RGBD Pipeline for Indoor Scene Reconstruction and Understanding
In this work, we consider the problem of reconstructing a 3D model from a sequence of
color and depth frames. Generating such a model has many important applications,
ranging from the entertainment industry to real estate. However, transforming the
RGBD frames into high-quality 3D models is a challenging problem, especially if
additional semantic information is required. In this document, we introduce three
projects, which implement various stages of a robust RGBD processing pipeline.
First, we consider the challenges arising during the RGBD data capture process.
While the depth cameras are providing dense, per-pixel depth measurements, there is
a non-trivial error associated with the resulting data. We discuss the depth generation
problem and propose an error reduction technique based on estimating an imagespace undistortion field. We describe the capture process of the data required for
the generation of such an undistortion field. We showcase how correcting the depth
measurements improves the reconstruction quality.
Second, we address the problem of registering RGBD frames over a long video
sequence into a globally consistent 3D model. We propose a “fine-to-coarse” global
registration algorithm that leverages robust registrations at finer scales to seed detection and enforcement of geometrical constraints, modeled as planar structures, at
coarser scales. To test global registration algorithms, we provide a benchmark with
10,401 manually-clicked point correspondences in 25 scenes from the SUN3D dataset.
We find that our fine-to-coarse algorithm registers long RGBD sequences better than
Last, we show how repeated scans of the same space can be used to establish
associations between the different observations. Specifically, we consider a situation
where 3D scans are acquired repeatedly at sparse time intervals. We develop an
algorithm that analyzes these “rescans” and builds a temporal model of a scene with
semantic instance information. The proposed algorithm operates inductively by using
a temporal model resulting from past observations to infer instance segmentation of
a new scan. The temporal model is continuously updated to reflect the changes
that occur in the scene over time, providing object associations across time. The
algorithm outperforms alternate approaches based on state-of-the-art networks for
semantic instance segmentation