From Pixels to Scenes: Recovering 3D Geometry and Semantics for Indoor Environments | Computer Science Department at Princeton University

Report ID:

TR-021-18

Authors:

Zhang, Yinda

Date:

October 29, 2018

Pages:

170

Download Formats:

[PDF]

Abstract:

Understanding the 3D geometry and semantics of real environments is in critically
high demand for many applications, such as autonomous driving, robotics, and augmented
reality. However, it is extremely challenging due to imperfect and noisy measurements
from real sensors, limited access to ground truth data, and cluttered scenes
exhibiting heavy occlusions and intervening objects. To address these issues, this thesis
introduces a series of works that produce a geometric and semantic understanding
of the scene in both pixel-wise and holistic 3D representations. Starting from estimating
a depth map, which is a fundamental task in many approaches for reconstructing
the 3D geometry of the scene, we introduce a learning-based active stereo system
that is trained in a self-supervised fashion and reduces the disparity error to 1/10th
of other canonical stereo systems. To handle a more common case where only one
input image is available for scene understanding, we create a high-quality synthetic
dataset facilitating pre-training of data-driven approaches, and demonstrating that
we can improve the surface normal estimation and improve raw depth measurements
from commodity RGBD sensors. Lastly, we pursue holistic 3D scene understanding
by estimating a 3D representation of the scene, in which objects and room layout
are represented using 3D bounding box and planar surfaces respectively. We propose
methods to produce such a representation from either a single color panorama or a
depth image, leveraging scene context. On the whole, these proposed methods produce
understanding of both 3D geometry and semantics from the most fine-grained
pixel level to the holistic scene scale, building foundations that support future work
in 3D scene understanding.