A 2D + 3D Rich Data Approach to Scene Understanding
To push the traditional 2D representation to the limit, we built the Scene Understanding (SUN) Database, a large collection of images that exhaustively spans all scene categories. However, the lack of a "rich" representation still significantly limits the traditional recognition pipeline. While an image is a 2D array, the world is 3D and our eyes see it from a viewpoint, but this is not traditionally modeled. This paradigm shift toward rich representation also opens up new challenges that require a new kind of big data -- data with extra descriptions, namely rich data. Specifically, we focus on a highly valuable kind of rich data -- multiple viewpoints in 3D -- and we build the SUN3D database to obtain an integrated "place-centric" representation of scenes. This novel representation with rich data opens up exciting new opportunities for integrating scene recognition over space and for obtaining a scene-level reconstruction of large environments. It also has many applications such as organizing big visual data to provide photo-realistic indoor 3D maps. Finally, I will discuss some open challenges and my future plans for rich data and representation.
Jianxiong Xiao is a Ph.D. candidate in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT). Before that, he received a B.Eng. and a M.Phil. from the Hong Kong University of Science and Technology. His research interests are in computer vision, with a focus on scene understanding. His work has received the Best Student Paper Award at the European Conference on Computer Vision (ECCV) in 2012, and has appeared in popular press. Jianxiong was awarded the Google U.S./Canada Ph.D. Fellowship in Computer Vision in 2012 and MIT CSW Best Research Award in 2011. More information can be found on his website: http://mit.edu/jxiao.