What does it mean to "understand" an image? One popular answer is simply naming the objects seen in the image. During the last decade most computer vision researchers have focused on this "object naming" problem. While there has been great progress in detecting things like "cars" and "people", such a level of understanding still cannot answer even basic questions about an image such as "What is the geometric structure of the scene?", "Where in the image can I walk?" or "What is going to happen next?". In this talk, I will show that it is beneficial to go beyond mere object naming and harness relationships between
objects for image understanding. These relationships can provide crucial high-level constraints to help construct a globally-consistent model of the scene, as well as allow for powerful ways of understanding and interpreting the underlying image. Specifically, I will present image and video understanding systems that incorporate: (1) physical relationships between objects via a qualitative 3D volumetric representation; (2) functional relationships between objects and actions via data-driven physical interactions; and (3) causal relationships between actions via a storyline representation. I will demonstrate the importance of these relationships on a diverse set of real-world images and videos.
Abhinav Gupta is a postdoctoral fellow at the Robotics Institute, Carnegie Mellon University working with Alexei Efros and Martial Hebert. His research is in the area of computer vision, and its applications to robotics and computer graphics. He is particularly interested in using physical, functional and causal relationships for understanding images and videos. His other research interests include exploiting relationship between language and vision, semantic image parsing, and exemplar-based models for recognition. Abhinav received his PhD in 2009 from the University of Maryland under Larry Davis. His dissertation was nominated for the ACM Doctoral Dissertation Award by the University of Maryland. Abhinav is a recipient of the ECCV Best Paper Runner-up Award(2010) and the University of Maryland Dean's Fellowship Award (2004).