Over the past sixty years, Intelligent Machines (IM) have made great progress in playing games, tagging images in isolation, and recently making decisions for self-driving vehicles. Despite these advancements, they are still far from making decisions in social scenes and assisting humans in public spaces such as terminals, malls, campuses, or any crowded urban environment. To overcome these limitations, we need to empower machines with social intelligence, i.e., the ability to get along well with others and facilitate mutual cooperation. This is crucial to design smart spaces that adapt to the behavior of humans for efficiency, or develop autonomous machines that assist in crowded public spaces (e.g., delivery robots, or self-navigating segways).
In this talk, I will present my work towards socially-aware machines that can understand human social dynamics and learn to forecast them. First, I will highlight the machine vision techniques behind understanding the behavior of more than 100 million individuals captured by multi-modal cameras in urban spaces. I will show how to use sparsity promoting priors to extract meaningful information about human behavior from an overwhelming volume of high dimensional and high entropy data. Second, I will introduce a new deep learning method to forecast human socialbehavior. The causality behind human behavior is an interplay between both observable and non-observable cues (e.g., intentions). For instance, when humans walk into crowded urban environments such as a busy train terminal, they obey a large number of (unwritten) common sense rules and comply with social conventions. They typically avoid crossing groups and keep a personal distance to their surrounding. I will present detailed insights on how to learn these interactions from millions of trajectories. I will describe a new recurrent neural network that can jointly reason on correlated sequences and simulate human trajectories in crowded scenes. It opens new avenues of research in learning the causalities behind the world we observe. I will conclude my talk by mentioning some ongoing work in applying these techniques to social robots, and the first generation of smart hospitals.
Alexandre Alahi is currently a postdoctoral fellow at Stanford University and received his PhD from EPFL. His research interests span visual information processing, computer vision, machine learning, robotics, and are focused around understanding and forecasting human social behaviors. In particular, he is interested in sparse approximation, deep learning, big visual data processing, real-time machine vision, and multi-modal reasoning. He was awarded the Swiss NSF early and advanced researcher grants. He won the CVPR 2012 Open Source Award for his work on Retina-inspired image descriptor, and the ICDSC 2009 Challenge Prize for his sparsity driven algorithm to track sport players. His PhD was nominated for the EPFL PhD prize. His research has been covered by the Wall Street journal, and PBS TV channel in the US, as well as European newspapers, Swiss TV news, and Euronews TV channel in Europe. He has also co-founded the startup Visiosafe, transferring his research on understanding human behavior into an industrial product. He was selected as the Top 20 Swiss Venture leaders in 2010 and won several startup competitions.