Applications of Large-Scale Machine Learning in Vision and Robotics

Yann LeCun

Computer Science Department, New York University

One long-term goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as visual perception auditary perception, language understanding, reasoning, intelligent control, and other artificially intelligent activities. To reach that goal, the Machine Learning community is facing two main challenges: solving the normalization problem, and solving the deep learning problem.

The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and Natural Language communities have devoted considerable efforts to circumventing this problem by developing ``un-normalized'' learning models for tasks in which the output is highly structured (e.g. English sentences). This class of models was in fact originally developed during the early 90's in the speech and handwriting recognition communities, and resulted in highly successful commercial system for automatically reading bank checks and other documents.


The Deep Learning Problem is related to the issue of training all the levels of a recognition system (e.g. segmentation, feature extraction, recognition, etc) in an integrated fashion. We first consider ``traditional'' methods for deep supervised learning, such as multi-layer neural networks and convolutional networks, a learning architecture for image recognition loosely modeled after the architecture of the visual cortex. Several practical applications of convolutional nets will be demonstrated with videos and live demos, including a handwriting recognition system, a real-time human face detector that also estimates the pose of the face, a real-time system that can detect and recognize objects such as airplanes, cars, animals and people in images, and a vision-based navigation system for off-road mobile robots that trains itself on-line to avoid obstacles.

Although these methods produce excellent performance, they require many training samples. The next challenge is to devise unsupervised learning methods for deep networks. Inspired by some recent work by Hinton on "deep belief networks", we devised an energy-based unsupervised algorithm that can learn deep hierarchies of invariant features for image recognition. We how such algorithms can dramatically reduces the required number of training samples, particularly for such tasks as the recognition of everyday objects at the category level.