Google and the Vapnik-Chervonenkis Dimension
Date and Time
Wednesday, February 11, 2009 - 4:15pm to 5:45pm
Computer Science Small Auditorium (Room 105)
Stuart Geman, from Brown University
Google engineers routinely train query classifiers, for ranking advertisements or search results, on more words than any human being sees or hears in a lifetime. A human being who sees a meaningfully new image every second for one-hundred years will not see as many images as Google has in its libraries, all of which are available for training object detectors and image classifiers. Yet by human standards the state-of-the-art, in computer understanding of language and computer-generated image analysis, is primitive. What explains the gap? Why can’t learning theory tell us how to make machines that learn as efficiently as humans? Upper bounds on the number of training samples needed to learn a classifier as rich and competent as the human visual system can be derived using the Vapnik-Chervonenkis dimension, or the metric entropy, but these suggest that not only does Google need more examples, but all of evolution might fall short. I will make some proposals for efficient learning and offer some mathematics to support them.