Combining Words & Pictures

Tamara Berg
Computer Science, SUNY Stonybrook

There are billions of photographs with associated text available on the web. Some common areas where images and words are naturally linked include: web pages, captioned photographs, and video with speech or closed captioning. The central question that needs to be solved in order to organize these collections effectively is how to extract images in which specified objects are depicted from large pools of pictures with noisy text. This is a very challenging problem because the relationship between words associated with an image and objects depicted within the image is often complex.

My work has demonstrated that for many situations these collections can be mined successfully. In this talk I will describe three projects that I have worked on in this area: Automatically labeling faces in news photographs, classifying images from the web, and ranking iconic images from consumer photo collections.

Related papers, created datasets, and demos are available on my webpage

Most relevent papers:

"Who's in the Picture?" Tamara L. Berg, Alexander C. Berg, Jaety Edwards, David A. Forsyth Neural Information Processing Systems (NIPS), 2004

"Animals on the Web" Tamara L. Berg, David A. Forsyth Computer Vision and Pattern Recognition (CVPR) 2006