Learning and Recognizing Real-World Scenes and Objects
In this talk, we will begin by showing a recent algorithm developed toward learning and recognizing complex real-life images such as busy city street, beach, kitchen, etc. To motivate this topic, I will present a series of recent human psychophysics studies on natural scene recognition. All these experiments converge to one prominent phenomena of the human visual system: humans are extremely efficient and rapid in capturing the overall gist of natural images. Can we achieve such a feat in computer vision modeling? We propose here a generative Bayesian hierarchical model that learns to categorize natural images in a weakly supervised fashion. We represent an image by a collection of local regions, denoted as codewords obtained by unsupervised clustering. Each region is then represented as part of a `theme'. In previous work, such themes were learnt from hand-annotations of experts, while our method learns the theme distribution as well as the codewords distribution over the themes without such supervision. We report excellent categorization performances on a large set of 13 categories of complex scenes.
If time permits, I will also briefly go over some work toward generic object categorization in cluttered scenes, especially under constraint training condition. We will highlight some progress made in one-shot learning of object categories. Learning visual models of object categories notoriously requires thousands of training examples; this is due to the diversity and richness of object appearance which requires models containing hundreds of parameters. We present a method for learning object categories from just a few images (1~5). It is based on incorporating "generic" knowledge which may be obtained from previously learnt models of unrelated categories. We operate in a variational Bayesian framework: object categories are represented by probabilistic models, and "prior" knowledge is represented as a probability density function on the parameters of these models. The "posterior" model for an object category is obtained by updating the prior in the light of one or more observations. Our ideas are demonstrated on objects belonging to 101 widely varied categories.