02-25
Bayesian Nonparametric Models and "Big Data"

Bayesian nonparametrics is an area in machine learning in which models grow in size and complexity as data accrue. As such, they they are particularly relevant to the world of "Big Data", where it may be difficult or even counterproductive to fix the number of parameters a priori. A stumbling block for Bayesian nonparametrics has been that their algorithms for posterior inference generally show poor scalability. In this talk, we tackle this issue in the domain of large-scale text collections. Our model is a novel tree-structured model in which documents are represented by collections of paths in an infinite-dimensional tree. We develop a general and efficient variational inference strategy for learning such models based on stochastic optimization, and show that with this combination of modeling and inference approach, we are able to learn high-quality models using millions of documents.

John Paisley received the B.S.E. (2004), M.S. (2007) and Ph.D. (2010) in Electrical & Computer Engineering from Duke University, where his advisor was Lawrence Carin. He was a postdoctoral researcher with David Blei in the Computer Science Department at Princeton University, and currently with Michael Jordan in the Department of EECS at UC Berkeley. He works on developing Bayesian models for machine learning applications, particularly for dictionary learning and topic modeling.

Date and Time

Monday February 25, 2013 4:30pm - 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Event Type

CS Department Colloquium Series

Speaker

John Paisley, from University of California, Berkeley

Host

David Blei

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

02-25 Bayesian Nonparametric Models and "Big Data"

02-25
Bayesian Nonparametric Models and "Big Data"