Data-Driven Text Analysis with Joint Models | Computer Science Department at Princeton University

Date and Time

Tuesday, March 1, 2016 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Greg Durrett , from UC Berkeley

Host

Barbara Engelhardt

One reason that analyzing text is hard is that it involves dealing with deeply entangled linguistic variables: objects like syntactic structures, semantic types, and discourse relations depend on one another in complex ways. Our work tackles several facets of text analysis using joint modeling, combining model components both across and within the various subtasks of this analysis. This model structure allows us to pass information between these entangled subtasks and propagate high-confidence predictions rather than errors. Critically, our models have the capacity to learn key linguistic phenomena as well as other important patterns in the data; that is, linguistics tells us how to structure these models, then the data injects knowledge into them. We describe state-of-the-art systems for a range of tasks, including syntactic parsing, entity resolution, and document summarization.

Greg is a Ph.D. candidate at UC Berkeley working on natural language processing with Dan Klein. He is interested in building structured machine learning models for a wide variety of text analysis problems and downstream NLP applications. His work is comprised of two broad thrusts: first, designing joint models that combine information across different tasks or different views of a problem, and second, building systems that strike a balance between being linguistically motivated and data-driven