Putting language in context: social networks and discourse structures for robust language technology
The creation of software that can use and understand natural human language would be a transformative technological advance, enabling computational inference to be brought to bear on the vast stores of knowledge that are encoded only as text. Supervised machine learning approaches have brought this possibility into view, but existing methods rely on annotated datasets that focus overwhelmingly on a narrow set of news texts, while failing to generalize to high-impact application domains such as social media and electronic health records. Human readers and listeners successfully comprehend language under difficult circumstances by relying on various forms of contextual information -- knowing who is speaking, and what they might be trying to say. My research brings this same contextual awareness to automated language processing, using deep learning architectures that are constructed to reflect theoretical ideas from sociolinguistics and discourse semantics. I will describe three applications of this methodology: (1) incorporating sociolinguistic variation into document classification; (2) linking textual references to canonicalized entities; (3) predicting the discourse relations that hold between sentences. I will also briefly describe research that uses computational linguistic analysis to obtain new evidence on sociocultural affinity and influence.
Jacob Eisenstein is an Assistant Professor in the School of Interactive Computing at Georgia Tech. He works on statistical natural language processing, focusing on computational sociolinguistics, social media analysis, discourse, and machine learning. He is a recipient of the NSF CAREER Award, a member of the Air Force Office of Scientific Research (AFOSR) Young Investigator Program, and was a SICSA Distinguished Visiting Fellow at the University of Edinburgh. His work has also been supported by the National Institutes for Health, the National Endowment for the Humanities, and Google. Jacob was a Postdoctoral researcher at Carnegie Mellon and the University of Illinois. He completed his Ph.D. at MIT in 2008, winning the George M. Sprowls dissertation award. Jacob's research has been featured in the New York Times, National Public Radio, and the BBC. Thanks to his brief appearance in If These Knishes Could Talk, Jacob has a Bacon number of 2.