Natural Language Understanding with Paraphrases and Composition
Natural language processing (NLP) aims to teach computers to understand human language. NLP has enabled some of the most visible applications of artificial intelligence, including Google search, IBM Watson, and Apple’s Siri. As AI is applied to increasingly complex domains such as health care, education, and government, NLP will play a crucial role in allowing computational systems to access the vast amount of human knowledge documented in the form of unstructured speech and text.
In this talk, I will discuss my work on training computers to make inferences about what is true or false based on information expressed in natural language. My approach combines machine learning with insights from formal linguistics in order to build data-driven models of semantics which are more precise and interpretable than would be possible using linguistically naive approaches. I will begin with my work on automatically adding semantic annotations to the 100 million phrase pairs in the Paraphrase Database (PPDB). These annotations provide the type of information necessary for carrying out precise inferences in natural language, transforming the database into a largest available lexical semantics resource for natural language processing. I will then turn to the problem of compositional entailment, and present an algorithm for performing inferences about long phrases which are unlikely to have been observed in data. Finally, I will discuss my current work on pragmatic reasoning: when and how humans derive meaning from a sentence beyond what is literally contained in the words. I will describe the difficulties that such "common-sense" inference poses for automatic language understanding, and present my on-going work on models for overcoming these challenges.
Ellie Pavlick is a PhD student at the University of Pennsylvania, advised by Dr. Chris Callison-Burch. Her dissertation focuses on natural language inference and entailment. Outside of her dissertation research, Ellie has published work on stylistic variation in paraphrase--e.g. how paraphrases can effect the formality or the complexity of language--and on applications of crowdsourcing to natural language processing and social science problems. She has been involved in the design and instruction of Penn's first undergraduate course on Crowdsourcing and Human Computation (NETS 213). Ellie is a 2016 Facebook PhD Fellow, and has interned at Google Research, Yahoo Labs, and the Allen Institute for Artificial Intelligence.