Knowledge-lean Approaches to Natural Language Processing
Date and Time
Wednesday, April 28, 2004 - 4:00pm to 5:30pm
Computer Science Small Auditorium (Room 105)
Lillian Lee, from Cornell University
In order to enable computers to understand and use natural language, a massive amount of knowledge, linguistic and otherwise, must be amassed. As a result, much recent research has focused on creating systems that automatically learn high-quality information about language, and about the world, directly from the statistics of unprocessed or minimally-processed language samples alone. As examples, I will focus on two lines of work. The first uses information-theoretic distributional clustering methods trained on large language samples to induce probabilistic models of linguistic co-occurrences. The second utilizes multiple-sequence-alignment algorithms, commonly employed in computational biology, to learn how to generate English versions of computer-generated proofs, creating texts whose quality rivals that of hand-crafted systems.
Portions of this talk are based on joint work with Regina Barzilay and with Fernando Pereira.