02-29
Making Language Models Useful

Large pre-trained language models, most notably GPT-3, are the engines of knowledge and capability underpinning powerful systems such as ChatGPT, Gemini, and Claude. Yet much like building a safe, comfortable vehicle requires more than a powerful engine, building a useful, beneficial language system requires additional techniques to promote key attributes such as controllability, factuality, and updatability. This talk will share my work towards imbuing large language models with these traits. I will first share the direct preference optimization algorithm, a more scalable algorithm for training language models to follow instructions in accordance with human preferences. I will next discuss approaches for improving the factual reliability of language models, which is challenging even for models that generally follow user instructions well. Finally, I will share my work towards methods for updating individual model behaviors or beliefs that have fallen out-of-date or are otherwise problematic. I will conclude with several important topics for future work toward more useful, trustworthy AI systems, including unsupervised continual learning, scalable oversight, and robust reasoning.

Bio: Eric Mitchell is a final-year PhD student in Stanford’s Computer Science department, advised by Chelsea Finn and Christopher Manning. His research uses tools from machine learning to improve the usefulness and reliability of language models, in particular by developing techniques that enhance their controllability, factuality, and updatability. His work has appeared in ICML, NeurIPS, ICLR, and EMNLP, being recognized with an outstanding paper runner-up award at NeurIPS ‘23. His work, in particular the direct preference optimization algorithm, has been used widely in state-of-the-art open source and proprietary language models. He is a former Knight-Hennessy Scholar and received his BS from Princeton University.

To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Date and Time

Thursday February 29, 2024 12:30pm - 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Event Type

CS Department Colloquium Series

Speaker

Eric Mitchell, from Stanford University

Host

Karthik Narasimhan

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

02-29 Making Language Models Useful

02-29
Making Language Models Useful