11-17
Large Language Models: Will they keep getting bigger? And, how will we use them if they do?

The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext and multi-modal structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models.

This talk describes work done at the University of Washington and Meta, primarily led by Armen Aghajanyan, Suchin Gururangan, Ari Holtzmann, Mike Lewis, Margaret Li, Sewon Min, and Peter West.

Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Director at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow as well as winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.

To request accommodations for a disability, please contact Emily Lawrence at emilyl@cs.princeton.edu at least one week prior to the event.

This talk is co-sponsored by Computer Science and the Center for Statistics and Machine Learning.

This talk will be recorded and live streamed on Princeton University Media Central. See link here.

Date and Time

Thursday November 17, 2022 12:30pm - 1:30pm

Location

Friend Center Convocation Room

Event Type

CS Department Colloquium Series

Speaker

Luke Zettlemoyer, from University of Washington

Host

Danqi Chen

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

11-17 Large Language Models: Will they keep getting bigger? And, how will we use them if they do?

11-17
Large Language Models: Will they keep getting bigger? And, how will we use them if they do?