03-05
Scalable and Efficient Systems for Large Language Models

Large Language Models (LLMs) have been driving recent breakthroughs in AI. These advancements would not have been possible without the support of scalable and efficient infrastructure systems. In this talk, I will introduce several underlying systems I have designed and built to support the entire model lifecycle, from training to deployment to evaluation. First, I will present Alpa, a system for large-scale model-parallel training that automatically generates execution plans unifying data, operator, and pipeline parallelism. Next, I will discuss efficient deployment systems, covering the frontend programming interface and backend runtime optimizations for high-performance inference. Finally, I will complete the model lifecycle by presenting our model evaluation efforts, including the crowdsourced live benchmark platform, Chatbot Arena, and the automatic evaluation pipeline, LLM-as-a-Judge. These projects have collectively laid a solid foundation for large language model systems, being widely adopted by leading LLM developers and companies. I will conclude by outlining some future directions of machine learning systems, such as co-optimizing across the full stack for building AI-centric applications.

Bio: Lianmin Zheng is a Ph.D. student in the EECS department at UC Berkeley, advised by Ion Stoica and Joseph E. Gonzalez. His research interests include machine learning systems, large language models, compilers, and distributed systems. He builds full-stack, scalable, and efficient systems to advance the development of AI. He co-founded LMSYS.org, where he leads impactful open-source large language model projects such as Vicuna and Chatbot Arena, which have received millions of downloads and served millions of users. He also co-organized the Big Model Tutorial at ICML 2022. He has received a Meta Ph.D. Fellowship, an IEEE Micro Best Paper Award, and an a16z open-source AI grant.

To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Date and Time

Tuesday March 5, 2024 12:30pm - 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Event Type

CS Department Colloquium Series

Speaker

Lianmin Zheng, from University of California, Berkeley

Host

Ravi Netravali

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

03-05 Scalable and Efficient Systems for Large Language Models

03-05
Scalable and Efficient Systems for Large Language Models