04-10
Principled Methods for Reliable AI at Scale

Making large-scale pretrained models reliable in practice remains an open challenge. In this talk, I will show that reliable AI requires understanding and manipulating how data and training dynamics shape model behavior, and present methods that do so. First, I will tackle the problem of estimating model performance in the wild, where labeled data matching the deployment setting is expensive or unavailable. I introduce Agreement-on-the-Line, which exploits surprising structure in how models behave under natural shifts to predict out-of-distribution accuracy from unlabeled data alone. Next, I will show why post-hoc approaches to reliability fail: standard training entangles all learned information in the same neurons. I introduce Memorization Sinks, a new training paradigm that exploits learning dynamics to disentangle memorization and generalization by design, one application of which yields the first natively unlearnable language models. Finally, I will discuss how to enable creativity in tasks that require a far-sighted leap of thought, like scientific discovery. I will argue that we need to go beyond next-token prediction and show how multi-token objectives and new ways of injecting randomness into generation can unlock the diversity and originality these tasks demand. I will close by discussing how to apply these ideas to the reliability challenges that lie ahead, as models become agentic, long-horizon, and increasingly embedded in high-stakes decisions.

Bio: Aditi Raghunathan is an Assistant Professor of Computer Science at Carnegie Mellon University. Her work advances trustworthy AI by translating insights from the scientific study of frontier model failures into methods that make them robust and safe. She is a recipient of the Sloan Research Fellowship, NSF CAREER Award, Okawa Research Award, Schmidt AI2050 Early Career Fellowship, Google Research Scholar Award, Forbes 30 Under 30 recognition, Arthur Samuel Best Thesis Award at Stanford, and multiple PhD fellowships. Her work has also been recognized with an Outstanding Paper Award at ICML 2025 and several workshop paper awards.
 

Image
Event poster
Date and Time
Friday April 10, 2026 11:00am - 12:00pm
Location
Computer Science Small Auditorium (Room 105)
Event Type
Speaker
Aditi Raghunathan, from Carnegie Mellon University

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List