Sadhika Malladi will present her FPO "Deep Learning Theory in the Age of Generative AI" on Thursday, November 20, 2025 at 11:00 AM in CS 401.
The members of Sadhika’s committee are as follows:
Examiners: Sanjeev Arora (Adviser), Danqi Chen, Chi Jin
Readers: Tri Dao, Zhiyuan Li (TTIC)
A copy of her thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.
Everyone is invited to attend her talk.
Abstract follows below:
This thesis develops theoretical frameworks for understanding modern deep learning in realistic training regimes. As model architectures, optimization methods, and data pipelines continue to evolve, much of the existing theory no longer describes how large-scale systems behave in practice. Exhaustive empirical studies, once central to understanding these systems, have become infeasible at current computational scales. The goal of this work is to bridge that gap by constructing mathematical analyses that remain faithful to real training scenarios while offering prescriptive guidance for practice.
The first part establishes stochastic differential equation (SDE) approximations that accurately model stochastic optimization at finite learning rates. These analyses clarify when SDE-based descriptions are valid and yield general scaling rules for efficient large-batch training. The second part examines the fine-tuning of pre-trained language models, formalizing it as a small, structured modification to an existing representation. This perspective explains the effectiveness of parameter-efficient adaptation methods and motivates a memory-efficient fine-tuning algorithm derived from the theoretical analysis. The third part studies preference-based alignment, identifying why commonly used objectives fail to fit preference data and introducing diagnostic measures that link training objectives to observed model behavior.
Together, these results demonstrate how mathematically grounded analysis can extend beyond idealized assumptions to capture the dynamics of contemporary machine learning systems. The thesis argues that such theory can play a prescriptive and interpretive role—guiding large-scale training, diagnosing failure modes, and providing a stable foundation for understanding as empirical practices continue to evolve.