05-28
Abhishek Panigrahi FPO

Abhishek Panigrahi will present his FPO "Understanding Optimization, Distillation, And Transfer: A Theoretical Study of Deep Learning" on Thursday, May 28, 2026 at 2:00 PM in CS 302.

The members of Abhishek’s committee are as follows:
Examiners: Sanjeev Arora (Adviser), Elad Hazan, Karthik Narasimhan
Readers: Danqi Chen, Surbhi Goel (UPENN)

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
Modern deep learning systems are trained through increasingly complex pipelines involving large-scale data, multiple stages of training, and a growing number of design choices. While such systems have achieved remarkable empirical success, our theoretical understanding of how these choices shape the capabilities of the final model remains limited. This thesis develops theoretical studies on three fundamental questions underlying modern training pipelines: how optimization choices affect learning dynamics, how capabilities can be effectively transferred across models, and how learned capabilities generalize across tasks and modalities.

First, the thesis analyzes the role of the learning rate in driving the Edge of Stability phenomenon, where training dynamics differ from the predictions of classical optimization theory. It also introduces and studies Context-Enhanced learning, a learning framework in which access to privileged contextual information during training can substantially accelerate training, even when that information is difficult to recover from the model’s outputs after training.

Second, the thesis investigates how existing trained models can be used to more effectively train new models through knowledge distillation. Stronger teachers do not necessarily produce stronger students; rather, the quality of transfer depends critically on both the choice of teacher and the form of supervision. The results establish design principles for distillation from two complementary perspectives: selecting teachers by analyzing the student’s own gradients, and adapting supervision by identifying the student’s missing skills. Together, these principles enable efficient and reliable teacher selection using only 1% of the training compute, and yield improvements of at least 7% in settings where standard fine-tuning and reinforcement learning methods struggle to improve the student model.

Finally, the thesis studies the transferability of learned capabilities beyond training. It develops methods for localizing the skills required for a task within a model and shows that such localized skills can improve generalization to out-of-distribution tasks. At the same time, it demonstrates that standard training pipelines are often insufficient for transferring capabilities to new modalities, and studies the importance of data composition in enabling reliable cross-modal transfer.

Together, these results contribute to a more principled understanding of modern deep learning pipelines, with the broader goal of moving from empirical trial-and-error toward a science of training design.

Date and Time

Thursday May 28, 2026 2:00pm - 4:00pm

Location

Computer Science 302

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

05-28 Abhishek Panigrahi FPO

05-28
Abhishek Panigrahi FPO