Quick links

Dingli Yu FPO

Date and Time
Thursday, July 18, 2024 - 1:30pm to 3:30pm
Computer Science 402

Dingli Yu will present his FPO "Efficient Scaling of Large Models: Principles in Optimization and Data Aspects" on Thursday, July 18, 2024 at 1:30 PM in CS 402 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/7188314894?omn=99115109336

The members of Dingli’s committee are as follows:
Examiners: Sanjeev Arora (Adviser), Elad Hazan, Chi Jin
Readers: Danqi Chen, Mark Braverman

A copy of his thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
Deep learning has advanced remarkably in recent decades. Yet, its theoretical foundations, particularly in the realm of large models, still lag behind. This thesis focuses on research that combines strong theoretical foundations with practical applications in efficiently scaling up large models.

In the first part of the thesis, we focus on the training dynamics of neural nets by covering the theory of overparametrized neural nets. We will briefly introduce the theory of Neural Tangent Kernel (NTK), and proceed with Hyperparameter Transfer, an important application of the Tensor Program framework. We cover some of the earliest papers that establish NTK as a research field, along with the limitations of NTK. Hyperparameter Transfer is a novel and efficient paradigm for hyperparameter tuning by providing the optimal strategy for scaling up models. We introduce the characterization of the training dynamics for deep neural nets and offer an efficient hyperparameter selection scheme where optimal hyperparameters selected by tuning on shallow nets also work for deep nets.

In the second part of the thesis, we focus on the data aspect of large model scaling. We will first introduce Skill-Mix, a novel and unique evaluation that sidesteps issues of traditional large language model (LLM) evaluations like data contamination and cramming for leaderboard. Skill-Mix randomly selects k language skills, then prompts the LLM to produce a concise text that demonstrates the chosen skills. The exponentially growing number of skill combinations provably prevent data contamination and can further reveal the novelty of successful answers by powerful LLMs. We then introduce ConceptMix, an extension of Skill-Mix to evaluate the capabilities of text-to-image models to combine k random selected visual concepts. Finally, we uncover the capabilities of LLMs to learn and generalize skill compositions given good responses from Skill-Mix. The results show that a few thousand of such data is enough to significantly improve the model performance in unseen skill combinations, beating models with much larger sizes. It suggests incorporating skill-rich synthetic text into training is an efficient way to scale up the data


Follow us: Facebook Twitter Linkedin