Mengzhou Xia will present her FPO "Advancing the Pareto Frontier of Training Open Language Models" on Wednesday, November 19, 2025 at 4:30 PM in CS 105.
The members of Mengzhou’s committee are as follows:
Examiners: Danqi Chen (Adviser), Sanjeev Arora, Peter Henderson
Readers: Karthik Narasimhan, Pang Wei Koh (University of Washington)
A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.
Everyone is invited to attend his talk.
Abstract follows below:
Large language models (LLMs) have reshaped AI by enabling breakthroughs in language understanding, reasoning, and diverse applications. However, their massive computational demands and the proprietary nature of leading models hinder broad accessibility and customization. My work addresses these challenges by optimizing the use of existing compute, data, and models to push the Pareto frontier for LLM training. In doing so, it not only produces stronger language models but also offers universal approaches that support effective customization and advance our scientific understanding of model training.
First, we study how structured pruning can be leveraged to pre-train compact, high performing models at a fraction of the usual pre-training cost, demonstrating its effectiveness in pushing the Pareto frontier for general-purpose pre-training. Next, we turn to the post-training phase to explore the critical role of data in shaping model behavior, presenting principled data optimization techniques that enhance models’ capabilities, safety, and transparency—showing that “less is more” when it comes to constructing effective training datasets. Finally, we introduce novel post-training approaches that more effectively align language models with desired behaviors and objectives. By revealing gaps in the reasoning abilities of even proprietary models, we outline future directions for building AI systems with enhanced reasoning capabilities—focusing on broader data synthesis through agentic processes and enabling advanced applications.