I'm a PhD student in the
Princeton NLP group,
advised by Danqi Chen,
and affiliated with Princeton Language and Intelligence (PLI).
Previously, I graduated with a BA & MEng from the University of Cambridge,
where I worked with
Adrian Weller.
I am interested in improving our understanding of language models in terms of their training data, architecture and training objectives.
[Google Scholar] [Twitter] [GitHub]
(* indicates equal contribution)
How to Train Long-Context Language Models (Effectively) Pre-print 2024
OLMoE: Open Mixture-of-Experts Language Models Pre-print 2024
Finding Transformer Circuits with Edge Pruning NeurIPS 2024 (Spotlight)
QuRating: Selecting High-Quality Data for Training Language Models ICML 2024 (Spotlight)
Language Models as Science Tutors ICML 2024
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? ICLR 2024 (Oral)
Learning Transformer Programs NeurIPS 2023 (Oral)
A Kernel-Based View of Language Model Fine-Tuning ICML 2023 ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models (Spotlight)