| Instructor | Ellen Zhong |
| Time | Tuesdays 1:20-4:10p, Friend Center 007 |
| Office hours | Mondays 4:00-5:00p, CS 314, or by appointment | Slack | Link |
| Syllabus | Link |
Recent breakthroughs in machine learning algorithms have transformed the study of the 3D structure of proteins and other biomolecules. This seminar class will survey recent papers on ML applied to tasks in protein structure prediction, structure determination, computational protein design, physics-based modeling, and more. We will take a holistic approach when discussing papers, including discussing their historical context, algorithmic contributions, and potential impact on scientific discovery and applications such as drug discovery.
For more information on the discussion format, expectations, and grading, see the course syllabus.
A non-exhaustive list of topics we will cover include:
Selected papers will cover a broad range of algorithmic concepts and machine learning techniques including:
In addition to the assigned papers, optional primers or reviews on relevant topics will be made available for background reading.
Final project guidelines: link
Tuesday, March 3rd, 1:20pm ET
Mark Goldstein (Flatiron Institute)
Title: Diffusion models and flow matching for molecule generation and design
Bio: Mark Goldstein is a Research Fellow in the Center for Computational Mathematics at the Flatiron Institute. Previously, he completed his PhD at the NYU Courant Institute of Mathematical Sciences, CILVR group, advised by Rajesh Ranganath and Thomas Wies. He works on deep generative models and machine learning in the sciences.
Tuesday, March 17th, 1:20pm ET
Zeming Lin (CZI Biohub // Evolutionary Scale)
Title: Evolutionary scale protein language modeling
Bio: TBD
Tuesday, March 24th, 1:20pm ET
Elana Simon (Stanford University)
Title: Discovering interpretable features in protein language models
Bio: Elana Simon is a PhD student at Stanford advised by James Zou, working on understanding what machine learning models learn from biological sequences and structures. Previously, she was an ML engineer at Reverie Labs designing small-molecule cancer drugs and studied computer science at Harvard, where she worked with Debora Marks on protein language models. She also writes in-depth ML-biology analyses on her blog matmols and has been actively involved in research and advocacy for Fibrolamellar Hepatocellular Carcinoma.
Tuesday, April 28th, 1:20pm ET (tentative)
Sam Rodriques (FutureHouse, Edison Scientific)
Title: Building AI scientists
Bio: Sam Rodriques is an inventor and entrepreneur and the founder of FutureHouse, a research lab focused on building AI scientists, and Edison Scientific, which commercializes AI agents for scientific discovery. He was previously head of the Applied Biotechnology Lab at the Francis Crick Institute and earned his PhD at MIT. Named one of Time Magazine’s 100 most influential people in AI in 2025, his work spans accelerating biomedical discovery, engineering human biology, and developing new institutional models for scientific research.
Please fill out this form and contact Ellen if you are interested in signing up for this class. See a previous year's course website for a sample of topics and papers we will cover.
Post-lecture feedback: Please fill out this form if you are assigned to give feedback on a lecture.
| Week | Date | Topic | Readings | Presenters | Questions and Feedback |
|---|---|---|---|---|---|
| 1 | January 27 | Course overview; Introduction to machine learning in structural biology |
Additional Resources:
1. Dill et al. The Protein Folding Problem. Annual Review of Biophysics 2008. |
Ellen Zhong [Slides] | N/A |
| 2 | February 3 | Protein structure prediction; CASP; Supervised learning; Protein-specific metrics |
1. Senior, A.W., Evans, R., Jumper, J. et al. Improved protein structure prediction using
potentials from deep learning. Nature 2020.
2. Ingraham, J. et al. Learning Protein Structure with a Differentiable Simulator. ICLR 2019 Oral. [Talk] Additional Resources: 3. AlphaFold1 CASP13 slides 4. https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/ 5. trRosetta: Yang et al. Improved protein structure prediction using predicted interresidue orientations. PNAS 2020. |
Ellen Zhong [Slides], Yufan Xia [Slides] | Pre-lecture questions Feedback: Jack McMahon, Ziyu Xiong |
| 3 | February 10 | Breakthroughs in protein structure prediction |
1. Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure
prediction with Alphafold. Nature 2021.
2. Abramson, J., Adler, J., Dunger, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024. Additional Resources: 3. Tunyasuvunakool, K., Adler, J., Wu, Z. et al. Highly accurate protein structure prediction for the human proteome. Nature 2021. 4. AlphaFold2 slides. [CASP14 talk] [Michael Figurnov slides] 5. https://moalquraishi.wordpress.com/2020/12/08/alphafold2-casp14-it-feels-like-ones-child-has-left-home/. 6. Primer on transformers: [1] [2] 7. The Illustrated AlphaFold(3) |
Jack Shaw, Maxwell Soh [Slides-AF2] [Slides-AFDB] [Slides-AF3] | Pre-lecture questions Feedback: Robert Heeter, Yagiz Devre |
| 4 | February 17 | Protein design I |
1. Ingraham et al. Generative
models for graph-based protein design. NeurIPS 2019.
2. ESM-IF1: Hsu et al. Learning inverse folding from millions of predicted structures. ICML 2022. 3. Pacesa et al. One-shot design of functional protein binders with BindCraft. Nature 2025. Additional Resources: 4. Dauparas et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science 2022. | Jack McMahon, Joseph Clark, Md Toki Tahmid | Pre-lecture questions Feedback:Tony Chen, Khai Evdaev |
| 5 | February 24 | Protein structure determination I: Cryo-EM reconstruction |
1. Zhong et al. Reconstructing continuous distributions of
protein structure from cryo-EM images. ICLR 2020 Spotlight.
2. Zhong et al. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nature Methods 2021. [pdf] 3. Levy et al. CryoDRGN-AI: neural ab initio reconstruction of challenging cryo-EM and cryo-ET datasets Nature Methods 2025. Additional Resources: 4. Computer vision related works:
i. Mildenhall, Srinivasan, Tancik et al. NeRF:
Representing
Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020 Oral. [project page]
ii. Tancik et al. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. NeurIPS 2020 Spotlight.
iii. Xie et al. Neural Fields in Visual Computing and
Beyond. Computer Graphics Forum 2022.
5. Cryo-EM background:
Singer & Sigworth. Computational Methods for
Single-Particle Cryo-EM. Annual Review of Biomedical Data Science, 2020.
6. Primer on Variational Autoencoders:
[1]
[2]
[3]
[4]
|
Guest lecture by Rish Raghu, Robert Heeter | Pre-lecture questions Feedback: Xingjian Hou, Sterling Hall |
| 6 | March 3 | Protein design II: Diffusion and flow matching models of sequence and structure | |||
| 7 | March 10 | No class -- Spring Recess | |||
| 8 | March 17 | Protein + Biological Language Modeling I | |||
| 9 | March 24 | Protein + Biological Language Modeling II | |||
| 10 | March 31 | Physics-based modeling | |||
| 11 | April 7 | Protein Structure Determination II | |||
| 12 | April 14 | Other molecules: RNA, small molecules, etc. | |||
| 13 | April 21 | Skip | |||
| 14 | April 28 | AI Scientists | |||
| 15 | Tuesday, May 5 or May 12 (TBD), 1:20-4:10pm | Final project presentations |