Instructor Ellen Zhong
Time Tuesdays 1:20-4:10p, Friend Center 007
Office hours Mondays 4:00-5:00p, CS 314, or by appointment
Slack Link
Syllabus Link

Recent breakthroughs in machine learning algorithms have transformed the study of the 3D structure of proteins and other biomolecules. This seminar class will survey recent papers on ML applied to tasks in protein structure prediction, structure determination, computational protein design, physics-based modeling, and more. We will take a holistic approach when discussing papers, including discussing their historical context, algorithmic contributions, and potential impact on scientific discovery and applications such as drug discovery.

For more information on the discussion format, expectations, and grading, see the course syllabus.


Goals

  • Learn about machine learning methods applied to problems in structural biology
  • Learn how to critically read and evaluate papers
  • Learn how to pose research problems and practice oral and written scientific communication skills
  • Bonus: Exposure to relevant basic and applied ML research in industry from guest speakers


Topics

A non-exhaustive list of topics we will cover include:

  • An introduction to structural biology
  • Protein structure prediction before and after AlphaFold2
  • Computer vision and cryo-electron microscopy (cryo-EM)
  • Computational protein design, in particular, antibody and vaccine design
  • Physics-based modeling and statistical mechanics
  • Small molecule drug discovery

Selected papers will cover a broad range of algorithmic concepts and machine learning techniques including:

  • Supervised learning and designing appropriate benchmarks and metrics
  • Language modeling and transformers
  • Generative modeling techniques including VAEs, GANs, normalizing flows, and diffusion models
  • Geometric deep learning
  • Neural rendering and multi-view 3D reconstruction

In addition to the assigned papers, optional primers or reviews on relevant topics will be made available for background reading.


Schedule

Please fill out this form and contact Ellen if you are interested in signing up for this class. See a previous year's course website for a sample of topics and papers we will cover.

Post-lecture feedback: Please fill out this form if you are assigned to give feedback on a lecture.

Week Date Topic Readings Presenters Questions and Feedback
1 January 27 Course overview; Introduction to machine learning in structural biology Additional Resources:
1. Dill et al. The Protein Folding Problem. Annual Review of Biophysics 2008.
Ellen Zhong [Slides] N/A
2 February 3 Protein structure prediction; CASP; Supervised learning; Protein-specific metrics 1. Senior, A.W., Evans, R., Jumper, J. et al. Improved protein structure prediction using potentials from deep learning. Nature 2020.
2. Ingraham, J. et al. Learning Protein Structure with a Differentiable Simulator. ICLR 2019 Oral. [Talk]

Additional Resources:
3. AlphaFold1 CASP13 slides
4. https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/
5. trRosetta: Yang et al. Improved protein structure prediction using predicted interresidue orientations. PNAS 2020.
TBD Pre-lecture questions
3 February 10 Breakthroughs in protein structure prediction
4 February 17 Protein structure determination I: Cryo-EM reconstruction
5 February 24 Protein language modeling
6 March 3 Protein design I: Inverse folding
7 March 10 No class -- Spring Recess
8 March 17 Structural bioinformatics
9 March 24 Physics-based modeling
10 March 31 Protein structure determination II
11 April 7 Protein Design II
12 April 14 Small molecule drug discovery
13 April 21 RNA structure prediction
14 April 28 Reading period (potential makeup class)
15 Tuesday, May 5 or May 12 (TBD), 1:20-4:10pm Final project presentations