Instructor Prof. Ellen Zhong
Time Thursdays 3:00-5:00p, Julis Romo A02
"Precept" / student-only discussion Wednesdays 1:00-2:00p, CS 301
Office hours Mondays 4:00-5:00p, CS 314
Slack Link
Syllabus Link

Recent breakthroughs in machine learning algorithms have transformed the study of the 3D structure of proteins and other biomolecules. This seminar class will survey recent papers on ML applied to tasks in protein structure prediction, structure determination, computational protein design, physics-based modeling, and more. We will take a holistic approach when discussing papers, including discussing their historical context, algorithmic contributions, and potential impact on scientific discovery and applications such as drug discovery.

For more information on the discussion format, expectations, and grading, see the course syllabus.


Goals

  • Learn about machine learning methods applied to problems in structural biology
  • Learn how to critically read and evaluate papers
  • Learn how to pose research problems and practice written scientific communication skills
  • Bonus: Exposure to relevant basic and applied ML research in industry from guest speakers


Topics

A non-exhaustive list of topics we will cover include:

  • An introduction to structural biology
  • Protein structure prediction before and after AlphaFold2
  • Computer vision and cryo-electron microscopy (cryo-EM)
  • Computational protein design, in particular, antibody and vaccine design
  • Physics-based modeling and statistical mechanics
  • Small molecule drug discovery

Selected papers will cover a broad range of algorithmic concepts and machine learning techniques including:

  • Supervised learning and designing appropriate benchmarks and metrics
  • Language modeling and transformers
  • Generative modeling techniques including VAEs, GANs, normalizing flows, and diffusion models
  • Geometric deep learning
  • Neural fields and multi-view 3D reconstruction

In addition to the assigned papers, optional primers or reviews on relevant topics will be made available for background reading.


Assignments

Assignment 1. Due 11am, Friday, September 30th via Canvas


Guest Speakers

Thursday September 22nd, 3pm ET
Dr. Michael Figurnov (DeepMind)

Title: Highly accurate protein structure prediction with AlphaFold

Abstract: Predicting a protein’s structure from its primary sequence has been a grand challenge in biology for the past 50 years, holding the promise to bridge the gap between the pace of genomics discovery and resulting structural characterization. In this talk, we will describe work at DeepMind to develop AlphaFold, a new deep learning-based system for structure prediction that achieves high accuracy across a wide range of targets. We demonstrated our system in the 14th biennial Critical Assessment of Protein Structure Prediction (CASP14) across a wide range of difficult targets, where the assessors judged our predictions to be at an accuracy “competitive with experiment” for approximately 2/3rds of proteins. The talk will focus on the underlying machine learning ideas, while also touching on the implications for biological research.

Bio: Michael Figurnov is a Staff Research Scientist at DeepMind. He has been working with the AlphaFold team for the past four years. Before joining DeepMind, he did his Ph.D. in Computer Science at the Bayesian Methods Research Group under the supervision of Dmitry Vetrov. His research interests include deep learning, Bayesian methods, and machine learning for biology.


Thursday November 10th, 12:30p ET (CS 105)
Dr. John Ingraham (Generate Biomedicines)


Schedule

Week Date Topic Readings Format Assignment
1 September 8 Course overview; Introduction to machine learning in structural biology Optional reading:
1. Dill et al. The Protein Folding Problem. Annual Review of Biophysics 2008.
E.Z. lecture N/A
2 September 15 Protein structure prediction; CASP; supervised learning; The alphabet soup of protein-specific terminology and acronyms 1. Senior, A.W., Evans, R., Jumper, J. et al. Improved protein structure prediction using potentials from deep learning. Nature 2020.
2. Ingraham, J. et al. Learning Protein Structure with a Differentiable Simulator. ICLR 2019 Oral. [Talk]

Optional further reading:
3. https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/
4. trRosetta: Yang et al. Improved protein structure prediction using predicted interresidue orientations. PNAS 2020.
Paper discussion N/A
3 September 22 Breakthroughs in protein structure prediction 1. Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with Alphafold. Nature 2021.
2. Tunyasuvunakool, K., Adler, J., Wu, Z. et al. Highly accurate protein structure prediction for the human proteome. Nature 2021.

Optional further reading:
3. AlphaFold2 slides. [CASP14 talk] [Michael Figurnov slides]
4. https://moalquraishi.wordpress.com/2020/12/08/alphafold2-casp14-it-feels-like-ones-child-has-left-home/.
5. Baek et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021. [paper]
6. Primer on transformers: [1] [2]
Guest Seminar (Michael Figurnov) + Paper discussion N/A
4 September 29 Complexes, integrative modeling, and limits of structure prediction 1. Evans et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv
2. Terwilliger et al. Improved AlphaFold modeling with implicit experimental information. bioRxiv

Optional further reading:
3. Nuclear pore complexes: https://www.science.org/doi/full/10.1126/science.abq4792?intcmp=trendmd-sci
4. Cluspro: https://www.nature.com/articles/nprot.2016.169
Paper discussion Assignment 1 due at 11am Fri, Sept 30th
5 October 6 Computer vision and cryo-EM 1. Zhong et al. Reconstructing continuous distributions of protein structure from cryo-EM images. ICLR 2020 Spotlight.
2. Zhong et al. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nature Methods 2021. [pdf]
3. Mildenhall, Srinivasan, Tancik et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV 2020 Oral.

Optional further reading: Stay tuned on slack.
Paper discussion N/A
6 October 13 Physics-based modeling Paper discussion Assignment 2 due at 11am Fri, Oct 14
7 October 20 No class -- Fall Recess N/A N/A
8 October 27 Geometric deep learning and drug discovery Paper discussion N/A
9 November 3 Computational protein design Guest lecture (Tentative) Assignment 3 due
10 November 10 Protein design continued; DNA/RNA structure Guest Seminar (John Ingraham) + Paper Discussion N/A
11 November 17 Protein language models Paper discussion (Tentative) Assignment 4 due
12 November 24 No class -- Thanksgiving N/A N/A
13 December 1 No class -- NeurIPS N/A (Tentative) Assignment 5 due
14 December 8 Generative modeling of sequence and structure Paper discussion (Tentative) Assignment 6 due
15 December 15 Structural bioinformatics Paper discussion N/A