Princeton University logo
Princeton University
Computer Science Department

Computer Science 597G
Theoretical Foundations of Deep Learning
  

Sanjeev Arora


  Fall 2018

Course Summary

This is a graduate course focused on research in theoretical aspects of deep learning. In recent years, deep learning has become the central paradigm of machine learning and related fields such as computer vision and natural language processing. But mathematical understanding for many aspects of this endeavor are still lacking. When and how fast does training succeed, and using how many examples? What are strengths and limitations of various architectures?

The course is geared towards graduate students in computer science and allied fields, but may be OK for undergrads who're suitably prepared. (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.)

Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.

This course does not satisfy any undergrad  requirements in the COS major (BSE or AB).

Administrative Information

Lectures: Tues-Thurs 4:30-5:50pm.   Room: COS 105. First meeting: Sept 13.

Instructor: Sanjeev Arora- 407 CS Building - 609-258-3869 arora AT the domain name cs.princeton.edu

TA: Yuping Luo  yupingl AT the domain name cs.princeton.edu


Lecture Schedule


Date/Topic
Required Reading
Additional Reading
Sept 13: Basic setup of deep learning. Gradient descent, back propagation, and training methods.

Michael Nielsen's book for the basics
Sept 18: Representing functions using deep nets. Barron's theorem.
Scribe notes by Neelesh Kumar and Yuting Wang.
Barron's paper.
Holden Lee's notes. A bit rough.
Matus Telgarsky's lecture slides.
Sept 20: Basics of optimization concepts.
Scribe notes by Abhishek Bhrushundi, Pritish Sahu, Wenjia Zhang

Sept 25: Introduction to practical aspects of deep learning.  (Lecturer Yuping Luo).
Slides

Sept 27: No class


Oct 2: Understanding adaptive regularization/ preconditioners/learning rates (and how they are related)
Scribe notes by Ethan Tseng, DiJia(Andy) Su
Description of Adagrad p 89-94 of Hazan's OCO book.

Catalog of adaptive learning algorithms with brief discussion. (Sebastian Ruder's blog article.)
Oct 4: Concluding remarks on adaptive regularization. Escaping saddle points via perturbed gradient descent.
Scribe notes by Pranay Manocha and Mohammed T. Islam.
Blog post by Chi Jin and Mike Jordan.
2017 Lecture notes by Misha Khodak
Oct 9: Generalization: basic concepts. The ubiquitous union bound. Rademacher Complexity.
Updated scribe notes by Charlie Hou.
Blog post.
2017 scribe notes by Hrishikesh Khanderparker.
Oct 11: PAC-Bayes bounds and some applications.


Oct 16: Generalization: Proving generalization via compression arguments. Linear classifiers with margin. ReLU deep nets.

Stronger Generalization Bounds via a Compression Approach. (ICML'18) See also Blog post.
Oct 18: Finish generalization via compression. Start unsupervised learning. Various notions of unsupervised learning and how they're connected.
Unsupervised learning and VAEs by Nataly Brukhim, Arushi Gupta, and Edgar Minasyan.
Blog post.
Oct 23: Variational Auto-encoders (VAEs). Start GANs.


Oct 25: Metrics on distributions and GANs. Limitations of GANs. Training objective cannot prevent mode-collapse

Blog post 1, post 2, post 3
Nov 6: More on gradient based training and "shaping" the gradient. Proximal gradient descent. Architectures for "shaping" the gradient: ResNets, DenseNets, and BatchNorm.

Blogpost on towarddatascience describing various ResNet models.
Nov 8: Proving gradient descent attains zero training error with overparametrized nets (in simple architectures).
Paper by Du, Zhai, Poczos and Singh.

Nov 13: Language models. Introduction to n-grams, word embeddings, LSTMs.

Language Modeling (book Chapter by Mike Collins)
Lecture notes by Socher et al (from Stanford CS224)
Nov 15 (guest lecture by Sham Kakade): Recent trends in language models:  Elmo and his friends from Sesame Street.
Rough notes by Sham. Scribe notes coming.
Google news releases on Transformers and BERT.
Nov 20: Sham Kakade guest lecture. Continuation of language modeling.
More notes by Sham.

Nov 27: Understanding word and sentence embeddings.

Blog post 1, post 2, post 3, post 4.
Nov 29: Finish sentence embeddings. Start Adversarial Examples.

Blog post 1.
Paper 1( Liu et al), Paper 2 (Tramer et al) Paper 3 (Madry et al)
Dec 4: Certified defense against adversarial examples.

paper1 (Wong and Kolter'18) and paper 2(Wong et al, Nov'18)
Dec 11: Recent analyses of teacher-student net training where student net is overparametrized.

Paper by Simon Du et al.
Dec 13: Teacher-student nets contd. Generalization in presence of vast overparametrization.



Please use this style file to scribe notes. Sample files are a source file and a compiled file.



Reading List  (will be continually updated)

Optimization
  1. Going with the slope: offline, online, and randomly. (Lecture notes from COS 521: Arora and Kothari)
  2. Optimization for ML; survey lecture by Elad Hazan (includes video and slides)
  3. Deep Learning book by Goodfellow, Courville and Bengio.





Useful Resources

  1. Ankur Moitra's notes Algorithmic Aspects of Machine Learning.
  2. Rong Ge's notes on Algorithmic Aspects of ML.
  3. Blog: Off the Convex Path.