Princeton University logo
Princeton University
Computer Science Department

Computer Science 597B
Theoretical Deep Learning
  

Sanjeev Arora


  Fall 2019

Course Summary

This is a graduate course focused on research in theoretical aspects of deep learning. In recent years, deep learning has become the central paradigm of machine learning and related fields such as computer vision and natural language processing. But mathematical understanding for many aspects of this endeavor are still lacking. When and how fast does training succeed, and using how many examples? What are strengths and limitations of various architectures?

The course is geared towards graduate students in computer science and allied fields, (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.) Auditors are welcome, provided there is space in the class-room. We will prepare detailed notes on the lectures, and the plan is to convert them into a monograph. There will be many guest lecturers from the ongoing IAS special year on Optimization, Statistics and Machine Learning.

Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. Students who fail to do this will not get credit for the course.

This course does not satisfy any undergrad  requirements in the COS major (BSE or AB) and undergrads are not allowed to take this course for a grade.

Administrative Information

Lectures: Friday 1:30-4:30   Room: Equad E225.  First meeting: Sept 13.

Instructor: Sanjeev Arora- 407 CS Building - 609-258-3869 arora AT the domain name cs.princeton.edu



Lecture Schedule

This is tentative.

Basic readings: (a) Deep Learning book by Goodfellow, Courville and Bengio. (b) Webpages of Fall18 course and Fall17 course.

Date
Topic and main reading
Additional Reading
Sept 13
Basic framework, intro to optimization and generalization.  Lecture notes.

Sept 20
Nonconvex landscapes. Generalized linear models, PCA etc. 
Sept 27
 Thinking of GD as random walk in landscape. Escaping saddle points. Langevin dynamics. Batch size vs step size.

Oct 4
Towards understanding generalization puzzle (part 1): Infinitely wide deep nets and associated Neural Tangent Kernels

Oct 11
Current ways to understand generalization of finite but overparametrized nets. (+ their limitations)

Oct 18
Possibly no lecture; instead attend IAS workshop on theory of DL that week (as your schedule permits)

Oct 25
 Implicit regularization in the Algorithm.

Nov 1
Fall break

Nov 8
Understanding effect of Dropout regularization + ??

Nov 15
Variational auto-encoders, Reparametrization trick. Generative Adversarial Nets and their limitations.

Nov 22
Empirically successful tricks (eg Batch Norm, Data Augmentation, etc.) and efforts to understand them.

Nov 29
Thanksgiving

Dec 6
Implicit regularization and acceleration by going deeper : understanding via dynamics of gradient descent

Dec 13
Adversarial examples and approaches towards certified defense. Min-max algorithms.





























Please use this style file to scribe notes. Sample files are a source file and a compiled file.



Reading List  (will be continually updated)







Useful Resources

  1. Ankur Moitra's notes Algorithmic Aspects of Machine Learning.
  2. Rong Ge's notes on Algorithmic Aspects of ML.
  3. Blog: Off the Convex Path.
  4. Going with the slope: offline, online, and randomly. (Lecture notes from COS 521: Arora and Kothari)
  5. Optimization for ML; survey lecture by Elad Hazan (includes video and slides)