Computer Science 597G
This is a graduate course focused on research in theoretical aspects of deep learning. In recent years, deep learning has become the central paradigm of machine learning and related fields such as computer vision and natural language processing. But mathematical understanding for many aspects of this endeavor are still lacking. When and how fast does training succeed, and using how many examples? What are strengths and limitations of various architectures?
The course is geared towards graduate students in computer science and allied fields, but may be OK for undergrads who're suitably prepared. (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.)
Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.This course does not satisfy any undergrad requirements in the COS major (BSE or AB).
Instructor: Sanjeev Arora- 407
Building - 609-258-3869 arora AT the domain name cs.princeton.edu
TA: Yuping Luo yupingl AT the domain name cs.princeton.edu
|Sept 13: Basic setup of deep learning.
Gradient descent, back propagation, and training methods.
Nielsen's book for the basics
|Sept 18: Representing functions using deep
nets. Barron's theorem.
by Neelesh Kumar and Yuting Wang.
Holden Lee's notes. A bit rough.
Matus Telgarsky's lecture slides.
|Sept 20: Basics of optimization concepts.
by Abhishek Bhrushundi, Pritish Sahu, Wenjia Zhang
|Sept 25: Introduction to practical aspects of
deep learning. (Lecturer Yuping Luo).
|Sept 27: No class
|Oct 2: Understanding adaptive regularization/
preconditioners/learning rates (and how they are related)
by Ethan Tseng, DiJia(Andy) Su
||Description of Adagrad p 89-94 of Hazan's OCO book.
Catalog of adaptive learning algorithms with brief discussion. (Sebastian Ruder's blog article.)
|Oct 4: Concluding remarks on adaptive
regularization. Escaping saddle points via perturbed
by Pranay Manocha and Mohammed T. Islam.
post by Chi Jin and Mike Jordan.
2017 Lecture notes by Misha Khodak
|Oct 9: Generalization: basic concepts. The
ubiquitous union bound. Rademacher Complexity.
scribe notes by Charlie Hou.
2017 scribe notes by Hrishikesh Khanderparker.
|Oct 11: PAC-Bayes bounds and some
|Oct 16: Generalization: Proving
generalization via compression arguments. Linear classifiers
with margin. ReLU deep nets.
Generalization Bounds via a Compression Approach.
(ICML'18) See also Blog
|Oct 18: Finish generalization via
compression. Start unsupervised learning. Various notions of
unsupervised learning and how they're connected.
learning and VAEs by Nataly Brukhim, Arushi Gupta,
and Edgar Minasyan.
|Oct 23: Variational Auto-encoders (VAEs).
|Oct 25: Metrics on distributions and GANs.
Limitations of GANs. Training objective cannot prevent
1, post 2,
|Nov 6: More on gradient based training and
"shaping" the gradient. Proximal gradient descent.
Architectures for "shaping" the gradient: ResNets,
DenseNets, and BatchNorm.
on towarddatascience describing various ResNet models.
|Nov 8: Proving gradient descent attains zero
training error with overparametrized nets (in simple
by Du, Zhai, Poczos and Singh.
|Nov 13: Language models. Introduction to
n-grams, word embeddings, LSTMs.
Modeling (book Chapter by Mike Collins)
Lecture notes by Socher et al (from Stanford CS224)
|Nov 15 (guest lecture by Sham Kakade): Recent
trends in language models: Elmo and his friends from
notes by Sham. Scribe notes coming.
||Google news releases on Transformers
|Nov 20: Sham Kakade guest lecture.
Continuation of language modeling.
notes by Sham.
|Nov 27: Understanding word and sentence
|Nov 29: Finish sentence embeddings. Start
Paper 1( Liu et al), Paper 2 (Tramer et al) Paper 3 (Madry et al)
|Dec 4: Certified defense against adversarial
(Wong and Kolter'18) and paper 2(Wong et
|Dec 11: Recent analyses of teacher-student
net training where student net is overparametrized.
by Simon Du et al.
|Dec 13: Teacher-student nets contd.
Generalization in presence of vast overparametrization.
Please use this style file to scribe notes. Sample files are a source file and a compiled file.