Computer Science 597A
This is a graduate seminar focused on research in theoretical machine learning. Recent successes of machine learning involve nonconvex optimization problems, many of which are are NP-hard in the worst case. This complicates developing a theoretical understanding of these problems. List of topics will include: (i) Analysing nonconvex optimization. (ii) Towards a generalization theory for deep learning. (iii) Semantics of natural language. Some other NLP. (iv) Latent variable models. Learning such models via tensor decomposition. (v) Unsupervised learning, Representation Learning, and Generative Adversarial Nets (GANs). (vi) Adversarial examples in deep learning: are they inherent? (vii) Interpretability in ML.
The course is geared towards graduate students in computer science and allied fields, but may be OK for undergrads who're suitably prepared. (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.)
Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.This course does not satisfy any undergrad requirements in the COS major (BSE or AB).
Instructor: Sanjeev Arora- 407 CS Building - 609-258-3869 arora AT the domain name cs.princeton.edu
|Sept 14: Recap of basic optimization via
1st order methods, Lyapunov functions (measures of
progress), Stochastic Gradient Descent.
Ge's intro to convex optimization.
Going with the slope: offline, online, and randomly. (COS 521 lecture notes)
Optimization for ML; survey lecture by Elad Hazan (includes video and slides)
|Sept 19: Nonconvex optimization. First
order methods. Descent lemma to arrive at critical
point. Arriving at approximate local optima (escaping
to escape saddle points efficiently by Jin et al.
Scribe notes. by Misha Khodak
post #1 on offconvex.org
by Rong Ge
Blog post #2 on offconvex.org
by Chi Jin and Mike Jordan.
|Sept 21: Recap of generalization
theory. (Conditions which guarantee no overfitting
occurs.) Simple bound, Compression bound, PAC-Bayes bound.
notes by Mannor and Shalev-Schwartz
(also many other resources on the web)
Scribe notes. by Hrishikesh Khanderparker
bounding the true error
by J. Langford and R. Caruana.
(derives generalization bounds for NNs using PAC-Bayes)
|Sept 26 : Some weirdnesses of
deep learning (wrt generalization). Discussion re:
paper and its controversial title. An interesting video.
deep learning requires rethinking generalization. (Sanjeev's
highlighted copy.) By Zhang, Bengio, Hardt, Recht, Vinyals.
Optimization, and Generalization in Multilayer Networks. Video
of talk by Nati Srebro at Simons Institute, Berkeley.
|Sept 28: Guest lecture by Behnam
Neyshabur, plus class discussion.
Approach to Spectrally-Normalized Generalization Bounds
for Neural Nets (by Neyshabur et al.)
Normalized Optimization in Neural Networks
by Neyshabur, Salakhutdinov and Srebro.
Spectrally-normalized margin bounds for neural networks by Bartlett, Foster, and Telgarsky.
|Oct 3: Intro to Tensors and Tensor
Decomposition. Difficulties compared to SVD. Jennrich's
Rong Ge Notes 1 and Notes 2.
blog post on tensor methods in ML. Wikipedia page
on eigendecomposition provides most linear algebra
background. See also Rong Ge's notes on SVD.
|Oct 5: Latent Variable Models. Topic
Models. ICA. How to learn them via Tensor Decomposition.
Decomposition for Learning Latent Variable Models by
Anandkumar, Ge, Hsu, Kakade, Telgarsky
|Oct 10: Guest lecturer Tengyu Ma. How to show all local minima are approximate global minima.||Matrix
completion has no spurious local minimum, by Ge, Lee,
|Oct 12: Learning topic models via
nonnegative matrix factorization. Efficient algorithms
assuming separability assumption. Brief intro to solving
nonlinear noisy-OR model via tensor decomposition.
Cameo by Tengyu Ma on more stable tensor decomposition.
topic models ---Provably and Efficiently. Arora et
al. (this version to appear in CACM)
Section 10 of Polynomial-time algorithms for Tensor Decompositions via sum-of-squares
by Ma, Shi and Steurer.
ICML version of topic model paper.
Introductory account of stability issues in linear algebraic procedures appears in this paper by O'Rourke et al.
First two sections of Provable Learning of Noisy-Or networks.
|Oct 17: No class.
|Oct 19: Compressed sensing and matrix
completion via convex programming: quick intro and proofs.
(Guest lecturer: Pravesh Kothari)
||Moitra Sections 4.1, 4.5, and Chapter 7.
|Oct 24: (Class begins at 5pm.) Language models: n-grams. Perplexity.||Language Models. (Chapter by Mike Collins)|