Princeton University

Computer Science 597A

Fall 2017

This is a graduate seminar focused on research in theoretical machine learning. Recent successes of machine learning involve nonconvex optimization problems, many of which are are NPhard in the worst case. This complicates developing a theoretical understanding of these problems. List of topics will include: (i) Analysing nonconvex optimization. (ii) Towards a generalization theory for deep learning. (iii) Semantics of natural language. Some other NLP. (iv) Latent variable models. Learning such models via tensor decomposition. (v) Unsupervised learning, Representation Learning, and Generative Adversarial Nets (GANs). (vi) Adversarial examples in deep learning: are they inherent? (vii) Interpretability in ML.
The course is geared towards graduate students in computer science and allied fields, but may be OK for undergrads who're suitably prepared. (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.)
Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.
This course does not satisfy any undergrad requirements in the COS major (BSE or AB).
Instructor: Sanjeev Arora 407 CS Building  6092583869 arora AT the domain name cs.princeton.edu
Date/topic 
Required reading 
Additional reading 
Sept 14: Recap of basic optimization via
1st order methods, Lyapunov functions (measures of
progress), Stochastic Gradient Descent. 
Rong
Ge's intro to convex optimization. Going with the slope: offline, online, and randomly. (COS 521 lecture notes) 
BoydVanderberghe Sec 9.19.3 Optimization for ML; survey lecture by Elad Hazan (includes video and slides) 
Sept 19: Nonconvex optimization. First
order methods. Descent lemma to arrive at critical
point. Arriving at approximate local optima (escaping
saddle points). 
How
to escape saddle points efficiently by Jin et al. Scribe notes. by Misha Khodak 
Blog
post #1 on offconvex.org by Rong Ge Blog post #2 on offconvex.org by Chi Jin and Mike Jordan. 
Sept 21: Recap of generalization
theory. (Conditions which guarantee no overfitting
occurs.) Simple bound, Compression bound, PACBayes bound. 
Lecture
notes by Mannor and ShalevSchwartz (also many other resources on the web) Scribe notes. by Hrishikesh Khanderparker 
(Not)
bounding the true error by J. Langford and R. Caruana. (derives generalization bounds for NNs using PACBayes) 
Sept 26 : Some weirdnesses of
deep learning (wrt generalization). Discussion re:
paper and its controversial title. An interesting video. 
Understanding
deep learning requires rethinking generalization. (Sanjeev's
highlighted copy.) By Zhang, Bengio, Hardt, Recht, Vinyals. 
Geometry,
Optimization, and Generalization in Multilayer Networks. Video
of talk by Nati Srebro at Simons Institute, Berkeley. 
Sept 28: Guest lecture by Behnam
Neyshabur, plus class discussion. 
PACBayesian
Approach to SpectrallyNormalized Generalization Bounds
for Neural Nets (by Neyshabur et al.) 
PathSGD: Path
Normalized Optimization in Neural Networks by Neyshabur, Salakhutdinov and Srebro. Spectrallynormalized margin bounds for neural networks by Bartlett, Foster, and Telgarsky. 
Oct 3: Intro to Tensors and Tensor
Decomposition. Difficulties compared to SVD. Jennrich's
algorithm. 
Moitra
chapter 3. Rong Ge Notes 1 and Notes 2. 
Rong's
blog post on tensor methods in ML. Wikipedia page
on eigendecomposition provides most linear algebra
background. See also Rong Ge's notes on SVD. 
Oct 5: Latent Variable Models. Topic
Models. ICA. How to learn them via Tensor Decomposition. 
Tensor
Decomposition for Learning Latent Variable Models by
Anandkumar, Ge, Hsu, Kakade, Telgarsky 

Oct 10: Guest lecturer Tengyu Ma. How to show all local minima are approximate global minima.  Matrix
completion has no spurious local minimum, by Ge, Lee,
and Ma. 

Oct 12: Learning topic models via
nonnegative matrix factorization. Efficient algorithms
assuming separability assumption. Brief intro to solving
nonlinear noisyOR model via tensor decomposition. Cameo by Tengyu Ma on more stable tensor decomposition. 
Learning
topic models Provably and Efficiently. Arora et
al. (this version to appear in CACM) Section 10 of Polynomialtime algorithms for Tensor Decompositions via sumofsquares by Ma, Shi and Steurer. 
Longer
ICML version of topic model paper. Introductory account of stability issues in linear algebraic procedures appears in this paper by O'Rourke et al. First two sections of Provable Learning of NoisyOr networks. 
Oct 17: No class. 

Oct 19: Compressed sensing and matrix
completion via convex programming: quick intro and proofs.
(Guest lecturer: Pravesh Kothari) 
Moitra Sections 4.1, 4.5, and Chapter 7. 

Oct 24: (Class begins at 5pm.) Language models: ngrams. Perplexity.  Language Models. (Chapter by Mike Collins) 