Princeton University logo
Princeton University
Computer Science Department

Computer Science 597A
New Directions in Theoretical Machine Learning
  

Sanjeev Arora


  Fall 2017

Course Summary

This is a graduate seminar focused on research in theoretical machine learning. Recent successes of machine learning involve nonconvex optimization problems, many of which are are NP-hard in the worst case. This complicates developing a theoretical understanding of these problems. List of topics will include: (i) Analysing nonconvex optimization. (ii) Towards a generalization theory for deep learning. (iii) Semantics of natural language. Some other NLP. (iv) Latent variable models. Learning such models via tensor decomposition. (v) Unsupervised learning, Representation Learning, and Generative Adversarial Nets (GANs). (vi) Adversarial examples in deep learning: are they inherent? (vii) Interpretability in ML.

The course is geared towards graduate students in computer science and allied fields, but may be OK for undergrads who're suitably prepared. (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.)

Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.

This course does not satisfy any undergrad  requirements in the COS major (BSE or AB).

Administrative Information

Lectures: Tues-Thurs 4:30-5:50pm.   Room: 402 CS Building. First meeting: Sept 14.

Instructor: Sanjeev Arora- 407 CS Building - 609-258-3869 arora AT the domain name cs.princeton.edu


Lecture Schedule


Date/topic
Required reading
Additional reading
Sept 14: Recap of basic optimization via 1st order methods, Lyapunov functions (measures of progress), Stochastic Gradient Descent.
Rong Ge's intro to convex optimization.

Going with the slope: offline, online, and randomly.  (COS 521 lecture notes)
Boyd-Vanderberghe
Sec 9.1-9.3

Optimization for ML; survey lecture by Elad Hazan (includes video and slides)
Sept 19: Nonconvex optimization. First order methods. Descent lemma to arrive at critical point. Arriving at approximate local optima  (escaping saddle points).
How to escape saddle points efficiently by Jin et al.

Scribe notes. by Misha Khodak
Blog post #1 on offconvex.org
by Rong Ge


Blog post #2 on offconvex.org
by Chi Jin and Mike Jordan.
Sept 21: Recap of generalization theory. (Conditions which guarantee no overfitting occurs.) Simple bound, Compression bound, PAC-Bayes bound.
Lecture notes by Mannor and Shalev-Schwartz
(also many other resources on the web)
Scribe notes. by Hrishikesh Khanderparker
(Not) bounding the true error
by J. Langford and R. Caruana.
(derives generalization bounds for NNs using PAC-Bayes)
Sept 26 : Some weirdnesses of deep learning (wrt generalization). Discussion re: paper and its controversial title. An interesting video.
Understanding deep learning requires rethinking generalization.  (Sanjeev's highlighted copy.) By Zhang, Bengio, Hardt, Recht, Vinyals.
Geometry, Optimization, and Generalization in Multilayer Networks. Video of talk by Nati Srebro at Simons Institute, Berkeley.
Sept 28: Guest lecture by Behnam Neyshabur, plus class discussion.
PAC-Bayesian Approach to Spectrally-Normalized Generalization Bounds for Neural Nets (by Neyshabur et al.)
Path-SGD: Path Normalized Optimization in Neural Networks
by Neyshabur, Salakhutdinov and Srebro.
Spectrally-normalized margin bounds for neural networks by Bartlett, Foster, and Telgarsky.
Oct 3: Intro to Tensors and Tensor Decomposition. Difficulties compared to SVD. Jennrich's algorithm.
Moitra chapter 3.
Rong Ge Notes 1 and Notes 2.
Rong's blog post on tensor methods in ML. Wikipedia page on eigendecomposition provides most linear algebra background. See also Rong Ge's notes on SVD.
Oct 5: Latent Variable Models. Topic Models. ICA. How to learn them via Tensor Decomposition.
Tensor Decomposition for Learning Latent Variable Models by Anandkumar, Ge, Hsu, Kakade, Telgarsky

Oct 10: Guest lecturer Tengyu Ma. How to show all local minima are approximate global minima. Matrix completion has no spurious local minimum, by Ge, Lee, and Ma.

Oct 12: Learning topic models via nonnegative matrix factorization. Efficient algorithms assuming separability assumption. Brief intro to solving nonlinear noisy-OR model via tensor decomposition.

Cameo by Tengyu Ma on more stable tensor decomposition.

Learning topic models ---Provably and Efficiently. Arora et al. (this version to appear in CACM)

Section 10 of Polynomial-time algorithms for Tensor Decompositions via sum-of-squares
by Ma, Shi and Steurer.
Longer ICML version of topic model paper.

Introductory account of stability issues in linear algebraic procedures appears in this paper by O'Rourke et al.

First two sections of Provable Learning of Noisy-Or networks.
Oct 17: No class.


Oct 19: Compressed sensing and matrix completion via convex programming: quick intro and proofs. (Guest lecturer: Pravesh Kothari)
Moitra Sections 4.1, 4.5, and Chapter 7.

Oct 24: (Class begins at 5pm.) Language models: n-grams. Perplexity. Language Models. (Chapter by Mike Collins)



Reading List  (will be continually updated)

Optimization
  1. Going with the slope: offline, online, and randomly. (Lecture notes from COS 521: Arora and Kothari)
  2. Optimization for ML; survey lecture by Elad Hazan (includes video and slides)






Useful Resources

Lecture notes

  1. Rob Schapire's notes on Machine Learning Theory.
  2. Ankur Moitra's notes Algorithmic Aspects of Machine Learning.
  3. Sanjeev Arora's grad course on Advanced Algorithm Design.
  4. Rong Ge's notes on Algorithmic Aspects of ML.
  5. Blog: Off the Convex Path.


Online Books


  1. Draft of Foundations of Data Science by Blum, Hopcroft and Kannan. (Highly recommended)
  2. Elad Hazan's book Online Convex Optimization
  3. Kevin Murphy's book on machine learning (written from a bayesian perspective).
  4. Ben-David and Shalev-Schwartz's Understanding Machine Learning (written from an ML theory perspective)
  5. Boyd and Vandenberghe, Convex Optimization (the classic book)