COS597A (Fall 2017) New Directions in Theoretical Machine Learning

Princeton University
Computer Science Department

Computer Science 597A
New Directions in Theoretical Machine Learning

Sanjeev Arora

Fall 2017

Course Summary

This is a graduate seminar focused on research in theoretical machine learning. Recent successes of machine learning involve nonconvex optimization problems, many of which are are NP-hard in the worst case. This complicates developing a theoretical understanding of these problems. List of topics will include: (i) Analysing nonconvex optimization. (ii) Towards a generalization theory for deep learning. (iii) Semantics of natural language. Some other NLP. (iv) Latent variable models. Learning such models via tensor decomposition. (v) Unsupervised learning, Representation Learning, and Generative Adversarial Nets (GANs). (vi) Adversarial examples in deep learning: are they inherent? (vii) Interpretability in ML.

The course is geared towards graduate students in computer science and allied fields, but may be OK for undergrads who're suitably prepared. (Knowledge of machine learning as well as algorithm design/analysis. Ideally, they would have taken at least one of COS 521 and COS 511.)

Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.

This course does not satisfy any undergrad requirements in the COS major (BSE or AB).

Administrative Information

Lectures: Tues-Thurs 4:30-5:50pm. Room: 402 CS Building. First meeting: Sept 14.

Instructor: Sanjeev Arora- 407 CS Building - 609-258-3869 arora AT the domain name cs.princeton.edu

Lecture Schedule

Date/topic	Required reading	Additional reading
Sept 14: Recap of basic optimization via 1st order methods, Lyapunov functions (measures of progress), Stochastic Gradient Descent.	Rong Ge's intro to convex optimization. Going with the slope: offline, online, and randomly. (COS 521 lecture notes)	Boyd-Vanderberghe Sec 9.1-9.3 Optimization for ML; survey lecture by Elad Hazan (includes video and slides)
Sept 19: Nonconvex optimization. First order methods. Descent lemma to arrive at critical point. Arriving at approximate local optima (escaping saddle points).	How to escape saddle points efficiently by Jin et al. Scribe notes. by Misha Khodak	Blog post #1 on offconvex.org by Rong Ge Blog post #2 on offconvex.org by Chi Jin and Mike Jordan.
Sept 21: Recap of generalization theory. (Conditions which guarantee no overfitting occurs.) Simple bound, Compression bound, PAC-Bayes bound.	Lecture notes by Mannor and Shalev-Schwartz (also many other resources on the web) Scribe notes. by Hrishikesh Khanderparker	(Not) bounding the true error by J. Langford and R. Caruana. (derives generalization bounds for NNs using PAC-Bayes)
Sept 26 : Some weirdnesses of deep learning (wrt generalization). Discussion re: paper and its controversial title. An interesting video.	Understanding deep learning requires rethinking generalization. (Sanjeev's highlighted copy.) By Zhang, Bengio, Hardt, Recht, Vinyals.	Geometry, Optimization, and Generalization in Multilayer Networks. Video of talk by Nati Srebro at Simons Institute, Berkeley.
Sept 28: Guest lecture by Behnam Neyshabur, plus class discussion.	PAC-Bayesian Approach to Spectrally-Normalized Generalization Bounds for Neural Nets (by Neyshabur et al.)	Path-SGD: Path Normalized Optimization in Neural Networks by Neyshabur, Salakhutdinov and Srebro. Spectrally-normalized margin bounds for neural networks by Bartlett, Foster, and Telgarsky.
Oct 3: Intro to Tensors and Tensor Decomposition. Difficulties compared to SVD. Jennrich's algorithm.	Moitra chapter 3. Rong Ge Notes 1 and Notes 2. Scribe notes by Woo Sang Cho	Rong's blog post on tensor methods in ML. Wikipedia page on eigendecomposition provides most linear algebra background. See also Rong Ge's notes on SVD.
Oct 5: Latent Variable Models. Topic Models. ICA. How to learn them via Tensor Decomposition.	Tensor Decomposition for Learning Latent Variable Models by Anandkumar, Ge, Hsu, Kakade, Telgarsky
Oct 10: Guest lecturer Tengyu Ma. How to show all local minima are approximate global minima.	Matrix completion has no spurious local minimum, by Ge, Lee, and Ma.
Oct 12: Learning topic models via nonnegative matrix factorization. Efficient algorithms assuming separability assumption. Brief intro to solving nonlinear noisy-OR model via tensor decomposition. Cameo by Tengyu Ma on more stable tensor decomposition.	Learning topic models ---Provably and Efficiently. Arora et al. (this version to appear in CACM) Section 10 of Polynomial-time algorithms for Tensor Decompositions via sum-of-squares by Ma, Shi and Steurer. Scribe notes by Suqi Liu	Longer ICML version of topic model paper. Introductory account of stability issues in linear algebraic procedures appears in this paper by O'Rourke et al. First two sections of Provable Learning of Noisy-Or networks.
Oct 17: No class.
Oct 19: Compressed sensing and matrix completion via convex programming: quick intro and proofs. (Guest lecturer: Pravesh Kothari)	Moitra Sections 4.1, 4.5, and Chapter 7.
Oct 24: (Class begins at 5pm.) Language models: n-grams. Perplexity. (Guest lecturer: Sida Wang)	Language Models. (Chapter by Mike Collins)
Oct 26: Word embeddings and their fascinating properties.	A Latent Variable Model Approach to PMI-based Word Embeddings by Arora et al. (TACL'16)	Blog articles: Semantic word embeddings and Word embeddings: Explaining their properties.
Nov 7: Unsupervised learning: formalizations. Bayesian perspective. Sentence embeddings.	Unsupervised learning: one notion or many? by Arora and Risteski.	Provable benefits of representation learning by Arora and Risteski.
Nov 9: Autoencoders; Denoising Autoencoders. Variational Autoencoders.	Autoencoding variational Bayes by Kingma and Welling.	Stacked denoising autoencoders by Vincent et al. Importance weighted autoencoders by Burda, Grosse, and Salakhutdinov.
Nov 14: Generative Adversarial Nets (GANs).	Ian Goodfellow's tutorial on GANs. Generalization and Equilibrium in GANs by Arora, Ge, Liang, Ma, and Zhang.	Offconvex.org blogpost1 and blogpost2 (the second reports experimental findings using the birthday paradox test). Theoretical limitations of encoder-decoder GAN architectures. by Arora, Risteski, Zhang.
Nov 16: Can we replace complicated LSTM models with simpler, interpretable models?	A simple but tough to beat baseline for sentence embeddings.	Unreasonable effectiveness by RNNs (blog post by A. Karpathy) Understanding LSTM networks (post on Colah's blog)
Nov 21: Adversarial examples for deep nets.	Is attacking ML easier than defending it? blog post by Goodfellow and Papernot. Delving into transferable adversarial examples and black box attacks by Liu et al.	Toward deep learning models resistant to adversarial attacks, by Madry et al.
Nov 28: More on generalization mystery for deep learning...	Wait for scribe notes...	Introduction to matrix concentration inequalities by J. Tropp. We used Thm 1.6.2 in class. Nonvacuous PAC-bayes bounds for deep nets by Dziugaite and Roy. (Sanjeev's highlighted version.)
Nov 30: Frank-Wolfe updates as an approach to nonconvex optimization. (Guest lecturer Elad Hazan)
Dec 5: Models for interactive learning. Machine teaching, curriculum learning etc.	Lecture notes on machine teaching by Jerry Zhu Curriculum Learning by Bengio et al. (ICML09)	Pieter Abbeel's lecture notes on policy gradient. (Basic idea is on slide 8.) Hossein Mobahi's talk on homotopy method. (watched for a few min to see connection to curriculum learning)
Dec 7: No lecture due to NIPS
Dec 12: No lecture; attend Yann LeCun's talk at IAS instead.
Dec 14: Recent progress in making deep nets resistant to adversarial attacks.	Provable defenses against adversarial examples via the convex outer adversarial polytope by Kolter and Wong (NIPS'17) Countering adversarial images using input transformations, by Anonymous (ICLR'17 submission)	Toward deep learning models resistant to adversarial attacks by Madry, Makelov, Schmidt, Tsipras, Vladu. Certified defenses for data poisoning attacks by Steinhardt, Koh and Liang

Reading List (will be continually updated)

Optimization

Going with the slope: offline, online, and randomly. (Lecture notes from COS 521: Arora and Kothari)
Optimization for ML; survey lecture by Elad Hazan (includes video and slides)

Useful Resources

Lecture notes

Rob Schapire's notes on Machine Learning Theory.
Ankur Moitra's notes Algorithmic Aspects of Machine Learning.
Sanjeev Arora's grad course on Advanced Algorithm Design.
Rong Ge's notes on Algorithmic Aspects of ML.
Blog: Off the Convex Path.

Online Books

Draft of Foundations of Data Science by Blum, Hopcroft and Kannan. (Highly recommended)
Elad Hazan's book Online Convex Optimization
Kevin Murphy's book on machine learning (written from a bayesian perspective).
Ben-David and Shalev-Schwartz's Understanding Machine Learning (written from an ML theory perspective)
Boyd and Vandenberghe, Convex Optimization (the classic book)

Princeton University Computer Science Department

Computer Science 597A New Directions in Theoretical Machine Learning

Sanjeev Arora

Fall 2017

Course Summary

Administrative Information

Lectures: Tues-Thurs 4:30-5:50pm. Room: 402 CS Building. First meeting: Sept 14.

Lecture Schedule

Reading List (will be continually updated)

Useful Resources

Lecture notes

Online Books

Princeton University
Computer Science Department

Computer Science 597A
New Directions in Theoretical Machine Learning