Princeton University

Computer Science 598D

Spring 2015

This is a seminar course that will focus on the following phenomenon: many problems in machine learning are formally intractable (e.g., NPhard). Nevertheless they are solved in practice by heuristics. Can we design algorithms with provable guarantees (running time, solution quality)? The course will cover some successes in this direction and focus on discussing open problems (of which there are many).
The course is geared towards graduate students but may be OK for undergrads who are suitably prepared. (Undergrad level training in algorithms and machine learning. Knowledge of material in COS 521 as taught in Fall 2014 would be highly desirable, especially the lectures on high dimensional geometry and SVD.)
Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.
Instructor: Sanjeev Arora 307 CS Building  6092583869 arora AT the domain name cs.princeton.edu
Feb 5: Whirlwind intro to ML.
(Supervised, Unsupervised, Discriminative, Generative, Max
Entropy Models, Max likelihood, etc.). Intro to themes of
the course. 
Lecture 1 notes Relevant chapter on generative vs discriminative in Tom Mitchell’s book. Survey by K Conrad on Max Entropy. 
Readings: On discriminative vs generative: A comparison of logistic regression and naïve bayes, by Ng and Jordan in NIPS 2001. Generative or Discriminative? Getting the best of both worlds by Bishop and Lasserre. Bayesian Statistics 2007. Connection between max likelihood and max entropy. (Note by J. Coughlan.) More philosophical: Steinhardt's blogpost on the old Bayesian vs Frequentist argument (and how it relates to more modern arguments in ML). 
Feb 12: Linearized models in ML. SVD: A Swiss army knife for solving many such problems. 
Brief Lecture
notes. COS521 lecture notes (Lec 1 and Lec 2) on SVD. Moitra's chapter on Sparse Recovery. Hopcroft Kannan Section 8.4 (spectral clustering). 
Read wikipedia pages on topic modeling, sparse coding, restricted isometry property (RIP). 
Feb 19: Nonnegative Matrix Factorization and
Topic Modeling. 
Moitra's Chapter 2 except Sec 2.3 
Papers: Arora, Ge, Moitra,
Kannan on NMF (STOC 2012); Arora, Ge
Moitra on Topic Models (FOCS 2012) and Arora et al. on efficient versions of topic modeling algorithm (ICML'13) 
Feb 26: No Class 

Mar 5: Sparse coding/dictionary learning, and
unsupervised learning of deep nets. (A common thread is a simple algorithm for community detection.) 
Scribe notes by
Max Simchowitz. 
Papers: Arora, Ge, Moitra on
Provable algorithms for Dictionary Learning (COLT'14) Arora, Bhaskara, Ge, Ma on Unsupervised learning of Deep nets (ICML'14) 
Mar 12: Learning latent variable models by
Tensor Decomposition. Learning topic models under l1 error (new manuscript by Bansal, Bhattacharya, Kannan; presented by Andrej Risteski) 
Scribe notes by
Holden Lee. 
Tensor
decompositions for learning latent variable models by
Anandkumar, Ge, Hsu, Kakade, Telgarsky. Provable SVDbased algorithm for topic modeling by Bansal, Bhattacharya, Kannan. 
Mar 26: Graphical models; Markov Random
Fields; Bayes Nets; Gibbs Distribution. Computational
Intractability of Inference problems; 
Slides
from a Short Course on Graphical Models by Mark
Paskin. (Lectures 2, 3 are especially useful.) 
Graphical
models, exponential families, and variational inference by
Wainwright and Jordan (a booklength survey) A sensitivity analysis of Pathfinder by Kevin Ng (evaluates this seminal use of graphical models in medical diagnoses). Chapter 7: Gibbs States in Probability on Graphs by G. Grimmett 
April 2: Introduction to NLP; Markovian
models; Smoothing. Semantic Word Vectors. 
Some wellwritten book chapters by Mike
Collins (a) Language
models (b) Loglinear models. Kathy McKeown's lecture notes on Hidden Markov Models. (Also explains Partofspeech tagging, and ForwardBackword and Viterbi algorithm.) 
Lecture notes by Kathy McKeown ngrams and corpus linguistics. 
April 9: More NLP. Probabilistic CFGs;
lexicalized CFGs. Semantics. 
Book chapters by Mike Collins: Probabilistic CFGs Lexicalized CFGs. IBM Translation Models 
A
convex alternative to the IBM Model 2, by Simion,
Collins and Stein. EMNLP 2013. 
April 16: Semantics. Semantic word embeddings. 
Paper
by Arora, Li, Liang, Ma, Risteski. 
Paper
in Neuron 2013 showing how to use topic modeling to
understand fMRI data. (by Stansbury, Naselaris, Gallant). 
April 23: (1) SVD based algorithms for learning
HMMs and continuousoutput HMMs. (Presentation by Alex
Smith.) (2) Matrix Completion via convex relaxation. 
(1) Spectral
Algorithm for Learning HMMs by Hsu, Kakade and
Zhang. (2) Hilbert Space Embeddings of HMMs by Song et al. (3) Moitra's lecture notes on matrix completion. 
Slides
by Le Song on paper (2). 
April 30: (1) Efficiently learning Ising
models on arbitrary graphs. (presentation by Holden Lee) (2) Energy based modeling in ML (including RBMs, word2vec, etc.) 
Paper
by Guy Bresler. 