Computer Science 598D
This is a seminar course that will focus on the following phenomenon: many problems in machine learning are formally intractable (e.g., NP-hard). Nevertheless they are solved in practice by heuristics. Can we design algorithms with provable guarantees (running time, solution quality)? The course will cover some successes in this direction and focus on discussing open problems (of which there are many).
The course is geared towards graduate students but may be OK for undergrads who are suitably prepared. (Undergrad level training in algorithms and machine learning. Knowledge of material in COS 521 as taught in Fall 2014 would be highly desirable, especially the lectures on high dimensional geometry and SVD.)
Enrolled students as well as auditors are expected to come to class regularly and participate in class discussion. To get a grade in the class, students should also be prepared to present a topic some time in the course, and write a brief paper or lecture notes on somebody else's presentation. There may also be an opportunity to do a course project, depending upon student interest.
Instructor: Sanjeev Arora- 307 CS Building - 609-258-3869 arora AT the domain name cs.princeton.edu
|Feb 5: Whirlwind intro to ML.
(Supervised, Unsupervised, Discriminative, Generative, Max
Entropy Models, Max likelihood, etc.). Intro to themes of
||Lecture 1 notes
Relevant chapter on generative vs discriminative in Tom Mitchellís book.
Survey by K Conrad on Max Entropy.
On discriminative vs generative: A comparison of logistic regression and naÔve bayes, by Ng and Jordan in NIPS 2001.
Generative or Discriminative? Getting the best of both worlds by Bishop and Lasserre. Bayesian Statistics 2007.
Connection between max likelihood and
max entropy. (Note by J. Coughlan.)
More philosophical: Steinhardt's blogpost on the old Bayesian vs Frequentist argument
(and how it relates to more modern arguments in ML).
|Feb 12: Linearized models in ML.
SVD: A Swiss army knife for solving many such problems.
COS521 lecture notes (Lec 1 and Lec 2) on SVD.
Moitra's chapter on Sparse Recovery.
Hopcroft Kannan Section 8.4
|Read wikipedia pages on topic modeling,
sparse coding, restricted isometry property (RIP).
|Feb 19: Nonnegative Matrix Factorization and
||Moitra's Chapter 2 except Sec 2.3
||Papers: Arora, Ge, Moitra,
Kannan on NMF (STOC 2012); Arora, Ge
Moitra on Topic Models
and Arora et al. on efficient versions of
topic modeling algorithm (ICML'13)
|Feb 26: No Class
|Mar 5: Sparse coding/dictionary learning, and
unsupervised learning of deep nets.
(A common thread is a simple algorithm for community detection.)
|Scribe notes by
||Papers: Arora, Ge, Moitra on
Provable algorithms for Dictionary Learning (COLT'14)
Arora, Bhaskara, Ge, Ma on Unsupervised learning of Deep nets (ICML'14)
|Mar 12: Learning latent variable models by
Learning topic models under l1 error (new manuscript by Bansal, Bhattacharya, Kannan; presented by Andrej Risteski)
|Scribe notes by
decompositions for learning latent variable models by
Anandkumar, Ge, Hsu, Kakade, Telgarsky.
Provable SVD-based algorithm for topic modeling by
Bansal, Bhattacharya, Kannan.
|Mar 26: Graphical models; Markov Random
Fields; Bayes Nets; Gibbs Distribution. Computational
Intractability of Inference problems;
from a Short Course on Graphical Models by Mark
Paskin. (Lectures 2, 3 are especially useful.)
models, exponential families, and variational inference by
Wainwright and Jordan (a book-length survey)
A sensitivity analysis of Pathfinder by Kevin Ng
(evaluates this seminal use of graphical models in medical diagnoses).
Chapter 7: Gibbs States in Probability on Graphs by G. Grimmett
|April 2: Introduction to NLP; Markovian
Semantic Word Vectors.
|Some well-written book chapters by Mike
Collins (a) Language
(b) Loglinear models.
Kathy McKeown's lecture notes on
Hidden Markov Models. (Also explains Part-of-speech tagging, and Forward-Backword and Viterbi algorithm.)
|Lecture notes by Kathy McKeown
n-grams and corpus linguistics.
|April 9: More NLP. Probabilistic CFGs;
|Book chapters by Mike Collins:
IBM Translation Models
convex alternative to the IBM Model 2, by Simion,
Collins and Stein. EMNLP 2013.
|April 16: Semantics.
Semantic word embeddings.
by Arora, Li, Liang, Ma, Risteski.
in Neuron 2013 showing how to use topic modeling to
understand fMRI data.
(by Stansbury, Naselaris, Gallant).
|April 23: (1) SVD based algorithms for learning
HMMs and continuous-output HMMs. (Presentation by Alex
Smith.) (2) Matrix Completion via convex relaxation.
Algorithm for Learning HMMs by Hsu, Kakade and
(2) Hilbert Space Embeddings of HMMs by Song et al.
(3) Moitra's lecture notes on matrix completion.
by Le Song on paper (2).
|April 30: (1) Efficiently learning Ising
models on arbitrary graphs. (presentation by Holden Lee)
(2) Energy based modeling in ML (including RBMs, word2vec, etc.)
by Guy Bresler.