Princeton
University

Spring
2012: COS598C


Directory
General Information  Assignments
 Handouts
 Interesting
Links
Professor: Sanjeev Arora
 307 CS Building  2583869 arora@cs.princeton.edu
Office hrs: Mon 34pm, or by appointment. (Or try dropping
by any afternoon.)
Fridays 1:30pm4:30pm. (Room 402, CS Bldg).
Lecture topics 
More details and main readings. 
Additional readings. 
Lecture 1: The basic problem in machine learning. Intro to themes and methodology of the rest of the course. 
Max Likelihood, Maximum
Entropy Principle, and their duality. Philosophical
musings. Bayesian Learning. Examples. The simplest
cases are NPhard! What does it mean to have provable
guarantees? Survey on max entropy by K. Conrad. (Almost everything you need to know, except the duality with MLE, which is discussed in this brief note by J.Coughlan. ) JohnsonLindenstrauss Dimension Reduction (Old Princeton lecture notes; read Section 4). 
Lecture
notes on ME and MLE by S. Mahadevan. An article that describes the philosophical difficulties of MaxEntropy principle by J. Uffink. Andrew Moore's slides on learning Gaussian Mixtures and EM (just to get a taste of how such problems are currently solved) 
Lec 2: How
"Clustering" problems arise in learning. A Swissarmy knife for provably solving various "clustering" and "learning" problems. 
Singular Value Decomposition and its
properties. Read this extract from
a manuscript of a forthcoming book by Hopcroft and Kannan. Its use as a Swissarmy knife for clustering is introduced in this extract from the same book. (The version done in today's lecture.) 
The recent paper of
Kannan and Kumar that describes the more general
Swiss Army Knife. An older paper of McSherry that shows how to learn "planted" patterns in matrices (which the above paper solves with the Swiss Knife). Lecture notes (by D. Spielman) on eigenvalues of random matrices and an intro to McSherry's results. Paper on algorithms for kmeans clustering by Jaiswal, Kumar, Sen. (One of many papers out there.) 
Lec 3: Part 1: Ravi Kannan's guest
lecture on his SwissArmy Knife Algorithm (SVD
followed by Lloyd's kmeans algorithm). Part 2: Algorithms for
nonnegative matrix factorization (NMF). Techniques
introduced: matrix pseudoinverses, limited
exponential search, epsilon nets. 
Part 1: recent paper of
Kannan and Kumar. Part 2: Paper of Arora, Ge, Kannan, Moitra. 

Lec 4: Part 1: Solving NMF under the
separability condition. Applications to Blind
Source Separation, Learning LDA's (linear dirichlet
allocations) and other related matrix models.
Part 2: Intro to
graphical models. Markov random fields. The two
representations and their equivalence via
CliffordHammersley Theorem. Examples of uses in vision,
statistical physics, neural nets, etc. 
Part 1: Manuscript by
Arora, Ge, Moitra. (Handed out in class; available by
email request.) Part 2: Chapter on Gibbs State from the online book "Probabability on Graphs" by G. Grimmett. 
Paper
by Geman and Geman introducing MRF's for computer
vision. 
Mar 9: No class due to CCI
group meeting. 

Mar 16: Provably learning mixtures of
Gaussians. (Guest lecture by Ankur.) 
Paper by Moitra
and Valiant. 
Shorter CACM
paper (introduces the main ideas) 
Mar 30: Graphical models. The
inference problem. Variable elimination, junction tree theorem,
message passing. Examples: errorcorrecting codes, Discrete Boltzmann Machines (DBMs). 
Brief
lecture notes on junction trees by Mark Paskin Survey article on graphical models by Michael Jordan. Article on inference by Jordan and Weiss. 
How sensitive are graphical
models to imprecision in the model specification? An early
analysis of Pathfinder 
Apr 6: No class due to CCI
group meeting. 

April 13: Part 1: (a) Who's afraid of hardness
results (in PAC learning)? (b) Sushant talks
about Introduction to
variational methods in ML. (c) Brief student
presentations about planned course projects. 
Tutorial
by T. Jaakola on variational methods. Class handout on PAC and possible modifications. 

April 20: Nonlinear optimization gone
wild: Autoencoding, dimension reduction, MDS
(multidimensional scaling), sparse coding, RBMs, deep
learning, etc. etc. A plethora of open
problems. (b) The issue of noisestability. (c)
Batting around ideas on student projects. 
Freund and Haussler's paper
on training DBMs (v. cleanly written). The autoencoding view of deep learning is explained in this paper by Ranzato et al.. 
On
the irrelevance of Kolmogorov's theorem (A note on the
difficulty of encoding arbitrary multivariate
distributions by a fixed basis of wellbehaved
functions) (Girosi & Poggio) Many many papers and slides on Dasgupta's course homepage (link below). 
April 27: (a) Problems in natural language
processing (NLP). Digram, Trigram, and Loglinear
models. Perplexity, a measure of a model's goodness.
Probabilistic Context Free Grammers. (b) Speculation about
ubiquity of loglinear notions in ML. (c) Ankur talks about
various KlivansO'DonnelServedio papers on learning intersections of
halfspaces and possible relevance to things we've
been discussing, including loglinear type models. (d)
Continued discussion of student projects. 
Tutorial notes by Michael
Collins: Introduction
to NLP , Loglinear
Models , and PCFGs. 
Honor Code for this class
Collaborating with your classmates on assignments is encouraged (and may be essential to get the most out of the course). You must, however, list your collaborators for each problem.