General Information | Assignments | Handouts | Interesting Links|
Professor: Sanjeev Arora - 307 CS Building - 258-3869 firstname.lastname@example.org Office hrs: Mon 3-4pm, or by appointment. (Or try dropping by any afternoon.)
Fridays 1:30pm-4:30pm. (Room 402, CS Bldg).
||More details and main readings.
|Lecture 1: The basic problem in machine learning.
Intro to themes and methodology of the rest of the course.
|Max Likelihood, Maximum
Entropy Principle, and their duality. Philosophical
musings. Bayesian Learning. Examples. The simplest
cases are NP-hard! What does it mean to have provable
Survey on max entropy by K. Conrad. (Almost everything you need to know, except the duality with MLE, which is discussed in this brief note by J.Coughlan. )
Johnson-Lindenstrauss Dimension Reduction (Old Princeton lecture notes; read Section 4).
notes on ME and MLE by S. Mahadevan.
An article that describes the philosophical difficulties of Max-Entropy principle by J. Uffink.
Andrew Moore's slides on learning Gaussian Mixtures and EM (just to get a taste of how such problems are currently solved)
|Lec 2: How
"Clustering" problems arise in learning.
A Swiss-army knife for provably solving various "clustering" and "learning" problems.
|Singular Value Decomposition and its
properties. Read this extract from
a manuscript of a forthcoming book by Hopcroft and Kannan.
Its use as a Swiss-army knife for clustering is introduced in this extract from the same book. (The version done in today's lecture.)
|The recent paper of
Kannan and Kumar that describes the more general
Swiss Army Knife.
An older paper of McSherry that shows how to learn "planted" patterns in matrices (which the above paper solves with the Swiss Knife).
Lecture notes (by D. Spielman) on eigenvalues of random matrices and an intro to McSherry's results.
Paper on algorithms for k-means clustering by Jaiswal, Kumar, Sen. (One of many papers out there.)
|Lec 3: Part 1: Ravi Kannan's guest
lecture on his Swiss-Army Knife Algorithm (SVD
followed by Lloyd's k-means algorithm). Part 2: Algorithms for
nonnegative matrix factorization (NMF). Techniques
introduced: matrix pseudoinverses, limited
exponential search, epsilon nets.
||Part 1: recent paper of
Kannan and Kumar.
Part 2: Paper of Arora, Ge, Kannan, Moitra.
|Lec 4: Part 1: Solving NMF under the
separability condition. Applications to Blind
Source Separation, Learning LDA's (linear dirichlet
allocations) and other related matrix models.
Part 2: Intro to
graphical models. Markov random fields. The two
representations and their equivalence via
Clifford-Hammersley Theorem. Examples of uses in vision,
statistical physics, neural nets, etc.
||Part 1: Manuscript by
Arora, Ge, Moitra. (Handed out in class; available by
Part 2: Chapter on Gibbs State from the online book "Probabability on Graphs" by G. Grimmett.
by Geman and Geman introducing MRF's for computer
|Mar 9: No class due to CCI
|Mar 16: Provably learning mixtures of
Gaussians. (Guest lecture by Ankur.)
||Paper by Moitra
paper (introduces the main ideas)
|Mar 30: Graphical models. The
inference problem. Variable elimination, junction tree theorem,
Examples: error-correcting codes, Discrete Boltzmann Machines (DBMs).
lecture notes on junction trees by Mark Paskin
Survey article on graphical models by Michael Jordan.
Article on inference by Jordan and Weiss.
|How sensitive are graphical
models to imprecision in the model specification? An early
analysis of Pathfinder
|Apr 6: No class due to CCI
|April 13: Part 1: (a) Who's afraid of hardness
results (in PAC learning)? (b) Sushant talks
about Introduction to
variational methods in ML. (c) Brief student
presentations about planned course projects.
by T. Jaakola on variational methods.
Class handout on PAC and possible modifications.
|April 20: Nonlinear optimization gone
wild: Autoencoding, dimension reduction, MDS
(multidimensional scaling), sparse coding, RBMs, deep
learning, etc. etc. A plethora of open
problems. (b) The issue of noise-stability. (c)
Batting around ideas on student projects.
||Freund and Haussler's paper
on training DBMs (v. cleanly written).
The autoencoding view of deep learning is explained in this paper by Ranzato et al..
the irrelevance of Kolmogorov's theorem (A note on the
difficulty of encoding arbitrary multivariate
distributions by a fixed basis of well-behaved
functions) (Girosi & Poggio)
Many many papers and slides on Dasgupta's course homepage (link below).
|April 27: (a) Problems in natural language
processing (NLP). Digram, Trigram, and Loglinear
models. Perplexity, a measure of a model's goodness.
Probabilistic Context Free Grammers. (b) Speculation about
ubiquity of loglinear notions in ML. (c) Ankur talks about
various Klivans-O'Donnel-Servedio papers on learning intersections of
halfspaces and possible relevance to things we've
been discussing, including loglinear type models. (d)
Continued discussion of student projects.
||Tutorial notes by Michael
to NLP , Loglinear
Models , and PCFGs.
Honor Code for this class
Collaborating with your classmates on assignments is encouraged (and may be essential to get the most out of the course). You must, however, list your collaborators for each problem.