Princeton University
Computer Science Department

Computer Science 511
Theoretical Machine Learning

Rob Schapire

Spring 2013

 


Directory
General Information | Schedule & Readings | Assignments | Final Project | blackboard

Schedule and readings

Numbers in brackets under "reading" refer to chapters or sections of Mohri et al.  Note that this schedule is continually being updated as the semester progresses, so check back often.  (If I seem to be falling behind in keeping this up-to-date, please send me a reminder.)

#

Date

Topic

Core reading

Other (optional) readings and links

1 Tu 2/5 General introduction to machine learning; consistency model [1]
scribe notes
2 Th 2/7 Consistency model; review of probability; begin PAC model [2.1]
scribe notes

[Appendix C], a review of probability (containing more material than is actually needed in this course)

3 Tu 2/12 PAC model; simple special cases; begin Occam's razor [2.2]
scribe notes
Valiant's original "Theory of the Learnable" paper, which first got theoretical computer scientists working on machine learning
4 Th 2/14 Prove Occam's razor; begin sample complexity for infinite hypothesis spaces; growth function scribe notes original "Occam's Razor" paper
5 Tu 2/19 sample complexity for infinite hypothesis spaces; VC-dimension; [3.3]
scribe notes
6 Th 2/21 Sauer's lemma; upper bound on sample complexity; lower bounds on sample complexity; [3.4]
scribe notes
7 Tu 2/26 Finish lower bound; handling inconsistent hypotheses; begin Chernoff bounds [2.3]
scribe notes
8 Th 2/28 Finish Chernoff bounds; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting [D.1-D.2]
scribe notes
9 Tu 3/5 Rademacher complexity [3.1-3.2]
scribe notes
10 Th 3/7 Finish Rademacher complexity;
Boosting; training error bound
[6.1-6.2] (okay to skip or skim 6.2.2 and 6.2.3)
slides
scribe notes
introductory chapter from Boosting: Foundations and Algorithms by Schapire & Freund
11 Tu 3/12 Generalization error bounds: direct and margins-based [6.3.1-6.3.3]
slides
margins "movie"
scribe notes
12 Th 3/14 Support-vector machines [4]; [5.1-5.3] (okay to skip or skim the many parts of these chapters that we are not covering)
scribe notes
tutorial on SVM's
13 Tu 3/26 Guest lecture:  Vladimir Vapnik on "Learning with teacher using privileged information" "A new learning paradigm: Learning using privileged information" by Vladimir Vapnik & Akshay Vashist (may need to be on Princeton intranet to access)
slides
scribe notes
14 Th 3/28 Finish support-vector machines; online learning; learning with expert advice [7.1-7.2.1]
scribe notes
"mind-reading" game based on online learning
15 Tu 4/2 Weighted majority algorithm [7.2.2-7.2.3]
scribe notes
16 Th 4/4 Perceptron algorithm; winnow [7.3]
scribe notes
17 Tu 4/9 regression; linear regression; begin Widrow-Hoff algorithm [10.1; 10.3.1;10.3.6]
scribe notes
rest of [10]

Jyrki Kivinen and Manfred K. Warmuth.
Exponentiated Gradient versus Gradient Descent for Linear Predictors.
Information and Computation, 132(1):1-63, January, 1997.

18 Th 4/11 finish Widrow-Hoff; EG; mirror descent; converting on-line to batch [7.4]
scribe notes
19 Tu 4/16 modeling probability distributions; maximum likelihood; maximum entropy modeling of distributions

The duality theorem given in class is taken from:

Stephen Della Pietra, Vincent Della Pietra and John Lafferty.
Inducing features of random fields.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April, 1997.
pdf

scribe notes

More readings, software, etc. on maximum entropy are available here.
20 Th 4/18 finish maxent; begin on-line log loss scribe notes
21 Tu 4/23 on-line log loss and universal compression; Bayes algorithm scribe notes
22 Th 4/25 shifting experts

Mark Herbster and Manfred K. Warmuth.
Tracking the Best Expert.
Machine Learning, 32(2):151-178, August, 1998.
pdf

scribe notes

23 Tu 4/30 portfolio selection

Avrim Blum and Adam Kalai.
Universal Portfolios With and Without Transaction Costs.
Machine Learning, 35:193-205, 1999.
pdf

scribe notes

For a view that says this is not the way to do things, see this short piece (and try to guess what is weird in how he wrote it):

Paul A. Samuelson
Why we should not make mean log of wealth big though years to act are long.
Journal of Banking and Finance, 3:305-307, 1979.
pdf

24 Th 5/2 game theory and learning

Yoav Freund and Robert E. Schapire.
Game theory, on-line prediction and boosting.
In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325-332, 1996.
pdf

scribe notes

For more details, see:

Yoav Freund and Robert E. Schapire.
Adaptive game playing using multiplicative weights.
Games and Economic Behavior, 29:79-103, 1999.
pdf