Princeton University
Computer Science Department

Computer Science 511
Foundations of Machine Learning

Rob Schapire

Spring 2004


General Information | Schedule & Readings | Assignments | Final Project

Schedule and readings

Numbers in brackets under "reading" refer to chapters or sections of Kearns & Vazirani.




Recommended reading

1 Tu 2/3 General introduction to machine learning; consistency model  
2 Th 2/5 Consistency model; review of probability  
3 Tu 2/10 PAC model; simple special cases; Occam's razor  [1], [2.1-2.2]
4 Th 2/12 Growth function and sample complexity for infinite hypothesis spaces [3]
5 Tu 2/17 VC-dimension; Sauer's lemma; upper bound on sample complexity
6 Th 2/19 lower bounds on sample complexity; handling inconsistent hypotheses
7 Tu 2/24 Chernoff bounds; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting
8 Th 2/26 Boosting; training error bound; naive generalization bounds Robert E. Schapire.
The boosting approach to machine learning: An overview.
In MSRI Workshop on Nonlinear Estimation and Classification, 2002.
Postscript or gzipped postscript.
9 Tu 3/2 Margins-based analysis of boosting Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.
Boosting the margin: A new explanation for the effectiveness of voting methods.
The Annals of Statistics, 26(5):1651-1686, 1998.
Postscript or compressed postscript.


10 Th 3/4 Finish boosting; support-vector machines Excerpt from The nature of statistical learning theory.

Audio demo.

11 Tu 3/9 Finish support-vector machines; learning with expert advice; halving algorithm Avrim Blum.
On-line algorithms in machine learning.
In Dagstuhl Workshop on On-Line Algorithms, June, 1996.
Gzipped postscript.
12 Th 3/11 Weighted majority algorithm
13 Tu 3/23 Perceptron algorithm; winnow algorithm
14 Th 3/25 Finish winnow; begin regression; linear least squares
15 Tu 3/30 Widrow-Hoff algorithm; EG Jyrki Kivinen and Manfred K. Warmuth.
Additive versus exponentiated gradient updates for linear prediction.
Information and Computation, 132(1):1-64, January, 1997.
16 Th 4/1 Converting on-line to batch; modeling probability distributions; maximum likelihood
17 Tu 4/6 Maximum entropy modeling of distributions There are a number of excellent papers and other materials on maximum entropy here.
The duality theorem given in class is taken from:

Stephen Della Pietra, Vincent Della Pietra and John Lafferty.
Inducing features of random fields.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April, 1997.

18 Th 4/8 Finish analyzing maximum entropy; on-line log loss and universal compression; begin Bayes algorithm
19 Tu 4/13 Analysis of Bayes algorithm; shifting experts Mark Herbster and Manfred K. Warmuth.
Tracking the Best Expert.
Machine Learning, 32(2):151-178, August, 1998.
20 Th 4/15 Finish shifting experts; begin portfolio selection
21 Tu 4/20 Universal portfolios algorithm; begin reinforcement learning Avrim Blum and Adam Kalai.
Universal Portfolios With and Without Transaction Costs.
Machine Learning, 35:193-205, 1999.
Gzipped postscript.
22 Th 4/22 Guest Lecture: Prof. Olga Troyanskaya
Tu 4/27
Th 4/29
Reinforcement learning (value iteration, policy iteration, Q-learning, TD(lambda)) Richard S. Sutton and Andrew G. Barto.
Reinforcement Learning: An Introduction.
MIT Press, 2001.
See especially Chapters 3, 4 and 6.



Here is a list of optional books for further background reading. All of these are being placed on reserve at the Engineering Library.