Princeton University
Computer Science Department

Computer Science 511
Theoretical Machine Learning

Rob Schapire

Spring 2008


General Information | Schedule & Readings | Assignments | Final Project

Schedule and readings

Numbers in brackets under "reading" refer to chapters or sections of Kearns & Vazirani, available from the library's electronic reserves, or blackboard.  Note that this schedule is continually being updated as the semester progresses, and therefore is subject to change.




Core reading

Other (optional) readings and links

1 M 2/4 General introduction to machine learning; consistency model scribe notes
2 W 2/6 Consistency model; review of probability; begin PAC model scribe notes

for a very brief review of probability, see, for instance, Appendix C.2 and C.3 in Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein.

3 M 2/11 PAC model; simple special cases; begin Occam's razor scribe notes
[1], [2.1-2.2]
Valiant's original "Theory of the Learnable" paper, which first got theoretical computer scientists working on machine learning
4 W 2/13 Prove Occam's razor; begin sample complexity for infinite hypothesis spaces scribe notes
original "Occam's Razor" paper
5 M 2/18 Growth function and sample complexity for infinite hypothesis spaces; VC-dimension scribe notes
6 W 2/20 Sauer's lemma; upper bound on sample complexity; lower bounds on sample complexity scribe notes
[3.2-3.4], [3.6]
7 M 2/25 Finish lower bound; handling inconsistent hypotheses; begin Chernoff bounds scribe notes
8 W 2/27 Finish Chernoff bounds; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting scribe notes
9 M 3/3 Boosting; training error bound scribe notes

Robert E. Schapire.
The boosting approach to machine learning: An overview.
In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003.
Postscript or gzipped postscript.


10 W 3/5 Generalization error bounds: naive and margins-based scribe notes

Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.
Boosting the margin: A new explanation for the effectiveness of voting methods.
The Annals of Statistics, 26(5):1651-1686, 1998.
Postscript or compressed postscript.

11 M 3/10 Finish boosting; begin support-vector machines scribe notes
12 W 3/12 Support-vector machines scribe notes

Excerpt from Vapnik's The nature of statistical learning theory.

Chris Burges's tutorial on svm's
13 M 3/24 Finish support-vector machines; on-line learning; learning with expert advice; weighted majority algorithm scribe notes

Avrim Blum.
On-line algorithms in machine learning.
In Dagstuhl Workshop on On-Line Algorithms, June, 1996.
Gzipped postscript.

"mind-reading" game based on on-line learning
14 W 3/26 Finish weighted majority algorithm; perceptron algorithm scribe notes
M 3/31 (no class)
15 W 4/2 Winnow; begin regression scribe notes
16 M 4/7 linear regression; Widrow-Hoff algorithm scribe notes

Jyrki Kivinen and Manfred K. Warmuth.
Exponentiated Gradient versus Gradient Descent for Linear Predictors.
Information and Computation, 132(1):1-63, January, 1997.

17 W 4/9 finish Widrow-Hoff; EG; converting on-line to batch scribe notes
18 M 4/14 modeling probability distributions; maximum likelihood; maximum entropy modeling of distributions scribe notes

The duality theorem given in class is taken from:

Stephen Della Pietra, Vincent Della Pietra and John Lafferty.
Inducing features of random fields.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April, 1997.

There are a number of excellent papers and other materials on maximum entropy here.
19 W 4/16 finish maxent; begin on-line log loss scribe notes
20 M 4/21 on-line log loss and universal compression; Bayes algorithm scribe notes
21 W 4/23 shifting experts scribe notes

Mark Herbster and Manfred K. Warmuth.
Tracking the Best Expert.
Machine Learning, 32(2):151-178, August, 1998.

22 M 4/28 portfolio selection scribe notes

Avrim Blum and Adam Kalai.
Universal Portfolios With and Without Transaction Costs.
Machine Learning, 35:193-205, 1999.
Gzipped postscript.

For an opposing view to this approach, see the following short paper (and try to detect what is unusual about the writing):

Paul A. Samuelson
Why we should not make mean log of wealth big though years to act are long
Journal of Banking and Finance, 3:305-307, 1979.

23 W 4/30 game theory and learning scribe notes

Yoav Freund and Robert E. Schapire.
Game theory, on-line prediction and boosting.
In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325-332, 1996.
Postscript or compressed postscript.

For more details, see:

Yoav Freund and Robert E. Schapire.
Adaptive game playing using multiplicative weights.
Games and Economic Behavior, 29:79-103, 1999.
Postscript or compressed postscript.



Here is a list of optional books for further background reading. All of these are being placed on reserve at the Engineering Library.