Princeton University
Computer Science Department

Computer Science 511
Theoretical Machine Learning

Rob Schapire

Spring 2008

 


Directory
General Information | Schedule & Readings | Assignments | Final Project

Schedule and readings

Numbers in brackets under "reading" refer to chapters or sections of Kearns & Vazirani, available from the library's electronic reserves, or blackboard.  Note that this schedule is continually being updated as the semester progresses, and therefore is subject to change.

#

Date

Topic

Core reading

Other (optional) readings and links

1 M 2/4 General introduction to machine learning; consistency model scribe notes
2 W 2/6 Consistency model; review of probability; begin PAC model scribe notes

for a very brief review of probability, see, for instance, Appendix C.2 and C.3 in Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein.

3 M 2/11 PAC model; simple special cases; begin Occam's razor scribe notes
[1], [2.1-2.2]
Valiant's original "Theory of the Learnable" paper, which first got theoretical computer scientists working on machine learning
4 W 2/13 Prove Occam's razor; begin sample complexity for infinite hypothesis spaces scribe notes
[3.1]
original "Occam's Razor" paper
5 M 2/18 Growth function and sample complexity for infinite hypothesis spaces; VC-dimension scribe notes
[3.5]
6 W 2/20 Sauer's lemma; upper bound on sample complexity; lower bounds on sample complexity scribe notes
[3.2-3.4], [3.6]
7 M 2/25 Finish lower bound; handling inconsistent hypotheses; begin Chernoff bounds scribe notes
8 W 2/27 Finish Chernoff bounds; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting scribe notes
9 M 3/3 Boosting; training error bound scribe notes

Robert E. Schapire.
The boosting approach to machine learning: An overview.
In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003.
Postscript or gzipped postscript.

Slides.

10 W 3/5 Generalization error bounds: naive and margins-based scribe notes

Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.
Boosting the margin: A new explanation for the effectiveness of voting methods.
The Annals of Statistics, 26(5):1651-1686, 1998.
Postscript or compressed postscript.

11 M 3/10 Finish boosting; begin support-vector machines scribe notes
12 W 3/12 Support-vector machines scribe notes

Excerpt from Vapnik's The nature of statistical learning theory.

Chris Burges's tutorial on svm's
13 M 3/24 Finish support-vector machines; on-line learning; learning with expert advice; weighted majority algorithm scribe notes

Avrim Blum.
On-line algorithms in machine learning.
In Dagstuhl Workshop on On-Line Algorithms, June, 1996.
Gzipped postscript.

"mind-reading" game based on on-line learning
14 W 3/26 Finish weighted majority algorithm; perceptron algorithm scribe notes
M 3/31 (no class)
15 W 4/2 Winnow; begin regression scribe notes
16 M 4/7 linear regression; Widrow-Hoff algorithm scribe notes

Jyrki Kivinen and Manfred K. Warmuth.
Exponentiated Gradient versus Gradient Descent for Linear Predictors.
Information and Computation, 132(1):1-63, January, 1997.
pdf.

17 W 4/9 finish Widrow-Hoff; EG; converting on-line to batch scribe notes
18 M 4/14 modeling probability distributions; maximum likelihood; maximum entropy modeling of distributions scribe notes

The duality theorem given in class is taken from:

Stephen Della Pietra, Vincent Della Pietra and John Lafferty.
Inducing features of random fields.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April, 1997.
Postscript

There are a number of excellent papers and other materials on maximum entropy here.
19 W 4/16 finish maxent; begin on-line log loss scribe notes
20 M 4/21 on-line log loss and universal compression; Bayes algorithm scribe notes
21 W 4/23 shifting experts scribe notes

Mark Herbster and Manfred K. Warmuth.
Tracking the Best Expert.
Machine Learning, 32(2):151-178, August, 1998.
pdf

22 M 4/28 portfolio selection scribe notes

Avrim Blum and Adam Kalai.
Universal Portfolios With and Without Transaction Costs.
Machine Learning, 35:193-205, 1999.
Gzipped postscript.

For an opposing view to this approach, see the following short paper (and try to detect what is unusual about the writing):

Paul A. Samuelson
Why we should not make mean log of wealth big though years to act are long
Journal of Banking and Finance, 3:305-307, 1979.
pdf

23 W 4/30 game theory and learning scribe notes

Yoav Freund and Robert E. Schapire.
Game theory, on-line prediction and boosting.
In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325-332, 1996.
Postscript or compressed postscript.

For more details, see:

Yoav Freund and Robert E. Schapire.
Adaptive game playing using multiplicative weights.
Games and Economic Behavior, 29:79-103, 1999.
Postscript or compressed postscript.

 


Books

Here is a list of optional books for further background reading. All of these are being placed on reserve at the Engineering Library.