COS 511: Schedule & Readings

Princeton University
Computer Science Department

Computer Science 511
Foundations of Machine Learning

Rob Schapire

Spring 2006

Directory
General Information | Schedule & Readings | Assignments | Final Project

Schedule and readings

Numbers in brackets under "reading" refer to chapters or sections of Kearns & Vazirani.

#	Date	Topic	Reading
1	Tu 2/7	General introduction to machine learning; consistency model	scribe notes
2	Th 2/9	Consistency model; review of probability; begin PAC model	scribe notes for a very brief review of probability, see, for instance, Appendix C.2 and C.3 in Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein.
3	Tu 2/14	PAC model; simple special cases; begin Occam's razor	scribe notes [1], [2.1-2.2]
4	Th 2/16	Prove Occam's razor; begin sample complexity for infinite hypothesis spaces	scribe notes [3.1]
5	Tu 2/21	Growth function and sample complexity for infinite hypothesis spaces; beginVC-dimension; Sauer's lemma; upper bound on sample complexity	scribe notes [3.5]
6	Th 2/23	VC-dimension; Sauer's lemma; upper bound on sample complexity; lower bounds on sample complexity	scribe notes [3.2-3.4], [3.6]
7	Tu 2/28	Finish lower bound; handling inconsistent hypotheses; begin Chernoff bounds	scribe notes
8	Th 3/2	Finish Chernoff bounds; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting; begin boosting	scribe notes
9	Tu 3/7	Boosting; training error bound; naive generalization bounds	scribe notes Robert E. Schapire. The boosting approach to machine learning: An overview. In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003. Postscript or gzipped postscript.
10	Th 3/9	Margins-based analysis of boosting	scribe notes Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686, 1998. Postscript or compressed postscript. Slides.
11	Tu 3/14	Finish boosting; begin support-vector machines	scribe notes Excerpt from Vapnik's The nature of statistical learning theory.
12	Th 3/16	Finish support-vector machines	scribe notes
13	Tu 3/28	On-line learning; learning with expert advice; weighted majority algorithm	scribe notes Avrim Blum. On-line algorithms in machine learning. In Dagstuhl Workshop on On-Line Algorithms, June, 1996. Gzipped postscript. "mind-reading" game based on on-line learning
14	Th 3/30	Finish weighted majority algorithm; perceptron algorithm	scribe notes
15	Tu 4/4	Winnow; begin regression	scribe notes
16	Th 4/6	linear regression; Widrow-Hoff algorithm	scribe notes Jyrki Kivinen and Manfred K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1-64, January, 1997. Postscript.
17	Tu 4/11	finish Widrow-Hoff; EG; Converting on-line to batch	scribe notes
	Th 4/13	(no class)
18	Tu 4/18	modeling probability distributions; maximum likelihood; maximum entropy modeling of distributions	scribe notes There are a number of excellent papers and other materials on maximum entropy here. The duality theorem given in class is taken from: Stephen Della Pietra, Vincent Della Pietra and John Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April, 1997. Postscript
19	Th 4/20	finish maxent; begin on-line log loss	scribe notes
20	Tu 4/25	on-line log loss and universal compression; Bayes algorithm	scribe notes
21	Th 4/27	shifting experts	scribe notes Mark Herbster and Manfred K. Warmuth. Tracking the Best Expert. Machine Learning, 32(2):151-178, August, 1998. Postscript.
22	Tu 5/2	portfolio selection	scribe notes Avrim Blum and Adam Kalai. Universal Portfolios With and Without Transaction Costs. Machine Learning, 35:193-205, 1999. Gzipped postscript. For an opposing view to this approach, see the following short paper (and try to detect what is unusual about the writing): Paul A. Samuelson Why we should not make mean log of wealth big though years to act are long Journal of Banking and Finance, 3:305-307, 1979. pdf
23	Th 5/4	game theory and learning	scribe notes Yoav Freund and Robert E. Schapire. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325-332, 1996. Postscript or compressed postscript. Or, for more details, see: Yoav Freund and Robert E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79-103, 1999. Postscript or compressed postscript.

Books

Here is a list of optional books for further background reading. All of these are being placed on reserve at the Engineering Library.

Ethem Alpaydin.
Introduction to machine learning.
MIT Press, 2004.
Luc Devroye, László Györfi, Gábor Lugosi.
A probabilistic theory of pattern recognition .
Springer, 1996.
Richard O. Duda, Peter E. Hart and David G. Stork.
Pattern classification .
Wiley, 2001.
Trevor Hastie, Robert Tibshirani and Jerome Friedman.
The elements of statistical learning: data mining, inference, and prediction .
Springer, 2001.
Michael J. Kearns and Umesh V. Vazirani.
An introduction to computational learning theory.
MIT Press, 1994.
Tom M. Mitchell.
Machine learning .
McGraw-Hill, 1997.
Vladimir N. Vapnik.
The nature of statistical learning theory .
Springer, 1995.
Vladimir N. Vapnik.
Statistical learning theory .
Wiley, 1998.

Princeton University Computer Science Department

Computer Science 511 Foundations of Machine Learning Rob Schapire

Spring 2006

Schedule and readings

#

Date

Topic

Reading

Books

Princeton University
Computer Science Department

Computer Science 511
Foundations of Machine Learning

Rob Schapire