COS 511 FOUNDATIONS OF MACHINE LEARNING
 Instructor: Rob Schapire
 TA: Donna Gabai
 Time: Tue / Thurs 11:0012:20
 Place: Frick Lab room 124
General course information.
Preparing scribe notes.
Final project information.
Homeworks
Lecture notes
 Feb. 4:
Introduction, basic definitions, the consistency model.
ps
pdf
 Feb. 6:
Review of probability; PAC learning model.
ps
pdf
 Feb. 11:
Examples of PAC learnable classes; Sample complexity for finite
hypothesis spaces; Occam's razor.
ps
pdf
 Feb. 13:
General bounds on sample complexity for infinite hypothesis spaces.
ps
pdf
 Feb. 18:
Sample complexity bound in terms of the growth function;
VCdimension; Sauer's Lemma.
ps
pdf
 Feb. 20:
A lower bound on sample complexity using VCdimension; handling
inconsistent hypotheses; beginning of Chernoff bounds.
ps
pdf
 Feb. 25:
Chernoff bounds proof; McDiarmid's inequality; error bounds
for inconsistent hypotheses; overfitting.
ps
pdf
 Feb. 27:
Boosting: Introduction and bounding the training error.
ps
pdf
 Mar. 4:
Generalization error of boosting.
ps
pdf
 Mar. 6:
Finished generalization error of boosting; supportvector machines.
ps
pdf
 Mar. 11:
Finished supportvector machines; learning with expert advice; the
halving algorithm.
ps
pdf
 Mar. 13:
Continued learning with expert advice; the weighted majority algorithm.
ps
pdf
 Mar. 25:
Perceptron and winnow.
ps
pdf
 Mar. 27:
Analysis of winnow; square loss; beginning linear regression.
ps
pdf
 Apr. 1:
Finish linear regression; WidrowHoff and other online algorithms
for linear regression.
ps
pdf
 Apr. 3:
Converting online to batch; neural networks; modelling probability
distributions.
ps
pdf
 Apr. 8:
Maximum likelihood and maximum entropy modeling of distributions.
ps
pdf
 Apr. 10:
Analysis of maximum entropy; Bayes algorithm for online learning
with log loss.
ps
pdf
 Apr. 15:
Bayes algorithm; switching experts; beginning investment portfolios.
ps
pdf
 Apr. 17:
Cover's universal portfolio algorithm; tips on running machine
learning experiments.
ps
pdf
 Apr. 22:
Membership and equivalence queries; Angluin's algorithm for learning
finite automata.
ps
pdf
 Apr. 24:
Angluin's algorithm continued; begin reinforcement learning and
Markov decision processes.
ps
pdf
 Apr. 29:
Reinforcement learning.
ps
pdf
 May 1:
More reinforcement learning.
ps
pdf
Additional Readings
(Related papers that go into more detail on topics discussed in class.)

Robert E. Schapire.
The boosting approach to machine learning: An
overview.
In MSRI Workshop on Nonlinear Estimation and
Classification, 2002.
Postscript or
gzipped postscript.

Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.
Boosting the margin: A new explanation for the
effectiveness of voting methods.
The Annals of Statistics, 26(5):16511686, 1998.
Postscript or
compressed postscript.

Avrim Blum.
Online algorithms in machine learning.
In Dagstuhl Workshop on OnLine Algorithms, June, 1996.
Gzipped postscript.

Jyrki Kivinen and Manfred K. Warmuth.
Additive versus exponentiated gradient updates
for linear prediction.
Information and Computation, 132(1):164, January, 1997.
Postscript.

There are a number of excellent papers and other materials on maximum entropy
here.
The duality theorem given in class is taken from:

Stephen Della Pietra, Vincent Della Pietra and John Lafferty.
Inducing features of random fields.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 19(4):380393, April, 1997.
Postscript.

Mark Herbster and Manfred K. Warmuth.
Tracking the Best Expert.
Machine Learning, 32(2):151178, August, 1998.
Postscript.

Avrim Blum and Adam Kalai.
Universal Portfolios With and Without Transaction Costs.
Machine Learning, 35:193205, 1999.
Gzipped postscript.

Richard S. Sutton and Andrew G. Barto.
Reinforcement Learning: An Introduction.
MIT Press, 2001.
See especially Chapters 3, 4 and 6.
Books
Here is a list of optional books for further background reading.
All of these are being placed on reserve at the Engineering Library.

Luc Devroye, László Györfi, Gábor Lugosi.
A probabilistic theory of pattern recognition
.
Springer, 1996.

Richard O. Duda, Peter E. Hart and David G. Stork.
Pattern classification
.
Wiley, 2001.

Trevor Hastie, Robert Tibshirani and Jerome Friedman.
The elements of statistical learning: data mining,
inference, and prediction
.
Springer, 2001.

Michael J. Kearns and Umesh V. Vazirani.
An introduction to computational learning
theory.
MIT Press, 1994.

Tom M. Mitchell.
Machine learning
.
McGrawHill, 1997.

Vladimir N. Vapnik.
The nature of statistical learning theory
.
Springer, 1995.

Vladimir N. Vapnik.
Statistical learning theory
.
Wiley, 1998.