Princeton University

Computer Science 511


You can choose to do readings from either the "textbook track" or the "eclectic track" (or both).
Numbers in square brackets (e.g. [3]) under "textbook track" refer to chapters or sections of the Mohri et al. textbook.
Some of the readings are available through blackboard (after logging on, click on "reserves"). In particular, numbers in braces (e.g. {3}) under "eclectic track" refer to chapters or sections of the Kearns & Vazirani textbook ("An Introduction to Computational Learning Theory"), available on blackboard. If you are not registered for the course but want to access these readings, let me know so that we can arrange guest access to blackboard.
Some of these readings can only be accessed when using the Princeton intranet. (See this link for more details, if connecting remotely.)
Note that this schedule is continually being updated as the semester progresses, so check back often. (If I seem to be falling behind in keeping this uptodate, please send me a reminder.)
# 
Date 
Topic 
Core reading 
Other (optional) readings and links 

Textbook track  Eclectic track  
1  M 2/4  General introduction to machine learning  [1]  Valiant on "Nature's algorithms for learning and prospering in a complex world" (chapter 1 of his book, Probably Approximately Correct, available on blackboard reserves)  
2  W 2/6  Consistency model; review of probability; begin PAC model  [2.1]  for a review of probability, see (for
instance): [Appendix C] from Mohri et al. (containing more material than is actually needed in this course) OR: Appendix C.2 and C.3 in "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. 

3  M 2/11  PAC model; simple special cases; begin Occam's razor  [2.2]  {1}, {2.12.2}  Valiant's original "Theory of the Learnable" paper, which first got theoretical computer scientists working on machine learning 
4  W 2/13  Prove Occam's razor; the consistency model and PAC learning  {3.1}  original "Occam's Razor" paper  
5  M 2/18  Growth function; sample complexity for infinite hypothesis spaces  [3.3]  {3.5}  Vapnik and Chervonenkis's original
paper "On the Uniform Convergence of Relative Frequencies of
Events to Their Probabilities"
The classic that introduced Vapnik and Chervonenkis's work to the learning theory community: "Learnability and the VapnikChervonenkis dimension" 
6  W 2/20  VCdimension; begin Sauer's lemma  [3.4]  {3.23.4}, {3.6}  
7  M 2/25  Finish Sauer's lemma; lower bounds on sample complexity; begin inconsistent hypotheses  [2.3]  
8  W 2/27  Inconsistent hypotheses; Chernoff bounds  [D.1; D.7]  "Probability Inequalities for Sums of Bounded Random Variables" by Wassily Hoeffding (first four sections)  
9  M 3/4  McDiarmid's inequality; overfitting; Rademacher complexity  [3.13.2]  Section 3 of "Theory of Classification: A Survey of Some Recent Advances" by Boucheron, Bousquet, Lugosi  
10  W 3/6  Finish Rademacher complexity; begin boosting 
[7.1]  Chapter 1 of Boosting: Foundations and Algorithms by Schapire & Freund  
11  M 3/11  Boosting  [7.2; 7.3.1] (okay to skip or skim 7.2.2 and 7.2.3)  Sections 3.03.1 and 4.04.1 of Boosting: Foundations and Algorithms.  
12  W 3/13  Finish boosting  [7.3.27.3.3]  Sections 5.05.4.1 (except 5.2.25.2.3) of Boosting: Foundations and Algorithms.  A highlevel overview of some of the various approaches that have attempted to explain AdaBoost as a learning method: "Explaining AdaBoost" [by Schapire, from Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik] 
13  M 3/25  Supportvector machines  [5]; [6.16.3] (okay to skip or skim the many parts of these chapters that we are not covering)  Sections 5.45.8 of The Nature of Statistical Learning Theory by Vapnik.  tutorial on SVM's 
14  W 3/27  Finish supportvector machines; online learning; learning with expert advice  [8.18.2.1]  Sections 1 and 2 of "The Weighted Majority Algorithm" by Littlestone and Warmuth  A comparison of boosting and SVM's is given in Section 5.6 of Boosting: Foundations and Algorithms 
15  M 4/1  Weighted majority algorithm  [8.2.28.2.3]  Sections 5 and 6 of "The Weighted Majority Algorithm" by Littlestone and Warmuth  A "mindreading" game based on online learning 
16  W 4/3  Perceptron algorithm; begin winnow  [8.3]  (see textbook track)  clip from documentary discussing "perceptron research from the 50's and 60's" 
17  M 4/8  winnow; regression; linear regression; begin WidrowHoff algorithm  [11.1; 11.3.1;11.3.6]  Sections 15 of "Exponentiated Gradient versus Gradient Descent for Linear Predictors" by Kivinen and Warmuth  
18  W 4/10  finish WidrowHoff; EG; mirror descent; converting online to batch  [8.4]  (see textbook track)  
19  M 4/15  modeling probability distributions; maximum likelihood; maximum entropy modeling of distributions  [12.112.7] 
The duality theorem given in class is taken
from: Stephen Della Pietra, Vincent Della Pietra and John Lafferty. 
Maxent (software and readings) for modeling species distributions is available here. 
20  W 4/17  finish maxent; begin online log loss  for a brief review of limits, continuity, etc., see (for instance): Appendix A.4 and A.5 in Boosting: Foundations and Algorithms.  
21  M 4/22  online log loss and universal compression; Bayes algorithm; begin shifting experts  Sections 9.19.3 of Prediction, Learning and Games by CesaBianchi & Lugosi  
22  W 4/24  shifting experts  Mark Herbster and Manfred K. Warmuth. Tracking the Best Expert. Machine Learning, 32(2):151178, August, 1998. 

23  M 4/29  portfolio selection  Avrim Blum and Adam Kalai. Universal Portfolios With and Without Transaction Costs. Machine Learning, 35:193205, 1999. 
For a view that says this is not the way to do
things, see this short piece (and try to guess what is weird in how he wrote
it):
Paul A. Samuelson 

24  W 5/1  game theory and learning  Yoav Freund and Robert E. Schapire. Game theory, online prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325332, 1996. 
For more details, see: Yoav Freund and Robert
E. Schapire.
Adaptive game playing using multiplicative weights.
Games and Economic Behavior, 29:79103, 1999. OR: Chapter 6 of Boosting: Foundations and Algorithms. 