COS 511: Schedule & Readings

Princeton University
Computer Science Department

Computer Science 511
Theoretical Machine Learning

Rob Schapire

Spring 2014

Directory
General Information | Schedule & Readings | Assignments | Final Project | blackboard

Schedule and readings

You can choose to do readings from either the "textbook track" or the "alternative track" (or both).

Numbers in square brackets (e.g. [3]) under "textbook track" refer to chapters or sections of the Mohri et al. textbook.

Some of the readings are available through blackboard (after logging on, click on "e-reserves"). In particular, numbers in braces (e.g. {3}) under "alternative track" refer to chapters or sections of the Kearns & Vazirani textbook ("An Introduction to Computational Learning Theory"), available through e-reserves. If you are not registered for the course but want to access these readings, let me know so that we can arrange guest access to blackboard.

Some of these readings can only be accessed when using the Princeton intranet. (See this link for more details, if connecting remotely.)

Note that this schedule is continually being updated as the semester progresses, so check back often. (If I seem to be falling behind in keeping this up-to-date, please send me a reminder.)

#	Date	Topic	Core reading		Other (optional) readings and links
			Textbook track	Alternative track
1	Tu 2/4	General introduction to machine learning; consistency model	[1]
			scribe notes
2	Th 2/6	Consistency model; review of probability; begin PAC model	[2.1]		for a review of probability, see (for instance): [Appendix C] from Mohri et al. (containing more material than is actually needed in this course) OR: Appendix C.2 and C.3 in "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein (on e-reserves).
			scribe notes
3	Tu 2/11	PAC model; simple special cases; begin Occam's razor	[2.2]	{1}, {2.1-2.2}	Valiant's original "Theory of the Learnable" paper, which first got theoretical computer scientists working on machine learning
			scribe notes
4	Th 2/13	Prove Occam's razor; begin sample complexity for infinite hypothesis spaces; growth function		{3.1}	original "Occam's Razor" paper
			scribe notes
5	Tu 2/18	sample complexity for infinite hypothesis spaces; VC-dimension	[3.3]	{3.5}	Vapnik and Chervonenkis's original paper "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities"
			scribe notes
6	Th 2/20	Sauer's lemma; upper bound on sample complexity; lower bounds on sample complexity	[3.4]	{3.2-3.4}, {3.6}
			scribe notes
7	Tu 2/25	Finish lower bound; handling inconsistent hypotheses; begin Chernoff bounds	[2.3]
			scribe notes
8	Th 2/27	Finish Chernoff bounds; McDiarmid's inequality; error bounds for inconsistent hypotheses; overfitting	[D.1-D.2]	"Probability Inequalities for Sums of Bounded Random Variables" by Wassily Hoeffding (first four sections)
			scribe notes
9	Tu 3/4	Rademacher complexity	[3.1-3.2]	Section 3 of "Theory of Classification: A Survey of Some Recent Advances" by Boucheron, Bousquet, Lugosi
			scribe notes
10	Th 3/6	Finish Rademacher complexity; begin boosting	[6.1-6.2] (okay to skip or skim 6.2.2 and 6.2.3)	Chapter 1 and Sections 3.0-3.1 of Boosting: Foundations and Algorithms by Schapire & Freund
			scribe notes
11	Tu 3/11	Boosting	[6.3.1-6.3.3]	Sections 4.0-4.1 and 5.0-5.4.1 (except 5.2.2-5.2.3) of Boosting: Foundations and Algorithms.
			slides (toy example) slides (learning curves) scribe notes
12	Th 3/13	Finish boosting
			margins movie scribe notes
13	Tu 3/25	Support-vector machines	[4]; [5.1-5.3] (okay to skip or skim the many parts of these chapters that we are not covering)	Sections 5.4-5.8 of The Nature of Statistical Learning Theory by Vapnik.	tutorial on SVM's
			scribe notes
14	Th 3/27	Finish support-vector machines; online learning; learning with expert advice	[7.1-7.2.1]	Sections 1 and 2 of "The Weighted Majority Algorithm" by Littlestone and Warmuth	"mind-reading" game based on online learning
			scribe notes
15	Tu 4/1	Weighted majority algorithm	[7.2.2-7.2.3]	Sections 5 and 6 of "The Weighted Majority Algorithm" by Littlestone and Warmuth
			scribe notes
16	Th 4/3	Perceptron algorithm; winnow	[7.3]	(see textbook track)
			scribe notes
17	Tu 4/8	regression; linear regression; begin Widrow-Hoff algorithm	[10.1; 10.3.1;10.3.6]	Sections 1-5 of "Exponentiated Gradient versus Gradient Descent for Linear Predictors" by Kivinen and Warmuth
			scribe notes
18	Th 4/10	finish Widrow-Hoff; EG; mirror descent; converting on-line to batch	[7.4]	(see textbook track)
			scribe notes
19	Tu 4/15	modeling probability distributions; maximum likelihood; maximum entropy modeling of distributions	The duality theorem given in class is taken from: Stephen Della Pietra, Vincent Della Pietra and John Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380-393, April, 1997. pdf scribe notes		More readings, software, etc. on maximum entropy are available here (although a bit out of date).
20	Th 4/17	finish maxent; begin on-line log loss	scribe notes
21	Tu 4/22	on-line log loss and universal compression; Bayes algorithm	scribe notes
22	Th 4/24	shifting experts	Mark Herbster and Manfred K. Warmuth. Tracking the Best Expert. Machine Learning, 32(2):151-178, August, 1998. pdf scribe notes
23	Tu 4/29	portfolio selection	Avrim Blum and Adam Kalai. Universal Portfolios With and Without Transaction Costs. Machine Learning, 35:193-205, 1999. pdf scribe notes		For a view that says this is not the way to do things, see this short piece (and try to guess what is weird in how he wrote it): Paul A. Samuelson Why we should not make mean log of wealth big though years to act are long. Journal of Banking and Finance, 3:305-307, 1979. pdf
24	Th 5/1	game theory and learning	Yoav Freund and Robert E. Schapire. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325-332, 1996. pdf scribe notes		For more details, see: Yoav Freund and Robert E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79-103, 1999. pdf OR: Chapter 6 of Boosting: Foundations and Algorithms by Schapire & Freund

Princeton University Computer Science Department

Computer Science 511 Theoretical Machine Learning Rob Schapire

Spring 2014

Schedule and readings

#

Date

Topic

Core reading

Other (optional) readings and links

Princeton University
Computer Science Department

Computer Science 511
Theoretical Machine Learning

Rob Schapire