COS 511
Theoretical Machine Learning
Roi Livni

Spring 2017

Course Summary

Can the mechanism of learning be automated and implemented by a machine?
In this course we formally define and study various models that have been proposed for learning. The course will present and contrast the statistical, computational and online models for learning. We will present and rigorously analyze some of the most successful algorithms in machine learning that are extensively used today. Likely topics include: intro to statistical learning theory and bounds on the number of random examples needed to learn; learning in adversarial settings and the on-line learning model; using convex optimization to model and solve learning problems; learning with partial observability; how to boost the accuracy of a weak learning algorithm; universal portfolio selection; unsupervised learning; spectral methods in learning.

Administrative Information

Lectures: Thu, 13:30-14:50 CS Building, 105.

Wed, 16:30-17:50 Friend Center, 008.

Professor: Roi Livni, CS building 219, Reception hours: Wed 11:00-12:00, or by appointment.

Teaching Assistants: Kiran Vodrahalli, CS building 402 , Reception hours: Mon 15:00-16:30, or by appointment.

Requirements: This is a graduate-level course that requires significant mathematical background.
Required background: probability, discrete math, calculus, analysis, linear algebra, algorithms and data structures, theory of computation / complexity theory
Recommended: linear programming, mathematical optimization, game theory

Attendance and the use of electronic devices: Attendance is expected at all lectures. The use of laptops and similar devices for note-taking is permitted.

Textbook and readings

1. (draft) Introduction to Online Convex Optimization, by E. Hazan, available here
2. An Introduction To Computational Learning Theory, by M.J. Kearns and U. Vazirani
3. Prediction, Learning and Games, by N. Cesa-Bianchi and G. Lugosi
4. Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David
5. Boosting: Foundations and Algorithms, by R. E. Schapire and Y. Freund
6. Foundations of Machine Learning, by Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar

We'll post additional optional reading as the course progresses.

Grading and collaboration

Grading: homework assignments (65%) and a final take home exam (35%). There will be 6-8 homework assignments, which will be given roughly every week or two, and will each consist of a small number of problems (optional programming). Credit for all assignments is the same.
Additional credit is given for constructive participation in class and solution of occasional "bonus problems" presented in class. Points may be deducted for extensive no-show to class.
Turning in assignments and late policy: Coordinate submission of assignments with the TA. Every late day reduces the attainable credit for the exercise by 10%.

Collaboration: Collaboration on the problem sets is allowed. Final exam is individual. The people you collaborated with on assignments should be clearly detailed: before the solution to each question, list all people that you collaborated with on that particlar question.

Number Topic Lecture notes Extra reading Problem sets
Introduction to Learning Theory
The PAC Model
  Lecture 1     Chapters 1-5 in book 4  
Agnostic PAC, ERM
Learnability of Finite Classes
  Lecture 2     Chapters 1-5 in book 4  
VC dimension
The Fundemental Theorem
  Lecture 3     Chapters 1-5 in book 4  
No Free Lunch
Sauer's Lemma
  Lecture 4     Chapters 1-5 in book 4
  The probablistic method
 -- N. Alon & J. Spencer
Exercise 1
Proof of Fund. Thm.
Neural Networks
  Lecture 5     Chapters 1-5, and 20 in book 4
  The probablistic method
  -- N. Alon & J. Spencer
Neural Networks.
Expressiveness and Sample Complexity
  Lecture 6     Chapter 20 in book 4
  Neural Network Learning: Theoretical Foundations
  --M. Anthony & P. Bartlett
Exercise 2
Generalized PAC Model
Computational Learning Theory
  Lecture 7     Chapter 8 in book 4
  Santosh Vempala's lecture notes on LP:The Ellipsoid Algorithm
Perceptron Algorithm
Hardness of Learning
  Lecture 8     A, Klivans A. Sherstov (2009)
Cryptographic Hardness of Learning Intersections of Halfspaces

  Lecture 9     Y. Freund; R. Schapire (1997).
"A decision-theoretic generalization of on-line learning and an application to boosting"

Boosting Cont
Convex Learning Problems     
  Lecture 10     Book 5
Surrogate Loss
Rademacher Complexity      
  Lecture 11     Cahpter 26 in Book 4 Exercise 3
Rademacher Bounds and Calculus      
  Lecture 12     Chapter 26 in Book 4
The Sample Complexity of Linear Classes
  Stochastic Gradient Descent    
  Lecture 13     Chapter 26 in Book 4
S. Shalev Shwatz, O. Shamir, N. Srebro, K. Sridharan (2009).
"Stochastic Convex Optimization"
Analysis of SGD    
  Lecture 14    
Kernel Methods    
  Lecture 15     Exercise 4
Neural Network 2 & Backpropogation    
  Lecture 16    
Online Learning    
  Lecture 17    
Expert Advice (Hedge Algorithm)
Online to Batch    
  Lecture 18    
Strong Convexity (Logarithmic Regret)
Second Order Methods    
  Lecture 19    
Online Newton Step Analysis    
  Lecture 20    
Follow The Regulerized Leader    
  Lecture 21     Exercise 5
Learning with Partial Feedback    
  Lecture 22    
Bandit Convex Optimization    
  Lecture 23