MIT Course 6.858/18.428: Machine Learning
This machine learning course covered the following topics:
 Formal models of machine learning
 Learning concepts from examples
 Learnable classes of concepts
 PAC learning
 VCdimension
 Bayesian Inference
 Neural Nets
 Learning from queries
 Learning with noise
 Learning finite automata
 Hidden Markov Models
Available Lecture Notes Fall 1994
 Lecture 1:
Introduction to the course. Defining models for machine learning.
Learning conjunctions in the mistakebounded model.
 Lecture 2:
Review of online mistakebounded model. The halving algorithm.
Winnow algorithm for learning linearly separable boolean functions.

Lecture 3: Review of the Winnow Algorithm. Perceptron
Convergence Theorem. Relationship between VCdimension and mistake
bounds.
 Lecture 4:
Probably Approximately Correct (PAC) Learning. PAC learning
conjunctions.
 Lecture 5:
Intractability of learning 3term DNF by 3term DNF. Occam
Algorithms. Learning 3term DNF by 3CNF.
 Lecture 6:
Learning kdecision lists. Occam's Razor (general case). Learning
conjunctions with few literals.
 Lecture 7:
VCdimension and PAC learning
 Lecture 8: PAC
learnability of infinite concept classes with finite VCdimension.

Lecture 9: Estimating error rates. Uniform convergence and
VCdimension.
 Lecture 10: Lower
bound on sample complexity for PAC learning. Cover's coin problem.

Lecture 11: Bayesian learning. Minimum description length
principle.
 Lecture 12:
Definition of weak learning. Confidence boosting. Accuracy boosting.

Lecture 13: Finish proof that weak learnability implies strong
learnability.
 Lecture 14:
Finish discussion of weak learnability. Freund's boosting method.
Introduction to neural networks.
 Lecture 15:
Neural networks and back propagation.
 Lecture 16:
Applications of neural nets. Sejnowski and Rosenberg's NETtalk system
for learning to pronounce English text. Gorman and Sejnowski's network
for learning how to classify sonar targets.
 Lecture 17:
Computational complexity of training neural nets. Training a 3Node
Neural Node is NPcomplete. Expressive power of continuous valued
neural networks with only two hidden layers.
 Lecture 18:
VCdimension of neural networks. Asymptotic error rates of neural
networks.
 Lecture 19:
Learning in the presence of noise. Malicious noise. Classification
noise. Minimizing disagreements to handle
classification noise. Minimizing disagreements for conjunctions in NPhard.
 Lecture 20:
Statistical query model. Statistical query algorithm for conjunctions.
Statistical query learnability implies PAC learnability in the
presence of classification noise.
 Lecture 21:
Learning decision trees using the fourier spectrum (in the membership
query model, with respect to the uniform distribution).
 Lecture 22:
Finish algorithm for learning decision trees.
Learning DNF with membership queries with respect to the uniform
distribution.
 Lecture 23:
Learning finite automata with membership and equivalence queries.
 Lecture 24:
Finish algorithm for learning finite automata with membership and
equivalence queries. Learning finite automata without resets using
homing sequences.
 Lecture 25:
Hidden Markov models and an application to speech recognition.
 Lecture 26:
Piecemeal learning of unknown environments.
Ron Rivest (rivest@theory.lcs.mit.edu)
Mona Singh (mona@cs.princeton.edu)