Validation

One of the key questions about any classifier is how well it will perform, i.e., what is the error rate.

One way to estimate the error rate is to see how well the classifier classifies the examples used to estimate the parameters. This is called testing on the training data. It is a good necessary condition -- if the classifier cannot correctly classify the examples used to train it, it is definitely worthless. However, it is a very poor measure of generalization ability. It inevitably promises much better performance than will be obtained with independent test data.

One simple validation procedure is to divide the available data into two disjoint subsets -- a subset used to train the classifier, and a subset used to test it. This is called the holdout method. It works reasonably well, but it often results in suboptimal performance because if you holdout enough examples to get a good test you will not have enough examples left for training.

There are several other more sophisticated alternatives. For example, in k-fold cross-validation, the examples are divided into k subsets of equal size. (A common choice is k = 10.) The system is designed k times, each time leaving out one of the subsets from training, and using the omitted subset for testing. Although this approach is time consuming, most of the data can be used for training while still having enough independent tests to estimate the error rate.

In a related approach called bootstrapping, one forms the the test set by randomly sampling the set of examples. See Masters for a good discussion of these alternatives.

Back to Regularization

Up to Learning