Regularization

What can be done to reduce the need for data to estimate the covariance matrix? Basically, there are only a couple of options.

Assume that the features are statistically independent. This results in a diagonal covariance matrix, C₀ = diag(v₁, v₂, ..., v_d ), where v_k is the variance for Feature k. The individual variances are pretty easy to estimate. Unfortunately, the assumption of independence is likely to be a poor one.

Assume that the covariance matrix is the same for all of the classes. This allows us to pool the data from all the classes. It is an attractive option if the number of classes is large.

Regularize the estimated covariance matrix. One way to do this is to form a convex combination of the estimate covariance matrix C and C₀, the estimate obtained assuming statistical independence: a C + (1 - a) C₀. Here a is a parameter between zero and one that you have to adjust experimentally to get the best results. This brings us to the topic of validation.

Back to Covariance

On to Validation

Up to Learn