Regularization

What can be done to reduce the need for data to estimate the covariance matrix? Basically, there are only a couple of options.
  1. Assume that the features are statistically independent. This results in a diagonal covariance matrix, C0 = diag(v1, v2, ..., vd ), where vk is the variance for Feature k. The individual variances are pretty easy to estimate. Unfortunately, the assumption of independence is likely to be a poor one.

  2. Assume that the covariance matrix is the same for all of the classes. This allows us to pool the data from all the classes. It is an attractive option if the number of classes is large.

  3. Regularize the estimated covariance matrix. One way to do this is to form a convex combination of the estimate covariance matrix C and C0, the estimate obtained assuming statistical independence: a C + (1 - a) C0. Here a is a parameter between zero and one that you have to adjust experimentally to get the best results. This brings us to the topic of validation.

Left arrow Back to Covariance Right arrow On to Validation Up arrow Up to Learn