This formula conceals a trap for the unwary. Recall that x is
a d-element vector, and C is a d-by-d matrix. It turns out that if n <
d + 1, the matrix C is singular. That is very bad, since we need to invert
C to form the Mahalanobis distance.
Even if n > d + 1, this estimate for the true covariance matrix may be
very poor. When you think about it, C contains d2 elements. Taking
into account that C has to be symmetric, we can show that C contains d(d-1)/2
independent elements. We should not expect to get a good estimate for C
until our number n of examples gets close to the number d(d-1)/2 of unknown
elements. This is not a big problem if d is small, but it is not unusual
to have 50 or 100 features. If d = 100, d(d-1)/2 is about 5000, meaning
that we need around 5000 examples!
Back
to Mean
On to Regularization
Up to Learning