Mean and Variance

Consider one feature x. Suppose that we have n examples of patterns that all belong to the same class. Let the different values for the feature x be x(1), x(2), ..., x(n).

There are two important statistics that we can use to characterize this collection of examples -- the mean m and the variance v *. The mean is the arithmetic average or the center of mass:

m = [ x(1) + x(2) + ... + x(n) ] / n .

In general, if the data fall in one cluster, we expect the mean to be more or less in the center of that cluster. That is, the mean represents a typical value. The variance is a measure of the size of the cluster -- how much departure there is from the typical value. It is defined as the arithmetic average of the square of the deviations from the mean. To be more precise, the conventional definition is

v = [ ( x(1) - m )² + ( x(2) - m )² + ... + ( x(n) - m )² ] / ( n - 1 ) .

Clearly, m has the same dimensions as x, but v has those dimensions squared. The square root of the variance is the RMS value or standard deviation, s, and it has the same dimensions as x:

s = sqrt(v) .

Where the mean measures the location of the center of the cluster, the standard deviation measures its "radius". It can be shown that if x has a Gaussian distribution, 68% of the examples will be within one standard deviation of the mean, and 95% will be within 2 standard deviations.

On to Scaling Up to Mahalanobis