Statistical Classification Example

Solutions

```Training Data:
Male L1 Male P1 Male T1         Fem L1  Fem P1  Fem T1
M/F1	3.51	2.52	2.49		2.89	2.21	2.17
M/F2	3.54	2.51	2.51		2.95	2.2	2.29
M/F3	3.47	2.51	2.5		3.12	2.25	2.35
M/F4	3.51	2.51	2.49		2.75	2.07	1.99
M/F5	3.55	2.54	2.53		3.05	2.14	2.21

Means	3.516	2.518	2.504		2.952	2.174	2.202
Stdevs  0.0313  0.0130  0.0167          0.1435  0.0702  0.1375

a)  We calculate distances of each unknown to the male and female means.
Then we use minimum distance to classify each unknown input handscan.

Unknown Data    L1      P1      T1    DistM3D   DistF3D  Class
x1      3.3     2.7     2.2   0.41496   0.63070   Male
x2      2.9     1.99    2.2   0.86640   0.19121   Fem
x3      2.55    2.31    2.4   0.99359   0.46829   Fem

b) Your finger.  Some mis-classifications

c) Compute the Covariance Matrix:
c(x,y) = ( (x[0]-mean_x)*(y[0]-mean_y) + ... +
(x[n-1]-mean_x) * (y[n-1]-mean_y)) / (n-1)

L        P       T
L  0.00098  0.00024 0.000345
P  0.00024  0.00017 0.00016
T  0.000345 0.00016 0.00028

d) What does it tell us?   Technically, NOTHING.  There aren't
really enough training vectors to reliably say much about
the covariance.  If it did tell us something, however it would
be that the Long finger varies the most, and the Pinkie the least
(main diagonal, we knew this anyway from the standard deviations
we calculated above).  To see how things co-vary, we need to divide
by the individual standard deviations, then inspect.

SD(L1) = sqrt( c(L1,L1) ) = .0313
SD(P1) = sqrt( c(P1,P1) ) = .0130
SD(T1) = sqrt( c(T1,T1) ) = .0167

now we can find the coefficient of correlation cc between each pair such that:

c(x,y) = cc * SD(x) * SD(y)

cc( L1, P1 ) = .590
cc( L1, T1 ) = .66
cc( P1, T1 ) = .737

from this, it seems that L1 and P1 are the LEAST correlated,
and P1 and T1 are the MOST correlated.

e) NOTE:  the covariance matrix and correllation coefficients were
only calculated using the male data, so they don't necessarily
tell us much about which feature to throw away.

To do that, the best method is "withholding," throwing out each
feature one at a time and seeing which pairs still give us a
"correct" (unchanged from the 3D case) classification.

DistM-LP        DistF-LP        Class
0.282453536     0.630698026     Male   CORRECT
0.811319912     0.191206694     Fem    CORRECT
0.988139666     0.424381903     Fem    CORRECT

DistM-LT	DistF-LT	Class
0.372923585     0.348005747     Fem    WRONG!!
0.6869294       0.052038447     Fem    CORRECT
0.971582215     0.448116056     Fem    CORRECT

DistM-PT	DistF-PT	Class
0.354316243     0.526003802     Male   CORRECT
0.609261848     0.184010869     Fem    CORRECT
0.23255107      0.240208243     Male   WRONG!!

So from this experiment, we could throw away the
thumb dimension/measurement, and still get the
same classification results.  Note that the covar.
matrix did tell us that the thumb and the long
finger were most highly correlated, witholding
told us which one we could throw away.
```