Your hardware engineer has already designed and built a feature extractor which gives three numbers for each handscan. You've collected five training data tokens for each of the sexes:
|Male handscan training data||L1||P1||T1|
|Female handscan training data|
a. Using a Euclidean linear statistical classifier, compute how the
input handscan feature vectors would be classified:
b. Take a ruler and place it between your middle and index fingers. Push a little firmly on the ruler to depress the skin between those fingers, then measure the length of your middle finger. That's L1 in the handscan feature set. Do the same for your pinkie, that's P1. Do the same for your thumb, that's T1. Compute what the Euclidean classifier would yield as a decision based on your hand metrics. Was it correct as to your sex? (don't freak out if not, I made up the data :).
c. Compute the Covariance matrix based on the training data for the Male handscan training data.
d. What does this matrix tell you about the features in that region?
e. If you were forced to pick only two features from the existing
three, which ones would you pick?
Plot the training data in 2D based on your selected two features.
Plot and label the Male and Female mean vectors.
Plot and label the unknown input data points x1, x2, and x3, and your own measurement point.
Training Data: Male L1 Male P1 Male T1 Fem L1 Fem P1 Fem T1 M/F1 3.51 2.52 2.49 2.89 2.21 2.17 M/F2 3.54 2.51 2.51 2.95 2.2 2.29 M/F3 3.47 2.51 2.5 3.12 2.25 2.35 M/F4 3.51 2.51 2.49 2.75 2.07 1.99 M/F5 3.55 2.54 2.53 3.05 2.14 2.21 Means 3.516 2.518 2.504 2.952 2.174 2.202 Stdevs 0.0313 0.0130 0.0167 0.1435 0.0702 0.1375 a) We calculate distances of each unknown to the male and female means. Then we use minimum distance to classify each unknown input handscan. Unknown Data L1 P1 T1 DistM3D DistF3D Class x1 3.3 2.7 2.2 0.41496 0.63070 Male x2 2.9 1.99 2.2 0.86640 0.19121 Fem x3 2.55 2.31 2.4 0.99359 0.46829 Fem b) Your finger. Some mis-classifications c) Compute the Covariance Matrix: c(x,y) = ( (x-mean_x)*(y-mean_y) + ... + (x[n-1]-mean_x) * (y[n-1]-mean_y)) / (n-1) L P T L 0.00098 0.00024 0.000345 P 0.00024 0.00017 0.00016 T 0.000345 0.00016 0.00028 d) What does it tell us? Technically, NOTHING. There aren't really enough training vectors to reliably say much about the covariance. If it did tell us something, however it would be that the Long finger varies the most, and the Pinkie the least (main diagonal, we knew this anyway from the standard deviations we calculated above). To see how things co-vary, we need to divide by the individual standard deviations, then inspect. SD(L1) = sqrt( c(L1,L1) ) = .0313 SD(P1) = sqrt( c(P1,P1) ) = .0130 SD(T1) = sqrt( c(T1,T1) ) = .0167 now we can find the coefficient of correlation cc between each pair such that: c(x,y) = cc * SD(x) * SD(y) cc( L1, P1 ) = .590 cc( L1, T1 ) = .66 cc( P1, T1 ) = .737 from this, it seems that L1 and P1 are the LEAST correlated, and P1 and T1 are the MOST correlated. e) NOTE: the covariance matrix and correllation coefficients were only calculated using the male data, so they don't necessarily tell us much about which feature to throw away. To do that, the best method is "withholding," throwing out each feature one at a time and seeing which pairs still give us a "correct" (unchanged from the 3D case) classification. DistM-LP DistF-LP Class 0.282453536 0.630698026 Male CORRECT 0.811319912 0.191206694 Fem CORRECT 0.988139666 0.424381903 Fem CORRECT DistM-LT DistF-LT Class 0.372923585 0.348005747 Fem WRONG!! 0.6869294 0.052038447 Fem CORRECT 0.971582215 0.448116056 Fem CORRECT DistM-PT DistF-PT Class 0.354316243 0.526003802 Male CORRECT 0.609261848 0.184010869 Fem CORRECT 0.23255107 0.240208243 Male WRONG!! So from this experiment, we could throw away the thumb dimension/measurement, and still get the same classification results. Note that the covar. matrix did tell us that the thumb and the long finger were most highly correlated, witholding told us which one we could throw away.