These notes provide background on feature selection and clustering for
the new NSF-sponsored course entitled Human
Computer Interface Design.
We often want to recognize patterns in the signals that we get from input sensors, and other notes for this course describe some statistically-based procedures for pattern classification. The standard feature-vector model for classification assumes that one way or another the designer has identified the features upon which the classification will be based. The classifier then uses all of these features to assign a feature vector to a class.
Because the specific features are so problem specific, there is no general theory for designing an effective feature set. However, there are some useful procedures for improving the performance one can obtain with a given set of features:
- Feature selection. If the number of features is too large, one can speed up and often improve the process by using a small subset of the most important features.
- Clustering. If the problem possesses natural subcategories, one can improve accuracy by finding the clusters and classifying in two stages -- subcategory classification followed by final classification.
* Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted with or without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on services, or to redistribute to lists, requires specific permission and/or a fee.