Feature Selection

Suppose that we are given some set of d features, which comprise the components of a feature vector x = [x1, x2, ... , xd]'. It often happens that many of these features have relatively little value in discriminating between the different classes. For example, we might have d = 100 features, but we might be able to classify x just about as well using only 5 or 6 features. Including a lot of weak or irrelevant features not only slows things down, but can also degrade classification performance. How do we find the good features?

Unfortunately, the only guaranteed solutions to this problem are exhaustive. However, there are some heuristic approaches that are often useful. We will look at the following approaches:

On to Clustering Up to Feature Selection and Clustering