Exhaustive Methods

Here is a brute-force procedure for finding the best subset of m features, where m < d: Systematically loop through the subsets of m features. For each subset:

Design a classifier using the features in the particular subset
Use independent data to estimate its error rate
Remember the subset giving the smallest error rate

While this may be feasible when d is small, there are some obvious problems with this approach:

There are so many ways to select m features out of d, namely, d!/(m!(d-m)!). For example, if d = 100 and m = 10, we must try over 10¹³ subsets.
If we repeatedly use the same test data, we may obtain features that are well suited for that particular test set, but that are not the best in general.
The results will depend on m. We may also have to repeat the process for various values of m to make an informed choice.

On to Stepwise

Up to Feature Selection