Exhaustive Methods

Here is a brute-force procedure for finding the best subset of m features, where m < d: Systematically loop through the subsets of m features. For each subset:
  1. Design a classifier using the features in the particular subset
  2. Use independent data to estimate its error rate
  3. Remember the subset giving the smallest error rate
While this may be feasible when d is small, there are some obvious problems with this approach:

On to Stepwise Up to Feature Selection