Here is a brute-force procedure for finding the best subset of m features,
where m < d:
While this may be feasible when d is small, there are some obvious problems
with this approach:
- There are so many ways to select m features out of d, namely, d!/(m!(d-m)!).
For example, if d = 100 and m = 10, we must try over 1013 subsets.
- If we repeatedly use the same test data, we may obtain features that
are well suited for that particular test set, but that are not the best
- The results will depend on m. We may also have to repeat the process
for various values of m to make an informed choice.
On to Stepwise Up to Feature Selection