Reducing the number of features used in data classification can remove noisy or redundant features, reduce the cost of data collection, and improve the accuracy of the classifier. This important step is normally conducted before a data separation (typically a hyperplane) is found. We reverse the process: a separating hyperplane is found first, and then afterwards a revised hyperplane is calculated that has fewer features, but that provides the same separation or better. The method relies on recent algorithms for finding sparse solutions for linear programs. Experiments show that the number of features in the separating hyperplane can be reduced substantially, while the overall accuracy often increases (but never reduces). This approach allows feature reduction to be easily incorporated into classifier decision tree construction using any algorithm for hyperplane selection. Separations that are substantially similar to the original separation but that use even fewer features can also be found. If the features have costs, the algorithm is easily extended to find separations that are at least as accurate as the original separation at much lower total cost.
Published May 2019 , 19 pages