Post-separation feature reduction

Chinneck, John W.

Reducing the number of features used in data classification can remove noisy or redundant features, reduce the cost of data collection, and improve the accuracy of the classifier. This important step is normally conducted before a data separation (typically a hyperplane) is found. We reverse the process: a separating hyperplane is found first, and then afterwards a revised hyperplane is calculated that has fewer features, but that provides the same separation or better. The method relies on recent algorithms for finding sparse solutions for linear programs. Experiments show that the number of features in the separating hyperplane can be reduced substantially, while the overall accuracy often increases (but never reduces). This approach allows feature reduction to be easily incorporated into classifier decision tree construction using any algorithm for hyperplane selection. Separations that are substantially similar to the original separation but that use even fewer features can also be found. If the features have costs, the algorithm is easily extended to find separations that are at least as accurate as the original separation at much lower total cost.

Paru en mai 2019 , 19 pages

Axe de recherche

Axe 1 : Valorisation des données pour la prise de décision

Application de recherche

Ingénierie (conception en ingénierie, conception numérique)

Document

G1931.pdf (630 Ko)

GERAD

G-2019-31

Post-separation feature reduction

John W. Chinneck

Axe de recherche

Application de recherche

Document