Robustness of random forests for regression

Roy, Marie-Hélène; Larocque, Denis

In this paper, we empirically investigate the robustness of random forests for regression problems. We also investigate the performance of five variations of the original random forest method, all aimed at improving robustness. All the proposed variations can be easily implemented using the R package randomForest. The first main idea behind these variations is the use of the median, instead of the mean, to combine the predictions from the individual trees. The second idea is to build the trees using the ranks of the response instead of the original values. The competing methods are compared via a simulation study and ten real data sets obtained from the UCI Machine Learning Repository. Our results show that the median--based random forests (using either the ranks or the original responses) offer good and stable performances for the simulated and real data sets considered and, as such, should be considered as serious alternatives to the original random forest method.

Paru en octobre 2010 , 17 pages

Axe de recherche

Axe 1 : Valorisation des données pour la prise de décision

Application de recherche

Marketing (intelligence d’affaires, gestion des revenus, systèmes de recommandation)

Publication

jan. 2012

Robustness of random forests for regression

Marie-Hélène Roy et Denis Larocque

Journal of Nonparametric Statistics, 24(4), 993–1006, 2012 référence BibTeX

GERAD

G-2010-56

Robustness of random forests for regression

Marie-Hélène Roy et Denis Larocque

Axe de recherche

Application de recherche

Publication