Robustness of random forests for regression

Roy, Marie-Hélène; Larocque, Denis

In this paper, we empirically investigate the robustness of random forests for regression problems. We also investigate the performance of five variations of the original random forest method, all aimed at improving robustness. All the proposed variations can be easily implemented using the R package randomForest. The first main idea behind these variations is the use of the median, instead of the mean, to combine the predictions from the individual trees. The second idea is to build the trees using the ranks of the response instead of the original values. The competing methods are compared via a simulation study and ten real data sets obtained from the UCI Machine Learning Repository. Our results show that the median--based random forests (using either the ranks or the original responses) offer good and stable performances for the simulated and real data sets considered and, as such, should be considered as serious alternatives to the original random forest method.

Published October 2010 , 17 pages

Research Axis

Axis 1: Data valuation for decision making

Research application

Marketing (business intelligence, revenue management, recommendation systems)

Publication

Jan 2012

Robustness of random forests for regression

Marie-Hélène Roy and Denis Larocque

Journal of Nonparametric Statistics, 24(4), 993–1006, 2012 BibTeX reference

GERAD

G-2010-56

Robustness of random forests for regression

Marie-Hélène Roy and Denis Larocque

Research Axis

Research application

Publication