An Empirical Comparison of Ensemble Methods Based on Classification Trees

Hamza, M; Larocque, Denis

In this paper, we perform an empirical comparison of the classification error of several ensemble methods based on classification trees. This comparison is based on fourteen data sets that are publicly available and that were used in Lim, Loh and Shih (Machine Learning 40, 203-228, 2000). The methods used are a single tree, Bagging, Boosting (Arcing) and random forests. They are compared on several different aspects. More precisely, we look at the effect of noise, the effect of allowing linear combinations in the construction of the trees, the differences between some splitting criterions and, specifically for random forests, the effect of the number of variables from which to choose the best split at each given node. Moreover, we compare our results from those obtained in Lim et al. (2000). In this study, the best overall results are obtained with random forests. In particular, random forests are the most robust against noise. The effect of allowing linear combinations and the differences between splitting criterions are not that great on average but can be substantial for some data sets.

Paru en octobre 2003 , 17 pages

Axe de recherche

Axe 1 : Valorisation des données pour la prise de décision

Application de recherche

Marketing (intelligence d’affaires, gestion des revenus, systèmes de recommandation)

Publication

jan. 2005

An empirical comparison of ensemble methods based on classification trees

M Hamza et Denis Larocque

Journal of Statistical Computation and Simulation, 75, 629–643, 2005 référence BibTeX

GERAD

G-2003-71

An Empirical Comparison of Ensemble Methods Based on Classification Trees

M Hamza et Denis Larocque

Axe de recherche

Application de recherche

Publication