Clusterwise regression is a technique for clustering data. Instead of using the classical homogeneity or separation criterion, clusterwise regression is based upon the accuracy of a linear regression model associated to each cluster. This model has many advantages, specially for the purpose of data mining, however, the underlying mathematical model is difficult to solve due to its large number of local optima. In this paper, we propose the use of the Variable Neighborhood Search metaheuristic (VNS) to improve the quality of the solution. Two perturbation strategies are described and one of them yields a substantial improvement if compared to multistart (the error is reduced by a factor of more than 1.5 on average for the 10 clusters problem).
Published August 2005 , 18 pages
This cahier was revised in December 2007
G-2005-61.pdf (300 KB)