Groupe d’études et de recherche en analyse des décisions

G-2002-43

Analysis of Global k-Means, an Incremental Heuristic for Minimum Sum-of-Squares Clustering

, , et

The global k-means heuristic is a recently proposed (Likas et al. 2003) incremental approach for minimum sum-of-squares clustering of a set X of N points of Rd into M clusters. For k=2,3,...,M-1 it considers the best-known set of k-1 centroids previously obtained, adds a new cluster center at each point of X in turn and applies k-means to each set of k centroids so-obtained, keeping the best k-partition found. We show that global k-means cannot be guaranteed to find the optimum partition for any M greater or equal to 2 and d greater or equal to 1; moreover, the same holds for all M greater or equal to 3 if the new cluster center is chosen anywhere in Rd instead of belonging to X. The empirical performance of global k-means is also evaluated by comparing the values it obtains with those obtained for three data sets with N less or equal to 150 which are solved optimally, as well as with values obtained by the recent j-means heuristic and extensions thereof for three larger data sets with N less or equal to 3038.

, 26 pages

Ce cahier a été révisé en décembre 2004