Groupe d’études et de recherche en analyse des décisions

# On strategies to fix degenerate k-Means-means solutions

## Daniel Aloise, Nielson Castelo Damasceno, Nenad Mladenović et Daniel Nobre Pinheiro

The $$k$$-means is a benchmark algorithm used in cluster analysis. It belongs to the large category of heuristics based on location-allocation steps that alternately locate cluster centers and allocate data points to them until no further improvement is possible. Such heuristics are known to suffer from a phenomenon called degeneracy in which some of the clusters are empty. In this paper, we compare and propose a series of strategies to circumvent degenerate solutions during a $$k$$-means execution. Our computational experiments show that these strategies are effective leading to better clustering solutions in the vast majority of the cases in which degeneracy appears in $$k$$-means. Moreover, we compare the use of our fixing strategies within $$k$$-means against the use of two initialization methods found in the literature. These results demonstrate how useful the proposed strategies can be, specially inside memory-based clustering algorithms.

, 22 pages