Group for Research in Decision Analysis

G-2016-81

On strategies to fix degenerate k-Means-means solutions

, , , and

The \(k\)-means is a benchmark algorithm used in cluster analysis. It belongs to the large category of heuristics based on location-allocation steps that alternately locate cluster centers and allocate data points to them until no further improvement is possible. Such heuristics are known to suffer from a phenomenon called degeneracy in which some of the clusters are empty. In this paper, we compare and propose a series of strategies to circumvent degenerate solutions during a \(k\)-means execution. Our computational experiments show that these strategies are effective leading to better clustering solutions in the vast majority of the cases in which degeneracy appears in \(k\)-means. Moreover, we compare the use of our fixing strategies within \(k\)-means against the use of two initialization methods found in the literature. These results demonstrate how useful the proposed strategies can be, specially inside memory-based clustering algorithms.

, 22 pages