On strategies to fix degenerate <i>k</i>-Means-means solutions

Aloise, Daniel; Castelo Damasceno, Nielsen; Mladenović, Nenad; Nobre Pinheiro, Daniel

The $k$ -means is a benchmark algorithm used in cluster analysis. It belongs to the large category of heuristics based on location-allocation steps that alternately locate cluster centers and allocate data points to them until no further improvement is possible. Such heuristics are known to suffer from a phenomenon called degeneracy in which some of the clusters are empty. In this paper, we compare and propose a series of strategies to circumvent degenerate solutions during a $k$ -means execution. Our computational experiments show that these strategies are effective leading to better clustering solutions in the vast majority of the cases in which degeneracy appears in $k$ -means. Moreover, we compare the use of our fixing strategies within $k$ -means against the use of two initialization methods found in the literature. These results demonstrate how useful the proposed strategies can be, specially inside memory-based clustering algorithms.

Published October 2016 , 22 pages

Research Axis

Axis 1: Data valuation for decision making

Research applications

Publication

Jul 2017

On strategies to remove degeneracy in k-means

Daniel Aloise, Nielsen Castelo Damasceno, Nenad Mladenović, and Daniel Nobre Pinheiro

Journal of Classification, 34(2), 165–190, 2017 BibTeX reference

GERAD

G-2016-81

On strategies to fix degenerate k-Means-means solutions

Daniel Aloise, Nielsen Castelo Damasceno, Nenad Mladenović, and Daniel Nobre Pinheiro

Research Axis

Research applications

Publication