Back

G-2026-09

A variable neighborhood search heuristic for semi-supervised minimum sum-of-squares clustering

, , and

BibTeX reference

Semi-supervised clustering is a learning approach that primarily relies on unlabeled data but incorporates some prior information to improve the clustering results. Among various clustering objectives, the minimum sum-of-squares clustering (MSSC) is widely used to partition data by minimizing intra-cluster variances. In our work, we propose a Variable Neighborhood Search (VNS) heuristic for semi-supervised MSSC, where prior information is given in the form of pairwise must-link and cannot-link constraints. Our approach reformulates the optimization problem by representing must-link constraints through the construction of super-points, which implicitly satisfy these constraints, while cannot-link constraints are incorporated as penalties in the objective function. Computational experiments indicate that, in the majority of tested cases, our proposed VNS heuristic outperforms the solutions obtained by the state-of-the-art heuristic algorithm found in the literature within the same computational time.

, 11 pages

Research Axis

Research application

Document

G2609.pdf (400 KB)