Group for Research in Decision Analysis

G-2018-23

On the \(k\)-medoids model for semi-supervised clustering

, , , and

Clustering is an automated and powerful technique for data analysis. It aims to divide a given set of data points into clusters which are homogeneous and/or well separated. The biggest challenge of data clustering is indeed to find a clustering criterion to express good separation of data into homogeneous groups so that they bring useful information to the user. To overcome this issue, it is suggested that the user provides a priori information about the data set. Clustering under this assumption is often called semi-supervised clustering. This work explores semi-supervised clustering through the \(k\)-medoids model. Results obtained by a Variable Neighborhood Search (VNS) heuristic show that the \(k\)-medoids model presents classification accuracy compared to that of the typical \(k\)-means approach. Furthermore, the model demonstrates high flexibility and performance by combining kernel projections with clustering constraints.

, 14 pages