On the $k$-medoids model for semi-supervised clustering

Randel, Rodrigo Alves; Aloise, Daniel; Mladenović, Nenad; Hansen, Pierre

Clustering is an automated and powerful technique for data analysis. It aims to divide a given set of data points into clusters which are homogeneous and/or well separated. The biggest challenge of data clustering is indeed to find a clustering criterion to express good separation of data into homogeneous groups so that they bring useful information to the user. To overcome this issue, it is suggested that the user provides a priori information about the data set. Clustering under this assumption is often called semi-supervised clustering. This work explores semi-supervised clustering through the $k$ -medoids model. Results obtained by a Variable Neighborhood Search (VNS) heuristic show that the $k$ -medoids model presents classification accuracy compared to that of the typical $k$ -means approach. Furthermore, the model demonstrates high flexibility and performance by combining kernel projections with clustering constraints.

Published March 2018 , 14 pages

Research Axis

Axis 1: Data valuation for decision making

Research application

Smart logistics (schedule design, supply chains, logistics, manufacturing systems)

Document

G1823.pdf (500 KB)

GERAD

G-2018-23

On the $k$ -medoids model for semi-supervised clustering

Rodrigo Alves Randel, Daniel Aloise, Nenad Mladenović, and Pierre Hansen

Research Axis

Research application

Document