A model for clustering data from heterogeneous subjects

Santi, Éverton; Aloise, Daniel; Blanchard, Simon

Clustering is a data mining method which consists in partitioning a given set of n objects into p clusters in order to minimize the dissimilarity among objects in the same cluster while dissimilarities regarding objects of other clusters are maximized. Classical clustering methods use only one dissimilarity matrix concerning each pair of objects as input. However, in some settings where data can be collected from different perspectives, multiple dissimilarity matrices are available. In such cases, researchers typically aggregate their data into a single matrix resulting in clustering results that mask the true nature of the data. We propose in this paper a clustering model consisting of a three-way partitioning problem that identifies segments of subjects that cluster objects in a similar way. The model is a nonconvex problem for which we propose a Variable Neighborhood Search heuristic whose local search is based on the solution of mixed-integer problems. Computational experiments show that the heuristic is efficient and that the proposed model is suited for recovering heterogeneous data as well as it is robust to different clustering settings.

Published March 2015 , 20 pages

Research Axis

Axis 1: Data valuation for decision making

Research application

Marketing (business intelligence, revenue management, recommendation systems)

Publication

Sep 2016

A model for clustering data from heterogeneous dissimilarities

Éverton Santi, Daniel Aloise, and Simon Blanchard

European Journal of Operational Research, 253(3), 659–672, 2016 BibTeX reference

GERAD

G-2015-18

A model for clustering data from heterogeneous subjects

Éverton Santi, Daniel Aloise, and Simon Blanchard

Research Axis

Research application

Publication