Groupe d’études et de recherche en analyse des décisions


A model for clustering data from heterogeneous subjects

, et

Clustering is a data mining method which consists in partitioning a given set of n objects into p clusters in order to minimize the dissimilarity among objects in the same cluster while dissimilarities regarding objects of other clusters are maximized. Classical clustering methods use only one dissimilarity matrix concerning each pair of objects as input. However, in some settings where data can be collected from different perspectives, multiple dissimilarity matrices are available. In such cases, researchers typically aggregate their data into a single matrix resulting in clustering results that mask the true nature of the data. We propose in this paper a clustering model consisting of a three-way partitioning problem that identifies segments of subjects that cluster objects in a similar way. The model is a nonconvex problem for which we propose a Variable Neighborhood Search heuristic whose local search is based on the solution of mixed-integer problems. Computational experiments show that the heuristic is efficient and that the proposed model is suited for recovering heterogeneous data as well as it is robust to different clustering settings.

, 20 pages