A model for clustering data from heterogeneous subjects

Santi, Éverton; Aloise, Daniel; Blanchard, Simon

Clustering is a data mining method which consists in partitioning a given set of n objects into p clusters in order to minimize the dissimilarity among objects in the same cluster while dissimilarities regarding objects of other clusters are maximized. Classical clustering methods use only one dissimilarity matrix concerning each pair of objects as input. However, in some settings where data can be collected from different perspectives, multiple dissimilarity matrices are available. In such cases, researchers typically aggregate their data into a single matrix resulting in clustering results that mask the true nature of the data. We propose in this paper a clustering model consisting of a three-way partitioning problem that identifies segments of subjects that cluster objects in a similar way. The model is a nonconvex problem for which we propose a Variable Neighborhood Search heuristic whose local search is based on the solution of mixed-integer problems. Computational experiments show that the heuristic is efficient and that the proposed model is suited for recovering heterogeneous data as well as it is robust to different clustering settings.

Paru en mars 2015 , 20 pages

Axe de recherche

Axe 1 : Valorisation des données pour la prise de décision

Application de recherche

Marketing (intelligence d’affaires, gestion des revenus, systèmes de recommandation)

Publication

sept. 2016

A model for clustering data from heterogeneous dissimilarities

Éverton Santi, Daniel Aloise et Simon Blanchard

European Journal of Operational Research, 253(3), 659–672, 2016 référence BibTeX

GERAD

G-2015-18

A model for clustering data from heterogeneous subjects

Éverton Santi, Daniel Aloise et Simon Blanchard

Axe de recherche

Application de recherche

Publication