Group for Research in Decision Analysis

A model for clustering data from heterogeneous subjects

Daniel Aloise Assistant professor, Department of Computer Engineering, Universidade Federal do Rio Grande do Norte, Brazil

Clustering is a data mining method which consists in partitioning a given set of n objects into p clusters in order to minimize the dissimilarity among objects in the same cluster while dissimilarities regarding objects of other clusters are maximized. Classical clustering methods use only one dissimilarity matrix concerning each pair of objects as input. However, in some settings where data can be collected from different perspectives, multiple dissimilarity matrices are available. In such cases, researchers typically aggregate their data into a single matrix resulting in clustering results that mask the true nature of the data. We propose in this paper a clustering model consisting of a three-way partitioning problem that identifies segments of subjects that cluster objects in a similar way. The model is a 0-1 quadratically constrained quadratic problem for which we propose a Variable Neighborhood Search heuristic whose local search is based on the solution of mixed-integer problems. Computational experiments show that the heuristic is efficient and that the proposed model is suited for recovering heterogeneous data as well as it is robust to different clustering settings.

Free entrance.
Welcome to everyone!