Groupe d’études et de recherche en analyse des décisions

G-93-06

Maximum Split Clustering Under Connectivity Constraints

, , , et

Consider N entities to be classified (e.g., geographical areas), a matrix of dissimilarity between pairs of entities, a graph H with vertices associated with these entities such that the edges join the vertices corresponding to contiguous entities. The split of a cluster is the smallest dissimilarity between an entity of this cluster and an entity outside of it. The single-linkage algorithm (ignoring contiguity between entities) provides partitions into M clusters for which the smallest split of the clusters, called split of the partition, is maximum. We study here the partitioning of the set of entities into M connected clusters for all M between N - 1 and 2 (i.e., clusters such that the subgraphs of H induced by their corresponding sets of entities are connected) with maximum split subject to that condition. We first provide an exact algorithm with a Θ(N2) complexity for the particular case in which H is a tree. This algorithm suggests in turn a first heuristic algorithm for the general problem. Several variants of this heuristic are also explored. We then present an exact algorithm for the general case based on iterative determination of cocycles of subtrees and on the solution of auxiliary set covering problems. As solution of the latter problems is time-consuming for large instances, we provide another heuristic in which the auxiliary set covering problems are solved approximately. Computational results obtained with the exact and heuristic algorithms are presented on test problems from the literature.

, 34 pages

Ce cahier a été révisé en juillet 2003