Group for Research in Decision Analysis

G-2018-31

Distributed integral simplex for clustering

, , and

Clustering is the subject of active research in several fields such as operations research, statistics, pattern recognition, and machine learning. The range of applications is very wide: scheduling, vehicle routing, pattern recognition, etc. Depending on the specific needs of the community, the methods used to solve these problems vary from heuristics of rather primal nature (improving of clustering iteratively by relocation moves for instance) to exact methods of a rather dual nature where we generally solve the continuous relaxation by releasing the integrality constraints and restoring it by implicit enumeration (branch and cut or branch and cut and price). In this paper, we propose the integral simplex, an exact primal method that could be suitable for both major classes. More interestingly, it could be distributedly solved better than the dual approach. Consequently, this work aims to propose a distributed version of the integral simplex, called DISUD, using some decompositions and multiple agents. Each agent dynamically splits the overall set partitioning (clustering) problem into sub-problems and solve them in parallel on a single machine. The new algorithm DISUD improves at each iteration the current clustering until (near) optimality is reached. It works much better on real set partitioning instances from the airline industry than DCPLEX, the distributed version of the state of the art commercial solver CPLEX.

, 25 pages