An exact algorithm is proposed for minimum sum-of-squares nonhierarchical clustering, i.e., for partitioning a given set of points from Euclidean m-space into a given number of clusters in order to minimize the sum of squared distances from all points to the centroïd of the cluster to which they belong. This problem is expressed as a constrained hyperbolic program in 0-1 variables. The resolution method combines an interior point algorithm, i.e., a weighted analytic center column generation method, with branch-and-bound. The auxiliary problem of determining the entering column (i.e. the oracle) is an unconstrained hyperbolic program in 0-1 variables with quadratic numerator and linear denominator. It is solved trough a sequence of unconstrained quadratic programs in 0-1 variables. To accelerate resolution, variable neighborhood search heuristics are used both to get a good initial solution and to solve quickly the auxiliary problem as long as global optimality is not reached. Estimated bounds for the dual variables are deduced from the heuristic solution and used in the resolution process as a trust region. Proved minimum sum of squares partitions are determined for the first time for several fairly large data sets from the literature, including Fisher's 150 iris.
Published August 1997 , 27 pages
This cahier was revised in January 1999