Data Clustering using Large <i>p</i>-Median Models and Primal-Dual Variable Neighborhood Search

Hansen, Pierre; Brimberg, Jack; Urosević, Dragan; Mladenović, Nenad

Data clustering methods have been developed extensively in the data mining literature for detecting useful patterns in large datasets in the form of densely populated regions in a multi-dimensional Euclidean space. One of the challenges in dealing with these large databases is to obtain quality solutions within reasonable CPU time and memory requirements. Earlier partitioning methods such as PAM (1990) become inefficient on these larger sets due to repetitive and lengthy neighborhood searches in the solution space. To get around this problem, CLARA (1990) randomly samples the solution space first, and then solves a clustering problem (p-median) on the much smaller representative sample using PAM. Meanwhile CLARANS (2002) randomly selects neighborhood points up to a maximum number instead of searching the entire neighborhood. BIRCH (1996) uses hierarchical approach to cluster the data in a single pass. Improvements are made at the user's option by further passes through the (classified) data. The purpose of this paper is to present a new approach for solving the p-median problem based on the variable neighborhood search metaheuristic. Using a highly efficient data structure and interchange updating procedure within the embedded local search we are able to analyze very large multi-dimensional datasets directly and quickly. Another feature of the our method is that a guaranteed bound on solution quality is obtained by solving heuristically a dual relaxation of the problem.

Paru en juin 2007 , 22 pages

Axe de recherche

Axe 2 : Aide à la décision prise dans les systèmes complexes

Applications de recherche

Publication

jan. 2009

Solving large p-median clustering problems by primal-dual variable neighborhood search

Pierre Hansen, Jack Brimberg, Dragan Urosević et Nenad Mladenović

Data Mining and Knowledge Discovery, 19(3), 351–375, 2009 référence BibTeX

GERAD

G-2007-41

Data Clustering using Large p-Median Models and Primal-Dual Variable Neighborhood Search

Pierre Hansen, Jack Brimberg, Dragan Urosević et Nenad Mladenović

Axe de recherche

Applications de recherche

Publication