Groupe d’études et de recherche en analyse des décisions

Convergence Assessment in Bayesian Clustering

Vahid Partovi Nia Professeur adjoint, Département de mathématiques et de génie industriel, Polytechnique Montréal, Canada

When data clustering is of interest, the data partition must be regarded as the statistical parameter. However, this view to clustering is recent. In Bayesian clustering a model is assumed for data given partition, and a prior distribution is considered for partitions. The goal often is to find the maximum a posteriori grouping. When a Bayesian model is formulated for clustering, often Markov chain Monte Carlo (MCMC) method is applied. Therefore a measure of convergence defined on the partition space, a finite state space, is needed. Such a convergence measure can also be used to quantify the efficiency of a sampler. A Pearson-like goodness of fit statistic is introduced for Bayesian models with analytically tractable marginal posteriors. The asymptotic distribution of the statistic is derived providing a statistical significance test of convergence. Application of the proposed method is demonstrated on MCMC clustering of high-dimensional-low-sample-size metabolite data.

A joint work with Masoud Asgharian (McGill University) and Ioana Cosma (University of Cambridge)