Finite mixture models are useful in a wide variety of applications such as astronomy, botany, genetics, medicine and zoology. One appeal for such models is that they provide a convenient model-based statistical framework for clustering. Further, parameter estimation is computationally made feasible by application of the expectation-maximization (EM) algorithm, which can also help in estimating dispersions for these parameter estimates. We have used these estimated dispersions along with first- and second-order multivariate asymptotics to develop approaches to determining significance of various aspects of such models. These include determining the number of significant components in the mixture model, variable selection, quantifying the uncertainty in the derived grouping, and determining significantly influential and outlying observations. In this talk, I will outline development of such methods and illustrate performance on both simulation and classification datasets. This work is joint with Volodymyr Melnykov and is supported in part by the US National Science Foundation under its CAREER grant DMS-0437555.
Groupe d’études et de recherche en analyse des décisions