Group for Research in Decision Analysis

Estimating the Cluster Tree of a Density

Werner Stuetzle

Clustering problems occur in may domains, from genomics and astronomy to document analysis and marketing. The general goal is to identify distinct groups in a collection of objects. To cast clustering as a statistical problem we regard the feature vectors characterizing the objects as a sample from some unknown probability density. The premise of nonparametric clustering is that groups correspond to modes of this density. Building on ideas of David Wishart and John Hartigan I will introduce the cluster tree of a density as a summary statistic reflecting the group structure, and I will describe methods for estimating the cluster tree.