Outliers and anomaly data are generated in many experimental phenomena. Stock market, finance, insurance, and communication systems are a few illustrations of them. Usually these data can be modeled using non-Gaussian heavy-tailed distributions.
Clustering is the primary technique used to divide data into groups based on unknown models inherent to the data. Regulation of the entire clustering method is complicated and submitted to several uncertainties. Similarity measures one of the first decisions to be made to establish how the similarity between two objects must be measured.
This research focuses on the influence of similarity measures in the hierarchical clustering to uncover patterns in heavy-tailed data. A well-known measure of similarity is defined based on correlation of two objects. However, this measure cannot be used for heavy-tailed data. We will illustrate how to perform a hierarchical cluster analysis in heavy-tailed data by extending the similarity measure based on the correlation. We introduce a new similarity measure based on covariation coefficient. We evaluate the performance of covariation similarity and compare it to others using external and internal criteria.
Bienvenue à tous!