Groupe d’études et de recherche en analyse des décisions

G-2020-23-EIW05

Statistical learning with the determinantal point process

et

The determinantal point process (DPP) provides a promising and attractive alternative to simple random sampling in cluster analysis or classification, for the initial random selection of points required by most algorithms. As a probabilistic model of repulsion, the DPP elects which points are similar and have less probability to appear together, favouring then more diverse subsets of points. After a short introduction to DPP, we show how its use for choosing initial subsets of points in a clustering algorithm run multiple times on large datasets can improve the quality of final results.

, 12 pages