Groupe d’études et de recherche en analyse des décisions

G-2011-64

An Algorithm for Parametric Communities Detection in Networks

, et

Modularity maximization is extensively used to detect communities in complex networks. It has been shown however that this method suffers from a resolution limit: small communities may be undetectable in the presence of larger ones even if they are very dense. To alleviate this defect, various modifications of the modularity function have been proposed as well as multi-resolution methods. In this paper we systematically study a simple model (first proposed by Pons and Latapy [Theor. Comp. Sci. 412:892-900 (2010)] and similar to the parametric model of Reichardt and Bornholdt [Phys. Rev. E 74:016110 (2006)]) with a single parameter α which balances the fraction of within community edges and the expected fraction of edges according to the configuration model.

An exact algorithm is proposed to find optimal solutions for all values of α as well as the corresponding successive intervals of α values for which they are optimal. This algorithm relies upon a routine for exact modularity maximization and is limited to moderate size instances. An agglomerative hierarchical heuristic is therefore proposed to address parametric modularity detection in large networks. At each iteration the smallest value of α for which it is worthwhile to merge two communities of the current partition is found. Then merging is performed and the data updated accordingly. An implementation is proposed with the same time and space complexity as the well-known Clauset Newman Moore heuristic (CNM) [Phys. Rev. E 70:066111 (2004)].

Experimental results on artificial and real world problems show that
(i) communities are detected by both exact and heuristic methods for all values of the parameter α ;
(ii) the dendrogram summarizing the results of the heuristic method provides a useful tool for substantive analysis, as illustrated particularly on Les Misérables data set;
(iii) the difference between the parametric modularity values given by the exact method and those given by the heuristic is moderate;
(iv) the heuristic version of the proposed parametric method, viewed as a modularity maximization tool, gives better results than the CNM heuristic for large instances.

, 22 pages