Abstract
The identification of cohesive communities is a key process in social network analysis. However, the algorithms that are effective for finding communities do not scale well to very large problems, as their time complexity is worse than linear in the number of edges in the graph. This is an important issue for those interested in applying social network analysis techniques to very large networks, such as networks of mobile phone subscribers. In this respect, the contributions of this paper are twofold. First, we demonstrate these scaling issues using a prominent community-finding algorithm as a case study. Then, we show that a two-stage process, whereby the network is first decomposed into manageable subnetworks using a multilevel graph partitioning procedure, is effective in finding communities in networks with more than 106 nodes.
Similar content being viewed by others
References
Paliouras G, Papatheodorou C, Karkaletsis V, Spyropoulos CD (2000) Clustering the users of large web sites into communities. In: Proceedings of seventh international conference on machine learning (ICML’00), pp 719–726
Srivastava J, Cooley R, Deshpande M, Tan P (2000) Web usage mining: discovery and applications of usage patterns from Web data. ACM SIGKDD Explor Newslett 1: 12–23
He Y, Cheung Hui S (2002) Mining a Web Citation Database for author co-citation analysis. Inf Process Manag 38: 491–508
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69: 56–68
Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435: 814–818
Gregory S (2007) An algorithm to find overlapping community structure in networks. In: Proceedings of 11th European conference on principles and practice of knowledge discovery in databases (PKDD’07), pp 91–102
Abello J, Pardalos PM, Resende MGC (1999) On maximum clique problems in very large graphs. In: External memory algorithms. DIMACS series in discrete mathematics and theoretical computer science, pp 119–130
Dhillon I, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29: 1944–1957
Karp R (1972) Reducibility among combinatorial problems. Complex Comput Comput 43: 85–103
Bron C, Kerbosch J (1973) Finding all cliques of an undirected graph. Commun ACM 16: 575–577
Balas E, Yu CS (1986) Finding a maximum clique in an arbitrary graph. SIAM J Comput 15: 1054–1068
Wood DR (1997) An algorithm for finding a maximum clique in a graph. Oper Res Lett 21: 211–217
Alon N, Krivelevich M, Sudakov B (1998) Finding a large hidden clique in a random graph. In: 9th symposium on discrete algorithms (SODA), pp 91–102
Östergård P (2002) A fast algorithm for the maximum clique problem. Discrete Appl Math 120: 197–207
Freeman L (1979) Centrality in social networks: conceptual clarification. Soc Netw 1: 215–239
Granovetter M (1983) The strength of weak ties: a network theory revisited. Sociol Theory 1: 201–233
Girvan M, Newman M (2002) Community structure in social and biological networks. PNAS 99: 7821–7826
Palla G, Dernyi I, Farkas I, Vicsek T (2005) Supplementary information: uncovering the overlapping community structure of complex networks in nature and society. Nature 1–12
Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR ’97), pp 731–737
Chung FRK (1994) Spectral graph theory. In: CBMS conference on recent advances in spectral graph theory. Regional conference series in mathematics, vol 92, California State University, Fresno
Pothen A, Simon HD, Liou KP (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Math Anal Appl 11: 430–452
Chan P, Schlag M, Zien J (1994) Spectral k-way ratio cut partitioning. IEEE Trans CAD Integr Circuits Syst 13: 1088–1096
Karypis G, Kumar V (1999) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20: 359–392
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49: 291–307
McCallum A, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr J 3: 127–163
Erdős P, Rényi A (1961) On the evolution of random graphs. Bull Inst Int Stat 38: 343–347
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286: 509–512
Hakimi S (1962) On the realizability of a set of integers as degrees of the vertices of a graph. SIAM J Appl Math 10: 496–506
Narayanan H, Belkin M, Niyogi P (2007) On the relation between low density separation, spectral clustering and graph cuts. Adv Neural Inf Process Syst 19: 1025
Author information
Authors and Affiliations
Corresponding author
Additional information
The work was supported by Science Foundation Ireland Grant nos. 05/IN.1/I24 and 08/SRC/I407 and Enterprise Ireland Grant no. PC/2007/010.
Rights and permissions
About this article
Cite this article
Narasimhamurthy, A., Greene, D., Hurley, N. et al. Partitioning large networks without breaking communities. Knowl Inf Syst 25, 345–369 (2010). https://doi.org/10.1007/s10115-009-0251-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0251-x