Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Engineering Parallel Algorithms for Community Detection in Massive Networks

Published: 01 January 2016 Publication History

Abstract

The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, fast analytics algorithms and software tools are necessary. One common graph analytics kernel is disjoint community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, although parallelism will be necessary to scale to the data volume of real-world applications. We address the deficit in computing capability by a flexible and extensible community detection framework with shared-memory parallelism. Within this framework we design and implement efficient parallel community detection heuristics: A parallel label propagation scheme; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the above. In extensive experiments driven by the algorithm engineering paradigm, we identify the most successful parameters and combinations of these algorithms. We also compare our implementations with state-of-the-art competitors. The processing rate of our fastest algorithm often reaches 50&#x00A0;M edges/second. We recommend the parallel Louvain method and our variant with refinement as both qualitatively strong and fast. Our methods are suitable for massive data sets with billions of edges. (A preliminary version of this paper appeared in <italic> Proceedings of the 42nd International Conference on Parallel Processing (ICPP 2013)</italic> <xref ref-type="bibr" rid="ref35">[35]</xref> .)

References

[1]
D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, Eds., Graph Partitioning and Graph Clustering, ser. Contemporary Mathematics. Providence, RI, USA: AMS, 2013, no. 588.
[2]
J. W. Berry, B. Hendrickson, R. A. LaViolette, and C. A. Phillips, “Tolerating the community detection resolution limit with edge weighting,” Phys. Rev. E, vol. 83, no. 5, p. 056119, 2011.
[3]
S. Bhowmick and S. Srinivasan, “A template for parallelizing the louvain method for modularity maximization, ” in Dynamics and Of Complex Networks. New York, NY, USA: Springer, 2013, pp. 111–124, vol. 2.
[4]
C. Bichot and P. Siarry, Graph Partitioning. New York, NY, USA: Wiley, 2011.
[5]
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” J. Statist. Mech.: Theory Exp., vol. 2008, no. 10, p. P10008, 2008.
[6]
U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, and D. Wagner, “On modularity clustering,” IEEE Trans. Knowl. Data Eng., vol. 20, no. 2, pp. 172–188, Feb. 2008.
[7]
A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz. (2014). “Recent advances in graph partitioning, ” arXiv preprint arXiv:1311.3144 [Online]. Available: http://arxiv.org/abs/1311.3144
[8]
A. Clauset, M. E. Newman, and C. Moore, “Finding community structure in very large networks,” Phys. Rev. E, vol. 70, no. 6, p. 066111, 2004.
[9]
B. O. Fagginger Auer and R. H. Bisseling, “Graph coarsening and clustering on the GPU,” in Graph Partitioning and Graph Clustering, ser. Contemporary Mathematics, D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, Eds. Providence, RI, USA: AMS, 2013, no. 588.
[10]
S. Fortunato, “Community detection in graphs,” Phy. Rep., vol. 486, no. 3-5, pp. 75–174, 2010.
[11]
S. Fortunato and M. Barthelemy, “Resolution limit in community detection,” Proc. Nat. Acad. Sci. USA, vol. 104, no. 1, pp. 36–41, 2007.
[12]
U. Gargi, W. Lu, V. S. Mirrokni, and S. Yoon, “Large-scale community detection on youtube for topic discovery and exploration,” in Proc. 5th Int. Conf. Weblogs Soc. Media, 2011, pp. 486–489.
[13]
J. R. Gilbert, S. Reinhardt, and V. B. Shah, “ High-performance graph algorithms from parallel sparse matrices,” in Proc. 8th Int. Conf. Appl. Parallel Comput. State Art Sci. Comput., 2007, pp. 260–269.
[14]
M. Girvan and M. Newman, “Community structure in social and biological networks,” Proc. Nat. Acad. Sci. USA, vol. 99, no. 12, p. 7821, 2002.
[15]
P. F. Jonsson, T. Cavanna, D. Zicha, and P. A. Bates, “Cluster analysis of networks generated through homology: Automatic identification of important protein communities involved in cancer metastasis, ” BMC Bioinformat., vol. 7, p. 2, 2006.
[16]
K. Kothapalli, S. Pemmaraju, and V. Sardeshmukh, “On the analysis of a label propagation algorithm for community detection,” in Distributed Computing and Networking, ser. Lecture Notes in Computer Science, D. Frey, M. Raynal, S. Sarkar, R. Shyamasundar, and P. Sinha, Eds. New York, NY, USA: Springer, 2013, vol. 7730, pp. 255–269.
[17]
D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguñá, “Hyperbolic geometry of complex networks, ” Phys. Rev. E, vol. 82, p. 036106, Sep 2010.
[18]
R. Lambiotte, “Multi-scale modularity in complex networks,” in Proc. 8th Int. Symp. Model Optim. Mobile, Ad Hoc Wireless Netw., 2010, pp. 546 –553.
[19]
A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs for testing community detection algorithms,” Phys. Rev. E, vol. 78, no. 4, p. 046110, 2008.
[20]
D. LaSalle and G. Karypis, “Multi-threaded graph partitioning,” in Proc. 27th IEEE Int. Symp. Parallel Distrib. Process., 2013, pp. 225–236.
[21]
D. LaSalle and G. Karypis. (2014). “Multi-threaded modularity based graph clustering using the multilevel paradigm, ” J. Parallel Distrib. Comput. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0743731514001750
[22]
M. von Looz, C. L. Staudt, H. Meyerhenke, and R. Prutkin. (2014, Nov.). Fast generation of dynamic complex networks with underlying hyperbolic geometry. Karlsruhe Institute of Technology, Karlsruhe, Germany, Karlsruhe Reports in Informatics 2014,14 [Online]. Available: http://digbib.ubka.uni-karlsruhe.de/volltexte/1000043881
[23]
P. D. Meo, E. Ferrara, G. Fiumara, and A. Provetti, “Mixing local and global information for community detection in large networks,” J. Comput. Syst. Sci., vol. 80, no. 1, pp. 72–87, 2014.
[24]
H. Meyerhenke, P. Sanders, and C. Schulz, “Parallel graph partitioning for complex networks,” in Proc. 29th IEEE Int. Symp. Parallel Distrib. Process., 2015.
[25]
M. Müller-Hannemann and S. Schirra, Eds., Algorithm Engineering: Bridging the Gap between Algorithm Theory and Practice, ser. Lecture Notes in Computer Science. New York, NY, USA: Springer, 2010, vol. 5971.
[26]
M. Ovelgönne, “Distributed community detection in web-scale networks, ” in Proc. Adv. Soc. Netw. Anal. Min., 2013, pp. 66 –73.
[27]
M. Ovelgönne and A. Geyer-Schulz, “An ensemble learning strategy for graph clustering,” in Graph Partitioning and Graph Clustering, ser. Contemporary Mathematics, D. A. Bader, H.  Meyerhenke, P. Sanders, and D. Wagner, Eds. Providence, RI, USA: AMS, 2013, no. 588.
[28]
U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,” Phys. Rev. E , vol. 76, no. 3, p. 036106, 2007.
[29]
E. J. Riedy and D. A. Bader, “Multithreaded community monitoring for massive streaming graph data,” in Proc. Workshop Multithreaded Archit. Appl., 2013, pp. 1646– 1655.
[30]
E. J. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader, “Parallel community detection for massive graphs,” in Graph Partitioning and Graph Clustering, ser. Contemporary Mathematics, D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, Eds. Providence, RI, USA: AMS, 2013, no. 588.
[31]
R. Rotta and A. Noack, “Multilevel local search algorithms for modularity clustering,” J. Exp. Algorithmics, vol. 16, pp. 2.3:2.1–2.3:2.27, Jul. 2011.
[32]
S. E. Schaeffer, “Graph clustering,” Comput. Sci. Rev. , vol. 1, no. 1, pp. 27–64, 2007.
[33]
J. Soman and A. Narang, “Fast community detection algorithm with GPUs and multicore architectures,” in Proc. 25th IEEE Intl. Parallel Distrib. Process. Symp., 2011, pp. 568 –579.
[34]
C. Staudt, A. Schumm, H. Meyerhenke, R. Görke, and D. Wagner, “Static and dynamic aspects of scientific collaboration networks, ” in Proc. IEEE/ACM Int. Conf. Adv. Soc. Netw. Anal. Min., 2012, pp. 522–526.
[35]
C. L. Staudt and H. Meyerhenke, “Engineering high-performance community detection heuristics for massive graphs, ” in Proc. Int. Conf. Parallel Process., 2013, pp. 180– 189.
[36]
C. L. Staudt, A. Sazonovs, and H. Meyerhenke. (2014). “ NetworKit: An interactive tool suite for high-performance network analysis,” arXiv preprint arXiv:1403.3005 [Online]. Available: http://arxiv.org/abs/1403.3005
[37]
J. Yang and J. Leskovec, “Defining and evaluating network communities based on ground-truth,” in Proc. ACM SIGKDD Workshop Min. Data Semantics, 2012, p. 3.

Cited By

View all
  • (2024)Scalable High-Quality Hypergraph PartitioningACM Transactions on Algorithms10.1145/362652720:1(1-54)Online publication date: 22-Jan-2024
  • (2023)Modeling the Centrality of Developer Output with Software Supply ChainsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613873(1809-1819)Online publication date: 30-Nov-2023
  • (2023)Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph DiscoveryACM Transactions on Parallel Computing10.1145/358308410:2(1-35)Online publication date: 20-Jun-2023
  • Show More Cited By

Index Terms

  1. Engineering Parallel Algorithms for Community Detection in Massive Networks
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Parallel and Distributed Systems
          IEEE Transactions on Parallel and Distributed Systems  Volume 27, Issue 1
          Jan. 2016
          304 pages

          Publisher

          IEEE Press

          Publication History

          Published: 01 January 2016

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 19 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Scalable High-Quality Hypergraph PartitioningACM Transactions on Algorithms10.1145/362652720:1(1-54)Online publication date: 22-Jan-2024
          • (2023)Modeling the Centrality of Developer Output with Software Supply ChainsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613873(1809-1819)Online publication date: 30-Nov-2023
          • (2023)Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph DiscoveryACM Transactions on Parallel Computing10.1145/358308410:2(1-35)Online publication date: 20-Jun-2023
          • (2023)Isolate Sets Based Parallel Louvain Method for Community DetectionJournal of Computer Science and Technology10.1007/s11390-023-1599-138:2(373-390)Online publication date: 30-Mar-2023
          • (2022)Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental AnalysisACM Journal of Experimental Algorithmics10.1145/356459327(1-31)Online publication date: 13-Dec-2022
          • (2022)Batched Graph Community Detection on GPUsProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569655(172-184)Online publication date: 8-Oct-2022
          • (2022)Generating Simple Directed Social Network Graphs for Information SpreadingProceedings of the ACM Web Conference 202210.1145/3485447.3512194(1475-1485)Online publication date: 25-Apr-2022
          • (2022)Network Sparsification via Degree- and Subgraph-Based Edge SamplingProceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1109/ASONAM55673.2022.10068651(9-16)Online publication date: 10-Nov-2022
          • (2022)Scalable distributed Louvain algorithm for community detection in large graphsThe Journal of Supercomputing10.1007/s11227-021-04224-278:7(10275-10309)Online publication date: 1-May-2022
          • (2022)Motif‐based embedding label propagation algorithm for community detectionInternational Journal of Intelligent Systems10.1002/int.2275937:3(1880-1902)Online publication date: 25-Jan-2022
          • Show More Cited By

          View Options

          View options

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media