Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3219819.3220093acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes

Published: 19 July 2018 Publication History

Abstract

This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in today's real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.

References

[1]
David Avis and Komei Fukuda. Reverse search for enumeration. Discrete Applied Mathematics, 65(1--3):21 -- 46, 1996.
[2]
Rachel Behar and Sara Cohen. Finding all maximal connected s-cliques in social networks. In 21th International Conference on Extending Database Technology, EDBT, pages 61--72, 2018.
[3]
Devora Berlowitz, Sara Cohen, and Benny Kimelfeld. Efficient enumeration of maximal k-plexes. In 2015 ACM SIGMOD International Conference on Management of Data, pages 431--444. ACM, 2015.
[4]
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, 34(8):711--726, 2004.
[5]
Coen Bron and Joep Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9):575--577, 1973.
[6]
Coenraad Bron and Joep Kerbosch. Finding all cliques of an undirected graph (algorithm 457). Communications of the ACM, 16(9):575--576, 1973.
[7]
James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu. Fast algorithms for maximal clique enumeration with limited memory. In 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 1240--1248, 2012.
[8]
Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 14(1):210--223, 1985.
[9]
Alessio Conte, Donatella Firmani, Caterina Mordente, Maurizio Patrignani, and Riccardo Torlone. Fast enumeration of large k-plexes. In 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 115--124, 2017.
[10]
Alessio Conte, Roberto Grossi, Andrea Marino, and Luca Versari. Sublinear-space bounded-delay enumeration for massive network analytics: Maximal cliques. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP, pages 148:1--148:15, 2016.
[11]
Alessio Conte, Roberto De Virgilio, Antonio Maccioni, Maurizio Patrignani, and Riccardo Torlone. Finding all maximal cliques in very large social networks. In 19th International Conference on Extending Database Technology, EDBT, pages 173--184, 2016.
[12]
David Eppstein, Maarten Löffler, and Darren Strash. Listing all maximal cliques in large sparse real-world graphs. ACM Journal of Experimental Algorithmics, 18, 2013.
[13]
Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75 -- 174, 2010.
[14]
Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. Querying k-truss community in large and dynamic graphs. In 2014 ACM SIGMOD International Conference on Management of Data, pages 1311--1322. ACM, 2014.
[15]
Guimei Liu and Limsoon Wong. Effective pruning techniques for mining quasi-cliques. In Machine Learning and Knowledge Discovery in Databases, pages 33--49. Springer Berlin Heidelberg, 2008.
[16]
Kazuhisa Makino and Takeaki Uno. New algorithms for enumerating all maximal cliques. In Algorithm Theory-SWAT 2004, pages 260--272. Springer, 2004.
[17]
Robert J Mokken. Cliques, clubs and clans. Quality and quantity, 13(2):161--173, 1979.
[18]
Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi. Distributed k-core decomposition. IEEE Transactions on parallel and distributed systems, 24(2):288--300, 2013.
[19]
Stephen B Seidman and Brian L Foster. A graph-theoretic generalization of the clique concept. Journal of Mathematical sociology, 6(1):139--154, 1978.
[20]
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363(1):28--42, 2006.
[21]
Charalampos E. Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria A. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pages 104--112, 2013.
[22]
Shuji Tsukiyama, Mikio Ide, Hiromu Ariyoshi, and Isao Shirakawa. A new algorithm for generating all the maximal independent sets. SIAM Journal on Computing, 6(3):505--517, 1977.
[23]
Takeaki Uno. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 56(1):3--16, 2008.
[24]
Zhuo Wang, Qun Chen, Boyi Hou, Bo Suo, Zhanhuai Li, Wei Pan, and Zachary G. Ives. Parallelizing maximal clique and k-plex enumeration over graph data. Journal of Parallel and Distributed Computing, 106:79 -- 91, 2017.
[25]
Bin Wu and Xin Pei. A parallel algorithm for enumerating all the maximal k-plexes. In Emerging Technologies in Knowledge Discovery and Data Mining, pages 476--483, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
[26]
Yanyan Xu, James Cheng, Ada Wai-Chee Fu, and Yingyi Bu. Distributed maximal clique computation. In 2014 IEEE International Congress on Big Data, pages 160--167. IEEE, 2014.
[27]
Hongjie Zhai, Makoto Haraguchi, Yoshiaki Okubo, and Etsuji Tomita. A fast and complete algorithm for enumerating pseudo-cliques in large graphs. International Journal of Data Science and Analytics, 2(3):145--158, Dec 2016.
[28]
Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. Finding top-k maximal cliques in an uncertain graph. In 26th International Conference on Data Engineering, ICDE, pages 649--652. IEEE, 2010.

Cited By

View all
  • (2024)Discovering stable communities in massive temporal graphsProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670186(468-472)Online publication date: 24-May-2024
  • (2024)Maximum k-Plex Computation: Theory and PracticeProceedings of the ACM on Management of Data10.1145/36393182:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Streaming Local Community Detection Through Approximate ConductanceIEEE Transactions on Big Data10.1109/TBDATA.2023.331025110:1(12-22)Online publication date: Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. community discovery
  2. graph enumeration
  3. k-plexes
  4. parallel programming

Qualifiers

  • Research-article

Funding Sources

  • EU H2020-ICT-2014-1
  • JST CREST, Japan

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)7
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Discovering stable communities in massive temporal graphsProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670186(468-472)Online publication date: 24-May-2024
  • (2024)Maximum k-Plex Computation: Theory and PracticeProceedings of the ACM on Management of Data10.1145/36393182:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Streaming Local Community Detection Through Approximate ConductanceIEEE Transactions on Big Data10.1109/TBDATA.2023.331025110:1(12-22)Online publication date: Feb-2024
  • (2024)Top-$L$ Most Influential Community Detection Over Social Networks2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.10639540(5767-5779)Online publication date: 13-May-2024
  • (2024)Efficient Maximal Temporal Plex Enumeration2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00240(3098-3110)Online publication date: 13-May-2024
  • (2024)On Searching Maximum Directed $(k, \ell)$-Plex2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00202(2570-2583)Online publication date: 13-May-2024
  • (2024)Quantum Algorithms for the Maximum K-Plex Problem2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00192(2435-2448)Online publication date: 13-May-2024
  • (2024)An Approximation Algorithm for K-best Enumeration of Minimal Connected Edge Dominating Sets with Cardinality ConstraintsTheoretical Computer Science10.1016/j.tcs.2024.114628(114628)Online publication date: May-2024
  • (2024)A local search algorithm with movement gap and adaptive configuration checking for the maximum weighted s-plex problemEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108079133(108079)Online publication date: Jul-2024
  • (2023)A fast maximum k-plex algorithm parameterized by the degeneracy gapProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/627(5648-5656)Online publication date: 19-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media