Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Mining Largest Maximal Quasi-Cliques

Published: 15 April 2021 Publication History

Abstract

Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications in bioinformatics and social network analysis. However, enumerating quasi-cliques in a graph is a challenging problem, even harder than the problem of enumerating cliques. We consider the enumeration of top-k degree-based quasi-cliques and make the following contributions: (1) we show that even the problem of detecting whether a given quasi-clique is maximal (i.e., not contained within another quasi-clique) is NP-hard. (2) We present a novel heuristic algorithm KernelQC to enumerate the k largest quasi-cliques in a graph. Our method is based on identifying kernels of extremely dense subgraphs within a graph, followed by growing subgraphs around these kernels, to arrive at quasi-cliques with the required densities. (3) Experimental results show that our algorithm accurately enumerates quasi-cliques from a graph, is much faster than current state-of-the-art methods for quasi-clique enumeration (often more than three orders of magnitude faster), and can scale to larger graphs than current methods.

References

[1]
James Abello, Mauricio G.C. Resende, and Sandra Sudarsky. 2002. Massive quasi-clique detection. In Proceedings of the 5th Latin American Symposium on Theoretical Informatics. Springer, 598--612.
[2]
Richard D. Alba. 1973. A graph-theoretic definition of a sociometric clique. Journal of Mathematical Sociology 3 (1973), 113--126.
[3]
Gabriela Alexe, Sorin Alexe, Yves Crama, Stephan Foldes, Peter L. Hammer, and Bruno Simeone. 2004. Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics 145, 1 (2004), 11--21.
[4]
Reid Andersen and Kumar Chellapilla. 2009. Finding dense subgraphs with size bounds. In Proceedings of the International Workshop on Algorithms and Models for the Web-Graph. Springer, 25--37.
[5]
Albert Angel, Nikos Sarkas, Nick Koudas, and Divesh Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proceedings of the VLDB Endowment 5, 6 (2012), 574--585.
[6]
Oana Denisa Balalau, Francesco Bonchi, T.H. Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining. ACM, 379--388.
[7]
Balabhaskar Balasundaram, Sergiy Butenko, and Illya V. Hicks. 2011. Clique relaxations in social network analysis: The maximum k-plex problem. Operations Research 59, 1 (2011), 133--142.
[8]
Paul Balister, Béla Bollobás, Julian Sahasrabudhe, and Alexander Veremyev. 2019. Dense subgraphs in random graphs. Discrete Applied Mathematics 260 (2019), 66--74.
[9]
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
[10]
Devora Berlowitz, Sara Cohen, and Benny Kimelfeld. 2015. Efficient enumeration of maximal k-plexes. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 431--444.
[11]
Malay Bhattacharyya and Sanghamitra Bandyopadhyay. 2009. Mining the largest quasi-clique in human protein interactome. In Proceedings of the International Conference on Adaptive and Intelligent Systems (ICAIS’09). IEEE, 194--199.
[12]
Brigitte Boden, Stephan Günnemann, Holger Hoffmann, and Thomas Seidl. 2012. Mining coherent subgraphs in multi-layer graphs with edge labels. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1258--1266.
[13]
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. Communications of the ACM 16, 9 (1973), 575--577.
[14]
Mauro Brunato, Holger H. Hoos, and Roberto Battiti. 2007. On effectively finding maximal quasi-cliques in graphs. In Proceedings of the International Conference on Learning and Intelligent Optimization. Springer, 41--55.
[15]
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of the International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95.
[16]
Jie Chen and Yousef Saad. 2012. Dense subgraph extraction with application to community detection. IEEE Transactions on Knowledge and Data Engineering 24, 7 (2012), 1216--1230.
[17]
James Cheng, Yiping Ke, Shumo Chu, and M. Tamer Özsu. 2011. Efficient core decomposition in massive networks. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE’11). IEEE, 51--62.
[18]
Norishige Chiba and Takao Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM Journal on Computing 14, 1 (1985), 210--223.
[19]
Shumo Chu and James Cheng. 2012. Triangle listing in massive networks. ACM Transactions on Knowledge Discovery from Data 6, 4 (2012), 17.
[20]
Jonathan Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report 16.
[21]
Alessio Conte, Tiziano De Matteis, Daniele De Sensi, Roberto Grossi, Andrea Marino, and Luca Versari. 2018. D2K: Scalable community detection in massive networks via small-diameter k-plexes. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1272--1281.
[22]
Alessio Conte, Donatella Firmani, Caterina Mordente, Maurizio Patrignani, and Riccardo Torlone. 2017. Fast enumeration of large k-plexes. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 115--124.
[23]
Alessio Conte, Roberto Grossi, Andrea Marino, and Luca Versari. 2016. Sublinear-space bounded-delay enumeration for massive network analytics: Maximal cliques. In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP’16), Vol. 148. 1--148.
[24]
Apurba Das, Seyed-Vahid Sanei-Mehri, and Srikanta Tirthapura. 2018. Shared-memory parallel maximal clique enumeration. In Proceedings of the 2018 IEEE 25th International Conference on High Performance Computing (HiPC’18). IEEE, 62--71.
[25]
Apurba Das, Seyed-Vahid Sanei-Mehri, and Srikanta Tirthapura. 2020. Shared-memory parallel maximal clique enumeration from static and dynamic graphs. ACM Transactions on Parallel Computing 7, 1 (2020), 1--28.
[26]
Apurba Das, Michael Svendsen, and Srikanta Tirthapura. 2016. Change-sensitive algorithms for maintaining maximal cliques in a dynamic graph. CoRR abs/1601.06311 (2016). http://arxiv.org/abs/1601.06311
[27]
Naga Shailaja Dasari, Ranjan Desh, and Mohammad Zubair. 2014. ParK: An efficient algorithm for k-core decomposition on multicore processors. In Proceedings of the 2014 IEEE International Conference on Big Data (Big Data’14). IEEE, 9--16.
[28]
Youcef Djeddi, Hacene Ait Haddadene, and Nabil Belacel. 2019. An extension of adaptive multi-start tabu search for the maximum quasi-clique problem. Computers & Industrial Engineering 132 (2019), 280--292.
[29]
David Eppstein, Maarten Löffler, and Darren Strash. 2010. Listing all maximal cliques in sparse graphs in near-optimal time. In Proceedings of the International Symposium on Algorithms and Computation. 403--414.
[30]
Linton C. Freeman. 1992. The sociological concept of ”Group”: An empirical test of two models. American Journal of Sociology 98, 1 (1992), 152--166.
[31]
Esther Galbrun, Aristides Gionis, and Nikolaj Tatti. 2016. Top-k overlapping densest subgraphs. Data Mining and Knowledge Discovery 30, 5 (2016), 1134--1165.
[32]
Michael R. Garey and David S. Johnson. 2002. Computers and Intractability. Vol. 29. W.H. Freeman, New York.
[33]
Andrew V. Goldberg. 1984. Finding a Maximum Density Subgraph. University of California Berkeley, CA.
[34]
Stephan Gunnemann, Ines Farber, Brigitte Boden, and Thomas Seidl. 2010. Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In Proceedings of the 2010 IEEE 10th International Conference on Data Mining (ICDM’10). IEEE, 845--850.
[35]
Guimu Guo, Da Yan, M. Tamer Özsu, and Zhe Jiang. 2020. Scalable mining of maximal quasi-cliques: An algorithm-system codesign approach. arXiv preprint arXiv:2005.00081 (2020).
[36]
Makoto Haraguchi and Yoshiaki Okubo. 2006. A method for pinpoint clustering of web pages with pseudo-clique search. In Federation Over the Web. Springer, 59--78.
[37]
Daxin Jiang and Jian Pei. 2009. Mining frequent cross-graph quasi-cliques. ACM Transactions on Knowledge Discovery from Data 2, 4 (2009), 16.
[38]
Wissam Khaouid, Marina Barsky, Venkatesh Srinivasan, and Alex Thomo. 2015. K-core decomposition of large networks on a single PC. Proceedings of the VLDB Endowment 9, 1 (2015), 13--23.
[39]
Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In Proceedings of the 36th International Colloquium on Automata, Languages, and Programming. Springer, 597--608.
[40]
Christian Komusiewicz. 2016. Multivariate algorithmics for finding cohesive subnetworks. Algorithms 9, 1 (2016), 21.
[41]
Valdis E. Krebs. 2002. Mapping networks of terrorist cells. Connections 24, 3 (2002), 43--52.
[42]
Pei Lee and Laks V.S. Lakshmanan. 2016. Query-driven maximum quasi-clique search. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 522--530.
[43]
Guimei Liu, Kelvin Sim, and Jinyan Li. 2006. Efficient mining of large maximal bicliques. In Data Warehousing and Knowledge Discovery. 437--448.
[44]
Guimei Liu and Limsoon Wong. 2008. Effective pruning techniques for mining quasi-cliques. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 33--49.
[45]
Xiuli Ma, Guangyu Zhou, Jingbo Shang, Jingjing Wang, Jian Peng, and Jiawei Han. 2017. Detection of complexes in biological networks through diversified dense subgraph mining. Journal of Computational Biology 24, 9 (2017), 923--941.
[46]
Kazuhisa Makino and Takeaki Uno. 2004. New algorithms for enumerating all maximal cliques. In Proceedings of the Scandinavian Workshop on Algorithm Theory. 260--272.
[47]
Fabrizio Marinelli, Andrea Pizzuti, and Fabrizio Rossi. 2018. A star-based reformulation for the maximum quasi-clique problem. In Proceedings of the 16th Cologne-Twente Workshop on Graphs and Combinatorial Optimization. 112.
[48]
Hideo Matsuda, Tatsuya Ishihara, and Akihiro Hashimoto. 1999. Classifying molecular sequences using a linkage graph with their pairwise similarities. Theoretical Computer Science 210, 2 (1999), 305--325.
[49]
Diane B. Miller and James P. O’Callaghan. 2015. Biomarkers of Parkinson’s disease: Present and future. Metabolism 64, 3 (2015), S40--S46.
[50]
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 815--824.
[51]
John W. Moon and Leo Moser. 1965. On cliques in graphs. Israel Journal of Mathematics 3, 1 (1965), 23--28.
[52]
Arko Provo Mukherjee and Srikanta Tirthapura. 2017. Enumerating maximal bicliques from a large graph using mapreduce. IEEE Transactions on Services Computing 10, 5 (2017), 771--784.
[53]
Arko Provo Mukherjee, Pan Xu, and Srikanta Tirthapura. 2017. Enumeration of maximal cliques from an uncertain graph. IEEE Transactions on Knowledge and Data Engineering 29, 3 (2017), 543--555.
[54]
Ha-Myung Park, Francesco Silvestri, Rasmus Pagh, Chin-Wan Chung, Sung-Hyon Myaeng, and U. Kang. 2018. Enumerating trillion subgraphs on distributed systems. ACM Transactions on Knowledge Discovery from Data 12, 6 (2018), 71.
[55]
Grigory Pastukhov, Alexander Veremyev, Vladimir Boginski, and Oleg A. Prokopyev. 2018. On maximum degree-based-quasi-clique problem: Complexity and exact approaches. Networks 71, 2 (2018), 136--152.
[56]
Jeffrey Pattillo, Alexander Veremyev, Sergiy Butenko, and Vladimir Boginski. 2013. On the maximum quasi-clique problem. Discrete Applied Mathematics 161, 1--2 (2013), 244--257.
[57]
Jeffrey Pattillo, Nataly Youssef, and Sergiy Butenko. 2012. Clique relaxation models in social network analysis. In Handbook of Optimization in Complex Networks. Springer, 143--162.
[58]
Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 228--238.
[59]
Sergio Rajsbaum. 2002. LATIN 2002: Theoretical Informatics: 5th Latin American Symposium, Cancun, Mexico, April 3-6, 2002, Proceedings. Vol. 5. Springer Science & Business Media.
[60]
Celso C. Ribeiro and José A. Riveaux. 2019. An exact algorithm for the maximum quasi-clique problem. International Transactions in Operational Research 26, 6 (2019), 2199--2229.
[61]
Polina Rozenshtein, Nikolaj Tatti, and Aristides Gionis. 2017. Finding dynamic dense subgraphs. ACM Transactions on Knowledge Discovery from Data 11, 3 (2017), 27.
[62]
Seyed-Vahid Sanei-Mehri, Apurba Das, and Srikanta Tirthapura. 2018. Enumerating top-k quasi-cliques. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data’18). IEEE, 1107--1112.
[63]
Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sariyuce, and Srikanta Tirthapura. 2018. Butterfly counting in bipartite networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2150--2159.
[64]
Arlei Silva, Wagner Meira Jr., and Mohammed J. Zaki. 2012. Mining attribute-structure correlated patterns in large attributed graphs. Proceedings of the VLDB Endowment 5, 5 (2012), 466--477.
[65]
Kelvin Sim, Jinyan Li, Vivekanand Gopalkrishnan, and Guimei Liu. 2006. Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In Proceedings of the 6th International Conference on Data Mining (ICDM’06). IEEE, 1059--1063.
[66]
Victor Spirin and Leonid A. Mirny. 2003. Protein complexes and functional modules in molecular networks. 100, 21 (2003), 12123--12128.
[67]
Michael Svendsen, Arko Provo Mukherjee, and Srikanta Tirthapura. 2015. Mining maximal cliques from a large graph using mapreduce: Tackling highly uneven subproble m sizes. Journal of Parallel and Distributed Computing 79 (2015), 104--114.
[68]
Jaya Thomas, Dongmin Seo, and Lee Sael. 2016. Review on graph clustering and subgraph similarity based analysis of neurological disorders. International Journal of Molecular Sciences 17, 6 (2016), 862.
[69]
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science 363, 1 (2006), 28--42.
[70]
Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1122--1132.
[71]
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112.
[72]
Charalampos E. Tsourakakis. 2014. A novel approach to finding near-cliques: The triangle-densest subgraph problem. arXiv preprint arXiv:1405.1477 (2014).
[73]
Takeaki Uno. 2007. An efficient algorithm for enumerating pseudo cliques. In Proceedings of the International Symposium on Algorithms and Computation. Springer, 402--414.
[74]
Alexander Veremyev, Vladimir Boginski, Pavlo A. Krokhmal, and David E. Jeffcoat. 2012. Dense percolation in large-scale mean-field random networks is provably “explosive”. PloS One 7, 12 (2012), e51883.
[75]
Chrysafis Vogiatzis, Alexander Veremyev, Eduardo L. Pasiliao, and Panos M. Pardalos. 2015. An integer programming approach for finding the most and the least central cliques. Optimization Letters 9, 4 (2015), 615--633.
[76]
Ling Wang, Yu Lu, Bo Jiang, Kai Tai Gao, and Tie Hua Zhou. 2020. Dense subgraphs summarization: An efficient way to summarize large scale graphs by super nodes. In Proceedings of the International Conference on Intelligent Computing. Springer, 520--530.
[77]
Zhiping Zeng, Jianyong Wang, Lizhu Zhou, and George Karypis. 2007. Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Transactions on Database Systems 32, 2 (2007), 13.
[78]
Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010. Finding top-k maximal cliques in an uncertain graph. In Proceedings of the 2010 IEEE 26th International Conference on Data Engineering (ICDE’10). IEEE, 649--652.

Cited By

View all
  • (2024)An optimization algorithm for maximum quasi-clique problem based on information feedback modelPeerJ Computer Science10.7717/peerj-cs.217310(e2173)Online publication date: 12-Jul-2024
  • (2024)Diversity-Optimized Group Extraction in Social NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.322493511:1(756-769)Online publication date: Feb-2024
  • (2024)DeepDenseExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121816238:PBOnline publication date: 27-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 5
October 2021
508 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3461317
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2021
Accepted: 01 December 2020
Revised: 01 November 2020
Received: 01 April 2020
Published in TKDD Volume 15, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dense subgraph mining
  2. In-complete subgraphs
  3. Large-scale graphs
  4. Quasi-Clique enumeration
  5. heuristic algorithms

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)220
  • Downloads (Last 6 weeks)33
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An optimization algorithm for maximum quasi-clique problem based on information feedback modelPeerJ Computer Science10.7717/peerj-cs.217310(e2173)Online publication date: 12-Jul-2024
  • (2024)Diversity-Optimized Group Extraction in Social NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.322493511:1(756-769)Online publication date: Feb-2024
  • (2024)DeepDenseExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121816238:PBOnline publication date: 27-Feb-2024
  • (2024)A Recursive Approach for Maximal ()-Clique Enumeration in Temporal NetworksAdvances in Databases and Information Systems10.1007/978-3-031-70626-4_6(79-92)Online publication date: 28-Aug-2024
  • (2023)Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design ApproachProceedings of the ACM on Management of Data10.1145/36173311:3(1-26)Online publication date: 13-Nov-2023
  • (2023)Computing Graph Descriptors on Edge StreamsACM Transactions on Knowledge Discovery from Data10.1145/359146817:8(1-25)Online publication date: 12-May-2023
  • (2023)MIP formulations for induced graph optimization problems: a tutorialInternational Transactions in Operational Research10.1111/itor.1329930:6(3159-3200)Online publication date: 17-Apr-2023
  • (2023)Listing maximal k-relaxed-vertex connected components from large graphsInformation Sciences: an International Journal10.1016/j.ins.2022.11.043620:C(67-83)Online publication date: 1-Jan-2023
  • (2023)A biased random-key genetic algorithm for the minimum quasi-clique partitioning problemAnnals of Operations Research10.1007/s10479-023-05609-7Online publication date: 28-Sep-2023
  • (2022)A local search algorithm with hybrid strategies for the maximum weighted quasi‐clique problemElectronics Letters10.1049/ell2.1268559:1Online publication date: 7-Dec-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media