Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394486.3403100acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Published: 20 August 2020 Publication History

Abstract

Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering coefficients, imply the existence of substantially large vertex neighborhoods with high edge-density. This observation suggests a very simple approach for extracting large quasi-cliques: simply scan the vertex neighborhoods, compute the clustering coefficient of each vertex, and output the best such subgraph. The implementation of such a method requires counting the triangles in a graph, which is a well-studied problem in graph mining. When empirically tested across a number of real-world graphs, this approach reveals a surprise: vertex neighborhoods include maximal cliques of non-trivial sizes, and the density of the best neighborhood often compares favorably to subgraphs produced by dedicated algorithms for maximizing subgraph density. For graphs with small clustering coefficients, we demonstrate that small vertex neighborhoods can be refined using a local-search method to grow larger cliques and near-cliques. Our results indicate that contrary to worst-case theoretical results, mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem, and provides motivation for further work geared towards a better explanation of these empirical successes.

References

[1]
2015. Large Near-Clique Detection. http://github.com/tsourolampis/Scalable-Large-Near-Clique-Detection.
[2]
Noga Alon and Joel H Spencer. 2016. The probabilistic method. John Wiley & Sons.
[3]
Albert Angel, Nikos Sarkas, Nick Koudas, and Divesh Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. In Proceedings of the 38th International Conference on Very Large Data Bases. VLDB Endowment, 574--585.
[4]
Gary D Bader and Christopher WV Hogue. 2003. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 1 (2003), 2.
[5]
Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.
[6]
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
[7]
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM 16, 9 (1973), 575--577.
[8]
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 95--106.
[9]
Jose Cadena, Anil Kumar Vullikanti, and Charu C Aggarwal. 2016. On dense subgraphs in signed network streams. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). IEEE, 51--60.
[10]
Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95.
[11]
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.
[12]
Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1.
[13]
David Eppstein, Maarten Löffler, and Darren Strash. 2010. Listing all maximal cliques in sparse graphs in near-optimal time. In International Symposium on Algorithms and Computation. Springer, 403--414.
[14]
Pooya Esfandiar, Francesco Bonchi, David F Gleich, Chen Greif, Laks VS Lakshmanan, and Byung-Won On. 2010. Fast Katz and commuters: Efficient estimation of social relatedness in large networks. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 132--145.
[15]
Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, Vol. 29. ACM, 251--262.
[16]
Alessandro Ferrante, Gopal Pandurangan, and Kihong Park. 2008. On the hardness of optimization in power-law graphs. Theoretical Computer Science 393, 1--3 (2008), 220--230.
[17]
Michael R Garey and David S Johnson. 2002. Computers and intractability.
[18]
David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 721--732.
[19]
David F Gleich and C Seshadhri. 2012. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 597--605.
[20]
Andrew V Goldberg. 1984. Finding a maximum density subgraph. Technical report, University of California Berkeley, CA.
[21]
Rishi Gupta, Tim Roughgarden, and Comandur Seshadhri. 2014. Decompositions of triangle-dense graphs. In Proceedings of the 5th conference on Innovations in Theoretical Computer Science. ACM, 471--482.
[22]
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 895--904.
[23]
Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473.
[24]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[25]
Zhi-Quan Luo, Wing-Kin Ma, Anthony Man-Cho So, Yinyu Ye, and Shuzhong Zhang. 2010. Semidefinite relaxation of quadratic optimization problems. IEEE Signal Processing Magazine 3, 27 (2010), 20--34.
[26]
Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.
[27]
Mark Newman. 2018. Networks. Oxford university press.
[28]
N Prulj, Dennis A Wigle, and Igor Jurisica. 2004. Functional topology in a network of protein interactions. Bioinformatics 20, 3 (2004), 340--348.
[29]
Stephen B Seidman. 1983. Network structure and minimum degree. Social networks 5, 3 (1983), 269--287.
[30]
Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 817--826.
[31]
Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1122--1132.
[32]
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112.
[33]
Takeaki Uno. 2005. Maximal Clique Enumerator (MACE). http://research.nii.ac.jp/~uno/codes.htm.
[34]
Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. 2009. On the evolution of user interaction in Facebook. In Proceedings of the 2nd ACM Workshop on Online social networks. ACM, 37--42.
[35]
Duncan JWatts and Steven H Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684 (1998), 440.

Cited By

View all
  • (2024)A Similarity-based Approach for Efficient Large Quasi-clique DetectionProceedings of the ACM Web Conference 202410.1145/3589334.3645374(401-409)Online publication date: 13-May-2024
  • (2022)AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial NetworksProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539100(2762-2770)Online publication date: 14-Aug-2022

Index Terms

  1. Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      August 2020
      3664 pages
      ISBN:9781450379984
      DOI:10.1145/3394486
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 August 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering coefficients
      2. neighborhoods
      3. quasi-cliques
      4. triangles

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      KDD '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)77
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Similarity-based Approach for Efficient Large Quasi-clique DetectionProceedings of the ACM Web Conference 202410.1145/3589334.3645374(401-409)Online publication date: 13-May-2024
      • (2022)AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial NetworksProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539100(2762-2770)Online publication date: 14-Aug-2022

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media