research-article

Public Access

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Authors:

Nicholas D. SidiropoulosAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 577 - 587

https://doi.org/10.1145/3394486.3403100

Published: 20 August 2020 Publication History

Abstract

Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering coefficients, imply the existence of substantially large vertex neighborhoods with high edge-density. This observation suggests a very simple approach for extracting large quasi-cliques: simply scan the vertex neighborhoods, compute the clustering coefficient of each vertex, and output the best such subgraph. The implementation of such a method requires counting the triangles in a graph, which is a well-studied problem in graph mining. When empirically tested across a number of real-world graphs, this approach reveals a surprise: vertex neighborhoods include maximal cliques of non-trivial sizes, and the density of the best neighborhood often compares favorably to subgraphs produced by dedicated algorithms for maximizing subgraph density. For graphs with small clustering coefficients, we demonstrate that small vertex neighborhoods can be refined using a local-search method to grow larger cliques and near-cliques. Our results indicate that contrary to worst-case theoretical results, mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem, and provides motivation for further work geared towards a better explanation of these empirical successes.

References

[1]

2015. Large Near-Clique Detection. http://github.com/tsourolampis/Scalable-Large-Near-Clique-Detection.

[2]

Noga Alon and Joel H Spencer. 2016. The probabilistic method. John Wiley & Sons.

[3]

Albert Angel, Nikos Sarkas, Nick Koudas, and Divesh Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. In Proceedings of the 38th International Conference on Very Large Data Bases. VLDB Endowment, 574--585.

Digital Library

[4]

Gary D Bader and Christopher WV Hogue. 2003. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 1 (2003), 2.

[5]

Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.

[6]

Vladimir Batagelj and Matjaz Zaversnik. 2003. An O(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).

[7]

Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM 16, 9 (1973), 575--577.

Digital Library

[8]

Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 95--106.

Digital Library

[9]

Jose Cadena, Anil Kumar Vullikanti, and Charu C Aggarwal. 2016. On dense subgraphs in signed network streams. In Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). IEEE, 51--60.

[10]

Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer, 84--95.

[11]

Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.

[12]

Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1.

Digital Library

[13]

David Eppstein, Maarten Löffler, and Darren Strash. 2010. Listing all maximal cliques in sparse graphs in near-optimal time. In International Symposium on Algorithms and Computation. Springer, 403--414.

[14]

Pooya Esfandiar, Francesco Bonchi, David F Gleich, Chen Greif, Laks VS Lakshmanan, and Byung-Won On. 2010. Fast Katz and commuters: Efficient estimation of social relatedness in large networks. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 132--145.

[15]

Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, Vol. 29. ACM, 251--262.

Digital Library

[16]

Alessandro Ferrante, Gopal Pandurangan, and Kihong Park. 2008. On the hardness of optimization in power-law graphs. Theoretical Computer Science 393, 1--3 (2008), 220--230.

Digital Library

[17]

Michael R Garey and David S Johnson. 2002. Computers and intractability.

[18]

David Gibson, Ravi Kumar, and Andrew Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 721--732.

Digital Library

[19]

David F Gleich and C Seshadhri. 2012. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 597--605.

Digital Library

[20]

Andrew V Goldberg. 1984. Finding a maximum density subgraph. Technical report, University of California Berkeley, CA.

[21]

Rishi Gupta, Tim Roughgarden, and Comandur Seshadhri. 2014. Decompositions of triangle-dense graphs. In Proceedings of the 5th conference on Innovations in Theoretical Computer Science. ACM, 471--482.

Digital Library

[22]

Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 895--904.

Digital Library

[23]

Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical Computer Science 407, 1--3 (2008), 458--473.

Digital Library

[24]

Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.

[25]

Zhi-Quan Luo, Wing-Kin Ma, Anthony Man-Cho So, Yinyu Ye, and Shuzhong Zhang. 2010. Semidefinite relaxation of quadratic optimization problems. IEEE Signal Processing Magazine 3, 27 (2010), 20--34.

[26]

Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, and Shen Chen Xu. 2015. Scalable large near-clique detection in large-scale networks via sampling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 815--824.

Digital Library

[27]

Mark Newman. 2018. Networks. Oxford university press.

[28]

N Prulj, Dennis A Wigle, and Igor Jurisica. 2004. Functional topology in a network of protein interactions. Bioinformatics 20, 3 (2004), 340--348.

Digital Library

[29]

Stephen B Seidman. 1983. Network structure and minimum degree. Social networks 5, 3 (1983), 269--287.

[30]

Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 817--826.

Digital Library

[31]

Charalampos Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1122--1132.

Digital Library

[32]

Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104--112.

Digital Library

[33]

Takeaki Uno. 2005. Maximal Clique Enumerator (MACE). http://research.nii.ac.jp/~uno/codes.htm.

[34]

Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. 2009. On the evolution of user interaction in Facebook. In Proceedings of the 2nd ACM Workshop on Online social networks. ACM, 37--42.

Digital Library

[35]

Duncan JWatts and Steven H Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684 (1998), 440.

Cited By

Pang JMa CFang YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)A Similarity-based Approach for Efficient Large Quasi-clique DetectionProceedings of the ACM Web Conference 202410.1145/3589334.3645374(401-409)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645374
Chen TTsourakakis CZhang ARangwala H(2022)AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial NetworksProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539100(2762-2770)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539100

Index Terms

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods
1. Mathematics of computing
  1. Discrete mathematics
    1. Combinatorics
    2. Graph theory

Recommendations

Mining Largest Maximal Quasi-Cliques

Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications in bioinformatics and social network analysis. ...
Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph
DATA 2015: Proceedings of 4th International Conference on Data Management Technologies and Applications

Many real-world phenomena such as social networks and biological networks can be modeled as graphs. Discovering dense sub-graphs from these graphs may be able to find interesting facts about the phenomena.

Quasi-cliques are a type of dense graphs, which ...
On Vertex-Disjoint Triangles in Tripartite Graphs and Multigraphs
Abstract
Let G be a tripartite graph with tripartition $(V_{1}, V_{2}, V_{3})$ , where $∣ V_{1} ∣ = ∣ V_{2} ∣ = ∣ V_{3} ∣ = k > 0$ . It is proved that if $d (x) + d (y) \geq 3 k$ for every pair of nonadjacent vertices $x \in V_{i}, y \in V_{j}$ with $i \neq j (i, j \in {1, 2, 3})$ , then G contains k vertex-disjoint triangles. As a ... $\frac{}{}$ $_{}_{}_{}$ $_{}_{}_{}$ $_{}$

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
423
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)13

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pang JMa CFang YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)A Similarity-based Approach for Efficient Large Quasi-clique DetectionProceedings of the ACM Web Conference 202410.1145/3589334.3645374(401-409)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645374
Chen TTsourakakis CZhang ARangwala H(2022)AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial NetworksProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539100(2762-2770)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539100

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents