Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3188745.3188810acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

On approximating the number of k-cliques in sublinear time

Published: 20 June 2018 Publication History

Abstract

We study the problem of approximating the number of k-cliques in a graph when given query access to the graph. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let n denote the number of vertices in the graph, m the number of edges, and Ck the number of k-cliques. We design an algorithm that outputs a (1+ε)-approximation (with high probability) for Ck, whose expected query complexity and running time are O(n/Ck1/k+mk/2/Ck )(logn, 1/ε,k).
Hence, the complexity of the algorithm is sublinear in the size of the graph for Ck = ω(mk/2−1). Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on logn, 1/ε and k).
The previous results in this vein are by Feige (SICOMP 06) and by Goldreich and Ron (RSA 08) for edge counting (k=2) and by Eden et al. (FOCS 2015) for triangle counting (k=3). Our result matches the complexities of these results.
The previous result by Eden et al. hinges on a certain amortization technique that works only for triangle counting, and does not generalize for larger cliques. We obtain a general algorithm that works for any k≥ 3 by designing a procedure that samples each k-clique incident to a given set S of vertices with approximately equal probability. The primary difficulty is in finding cliques incident to purely high-degree vertices, since random sampling within neighbors has a low success probability. This is achieved by an algorithm that samples uniform random high degree vertices and a careful tradeoff between estimating cliques incident purely to high-degree vertices and those that include a low-degree vertex.

Supplementary Material

MP4 File (5c-1.mp4)

References

[1]
A. Abboud, A. Backurs, and V. V. Williams. 2015. If the Current Clique Algorithms are Optimal, So is Valiant’s Parser. In Proceedings of the Symposium on Foundations of Computer Science (FOCS). 98–117.
[2]
K. J. Ahn, S. Guha, and A. McGregor. 2012. Graph sketches: sparsification, spanners, and subgraphs. In Proceedings of the Symposium on Principles of Database Systems (PODS). 5–14.
[3]
H. Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Workshop on Large-scale Data Mining: Theory and Applications (LDMTA), Vol. 10. 10–9.
[4]
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD). 16–24.
[5]
J. W. Berry, B. Hendrickson, R. A. LaViolette, and C. A. Phillips. 2011. Tolerating the Community Detection Resolution Limit with Edge Weighting. Physical Review E 83, 5 (May 2011), 056119.
[6]
R. S. Burt. 2004. Structural Holes and Good Ideas. Amer. J. Sociology 110, 2 (2004), 349–399. http://www.jstor.org/stable/10.1086/421787
[7]
B. Chazelle, R. Rubinfeld, and L. Trevisan. 2005. Approximating the Minimum Spanning Tree Weight in Sublinear Time. SIAM J. Comput. 34, 6 (2005), 1370– 1379.
[8]
H. Chernoff. 1952. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics (1952), 493–507.
[9]
N. Chiba and T. Nishizeki. 1985.
[10]
Arboricity and subgraph listing algorithms. SIAM J. Comput. 14, 1 (1985), 210–223.
[11]
F. Chierichetti, A. Dasgupta, R. Kumar, S. Lattanzi, and T. Sarlos. 2016.
[12]
On Sampling Nodes in a Network. In Conference on the World Wide Web (WWW). 471–481.
[13]
S. Chu and J. Cheng. 2011. Triangle listing in massive networks and its applications. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD). 672–680.
[14]
J. S. Coleman. 1988. Social Capital in the Creation of Human Capital. Amer. J. Sociology 94 (1988), S95–S120. http://www.jstor.org/stable/2780243
[15]
A. Czumaj, F. Ergün, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, and C. Sohler. 2005. Approximating the Weight of the Euclidean Minimum Spanning Tree in Sublinear Time. SIAM J. Comput. 35, 1 (2005), 91–109.
[16]
A. Czumaj and C. Sohler. 2009.
[17]
Estimating the Weight of Metric Minimum Spanning Trees in Sublinear Time. SIAM J. Comput. 39, 3 (2009), 904–922.
[18]
A. Dasgupta, R. Kumar, and T. Sarlos. 2014. On Estimating the Average Degree. In Conference on the World Wide Web (WWW). ACM, 795–806.
[19]
J. P. Eckmann and E. Moses. 2002.
[20]
Curvature of Co-links Uncovers Hidden Thematic Layers in the World Wide Web. Proceedings of the National Academy of Sciences 99, 9 (2002), 5825–5829.
[21]
T. Eden, A. Levi, D. Ron, and C Seshadhri. 2015. Approximately counting triangles in sublinear time. In Proceedings of the Symposium on Foundations of Computer Science (FOCS). 614–633.
[22]
T. Eden, D. Ron, and C. Seshadhri. 2017.
[23]
On Approximating the Number of k-cliques in Sublinear Time. CoRR abs/1707.04858v1 (2017). arXiv: 1707.04858v1 http://arxiv.org/abs/1707.04858v1
[24]
T. Eden, D. Ron, and C. Seshadhri. 2017. Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection. In 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland. 7:1–7:13.
[25]
T. Eden and W. Rosenbaum. 2017.
[26]
Lower Bounds for Approximating Graph Parameters via Communication Complexity. CoRR abs/1709.04262 (2017).
[27]
arXiv: 1709.04262 http://arxiv.org/abs/1709.04262
[28]
T. Eden and W. Rosenbaum. 2018.
[29]
On Sampling Edges Almost Uniformly. In 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA. 7:1–7:9.
[30]
F. Eisenbrand and F. Grandoni. 2004. On the complexity of fixed parameter clique and dominating set. Theoretical Computer Science 326, 1-3 (2004), 57–67.
[31]
D. Eppstein, M. Löffler, and D. Strash. 2013. Listing All Maximal Cliques in Large Sparse Real-World Graphs. ACM Journal of Experimental Algorithmics 18 (2013), 3–1.
[32]
U. Feige. 2006.
[33]
On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SIAM J. Comput. 35, 4 (2006), 964–984.
[34]
I. Finocchi, M. Finocchi, and E. G. Fusco. 2015. Clique Counting in MapReduce: Algorithms and Experiments. ACM Journal of Experimental Algorithmics 20 (2015), 1–7.
[35]
B. Foucault Welles, A. Van Devender, and N. Contractor. 2010.
[36]
Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In CHI Extended Abstracts on Human Factors in Computing Systems. 4027–4032.
[37]
O. Goldreich. 2017.
[38]
Introduction to Property Testing. Cambridge University Press.
[39]
O. Goldreich and D. Ron. 2008. Approximating average parameters of graphs. Random Structures and Algorithms 32, 4 (2008), 473–493.
[40]
M. Gonen, D. Ron, and Y. Shavitt. 2011. Counting stars and other small subgraphs in sublinear-time. SIAM Journal on Discrete Mathematics 25, 3 (2011), 1365–1411.
[41]
On Approximating the Number of k-Cliques in Sublinear Time STOC’18, June 25–29, 2018, Los Angeles, CA, USA
[42]
A. Hassidim, J. A. Kelner, H. N. Nguyen, and K. Onak. 2009. Local graph partitions for approximation and testing. In Proceedings of the Symposium on Foundations of Computer Science (FOCS). 22–31.
[43]
P. W. Holland and S. Leinhardt. 1970.
[44]
A method for detecting structure in sociometric data. Amer. J. Sociology 76 (1970), 492–513.
[45]
M. O. Jackson, T. Rodriguez-Barraquer, and X. Tan. 2012.
[46]
Social Capital and Social Quilts: Network Patterns of Favor Exchange. American Economic Review 102, 5 (2012), 1857?1897.
[47]
S. Jain and C. Seshadhri. 2017. A Fast and Provable Method for Estimating Clique Counts Using Turán’s Theorem. In Conference on the World Wide Web (WWW). 441–449.
[48]
H. Jowhari and M. Ghodsi. 2005.
[49]
New streaming algorithms for counting triangles in graphs. In Proceedings of the International Conference Computing and Combinatorics (COCOON). Springer, 710–716.
[50]
D. M. Kane, K. Mehlhorn, T. Sauerwald, and H. Sun. 2012. Counting Arbitrary Subgraphs in Data Streams. In International Colloquium on Automata, Languages, and Programming (ICALP). 598–609.
[51]
M. N. Kolountzakis, G. L. Miller, R. Peng, and C. E. Tsourakakis. 2012. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics 8, 1-2 (2012), 161–185.
[52]
S. Marko and D. Ron. 2009. Approximating the distance to properties in boundeddegree and general sparse graphs. ACM Transactions on Algorithms 5, 2 (2009), 22.
[53]
G. Marsaglia, W. W. Tsang, J. Wang, et al. 2004. Fast generation of discrete random variables. Journal of Statistical Software 11, 3 (2004), 1–11.
[54]
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002.
[55]
Network motifs: simple building blocks of complex networks. Science 298, 5594 (2002), 824–827.
[56]
J. Neštřil and S. Poljak. 1985.
[57]
On the complexity of the subgraph problem. Commentationes Mathematicae Universitatis Carolinae 26, 2 (1985), 415–419.
[58]
H. N. Nguyen and K. Onak. 2008. Constant-time approximation algorithms via local improvements. In Proceedings of the Symposium on Foundations of Computer Science (FOCS). 327–336.
[59]
K. Onak, D. Ron, M. Rosen, and R. Rubinfeld. 2012. A near-optimal sublinear-time algorithm for approximating the minimum vertex cover size. In Proceedings of the Symposium on Discrete Algorithms (SODA). 1123–1131.
[60]
M. Parnas and D. Ron. 2007.
[61]
Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms. Theoretical Computer Science 381, 1-3 (2007), 183–196.
[62]
Alejandro Portes. 2000. Social Capital: Its Origins and Applications in Modern Sociology. In Knowledge and Social Capital, Eric L. Lesser (Ed.). Butterworth-Heinemann, Boston, 43 – 67. 0- 7506- 7222- 1.50006- 4
[63]
T. Schank and D. Wagner. 2005.
[64]
Approximating Clustering Coefficient and Transitivity. Journal of Graph Algorithms and Applications 9 (2005), 265–275. Issue 2.
[65]
T. Schank and D. Wagner. 2005. Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study. In Experimental and Efficient Algorithms. 606–609.
[66]
C. Seshadhri, T. G. Kolda, and A. Pinar. 2012. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E 85, 5 (May 2012), 056109.
[67]
C. Seshadhri, A. Pinar, and T. G. Kolda. 2013. Fast Triangle Counting through Wedge Sampling. arXiv:1202.5230. In Proceedings of the International Conference on Data Mining (ICDM), Vol. 4. 5. http://arxiv.org/abs/1202.5230
[68]
S. Suri and S. Vassilvitskii. 2011.
[69]
Counting triangles and the curse of the last reducer. In Proceedings of the International Conference on World Wide Web (WWW). 607–614.
[70]
K. Tangwongsan, A. Pavan, and S. Tirthapura. 2013. Parallel Triangle Counting in Massive Streaming Graphs. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). ACM, 781–786.
[71]
C. E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In International Conference on Data Mining (ICDM). 608–617.
[72]
C. E. Tsourakakis. 2015. The K-clique Densest Subgraph Problem. In Proceedings of the International Conference on World Wide Web (WWW). 1122–1132.
[73]
C. E. Tsourakakis, U. Kang, G.L. Miller, and C. Faloutsos. 2009. Doulion: counting triangles in massive graphs with a coin. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD). 837–846.
[74]
C. E. Tsourakakis, M. N. Kolountzakis, and G. L. Miller. 2011. Triangle Sparsifiers. Journal of Graph Algorithms and Applications 15, 6 (2011), 703–726.
[75]
V. Vassilevska. 2009. Efficient algorithms for clique problems. Inform. Process. Lett. 109, 4 (2009), 254–257.
[76]
A. J. Walker. 1974. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters 10, 8 (1974), 127–128.
[77]
A. J. Walker. 1977. An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Software 3, 3 (1977), 253–256.

Cited By

View all
  • (2024)Efficient -Clique Counting on Large Graphs: The Power of Color-Based Sampling ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331464336:4(1518-1536)Online publication date: Apr-2024
  • (2023)DeMEtRIS: Counting (near)-Cliques by CrawlingProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570438(312-320)Online publication date: 27-Feb-2023
  • (2022)Sampling Multiple Nodes in Large NetworksProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498383(37-47)Online publication date: 11-Feb-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
June 2018
1332 pages
ISBN:9781450355599
DOI:10.1145/3188745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximation algorithms
  2. Counting cliques
  3. Sublinear algorithms

Qualifiers

  • Research-article

Conference

STOC '18
Sponsor:
STOC '18: Symposium on Theory of Computing
June 25 - 29, 2018
CA, Los Angeles, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)7
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient -Clique Counting on Large Graphs: The Power of Color-Based Sampling ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331464336:4(1518-1536)Online publication date: Apr-2024
  • (2023)DeMEtRIS: Counting (near)-Cliques by CrawlingProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570438(312-320)Online publication date: 27-Feb-2023
  • (2022)Sampling Multiple Nodes in Large NetworksProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498383(37-47)Online publication date: 11-Feb-2022
  • (2022)A Lower Bound on Cycle-Finding in Sparse DigraphsACM Transactions on Algorithms10.1145/341797918:4(1-23)Online publication date: 10-Oct-2022
  • (2022)Supernodes: a generalization of the rich-clubJournal of Complex Networks10.1093/comnet/cnab05210:1Online publication date: 13-Jan-2022
  • (2021)Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality EstimationProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457246(964-976)Online publication date: 9-Jun-2021
  • (2021)Testing stability properties in graphical hedonic gamesAutonomous Agents and Multi-Agent Systems10.1007/s10458-021-09505-x35:2Online publication date: 7-Jun-2021
  • (2021)On Triangle Estimation Using Tripartite Independent Set QueriesTheory of Computing Systems10.1007/s00224-021-10043-yOnline publication date: 28-May-2021
  • (2020)Ordering heuristics for k-clique listingProceedings of the VLDB Endowment10.14778/3407790.340784313:12(2536-2548)Online publication date: 14-Sep-2020
  • (2020)Edge Estimation with Independent Set OraclesACM Transactions on Algorithms10.1145/340486716:4(1-27)Online publication date: 16-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media