Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3519935.3520059acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Edge sampling and graph parameter estimation via vertex neighborhood accesses

Published: 10 June 2022 Publication History

Abstract

In this paper, we consider the problems from the area of sublinear-time algorithms of edge sampling, edge counting, and triangle counting. Part of our contribution is that we consider three different settings, differing in the way in which one may access the neighborhood of a given vertex. In previous work, people have considered indexed neighbor access, with a query returning the i-th neighbor of a given vertex. Full neighborhood access model, which has a query that returns the entire neighborhood at a unit cost, has recently been considered in the applied community. Between these, we propose hash-ordered neighbor access, inspired by coordinated sampling, where we have a global fully random hash function, and can access neighbors in order of their hash values, paying a constant for each accessed neighbor.
For edge sampling and counting, our new lower bounds are in the most powerful full neighborhood access model. We provide matching upper bounds in the weaker hash-ordered neighbor access model. Our new faster algorithms can be provably implemented efficiently on massive graphs in external memory and with the current APIs for, e.g., Twitter or Wikipedia. For triangle counting, we provide a separation: a better upper bound with full neighborhood access than the known lower bounds with indexed neighbor access. The technical core of our paper is our edge-sampling algorithm on which the other results depend. We now describe our results on the classic problems of edge and triangle counting.
We give an algorithm that uses hash-ordered neighbor access to approximately count edges in time Õ(n/є √m + 1/є2) (compare to the state of the art without hash-ordered neighbor access of Õ(n2m) by Eden, Ron, and Seshadhri [ICALP 2017]). We present an Ω(n/є √m) lower bound for є ≥√m/n in the full neighborhood access model. This improves the lower bound of Ω(n/√є m) by Goldreich and Ron [Rand. Struct. Alg. 2008]) and it matches our new upper bound for є ≥ √m/n. We also show an algorithm that uses the more standard assumption of pair queries (“are the vertices u and v adjacent?”), with time complexity of Õ(n/є √m + 1/є4). This matches our lower bound for є ≥ m1/6/n1/3.
Finally, we focus on triangle counting. For this, we use the full power of the full neighbor access. In the indexed neighbor model, an algorithm that makes Õ(n10/3 T1/3 + min(m,m3/23 T)) queries for T being the number of triangles, is known and this is known to be the best possible up to the dependency on є (Eden, Levi, Ron, and Seshadhri [FOCS 2015]). We improve this significantly to Õ(min(n,nT1/3 + √n m2T)) full neighbor accesses, thus showing that the full neighbor access is fundamentally stronger for triangle counting than the weaker indexed neighbor model. We also give a lower bound, showing that this is the best possible with full neighborhood access, in terms of n,m,T.

References

[1]
AC Armenakis, LE Garey, and RD Gupta. 1985. An adaptation of a root finding method to searching ordered disk files. BIT Numerical Mathematics, 25, 4 (1985), 561–568.
[2]
Sepehr Assadi. 2020. CS 514: Advanced Algorithms II-Sublinear Algorithms 1 Sublinear Time Algorithms for Graphs. Rutgers University. https://www.cs.rutgers.edu/~sa1497/courses/cs514-s20/lec3.pdf
[3]
Omri Ben-Eliezer, Talya Eden, Joel Oren, and Dimitris Fotakis. 2021. Sampling Multiple Nodes in Large Networks: Beyond Random Walks. arxiv:2110.13324.
[4]
Suman K Bera and C Seshadhri. 2020. How to count triangles, without seeing the whole graph. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 306–316.
[5]
Flavio Chiericetti, Anirban Dasgupta, Ravi Kumar, Silvio Lattanzi, and Tamás Sarlós. 2016. On sampling nodes in a network. In Proceedings of the 25th International Conference on World Wide Web. 471–481.
[6]
Flavio Chierichetti and Shahrzad Haddadan. 2018. On the complexity of sampling vertices uniformly from a graph. In Leibniz International Proceedings in Informatics, LIPIcs. 107, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. isbn:9783959770767 issn:18688969 https://doi.org/10.4230/LIPIcs.ICALP.2018.149
[7]
Anirban Dasgupta, Ravi Kumar, and Tamas Sarlos. 2014. On estimating the average degree. In Proceedings of the 23rd international conference on World wide web. 795–806.
[8]
Talya Eden, Amit Levi, Dana Ron, and C. Seshadhri. 2015. Approximately Counting Triangles in Sublinear Time. Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, 2015-Decem (2015), apr, 614–633. https://doi.org/10.1109/FOCS.2015.44 arxiv:1504.00954.
[9]
Talya Eden, Saleet Mossel, and Ronitt Rubinfeld. 2021. Sampling Multiple Edges Efficiently. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021), Mary Wootters and Laura Sanità (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 207). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany. 51:1–51:15. isbn:978-3-95977-207-5 issn:1868-8969 https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.51
[10]
Talya Eden, Dana Ron, and Will Rosenbaum. 2019. The arboricity captures the complexity of sampling edges. In Leibniz International Proceedings in Informatics, LIPIcs. 132, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. isbn:9783959771092 issn:18688969 https://doi.org/10.4230/LIPIcs.ICALP.2019.52 arxiv:1902.08086.
[11]
Talya Eden, Dana Ron, and Will Rosenbaum. 2019. The Arboricity Captures the Complexity of Sampling Edges. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 132). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 52:1–52:14. isbn:978-3-95977-109-2 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ICALP.2019.52
[12]
Talya Eden, Dana Ron, and C. Seshadhri. 2017. Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017), Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 80). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 7:1–7:13. isbn:978-3-95977-041-5 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ICALP.2017.7
[13]
Talya Eden and Will Rosenbaum. 2018. Lower Bounds for Approximating Graph Parameters via Communication Complexity. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018), Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 116). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 11:1–11:18. isbn:978-3-95977-085-9 issn:1868-8969 https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.11
[14]
Talya Eden and Will Rosenbaum. 2018. On Sampling Edges Almost Uniformly. In 1st Symposium on Simplicity in Algorithms (SOSA 2018), Raimund Seidel (Ed.) (OpenAccess Series in Informatics (OASIcs), Vol. 61). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 7:1–7:9. isbn:978-3-95977-064-4 issn:2190-6807 https://doi.org/10.4230/OASIcs.SOSA.2018.7
[15]
Uriel Feige. 2004. On sums of independent random variables with unbounded variance, and estimating the average degree in a graph. 594–603. issn:07349025 https://doi.org/10.1145/1007352.1007443
[16]
Oded Goldreich. 2018. Introduction to property testing. Cambridge University Press, Cambridge, United Kingdom ; New York, NY, USA. isbn:9781107194052
[17]
Oded Goldreich and Dana Ron. 2008. Approximating average parameters of graphs. Random Structures and Algorithms, 32, 4 (2008), jul, 473–493. issn:10429832 https://doi.org/10.1002/rsa.20203
[18]
D. G. Horvitz and D. J. Thompson. 1952. A Generalization of Sampling Without Replacement From a Finite Universe. J. Amer. Statist. Assoc., 47, 260 (1952), 663–685. issn:01621459 http://www.jstor.org/stable/2280784
[19]
Alon Itai and Michael Rodeh. 1977. Finding a Minimum Circuit in a Graph. In Proceedings of the Ninth Annual ACM Symposium on Theory of Computing (STOC ’77). Association for Computing Machinery, New York, NY, USA. 1–10. isbn:9781450374095 https://doi.org/10.1145/800105.803390
[20]
John Kallaugher, Andrew McGregor, Eric Price, and Sofya Vorotnikova. 2019. The Complexity of Counting Cycles in the Adjacency List Streaming Model. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ’19). Association for Computing Machinery, New York, NY, USA. 119–133. isbn:9781450362276 https://doi.org/10.1145/3294052.3319706
[21]
Tali Kaufman, Michael Krivelevich, and Dana Ron. 2004. Tight bounds for testing bipartiteness in general graphs. SIAM Journal on computing, 33, 6 (2004), 1441–1483.
[22]
Mihail N. Kolountzakis, Gary L. Miller, Richard Peng, and Charalampos E. Tsourakakis. 2012. Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning. Internet Mathematics, 8, 1-2 (2012), 161–185. https://doi.org/10.1080/15427951.2012.625260 arxiv:https://doi.org/10.1080/15427951.2012.625260.
[23]
W Wesley Peterson. 1957. Addressing for random-access storage. IBM journal of Research and Development, 1, 2 (1957), 130–146.
[24]
Seagate. 2014. Seagate® Desktop HDD, ST4000DM000, ST3000DM003. https://www.seagate.com/www-content/product-content/desktop-hdd-fam/en-us/docs/100710254f.pdf
[25]
C. Seshadhri. 2015. A simpler sublinear algorithm for approximating the triangle count. may, arxiv:1505.01927. arxiv:1505.01927
[26]
Toshiba. 2019. Enterprise Hard Drives MG Series. https://www.toshiba-storage.com/wp-content/uploads/2019/09/TOSH_DS_MG_Series_print.pdf

Cited By

View all
  • (2024)Better Sum Estimation via Weighted SamplingACM Transactions on Algorithms10.1145/365003020:3(1-33)Online publication date: 21-Jun-2024
  • (2024)Generic network sparsification via degree- and subgraph-based edge samplingInformation Sciences: an International Journal10.1016/j.ins.2024.121096679:COnline publication date: 1-Sep-2024
  • (2023)An adaptive graph sampling framework for graph analyticsSocial Network Analysis and Mining10.1007/s13278-023-01157-x14:1Online publication date: 6-Dec-2023

Index Terms

  1. Edge sampling and graph parameter estimation via vertex neighborhood accesses

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2022: Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing
    June 2022
    1698 pages
    ISBN:9781450392648
    DOI:10.1145/3519935
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Edge counting
    2. Edge sampling
    3. Sublinear-time algorithms
    4. Triangle counting

    Qualifiers

    • Research-article

    Conference

    STOC '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Better Sum Estimation via Weighted SamplingACM Transactions on Algorithms10.1145/365003020:3(1-33)Online publication date: 21-Jun-2024
    • (2024)Generic network sparsification via degree- and subgraph-based edge samplingInformation Sciences: an International Journal10.1016/j.ins.2024.121096679:COnline publication date: 1-Sep-2024
    • (2023)An adaptive graph sampling framework for graph analyticsSocial Network Analysis and Mining10.1007/s13278-023-01157-x14:1Online publication date: 6-Dec-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media