Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1963405.1963493acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

HyperANF: approximating the neighbourhood function of very large graphs on a budget

Published: 28 March 2011 Publication History

Abstract

The neighbourhood function NG(t) of a graph G gives, for each tN, the number of pairs of nodes x, y such that y is reachable from x in less that t hops. The neighbourhood function provides a wealth of information about the graph [10] (e.g., it easily allows one to compute its diameter), but it is very expensive to compute it exactly. Recently, the ANF algorithm [10] (approximate neighbourhood function) has been proposed with the purpose of approximating NG(t) on large graphs. We describe a breakthrough improvement over ANF in terms of speed and scalability. Our algorithm, called HyperANF, uses the new HyperLogLog counters [5] and combines them efficiently through broadword programming [8]; our implementation uses talk decomposition to exploit multi-core parallelism. With HyperANF, for the first time we can compute in a few hours the neighbourhood function of graphs with billions of nodes with a small error and good confidence using a standard workstation.
Then, we turn to the study of the distribution of the distances between reachable nodes (that can be efficiently approximated by means of HyperANF), and discover the surprising fact that its index of dispersion provides a clear-cut characterisation of proper social networks vs. web graphs. We thus propose the spid (Shortest-Paths Index of Dispersion) of a graph as a new, informative statistics that is able to discriminate between the above two types of graphs. We believe this is the first proposal of a significant new non-local structural index for complex networks whose computation is highly scalable.

References

[1]
Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci, 58(1):137--147, 1999.
[2]
Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595--601, Manhattan, USA, 2004. ACM Press.
[3]
Edith Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci., 55:441--453, 1997.
[4]
Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities (extended abstract). In Giuseppe Di Battista and Uri Zwick, editors, Algorithms - ESA 2003, 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003, Proceedings, volume 2832 of Lecture Notes in Computer Science, pages 605--617. Springer, 2003.
[5]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Proceedings of the 13th conference on analysis of algorithm (AofA 07), pages 127--146, 2007.
[6]
J. A. Hartigan and P. M. Hartigan. The dip test of unimodality. Ann. Statist., 13(1):70--84, 1985.
[7]
U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec. HADI: Mining radii of large graphs. ACM Transactions on Knowledge Discovery from Data, 2010.
[8]
Donald E. Knuth. The Art of Computer Programming. Pre-Fascicle 1A. Draft of Section 7.1.3: Bitwise Tricks and Techniques, 2007.
[9]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD '10: Proceedings of the 2010 international conference on Management of data, pages 135--146, New York, NY, USA, 2010. ACM.
[10]
Christopher R. Palmer, Phillip B. Gibbons, and Christos Faloutsos. Anf: a fast and scalable tool for data mining in massive graphs. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 81--90, New York, NY, USA, 2002. ACM.
[11]
Sebastiano Vigna. Stanford matrix considered harmful. In Andreas Frommer, Michael W. Mahoney, and Daniel B. Szyld, editors, Web Information Retrieval and Linear Algebra Algorithms, number 07071 in Dagstuhl Seminar Proceedings, 2007. http://arxiv.org/abs/0710.1962.
[12]
Sebastiano Vigna. Broadword implementation of rank/select queries. In Catherine C. McGeoch, editor, Experimental Algorithms. 7th International Workshop, WEA 2008, number 5038 in Lecture Notes in Computer Science, pages 154--168. Springer-Verlag, 2008.
[13]
D. F. Vysochanskiĭ and Yu. Ī. Petunīn. Remark: "Proof of the 3σ rule for unimodal distributions" {Teor. Veroyatnost. i Mat. Statist. 21 (1979), 23--35}. Teor. Veroyatnost. i Mat. Statist., 27:26--27, 157, 1982.

Cited By

View all
  • (2024)UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct CountingProceedings of the VLDB Endowment10.14778/3654621.365463217:7(1655-1668)Online publication date: 1-Mar-2024
  • (2024)(Vision Paper) A Vision for Spatio-Causal Situation Awareness, Forecasting, and PlanningACM Transactions on Spatial Algorithms and Systems10.1145/367255610:2(1-42)Online publication date: 1-Jul-2024
  • (2024)A Revisit to Graph Neighborhood Cardinality Estimation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00242(3125-3137)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '11: Proceedings of the 20th international conference on World wide web
March 2011
840 pages
ISBN:9781450306324
DOI:10.1145/1963405
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. effective diameter
  2. neighbourhood function
  3. probabilistic counters
  4. shortest paths

Qualifiers

  • Research-article

Conference

WWW '11
WWW '11: 20th International World Wide Web Conference
March 28 - April 1, 2011
Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct CountingProceedings of the VLDB Endowment10.14778/3654621.365463217:7(1655-1668)Online publication date: 1-Mar-2024
  • (2024)(Vision Paper) A Vision for Spatio-Causal Situation Awareness, Forecasting, and PlanningACM Transactions on Spatial Algorithms and Systems10.1145/367255610:2(1-42)Online publication date: 1-Jul-2024
  • (2024)A Revisit to Graph Neighborhood Cardinality Estimation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00242(3125-3137)Online publication date: 13-May-2024
  • (2023) SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher BoundsACM Transactions on Knowledge Discovery from Data10.1145/362860118:3(1-55)Online publication date: 9-Dec-2023
  • (2023)Effective and efficient core computation in signed networksInformation Sciences10.1016/j.ins.2023.03.097634(290-307)Online publication date: Jul-2023
  • (2023)propagate: A Seed Propagation Framework to Compute Distance-Based Metrics on Very Large GraphsMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43418-1_40(671-688)Online publication date: 17-Sep-2023
  • (2022)SIEVE: A Space-Efficient Algorithm for Viterbi DecodingProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526170(1136-1145)Online publication date: 10-Jun-2022
  • (2022)ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set RepresentationsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00048(1-17)Online publication date: Nov-2022
  • (2022)AEGNN: Asynchronous Event-based Graph Neural Networks2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01205(12361-12371)Online publication date: Jun-2022
  • (2022)Disrupting networks of hate: characterising hateful networks and removing critical nodesSocial Network Analysis and Mining10.1007/s13278-021-00818-z12:1Online publication date: 31-Jan-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media