research-article

HyperANF: approximating the neighbourhood function of very large graphs on a budget

Authors:

Paolo Boldi,

Marco Rosa,

Sebastiano VignaAuthors Info & Claims

WWW '11: Proceedings of the 20th international conference on World wide web

Pages 625 - 634

https://doi.org/10.1145/1963405.1963493

Published: 28 March 2011 Publication History

Get Access

Abstract

The neighbourhood function N_G(t) of a graph G gives, for each t ∈ N, the number of pairs of nodes x, y such that y is reachable from x in less that t hops. The neighbourhood function provides a wealth of information about the graph [10] (e.g., it easily allows one to compute its diameter), but it is very expensive to compute it exactly. Recently, the ANF algorithm [10] (approximate neighbourhood function) has been proposed with the purpose of approximating N_G(t) on large graphs. We describe a breakthrough improvement over ANF in terms of speed and scalability. Our algorithm, called HyperANF, uses the new HyperLogLog counters [5] and combines them efficiently through broadword programming [8]; our implementation uses talk decomposition to exploit multi-core parallelism. With HyperANF, for the first time we can compute in a few hours the neighbourhood function of graphs with billions of nodes with a small error and good confidence using a standard workstation.

Then, we turn to the study of the distribution of the distances between reachable nodes (that can be efficiently approximated by means of HyperANF), and discover the surprising fact that its index of dispersion provides a clear-cut characterisation of proper social networks vs. web graphs. We thus propose the spid (Shortest-Paths Index of Dispersion) of a graph as a new, informative statistics that is able to discriminate between the above two types of graphs. We believe this is the first proposal of a significant new non-local structural index for complex networks whose computation is highly scalable.

References

[1]

Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci, 58(1):137--147, 1999.

Digital Library

Google Scholar

[2]

Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595--601, Manhattan, USA, 2004. ACM Press.

Digital Library

Google Scholar

[3]

Edith Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci., 55:441--453, 1997.

Digital Library

Google Scholar

[4]

Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities (extended abstract). In Giuseppe Di Battista and Uri Zwick, editors, Algorithms - ESA 2003, 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003, Proceedings, volume 2832 of Lecture Notes in Computer Science, pages 605--617. Springer, 2003.

Google Scholar

[5]

Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Proceedings of the 13th conference on analysis of algorithm (AofA 07), pages 127--146, 2007.

Google Scholar

[6]

J. A. Hartigan and P. M. Hartigan. The dip test of unimodality. Ann. Statist., 13(1):70--84, 1985.

Crossref

Google Scholar

[7]

U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec. HADI: Mining radii of large graphs. ACM Transactions on Knowledge Discovery from Data, 2010.

Digital Library

Google Scholar

[8]

Donald E. Knuth. The Art of Computer Programming. Pre-Fascicle 1A. Draft of Section 7.1.3: Bitwise Tricks and Techniques, 2007.

Google Scholar

[9]

Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD '10: Proceedings of the 2010 international conference on Management of data, pages 135--146, New York, NY, USA, 2010. ACM.

Digital Library

Google Scholar

[10]

Christopher R. Palmer, Phillip B. Gibbons, and Christos Faloutsos. Anf: a fast and scalable tool for data mining in massive graphs. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 81--90, New York, NY, USA, 2002. ACM.

Digital Library

Google Scholar

[11]

Sebastiano Vigna. Stanford matrix considered harmful. In Andreas Frommer, Michael W. Mahoney, and Daniel B. Szyld, editors, Web Information Retrieval and Linear Algebra Algorithms, number 07071 in Dagstuhl Seminar Proceedings, 2007. http://arxiv.org/abs/0710.1962.

Google Scholar

[12]

Sebastiano Vigna. Broadword implementation of rank/select queries. In Catherine C. McGeoch, editor, Experimental Algorithms. 7th International Workshop, WEA 2008, number 5038 in Lecture Notes in Computer Science, pages 154--168. Springer-Verlag, 2008.

Digital Library

Google Scholar

[13]

D. F. Vysochanskiĭ and Yu. Ī. Petunīn. Remark: "Proof of the 3σ rule for unimodal distributions" {Teor. Veroyatnost. i Mat. Statist. 21 (1979), 23--35}. Teor. Veroyatnost. i Mat. Statist., 27:26--27, 157, 1982.

Google Scholar

Cited By

View all

Ertl O(2024)UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct CountingProceedings of the VLDB Endowment10.14778/3654621.365463217:7(1655-1668)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.14778/3654621.3654632
Azad FCandan KKapkiç ALi MLiu HMandal PSheth PArslan BChowell-Puente GSabo JMuenich RRedondo Anton JSapino M(2024)(Vision Paper) A Vision for Spatio-Causal Situation Awareness, Forecasting, and PlanningACM Transactions on Spatial Algorithms and Systems10.1145/367255610:2(1-42)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3672556
Wang PZhang YCheng KZhao J(2024)A Revisit to Graph Neighborhood Cardinality Estimation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00242(3125-3137)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00242
Show More Cited By

Recommendations

Reconfiguration graphs of shortest paths
Abstract
For a graph G anda , b ∈ V ( G ), the shortest path reconfiguration graph of G with respect to a andb is denoted by S ( G , a , b ). The vertex set of S ( G , a , b ) is the set of all shortest paths between a andb in G. Two vertices ...
A faster algorithm for computing the girth of planar and bounded genus graphs

The girth of a graph G is the length of a shortest cycle of G. In this article we design an O(n^5/4 log n) algorithm for finding the girth of an undirected n-vertex planar graph, the first o(n²) algorithm for this problem. We also extend our results for ...
Weakly bipartite graphs and the Max-cut problem

A new class of graphs, called weakly bipartite graphs, is introduced. A graph is called weakly bipartite if its bipartite subgraph polytope coincides with a certain polyhedron related to odd cycle constraints. The class of weakly bipartite graphs ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

WWW '11: Proceedings of the 20th international conference on World wide web

March 2011

840 pages

ISBN:9781450306324

DOI:10.1145/1963405

General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
The International Institute of Information Technology Bangalore: The International Institute of Information Technology Bangalore

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '11

WWW '11: 20th International World Wide Web Conference

March 28 - April 1, 2011

Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

90
Total Citations
View Citations
837
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)3

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ertl O(2024)UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct CountingProceedings of the VLDB Endowment10.14778/3654621.365463217:7(1655-1668)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.14778/3654621.3654632
Azad FCandan KKapkiç ALi MLiu HMandal PSheth PArslan BChowell-Puente GSabo JMuenich RRedondo Anton JSapino M(2024)(Vision Paper) A Vision for Spatio-Causal Situation Awareness, Forecasting, and PlanningACM Transactions on Spatial Algorithms and Systems10.1145/367255610:2(1-42)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3672556
Wang PZhang YCheng KZhao J(2024)A Revisit to Graph Neighborhood Cardinality Estimation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00242(3125-3137)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00242
Pellegrina LVandin F(2023) SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher BoundsACM Transactions on Knowledge Discovery from Data10.1145/362860118:3(1-55)Online publication date: 9-Dec-2023
https://dl.acm.org/doi/10.1145/3628601
Kim JJeong HLim SKim J(2023)Effective and efficient core computation in signed networksInformation Sciences10.1016/j.ins.2023.03.097634(290-307)Online publication date: Jul-2023
https://doi.org/10.1016/j.ins.2023.03.097
Amati GCruciani APasquini DVocca PAngelini S(2023)propagate: A Seed Propagation Framework to Compute Distance-Based Metrics on Very Large GraphsMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43418-1_40(671-688)Online publication date: 17-Sep-2023
https://doi.org/10.1007/978-3-031-43418-1_40
Ciaperoni MGionis AKatsamanis AKarras PIves ZBonifati AEl Abbadi A(2022)SIEVE: A Space-Efficient Algorithm for Viterbi DecodingProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526170(1136-1145)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526170
Besta MMiglioli CLabini PTětek JIff PKanakagiri RAshkboos SJanda KPodstawski MKwaśniewski GGleinig NVella FMutlu OHoefler T(2022)ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set RepresentationsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00048(1-17)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00048
Schaefer SGehrig DScaramuzza D(2022)AEGNN: Asynchronous Event-based Graph Neural Networks2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01205(12361-12371)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.01205
Alorainy WBurnap PLiu HWilliams MGiommoni L(2022)Disrupting networks of hate: characterising hateful networks and removing critical nodesSocial Network Analysis and Mining10.1007/s13278-021-00818-z12:1Online publication date: 31-Jan-2022
https://doi.org/10.1007/s13278-021-00818-z
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Reconfiguration graphs of shortest paths

A faster algorithm for computing the girth of planar and bounded genus graphs

Weakly bipartite graphs and the Max-cut problem