Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1376916.1376928acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Estimating PageRank on graph streams

Published: 09 June 2008 Publication History

Abstract

This study focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n) and a few passes.
In the streaming model, we show how to perform several graph computations including estimating the probability distribution after a random walk of length l, mixing time, and the conductance. We estimate the mixing time M of a random walk in Õ(nα+Mα√n+√Mn/
α) space and Õ(√Mα) passes. Furthermore, the relation between mixing time and conductance gives us an estimate for the conductance of the graph. By applying our algorithm for computing probability distribution on the web-graph, we can estimate the PageRank p of any node up to an additive error of √εp in Õ(√M/α) passes and Õ(min(nα + 1/ε √M/α + 1/ε Mα, αnMα + 1/ε √M/α)) space, for any α ∈ (0, 1]. In particular, for ε = M/n, by setting α = M--1/2, we can compute the approximate PageRank values in Õ(nM--1/4) space and Õ(M3/4) passes. In comparison, a standard implementation of the PageRank algorithm will take O(n) space and O(M) passes.

References

[1]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137--147, 1999.
[2]
Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Local graph partitioning using pagerank vectors. In Proc. of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 475--486, 2006.
[3]
Roy Armoni, Amnon Ta-Shma, Avi Wigderson, and Shiyu Zhou. sll 4/3. In ACM Symposium on Theory of Computing, pages 230--239, 1997.
[4]
Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In In Proc. ACM-SIAM Symposium on Discrete Algorithms, pages 623--632, 2002.
[5]
T. Batu, E. Fischer, L. Fortnow, R. Kumar, R. Rubenfeld, and P. White. Testing random variables for independence and identity. In Proc. of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 442--451, 2001.
[6]
L. Bhuvanagiri and S. Ganguly. Estimating entropy over data streams. In European Symposium on Algorithms (ESA), pages 148--159, 2006.
[7]
L. Bhuvanagiri, S. Ganguly, D. Kesh, and C. Saha. Simpler algorithm for estimating frequency moments of data streams. In Proc of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 708--713, 2006.
[8]
Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. In Proc. 7th international conference on World Wide Web, pages 107--117, 1998.
[9]
Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohle. Counting triangles in data streams. In PODS, pages 253--262, 2006.
[10]
G. Cormode and S. Muthukrishnan. Space efficient mining of multigraph streams. In In ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 271--282, 2005.
[11]
C. Demetrescu, I. Finocchi, and A. Ribichini. Trading of space for passes in graph streaming problems. In In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 714--723, 2006.
[12]
Uriel Feige. A spectrum of time-space trade-offs for undirected s-t connectivity. Journal of Computer and System Sciences, 54(2):305--316, 1997.
[13]
J. Feigenbaum, S. Kannan, A McGregor, S Suri, and J. Zhang. Graph distances in the streaming model: the value of space. In In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 745--754, 2005.
[14]
Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Cliff Stein, and Zoya Svitkina. On the complexity of processing massive, unordered, distributed data. In CoRR abs/cs/0611108, 2006.
[15]
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In In ACM International Conference on Management of Data, SIGMOD, pages 58--66, 2001.
[16]
S. Guha and A. McGregor. Approximate quantiles and the order of the stream. In In ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 273--279, 2006.
[17]
Sudipto Guha and Andrew McGregor. Space-efficient sampling. In In AISTATS, pages 169--176, 2007.
[18]
Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 733--742, 2006.
[19]
Sudipto Guha and Andrew McGrgor. Lower bounds for quantile estimation in random-order and multi-pass streaming. In ICALP, pages 704--715, 2007.
[20]
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In In External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 50, pages 107--118, 1999.
[21]
P. Indyk and D. P. Woodruff. Optimal approximations of the frequency moments of data streams. In IEEE Symposium on Foundations of Computer Science, FOCS, pages 283--292, 2003.
[22]
Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In ACM Symposium on Theory of Computing, STOC, pages 373--380, 2004.
[23]
M. Jerrum and A. Sinclair. Approximating the permanent. SIAM Journal of Computing, 18(6):1149--1178, 1989.
[24]
H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In In COCOON, pages 710--716, 2005.
[25]
G.S. Manku, S. Rajagopalan, and B.G. Lindsay. Randomized sampling techniques for space efficient online computation of order statistics of large datasets. In In ACM SIGMOD International Conference on Management of Data, pages 251--262, 1999.
[26]
A. McGregor. Finding graph matchings in data streams. In In APPROX-RANDOM, pages 170--181, 2005.
[27]
Frank McSherry. A uniform approach to accelerated pagerank computation. In WWW, pages 575--582, 2005.
[28]
T. Sarlos, A. Benczur, K. Csalogany, D. Fogaras, and B. Racz. To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In In the 15th International World Wide Web Conference, WWW, pages 297--306, 2006.
[29]
John Wicks and Amy R. Greenwald. Parallelizing the computation of pagerank. In Proc. 5th Workshop On Algorithms And Models For The Web-Graph (WAW), pages 202--208, 2007.
[30]
David P. Woodruff. Optimal space lower bounds for all frequency moments'. In In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 167--175, 2004.

Cited By

View all
  • (2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024
  • (2022)Fast Nearest Neighbor Search on Large Time-Evolving GraphsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_2(17-33)Online publication date: 10-Mar-2022
  • (2019)Accelerating parallel graph computing with speculationProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3323049(115-124)Online publication date: 30-Apr-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2008
330 pages
ISBN:9781605581521
DOI:10.1145/1376916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. PageRank
  2. graph conductance
  3. mixing time
  4. random walk
  5. streaming algorithms

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '08
Sponsor:

Acceptance Rates

PODS '08 Paper Acceptance Rate 28 of 159 submissions, 18%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024
  • (2022)Fast Nearest Neighbor Search on Large Time-Evolving GraphsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_2(17-33)Online publication date: 10-Mar-2022
  • (2019)Accelerating parallel graph computing with speculationProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3323049(115-124)Online publication date: 30-Apr-2019
  • (2018)ASAPProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291224(745-761)Online publication date: 8-Oct-2018
  • (2018)Recent Advancements in Event ProcessingACM Computing Surveys10.1145/317043251:2(1-36)Online publication date: 13-Feb-2018
  • (2018)Processing of RDF Stream DataLinked Data10.1007/978-3-319-73515-3_5(85-108)Online publication date: 2-Mar-2018
  • (2018)Graph Mining on StreamsEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_184(1651-1656)Online publication date: 7-Dec-2018
  • (2017)On sampling from massive graph streamsProceedings of the VLDB Endowment10.14778/3137628.313765110:11(1430-1441)Online publication date: 1-Aug-2017
  • (2016)CASQDProceedings of the 10th ACM International Conference on Distributed and Event-based Systems10.1145/2933267.2933316(226-237)Online publication date: 13-Jun-2016
  • (2016)TornadoProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882950(417-430)Online publication date: 26-Jun-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media