Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

A Structural Result for Personalized PageRank and its Algorithmic Consequences

Published: 19 June 2019 Publication History

Abstract

Many systems, such as the Internet, social networks, and the power grid, can be represented as graphs. When analyzing graphs, it is often useful to compute scores describing the relative importance or distance between nodes. One example is Personalized PageRank (PPR), which assigns to each node v a vector whose i-th entry describes the importance of the i-th node from the perspective of v. PPR has proven useful in many applications, such as recommending who users should follow on social networks (if this i-th entry is large, v may be interested in following the i-th user). Unfortunately, computing n PPR vectors exactly for a graph of n nodes has complexity O(n^3), which is infeasible for many graphs of interest. In this work, we devise a scheme to estimate all n PPR vectors with bounded l_1 error and complexity O(nc), where c < 2 depends on the degrees of the graph at hand, the desired error tolerance, and a parameter that defines PPR. This improves upon existing methods, the best of which have complexity O(n2 łog n) in our setting. Our complexity guarantee holds with high probability, for certain choices of the PPR parameter, and for a certain class of random graphs (roughly speaking, the sparse directed configuration model with heavy-tailed in-degrees); our accuracy guarantee holds with probability 1 and for arbitrary graphs and PPR parameters. The complexity result arises as a consequence of our main (structural) result, which shows that the dimensionality of the set of PPR vectors scales sublinearly in n with high probability, for the same class of random graphs and for a notion of dimensionality similar to matrix rank. It is this coupling of the PPR vectors for the nodes on a common underlying graph that allows for estimating them faster. Hence, at a high level, our scheme is analogous to (but distinct from) low-rank matrix approximation. We also note that our scheme is similar to one that was proposed in [Jeh and Widom 2003] but lacked accuracy and complexity guarantees, so another contribution of our paper is to address this gap in the literature.

References

[1]
David J Aldous, Antar Bandyopadhyay, et almbox. 2005. A survey of max-type recursive distributional equations. The Annals of Applied Probability, Vol. 15, 2 (2005), 1047--1110.
[2]
Brian Amento, Loren Terveen, and Will Hill. 2000. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 296--303.
[3]
Reid Andersen, Christian Borgs, Jennifer Chayes, John Hopcroft, Vahab Mirrokni, and Shang-Hua Teng. 2008. Local computation of PageRank contributions. Internet Mathematics, Vol. 5, 1--2 (2008), 23--45.
[4]
Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using PageRank vectors. In 2006 IEEE Symposium on Foundations of Computer Science. IEEE, 475--486.
[5]
Konstantin Avrachenkov, Arun Kadavankandy, Liudmila Ostroumova Prokhorenkova, and Andrei Raigorodskii. 2015. PageRank in undirected random graphs. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 151--163.
[6]
Konstantin Avrachenkov and Dmitri Lebedev. 2006. PageRank of scale-free growing networks. Internet Mathematics, Vol. 3, 2 (2006), 207--231.
[7]
Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, and Natalia Osipova. 2007. Monte Carlo methods in PageRank computation: When one iteration is sufficient. SIAM J. Numer. Anal., Vol. 45, 2 (2007), 890--904.
[8]
Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science, Vol. 286, 5439 (1999), 509--512.
[9]
Edward A Bender and E Rodney Canfield. 1978. The asymptotic number of labeled graphs with given degree sequences. Journal of Combinatorial Theory, Series A, Vol. 24, 3 (1978), 296--307.
[10]
Paolo Boldi, Massimo Santini, and Sebastiano Vigna. {n. d.}. Laboratory for Web Algorithmics datasets. http://law.di.unimi.it/datasets.php .
[11]
Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2005. PageRank as a function of the damping factor. In Proceedings of the 14th international conference on World Wide Web. ACM, 557--566.
[12]
Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2009. PageRank: functional dependencies. ACM Transactions on Information Systems (TOIS), Vol. 27, 4 (2009), 19.
[13]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph framework I: Compression techniques. In Proceedings of the 13th International Conference on World Wide Web. ACM, 595--602.
[14]
Béla Bollobás. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics, Vol. 1, 4 (1980), 311--316.
[15]
Charles Bordenave. {n. d.}. Lecture notes on random graphs and probabilistic combinatorial optimization. https://www.math.univ-toulouse.fr/ bordenave/coursRG.pdf .
[16]
Charles Bordenave, Pietro Caputo, and Justin Salez. 2016. Cutoff at the "entropic time" for sparse Markov chains. Probability Theory and Related Fields (2016), 1--32.
[17]
Charles Bordenave, Pietro Caputo, and Justin Salez. 2018. Random walk on sparse random digraphs. Probability Theory and Related Fields, Vol. 170, 3--4 (2018), 933--960.
[18]
Christian Borgs, Michael Brautbar, Jennifer Chayes, and Shang-Hua Teng. 2014. Multiscale matrix sampling and sublinear-time pagerank computation. Internet Mathematics, Vol. 10, 1--2 (2014), 20--48.
[19]
Moses Charikar and Amit Sahai. 2002. Dimension reduction in the l1 norm. In 2002 IEEE Symposium on Foundations of Computer Science. IEEE, 551--560.
[20]
Ningyuan Chen, Nelly Litvak, and Mariana Olvera-Cravioto. 2014. PageRank in scale-free random graphs. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 120--131.
[21]
Ningyuan Chen, Nelly Litvak, and Mariana Olvera-Cravioto. 2017. Generalized PageRank on directed configuration networks. Random Structures & Algorithms, Vol. 51, 2 (2017), 237--274.
[22]
Ningyuan Chen and Mariana Olvera-Cravioto. 2013. Directed random graphs with given degree distributions. Stochastic Systems, Vol. 3, 1 (2013), 147--186.
[23]
Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law distributions in empirical data. SIAM Rev., Vol. 51, 4 (2009), 661--703.
[24]
D.P. Dubhashi and A. Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms .Cambridge University Press. https://books.google.com/books?id=UUohAwAAQBAJ
[25]
Santo Fortunato, Marián Bogu ná, Alessandro Flammini, and Filippo Menczer. 2006. Approximating PageRank from in-degree. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 59--71.
[26]
R.G. Gallager. 2013. Stochastic Processes: Theory for Applications .Cambridge University Press. 2013005146 https://books.google.com/books?id=ERLrAQAAQBAJ
[27]
Alessandro Garavaglia, Remco van der Hofstad, and Nelly Litvak. 2018. Local weak convergence for PageRank. arXiv preprint arXiv:1803.06146 (2018).
[28]
David F Gleich. 2015. PageRank beyond the Web. SIAM Rev., Vol. 57, 3 (2015), 321--363.
[29]
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The who to follow service at Twitter. In Proceedings of the 22nd international conference on World Wide Web. ACM, 505--514.
[30]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024--1034.
[31]
Taher H Haveliwala. 2002. Topic-sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web. ACM, 517--526.
[32]
Piotr Indyk. 2000. Stable distributions, pseudorandom generators, embeddings and data stream computation. In 2000 IEEE Symposium on Foundations of Computer Science. IEEE, 189--197.
[33]
Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th International Conference on World Wide Web. ACM, 271--279.
[34]
A.J. Laub. 2005. Matrix Analysis for Scientists and Engineers .Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104). https://books.google.com/books?id=nCsLwYJdOlwC
[35]
Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems. 556--562.
[36]
Jiung Lee and Mariana Olvera-Cravioto. 2017. PageRank on inhomogeneous random digraphs. arXiv preprint arXiv:1707.02492 (2017).
[37]
Jure Leskovec and Andrej Krevl. {n. d.}. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data .
[38]
Nelly Litvak, Werner RW Scheinhardt, and Yana Volkovich. 2007. In-degree and PageRank: Why do they follow similar power laws? Internet Mathematics, Vol. 4, 2--3 (2007), 175--198.
[39]
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized PageRank estimation and search: A bidirectional approach. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 163--172.
[40]
Peter Lofgren and Ashish Goel. 2013. Personalized pagerank to a target node. arXiv preprint arXiv:1304.4658 (2013).
[41]
Ivan Markovsky. 2011. Low rank approximation: algorithms, implementation, applications .Springer Science & Business Media.
[42]
Michael Molloy and Bruce Reed. 1995. A critical point for random graphs with a given degree sequence. Random structures & algorithms, Vol. 6, 2--3 (1995), 161--180.
[43]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
[44]
Kijung Shin, Jinhong Jung, Sael Lee, and U Kang. 2015. Bear: Block elimination approach for random walk with restart on large graphs. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1571--1585.
[45]
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2008. Random walk with restart: Fast solutions and applications. Knowledge and Information Systems, Vol. 14, 3 (2008), 327--346.
[46]
Remco van der Hofstad, Gerard Hooghiemstra, and Piet Van Mieghem. 2005. Distances in random graphs with finite variance degrees. Random Structures & Algorithms, Vol. 27, 1 (2005), 76--123.
[47]
Yana Volkovich and Nelly Litvak. 2010. Asymptotic analysis for personalized web search. Advances in Applied Probability, Vol. 42, 2 (2010), 577--604.
[48]
Yana Volkovich, Nelly Litvak, and Debora Donato. 2007. Determining factors behind the PageRank log-log plot. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 108--123.
[49]
Nicholas C Wormald. 1980. Some problems in the enumeration of labelled graphs. Bulletin of the Australian Mathematical Society, Vol. 21, 1 (1980), 159--160.
[50]
Zhanxing Zhu, Zhirong Yang, and Erkki Oja. 2013. Multiplicative updates for learning with stochastic matrices. In Scandinavian Conference on Image Analysis. Springer, 143--152.

Cited By

View all
  • (2022)Red Light Green Light Method for Solving Large Markov ChainsJournal of Scientific Computing10.1007/s10915-022-01976-893:1Online publication date: 1-Oct-2022
  • (2021)Mixing time of PageRank surfers on sparse random digraphsRandom Structures & Algorithms10.1002/rsa.2100959:3(376-406)Online publication date: 12-Apr-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 3, Issue 2
June 2019
683 pages
EISSN:2476-1249
DOI:10.1145/3341617
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2019
Published in POMACS Volume 3, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. branching processes
  2. directed configuration model
  3. low-rank approximation
  4. mixing times
  5. personalized pagerank

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)12
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Red Light Green Light Method for Solving Large Markov ChainsJournal of Scientific Computing10.1007/s10915-022-01976-893:1Online publication date: 1-Oct-2022
  • (2021)Mixing time of PageRank surfers on sparse random digraphsRandom Structures & Algorithms10.1002/rsa.2100959:3(376-406)Online publication date: 12-Apr-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media