Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3183713.3196920acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs

Published: 27 May 2018 Publication History

Abstract

Personalized PageRank (PPR) is a classic metric that measures the relevance of graph nodes with respect to a source node. Given a graph G, a source node s, and a parameter k, a top-k PPR query returns a set of k nodes with the highest PPR values with respect to s. This type of queries serves as an important building block for numerous applications in web search and social networks, such as Twitter's Who-To-Follow recommendation service. Existing techniques for top-k PPR, however, suffer from two major deficiencies. First, they either incur prohibitive space and time overheads on large graphs, or fail to provide any guarantee on the precision of top-k results (i.e., the results returned might miss a number of actual top-k answers). Second, most of them require significant pre-computation on the input graph G, which renders them unsuitable for graphs with frequent updates (e.g., Twitter's social graph).
To address the deficiencies of existing solutions, we propose PPR, an algorithm for top-k PPR queries that ensure at least ρ precision (i.e., at least ρ fraction of the actual top-k results are returned) with at least 1 - 1/n probability, where ρ ∈;n (0, 1] is a user-specified parameter and n is the number of nodes in G. In addition, PPR offers non-trivial guarantees on query time in terms of ρ, and it can easily handle dynamic graphs as it does not require any preprocessing. We experimentally evaluate PPR using a variety of benchmark datasets, and demonstrate that PPR outperforms the state-of-the-art solutions in terms of both efficiency and precision, even when we set ρ = 1 (i.e., when PPR returns the exact top-k results). Notably, on a billion-edge Twitter graph, PPR only requires 15 seconds to answer a top-500 PPR query with ρ = 1.

References

[1]
https://spark-summit.org/2017/events/random-walks-on-large-scale-graphs-with-apache-spark/.
[2]
http://compbio.case.edu/omics/software/chopper/index.html.
[3]
https://github.com/YubaoWu/FLoS.
[4]
https://datalab.snu.ac.kr/bepi/.
[5]
http://snap.stanford.edu/data/index.html.
[6]
http://law.di.unimi.it/datasets.php.
[7]
Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. Local computation of pagerank contributions. In WAW, pages 150--165, 2007.
[8]
Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Local graph partitioning using pagerank vectors. In FOCS, pages 475--486, 2006.
[9]
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. Tuning bandit algorithms in stochastic environments. In ALT, volume 4754, pages 150--165. Springer, 2007.
[10]
Lars Backstrom and Jure Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, pages 635--644, 2011.
[11]
Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. Fast personalized pagerank on mapreduce. In SIGMOD, pages 973--984, 2011.
[12]
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010.
[13]
Soumen Chakrabarti. Dynamic personalized pagerank in entity-relation graphs. In WWW, pages 571--580, 2007.
[14]
Fan R. K. Chung and Lincoln Lu. Survey: Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79--127, 2006.
[15]
Mustafa Coskun, Ananth Grama, and Mehmet Koyuturk. Efficient processing of network proximity queries via chebyshev acceleration. In KDD, pages 1515--1524, 2016.
[16]
Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.
[17]
Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, and Masaru Kitsuregawa. Fast and exact top-k search for random walk with restart. PVLDB, 5(5):442--453, 2012.
[18]
Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. Efficient ad-hoc search for personalized pagerank. In SIGMOD, pages 445--456, 2013.
[19]
Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. Fast and exact top-k algorithm for pagerank. In AAAI, 2013.
[20]
Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. Efficient personalized pagerank with accuracy assurance. In KDD, pages 15--23, 2012.
[21]
Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. Distributed algorithms on exact personalized pagerank. In SIGMOD, pages 479--494, 2017.
[22]
Manish S. Gupta, Amit Pathak, and Soumen Chakrabarti. Fast algorithms for top-k personalized pagerank queries. In WWW, pages 1225--1226, 2008.
[23]
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. Wtf: The who to follow service at twitter. In WWW, pages 505--514, 2013.
[24]
Ali Hadian, Sadegh Nobari, Behrooz Minaei-Bidgoli, and Qiang Qu. Roll: Fast in-memory generation of gigantic scale-free networks. In SIGMOD, pages 1829--1842. ACM, 2016.
[25]
Kalervo J"arvelin and Jaana Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pages 41--48, 2000.
[26]
Glen Jeh and Jennifer Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.
[27]
Jinhong Jung, Namyong Park, Sael Lee, and U Kang. Bepi: Fast and memory-efficient method for billion-scale random walk with restart. In SIGMOD, pages 789--804, 2017.
[28]
David C. Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C. Ma, Zhigang Zhong, Jenny Liu, and Yushi Jing. Related pins at pinterest: The evolution of a real-world recommender system. In WWW (Companion), pages 583--592, 2017.
[29]
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Bidirectional pagerank estimation: From average-case to worst-case. In WAW, pages 164--176, 2015.
[30]
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016.
[31]
Peter A Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In KDD, pages 1436--1445, 2014.
[32]
Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. Computing personalized pagerank quickly by exploiting graph structures. PVLDB, 7(12):1023--1034, 2014.
[33]
Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. Efficient pagerank tracking in evolving networks. In KDD, pages 875--884, 2015.
[34]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: bringing order to the web. 1999.
[35]
CH Ren, Luyi Mo, CM Kao, CK Cheng, and DWL Cheung. Clude: An efficient algorithm for lu decomposition over a sequence of evolving graphs. In EDBT, 2014.
[36]
Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. Fast distributed pagerank computation. Theoretical Computer Science, 561:113--121, 2015.
[37]
Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. BEAR: block elimination approach for random walk with restart on large graphs. In SIGMOD, pages 1571--1585, 2015.
[38]
Alastair J. Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 10(8):127--128, 1974.
[39]
Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. Hubppr: Effective indexing for approximate personalized pagerank. PVLDB, 10(3):205--216, 2016.
[40]
Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. FORA: simple and effective approximate single-source personalized pagerank. In KDD, pages 505--514, 2017.
[41]
Yubao Wu, Ruoming Jin, and Xiang Zhang. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In SIGMOD 2014, pages 1139--1150, 2014.
[42]
Weiren Yu and Xuemin Lin. IRWR: incremental random walk with restart. In SIGIR, pages 1017--1020, 2013.
[43]
Weiren Yu and Julie A. McCann. Random walk with restart over dynamic graphs. In ICDM, pages 589--598, 2016.
[44]
Hongyang Zhang, Peter Lofgren, and Ashish Goel. Approximate personalized pagerank on dynamic graphs. In KDD, pages 1315--1324, 2016.
[45]
Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. Incremental and accuracy-aware personalized pagerank through scheduled approximation. PVLDB, 6(6):481--492, 2013.

Cited By

View all

Index Terms

  1. TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
      May 2018
      1874 pages
      ISBN:9781450347037
      DOI:10.1145/3183713
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 May 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. personalized pagerank
      2. top-k queries

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '18
      Sponsor:

      Acceptance Rates

      SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Effective Tool Augmented Multi-Agent Framework for Data AnalysisData Intelligence10.3724/2096-7004.di.2024.0013Online publication date: 17-Oct-2024
      • (2024)BIRD: Efficient Approximation of Bidirectional Hidden Personalized PageRankProceedings of the VLDB Endowment10.14778/3665844.366585517:9(2255-2268)Online publication date: 1-May-2024
      • (2024)QTCS: Efficient Query-Centered Temporal Community SearchProceedings of the VLDB Endowment10.14778/3648160.364816317:6(1187-1199)Online publication date: 3-May-2024
      • (2024)Efficient and Provable Effective Resistance Computation on Large Graphs: An Index-based ApproachProceedings of the ACM on Management of Data10.1145/36549362:3(1-27)Online publication date: 30-May-2024
      • (2024)Revisiting Local Computation of PageRank: Simple and OptimalProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649661(911-922)Online publication date: 10-Jun-2024
      • (2024)Accurate and Scalable Graph Convolutional Networks for Recommendation Based on Subgraph PropagationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.346733336:12(7556-7568)Online publication date: Dec-2024
      • (2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024
      • (2024)Efficient Algorithms for Group Hitting Probability Queries on Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3349164(1-13)Online publication date: 2024
      • (2024)Attributed Network Embedding in Streaming Style2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00243(3138-3150)Online publication date: 13-May-2024
      • (2024)Personalized PageRanks over Dynamic Graphs - The Case for Optimizing Quality of Service2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00038(409-422)Online publication date: 13-May-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media