Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Parallelizing approximate single-source personalized PageRank queries on shared memory

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR) \(\pi (s,t)\) measures the importance of node t with respect to node s. In this work, we study the single-source PPR query, which takes a source node s as input and outputs the PPR values of all nodes in G with respect to s. The single-source PPR query finds many important applications, e.g., community detection and recommendation. Deriving the exact answers for single-source PPR queries is prohibitive, so most existing work focuses on approximate solutions. Nevertheless, existing approximate solutions are still inefficient, and it is challenging to compute single-source PPR queries efficiently for online applications. This motivates us to devise efficient parallel algorithms running on shared-memory multi-core systems. In this work, we present how to efficiently parallelize the state-of-the-art index-based solution FORA, and theoretically analyze the complexity of the parallel algorithms. Theoretically, we prove that our proposed algorithm achieves a time complexity of \(O(W/P+\log ^2{n})\), where W is the time complexity of sequential FORA algorithm, P is the number of processors used, and n is the number of nodes in the graph. FORA includes a forward push phase and a random walk phase, and we present optimization techniques to both phases, including effective maintenance of active nodes, improving the efficiency of memory access, and cache-aware scheduling. Extensive experimental evaluation demonstrates that our solution achieves up to 37\(\times \) speedup on 40 cores and 3.3\(\times \) faster than alternatives on 40 cores. Moreover, the forward push alone can be used for local graph clustering, and our parallel algorithm for forward push is 4.8\(\times \) faster than existing parallel alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The grainsize is set to 128 according to Leiserson et al. [23].

References

  1. Andersen, R., Borgs, C., Chayes, J., Hopcraft, J., Mirrokni, V., Teng, S.-H.: Local computation of pagerank contributions. In: WAW, pp. 150–165 (2007)

    Chapter  Google Scholar 

  2. Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using pagerank vectors. In: FOCS, pp. 475–486 (2006)

  3. Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized pagerank on mapreduce. In: SIGMOD, pp. 973–984 (2011)

  4. Bahmani, B., Chowdhury, A., Goel, A.: Fast incremental and personalized pagerank. PVLDB 4(3), 173–184 (2010)

    Google Scholar 

  5. Beamer, S., Asanović, K., Patterson, D.: Direction-optimizing breadth-first search. Sci. Program. 21(3–4), 137–148 (2013)

    Google Scholar 

  6. Brent, R.P.: The parallel evaluation of general arithmetic expressions. J. ACM 21(2), 201–206 (1974)

    Article  MathSciNet  Google Scholar 

  7. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM (2004)

  8. Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)

    Article  MathSciNet  Google Scholar 

  9. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  10. Coskun, M., Grama, A., Koyutürk, M.: Efficient processing of network proximity queries via chebyshev acceleration. In: SIGKDD, pp. 1515–1524 (2016)

  11. Dagum, L., Menon, R.: Openmp: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)

    Article  Google Scholar 

  12. Fogaras, D., Rácz, B., Csalogány, K., Sarlós, T.: Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math. 2(3), 333–358 (2005)

    Article  MathSciNet  Google Scholar 

  13. Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. PVLDB 5(5), 442–453 (2012)

    Google Scholar 

  14. Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Efficient ad-hoc search for personalized pagerank. In: SIGMOD, pp. 445–456 (2013)

  15. Fujiwara, Y., Nakatsuji, M., Yamamuro, T., Shiokawa, H., Onizuka, M.: Efficient personalized pagerank with accuracy assurance. In: SIGKDD, pp. 15–23 (2012)

  16. Guo, T., Cao, X., Cong, G., Lu, J., Lin, X.: Distributed algorithms on exact personalized pagerank. In: SIGMOD, pp. 479–494 (2017)

  17. Guo, W., Li, Y., Sha, M., Tan, K.-L.: Parallel personalized pagerank on dynamic graphs. PVLDB 11(1), 93–106 (2017)

    Google Scholar 

  18. Gupta, M., Pathak, A., Chakrabarti, S.: Fast algorithms for topk personalized pagerank queries. In: WWW, pp. 1225–1226 (2008)

  19. Gupta, P., Goel, A., Lin, J.J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at twitter. In: WWW, pp. 505–514 (2013)

  20. https://www.cilkplus.org/ (2018)

  21. Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)

  22. Jung, J., Park, N., Sael, L., Kang, U.: Bepi: fast and memory-efficient method for billion-scale random walk with restart. In: SIGMOD, pp 789–804 (2017)

  23. Leiserson, C.E., Schardl, T.B.: A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: SPAA, pp. 303–314 (2010)

  24. Lin, W.: Distributed algorithms for fully personalized pagerank on large graphs. In: WWW, pp. 1084–1094 (2019)

  25. Liu, D.C., Rogers, S., Shiau, R., Kislyuk, D., Ma, K.C., Zhong, Z., Liu, J., Jing, Y.: Related pins at pinterest: the evolution of a real-world recommender system. In: WWW, pp. 583–592 (2017)

  26. Lofgren, P., Banerjee, S., Goel, A.: Personalized pagerank estimation and search: a bidirectional approach. In: WSDM, pp. 163–172 (2016)

  27. Nguyen, P., Tomeo, P., Noia, T.D., Sciascio, E.D.: An evaluation of simrank and personalized pagerank to build a recommender system for the web of data. In: WWW, pp. 1477–1482 (2015)

  28. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)

  29. Park, H., Jung, J., Kang, U.: A comparative study of matrix factorization and random walk with restart in recommender systems. In: BigData, pp. 756–765 (2017)

  30. Shin, K., Jung, J., Sael, L., Kang, U.: BEAR: block elimination approach for random walk with restart on large graphs. In: SIGMOD, pp. 1571–1585 (2015)

  31. Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: PPoPP, pp. 135–146 (2013)

    Article  Google Scholar 

  32. Shun, J., Blelloch, G.E.: Phase-concurrent hash tables for determinism. In: SPAA, pp. 96–107 (2014)

  33. Shun, J., Roosta-Khorasani, F., Fountoulakis, K., Mahoney, M.W.: Parallel local graph clustering. PVLDB 9(12), 1041–1052 (2016)

    Google Scholar 

  34. Wang, S., Tang, Y., Xiao, X., Yang, Y., Li, Z.: Hubppr: effective indexing for approximate personalized pagerank. Proc. VLDB Endow. 10(3), 205–216 (2016)

    Article  Google Scholar 

  35. Wang, S., Tao, Y.: Efficient algorithms for finding approximate heavy hitters in personalized pageranks. In: SIGMOD, pp. 1113–1127 (2018)

  36. Wang, S., Yang, R., Xiao, X., Wei, Z., Yang, Y.: FORA: simple and effective approximate single-source personalized pagerank. In: SIGKDD, pp. 505–514 (2017)

  37. Wei, H., Yu, J.X., Lu, C., Lin, X.: Speedup graph processing by graph ordering. In: SIGMOD, pp. 1813–1828 (2016)

  38. Wei, Z., He, X., Xiao, X., Wang, S., Shang, S., Wen, J.-R.: Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In: SIGMOD, pp. 441–456 (2018)

  39. Whang, J.J., Gleich, D.F., Dhillon, I.S.: Overlapping community detection using neighborhood-inflated seed expansion. IEEE Trans. Knowl. Data Eng. 28(5), 1272–1284 (2016)

    Article  Google Scholar 

  40. Yin, H., Benson, A.R., Leskovec, J., Gleich, D.F.: Local higher-order graph clustering. In: SIGKDD, pp. 555–564 (2017)

  41. Zhang, H., Lofgren, P., Goel, A.: Approximate personalized pagerank on dynamic graphs. In: SIGKDD, pp. 1315–1324 (2016)

  42. Zhu, F., Fang, Y., Chang, K.C., Ying, J.: Incremental and accuracy-aware personalized pagerank through scheduled approximation. PVLDB 6(6), 481–492 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Runhui Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, R., Wang, S. & Zhou, X. Parallelizing approximate single-source personalized PageRank queries on shared memory. The VLDB Journal 28, 923–940 (2019). https://doi.org/10.1007/s00778-019-00576-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00576-7

Keywords

Navigation