research-article

Accelerating dynamic graph analytics on GPUs

Authors:

Kian-Lee TanAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 11, Issue 1

Pages 107 - 120

https://doi.org/10.14778/3151113.3151122

Published: 01 September 2017 Publication History

Abstract

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.

References

[1]

Apache flink. https://flink.apache.org/. Accessed: 2016-10-18.

[2]

Cusp library. https://developer.nvidia.com/cusp. Accessed: 2017-03-25.

[3]

cusparse. https://developer.nvidia.com/cusparse. Accessed: 2016-11-09.

[4]

CUDA UnBound (CUB) library. https://nvlabs.github.io/cub/, 2015.

[5]

L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov., 29(3):626--688, 2015.

Digital Library

[6]

A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus for graph applications. In SC, pages 781--792, 2014.

Digital Library

[7]

A. Ashari, N. Sedaghati, J. Eisenlohr, and P. Sadayappan. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on gpus. In ICS, pages 273--282, 2014.

Digital Library

[8]

D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarría-Miranda, C. Hastings, K. Madduri, and S. C. Poulos. Stinger: Spatio-temporal interaction networks and graphs (sting) extensible representation. Georgia Institute of Technology, Tech. Rep, 2009.

[9]

N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. Technical Report NVR-2008--004, NVIDIA Corporation, 2008.

[10]

M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees. SIAM J. Comput., 35(2):341--358, 2005.

Digital Library

[11]

M. A. Bender and H. Hu. An adaptive packed-memory array. ACM Trans. Database Syst., 32(4), 2007.

Digital Library

[12]

L. Braun, T. Etter, G. Gasparis, M. Kaufmann, D. Kossmann, D. Widmer, A. Avitzur, A. Iliopoulos, E. Levy, and N. Liang. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, pages 251--264, 2015.

Digital Library

[13]

F. Busato and N. Bombieri. Bfs-4k: an efficient implementation of bfs for kepler gpu architectures. TPDS, 26(7):1826--1838, 2015.

[14]

R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012.

Digital Library

[15]

M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337--348. Springer, 2013.

[16]

H.-V. Dang and B. Schmidt. The sliced coo format for sparse matrix-vector multiplication on cuda-enabled gpus. Procedia Computer Science, 9:57--66, 2012.

[17]

M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794--1813, 2002.

Digital Library

[18]

A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel gpu methods for single-source shortest paths. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 349--359. IEEE, 2014.

Digital Library

[19]

D. Ediger, R. McColl, E. J. Riedy, and D. A. Bader. STINGER - High performance data structure for streaming graphs. HPEC, 2012.

[20]

M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Trans. Algorithms, 7(2):20:1--20:17, 2011.

Digital Library

[21]

J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2--3):207--216, 2005.

Digital Library

[22]

Z. Fu, M. Personick, and B. Thompson. MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs. A High Level API for Fast Development of High Performance Graph Analytics on GPUs. ACM, New York, New York, USA, June 2014.

[23]

S. Guha and A. McGregor. Graph synopses, sketches, and streams: A survey. PVLDB, 5(12):2030--2031, 2012.

Digital Library

[24]

W. Guo, Y. Li, M. Sha, and K.-L. Tan. Parallel personalized pagerank on dynamic graphs. PVLDB, 11(1), 2017.

Digital Library

[25]

P. Harish and P. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In International Conference on High-Performance Computing, pages 197--208. Springer, 2007.

Digital Library

[26]

D. S. Hirschberg. Parallel algorithms for the transitive closure and the connected component problems. In Proceedings of the eighth annual ACM symposium on Theory of computing, pages 55--57. ACM, 1976.

Digital Library

[27]

A. P. Iyer, L. E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pages 5:1--5:6, 2016.

Digital Library

[28]

A. P. Iyer, L. E. Li, and I. Stoica. Celliq : Real-time cellular network analytics at scale. In NSDI, pages 309--322, 2015.

Digital Library

[29]

R. Kaleem, A. Venkat, S. Pai, M. Hall, and K. Pingali. Synchronization trade-offs in gpu implementations of graph algorithms. In Parallel and Distributed Processing Symposium, 2016 IEEE International, pages 514--523. IEEE, 2016.

[30]

J. King, T. Gilray, R. M. Kirby, and M. Might. Dynamic sparse-matrix allocation on gpus. In ISC, pages 61--80, 2016.

[31]

J. Leskovec and R. Sosič. Snap: A general-purpose network analysis and graph-mining library. TIST, 8(1):1, 2016.

Digital Library

[32]

X. Lin, R. Zhang, Z. Wen, H. Wang, and J. Qi. Efficient subgraph matching using gpus. In ADC, pages 74--85, 2014.

[33]

H. Liu, H. H. Huang, and Y. Hu. ibfs: Concurrent breadth-first search on gpus. In SIGMOD, pages 403--416, 2016.

Digital Library

[34]

L. Luo, M. Wong, and W.-m. Hwu. An effective gpu implementation of breadth-first search. In DAC, pages 52--55, 2010.

Digital Library

[35]

M. Martone, S. Filippone, S. Tucci, P. Gepner, and M. Paprzycki. Use of hybrid recursive csr/coo data structures in sparse matrix-vector multiplication. In IMCSIT, pages 327--335. IEEE, 2010.

[36]

A. McGregor. Graph stream algorithms: A survey. SIGMOD Rec., 43(1):9--20, 2014.

Digital Library

[37]

D. Merrill, M. Garland, and A. Grimshaw. High-Performance and Scalable GPU Graph Traversal. TOPC, 1(2), 2015.

Digital Library

[38]

R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. 2010.

[39]

N. Ohsaka, T. Maehara, and K.-i. Kawarabayashi. Efficient pagerank tracking in evolving networks. In KDD, pages 875--884, 2015.

Digital Library

[40]

Y. Saad. Numerical solution of large nonsymmetric eigenvalue problems. Computer Physics Communications, 53(1):71--90, 1989.

[41]

D. Sayce. 10 billions tweets, number of tweets per day. http://www.dsayce.com/social-media/10-billions-tweets/. Accessed: 2016-10-18.

[42]

M. Sha, Y. Li, B. He, and K.-L. Tan. Technical report: Accelerating dynamic graph analytics on gpus. arXiv preprint arXiv:1709.05061, 2017.

[43]

J. Soman, K. Kothapalli, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. IPDPS Workshops, 2010.

[44]

M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4):42--47, 2005.

Digital Library

[45]

J. A. Stratton, N. Anssari, C. Rodrigues, I.-J. Sung, N. Obeid, L. Chang, G. D. Liu, and W.-m. Hwu. Optimization and architecture effects on gpu computing workload performance. In InPar, pages 1--10, 2012.

[46]

N. Tang, Q. Chen, and P. Mitra. Graph stream summarization: From big bang to big crunch. In SIGMOD, pages 1481--1496, 2016.

Digital Library

[47]

A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, pages 147--156, 2014.

Digital Library

[48]

C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. DOULION: counting triangles in massive graphs with a coin. In SIGKDD, pages 837--846, 2009.

Digital Library

[49]

U. Verner, A. Schuster, M. Silberstein, and A. Mendelson. Scheduling processing of real-time data streams on heterogeneous multi-gpu systems. In SYSTOR, page 7, 2012.

Digital Library

[50]

Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: A high-performance graph processing library on the gpu. SIGPLAN Not., 50(8):265--266, 2015.

Digital Library

[51]

Y. Wang, Q. Fan, Y. Li, and K.-L. Tan. Real-time influence maximization on dynamic social streams. PVLDB, 10(7):805--816, 2017.

Digital Library

[52]

S. Yan, C. Li, Y. Zhang, and H. Zhou. yaspmv: yet another spmv framework on gpus. In SIGPLAN Notices, volume 49, pages 107--118, 2014.

Digital Library

[53]

X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: Implications for graph mining. PVLDB, 4(4):231--242, 2011.

Digital Library

[54]

Y. Yang, Z. Wang, J. Pei, and E. Chen. Tracking influential nodes in dynamic networks. arXiv preprint arXiv:1602.04490, 2016.

[55]

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP, pages 423--438, 2013.

Digital Library

[56]

H. Zhang, P. Lofgren, and A. Goel. Approximate personalized pagerank on dynamic graphs. arXiv preprint arXiv:1603.07796, 2016.

[57]

Y. Zhang and F. Mueller. Gstream: A general-purpose data streaming framework on GPU clusters. In ICPP, pages 245--254, 2011.

Digital Library

[58]

J. Zhong and B. He. Medusa: Simplified graph processing on gpus. IEEE Trans. Parallel Distrib. Syst., 25(6):1543--1552, 2014.

Digital Library

Cited By

Qi HWu YHe LZhang YLuo KCai MJin HZhang ZZhao J(2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650076
Wheatman BBurns RBuluc AXu HLee IChabbi MSteuwer M(2024)CPMA: An Efficient Batch-Parallel Compressed Set Without PointersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638492(348-363)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638492
Zhang XLiu CNi JCheng YZhang LLi HLi X(2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TCAD.2023.3335880
Show More Cited By

Recommendations

Accelerating CUDA graph algorithms at maximum warp
PPoPP '11

Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...
Accelerating Radiative Transfer Simulation on NVIDIA GPUs with OpenACC
Parallel and Distributed Computing, Applications and Technologies
Abstract
To accelerate multiphysics applications, making use of not only GPUs but also FPGAs has been emerging. Multiphysics applications are simulations involving multiple physical models and multiple simultaneous physical phenomena. Operations with ...
Accelerating CUDA graph algorithms at maximum warp
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 11, Issue 1

Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, Brazil

September 2017

120 pages

ISSN:2150-8097

Editors:
Jian Pei
Simon Fraser University
,
Sihem Amer-Yahia
University of Grenoble Alpes, CNRS

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2017

Published in PVLDB Volume 11, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
372
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)6

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qi HWu YHe LZhang YLuo KCai MJin HZhang ZZhao J(2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650076
Wheatman BBurns RBuluc AXu HLee IChabbi MSteuwer M(2024)CPMA: An Efficient Batch-Parallel Compressed Set Without PointersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638492(348-363)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638492
Zhang XLiu CNi JCheng YZhang LLi HLi X(2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TCAD.2023.3335880
Gao HLiao XShao ZLi KChen JJin H(2024)A survey on dynamic graph processing on GPUs: concepts, terminologies and systemsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2656-118:4Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s11704-023-2656-1
Concessao KCheramangalath UDev RNasre R(2024)Meerkat: A Framework for Dynamic Graph Algorithms on GPUsInternational Journal of Parallel Programming10.1007/s10766-024-00774-z52:5-6(400-453)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10766-024-00774-z
Jibril MBaumstark ASattler K(2023)Adaptive update handling for graph HTAPDistributed and Parallel Databases10.1007/s10619-023-07428-y41:3(331-357)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1007/s10619-023-07428-y
Peng JChen ZShao YShen YChen LCao J(2022)SancusProceedings of the VLDB Endowment10.14778/3538598.353861415:9(1937-1950)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.14778/3538598.3538614
Chou SAmarasinghe S(2022)Compilation of dynamic sparse tensor algebraProceedings of the ACM on Programming Languages10.1145/35633386:OOPSLA2(1408-1437)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563338
Wang ZWang NNie JWei ZGu YYu G(2022)A lock-free approach to parallelizing personalized PageRank computations on GPUFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-1546-217:1Online publication date: 8-Aug-2022
https://dl.acm.org/doi/10.1007/s11704-022-1546-2
Lu SHe BLi YFu H(2021)Accelerating exact constrained shortest paths on GPUsProceedings of the VLDB Endowment10.14778/3436905.343691414:4(547-559)Online publication date: 22-Feb-2021
https://dl.acm.org/doi/10.14778/3436905.3436914
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents