Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Accelerating dynamic graph analytics on GPUs

Published: 01 September 2017 Publication History

Abstract

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform a rebuild of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper, we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore, we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing on GPUs. Our extensive experiments with three streaming applications on large-scale real and synthetic datasets demonstrate the superior performance of our proposed approach.

References

[1]
Apache flink. https://flink.apache.org/. Accessed: 2016-10-18.
[2]
Cusp library. https://developer.nvidia.com/cusp. Accessed: 2017-03-25.
[3]
cusparse. https://developer.nvidia.com/cusparse. Accessed: 2016-11-09.
[4]
CUDA UnBound (CUB) library. https://nvlabs.github.io/cub/, 2015.
[5]
L. Akoglu, H. Tong, and D. Koutra. Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov., 29(3):626--688, 2015.
[6]
A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus for graph applications. In SC, pages 781--792, 2014.
[7]
A. Ashari, N. Sedaghati, J. Eisenlohr, and P. Sadayappan. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on gpus. In ICS, pages 273--282, 2014.
[8]
D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarría-Miranda, C. Hastings, K. Madduri, and S. C. Poulos. Stinger: Spatio-temporal interaction networks and graphs (sting) extensible representation. Georgia Institute of Technology, Tech. Rep, 2009.
[9]
N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. Technical Report NVR-2008--004, NVIDIA Corporation, 2008.
[10]
M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees. SIAM J. Comput., 35(2):341--358, 2005.
[11]
M. A. Bender and H. Hu. An adaptive packed-memory array. ACM Trans. Database Syst., 32(4), 2007.
[12]
L. Braun, T. Etter, G. Gasparis, M. Kaufmann, D. Kossmann, D. Widmer, A. Avitzur, A. Iliopoulos, E. Levy, and N. Liang. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, pages 251--264, 2015.
[13]
F. Busato and N. Bombieri. Bfs-4k: an efficient implementation of bfs for kepler gpu architectures. TPDS, 26(7):1826--1838, 2015.
[14]
R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012.
[15]
M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337--348. Springer, 2013.
[16]
H.-V. Dang and B. Schmidt. The sliced coo format for sparse matrix-vector multiplication on cuda-enabled gpus. Procedia Computer Science, 9:57--66, 2012.
[17]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794--1813, 2002.
[18]
A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel gpu methods for single-source shortest paths. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 349--359. IEEE, 2014.
[19]
D. Ediger, R. McColl, E. J. Riedy, and D. A. Bader. STINGER - High performance data structure for streaming graphs. HPEC, 2012.
[20]
M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Trans. Algorithms, 7(2):20:1--20:17, 2011.
[21]
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2--3):207--216, 2005.
[22]
Z. Fu, M. Personick, and B. Thompson. MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs. A High Level API for Fast Development of High Performance Graph Analytics on GPUs. ACM, New York, New York, USA, June 2014.
[23]
S. Guha and A. McGregor. Graph synopses, sketches, and streams: A survey. PVLDB, 5(12):2030--2031, 2012.
[24]
W. Guo, Y. Li, M. Sha, and K.-L. Tan. Parallel personalized pagerank on dynamic graphs. PVLDB, 11(1), 2017.
[25]
P. Harish and P. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In International Conference on High-Performance Computing, pages 197--208. Springer, 2007.
[26]
D. S. Hirschberg. Parallel algorithms for the transitive closure and the connected component problems. In Proceedings of the eighth annual ACM symposium on Theory of computing, pages 55--57. ACM, 1976.
[27]
A. P. Iyer, L. E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pages 5:1--5:6, 2016.
[28]
A. P. Iyer, L. E. Li, and I. Stoica. Celliq : Real-time cellular network analytics at scale. In NSDI, pages 309--322, 2015.
[29]
R. Kaleem, A. Venkat, S. Pai, M. Hall, and K. Pingali. Synchronization trade-offs in gpu implementations of graph algorithms. In Parallel and Distributed Processing Symposium, 2016 IEEE International, pages 514--523. IEEE, 2016.
[30]
J. King, T. Gilray, R. M. Kirby, and M. Might. Dynamic sparse-matrix allocation on gpus. In ISC, pages 61--80, 2016.
[31]
J. Leskovec and R. Sosič. Snap: A general-purpose network analysis and graph-mining library. TIST, 8(1):1, 2016.
[32]
X. Lin, R. Zhang, Z. Wen, H. Wang, and J. Qi. Efficient subgraph matching using gpus. In ADC, pages 74--85, 2014.
[33]
H. Liu, H. H. Huang, and Y. Hu. ibfs: Concurrent breadth-first search on gpus. In SIGMOD, pages 403--416, 2016.
[34]
L. Luo, M. Wong, and W.-m. Hwu. An effective gpu implementation of breadth-first search. In DAC, pages 52--55, 2010.
[35]
M. Martone, S. Filippone, S. Tucci, P. Gepner, and M. Paprzycki. Use of hybrid recursive csr/coo data structures in sparse matrix-vector multiplication. In IMCSIT, pages 327--335. IEEE, 2010.
[36]
A. McGregor. Graph stream algorithms: A survey. SIGMOD Rec., 43(1):9--20, 2014.
[37]
D. Merrill, M. Garland, and A. Grimshaw. High-Performance and Scalable GPU Graph Traversal. TOPC, 1(2), 2015.
[38]
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. 2010.
[39]
N. Ohsaka, T. Maehara, and K.-i. Kawarabayashi. Efficient pagerank tracking in evolving networks. In KDD, pages 875--884, 2015.
[40]
Y. Saad. Numerical solution of large nonsymmetric eigenvalue problems. Computer Physics Communications, 53(1):71--90, 1989.
[41]
D. Sayce. 10 billions tweets, number of tweets per day. http://www.dsayce.com/social-media/10-billions-tweets/. Accessed: 2016-10-18.
[42]
M. Sha, Y. Li, B. He, and K.-L. Tan. Technical report: Accelerating dynamic graph analytics on gpus. arXiv preprint arXiv:1709.05061, 2017.
[43]
J. Soman, K. Kothapalli, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. IPDPS Workshops, 2010.
[44]
M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4):42--47, 2005.
[45]
J. A. Stratton, N. Anssari, C. Rodrigues, I.-J. Sung, N. Obeid, L. Chang, G. D. Liu, and W.-m. Hwu. Optimization and architecture effects on gpu computing workload performance. In InPar, pages 1--10, 2012.
[46]
N. Tang, Q. Chen, and P. Mitra. Graph stream summarization: From big bang to big crunch. In SIGMOD, pages 1481--1496, 2016.
[47]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, pages 147--156, 2014.
[48]
C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. DOULION: counting triangles in massive graphs with a coin. In SIGKDD, pages 837--846, 2009.
[49]
U. Verner, A. Schuster, M. Silberstein, and A. Mendelson. Scheduling processing of real-time data streams on heterogeneous multi-gpu systems. In SYSTOR, page 7, 2012.
[50]
Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: A high-performance graph processing library on the gpu. SIGPLAN Not., 50(8):265--266, 2015.
[51]
Y. Wang, Q. Fan, Y. Li, and K.-L. Tan. Real-time influence maximization on dynamic social streams. PVLDB, 10(7):805--816, 2017.
[52]
S. Yan, C. Li, Y. Zhang, and H. Zhou. yaspmv: yet another spmv framework on gpus. In SIGPLAN Notices, volume 49, pages 107--118, 2014.
[53]
X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: Implications for graph mining. PVLDB, 4(4):231--242, 2011.
[54]
Y. Yang, Z. Wang, J. Pei, and E. Chen. Tracking influential nodes in dynamic networks. arXiv preprint arXiv:1602.04490, 2016.
[55]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In SOSP, pages 423--438, 2013.
[56]
H. Zhang, P. Lofgren, and A. Goel. Approximate personalized pagerank on dynamic graphs. arXiv preprint arXiv:1603.07796, 2016.
[57]
Y. Zhang and F. Mueller. Gstream: A general-purpose data streaming framework on GPU clusters. In ICPP, pages 245--254, 2011.
[58]
J. Zhong and B. He. Medusa: Simplified graph processing on gpus. IEEE Trans. Parallel Distrib. Syst., 25(6):1543--1552, 2014.

Cited By

View all
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2024)CPMA: An Efficient Batch-Parallel Compressed Set Without PointersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638492(348-363)Online publication date: 2-Mar-2024
  • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: 1-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 1
Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, Brazil
September 2017
120 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2017
Published in PVLDB Volume 11, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)6
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • (2024)CPMA: An Efficient Batch-Parallel Compressed Set Without PointersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638492(348-363)Online publication date: 2-Mar-2024
  • (2024)PDG: A Prefetcher for Dynamic Graph UpdatingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333588043:4(1246-1259)Online publication date: 1-Apr-2024
  • (2024)A survey on dynamic graph processing on GPUs: concepts, terminologies and systemsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2656-118:4Online publication date: 1-Aug-2024
  • (2024)Meerkat: A Framework for Dynamic Graph Algorithms on GPUsInternational Journal of Parallel Programming10.1007/s10766-024-00774-z52:5-6(400-453)Online publication date: 1-Dec-2024
  • (2023)Adaptive update handling for graph HTAPDistributed and Parallel Databases10.1007/s10619-023-07428-y41:3(331-357)Online publication date: 14-May-2023
  • (2022)SancusProceedings of the VLDB Endowment10.14778/3538598.353861415:9(1937-1950)Online publication date: 1-May-2022
  • (2022)Compilation of dynamic sparse tensor algebraProceedings of the ACM on Programming Languages10.1145/35633386:OOPSLA2(1408-1437)Online publication date: 31-Oct-2022
  • (2022)A lock-free approach to parallelizing personalized PageRank computations on GPUFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-1546-217:1Online publication date: 8-Aug-2022
  • (2021)Accelerating exact constrained shortest paths on GPUsProceedings of the VLDB Endowment10.14778/3436905.343691414:4(547-559)Online publication date: 22-Feb-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media