short-paper

Public Access

Dynamic parallelism for simple and efficient GPU graph algorithms

Authors:

Andrew LumsdaineAuthors Info & Claims

IA³ '15: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms

Article No.: 11, Pages 1 - 4

https://doi.org/10.1145/2833179.2833189

Published: 15 November 2015 Publication History

PDF eReader

Abstract

Dynamic parallelism allows GPU kernels to launch additional kernels at runtime directly from the GPU. In this paper we show that dynamic parallelism enables relatively simple high-performance graph algorithms for GPUs. We present breadth-first search (BFS) and single-source shortest paths (SSSP) algorithms that use dynamic parallelism to adapt to the irregular and data-driven nature of these problems. Our approach results in simple code that closely follows the high-level description of the algorithms but yields performance competitive with the current state of the art.

References

[1]

A. Adinetz. Adaptive Parallel Computation with CUDA Dynamic Parallelism, May 2014.

Google Scholar

[2]

A. Davidson, S. Baxter, M. Garland, and J. Owens. Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths. In Proc. 28th Internat. Parallel and Distributed Processing Symposium, pages 349--359, May 2014.

Digital Library

Google Scholar

[3]

C. Demetrescu, A. Goldberg, and D. Johnson. 9th DIMACS Implementation Challenge: Shortest Paths, 2006.

Google Scholar

[4]

P. Harish and P. J. Narayanan. Accelerating Large Graph Algorithms on the GPU Using CUDA. In Proc. 14th International Conference on High Performance Computing, pages 197--208. Springer, 2007.

Digital Library

Google Scholar

[5]

S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA Graph Algorithms at Maximum Warp. In Proc. 16th ACM Symposium on Principles and Practice of Parallel Programming, pages 267--276. ACM, 2011.

Digital Library

Google Scholar

[6]

D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In Proc. 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 117--128. ACM, 2012.

Digital Library

Google Scholar

[7]

R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500 benchmark. Cray User's Group (CUG), 2010.

Google Scholar

[8]

K. Ueno and T. Suzumura. Highly Scalable Graph Search For the Graph500 Benchmark. In Proc. 21st Internat. Symposium on High-Performance Parallel and Distributed Computing, pages 149--160. ACM, 2012.

Digital Library

Google Scholar

[9]

K. Ueno and T. Suzumura. Parallel distributed breadth first search on GPU. In Proc. 20th Internat. Conference on High Performance Computing (HiPC), pages 314--323, Dec. 2013.

Crossref

Google Scholar

[10]

J. Wang and S. Yalamanchili. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In Proc. IEEE Internat. Symposium on Workload Characterization (IISWC), pages 51--60, Oct. 2014.

Crossref

Google Scholar

[11]

J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili. Dynamic Thread Block Launch: A Lightweight Execution Mechanism to Support Irregular Applications on GPUs. In Proc. 42nd Annual International Symposium on Computer Architecture, pages 528--540. ACM, 2015.

Digital Library

Google Scholar

Cited By

View all

Liu ZZhang SGarrigus JZhao H(2023)Genomics-GPU: A Benchmark Suite for GPU-accelerated Genome Analysis2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00026(178-188)Online publication date: Apr-2023
https://doi.org/10.1109/ISPASS57527.2023.00026
Ben-Shalom RLadd AArtherya NCross CKim KSanghevi HKorngreen ABouchard KBender K(2022)NeuroGPU: Accelerating multi-compartment, biophysically detailed neuron simulations on GPUsJournal of Neuroscience Methods10.1016/j.jneumeth.2021.109400366(109400)Online publication date: Jan-2022
https://doi.org/10.1016/j.jneumeth.2021.109400
Muhammed TMehmood RAlbeshri AKatib I(2019)SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUsApplied Sciences10.3390/app90509479:5(947)Online publication date: 6-Mar-2019
https://doi.org/10.3390/app9050947
Show More Cited By

Dynamic parallelism for simple and efficient GPU graph algorithms
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication
E2SC '14: Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing

GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only ...
Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy

Intra-GPU synchronization is a problem for GPU controlled communication.Options, based on dynamic parallelism provide on-device synchronization.GPU controlled communication have a lower performance than CPU assisted approaches.Relieving the CPU from the ...
Efficient GPU Computation Using Task Graph Parallelism
Euro-Par 2021: Parallel Processing
Abstract
Recently, CUDA introduces a new task graph programming model, CUDA graph, to enable efficient launch and execution of GPU work. Users describe a GPU workload in a task graph rather than aggregated GPU operations, allowing the CUDA runtime to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

IA³ '15: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms

November 2015

79 pages

ISBN:9781450340014

DOI:10.1145/2833179

Conference Chairs:
Antonino Tumeo
Pacific Northwest National Laboratory
,
John Feo
Pacific Northwest National Laboratory
,
Oreste Villa
NVIDIA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Short-paper

Funding Sources

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS\DATC

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15, 2015

Texas, Austin

Acceptance Rates

IA³ '15 Paper Acceptance Rate 6 of 24 submissions, 25%;

Overall Acceptance Rate 18 of 67 submissions, 27%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
971
Total Downloads

Downloads (Last 12 months)117
Downloads (Last 6 weeks)30

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Liu ZZhang SGarrigus JZhao H(2023)Genomics-GPU: A Benchmark Suite for GPU-accelerated Genome Analysis2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00026(178-188)Online publication date: Apr-2023
https://doi.org/10.1109/ISPASS57527.2023.00026
Ben-Shalom RLadd AArtherya NCross CKim KSanghevi HKorngreen ABouchard KBender K(2022)NeuroGPU: Accelerating multi-compartment, biophysically detailed neuron simulations on GPUsJournal of Neuroscience Methods10.1016/j.jneumeth.2021.109400366(109400)Online publication date: Jan-2022
https://doi.org/10.1016/j.jneumeth.2021.109400
Muhammed TMehmood RAlbeshri AKatib I(2019)SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUsApplied Sciences10.3390/app90509479:5(947)Online publication date: 6-Mar-2019
https://doi.org/10.3390/app9050947
Todling DWinter MSteinberger M(2019)Breadth-First Search on Dynamic Graphs using Dynamic Parallelism on the GPU2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916476(1-7)Online publication date: Sep-2019
https://doi.org/10.1109/HPEC.2019.8916476
Carneiro TGmys JMelab Nde Carvalho Junior FRebouças Filho PTuyttens D(2019)Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer AlgorithmsWater Governance and Management in India10.1007/978-3-030-15996-2_2(16-30)Online publication date: 26-Mar-2019
https://doi.org/10.1007/978-3-030-15996-2_2
Agrawal SYadav AParwani DMayya V(2018)An Efficient Parallel Implementation of CPU Scheduling Algorithms Using Data Parallel AlgorithmsInternational Conference on Advanced Computing Networking and Informatics10.1007/978-981-13-2673-8_45(429-438)Online publication date: 28-Nov-2018
https://doi.org/10.1007/978-981-13-2673-8_45
Baudis NJacob FAndelfinger P(2017)Performance Evaluation of Priority Queues for Fine-Grained Parallel Tasks on GPUs2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2017.15(1-11)Online publication date: Sep-2017
https://doi.org/10.1109/MASCOTS.2017.15
Jarząbek źCzarnul P(2017)Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applicationsThe Journal of Supercomputing10.1007/s11227-017-2091-x73:12(5378-5401)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s11227-017-2091-x
Carneiro Pessoa TGmys Jde Carvalho Júnior FMelab NTuyttens D(2017)GPU‐accelerated backtracking using CUDA Dynamic ParallelismConcurrency and Computation: Practice and Experience10.1002/cpe.437430:9Online publication date: 27-Nov-2017
https://doi.org/10.1002/cpe.4374
Hajj IGómez-Luna JLi CChang LMilojicic DHwu WHsu WYang CLipasti MLee H(2016)KLAPThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195654(1-12)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195654
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy

Efficient GPU Computation Using Task Graph Parallelism