Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2833179.2833189acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
short-paper
Public Access

Dynamic parallelism for simple and efficient GPU graph algorithms

Published: 15 November 2015 Publication History

Abstract

Dynamic parallelism allows GPU kernels to launch additional kernels at runtime directly from the GPU. In this paper we show that dynamic parallelism enables relatively simple high-performance graph algorithms for GPUs. We present breadth-first search (BFS) and single-source shortest paths (SSSP) algorithms that use dynamic parallelism to adapt to the irregular and data-driven nature of these problems. Our approach results in simple code that closely follows the high-level description of the algorithms but yields performance competitive with the current state of the art.

References

[1]
A. Adinetz. Adaptive Parallel Computation with CUDA Dynamic Parallelism, May 2014.
[2]
A. Davidson, S. Baxter, M. Garland, and J. Owens. Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths. In Proc. 28th Internat. Parallel and Distributed Processing Symposium, pages 349--359, May 2014.
[3]
C. Demetrescu, A. Goldberg, and D. Johnson. 9th DIMACS Implementation Challenge: Shortest Paths, 2006.
[4]
P. Harish and P. J. Narayanan. Accelerating Large Graph Algorithms on the GPU Using CUDA. In Proc. 14th International Conference on High Performance Computing, pages 197--208. Springer, 2007.
[5]
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA Graph Algorithms at Maximum Warp. In Proc. 16th ACM Symposium on Principles and Practice of Parallel Programming, pages 267--276. ACM, 2011.
[6]
D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In Proc. 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 117--128. ACM, 2012.
[7]
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500 benchmark. Cray User's Group (CUG), 2010.
[8]
K. Ueno and T. Suzumura. Highly Scalable Graph Search For the Graph500 Benchmark. In Proc. 21st Internat. Symposium on High-Performance Parallel and Distributed Computing, pages 149--160. ACM, 2012.
[9]
K. Ueno and T. Suzumura. Parallel distributed breadth first search on GPU. In Proc. 20th Internat. Conference on High Performance Computing (HiPC), pages 314--323, Dec. 2013.
[10]
J. Wang and S. Yalamanchili. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In Proc. IEEE Internat. Symposium on Workload Characterization (IISWC), pages 51--60, Oct. 2014.
[11]
J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili. Dynamic Thread Block Launch: A Lightweight Execution Mechanism to Support Irregular Applications on GPUs. In Proc. 42nd Annual International Symposium on Computer Architecture, pages 528--540. ACM, 2015.

Cited By

View all
  • (2023)Genomics-GPU: A Benchmark Suite for GPU-accelerated Genome Analysis2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00026(178-188)Online publication date: Apr-2023
  • (2022)NeuroGPU: Accelerating multi-compartment, biophysically detailed neuron simulations on GPUsJournal of Neuroscience Methods10.1016/j.jneumeth.2021.109400366(109400)Online publication date: Jan-2022
  • (2019)SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUsApplied Sciences10.3390/app90509479:5(947)Online publication date: 6-Mar-2019
  • Show More Cited By
  1. Dynamic parallelism for simple and efficient GPU graph algorithms

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IA3 '15: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms
    November 2015
    79 pages
    ISBN:9781450340014
    DOI:10.1145/2833179
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    SC15
    Sponsor:

    Acceptance Rates

    IA3 '15 Paper Acceptance Rate 6 of 24 submissions, 25%;
    Overall Acceptance Rate 18 of 67 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)117
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Genomics-GPU: A Benchmark Suite for GPU-accelerated Genome Analysis2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00026(178-188)Online publication date: Apr-2023
    • (2022)NeuroGPU: Accelerating multi-compartment, biophysically detailed neuron simulations on GPUsJournal of Neuroscience Methods10.1016/j.jneumeth.2021.109400366(109400)Online publication date: Jan-2022
    • (2019)SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUsApplied Sciences10.3390/app90509479:5(947)Online publication date: 6-Mar-2019
    • (2019)Breadth-First Search on Dynamic Graphs using Dynamic Parallelism on the GPU2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916476(1-7)Online publication date: Sep-2019
    • (2019)Dynamic Configuration of CUDA Runtime Variables for CDP-Based Divide-and-Conquer AlgorithmsWater Governance and Management in India10.1007/978-3-030-15996-2_2(16-30)Online publication date: 26-Mar-2019
    • (2018)An Efficient Parallel Implementation of CPU Scheduling Algorithms Using Data Parallel AlgorithmsInternational Conference on Advanced Computing Networking and Informatics10.1007/978-981-13-2673-8_45(429-438)Online publication date: 28-Nov-2018
    • (2017)Performance Evaluation of Priority Queues for Fine-Grained Parallel Tasks on GPUs2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2017.15(1-11)Online publication date: Sep-2017
    • (2017)Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applicationsThe Journal of Supercomputing10.1007/s11227-017-2091-x73:12(5378-5401)Online publication date: 1-Dec-2017
    • (2017)GPU‐accelerated backtracking using CUDA Dynamic ParallelismConcurrency and Computation: Practice and Experience10.1002/cpe.437430:9Online publication date: 27-Nov-2017
    • (2016)KLAPThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195654(1-12)Online publication date: 15-Oct-2016
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media