Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503223acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

General transformations for GPU execution of tree traversals

Published: 17 November 2013 Publication History

Abstract

With the advent of programmer-friendly GPU computing environments, there has been much interest in offloading workloads that can exploit the high degree of parallelism available on modern GPUs. Exploiting this parallelism and optimizing for the GPU memory hierarchy is well-understood for regular applications that operate on dense data structures such as arrays and matrices. However, there has been significantly less work in the area of irregular algorithms and even less so when pointer-based dynamic data structures are involved. Recently, irregular algorithms such as Barnes-Hut and kd-tree traversals have been implemented on GPUs, yielding significant performance gains over CPU implementations. However, the implementations often rely on exploiting application-specific semantics to get acceptable performance. We argue that there are general-purpose techniques for implementing irregular algorithms on GPUs that exploit similarities in algorithmic structure rather than application-specific knowledge. We demonstrate these techniques on several tree traversal algorithms, achieving speedups of up to 38x over 32--thread CPU versions.

References

[1]
J. Barnes and P. Hut. A hierarchical o(n log n) force-calculation algorithm. nature, 324:4, 1986.
[2]
M. Burtscher and K. Pingali. An efficient CUDA implementation of the tree-based barnes hut n-body algorithm. In GPU Computing Gems Emerald Edition, pages 75--92. 2011.
[3]
T. Foley and J. Sugerman. Kd-tree acceleration structures for a gpu raytracer. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, HWWS '05, pages 15--22, 2005.
[4]
M. Goldfarb, Y. Jo, and M. Kulkarni. General Transformations for GPU Execution of Tree Traversals. Technical Report TR-ECE-13-09, Purdue University, 2013.
[5]
J. Gunther, S. Popov, H.-P. Seidel, and P. Slusallek. Real-time ray tracing on gpu with bvh-based packet traversal. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing, RT '07, pages 113--118, 2007.
[6]
M. Hapala, T. Davidovic, I. Wald, V. Havran, and P. Slusallek. Efficient Stack-less BVH Traversal for Ray Tracing. In Proceedings 27th Spring Conference of Computer Graphics (SCCG) 2011, pages 29--34, 2011.
[7]
D. M. Hughes and I. S. Lim. Kd-jump: a path-preserving stackless traversal for faster isosurface raytracing on gpus. IEEE Transactions on Visualization and Computer Graphics, 15(6):1555--1562, Nov. 2009.
[8]
X. Huo, S. Krishnamoorthy, and G. Agrawal. Efficient scheduling of recursive control flow on gpus. In Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pages 409--420, New York, NY, USA, 2013. ACM.
[9]
Y. Jo, M. Goldfarb, and M. Kulkarni. Automatic vectorization of tree traversals. In PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, 2013.
[10]
Y. Jo and M. Kulkarni. Enhancing locality for recursive traversals of recursive structures. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, OOPSLA '11, pages 463--482, 2011.
[11]
Y. Jo and M. Kulkarni. Automatically enhancing locality for tree traversals with traversal splicing. In Proceedings of the 2012 ACM international conference on Object oriented programming systems languages and applications, OOPSLA '12, 2012.
[12]
M. Kulkarni, M. Burtscher, K. Pingali, and C. Cascaval. Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 65--76, April 2009.
[13]
S. Lee and R. Eigenmann. Openmpc: Extended openmp programming and tuning for gpus. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, 2010.
[14]
C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 45--55, 2009.
[15]
J. Makino. Vectorization of a treecode. J. Comput. Phys., 87:148--160, March 1990.
[16]
E. Mansson, J. Munkberg, and T. Akenine-Moller. Deep coherent ray tracing. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing, RT '07, pages 79--85, 2007.
[17]
M. Méndez-Lojo, M. Burtscher, and K. Pingali. A gpu implementation of inclusion-based points-to analysis. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 107--116. ACM, 2012.
[18]
D. Merrill, M. Garland, and A. Grimshaw. Scalable gpu graph traversal. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 117--128, 2012.
[19]
B. Moon, Y. Byun, T.-J. Kim, P. Claudio, H.-S. Kim, Y.-J. Ban, S. W. Nam, and S.-E. Yoon. Cache-oblivious ray reordering. ACM Trans. Graph., 29(3):28:1--28:10, July 2010.
[20]
A. Moore, A. Connolly, C. Genovese, A. Gray, L. Grone, N. Kanidoris II, R. Nichol, J. Schneider, A. Szalay, I. Szapudi, et al. Fast algorithms and efficient statistics: N-point correlation functions. Mining the Sky, pages 71--82, 2001.
[21]
S. Popov, J. Günther, H.-P. Seidel, and P. Slusallek. Stackless kd-tree traversal for high performance GPU ray tracing. Computer Graphics Forum, 26(3):415--424, Sept. 2007. (Proceedings of Eurographics).
[22]
M. Rehman, K. Kothapalli, and P. Narayanan. Fast and scalable list ranking on the gpu. In Proceedings of the 23rd international conference on supercomputing, pages 235--243, 2009.
[23]
J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. Hennessy. Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. J. Parallel Distrib. Comput., 27(2):118--141, June 1995.
[24]
V. Vineet, P. Harish, S. Patidar, and P. Narayanan. Fast minimum spanning tree for large graphs on the gpu. In Proceedings of the Conference on High Performance Graphics 2009, pages 167--171. ACM, 2009.
[25]
Z. Wei and J. JaJa. Optimization of linked list prefix computations on multithreaded gpus using cuda. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--8. IEEE, 2010.
[26]
B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '13, pages 57--68, New York, NY, USA, 2013. ACM.
[27]
P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, SODA '93, pages 311--321, 1993.
[28]
E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, ASPLOS XVI, pages 369--380, New York, NY, USA, 2011. ACM.

Cited By

View all
  • (2024)Garbage Collection for Mostly Serialized HeapsProceedings of the 2024 ACM SIGPLAN International Symposium on Memory Management10.1145/3652024.3665512(1-14)Online publication date: 20-Jun-2024
  • (2024)Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray TracingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656601(14-25)Online publication date: 30-May-2024
  • (2023)GPU-parallelisation of Haar wavelet-based grid resolution adaptation for fast finite volume modelling: application to shallow water flowsJournal of Hydroinformatics10.2166/hydro.2023.15425:4(1210-1234)Online publication date: 16-Jun-2023
  • Show More Cited By

Index Terms

  1. General transformations for GPU execution of tree traversals

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
    November 2013
    1123 pages
    ISBN:9781450323789
    DOI:10.1145/2503210
    • General Chair:
    • William Gropp,
    • Program Chair:
    • Satoshi Matsuoka
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 November 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU
    2. irregular programs
    3. tree traversals
    4. vectorization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SC13
    Sponsor:

    Acceptance Rates

    SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Garbage Collection for Mostly Serialized HeapsProceedings of the 2024 ACM SIGPLAN International Symposium on Memory Management10.1145/3652024.3665512(1-14)Online publication date: 20-Jun-2024
    • (2024)Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray TracingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656601(14-25)Online publication date: 30-May-2024
    • (2023)GPU-parallelisation of Haar wavelet-based grid resolution adaptation for fast finite volume modelling: application to shallow water flowsJournal of Hydroinformatics10.2166/hydro.2023.15425:4(1210-1234)Online publication date: 16-Jun-2023
    • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
    • (2023)RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00100(963-973)Online publication date: May-2023
    • (2022)Accelerating Random Forest Classification on GPU and FPGAProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545067(1-11)Online publication date: 29-Aug-2022
    • (2021)Compiling pattern matching to in-place modificationsProceedings of the 20th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3486609.3487204(123-129)Online publication date: 17-Oct-2021
    • (2021)Efficient tree-traversals: reconciling parallelism and dense data representationsProceedings of the ACM on Programming Languages10.1145/34735965:ICFP(1-29)Online publication date: 19-Aug-2021
    • (2021)Optimization of cosmological N-body simulation with FMM-PM on SIMT acceleratorsThe Journal of Supercomputing10.1007/s11227-021-04153-078:5(7186-7205)Online publication date: 5-Nov-2021
    • (2020)Implementing an Attack Graph Generator in CUDA2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00128(730-738)Online publication date: May-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media