Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3148173.3148189acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Implementing implicit OpenMP data sharing on GPUs

Published: 12 November 2017 Publication History

Abstract

OpenMP is a shared memory programming model which supports the offloading of target regions to accelerators such as NVIDIA GPUs. The implementation in Clang/LLVM aims to deliver a generic GPU compilation toolchain that supports both the native CUDA C/C++ and the OpenMP device offloading models. There are situations where the semantics of OpenMP and those of CUDA diverge. One such example is the policy for implicitly handling local variables. In CUDA, local variables are implicitly mapped to thread local memory and thus become private to a CUDA thread. In OpenMP, due to semantics that allow the nesting of regions executed by different numbers of threads, variables need to be implicitly shared among the threads of a contention group.
In this paper we introduce a re-design of the OpenMP device data sharing infrastructure that is responsible for the implicit sharing of local variables in the Clang/LLVM toolchain. We introduce a new data sharing infrastructure that lowers implicitly shared variables to the shared memory of the GPU.
We measure the amount of shared memory used by our scheme in cases that involve scalar variables and statically allocated arrays. The evaluation is carried out by offloading to K40 and P100 NVIDIA GPUs. For scalar variables the pressure on shared memory is relatively low, under 26% of shared memory utilization for the K40, and does not negatively impact occupancy. The limiting occupancy factor in that case is register pressure. The data sharing scheme offers the users a simple memory model for controlling the implicit allocation of device shared memory.

References

[1]
Samuel F. Antao, Alexey Bataev, Arpith C. Jacob, Gheorghe-Teodor Bercea, Alexandre E. Eichenberger, Georgios Rokos, Matt Martineau, Tian Jin, Guray Ozen, Zehra Sura, Tong Chen, Hyojin Sung, Carlo Bertolli, and Kevin O'Brien. 2016. Offloading Support for OpenMP in Clang and LLVM. In Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC (LLVM-HPC '16). IEEE Press, Piscataway, NJ, USA, 1--11. https://doi.org/10.1109/LLVM-HPC.2016.6
[2]
Gheorghe-Teodor Bercea, Carlo Bertolli, Samuel F. Antao, Arpith C. Jacob, Alexandre E. Eichenberger, Tong Chen, Zehra Sura, Hyojin Sung, Georgios Rokos, David Appelhans, and Kevin O'Brien. 2015. Performance Analysis of OpenMP on a GPU Using a CORAL Proxy Application. In Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems (PMBS '15). ACM, New York, NY, USA, Article 2, 11 pages. https://doi.org/10.1145/2832087.2832089
[3]
Carlo Bertolli, Samuel F. Antao, Gheorghe-Teodor Bercea, Arpith C. Jacob, Alexandre E. Eichenberger, Tong Chen, Zehra Sura, Hyojin Sung, Georgios Rokos, David Appelhans, and Kevin O'Brien. 2015. Integrating GPU Support for OpenMP Offloading Directives into Clang. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC (LLVM '15). ACM, New York, NY, USA, Article 5, 11 pages. https://doi.org/10.1145/2833157.2833161
[4]
Carlo Bertolli, Samuel F. Antao, Alexandre E. Eichenberger, Kevin O'Brien, Zehra Sura, Arpith C. Jacob, Tong Chen, and Olivier Sallenave. 2014. Coordinating GPU Threads for OpenMP 4.0 in LLVM. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC (LLVM-HPC '14). IEEE Press, Piscataway, NJ, USA, 12--21. https://doi.org/10.1109/LLVM-HPC.2014.10
[5]
Arpith C. Jacob, Alexandre E. Eichenberger, Hyojin Sung, Samuel F. Antao, Gheorghe-Teodor Bercea, Carlo Bertolli, Alexey Bataev, Tian Jin, Tong Chen, Zehra Sura, Georgios Rokos, and Kevin O'Brien. [n. d.]. Efficient Fork-Join on GPUs through Warp Specialization. To be published at the IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2017) ([n. d.]).
[6]
M. Martineau, S. McIntosh-Smith, C. Bertolli, A. C. Jacob, S. F. Antao, A. Eichenberger, G. T. Bercea, T. Chen, T. Jin, K. O'Brien, G. Rokos, H. Sung, and Z. Sura. 2016. Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support. In 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 54--64. https://doi.org/10.1109/PMBS.2016.011
[7]
All members of the OpenMP Language Working Group. 2017. OpenMP Technical Report 4: Version 5.0 Preview 1. Technical Report. The OpenMP ARB.
[8]
Eric Stotzer, Ajay Jayaraj, Murtaza Ali, Arnon Friedmann, Gaurav Mitra, Alistair P. Rendell, and Ian Lintault. 2013. OpenMP on the Low-Power TI Keystone II ARM/DSP System-on-Chip. Springer Berlin Heidelberg, Berlin, Heidelberg, 114--127. https://doi.org/10.1007/978-3-642-40698-0_9
[9]
Yi Yang and Huiyang Zhou. 2014. CUDA-NP: Realizing Nested Thread-level Parallelism in GPGPU Applications. SIGPLAN Not. 49, 8 (Feb. 2014), 93--106. https://doi.org/10.1145/2692916.2555254

Cited By

View all
  • (2020)Mixed-data-model heterogeneous compilation and OpenMP offloadingProceedings of the 29th International Conference on Compiler Construction10.1145/3377555.3377891(119-131)Online publication date: 22-Feb-2020
  • (2020)Data Transfer and Reuse Analysis Tool for GPU-Offloading Using OpenMPOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_18(280-294)Online publication date: 1-Sep-2020
  • (2019)An open-source solution to performance portability for Summit and Sierra supercomputersIBM Journal of Research and Development10.1147/JRD.2019.2955944(1-1)Online publication date: 2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
LLVM-HPC'17: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC
November 2017
106 pages
ISBN:9781450355650
DOI:10.1145/3148173
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clang
  2. OpenMP
  3. data sharing
  4. shared memory

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SC '17
Sponsor:

Acceptance Rates

LLVM-HPC'17 Paper Acceptance Rate 9 of 10 submissions, 90%;
Overall Acceptance Rate 16 of 22 submissions, 73%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Mixed-data-model heterogeneous compilation and OpenMP offloadingProceedings of the 29th International Conference on Compiler Construction10.1145/3377555.3377891(119-131)Online publication date: 22-Feb-2020
  • (2020)Data Transfer and Reuse Analysis Tool for GPU-Offloading Using OpenMPOpenMP: Portable Multi-Level Parallelism on Modern Systems10.1007/978-3-030-58144-2_18(280-294)Online publication date: 1-Sep-2020
  • (2019)An open-source solution to performance portability for Summit and Sierra supercomputersIBM Journal of Research and Development10.1147/JRD.2019.2955944(1-1)Online publication date: 2019
  • (2019)The TRegion Interface and Compiler Optimizations for OpenMP Target RegionsOpenMP: Conquering the Full Hardware Spectrum10.1007/978-3-030-28596-8_11(153-167)Online publication date: 9-Aug-2019
  • (2018)Evaluating the Impact of Proposed OpenMP 5.0 Features on Performance, Portability and Productivity2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC.2018.00007(37-46)Online publication date: Nov-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media