research-article

Integrating GPU support for OpenMP offloading directives into Clang

Authors:

Carlo Bertolli,

Samuel F. Antao,

Gheorghe-Teodor Bercea,

Arpith C. Jacob,

Alexandre E. Eichenberger,

Georgios Rokos,

David Appelhans,

Kevin O'BrienAuthors Info & Claims

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC

Article No.: 5, Pages 1 - 11

https://doi.org/10.1145/2833157.2833161

Published: 15 November 2015 Publication History

Abstract

The LLVM community is currently developing OpenMP 4.1 support, consisting of software improvements for Clang and new runtime libraries. OpenMP 4.1 includes offloading constructs that permit execution of user selected regions on generic devices, external to the main host processor. This paper describes our ongoing work towards delivering support for OpenMP offloading constructs for the OpenPower system into the LLVM compiler infrastructure. We previously introduced a design for a control loop scheme necessary to implement the OpenMP generic offloading model on NVIDIA GPUs. In this paper we show how we integrated the complexity of the control loop into Clang by limiting its support to OpenMP-related functionality. We also synthetically report the results of performance analysis on benchmarks and a complex application kernel. We show an optimization in the Clang code generation scheme for specific code patterns, alternative to the control loop, which delivers improved performance.

References

[1]

A. Baker. Custom hardware state-machines and datapaths: Using llvm to generate fpga accelerators, October 2014. http://llvm.org/devmtg/2014-10/Slides/Baker-CustomHardwareStateMachines.pdf.

[2]

J. Barker and J. Bowden. Manycore parallelism through openmp. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 45--57. Springer Berlin Heidelberg, 2013.

[3]

M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic c-to-cuda code generation for affine programs. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10, pages 244--263, Berlin, Heidelberg, 2010. Springer-Verlag.

Digital Library

[4]

A. Bataev. Openmp support in clang/llvm: Status update and future directions, October 2014. http://llvm.org/devmtg/2014-10/Slides/Bataev-OpenMP.pdf.

[5]

G.-T. Bercea, C. Bertolli, S. F. Antao, A. C. Jacob, A. E. Eichenberger, L. Duran, T. Chen, Z. Sura, H. Sung, G. Rokos, D. Appelhans, and K. O'Brien. Performance analysis of openmp on a gpu using a coral proxy application. In Submitted to 6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15), 2015.

Digital Library

[6]

C. Bertolli, S. F. Antao, A. E. Eichenberger, K. O'Brien, Z. Sura, A. C. Jacob, T. Chen, and O. Sallenave. Coordinating gpu threads for openmp 4.0 in llvm. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC '14, pages 12--21, Piscataway, NJ, USA, 2014. IEEE Press.

Digital Library

[7]

G. Brown. Implementing the sycl for opencl shared source c++ programming model using clang/llvm. https://www.codeplay.com/public/uploaded/publications/SC2014_LLVM_HPC.pdf.

[8]

Github repository for extended clang implementation supporting openmp 4.0. https://github.com/clang-omp/clang_trunk.

[9]

Coral award announcement. http://energy.gov/articles/department-energy-awards-425-million-next-generation-supercomputing-technologies.

[10]

Cuda toolkit webpage. http://docs.nvidia.com/cuda/index.html.

[11]

M. Haidl and S. Gorlatch. Pacxx: Towards a unified programming model for programming accelerators using c++14. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC '14, pages 1--11, Piscataway, NJ, USA, 2014. IEEE Press.

Digital Library

[12]

Nvidia libnvvm library manual. http://docs.nvidia.com/cuda/libnvvm-api/modules.htm.

[13]

The llvm compiler infrastructure webpage. http://llvm.org/.

[14]

Llvm backend component for nvptx archietecture (nvidia gpus). http://llvm.org/docs/NVPTXUsage.html.

[15]

Lulesh webpage. https://codesign.llnl.gov/lulesh.php.

[16]

Github repository for libomptarget offloading and gpu openmp runtime. https://github.com/clang-omp/libomptarget.

[17]

OpenMP Language Committee. OpenMP Application Program Interface, version 4.0 edition, July 2013. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.

[18]

E. Stotzer, A. Jayaraj, M. Ali, A. Friedmann, G. Mitra, A. Rendell, and I. Lintault. Openmp on the low-power ti keystone ii arm/dsp system-on-chip. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 114--127. Springer Berlin Heidelberg, 2013.

[19]

Vikas, T. Scott, N. Giacaman, and O. Sinnen. Using openmp under android. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 15--29. Springer Berlin Heidelberg, 2013.

[20]

U. Weigand. Supporting the new ibm z13 mainframe and its simd vector unit, April 2015. http://llvm.org/devmtg/2015-04/slides/Euro-LLVM-2015-Weigand.pdf.

Cited By

Yamato Y(2024)Study and Evaluation for Adopting Environmental Adaptation of Low-Resource DevicesIEEE Access10.1109/ACCESS.2024.344091812(110447-110456)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3440918
Yamato Y(2024)Study and evaluation of automatic division of general-purpose programs to facilitate addition of user functionsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2024.2375650(1-12)Online publication date: 9-Aug-2024
https://doi.org/10.1080/17445760.2024.2375650
Yamato Y(2024)Study and evaluation of automatic offloading for function blocks of applicationsAutomatika10.1080/00051144.2024.230188865:1(387-400)Online publication date: 9-Jan-2024
https://doi.org/10.1080/00051144.2024.2301888
Show More Cited By

Index Terms

Integrating GPU support for OpenMP offloading directives into Clang
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management
        Message passing

Recommendations

Performance analysis and optimization of Clang's OpenMP 4.5 GPU support
PMBS '16: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems

The Clang implementation of OpenMP^® 4.5 now provides full support for the specification, offering the only open source option for targeting NVIDIA^® GPUs. While using OpenMP allows portability across different architectures, matching native CUDA^® ...
Offloading Support for OpenMP in Clang and LLVM
LLVM-HPC '16: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC

OpenMP 4.5 allows performance portability by enabling users to write a single application code and run it on multiple types of accelerators. Our goal is to deliver a high-performance implementation of OpenMP into the Clang/LLVM project. This paper ...
Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC, and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

A serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC

November 2015

74 pages

ISBN:9781450340052

DOI:10.1145/2833157

Conference Chair:
Hal Finkel
Argonne National Laboratory

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

LLNS

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS\DATC

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15, 2015

Texas, Austin

Acceptance Rates

LLVM '15 Paper Acceptance Rate 7 of 12 submissions, 58%;

Overall Acceptance Rate 16 of 22 submissions, 73%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
468
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yamato Y(2024)Study and Evaluation for Adopting Environmental Adaptation of Low-Resource DevicesIEEE Access10.1109/ACCESS.2024.344091812(110447-110456)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3440918
Yamato Y(2024)Study and evaluation of automatic division of general-purpose programs to facilitate addition of user functionsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2024.2375650(1-12)Online publication date: 9-Aug-2024
https://doi.org/10.1080/17445760.2024.2375650
Yamato Y(2024)Study and evaluation of automatic offloading for function blocks of applicationsAutomatika10.1080/00051144.2024.230188865:1(387-400)Online publication date: 9-Jan-2024
https://doi.org/10.1080/00051144.2024.2301888
Chakrabarti DRodgers GBertolli CBercea GLehr JStringer LLeyonberg JPalermo DLieberman R(2023)Specialized Kernels for Optimizing GPU Offload in OpenMPProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624605(1918-1928)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624605
Tian SChapman BDoerfert J(2023)Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble ExecutionProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3606016(112-118)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605731.3606016
Wright EDoerfert JTian SChapman BChandrasekaran S(2023)Implementing OpenMP’s SIMD Directive in LLVM’s GPU RuntimeProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605640(173-182)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605640
Yan KShi YYan YChen QHuang ZSi M(2023)Exploring OpenMP GPU Offloading for Implementing Convolutional Neural NetworksProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582523(60-69)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3582514.3582523
Yamato Y(2023)Proposal and Evaluation of GPU Offloading Parts Reconfiguration During Applications Operations for Environment AdaptationJournal of Network and Systems Management10.1007/s10922-023-09789-232:1Online publication date: 28-Nov-2023
https://doi.org/10.1007/s10922-023-09789-2
Tian SChapman BDoerfert J(2023)Exploring the Limits of Generic Code Execution on GPUs via Direct (OpenMP) OffloadOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_12(179-192)Online publication date: 1-Sep-2023
https://doi.org/10.1007/978-3-031-40744-4_12
Kasmeridis IDimakopoulos V(2022)OpenMP Offloading in the Jetson Nano PlatformWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548517(1-8)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3547276.3548517
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents