Article

Portable mapping of data parallel programs to OpenCL for heterogeneous systems

Authors:

Michael F. P. O'Boyle,

Dominik GreweAuthors Info & Claims

CGO '13: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Pages 1 - 10

https://doi.org/10.1109/CGO.2013.6494993

Published: 23 February 2013 Publication History

Abstract

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high levellanguage (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51x and 4.20x (143x and 67x) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.

References

[1]

NAS parallel benchmarks 2.3, OpenMP C version. http: //phase.hpcc.jp/Omni/benchmarks/NPB/index.html.

[2]

AMD. AMD/ATI Stream SDK. http://www.amd.com/stream/.

[3]

M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic C-to-CUDA code generation for affine programs. In CC '10.

Digital Library

[4]

R. Bordawekar, U. Bondhugula, and R. Rao. Believe it or not!: multi-core CPUs can match GPU performance for a FLOP-intensive application! In PACT '10.

Digital Library

[5]

S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: optimizing memory access patterns for heterogeneous systems. In SC '11.

Digital Library

[6]

K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In LCTES '99.

Digital Library

[7]

A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (SHOC) benchmark suite. In GPGPU '10.

Digital Library

[8]

C. Dubach, P. Cheng, R. Rabbah, D. Bacon, and S. Fink. Compiling a high-level language for GPUs (via language support for architectures and compilers). In PLDI '12.

Digital Library

[9]

D. Grewe and M. O'Boyle. A static task partitioning approach for heterogeneous systems using OpenCL. In CC '11.

Digital Library

[10]

T. D. Han and T. S. Abdelrahman. hiCUDA: a high-level directive-based language for GPU programming. In GPGPU '09.

Digital Library

[11]

A. Hormati, M. Samadi, M. Woh, T. N. Mudge, and S. A. Mahlke. Sponge: portable stream programming on graphics engines. In ASPLOS '11.

Digital Library

[12]

T. B. Jablin, J. A. Jablin, P. Prabhu, F. Liu, and D. I. August. Dynamically managed data for CPU-GPU architectures. In CGO '12.

Digital Library

[13]

T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August. Automatic CPU-GPU communication management and optimization. In PLDI '11.

Digital Library

[14]

K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers, 2002.

Digital Library

[15]

J. Kim, H. Kim, J. H. Lee, and J. Lee. Achieving a single compute device image in opencl for multiple GPUs. In PPoPP '11.

Digital Library

[16]

K. Komatsu, K. Sato, Y. Arai, K. Koyama, H. Takizawa, and H. Takizawa. Evaluating performance and portability of OpenCL programs. In Workshop on Automatic Performance Tuning 2010.

[17]

S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In PPoPP '09.

Digital Library

[18]

V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In ISCA '10.

Digital Library

[19]

C.-k. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In MICRO '09.

Digital Library

[20]

NVIDIA Corp. NVIDIA CUDA. http://developer. nvidia.com/object/cuda.html.

[21]

U. of Illinois at Urbana-Champaign. Parboil benchmark suite, http://impact.crhc.illinois.edu/parboil.php.

[22]

J. R. Quinlan. C4.5: programs for machine learning. 1993.

Digital Library

[23]

S. Seo, G. Jo, and J. Lee. Performance characterization of the NAS parallel benchmarks in OpenCL. In IISWC '11.

Digital Library

[24]

G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a holistic approach to auto-parallelization. In PLDI '09.

Digital Library

[25]

Z. Wang and M. O'Boyle. Mapping parallelism to multicores: a machine learning based approach. In PPoPP '09.

Digital Library

[26]

Z. Wang and M. O'Boyle. Partitioning streaming parallelism for multi-cores: a machine learning based approach. In PACT '10.

Digital Library

Cited By

Dutta AJannesari A(2024)MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance OptimizationsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676895(156-167)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676895
Niu CLi CNg VLo DLuo BRoychoudhury APaiva AAbreu RStorey M(2024)FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate RepresentationsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608136(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3608136
Dutta AAlcaraz JTehraniJamsaz ACesar ESikora AJannesari AButt AMi NChard K(2023)Performance Optimization using Multimodal Modeling and Heterogeneous GNNProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592984(45-57)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592984
Show More Cited By

Index Terms

Portable mapping of data parallel programs to OpenCL for heterogeneous systems
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language features
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to ...
Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms
IWOCL '19: Proceedings of the International Workshop on OpenCL

A key challenge in programming high-performance applications is achieving portable performance, such that the same program code can reach a consistent level of performance over the variety of modern parallel processors, including multi-core CPU and ...
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '13: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

February 2013

366 pages

ISBN:9781467355247

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 February 2013

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
262
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)2

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dutta AJannesari A(2024)MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance OptimizationsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676895(156-167)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676895
Niu CLi CNg VLo DLuo BRoychoudhury APaiva AAbreu RStorey M(2024)FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate RepresentationsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608136(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3608136
Dutta AAlcaraz JTehraniJamsaz ACesar ESikora AJannesari AButt AMi NChard K(2023)Performance Optimization using Multimodal Modeling and Heterogeneous GNNProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592984(45-57)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592984
S. F. X. Teixeira THenzinger AYadav RAiken AMohror KArnold DBadia R(2023)Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous MachinesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607079(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607079
Martínez PWoodruff JArmengol-Estapé JBernabé GGarcía JO’Boyle MVerbrugge CLhoták OShen X(2023)Matching Linear Algebra and Tensor Code to Specialized Hardware AcceleratorsProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580262(85-97)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3578360.3580262
Tsimpourlas FPetoumenos PXu MCummins CHazelwood KRajan ALeather HKloeckner AMoreira J(2022)BenchPressProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569644(505-516)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569644
Wang HTang ZZhang CZhao JCummins CLeather HWang ZEgger BSmith A(2022)Automating reinforcement learning architecture design for code optimizationProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517769(129-143)Online publication date: 19-Mar-2022
https://dl.acm.org/doi/10.1145/3497776.3517769
Yang WFang JDong DSu XWang Zde Supinski BHall MGamblin T(2021)LIBSHALOMProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476217(1-14)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476217
Papadimitriou MMarkou EFumero JStratikopoulos ABlanaru FKotselidis CTitzer BXu HZhang I(2021)Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimesProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454019(125-138)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454019
Ye GTang ZTan SHuang SFang DSun XBian LWang HWang ZFreund SYahav E(2021)Automated conformance testing for JavaScript engines via deep compiler fuzzingProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454054(435-450)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454054
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents