Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/CGO.2013.6494993acmconferencesArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Portable mapping of data parallel programs to OpenCL for heterogeneous systems

Published: 23 February 2013 Publication History

Abstract

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high levellanguage (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51x and 4.20x (143x and 67x) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.

References

[1]
NAS parallel benchmarks 2.3, OpenMP C version. http: //phase.hpcc.jp/Omni/benchmarks/NPB/index.html.
[2]
AMD. AMD/ATI Stream SDK. http://www.amd.com/stream/.
[3]
M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic C-to-CUDA code generation for affine programs. In CC '10.
[4]
R. Bordawekar, U. Bondhugula, and R. Rao. Believe it or not!: multi-core CPUs can match GPU performance for a FLOP-intensive application! In PACT '10.
[5]
S. Che, J. W. Sheaffer, and K. Skadron. Dymaxion: optimizing memory access patterns for heterogeneous systems. In SC '11.
[6]
K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In LCTES '99.
[7]
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (SHOC) benchmark suite. In GPGPU '10.
[8]
C. Dubach, P. Cheng, R. Rabbah, D. Bacon, and S. Fink. Compiling a high-level language for GPUs (via language support for architectures and compilers). In PLDI '12.
[9]
D. Grewe and M. O'Boyle. A static task partitioning approach for heterogeneous systems using OpenCL. In CC '11.
[10]
T. D. Han and T. S. Abdelrahman. hiCUDA: a high-level directive-based language for GPU programming. In GPGPU '09.
[11]
A. Hormati, M. Samadi, M. Woh, T. N. Mudge, and S. A. Mahlke. Sponge: portable stream programming on graphics engines. In ASPLOS '11.
[12]
T. B. Jablin, J. A. Jablin, P. Prabhu, F. Liu, and D. I. August. Dynamically managed data for CPU-GPU architectures. In CGO '12.
[13]
T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August. Automatic CPU-GPU communication management and optimization. In PLDI '11.
[14]
K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers, 2002.
[15]
J. Kim, H. Kim, J. H. Lee, and J. Lee. Achieving a single compute device image in opencl for multiple GPUs. In PPoPP '11.
[16]
K. Komatsu, K. Sato, Y. Arai, K. Koyama, H. Takizawa, and H. Takizawa. Evaluating performance and portability of OpenCL programs. In Workshop on Automatic Performance Tuning 2010.
[17]
S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In PPoPP '09.
[18]
V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In ISCA '10.
[19]
C.-k. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In MICRO '09.
[20]
NVIDIA Corp. NVIDIA CUDA. http://developer. nvidia.com/object/cuda.html.
[21]
U. of Illinois at Urbana-Champaign. Parboil benchmark suite, http://impact.crhc.illinois.edu/parboil.php.
[22]
J. R. Quinlan. C4.5: programs for machine learning. 1993.
[23]
S. Seo, G. Jo, and J. Lee. Performance characterization of the NAS parallel benchmarks in OpenCL. In IISWC '11.
[24]
G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a holistic approach to auto-parallelization. In PLDI '09.
[25]
Z. Wang and M. O'Boyle. Mapping parallelism to multicores: a machine learning based approach. In PPoPP '09.
[26]
Z. Wang and M. O'Boyle. Partitioning streaming parallelism for multi-cores: a machine learning based approach. In PACT '10.

Cited By

View all
  • (2024)MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance OptimizationsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676895(156-167)Online publication date: 14-Oct-2024
  • (2024)FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate RepresentationsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608136(1-12)Online publication date: 20-May-2024
  • (2023)Performance Optimization using Multimodal Modeling and Heterogeneous GNNProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592984(45-57)Online publication date: 7-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '13: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
February 2013
366 pages
ISBN:9781467355247

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 February 2013

Check for updates

Author Tags

  1. GPU
  2. Learning Mapping
  3. Machine
  4. OpenCL

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance OptimizationsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676895(156-167)Online publication date: 14-Oct-2024
  • (2024)FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate RepresentationsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608136(1-12)Online publication date: 20-May-2024
  • (2023)Performance Optimization using Multimodal Modeling and Heterogeneous GNNProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592984(45-57)Online publication date: 7-Aug-2023
  • (2023)Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous MachinesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607079(1-13)Online publication date: 12-Nov-2023
  • (2023)Matching Linear Algebra and Tensor Code to Specialized Hardware AcceleratorsProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580262(85-97)Online publication date: 17-Feb-2023
  • (2022)BenchPressProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569644(505-516)Online publication date: 8-Oct-2022
  • (2022)Automating reinforcement learning architecture design for code optimizationProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517769(129-143)Online publication date: 19-Mar-2022
  • (2021)LIBSHALOMProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476217(1-14)Online publication date: 14-Nov-2021
  • (2021)Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimesProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454019(125-138)Online publication date: 7-Apr-2021
  • (2021)Automated conformance testing for JavaScript engines via deep compiler fuzzingProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454054(435-450)Online publication date: 19-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media