article

Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator

Authors:

Gabriel MartinezAuthors Info & Claims

Parallel Computing, Volume 39, Issue 12

Pages 769 - 786

https://doi.org/10.1016/j.parco.2013.09.003

Published: 01 December 2013 Publication History

Abstract

The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and their associated programming models. One of the most promising models for heterogeneous computing is the accelerator model, and one of the most cost-effective, high-performance accelerators currently available is the general-purpose, graphics processing unit (GPU). Two similar programming environments have been proposed for GPUs: CUDA and OpenCL. While there are more lines of code already written in CUDA, OpenCL is an open standard that supports a broader. Hence, there is significant interest in automatic translation from CUDA to OpenCL. The contributions of this work are three-fold: (1) an extensive characterization of the subtle challenges of translation, (2) CU2CL (CUDA to OpenCL) - an implementation of a translator, and (3) an evaluation of CU2CL with respect to coverage of CUDA, translation performance, and performance of the translated applications.

References

[1]

Daga, M., Scogland, T. and Feng, W., Architecture-Aware Mapping and Optimization on a 1600-Core GPU, in 17th IEEE International Conference on Parallel and Distributed Systems. 2011. Tainan, Taiwan, December.

Digital Library

[2]

S. Xiao, H. Lin, W.-C. Feng, Accelerating protein sequence search in a heterogeneous computing system, in Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, 2011, pp. 1212-1222.

Digital Library

[3]

NVIDA Corporation, Nvidia CUDA C Programming Guide, http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CU DACProgrammingGuide.pdf.

[4]

NVIDIA Corporation, NVIDIA Contributes CUDA Compiler to Open Source Community, http://nvidianews.nvidia.com/Releases/NVIDIA-Contributes-CUDA-Compiler- to-Open-Source-Community-7d0.aspx, May 9, 2012.

[5]

J. Leskela, J. Nikula, M. Salmela, OpenCL Embedded Profile Prototype in Mobile Device, in IEEE Workshop on Signal Processing Systems, Oct 2009, pp. 279-284.

[6]

HPCwire, The Portland Group Ships OpenCL Compiler for Multi-core ARM, press release at http://www.hpcwire.com/hpcwire/2012-02-28/theportlandgroupshipsopen clcompilerformulti-corearm.html, Feb 28, 2012.

[7]

Imagination Technologies, Imagination Submits POWERVR SGX Cores for OpenCL Conformance, press release http://www.imgtec.com/news/Release/index.asp?NewsID=610, Feb 14, 2011.

[8]

Y. Aridor, Discussing Intel's OpenCL With Technical Lead Yariv Aridor - Parallel Programming Talk #117, video at http://software.intel.com/en-us/blogs/2011/07/27/discussing-intels-open cl-with-technical-lead-yariv-aridor-parallel-programming-talk-117/, July 27, 2011.

[9]

Altera, White Paper: Implementing FPGA Design with the OpenCL Standard, http://www.altera.com/b/opencl.html, Nov 2011.

[10]

G.F. Diamos, A.R. Kerr, S. Yalamanchili, N. Clark, Ocelot: A Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous Systems, in: 19th International Conference on Parallel Architectures and Compilation, Techniques, 2010, pp. 353-364.

[11]

R. Domínguez, D. Schaa, D. Kaeli, Caracal: Dynamic Translation of Runtime Environments for GPUs, in 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011, pp. 5:1-5:7.

[12]

J.A. Stratton, S.S. Stone, W.W. Hwu, Languages and Compilers for Parallel Computing.Springer-Verlag, 2008, ch. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs, pp. 16-30.

[13]

Martinez, G., Gardner, M. and Feng, W., CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-Core Architectures, in IEEE 17th Intl. 2011. Conference on Parallel and Distributed Systems, Dec.

[14]

Harvey, M.J. and Fabritiis, G.D., Swan: A tool for porting CUDA programs to OpenCL. Computer Physics Communications. v182 i4. 1093-1099.

[15]

S. Rosendahl, CUDA and OpenCL API Comparison, Presentation for T106.5800 Seminar on GPGPU Programming, Spring 2010, https://wiki.aalto.fi/download/attachments/40025977/Cuda+and+OpenCL+API +comparisonpresented.pdf.

[16]

NVIDIA, CUDA Toolkit, http://developer.nvidia.com/cuda/cuda-toolkit.

[17]

Rodinia: A Benchmark Suite for Heterogeneous Computing, http://lava.cs.virginia.edu/Rodinia.

[18]

Spinellis, D., Global analysis and transformations in preprocessed languages. IEEE Transactions on Software Engineering. v29 i11. 1019-1030.

[19]

Z. Guo, E. Zhang, X. Shen, Correctly treating synchronizations in compiling fine-grained SPMD-threaded programs for CPU, in: 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2011, pp. 310-319.

Digital Library

[20]

Quinlan, D., ROSE: Compiler Support for Object-Oriented Frameworks. Parallel Processing Letters. v2 i3. 215-226.

[21]

Eelco, V., Program Transformation with Stratego/XT, in Domain-Specific Program Generation, ser. Lecture Notes in Computer Science. v3016. 315-349.

[22]

I. Baxter, C. Pidgeon, M. Mehlich, DMS: program transformations for practical scalable software evolution, in: 26th International Conference on Software Engineering. IEEE Computer Society, 2004, pp. 625-634.

[23]

Lee, S., Johnson, T. and Eigenmann, R., Cetus-an extensible compiler infrastructure for source-to-source transformation. Languages and Compilers for Parallel Computing. v9703180. 539-553.

[24]

clang: a C language family frontend for LLVM, http://clang.llvm.org/.

[25]

J. Van Wijngaarden, J. Van Wijngaarden, E. Visser, Program Transformation Mechanics: A classification of Mechanisms for Program Transformation with a Survey of Existing Transformation Systems, Utrecht University: Information and Computing Sciences, Tech. Rep. UU-CS 2003-048, 2003.

[26]

M.L. Van De Vanter, Preserving the Documentary Structure of Source Code in Language-Based Transformation Tools, Workshop on Source Code Analysis and Manipulation, 2001, pp. 131-141.

[27]

Molecular Dynamics Simulations of Aqueous Ions at the LiquidVapor Interface Accelerated using Graphics Processors. Journal of Computational Chemistry. v32 i3. 375-385.

[28]

R. Anandakrishnan, T.R. Scogland, A.T. Fenley, J.C. Gordon, W. chun Feng, A.V. Onufriev, Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units, Journal of Molecular Graphics and Modelling 28 (8) (2010) 904-910.

[29]

D. Yudanov, M. Shaaban, R. Melton, L. Reznik, GPU-based simulation of spiking neural networks with real-time performance and high accuracy, in: Neural Networks (IJCNN), The 2010 International Joint Conference on, July, pp. 1-8.

[30]

Anandakrishnan, R., Scogland, T.R., Fenley, A.T., Gordon, J.C., chun Feng, W. and Onufriev, A.V., "Accelerating Electrostatic Surface Potential Calculation with Multi-Scale Approximation on Graphics Processing Units". Journal of Molecular Graphics and Modelling. v28 i8. 904-910.

Cited By

Sathre PGardner MFeng W(2019)On the Portability of CPU-Accelerated Applications via Automated Source-to-Source TranslationProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3293320.3293338(1-8)Online publication date: 14-Jan-2019
https://dl.acm.org/doi/10.1145/3293320.3293338
Kim JDao TJung JJoo JLee JKern JVetter J(2015)Bridging OpenCL and CUDAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807621(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2807591.2807621

Recommendations

Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

The use of accelerators in high-performance computing is increasing. The most commonly used accelerator is the graphics processing unit (GPU) because of its low cost and massively parallel performance. The two most common programming environments for ...
Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Parallel Computing

Parallel Computing Volume 39, Issue 12

December, 2013

140 pages

ISSN:0167-8191

Issue’s Table of Contents

Copyright © © 2013.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 December 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sathre PGardner MFeng W(2019)On the Portability of CPU-Accelerated Applications via Automated Source-to-Source TranslationProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3293320.3293338(1-8)Online publication date: 14-Jan-2019
https://dl.acm.org/doi/10.1145/3293320.3293338
Kim JDao TJung JJoo JLee JKern JVetter J(2015)Bridging OpenCL and CUDAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807621(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2807591.2807621

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents