Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2581122.2544165acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
tutorial

Efficient Mapping of Irregular C++ Applications to Integrated GPUs

Published: 15 February 2014 Publication History

Abstract

There is growing interest in using GPUs to accelerate general-purpose computation since they offer the potential of massive parallelism with reduced energy consumption. This interest has been encouraged by the ubiquity of integrated processors that combine a GPU and CPU on the same die, lowering the cost of offloading work to the GPU. However, while the majority of effort has focused on GPU acceleration of regular applications, relatively little is known about the behavior of irregular applications on GPUs. These applications are expected to perform poorly on GPUs without major software engineering effort. We present a compiler framework with support for C++ features that enables GPU acceleration of a wide range of C++ applications with minimal changes. This framework, Concord, includes a low-cost, software SVM implementation that permits seamless sharing of pointer-containing data structures between the CPU and GPU. It also includes compiler optimizations to improve irregular application performance on GPUs. Using Concord, we ran nine irregular C++ programs on two computer systems containing Intel 4th Generation Core processors. One system is an Ultrabook with an integrated HD Graphics 5000 GPU, and the other system is a desktop with an integrated HD Graphics 4600 GPU. The nine applications are pointer-intensive and operate on irregular data structures such as trees and graphs; they include face detection, BTree, single-source shortest path, soft-body physics simulation, and breadth-first search. Our results show that Concord acceleration using the GPU improves energy efficiency by up to 6.04× on the Ultrabook and 3.52× on the desktop over multicore-CPU execution.

References

[1]
First-Rays. http://www.codermind.com/articles/Raytracer-in-C++-Part-I-First-rays.html.
[2]
Galois. http://iss.ices.utexas.edu/?p=projects/galois.
[3]
Intel Corporation. The Intel Thread Building Blocks. http://threading-buildingblocks.org/.
[4]
Khronos OpenCL Working Group. The OpenCL Specification, http://www.khronos.org/opencl/.
[5]
Microsoft Corporation. C++ Accelerated Massive Parallelism Specification.
[6]
NVIDIA Corporation. The CUDA Specification, http://developer.nvidia-.com/object/cuda.html.
[7]
OpenCV. http://sourceforge.net/projects/opencvlibrary/.
[8]
Petme. http://software.intel.com/en-us/articles/multi-core-simulation-of-soft-body-characters-using-cloth/.
[9]
Rodinia. http://lava.cs.virginia.edu/Rodinia/.
[10]
Task Parallel Library (TPL). http://msdn.microsoft.com/en-us/library/dd460717.aspx.
[11]
The Cilk Project. http://supertech.csail.mit.edu/cilk.
[12]
The OpenACC#8482; Application Programming Interface, www.openacc-standard.org/.
[13]
J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: a Java-compatible and synthesizable language for heterogeneous architectures. OOPSLA'10.
[14]
M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic C-to-CUDA code generation for affine programs. CC'10/ETAPS'10.
[15]
B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. PPoPP'11.
[16]
P. Cooper, U. Dolinsky, A. F. Donaldson, A. Richards, C. Riley, and G. Russell. Offload - automating code migration to heterogeneous multicore systems. HiPEAC'10.
[17]
D. Cunningham, R. Bordawekar, and V. Saraswat. GPU programming in a high level language: compiling X10 to CUDA. X10'11.
[18]
C. Dubach, P. Cheng, R. Rabbah, D. F. Bacon, and S. J. Fink. Compiling a high-level language for GPUs: (via language support for architectures and compilers). PLDI'12.
[19]
D. Grewe, Z. Wang, and M. F. O'Boyle. Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems. CGO'13.
[20]
M. Grossman, A. S. Sbirlea, Z. Budimlic, and V. Sarkar. CnC-CUDA: Declarative Programming for GPUs. LCPC'10.
[21]
T. D. Han and T. S. Abdelrahman. hiCUDA: High-Level GPGPU Programming. TPDS'11.
[22]
S. Hong, S. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA graph algorithms at maximum warp. PPoPP'11.
[23]
J. Knoop, O. Rüthing, and B. Steffen. Optimal code motion: theory and practice. TOPLAS'94.
[24]
J. Lee and H. Kim. TAP: A TLP-aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture. HPCA'12.
[25]
S. Lee and R. Eigenmann. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. SC'10.
[26]
S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. PPoPP'09.
[27]
R. McIlroy and J. Sventek. Hera-JVM: a runtime system for heterogeneous multi-core architectures. OOPSLA'10.
[28]
B. Ren, G. Agrawal, J. R. Larus, T. Mytkowicz, T. Poutanen, and W. Schulte. SIMD parallelization of applications that traverse irregular data structures. CGO'13.
[29]
A. Sbîrlea, Y. Zou, Z. Budimlíc, J. Cong, and V. Sarkar. Mapping a data-flow programming model onto heterogeneous platforms. LCTES'12.
[30]
D. Unat, X. Cai, and S. B. Baden. Mint: realizing CUDA performance in 3D stencil methods with annotated C. ICS'11.
[31]
H. Wu, G. Diamos, J. Wang, S. Li, and S. Yalamanchili. Characterization and Transformation of Unstructured Control Flow in bulk synchronous GPU Applications. JHPCA'12.
[32]
E. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for GPU computing. ASPLOS'11.

Cited By

View all
  • (2021)Judging a type by its pointer: optimizing GPU virtual functionsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446734(241-254)Online publication date: 19-Apr-2021
  • (2019)Restricted Extensions for GPU Photo-realistic RendererGraphiCon'2019 Proceedings. Volume 210.30987/graphicon-2019-2-37-42(37-42)Online publication date: 5-Nov-2019
  • (2019)Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processorsCCF Transactions on High Performance Computing10.1007/s42514-019-00008-61:2(131-143)Online publication date: 12-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
February 2014
328 pages
ISBN:9781450326704
DOI:10.1145/2581122
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compiler optimization
  2. Energy efficiency
  3. Integrated GPU programming

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CGO '14

Acceptance Rates

CGO '14 Paper Acceptance Rate 29 of 100 submissions, 29%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Judging a type by its pointer: optimizing GPU virtual functionsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446734(241-254)Online publication date: 19-Apr-2021
  • (2019)Restricted Extensions for GPU Photo-realistic RendererGraphiCon'2019 Proceedings. Volume 210.30987/graphicon-2019-2-37-42(37-42)Online publication date: 5-Nov-2019
  • (2019)Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processorsCCF Transactions on High Performance Computing10.1007/s42514-019-00008-61:2(131-143)Online publication date: 12-Jun-2019
  • (2019)Mozart : Efficient Composition of Library Functions for Heterogeneous ExecutionLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_13(182-202)Online publication date: 15-Nov-2019
  • (2018)Accelerating Data Analytics on Integrated GPU Platforms via Runtime SpecializationInternational Journal of Parallel Programming10.1007/s10766-016-0482-x46:2(336-375)Online publication date: 1-Apr-2018
  • (2017)FinePar: irregularity-aware fine-grained workload partitioning on integrated architecturesProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049836(27-38)Online publication date: 4-Feb-2017
  • (2017)Analyzing memory management methods on integrated CPU-GPU systemsACM SIGPLAN Notices10.1145/3156685.309225652:9(59-69)Online publication date: 18-Jun-2017
  • (2017)SAVI objects: sharing and virtuality incorporatedProceedings of the ACM on Programming Languages10.1145/31338691:OOPSLA(1-24)Online publication date: 12-Oct-2017
  • (2017)Analyzing memory management methods on integrated CPU-GPU systemsProceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management10.1145/3092255.3092256(59-69)Online publication date: 18-Jun-2017
  • (2017)Understanding Co-Running Behaviors on Integrated CPU/GPU ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.258607428:3(905-918)Online publication date: 1-Mar-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media