Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/SC.2010.36acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Published: 13 November 2010 Publication History

Abstract

General-Purpose Graphics Processing Units (GPGPUs) are promising parallel platforms for high performance computing. The CUDA (Compute Unified Device Architecture) programming model provides improved programmability for general computing on GPGPUs. However, its unique execution model and memory model still pose significant challenges for developers of efficient GPGPU code. This paper proposes a new programming interface, called OpenMPC, which builds on OpenMP to provide an abstraction of the complex CUDA programming model and offers high-level controls of the involved parameters and optimizations. We have developed a fully automatic compilation and user-assisted tuning system supporting OpenMPC. In addition to a range of compiler transformations and optimizations, the system includes tuning capabilities for generating, pruning, and navigating the search space of compilation variants. Our results demonstrate that OpenMPC offers both programmability and tunability. Our system achieves 88% of the performance of the hand-coded CUDA programs.

References

[1]
"OpenMP {Online}. Available: http://openmp.org/wp/."
[2]
S. Lee, S.-J. Min, and R. Eigenmann, "OpenMP to GPGPU: A compiler framework for automatic translation and optimization," in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). New York, NY, USA: ACM, Feb. 2009, pp. 101-110.
[3]
S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S. Ueng, J. A. Stratton, and W. W. Hwu, "Program optimization space pruning for a multithreaded GPU," International Symposium on Code Generation and Optimization (CGO), 2008.
[4]
Y. Liu, E. Z. Zhang, and X. Shen, "A cross-input adaptive framework for GPU program optimizations," 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1-10, 2009.
[5]
T. D. Han and T. S. Abdelrahman, "hiCUDA: a high-level directive-based language for GPU programming," in GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. New York, NY, USA: ACM, 2009, pp. 52-61.
[6]
M. M. Baskaran, J. Ramanujam, and P. Sadayappan, "Automatic C-to-CUDA code generation for affine programs," International Conference on Compiler Construction (CC), vol. Volume 6011/2010, pp. 244-263, March 2010.
[7]
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. W. Hwu, "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA," ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 73-82, 2008.
[8]
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, "A compiler framework for optimization of affine loop nests for GPGPUs," ACM International Conference on Supercomputing (ICS), 2008.
[9]
T. Davis, "University of Florida Sparse Matrix Collection {Online}. Available: http://www.cise.ufl.edu/research/sparse/matrices/."
[10]
A. Nukada and S. Matsuoka, "Auto-tuning 3-D FFT library for CUDA GPUs," in SC '09: Proceedings of the 2009 ACM/IEEE conference on Supercomputing. New York, NY, USA: ACM, 2009, pp. 1-10.
[11]
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures," in SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Piscataway, NJ, USA: IEEE Press, 2008, pp. 1-12.
[12]
V. Volkov and J. W. Demmel, "Benchmarking GPUs to tune dense linear algebra," in SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Piscataway, NJ, USA: IEEE Press, 2008, pp. 1-11.
[13]
C. Dave, H. Bae, S.-J. Min, S. Lee, R. Eigenmann, and S. Midkiff, "Cetus: A source-to-source compiler infrastructure for multicores," IEEE Computer, vol. 42, no. 12, pp. 36-42, 2009.
[14]
"NVIDIA CUDA SDK - Data-Parallel Algorithms: Parallel Reduction {Online}. Available: http://developer.download.nvidia.com/compute/cu-da/1_1/Website/Data-Parallel_Algorithms.html."
[15]
Z. Pan and R. Eigenmann, "PEAK--a fast and effective performance tuning system via compiler optimization orchestration," ACM Trans. Program. Lang. Syst., vol. 30, no. 3, pp. 1-43, 2008.
[16]
S. Ueng, M. Lathara, S. S. Baghsorkhi, and W. W. Hwu, "CUDA-lite: Reducing GPU programming complexity," International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2008.

Cited By

View all
  1. OpenMPC: Extended OpenMP Programming and Tuning for GPUs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2010
    634 pages
    ISBN:9781424475599

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 13 November 2010

    Check for updates

    Qualifiers

    • Article

    Conference

    SC '10
    Sponsor:

    Acceptance Rates

    SC '10 Paper Acceptance Rate 51 of 253 submissions, 20%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Tiling-Based Programming Model for Structured Grids on GPU ClustersProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3368474.3368485(43-51)Online publication date: 15-Jan-2020
    • (2019)Optimization of lattice Boltzmann simulations on heterogeneous computersInternational Journal of High Performance Computing Applications10.1177/109434201770377133:1(124-139)Online publication date: 1-Jan-2019
    • (2019)NoTThe Journal of Supercomputing10.1007/s11227-019-02749-175:7(3810-3841)Online publication date: 1-Jul-2019
    • (2019)Compiler Optimization of Accelerator Data TransfersInternational Journal of Parallel Programming10.1007/s10766-017-0549-347:1(39-58)Online publication date: 1-Feb-2019
    • (2018)Automatic annotation of tasks in structured codeProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243200(1-13)Online publication date: 1-Nov-2018
    • (2017)DawnCCACM Transactions on Architecture and Code Optimization10.1145/308454014:2(1-25)Online publication date: 26-May-2017
    • (2017)Optimized two-level parallelization for GPU accelerators using the polyhedral modelProceedings of the 26th International Conference on Compiler Construction10.1145/3033019.3033022(22-33)Online publication date: 5-Feb-2017
    • (2017)VectorPUProceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/3029580.3029582(7-12)Online publication date: 25-Jan-2017
    • (2017)PandaInternational Journal of Parallel Programming10.1007/s10766-016-0454-145:3(711-729)Online publication date: 1-Jun-2017
    • (2016)Exploring compiler optimization opportunities for the OpenMP 4.x accelerator model on a POWER8+GPU platformProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019127(68-78)Online publication date: 13-Nov-2016
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media