Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Employing Software-Managed Caches in OpenACC: Opportunities and Benefits

Published: 12 February 2016 Publication History

Abstract

The OpenACC programming model has been developed to simplify accelerator programming and improve development productivity. In this article, we investigate the main limitations faced by OpenACC in harnessing all capabilities of GPU-like accelerators. We build on our findings and discuss the opportunity to exploit a software-managed cache as (i) a fast communication medium and (ii) a cache for data reuse. To this end, we propose a new directive and communication model for OpenACC. Investigating several benchmarks, we show that the proposed directive can improve performance up to 2.54×, and at the cost of minor programming effort.

References

[1]
AMD, Incorporated. 2007. AMD’s close-to-the-metal. Retrieved June 19, 2015 from http://sourceforge.net/projects/amdctm/.
[2]
Jorg Arndt. 2011. Matters Computational. Springer, Chap. 23.
[3]
Ian Buck. 2004. BrookGPU. Retrieved June 19, 2015 from http://graphics.stanford.edu/projects/brookgpu/.
[4]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC) (IISWC’09). IEEE Computer Society, Washington, DC, 44--54.
[5]
Michael Garland, Scott Le Grand, John Nickolls, Joshua Anderson, Jim Hardwick, Scott Morton, Everett Phillips, Yao Zhang, and Vasily Volkov. 2008. Parallel computing experiences with CUDA. IEEE Micro 28, 4 (July 2008), 13--27.
[6]
Tetsuya Hoshino, Naoya Maruyama, and Satoshi Matsuoka. 2014. An OpenACC extension for data layout transformation. In Proceedings of the 1st Workshop on Accelerator Programming Using Directives (WACCPD’14). IEEE Press, Piscataway, NJ, 12--18.
[7]
Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka, and Ryoji Takaki. 2013. CUDA vs OpenACC: Performance case studies with kernel benchmarks and a memory-bound CFD application. In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’13). 136--143.
[8]
Ujval J. Kapasi, Scott Rixner, William J. Dally, Brucek Khailany, Jung Ho Ahn, Peter Mattson, and John D. Owens. 2003. Programmable stream processors. Computer 36, 8 (Aug. 2003), 54--62.
[9]
Ahmad Lashgar, Alireza Majidi, and Amirali Baniasadi. 2014. IPMACC: Open source OpenACC to CUDA/OpenCL translator. arXiv:1412.1127v1 {cs.PL}.
[10]
Seyong Lee and Jeffrey S. Vetter. 2014. OpenARC: Extensible OpenACC compiler framework for directive-based accelerator programming study. In Proceedings of the 1st Workshop on Accelerator Programming Using Directives (WACCPD’14). IEEE Press, Piscataway, NJ, 1--11.
[11]
Chao Li, Yi Yang, Hongwen Dai, Shengen Yan, Frank Mueller, and Huiyang Zhou. 2014. Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 231--242.
[12]
Chunhua Liao, Oscar Hernandez, Barbara Chapman, Wenguang Chen, and Weimin Zheng. 2007. OpenUH: An optimizing, portable OpenMP compiler: Research articles. Concurrency and Computation : Practice and Experience 19, 18 (Dec. 2007), 2317--2332.
[13]
Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2 (March 2008), 39--55.
[14]
William R. Mark, R. Steven Glanville, Kurt Akeley, and Mark J. Kilgard. 2003. Cg: A system for programming graphics hardware in a C-like language. ACM Transactions on Graphics 22, 3 (July 2003), 896--907.
[15]
Jiayuan Meng and Kevin Skadron. 2009. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, New York, NY, 256--265.
[16]
Hitoshi Murai, Masahiro Nakao, Takenori Shimosaka, Akihiro Tabuchi, Taisuke Bokut, and Mitsuhisa Sato. 2014. XcalableACC—A directive-based language extension for accelerated parallel computing. In Proceedings of the Supercomputing Conference Poster (SC’14 Poster Session). Piscataway, NJ, 2.
[17]
Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Akihiro Tabuchi, Toshihiro Hanawa, Yuetsu Kodama, Taisuke Bokut, and Mitsuhisa Sato. 2014. XcalableACC: Extension of XcalableMP PGAS language using OpenACC for accelerator clusters. In Proceedings of the 1st Workshop on Accelerator Programming Using Directives (WACCPD’14). IEEE Press, Piscataway, NJ, 27--36.
[18]
John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable parallel programming with CUDA. Queue 6, 2 (March 2008), 40--53.
[19]
NVIDIA Corp. 2015a. CUDA Toolkit 6.0. Retrieved June 19, 2015 from https://developer.nvidia.com/cuda-downloads.
[20]
NVIDIA Corp. 2015b. NVIDIA CUDA Occupancy Calculator. Retrieved June 19, 2015 from http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls.
[21]
NVIDIA Corp. 2015c. Profiler’s User Guide. Retrieved June 19, 2015 from http://docs.nvidia.com/cuda/profiler-users-guide/.
[22]
Lars Nyland, Mark Harris, and Jan Prins. 2007. Gpu Gems 3: Chapter 31 (1st ed.). Addison-Wesley Professional.
[23]
OpenACC. 2015. The OpenACC Application Programming Interface. Retrieved June 19, 2015 from http://www.openacc-standard.org/.
[24]
PathScale. 2013. Modified Rodinia Benchmark Suite. Retrieved June 19, 2015 from https://github.com/pathscale/rodinia.
[25]
Ruyman Reyes, Ivan López-Rodríguez, Juan J. Fumero, and Francisco de Sande. 2012. accULL: An OpenACC implementation with CUDA and OpenCL support. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par’12). Springer-Verlag, Berlin, 871--882.
[26]
Akihiro Tabuchi, Masahiro Nakao, and Mitsuhisa Sato. 2014. A source-to-source OpenACC compiler for CUDA. In Euro-Par 2013: Parallel Processing Workshops. Lecture Notes in Computer Science, Vol. 8374. Springer, Berlin, 178--187.
[27]
The Khronos Group. 2015. OpenCL: The open standard for parallel programming of heterogeneous systems. Retrieved June 19, 2015 from https://www.khronos.org/opencl/.
[28]
William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC’02). Springer-Verlag, London, 179--196.
[29]
Xiaonan Tian, Rengan Xu, Yonghong Yan, Zhifeng Yun, Sunita Chandrasekaran, and Barbara Chapman. 2013. Compiling a high-level directive-based programming model for GPGPUs. In Proceedings of the 26th International Workshop on Languages and Compilers for High Performance Computing (LCPC’13).
[30]
Tristan Vanderbruggen. 2015. RoseACC. Retrieved June 19, 2015 from http://roseacc.org/.
[31]
Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey. 2012. OpenACC: First experiences with real-world applications. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par’12). Springer-Verlag, Berlin, 859--870. 10.1007/978-3-642-32820-6_85
[32]
Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems Software (ISPASS). 235--246.

Cited By

View all
  • (2022)On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJAInternational Conference on High Performance Computing in Asia-Pacific Region10.1145/3492805.3492806(103-114)Online publication date: 7-Jan-2022
  • (2019)Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular SimulationAccelerator Programming Using Directives10.1007/978-3-030-12274-4_2(22-47)Online publication date: 24-Jan-2019
  • (2017)Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks2017 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS.2017.110(719-726)Online publication date: Jul-2017
  • Show More Cited By

Index Terms

  1. Employing Software-Managed Caches in OpenACC: Opportunities and Benefits

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems
    ACM Transactions on Modeling and Performance Evaluation of Computing Systems  Volume 1, Issue 1
    Inaugural Issue
    March 2016
    118 pages
    ISSN:2376-3639
    EISSN:2376-3647
    DOI:10.1145/2893449
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 February 2016
    Accepted: 01 June 2015
    Revised: 01 June 2015
    Received: 01 December 2014
    Published in TOMPECS Volume 1, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. Accelerator
    2. CUDA
    3. OpenACC
    4. software-managed cache

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Natural Sciences and Engineering Research Council of Canada

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJAInternational Conference on High Performance Computing in Asia-Pacific Region10.1145/3492805.3492806(103-114)Online publication date: 7-Jan-2022
    • (2019)Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular SimulationAccelerator Programming Using Directives10.1007/978-3-030-12274-4_2(22-47)Online publication date: 24-Jan-2019
    • (2017)Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks2017 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS.2017.110(719-726)Online publication date: Jul-2017
    • (2017)Enabling efficient stencil code generation in OpenACCProcedia Computer Science10.1016/j.procs.2017.05.155108(2333-2337)Online publication date: 2017
    • (2016)OpenACC cache directiveProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019125(46-56)Online publication date: 13-Nov-2016
    • (2016)OpenACC Cache Directive: Opportunities and Optimizations2016 Third Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD.2016.009(46-56)Online publication date: Nov-2016
    • (2016)Histogram optimization with CUDA2016 IEEE Industrial Electronics and Applications Conference (IEACon)10.1109/IEACON.2016.8067397(312-318)Online publication date: Nov-2016

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media