research-article

Employing Software-Managed Caches in OpenACC: Opportunities and Benefits

Authors:

Amirali BaniasadiAuthors Info & Claims

ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), Volume 1, Issue 1

Article No.: 2, Pages 1 - 34

https://doi.org/10.1145/2798724

Published: 12 February 2016 Publication History

Abstract

The OpenACC programming model has been developed to simplify accelerator programming and improve development productivity. In this article, we investigate the main limitations faced by OpenACC in harnessing all capabilities of GPU-like accelerators. We build on our findings and discuss the opportunity to exploit a software-managed cache as (i) a fast communication medium and (ii) a cache for data reuse. To this end, we propose a new directive and communication model for OpenACC. Investigating several benchmarks, we show that the proposed directive can improve performance up to 2.54×, and at the cost of minor programming effort.

References

[1]

AMD, Incorporated. 2007. AMD’s close-to-the-metal. Retrieved June 19, 2015 from http://sourceforge.net/projects/amdctm/.

[2]

Jorg Arndt. 2011. Matters Computational. Springer, Chap. 23.

[3]

Ian Buck. 2004. BrookGPU. Retrieved June 19, 2015 from http://graphics.stanford.edu/projects/brookgpu/.

[4]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC) (IISWC’09). IEEE Computer Society, Washington, DC, 44--54.

Digital Library

[5]

Michael Garland, Scott Le Grand, John Nickolls, Joshua Anderson, Jim Hardwick, Scott Morton, Everett Phillips, Yao Zhang, and Vasily Volkov. 2008. Parallel computing experiences with CUDA. IEEE Micro 28, 4 (July 2008), 13--27.

Digital Library

[6]

Tetsuya Hoshino, Naoya Maruyama, and Satoshi Matsuoka. 2014. An OpenACC extension for data layout transformation. In Proceedings of the 1st Workshop on Accelerator Programming Using Directives (WACCPD’14). IEEE Press, Piscataway, NJ, 12--18.

Digital Library

[7]

Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka, and Ryoji Takaki. 2013. CUDA vs OpenACC: Performance case studies with kernel benchmarks and a memory-bound CFD application. In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’13). 136--143.

Digital Library

[8]

Ujval J. Kapasi, Scott Rixner, William J. Dally, Brucek Khailany, Jung Ho Ahn, Peter Mattson, and John D. Owens. 2003. Programmable stream processors. Computer 36, 8 (Aug. 2003), 54--62.

Digital Library

[9]

Ahmad Lashgar, Alireza Majidi, and Amirali Baniasadi. 2014. IPMACC: Open source OpenACC to CUDA/OpenCL translator. arXiv:1412.1127v1 {cs.PL}.

[10]

Seyong Lee and Jeffrey S. Vetter. 2014. OpenARC: Extensible OpenACC compiler framework for directive-based accelerator programming study. In Proceedings of the 1st Workshop on Accelerator Programming Using Directives (WACCPD’14). IEEE Press, Piscataway, NJ, 1--11.

Digital Library

[11]

Chao Li, Yi Yang, Hongwen Dai, Shengen Yan, Frank Mueller, and Huiyang Zhou. 2014. Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 231--242.

[12]

Chunhua Liao, Oscar Hernandez, Barbara Chapman, Wenguang Chen, and Weimin Zheng. 2007. OpenUH: An optimizing, portable OpenMP compiler: Research articles. Concurrency and Computation : Practice and Experience 19, 18 (Dec. 2007), 2317--2332.

Digital Library

[13]

Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2 (March 2008), 39--55.

Digital Library

[14]

William R. Mark, R. Steven Glanville, Kurt Akeley, and Mark J. Kilgard. 2003. Cg: A system for programming graphics hardware in a C-like language. ACM Transactions on Graphics 22, 3 (July 2003), 896--907.

Digital Library

[15]

Jiayuan Meng and Kevin Skadron. 2009. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, New York, NY, 256--265.

Digital Library

[16]

Hitoshi Murai, Masahiro Nakao, Takenori Shimosaka, Akihiro Tabuchi, Taisuke Bokut, and Mitsuhisa Sato. 2014. XcalableACC—A directive-based language extension for accelerated parallel computing. In Proceedings of the Supercomputing Conference Poster (SC’14 Poster Session). Piscataway, NJ, 2.

[17]

Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Akihiro Tabuchi, Toshihiro Hanawa, Yuetsu Kodama, Taisuke Bokut, and Mitsuhisa Sato. 2014. XcalableACC: Extension of XcalableMP PGAS language using OpenACC for accelerator clusters. In Proceedings of the 1st Workshop on Accelerator Programming Using Directives (WACCPD’14). IEEE Press, Piscataway, NJ, 27--36.

Digital Library

[18]

John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable parallel programming with CUDA. Queue 6, 2 (March 2008), 40--53.

Digital Library

[19]

NVIDIA Corp. 2015a. CUDA Toolkit 6.0. Retrieved June 19, 2015 from https://developer.nvidia.com/cuda-downloads.

[20]

NVIDIA Corp. 2015b. NVIDIA CUDA Occupancy Calculator. Retrieved June 19, 2015 from http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls.

[21]

NVIDIA Corp. 2015c. Profiler’s User Guide. Retrieved June 19, 2015 from http://docs.nvidia.com/cuda/profiler-users-guide/.

[22]

Lars Nyland, Mark Harris, and Jan Prins. 2007. Gpu Gems 3: Chapter 31 (1st ed.). Addison-Wesley Professional.

[23]

OpenACC. 2015. The OpenACC Application Programming Interface. Retrieved June 19, 2015 from http://www.openacc-standard.org/.

[24]

PathScale. 2013. Modified Rodinia Benchmark Suite. Retrieved June 19, 2015 from https://github.com/pathscale/rodinia.

[25]

Ruyman Reyes, Ivan López-Rodríguez, Juan J. Fumero, and Francisco de Sande. 2012. accULL: An OpenACC implementation with CUDA and OpenCL support. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par’12). Springer-Verlag, Berlin, 871--882.

Digital Library

[26]

Akihiro Tabuchi, Masahiro Nakao, and Mitsuhisa Sato. 2014. A source-to-source OpenACC compiler for CUDA. In Euro-Par 2013: Parallel Processing Workshops. Lecture Notes in Computer Science, Vol. 8374. Springer, Berlin, 178--187.

[27]

The Khronos Group. 2015. OpenCL: The open standard for parallel programming of heterogeneous systems. Retrieved June 19, 2015 from https://www.khronos.org/opencl/.

[28]

William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC’02). Springer-Verlag, London, 179--196.

Digital Library

[29]

Xiaonan Tian, Rengan Xu, Yonghong Yan, Zhifeng Yun, Sunita Chandrasekaran, and Barbara Chapman. 2013. Compiling a high-level directive-based programming model for GPGPUs. In Proceedings of the 26th International Workshop on Languages and Compilers for High Performance Computing (LCPC’13).

[30]

Tristan Vanderbruggen. 2015. RoseACC. Retrieved June 19, 2015 from http://roseacc.org/.

[31]

Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey. 2012. OpenACC: First experiences with real-world applications. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par’12). Springer-Verlag, Berlin, 859--870. 10.1007/978-3-642-32820-6_85

Digital Library

[32]

Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems Software (ISPASS). 235--246.

Cited By

Marowka A(2022)On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJAInternational Conference on High Performance Computing in Asia-Pacific Region10.1145/3492805.3492806(103-114)Online publication date: 7-Jan-2022
https://dl.acm.org/doi/10.1145/3492805.3492806
Sedova ATillack ATharrington A(2019)Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular SimulationAccelerator Programming Using Directives10.1007/978-3-030-12274-4_2(22-47)Online publication date: 24-Jan-2019
https://doi.org/10.1007/978-3-030-12274-4_2
Pereira ACastro MDantas MRocha RGoes L(2017)Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks2017 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS.2017.110(719-726)Online publication date: Jul-2017
https://doi.org/10.1109/HPCS.2017.110
Show More Cited By

Index Terms

Employing Software-Managed Caches in OpenACC: Opportunities and Benefits
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
OpenACC cache directive: opportunities and optimizations
WACCPD '16: Proceedings of the Third International Workshop on Accelerator Programming Using Directives

OpenACC's programming model presents a simple interface to programmers, offering a trade-off between performance and development effort. OpenACC relies on compiler technologies to generate efficient code and optimize for performance. Among the difficult ...
CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application
CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

OpenACC is a new accelerator programming interface that provides a set of OpenMP-like loop directives for the programming of accelerators in an implicit and portable way. It allows the programmer to express the offloading of data and computations to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems

ACM Transactions on Modeling and Performance Evaluation of Computing Systems Volume 1, Issue 1

Inaugural Issue

March 2016

118 pages

ISSN:2376-3639

EISSN:2376-3647

DOI:10.1145/2893449

Editors:
Don Towsley
University of Massachusetts—Amherst, USA
,
Carey Williamson
University of Calgary, Canada

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2016

Accepted: 01 June 2015

Revised: 01 June 2015

Received: 01 December 2014

Published in TOMPECS Volume 1, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Natural Sciences and Engineering Research Council of Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Marowka A(2022)On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJAInternational Conference on High Performance Computing in Asia-Pacific Region10.1145/3492805.3492806(103-114)Online publication date: 7-Jan-2022
https://dl.acm.org/doi/10.1145/3492805.3492806
Sedova ATillack ATharrington A(2019)Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular SimulationAccelerator Programming Using Directives10.1007/978-3-030-12274-4_2(22-47)Online publication date: 24-Jan-2019
https://doi.org/10.1007/978-3-030-12274-4_2
Pereira ACastro MDantas MRocha RGoes L(2017)Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks2017 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS.2017.110(719-726)Online publication date: Jul-2017
https://doi.org/10.1109/HPCS.2017.110
Pereira ARocha RCastro MGóes LDantas M(2017)Enabling efficient stencil code generation in OpenACCProcedia Computer Science10.1016/j.procs.2017.05.155108(2333-2337)Online publication date: 2017
https://doi.org/10.1016/j.procs.2017.05.155
Lashgar ABaniasadi AChandrasekaran SJuckeland G(2016)OpenACC cache directiveProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019125(46-56)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3019120.3019125
Lashgar ABaniasadi A(2016)OpenACC Cache Directive: Opportunities and Optimizations2016 Third Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD.2016.009(46-56)Online publication date: Nov-2016
https://doi.org/10.1109/WACCPD.2016.009
Yong KTalib S(2016)Histogram optimization with CUDA2016 IEEE Industrial Electronics and Applications Conference (IEACon)10.1109/IEACON.2016.8067397(312-318)Online publication date: Nov-2016
https://doi.org/10.1109/IEACON.2016.8067397

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents