research-article

On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA

Author:

Ami MarowkaAuthors Info & Claims

HPCAsia '22: International Conference on High Performance Computing in Asia-Pacific Region

Pages 103 - 114

https://doi.org/10.1145/3492805.3492806

Published: 07 January 2022 Publication History

Abstract

Performance Portability frameworks are becoming more central and essential in heterogeneous computing systems. However, the developer toolbox lacks the tools to assess the performance portability degree of these frameworks.

This article presents a new definition and a metric for evaluating the performance portability of high-level parallel programming models. Using the new metric, the performance portability of OpenACC, OpenMP, Kokkos and RAJA were evaluated based on 324 case studies in various application domains, CPUs and GPUs architectures, and high-performance compilers. The results show that the four performance portability frameworks achieve impressive performance portability of over 80% with no significant differences between different architectures and compilers.

References

[1]

[1] Sutter H., Welcome to the Jungle, http://herbsutter.com/welcome-to-the-jungle/, 2012.

[2]

[2] OpenACC: Directive-Based Parallel Programming Model for Accelerators. Available: http://www.openacc.org (2018).

[3]

[3] OpenMP. OpenMP 4.5 Specifications.http://www.openmp.org/specifications/. Accessed: 2017-02-11.

[4]

[4] H. Carter Edwards, Christian R. Trott and Daniel Sundrland, Kokkos: Enabling manycore performance portability through polymorphic memory access patterns, Journal of Parallel and Distributed Computing, 2014.

[5]

[5] R. D. Hornung, and J. A. Keasler. 2014. The RAJA Portability Layer: Overview and Status. LLNL-TR-661403.

[6]

[6] William D. Gropp, Performance, Portability, and Dreams, Dagstuhl Seminar 17431, October 22-27, 2017.

[7]

[7] A. Marowka, Pitfalls and Issues of Manycore Programming, Advances in Computers, Volume 79, pages 71-117, 2010.

[8]

[8] http://performanceportability.org/perfport/definition/

[9]

[9] DOE Centers of Excellence Performance Portability Meeting,April 19-21, 2016, Glendale, AZ, Post-meeting Report.

[10]

[10] V. Artigues, K. Kormann, M. Rampp, and K. Reuter. Evaluation of performance portability frameworks for the implementation of a particle-in-cell code. Concurrency Computat. Pract. Exper., page e5640, 2019.

[11]

[11]Asahi Y., Latu G., Grandgirard V., Bigot J. (2020) Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App. In: Wienke S., Bhalachandra S. (eds) Accelerator Programming Using Directives. WACCPD 2019. Lecture Notes in Computer Science, vol 12017. Springer, Cham.

[12]

[12] Deakin T., Price J., Martineau M., McIntosh-Smith S. (2016) GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models. In: Taufer M., Mohr B., Kunkel J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol 9945. Springer, Cham.

[13]

[13] Eichstaedt J, Vymazal M, Moxey D, Peiro Jet al., 2020, A comparison of the shared-memory parallel programming models OpenMP, OpenACC and Kokkos in the context of implicit solvers for high-order FEM, Computer Physics Communications, Vol: 255, Pages: 1-15.

[14]

[14] Gayatri R., Yang C., Kurth T., Deslippe J. (2019) A Case Study for Performance Portability Using OpenMP 4.5. In: Chandrasekaran S., Juckeland G., Wienke S. (eds) Accelerator Programming Using Directives. WACCPD 2018. Lecture Notes in Computer Science, vol 11381. Springer, Cham.

[15]

[15] J. A. Herdman et al., Accelerating Hydrocodes with OpenACC, OpenCL and CUDA, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, Salt Lake City, UT, 2012, pp. 465-471.

[16]

[16] R. O. Kirk, G. R. Mudalige, I. Z. Reguly, S. A. Wright, M. J. Martineau and S. A. Jarvis, Achieving Performance Portability for a Heat Conduction Solver Mini-Application on Modern Multi-core Systems, 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, HI, 2017, pp. 834-841

[17]

[17] John Gounley, Amanda Randles and Jeffrey S. Vetter, Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures. J. Parallel Distributed Comput. 129: 1-13 (2019)

Digital Library

[18]

[18] M. Martineau, S. McIntosh-Smith and W. Gaudin, Evaluating OpenMP 4.0’s Effectiveness as a Heterogeneous Parallel Programming Model, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, 2016, pp. 338-347.

[19]

[19] I. Z. Reguly, Performance Portability of Multi-Material Kernels, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Denver, CO, USA, 2019, pp. 26-35.

[20]

[20] Y. Wei et al., Performance and Portability Studies with OpenACC Accelerated Version of GTC-P, 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Guangzhou, 2016, pp. 13-18,

[21]

[21] Sabne A., Sakdhnagool P., Lee S., Vetter J.S. (2015) Evaluating Performance Portability of OpenACC. In: Brodman J., Tu P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science, vol 8967. Springer, Cham.

[22]

[22] S. Lee and J. S. Vetter, OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study, 2014 First Workshop on Accelerator Programming using Directives, New Orleans, LA, 2014, pp. 1-11,

[23]

[23] Balogh G.D., Reguly I.Z., Mudalige G.R. (2018) Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs. In: Jarvis S., Wright S., Hammond S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science, vol 10724. Springer, Cham.

[24]

[24] Bonati, C., Coscetti, S., D’Elia, M., Mesiti, M., Negro, F., Calore, E., Schifano, S.F., Silvi, G., Tripiccione, R. Design and optimization of a portable LQCD Monte Carlo code using OpenACC. Int. J. Mod. Phys. C 2017, 28.

[25]

[25] Calore E., Kraus J., Schifano S.F., Tripiccione R. (2015) Accelerating Lattice Boltzmann Applications with OpenACC. In: Traff J., Hunold S., Versaci F. (eds) Euro-Par 2015: Parallel Processing. Euro-Par 2015. Lecture Notes in Computer Science, vol 9233. Springer, Berlin, Heidelberg.

[26]

[26] Xu R., Tian X., Chandrasekaran S., Yan Y., Chapman B. (2015) NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model. In: Brodman J., Tu P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science, vol 8967. Springer, Cham

[27]

[27] J. A. Herdman, W. P. Gaudin, O. Perks, D. A. Beckingsale, A. C. Mallinson and S. A. Jarvis, Achieving Portability and Performance through OpenACC, 2014 First Workshop on Accelerator Programming using Directives, New Orleans, LA, 2014, pp. 19-26.

Digital Library

[28]

[28] Kuan, L., J. Neves, F. Pratas, P. Tomas, and L. Sousa. 2014. Accelerating Phylogenetic Inference on GPUs: An OpenACC and CUDA comparison. 2nd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), Granada, SPAIN, April, 07-09. 1: 589-600.

[29]

[29] M. G. Lopez et al., Towards Achieving Performance Portability Using Directives for Accelerators, 2016 Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, UT, 2016, pp. 13-24.

[30]

[30] M. Martineau, S. McIntosh-Smith, M. Boulton, W. Gaudin, An Evaluation of Emerging Many-Core Parallel Programming Models, 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016.

[31]

[31] Gong, J., Markidis, S., Laure, E. et al. Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations. J Supercomputing 72, 4160-4180 (2016).

Digital Library

[32]

[32] T. Hoshino, N. Maruyama, S. Matsuoka and R. Takaki, CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, Delft, 2013, pp. 136-143,

Digital Library

[33]

[33] A. Lashgar and A. Baniasadi, Employing software-managed caches in OpenACC: Opportunities and benefits, ACM Trans. Model. Perform. Eval. Comput. Syst., vol. 1, no. 1, pp. 2:1-2:34, 2016.

Digital Library

[34]

[34] Niemeyer, K.E., Sung, C. Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67, 528-564 (2014).

Digital Library

[35]

[35] Norman M, Larkin J, Vose A, et al. (2015) A case study of CUDA FORTRAN and OpenACC for an atmospheric climate kernel. Journal of Computational Science 9: 1-6.

[36]

[36] Mudalige G.R., Reguly I.Z., Giles M.B., Mallinson A.C., Gaudin W.P., Herdman J.A. (2015) Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems. In: Jarvis S., Wright S., Hammond S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2014. Lecture Notes in Computer Science, vol 8966. Springer, Cham.

Digital Library

[37]

[37] Hernandez O., Ding W., Chapman B., Kartsaklis C., Sankaran R., Graham R. (2012) Experiences with High-Level Programming Directives for Porting Applications to GPUs. In: Keller R., Kramer D., Weiss JP. (eds) Facing the Multicore - Challenge II. Lecture Notes in Computer Science, vol 7174. Springer, Berlin, Heidelberg.

[38]

[38] H. C. Edwards and C. R. Trott, Kokkos: Enabling Performance Portability Across Manycore Architectures, 2013 Extreme Scaling Workshop (xsw 2013), Boulder, CO, 2013, pp. 18-24.

Digital Library

[39]

[39] A. Hayashi, J. Shirako, E. Tiotto, R. Ho and V. Sarkar, Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator Model on a POWER8+GPU Platform, 2016 Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, UT, 2016, pp. 68-78.

[40]

[40] A. Hsu, D. N. Asanza, J. A. Schoonover, Z. Jibben, N. N. Carlson and R. Robey, Performance Portability Challenges for Fortran Applications, 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 47-58.

[41]

[41] Law, T.R., Kevis, R., Powell, S., Dickson, J., Maheswaran, S., Herdman, J.A., Jarvis, S.A.: Performance portability of an unstructured hydrodynamics mini-application. In: Proceedings of 2018 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC). ACM, New York, NY, USA (2018).

[42]

[42] Martineau M., McIntosh-Smith S. (2017) The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs.In: de Supinski B., Olivier S., Terboven C., Chapman B., M?ller M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science, vol 10468. Springer, Cham.

[43]

[43] Martineau M., Price J., McIntosh-Smith S., Gaudin W. (2016) Pragmatic Performance Portability with OpenMP 4.x. In: Maruyama N., de Supinski B., Wahib M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science, vol 9903. Springer, Cham.

[44]

[44] S. J. Pennycook, J. D. Sewall and J. R. Hammond, Evaluating the Impact of Proposed OpenMP 5.0 Features on Performance, Portability and Productivity, 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 37-46

[45]

[45] S. L. Harrell et al., Effective Performance Portability,” 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 24-36.

[46]

[46] Tandon Suyash, N. Stegmeier, Vasu Jaganath, Jennifer Ranta, R. Ratnasingam, Elizabeth Carlson, J. Loiseau, Vinay Ramakrishnaiah and Robert S. Pavel. Enabling code portability of a parallel and distributed smooth-particle hydrodynamics application, FleCSPH. (2019).

[47]

[47] T. Hey, J. Ferrante (Eds.), Portability and Performance of Parallel Processing, Wiley, New York, 1994.

Digital Library

[48]

[48] Bowen Alpern and Larry Carter, Towards a Model for Portable Parallel Performance: Exposing the Memory Hierarchy,In T. Hey, J. Ferrante (Eds.), Portability and Performance of Parallel Processing, Wiley, New York, 1994, pp. 21-41.

[49]

[49] S. J. Pennycook, J. D. Sewall, and V. W. Lee, A Metric for Performance Portability, arXiv preprint arXiv:1611.07409, 2016.

[50]

[50] Ami Marowka, Toward a Better Performance Portability Metric, In Proceeding of 29th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2021), Valladolid, Spain, March 10-12, 2021.

[51]

[51] Ami Marowka, Raw Data and Statistics of case studies for Performance Portability Research,https://www.dropbox.com/s/1g9q0s2ymqq9003/Zmy.pdf?dl=0

[52]

[52] Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification, 2020. [Online]. Available: https://www.khronos.org/news/press/

[53]

[53] https://www.oneapi.io/

Cited By

Yanhaona MGrimshaw AMickey S(2024)HighP5: Programming using Partitioned Parallel Processing SpacesJournal of the Brazilian Computer Society10.5753/jbcs.2024.434530:1(653-687)Online publication date: 17-Dec-2024
https://doi.org/10.5753/jbcs.2024.4345
Valero-Lara PGodoy WMankad HTeranishi KVetter JBlaschke JSchanen M(2024)JACC: Leveraging HPC Meta-Programming and Performance Portability with the Just-in-Time and LLVM-based Julia LanguageProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00245(1955-1966)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00245
Munera ADasca GQuiñones ERoyuela S(2024)Adaptive Parallelism in OpenMP Through Dynamic VariantsHigh Performance Computing. ISC High Performance 2024 International Workshops10.1007/978-3-031-73716-9_2(17-30)Online publication date: 14-Dec-2024
https://doi.org/10.1007/978-3-031-73716-9_2
Show More Cited By

Index Terms

On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA

Index terms have been assigned to the content through auto-classification.

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
Understanding Performance Portability of OpenACC for Supercomputers
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the ...
An overview of performance portability in the uintah runtime system through the use of kokkos
ESPM2: Proceedings of the Second Internationsl Workshop on Extreme Scale Programming Models and Middleware

The current diversity in nodal parallel computer architectures is seen in machines based upon multicore CPUs, GPUs and the Intel Xeon Phi's. A class of approaches for enabling scalability of complex applications on such architectures is based upon ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCAsia '22: International Conference on High Performance Computing in Asia-Pacific Region

January 2022

145 pages

ISBN:9781450384988

DOI:10.1145/3492805

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 January 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HPC Asia2022

HPC Asia2022: International Conference on High Performance Computing in Asia-Pacific Region

January 12 - 14, 2022

Virtual Event, Japan

Acceptance Rates

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
477
Total Downloads

Downloads (Last 12 months)91
Downloads (Last 6 weeks)4

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yanhaona MGrimshaw AMickey S(2024)HighP5: Programming using Partitioned Parallel Processing SpacesJournal of the Brazilian Computer Society10.5753/jbcs.2024.434530:1(653-687)Online publication date: 17-Dec-2024
https://doi.org/10.5753/jbcs.2024.4345
Valero-Lara PGodoy WMankad HTeranishi KVetter JBlaschke JSchanen M(2024)JACC: Leveraging HPC Meta-Programming and Performance Portability with the Just-in-Time and LLVM-based Julia LanguageProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00245(1955-1966)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00245
Munera ADasca GQuiñones ERoyuela S(2024)Adaptive Parallelism in OpenMP Through Dynamic VariantsHigh Performance Computing. ISC High Performance 2024 International Workshops10.1007/978-3-031-73716-9_2(17-30)Online publication date: 14-Dec-2024
https://doi.org/10.1007/978-3-031-73716-9_2
Đukić JMišić M(2023)An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil BenchmarkElectronics10.3390/electronics1222455512:22(4555)Online publication date: 7-Nov-2023
https://doi.org/10.3390/electronics12224555
Marowka A(2023)Toward Open Repository of Performance Portability of Applications, Benchmarks and Models2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00025(160-169)Online publication date: 17-Oct-2023
https://doi.org/10.1109/SBAC-PAD59825.2023.00025
Godoy WValero-Lara PDettling TTrefftz CJorquera ISheehy TMiller RGonzalez-Tallada MVetter JChuravy V(2023)Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00068(373-382)Online publication date: May-2023
https://doi.org/10.1109/IPDPSW59300.2023.00068
Taboada HPereira RJaeger JBesnard J(2023)Towards Achieving Transparent Malleability Thanks to MPI Process VirtualizationHigh Performance Computing10.1007/978-3-031-40843-4_3(28-41)Online publication date: 21-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-40843-4_3
Marowka A(2023)A comparison of two performance portability metricsConcurrency and Computation: Practice and Experience10.1002/cpe.786835:25Online publication date: 4-Aug-2023
https://doi.org/10.1002/cpe.7868
Marowka A(2022)Inferential Statistical Analysis of Performance PortabilityParallel Processing and Applied Mathematics10.1007/978-3-031-30445-3_4(39-50)Online publication date: 11-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-30445-3_4
Jiménez DHerrera-Mora JRampp MLaure EMeneses E(2022)Implementing a GPU-Portable Field Line Tracing Application with OpenMP OffloadHigh Performance Computing10.1007/978-3-031-23821-5_3(31-46)Online publication date: 21-Dec-2022
https://doi.org/10.1007/978-3-031-23821-5_3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten