Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3492805.3492806acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA

Published: 07 January 2022 Publication History

Abstract

Performance Portability frameworks are becoming more central and essential in heterogeneous computing systems. However, the developer toolbox lacks the tools to assess the performance portability degree of these frameworks.
This article presents a new definition and a metric for evaluating the performance portability of high-level parallel programming models. Using the new metric, the performance portability of OpenACC, OpenMP, Kokkos and RAJA were evaluated based on 324 case studies in various application domains, CPUs and GPUs architectures, and high-performance compilers. The results show that the four performance portability frameworks achieve impressive performance portability of over 80% with no significant differences between different architectures and compilers.

References

[1]
[1] Sutter H., Welcome to the Jungle, http://herbsutter.com/welcome-to-the-jungle/, 2012.
[2]
[2] OpenACC: Directive-Based Parallel Programming Model for Accelerators. Available: http://www.openacc.org (2018).
[3]
[3] OpenMP. OpenMP 4.5 Specifications.http://www.openmp.org/specifications/. Accessed: 2017-02-11.
[4]
[4] H. Carter Edwards, Christian R. Trott and Daniel Sundrland, Kokkos: Enabling manycore performance portability through polymorphic memory access patterns, Journal of Parallel and Distributed Computing, 2014.
[5]
[5] R. D. Hornung, and J. A. Keasler. 2014. The RAJA Portability Layer: Overview and Status. LLNL-TR-661403.
[6]
[6] William D. Gropp, Performance, Portability, and Dreams, Dagstuhl Seminar 17431, October 22-27, 2017.
[7]
[7] A. Marowka, Pitfalls and Issues of Manycore Programming, Advances in Computers, Volume 79, pages 71-117, 2010.
[8]
[8] http://performanceportability.org/perfport/definition/
[9]
[9] DOE Centers of Excellence Performance Portability Meeting,April 19-21, 2016, Glendale, AZ, Post-meeting Report.
[10]
[10] V. Artigues, K. Kormann, M. Rampp, and K. Reuter. Evaluation of performance portability frameworks for the implementation of a particle-in-cell code. Concurrency Computat. Pract. Exper., page e5640, 2019.
[11]
[11]Asahi Y., Latu G., Grandgirard V., Bigot J. (2020) Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App. In: Wienke S., Bhalachandra S. (eds) Accelerator Programming Using Directives. WACCPD 2019. Lecture Notes in Computer Science, vol 12017. Springer, Cham.
[12]
[12] Deakin T., Price J., Martineau M., McIntosh-Smith S. (2016) GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models. In: Taufer M., Mohr B., Kunkel J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol 9945. Springer, Cham.
[13]
[13] Eichstaedt J, Vymazal M, Moxey D, Peiro Jet al., 2020, A comparison of the shared-memory parallel programming models OpenMP, OpenACC and Kokkos in the context of implicit solvers for high-order FEM, Computer Physics Communications, Vol: 255, Pages: 1-15.
[14]
[14] Gayatri R., Yang C., Kurth T., Deslippe J. (2019) A Case Study for Performance Portability Using OpenMP 4.5. In: Chandrasekaran S., Juckeland G., Wienke S. (eds) Accelerator Programming Using Directives. WACCPD 2018. Lecture Notes in Computer Science, vol 11381. Springer, Cham.
[15]
[15] J. A. Herdman et al., Accelerating Hydrocodes with OpenACC, OpenCL and CUDA, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, Salt Lake City, UT, 2012, pp. 465-471.
[16]
[16] R. O. Kirk, G. R. Mudalige, I. Z. Reguly, S. A. Wright, M. J. Martineau and S. A. Jarvis, Achieving Performance Portability for a Heat Conduction Solver Mini-Application on Modern Multi-core Systems, 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, HI, 2017, pp. 834-841
[17]
[17] John Gounley, Amanda Randles and Jeffrey S. Vetter, Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures. J. Parallel Distributed Comput. 129: 1-13 (2019)
[18]
[18] M. Martineau, S. McIntosh-Smith and W. Gaudin, Evaluating OpenMP 4.0’s Effectiveness as a Heterogeneous Parallel Programming Model, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, 2016, pp. 338-347.
[19]
[19] I. Z. Reguly, Performance Portability of Multi-Material Kernels, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Denver, CO, USA, 2019, pp. 26-35.
[20]
[20] Y. Wei et al., Performance and Portability Studies with OpenACC Accelerated Version of GTC-P, 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Guangzhou, 2016, pp. 13-18,
[21]
[21] Sabne A., Sakdhnagool P., Lee S., Vetter J.S. (2015) Evaluating Performance Portability of OpenACC. In: Brodman J., Tu P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science, vol 8967. Springer, Cham.
[22]
[22] S. Lee and J. S. Vetter, OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study, 2014 First Workshop on Accelerator Programming using Directives, New Orleans, LA, 2014, pp. 1-11,
[23]
[23] Balogh G.D., Reguly I.Z., Mudalige G.R. (2018) Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs. In: Jarvis S., Wright S., Hammond S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science, vol 10724. Springer, Cham.
[24]
[24] Bonati, C., Coscetti, S., D’Elia, M., Mesiti, M., Negro, F., Calore, E., Schifano, S.F., Silvi, G., Tripiccione, R. Design and optimization of a portable LQCD Monte Carlo code using OpenACC. Int. J. Mod. Phys. C 2017, 28.
[25]
[25] Calore E., Kraus J., Schifano S.F., Tripiccione R. (2015) Accelerating Lattice Boltzmann Applications with OpenACC. In: Traff J., Hunold S., Versaci F. (eds) Euro-Par 2015: Parallel Processing. Euro-Par 2015. Lecture Notes in Computer Science, vol 9233. Springer, Berlin, Heidelberg.
[26]
[26] Xu R., Tian X., Chandrasekaran S., Yan Y., Chapman B. (2015) NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model. In: Brodman J., Tu P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science, vol 8967. Springer, Cham
[27]
[27] J. A. Herdman, W. P. Gaudin, O. Perks, D. A. Beckingsale, A. C. Mallinson and S. A. Jarvis, Achieving Portability and Performance through OpenACC, 2014 First Workshop on Accelerator Programming using Directives, New Orleans, LA, 2014, pp. 19-26.
[28]
[28] Kuan, L., J. Neves, F. Pratas, P. Tomas, and L. Sousa. 2014. Accelerating Phylogenetic Inference on GPUs: An OpenACC and CUDA comparison. 2nd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), Granada, SPAIN, April, 07-09. 1: 589-600.
[29]
[29] M. G. Lopez et al., Towards Achieving Performance Portability Using Directives for Accelerators, 2016 Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, UT, 2016, pp. 13-24.
[30]
[30] M. Martineau, S. McIntosh-Smith, M. Boulton, W. Gaudin, An Evaluation of Emerging Many-Core Parallel Programming Models, 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016.
[31]
[31] Gong, J., Markidis, S., Laure, E. et al. Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations. J Supercomputing 72, 4160-4180 (2016).
[32]
[32] T. Hoshino, N. Maruyama, S. Matsuoka and R. Takaki, CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, Delft, 2013, pp. 136-143,
[33]
[33] A. Lashgar and A. Baniasadi, Employing software-managed caches in OpenACC: Opportunities and benefits, ACM Trans. Model. Perform. Eval. Comput. Syst., vol. 1, no. 1, pp. 2:1-2:34, 2016.
[34]
[34] Niemeyer, K.E., Sung, C. Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67, 528-564 (2014).
[35]
[35] Norman M, Larkin J, Vose A, et al. (2015) A case study of CUDA FORTRAN and OpenACC for an atmospheric climate kernel. Journal of Computational Science 9: 1-6.
[36]
[36] Mudalige G.R., Reguly I.Z., Giles M.B., Mallinson A.C., Gaudin W.P., Herdman J.A. (2015) Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems. In: Jarvis S., Wright S., Hammond S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2014. Lecture Notes in Computer Science, vol 8966. Springer, Cham.
[37]
[37] Hernandez O., Ding W., Chapman B., Kartsaklis C., Sankaran R., Graham R. (2012) Experiences with High-Level Programming Directives for Porting Applications to GPUs. In: Keller R., Kramer D., Weiss JP. (eds) Facing the Multicore - Challenge II. Lecture Notes in Computer Science, vol 7174. Springer, Berlin, Heidelberg.
[38]
[38] H. C. Edwards and C. R. Trott, Kokkos: Enabling Performance Portability Across Manycore Architectures, 2013 Extreme Scaling Workshop (xsw 2013), Boulder, CO, 2013, pp. 18-24.
[39]
[39] A. Hayashi, J. Shirako, E. Tiotto, R. Ho and V. Sarkar, Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator Model on a POWER8+GPU Platform, 2016 Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, UT, 2016, pp. 68-78.
[40]
[40] A. Hsu, D. N. Asanza, J. A. Schoonover, Z. Jibben, N. N. Carlson and R. Robey, Performance Portability Challenges for Fortran Applications, 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 47-58.
[41]
[41] Law, T.R., Kevis, R., Powell, S., Dickson, J., Maheswaran, S., Herdman, J.A., Jarvis, S.A.: Performance portability of an unstructured hydrodynamics mini-application. In: Proceedings of 2018 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC). ACM, New York, NY, USA (2018).
[42]
[42] Martineau M., McIntosh-Smith S. (2017) The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs.In: de Supinski B., Olivier S., Terboven C., Chapman B., M?ller M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science, vol 10468. Springer, Cham.
[43]
[43] Martineau M., Price J., McIntosh-Smith S., Gaudin W. (2016) Pragmatic Performance Portability with OpenMP 4.x. In: Maruyama N., de Supinski B., Wahib M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science, vol 9903. Springer, Cham.
[44]
[44] S. J. Pennycook, J. D. Sewall and J. R. Hammond, Evaluating the Impact of Proposed OpenMP 5.0 Features on Performance, Portability and Productivity, 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 37-46
[45]
[45] S. L. Harrell et al., Effective Performance Portability,” 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 24-36.
[46]
[46] Tandon Suyash, N. Stegmeier, Vasu Jaganath, Jennifer Ranta, R. Ratnasingam, Elizabeth Carlson, J. Loiseau, Vinay Ramakrishnaiah and Robert S. Pavel. Enabling code portability of a parallel and distributed smooth-particle hydrodynamics application, FleCSPH. (2019).
[47]
[47] T. Hey, J. Ferrante (Eds.), Portability and Performance of Parallel Processing, Wiley, New York, 1994.
[48]
[48] Bowen Alpern and Larry Carter, Towards a Model for Portable Parallel Performance: Exposing the Memory Hierarchy,In T. Hey, J. Ferrante (Eds.), Portability and Performance of Parallel Processing, Wiley, New York, 1994, pp. 21-41.
[49]
[49] S. J. Pennycook, J. D. Sewall, and V. W. Lee, A Metric for Performance Portability, arXiv preprint arXiv:1611.07409, 2016.
[50]
[50] Ami Marowka, Toward a Better Performance Portability Metric, In Proceeding of 29th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2021), Valladolid, Spain, March 10-12, 2021.
[51]
[51] Ami Marowka, Raw Data and Statistics of case studies for Performance Portability Research,https://www.dropbox.com/s/1g9q0s2ymqq9003/Zmy.pdf?dl=0
[52]
[52] Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification, 2020. [Online]. Available: https://www.khronos.org/news/press/
[53]
[53] https://www.oneapi.io/

Cited By

View all
  • (2024)HighP5: Programming using Partitioned Parallel Processing SpacesJournal of the Brazilian Computer Society10.5753/jbcs.2024.434530:1(653-687)Online publication date: 17-Dec-2024
  • (2024)JACC: Leveraging HPC Meta-Programming and Performance Portability with the Just-in-Time and LLVM-based Julia LanguageProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00245(1955-1966)Online publication date: 17-Nov-2024
  • (2024)Adaptive Parallelism in OpenMP Through Dynamic VariantsHigh Performance Computing. ISC High Performance 2024 International Workshops10.1007/978-3-031-73716-9_2(17-30)Online publication date: 14-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPCAsia '22: International Conference on High Performance Computing in Asia-Pacific Region
January 2022
145 pages
ISBN:9781450384988
DOI:10.1145/3492805
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 January 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Kokkos
  2. OpenACC
  3. OpenMP
  4. Performance Efficiency
  5. Performance Portability
  6. RAJA

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HPC Asia2022

Acceptance Rates

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)4
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)HighP5: Programming using Partitioned Parallel Processing SpacesJournal of the Brazilian Computer Society10.5753/jbcs.2024.434530:1(653-687)Online publication date: 17-Dec-2024
  • (2024)JACC: Leveraging HPC Meta-Programming and Performance Portability with the Just-in-Time and LLVM-based Julia LanguageProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00245(1955-1966)Online publication date: 17-Nov-2024
  • (2024)Adaptive Parallelism in OpenMP Through Dynamic VariantsHigh Performance Computing. ISC High Performance 2024 International Workshops10.1007/978-3-031-73716-9_2(17-30)Online publication date: 14-Dec-2024
  • (2023)An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil BenchmarkElectronics10.3390/electronics1222455512:22(4555)Online publication date: 7-Nov-2023
  • (2023)Toward Open Repository of Performance Portability of Applications, Benchmarks and Models2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00025(160-169)Online publication date: 17-Oct-2023
  • (2023)Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00068(373-382)Online publication date: May-2023
  • (2023)Towards Achieving Transparent Malleability Thanks to MPI Process VirtualizationHigh Performance Computing10.1007/978-3-031-40843-4_3(28-41)Online publication date: 21-May-2023
  • (2023)A comparison of two performance portability metricsConcurrency and Computation: Practice and Experience10.1002/cpe.786835:25Online publication date: 4-Aug-2023
  • (2022)Inferential Statistical Analysis of Performance PortabilityParallel Processing and Applied Mathematics10.1007/978-3-031-30445-3_4(39-50)Online publication date: 11-Sep-2022
  • (2022)Implementing a GPU-Portable Field Line Tracing Application with OpenMP OffloadHigh Performance Computing10.1007/978-3-031-23821-5_3(31-46)Online publication date: 21-Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media