Article

Performance and Portability of a Linear Solver Across Emerging Architectures

Authors:

Aaron C. Walden,

Mohammad Zubair,

Eric J. NielsenAuthors Info & Claims

Accelerator Programming Using Directives: 7th International Workshop, WACCPD 2020, Virtual Event, November 20, 2020, Proceedings

Pages 61 - 79

https://doi.org/10.1007/978-3-030-74224-9_4

Published: 20 November 2020 Publication History

Abstract

A linear solver algorithm used by a large-scale unstructured-grid computational fluid dynamics application is examined for a broad range of familiar and emerging architectures. Efficient implementation of a linear solver is challenging on recent CPUs offering vector architectures. Vector loads and stores are essential to effectively utilize available memory bandwidth on CPUs, and maintaining performance across different CPUs can be difficult in the face of varying vector lengths offered by each. A similar challenge occurs on GPU architectures, where it is essential to have coalesced memory accesses to utilize memory bandwidth effectively. In this work, we demonstrate that restructuring a computation, and possibly data layout, with regard to architecture is essential to achieve optimal performance by establishing a performance benchmark for each target architecture in a low level language such as vector intrinsics or CUDA. In doing so, we demonstrate how a linear solver kernel can be mapped to Intel^® Xeon™ and Xeon Phi™, Marvell^® ThunderX2^®, NEC^® SX-Aurora™ TSUBASA Vector Engine, and NVIDIA^® and AMD^® GPUs. We further demonstrate that the required code restructuring can be achieved in higher level programming environments such as OpenACC, OCCA, and Intel^® OneAPI™/SYCL, and that each generally results in optimal performance on the target architecture. Relative performance metrics for all implementations are shown, and subjective ratings for ease of implementation and optimization are suggested.

References

[1]

OpenACC. https://www.openacc.org. Accessed 24 Aug 2020

[2]

OpenMP. https://www.openmp.org. Accessed 24 Aug 2020

[3]

The MPI Forum Website. http://www.mpi-forum.org. Accessed 24 Aug 2020

[4]

AMD Incorporated: AMD Radeon Instinct MI50 Accelerator. https://www.amd.com/en/products/professional-graphics/instinct-mi50. Accessed 24 Aug 2020

[5]

AMD Incorporated: HIP Porting Guide. https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-porting-guide.html. Accessed 24 Aug 2020

[6]

AMD Incorporated: HIP Programming Guide. https://rocm-documentation.readthedocs.io/en/latest/Programming_Guides/HIP-GUIDE.html. Accessed 24 Aug 2020

[7]

Biedron, R., et al.: FUN3D Manual 13.6. NASA/TM-2019-220416 (2019)

[8]

Codeplay: Codeplay Contribution to DPC++ Brings SYCL Support for NVIDIA GPUs. https://www.codeplay.com/portal/news/2020/02/03/codeplay-contribution-to-dpcpp-brings-sycl-support-for-nvidia-gpus.html. Accessed 24 Aug 2020

[9]

Intel Corporation: Intel oneAPI DPC++ Compiler (Beta). https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-compiler.html. Accessed 24 Aug 2020

[10]

Intel Corporation: Intrinsics Guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/. Accessed 24 Aug 2020

[11]

Khronos Group: OpenCL. https://www.khronos.org/opencl/. Accessed 24 Aug 2020

[12]

Khronos Group: SYCL. https://www.khronos.org/sycl/. Accessed 24 Aug 2020

[13]

Kincaid, D.R., Oppe, T.C., Young, D.M.: ITPACKV 2D User’s Guide, May 1989

[14]

Korzun, A., et al.: Effects of Spatial Resolution on Retropropulsion Aerodynamics in an Atmospheric Environment. AIAA SciTech Forum (2020)

[15]

Kreutzer M, Hager G, Wellein G, Fehske H, and Bishop AR A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units SIAM J. Sci. Comput. 2014 36 5 C401-C423

[16]

Laflin KR et al. Data summary from second AIAA computational fluid dynamics drag prediction workshop J. Aircraft 2005 42 5 1165-1178

[17]

Medina, D.S., St-Cyr, A., Warburton, T.: OCCA: A Unified Approach to Multi-Threading Languages. arXiv preprint arXiv:1403.0968 (2014)

[18]

NEC Corporation: SX-Aurora TSUBASA Fortran Compiler User’s Guide. https://www.hpc.nec/documents/sdk/pdfs/g2af02e-FortranUsersGuide-018.pdf. Accessed 24 Aug 2020

[19]

NEC Corporation: SX-Aurora TSUBASA VEOS NUMA Mode Guide for Partitioning Mode. https://www.hpc.nec/documents/guide/pdfs/VEOS_NUMA_Mode4PartitioningMode_E.pdf. Accessed 24 Aug 2020

[20]

Nielsen EJ and Diskin B High-performance aerodynamic computations for aerospace applications Parallel Comput. 2017 64 20-32

[21]

NVIDIA Corporation: cuBLAS. https://developer.nvidia.com/cublas. Accessed 24 Aug 2020

[22]

NVIDIA Corporation: CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/#axzz4Hicq83a9. Accessed 24 Aug 2020

[23]

NVIDIA Corporation: cuSPARSE. https://developer.nvidia.com/cusparse. Accessed 24 Aug 2020

[24]

Oak Ridge National Laboratory: Exascale System Expected to be World’s Most Powerful Computer for Science and Innovation. https://www.olcf.ornl.gov/2019/05/07/no-scaling-back-doe-cray-amd-to-bring-exascale-to-ornl/. Accessed 24 Aug 2020

[25]

Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)

[26]

ANANDTECH: Assessing Cavium’s ThunderX2: The Arm Server Dream Realized At Last (2018). https://www.anandtech.com/show/12694/assessing-cavium-thunderx2-arm-server-reality

[27]

Walden, A., Nielsen, E., Diskin, B., Zubair, M.: A mixed precision multicolor point-implicit solver for unstructured grids on GPUs. In: Proceedings of the Ninth Workshop on Irregular Applications: Architectures and Algorithms, IA3 2019, Los Alamitos, CA, USA, pp. 23–30. IEEE Press (2019)

[28]

Zubair, M., Nielsen, E., Luitjens, J., Hammond, D.: An optimized multicolor point-implicit solver for unstructured grid applications on graphics processing units. In: Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms, IA3 2016, Piscataway, NJ, USA, pp. 18–25. IEEE Press (2016)

Index Terms

Performance and Portability of a Linear Solver Across Emerging Architectures

Index terms have been assigned to the content through auto-classification.

Recommendations

On the Programmability and Performance of Heterogeneous Platforms
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed Systems

General-purpose computing on an ever-broadening array of parallel devices has led to an increasingly complex and multi-dimensional landscape with respect to programmability and performance optimization. The growing diversity of parallel architectures ...
Evaluation of a performance portable lattice Boltzmann code using OpenCL
IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014

With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important ...
Understanding Performance Portability of OpenACC for Supercomputers
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Accelerator Programming Using Directives: 7th International Workshop, WACCPD 2020, Virtual Event, November 20, 2020, Proceedings

Nov 2020

107 pages

ISBN:978-3-030-74223-2

DOI:10.1007/978-3-030-74224-9

Editors:
Sridutt Bhalachandra
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
,
Sandra Wienke
RWTH Aachen University, Aachen, Germany
,
Sunita Chandrasekaran
University of Delaware, Newark, DE, USA
,
Guido Juckeland
Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Germany

© Springer Nature Switzerland AG 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 20 November 2020

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten