research-article

Enabling Performance Portability for Shallow Water Equations on CPUs, GPUs, and FPGAs with SYCL

Authors:

Markus Büttner,

Harald Köstler,

Christian Plessl,

Vadym AizingerAuthors Info & Claims

PASC '24: Proceedings of the Platform for Advanced Scientific Computing Conference

Article No.: 11, Pages 1 - 12

https://doi.org/10.1145/3659914.3659925

Published: 03 June 2024 Publication History

Abstract

In order to make the best use of the diverse hardware architectures in present and future high-performance computers, developers and maintainers of scientific simulation codes strive for performance portability. The goal is to reach a good fraction of the hardware-specific practically achievable performance while maintaining a largely unified codebase. In benchmarks and first production codes, SYCL has been demonstrated to be a promising programming model for this purpose when targeting different CPU and GPUs.

In this work, we utilize SYCL to develop a performance portable implementation of the 2D shallow water equations, discretized on unstructured triangular meshes using the discontinuous Galerkin method with polynomial orders zero, one, and two. In addition to GPUs from three and CPUs from two vendors, we also broaden the scope of target architectures by including Intel Stratix FPGAs with a fundamentally different execution model. We show that with a few targeted and encapsulated specializations, it is possible to adapt the execution flow to the respective targets. The performance analysis shows how FPGAs complement the other two architectures with particularly good performance for small problem sizes.

References

[1]

Aizinger, V., and Dawson, C. A discontinuous Galerkin method for two-dimensional flow and transport in shallow water. Advances in Water Resources 25, 1 (2002), 67--84.

[2]

Alhaddad, S., Förstner, J., Groth, S., Grünewald, D., Grynko, Y., Hannig, F., Kenter, T., Pfreundt, F.-J., Plessl, C., Schotte, M., Steinke, T., Teich, J., Weiser, M., and Wende, F. The HighPerMeshes framework for numerical algorithms on unstructured grids. Concurrency and Computation: Practice and Experience (September 2021), 1--15.

[3]

Alpay, A., and Heuveline, V. One pass to bind them: The first single-pass sycl compiler with unified code representation across backends. In Proc. Int. Workshop on OpenCL (IWOCL) (New York, NY, USA, 2023), IWOCL '23, Association for Computing Machinery.

Digital Library

[4]

Alt, C., Kenter, T., Faghih-Naini, S., Faj, J., Opdenhövel, J.-O., Plessl, C., Aizinger, V., Hönig, J., and Köstler, H. Shallow water DG simulations on FPGAs: Design and comparison of a novel code generation pipeline. In Proc. Int. Conf. on High Performance Computing (ISC High Performance) (2023), Lecture Notes in Computer Science (LNCS), Springer, pp. 86--105.

Digital Library

[5]

Aureli, F., Prost, F., Vacondio, R., Dazzi, S., and Ferrari, A. A GPU-Accelerated Shallow-Water Scheme for Surface Runoff Simulations. Water 12, 3 (Mar. 2020), 637. Number: 3 Publisher: Multidisciplinary Digital Publishing Institute.

[6]

Bauer, C., Kenter, T., Lass, M., Mazur, L., Meyer, M., Nitsche, H., Riebler, H., Schade, R., Schwarz, M., Winnwa, N., Wiens, A., Wu, X., Plessl, C., and Simon, J. Noctua 2 supercomputer. Journal of large-scale research facilities (JLSRF) (2024). In press.

[7]

Caviedes-Voullième, D., Morales-Hernández, M., Norman, M. R., and Özgen Xian, I. SERGHEI (SERGHEI-SWE) v1.0: a performance-portable highperformance parallel-computing shallow-water solver for hydrology and environmental hydraulics. Geoscientific Model Development 16, 3 (Feb. 2023), 977--1008.

[8]

Chippada, S., Dawson, C., Martinez, M., and Wheeler, M. A Godunov-type finite volume method for the system of shallow water equations. Computer Methods in Applied Mechanics and Engineering 151, 1 (1998), 105 -- 129.

[9]

Cockburn, B., and Shu, C.-W. TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework. Math. Comp. 52 (1989), 411--435.

[10]

Deakin, T., and McIntosh-Smith, S. Evaluating the performance of HPC-style SYCL applications. In Proceedings of the International Workshop on OpenCL (New York, NY, USA, Apr. 2020), IWOCL '20, Association for Computing Machinery, pp. 1--11.

Digital Library

[11]

Düben, P. D., Korn, P., and Aizinger, V. A discontinuous/continuous low order finite element shallow water model on the sphere. Journal of Computational Physics 231, 6 (Mar. 2012), 2396--2413.

Digital Library

[12]

Faghih-Naini, S., Kuckuk, S., Aizinger, V., Zint, D., Grosso, R., and Köstler, H. Quadrature-free discontinuous Galerkin method with code generation features for shallow water equations on automatically generated block-structured meshes. Advances in Water Resources 138 (2020), 103552.

[13]

Faghih-Naini, S., Kuckuk, S., Zint, D., Kemmler, S., Köstler, H., and Aizinger, V. Discontinuous Galerkin method for the shallow water equations on complex domains using masked block-structured grids. Advances in Water Resources 182 (Dec. 2023), 104584.

[14]

Faj, J., Plessl, C., Kenter, T., Faghih-Naini, S., and Aizinger, V. Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured Meshes. In Proc. Platform for Advanced Scientific Computing Conf. (PASC) (2023), pp. 1--12.

Digital Library

[15]

Gandham, R., Medina, D., and Warburton, T. GPU Accelerated Discontinuous Galerkin Methods for Shallow Water Equations. Communications in Computational Physics 18, 1 (July 2015), 37--64. Publisher: Cambridge University Press.

[16]

Hajduk, H., Hodges, B. R., Aizinger, V., and Reuter, B. Locally Filtered Transport for computational efficiency in multi-component advection-reaction models. Environmental Modelling & Software 102 (2018), 185--198.

Digital Library

[17]

Hu, B., and Rossbach, C. J. Altis: Modernizing GPGPU benchmarks. In Proc. IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS) (2020), pp. 1--11.

[18]

Kenter, T., Mahale, G., Alhaddad, S., Grynko, Y., Schmitt, C., Afzal, A., Hannig, F., Förstner, J., and Plessl, C. OpenCL-based FPGA design to accelerate the nodal discontinuous Galerkin method for unstructured meshes. In Proc. IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM) (2018), IEEE, pp. 189--196.

[19]

Kenter, T., Shambhu, A., Faghih-Naini, S., and Aizinger, V. Algorithmhardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva Switzerland, July 2021), ACM, pp. 1--11.

Digital Library

[20]

Meyer, J., Alpay, A., Hack, S., Fröning, H., and Heuveline, V. Implementation techniques for SPMD kernels on CPUs. In Proc. Int. Workshop on OpenCL (IWOCL) (New York, NY, USA, 2023), IWOCL '23, Association for Computing Machinery.

Digital Library

[21]

Meyer, M., Kenter, T., and Plessl, C. In-depth FPGA accelerator performance evaluation with single node benchmarks from the HPC challenge benchmark suite for Intel and Xilinx FPGAs using OpenCL. Journal of Parallel and Distributed Computing 160 (2022), 79--89.

Digital Library

[22]

Pennycook, S. J., Sewall, J. D., Jacobsen, D. W., Deakin, T., and McIntosh-Smith, S. Navigating Performance, Portability, and Productivity. Computing in Science & Engineering 23, 5 (2021), 28--38.

[23]

Reguly, I. Z. Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications. In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (New York, NY, USA, Nov. 2023), SC-W '23, Association for Computing Machinery, pp. 1038--1047.

Digital Library

[24]

Weckert, C., Solis-Vasqez, L., Oppermann, J., Koch, A., and Sinnen, O. Altis-SYCL: Migrating Altis Benchmarking Suite from CUDA to SYCL for GPUs and FPGAs. In Proc. Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), held in conjuction with Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC) (New York, NY, USA, 2023), SC-W '23, Association for Computing Machinery, p. 547--555.

Digital Library

[25]

Yang, C., Gayatri, R., Kurth, T., Basu, P., Ronaghi, Z., Adetokunbo, A., Friesen, B., Cook, B., Doerfler, D., Oliker, L., Deslippe, J., and Williams, S. An empirical roofline methodology for quantitatively assessing performance portability. In Proc. Int. Workshop on Performance, Portability and Productivity in HPC (P3HPC) (2018), pp. 14--23.

Cited By

Index Terms

Enabling Performance Portability for Shallow Water Equations on CPUs, GPUs, and FPGAs with SYCL

Recommendations

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

In this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three ...
Comparing the Performance of SYCL Runtimes for Molecular Dynamics Applications
IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL

SYCL is a cross-platform, royalty-free standard for programming a wide range of hardware accelerators. It is a powerful and convenient way to write standard C++ 17 code that can take full advantage of available devices. There are already multiple SYCL ...
Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
Abstract
The aim of SYCL is to reduce the gap between the performance and code portability of the main accelerators used in HPC, such as multi-vendor CPUs, GPUs, and FPGAs. To evaluate SYCL’s performance portability, this paper uses the k-means algorithm ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PASC '24: Proceedings of the Platform for Advanced Scientific Computing Conference

June 2024

296 pages

ISBN:9798400706394

DOI:10.1145/3659914

Proceedings Chairs:
Timothy Robinson
ETH Zurich/CSCS
,
Michèle Woodtli
ETH Zurich/CSCS
,
Program Chairs:
Axel Huebl
Lawrence Berkeley National Laboratory
,
Cristina Silvano
Politecnico di Milano

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Deutsche Forschungsgemeinschaft

Conference

PASC '24

Sponsor:

SIGHPC

PASC '24: Platform for Advanced Scientific Computing Conference

June 3 - 5, 2024

Zurich, Switzerland

Acceptance Rates

PASC '24 Paper Acceptance Rate 26 of 36 submissions, 72%;

Overall Acceptance Rate 109 of 221 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
68
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)6

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents