Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3659914.3659925acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article

Enabling Performance Portability for Shallow Water Equations on CPUs, GPUs, and FPGAs with SYCL

Published: 03 June 2024 Publication History

Abstract

In order to make the best use of the diverse hardware architectures in present and future high-performance computers, developers and maintainers of scientific simulation codes strive for performance portability. The goal is to reach a good fraction of the hardware-specific practically achievable performance while maintaining a largely unified codebase. In benchmarks and first production codes, SYCL has been demonstrated to be a promising programming model for this purpose when targeting different CPU and GPUs.
In this work, we utilize SYCL to develop a performance portable implementation of the 2D shallow water equations, discretized on unstructured triangular meshes using the discontinuous Galerkin method with polynomial orders zero, one, and two. In addition to GPUs from three and CPUs from two vendors, we also broaden the scope of target architectures by including Intel Stratix FPGAs with a fundamentally different execution model. We show that with a few targeted and encapsulated specializations, it is possible to adapt the execution flow to the respective targets. The performance analysis shows how FPGAs complement the other two architectures with particularly good performance for small problem sizes.

References

[1]
Aizinger, V., and Dawson, C. A discontinuous Galerkin method for two-dimensional flow and transport in shallow water. Advances in Water Resources 25, 1 (2002), 67--84.
[2]
Alhaddad, S., Förstner, J., Groth, S., Grünewald, D., Grynko, Y., Hannig, F., Kenter, T., Pfreundt, F.-J., Plessl, C., Schotte, M., Steinke, T., Teich, J., Weiser, M., and Wende, F. The HighPerMeshes framework for numerical algorithms on unstructured grids. Concurrency and Computation: Practice and Experience (September 2021), 1--15.
[3]
Alpay, A., and Heuveline, V. One pass to bind them: The first single-pass sycl compiler with unified code representation across backends. In Proc. Int. Workshop on OpenCL (IWOCL) (New York, NY, USA, 2023), IWOCL '23, Association for Computing Machinery.
[4]
Alt, C., Kenter, T., Faghih-Naini, S., Faj, J., Opdenhövel, J.-O., Plessl, C., Aizinger, V., Hönig, J., and Köstler, H. Shallow water DG simulations on FPGAs: Design and comparison of a novel code generation pipeline. In Proc. Int. Conf. on High Performance Computing (ISC High Performance) (2023), Lecture Notes in Computer Science (LNCS), Springer, pp. 86--105.
[5]
Aureli, F., Prost, F., Vacondio, R., Dazzi, S., and Ferrari, A. A GPU-Accelerated Shallow-Water Scheme for Surface Runoff Simulations. Water 12, 3 (Mar. 2020), 637. Number: 3 Publisher: Multidisciplinary Digital Publishing Institute.
[6]
Bauer, C., Kenter, T., Lass, M., Mazur, L., Meyer, M., Nitsche, H., Riebler, H., Schade, R., Schwarz, M., Winnwa, N., Wiens, A., Wu, X., Plessl, C., and Simon, J. Noctua 2 supercomputer. Journal of large-scale research facilities (JLSRF) (2024). In press.
[7]
Caviedes-Voullième, D., Morales-Hernández, M., Norman, M. R., and Özgen Xian, I. SERGHEI (SERGHEI-SWE) v1.0: a performance-portable highperformance parallel-computing shallow-water solver for hydrology and environmental hydraulics. Geoscientific Model Development 16, 3 (Feb. 2023), 977--1008.
[8]
Chippada, S., Dawson, C., Martinez, M., and Wheeler, M. A Godunov-type finite volume method for the system of shallow water equations. Computer Methods in Applied Mechanics and Engineering 151, 1 (1998), 105 -- 129.
[9]
Cockburn, B., and Shu, C.-W. TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. II. General framework. Math. Comp. 52 (1989), 411--435.
[10]
Deakin, T., and McIntosh-Smith, S. Evaluating the performance of HPC-style SYCL applications. In Proceedings of the International Workshop on OpenCL (New York, NY, USA, Apr. 2020), IWOCL '20, Association for Computing Machinery, pp. 1--11.
[11]
Düben, P. D., Korn, P., and Aizinger, V. A discontinuous/continuous low order finite element shallow water model on the sphere. Journal of Computational Physics 231, 6 (Mar. 2012), 2396--2413.
[12]
Faghih-Naini, S., Kuckuk, S., Aizinger, V., Zint, D., Grosso, R., and Köstler, H. Quadrature-free discontinuous Galerkin method with code generation features for shallow water equations on automatically generated block-structured meshes. Advances in Water Resources 138 (2020), 103552.
[13]
Faghih-Naini, S., Kuckuk, S., Zint, D., Kemmler, S., Köstler, H., and Aizinger, V. Discontinuous Galerkin method for the shallow water equations on complex domains using masked block-structured grids. Advances in Water Resources 182 (Dec. 2023), 104584.
[14]
Faj, J., Plessl, C., Kenter, T., Faghih-Naini, S., and Aizinger, V. Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured Meshes. In Proc. Platform for Advanced Scientific Computing Conf. (PASC) (2023), pp. 1--12.
[15]
Gandham, R., Medina, D., and Warburton, T. GPU Accelerated Discontinuous Galerkin Methods for Shallow Water Equations. Communications in Computational Physics 18, 1 (July 2015), 37--64. Publisher: Cambridge University Press.
[16]
Hajduk, H., Hodges, B. R., Aizinger, V., and Reuter, B. Locally Filtered Transport for computational efficiency in multi-component advection-reaction models. Environmental Modelling & Software 102 (2018), 185--198.
[17]
Hu, B., and Rossbach, C. J. Altis: Modernizing GPGPU benchmarks. In Proc. IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS) (2020), pp. 1--11.
[18]
Kenter, T., Mahale, G., Alhaddad, S., Grynko, Y., Schmitt, C., Afzal, A., Hannig, F., Förstner, J., and Plessl, C. OpenCL-based FPGA design to accelerate the nodal discontinuous Galerkin method for unstructured meshes. In Proc. IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM) (2018), IEEE, pp. 189--196.
[19]
Kenter, T., Shambhu, A., Faghih-Naini, S., and Aizinger, V. Algorithmhardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva Switzerland, July 2021), ACM, pp. 1--11.
[20]
Meyer, J., Alpay, A., Hack, S., Fröning, H., and Heuveline, V. Implementation techniques for SPMD kernels on CPUs. In Proc. Int. Workshop on OpenCL (IWOCL) (New York, NY, USA, 2023), IWOCL '23, Association for Computing Machinery.
[21]
Meyer, M., Kenter, T., and Plessl, C. In-depth FPGA accelerator performance evaluation with single node benchmarks from the HPC challenge benchmark suite for Intel and Xilinx FPGAs using OpenCL. Journal of Parallel and Distributed Computing 160 (2022), 79--89.
[22]
Pennycook, S. J., Sewall, J. D., Jacobsen, D. W., Deakin, T., and McIntosh-Smith, S. Navigating Performance, Portability, and Productivity. Computing in Science & Engineering 23, 5 (2021), 28--38.
[23]
Reguly, I. Z. Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications. In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (New York, NY, USA, Nov. 2023), SC-W '23, Association for Computing Machinery, pp. 1038--1047.
[24]
Weckert, C., Solis-Vasqez, L., Oppermann, J., Koch, A., and Sinnen, O. Altis-SYCL: Migrating Altis Benchmarking Suite from CUDA to SYCL for GPUs and FPGAs. In Proc. Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC), held in conjuction with Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC) (New York, NY, USA, 2023), SC-W '23, Association for Computing Machinery, p. 547--555.
[25]
Yang, C., Gayatri, R., Kurth, T., Basu, P., Ronaghi, Z., Adetokunbo, A., Friesen, B., Cook, B., Doerfler, D., Oliker, L., Deslippe, J., and Williams, S. An empirical roofline methodology for quantitatively assessing performance portability. In Proc. Int. Workshop on Performance, Portability and Productivity in HPC (P3HPC) (2018), pp. 14--23.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PASC '24: Proceedings of the Platform for Advanced Scientific Computing Conference
June 2024
296 pages
ISBN:9798400706394
DOI:10.1145/3659914
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2024

Check for updates

Author Tags

  1. SYCL
  2. performance-portability
  3. shallow water equations
  4. discontinous Galerkin method

Qualifiers

  • Research-article

Funding Sources

Conference

PASC '24
Sponsor:

Acceptance Rates

PASC '24 Paper Acceptance Rate 26 of 36 submissions, 72%;
Overall Acceptance Rate 109 of 221 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 68
    Total Downloads
  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)6
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media