Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3432261.3436753acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article
Open access

SeisSol on Distributed Multi-GPU Systems: CUDA Code Generation for the Modal Discontinuous Galerkin Method

Published: 20 January 2021 Publication History

Abstract

We present a GPU implementation of the high order Discontinuous Galerkin (DG) scheme in SeisSol, a software package for simulating seismic waves and earthquake dynamics. Our particular focus is on providing a performance portable solution for heterogeneous distributed multi-GPU systems. We therefore redesigned SeisSol’s code generation cascade for GPU programming models. This includes CUDA source code generation for the performance-critical small batched matrix multiplications kernels. The parallelisation extends the existing MPI+X scheme and supports SeisSol’s cluster-wise Local Time Stepping (LTS) algorithm for ADER time integration.
We performed a Roofline model analysis to ensure that the generated batched matrix operations achieve the performance limits posed by the memory-bandwidth roofline. Our results also demonstrate that the generated GPU kernels outperform the corresponding cuBLAS subroutines by 2.5 times on average. We present strong and weak scaling studies of our implementation on the Marconi100 supercomputer (with 4 Nvidia Volta V100 GPUs per node) on up to 256 GPUs, which revealed good parallel performance and efficiency in case of time integration using global time stepping. However, we show that directly mapping the LTS method from CPUs to distributed GPU environments results in lower hardware utilization. Nevertheless, due to the algorithmic advantages of local time stepping, the method still reduces time-to-solution by a factor of 1.3 on average in contrast to the GTS scheme.

References

[1]
Daniel S Abdi, Lucas C Wilcox, Timothy C Warburton, and Francis X Giraldo. 2019. A GPU-accelerated continuous and discontinuous Galerkin non-hydrostatic atmospheric model. Int. J. High Perform. Comp. App. 33, 1 (2019), 81–109.
[2]
Alexander Breuer, Alexander Heinecke, and Michael Bader. 2016. Petascale local time stepping for the ADER-DG finite element method. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 854–863.
[3]
Alexander Breuer, Alexander Heinecke, and Yifeng Cui. 2017. EDGE: Extreme scale fused seismic simulations with the discontinuous Galerkin method. In International Supercomputing Conference, ISC’17. Springer, 41–60.
[4]
Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, and Zizhong Chen. 2019. TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs. In Proceedings of the ACM International Conference on Supercomputing. 106–116.
[5]
Steven M Day, J Bielak, D Dreger, R Graves, S Larsen, KB Olsen, and A Pitarka. 2003. Tests of 3D elastodynamic codes: Final report for Lifelines Project 1A02. Pacific Earthquake Engineering Research Center (2003).
[6]
Edoardo Di Napoli, Diego Fabregat-Traver, Gregorio Quintana-Ortí, and Paolo Bientinesi. 2014. Towards an efficient use of the BLAS library for multilinear tensor contractions. Appl. Math. Comput. 235(2014), 454–468.
[7]
Michael Dumbser and Martin Käser. 2006. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—II. The three-dimensional isotropic case. Geophys. J. Int. 167, 1 (2006), 319–336.
[8]
Michael Dumbser and Claus-Dieter Munz. 2006. Building blocks for arbitrary high order discontinuous Galerkin schemes. J. Sci. Comput. 27, 1-3 (2006), 215–230.
[9]
Dominik Ernst, Georg Hager, Jonas Thies, and Gerhard Wellein. 2020. Performance Engineering for a Tall & Skinny Matrix Multiplication Kernels on GPUs. In Parallel Processing and Applied Mathematics. Springer, 505–515.
[10]
Ruth A. Harris, Michael Barall, Brad Aagaard, Shuo Ma, Daniel Roten, Kim Olsen, Benchun Duan, Dunyu Liu, Bin Luo, Kangchen Bai, Jean‐Paul Ampuero, Yoshihiro Kaneko, Alice‐Agnes Gabriel, Kenneth Duru, Thomas Ulrich, Stephanie Wollherr, Zheqiang Shi, Eric Dunham, Sam Bydlon, Zhenguo Zhang, Xiaofei Chen, Surendra Nadh Somala, Christian Pelties, Josué Tago, Victor Manuel Cruz‐Atienza, Jeremy Kozdon, Eric Daub, Khurram Aslam, Yuko Kase, Kyle Withers, and Luis Dalguer. 2018. A Suite of Exercises for Verifying Dynamic Earthquake Rupture Codes. Seismol.l Res. Lett. 89, 3 (04 2018), 1146–1162.
[11]
Alexander Heinecke, Alexander Breuer, Sebastian Rettenberger, Michael Bader, Alice-Agnes Gabriel, Christian Pelties, Arndt Bode, William Barth, Xiang-Ke Liao, Karthikeyan Vaidyanathan, Mikhail Smelyanskiy, and Pradeep Dubey. 2014. Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers. In SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 3–14.
[12]
A. Heinecke, G. Henry, M. Hutchinson, and H. Pabst. 2016. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation. In SC16: Int. Conf. for HPC, Networking, Storage and Analysis. 981–991.
[13]
Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele Paolo Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. CoRR abs/1804.06826 (2018). arXiv preprint arXiv:1804.06826(2018).
[14]
George Karypis and Vipin Kumar. 2009. MeTis: Unstructured Graph Partitioning and Sparse Matrix Ordering System, Version 4.0. http://www.cs.umn.edu/~metis.
[15]
Martin Käser, Michael Dumbser, Josep De La Puente, and Heiner Igel. 2007. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—III. Viscoelastic attenuation. Geophys. J. Int. 168, 1 (2007), 224–242.
[16]
Martin Käser, P Martin Mai, and Michael Dumbser. 2007. Accurate calculation of fault-rupture models using the high-order discontinuous Galerkin method on tetrahedral meshes. B. Seismol. Soc. Am. 97, 5 (2007), 1570–1586.
[17]
Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. 2018. Nvidia tensor core programmability, performance & precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 522–531.
[18]
Axel Modave, Amik St-Cyr, and Tim Warburton. 2016. GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models. Computers & Geosciences 91 (2016), 64–76.
[19]
Tesla NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture.
[20]
Max Rietmann, Peter Messmer, Tarje Nissen-Meyer, Daniel Peter, Piero Basini, Dimitri Komatitsch, Olaf Schenk, Jeroen Tromp, Lapo Boschi, and Domenico Giardini. 2012. Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1–11.
[21]
Max Rietmann, Daniel Peter, Olaf Schenk, Bora Uçar, and Marcus Grote. 2015. Load-balanced local time stepping for large-scale wave propagation. In 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, 925–935.
[22]
Paul Springer and Paolo Bientinesi. 2018. Design of a high-performance gemm-like tensor–tensor multiplication. ACM Trans. Math. Software 44, 3 (2018), 1–29.
[23]
Kasia Świrydowicz, Noel Chalmers, Ali Karakus, and Tim Warburton. 2019. Acceleration of tensor-product operations for high-order finite element methods. Int. J. High Perform. Comp. App. 33, 4 (2019), 735–757.
[24]
Thomas Ulrich, Alice-Agnes Gabriel, Jean-Paul Ampuero, and Wenbin Xu. 2019. Dynamic viability of the 2016 Mw 7.8 Kaikōura earthquake cascade on weak crustal faults. Nat. Commun. 10, 1 (2019), 1–16.
[25]
Carsten Uphoff and Michael Bader. 2016. Generating high performance matrix kernels for earthquake simulations with viscoelastic attenuation. In 2016 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 908–916.
[26]
Carsten Uphoff and Michael Bader. 2020. Yet Another Tensor Toolbox for discontinuous Galerkin methods and other applications. ACM Trans. Math. Software 46, 4 (2020). https://doi.org/10.1145/3406835
[27]
Carsten Uphoff, Sebastian Rettenberger, Michael Bader, Elizabeth H. Madden, Thomas Ulrich, Stephanie Wollherr, and Alice-Agnes Gabriel. 2017. Extreme Scale Multi-Physics Simulations of the Tsunamigenic 2004 Sumatra Megathrust Earthquake. In SC ’17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM.
[28]
Peter Vincent, Freddie Witherden, Brian Vermeire, Jin Seok Park, and Arvind Iyer. 2016. Towards green aviation with python at petascale. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–11.
[29]
Stephanie Wollherr, Alice-Agnes Gabriel, and P Martin Mai. 2019. Landers 1992 “reloaded”: Integrative dynamic earthquake rupture modeling. J. Geophys. Res.: Sol. Ea. 124, 7 (2019), 6666–6702.

Cited By

View all
  • (2024)Rupture Dynamics of Cascading Earthquakes in a Multiscale Fracture NetworkJournal of Geophysical Research: Solid Earth10.1029/2023JB027578129:3Online publication date: 19-Mar-2024
  • (2024)Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSolConcurrency and Computation: Practice and Experience10.1002/cpe.803736:12Online publication date: 13-Feb-2024
  • (2023)The EU Center of Excellence for Exascale in Solid Earth (ChEESE)Future Generation Computer Systems10.1016/j.future.2023.04.006146:C(47-61)Online publication date: 1-Sep-2023
  • Show More Cited By

Index Terms

  1. SeisSol on Distributed Multi-GPU Systems: CUDA Code Generation for the Modal Discontinuous Galerkin Method
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific Region
    January 2021
    143 pages
    ISBN:9781450388429
    DOI:10.1145/3432261
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 January 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ADER
    2. Discontinuous Galerkin
    3. GPU
    4. SeisSol
    5. code generation
    6. high performance computing
    7. local time stepping
    8. seismic wave propagation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the European Union?s Horizon 2020 research and innovation programme

    Conference

    HPC Asia 2021

    Acceptance Rates

    Overall Acceptance Rate 69 of 143 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)236
    • Downloads (Last 6 weeks)46
    Reflects downloads up to 23 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Rupture Dynamics of Cascading Earthquakes in a Multiscale Fracture NetworkJournal of Geophysical Research: Solid Earth10.1029/2023JB027578129:3Online publication date: 19-Mar-2024
    • (2024)Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSolConcurrency and Computation: Practice and Experience10.1002/cpe.803736:12Online publication date: 13-Feb-2024
    • (2023)The EU Center of Excellence for Exascale in Solid Earth (ChEESE)Future Generation Computer Systems10.1016/j.future.2023.04.006146:C(47-61)Online publication date: 1-Sep-2023
    • (2022)Next-Generation Local Time Stepping for the ADER-DG Finite Element Method2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00046(402-413)Online publication date: May-2022
    • (2022)A discontinuous Galerkin method for sequences of earthquakes and aseismic slip on multiple faults using unstructured curvilinear gridsGeophysical Journal International10.1093/gji/ggac467233:1(586-626)Online publication date: 25-Nov-2022
    • (2022)An efficient ADER-DG local time stepping scheme for 3D HPC simulation of seismic waves in poroelastic mediaJournal of Computational Physics10.1016/j.jcp.2021.110886455:COnline publication date: 15-Apr-2022
    • (2022)Finch: Domain Specific Language and Code Generation for Finite Element and Finite Volume in JuliaComputational Science – ICCS 202210.1007/978-3-031-08751-6_9(118-132)Online publication date: 21-Jun-2022
    • (2021)3D Linked Subduction, Dynamic Rupture, Tsunami, and Inundation Modeling: Dynamic Effects of Supershear and Tsunami Earthquakes, Hypocenter Location, and Shallow Fault SlipFrontiers in Earth Science10.3389/feart.2021.6268449Online publication date: 24-Jun-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media