Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3404397.3404428acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Performance Portable Supernode-based Sparse Triangular Solver for Manycore Architectures

Published: 17 August 2020 Publication History

Abstract

Sparse triangular solver is an important kernel in many computational applications. However, a fast, parallel, sparse triangular solver on a manycore architecture such as GPU has been an open issue in the field for several years. In this paper, we develop a sparse triangular solver that takes advantage of the supernodal structures of the triangular matrices that come from the direct factorization of a sparse matrix. We implemented our solver using Kokkos and Kokkos Kernels such that our solver is portable to different manycore architectures. This has the additional benefit of allowing our triangular solver to use the team-level kernels and take advantage of the hierarchical parallelism available on the GPU. We compare the effects of different scheduling schemes on the performance and also investigate an algorithmic variant called the partitioned inverse. Our performance results on an NVIDIA V100 or P100 GPU demonstrate that our implementation can be 12.4 × or 19.5 × faster than the vendor optimized implementation in NVIDIA’s CuSPARSE library.

References

[1]
F. L. Alvarado, A. Pothen, and R. Schreiber. 1993. Highly Parallel Sparse Triangular Solution. In Graph Theory and Sparse Matrix Computation. The IMA Volumes in Mathematics and its Applications, A. George A, J. R. Gilbert, and J. W. H. Liu (Eds.). Springer, New York, NY, Chapter 56, 141–157.
[2]
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. 1999. LAPACK User’s Guide(3 ed.). SIAM, Philadelpha, PA.
[3]
E. Anderson and Y. Saad. 1989. Solving sparse triangular linear systems on parallel computers. Int. J. High Speed Comput. 1 (1989), 73–95.
[4]
E. Bavier, M. Hoemmen, S. Rajamanickam, and H. Thornquist. 2012. Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems. Sci. Programming 20(2012), 241–255.
[5]
A. M. Bradley. 2016. A hybrid multithreaded direct sparse triangular solver. In 2016 Proceedings of the Seventh SIAM Workshop on Combinatorial Scientific Computing. SIAM, 13–22.
[6]
Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam. 2008. Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Transactions on Mathematical Software (TOMS) 35, 3 (2008), 1–14.
[7]
Timothy A Davis, Sivasankaran Rajamanickam, and Wissam M Sid-Lakhdar. 2016. A survey of direct methods for sparse linear systems. Acta Numerica 25(2016), 383–566.
[8]
N. Ding, S. Williams, Y. Liu, and X. Li. 2020. Leveraging One-Sided Communication for Sparse Triangular Solvers. In Proceedings of the SIAM Conference on Parallel Processing for Scientific Computing.
[9]
C. R. Dohrmann, A. Klawonn, and O. B. Widlund. 2008. Domain decomposition for less regular subdomains: Overlapping Schwarz in two dimensions. SIAM J. Numer. Anal. 46(2008), 2153–2168.
[10]
H. Edwards, C. Trott, and D. Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74, 12 (2014), 3202–3216.
[11]
A. Heinlein, A. Klawonn, S. Rajamanickam, and O. Rheinbach. 2018. FROSch: A Fast and Robust Overlapping Schwarz Domain Decomposition Preconditioner Based on Xpetra in Trilinos.Technical Report. Sandia National Lab.(SNL-NM).
[12]
N. J. Higham and A. Pothen. 1994. Stability of the partitioned inverse method for parallel solution of sparse triangular systems. SIAM J. Sci. Comput. 15(1994), 139–148.
[13]
G. Karypis. 2013. METIS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices. Technical Report.
[14]
R. Li and Y. Saad. 2013. GPU-accelerated preconditioned iterative linear solvers. Journal of Supercomputing 63 (2013), 443–466.
[15]
R. Li and C. Zhang. 2020. Efficient Parallel Implementations of Sparse Triangular Solves for GPU Architecture. In Proceedings of SIAM Conference on Parallel Proc. for Sci. Comput.118–128.
[16]
X. S. Li, J. W. Demmel, J. R. Gilbert, L. Grigori, P. Sao, M. Shao, and I. Yamazaki. 1999. SuperLU Users’ Guide. Technical Report LBNL-44289.
[17]
P. Lin, M. Bettencourt, S. Domino, T. Fisher, M. Hoemmen, J. Hu, E. Phipps, A. Prokopenko, S. Rajamanickam, C. Siefert, 2014. Towards extreme-scale simulations for low mach fluids with second-generation trilinos. Parallel processing letters 24, 04 (2014), 1442005.
[18]
M. Naumov. 2011. Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. Technical Report Tech. Rep. NVR-2011.
[19]
[19] Kokkos Kernels Home Page.[n.d.]. https://github.com/kokkos/kokkos-kernels. [Online; accessed 2020].
[20]
A. Picciau, G. E. Inggs, J. Wickerson, E. C. Kerrigan, and G. A. Constantinides. 2016. Balancing locality and concurrency: Solving sparse triangular systems on GPUs. In Proceedings of the 23rd IEEE International Conference on High Performance Computing (HiPC). 183–192.
[21]
Y. Saad. 2003. Iterative Methods for Sparse Linear Systems (2 ed.). SIAM, Philadelpha, PA.
[22]
J. H. Saltz. 1990. Aggregation methods for solving sparse triangular systems on multiprocessors. SIAM J. Sci. Comput. 11(1990), 123–144.
[23]
B. Suchoski, C. Severn, M. Shantharam, and P. Raghavan. 2012. Adapting sparse triangular solution to GPUs. In Proceedings of the 41st International Conference on Parallel Processing Workshops. 140––148.
[24]
Sierra Structural Dynamics Development Team. 2017. Sierra Structural Dynamics–User’s Notes. Technical Report SAND2018-2449.

Cited By

View all
  • (2024)AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUsACM Transactions on Architecture and Code Optimization10.1145/367491121:4(1-25)Online publication date: 25-Jun-2024
  • (2023)A Multi-GPU Aggregation-Based AMG Preconditioner for Iterative Linear SolversIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328723834:8(2365-2376)Online publication date: Aug-2023
  • (2023)An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00073(680-689)Online publication date: May-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '20: Proceedings of the 49th International Conference on Parallel Processing
August 2020
844 pages
ISBN:9781450388160
DOI:10.1145/3404397
© 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP '20

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)4
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUsACM Transactions on Architecture and Code Optimization10.1145/367491121:4(1-25)Online publication date: 25-Jun-2024
  • (2023)A Multi-GPU Aggregation-Based AMG Preconditioner for Iterative Linear SolversIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328723834:8(2365-2376)Online publication date: Aug-2023
  • (2023)An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00073(680-689)Online publication date: May-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media