Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Ginkgo - A math library designed to accelerate Exascale Computing Project science applications
- Terry Cojean,
- Pratik Nayak,
- Tobias Ribizel,
- Natalie Beams,
- Yu-Hsiang Mike Tsai,
- Marcel Koch,
- Fritz Göbel,
- Thomas Grützmacher,
- Hartwig Anzt,
- Michael Heroux
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 6Pages 568–584https://doi.org/10.1177/10943420241268323Large-scale simulations require efficient computation across the entire computing hierarchy. A challenge of the Exascale Computing Project (ECP) was to reconcile highly heterogeneous hardware with the myriad of applications that were required to run on ...
- research-articleOctober 2024
Mixed-precision pre-pivoting strategy for the LU factorization
AbstractThis paper investigates the efficient application of half-precision floating-point (FP16) arithmetic on GPUs for boosting LU decompositions in double (FP64) precision. Addressing the motivation to enhance computational efficiency, we introduce two ...
- research-articleNovember 2023
GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1672–1679https://doi.org/10.1145/3624062.3624247This paper presents a portable and performance-efficient approach to solve a batch of linear systems of equations using Graphics Processing Units (GPUs). Each system is represented using a special type of matrices with a band structure above and/or below ...
- research-articleNovember 2023
MatRIS: Multi-level Math Library Abstraction for Heterogeneity and Performance Portability using IRIS Runtime
- Mohammad Alaul Haque Monil,
- Narasinga Rao Miniskar,
- Keita Teranishi,
- Jeffrey S. Vetter,
- Pedro Valero-Lara
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1081–1092https://doi.org/10.1145/3624062.3624184Vendor libraries are tuned for a specific architecture and are not portable to others. Moreover, they lack support for heterogeneity and multi-device orchestration, which is required for efficient use of contemporary HPC and cloud resources. To address ...
- research-articleNovember 2023
Optimized matrix ordering of sparse linear solver using a few-shot model for circuit simulation
AbstractThe sparse linear solver has become the bottleneck in a SPICE-like circuit simulator. A general sparse linear solver comprises pre-analysis, numeric factorization, and right-hand solving. The matrix ordering method in pre-analysis determines fill-...
-
- research-articleJune 2023
Using Additive Modifications in LU Factorization Instead of Pivoting
ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 14–24https://doi.org/10.1145/3577193.3593731Direct solvers for dense systems of linear equations commonly use partial pivoting to ensure numerical stability. However, pivoting can introduce significant performance overheads, such as synchronization and data movement, particularly on distributed ...
- research-articleMarch 2023
Mixed precision LU factorization on GPU tensor cores: reducing data movement and memory footprint
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 37, Issue 2Pages 165–179https://doi.org/10.1177/10943420221136848Modern GPUs equipped with mixed precision tensor core units present great potential to accelerate dense linear algebra operations such as LU factorization. However, state-of-the-art mixed half/single precision LU factorization algorithms all require the ...
- research-articleFebruary 2023
End-to-End LU Factorization of Large Matrices on GPUs
PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 288–300https://doi.org/10.1145/3572848.3577486LU factorization for sparse matrices is an important computing step for many engineering and scientific problems such as circuit simulation. There have been many efforts toward parallelizing and scaling this algorithm, which include the recent efforts ...
- research-articleNovember 2022
Solving linear systems on a GPU with hierarchically off-diagonal low-rank approximations
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 84, Pages 1–15We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the factorization of ...
- research-articleNovember 2022
Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 26, Pages 1–14Many scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are ...
- ArticleMay 2023
A Portable and Heterogeneous LU Factorization on IRIS
AbstractHere, the IRIS programming model is evaluated as a method to improve performance portability for heterogeneous systems that use LU matrix factorization. LU (lower-upper) factorization is considered one of the most important numerical linear ...
- research-articleJuly 2021
Augmented Joint Domain Localized Method for Polarimetric Space–Time Adaptive Processing
Circuits, Systems, and Signal Processing (CSSP), Volume 40, Issue 7Pages 3592–3608https://doi.org/10.1007/s00034-020-01634-0AbstractAn augmented joint domain localized technique for computationally efficient polarimetric space–time adaptive processing (pSTAP) is proposed. In the proposed method, the signal vector to be detected is first estimated by using a modified least ...
- research-articleJanuary 2021
Block Low-Rank Matrices with Shared Bases: Potential and Limitations of the BLR$^2$ Format
SIAM Journal on Matrix Analysis and Applications (SIMAX), Volume 42, Issue 2Pages 990–1010https://doi.org/10.1137/20M1386451We investigate a special class of data sparse rank-structured matrices that combine a flat block low-rank (BLR) partitioning with the use of shared (called nested in the hierarchical case) bases. This format is to $\mathcal{H}^2$ matrices what BLR is to $\...
- research-articleJanuary 2021
Matrices with Tunable Infinity-Norm Condition Number and No Need for Pivoting in LU Factorization
SIAM Journal on Matrix Analysis and Applications (SIMAX), Volume 42, Issue 1Pages 417–435https://doi.org/10.1137/20M1357238We propose a two-parameter family of nonsymmetric dense $n\times n$ matrices $A(\alpha,\beta)$ for which LU factorization without pivoting is numerically stable, and we show how to choose $\alpha$ and $\beta$ to achieve any value of the $\infty$-norm ...
- research-articleJanuary 2021
Random Matrices Generating Large Growth in LU Factorization with Pivoting
SIAM Journal on Matrix Analysis and Applications (SIMAX), Volume 42, Issue 1Pages 185–201https://doi.org/10.1137/20M1338149We identify a class of random, dense, $n\times n$ matrices for which LU factorization with any form of pivoting produces a growth factor typically of size at least $n/(4 \log n)$ for large $n$. The condition number of the matrices can be arbitrarily chosen, ...
- ArticleNovember 2020
ADELUS: A Performance-Portable Dense LU Solver for Distributed-Memory Hardware-Accelerated Systems
AbstractSolving dense systems of linear equations is essential in applications encountered in physics, mathematics, and engineering. This paper describes our current efforts toward the development of the ADELUS package for current and next generation ...
- research-articleJanuary 2020
A hierarchical butterfly LU preconditioner for two-dimensional electromagnetic scattering problems involving open surfaces
Journal of Computational Physics (JOCP), Volume 401, Issue Chttps://doi.org/10.1016/j.jcp.2019.109014Highlights- O ( N log 2 N ) fast matvec and approximate LU factorization of the linear system from 2D EFIE involving open surfaces.
This paper introduces a hierarchical interpolative decomposition butterfly-LU factorization (H-IDBF-LU) preconditioner for solving two-dimensional electric-field integral equations (EFIEs) in electromagnetic scattering problems of ...
- research-articleJanuary 2020
Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores
SIAM Journal on Scientific Computing (SISC), Volume 42, Issue 3Pages C124–C141https://doi.org/10.1137/19M1289546Computing units that carry out a fused multiply-add (FMA) operation with matrix arguments, referred to as tensor units by some vendors, have great potential for use in scientific computing. However, these units are inherently mixed precision, and existing ...
- research-articleSeptember 2019
Distributed-memory lattice H -matrix factorization
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 5Pages 1046–1063https://doi.org/10.1177/1094342019861139We parallelize the LU factorization of a hierarchical low-rank matrix ( H -matrix) on a distributed-memory computer. This is much more difficult than the H -matrix-vector multiplication due to the dataflow of the factorization, and it is much harder ...
- research-articleSeptember 2019
Hierarchical approach for deriving a reproducible unblocked LU factorization
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 5Pages 791–803https://doi.org/10.1177/1094342019832968We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector ...