Keyword: LU factorization : Search

research-article

Open Access

Ginkgo - A math library designed to accelerate Exascale Computing Project science applications

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 6Pages 568–584https://doi.org/10.1177/10943420241268323

Large-scale simulations require efficient computation across the entire computing hierarchy. A challenge of the Exascale Computing Project (ECP) was to reconcile highly heterogeneous hardware with the myriad of applications that were required to run on ...

research-article

Mixed-precision pre-pivoting strategy for the LU factorization

The Journal of Supercomputing (JSCO), Volume 81, Issue 1https://doi.org/10.1007/s11227-024-06523-w

Abstract

This paper investigates the efficient application of half-precision floating-point (FP16) arithmetic on GPUs for boosting LU decompositions in double (FP64) precision. Addressing the motivation to enhance computational efficiency, we introduce two ...

research-article

Public Access

GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1672–1679https://doi.org/10.1145/3624062.3624247

This paper presents a portable and performance-efficient approach to solve a batch of linear systems of equations using Graphics Processing Units (GPUs). Each system is represented using a special type of matrices with a band structure above and/or below ...

research-article

Public Access

MatRIS: Multi-level Math Library Abstraction for Heterogeneity and Performance Portability using IRIS Runtime

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1081–1092https://doi.org/10.1145/3624062.3624184

Vendor libraries are tuned for a specific architecture and are not portable to others. Moreover, they lack support for heterogeneity and multi-device orchestration, which is required for efficient use of contemporary HPC and cloud resources. To address ...

research-article

Optimized matrix ordering of sparse linear solver using a few-shot model for circuit simulation

Integration, the VLSI Journal (INTG), Volume 93, Issue Chttps://doi.org/10.1016/j.vlsi.2023.102062

Abstract

The sparse linear solver has become the bottleneck in a SPICE-like circuit simulator. A general sparse linear solver comprises pre-analysis, numeric factorization, and right-hand solving. The matrix ordering method in pre-analysis determines fill-...

research-article

Public Access

Using Additive Modifications in LU Factorization Instead of Pivoting

ICS '23: Proceedings of the 37th ACM International Conference on SupercomputingPages 14–24https://doi.org/10.1145/3577193.3593731

Direct solvers for dense systems of linear equations commonly use partial pivoting to ensure numerical stability. However, pivoting can introduce significant performance overheads, such as synchronization and data movement, particularly on distributed ...

research-article

Mixed precision LU factorization on GPU tensor cores: reducing data movement and memory footprint

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 37, Issue 2Pages 165–179https://doi.org/10.1177/10943420221136848

Modern GPUs equipped with mixed precision tensor core units present great potential to accelerate dense linear algebra operations such as LU factorization. However, state-of-the-art mixed half/single precision LU factorization algorithms all require the ...

research-article

End-to-End LU Factorization of Large Matrices on GPUs

PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 288–300https://doi.org/10.1145/3572848.3577486

LU factorization for sparse matrices is an important computing step for many engineering and scientific problems such as circuit simulation. There have been many efforts toward parallelizing and scaling this algorithm, which include the recent efforts ...

research-article

Solving linear systems on a GPU with hierarchically off-diagonal low-rank approximations

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 84, Pages 1–15

We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the factorization of ...

research-article

Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 26, Pages 1–14

Many scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are ...

Article

A Portable and Heterogeneous LU Factorization on IRIS

Euro-Par 2022: Parallel Processing WorkshopsPages 17–31https://doi.org/10.1007/978-3-031-31209-0_2

Abstract

Here, the IRIS programming model is evaluated as a method to improve performance portability for heterogeneous systems that use LU matrix factorization. LU (lower-upper) factorization is considered one of the most important numerical linear ...

research-article

Augmented Joint Domain Localized Method for Polarimetric Space–Time Adaptive Processing

Circuits, Systems, and Signal Processing (CSSP), Volume 40, Issue 7Pages 3592–3608https://doi.org/10.1007/s00034-020-01634-0

Abstract

An augmented joint domain localized technique for computationally efficient polarimetric space–time adaptive processing (pSTAP) is proposed. In the proposed method, the signal vector to be detected is first estimated by using a modified least ...

research-article

Block Low-Rank Matrices with Shared Bases: Potential and Limitations of the BLR$^2$ Format

SIAM Journal on Matrix Analysis and Applications (SIMAX), Volume 42, Issue 2Pages 990–1010https://doi.org/10.1137/20M1386451

We investigate a special class of data sparse rank-structured matrices that combine a flat block low-rank (BLR) partitioning with the use of shared (called nested in the hierarchical case) bases. This format is to $\mathcal{H}^2$ matrices what BLR is to $\...

research-article

Open Access

Matrices with Tunable Infinity-Norm Condition Number and No Need for Pivoting in LU Factorization

SIAM Journal on Matrix Analysis and Applications (SIMAX), Volume 42, Issue 1Pages 417–435https://doi.org/10.1137/20M1357238

We propose a two-parameter family of nonsymmetric dense $n\times n$ matrices $A(\alpha,\beta)$ for which LU factorization without pivoting is numerically stable, and we show how to choose $\alpha$ and $\beta$ to achieve any value of the $\infty$-norm ...

research-article

Open Access

Random Matrices Generating Large Growth in LU Factorization with Pivoting

SIAM Journal on Matrix Analysis and Applications (SIMAX), Volume 42, Issue 1Pages 185–201https://doi.org/10.1137/20M1338149

We identify a class of random, dense, $n\times n$ matrices for which LU factorization with any form of pivoting produces a growth factor typically of size at least $n/(4 \log n)$ for large $n$. The condition number of the matrices can be arbitrarily chosen, ...

Article

ADELUS: A Performance-Portable Dense LU Solver for Distributed-Memory Hardware-Accelerated Systems

Accelerator Programming Using DirectivesPages 80–101https://doi.org/10.1007/978-3-030-74224-9_5

Abstract

Solving dense systems of linear equations is essential in applications encountered in physics, mathematics, and engineering. This paper describes our current efforts toward the development of the ADELUS package for current and next generation ...

research-article

A hierarchical butterfly LU preconditioner for two-dimensional electromagnetic scattering problems involving open surfaces

Journal of Computational Physics (JOCP), Volume 401, Issue Chttps://doi.org/10.1016/j.jcp.2019.109014

Highlights

O ( N log 2 ⁡ N ) fast matvec and approximate LU factorization of the linear system from 2D EFIE involving open surfaces.

Abstract

This paper introduces a hierarchical interpolative decomposition butterfly-LU factorization (H-IDBF-LU) preconditioner for solving two-dimensional electric-field integral equations (EFIEs) in electromagnetic scattering problems of ...

research-article

Open Access

Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores

SIAM Journal on Scientific Computing (SISC), Volume 42, Issue 3Pages C124–C141https://doi.org/10.1137/19M1289546

Computing units that carry out a fused multiply-add (FMA) operation with matrix arguments, referred to as tensor units by some vendors, have great potential for use in scientific computing. However, these units are inherently mixed precision, and existing ...

research-article

Distributed-memory lattice H -matrix factorization

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 5Pages 1046–1063https://doi.org/10.1177/1094342019861139

We parallelize the LU factorization of a hierarchical low-rank matrix ( H -matrix) on a distributed-memory computer. This is much more difficult than the H -matrix-vector multiplication due to the dataflow of the factorization, and it is much harder ...

research-article

Hierarchical approach for deriving a reproducible unblocked LU factorization

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 5Pages 791–803https://doi.org/10.1177/1094342019832968

We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences