Linear algebra algorithms

Applied Filters

Publication Date

People

Publications

Searched The ACM Guide to Computing Literature (3,856,395 records)|Limit your search to The ACM Full-Text Collection (778,822 records)

Showing 1 - 20of59 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
October 2024
Fast hardware-aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systems
Journal of Parallel and Distributed Computing (JPDC), Volume 192, Issue Chttps://doi.org/10.1016/j.jpdc.2024.104925
Abstract
Recent hardware-aware matrix-free algorithms for higher-order finite-element (FE) discretized matrix-vector multiplications reduce floating point operations and data access costs compared to traditional sparse matrix approaches. In this work, we ...
Highlights

Matrix-free algorithms for FE matrix-multivector products with thousands of vectors on multi-node CPU and GPU architectures.
Hardware-tuned batched evaluation strategies for improved data locality.
Architecture-specific ...
0
Metrics
Total Citations0
research-article
May 2023
Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures
Journal of Parallel and Distributed Computing (JPDC), Volume 175, Issue CPages 51–65https://doi.org/10.1016/j.jpdc.2023.01.004
Abstract
We propose a methodology to address the programmability issues derived from the emergence of new-generation shared-memory NUMA architectures. For this purpose, we employ dense matrix factorizations and matrix inversion (DMFI) as a use ...
Highlights

Exposure of the performance penalty introduced by NUMA-oblivious implementations.
1
Metrics
Total Citations1
research-article
May 2022
Pipelined Preconditioned Conjugate Gradient Methods for real and complex linear systems for distributed memory architectures
- Manasi Tiwari,
- Sathish Vadhiyar
Journal of Parallel and Distributed Computing (JPDC), Volume 163, Issue CPages 147–155https://doi.org/10.1016/j.jpdc.2022.01.008
Highlights

We introduce PIPECG-OATI-c to solve complex Hermitian and symmetric linear systems.
Abstract
Preconditioned Conjugate Gradient (PCG) is a popular method for solving large and sparse linear systems of equations. The performance of PCG at scale is affected due to the costly global synchronization steps that arise in dot-products ...
0
Metrics
Total Citations0
research-article
February 2022
Comparing the performance of general matrix multiplication routine on heterogeneous computing systems
Journal of Parallel and Distributed Computing (JPDC), Volume 160, Issue CPages 39–48https://doi.org/10.1016/j.jpdc.2021.10.002
Abstract
This paper contains the results of research on the general matrix multiplication routine performance on modern heterogeneous computing systems. In addition to the single-threaded and multi-threaded performance of the routine for ...
Highlights

The performance of GEMM routines on heterogeneous computing systems has been studied.
1
Metrics
Total Citations1
research-article
April 2020
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library)
Journal of Parallel and Distributed Computing (JPDC), Volume 138, Issue CPages 153–171https://doi.org/10.1016/j.jpdc.2019.12.002
Abstract
In this work we have implemented a novel Linear Algebra Library on top of the task-based runtime OmpSs-2. We have used some of the most advanced OmpSs-2 features; weak dependencies and regions, together with the final clause for the ...
Highlights

Development of a highly optimized auto-tuned library for BLAS-3 and LAPACK operations.
8
Metrics
Total Citations8
research-article
March 2020
Batched transpose-free ADI-type preconditioners for a Poisson solver on GPGPUs
- Peter Arbenz,
- Lubomír Říha
Journal of Parallel and Distributed Computing (JPDC), Volume 137, Issue CPages 148–159https://doi.org/10.1016/j.jpdc.2019.11.004
Abstract
We investigate the iterative solution of a symmetric positive definite linear system involving the shifted Laplacian as the system matrix on General Purpose Graphics Processing Units (GPGPUs). We consider in particular the Chebyshev ...
Highlights

Solution of batched tridiagonal linear systems on GPUs.
Problems originating ...
1
Metrics
Total Citations1
research-article
August 2019
Node aware sparse matrix–vector multiplication
Journal of Parallel and Distributed Computing (JPDC), Volume 130, Issue CPages 166–178https://doi.org/10.1016/j.jpdc.2019.03.016
Abstract
The sparse matrix–vector multiply (SpMV) operation is a key computational kernel in many simulations and linear solvers. The large communication requirements associated with a reference implementation of a parallel SpMV result in poor ...
10
Metrics
Total Citations10
research-article
July 2019
Optimizing sparse tensor times matrix on GPUs
Journal of Parallel and Distributed Computing (JPDC), Volume 129, Issue CPages 99–109https://doi.org/10.1016/j.jpdc.2018.07.018
Abstract
This work optimizes tensor-times-dense matrix multiply (Ttm) for general sparse and semi-sparse tensors on CPU and NVIDIA GPU platforms. Ttm is a computational kernel in tensor methods-based data analytics and data mining applications, ...
Highlights

Designed an in-place SpTTM algorithm to avoid tensor-matrix data transformation.
7
Metrics
Total Citations7
research-article
June 2019
Modeling the asynchronous Jacobi method without communication delays
- Jordi Wolfson-Pou,
- Edmond Chow
Journal of Parallel and Distributed Computing (JPDC), Volume 128, Issue CPages 84–98https://doi.org/10.1016/j.jpdc.2019.02.002
Abstract
Asynchronous iterative methods for solving linear systems are gaining renewed interest due to the high cost of synchronization points in massively parallel codes. Historically, theory on asynchronous iterative methods has focused on ...
Highlights

We study the asynchronous Jacobi (AJ) method for solving sparse linear systems.
0
Metrics
Total Citations0
research-article
September 2018
Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning
Journal of Parallel and Distributed Computing (JPDC), Volume 119, Issue CPages 219–230https://doi.org/10.1016/j.jpdc.2018.04.017
Abstract
When using incomplete factorization preconditioners with an iterative method to solve large sparse linear systems, each application of the preconditioner involves solving two sparse triangular systems. These triangular systems are ...
Highlights

Jacobi solvers for sparse incomplete factor preconditioning are extensively tested.
2
Metrics
Total Citations2
article
August 2008
Parallel computation of the eigenvalues of symmetric Toeplitz matrices through iterative methods
Journal of Parallel and Distributed Computing (JPDC), Volume 68, Issue 8Pages 1113–1121https://doi.org/10.1016/j.jpdc.2008.03.003

This paper presents a new procedure to compute many or all of the eigenvalues and eigenvectors of symmetric Toeplitz matrices. The key to this algorithm is the use of the ''Shift-and-Invert'' technique applied with iterative methods, which allows the ...
1
Metrics
Total Citations1
article
June 2008
A relaxation scheme for increasing the parallelism in Jacobi-SVD
- Sanguthevar Rajasekaran,
- Mingjun Song
Journal of Parallel and Distributed Computing (JPDC), Volume 68, Issue 6Pages 769–777https://doi.org/10.1016/j.jpdc.2007.12.003

The Singular Value Decomposition (SVD) is a vital problem that finds a place in numerous application domains in science and engineering. As an example, SVDs are used in processing voluminous datasets. Many sequential and parallel algorithms have been ...
0
Metrics
Total Citations0
article
May 2008
Parallel block tridiagonalization of real symmetric matrices
- Yihua Bai,
- Robert C. Ward
Journal of Parallel and Distributed Computing (JPDC), Volume 68, Issue 5Pages 703–715https://doi.org/10.1016/j.jpdc.2007.10.001

Two parallel block tridiagonalization algorithms and implementations for dense real symmetric matrices are presented. Block tridiagonalization is a critical pre-processing step for the block tridiagonal divide-and-conquer algorithm for computing ...
2
Metrics
Total Citations2
article
February 2008
Load balancing algorithms based on gradient methods and their analysis through algebraic graph theory
- Andrey G. Bronevich,
- Wolfgang Meyer
Journal of Parallel and Distributed Computing (JPDC), Volume 68, Issue 2Pages 209–220https://doi.org/10.1016/j.jpdc.2007.09.001

The main results of this paper are based on the idea that most load balancing algorithms can be described in the framework of optimization theory. It enables to involve classical results linked with convergence, its speed and other elements. We ...
1
Metrics
Total Citations1
article
May 2007
MPI implementation of parallel subdomain methods for linear and nonlinear convection--diffusion problems
Journal of Parallel and Distributed Computing (JPDC), Volume 67, Issue 5Pages 581–591https://doi.org/10.1016/j.jpdc.2007.01.003

The solution of linear and nonlinear convection-diffusion problems via parallel subdomain methods is considered. MPI implementation of parallel Schwarz alternating methods on distributed memory multiprocessors is discussed. Parallel synchronous and ...
5
Metrics
Total Citations5
article
November 2006
Parallel sparse LU factorization on different message passing platforms
- Kai Shen
Journal of Parallel and Distributed Computing (JPDC), Volume 66, Issue 11Pages 1387–1403https://doi.org/10.1016/j.jpdc.2006.07.001

Several message passing-based parallel solvers have been developed for general (non-symmetric) sparse LU factorization with partial pivoting. Existing solvers were mostly deployed and evaluated on parallel computing platforms with high message passing ...
1
Metrics
Total Citations1
article
March 2006
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules
- Clémentin Tayou Djamegni
Journal of Parallel and Distributed Computing (JPDC), Volume 66, Issue 3Pages 323–333https://doi.org/10.1016/j.jpdc.2005.07.008

This paper investigates the computation of matrix product on both partially pipelined and fully pipelined modular linear arrays. These investigations are guided by a constructive and unified approach for both target architectures. First, permissible ...
1
Metrics
Total Citations1
article
March 2005
Distributed block independent set algorithms and parallel multilevel ILU preconditioners
Journal of Parallel and Distributed Computing (JPDC), Volume 65, Issue 3Pages 331–346https://doi.org/10.1016/j.jpdc.2004.10.007

We present a class of parallel preconditioning strategies utilizing multilevel block incomplete LU (ILU) factorization techniques to solve large sparse linear systems. The preconditioners are constructed by exploiting the concept of block independent ...
1
Metrics
Total Citations1
article
March 2005
Toward an automatic parallelization of sparse matrix computations
Journal of Parallel and Distributed Computing (JPDC), Volume 65, Issue 3Pages 313–330https://doi.org/10.1016/j.jpdc.2004.09.017

In this paper, we propose a generic method of automatic parallelization for sparse matrix computation. This method is based on both a refinement of the data-dependence test proposed by Bernstein and an inspector-executor scheme which is specialized to ...
0
Metrics
Total Citations0
article
September 2003
Communication characteristics of large-scale scientific applications for contemporary cluster architectures
- Jeffrey S. Vetter,
- Frank Mueller
Journal of Parallel and Distributed Computing (JPDC), Volume 63, Issue 9Pages 853–865https://doi.org/10.1016/S0743-7315(03)00104-7

This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the ...
31
Metrics
Total Citations31

Applied Filters

Publication Date

People

Authors

Institutions

Publications

All Publications

Content Type

Publisher

Results

Fast hardware-aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systems

Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Pipelined Preconditioned Conjugate Gradient Methods for real and complex linear systems for distributed memory architectures

Comparing the performance of general matrix multiplication routine on heterogeneous computing systems

sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library)

Batched transpose-free ADI-type preconditioners for a Poisson solver on GPGPUs

Node aware sparse matrix–vector multiplication

Optimizing sparse tensor times matrix on GPUs

Modeling the asynchronous Jacobi method without communication delays

Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning

Parallel computation of the eigenvalues of symmetric Toeplitz matrices through iterative methods

A relaxation scheme for increasing the parallelism in Jacobi-SVD

Parallel block tridiagonalization of real symmetric matrices

Load balancing algorithms based on gradient methods and their analysis through algebraic graph theory

MPI implementation of parallel subdomain methods for linear and nonlinear convection--diffusion problems

Parallel sparse LU factorization on different message passing platforms

Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules

Distributed block independent set algorithms and parallel multilevel ILU preconditioners

Toward an automatic parallelization of sparse matrix computations

Communication characteristics of large-scale scientific applications for contemporary cluster architectures