Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
- Jaeyoung Choi,
- Jack J. Dongarra,
- L. Susan Ostrouchov,
- Antoine P. Petitet,
- David W. Walker,
- R. Clint Whaley
This article discusses the core factorization routines included in the ScaLAPACK library. These routines allow the factorization and solution of a dense system of linear equations via LU, QR, and Cholesky. They are implemented using a block cyclic data ...
Irregular Coarse-Grain Data Parallelism under LPARX
LPARX is a software development tool for implementing dynamic, irregular scientific applications, such as multilevel finite difference and particle methods, on high-performance multiple instruction multiple data (MIMD) parallel architectures. It ...
Massively Parallel Searching for Better Algorithms or How to Do a Cross Product with Five Multiplications
A number of "tricks" are known that trade multiplications for additions. The term "tricks" reflects the way these methods seem not to proceed from any general theory, but instead jump into existence as recipes that work. The Strassen method for 2 × 2 ...
A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation
Some issues in the relationship of coding style and compiler optimization are discussed with regard to Fortran 90 array notation. A review of several important Fortran 90 array constructs and their performance on vector and scalar hardware sets the ...
Software Tools for High-Performance Computiing: Survey and Recommendations
Applications programming for high-performance computing is notoriously difficult. Al-though parallel programming is intrinsically complex, the principal reason why high-performance computing is difficult is the lack of effective software tools. We ...
Pattern-Driven Automatic Parallelization
This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main ...