Google Scholar

Scalable tile communication-avoiding QR factorization on multicore cluster systems

F Song, H Ltaief, B Hadri… - SC'10: Proceedings of the …, 2010 - ieeexplore.ieee.org

SC'10: Proceedings of the 2010 ACM/IEEE International Conference …, 2010•ieeexplore.ieee.org

As tile linear algebra algorithms continue achieving high performance on shared-memory multicore architectures, it is a challenging task to make them scalable on distributed-memory multicore cluster machines. The main contribution of this paper is the extension to the distributed-memory environment of the previous work done by Hadri et al. on Communication- Avoiding QR (CA-QR) factorizations for tall and skinny matrices (initially done on shared-memory multicore systems). The fine granularity of tile algorithms associated with communicationavoiding techniques for the QR factorization presents a high degree of parallelism where multiple tasks can be concurrently executed, computation and communication largely overlapped, and computation steps fully pipelined. A decentralized dynamic scheduler has then been integrated as a runtime system to efficiently schedule tasks across the distributed resources. Our experimental results performed on two clusters (with dual-core and 8-core nodes, respectively) and a Cray XT5 system with 12-core nodes show that the tile CA-QR factorization is able to outperform the de facto ScaLAPACK library by up to 4 times for tall and skinny matrices, and has good scalability on up to 3,072 cores.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 52 Related articles All 21 versions

Cite

Advanced search

Saved to My library

Scalable tile communication-avoiding QR factorization on multicore cluster systems