Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A recursive formulation of Cholesky factorization of a matrix in packed storage

Published: 01 June 2001 Publication History

Abstract

A new compact way to store a symmetric or triangular matrix called RPF for Recursive Packed Format is fully described. Novel ways to transform RPF to and from standard packed format are included. A new algorithm, called RPC for Recursive Packed Cholesky, that operates on the RPG format is presented. ALgorithm RPC is basd on level-3 BLAS and requires variants of algorithms TRSM and SYRK that work on RPF. We call these RP_TRSM and RP_SYRK and find that they do most of their work by calling GEMM. It follows that most of the execution time of RPC lies in GEMM. The advantage of this storage scheme compared to traditional packed and full storage is demonstrated. First, the RPC storage format uses the minimal amount of storage for the symmetric or triangular matrix. Second, RPC gives a level-3 implementation of Cholesky factorization whereas standard packed implementations are only level 2. Hence, the performance of our RPC implementation is decidedly superior. Third, unlike fixed block size algorithms, RPC, requires no block size tuning parameter. We present performance measurements on several current architectures that demonstrate improvements over the traditional packed routines. Also MSP parallel computations on the IBM SMP computer are made. The graphs that are attached in Section 7 show that the RPC algorithms are superior by a factor between 1.6 and 7.4 for order around 1000, and between 1.9 and 10.3 for order around 3000 over the traditional packed algorithms. For some architectures, the RPC performance results are almost the same or even better than the traditional full-storage algorithms results.

References

[1]
AGARWAL,R.C.,GUSTAVSON,F.G.,AND ZUBAIR, M. 1994. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Dev. 38, 5 (Sept.), 563-576.
[2]
ANDERSEN,B.S.,GUSTAVSON, F., KARAIVANOV, A., WASNIEWSKI, J., AND YALAMOV, P. Y. 1999. LAWRA-Linear algebras with recursive algorithms. In Proceedings of the 3rd Interna-tional Conference on Parallel Processing and Applied Mathematics (PPAM '99, Kazimierz Dolny, Poland), J. Szopa. 63-76.
[3]
ANDERSON, E., BAI, Z., BISCHOF, C., BLACKFORD,L.S.,DEMMEL, J., DONGARRA, J., DU CROZ, J., GREENBAUM, A., HAMMARLING, S., MCKENNEY, A., AND SORENSON, D. 1999. LAPACK Users's Guide. 3rd ed. SIAM, Philadelphia, PA.
[4]
BILMES, J., ASANOVIC, K., CHIN, C.-W., AND DEMMEL, J. 1997. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the 1997 International Conference on Supercomputing (ICS '97, Vienna, Austria, July 7-11), S. J. Wallach and H. Zima, Chairs. ACM Press, New York, NY, 340-347.
[5]
DEMMEL, J. W. 1997. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA.
[6]
DONGARRA,J.AND WASNIEWSKI, J. 1999. High performance linear algebra package- LAPACK90. In Advances in Randomized Parallel Computing, P. M. Pardalos and S. Rajasekaran, Eds. Combinatorial Optimization, vol. 5. Kluwer Academic Publishers, Hingham, MA, 241-275.
[7]
DONGARRA,J.J.,DU CROZ,J.J.,HAMMARLING, S., AND DUFF, I. S. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1 (Mar.), 1-17.
[8]
DONGARRA,J.J.,DU CROZ, J., HAMMARLING, S., AND HANSON, R. J. 1988. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14, 1 (Mar.), 1-17.
[9]
DONGARRA,J.J.,DUFF,I.S.,SORENSEN,D.C.,AND VAN DER VORST, H. A. 1998. Numerical Linear Algebra for High-Performance Computers. 2nd ed. SIAM, Philadelphia, PA.
[10]
GOLUB,G.H.AND VAN LOAN, C. F. 1996. Matrix Computations. 3rd ed. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, Baltimore, MD.
[11]
GUSTAVSON, F., HENRIKSSON, A., JONSSON, I., KAGSTROM, B., AND LING, P. 1998. Recursive blocked data formats and BLAS for dense linear algebra algorithms. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 195-206.
[12]
GUSTAVSON, F., HENRIKSSON, A., JONSSON, I., KAGSTROM, B., AND LING, P. 1998. Superscalar GEMM-based level 3 BLAS-The on-going evolution of portable and high-performance library. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 207-215.
[13]
GUSTAVSON, F. G. 1997. Recursion leads to automatic variable blocking for dense linearalgebra algorithms. IBM J. Res. Dev. 41, 6, 737-756.
[14]
HIGHAM, N. J. 1996. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA.
[15]
IBM. 1997a. IBM engineering and scientific subroutine library for AIX, version 3, volume 1. Pub. No. SA22-7272-0.
[16]
IBM. 1997b. XL Fortran AIX, language reference, 1st edition, version 5, release 1.
[17]
IBM. 1997c. XL Fortran AIX User's Guide. 1st ed. IBM Corp., Riverton, NJ. Version 5, release 1.
[18]
KAGSTROM, B., LING, P., AND VAN LOAN, C. 1998. GEMM-based level 3 BLAS: Highperformance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24, 3 (Sept.), 268-302.
[19]
LAWSON,C.L.,HANSON,R.J.,KINCAID,D.R.,AND KROGH, F. T. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5, 3, 308-323.
[20]
METCALF,M.AND REID, J. 1999. Fortran 90/95 Explained. 2nd ed. Oxford University Press, Oxford, UK.
[21]
TOLEDO, S. 1997. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18, 4, 1065-1081.
[22]
TREFETHEN,L.N.AND BAU, D. 1997. Numerical Lineary Algebra. SIAM, Philadelphia, PA.
[23]
WASNIEWSKI, J., ANDERSEN,B.S.,AND GUSTAVSON, F. 1998. Recursive formation of Cholesky algorithm in Fortran 90. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 574- 578.
[24]
WHALEY,R.C.AND DONGARRA, J. 1999. ATLAS: Automatically tuned linear algebra software. University of Tennessee at Knoxville, Knoxville, TN. http://www.netlib.org/atlas/.

Cited By

View all
  • (2021)Enhancing parallelism of distributed algorithms with the actor model and a smart data movement techniqueInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2021.1971665(1-14)Online publication date: 31-Aug-2021
  • (2019)Least squares solvers for distributed-memory machines with GPU acceleratorsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330356(117-126)Online publication date: 26-Jun-2019
  • (2019)An Efficient Solution to Structured Optimization Problems using Recursive MatricesComputer Graphics Forum10.1111/cgf.1375838:8(33-39)Online publication date: 14-Nov-2019
  • Show More Cited By

Recommendations

Reviews

Maurice W. Benson

Due to symmetry in Cholesky factorization, only the lower (or upper) triangular part of the matrix need be stored. With standard packed storage, matrix elements in the triangular part are ordered by columns (or rows) and stored in a one-dimensional array. This presents problems of efficiency where memory hierarchy (CPU, registers, cache, and so on) is better used if a locality of reference (blocking) is present in the algorithm. These considerations motivate a new packed storage arrangement produced by a recursive approach, where at the coarsest level, the triangular part of the matrix is partitioned into an approximately square part and two smaller triangular parts. This process is continued recursively for the two smaller triangular parts, with the final new packing taking the same total space as standard packing. The recursive Cholesky algorithm using this new packed storage gets superior performance by utilizing level-3 BLAS (Basic Linear Algebra Subroutines). Using well-constructed figures, the paper provides a clear description of the algorithms. A significant part of the presentation involves tests on a variety of machines (seven) and over a range of matrix sizes. Standard packing, the new formulation, and full matrix (no packing) algorithms are compared. The new approach is shown to be significantly more effective than the one using standard packing. This work should be of interest to researchers in computational linear algebra, as well as to those developing high- performance software libraries. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 27, Issue 2
June 2001
154 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/383738
  • Editor:
  • Ron Boisvert
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2001
Published in TOMS Volume 27, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BLAS
  2. Cholesky factorization and solution
  3. complex Hermitian matrices
  4. novel packed matrix data structures
  5. positive definite matrices
  6. real symmetric matrices
  7. recursive algorithms

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Enhancing parallelism of distributed algorithms with the actor model and a smart data movement techniqueInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2021.1971665(1-14)Online publication date: 31-Aug-2021
  • (2019)Least squares solvers for distributed-memory machines with GPU acceleratorsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330356(117-126)Online publication date: 26-Jun-2019
  • (2019)An Efficient Solution to Structured Optimization Problems using Recursive MatricesComputer Graphics Forum10.1111/cgf.1375838:8(33-39)Online publication date: 14-Nov-2019
  • (2019)Linear Systems Solvers for Distributed-Memory Machines with GPU AcceleratorsEuro-Par 2019: Parallel Processing10.1007/978-3-030-29400-7_35(495-506)Online publication date: 26-Aug-2019
  • (2017)Optimization of Triangular and Banded Matrix Operations Using 2d-Packed LayoutsACM Transactions on Architecture and Code Optimization10.1145/316201614:4(1-19)Online publication date: 18-Dec-2017
  • (2017)Algorithm 979ACM Transactions on Mathematical Software10.1145/306166444:2(1-19)Online publication date: 18-Sep-2017
  • (2016)Auto-tuning TRSM with an asynchronous task assignment model on multicore, multi-GPU and coprocessor systems2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)10.1109/AICCSA.2016.7945637(1-8)Online publication date: Nov-2016
  • (2014)Direct Methods for Linear SystemsNumerical Methods in Matrix Computations10.1007/978-3-319-05089-8_1(1-209)Online publication date: 7-Oct-2014
  • (2013)Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky AlgorithmsACM Transactions on Mathematical Software10.1145/2427023.242702639:2(1-10)Online publication date: 1-Feb-2013
  • (2013)Layout-oblivious compiler optimization for matrix computationsACM Transactions on Architecture and Code Optimization10.1145/2400682.24006949:4(1-20)Online publication date: 20-Jan-2013
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media