Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

A set of level 3 basic linear algebra subprograms

Published: 01 March 1990 Publication History

Abstract

This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations that should provide for efficient and portable implementations of algorithms for high-performance computers

References

[1]
BARRON, D. W., AND SWINNERTON-DYER, H. P.F. Solution of simultaneous linear equations using a magnetic-tape store. Comput. J. 3 (1960), 28-33.
[2]
BERRY, M., GALLIVAN, K., HARROD, W., JALBY, W., LO, S., MEIER, U., PHILIPPE, B., AND SAMEH, A. Parallel algorithms on the CEDAR system. CSRD Report 581, 1986.
[3]
BISCHOF, C., AND VAN LOAN, C. The WY representation for products of Householder matrices. SIAM J. Sci. Star. Comput. 8, 1 (Jan. 1987), s2-s13.
[4]
BRONLUND, O. E., AND JOHNSEN, T. QR-factorization of partitioned matrices. Comput. Meth. Appl. Mech. Eng., vol. 3, pp. 153-172, 1974.
[5]
BUCHER, I., AND JORDAN, T. Linear algebra programs for use on a vector computer with a secondary solid state storage device. In Advances in Computer Methods for Partial Differential Equations, R. Vichnevetsky and R. Stepleman, Eds. IMACS, 1984, 546-550.
[6]
CALAHAN, D.A. Block-oriented local-memory-based linear equation solution on the CRAY-2: Uniprocessor algorithms. In Proceedings International Conference on Parallel Processing (Aug. 1986). IEEE Computer Society Press, New York, 1986.
[7]
CARNEVALI, P., RADICATI DI BROZOLO, G., ROBERT, Y., AND SGUAZZERO, P. Efficient Fortran implementation of the Gaussian elimination and Householder reduction algorithms on the IBM 3090 vector multiprocessor. IBM ECSEC Rep. ICE-0012, 1987.
[8]
CHARTRES, B. Adaption of the Jacobi and Givens methods for a computer with magnetic tape backup store. Univ. of Sydney Tech. Rep. 8, 1960.
[9]
DAVE, A. K., AND DUFF, I.S. Sparse matrix calculations on the CRAY-2. Parallel Comput. 5 (July 1987), 55-64.
[10]
DEMMEL, J., DONGARRA, J. J., DU CROZ, J., GREENBAUM, A., HAMMARLING, S., AND SORENSEN, D. Prospectus for the development of a linear algebra library for high-performance computers. Argonne National Lab. Rep. ANL-MCS-TM-97, Sept. 1987.
[11]
DIETRICH, G. A new formulation of the hypermatrix Householder QR-decomposition. Comput. Meth. AppI. Mech. Eng. 9 (1976), 273-280.
[12]
DODSON, D., AND LEWIS, J. Issues relating to extension of the basic linear algebra subprograms. ACM SIGNUM Newsl. 20, 1 (1985), 2-18.
[13]
DONGARRA, J. J., BUNCH, J., MOLER, C., AND STEWART, G. LINPACK Users' Guide. SIAM, Philadelphia, Pa., 1979.
[14]
DONGARRA, J. J., DuCRoz, J., HAMMARLING, S., AND HANSON, R. An extended set of Fortran basic linear algebra subprograms. ACM Trans. Math. Softw. 14, i (Mar. 1988), 1-17.
[15]
DONGARRA, J. J., DuCRoz, Z., HAMMARLING, S., AND HANSON, R. An extended set of Fortran basic linear algebra subprograms: Model implementation and test programs. ACM Trans. Math. Softw. 14, I (Mar. 1988), 18-32.
[16]
DONGARRA, J. J., DuCRoz, J., DUFF, I. S., AND HAMMARLING, S. A set of level 3 basic linear algebra subprograms: Model implementation and test programs. This issue, pp. 18-37.
[17]
DONGARRA, J. J., AND DUFF, I.S. Advanced architecture computers. Univ. of Tennessee, Rep. CS-89-90, Nov. 1989.
[18]
DONGARRA, J. J., GUSTAVSON, F., AND KARP, A. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Rev. 26, 1 (1984), 91-112.
[19]
DONGARRA, J. J., HAMMARLING, S., AND SORENSEN, O. C. Block reduction of matrices to condensed forms for eigenvalue computations. Argonne National Lab. Rep. ANL-MCS-TM-99, Sept. 1987.
[20]
DONGARRA, J. J., AND HEWITT, T. Implementing dense linear algebra using multitasking on the CRAY X-MP-4. J. Comput. Appl. Math. 27 (1989), 215-227.
[21]
DONGARRA, J. J., AND SORENSEN, D.C. Linear algebra on high-performance computers. In Proceedings Parallel Computing 85, U. Schendel, Ed. North Holland, Amsterdam, 1986, 113-136.
[22]
DuCRoz, J., NUGENT, S., REID, J., AND TAYLOR, D. Solving large full sets of linear equations in a paged virtual store. ACM Trans. Math. Softw. 7, 4 (1981), 527-536.
[23]
DUFF, I. S. Full matrix techniques in sparse Gaussian elimination. In Numerical Analysis Proceedings, Dundee 1981, Lecture Notes in Mathematics 912. Springer-Verlag, New York, 1981, 71-84.
[24]
GALLIVAN, K., JALBV, W., AND MEIER, U. The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM J. Sci. Star. Comput. 8, 6 (Nov. 1987), 1079-1084.
[25]
GEORGE, A., AND RASHWAN, S. Auxiliary storage methods for solving finite element systems. SIAM J. Sci. Star. Comput. 6, 4 (Oct. 1985), 882-910.
[26]
IBM. Engineering and scientific subroutine library. Program 5668-863, 1986.
[27]
LAWSON, C., HANSON, R. KINCAID, D., AND KROGH, F. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5 (1979), 308-323.
[28]
LAWSON, C., HANSON, R., KINCAID, D., AND KROGH, F. Algorithm 539: Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5 (1979), 324-325.
[29]
MCKELLAR, A. C., AND COFFMAN, E. G., JR. Organizing matrices and matrix operations for paged memory systems. Commun. ACM 12, 3 (1969), 153-165.
[30]
ROBERT, Y., AND SGUAZZERO, P. The LU decomposition algorithm and its efficient Fortran implementation on the IBM 3090 vector multiprocessor. IBM ECSEC Rep. ICE-0006, 1987.
[31]
SCHREIBER, R. Module design specification (Version 1.0). SAXPY Computer Corp., 255 San Geronimo Way, Sunnyvale, CA 94086, 1986.
[32]
SCHREIBER, R., AND PARLETT, B. Block reflectors: Theory and computation. SIAM J. Numer. Anal. 25, 1 (Feb. 1988), 189-205.

Cited By

View all
  • (2025)Parallel cyclic reduction of padded bordered almost block diagonal matricesJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116331458(116331)Online publication date: Apr-2025
  • (2024)A computational perspective on neural-symbolic integrationNeurosymbolic Artificial Intelligence10.3233/NAI-240672(1-12)Online publication date: 18-Jul-2024
  • (2024)Quantitative Performance Analysis of BLAS Libraries on GPU ArchitecturesBLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans AnaliziDeu Muhendislik Fakultesi Fen ve Muhendislik10.21205/deufmd.202426760626:76(40-48)Online publication date: 23-Jan-2024
  • Show More Cited By

Recommendations

Reviews

Chaya Gurwitz

The original set of FORTRAN basic linear algebra subprograms, or Level 1 BLAS, included vector operations [1]; the routines in Level 2 BLAS were later added to provide matrix-vector operations [2]. This paper proposes adding a set of Level 3 BLAS, which would be used to perform matrix-matrix operations. The Level 1 and Level 2 BLAS have been adopted by the mathematical programming community as basic routines that are used as building blocks for software development. Efficient machine code implementations of the BLAS that take advantage of specific hardware features can lead to significant computational speedup. Using the BLAS provides portability and ease of maintenance. The proposed Level 3 BLAS are especially suitable for programming on computers with a hierarchy of memory and on machines using parallel processors. For these types of computers, computation is most efficient if the matrices are partitioned into blocks and matrix-matrix operations are performed on the blocks. On architectures that support parallel processing, operations on different blocks may be performed in parallel. The operations proposed for inclusion in the Level 3 BLAS are: matrix-matrix products, rank- k and rank-2 k updates of symmetric and Hermetian matrices, multiplication of a rectangular matrix by a triangular matrix, and solving triangular systems of equations with multiple right-hand sides. The routines are provided for four different FORTRAN data types: real, double precision, complex, and double complex. The paper describes the naming conventions and calling sequences for the subprograms, which generally follow the conventions used in the Level 2 BLAS. The authors discuss the reasoning used in selecting the operations to be included in the Level 3 BLAS. The paper concludes with a discussion of the application of the Level 3 BLAS to solve numerical linear algebra problems in terms of operations on submatrices (blocks). An example illustrates how the Level 3 BLAS can be used to implement the Cholesky factorization as a block algorithm.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 16, Issue 1
March 1990
109 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/77626
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 1990
Published in TOMS Volume 16, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)433
  • Downloads (Last 6 weeks)47
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Parallel cyclic reduction of padded bordered almost block diagonal matricesJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116331458(116331)Online publication date: Apr-2025
  • (2024)A computational perspective on neural-symbolic integrationNeurosymbolic Artificial Intelligence10.3233/NAI-240672(1-12)Online publication date: 18-Jul-2024
  • (2024)Quantitative Performance Analysis of BLAS Libraries on GPU ArchitecturesBLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans AnaliziDeu Muhendislik Fakultesi Fen ve Muhendislik10.21205/deufmd.202426760626:76(40-48)Online publication date: 23-Jan-2024
  • (2024)Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processorsInternational Journal of High Performance Computing Applications10.1177/1094342023115765338:2(55-68)Online publication date: 1-Mar-2024
  • (2024)Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable MatricesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673119(32-41)Online publication date: 12-Aug-2024
  • (2024)High-Performance 3D convolution on the Latest Generation Sunway ProcessorProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673093(241-251)Online publication date: 12-Aug-2024
  • (2024)CoActo: CoActive Neural Network Inference Offloading with Fine-grained and Concurrent ExecutionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661885(412-424)Online publication date: 3-Jun-2024
  • (2024)HPS Cholesky: Hierarchical Parallelized Supernodal Cholesky with Adaptive ParametersACM Transactions on Parallel Computing10.1145/363005111:1(1-22)Online publication date: 11-Mar-2024
  • (2024)Reorthogonalized Block Classical Gram–Schmidt Using Two Cholesky-Based TSQR AlgorithmsSIAM Journal on Matrix Analysis and Applications10.1137/23M160538745:3(1487-1517)Online publication date: 9-Aug-2024
  • (2024)A Numerically Stable Communication-Avoiding \({s}\)-Step GMRES AlgorithmSIAM Journal on Matrix Analysis and Applications10.1137/23M157710945:4(2039-2074)Online publication date: 21-Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media