Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

A set of level 3 basic linear algebra subprograms

Published: 01 March 1990 Publication History

Abstract

This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations that should provide for efficient and portable implementations of algorithms for high-performance computers

References

[1]
BARRON, D. W., AND SWINNERTON-DYER, H. P.F. Solution of simultaneous linear equations using a magnetic-tape store. Comput. J. 3 (1960), 28-33.
[2]
BERRY, M., GALLIVAN, K., HARROD, W., JALBY, W., LO, S., MEIER, U., PHILIPPE, B., AND SAMEH, A. Parallel algorithms on the CEDAR system. CSRD Report 581, 1986.
[3]
BISCHOF, C., AND VAN LOAN, C. The WY representation for products of Householder matrices. SIAM J. Sci. Star. Comput. 8, 1 (Jan. 1987), s2-s13.
[4]
BRONLUND, O. E., AND JOHNSEN, T. QR-factorization of partitioned matrices. Comput. Meth. Appl. Mech. Eng., vol. 3, pp. 153-172, 1974.
[5]
BUCHER, I., AND JORDAN, T. Linear algebra programs for use on a vector computer with a secondary solid state storage device. In Advances in Computer Methods for Partial Differential Equations, R. Vichnevetsky and R. Stepleman, Eds. IMACS, 1984, 546-550.
[6]
CALAHAN, D.A. Block-oriented local-memory-based linear equation solution on the CRAY-2: Uniprocessor algorithms. In Proceedings International Conference on Parallel Processing (Aug. 1986). IEEE Computer Society Press, New York, 1986.
[7]
CARNEVALI, P., RADICATI DI BROZOLO, G., ROBERT, Y., AND SGUAZZERO, P. Efficient Fortran implementation of the Gaussian elimination and Householder reduction algorithms on the IBM 3090 vector multiprocessor. IBM ECSEC Rep. ICE-0012, 1987.
[8]
CHARTRES, B. Adaption of the Jacobi and Givens methods for a computer with magnetic tape backup store. Univ. of Sydney Tech. Rep. 8, 1960.
[9]
DAVE, A. K., AND DUFF, I.S. Sparse matrix calculations on the CRAY-2. Parallel Comput. 5 (July 1987), 55-64.
[10]
DEMMEL, J., DONGARRA, J. J., DU CROZ, J., GREENBAUM, A., HAMMARLING, S., AND SORENSEN, D. Prospectus for the development of a linear algebra library for high-performance computers. Argonne National Lab. Rep. ANL-MCS-TM-97, Sept. 1987.
[11]
DIETRICH, G. A new formulation of the hypermatrix Householder QR-decomposition. Comput. Meth. AppI. Mech. Eng. 9 (1976), 273-280.
[12]
DODSON, D., AND LEWIS, J. Issues relating to extension of the basic linear algebra subprograms. ACM SIGNUM Newsl. 20, 1 (1985), 2-18.
[13]
DONGARRA, J. J., BUNCH, J., MOLER, C., AND STEWART, G. LINPACK Users' Guide. SIAM, Philadelphia, Pa., 1979.
[14]
DONGARRA, J. J., DuCRoz, J., HAMMARLING, S., AND HANSON, R. An extended set of Fortran basic linear algebra subprograms. ACM Trans. Math. Softw. 14, i (Mar. 1988), 1-17.
[15]
DONGARRA, J. J., DuCRoz, Z., HAMMARLING, S., AND HANSON, R. An extended set of Fortran basic linear algebra subprograms: Model implementation and test programs. ACM Trans. Math. Softw. 14, I (Mar. 1988), 18-32.
[16]
DONGARRA, J. J., DuCRoz, J., DUFF, I. S., AND HAMMARLING, S. A set of level 3 basic linear algebra subprograms: Model implementation and test programs. This issue, pp. 18-37.
[17]
DONGARRA, J. J., AND DUFF, I.S. Advanced architecture computers. Univ. of Tennessee, Rep. CS-89-90, Nov. 1989.
[18]
DONGARRA, J. J., GUSTAVSON, F., AND KARP, A. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Rev. 26, 1 (1984), 91-112.
[19]
DONGARRA, J. J., HAMMARLING, S., AND SORENSEN, O. C. Block reduction of matrices to condensed forms for eigenvalue computations. Argonne National Lab. Rep. ANL-MCS-TM-99, Sept. 1987.
[20]
DONGARRA, J. J., AND HEWITT, T. Implementing dense linear algebra using multitasking on the CRAY X-MP-4. J. Comput. Appl. Math. 27 (1989), 215-227.
[21]
DONGARRA, J. J., AND SORENSEN, D.C. Linear algebra on high-performance computers. In Proceedings Parallel Computing 85, U. Schendel, Ed. North Holland, Amsterdam, 1986, 113-136.
[22]
DuCRoz, J., NUGENT, S., REID, J., AND TAYLOR, D. Solving large full sets of linear equations in a paged virtual store. ACM Trans. Math. Softw. 7, 4 (1981), 527-536.
[23]
DUFF, I. S. Full matrix techniques in sparse Gaussian elimination. In Numerical Analysis Proceedings, Dundee 1981, Lecture Notes in Mathematics 912. Springer-Verlag, New York, 1981, 71-84.
[24]
GALLIVAN, K., JALBV, W., AND MEIER, U. The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM J. Sci. Star. Comput. 8, 6 (Nov. 1987), 1079-1084.
[25]
GEORGE, A., AND RASHWAN, S. Auxiliary storage methods for solving finite element systems. SIAM J. Sci. Star. Comput. 6, 4 (Oct. 1985), 882-910.
[26]
IBM. Engineering and scientific subroutine library. Program 5668-863, 1986.
[27]
LAWSON, C., HANSON, R. KINCAID, D., AND KROGH, F. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5 (1979), 308-323.
[28]
LAWSON, C., HANSON, R., KINCAID, D., AND KROGH, F. Algorithm 539: Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5 (1979), 324-325.
[29]
MCKELLAR, A. C., AND COFFMAN, E. G., JR. Organizing matrices and matrix operations for paged memory systems. Commun. ACM 12, 3 (1969), 153-165.
[30]
ROBERT, Y., AND SGUAZZERO, P. The LU decomposition algorithm and its efficient Fortran implementation on the IBM 3090 vector multiprocessor. IBM ECSEC Rep. ICE-0006, 1987.
[31]
SCHREIBER, R. Module design specification (Version 1.0). SAXPY Computer Corp., 255 San Geronimo Way, Sunnyvale, CA 94086, 1986.
[32]
SCHREIBER, R., AND PARLETT, B. Block reflectors: Theory and computation. SIAM J. Numer. Anal. 25, 1 (Feb. 1988), 189-205.

Cited By

View all
  • (2025)Exo 2: Growing a Scheduling LanguageProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707218(426-444)Online publication date: 3-Feb-2025
  • (2025)PyLOM: A HPC open source reduced order model suite for fluid dynamics applicationsComputer Physics Communications10.1016/j.cpc.2024.109459308(109459)Online publication date: Mar-2025
  • (2025)Parallel cyclic reduction of padded bordered almost block diagonal matricesJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116331458(116331)Online publication date: Apr-2025
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 16, Issue 1
March 1990
109 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/77626
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 1990
Published in TOMS Volume 16, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)485
  • Downloads (Last 6 weeks)62
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Exo 2: Growing a Scheduling LanguageProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707218(426-444)Online publication date: 3-Feb-2025
  • (2025)PyLOM: A HPC open source reduced order model suite for fluid dynamics applicationsComputer Physics Communications10.1016/j.cpc.2024.109459308(109459)Online publication date: Mar-2025
  • (2025)Parallel cyclic reduction of padded bordered almost block diagonal matricesJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116331458(116331)Online publication date: Apr-2025
  • (2024)A computational perspective on neural-symbolic integrationNeurosymbolic Artificial Intelligence10.3233/NAI-240672(1-12)Online publication date: 18-Jul-2024
  • (2024)Quantitative Performance Analysis of BLAS Libraries on GPU ArchitecturesBLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans AnaliziDeu Muhendislik Fakultesi Fen ve Muhendislik10.21205/deufmd.202426760626:76(40-48)Online publication date: 23-Jan-2024
  • (2024)Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processorsInternational Journal of High Performance Computing Applications10.1177/1094342023115765338:2(55-68)Online publication date: 10-Apr-2024
  • (2024)Algorithm 1051: UltimateKalman, Flexible Kalman Filtering and Smoothing Using Orthogonal TransformationsACM Transactions on Mathematical Software10.1145/369995850:4(1-19)Online publication date: 12-Dec-2024
  • (2024)Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable MatricesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673119(32-41)Online publication date: 12-Aug-2024
  • (2024)High-Performance 3D convolution on the Latest Generation Sunway ProcessorProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673093(241-251)Online publication date: 12-Aug-2024
  • (2024)CoActo: CoActive Neural Network Inference Offloading with Fine-grained and Concurrent ExecutionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661885(412-424)Online publication date: 3-Jun-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media