Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Design, implementation and testing of extended and mixed precision BLAS

Published: 01 June 2002 Publication History

Abstract

This article describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more accurate, and sometimes faster than possible without these features. The new BLAS are challenging to implement and test because there are many more subroutines than in the existing Standard, and because we must be able to assess whether a higher precision is used for internal computations than is used for either input or output variables. We have therefore developed an automated process of generating and systematically testing these routines. Our methodology is applicable to languages besides C. In particular, our algorithms used in the testing code will be valuable to all other BLAS implementors. Our extra precision routines achieve excellent performance---close to half of the machine peak Megaflop rate even for the Level 2 BLAS, when the data access is stride one.

References

[1]
Basic Linear Algebra Subprograms (BLAS) Standard. 2000. http://www.netlib.org/utk/papers/blast-forum.html.
[2]
IEEE. 2001. 754 Revision. http://grouper.ieee.org/groups/754/.
[3]
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D. 1999. LAPACK Users' Guide, Release 3.0. SIAM, Philadelphia. Pa.
[4]
ANSI/IEEE Std. 754-1985. IEEE Standard for Binary Floating Point Arithmetic.
[5]
ANSI/IEEE Std. 854-1987. IEEE Standard for Radix Independent Floating Point Arithmetic.
[6]
ISO/IEC 9899:1999. Standard for the C programming language (C99). Jan 99 draft available at http://anubis.dkuug.dk/JTC1/SC22/WG14/www/docs/n869. Final version to be available from http://www.iso.ch.
[7]
Bailey, D. 2000. A Fortran-90 double-double precision library. http://www.nersc.gov/∼dhbailey/mpdist/mpdist.html.
[8]
Bailey, D. H. 1995. A Fortran-90 based multiprecision system. ACM Trans. Math. Soft. 21, 4, 379--387.
[9]
Bilmes, J., Asanović, K., Demmel, J., Lam, D., and Chin, C. 1996. The PHiPAC WWW home page. http://www.icsi.berkeley.edu/∼bilmes/phipac.
[10]
Blackford, L. S., Choi, J., D'Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., and Whaley, R. C. 1997. ScaLAPACK Users' Guide. SIAM, Philadelphia, Pa.
[11]
Bleher, J., Roeder, A., and Rump, S. 1985. ACRITH: High-Accuracy Arithmetic---An advanced tool for numerical computation. In Proceedings of the 7th Symposium on Computer Arithmetic. IEEE Computer Society Press, Urbana, Il.
[12]
Brent, R. 1978. A Fortran multiple precision arithmetic package. ACM Trans. Math. Soft. 4, 57--70.
[13]
Briggs, K. 1998. Doubledouble floating point arithmetic. http://www-epidem.plantsci.cam.ac.uk/∼kbriggs/doubledouble.html.
[14]
Dekker, T. 1971. A floating-point technique for extending the available precision. Numer. Math. 18, 224--242.
[15]
Demmel, J. 1984. Underflow and the reliability of numerical software. SIAM J. Sci. Stat. Comput. 5, 4 (Dec.), 887--919.
[16]
Demmel, J. and Li, X. 1994. Faster numerical algorithms via exception handling. IEEE Trans. Comput. 43, 8, 983--992. LAPACK Working Note 59.
[17]
Demmel, J. W. 1997. Applied Numerical Linear Algebra. SIAM, Philadelphia, Pa.
[18]
Dhillon, I., Fann, G., and Parlett, B. 1997. Application of a new algorithm for the symmetric eigenproblem to computational quantum chemistry. In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, Pa.
[19]
Dhillon, I. S. 1997. A new O(n2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. Ph.D. dissertation. University of California, Berkeley Berkeley, California.
[20]
Dongarra, J., Bunch, J., Moler, C., and Stewart, G. W. 1979. LINPACK User's Guide. SIAM, Philadelphia, Pa.
[21]
Dongarra, J., Du Croz, J., Duff, I. S., and Hammarling, S. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1--17.
[22]
Dongarra, J., Du Croz, J., Hammarling, S., and Hanson, R. J. 1988. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14, 1 (Mar.), 1--17.
[23]
Eisenstat, S. C., Elman, H. C., and Schultz, M. H. 1983. Variational iterative methods for nonsymmetric systems of linear equations. SIAM J. Numer. Anal. 20, 345--357.
[24]
Forsythe, G. E. and Moler, C. B. 1967. Computer Solution of Linear Algebraic Systems. Prentice-Hall, Englewood Cliffs, N.J.
[25]
Gupta, A., Joshi, M., Karypis, G., and Kumar, V. 1999. PSPASES: A scalable parallel direct solver for sparse symmetric positive definite systems. http://www.cs.umn.edu/mjoshi/pspases.
[26]
Henry, G. 1998. UNIX extended precision library for the pentium. http://www.cs.utk.edu/∼ghenry/distrib/archive.htm.
[27]
Higham, N. J. 1996. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, Pa.
[28]
Kahan, W. 1966 (revised June 1968). Accurate eigenvalues of a symmetric tridiagonal matrix. Computer Science Dept. Technical Report CS41, Stanford University, Stanford, Calif., July.
[29]
Kahan, W. 1998. Matlab's loss is nobody's gain. http://www.cs.berkeley.edu/∼wkahan/MxMulEps.pdf.
[30]
Kahan, W. and Darcy, J. D. 1998. How Java's floating-point hurts everyone everywhere. http://www.cs.berkeley.edu/∼wkahan/JAVAhurt.pdf.
[31]
Kahan, W. and LeBlanc, E. 1985. Anomalies in the IBM ACRITH package. In Proceedings of the 7th Symposium on Computer Arithmetic. IEEE Computer Society Press, Urbana, Ill.
[32]
Knuth, D. 1969. The Art of Computer Programming, vol. 2. Addison-Wesley, Reading, Mass.
[33]
Kulisch, U. and Miranker, W., Eds. 1983. A New Approach to Scientific Computing. Academic Press, New York.
[34]
Lawson, C., Hanson, R., Kincaid, D., and Krogh, F. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5, 308--323.
[35]
Li, X. S. and Demmel, J. W. 1998. Making sparse Gaussian elimination scalable by static pivoting. In Proceedings of SC98. Orlando, Fla.
[36]
Li, X. S. and Demmel, J. W. 2001. SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems. Tech. rep., Lawrence Berkeley National Laboratory. December. in preparation.
[37]
Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kapur, A., Martin, M. C., Tung, T., and Yoo, D. J. 2000. Design, implementation and testing of extended and mixed precision BLAS. Tech. Rep. LBNL-45991, Lawrence Berkeley National Laboratory. June. http://www.nersc.gov/∼xiaoye/XBLAS/.
[38]
M4 macro processor. http://www.cs.utah.edu/csinfo/texinfo/m4/m4.html.
[39]
Macsyma, Inc. 1996. Macsyma Mathematics and System Reference Manual, 16th ed, 589 pages.
[40]
Møller, O. 1965. Quasi double precision in floating-point arithmetic. BIT 5, 37--50.
[41]
Monagan, M., Geddes, K., Heal, K., Labahn, G., and Vorkoetter, S. 1997. Maple V Programming Guide for Release 5. Springer-Verlag, New York.
[42]
Oettli, W. and Prager, W. 1964. Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides. Numer. Math. 6, 405--409.
[43]
Parlett, B. and Dhillon, I. 1997. Fernando's solution to Wilkinson's problem: An application of double factorization. Lin. Alg. Appl. 267, 247--279.
[44]
Parlett, B. and Marques, O. 2000. An implementation of the DQDS algorithm (Positive case). Lin. Alg. Appl. 309, 217--259.
[45]
Pichat, M. 1972. Correction d'une somme en arithmetique à virgule flottante. Numer. Math. 19, 400--406.
[46]
Priest, D. 1991. Algorithms for arbitrary precision floating point arithmetic. In Proceedings of the 10th Symposium on Computer Arithmetic, P. Kornerup and D. Matula, Eds. IEEE Computer Society Press, Grenoble, France, 132--145.
[47]
Saad, Y. and Schultz, M. H. 1986. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist. Comput. 7, 856--869.
[48]
Shewchuk, J. R. 1997. Adaptive precision floating-point arithmetic and fast robust geometric predicates. Disc. Comput. Geom. 18, 305--363.
[49]
Smith, B. T., Boyle, J. M., Dongarra, J. J., Garbow, B. S., Ikebe, Y., Klema, V. C., and Moler, C. B. 1976. Matrix Eigensystem Routines---EISPACK Guide. In Lecture Notes in Computer Science, vol. 6. Springer-Verlag, Berlin, Germany.
[50]
Turner, K. and Walker, H. F. 1992. Efficient high accuracy solutions with GMRES(m). SIAM J. Sci. Stat. Comput. 13, 815--825.
[51]
Whaley, R. C. and Dongarra, J. 1998. The ATLAS home page. http://www.netlib.org/atlas/.
[52]
Wolfram, S. 1988. Mathematica: A System for Doing Mathematics by Computer. Addison-Wesley, Reading, Mass.

Cited By

View all
  • (2024)Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUsSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00064(1-16)Online publication date: 17-Nov-2024
  • (2024)Useful applications of correctly-rounded operators of the form ab + cd + e2024 IEEE 31st Symposium on Computer Arithmetic (ARITH)10.1109/ARITH61463.2024.00015(32-39)Online publication date: 10-Jun-2024
  • (2024)Acceleration of iterative refinement for singular value decompositionNumerical Algorithms10.1007/s11075-023-01596-995:2(979-1009)Online publication date: 1-Feb-2024
  • Show More Cited By

Index Terms

  1. Design, implementation and testing of extended and mixed precision BLAS

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Mathematical Software
    ACM Transactions on Mathematical Software  Volume 28, Issue 2
    June 2002
    151 pages
    ISSN:0098-3500
    EISSN:1557-7295
    DOI:10.1145/567806
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2002
    Published in TOMS Volume 28, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. BLAS
    2. double-double arithmetic
    3. extended and mixed precision

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUsSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00064(1-16)Online publication date: 17-Nov-2024
    • (2024)Useful applications of correctly-rounded operators of the form ab + cd + e2024 IEEE 31st Symposium on Computer Arithmetic (ARITH)10.1109/ARITH61463.2024.00015(32-39)Online publication date: 10-Jun-2024
    • (2024)Acceleration of iterative refinement for singular value decompositionNumerical Algorithms10.1007/s11075-023-01596-995:2(979-1009)Online publication date: 1-Feb-2024
    • (2023)Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00094(608-615)Online publication date: 18-Dec-2023
    • (2023)Leveraging Mixed Precision in Exponential Time Integration Methods2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363489(1-8)Online publication date: 25-Sep-2023
    • (2023)LAProof: A Library of Formal Proofs of Accuracy and Correctness for Linear Algebra Programs2023 IEEE 30th Symposium on Computer Arithmetic (ARITH)10.1109/ARITH58626.2023.00021(36-43)Online publication date: 4-Sep-2023
    • (2023)PACFApplied Mathematics and Computation10.1016/j.amc.2022.127611440:COnline publication date: 1-Mar-2023
    • (2023)ddRingAllreduce: a high-precision RingAllreduce algorithmCCF Transactions on High Performance Computing10.1007/s42514-023-00150-25:3(245-257)Online publication date: 5-Jul-2023
    • (2022)Formalization of Double-Word Arithmetic, and Comments on “Tight and Rigorous Error Bounds for Basic Building Blocks of Double-Word Arithmetic”ACM Transactions on Mathematical Software10.1145/348451448:1(1-24)Online publication date: 16-Feb-2022
    • (2022)Accelerating Variants of the Conjugate Gradient with the Variable Precision Processor2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)10.1109/ARITH54963.2022.00017(51-57)Online publication date: Sep-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media