research-article

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Authors:

Field G. Van Zee,

Robert A. van de Geijn,

Gregorio Quintana-OrtíAuthors Info & Claims

ACM Transactions on Mathematical Software (TOMS), Volume 40, Issue 3

Article No.: 18, Pages 1 - 34

https://doi.org/10.1145/2535371

Published: 01 April 2014 Publication History

Abstract

We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they become rich in operations that can achieve near-peak performance on a modern processor. The key is a novel, cache-friendly algorithm for applying multiple sets of Givens rotations to the eigenvector/singular vector matrix. This algorithm is then implemented with optimizations that: (1) leverage vector instruction units to increase floating-point throughput, and (2) fuse multiple rotations to decrease the total number of memory operations. We demonstrate the merits of these new QR algorithms for computing the Hermitian eigenvalue decomposition (EVD) and singular value decomposition (SVD) of dense matrices when all eigenvectors/singular vectors are computed. The approach yields vastly improved performance relative to traditional QR algorithms for these problems and is competitive with two commonly used alternatives---Cuppen’s Divide-and-Conquer algorithm and the method of Multiple Relatively Robust Representations---while inheriting the more modest O(n) workspace requirements of the original QR algorithms. Since the computations performed by the restructured algorithms remain essentially identical to those performed by the original methods, robust numerical properties are preserved.

Supplementary Material

PDF File (a18-van_zee_appendix.pdf)

The proof is given in an electronic appendix, available online in the ACM Digital Library.

Download
1.83 MB

References

[1]

Anderson, E., Bai, Z., Bischof, C., Blackford, L. S., Demmel, J., Dongarra, J. J., Croz, J. D., Hammarling, S., Greenbaum, A., McKenney, A., and Sorensen, D. 1999. LAPACK Users’ Guide 3rd Ed. SIAM, Philadelphia, PA.

Digital Library

[2]

Bientinesi, P., Igual, F. D., Kressner, D., Petschow, M., and Quintana-Ortí, E. S. 2011. Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures. Concurr. Comput. Pract. Exper. 23, 7, 694--707.

Digital Library

[3]

Bischof, C. and Van Loan, C. 1987. The WY representation for products of householder matrices. SIAM J. Sci. Stat. Comput. 8, 1, 2--13.

Digital Library

[4]

Bischof, C., Lang, B., and Sun, X. 1994. Parallel tridiagonalization through two-step band reduction. In Proceedings of the Scalable High-Performance Computing Conference. IEEE Computer Society Press, 23--27.

[5]

Blast. 2002. Basic linear algebra subprograms technical forum standard. Int. J. High Perf. Appl. Supercomput. 16, 1.

Digital Library

[6]

Braman, K., Byers, R., and Mathias, R. 2001a. The multishift QR algorithm. Part I: Maintaining well focused shifts and level 3 performance. SIAM J. Matrix Anal. Appl. 23, 929--947.

Digital Library

[7]

Braman, K., Byers, R., and Mathias, R. 2001b. The multishift QR algorithm. Part II: Aggressive early deflation. SIAM J. Matrix Anal. Appl. 23, 948--973.

Digital Library

[8]

Cline, A. K. and Dhillon, I. S. 2006. Computation of the singular value decomposition. In Handbook of Linear Algebra, CRC Press, Boca Raton, FL, 45--1--45--13.

[9]

Cuppen, J. J. M. 1980. A divide and conquer method for the symmetric tridiagonal eigenproblem. Numerische Mathematik 36, 177--195.

Digital Library

[10]

Demmel, J. and Kahan, W. 1990. Accurate singular values of bidiagonal matrices. SIAM J. Sci. Statist. Comput. 11, 873--912.

Digital Library

[11]

Demmel, J. and Veselić, K. 1992. Jacobi’s method is more accurate than QR. SIAM J. Matrix Anal. Appl. 13, 4, 1204--1245.

Digital Library

[12]

Demmel, J., Dhillon, I., and Ren, H. 1995. On the correctness of some bisection-like parallel eigenvalue algorithms in floating point arithmetic. Electron. Trans. Num. Anal. 3, 116--149.

[13]

Demmel, J., Marques, O., Parlett, B., and Vömel, C. 2008. Performance and accuracy of LAPACK’s symmetric tridiagonal eigensolvers. SIAM J. Sci. Comput. 30, 3, 1508--1526.

Digital Library

[14]

Dhillon, I. S. 1997. A new O(n²) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. Ph.D. thesis, EECS Department, University of California, Berkeley.

Digital Library

[15]

Dhillon, I. S. 1998. Current inverse iteration software can fail. BIT Numer. Math. 38, 685--704.

[16]

Dhillon, I. S. and Parlett, B. N. 2003. Orthogonal eigenvectors and relative gaps. SIAM J. Matrix Anal. Appl. 25, 3, 858--899.

Digital Library

[17]

Dhillon, I. S. and Parlett, B. N. 2004. Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices. Linear Algor. Appl. 387, 1--28.

[18]

Dhillon, I. S., Parlett, B. N., and Vomel, C. 2006. The design and implementation of the MRRR algorithm. ACM Trans. Math. Softw. 32, 533--560.

Digital Library

[19]

Dongarra, J. J., Du Croz, J., Hammarling, S., and Hanson, R. J. 1988. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14, 1, 1--17.

Digital Library

[20]

Drmač, Z. 2009. A global convergence proof for cyclic Jacobi methods with block rotations. SIAM J. Matrix Anal. Appl. 31, 3, 1329--1350.

Digital Library

[21]

Drmač, Z. and Veselić, K. 2008a. New fast and accurate Jacobi SVD algorithm. I. SIAM J. Matrix Anal. Appl. 29, 4, 1322--1342.

Digital Library

[22]

Drmač, Z. and Veselić, K. 2008b. New fast and accurate Jacobi SVD algorithm. II. SIAM J. Matrix Anal. Appl. 29, 4, 1343--1362.

Digital Library

[23]

Fernando, K. V. and Parlett, B. N. 1994. Accurate singular values and differential qd algorithms. Numer. Math. 67, 191--229.

[24]

Golub, G. H. and Loan, C. F. V. 1996. Matrix Computations 3rd Ed. The Johns Hopkins University Press, Baltimore, MD.

Digital Library

[25]

Goto, K. and van de Geijn, R. A. 2008. Anatomy of a high-performance matrix multiplication. ACM Trans. Math. Soft. 34, 3.

Digital Library

[26]

Gu, M. and Eisenstat, S. C. 1995a. A divide-and-conquer algorithm for the bidiagonal SVD. SIAM J. Matrix Anal. Appl. 16, 1, 79--92.

Digital Library

[27]

Gu, M. and Eisenstat, S. C. 1995b. A divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem. SIAM J. Matrix Anal. Appl. 16, 1, 172--191.

Digital Library

[28]

Haidar, A., Ltaief, H., and Dongarra, J. 2011. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. LAPACK Working Note 254. http://www.netlib.org/lapack/lawnspdf/lawn254.pdf.

[29]

Howell, G. W., Demmel, J. W., Fulton, C. T., Hammarling, S., and Marmol, K. 2008. Cache efficient bidiagonalization using BLAS 2.5 operators. ACM Trans. Math. Softw. 34, 3, 14:1--14:33.

Digital Library

[30]

Jacobi, C. 1846. Über ein leichtes verfahren, die in der theorie der säkularstörungen vorkommenden gleichungen numerisch aufzulösen. Crelle’s J. 30, 51--94.

[31]

Jiang, E. and Zhang, Z. 1985. A new shift of the QL algorithm for irreducible symmetric tridiagonal matrices. Linear Algebra Appl. 65, 0, 261--272.

[32]

Joffrain, T., Low, T. M., Quintana-Ortí, E. S., van de Geijn, R., and Van Zee, F. 2006. Accumulating householder transformations. ACM Trans. Math. Softw. 32, 2, 169--179.

Digital Library

[33]

Kågström, B., Kressner, D., Quintana-Ortí, E. S., and Quintana-Ortí, G. 2008. Blocked algorithms for the reduction to Hessenberg-triangular form revisited. BIT Numer. Math. 48, 3, 563--584.

[34]

Lang, B. 1998. Using level 3 BLAS in rotation-based algorithms. SIAM J. Sci. Comput. 19, 2, 626--634.

Digital Library

[35]

LAPACK. 2011. http://www.netlib.org/lapack/.

[36]

Lawson, C. L., Hanson, R. J., Kincaid, D. R., and Krogh, F. T. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Soft. 5, 3, 308--323.

Digital Library

[37]

libflame. 2011. http://www.cs.utexas.edu/users/flame/libflame/.

[38]

Luszczek, P., Ltaief, H., and Dongarra, J. 2011. Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. LAPACK Working Note 244. http://www.netlib.org/lapack/lawnspdf/lawn244.pdf.

[39]

Nakatsukasa, Y., Aishima, K., and Yamazaki, I. 2012. dqds with aggressive early deflation. SIAM J. Matrix Anal. Appl. 33, 1, 22--51.

Digital Library

[40]

OpenBLAS. 2011. https://github.com/xianyi/OpenBLAS/.

[41]

Parlett, B. N. 1980. The Symmetric Eigenvalue Problem. Prentice-Hall.

Digital Library

[42]

Parlett, B. N. and Marques, O. A. 1999. An implementation of the dqds algorithm (positive case). Linear Algebra Appl. 309, 217--259.

[43]

Quintana-Ortí, G. and Vidal, A. M. 1992. Parallel algorithms for QR factorization on shared memory multiprocessors. In Proceedings of the European Workshop on Parallel Computing’92, Parallel Computing: From Theory to Sound Practice. IOS Press, 72--75.

[44]

Rajamanickam, S. 2009. Efficient algorithms for sparse singular value decomposition. Ph.D. thesis, University of Florida.

[45]

Rutter, J. D. 1994. A serial implementation of Cuppen’s divide and conquer algorithm for the symmetric eigenvalue problem. Tech. rep. CSD-94-799, Computer Science Division, University of California at Berkeley.

Digital Library

[46]

Schreiber, R. and van Loan, C. 1989. A storage-efficient WY representation for products of Householder transformations. SIAM J. Sci. Statist. Comput. 10, 1, 53--57.

Digital Library

[47]

Smith, B. T., Boyle, J. M., Dongarra, J. J., Garbow, B. S., Ikebe, Y., et al. 1976. Matrix Eigensystem Routines -- EISPACK Guide 2nd Ed. Lecture Notes in Computer Science, vol. 6, Springer.

[48]

Stewart, G. W. 2001. Matrix Algorithms Volume II: Eigensystems. SIAM, Philadelphia, PA.

Digital Library

[49]

Van Zee, F. G. 2011. Libflame: The Complete Reference. http://www.lulu.com/.

[50]

Van Zee, F. G., van de Geijn, R., and Quintana-Ortí, G. 2011. Restructuring the QR algorithm for high-performance application of Givens rotations. Tech. rep. TR-11-36, FLAME Working Note 60. Department of Computer Science, The University of Texas at Austin.

[51]

Van Zee, F. G., van de Geijn, R. A., Quintana-Ortí, G., and Elizondo, G. J. 2012. Families of algorithms for reducing a matrix to condensed form. ACM Trans. Math. Soft. 39, 1.

Digital Library

[52]

Watkins, D. S. 1982. Understanding the QR algorithm. SIAM Rev. 24, 4, 427--440.

Digital Library

[53]

Wilkinson, J. H. 1968. Global convergence of tridiagonal QR algorithm with origin shifts. Linear Algebra Appl. 1, 409--420.

[54]

Willems, P. R. and Lang, B. 2011. A framework for the MR³ algorithm: Theory and implementation. Tech. rep., Bergische Universität Wuppertal. http://www-ai.math.uni-wuppertal.de/SciComp/preprints/SC1102.pdf.

[55]

Willems, P. R., Lang, B., and Vömel, C. 2006. Computing the bidiagonal SVD using multiple relatively robust representations. SIAM J. Matrix Anal. Appl. 28, 4, 907--926.

Digital Library

Cited By

Zhao THu QHe XXu WWang JLeng CCheng J(2023)ECBC: Efficient Convolution via Blocked ColumnizingIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.309527634:1(433-445)Online publication date: Jan-2023
https://doi.org/10.1109/TNNLS.2021.3095276
Kokkiligadda VNaikoti VPatkotwar GSabat SPeesapati R(2023)FPGA-Based Hardware Accelerator for Matrix InversionSN Computer Science10.1007/s42979-022-01542-x4:2Online publication date: 9-Jan-2023
https://dl.acm.org/doi/10.1007/s42979-022-01542-x
Heavner NChen CGopal AMartinsson P(2023)Efficient algorithms for computing rank‐revealing factorizations on a GPUNumerical Linear Algebra with Applications10.1002/nla.251530:6Online publication date: 2-Jun-2023
https://doi.org/10.1002/nla.2515
Show More Cited By

Index Terms

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance
1. Mathematics of computing
  1. Mathematical software

Recommendations

Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem

The Singular Value Decomposition (SVD) is widely used in numerical analysis and scientific computing applications, including dimensionality reduction, data compression and clustering, and computation of pseudo-inverses. In many cases, a crucial part of ...
Families of Algorithms for Reducing a Matrix to Condensed Form

In a recent paper it was shown how memory traffic can be diminished by reformulating the classic algorithm for reducing a matrix to bidiagonal form, a preprocess when computing the singular values of a dense matrix. The key is a reordering of the ...
Accurately Counting Singular Values of Bidiagonal Matrices and Eigenvalues of Skew-Symmetric Tridiagonal Matrices

We have developed algorithms to count singular values of a real bidiagonal matrix which are greater than a specified value. This requires the transformation of the singular value problem to an equivalent symmetric eigenvalue problem. The counting of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software

ACM Transactions on Mathematical Software Volume 40, Issue 3

April 2014

152 pages

ISSN:0098-3500

EISSN:1557-7295

DOI:10.1145/2610268

Editor:
Michael A. Heroux
Sandia National Laboratories USA

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2014

Accepted: 01 October 2013

Revised: 01 March 2013

Received: 01 October 2011

Published in TOMS Volume 40, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Microsoft
UTAustin-Portugal Colab program
Office of Cyberinfrastructure
Division of Computing and Communication Foundations

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
666
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)3

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao THu QHe XXu WWang JLeng CCheng J(2023)ECBC: Efficient Convolution via Blocked ColumnizingIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.309527634:1(433-445)Online publication date: Jan-2023
https://doi.org/10.1109/TNNLS.2021.3095276
Kokkiligadda VNaikoti VPatkotwar GSabat SPeesapati R(2023)FPGA-Based Hardware Accelerator for Matrix InversionSN Computer Science10.1007/s42979-022-01542-x4:2Online publication date: 9-Jan-2023
https://dl.acm.org/doi/10.1007/s42979-022-01542-x
Heavner NChen CGopal AMartinsson P(2023)Efficient algorithms for computing rank‐revealing factorizations on a GPUNumerical Linear Algebra with Applications10.1002/nla.251530:6Online publication date: 2-Jun-2023
https://doi.org/10.1002/nla.2515
Rizwan MJung EPark YChoi JKim Y(2022)Optimization of Matrix-Matrix Multiplication Algorithm for Matrix-Panel Multiplication on Intel KNL2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA56895.2022.10017947(1-7)Online publication date: Dec-2022
https://doi.org/10.1109/AICCSA56895.2022.10017947
Bosner N(2022)Efficient algorithm for simultaneous reduction to the $$m$$-Hessenberg-triangular-triangular formBIT10.1007/s10543-014-0516-y55:3(677-703)Online publication date: 11-Mar-2022
https://dl.acm.org/doi/10.1007/s10543-014-0516-y
Theodoris CAlachiotis NLow TPavlidis P(2020)qLD: High-performance Computation of Linkage Disequilibrium on CPU and GPU2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE)10.1109/BIBE50027.2020.00019(65-72)Online publication date: Oct-2020
https://doi.org/10.1109/BIBE50027.2020.00019
Fibich CTauner SRossler PHorauer MKrapfenbauer MLinauer MMatschnig MTaucher H(2019)Evaluation of Open-Source Linear Algebra Libraries in Embedded Applications2019 8th Mediterranean Conference on Embedded Computing (MECO)10.1109/MECO.2019.8760041(1-6)Online publication date: Jun-2019
https://doi.org/10.1109/MECO.2019.8760041
Dongarra JGates MHaidar AKurzak JLuszczek PTomov SYamazaki I(2018)The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme ScaleSIAM Review10.1137/17M111773260:4(808-865)Online publication date: 8-Nov-2018
https://doi.org/10.1137/17M1117732
Zhang JHuang G(2018)Research on distributed heterogeneous data PCA algorithm based on cloud platform10.1063/1.5038988(020016)Online publication date: 2018
https://doi.org/10.1063/1.5038988
Yurchenko SLodi LTennyson JStolyarov A(2016)Duo: A general program for calculating spectra of diatomic moleculesComputer Physics Communications10.1016/j.cpc.2015.12.021202(262-275)Online publication date: May-2016
https://doi.org/10.1016/j.cpc.2015.12.021
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents