Abstract
The small-bulge multishift QR algorithm proposed by Braman, Byers and Mathias is one of the most efficient algorithms for computing the eigenvalues of nonsymmetric matrices on processors with hierarchical memory. However, to fully extract its potential performance, it is crucial to choose the block size m properly according to the target architecture and the matrix size n. In this paper, we construct a performance model for this algorithm. The model has a hierarchical structure that reflects the structure of the original algorithm and given n, m and the performance data of the basic components of the algorithm, such as the level-3 BLAS routines and the double implicit shift QR routine, predicts the total execution time. Experiments on SMP machines with PowerPC G5 and Opteron processors show that the variation of the execution time as a function of m predicted by the model agrees well with the measurements. Thus our model can be used to automatically select the optimal value of m for a given matrix size on a given architecture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Francis, J.G.F.: The QR transformation. a unitary analogue to the LR transformation. I. Comput. J. 4, 265–271 (1961)
Francis, J.G.F.: The QR transformation. II. Comput. J. 4, 332–345 (1961)
Kublanovskaya, V.N.: On some algorithms for the solution of the complete eigenvalue problem. U.S.S.R. Comput. Math. and Math. Phys. 3, 637–657 (1961)
Bai, Z., Demmel, J.: On a block implementation of Hessenberg QR iteration. Int. J. of High Speed Computing 1, 97–112 (1989)
Braman, K., Byers, R., Mathias, R.: The multishift QR algorithm. part I: Maintaining well-focused shifts and level 3 performance. SIAM Journal on Matrix Analysis and Applications 23, 929–947 (2002)
Dubrulle, A.: The multishift QR algorithm: Is it worth the trouble? Palo Alto Scientific Center Report G320-3558x, IBM Corp. (1991)
Henry, G., Watkins, D.S., Dongarra, J.: A parallel implementation of the nonsymmetric QR algorithm for distributed memory architectures. SIAM J. Sci. Comput. 24, 284–311 (2002)
Watkins, D.S.: Bidirectional chasing algorithms for the eigenvalue problem. SIAM J. Matrix Anal. Appl. 14, 166–179 (1993)
Watkins, D.S.: Shifting strategies for the parallel QR algorithm. SIAM J. Sci. Comput. 15, 953–958 (1994)
Demmel, J.W.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997)
Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Watkins, D.S.: The transmission of shifts and shift blurring in the QR algorithm. Linear Algebra and Its Applications 241/243, 877–896 (1996)
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK User’s Guide. SIAM, Philadelphia (1992)
Dackland, K., Kågström, B.: A hierarchical approach for performance analysis of ScaLAPACK-based routines using the distributed linear algebra machine. In: Madsen, K., Olesen, D., Waśniewski, J., Dongarra, J. (eds.) PARA 1996. LNCS, vol. 1184, pp. 187–195. Springer, Heidelberg (1996)
Cuenca, J., Gimenez, D., Gonzalez, J.: Architecture of an automatically tuned linear algebra library. Parallel Computing 30, 187–210 (2004)
Cuenca, J., Garcia, L.P., Gimenez, D.G.: Empirical modelling of parallel linear algebra routines. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2004. LNCS, vol. 3019, pp. 169–174. Springer, Heidelberg (2004)
Katagiri, T., Kuroda, H., Kanada, Y.: A methodology for automatically tuned parallel tri-diagonalization on distributed memory parallel machines. In: Proceedings of VecPar2000, Faculdade de Engenharia da Universidade do Porto, Portugal, pp. 265–277 (2000)
Yamamoto, Y.: Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm. In: Proceedings of HPC-Asia 2005, Beijing, pp. 249–256 (2005)
Dongarra, J., Eijkhout, V.: Self-adapting numerical software for next generation applications. International Journal of High Performance Computing Applications 17, 125–131 (2003)
Whaley, R., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 3–35 (2001)
Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PhiPAC: a portable, high-performance, ANSI-C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing, Vienna, pp. 340–347 (1997)
Kressner, D.: Numerical Methods for General and Structured Eigenvalue Problems. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yamamoto, Y. (2006). Performance Modeling and Optimal Block Size Selection for the Small-Bulge Multishift QR Algorithm. In: Guo, M., Yang, L.T., Di Martino, B., Zima, H.P., Dongarra, J., Tang, F. (eds) Parallel and Distributed Processing and Applications. ISPA 2006. Lecture Notes in Computer Science, vol 4330. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946441_44
Download citation
DOI: https://doi.org/10.1007/11946441_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68067-3
Online ISBN: 978-3-540-68070-3
eBook Packages: Computer ScienceComputer Science (R0)