Abstract.
Parallel implementation of the BLAS library for sparse matrix algorithms in computational linear algebra is a critical problem, especially on the shared memory architectures with finite memory bandwidth. In this study, we evaluate the performance of the cc-NUMA systems using low level multithreaded BLAS kernels. The performance of both the compiler and the systems are evaluated on two Intel processor based architectures, NEC TX7/AzusA and IBM xSeries 440.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
STREAM: Sustainable Memory Bandwidth in High Performance Computers, http://www.cs.virginia.edu/stream/
Nishida, A., Oyanagi, Y.: A Parallel Implementation of the Jacobi-Davidson Method using OpenMP and its Evaluation on Shared Memory Architectures. In: Proceedings of Joint Symposium on Parallel Processing 2002, pp. 79–86 (2002)
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users Guide, 3rd edn. Society for Industrial and Applied Mathematics (1999)
Aono, F., Kimura, M.: The AzusA 16-Way Itanium Server. IEEE Micro 20(5), 54–60 (2000)
Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H. (eds.): Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia (2000)
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., van der Vorst, H.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia (1994)
Davidson, E.R.: The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real symmetric matrices. J. Comp. Phys. 17, 87–94 (1975)
Fokkema, D.R., Sleijpen, G.L.G., van der Vorst, H.A.: Jacobi-Davidson style QR and QZ algorithms for the partial reduction of matrix pencils. Technical Report 941, Department of Mathematics, Utrecht University (1996)
Lawson, L., Hanson, R.J., Kincaid, D., Krogh, F.T.: Basic Linear Algebra Subprograms for FORTRAN usage. ACM Trans. Math. Soft. 5, 308–323
Sleijpen, G.L.G., van der Vorst, H.A.: A Jacobi-Davidson iteration method for linear eigenvalue problems. SIAM J. Matrix Anal. Appl. 17(2), 401–425 (1996)
Toledo, S.: Improving Memory-System Performance of Sparse Matrix-Vector Multiplication. IBM Journal of Research and Development 41(6), 711–725 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nishida, A., Oyanagi, Y. (2003). Performance Evaluation of Low Level Multithreaded BLAS Kernels on Intel Processor Based cc-NUMA Systems. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds) High Performance Computing. ISHPC 2003. Lecture Notes in Computer Science, vol 2858. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39707-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-39707-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20359-9
Online ISBN: 978-3-540-39707-6
eBook Packages: Springer Book Archive