Reducing the Time to Tune Parallel Dense Linear Algebra Routines with Partial Execution and Performance Modeling

Piotr Luszczek¹⁹ &
Jack Dongarra^19,20,21

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7203))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

2134 Accesses

Abstract

We present a modeling framework to accurately predict time to run dense linear algebra calculation. We report the framework’s accuracy in a number of varied computational environments such as shared memory multicore systems, clusters, and large supercomputing installations with tens of thousands of cores. We also test the accuracy for various algorithms, each of which having a different scaling properties and tolerance to low-bandwidth/high-latency interconnects. The predictive accuracy is very good and on the order of measurement accuracy which makes the method suitable for both dedicated and non-dedicated environments. We also present a practical application of our model to reduce the time required to tune and optimize large parallel runs whose time is dominated by linear algebra computations. We show practical examples of how to apply the methodology to avoid common pitfalls and reduce the influence of measurement errors and the inherent performance variability.

This research was supported by DARPA through ORNL subcontract 4000075916 as well as NSF through award number 1038814. We would like to also thank Patrick Worley from ORNL for facilitating the large scale runs on Jaguar’s Cray XT4 partition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

High-Performance Algorithms for Numerical Linear Algebra

High-Performance Computing Basics

Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

References

Anderson, E., Bai, Z., Bischof, C., Blackford, S.L., Demmel, J.W., Dongarra, J.J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.C.: LAPACK User’s Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia (1999)
Book MATH Google Scholar
Barrett, R.F., Chan, T.H.F., D’Azevedo, E.F., Jaeger, E.F., Wong, K., Wong, R.Y.: Complex version of high performance computing LINPACK benchmark (HPL). Concurrency and Computation: Practice and Experience 22(5), 573–587 (2010)
Google Scholar
Björk, Å.: Numerical methods for Least Squares Problems. SIAM (1996) ISBN 0-89871-360-9
Google Scholar
Suzan Blackford, L., Choi, J., Cleary, A., D’Azevedo, E.F., Demmel, J.W., Dhillon, I.S., Dongarra, J.J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D.W., Clint Whaley, R.: ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Book Google Scholar
Chen, Z., Dongarra, J., Luszczek, P., Roche, K.: Self-adapting software for numerical linear algebra and LAPACK for Clusters. Parallel Computing 29(11-12), 1723–1743 (2003)
Article Google Scholar
Choi, J., Dongarra, J.J., Ostrouchov, S., Petitet, A., Walker, D.W., Clint Whaley, R.: The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming 5, 173–184 (1996)
Google Scholar
Dongarra, J., Du Croz, J., Duff, I., Hammarling, S.: Algorithm 679: A set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Soft. 16(1), 18–28 (1990)
Article MATH Google Scholar
Dongarra, J., Du Croz, J., Duff, I., Hammarling, S.: A set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Soft. 16(1), 1–17 (1990)
Article MATH Google Scholar
Dongarra, J., Jeannot, E., Langou, J.: Modeling the LU factorization for SMP clusters. In: Proceeedings of Parallel Matrix Algorithms and Applications (PMAA 2006), September 7-9. IRISA, Rennes, France (2006)
Google Scholar
Dongarra, J., Luszczek, P.: Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modelling. In: Poster Session of SC 2010, New Orleans, Louisianna, USA, November 13-19 (2010), Also: Technical Report UT-CS-10-661, University of Tennessee, Computer Science Department
Google Scholar
Dongarra, J.J., Duff, I.S., Sorensen, D.C., van der Vorst, H.A.: Numerical Linear Algebra for High-Performance Computers. Society for Industrial and Applied Mathematics, Philadelphia (1998)
Book MATH Google Scholar
Dongarra, J.J., Gustavson, F.G., Karp, A.: Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review 26(1), 91–112 (1984)
Article MathSciNet MATH Google Scholar
Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK benchmark: Past, present, and future. Concurrency and Computation: Practice and Experience 15, 1–18 (2003)
Article Google Scholar
Edelman, A.: Large dense numerical linear algebra in 1993: the parallel computing influence. International Journal of High Performance Computing Applications 7(2), 113–128 (1993)
Article Google Scholar
García, L.-P., Cuenca, J., Giménez, D.: Using Experimental Data to Improve the Performance Modelling of Parallel Linear Algebra Routines. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 1150–1159. Springer, Heidelberg (2008) ISSN 0302-9743 (Print) 1611-3349 (Online), doi:10.1007/978-3-540-68111-3
Chapter Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore and London (1996)
MATH Google Scholar
Meuer, H., Strohmaier, E., Dongarra, J., Simon, H.: TOP500 Supercomputer Sites, 34th edn. (November 2009), http://www.netlib.org/benchmark/top500.html and http://www.top500.org/
Meuer, H., Strohmaier, E., Dongarra, J., Simon, H.: TOP500 Supercomputer Sites, Hambug, Germany, 37th edn. (June 2011), http://www.netlib.org/benchmark/top500.html and http://www.top500.org/
Harrington, R.: Origin and development of the method of moments for field computation. IEEE Antennas and Propagation Magazine (June 1990)
Google Scholar
Hess, J.L.: Panel methods in computational fluid dynamics. Annual Reviews of Fluid Mechanics 22, 255–274 (1990)
Article Google Scholar
Hess, L., Smith, M.O.: Calculation of potential flows about arbitrary bodies. In: Kuchemann, D. (ed.) Progress in Aeronautical Sciences, vol. 8. Pergamon Press (1967)
Google Scholar
Kerbyson, D.J., Hoisie, A., Wasserman, H.J.: Verifying Large-Scale System Performance During Installation using Modeling. In: High Performance Scientific and Engineering Computing, Hardware/Software Support. Kluwer (October 2003)
Google Scholar
Luszczek, P., Dongarra, J., Kepner, J.: Design and implementation of the HPCC benchmark suite. CT Watch Quarterly 2(4A) (November 2006)
Google Scholar
Numerich, R.W.: Computational forces in the Linpack benchmark. Concurrency Practice and Experience (2007)
Google Scholar
Oram, A., Wilson, G. (eds.): Beautiful Code. O’Reilly (2007), Chapter 14: How Elegant Code Evolves with Hardware: The Case of Gaussian Elimination
Google Scholar
Roche, K.J., Dongarra, J.J.: Deploying parallel numerical library routines to cluster computing in a self adapting fashion. In: Parallel Computing: Advances and Current Issues. Imperial College Press, London (2002)
Google Scholar
Rodgers, J.L., Nicewander, W.A.: Thirteen ways to look at the correlation coefficient. The American Statistician 42, 59–66 (1988)
Article Google Scholar
Smith, W., Foster, I., Taylor, V.: Predicting application runt times with historical information. In: Proceedings of IPPS Workshop on Job Scheduling Strtegies for Parallel Processing. Elsevier Inc. (1998), doi:10.1016/j.jpdc.2004.06.2008
Google Scholar
Wang, J.J.H.: Generalized Moment Methods in Electromagnetics. John Wiley & Sons, New York (1991)
Google Scholar
Weinberg, J., McCracken, M.O., Strohmaier, E., Snavely, A.: Quantifying locality in the memory access patterns of HPC applications. In: Proceedings of SC 2005, Seattle, Washington. IEEE Computer Society Washington, DC (2005)
Google Scholar
Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Prentice Hall, Englewood Cliffs (1963)
MATH Google Scholar
Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford University Press, Oxford (1965)
MATH Google Scholar
Yang, L.T., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the ACM/IEEE SC 2005 Conference (SC 2005). IEEE (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Tennessee, Knoxville, TN, USA
Piotr Luszczek & Jack Dongarra
Oak Ridge National Laboratory, USA
Jack Dongarra
University of Manchester, United Kingdom
Jack Dongarra

Authors

Piotr Luszczek
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego 69, 42-201, Czestochowa, Poland
Roman Wyrzykowski & Konrad Karczewski &
Electrical Engineering and Computer Science Department, University of Tennessee, 1122 Volunteer Blvd, 37996-3450, Knoxville, TN, USA
Jack Dongarra
Department of Informatics and Mathematical Modeling, Technical University of Denmark, Richard Petersens Plads, Building 321, 2800, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luszczek, P., Dongarra, J. (2012). Reducing the Time to Tune Parallel Dense Linear Algebra Routines with Partial Execution and Performance Modeling. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2011. Lecture Notes in Computer Science, vol 7203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31464-3_74

Download citation

DOI: https://doi.org/10.1007/978-3-642-31464-3_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31463-6
Online ISBN: 978-3-642-31464-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics