Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1654059.1654061acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Sparse matrix factorization on massively parallel computers

Published: 14 November 2009 Publication History

Abstract

Direct methods for solving sparse systems of linear equations have a high asymptotic computational and memory requirements relative to iterative methods. However, systems arising in some applications, such as structural analysis, can often be too ill-conditioned for iterative solvers to be effective. We cite real applications where this is indeed the case, and using matrices extracted from these applications to conduct experiments on three different massively parallel architectures, show that a well designed sparse factorization algorithm can attain very high levels of performance and scalability. We present strong scalability results for test data from real applications on up to 8,192 cores, along with both analytical and experimental weak scalability results for a model problem on up to 16,384 cores---an unprecedented number for sparse factorization. For the model problem, we also compare experimental results with multiple analytical scaling metrics and distinguish between some commonly used weak scaling methods.

References

[1]
Blue Waters: Sustained Petascale Computing. National Center for Supercomputing Applications, University of Illinois, http://www.ncsa.uiuc.edu/Blue Waters.
[2]
MSC Software Products: Engineering Analysis. MSC Software Corporation, Santa Ana, CA. http://www.mscsoftware.com/products.
[3]
P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications, 17(4):886--905, 1996.
[4]
P. R. Amestoy, I. S. Duff, J. Koster, and J. Y. L'Excellent. A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications, 23(1):15--41, 2001.
[5]
P. R. Amestoy, I. S. Duff, and J. Y. L'Excellent. Multifrontal parallel distributed symmetric and unsymmetric solvers. Computational Methods in Applied Mechanical Engineering, 184:501--520, 2000.
[6]
E. Chow. Parallel implementation and practical use of sparse approximate inverse preconditioners with a priori sparsity patterns. International Journal of High Performance Computing Applications, 15(1):56--74, 2001.
[7]
J. J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff. A set of level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1--17, 1990.
[8]
I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Transactions on Mathematical Software, 9(3):302--325, 1983.
[9]
R. D. Falgout and U. M. Yang. hypre, High Performance Preconditioners: Users manual. Technical report, Lawrence Livermore National Laboratory, 2006. Paper appears in ICCS 2002.
[10]
M. Faverge and P. Ramet. Dynamic scheduling for sparse direct solver on NUMA architectures. In Proceedings of PARA '2008, Trondheim, Norway.
[11]
J. Fettig, S. Koric, and N. Sobh. A comparison of direct and iterative linear sparse solvers in computational solid mechanics. In International Conference on Preconditioning Techniques for Large Sparse Matrix Problems, Napa, CA, 2003.
[12]
A. George and J. W.-H. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, NJ, 1981.
[13]
T. George, A. Gupta, and V. Sarin. An empirical analysis of iterative solver performance for SPD systems. Technical Report RC 24737, IBM T. J. Watson Research Center, Yorktown Heights, NY, January 2009.
[14]
J. R. Gilbert and S. Toledo. An assessment of incomplete-LU preconditioners for nonsymmetric linear systems. Informatica, 24(3):409--425, 2000.
[15]
A. Gupta. A shared- and distributed-memory parallel sparse direct solver. Applicable Algebra in Engineering, Communication, and Computing, 18(3):263--277, 2007.
[16]
A. Gupta. Fast and effective algorithms for graph partitioning and sparse matrix ordering. IBM Journal of Research and Development, 41(1/2):171--183, January/March, 1997.
[17]
A. Gupta. WSMP: Watson sparse matrix package (Part-I: Direct solution of symmetric sparse systems). Technical Report RC 21886, IBM T. J. Watson Research Center, Yorktown Heights, NY, November 2000. http://www.cs.umn.edu/~agupta/wsmp.
[18]
A. Gupta, M. Joshi, and V. Kumar. WSMP: A high-performance serial and parallel symmetric sparse linear solver. In B. Kagstrom, J. J. Dongarra, E. Elmroth, and J. Wasniewski, editors, PARA '98 Workshop on Applied Parallel Computing in Large Scale Scientific and Industrial Problems. Springer Verlag, 1998.
[19]
A. Gupta, G. Karypis, and V. Kumar. Highly scalable parallel algorithms for sparse matrix factorization. IEEE Transactions on Parallel and Distributed Systems, 8(5):502--520, May 1997.
[20]
J. L. Gustafson. Reevaluating Amdahl's law. Communications of the ACM, 31(5):532--533, 1988.
[21]
J. L. Gustafson, G. R. Montry, and R. E. Benner. Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, 9(4):609--638, 1988.
[22]
P. Hénon, P. Ramet, and J. Roman. PaStiX: A high-performance parallel direct solver for sparse symmetric definite systems. Parallel Computing, 28(2):301--321, 2002.
[23]
V. E. Henson and U. M. Yang. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics: Transactions of IMACS, 41(1): 155--177, 2002.
[24]
D. Hysom and A. Pothen. A scalable parallel algorithm for incomplete factor preconditioning. SIAM Journal on Scientific Computing, 22(6):2194--2215, 2000.
[25]
M. Joshi, A. Gupta, G. Karypis, and V. Kumar. Two-dimensional scalable parallel algorithms for solution of triangular systems. In Proceedings of the 1997 International Conference on High Performance Computing (HiPC), 1997.
[26]
G. Karypis and V. Kumar. ParMETIS: Parallel graph partitioning and sparse matrix ordering library. Technical Report TR 97-060, Department of Computer Science, University of Minnesota, 1997.
[27]
V. Kumar and A. Gupta. Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 22(3):379--391, 1994.
[28]
V. Kumar and V. N. Rao. Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, 16(6):501--519, 1987.
[29]
G. Lakner and C. P. Sosa. IBM System Blue Gene Solution: Blue Gene/P Application Development. IBM, 2008. http://www.redbooks.ibm.com/abstracts/sg247287.html.
[30]
X. S. Li and J. W. Demmel. Making sparse Gaussian elimination scalable by static pivoting. In Supercomputing '98 Proceedings, 1998.
[31]
X. S. Li and J. W. Demmel. A scalable sparse direct solver using static pivoting. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.
[32]
R. J. Lipton, D. J. Rose, and R. E. Tarjan. Generalized nested dissection. SIAM Journal on Numerical Analysis, 16:346--358, 1979.
[33]
J. W.-H. Liu. The role of elimination trees in sparse factorization. SIAM Journal on Matrix Analysis and Applications, 11:134--172, 1990.
[34]
J. W.-H. Liu. The multifrontal method for sparse matrix solution: Theory and practice. SIAM Review, 34(1):82--109, 1992.
[35]
J. W. Ruge and K. Stüben. Multigrid methods. In S. F. McCormick, editor, Frontiers in Applied Mathematics, volume 3, pages 73--130. SIAM, Philadelphia, PA, 1987.
[36]
Y. Saad. Iterative Methods for Sparse Linear Systems, 2nd edition. SIAM, 2003.
[37]
J. Schulze. Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods. Bit Numerical Mathematics, 41(4):800--841, 2001.
[38]
X.-H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17(2):1093--1109, 1991.
[39]
X.-H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing '90 Proceedings, pages 324--333, 1990.
[40]
X.-H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19(1):27--37, 1993.
[41]
X.-H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. IEEE Transactions on Parallel and Distributed Systems, 5(6):599--613, 1994.
[42]
P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838--858, 1990.

Cited By

View all
  • (2019)A novel TDFEM scheme based on high‐performance parallel solver PaStiXInternational Journal of Numerical Modelling: Electronic Networks, Devices and Fields10.1002/jnm.269133:2Online publication date: 15-Oct-2019
  • (2017)On the improvement of a scalable sparse direct solver for unsymmetrical linear equationsThe Journal of Supercomputing10.1007/s11227-016-1892-773:5(1852-1904)Online publication date: 1-May-2017
  • (2016)Multi-core parallel robust structured multifrontal factorization method for large discretized PDEsJournal of Computational and Applied Mathematics10.1016/j.cam.2015.09.012296:C(36-46)Online publication date: 1-Apr-2016
  • Show More Cited By

Index Terms

  1. Sparse matrix factorization on massively parallel computers

                      Recommendations

                      Reviews

                      James Harold Davenport

                      Gupta, Koric, and George consider direct methods for solving large sparse linear systems. As the authors themselves admit that "direct methods have a high asymptotic computational and memory requirement relative to iterative methods," the question to ask is why. Because, as the authors conclusively demonstrate in Section 3, real-world examples exist-in particular, three-dimensional finite element problems-that either cannot be solved iteratively, no matter how much preconditioning is attempted, or where the iterative solution takes longer. Therefore, they explore the performance of their Watson sparse matrix package against two other direct solvers, on a range of symmetric positive definite matrices, on a range of machines, going up to 16,384 cores. The results are too detailed to describe here, but what I took away is the message that at least in this range, one still gets linear speedup, provided that the problem size is allowed to grow with the number of processors. I have one minor quibble: while Gupta, Koric, and George are careful to talk about cores and nodes in the description of the machines, they too often use the term "CPU," forcing a detailed search to discover what they mean. Online Computing Reviews Service

                      Access critical reviews of Computing literature here

                      Become a reviewer for Computing Reviews.

                      Comments

                      Please enable JavaScript to view thecomments powered by Disqus.

                      Information & Contributors

                      Information

                      Published In

                      cover image ACM Conferences
                      SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
                      November 2009
                      778 pages
                      ISBN:9781605587448
                      DOI:10.1145/1654059
                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Sponsors

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      Published: 14 November 2009

                      Permissions

                      Request permissions for this article.

                      Check for updates

                      Author Tags

                      1. error correction
                      2. last-level caches
                      3. reliability
                      4. soft error

                      Qualifiers

                      • Research-article

                      Conference

                      SC '09
                      Sponsor:

                      Acceptance Rates

                      SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
                      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

                      Contributors

                      Other Metrics

                      Bibliometrics & Citations

                      Bibliometrics

                      Article Metrics

                      • Downloads (Last 12 months)8
                      • Downloads (Last 6 weeks)1
                      Reflects downloads up to 14 Nov 2024

                      Other Metrics

                      Citations

                      Cited By

                      View all
                      • (2019)A novel TDFEM scheme based on high‐performance parallel solver PaStiXInternational Journal of Numerical Modelling: Electronic Networks, Devices and Fields10.1002/jnm.269133:2Online publication date: 15-Oct-2019
                      • (2017)On the improvement of a scalable sparse direct solver for unsymmetrical linear equationsThe Journal of Supercomputing10.1007/s11227-016-1892-773:5(1852-1904)Online publication date: 1-May-2017
                      • (2016)Multi-core parallel robust structured multifrontal factorization method for large discretized PDEsJournal of Computational and Applied Mathematics10.1016/j.cam.2015.09.012296:C(36-46)Online publication date: 1-Apr-2016
                      • (2016)Numerical MethodsHandbook of Software Solutions for ICME10.1002/9783527693566.ch7(487-531)Online publication date: 23-Sep-2016
                      • (2015)Kinetic Dependence GraphsACM SIGARCH Computer Architecture News10.1145/2786763.269436343:1(457-471)Online publication date: 14-Mar-2015
                      • (2015)Kinetic Dependence GraphsACM SIGPLAN Notices10.1145/2775054.269436350:4(457-471)Online publication date: 14-Mar-2015
                      • (2015)Kinetic Dependence GraphsProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694363(457-471)Online publication date: 14-Mar-2015
                      • (2014)ELEFANT: a user-friendly multipurpose geodynamics codeSolid Earth Discussions10.5194/sed-6-1949-20146:2(1949-2096)Online publication date: 29-Jul-2014
                      • (2014)Efficient enforcement of hard articulation constraints in the presence of closed loops and contactsComputer Graphics Forum10.1111/cgf.1232233:2(235-244)Online publication date: 1-May-2014
                      • (2014)Real-Time Stochastic Optimization of Complex Energy Systems on High-Performance ComputersComputing in Science & Engineering10.1109/MCSE.2014.5316:5(32-42)Online publication date: Sep-2014
                      • Show More Cited By

                      View Options

                      Get Access

                      Login options

                      View options

                      PDF

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader

                      Media

                      Figures

                      Other

                      Tables

                      Share

                      Share

                      Share this Publication link

                      Share on social media