research-article

Sparse matrix factorization on massively parallel computers

Authors:

Anshul Gupta,

Seid Koric,

Thomas GeorgeAuthors Info & Claims

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Article No.: 1, Pages 1 - 12

https://doi.org/10.1145/1654059.1654061

Published: 14 November 2009 Publication History

Get Access

Abstract

Direct methods for solving sparse systems of linear equations have a high asymptotic computational and memory requirements relative to iterative methods. However, systems arising in some applications, such as structural analysis, can often be too ill-conditioned for iterative solvers to be effective. We cite real applications where this is indeed the case, and using matrices extracted from these applications to conduct experiments on three different massively parallel architectures, show that a well designed sparse factorization algorithm can attain very high levels of performance and scalability. We present strong scalability results for test data from real applications on up to 8,192 cores, along with both analytical and experimental weak scalability results for a model problem on up to 16,384 cores---an unprecedented number for sparse factorization. For the model problem, we also compare experimental results with multiple analytical scaling metrics and distinguish between some commonly used weak scaling methods.

References

[1]

Blue Waters: Sustained Petascale Computing. National Center for Supercomputing Applications, University of Illinois, http://www.ncsa.uiuc.edu/Blue Waters.

Google Scholar

[2]

MSC Software Products: Engineering Analysis. MSC Software Corporation, Santa Ana, CA. http://www.mscsoftware.com/products.

Google Scholar

[3]

P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications, 17(4):886--905, 1996.

Digital Library

Google Scholar

[4]

P. R. Amestoy, I. S. Duff, J. Koster, and J. Y. L'Excellent. A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications, 23(1):15--41, 2001.

Digital Library

Google Scholar

[5]

P. R. Amestoy, I. S. Duff, and J. Y. L'Excellent. Multifrontal parallel distributed symmetric and unsymmetric solvers. Computational Methods in Applied Mechanical Engineering, 184:501--520, 2000.

Crossref

Google Scholar

[6]

E. Chow. Parallel implementation and practical use of sparse approximate inverse preconditioners with a priori sparsity patterns. International Journal of High Performance Computing Applications, 15(1):56--74, 2001.

Digital Library

Google Scholar

[7]

J. J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff. A set of level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1--17, 1990.

Digital Library

Google Scholar

[8]

I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Transactions on Mathematical Software, 9(3):302--325, 1983.

Digital Library

Google Scholar

[9]

R. D. Falgout and U. M. Yang. hypre, High Performance Preconditioners: Users manual. Technical report, Lawrence Livermore National Laboratory, 2006. Paper appears in ICCS 2002.

Google Scholar

[10]

M. Faverge and P. Ramet. Dynamic scheduling for sparse direct solver on NUMA architectures. In Proceedings of PARA '2008, Trondheim, Norway.

Google Scholar

[11]

J. Fettig, S. Koric, and N. Sobh. A comparison of direct and iterative linear sparse solvers in computational solid mechanics. In International Conference on Preconditioning Techniques for Large Sparse Matrix Problems, Napa, CA, 2003.

Google Scholar

[12]

A. George and J. W.-H. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, NJ, 1981.

Digital Library

Google Scholar

[13]

T. George, A. Gupta, and V. Sarin. An empirical analysis of iterative solver performance for SPD systems. Technical Report RC 24737, IBM T. J. Watson Research Center, Yorktown Heights, NY, January 2009.

Google Scholar

[14]

J. R. Gilbert and S. Toledo. An assessment of incomplete-LU preconditioners for nonsymmetric linear systems. Informatica, 24(3):409--425, 2000.

Google Scholar

[15]

A. Gupta. A shared- and distributed-memory parallel sparse direct solver. Applicable Algebra in Engineering, Communication, and Computing, 18(3):263--277, 2007.

Digital Library

Google Scholar

[16]

A. Gupta. Fast and effective algorithms for graph partitioning and sparse matrix ordering. IBM Journal of Research and Development, 41(1/2):171--183, January/March, 1997.

Digital Library

Google Scholar

[17]

A. Gupta. WSMP: Watson sparse matrix package (Part-I: Direct solution of symmetric sparse systems). Technical Report RC 21886, IBM T. J. Watson Research Center, Yorktown Heights, NY, November 2000. http://www.cs.umn.edu/~agupta/wsmp.

Google Scholar

[18]

A. Gupta, M. Joshi, and V. Kumar. WSMP: A high-performance serial and parallel symmetric sparse linear solver. In B. Kagstrom, J. J. Dongarra, E. Elmroth, and J. Wasniewski, editors, PARA '98 Workshop on Applied Parallel Computing in Large Scale Scientific and Industrial Problems. Springer Verlag, 1998.

Digital Library

Google Scholar

[19]

A. Gupta, G. Karypis, and V. Kumar. Highly scalable parallel algorithms for sparse matrix factorization. IEEE Transactions on Parallel and Distributed Systems, 8(5):502--520, May 1997.

Digital Library

Google Scholar

[20]

J. L. Gustafson. Reevaluating Amdahl's law. Communications of the ACM, 31(5):532--533, 1988.

Digital Library

Google Scholar

[21]

J. L. Gustafson, G. R. Montry, and R. E. Benner. Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, 9(4):609--638, 1988.

Digital Library

Google Scholar

[22]

P. Hénon, P. Ramet, and J. Roman. PaStiX: A high-performance parallel direct solver for sparse symmetric definite systems. Parallel Computing, 28(2):301--321, 2002.

Digital Library

Google Scholar

[23]

V. E. Henson and U. M. Yang. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics: Transactions of IMACS, 41(1): 155--177, 2002.

Digital Library

Google Scholar

[24]

D. Hysom and A. Pothen. A scalable parallel algorithm for incomplete factor preconditioning. SIAM Journal on Scientific Computing, 22(6):2194--2215, 2000.

Digital Library

Google Scholar

[25]

M. Joshi, A. Gupta, G. Karypis, and V. Kumar. Two-dimensional scalable parallel algorithms for solution of triangular systems. In Proceedings of the 1997 International Conference on High Performance Computing (HiPC), 1997.

Digital Library

Google Scholar

[26]

G. Karypis and V. Kumar. ParMETIS: Parallel graph partitioning and sparse matrix ordering library. Technical Report TR 97-060, Department of Computer Science, University of Minnesota, 1997.

Google Scholar

[27]

V. Kumar and A. Gupta. Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 22(3):379--391, 1994.

Digital Library

Google Scholar

[28]

V. Kumar and V. N. Rao. Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, 16(6):501--519, 1987.

Digital Library

Google Scholar

[29]

G. Lakner and C. P. Sosa. IBM System Blue Gene Solution: Blue Gene/P Application Development. IBM, 2008. http://www.redbooks.ibm.com/abstracts/sg247287.html.

Digital Library

Google Scholar

[30]

X. S. Li and J. W. Demmel. Making sparse Gaussian elimination scalable by static pivoting. In Supercomputing '98 Proceedings, 1998.

Digital Library

Google Scholar

[31]

X. S. Li and J. W. Demmel. A scalable sparse direct solver using static pivoting. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.

Google Scholar

[32]

R. J. Lipton, D. J. Rose, and R. E. Tarjan. Generalized nested dissection. SIAM Journal on Numerical Analysis, 16:346--358, 1979.

Crossref

Google Scholar

[33]

J. W.-H. Liu. The role of elimination trees in sparse factorization. SIAM Journal on Matrix Analysis and Applications, 11:134--172, 1990.

Digital Library

Google Scholar

[34]

J. W.-H. Liu. The multifrontal method for sparse matrix solution: Theory and practice. SIAM Review, 34(1):82--109, 1992.

Digital Library

Google Scholar

[35]

J. W. Ruge and K. Stüben. Multigrid methods. In S. F. McCormick, editor, Frontiers in Applied Mathematics, volume 3, pages 73--130. SIAM, Philadelphia, PA, 1987.

Google Scholar

[36]

Y. Saad. Iterative Methods for Sparse Linear Systems, 2nd edition. SIAM, 2003.

Digital Library

Google Scholar

[37]

J. Schulze. Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods. Bit Numerical Mathematics, 41(4):800--841, 2001.

Crossref

Google Scholar

[38]

X.-H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17(2):1093--1109, 1991.

Digital Library

Google Scholar

[39]

X.-H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing '90 Proceedings, pages 324--333, 1990.

Digital Library

Google Scholar

[40]

X.-H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19(1):27--37, 1993.

Digital Library

Google Scholar

[41]

X.-H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. IEEE Transactions on Parallel and Distributed Systems, 5(6):599--613, 1994.

Digital Library

Google Scholar

[42]

P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838--858, 1990.

Digital Library

Google Scholar

Cited By

View all

Wu X(2019)A novel TDFEM scheme based on high‐performance parallel solver PaStiXInternational Journal of Numerical Modelling: Electronic Networks, Devices and Fields10.1002/jnm.269133:2Online publication date: 15-Oct-2019
https://doi.org/10.1002/jnm.2691
Celebi MDuran AOztoprak FTuncel MAkaydin B(2017)On the improvement of a scalable sparse direct solver for unsymmetrical linear equationsThe Journal of Supercomputing10.1007/s11227-016-1892-773:5(1852-1904)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s11227-016-1892-7
Zuo XMo ZGu TXu XZhang A(2016)Multi-core parallel robust structured multifrontal factorization method for large discretized PDEsJournal of Computational and Applied Mathematics10.1016/j.cam.2015.09.012296:C(36-46)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1016/j.cam.2015.09.012
Show More Cited By

Index Terms

Recommendations

Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Cray T3D parallel computer. Through our ...
A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

In this paper, a method via sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of a symmetric positive definite matrix is proposed. The resulting factorized sparse approximate inverse is used as a preconditioner for ...
On sparse matrix reordering for parallel factorization
ICS '94: Proceedings of the 8th international conference on Supercomputing

To minimize the amount of computation and storage for parallel sparse factorization, sparse matrices have to be reordered prior to factorization. We show that none of the popular ordering heuristics proposed before, namely, mulitple minimum degree and ...

Reviews

Reviewer: James Harold Davenport

Gupta, Koric, and George consider direct methods for solving large sparse linear systems. As the authors themselves admit that "direct methods have a high asymptotic computational and memory requirement relative to iterative methods," the question to ask is why. Because, as the authors conclusively demonstrate in Section 3, real-world examples exist-in particular, three-dimensional finite element problems-that either cannot be solved iteratively, no matter how much preconditioning is attempted, or where the iterative solution takes longer. Therefore, they explore the performance of their Watson sparse matrix package against two other direct solvers, on a range of symmetric positive definite matrices, on a range of machines, going up to 16,384 cores. The results are too detailed to describe here, but what I took away is the message that at least in this range, one still gets linear speedup, provided that the problem size is allowed to grow with the number of processors. I have one minor quibble: while Gupta, Koric, and George are careful to talk about cores and nodes in the description of the machines, they too often use the term "CPU," forcing a detailed search to discover what they mean. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

November 2009

778 pages

ISBN:9781605587448

DOI:10.1145/1654059

Conference Chair:
Wilfred Pinfold

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '09

Sponsor:

SIGARCH
IEEE-CS

SC '09: International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 20, 2009

Oregon, Portland

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
51
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wu X(2019)A novel TDFEM scheme based on high‐performance parallel solver PaStiXInternational Journal of Numerical Modelling: Electronic Networks, Devices and Fields10.1002/jnm.269133:2Online publication date: 15-Oct-2019
https://doi.org/10.1002/jnm.2691
Celebi MDuran AOztoprak FTuncel MAkaydin B(2017)On the improvement of a scalable sparse direct solver for unsymmetrical linear equationsThe Journal of Supercomputing10.1007/s11227-016-1892-773:5(1852-1904)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s11227-016-1892-7
Zuo XMo ZGu TXu XZhang A(2016)Multi-core parallel robust structured multifrontal factorization method for large discretized PDEsJournal of Computational and Applied Mathematics10.1016/j.cam.2015.09.012296:C(36-46)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1016/j.cam.2015.09.012
de Saracibar CBoman RBussetta PCajas JCervera MChiumenti MColl ADadvand PHernández Ortega JHouzeaux Gde Riera MPonthot J(2016)Numerical MethodsHandbook of Software Solutions for ICME10.1002/9783527693566.ch7(487-531)Online publication date: 23-Sep-2016
https://doi.org/10.1002/9783527693566.ch7
Hassaan MNguyen DPingali K(2015)Kinetic Dependence GraphsACM SIGARCH Computer Architecture News10.1145/2786763.269436343:1(457-471)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2786763.2694363
Hassaan MNguyen DPingali K(2015)Kinetic Dependence GraphsACM SIGPLAN Notices10.1145/2775054.269436350:4(457-471)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2775054.2694363
Hassaan MNguyen DPingali KOzturk OEbcioglu KDwarkadas S(2015)Kinetic Dependence GraphsProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694363(457-471)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2694344.2694363
Thieulot C(2014)ELEFANT: a user-friendly multipurpose geodynamics codeSolid Earth Discussions10.5194/sed-6-1949-20146:2(1949-2096)Online publication date: 29-Jul-2014
https://doi.org/10.5194/sed-6-1949-2014
Tomcin RSibbing DKobbelt L(2014)Efficient enforcement of hard articulation constraints in the presence of closed loops and contactsComputer Graphics Forum10.1111/cgf.1232233:2(235-244)Online publication date: 1-May-2014
https://dl.acm.org/doi/10.1111/cgf.12322
Petra CSchenk OAnitescu M(2014)Real-Time Stochastic Optimization of Complex Energy Systems on High-Performance ComputersComputing in Science & Engineering10.1109/MCSE.2014.5316:5(32-42)Online publication date: Sep-2014
https://doi.org/10.1109/MCSE.2014.53
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

On sparse matrix reordering for parallel factorization

Reviews

Access critical reviews of Computing literature here