Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

swSuperLU: A highly scalable sparse direct solver on Sunway manycore architecture

Published: 01 June 2022 Publication History

Abstract

Sparse LU factorization is essential for scientific and engineering simulations. In this work, we present swSuperLU, a highly scalable sparse direct solver on Sunway manycore architecture based on sparse LU factorization. To improve the parallelism of sparse LU factorization, we introduce the hierarchical scheme to exploit the hierarchy of Sunway manycore architecture in process-level parallelism between MPEs and thread-level parallelism between the CPE arrays. A task-based hierarchical scheme and a series of highly optimized computation kernels are designed to map processor loads and memory access well to this hierarchy. Moreover, we compared various ordering strategies and several machine-dependent parameter settings to find the most suitable ordering strategies and parameter settings for Sunway manycore architecture. We present performance and scalability experiments of swSuperLU on Newest Generation Sunway Supercomputer and Sunway TaihuLight. swSuperLU achieves 9.02× speedup on average compared to state-of-the-art packages and strong scalability from 10 thousand cores to million cores.

References

[1]
Harrington RF Field Computation by Moment Methods 1993 Hoboken Wiley-IEEE Press
[2]
Jin JM Theory and computation of electromagnetic fields 2011 Hoboken John Wiley & Sons
[3]
Wu YS Multiphase fluid flow in porous and fractured reservoirs 2015 Oxford Gulf professional publishing
[4]
Blazek J Computational fluid dynamics: principles and applications 2015 Oxford Butterworth-Heinemann
[5]
Davis TA Direct methods for sparse linear systems 2006 Philadelphia SIAM
[6]
Saad Y Iterative methods for sparse linear systems 2003 Philadelphia SIAM
[7]
Demmel JW, Eisenstat SC, Gilbert JR, Li XS, and Liu JW A supernodal approach to sparse partial pivoting SIAM J Matrix Anal Appl 1999 20 3 720-755
[8]
Gilbert JR and Liu JW Elimination structures for unsymmetric sparse lu factors SIAM J Matrix Anal Appl 1993 14 2 334-352
[9]
Blackford LS, Petitet A, Pozo R, Remington K, Whaley RC, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, et al. An updated set of basic linear algebra subprograms (blas) ACM Trans Math Softw 2002 28 2 135-151
[10]
Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F, et al. The sunway taihulight supercomputer: system and applications Sci China Inform Sci 2016 59 7 1-16
[11]
Liu Y, Jacquelin M, Ghysels P, Li XS (2018) Highly scalable distributed-memory sparse triangular solution algorithms. In: 2018 Proceedings of the Seventh SIAM Workshop on Combinatorial Scientific Computing, pp. 87–96. SIAM
[12]
Yamazaki I, Li XS (2012) New scheduling strategies and hybrid programming for a parallel right-looking sparse lu factorization algorithm on multicore cluster systems. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 619–630. IEEE
[13]
Sao P, Li XS, Vuduc R (2018) A communication-avoiding 3d lu factorization algorithm for sparse matrices. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 908–919. IEEE
[14]
Sao P, Vuduc R, Li XS (2014) A distributed cpu-gpu sparse direct solver. In: European Conference on Parallel Processing, pp. 487–498. Springer
[15]
Sao P, Liu X, Vuduc R, Li X (2015) A sparse direct solver for distributed memory xeon phi-accelerated systems. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 71–81. IEEE
[16]
Niu Y, Lu Z, Dong M, Jin Z, Liu W, Tan G (2021) Tilespmv: A tiled algorithm for sparse matrix-vector multiplication on gpus. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 68–78. IEEE
[17]
Su J, Zhang F, Liu W, He B, Wu R, Du X, Wang R (2020) Capellinisptrsv: A thread-level synchronization-free sparse triangular solve on gpus. In: 49th International Conference on Parallel Processing-ICPP, pp. 1–11
[18]
Lu Z, Niu Y, Liu W (2020) Efficient block algorithms for parallel sparse triangular solve. In: 49th International Conference on Parallel Processing-ICPP, pp. 1–11
[19]
Duan X, Gao P, Zhang T, Zhang M, Liu W, Zhang W, Xue W, Fu H, Gan L, Chen D et al (2018) Redesigning lammps for peta-scale and hundred-billion-atom simulation on sunway taihulight. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 148–159. IEEE
[20]
Chen B, Fu H, Wei Y, He C, Zhang W, Li Y, Wan W, Zhang W, Gan L, Zhang Z et al (2018) Simulating the wenchuan earthquake with accurate surface topography on sunway taihulight. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 517–528. IEEE
[21]
Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W et al (2017) Redesigning cam-se for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12
[22]
Lin H, Zhu X, Yu B, Tang X, Xue W, Chen W, Zhang L, Hoefler T, Ma X, Liu X et al (2018)Shentu: processing multi-trillion edge graphs on millions of cores in seconds. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 706–716. IEEE
[23]
Zhong X, Li M, Yang H, Liu Y, Qian D (2018) swmr: a framework for accelerating mapreduce applications on sunway taihulight. IEEE Transactions on Emerging Topics in Computing
[24]
Li L, Fang J, Fu H, Jiang J, Zhao W, He C, You X, Yang G (2018) swcaffe: A parallel framework for accelerating deep learning applications on sunway taihulight. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 413–422. IEEE
[25]
Liu C, Xie B, Liu X, Xue W, Yang H, Liu X (2018) Towards efficient spmv on sunway manycore architectures. In: Proceedings of the 2018 International Conference on Supercomputing, pp. 363–373
[26]
Li M, Liu Y, Yang H, Luan Z, Qian D (2018) Multi-role sptrsv on sunway many-core architecture. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 594–601. IEEE
[27]
Wang X, Liu W, Xue W, Wu L (2018) swsptrsv: A fast sparse triangular solve with sparse level tile layout on sunway architectures. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 338–353
[28]
Fang J, Fu H, Zhao W, Chen B, Zheng W, Yang G (2017) swdnn: A library for accelerating deep learning applications on sunway taihulight. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 615–624. IEEE
[29]
Li M, Liu Y, Yang H, Luan Z, Gan L, Yang G, and Qian D Accelerating sparse cholesky factorization on sunway manycore architecture IEEE Trans Parallel Distrib Syst 2019 31 7 1636-1650
[30]
Davis TA and Hu Y The university of florida sparse matrix collection ACM Trans Math Softw (TOMS) 2011 38 1 1-25
[31]
Rose DJ, Tarjan RE, and Lueker GS Algorithmic aspects of vertex elimination on graphs SIAM J Comput 1976 5 2 266-283
[32]
Rose DJ and Tarjan RE Algorithmic aspects of vertex elimination on directed graphs SIAM J Appl Math 1978 34 1 176-197
[33]
Gilbert JR A note on the np-completeness of vertex elimination on directed graphs SIAM J Algebraic Discrete Methods 1980 1 3 292-294
[34]
Yannakakis M Computing the minimum fill-in is np-complete SIAM J Algeb Discrete Methods 1981 2 1 77-79
[35]
Tinney WF and Walker JW Direct solutions of sparse network equations by optimally ordered triangular factorization Proc IEEE 1967 55 11 1801-1809
[36]
Rose DJ A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations In Graph Theory and Computing 1972 New York Elsevier 183-217
[37]
Amestoy PR, Davis TA, and Duff IS An approximate minimum degree ordering algorithm SIAM J Matrix Anal Appl 1996 17 4 886-905
[38]
Eisenstat SC, Schultz MH, and Sherman AH Algorithms and data structures for sparse symmetric gaussian elimination SIAM J Sc Statist Comput 1981 2 2 225-237
[39]
Liu JW Modification of the minimum-degree algorithm by multiple elimination ACM Trans Math Softw (TOMS) 1985 11 2 141-153
[40]
Karypis G and Kumar V A fast and high quality multilevel scheme for partitioning irregular graphs SIAM J Sci Comput 1998 20 1 359-392
[41]
Karypis G and Kumar V Multilevelk-way partitioning scheme for irregular graphs J Parallel Distrib Comput 1998 48 1 96-129
[42]
Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: SC’98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, pp. 28–28. IEEE
[43]
Karypis G, Kumar V (1997) A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In: PPSC
[44]
Schloegel K, Karypis G, Kumar V (2000) Parallel multilevel algorithms for multi-constraint graph partitioning. In: European Conference on Parallel Processing, pp. 296–310. Springer
[45]
Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference, pp. 157–172
[46]
George A and Liu JW Computer solution of large sparse positive definite 1981 Englewood Cliffs Prentice Hall Professional Technical Reference
[47]
Li XS and Demmel JW Superlu\_dist: a scalable distributed-memory sparse direct solver for unsymmetric linear systems ACM Trans Math Softw (TOMS) 2003 29 2 110-140
[48]
Amestoy PR, Duff IS, L’Excellent J-Y, and Koster J A fully asynchronous multifrontal solver using distributed dynamic scheduling SIAM J Matrix Anal Appl 2001 23 1 15-41

Cited By

View all
  • (2024)swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processorsThe Journal of Supercomputing10.1007/s11227-023-05641-180:4(4682-4706)Online publication date: 1-Mar-2024
  • (2023)swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architectureThe Journal of Supercomputing10.1007/s11227-023-05114-579:10(11427-11451)Online publication date: 1-Jul-2023

Index Terms

  1. swSuperLU: A highly scalable sparse direct solver on Sunway manycore architecture
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Supercomputing
    The Journal of Supercomputing  Volume 78, Issue 9
    Jun 2022
    864 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 June 2022
    Accepted: 20 December 2021

    Author Tags

    1. Sparse LU factorization
    2. Sparse direct solver
    3. Sunway manycore architecture
    4. Performance optimization
    5. Parallel scalability

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Shandong Provincial Natural Science Foundation
    • “Colleges and Universities 20 Terms” Foundation of Jinan City, China
    • Research and Application Demonstration of Key Technologies of Autonomous Controllable Supercomputing Software Ecosystem Project

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processorsThe Journal of Supercomputing10.1007/s11227-023-05641-180:4(4682-4706)Online publication date: 1-Mar-2024
    • (2023)swParaFEM: a highly efficient parallel finite element solver on Sunway many-core architectureThe Journal of Supercomputing10.1007/s11227-023-05114-579:10(11427-11451)Online publication date: 1-Jul-2023

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media