Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/762761.762797acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

A new data-mapping scheme for latency-tolerant distributed sparse triangular solution

Published: 16 November 2002 Publication History

Abstract

This paper concerns latency-tolerant schemes for the efficient parallel solution of sparse triangular linear systems on distributed memory multiprocessors. Such triangular solution is required when sparse Cholesky factors are used to solve for a sequence of right-hand-side vectors or when incomplete sparse Cholesky factors are used to precondition a Conjugate Gradients iterative solver. In such applications, the use of traditional distributed substitution schemes can create a performance bottleneck when the latency of interprocessor communication is large. We had earlier developed the Selective Inversion (SI) scheme to reduce communication latency costs by replacing distributed substitution by parallel matrix vector multiplication. We now present a new two-way mapping of the triangular sparse matrix to processors to improve the performance of SI by halving its communication latency costs. We provide analytic results for model sparse matrices and we report on the performance of our scheme for parallel preconditioning with incomplete sparse Cholesky factors.

References

[1]
M. A. AJIZAND A. JENNINGS, A robust incomplete Cholesky-conjugate gradient algorithm, Internat. J. Numer. Methods. Engrg., 20 (1984), pp. 949--966.
[2]
O. AXELSSON, Iterative Solution Methods, Cambridge University Press, Cambridge, 1994.
[3]
M. BENZIAND C. D. MEYERAND M. TÛMA, A sparse approximate inverse preconditioner for the Conjugate Gradient method, SIAM J. Sci. Comput., 17 (1996), pp. 1135--1149.
[4]
M. BOLLHÖFERA robust ILU with pivoting based on monitoring the growth of the inverse factors, Linear Algebra and its Applications, 338(1--3), pp. 201--218, 2001.
[5]
E. CHOW, ParaSails User's Guide, Tech. Report UCRL-MA-137863, Lawrence Livermore National Laboratory, Livermore, CA, 2000.
[6]
E. CHOW, Parallel implementation and performance characteristics of least squares sparse approximate inverse preconditioners, Tech. Report UCRL-JC-138883, Lawrence Livermore National Laboratory, Livermore, CA, 2000.
[7]
P. CONCUS, G. GOLUB, AND D. O'LEARY, A generalized conjugate gradient method for the numerical solution of elliptic partial differential equations, J. R. Bunchand D. J. Rose, eds., Sparse Matrix Computations, pp. 309--332, Academic Press, 1976.
[8]
J. J. DONGARRA, J. D. CROZ, S. HAMMARLING, AND I. S. DUFF, An extended set of basic linear algebra subprograms, ACM Trans. Math. Software, 14 (1988), pp. 1--17.
[9]
J. J. DONGARRA, J. DU CROZ, I. DUFF, AND S. HAMMARLING, An Set of Level 3 FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Software, 16 (1990), pp. 1--17.
[10]
I. S. DUFF, A. M. ERISMAN, AND J. K. RAID, Direct Methods for sparse Matrices, Claredon Press, Oxford, 1986.
[11]
A. GEORGE, M. T. HEATH, J. W-H. LIUAND E. G-Y. NGSymbolic Cholesky factorization on a local-memory multiprocessor, Parallel Computing 5 (1987), pp. 85--95.
[12]
J. A. GEORGE, AND J. W-H. LIU, Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall Inc., Englewood Cliffs, NJ, 1981.
[13]
G. GOLUBAND C. F. VAN LOAN, Matrix Computations, third edition, The Johns Hopkins University Press, Baltimore, MD, 1996.
[14]
A. GUPTAAND V. KUMAR, A scalable parallel algorithm for sparse matrix factorization, Technical Report 94--19, Department of Computer Science, University of Minnesota, Minneapolis, 1994.
[15]
I. GUSTAFSSON, An incomplete factorization preconditioning method based on modification of element matrices BIT Numerical Mathematics 36, 1, pp. 86--100, 1996.
[16]
M. J. GROTEAND T. HUCKLE, Parallel Preconditioning with Sparse Approximate Inverses, SIAM J. Sci. Comput., 18 (1997), pp. 838--853.
[17]
M. T. HEATHAND C. H. ROMINE, Parallel solution of triangular systems on distributed-memory multiprocessors, SIAM J. Sci. Stat. Comput., 9 (1988), pp. 558--587.
[18]
M. T. HEATH, E. NG, AND B. W. PEYTON, Parallel algorithms for sparse linear systems, SIAM Review, 33 (1991), pp. 420--460.
[19]
M. HEATHAND P. RAGHAVAN, Parallel sparse triangular solution, in IMA Volumes in Mathematics and its Applications, R. Gulliver, M. Heath, R. Schreiber, and P. Bjorstad, eds., vol. 105, Springer-Verlag (1998), pp. 289--306.
[20]
L. YU. KOLOTILINAAND A. YU. YEREMIN, Factorized sparse approximate inverse preconditionings I. Theory, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 45--58.
[21]
C. -J. LINAND J. J. MORÉ, Incomplete Cholesky factorizations with limited memory, SIAM J. Sci. Comput., 21 (1999), pp. 24--45.
[22]
J. W. H. LIU, The role of elimination trees on sparse factorization, SIAM J. Matrix Anal. Appl., 11 (1990), pp. 134--172.
[23]
J. W. H. LIU, E. NG, AND B. W. PAYTON, On finding supernodes for sparse matrix computations, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 242--252.
[24]
R. F. LUCAS, Solving planar systems of equations on distributed-memory multiprocessors, Ph. D. Thesis, Department of Electrical Engineering, Stanford University, 1987.
[25]
T. A. MANTEUFFEL, An incomplete factorization technique for positive definite linear systems, Math. Comput., 34 (1980), pp. 473--497.
[26]
J. A. MEIJERINKAND H. A. VANDER VORST, An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix, Math. Comp., 31 (1977), pp. 148--162.
[27]
MESSAGE PASSING INTERFACE FORUM, MPI: A Message-Passing Interface Standard, Int. J. of Supercomputer Applications, 8, (1994) pp. 165--414.
[28]
E. G. NG, B. W.PEYTONAND P. RAGHAVAN, A Blocked incomplete Cholesky preconditioner for hierarchical-memory computers, Proceedings of the Fourth IMACS International Symposium on Iterative Methods in Scientific Computation. IMACS Series in Computational and Applied Mathematics: Iterative Methods in Scientific Computation IV, D. R. Kincaid and A. C. Elster, eds., 1999, pp. 211--222.
[29]
P. RAGHAVAN, DSCPACK: A Domain-Separator Codes For The Parallel Solution Of Sparse Liner Systems, Technical Report CSE-02-004, Department of Computer Science and Engineering, The Pennsylvania State University.
[30]
P. RAGHAVAN, Efficient parallel sparse triangular solution using selective inversion, Parallel processing Letters, Vol. 8 No. 1 (1998), pp. 29--40.
[31]
P. RAGHAVAN, K. TERANISHIAND E. NG, Towards Scalable Preconditioning Using Incomplete Factorization, Proceedings of 2001 International Conference On Preconditioning Techniques For Large Sparse Matrix Problems In Industrial Applications, pp. 63--65, Tahoe City CA, April 29--May 1, 2001. Longer version submitted to Numerical Linear Algebra.
[32]
Y. SAAD, Iterative Methods for Sparse Linear Systems, PWS Publishing, Boston, 1996.
[33]
B. SMITH, L. C. MCINNES, AND W. GROPP, PETSc 2. 0 user's manual, Technical Report ANL 95/11 - Revision 2. 0. 22 (1997), Mathematics and Computer Science Division Argonne National Laboratory, Argonne IL 60439.

Cited By

View all
  • (2019)Automatic translation of MPI source into a latency-tolerant, data-driven formJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.02.009106:C(1-13)Online publication date: 4-Jan-2019
  • (2004)Parallel hybrid sparse solvers through flexible incomplete cholesky preconditioningProceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing10.1007/11558958_76(637-643)Online publication date: 20-Jun-2004

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing
November 2002
952 pages
ISBN:076951524X

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 16 November 2002

Check for updates

Qualifiers

  • Article

Conference

SC '02
Sponsor:

Acceptance Rates

SC '02 Paper Acceptance Rate 67 of 230 submissions, 29%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Automatic translation of MPI source into a latency-tolerant, data-driven formJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.02.009106:C(1-13)Online publication date: 4-Jan-2019
  • (2004)Parallel hybrid sparse solvers through flexible incomplete cholesky preconditioningProceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing10.1007/11558958_76(637-643)Online publication date: 20-Jun-2004

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media