Nothing Special   »   [go: up one dir, main page]

skip to main content
article

An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures

Published: 10 December 2015 Publication History

Abstract

An optimized parallel algorithm is proposed to solve the problem occurred in the process of complicated backward substitution of cyclic reduction during solving tridiagonal linear systems. Adopting a hybrid parallel model, this algorithm combines the cyclic reduction method and the partition method. This hybrid algorithm has simple backward substitution on parallel computers comparing with the cyclic reduction method. In this paper, the operation count and execution time are obtained to evaluate and make comparison for these methods. On the basis of results of these measured parameters, the hybrid algorithm using the hybrid approach with a multi-threading implementation achieves better efficiency than the other parallel methods, that is, the cyclic reduction and the partition methods. In particular, the approach involved in this paper has the least scalar operation count and the shortest execution time on a multi-core computer when the size of equations meets some dimension threshold. The hybrid parallel algorithm improves the performance of the cyclic reduction and partition methods by 19.2% and 13.2%, respectively. In addition, by comparing the single-iteration and multi-iteration hybrid parallel algorithms, it is found that increasing iteration steps of the cyclic reduction method does not affect the performance of the hybrid parallel algorithm very much. Copyright © 2015 John Wiley & Sons, Ltd.

References

[1]
Wang HH. A parallel method for tridiagonal equations. ACM Transactions on Mathematical Software 1981; Volume 7: pp.170-183.
[2]
Zhang Y, Cohen J, Owens JD. Fast tridiagonal solvers on the GPU. Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming PPoPP' 10, Bangalore, India, 2010; pp.127-136.
[3]
Lambiotte JJ, Jr., Voigt RG. The solution of tridiagonal linear systems on the CDC STAR-100 computer. ACM Transactions on Mathematical Software 1975; Volume 1: pp.308-329.
[4]
Nguyen H, Beauregard MA, Morgan R. Improving the speed of convergence of GMRES for certain perturbed tridiagonal systems. IEEE 45th Southeastern Symposium on System TheorySSST, Waco, Texas, 2013; pp.63-67.
[5]
Thomas LH. Elliptic problems in linear difference equations over a network. Watson Scientific Computing Laboratory Report, Columbia University: Manhattan, 1949.
[6]
Hirshman SP, Perumalla KS, Lynch VE, Sanchez R. BCYCLIC: a parallel block tridiagonal matrix cyclic solver. Journal of Computational Physics 2010; Volume 229: pp.6392-6404.
[7]
Natarajan EP. Klu-a high performance sparse linear solver for circuit simulation problems. Diss, University of Florida: Gainesville, 2005.
[8]
Edison TM, Erlebacher G. Implementation of a fully-balanced periodic tridiagonal solver on a parallel distributed memory architecture. Concurrency and Computation: Practice and Experience 1995; Volume 7 Issue 4: pp.273-302.
[9]
Goddeke D, Strzodka R. Cyclic reduction tridiagonal solvers on gpus applied to mixed precision multigrid. IEEE Transactions on Parallel and Distributed Systems 2011; Volume 22: pp.22-32.
[10]
Coakley ES, Rokhlin V. A fast divide-and-conquer algorithm for computing the spectra of real symmetric tridiagonal matrices. Applied and Computational Harmonic Analysis 2013; Volume 34: pp.379-414.
[11]
Bini DA, Meini B. The cyclic reduction algorithm: from Poisson equation to stochastic processes and beyond. Numerical Algorithms 2009; Volume 51: pp.23-60.
[12]
Sogabe T, El-Mikkawy M. Fast block diagonalization of k-tridiagonal matrices. Applied Mathematics and Computation 2011; Volume 218: pp.2740-2743.
[13]
Seal SK, Perumalla KS, Hirshman SP. Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations. Journal of Parallel and Distributed Computing 2013; Volume 73: pp.273-280.
[14]
Stone HS. Parallel tridiagonal equation solvers. ACM Transactions on Mathematical Software 1975; Volume 1: pp.289-307.
[15]
Stone HS. An efficient parallel algorithm for the solution of a tridiagonal linear system of equations. Journal of the ACM 1973; Volume 20: pp.27-38.
[16]
Wu C-Y, Huang T-Z. Stability of block LU factorization for block tridiagonal block H-matrices. Journal of Computational and Applied Mathematics 2012; Volume 236: pp.2673-2684.
[17]
Miro R, Shklarski G, Toledo S. Partitioned triangular tridiagonalization. ACM Transactions on Mathematical Software 2011; Volume 37: pp.38-38:16.
[18]
Heller D. Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems. SIAM Journal on Numerical Analysis 1976; Volume 13: pp.484-496.
[19]
Barnes GH, Brown RM, Kato M, Kuck DJ, Slotnick DL, Stokes RA. The ILLIAC IV computer. IEEE Transactions on Computers 1968; Volume 17: pp.746-757.
[20]
Andrew V. Terekhov: a fast parallel algorithm for solving block-tridiagonal systems of linear equations including the domain decomposition method. Parallel Computation 2013; Volume 39: pp.245-258.
[21]
Akimova EN, Belousov DV. Parallel algorithms for solving linear systems with block-tridiagonal matrices on multi-core CPU with GPU. Journal of Computational Science 2012; Volume 3: pp.445-449.
[22]
Petschow M, Bientinesi P. MR3-SMP: a symmetric tridiagonal eigensolver for multicore architectures. Parallel Computation 2011; Volume 37: pp.795-805.
[23]
Tang G, Li K, Li K, Chen H, Jiayi D. A hybrid parallel tridiagonal solver on multi-core architectures. 2014 IEEE International Parallel Distributed Processing Symposium Workshops IPDPSW, IEEE, Phoenix Arizona, USA, 2014; pp.604-613.
[24]
Ltaief H, Luszczek P, Dongarra J. High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures. ACM Transactions on Mathematical Software 2013; Volume 39: pp.16-16:22.
[25]
Li K, Yang W, Li K. Performance analysis and optimization for SpMV on GPU using probabilistic modeling. IEEE Transactions on Parallel and Distributed Systems 2015; Volume 26: pp.196-205.
[26]
Heng Z, Wu Z. A parallel multi-layer hybrid algorithm for solving block-tridiagonal systems. International Conference on Transportation, Mechanical, and Electrical Engineering TMEE, Changchun, China, 2011; pp.1415-1418.
[27]
Hockney Roger W. A fast direct solution of Poissons equation using Fourier analysis. Journal of the ACM 1965; Volume 12: pp.95-113.
[28]
Allmann S, Rauber T, Runger G. Cyclic reduction on distributed shared memory machines. Parallel, Distributed, and Network-Based Processing, Euromicro Conference, Mantova, Italy, 2001; pp.290-297.

Cited By

View all
  • (2017)A general and efficient divide-and-conquer algorithm framework for multi-core clustersCluster Computing10.1007/s10586-017-0766-y20:3(2605-2626)Online publication date: 1-Sep-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Concurrency and Computation: Practice & Experience
Concurrency and Computation: Practice & Experience  Volume 27, Issue 17
December 2015
981 pages
ISSN:1532-0626
EISSN:1532-0634
Issue’s Table of Contents

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 10 December 2015

Author Tags

  1. hybrid parallel algorithm
  2. multi-core architecture
  3. multi-threading
  4. tridiagonal system of equations

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)A general and efficient divide-and-conquer algorithm framework for multi-core clustersCluster Computing10.1007/s10586-017-0766-y20:3(2605-2626)Online publication date: 1-Sep-2017

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media