Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers

Published: 01 November 2006 Publication History

Abstract

Dynamic data redistribution is used to enhance data locality and algorithm performance by reducing interprocessor communication in many parallel scientific applications on distributed memory multicomputers. Since the redistribution is performed at runtime, there is a performance tradeoff between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a processor replacement scheme to minimize the cost of interprocessor data exchange during runtime. The main idea of the proposed technique is to develop a replacement function for reordering logical processors in the destination phase. Based on the replacement function, a realigned sequence of destination processors can be derived and is then used to perform data decomposition in the receiving phase. Together with local matrix and compressed CRS vectors transposition schemes, the interprocessor communication can be eliminated during runtime. A significant improvement of this approach is that the realignment of data can be performed without interprocessor communication for special cases. The second contribution of the present technique is that the complicated communication sets generation could be simplified by applying local matrix transposition. Consequently, the indexing cost could be reduced significantly. The proposed techniques can be applied in both dense and sparse applications. A generalized symmetric redistribution algorithm is also presented in this work. To analyze the efficiency of the proposed technique, the theoretical analysis proves that up to (p-1)/p data transmission cost can be saved. For general cases, the symmetric redistribution algorithm saves 1/p communication overheads compared with the traditional method. Experimental results also show that the proposed techniques provide superior performance in most data redistribution instances.

References

[1]
B. Chapman, P. Mehrotra, H. Moritsch, and H. Zima, “Dynamic Data Distribution in Vienna Fortran,” Proc. Supercomputing Conf. '93, pp. 284-293, Nov. 1993.
[2]
S. Chatterjee, J.R. Gilbert, F.J.E. Long, R. Schreiber, and S.-H. Teng, “Generating Local Address and Communication Sets for Data Parallel Programs,” J. Parallel and Distributed Computing, vol. 26, pp. 72-84, 1995.
[3]
F. Desprez, J. Dongarra, and A. Petitet, “Scheduling Block-Cyclic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2, pp. 192-205, Feb. 1998.
[4]
J. Dongarra, R. Van De Geijn, and D. W. Walker, “A Look at Scalable Dense Linear Algebra Libraries,” Technical Report ORNL/TM-12126 from Oak Ridge Nat'l Laboratory, Apr. 1992.
[5]
J. Garcia, E. Ayguade, and J. Labarta, “A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 4, Apr. 2001.
[6]
M. Guo, “Communication Generation for Irregular Codes,” J.Supercomputing, vol. 25, no. 3, pp. 199-214, 2003.
[7]
S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On the Generation of Efficient Data Communication for Distributed-Memory Machines,” Proc. Int'l Computing Symp., pp. 504-513, 1992.
[8]
S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan, “On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 32, pp. 155-172, 1996.
[9]
S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi, “Compilation Technique for Block-Cyclic Distribution,” Proc. ACM Int'l Conf. Supercomputing, pp. 392-403, 1994.
[10]
C.-H. Hsu, S.-W. Bai, Y.-C. Chung, C.-S. Yang, “A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 12, pp. 1201-1216, Dec. 2000.
[11]
C.-H. Hsu, Y.-C. Chung, and C.-R. Dow, “Efficient Methods for Multidimensional Array Redistribution,” J. Supercomputing, vol. 17, no. 1, pp. 23-46, Aug. 2000.
[12]
C.-H. Hsu, D.-L. Yang, Y.-C. Chung, and C.-R. Dow, “A Generalized Processor Mapping Technique for Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 7, pp. 743-757, July 2001.
[13]
E.T. Kalns and L.M. Ni, “Processor Mapping Technique toward Efficient Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 12, Dec. 1995.
[14]
S.D. Kaushik, C.H. Huang, J. Ramanujam, and P. Sadayappan, “Multiphase Data Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp. '95, pp. 441-445, 1995.
[15]
P.-Z. Lee and W.Y. Chen, “Compiler Techniques for Determining Data Distribution and Generating Communication Sets on Distributed-Memory Multicomputers,” Proc. 29th IEEE Hawaii Int'l Conf. System Sciences, pp.537-546, Jan. 1996.
[16]
S. Lee, H. Yook, M. Koo, and M. Park, “Processor Reordering Algorithms toward Efficient GEN_BLOCK Redistribution,” Proc. 2001 ACM Symp. Applied Computing, 2001.
[17]
Y.W. Lim, P.B. Bhat, and V.K. Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of Arrays,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 74-83, 1996.
[18]
Y.W. Lim, N. Park, and V.K. Prasanna, “Efficient Algorithms for Multi-Dimensional Block-Cyclic Redistribution of Arrays,” Proc. 26th Int'l Conf. Parallel Processing, pp. 234-241, 1997.
[19]
N. Park, V.K. Prasanna, and C.S. Raghavendra, “Efficient Algorithms for Block-Cyclic Data Redistribution between Processor Sets,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 12, pp. 1217-1240, Dec. 1999.
[20]
A.P. Petitet and J.J. Dongarra, “Algorithmic Redistribution Methods for Block-Cyclic Decompositions,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 12, pp. 1201-1216, Dec. 1999.
[21]
L. Prylli and B. Touranchean, “Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, pp. 63-72, Aug. 1997.
[22]
S. Ramaswamy, B. Simons, and P. Banerjee, “Optimization for Efficient Data Redistribution on Distributed Memory Multicomputers,” J. Parallel and Distributed Computing, vol. 38, pp.217-228, 1996.
[23]
J.M. Stichnoth, D. O'Hallaron, and T.R. Gross, “Generating Communication for Array Statements: Design, Implementation, and Evaluation,” J. Parallel and Distributed Computing, vol. 21, pp.150-159, 1994.
[24]
R. Thakur, A. Choudhary, and J. Ramanujam, “Efficient Algorithms for Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6, June 1996.
[25]
A. Thirumalai and J. Ramanujam, “HPF Array Statements: Communication Generation and Optimization,” Proc. Third Workshop Languages, Compilers, and Run-Time System for Scalable Computers, May 1995.
[26]
A. Wakatani and M. Wolfe, “Optimization of Array Redistribution for Distributed Memory Multicomputers,” Parallel Computing, vol. 21, no. 9, pp. 1485-1490, Sept. 1995.
[27]
D.W. Walker and S.W. Otto, “Redistribution of Block-Cyclic Data Distributions Using MPI,” Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, 1996.
[28]
H. Wang, M. Guo, and D. Wei, “Divide-and-Conquer Algorithm for Irregular Redistributions in Parallelizing Compilers,” J. Supercomputing, vol. 29, no. 2, 2004.
[29]
H. Wang, M. Guo, and W. Chen, “An Efficient Algorithm for Irregular Redistribution in Parallelizing Compilers,” Proc. 2003 Int'l Symp. Parallel and Distributed Processing with Applications (ISPA '03), 2003.

Cited By

View all
  • (2018)A Two-Level Scheduling Strategy for optimising communications of data parallel programs in clustersInternational Journal of Ad Hoc and Ubiquitous Computing10.1504/IJAHUC.2010.0355376:4(263-269)Online publication date: 27-Dec-2018
  • (2011)CADProceedings of the 4th international conference on Data management in grid and peer-to-peer systems10.5555/2040132.2040147(120-134)Online publication date: 1-Sep-2011
  • (2010)A compound scheduling strategy for irregular array redistribution in cluster based parallel systemProceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers10.5555/1927517.1927528(69-77)Online publication date: 16-May-2010

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 17, Issue 11
November 2006
159 pages

Publisher

IEEE Press

Publication History

Published: 01 November 2006

Author Tags

  1. CRS transposition
  2. Processor replacement
  3. communication free
  4. data redistribution
  5. sparse matrix.
  6. symmetric matrix

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)A Two-Level Scheduling Strategy for optimising communications of data parallel programs in clustersInternational Journal of Ad Hoc and Ubiquitous Computing10.1504/IJAHUC.2010.0355376:4(263-269)Online publication date: 27-Dec-2018
  • (2011)CADProceedings of the 4th international conference on Data management in grid and peer-to-peer systems10.5555/2040132.2040147(120-134)Online publication date: 1-Sep-2011
  • (2010)A compound scheduling strategy for irregular array redistribution in cluster based parallel systemProceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers10.5555/1927517.1927528(69-77)Online publication date: 16-May-2010

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media