Generalized overlap regions for communication optimization in data-parallel programs

A. Venkatachar¹,
J. Ramanujam¹ &
A. Thirumalai¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

99 Accesses
2 Citations

Abstract

Data-parallel languages such as High Performance Fortran, Vienna Fortran and Fortran D include directives for alignment and distribution that describe how data and computation are mapped onto the processors in a distributed-memory multiprocessor. A compiler for these language that generates code for each processor has to compute the sequence of local memory addresses accessed by each processor and the sequence of sends and receives for a given processor to access non-local data. While the address generation problem has received much attention, issues in communication have not been dealt with extensively. A novel approach for the management of communication sets and strategies for local storage of remote references is presented. Algorithms for deriving communication patterns are discussed first. Then, two schemes that extend the notion of a local array by providing storage for non-local elements (called overlap regions) interspersed throughout the storage for the local portion are presented. The two schemes, namely course padding and column padding enhance locality of reference significantly at the cost of a small overhead due to unpacking of messages. The performance of these schemes are compared to the traditional buffer-based approach and improvements of up to 30% in total time are demonstrated. Several message optimizations such as offset communication, message aggregation and coalescing are also discussed.

Supported in part by an NSF Young Investigator Award CCR-9457768, and NSF grant CCR-9210422, and by the Louisiana Board of Regents through contract LEQSF (1991–94)-RD-A-09.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. To appear in Scientific Programming, 1996.
Google Scholar
S. Benkner. Handling block-cyclic distributed arrays in Vienna Fortran 90. In Proc. International Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, June 1995.
Google Scholar
B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. Scientific Programming, 1(1):31–50, Fall 1992.
Google Scholar
S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Teng. Generating local addresses and communication sets for data parallel programs. Journal of Parallel and Distributed Computing, 26(1):72–84, 1995.
Article Google Scholar
G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu. Fortran D language specification. Technical Report CRPC-TR90079, Rice University, December 1990.
Google Scholar
M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice and Experience, 2(3):171–193, September 1990.
Google Scholar
S. Gupta, S. Kaushik, C. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. To appear in Journal of Parallel and Distributed Computing.
Google Scholar
High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1–2): 1–170, 1993.
Google Scholar
K. Kennedy, N. Nedeljkovic, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. In Proc. of Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.
Google Scholar
K. Kennedy, N. Nedeljkovic, and A. Sethi. Communication generation for cyclic(k) distributions. In Languages, Compilers, and Run-Time Systems for Scalable Computers, B. Szymanski and B. Sinharoy (Eds.), Kluwer Academic Publishers, 1995.
Google Scholar
C. Koelbel. Compile-time generation of communication for scientific programs. In Proc. Supercomputing '91, pages 101–110, November 1991.
Google Scholar
C. Koelbel, D. Loveman, R. Schreiber, G. Steele, and M. Zosel. High Performance Fortran Handbook. The MIT Press, 1994.
Google Scholar
J. Ramanujam. Non-unimodular transformations of nested loops. In Proc. Supercomputing 92, pages 214–223, November 1992.
Google Scholar
C. van Reeuwijk, H.J. Sips, W. Denissen, and E. M. Paalvast. Implementing HPF distributed arrays on a message-passing parallel computer system. CP Technical Report series, TR9506, Delft University of Technology, 1995.
Google Scholar
J. Stichnoth. Efficient compilation of array statements for private memory multicomputers. Technical Report CMU-CS-93-109, School of Computer Science, Carnegie-Mellon University, February 1993.
Google Scholar
E. Su, A. Lain, S. Ramaswamy, D.J. Palermo, E.W. Hodges IV, and P. Banerjee. Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers. In Proc. 1995 ACM International Conference on Supercomputing, Barcelona, Spain, July 1995.
Google Scholar
A. Thirumalai. Code generation and optimization for High Performance Fortran. M.S. Thesis, Department of Electrical and Computer Engineering, Louisiana State University, August 1995.
Google Scholar
A. Thirumalai and J. Ramanujam. An efficient compile-time approach to compute address sequences in data parallel programs. In Proc. 5th International Workshop on Compilers for Parallel Computers, Malaga, Spain, pages 581–605, June 1995.
Google Scholar
A. Thirumalai and J. Ramanujam. Fast address sequence generation for data-parallel programs using integer lattices. In Languages and Compilers for Parallel Computing, P. Sadayappan et al. (Eds.), Lecture Notes in Computer Science, Springer-Verlag, 1996.
Google Scholar
A. Thirumalai, J. Ramanujam, and A. Venkatachar. Communication generation and optimization for HPF. In Languages, Compilers, and Run-Time Systems for Scalable Computers, B. Szymanski and B. Sinharoy (Eds.), Kluwer Academic Publishers, 1995.
Google Scholar
A. Thirumalai and J. Ramanujam. Efficient computation of address sequences in data parallel programs using closed forms for basis vectors. Journal of Parallel and Distributed Computing, 38(2): 188–203, November 1996.
Article Google Scholar
M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Co., Redwood City, CA, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Louisiana State University, 70803-5901, Baton Rouge, LA, USA
A. Venkatachar, J. Ramanujam & A. Thirumalai

Authors

A. Venkatachar
View author publications
You can also search for this author in PubMed Google Scholar
J. Ramanujam
View author publications
You can also search for this author in PubMed Google Scholar
A. Thirumalai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Venkatachar, A., Ramanujam, J., Thirumalai, A. (1997). Generalized overlap regions for communication optimization in data-parallel programs. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017266

Download citation

DOI: https://doi.org/10.1007/BFb0017266
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics