Article

Free access

Communication optimization and code generation for distributed memory machines

Authors:

Saman P. Amarasinghe,

Monica S. LamAuthors Info & Claims

PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation

Pages 126 - 138

https://doi.org/10.1145/155090.155102

Published: 01 June 1993 Publication History

Abstract

This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our algorithms produce an SPMD (single program multiple data) program to be run on each processor. Our compiler generated the necessary receive and send instructions, optimizes the communication by eliminating redundant communication and aggregating small messages into large messages, allocates space locally on each processor, and translates global data addresses to local addresses.

Our techniques are based on an exact data-flow analysis on individual array element accesses. Unlike data dependence analysis, this analysis determines if two dynamic instances refer to the same value, and not just to the same location. Using this information, our compiler can handle more flexible data decompositions and find more opportunities for communication optimization than systems based on data dependence analysis.

Our technique is based on a uniform framework, where data decompositions, computation decompositions and the data flow information are all represented as systems of linear inequalities. We show that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.

References

[1]

C. Ancourt and F. Irigoin. Scanning Polyhedra with DO Loops. In Third A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, pp. 39-50, April 1991.

Digital Library

[2]

M. Ancourt. Generation automatique de codes de transfert pour multt~vrocesseurs a memoires locales. PhD thesis, University of Paris VI, March 1991.

[3]

J. M. Anderson and M. S. Lam. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In Proceedings of the SIGPLAN '93 Conference on Program Language Design and Implementation, June 1993.

Digital Library

[4]

F. Andr6, O. Ch6ron and J.-L. Pazat. Compiling Sequential Programs for Distributed Memory Parallel Computers with Pandore II. In Third Workshop on Compilers for Parallel Computers, pp. 231-242, July 1992.

[5]

U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic, 1988.

Digital Library

[6]

W. Blume and R. Eigenmann. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs. In IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 6, pp. 643-656, November 1992.

Digital Library

[7]

M. Bromley, S. Heller, T. McNerney and G. L. Steele Jr. Fortran at Ten Gigaflops: The Connection Machine Convolution Compiler. In Proceedings of A CM Conference on Programming Language Design and Implementation, pp. 145-164, June 1991.

Digital Library

[8]

B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. In Third Workshop on Compilers for Parallel Computers, pp. 121-160, July 1992.

[9]

C. Koelbel. Compile-time generation of regular communication patterns. In Proceedings of Supercomputing '91, pp. 101-110, November 1991.

Digital Library

[10]

P. Feautrier, Array expansion. In International Conference on Supercomputing, pp. 429-442, 1988.

Digital Library

[11]

P. Feautrier, Parametric integer programming. Technical Report 209, Laboratoire Methodologie and Architecture Des Systems Informatiques, January 1988.

[12]

P. Feautrier. Dataflow analysis of array and scalar references. In International Journal of Parallel Programming, 20(1):23- 52, February 1991.

Digital Library

[13]

G. Gannon, W. Jalby and K Gallivan. Strategies for cache and local memory management by global program transformation. In Journal of Parallel and Distributed Computing, 5:587-616, 1988.

Digital Library

[14]

G. R. Gao, R. Olsen, V. Sarkar and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Programming Languages and Compilers for Parallel Computing, pp. 171-181, August 1992.

Digital Library

[15]

P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. In IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 3, pp. 350-360, July 1991.

Digital Library

[16]

S. Hiranandani, K. Kennedy and C. Tseng. Compiling Fortran D for MIMD Distributed-Memory Machines. In Communications of ACM, vol. 35, no. 8, pp. 66-80, August 1992.

Digital Library

[17]

L. G. C. Hamey, J. A. Webb and I. C. Wu. An architecture independent programming language for low-level vision. Computer Vision, Graphics, and Image Processing, vol. 48, no. 2, pp. 246-264, November 1989.

Digital Library

[18]

J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays, in Proceedings of Frontiers '90: The Third Symposium on the Frontiers of Massively Parallel Computation. pp. 424-432. October 1990.

[19]

D. E. Maydan, S. P. Amarasinghe and M. S. Lain. Array Data- Flow Analysis and its Use in Array Privatization. in Proceedings of ACM Conference on Principles of Programming Languages, pp. 2-15. January 1993.

Digital Library

[20]

D. E. Maydan, Accurate Analysis of Array References. PhD thesis, Stanford University, September 1992. Published as CSL-TR-92-547.

Digital Library

[21]

P. Mehrotra and J. Van Rosendale. High Level Programming of Distributed Memory Architectures. In A. Nicolau, D. Gelernter, T. Gross and D. Padua editors, Advances in Languages and Compilers for Parallel Processing, pp. 364- 384, The MIT Press, Cambridge, Massachusetts, 1991.

[22]

K. Pingali and A. Rogers. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, pp. 69-80, June 1989.

Digital Library

[23]

A. Schrijver, Theory of Linear and Integer Programming, Wiley, Chichester' 1986.

Digital Library

[24]

G. L. Steele. Proposal for alignment and distribution directives in HPF. Draft presented at HPF Forum meeting, June 1992.

[25]

S. Tjiang, M. Wolf, M. Lain, K. Pieper and J. Hennessy. Integrating scalar optimizations and parallelization. In U. Banerjee, D. Gelernter, A. Nicolau and D. Padua editors, Languages and Compilers for Parallel Computing, pp. 137- 151, Springer-Verlag, Berlin, Germany, 1992.

Digital Library

[26]

C.-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Rice University, January 1993. Published as Rice COMP TR93-199.

Digital Library

[27]

M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In SIGPLAN Notices, vol. 26, no. 6, pp. 30-44, June 1991.

Digital Library

[28]

M. E. Wolf. Improving locality and parallelism in nested loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.

Digital Library

Cited By

Willis BShrivastava AMack JDave SChakrabarti CBrunhaver J(2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: Jan-2024
https://doi.org/10.1109/TC.2023.3327504
Kong MAbu Yosef RRountev ASadayappan PMohror KArnold DBadia R(2023)Automatic Generation of Distributed-Memory Mappings for Tensor ComputationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607096(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607096
Bakhtin VZakharov DKataev NKolganov AYakobovskiy M(2023)Automation of Programming for Promising High-Performance Computing SystemsParallel Computing Technologies10.1007/978-3-031-41673-6_1(3-17)Online publication date: 15-Aug-2023
https://doi.org/10.1007/978-3-031-41673-6_1
Show More Cited By

Index Terms

Communication optimization and code generation for distributed memory machines

Recommendations

Communication optimization and code generation for distributed memory machines

This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our ...
Eliminating Redundant Communication of Code Generation for Distributed Memory Machines
SNPD '07: Proceedings of the Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing - Volume 02

How to eliminate redundant communications is a key issue in developing parallelizing compilers for distributed memory machines. An important way to improve the performance of the code generated by parallelizing compiler in distributed memory systems is ...
Code optimization of pipeline constraints

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation

August 1993

313 pages

ISBN:0897915984

DOI:10.1145/155090

Editor:
Robert Cartwright
Rice Univ., Houston, TX

ACM SIGPLAN Notices Volume 28, Issue 6
June 1993
313 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/173262
Editor:
Richard Wexelblat
IDA/CSED, Alexandria, VA
Issue’s Table of Contents

Copyright © 1993 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PLDI93

Sponsor:

SIGPLAN

PLDI93: ACM SIGPLAN Conference on Programming Languages Design and Implementation

June 21 - 25, 1993

New Mexico, Albuquerque, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

198
Total Citations
View Citations
1,073
Total Downloads

Downloads (Last 12 months)162
Downloads (Last 6 weeks)20

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Willis BShrivastava AMack JDave SChakrabarti CBrunhaver J(2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: Jan-2024
https://doi.org/10.1109/TC.2023.3327504
Kong MAbu Yosef RRountev ASadayappan PMohror KArnold DBadia R(2023)Automatic Generation of Distributed-Memory Mappings for Tensor ComputationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607096(1-13)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607096
Bakhtin VZakharov DKataev NKolganov AYakobovskiy M(2023)Automation of Programming for Promising High-Performance Computing SystemsParallel Computing Technologies10.1007/978-3-031-41673-6_1(3-17)Online publication date: 15-Aug-2023
https://doi.org/10.1007/978-3-031-41673-6_1
Barua PZhao JSarkar V(2020)OmpMemOpt: Optimized Memory Movement for Heterogeneous ComputingEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_13(200-216)Online publication date: 24-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-57675-2_13
Freiberg OPalsberg JEslamimehr M(2016)Retargetable Communication for Distributed Programs2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA)10.1109/QoSA.2016.8(21-30)Online publication date: Apr-2016
https://doi.org/10.1109/QoSA.2016.8
Tavarageri SMeister BBaskaran MPradelle BHenretty TKonstantinidis AJohnson ALethin R(2015)Automatic cluster parallelization and minimizing communication via selective data replication2015 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2015.7322481(1-7)Online publication date: Sep-2015
https://doi.org/10.1109/HPEC.2015.7322481
Zhao JZhao RXu J(2015)Code Generation for Distributed-Memory ArchitecturesThe Computer Journal10.1093/comjnl/bxv077(bxv077)Online publication date: 15-Sep-2015
https://doi.org/10.1093/comjnl/bxv077
Waterland AAngelino EAdams RAppavoo JSeltzer M(2014)ASCACM SIGARCH Computer Architecture News10.1145/2654822.254198542:1(575-590)Online publication date: 24-Feb-2014
https://dl.acm.org/doi/10.1145/2654822.2541985
Waterland AAngelino EAdams RAppavoo JSeltzer M(2014)ASCACM SIGPLAN Notices10.1145/2644865.254198549:4(575-590)Online publication date: 24-Feb-2014
https://dl.acm.org/doi/10.1145/2644865.2541985
Waterland AAngelino EAdams RAppavoo JSeltzer MBalasubramonian RDavis AAdve S(2014)ASCProceedings of the 19th international conference on Architectural support for programming languages and operating systems10.1145/2541940.2541985(575-590)Online publication date: 24-Feb-2014
https://dl.acm.org/doi/10.1145/2541940.2541985
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents