Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Communication optimization and code generation for distributed memory machines

Published: 01 June 1993 Publication History

Abstract

This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our algorithms produce an SPMD (single program multiple data) program to be run on each processor. Our compiler generated the necessary receive and send instructions, optimizes the communication by eliminating redundant communication and aggregating small messages into large messages, allocates space locally on each processor, and translates global data addresses to local addresses.
Our techniques are based on an exact data-flow analysis on individual array element accesses. Unlike data dependence analysis, this analysis determines if two dynamic instances refer to the same value, and not just to the same location. Using this information, our compiler can handle more flexible data decompositions and find more opportunities for communication optimization than systems based on data dependence analysis.
Our technique is based on a uniform framework, where data decompositions, computation decompositions and the data flow information are all represented as systems of linear inequalities. We show that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.

References

[1]
C. Ancourt and F. Irigoin. Scanning Polyhedra with DO Loops. In Third A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, pp. 39-50, April 1991.
[2]
M. Ancourt. Generation automatique de codes de transfert pour multt~vrocesseurs a memoires locales. PhD thesis, University of Paris VI, March 1991.
[3]
J. M. Anderson and M. S. Lam. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In Proceedings of the SIGPLAN '93 Conference on Program Language Design and Implementation, June 1993.
[4]
F. Andr6, O. Ch6ron and J.-L. Pazat. Compiling Sequential Programs for Distributed Memory Parallel Computers with Pandore II. In Third Workshop on Compilers for Parallel Computers, pp. 231-242, July 1992.
[5]
U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic, 1988.
[6]
W. Blume and R. Eigenmann. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs. In IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 6, pp. 643-656, November 1992.
[7]
M. Bromley, S. Heller, T. McNerney and G. L. Steele Jr. Fortran at Ten Gigaflops: The Connection Machine Convolution Compiler. In Proceedings of A CM Conference on Programming Language Design and Implementation, pp. 145-164, June 1991.
[8]
B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. In Third Workshop on Compilers for Parallel Computers, pp. 121-160, July 1992.
[9]
C. Koelbel. Compile-time generation of regular communication patterns. In Proceedings of Supercomputing '91, pp. 101-110, November 1991.
[10]
P. Feautrier, Array expansion. In International Conference on Supercomputing, pp. 429-442, 1988.
[11]
P. Feautrier, Parametric integer programming. Technical Report 209, Laboratoire Methodologie and Architecture Des Systems Informatiques, January 1988.
[12]
P. Feautrier. Dataflow analysis of array and scalar references. In International Journal of Parallel Programming, 20(1):23- 52, February 1991.
[13]
G. Gannon, W. Jalby and K Gallivan. Strategies for cache and local memory management by global program transformation. In Journal of Parallel and Distributed Computing, 5:587-616, 1988.
[14]
G. R. Gao, R. Olsen, V. Sarkar and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Programming Languages and Compilers for Parallel Computing, pp. 171-181, August 1992.
[15]
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. In IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 3, pp. 350-360, July 1991.
[16]
S. Hiranandani, K. Kennedy and C. Tseng. Compiling Fortran D for MIMD Distributed-Memory Machines. In Communications of ACM, vol. 35, no. 8, pp. 66-80, August 1992.
[17]
L. G. C. Hamey, J. A. Webb and I. C. Wu. An architecture independent programming language for low-level vision. Computer Vision, Graphics, and Image Processing, vol. 48, no. 2, pp. 246-264, November 1989.
[18]
J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays, in Proceedings of Frontiers '90: The Third Symposium on the Frontiers of Massively Parallel Computation. pp. 424-432. October 1990.
[19]
D. E. Maydan, S. P. Amarasinghe and M. S. Lain. Array Data- Flow Analysis and its Use in Array Privatization. in Proceedings of ACM Conference on Principles of Programming Languages, pp. 2-15. January 1993.
[20]
D. E. Maydan, Accurate Analysis of Array References. PhD thesis, Stanford University, September 1992. Published as CSL-TR-92-547.
[21]
P. Mehrotra and J. Van Rosendale. High Level Programming of Distributed Memory Architectures. In A. Nicolau, D. Gelernter, T. Gross and D. Padua editors, Advances in Languages and Compilers for Parallel Processing, pp. 364- 384, The MIT Press, Cambridge, Massachusetts, 1991.
[22]
K. Pingali and A. Rogers. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, pp. 69-80, June 1989.
[23]
A. Schrijver, Theory of Linear and Integer Programming, Wiley, Chichester' 1986.
[24]
G. L. Steele. Proposal for alignment and distribution directives in HPF. Draft presented at HPF Forum meeting, June 1992.
[25]
S. Tjiang, M. Wolf, M. Lain, K. Pieper and J. Hennessy. Integrating scalar optimizations and parallelization. In U. Banerjee, D. Gelernter, A. Nicolau and D. Padua editors, Languages and Compilers for Parallel Computing, pp. 137- 151, Springer-Verlag, Berlin, Germany, 1992.
[26]
C.-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Rice University, January 1993. Published as Rice COMP TR93-199.
[27]
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In SIGPLAN Notices, vol. 26, no. 6, pp. 30-44, June 1991.
[28]
M. E. Wolf. Improving locality and parallelism in nested loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.

Cited By

View all
  • (2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: 1-Jan-2024
  • (2022)DISTAL: the distributed tensor algebra compilerProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523437(286-300)Online publication date: 9-Jun-2022
  • (2019)Tiramisu: a polyhedral compiler for expressing fast and portable codeProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314896(193-205)Online publication date: 16-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 28, Issue 6
June 1993
313 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/173262
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
    August 1993
    313 pages
    ISBN:0897915984
    DOI:10.1145/155090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993
Published in SIGPLAN Volume 28, Issue 6

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)162
  • Downloads (Last 6 weeks)20
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: 1-Jan-2024
  • (2022)DISTAL: the distributed tensor algebra compilerProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523437(286-300)Online publication date: 9-Jun-2022
  • (2019)Tiramisu: a polyhedral compiler for expressing fast and portable codeProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314896(193-205)Online publication date: 16-Feb-2019
  • (2019)Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661197(193-205)Online publication date: Feb-2019
  • (2018)Fog Computing and Data as a Service: A Goal-Based Modeling Approach to Enable Effective Data MovementsAdvanced Information Systems Engineering10.1007/978-3-319-91563-0_13(203-219)Online publication date: 11-Jun-2018
  • (2015)On the run-time cost of distributed-memory communications generated using the polyhedral model2015 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2015.7237034(151-159)Online publication date: Jul-2015
  • (2013)Automatic Generation of Communications for Redundant Multi-dimensional Data Parallel Redistributions2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing10.1109/HPCC.and.EUC.2013.116(800-811)Online publication date: Nov-2013
  • (2010)Compilation for Distributed Memory ArchitecturesThe Compiler Design Handbook10.1201/9781420040579.ch11Online publication date: 7-Mar-2010
  • (2009)Slicing based code parallelization for minimizing inter-processor communicationProceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1629395.1629409(87-96)Online publication date: 11-Oct-2009
  • (2008)Finding Synchronization-Free Slices of Operations in Arbitrarily Nested LoopsProceedings of the international conference on Computational Science and Its Applications, Part II10.1007/978-3-540-69848-7_69(871-886)Online publication date: 30-Jun-2008
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media