Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/155090.155102acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free access

Communication optimization and code generation for distributed memory machines

Published: 01 June 1993 Publication History

Abstract

This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our algorithms produce an SPMD (single program multiple data) program to be run on each processor. Our compiler generated the necessary receive and send instructions, optimizes the communication by eliminating redundant communication and aggregating small messages into large messages, allocates space locally on each processor, and translates global data addresses to local addresses.
Our techniques are based on an exact data-flow analysis on individual array element accesses. Unlike data dependence analysis, this analysis determines if two dynamic instances refer to the same value, and not just to the same location. Using this information, our compiler can handle more flexible data decompositions and find more opportunities for communication optimization than systems based on data dependence analysis.
Our technique is based on a uniform framework, where data decompositions, computation decompositions and the data flow information are all represented as systems of linear inequalities. We show that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.

References

[1]
C. Ancourt and F. Irigoin. Scanning Polyhedra with DO Loops. In Third A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, pp. 39-50, April 1991.
[2]
M. Ancourt. Generation automatique de codes de transfert pour multt~vrocesseurs a memoires locales. PhD thesis, University of Paris VI, March 1991.
[3]
J. M. Anderson and M. S. Lam. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In Proceedings of the SIGPLAN '93 Conference on Program Language Design and Implementation, June 1993.
[4]
F. Andr6, O. Ch6ron and J.-L. Pazat. Compiling Sequential Programs for Distributed Memory Parallel Computers with Pandore II. In Third Workshop on Compilers for Parallel Computers, pp. 231-242, July 1992.
[5]
U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic, 1988.
[6]
W. Blume and R. Eigenmann. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs. In IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 6, pp. 643-656, November 1992.
[7]
M. Bromley, S. Heller, T. McNerney and G. L. Steele Jr. Fortran at Ten Gigaflops: The Connection Machine Convolution Compiler. In Proceedings of A CM Conference on Programming Language Design and Implementation, pp. 145-164, June 1991.
[8]
B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. In Third Workshop on Compilers for Parallel Computers, pp. 121-160, July 1992.
[9]
C. Koelbel. Compile-time generation of regular communication patterns. In Proceedings of Supercomputing '91, pp. 101-110, November 1991.
[10]
P. Feautrier, Array expansion. In International Conference on Supercomputing, pp. 429-442, 1988.
[11]
P. Feautrier, Parametric integer programming. Technical Report 209, Laboratoire Methodologie and Architecture Des Systems Informatiques, January 1988.
[12]
P. Feautrier. Dataflow analysis of array and scalar references. In International Journal of Parallel Programming, 20(1):23- 52, February 1991.
[13]
G. Gannon, W. Jalby and K Gallivan. Strategies for cache and local memory management by global program transformation. In Journal of Parallel and Distributed Computing, 5:587-616, 1988.
[14]
G. R. Gao, R. Olsen, V. Sarkar and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Programming Languages and Compilers for Parallel Computing, pp. 171-181, August 1992.
[15]
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. In IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 3, pp. 350-360, July 1991.
[16]
S. Hiranandani, K. Kennedy and C. Tseng. Compiling Fortran D for MIMD Distributed-Memory Machines. In Communications of ACM, vol. 35, no. 8, pp. 66-80, August 1992.
[17]
L. G. C. Hamey, J. A. Webb and I. C. Wu. An architecture independent programming language for low-level vision. Computer Vision, Graphics, and Image Processing, vol. 48, no. 2, pp. 246-264, November 1989.
[18]
J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays, in Proceedings of Frontiers '90: The Third Symposium on the Frontiers of Massively Parallel Computation. pp. 424-432. October 1990.
[19]
D. E. Maydan, S. P. Amarasinghe and M. S. Lain. Array Data- Flow Analysis and its Use in Array Privatization. in Proceedings of ACM Conference on Principles of Programming Languages, pp. 2-15. January 1993.
[20]
D. E. Maydan, Accurate Analysis of Array References. PhD thesis, Stanford University, September 1992. Published as CSL-TR-92-547.
[21]
P. Mehrotra and J. Van Rosendale. High Level Programming of Distributed Memory Architectures. In A. Nicolau, D. Gelernter, T. Gross and D. Padua editors, Advances in Languages and Compilers for Parallel Processing, pp. 364- 384, The MIT Press, Cambridge, Massachusetts, 1991.
[22]
K. Pingali and A. Rogers. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, pp. 69-80, June 1989.
[23]
A. Schrijver, Theory of Linear and Integer Programming, Wiley, Chichester' 1986.
[24]
G. L. Steele. Proposal for alignment and distribution directives in HPF. Draft presented at HPF Forum meeting, June 1992.
[25]
S. Tjiang, M. Wolf, M. Lain, K. Pieper and J. Hennessy. Integrating scalar optimizations and parallelization. In U. Banerjee, D. Gelernter, A. Nicolau and D. Padua editors, Languages and Compilers for Parallel Computing, pp. 137- 151, Springer-Verlag, Berlin, Germany, 1992.
[26]
C.-W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. PhD thesis, Rice University, January 1993. Published as Rice COMP TR93-199.
[27]
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In SIGPLAN Notices, vol. 26, no. 6, pp. 30-44, June 1991.
[28]
M. E. Wolf. Improving locality and parallelism in nested loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.

Cited By

View all
  • (2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: Jan-2024
  • (2023)Automatic Generation of Distributed-Memory Mappings for Tensor ComputationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607096(1-13)Online publication date: 12-Nov-2023
  • (2023)Automation of Programming for Promising High-Performance Computing SystemsParallel Computing Technologies10.1007/978-3-031-41673-6_1(3-17)Online publication date: 15-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
August 1993
313 pages
ISBN:0897915984
DOI:10.1145/155090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PLDI93
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)162
  • Downloads (Last 6 weeks)20
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: Jan-2024
  • (2023)Automatic Generation of Distributed-Memory Mappings for Tensor ComputationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607096(1-13)Online publication date: 12-Nov-2023
  • (2023)Automation of Programming for Promising High-Performance Computing SystemsParallel Computing Technologies10.1007/978-3-031-41673-6_1(3-17)Online publication date: 15-Aug-2023
  • (2020)OmpMemOpt: Optimized Memory Movement for Heterogeneous ComputingEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_13(200-216)Online publication date: 24-Aug-2020
  • (2016)Retargetable Communication for Distributed Programs2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA)10.1109/QoSA.2016.8(21-30)Online publication date: Apr-2016
  • (2015)Automatic cluster parallelization and minimizing communication via selective data replication2015 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2015.7322481(1-7)Online publication date: Sep-2015
  • (2015)Code Generation for Distributed-Memory ArchitecturesThe Computer Journal10.1093/comjnl/bxv077(bxv077)Online publication date: 15-Sep-2015
  • (2014)ASCACM SIGARCH Computer Architecture News10.1145/2654822.254198542:1(575-590)Online publication date: 24-Feb-2014
  • (2014)ASCACM SIGPLAN Notices10.1145/2644865.254198549:4(575-590)Online publication date: 24-Feb-2014
  • (2014)ASCProceedings of the 19th international conference on Architectural support for programming languages and operating systems10.1145/2541940.2541985(575-590)Online publication date: 24-Feb-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media