Article

Free access

Compilation techniques for block-cyclic distributions

Authors:

Seema Hiranandani,

John Mellor-Crummey,

Ajay SethiAuthors Info & Claims

ICS '94: Proceedings of the 8th international conference on Supercomputing

Pages 392 - 403

https://doi.org/10.1145/181181.181572

Published: 16 July 1994 Publication History

Abstract

Compilers for data-parallel languages such as Fortran D and High-Performance Fortran use data alignment and distribution specifications as the basis for translating programs for execution on MIMD distributed-memory machines. This paper describes techniques for generating efficient code for programs that use block-cyclic distributions. These techniques can be applied to programs with symbolic loop bounds, symbolic array dimensions, and loops with non-unit strides. We present algorithms for computing the data elements that need to be communicated among processors both for loops with unit and non-unit strides, a linear-time algorithm for computing the memory access sequence for loops with non-unit strides, and experimental results for a hand-compiled test case using block-cyclic distributions

References

[1]

V.S. Adve, C. Koelbel, and J.Mellor-Crummey. Performance analysis of data parallel programs. Submitted to Supercomputing '94, April 1994.]]

[2]

F. Bodin, E.D. Granston, and T. Montaut. Experiences reducing false sharing in shared virtual memory systems (under preparation). Technical report, Center for Research on Parallel Computation, Rice University, 1994.]]

[3]

D. Callahan and K. Kennedy. Compiling programs for distributed-memory multiprocessors. Journal of Supercomputing, 2:151-169, October 1988.]]

[4]

S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Teng. Generating local addresses and communication sets for dataparallel programs. In Proceedings of the Fourth A CM SIG- PLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.]]

Digital Library

[5]

J. Dongarra, R. van de Geijn, and D. Walker. A look at scalable dense linear algebra libraries. In Proceedings of the 1992 Scalable High Performance Computing Conference, pages 372-379, Williamsburg, VA, April 1992.]]

[6]

M. Gerndt. Updating distributed variables in local computations. Concurrency: Practice ~J Experience, 2(3):171-193, September 1990.]]

Digital Library

[7]

S.K.S. Gupta, S.D. Kaushik, S. Mufti, S. Sharma, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. In Proceedzngs of the i993 International Conference on Parallel Processing, volume II, pages 301-305, 1993.]]

[8]

S.K.S. Gupta, S.D. Kaushik, S. Mufti, S. Sharma, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. Technical report OSU-CISRC-4/94-TR19, Department of Computer and Information Science, Ohio State University, Columbus, OH, March 1994.]]

[9]

R. v. Hanxleden. Handling irregular problems with fortran d - a preliminary report. Technical Report CRPC-TR93339-S, Center for Research on Parallel Computation, Rice University, October 1993.]]

[10]

P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350-360, July 1991.]]

Digital Library

[11]

High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1- 2):1-170, 1993.]]

[12]

S. Hiranandani, K. Kennedy, J. Mellor-Crummey, and A. Sethi. Advanced compilation techniques for fortran d. Technical Report CRPC-TR93338, Center {'or Research on Parallel Computation, Rice University, October 1993.]]

[13]

S. Hiranandani, K. Kennedy, and C. Tseng. Compiler optimizations for Fortran D on MIMD distributed-memory machines. In Proceedings of Supercomputing '91, Albuquerque, NM, November 1991.]]

Digital Library

[14]

S. Hiranandani, K. Kennedy, and C. Tseng. Compiler support for machine-independent parallel programming in Fortran D. In J. Saltz and P. Mehrotra, editors, Languages, Compilers, and Run-Time Environments }or Distributed Memory Machines. North-Holland, Amsterdam, The Netherlands, 1992.]]

Digital Library

[15]

S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the A CM, 35(8):66-80, August 1992.]]

Digital Library

[16]

K. Kennedy, K. S. McKinley, and C. Tseng. Analysis and transformation in the ParaScope Editor. In Proceedings o/ the 1991 ACM International Conference on Supercomput- ~ng, Cologne, Germany, June 1991.]]

Digital Library

[17]

C. Koelbel. Compile-time generation of regular communications patterns. In Proceedings of Supercomputing '91, pages 101-110, Albuquerque, NM, November 1991.]]

Digital Library

[18]

C. Koelbel, D. Loveman, R. Schreiber, G. Steele, Jr., and M. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.]]

Digital Library

[19]

C. Koelbel and P. Mehrotra. Compiling global name-space programs for distributed execution. ICASE Report 90-70, Institute for Computer Application in Science and Engineering, Hampton, VA, October 1990.]]

[20]

A. Rogers and K. Pingali. Process decomposition through locality of reference, in Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, Portland, OR, June 1989.]]

Digital Library

[21]

J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. Run-time scheduling and execution of loops on message passing machines. Journal o} Parallel and Distributed Computing, 8(4):303-312, April 1990.]]

Digital Library

[22]

J. Stichnoth, D. O'Hallaron, and T. Gross. Generating communication for array statements: Design, implementation, and evaluation, in Proceedings of the Szxth Workshop on Languages and Compilers }or Parallel Computing, Portland, OR, August 1993.]]

[23]

H. Zima, H.-J. Bast, and M. Gerndt. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing, 6:1-18, 1988.]]

Cited By

Belyaev NPerepelkin V(2021)High-Efficiency Specialized Support for Dense Linear Algebra Arithmetic in LuNA SystemParallel Computing Technologies10.1007/978-3-030-86359-3_11(143-150)Online publication date: 7-Sep-2021
https://doi.org/10.1007/978-3-030-86359-3_11
Lee JSato M(2010)Implementation and Performance Evaluation of XcalableMPProceedings of the 2010 39th International Conference on Parallel Processing Workshops10.1109/ICPPW.2010.62(413-420)Online publication date: 13-Sep-2010
https://dl.acm.org/doi/10.1109/ICPPW.2010.62
Huang JChu C(2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
https://dl.acm.org/doi/10.1007/s11227-007-0166-9
Show More Cited By

Index Terms

Compilation techniques for block-cyclic distributions
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Compilation techniques for block-cyclic distributions
ACM International Conference on Supercomputing 25th Anniversary Volume

Compilers for data-parallel languages such as Fortran D and High-Performance Fortran use data alignment and distribution specifications as the basis for translating programs for execution on MIMD distributed-memory machines. This paper describes ...
A study of scalar compilation techniques for pipelined supercomputers
ASPLOS II: Proceedings of the second international conference on Architectual support for programming languages and operating systems

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of ...
A study of scalar compilation techniques for pipelined supercomputers

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '94: Proceedings of the 8th international conference on Supercomputing

July 1994

452 pages

ISBN:0897916654

DOI:10.1145/181181

Chairmen:
John Gurd
Univ. of Manchester, Manchester, UK
,
William Jalby
Univ. de Versailles, France

Copyright © 1994 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 July 1994

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS94

Sponsor:

SIGARCH

ICS94: International Conference on Supercomputing '94

July 11 - 15, 1994

Manchester, England

Acceptance Rates

ICS '94 Paper Acceptance Rate 45 of 114 submissions, 39%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

49
Total Citations
View Citations
362
Total Downloads

Downloads (Last 12 months)102
Downloads (Last 6 weeks)27

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Belyaev NPerepelkin V(2021)High-Efficiency Specialized Support for Dense Linear Algebra Arithmetic in LuNA SystemParallel Computing Technologies10.1007/978-3-030-86359-3_11(143-150)Online publication date: 7-Sep-2021
https://doi.org/10.1007/978-3-030-86359-3_11
Lee JSato M(2010)Implementation and Performance Evaluation of XcalableMPProceedings of the 2010 39th International Conference on Parallel Processing Workshops10.1109/ICPPW.2010.62(413-420)Online publication date: 13-Sep-2010
https://dl.acm.org/doi/10.1109/ICPPW.2010.62
Huang JChu C(2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
https://dl.acm.org/doi/10.1007/s11227-007-0166-9
Hsu CChen MYang CLi K(2006)Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing CompilersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2006.16217:11(1226-1241)Online publication date: 1-Nov-2006
https://dl.acm.org/doi/10.1109/TPDS.2006.162
Huang JChu C(2006)An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data RedistributionThe Journal of Supercomputing10.1007/s11227-006-6615-z37:3(297-318)Online publication date: 1-Sep-2006
https://dl.acm.org/doi/10.1007/s11227-006-6615-z
Zhang GCarpenter BFox GLi XLi XWen Y(2005)PCRC-based HPF compilationLanguages and Compilers for Parallel Computing10.1007/BFb0032693(204-217)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0032693
Thirumalai ARamanujam J(2005)Fast address sequence generation for data-parallel programs using integer latticesLanguages and Compilers for Parallel Computing10.1007/BFb0014200(191-208)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0014200
Coelho FGermain CPazat J(2005)State of the art in compiling HPFThe Data Parallel Programming Model10.1007/3-540-61736-1_45(104-133)Online publication date: 4-Jun-2005
https://doi.org/10.1007/3-540-61736-1_45
Hsu CYu K(2004)A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse MatrixThe Journal of Supercomputing10.1023/B:SUPE.0000026846.74050.1829:2(125-143)Online publication date: 1-Aug-2004
https://dl.acm.org/doi/10.1023/B%3ASUPE.0000026846.74050.18
Hwang G(2004)An efficient algorithm for communication set generation of data parallel programs with block-cyclic distributionParallel Computing10.1016/j.parco.2004.02.00130:4(473-501)Online publication date: 1-Apr-2004
https://dl.acm.org/doi/10.1016/j.parco.2004.02.001
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents