Article

Free access

A linear-time algorithm for computing the memory access sequence in data-parallel programs

Authors:

Nenad Nedeljkovic,

Ajay SethiAuthors Info & Claims

PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 102 - 111

https://doi.org/10.1145/209936.209948

Published: 01 August 1995 Publication History

Abstract

Data-parallel languages, such as High Performance Fortran, are widely regarded as a promising means for writing portable programs for distributed-memory machines. Novel features of these languages call for the development of new techniques in both compilers and run-time systems. In this paper, we present an improved algorithm for finding the local memory access sequence in computations involving regular sections of arrays with cyclic(k) distributions. After establishing the fact that regular section indices correspond to elements of an integer lattice, we show how to find a lattice basis that allows for simple and fast enumeration of memory accesses. The complexity of our algorithm is shown to be lower than that of the previous solution for the same problem. In addition, the experimental results demonstrate the efficiency of our method in practice.

References

[1]

C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993.

[2]

B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. Scientific Programming, 1(1):31-50, Fall 1992.

Digital Library

[3]

S. Chatterjee. Private communication, October 1994.

[4]

S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Teng. Generating local addresses and communication sets for dataparallel programs. In Proceedings of the Fourth A CM SIG- PLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.

Digital Library

[5]

T.H. Cormen, C,E. Lelserson, and R.L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.

Digital Library

[6]

J. Dongarra, R. van de Geijn, and D. Walker. A look at scalable dense linear algebra libraries. In Proceedings of the 1992 Scalable High Performance Computing Conference, pages 372-379, Williamsburg, VA, April 1992.

[7]

S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. Teclmical Report OSE- CISRC-4/94-TR19, Department of Computer and Information Science, The Ohio State University, April 1994.

[8]

High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1- 2):1-170, 1993.

[9]

S. Hiranandani, K. Kennedy, J. Mellor-Crummey, and A. Sethi. Compilation techniques for block-cyclic distributions. In Proceedings of the 199# ACM international Conference on Supercomputlng, Manchester, England, July 1994.

Digital Library

[10]

S. Hiranandani, K. Kemmdy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the A CM, 35(8):66-80, August 1992.

Digital Library

[11]

R. Kannan. Algorithmic geometry of numbers. In J. Traub, editor, Annual Review of Computer Science. Annual Reviews hlc., Polo Alto, CA, 1987.

[12]

K. Kennedy, N. Nedeljkovid, and A. Sethi. Efficient address generation for block-cyclic distributions. In Proceedings of the 1995 A CM International Conference on Supercomputing, Barcelona, Spain, July 1995.

Digital Library

[13]

A. Knies, M. O'Keefe, and T. MacDonald. High Performance Fortran: A practical analysis. Scientific Programming, 3(3):187-199, Fall 1994.

Digital Library

[14]

C. Koelbel, D. Loveman, R. Schreiber# G. Steele, Jr., and M. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.

Digital Library

[15]

C. van Reeuwijk, H. J. Sips, W. Denissen, a#ld E. M. Paalvast. Implementing HPF distributed arrays on a mesagepassing parallel computer. Tectutical report, Advanced School of Computing and Imaging, Delft University of Technology, February 1995.

[16]

J. Stichnoth, D. O'Hallaxon, aald T. Gross. Generating communication for array statements: Design, implementation, and evaluation. Journal of Parallel and Distributed Computing, 21(1):150-159, April 1994.

Digital Library

Cited By

Ziogas AKwasniewski GBen-Nun TSchneider THoefler TWolf FShende SCulhane CAlam SJagode H(2022)DeinsumProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571918(1-15)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571918
Ziogas AKwasniewski GBen-Nun TSchneider THoefler T(2022)Deinsum: Practically I/O Optimal Multi-Linear AlgebraSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00030(1-15)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00030
Mellor-Crummey JHiranandani SSethi A(2014)Author retrospectiveACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591651(29-31)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2591651
Show More Cited By

Index Terms

A linear-time algorithm for computing the memory access sequence in data-parallel programs
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

A linear-time algorithm for computing the memory access sequence in data-parallel programs

Data-parallel languages, such as High Performance Fortran, are widely regarded as a promising means for writing portable programs for distributed-memory machines. Novel features of these languages call for the development of new techniques in both ...
Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

Non-volatile memory (NVM) provides a scalable solution to replace DRAM as main memory. Because of relatively high latency and low bandwidth of NVM (comparing with DRAM), NVM often pairs with DRAM to build a heterogeneous main memory system (HMS). ...
Efficient memory partitioning for parallel data access in multidimensional arrays
DAC '15: Proceedings of the 52nd Annual Design Automation Conference

Memory bandwidth bottlenecks severely restrict parallel access of data from memory arrays. To increase bandwidth, memory partitioning algorithms have been proposed to access multiple memory banks simultaneously. However, previous partitioning schemes ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming

August 1995

234 pages

ISBN:0897917006

DOI:10.1145/209936

Chairmen:
Jeanne Ferrante
Univ. of California, San Diego
,
David Padua
Univ. of Illinois at Urbana-Champaign, Urbana
,
Editor:
Richard L. Wexelblat
IRS IS:AO, Washington, DC

ACM SIGPLAN Notices Volume 30, Issue 8
Aug. 1995
226 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/209937
Editors:
Richard L. Wexelblat
IRS IS:AO, Washington, DC
,
Jeanne Ferrante
Univ. of California, San Diego
,
David Padua
Univ. of Illinois at Urbana-Champaign, Urbana
Issue’s Table of Contents

Copyright © 1995 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1995

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PPoPP95

Sponsor:

SIGPLAN

PPoPP95: Principles & Practices of Parallel Programming

July 19 - 21, 1995

California, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
371
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)19

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ziogas AKwasniewski GBen-Nun TSchneider THoefler TWolf FShende SCulhane CAlam SJagode H(2022)DeinsumProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571918(1-15)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571918
Ziogas AKwasniewski GBen-Nun TSchneider THoefler T(2022)Deinsum: Practically I/O Optimal Multi-Linear AlgebraSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00030(1-15)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00030
Mellor-Crummey JHiranandani SSethi A(2014)Author retrospectiveACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2591651(29-31)Online publication date: 10-Jun-2014
https://dl.acm.org/doi/10.1145/2591635.2591651
Ramanujam JDutta SVenkatachar A(2005)Code generation for complex subscripts in data-parallel programsLanguages and Compilers for Parallel Computing10.1007/BFb0032683(49-63)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0032683
Midkiff S(2005)Optimizing the representation of local iteration sets and access sequences for block-cyclic distributionsLanguages and Compilers for Parallel Computing10.1007/BFb0017267(420-434)Online publication date: 10-Jun-2005
https://doi.org/10.1007/BFb0017267
Venkatachar ARamanujam JThirumalai A(2005)Generalized overlap regions for communication optimization in data-parallel programsLanguages and Compilers for Parallel Computing10.1007/BFb0017266(404-419)Online publication date: 10-Jun-2005
https://doi.org/10.1007/BFb0017266
Stichnoth JGross T(2005)A communication backend for parallel language compilersLanguages and Compilers for Parallel Computing10.1007/BFb0014202(224-238)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0014202
Kaushik SHuang CSadayappan P(2005)Compiling array statements for efficient execution on distributed-memory machines: Two-level mappingsLanguages and Compilers for Parallel Computing10.1007/BFb0014201(209-223)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0014201
Thirumalai ARamanujam J(2005)Fast address sequence generation for data-parallel programs using integer latticesLanguages and Compilers for Parallel Computing10.1007/BFb0014200(191-208)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0014200
Hwang G(2004)An efficient algorithm for communication set generation of data parallel programs with block-cyclic distributionParallel Computing10.1016/j.parco.2004.02.00130:4(473-501)Online publication date: 1-Apr-2004
https://dl.acm.org/doi/10.1016/j.parco.2004.02.001
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents