Article

Effective communication coalescing for data-parallel applications

Authors:

Daniel Chavarría-Miranda,

John Mellor-CrummeyAuthors Info & Claims

PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 14 - 25

https://doi.org/10.1145/1065944.1065948

Published: 15 June 2005 Publication History

Abstract

Communication coalescing is a static optimization that can reduce both communication frequency and redundant data transfer in compiler-generated code for regular, data parallel applications. We present an algorithm for coalescing communication that arises when generating code for regular, data-parallel applications written in High Performance Fortran (HPF). To handle sophisticated computation partitionings, our algorithm normalizes communication before attempting coalescing. We experimentally evaluate our algorithm, which is implemented in the dHPF compiler, in the compilation of HPF versions of the NAS application benchmarks SP, BT and LU. Our normalized coalescing algorithm improves the performance and scalability of compiler-generated code for these benchmarks by reducing the communication volume up to 55% compared to a simpler coalescing strategy and enables us to match the communication volume and frequency in hand-optimized MPI implementations of these codes.

References

[1]

V. Adve and J. Mellor-Crummey. Using Integer Sets for Data-Parallel Program Analysis and Optimization. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.]]

Digital Library

[2]

D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Dec. 1995.]]

[3]

S. Chakrabarti, M. Gupta, and J.-D. Choi. Global communication analysis and optimization. In Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation, Philadelphia, PA, May 1996.]]

Digital Library

[4]

S. Chatterjee, J. Gilbert, R. Schreiber, and S. Teng. Automatic array alignment in data-parallel programs. In Proceedings of the Twentieth Annual ACM Symposium on the Principles of Programming Languages, Charleston, SC, Jan. 1993.]]

Digital Library

[5]

D. Chavarría-Miranda. Advanced Data-Parallel Compilation. PhD thesis, Dept. of Computer Science, Rice University, Dec. 2003.]]

Digital Library

[6]

D. Chavarría-Miranda and J. Mellor-Crummey. Towards compiler support for scalable parallelism. In Proceedings of the Fifth Workshop on Languages, Compilers, and Runtime Systems for Scalable Computers, Lecture Notes in Computer Science 1915, pages 272--284, Rochester, NY, May 2000. Springer-Verlag.]]

Digital Library

[7]

D. Chavarría-Miranda and J. Mellor-Crummey. An evaluation of data-parallel compiler support for line-sweep applications. In Eleventh International Conference on Parallel Architectures and Compilation Techniques (PACT), Charlottesville, VA, Sept. 2002. ACM.]]

Digital Library

[8]

D. Chavarría-Miranda and J. Mellor-Crummey. An evaluation of data-parallel compiler support for line-sweep applications. The Journal of Instruction-Level Parallelism, 5, February 2003. (http://www.jilp.org/vol5).]]

Digital Library

[9]

D. Chavarría-Miranda, J. Mellor-Crummey, and T. Sarang. Data-parallel compiler support for multipartitioning. In European Conference on Parallel Computing (Euro-Par), Manchester, United Kingdom, Aug. 2001.]]

Digital Library

[10]

A. Darte, J. Mellor-Crummey, R. Fowler, and D. Chavarría-Miranda. Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep applications. Journal of Parallel and Distributed Computing, 63(9), Sept. 2003.]]

Digital Library

[11]

Y. Dotsenko, C. Coarfa, and J. Mellor-Crummey. A Multiplatform Co-Array Fortran Compiler. In Proceedings of the 13th Intl. Conference of Parallel Architectures and Compilation Techniques, Antibes Juan-les-Pins, France, September 29 - October 3 2004.]]

Digital Library

[12]

T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specifications v1.1.1, October 2003. Available at http://www.gwu.edu/.upc/docs/upc spec 1.1.1.pdf.]]

[13]

M. Frumkin, H. Jin, and J. Yan. Implementation of the NAS Parallel Benchmarks in High Performance Fortran. Technical Report NAS-98-009, NAS Parallel Tools Groups, NASA Ames Research Center, Moffet Field, CA 94035, September 1998.]]

[14]

C. Germain and F. Delaplace. Automatic vectorization of communications for data-parallel programs. In European Conference on Parallel Processing (EuroPar), pages 429--440, 1995.]]

Digital Library

[15]

M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K. Wang, W. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Proceedings of Supercomputing '95, San Diego, CA, Dec. 1995.]]

Digital Library

[16]

M. Gupta, E. Schonberg, and H. Srinivasan. A unified framework for optimizing communication in data-parallel programs. IEEE Transactions on Parallel and Distributed Systems, 7(7):689--704, July 1996.]]

Digital Library

[17]

High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1-2):1--170, 1993.]]

[18]

H. Iwashita, N. Sueyasu, and S. Kamiya. A comparison of HPF and VPP Fortran: How it has been used in the design and implementation of HPF/JA extensions. In Proceedings of the 4th Annual HPF User Group meeting, Tokyo, Japan, Oct. 2000.]]

[19]

M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy. A global communication optimization technique based on data-flow analysis and linear algebra. ACM Transactions on Programming Languages and Systems, 21(6):1251--1297, Nov. 1999.]]

Digital Library

[20]

W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott. The Omega Library Interface Guide. Technical report, Dept. of Computer Science, Univ. of Maryland, College Park, Apr. 1996.]]

Digital Library

[21]

J. Mellor-Crummey, V. Adve, B. Broom, D. C. a Miranda, R. Fowler, G. Jin, K. Kennedy, and Q. Yi. Advanced optimization strategies in the Rice dHPF compiler. Concurrency and Computation: Practice and Experience, 14(8-9):741--767, 2002.]]

[22]

N. Naik, V. Naik, and M. Nicoules. Parallelization of a class of implicit finite-difference schemes in computational uid dynamics. International Journal of High Speed Computing, 5(1):1--50, 1993.]]

[23]

R. W. Numrich and J. K. Reid. Co-Array Fortran for parallel programming. ACM Fortran Forum, 17(2):1--31, August 1998.]]

Digital Library

[24]

A. Rogers and K. Pingali. Compiling for distributed memory architectures. IEEE Transactions on Parallel and Distributed Systems, 5(3):281--298, Mar. 1994.]]

Digital Library

[25]

B. Rosen, M. Wegman, and K. Zadeck. Global value numbers and redundant compuations. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA, Jan. 1988.]]

Digital Library

[26]

Y. Seo, H. Iwashita, H. Ohta, and S. Takahashi. HPF/JA: HPF extensions for real-world parallel applications. In Proceedings of the 2th Annual HPF User Group meeting, Porto, Portugal, June 1998.]]

[27]

K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. Concurrency: Practice and Experience, 10(11--13), September-November 1998.]]

Cited By

Barua PZhao JSarkar V(2020)OmpMemOpt: Optimized Memory Movement for Heterogeneous ComputingEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_13(200-216)Online publication date: 24-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-57675-2_13
Alvanos MFarreras MTiotto EAmaral JMartorell X(2019)Combining Static and Dynamic Data Coalescing in Unified Parallel CIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.240555127:2(381-393)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/TPDS.2015.2405551
Hayashi AZhao JFerguson MSarkar VFinkel H(2015)LLVM-based communication optimizations for PGAS programsProceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC10.1145/2833157.2833164(1-11)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2833157.2833164
Show More Cited By

Index Terms

Effective communication coalescing for data-parallel applications
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

Data-Parallel Programming on MIMD Computers

The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of ...
Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation

High Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...
Data Management and Control-Flow Aspects of an SIMD/SPMD Parallel Language/Compiler

Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming

June 2005

310 pages

ISBN:1595930809

DOI:10.1145/1065944

General Chair:
Keshav Pingali
Cornell University
,
Program Chairs:
Katherine Yelick
University of California, Berkeley and LBNL
,
Andrew Grimshaw
University of Virginia

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PPoPP05

Sponsor:

PPoPP05: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2005

June 15 - 17, 2005

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
469
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Barua PZhao JSarkar V(2020)OmpMemOpt: Optimized Memory Movement for Heterogeneous ComputingEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_13(200-216)Online publication date: 24-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-57675-2_13
Alvanos MFarreras MTiotto EAmaral JMartorell X(2019)Combining Static and Dynamic Data Coalescing in Unified Parallel CIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.240555127:2(381-393)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/TPDS.2015.2405551
Hayashi AZhao JFerguson MSarkar VFinkel H(2015)LLVM-based communication optimizations for PGAS programsProceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC10.1145/2833157.2833164(1-11)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2833157.2833164
Sharma ASmith DKoehler JBarua RFerguson MMalony AHammond J(2014)Affine Loop Optimization Based on Modulo Unrolling in ChapelProceedings of the 8th International Conference on Partitioned Global Address Space Programming Models10.1145/2676870.2676877(1-12)Online publication date: 6-Oct-2014
https://dl.acm.org/doi/10.1145/2676870.2676877
Alvanosl MAmaral JTiotto EFarreras MMartorell X(2014)Reducing Compiler-Inserted Instrumentation in Unified-Parallel-C Code GenerationProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.34(270-277)Online publication date: 22-Oct-2014
https://dl.acm.org/doi/10.1109/SBAC-PAD.2014.34
Dathathri RReddy CRamashekar TBondhugula UFensch CO'Boyle MSeznec ABodin F(2013)Generating efficient data movement code for heterogeneous architectures with distributed-memoryProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523771(375-386)Online publication date: 7-Oct-2013
https://dl.acm.org/doi/10.5555/2523721.2523771
Alias CDarte APlesco AMacii E(2013)Optimizing remote accesses for offloaded kernelsProceedings of the Conference on Design, Automation and Test in Europe10.5555/2485288.2485430(575-580)Online publication date: 18-Mar-2013
https://dl.acm.org/doi/10.5555/2485288.2485430
Ramashekar TBondhugula U(2013)Automatic data allocation and buffer management for multi-GPU machinesACM Transactions on Architecture and Code Optimization10.1145/254410010:4(1-26)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1145/2544100
Alvanos MFarreras MTiotto EAmaral JMartorell XMalony ANemirovsky MMidkiff S(2013)Improving communication in PGAS environmentsProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465006(129-138)Online publication date: 10-Jun-2013
https://dl.acm.org/doi/10.1145/2464996.2465006
Dathathri RReddy CRamashekar TBondhugula U(2013)Dynamic memory access monitoring based on tagged memoryProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618833(409-410)Online publication date: Oct-2013
https://doi.org/10.1109/PACT.2013.6618833
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents