Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1065944.1065948acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Effective communication coalescing for data-parallel applications

Published: 15 June 2005 Publication History

Abstract

Communication coalescing is a static optimization that can reduce both communication frequency and redundant data transfer in compiler-generated code for regular, data parallel applications. We present an algorithm for coalescing communication that arises when generating code for regular, data-parallel applications written in High Performance Fortran (HPF). To handle sophisticated computation partitionings, our algorithm normalizes communication before attempting coalescing. We experimentally evaluate our algorithm, which is implemented in the dHPF compiler, in the compilation of HPF versions of the NAS application benchmarks SP, BT and LU. Our normalized coalescing algorithm improves the performance and scalability of compiler-generated code for these benchmarks by reducing the communication volume up to 55% compared to a simpler coalescing strategy and enables us to match the communication volume and frequency in hand-optimized MPI implementations of these codes.

References

[1]
V. Adve and J. Mellor-Crummey. Using Integer Sets for Data-Parallel Program Analysis and Optimization. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.]]
[2]
D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Dec. 1995.]]
[3]
S. Chakrabarti, M. Gupta, and J.-D. Choi. Global communication analysis and optimization. In Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation, Philadelphia, PA, May 1996.]]
[4]
S. Chatterjee, J. Gilbert, R. Schreiber, and S. Teng. Automatic array alignment in data-parallel programs. In Proceedings of the Twentieth Annual ACM Symposium on the Principles of Programming Languages, Charleston, SC, Jan. 1993.]]
[5]
D. Chavarría-Miranda. Advanced Data-Parallel Compilation. PhD thesis, Dept. of Computer Science, Rice University, Dec. 2003.]]
[6]
D. Chavarría-Miranda and J. Mellor-Crummey. Towards compiler support for scalable parallelism. In Proceedings of the Fifth Workshop on Languages, Compilers, and Runtime Systems for Scalable Computers, Lecture Notes in Computer Science 1915, pages 272--284, Rochester, NY, May 2000. Springer-Verlag.]]
[7]
D. Chavarría-Miranda and J. Mellor-Crummey. An evaluation of data-parallel compiler support for line-sweep applications. In Eleventh International Conference on Parallel Architectures and Compilation Techniques (PACT), Charlottesville, VA, Sept. 2002. ACM.]]
[8]
D. Chavarría-Miranda and J. Mellor-Crummey. An evaluation of data-parallel compiler support for line-sweep applications. The Journal of Instruction-Level Parallelism, 5, February 2003. (http://www.jilp.org/vol5).]]
[9]
D. Chavarría-Miranda, J. Mellor-Crummey, and T. Sarang. Data-parallel compiler support for multipartitioning. In European Conference on Parallel Computing (Euro-Par), Manchester, United Kingdom, Aug. 2001.]]
[10]
A. Darte, J. Mellor-Crummey, R. Fowler, and D. Chavarría-Miranda. Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep applications. Journal of Parallel and Distributed Computing, 63(9), Sept. 2003.]]
[11]
Y. Dotsenko, C. Coarfa, and J. Mellor-Crummey. A Multiplatform Co-Array Fortran Compiler. In Proceedings of the 13th Intl. Conference of Parallel Architectures and Compilation Techniques, Antibes Juan-les-Pins, France, September 29 - October 3 2004.]]
[12]
T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specifications v1.1.1, October 2003. Available at http://www.gwu.edu/.upc/docs/upc spec 1.1.1.pdf.]]
[13]
M. Frumkin, H. Jin, and J. Yan. Implementation of the NAS Parallel Benchmarks in High Performance Fortran. Technical Report NAS-98-009, NAS Parallel Tools Groups, NASA Ames Research Center, Moffet Field, CA 94035, September 1998.]]
[14]
C. Germain and F. Delaplace. Automatic vectorization of communications for data-parallel programs. In European Conference on Parallel Processing (EuroPar), pages 429--440, 1995.]]
[15]
M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K. Wang, W. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Proceedings of Supercomputing '95, San Diego, CA, Dec. 1995.]]
[16]
M. Gupta, E. Schonberg, and H. Srinivasan. A unified framework for optimizing communication in data-parallel programs. IEEE Transactions on Parallel and Distributed Systems, 7(7):689--704, July 1996.]]
[17]
High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming, 2(1-2):1--170, 1993.]]
[18]
H. Iwashita, N. Sueyasu, and S. Kamiya. A comparison of HPF and VPP Fortran: How it has been used in the design and implementation of HPF/JA extensions. In Proceedings of the 4th Annual HPF User Group meeting, Tokyo, Japan, Oct. 2000.]]
[19]
M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, and N. Shenoy. A global communication optimization technique based on data-flow analysis and linear algebra. ACM Transactions on Programming Languages and Systems, 21(6):1251--1297, Nov. 1999.]]
[20]
W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott. The Omega Library Interface Guide. Technical report, Dept. of Computer Science, Univ. of Maryland, College Park, Apr. 1996.]]
[21]
J. Mellor-Crummey, V. Adve, B. Broom, D. C. a Miranda, R. Fowler, G. Jin, K. Kennedy, and Q. Yi. Advanced optimization strategies in the Rice dHPF compiler. Concurrency and Computation: Practice and Experience, 14(8-9):741--767, 2002.]]
[22]
N. Naik, V. Naik, and M. Nicoules. Parallelization of a class of implicit finite-difference schemes in computational uid dynamics. International Journal of High Speed Computing, 5(1):1--50, 1993.]]
[23]
R. W. Numrich and J. K. Reid. Co-Array Fortran for parallel programming. ACM Fortran Forum, 17(2):1--31, August 1998.]]
[24]
A. Rogers and K. Pingali. Compiling for distributed memory architectures. IEEE Transactions on Parallel and Distributed Systems, 5(3):281--298, Mar. 1994.]]
[25]
B. Rosen, M. Wegman, and K. Zadeck. Global value numbers and redundant compuations. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA, Jan. 1988.]]
[26]
Y. Seo, H. Iwashita, H. Ohta, and S. Takahashi. HPF/JA: HPF extensions for real-world parallel applications. In Proceedings of the 2th Annual HPF User Group meeting, Porto, Portugal, June 1998.]]
[27]
K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. Concurrency: Practice and Experience, 10(11--13), September-November 1998.]]

Cited By

View all
  • (2020)OmpMemOpt: Optimized Memory Movement for Heterogeneous ComputingEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_13(200-216)Online publication date: 24-Aug-2020
  • (2019)Combining Static and Dynamic Data Coalescing in Unified Parallel CIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.240555127:2(381-393)Online publication date: 1-Jan-2019
  • (2015)LLVM-based communication optimizations for PGAS programsProceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC10.1145/2833157.2833164(1-11)Online publication date: 15-Nov-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
June 2005
310 pages
ISBN:1595930809
DOI:10.1145/1065944
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. communication analysis and optimization
  2. data-parallelism
  3. high-performance fortran (HPF)
  4. parallel languages

Qualifiers

  • Article

Conference

PPoPP05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2020)OmpMemOpt: Optimized Memory Movement for Heterogeneous ComputingEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_13(200-216)Online publication date: 24-Aug-2020
  • (2019)Combining Static and Dynamic Data Coalescing in Unified Parallel CIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.240555127:2(381-393)Online publication date: 1-Jan-2019
  • (2015)LLVM-based communication optimizations for PGAS programsProceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC10.1145/2833157.2833164(1-11)Online publication date: 15-Nov-2015
  • (2014)Affine Loop Optimization Based on Modulo Unrolling in ChapelProceedings of the 8th International Conference on Partitioned Global Address Space Programming Models10.1145/2676870.2676877(1-12)Online publication date: 6-Oct-2014
  • (2014)Reducing Compiler-Inserted Instrumentation in Unified-Parallel-C Code GenerationProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.34(270-277)Online publication date: 22-Oct-2014
  • (2013)Generating efficient data movement code for heterogeneous architectures with distributed-memoryProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523771(375-386)Online publication date: 7-Oct-2013
  • (2013)Optimizing remote accesses for offloaded kernelsProceedings of the Conference on Design, Automation and Test in Europe10.5555/2485288.2485430(575-580)Online publication date: 18-Mar-2013
  • (2013)Automatic data allocation and buffer management for multi-GPU machinesACM Transactions on Architecture and Code Optimization10.1145/254410010:4(1-26)Online publication date: 1-Dec-2013
  • (2013)Improving communication in PGAS environmentsProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465006(129-138)Online publication date: 10-Jun-2013
  • (2013)Dynamic memory access monitoring based on tagged memoryProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618833(409-410)Online publication date: Oct-2013
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media