research-article

Combining Static and Dynamic Data Coalescing in Unified Parallel C

Authors:

Michail Alvanos,

Montse Farreras,

Jose Nelson Amaral,

Xavier MartorellAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 27, Issue 2

Pages 381 - 393

https://doi.org/10.1109/TPDS.2015.2405551

Published: 01 February 2016 Publication History

Abstract

Significant progress has been made in the development of programming languages and tools that are suitable for hybrid computer architectures that group several shared-memory multicores interconnected through a network. This paper addresses important limitations in the code generation for partitioned global address space (PGAS) languages. These languages allow fine-grained communication and lead to programs that perform many fine-grained accesses to data. When the data is distributed to remote computing nodes, code transformations are required to prevent performance degradation. Until now code transformations to PGAS programs have been restricted to the cases where both the physical mapping of the data or the number of processing nodes are known at compilation time. In this paper, a novel application of the inspector-executor model overcomes these limitations and allows profitable code transformations, which result in fewer and larger messages sent through the network, when neither the data mapping nor the number of processing nodes are known at compilation time. A performance evaluation reports both scaling and absolute performance numbers on up to 32,768 cores of a Power 775 supercomputer. This evaluation indicates that the compiler transformation results in speedups between 1.15<inline-formula><tex-math>$\times$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="alvanos-ieq1-2405551.gif"/></alternatives></inline-formula> and 21<inline-formula><tex-math> $\times$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="alvanos-ieq2-2405551.gif"/></alternatives></inline-formula> over a baseline and that these automated transformations achieve up to 63 percent the performance of the MPI versions.

References

[1]

J. Protic, M. Tomasevic, and V. Milutinovic, “ Distributed shared memory: Concepts and systems,” IEEE Parallel Distrib. Technol.: Syst. Appl., vol. 4, no. 2, pp. 63–71, Jun. 1996.

Digital Library

[2]

B. Nitzberg and V. Lo, “Distributed shared memory: A survey of issues and algorithms,” Computer, vol. 24, no. 8, pp. 52–60, 1991.

Digital Library

[3]

C. Amza, A. L. Cox, H. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel, “TreadMarks: Shared memory computing on networks of workstations,” IEEE Comput., vol. 29, no. 2, pp. 18–28, Feb. 1996.

Digital Library

[4]

A. Itzkovitz and A. Schuster, “MultiView and Millipage—Fine-grain sharing in page-based DSMs,” in Proc. 3rd Symp. Oper. Syst. Des. Implementation, 1999, pp. 215– 228.

[5]

U. Consortium, (2013). UPC Specifications, V1.3, [Online]. Available: http://upc.gwu.edu/documentation.html

[6]

R. Numwich and J. Reid, “Co-array fortran for parallel programming,” Rutherford Appleton Lab., Chilton, Oxfordshire, England, Tech. Rep. RAL-TR-1998-060, 1998.

[7]

E. Allen, D. Chase, J. Hallett, V. Luchangco, J.-W. Maessen, S. Ryu, G. L. Steele Jr., and S. Tobin-Hochstadt. (2008, Mar. ). The fortress language specification version 1.0 [Online]. Available: http://labs.oracle.com/projects/plrg/Publications/fortress.1.0.pdf

[8]

Cray Inc. ( 2011, Apr.). Chapel language specification version 0.8 [Online]. Available: http://chapel.cray.com/spec/spec-0.8.pdf

[9]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, “X10: An object-oriented approach to non-uniform cluster computing,” ACM SIGPLAN Notices, vol. 40, no. 10, pp. 519–538, Oct. 2005.

Digital Library

[10]

K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken, “ Titanium: A high-performance java dialect,” Concurrency-Practice Exp., vol. 10, nos. 11/13, pp. 825–836, 1998.

[11]

MPI Forum, (2014). “$\sf {MPI}$: A message-passing interface standard [Online]. Available: http://www.mpi-forum.org

[12]

C. I. W. Chen and K. Yelick, “Communication optimizations for fine-grained UPC applications,” in Proc. 14th Int. Conf. Parallel Archit. Compilation Techn., 2005, pp. 267– 278.

Digital Library

[13]

D. Chavarria-Miranda and J. Mellor-Crummey, “Effective communication coalescing for data-parallel applications, ” in Proc. 10th ACM SIGPLAN Symp. Principles Practice Parallel Programm., 2005, pp. 14–25.

[14]

C. Barton, G. Almasi, M. Farreras, and J. Nelson Amaral, “A unified parallel C compiler that implements automatic communication coalescing,” presented at the 14th Workshop Compilers Parallel Comput., Zurich, Switzerland, 2009.

[15]

R. Rajamony, L. Arimilli, and K. Gildea, “PERCS: The IBM POWER7-IH high-performance computing system,” IBM J. Res. Develop., vol. 55, no. 3, pp. 3–1, 2011.

[16]

B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony, “The PERCS high-performance interconnect,” in Proc. 18th Annu. Symp. High-Perform. Interconnects, 2010, pp. 75–82.

[17]

J. H. Saltz, R. Mirchandaney, and K. Crowley, “ Run-time parallelization and scheduling of loops,” IEEE Trans. Comput. , vol. 40, no. 5, pp. 603–612, May 1991.

Digital Library

[18]

C. Koelbel and P. Mehrotra, “Compiling global name-space parallel loops for distributed execution,” IEEE Trans. Parallel Distrib. Syst., vol. 2, no. 2, pp. 440 –451, Oct. 1991.

Digital Library

[19]

P. Brezany, M. Gerndt, and V. Sipkova, “SVM support in the Vienna fortran compilation system,” Julich Supercomputing Centre, KFA Juelich, Tech. Rep. KFA-ZAM-IB-9401, 1994.

[20]

J. Su and K. Yelick, “ Automatic support for irregular computations in a high-level language,” in Proc. 19th IEEE Int. Parallel Distrib. Process. Symp., 2005, p. 56b.

[21]

International Organization for Standardization, “ISO/IEC 9899:TC2 Programming Languages - C, ” May 2005.

[22]

M. Gupta, E. Schonberg, and H. Srinivasan, “A unified framework for optimizing communication in data-parallel programs,” IEEE Trans. Parallel Distrib. Syst., vol. 7, no. 7, pp. 689– 704, Jul. 1996.

Digital Library

[23]

D. Yokota, S. Chiba, and K. Itano, “A new optimization technique for the inspector-executor method,” in Proc. Int. Conf. Parallel Distrib. Comput. Syst., 2002, pp. 706–711.

[24]

W.-Y. Chen, D. Bonachea, C. Iancu, and K. Yelick, “Automatic nonblocking communication for partitioned global address space programs,” in Proc. 21st Annu. Int. Conf. Supercomput. , 2007, pp. 158–167.

Digital Library

[25]

C. M. Barton, “Improving access to shared data in a partitioned global address space programming model,” Ph.D. thesis, Department of Computing Science, University of Alberta, 2009.

[26]

Y. Dotsenko, C. Coarfa, and J. Mellor-Crummey, “A multi-platform co-array fortran compiler,” in Proc. 13th Int. Conf. Parallel Arch. Compilation Techn., 2004, pp. 29–40.

Digital Library

[27]

K. Ebcioglu, V. Saraswat, and V. Sarkar, “X10: Programming for hierarchical parallelism and non-uniform data access,” in Proc. Int. Workshop Lang. Runtimes, 2004, pp. 519–538.

[28]

A. Sanz, R. Asenjo, J. Lopez, R. Larrosa, A. Navarro, V. Litvinov, S.-E. Choi, and B. L. Chamberlain, “Global data re-allocation via communication aggregation in Chapel,” in Proc. 24th Int. Symp. Comput. Archit. High Perform. Comput., 2012, pp. 235–242.

[29]

M. Alvanos, M. Farreras, E. Tiotto, and X. Martorell, “Automatic communication coalescing for irregular computations in UPC language,” in Proc. Conf. Center Adv. Studies Collaborative Res., 2012, pp. 220–234.

[30]

M. Alvanos and E. Tiotto, “Data prefetching and coalescing for partitioned global address space languages, ” US Patent App. 13/659,048, Oct. 24, 2012.

[31]

M. Alvanos, M. Farreras, E. Tiotto, J. N. Amaral, and X. Martorell, “Improving communication in pgas environments: Static and dynamic coalescing in UPC,” in Proc. 27th Annu. Int. Conf. Supercomput., 2013, pp. 129–138.

[32]

G. Tanase, G. Almási, E. Tiotto, M. Alvanos, A. Ly, and B. Daltonn, “ Performance analysis of the IBM XL UPC on the PERCS architecture,” Tech. Rep. IBM RC25360, 2013.

[33]

G. I. Tanase, G. Almási, H. Xue, and C. Archer, “Composable, non-blocking collective operations on power7 IH,” in Proc. 26th ACM Int. Conf. Supercomput., 2012, pp. 215–224.

Digital Library

[34]

R. Kalla, B. Sinharoy, W. Starke, and M. Floyd, “Power7: IBM's Next-generation server processor,” IEEE Micro, vol. 30, no. 2, pp. 7–15, Mar./Apr. 2010.

Digital Library

[35]

T. El-Ghazawi and F. Cantonnet, “UPC performance and potential: A NPB experimental study,” in Proc. ACM/IEEE Conf. Supercomput., 2002, pp. 1–26.

[36]

S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, ser. Cambridge Monographs on Mathematical Physics.Cambridge, U.K.: Cambridge Univ. Press, 2003.

[37]

A. K. Dewdney, “Computer recreations sharks and fish wage an ecological war on the toroidal planet wa-tor,” Sci. Amer., vol. 251, pp. 14 –22, 1984.

[38]

T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson, Introduction to Algorithms, 2nd ed. New York, NY, USA: McGraw-Hill.

Digital Library

[39]

C. Barton, C. Cascaval, G. Almasi, Y. Zheng, M. Farreras, S. Chatterje, and J. N. Amaral, “Shared memory programming for large scale machines,” in Proc. ACM Conf. Programm. Lang. Des. Implementation, Jun. 2006, pp. 108–117.

[40]

K. J. Barker, A. Hoisie, and D. J. Kerbyson, “An early performance analysis of POWER7-IH HPC systems,” in Proc. Int. Conf. High Perform. Comput., Netw., Storage Anal., 2011, pp. 42:1–42:11.

[41]

Redbooks, IBM Power Systems 775 for AIX and Linux HPC Solution, IBM, Armonk, NY, USA, 2012.

Index Terms

Combining Static and Dynamic Data Coalescing in Unified Parallel C
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Parallel programming languages
  2. Software organization and properties

Index terms have been assigned to the content through auto-classification.

Recommendations

Combining Data and Computation Distribution Directives for Hybrid Parallel Programming: A Transformation System

This paper describes dSTEP, a directive-based programming model for hybrid shared and distributed memory machines. The originality of our work is the definition and an implementation of a unified high-level programming model addressing both data and ...
Effective communication coalescing for data-parallel applications
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming

Communication coalescing is a static optimization that can reduce both communication frequency and redundant data transfer in compiler-generated code for regular, data parallel applications. We present an algorithm for coalescing communication that ...
Data-Parallel Programming on MIMD Computers

The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Copyright © 2015.

Publisher

IEEE Press

Publication History

Published: 01 February 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents