research-article

MPI detach — Towards automatic asynchronous local completion

Authors:

Joachim Protze,

Marc-André Hermanns,

Matthias S. Müller,

Van Man Nguyen,

Emmanuelle Saillard,

Patrick Carribault,

Denis BarthouAuthors Info & Claims

Volume 109, Issue C

https://doi.org/10.1016/j.parco.2021.102859

Published: 01 March 2022 Publication History

Abstract

When aiming for large-scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent load imbalance, synchronization cost is just the visible symptom of the load imbalance. Tasking approaches as in OpenMP, TBB, OmpSs, or C++20 coroutines promise to expose a higher degree of concurrency, which can be distributed on available execution units and significantly increase load balance. Available MPI non-blocking functionality does not integrate seamlessly into such tasking parallelization. In this work, we present a slim extension of the MPI interface to allow seamless integration of non-blocking communication with available concepts of asynchronous execution in OpenMP and C++. Using our concept allows to span task dependency graphs for asynchronous execution over the full distributed memory application. We furthermore investigate compile-time analysis necessary to transform an application using blocking MPI communication into an application integrating OpenMP tasks with our proposed MPI interface extension.

Highlights

•

MPI interface extensions to transfer request completion back to the MPI library.

•

callback-driven notification of asynchronous completion back to the application.

•

prototype implementation of the interface independent of the MPI implementation.

•

integration of MPI communication into OpenMP task programming.

•

compile-time analysis to convert blocking communication into non-blocking.

References

[1]

Forum M.P.I., MPI: A message-passing interface standard, version 3.1, 2015.

[2]

Dinan J., Grant R.E., Balaji P., Goodell D., Miller D., Snir M., Thakur R., Enabling communication concurrency through flexible MPI endpoints, Int. J. Supercomput. Appl. High Perform. Comput. 28 (4) (2014) 390–405,.

Digital Library

[3]

Grant R.E., Dosanjh M.G.F., Levenhagen M.J., Brightwell R., Skjellum A., Finepoints: Partitioned multithreaded MPI communication, in: High Performance Computing - 34th Intl. Conf., ISC High Performance 2019, Frankfurt/Main, Germany, June 16-20, 2019, Proc., 2019, pp. 330–350,.

[4]

OpenM.P. Architecture Review Board R.E., OpenMP application program interface version 5.0, 2018.

[5]

Schuchart J., Tsugane K., Gracia J., Sato M., The impact of taskyield on the design of tasks communicating through MPI, in: Evolving OpenMP for Evolving Architectures - Proc. of the 14th Intl. Workshop on OpenMP, IWOMP 2018, 2018, pp. 3–17,.

[6]

Klinkenberg J., Samfass P., Bader M., Terboven C., Müller M.S., CHAMELEON: reactive load balancing for hybrid MPI+OpenMP task-parallel applications, J. Parallel Distrib. Comput. 138 (2020) 55–64,.

Digital Library

[7]

Sala K., Bellón J., Farré P., Teruel X., Pérez J.M., Peña A.J., Holmes D.J., Beltran V., Labarta J., Improving the interoperability between MPI and task-based programming models, in: Proc. of the 25th European MPI Users’ Group Meeting, Vol. 2018, 2018, pp. 6:1–6:11,.

Digital Library

[8]

Sala K., Teruel X., Pérez J.M., Peña A.J., Beltran V., Labarta J., Integrating blocking and non-blocking MPI primitives with task-based programming models, 2019, CoRR arXiv:1901.03271.

[9]

Protze J., Hermanns M., Demiralp A.C., Müller M.S., Kuhlen T.W., MPI detach - asynchronous local completion, in: EuroMPI, ACM, 2020, pp. 71–80.

[10]

Baker M.B., et al., OpenSHMEM specification 1.4, 2017,.

[11]

Hermanns M.-A., Geimer M., Mohr B., Wolf F., Trace-based detection of lock contention in MPI one-sided communication, in: Tools for High Performance Computing, Vol. 2016, Springer Intl. Publishing, Cham, 2017, pp. 97–114,.

[12]

Kumar S., et al., The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer, in: Proc. of the 22nd Annual Intl. Conf. on Supercomputing, Vol. 2008, ICS 2008, 2008, pp. 94–103,.

Digital Library

[13]

Laguna I., Marshall R., Mohror K., Ruefenacht M., Skjellum A., Sultana N., A large-scale study of MPI usage in open-source HPC applications, in: SC, ACM, 2019, pp. 31:1–31:14.

[14]

H. Ahmed, A. Skjellum, P. Bangalore, P. Pirkelbauer, Transforming Blocking MPI Collectives to Non-Blocking and Persistent Operations, in: Proceedings of the 24th European MPI Users’ Group Meeting, 2017, pp. 1–11.

[15]

Nguyen V.M., et al., Automatic code motion to extend MPI nonblocking overlap window, in: Jagode H., Anzt H., Juckeland G., Ltaief H. (Eds.), High Performance Computing. ISC High Performance. Lecture Notes in Computer Science, Vol. 12321, Springer, Cham, 2020.

[16]

Lattner C., Adve V., LLVM: A compilation framework for lifelong program analysis & transformation, in: International Symposium on Code Generation and Optimization, Vol. 2004, CGO 2004, IEEE, 2004, pp. 75–86.

[17]

Wagner M., López V., Morillo J., Cavazzoni C., Affinito F., Giménez J., Labarta J., Performance analysis and optimization of the FFTXlib on the intel knights landing architecture, in: ICPP Workshops, IEEE Computer Society, 2017, pp. 243–250.

[18]

Lührs S., Rohe D., Schnurpfeil A., Thust K., Frings W., Flexible and generic workflow management, in: Parallel Computing: On the Road to Exascale Intl. Conf. on Parallel Computing 2015, Edinburgh (United Kingdom), 1 Sep 2015 - 4 Sep 2015, in: Advances in parallel computing, Vol. 27, IOS Press, Amsterdam, 2016, pp. 431–438,. URL https://www.fz-juelich.de/jsc/jube/.

Cited By

Jiao HZhang JSuzuki T(2024)Task-based low-rank hybrid parallel Cholesky factorization for distributed memory environmentProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635039(107-116)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3635035.3635039
Jenke JSchwitanski SThärigen IMüller M(2023)Mapping High-Level Concurrency from OpenMP and MPI to ThreadSanitizer FibersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624085(187-195)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624085
Protze JOrland FHaldar KKoritzius TTerboven C(2022)On-the-Fly Calculation of Model Factors for Multi-paradigm ApplicationsEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_5(69-84)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1007/978-3-031-12597-3_5

Index Terms

MPI detach — Towards automatic asynchronous local completion
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
        Parallel programming languages
  2. Software organization and properties

Index terms have been assigned to the content through auto-classification.

Recommendations

MPI Detach - Asynchronous Local Completion
EuroMPI/USA '20: Proceedings of the 27th European MPI Users' Group Meeting

When aiming for large scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking ...
Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation

High Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...
MPI as a Coordination Layer for Communicating HPF Tasks
MPIDC '96: Proceedings of the Second MPI Developers Conference

Abstract: Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model in which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Parallel Computing

Parallel Computing Volume 109, Issue C

Mar 2022

96 pages

ISSN:0167-8191

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiao HZhang JSuzuki T(2024)Task-based low-rank hybrid parallel Cholesky factorization for distributed memory environmentProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635039(107-116)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3635035.3635039
Jenke JSchwitanski SThärigen IMüller M(2023)Mapping High-Level Concurrency from OpenMP and MPI to ThreadSanitizer FibersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624085(187-195)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624085
Protze JOrland FHaldar KKoritzius TTerboven C(2022)On-the-Fly Calculation of Model Factors for Multi-paradigm ApplicationsEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_5(69-84)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1007/978-3-031-12597-3_5

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents