Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

MPI detach — Towards automatic asynchronous local completion

Published: 01 March 2022 Publication History

Abstract

When aiming for large-scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent load imbalance, synchronization cost is just the visible symptom of the load imbalance. Tasking approaches as in OpenMP, TBB, OmpSs, or C++20 coroutines promise to expose a higher degree of concurrency, which can be distributed on available execution units and significantly increase load balance. Available MPI non-blocking functionality does not integrate seamlessly into such tasking parallelization. In this work, we present a slim extension of the MPI interface to allow seamless integration of non-blocking communication with available concepts of asynchronous execution in OpenMP and C++. Using our concept allows to span task dependency graphs for asynchronous execution over the full distributed memory application. We furthermore investigate compile-time analysis necessary to transform an application using blocking MPI communication into an application integrating OpenMP tasks with our proposed MPI interface extension.

Highlights

MPI interface extensions to transfer request completion back to the MPI library.
callback-driven notification of asynchronous completion back to the application.
prototype implementation of the interface independent of the MPI implementation.
integration of MPI communication into OpenMP task programming.
compile-time analysis to convert blocking communication into non-blocking.

References

[1]
Forum M.P.I., MPI: A message-passing interface standard, version 3.1, 2015.
[2]
Dinan J., Grant R.E., Balaji P., Goodell D., Miller D., Snir M., Thakur R., Enabling communication concurrency through flexible MPI endpoints, Int. J. Supercomput. Appl. High Perform. Comput. 28 (4) (2014) 390–405,.
[3]
Grant R.E., Dosanjh M.G.F., Levenhagen M.J., Brightwell R., Skjellum A., Finepoints: Partitioned multithreaded MPI communication, in: High Performance Computing - 34th Intl. Conf., ISC High Performance 2019, Frankfurt/Main, Germany, June 16-20, 2019, Proc., 2019, pp. 330–350,.
[4]
OpenM.P. Architecture Review Board R.E., OpenMP application program interface version 5.0, 2018.
[5]
Schuchart J., Tsugane K., Gracia J., Sato M., The impact of taskyield on the design of tasks communicating through MPI, in: Evolving OpenMP for Evolving Architectures - Proc. of the 14th Intl. Workshop on OpenMP, IWOMP 2018, 2018, pp. 3–17,.
[6]
Klinkenberg J., Samfass P., Bader M., Terboven C., Müller M.S., CHAMELEON: reactive load balancing for hybrid MPI+OpenMP task-parallel applications, J. Parallel Distrib. Comput. 138 (2020) 55–64,.
[7]
Sala K., Bellón J., Farré P., Teruel X., Pérez J.M., Peña A.J., Holmes D.J., Beltran V., Labarta J., Improving the interoperability between MPI and task-based programming models, in: Proc. of the 25th European MPI Users’ Group Meeting, Vol. 2018, 2018, pp. 6:1–6:11,.
[8]
Sala K., Teruel X., Pérez J.M., Peña A.J., Beltran V., Labarta J., Integrating blocking and non-blocking MPI primitives with task-based programming models, 2019, CoRR arXiv:1901.03271.
[9]
Protze J., Hermanns M., Demiralp A.C., Müller M.S., Kuhlen T.W., MPI detach - asynchronous local completion, in: EuroMPI, ACM, 2020, pp. 71–80.
[10]
Baker M.B., et al., OpenSHMEM specification 1.4, 2017,.
[11]
Hermanns M.-A., Geimer M., Mohr B., Wolf F., Trace-based detection of lock contention in MPI one-sided communication, in: Tools for High Performance Computing, Vol. 2016, Springer Intl. Publishing, Cham, 2017, pp. 97–114,.
[12]
Kumar S., et al., The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer, in: Proc. of the 22nd Annual Intl. Conf. on Supercomputing, Vol. 2008, ICS 2008, 2008, pp. 94–103,.
[13]
Laguna I., Marshall R., Mohror K., Ruefenacht M., Skjellum A., Sultana N., A large-scale study of MPI usage in open-source HPC applications, in: SC, ACM, 2019, pp. 31:1–31:14.
[14]
H. Ahmed, A. Skjellum, P. Bangalore, P. Pirkelbauer, Transforming Blocking MPI Collectives to Non-Blocking and Persistent Operations, in: Proceedings of the 24th European MPI Users’ Group Meeting, 2017, pp. 1–11.
[15]
Nguyen V.M., et al., Automatic code motion to extend MPI nonblocking overlap window, in: Jagode H., Anzt H., Juckeland G., Ltaief H. (Eds.), High Performance Computing. ISC High Performance. Lecture Notes in Computer Science, Vol. 12321, Springer, Cham, 2020.
[16]
Lattner C., Adve V., LLVM: A compilation framework for lifelong program analysis & transformation, in: International Symposium on Code Generation and Optimization, Vol. 2004, CGO 2004, IEEE, 2004, pp. 75–86.
[17]
Wagner M., López V., Morillo J., Cavazzoni C., Affinito F., Giménez J., Labarta J., Performance analysis and optimization of the FFTXlib on the intel knights landing architecture, in: ICPP Workshops, IEEE Computer Society, 2017, pp. 243–250.
[18]
Lührs S., Rohe D., Schnurpfeil A., Thust K., Frings W., Flexible and generic workflow management, in: Parallel Computing: On the Road to Exascale Intl. Conf. on Parallel Computing 2015, Edinburgh (United Kingdom), 1 Sep 2015 - 4 Sep 2015, in: Advances in parallel computing, Vol. 27, IOS Press, Amsterdam, 2016, pp. 431–438,. URL https://www.fz-juelich.de/jsc/jube/.

Cited By

View all
  • (2024)Task-based low-rank hybrid parallel Cholesky factorization for distributed memory environmentProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635039(107-116)Online publication date: 18-Jan-2024
  • (2023)Mapping High-Level Concurrency from OpenMP and MPI to ThreadSanitizer FibersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624085(187-195)Online publication date: 12-Nov-2023
  • (2022)On-the-Fly Calculation of Model Factors for Multi-paradigm ApplicationsEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_5(69-84)Online publication date: 22-Aug-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Parallel Computing
Parallel Computing  Volume 109, Issue C
Mar 2022
96 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2022

Author Tags

  1. Message Passing Interface
  2. Asynchronous communication
  3. OpenMP tasking
  4. Hybrid parallelism
  5. Static analysis
  6. Code transformation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Task-based low-rank hybrid parallel Cholesky factorization for distributed memory environmentProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635039(107-116)Online publication date: 18-Jan-2024
  • (2023)Mapping High-Level Concurrency from OpenMP and MPI to ThreadSanitizer FibersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624085(187-195)Online publication date: 12-Nov-2023
  • (2022)On-the-Fly Calculation of Model Factors for Multi-paradigm ApplicationsEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_5(69-84)Online publication date: 22-Aug-2022

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media