article

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

Authors:

Torsten Hoefler,

Darius Buntinas,

Ron Brightwell,

Rajeev ThakurAuthors Info & Claims

Computing, Volume 95, Issue 12

Pages 1121 - 1136

https://doi.org/10.1007/s00607-013-0324-2

Published: 01 December 2013 Publication History

Abstract

Hybrid parallel programming with the message passing interface (MPI) for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While this model provides a great deal of flexibility and performance potential, it saddles programmers with the complexity of utilizing two parallel programming systems in the same application. We introduce an MPI-integrated shared-memory programming model that is incorporated into MPI through a small extension to the one-sided communication interface. We discuss the integration of this interface with the MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency. We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40 % to the communication component of a five-point stencil solver.

References

[1]

MPI Forum (2012) MPI: a message-passing interface standard. version 3.0

[2]

Smith L, Bull M (2001) Development of mixed mode MPI/OpenMP applications. Sci Program 9(2,3):83---98

Digital Library

[3]

Rabenseifner R, Hager G, Jost G (2009) Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proceedings of the 17th Euromicro international conference on parallel, distributed and network-based processing

Digital Library

[4]

Demaine E (1997) A threads-only MPI implementation for the development of parallel programs. In: Proceedings of the 11th international symposium on HPC systems. pp 153---163

[5]

Bhargava P (1997) MPI-LITE: multithreading support for MPI. http://pcl.cs.ucla.edu/projects/sesame/mpi_lite/mpi_lite.html

[6]

Shen K, Tang H, Yang T (1999) Adaptive two-level thread management for fast MPI execution on shared memory machines. In: Proceedings of the ACM/IEEE conference on supercomputing

Digital Library

[7]

Tang H, Shen K, Yang T (2000) Program transformation and runtime support for threaded MPI execution on shared memory machines. ACM Trans Program Lang Syst 22:673---700

Digital Library

[8]

Pérachec M, Carribault P, Jourdren H (2009) MPC-MPI: an MPI implementation reducing the overall memory consumption. In: Proceedings of EuroPVM/MPI 2009, Springer, pp 94---103

Digital Library

[9]

Shirley D (2000) Enhancing MPI applications through selective use of shared memory on SMPs. In: Proceedings of the 1st SIAM conference on CSE

[10]

Los Alamos National Laboratory (2001) Unified parallel software users' guide and reference manual. http://public.lanl.gov/ups/Doc_Directory/UserGuide/UserGuide.pdf

[11]

Hoefler T, Dinan J, Buntinas D, Balaji P, Barrett B, Brightwell R, Gropp W, Kale V, Thakur R (2012) Leveraging MPIs one-sided communication interface for shared-memory programming. In: Träff J, Benkner S, Dongarra J (eds) Recent advances in the message passing interface. vol 7490, pp 132---141

Digital Library

[12]

Taft JR (2001) Achieving 60 GFLOP/s on the production CFD code OVERFLOW-MLP. Parallel Comput 27(4):521---536

Digital Library

[13]

Boehm HJ (2005) Threads cannot be implemented as a library. In: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation. PLDI '05, New York, NY, USA, ACM pp 261---268

Digital Library

[14]

Boehm HJ, Adve SV (2012) You do not know jack about shared variables or memory models. Commun. ACM 55(2):48---54

Digital Library

[15]

Aho AV, Sethi R, Ullman JD (1986) Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co. Inc., Boston

Digital Library

[16]

Manson J, Pugh W, Adve SV (2005) The Java memory model. In: Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on principles of programming languages. POPL '05, New York, ACM pp 378---391

Digital Library

[17]

Boehm HJ, Adve SV (2008) Foundations of the C++ concurrency memory model. SIGPLAN Not 43(6):68---78

Digital Library

[18]

Lee EA (2006) The problem with threads. Computer 39(5):33---42

Digital Library

[19]

Heroux MA, Brightwell R, Wolf MM (2011) Bi-modal MPI and MPI+threads computing on scalable multicore systems. IJHPCA (Submitted)

[20]

Sandia National Laboratories (2012) Mantevo project. http://www.mantevo.org

[21]

Saad Y (2003) Iterative methods for sparse linear systems. Society for Industrial and Applied Mathematics

Digital Library

[22]

Saltz JH (1990) Aggregation methods for solving sparse triangular systems on multiprocessors. SIAM J Sci Stat Comput 11(1):123---144

Digital Library

[23]

Wolf MM, Heroux MA, Boman EG (2010) Factors impacting performance of multithreaded sparse triangular solve. Technical report SAND2010-0331 presented at VECPAR'10

[24]

Esler KP, Kim J, Ceperley DM, Purwanto W, Walter EJ, Krakauer H, Zhang S, Kent PRC, Hennig RG, Umrigar C, Bajdich M, Koloren J, Mitas L, Srinivasan A (2008) Quantum monte carlo algorithms for electronic structure at the petascale; the endstation project. J Phys 125(1):012057

[25]

Wagner LK, Bajdich M, Mitas L (2009) Qwalk: a quantum monte carlo program for electronic structure. J Comput Phys 228(9):3390---3404

Digital Library

[26]

Esler KP Einspline libaray. Online: http://einspline.svn.sourceforge.net/

[27]

Niu Q, Dinan J, Tirukkovalur S, Mitas L, Wagner L, Sadayappan P (2012) A global address space approach to automated data management for parallel quantum Monte Carlo applications. In: Proceedings 19th international conference on high performance computing. HiPC'12

[28]

Smith L, Kent P (2000) Development and performance of a mixed OpenMP/MPI quantum Monte Carlo code. Concurr Pract Exp 12(12):1121---1129

[29]

Esler KP, Kim J, Ceperley DM, Shulenburger L (2012) Accelerating quantum Monte Carlo simulations of real materials on GPU clusters. Comput Sci Eng 14(1):40---51

Digital Library

Cited By

Besnard JShende SMalony AJaeger JPerache M(2022)Enabling Global MPI Process Addressing in MPI ApplicationsProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555829(27-36)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1145/3555819.3555829
Wang HChandramowlishwaran ACuicchi CQualters IKramer W(2020)PencilProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433814(1-16)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433814
Zambre RChandramowliswharan ABalaji PAyguadé EHwu WBadia RHofstee H(2020)How I learned to stop worrying about user-visible endpoints and love MPIProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392773(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392773
Show More Cited By

Index Terms

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

Hybrid parallel programming with MPI and unified parallel C
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global ...
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors
SPAA '03: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures

When using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a ...
Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Modern high-speed interconnection networks are designed with capabilities to support communication from multiple processor cores. The MPI endpoints extension has been proposed to ease process and thread count tradeoffs by enabling multithreaded MPI ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computing

Computing Volume 95, Issue 12

December 2013

97 pages

ISSN:0010-485X

Issue’s Table of Contents

Copyright © Copyright © 2013 Springer-Verlag Wien.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 December 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Besnard JShende SMalony AJaeger JPerache M(2022)Enabling Global MPI Process Addressing in MPI ApplicationsProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555829(27-36)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1145/3555819.3555829
Wang HChandramowlishwaran ACuicchi CQualters IKramer W(2020)PencilProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433814(1-16)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433814
Zambre RChandramowliswharan ABalaji PAyguadé EHwu WBadia RHofstee H(2020)How I learned to stop worrying about user-visible endpoints and love MPIProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392773(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392773
Zhou HGracia JSchneider R(2019)MPI Collectives for Multi-core ClustersWorkshop Proceedings of the 48th International Conference on Parallel Processing10.1145/3339186.3339199(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3339186.3339199
Amer AArcher CBlocksome MCao CChuvelev MFujita HGarzaran MGuo YHammond JIwasaki SRaffenetti KShiryaev MSi MTaura KThapaliya SBalaji PEigenmann RDing CMcKee S(2019)Software combining to mitigate multithreaded MPI contentionProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330378(367-379)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330378
Amer ALu HBalaji PChabbi MWei YHammond JMatsuoka S(2019)Lock Contention Management in Multithreaded MPIACM Transactions on Parallel Computing10.1145/32754435:3(1-21)Online publication date: 8-Jan-2019
https://dl.acm.org/doi/10.1145/3275443
Szustak LWyrzykowski RHalbiniak KBratek P(2019)Toward Heterogeneous MPI+MPI Programming: Comparison of OpenMP and MPI Shared Memory ModelsEuro-Par 2019: Parallel Processing Workshops10.1007/978-3-030-48340-1_21(270-281)Online publication date: 26-Aug-2019
https://dl.acm.org/doi/10.1007/978-3-030-48340-1_21
Bak SMenon HWhite SDiener MKale LEl-Araby EEl-Ghazawi TPanda D(2018)Multi-level load balancing with an integrated runtime approachProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00018(31-40)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1109/CCGRID.2018.00018
Diener MWhite SKale LCampbell MBodony DFreund JPeña ABalaji PGropp WThakur R(2017)Improving the memory access locality of hybrid MPI applicationsProceedings of the 24th European MPI Users' Group Meeting10.1145/3127024.3127038(1-10)Online publication date: 25-Sep-2017
https://dl.acm.org/doi/10.1145/3127024.3127038
Rivas-Gomez SGioiosa RPeng IKestor GNarasimhamurthy SLaure EMarkidis SPeña ABalaji PGropp WThakur R(2017)MPI windows on storage for HPC applicationsProceedings of the 24th European MPI Users' Group Meeting10.1145/3127024.3127034(1-11)Online publication date: 25-Sep-2017
https://dl.acm.org/doi/10.1145/3127024.3127034
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents