Article

Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Author:

Géraud KrawezikAuthors Info & Claims

SPAA '03: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures

Pages 118 - 127

https://doi.org/10.1145/777412.777433

Published: 07 June 2003 Publication History

Abstract

When using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a choice of a broad range of programming approaches.To help the programmer in his selection, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B) and two shared memory multiprocessors (IBM SP3 Night Hawk II, SGI Origin 3800). We also present a path from MPI to OpenMP SPMD guiding the programmers starting from an existing MPI code. We present the first SPMD OpenMP version of the NAS benchmark and compare it with other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared to MPI for a large set of experimental conditions. However the price of this performance is a strong programming effort on data set adaptation and inter-thread communications. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences.

References

[1]

F. Cappello and D. Etiemble. MPI versus MPI+OpenMP on IBM SP for the NAS Benchmarks. Proc. of the international Conference on Supercomputing 2000 : High-Performance Networking and Computing (SC2000), 2000. http://www.sc2000.org/proceedings/techpapr/index.htm.]]

Digital Library

[2]

P. Kloos amd F. Mathey and P. Blaise. OpenMP and MPI programming with a CG algorithm. In Proceedings of the Second European Workshop on OpenMP (EWOMP 2000), http://www.epcc.ed.ac.uk/ewomp2000/proceedings.html, 2000.]]

[3]

A. Kneer. Industrial Mixed OpenMP/MPI CFD application for Practical Use in Free-surface Flow Calculations. In International Workshop on OpenMP Applications and Tools, WOMPAT 2000, http://www.cs.uh.edu/wompat2000/Program.html, 2000.]]

[4]

L. Smith and M. Bull. Development of Mixed Mode MPI/ OpenMP Applications. In International Workshop on OpenMP Applications and Tools, WOMPAT 2000, http://www.cs.uh.edu/wompat2000/Program.html, 2000.]]

[5]

Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. MPI: The Complete Reference. Massachussets Institute of Technology Press, 1996.]]

Digital Library

[6]

Jack Dongarra et al. Message Passing Interface Forum. www.mpi-forum.org/docs/docs.html, 1994.]]

[7]

M.K. Bane et al. A Comparison of MPI and OpenMP Implementations of a Finite Analysis Code. In Cray User Group, (CUG-2000) (Noordwijk, Netherlands, 22--26 May 2000), 2000.]]

[8]

Kazuhiro Kusano, Shigehisa Satoh, and Mitsuhisa Sato. Performance Evaluation of the Omni OpenMP Compiler. Lecture Notes in Computer Science, 1940:403--414, 2000.]]

Digital Library

[9]

B. Chapman, A. Patil, and A. Prabhakar. Performance Oriented Programming for NUMA Architectures. In Springer-Verlag~Berlin Heidelberg, editor, LNCS 2104, International Workshop on OpenMP Applications and Tools, WOMPAT 2001, West Lafayette, IN, USA, 2001.]]

Digital Library

[10]

P. Kloos amd F. Mathey and P. Blaise. OpenMP and MPI programming with a CG algorithm. In Proceedings of the Second European Workshop on OpenMP (EWOMP 2000), http://www.epcc.ed.ac.uk/ewomp2000/proceedings.html, 2000.]]

[11]

E. Ayguade et al. NANOS: Effective Integration of Fine-grain Parallelism Exploitation and Multiprogramming, 1999.]]

[12]

Yoshizumi Tanaka, Kenjiro Taura, Mitsuhisa Sato, and Akinori Yonezawa. Performance Evaluation of OpenMP Applications with Nested Parallelism. In Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 100--112, 2000.]]

Digital Library

[13]

David Bailey, Tim Harris, William Saphir, Rob van~der Wijngaart, Alex Woo, and Maurice Yarrow. The NAS Parallel Benchmarks 2.0. Report NAS-95-020, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Mail Stop T 27 A-1, Moffett Field, CA 94035-1000, USA, December 1995.]]

[14]

F. C. Wong, R. P. Martin, R. H. Arpaci-Dusseau, and D. E. Culler. Architectural Requirements and Scalability of the NAS Parallel Benchmarks. In Proc. of international Conference on Supercomputing 1999 : High-Performance Networking and Computing (SC299), 1999.]]

Digital Library

[15]

H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of the NAS Parallel Benchmarks and its Performance. In NASA Ames~Research Center, editor, Technical Report NAS-99-01, 1999.]]

[16]

B. Armstrong, S. Wook Kim, and R. Eigenmann. Quantifying Differences between OpenMP and MPI Using a Large-Scale Application Suite. In Springer-Verlag~Berlin Heidelberg, editor, LNCS 1940, ISHPC International Workshop on OpenMP: Experiences and Implementations (WOMPEI 2000), 2000.]]

Digital Library

[17]

A. J. Wallcraft. SPMD OpenMP vs MPI for Ocean Models. In First European Workshop on OpenMP - EWOMP'99, http://www.it.lth.se/ewomp99/programme.html, 2000.]]

Cited By

Zhou HRaffenetti KGuo YGillis TLatham RThakur R(2024)Designing and prototyping extensions to the Message Passing Interface in MPICHThe International Journal of High Performance Computing Applications10.1177/10943420241263544Online publication date: 19-Aug-2024
https://doi.org/10.1177/10943420241263544
Zhou HRaffenetti KZhang JGuo YThakur R(2023)Frustrated With MPI+Threads? Try MPIxThreads!Proceedings of the 30th European MPI Users' Group Meeting10.1145/3615318.3615320(1-10)Online publication date: 11-Sep-2023
https://dl.acm.org/doi/10.1145/3615318.3615320
Eghtesad AGermaschewski KKnezevic M(2022)Coupling of a multi-GPU accelerated elasto-visco-plastic fast Fourier transform constitutive model with the implicit finite element methodComputational Materials Science10.1016/j.commatsci.2022.111348208(111348)Online publication date: Jun-2022
https://doi.org/10.1016/j.commatsci.2022.111348
Show More Cited By

Index Terms

Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Recommendations

Towards automatic translation of OpenMP to MPI
ICS '05: Proceedings of the 19th annual international conference on Supercomputing

We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems. This translation aims to extend the ease of creating parallel applications with ...
Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles

When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), ...
A Detailed Performance Analysis of the Interpolation Supplemented Lattice Boltzmann Method on the Cray T3E and Cray X1A Detailed Performance Analysis of the Interpolation Supplemented Lattice Boltzmann Method on the Cray T3E and Cray X1

A detailed study of the parallel performance of the interpolation supplemented lattice Boltzmann (ISLB) method using SHMEM and MPI on the Cray T3E-900 and Cray X1 architectures is presented. The noteworthy feature of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '03: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures

June 2003

374 pages

ISBN:1581136617

DOI:10.1145/777412

General Chair:
Arnold Rosenberg
University of Massachusetts
,
Program Chair:
Friedhelm Meyer auf der Heide
U. Paderborn

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SPAA03

Sponsor:

SPAA03: 15th ACM Symposium on Parallelism in Algorithms and Architecturesn

June 7 - 9, 2003

California, San Diego, USA

Acceptance Rates

SPAA '03 Paper Acceptance Rate 38 of 106 submissions, 36%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
1,925
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)12

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou HRaffenetti KGuo YGillis TLatham RThakur R(2024)Designing and prototyping extensions to the Message Passing Interface in MPICHThe International Journal of High Performance Computing Applications10.1177/10943420241263544Online publication date: 19-Aug-2024
https://doi.org/10.1177/10943420241263544
Zhou HRaffenetti KZhang JGuo YThakur R(2023)Frustrated With MPI+Threads? Try MPIxThreads!Proceedings of the 30th European MPI Users' Group Meeting10.1145/3615318.3615320(1-10)Online publication date: 11-Sep-2023
https://dl.acm.org/doi/10.1145/3615318.3615320
Eghtesad AGermaschewski KKnezevic M(2022)Coupling of a multi-GPU accelerated elasto-visco-plastic fast Fourier transform constitutive model with the implicit finite element methodComputational Materials Science10.1016/j.commatsci.2022.111348208(111348)Online publication date: Jun-2022
https://doi.org/10.1016/j.commatsci.2022.111348
Rakhimov MElov JKhamdamov UAminov SJavliev S(2021)Parallel Implementation of Real-Time Object Detection using OpenMP2021 International Conference on Information Science and Communications Technologies (ICISCT)10.1109/ICISCT52966.2021.9670146(1-4)Online publication date: 3-Nov-2021
https://doi.org/10.1109/ICISCT52966.2021.9670146
Petrini AMesiti MSchubach MFrasca MDanis DRe MGrossi GCappelletti LCastrignanò TRobinson PValentini G(2020)parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variantsGigaScience10.1093/gigascience/giaa0529:5Online publication date: 23-May-2020
https://doi.org/10.1093/gigascience/giaa052
Jammer TIwainsky CBischof C(2020)A Comparison of the Scalability of OpenMP ImplementationsEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_6(83-97)Online publication date: 24-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-57675-2_6
Utrera GGil MMartorell X(2020)A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory ArchitecturesNumerical Computations: Theory and Algorithms10.1007/978-3-030-39081-5_28(318-325)Online publication date: 14-Feb-2020
https://doi.org/10.1007/978-3-030-39081-5_28
Gutierrez SDavis KArnold DBaker RRobey RMcCormick PHolladay DDahl JZerr RWeik FJunghans C(2017)Accommodating Thread-Level Heterogeneity in Coupled Parallel Applications2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.13(469-478)Online publication date: May-2017
https://doi.org/10.1109/IPDPS.2017.13
Royuela SMartorell XQuiñones EPinho L(2017)OpenMP Tasking Model for Ada: Safety and CorrectnessReliable Software Technologies – Ada-Europe 201710.1007/978-3-319-60588-3_12(184-200)Online publication date: 30-May-2017
https://doi.org/10.1007/978-3-319-60588-3_12
Kim JLee SVetter JNakashima HTaura KLange J(2016)IMPACCProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907302(189-201)Online publication date: 31-May-2016
https://dl.acm.org/doi/10.1145/2907294.2907302
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten