Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/11557265_21guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Collective error detection for MPI collective operations

Published: 18 September 2005 Publication History

Abstract

An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather performance data on MPI programs. Here we present a profiling library whose purpose is to detect user errors in the use of MPI’s collective operations. While some errors can be detected locally (by a single process), other errors involving the consistency of arguments passed to MPI collective functions must be tested for in a collective fashion. While the idea of using such a profiling library does not originate here, we take the idea further than it has been taken before (we detect more errors) and offer an open-source library that can be used with any MPI implementation. We describe the tests carried out, provide some details of the implementation, illustrate the usage of the library, and present performance tests.

References

[1]
G. Almási, C. Archer, J. G. Casta nos, M. Gupta, X. Martorell, J. E. Moreira, W. D. Gropp, S. Rus, and B. Toonen. MPI on BlueGene/L: Designing an efficient general purpose messaging solution for a large cellular system. In Jack Dongarra, Domenico Laforenza, and Salvatore Orlando, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, number LNCS 2840 in Lecture Notes in Computer Science, pages 352-361. Springer Verlag, 2003.
[2]
William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir. MPIThe Complete Reference: Volume 2, The MPI-2 Extensions. MIT Press, Cambridge, MA, 1998.
[3]
William D. Gropp. Runtime checking of datatype signatures in MPI. In Jack Dongarra, Peter Kacsuk, and Norbert Podhorszki, editors, Recent Advances in Parallel Virutal Machine and Message Passing Interface, number 1908 in Springer Lecture Notes in Computer Science, pages 160-167, September 2000.
[4]
MPICH2 Web page. http://www.mcs.anl.gov/mpi/mpich2.
[5]
R. Rosner, A. Calder, J. Dursi, B. Fryxell, D. Q. Lamb, J. C. Niemeyer, K. Olson, P. Ricker, F. X. Timmes, J. W. Truran, H. Tufo, Y. Young, M. Zingale, E. Lusk, and R. Stevens. Flash code: Studying astrophysical thermonuclear flashes. Computing in Science and Engineering, 2(2):33, 2000.
[6]
Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W. Walker, and Jack Dongarra. MPIThe Complete Reference: Volume 1, The MPI Core, 2nd edition. MIT Press, Cambridge, MA, 1998.
[7]
Jesper Larsson Träff and Joachim Worringen. Verifying collective MPI calls. In Dieter Kranslmuller, Peter Kacsuk, and Jack Dongarra, editors, Recent Advances in Parallel Virutal Machine and Message Passing Interface, number 3241 in Springer Lecture Notes in Computer Science, pages 18-27, 2004.

Cited By

View all
  • (2016)Runtime Correctness Analysis of MPI-3 Nonblocking CollectivesProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966906(188-197)Online publication date: 25-Sep-2016
  • (2014)Accurate application progress analysis for large-scale parallel debuggingACM SIGPLAN Notices10.1145/2666356.259433649:6(193-203)Online publication date: 9-Jun-2014
  • (2014)Accurate application progress analysis for large-scale parallel debuggingProceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2594291.2594336(193-203)Online publication date: 9-Jun-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
PVM/MPI'05: Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
September 2005
545 pages
ISBN:3540290095
  • Editors:
  • Beniamino Martino,
  • Dieter Kranzlmüller,
  • Jack Dongarra

Sponsors

  • Myricom
  • Microsoft: Microsoft
  • Intel: Intel
  • IBM: IBM
  • HP: HP

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 18 September 2005

Author Tags

  1. MPI
  2. collective
  3. datatype
  4. errors
  5. hashing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Runtime Correctness Analysis of MPI-3 Nonblocking CollectivesProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966906(188-197)Online publication date: 25-Sep-2016
  • (2014)Accurate application progress analysis for large-scale parallel debuggingACM SIGPLAN Notices10.1145/2666356.259433649:6(193-203)Online publication date: 9-Jun-2014
  • (2014)Accurate application progress analysis for large-scale parallel debuggingProceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2594291.2594336(193-203)Online publication date: 9-Jun-2014
  • (2013)Runtime MPI collective checking with tree-based overlay networksProceedings of the 20th European MPI Users' Group Meeting10.1145/2488551.2488570(129-134)Online publication date: 15-Sep-2013
  • (2013)Combining static and dynamic validation of MPI collective communicationsProceedings of the 20th European MPI Users' Group Meeting10.1145/2488551.2488555(117-122)Online publication date: 15-Sep-2013
  • (2012)MPI runtime error detection with MUSTProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389037(1-11)Online publication date: 10-Nov-2012
  • (2009)MPIWizACM SIGPLAN Notices10.1145/1594835.150421344:4(251-260)Online publication date: 14-Feb-2009
  • (2009)MPIWizProceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/1504176.1504213(251-260)Online publication date: 14-Feb-2009
  • (2007)Open issues in MPI implementationProceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture10.5555/2392163.2392194(327-338)Online publication date: 23-Aug-2007
  • (2007)A Portable Method for Finding User Errors in the Usage of MPI Collective OperationsInternational Journal of High Performance Computing Applications10.1177/109434200707786021:2(155-165)Online publication date: 1-May-2007

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media