Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/370049.370462acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

Dynamic software testing of MPI applications with umpire

Published: 01 November 2000 Publication History

Abstract

As evidenced by the popularity of MPI (Message Passing Interface), message passing is an effective programming technique for managing coarse-grained concurrency on distributed computers. Unfortunately, debugging message-passing applications can be difficult. Software complexity, data races, and scheduling dependencies can make programming errors challenging to locate with manual, interactive debugging techniques. This article describes Umpire, a new tool for detecting programming errors at runtime in message passing applications. Umpire monitors the MPI operations of an application by interposing itself between the application and the MPI runtime system using the MPI profiling layer. Umpire then checks the application's MPI behavior for specific errors. Our initial collection of programming errors includes deadlock detection, mismatched collective operations, and resource exhaustion. We present an evaluation on a variety of applications that demonstrates the effectiveness of this approach.

References

[1]
Z. Aral and I. Gertner, "High-Level Debugging in Parasight," ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, published in ACM SIGPLAN Notices, 24:151-62, 1989.
[2]
P.C. Bates, "Debugging heterogeneous distributed systems using event-based models of behavior," ACM Trans. Computer Systems, 13(1):1-31, 1995.
[3]
M.E. Crovella and T.J. LeBlanc, "Performance debugging using parallel performance predicates," SIGPLAN Notices (ACM/ONR Workshop on Parallel and Distributed Debugging), 28, no.12:140-50, 1993.
[4]
J. Cuny, G. Forman et al., "The Ariadne debugger: scalable application of event-based abstraction," SIGPLAN Notices (ACM/ONR Workshop on Parallel and Distributed Debugging), 28, no.12:85-95, 1993.
[5]
A. Dinning and E. Schonberg, "Detecting Access Anomalies in Programs with Critical Sections," Proc. ACM/ONR Workshop on Parallel and Distributed Debugging, 1991, pp. 85-96.
[6]
A. Eustace and A. Srivastava, "ATOM: a flexible interface for building high performance program analysis tools," Proc. 1995 USENIX Technical Conf., 1995, pp. 303-14.
[7]
W. Gropp, E. Lusk, and A. Skjellum, Using MPI: portable parallel programming with the messagepassing interface, 2nd ed. Cambridge, MA: MIT Press, 1999.
[8]
D.P. Helmbold, C.E. McDowell, and J.-Z. Wang, "Determining Possible Event Orders by Analyzing Sequential Traces," IEEE Trans. Parallel and Distributed Systems, 4(7):827-40, 1993.
[9]
R. Hood, K. Kennedy, and J. Mellor-Crummey, "Parallel program debugging with on-the-fly anomaly detection," Proc. Supercomputing'90, 1990, pp. 74-81.
[10]
HPCC, "HPCC 1998 Blue Book. (Computing, Information, and Communications: Technologies for the 21st Century)," Computing, Information, and Communications (CIC) R&D Subcommittee of the National Science and Technology Council's Committee on Computing, Information, and Communications (CCIC) 1998.
[11]
Kuck.and.Associates.Inc., KAI Assure, http://www.kai.com/assure-all, 2000.
[12]
D.C. Marinescu, H.J. Siegel et al., "Models for Monitoring and Debugging Tools for Parallel and Distributed Software," Jour. Parallel and Distributed Computing, 9:171-84, 1990.
[13]
J. Mellor-Crummey, "Compile-time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs," SIGPLAN Notices (ACM/ONR Workshop on Parallel and Distributed Debugging):129-39, 1993.
[14]
B.P. Miller and J.-D. Choi, "Breakpoints and Halting in Distributed Programs," Proc. Eighth Int'l Conf. Distributed Computing Systems, 1988, pp. 316-23.
[15]
R.H.B. Netzer, "Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs," Proc. ACM/ONR Workshop on Parallel and Distributed Debugging, 1993, pp. 1-11.
[16]
W. Pfeiffer, S. Hotovy et al., "JNNIE: The Joint NSF-NASA Initiative on Evaluation," NSF-NASA 1995.
[17]
Rational-Corporation, Rational Purify for UNIX, http://www.rational.com/products/purify_unix, 2000.
[18]
S. Savage, M. Burrows et al., "Eraser: a dynamic data race detector for multithreaded programs," ACM Trans. Computer Systems, 15(4):391-411, 1997.
[19]
M. Snir, S. Otto et al., Eds., MPI--the complete reference, 2nd ed. Cambridge, MA: MIT Press, 1998.
[20]
M. Spezialetti and R. Gupta, "Exploiting program semantics for efficient instrumentation of distributed event recognitions," Proc. 13th Symp. Reliable Distributed Systems, 1994, pp. 181-90.
[21]
SunSoft, "Locklint User's Guide," SunSoft, Manual 1994.

Cited By

View all
  • (2024)Efficient Deadlock Detection in MPI Programs with Path Compression and Focus MatchingProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674822(467-476)Online publication date: 24-Jul-2024
  • (2022)Towards a Hybrid MPI Correctness Benchmark SuiteProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555853(46-56)Online publication date: 14-Sep-2022
  • (2021)MPI-CorrBenchProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460652(69-80)Online publication date: 21-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing
November 2000
889 pages
ISBN:0780398025

Sponsors

In-Cooperation

  • SIAM: Society for Industrial and Applied Mathematics

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2000

Check for updates

Qualifiers

  • Article

Conference

SC '00
Sponsor:

Acceptance Rates

SC '00 Paper Acceptance Rate 62 of 179 submissions, 35%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)9
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Deadlock Detection in MPI Programs with Path Compression and Focus MatchingProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674822(467-476)Online publication date: 24-Jul-2024
  • (2022)Towards a Hybrid MPI Correctness Benchmark SuiteProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555853(46-56)Online publication date: 14-Sep-2022
  • (2021)MPI-CorrBenchProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460652(69-80)Online publication date: 21-Jun-2021
  • (2020)Detecting and reproducing error-code propagation bugs in MPI implementationsProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374515(187-201)Online publication date: 19-Feb-2020
  • (2017)ParastackProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126938(1-12)Online publication date: 12-Nov-2017
  • (2016)Runtime Correctness Analysis of MPI-3 Nonblocking CollectivesProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966906(188-197)Online publication date: 25-Sep-2016
  • (2016)Nasty-MPIProceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 983310.1007/978-3-319-43659-3_4(51-62)Online publication date: 24-Aug-2016
  • (2015)When truth is efficient: analysing concurrencyProceedings of the 2015 International Symposium on Software Testing and Analysis10.1145/2771783.2771790(141-152)Online publication date: 13-Jul-2015
  • (2015)Debugging high-performance computing applications at massive scalesCommunications of the ACM10.1145/266721958:9(72-81)Online publication date: 24-Aug-2015
  • (2014)CommGramProceedings of the First Workshop on Visual Performance Analysis10.1109/VPA.2014.8(28-35)Online publication date: 16-Nov-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media