Abstract
While high performance computing was eagerly adopted by users as a vehicle for satisfying a growing demand on computational power, some areas are still poorly explored. The MPI paradigm is considered as being the keystone for the large development of the HPC infrastructure over the last decade. However, even today the users have to face the lack of tools able to help increase the stability of the software stack and/or of the applications. In this paper we present and evaluate a tool designed to allow developers to further investigate the execution of parallel applications by enabling them to dynamically move back and forth in the execution timeline of a parallel application. Based on an unobtrusive message logging mechanism, deterministic replay is enforced, leading to a simpler and more efficient way to debug parallel software.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gottbrath, C.: Eliminating parallel application memory bugs with totalview. In: SC 2006 Proceedings of the 2006 ACM/IEEE conference on Supercomputing p. 210. ACM Press, New York (2006)
Rudgyard, M.: Novel techniques for debugging and optimizing parallel applications. In: SC 2006, p. 281. ACM Press, New York (2006)
Vetter, J.S., de Supinski, B.R.: Dynamic software testing of mpi applications with umpire. In: SC 2000: Proceedings of the 2000 ACM/IEEE conference on Supercomputing, p. 51. IEEE Computer Society, Washington, DC, USA (2000)
Wolf, F., Mohr, B., Dongarra, J., Moore, S.: Efficient pattern search in large traces through successive refinement. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 47–54. Springer, Berlin (2004)
Noeth, M., Mueller, F., Schulz, M., de Supinski, B.: Scalable compression and replay of communication traces in massively parallel environments. In: 21th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), ACM Press, New York (to appear, 2007)
Geels, D., Altekar, G., Shenker, S., Stoica, I.: Replay debugging for distributed applications. In: Proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, USENIX, pp. 289–300 (2006)
Bouteiler, A., Herault, T., Krawezik, G., Lemarinier, P., Cappello, F.: MPICH-V project: a multiprotocol automatic fault tolerant MPI, vol. 20, pp. 319–333. SAGE Publications, Thousand Oaks (2006)
Clemencon, C., Fritscher, J., Meehan, M.J., Ruhl, R.: An implementation of race detection and deterministic replay with mpi. In: Haridi, S., Ali, K., Magnusson, P. (eds.) EURO-PAR 1995: Parallel Processing. LNCS, vol. 966, pp. 155–166. Springer, Heidelberg (1995)
Kranzlmuller, D., Schaubschlager, C., Volkert, J.: An integrated record&replay mechanism for nondeterministic message passing programs. In: Proceedings of the 8th EuroPVM/MPI Users’ Group Meeting, pp. 192–200. Springer, London, UK (2001)
de Kergommeaux, J.C., Ronsse, M., de Bosschere, K.: MPL*: Efficient record/replay of nondeterministic features of message passing libraries. In: Margalef, T., Dongarra, J.J., Luque, E. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 1697, pp. 141–148. Springer, Heidelberg (1999)
Maryama, M., Tsumara, T., Nakashima, H.: Parallel program debugging based on data replay. In: 17th IASTED International Conference on Parallel and Distributed Computing Systems, pp. 151–156. ACTA Press (November 2005)
Duell, J., Hargrove, P., Roman, E.: The design and implementation of berkeley lab’s linux checkpoint/restart. Technical Report LBNL-54941, Berkeley Lab (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bouteiller, A., Bosilca, G., Dongarra, J. (2007). Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging. In: Cappello, F., Herault, T., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2007. Lecture Notes in Computer Science, vol 4757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75416-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-75416-9_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75415-2
Online ISBN: 978-3-540-75416-9
eBook Packages: Computer ScienceComputer Science (R0)