Abstract
This paper describes a technique for distributed recovery in multiprocessor ring configurations, which has been developed and implemented for the multiprocessor system DIRMU 25 — a 25 processor system which is operational at the University of Erlangen-Nuremberg. First a short overview of the DIRMU hardware architecture and the distributed operating system DIRMOS is given. The steps of distributed recovery using distributed system checkpoints are described. By measurement of the runtime overhead of a realistic application (2D-Poisson-multigrid) its efficiency is discussed in comparasion to recovery techniques using central system checkpoints.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dilger, E.; Maehle, E.: Systemarchitektur und Fehlertoleranz, Informatik Spektrum, Themenheft “Fehlertoleranz in Systemen”, Vol. 9, No. 2, p 110–118, April 1986.
Hackbusch, W.; Trottenberg, U.: Multigrid Methods, Lecture Notes in Mathematics 960, p 1–170, Springer Verlag, Berlin-Heidelberg 1982.
Haendler, W.; Maehle, E.; Wirl, K.: DIRMU Multiprocessor Configurations, Proc. 1985 Int. Conf. on Parallel Processing, p 652–656, St. Charles, Illinois 1985.
Hayes, J.P.; Yanney, R.N.: Distributed Recovery in Fault-Tolerant Multiprocessor Networks, IEEE Transactions on Computers, Vol. 35, No. 10, October 1986.
Maehle, E.; Moritzen, K.; Wirl, K.: Fault Tolerant Hardware Configuration Management on the Multiprocessor System DIRMU 25, Proceedings CONPAR 86, Aachen 1986, Lecture Notes in Computer Science 237, p 190–197, Springer-Verlag, Berlin-Heidelberg 1986.
Maehle, E.; Moritzen, K.; Wirl, K.: A Graph Modell and its Application to a Fault-Tolerant Multiprocessor System, Proceedings International Symposium on Fault-Tolerant Computing ‘FTCS-161’, p 292-297, Vienna 1986.
Young, J.W.: A First Order Approximation to the Optimum Checkpoint Interval, Communications of the ACM, Vol. 17, No. 6, p 493–499, September 1978.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1987 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lehmann, L., Brehm, J. (1987). Rollback Recovery in Multiprocessor Ring Configurations. In: Belli, F., Görke, W. (eds) Fehlertolerierende Rechensysteme / Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45628-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-45628-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-18294-8
Online ISBN: 978-3-642-45628-2
eBook Packages: Springer Book Archive