Failure transparency

In a distributed system, failure transparency refers to the extent to which errors and subsequent recoveries of hosts and services within the system are invisible to users and applications. For example, if a server fails, but users are automatically redirected to another server and never notice the failure, the system is said to exhibit high failure transparency.

Failure transparency is one of the most difficult types of transparency to achieve since it is often difficult to determine whether a server has actually failed, or whether it is simply responding very slowly.^[1] Additionally, it is generally impossible to achieve full failure transparency in a distributed system since networks are unreliable.

There is also usually a trade-off between achieving a high level of failure transparency and maintaining an adequate level of system performance. For example, if a distributed system attempts to mask a transient server failure by having the client try to contact the failed server multiple times, performance of the system may be negatively affected. In this case, it would have been preferable to have given up earlier and tried another server.^[1]

References

^ ^a ^b Tanenbaum, Andrew S. and Maarten van Steen, Distributed Systems: Principles and Paradigms, Prentice Hall, Second Edition, 2007. ISBN 0-13-239227-5

References

See also