Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Practical byzantine fault tolerance and proactive recovery

Published: 01 November 2002 Publication History

Abstract

Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2% faster to 24% slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the BFT library can be used to build practical systems that tolerate Byzantine faults.

References

[1]
Alsberg, P. and Day, J. 1976. A principle for resilient sharing of distributed resources. In Proceedings of the Second International Conference on Software Engineering, IEEE Computer Society Press, San Francisco, 627--644.]]
[2]
Alvisi, L., Malkhi, D., Pierce, E., Reiter, M., and Wright, R. 2000. Dynamic Byzantine quorum systems. In International Conference on Dependable Systems and Networks (DSN, FTCS-30 and DCCA-8), IEEE Computer Society Press, New York, 283--292.]]
[3]
Alvisi, L., Pierce, E., Malkhi, D., and Reiter, M. 1999. Fault detection for Byzantine quorum systems. In Proceedings of the Seventh IFIP International Working Conference on Dependable Computing for Critical Applications (DCCA-7), IEEE Computer Society Press, San Jose, Calif. 357--371.]]
[4]
Bellare, M. and Micciancio, D. 1997. A new paradigm for collision-free hashing: Incrementality at reduced cost. In Advances in Cryptology---EUROCRYPT' 97, Lecture Notes in Computer Science, vol. 1233, W. Fumy, Ed., Springer-Verlag, Konstanz, Germany, 163--192.]]
[5]
Bellare, M. and Rogaway, P. 1995. Optimal asymmetric encryption---How to encrypt with RSA. In Advances in Cryptology---EUROCRYPT 94, Lecture Notes in Computer Science, vol. 950, A. D. Santis, Ed., Springer-Verlag, Perugia, Italy, 92--111.]]
[6]
Bellare, M. and Rogaway, P. 1996. The exact security of digital signatures. How to sign with RSA and Rabin. In Advances in Cryptology---EUROCRYPT 96, Lecture Notes in Computer Science, vol. 1070, U. Maurer, Ed., Springer-Verlag, Zaragoza, Spain, 399--416.]]
[7]
Bennett, C., Bessette, F., Brassard, G., Salvail, L., and Smolin, J. 1992. Experimental quantum cryptography. J. Cryptol. 5, 1, 3--28.]]
[8]
Black, J., Halevi, S., Krawczyk, H., Krovetz, T., and Rogaway, P. 1999. UMAC: Fast and secure message authentication. In Advances in Cryptology---CRYPTO'99, Lecture Notes in Computer Science, vol. 1666, M. Wiener, Ed., Springer-Verlag, Santa Barbara, Calif., 216--233.]]
[9]
Blum, M., Evans, W., Gemmel, P., Kannan, S., and Naor, M. 1994. Checking the correctness of memories. Algorithmica 12, 225--244.]]
[10]
Bracha, G. and Toueg, S. 1985. Asynchronous consensus and broadcast protocols. J. ACM 32, 4, 824--240.]]
[11]
Cachin, C., Kursawe, K., and Shoup, V. 2000. Random oracles in Constantinople: Practical asynchronous Byzantine agreement using cryptography. In Proceedings of the Nineteenth ACM Symposium on Principles of Distributed Computing (PODC 2000), ACM Press, Portland, Ore.]]
[12]
Canetti, R. and Rabin, T. 1992. Optimal asynchronous byzantine agreement. Tech. Rep. #92-15, Computer Science Department, Hebrew University.]]
[13]
Canetti, R., Halevi, S., and Herzberg, A. 1997. Maintaining authenticated communication in the presence of break-ins. In Proceedings of the Fourth ACM Conference on Computers and Communication Security, ACM Press, Zurich, Switzerland.]]
[14]
Castro, M. 2001. Practical Byzantine fault tolerance. Tech. Rep. MIT/LCS/TR-817, MIT Laboratory for Computer Science. January.]]
[15]
Castro, M. and Liskov, B. 1999a. A Correctness proof for a practical byzantine-fault-tolerant replication algorithm. Tech. Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science.]]
[16]
Castro, M. and Liskov, B. 1999b. Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI), USENIX, New Orleans.]]
[17]
Chockler, G., Malkhi, D., and Reiter, M. 2001. Backoff protocols for distributed mutual exclusion and ordering. In Proceedings of the 21st International Conference on Distributed Computing Systems, IEEE Computer Society Press, Phoenix, Ariz.]]
[18]
Cristian, F., Aghili, H., Strong, R., and Dolev, D. 1985. Atomic broadcast: From simple message diffusion to Byzantine agreement. In Proceedings of the Fifteenth International Conference on Fault Tolerant Computing, IEEE Computer Society Press, Ann Arbor, Mich.]]
[19]
Deering, S. and Cheriton, D. 1990. Multicast routing in datagram internetworks and extended LANs. ACM Trans. Comput. Syst. 8, 2 (May), 85--110.]]
[20]
Doudou, A., Garbinato, B., and Guerraoui, R. 2000. Modular abstractions for devising Byzantine-resilient state machine Replication. In Proceedings of the IEEE Symposium on Reliable Distributed Systems, IEEE Computer Society Press, Nurnberg, Germany, 144--153.]]
[21]
Doudou, A., Garbinato, B., Guerraoui, R., and Schiper, A. 1999. Muteness failure detectors: Specification and implementation. In Proceedings of the Third European Dependable Computing Conference (EDCC-3), Lecture Notes in Computer Science, vol. 1667, J. Hlavicka, E. Maehle, and A. Pataricza, Eds., Springer-Verlag, Prague, Czech Republic, 71--87.]]
[22]
Fischer, M., Lynch, N., and Paterson, M. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (April), 374--382.]]
[23]
Fu, K., Kaashoek, M. F., and Mazières, D. 2000. Fast and secure distributed read-only file system. In Proceedings of the Fourth USENIX Symposium on Operating Systems Design and Implementation (OSDI 2000), USENIX, San Diego.]]
[24]
Garay, J. and Moses, Y. 1998. Fully polynomial Byzantine agreement for n > 3t processors in t + 1 rounds. SIAM J. Comput. 27, 1 (Feb.), 247--290.]]
[25]
Garay, J., Gennaro, R., Jutla, C., and Rabin, T. 2000. Secure distributed storage and retrieval. Theo. Comput. Sci. 243, 1--2 (July), 363--389.]]
[26]
Gifford, D. K. 1979. Weighted voting for replicated data. In Proceedings of the Seventh Symposium on Operating Systems Principles, ACM Press, Pacific Grove, Calif., 150--162.]]
[27]
Gong, L. 1992. A security risk of depending on synchronized clocks. Oper. Syst. Rev. 26, 1 (Jan.), 49--53.]]
[28]
Gray, J. 2000. FT 101. Talk at the University of California at Berkeley.]]
[29]
Herlihy, M. P. and Wing, J. M. 1987. Axioms for concurrent objects. In Proceedings of the Fourteenth ACM Symposium on Principles of Programming Languages, ACM Press, Munich, 13--26.]]
[30]
Herzberg, A., Jakobsson, M., Jarecki, S., Krawczyk, H., and Yung, M. 1997. Proactive public key and signature systems. In Proceedings of the Fourth ACM Conference on Computers and Communication Security, ACM Press, Zurich, Switzerland.]]
[31]
Herzberg, A., Jarecki, S., Krawczyk, H., and Yung, M. 1995. Proactive secret sharing, or: How to cope with perpetual leakage. In Advances in Cryptology---CRYPTO'95, Lecture Notes in Computer Science, vol. 963, D. Coppersmith, Ed., Springer-Verlag, Santa Barbara, Calif.]]
[32]
Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., and West, M. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 51--81.]]
[33]
Katcher, J. 1997. PostMark: A new file system benehmark. Tech. Rep. TR-3022, Network Appliance. October.]]
[34]
Keidar, I. and Dolev, D. 1996. Efficient message ordering in dynamic networks. In Proceedings of the Fifteenth ACM Symposium on Principles of Distributed Computing, ACM Press, Philadelphia, 68--76.]]
[35]
Keidar, I. and Dolev, D. 1998. Increasing the resilience of distributed and replicated database systems. J. Computer Syst. Sci. 57, 3 (Dec.), 309--324.]]
[36]
Kihlstrom, K., Moser, L., and Melliar-Smith, P. 1998. The SecureRing protocols for securing group communication. In Proceedings of the Hawaii International Conference on System Sciences, IEEE Computer Society Press, Hawaii.]]
[37]
Lamport, L. 1977. Proving the correctness of multiprocess programs. IEEE Trans. Softw. Eng. 3, 2 (Nov.), 125--143.]]
[38]
Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July), 558--565.]]
[39]
Lamport, L. 1984. Using time instead of timeout for fault-tolerant distributed systems. ACM Trans. Program. Lang. and Syst. 6, 2 (Apr.), 254--280.]]
[40]
Lamport, L. 1989. The part-time parliament. Research Rep. 49, Digital Equipment Corporation Systems Research Center, Palo Alto, Sept.]]
[41]
Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4, 3 (July), 382--401.]]
[42]
Lampson, B. 2001. The ABCDs of Paxos. Presented at Principles of Distributed Computing. Available at http://www.research.microsoft.com/lampson.]]
[43]
Liskov, B. and Zilles, S. 1975. Specification techniques for data abstractions. IEEE Trans. Softw. Eng. SE-1, 1 (Mar.), 7--17.]]
[44]
Liskov, B., Ghemawat, S., Gruber, R., Johnson, P., Shrira, L., and Williams, M. 1991. Replication in the Harp file system. In Proceedings of the Thirteenth ACM Symposium on Operating System Principles (SOSP), ACM Press, Pacific Grove, Calif., 226--238.]]
[45]
Lynch, N. 1996. Distributed Algorithms. Morgan Kaufmann Publishers, San Mateo, Calif.]]
[46]
Maheshwari, U., Vingralek, R., and Shapiro, B. 2000. How to build a trusted database system on untrusted storage. In Proceedings of the Fourth USENIX Symposium on Operating Systems Design and Implementation (OSDI 2000), USENIX, San Diego.]]
[47]
Malkhi, D. and Reiter, M. 1996a. A high-throughput secure reliable multicast protocol. In Proceedings of the Ninth Computer Security Foundations Workshop, IEEE Computer Society Press, Ireland, 9--17.]]
[48]
Malkhi, D. and Reiter, M. 1996b. Unreliable intrusion detection in distributed computations. In Proceedings of the Ninth Computer Security Foundations Workshop, IEEE Computer Society Press, Ireland, 9--17.]]
[49]
Malkhi, D. and Reiter, M. 1998a. Byzantine quorum systems. J. Distrib. Comput. 11, 4, 203--213.]]
[50]
Malkhi, D. and Reiter, M. 1998b. Secure and scalable replication in phalanx. In Proceedings of the Seventeenth IEEE Symposium on Reliable Distributed Systems, IEEE Computer Society Press, West Lafayette, Ind.]]
[51]
Malkhi, D. and Reiter, M. 2000. An architecture for survivable coordination in large distributed systems. IEEE Trans. Knowl. Data Eng. 12, 2 (Apr.), 187--202.]]
[52]
Malkhi, D., Reiter, M., and Lynch, N. 1998. A correctness condition for memory shared by Byzantine processes (Submitted).]]
[53]
Mazières, D., Kaminsky, M., Kaashoek, M. F., and Witchel, E. 1999. Separating key management from file system security. In Proceedings of the Seventeenth ACM Symposium on Operating System Principles, ACM Press, Kiawah Island, S.C.]]
[54]
Merkle, R. 1987. A digital signature based on a conventional encryption function. In Advances in Cryptology---Crypto'87, Lecture Notes in Computer Science, vol. 293, C. Pomerance, Ed., Springer-Verlag, Santa Barbara, Calif., 369--378.]]
[55]
Minnich, R. 2000. The Linux BIOS home page. Available at http://www.acl.lanl.gov/linuxbios.]]
[56]
Murphy, B. and Levidow, B. 2000. Windows 2000 dependability. In Proceedings of IEEE International Conference on Dependable Systems and Networks, IEEE Computer Society Press, New York.]]
[57]
Oki, B. and Liskov, B. 1988. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of ACM Symposium on Principles of Distributed Computing, ACM Press, Toronto, 8--17.]]
[58]
Ostrovsky, R. and Yung, M. 1991. How to withstand mobile virus attack. In Proceedings of the Nineteenth Symposium on Principles of Distributed Computing, ACM Press, Montreal, 51--59.]]
[59]
Ousterhout, J. 1990. Why aren't operating systems getting faster as fast as hardware? In Proceedings of USENIX Summer Conference, USENIX, Anaheim, Calif., 247--256.]]
[60]
Pease, M., Shostak, R., and Lamport, L. 1980. Reaching agreement in the presence of faults. J. ACM 27, 2 (April), 228--234.]]
[61]
Postel, J. 1980. User datagram protocol. DARPA-Internet RFC-768.]]
[62]
Reiter, M. 1994. Secure agreement protocols. In Proceedings of the Second ACM Conference on Computer and Communication Security, ACM Press, Fairfax, Va., 68--80.]]
[63]
Reiter, M. 1995. The Rampart toolkit for building high-integrity services. In Theory and Practice in Distributed Systems. Lecture Notes in Computer Science, vol. 938, Springer Verlag, New York, 99--110.]]
[64]
Reiter, M. 1996. A secure group membership protocol. IEEE Trans. Softw. Eng. 22, 1 (Jan.), 31--42.]]
[65]
Rivest, R. 1992. The MD5 message-digest algorithm. Internet RFC-1321.]]
[66]
Rodrigues, R., Castro, M., and Liskov, B. 2001. BASE: Using abstraction to improve fault tolerance. In Proceedings of the Eighteenth Symposium on Operating System Principles, ACM Press, Banff, Canada.]]
[67]
Sandberg, R., Goldberg, D., Kleiman, S., Walsh, D., and Lyon, B. 1985. Design and implementation of the sun network filesystem. In Proceedings of the Summer 1985 USENIX Conference, USENIX, Portland, Oreo, 119--130.]]
[68]
Schneider, F. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec.), 299--319.]]
[69]
Schneider, F. 1982. Synchronization in distributed programs. ACM Trans. Program. Lang. Syst. 4, 2 (Apr.), 125--148.]]
[70]
Schneier, B. 1996. Applied Cryptography. Wiley, New York.]]
[71]
SHA1 1994. Announcement of Weakness in Secure Hash Standard.]]
[72]
Wensley, J., Lamport, L., Goldberg, J., Green, M., Levitt, K., Melliar-Smith, M., Shostak, R., and Weinstock, C. 1978. SIFT: Design and analysis of a fault-tolerant computer for aircraft control. Proc. IEEE 66, 10 (Oct.), 1240--1255.]]
[73]
Zhou, L., Schneider, F., and Renesse, R. 2000. COCA: A secure distributed on-line certification authority. Tech. Rep. 2000-1828, Department of Computer Science, Cornell University, Ithaca, NY., Dec. ACM Trans. Comput. Syst. (to appear).]]

Cited By

View all
  • (2025)Bodyless block propagation: TPS fully scalable blockchain with pre-validationFuture Generation Computer Systems10.1016/j.future.2024.107516163(107516)Online publication date: Feb-2025
  • (2024)Scalable and Robust Fraud Detection in Distributed SystemsInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-19519(103-107)Online publication date: 7-Sep-2024
  • (2024)Foundations of Blockchain and Digital TwinsHarnessing Blockchain-Digital Twin Fusion for Sustainable Investments10.4018/979-8-3693-1878-2.ch002(20-48)Online publication date: 16-Feb-2024
  • Show More Cited By

Recommendations

Reviews

Cristiano di Flora

The most challenging faults in the context of modern wide-area distributed applications are those caused by software bugs, intentional attacks, and operator mistakes. Such faults, called Byzantine faults, cause arbitrary behavior of the overall system. Building Byzantine fault tolerant (BFT) systems is not a straightforward task; this paper presents an extremely interesting and effective solution for building systems that tolerate Byzantine faults. Castro and Liskov contribute significantly to the solution of this problem, providing readers with a complete and exhaustive description of their BFT algorithm. The main contribution of this work is twofold. First, the description of the algorithm is engineered in a perfect way. Even researchers who are not interested in fault tolerance issues may gain a lot of experience by reading this paper. The algorithm is described in detail, and the authors have done a good job of documenting its vulnerability window (and strategies to dimension and configure it); its optimizations (which make it usable in practice); and its implementation (as a generic library), providing the reader with a practical case study (namely, the first BFT distributed file system) and a complete performance analysis. Second, experienced researchers in the field of fault tolerance in distributed systems will find many explicit or implicit guidelines for addressing BFT issues, and for evaluating and predicting the behavior of the BFT algorithm in particular contexts. Performance analysis and evaluation are very accurately and deeply documented, and the authors also provide the reader with both the necessary guidelines for customizing the performance and behavior of the algorithm and a substantial section devoted to related work. The inclusion of some additional figures in the middle of the paper (particularly in sections 5 and 6), or of some more standard formalism for the description of the algorithm, would have helped inexperienced readers. Nevertheless, this paper remains a worthwhile read, both for those interested in BFT, and for those interested in evaluating and modeling complex algorithms. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 20, Issue 4
November 2002
133 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/571637
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2002
Published in TOCS Volume 20, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Byzantine fault tolerance
  2. asynchronous systems
  3. proactive recovery
  4. state machine replication
  5. state transfer

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)585
  • Downloads (Last 6 weeks)55
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Bodyless block propagation: TPS fully scalable blockchain with pre-validationFuture Generation Computer Systems10.1016/j.future.2024.107516163(107516)Online publication date: Feb-2025
  • (2024)Scalable and Robust Fraud Detection in Distributed SystemsInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-19519(103-107)Online publication date: 7-Sep-2024
  • (2024)Foundations of Blockchain and Digital TwinsHarnessing Blockchain-Digital Twin Fusion for Sustainable Investments10.4018/979-8-3693-1878-2.ch002(20-48)Online publication date: 16-Feb-2024
  • (2024)Securing Blockchain-Based Supply Chain Management: Textual Data Encryption and Access ControlTechnologies10.3390/technologies1207011012:7(110)Online publication date: 9-Jul-2024
  • (2024)A Redactable Blockchain-Based Data Management Scheme for Agricultural Product TraceabilitySensors10.3390/s2405166724:5(1667)Online publication date: 4-Mar-2024
  • (2024)Incentive Mechanism for Privacy-Preserving Collaborative Routing Using Secure Multi-Party Computation and BlockchainSensors10.3390/s2402054224:2(542)Online publication date: 15-Jan-2024
  • (2024)Dynamic Byzantine Fault-Tolerant Consensus Algorithm with Supervised Feedback MechanismsMathematics10.3390/math1217264312:17(2643)Online publication date: 26-Aug-2024
  • (2024)Blizzard: A Distributed Consensus Protocol for Mobile DevicesMathematics10.3390/math1205070712:5(707)Online publication date: 28-Feb-2024
  • (2024)BeHarmony: Blockchain-Enabled Trustworthy Communication and Legitimate Decision Making in Multi-Party Internet of Vehicles SystemsElectronics10.3390/electronics1316321913:16(3219)Online publication date: 14-Aug-2024
  • (2024)Linear Consensus Protocol Based on Vague Sets and Multi-Attribute Decision-Making MethodsElectronics10.3390/electronics1313246113:13(2461)Online publication date: 24-Jun-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media