Abstract
In this paper, we describe the design and implementation of BChain, a Byzantine fault-tolerant state machine replication protocol, which performs comparably to other modern protocols in fault-free cases, but in the face of failures can also quickly recover its steady state performance. Building on chain replication, BChain achieves high throughput and low latency under high client load. At the core of BChain is an efficient Byzantine failure detection mechanism called re-chaining, where faulty replicas are placed out of harm’s way at the end of the chain, until they can be replaced. Our experimental evaluation confirms our performance expectations for both fault-free and failure scenarios. We also use BChain to implement an NFS service, and show that its performance overhead, with and without failures, is low, both compared to unreplicated NFS and other BFT implementations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abd-El-Malek, M., Ganger, G., Goodson, G., Reiter, M., Wylie, J.: Fault-scalable Byzantine fault-tolerant services. In: SOSP, pp. 59–74. ACM Press (2005)
Adams, J., Ramarao, K.: Distributed diagnosis of Byzantine processors and links. In: ICDCS, pp. 562–569. IEEE Computer Society (1989)
Baldoni, R., Helary, J., Raynal, M.: From crash fault-tolerance to arbitrary-fault tolerance: Towards a modular approach. In: DSN, pp. 273–282 (2000)
Benzel, T.: The science of cyber security experimentation: The DETER project. In: ACSAC (2011)
Castro, M., Liskov, B.: Practical Byzantine fault tolerance. In: OSDI, pp. 173–186. USENIX Association (1999)
Chandra, T., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)
Chandra, T., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43(2), 225–267 (1996)
Chiang, M., Wang, S., Tseng, L.: An early fault diagnosis agreement under hybrid fault model. Expert Syst. Appl. 36(3), 5039–5050 (2009)
Clement, A., Wong, E., Alvisi, L., Dahlin, M., Marchetti, M.: Making Byzantine fault tolerant systems tolerate Byzantine faults. In: NSDI, pp. 153–168. USENIX Association (2009)
Coker, R.: http://www.coker.com.au/bonnie++
Clement, A., Kapritsos, M., Lee, S., Wang, Y., Alvisi, L., Dahlin, M., Riche, T.: UpRight cluster services. In: SOSP, pp. 277–290. ACM Press (2009)
Cowling, J., Myers, D., Liskov, B., Rodrigues, R., Shrira, L.: HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In: OSDI, pp. 177–190. USENIX Association (2006)
Doudou, A., Garbinato, B., Guerraoui, R., Schiper, A.: Muteness failure detectors: Specification and implementation. In: Hlavicka, J., Maehle, E., Pataricza, A. (eds.) EDDC 1999. LNCS, vol. 1667, pp. 71–87. Springer, Heidelberg (1999)
Doudou, A., Garbinato, B., Guerraoui, R.: Encapsulating Failure Detection: From Crash to Byzantine Failures. In: Blieberger, J., Strohmeier, A. (eds.) Ada-Europe 2002. LNCS, vol. 2361, pp. 24–50. Springer, Heidelberg (2002)
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)
Fischer, M., Lynch, N., Paterson, M.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: SOSP, pp. 29–43 (2003)
Guerraoui, R., Knezevic, N., Quema, V., Vukolic, M.: The next 700 BFT protocols. In: EuroSys, pp. 363–376. ACM (2010)
Haeberlen, A., Kouznetsov, P., Druschel, P.: PeerReview: practical accountability for distributed systems. In: SOSP, pp. 175–188. ACM (2007)
Hendricks, J., Sinnamohideen, S., Ganger, G., Reiter, M.: Zzyzx: Scalable fault tolerance through Byzantine locking. In: DSN, pp. 363–372. IEEE Computer Society (2010)
Hirt, M., Maurer, U.M., Przydatek, B.: Efficient secure multi-party computation (Extended Abstract). In: Okamoto, T. (ed.) ASIACRYPT 2000. LNCS, vol. 1976, pp. 143–161. Springer, Heidelberg (2000)
Hsiao, H., Chin, Y., Yang, W.: Reaching fault diagnosis agreement under a hybrid fault model. IEEE Transactions on Computers 49(9) (September 2000)
Kihlstrom, K.P., Moser, L.E., Melliar-Smith, P.M.: Byzantine Fault Detectors for Solving Consensus. Comput. J. 46(1), 16–35 (2003)
Kotla, R., Alvisi, L., Dahlin, M., Clement, A., Wong, E.: Zyzzyva: Speculative Byzantine fault tolerance. In: SOSP, pp. 45–58. ACM (2007)
Lamport, L.: Using time instead of timeout for fault-tolerant distributed systems. Trans. on Programming Languages and Systems 6(2), 254–280 (1984)
Lamport, L., Malkhi, D., Zhou, L.: Reconfiguring a state machine. SIGACT News 41(1), 63–73 (2010)
Malkhi, D., Reiter, M.: Unreliable intrusion detection in distributed computations. In: CSFW, pp. 116–125 (1997)
Malkhi, D., Reiter, M.: Byzantine quorum systems. Distributed Computing 11(4) (1998)
Preperata, F., Metze, G., Chien, R.: On the connection asssignment problem of diagnosable systems. IEEE Transactions on Electronic Computers EC-16(6), 848–854 (1967)
Ramarao, K., Adams, J.: On the diagnosis of Byzantine faults. In: Proc. Symp. Reliable Distributed Systems, pp. 144–153 (1988)
Schneider, F.: Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22(4), 299–319 (1990)
Serafini, M., Bondavalli, A., Suri, N.: Online diagnosis and recovery: On the choice and impact of tuning parameters. IEEE Trans. Dependable Sec. Comput. 4(4), 295–312 (2007)
Shin, K., Ramanathan, P.: Diagnosis of processors with Byzantine faults in a distributed computing system. In: Proc. Symp. Fault-Tolerant Computing, pp. 55–60 (July 1987)
van Renesse, R., Ho, C., Schiper, N.: Byzantine chain replication. In: Baldoni, R., Flocchini, P., Binoy, R. (eds.) OPODIS 2012. LNCS, vol. 7702, pp. 345–359. Springer, Heidelberg (2012)
van Renesse, R., Schneider, F.B.: Chain replication for supporting high throughput and availability. In: OSDI, pp. 91–104. USENIX Association (2004)
Vukolic, M.: Abstractions for asynchronous distributed computing with malicious players. PhD thesis. EPFL, Lausanne, Switzerland (2008)
Walter, C., Lincoln, P., Suri, N.: Formally verified on-line diagnosis. IEEE Trans. Software Eng. 23(11), 684–721 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Duan, S., Meling, H., Peisert, S., Zhang, H. (2014). BChain: Byzantine Replication with High Throughput and Embedded Reconfiguration. In: Aguilera, M.K., Querzoni, L., Shapiro, M. (eds) Principles of Distributed Systems. OPODIS 2014. Lecture Notes in Computer Science, vol 8878. Springer, Cham. https://doi.org/10.1007/978-3-319-14472-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-14472-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14471-9
Online ISBN: 978-3-319-14472-6
eBook Packages: Computer ScienceComputer Science (R0)