Abstract
The overhead of Byzantine fault tolerant (BFT) storage is a primary concern that prevents its adoption in practice. The cost stems from the need to maintain at least 3t + 1 copies of the data at different storage replicas in the asynchronous model, so that t Byzantine replica faults can be tolerated. This paper presents MDStore, the first fully asynchronous BFT storage protocol that reduces the number of replicas that store the payload data to as few as 2t + 1 and maintains metadata at 3t + 1 replicas on (possibly) different servers. At the heart of MDStore lies a metadata service built upon a new abstraction called “timestamped storage.” Timestamped storage allows for conditional writes (facilitating the implementation of the metadata service) and has consensus number one (making it implementable with wait-free semantics in an asynchronous system despite faults). In addition to its low replication overhead, MDStore offers strong guarantees by emulating a multi-writer multi-reader atomic register, providing wait-free termination, and tolerating any number of Byzantine readers and crash-faulty writers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abraham, I., Chockler, G., Keidar, I., Malkhi, D.: Byzantine disk Paxos: Optimal resilience with Byzantine shared memory. Distributed Computing 18(5), 387–408 (2006)
Adya, A., Bolosky, W.J., Castro, M., et al.: FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In: Proc. 5th Symp. Operating Systems Design and Implementation, OSDI (2002)
Aiyer, A.S., Alvisi, L., Bazzi, R.A.: Bounded wait-free implementation of optimally resilient Byzantine storage without (unproven) cryptographic assumptions. In: Pelc, A. (ed.) DISC 2007. LNCS, vol. 4731, pp. 7–19. Springer, Heidelberg (2007)
Androulaki, E., Cachin, C., Dobre, D., Vukolić, M.: Erasure-coded Byzantine storage with separate metadata. Report ArXiv:1402.4958, CoRR (2014)
Attiya, H., Welch, J.: Distributed Computing: Fundamentals, Simulations and Advanced Topics. McGraw-Hill, London (1998)
Cachin, C., Dobre, D., Vukolić, M.: BFT storage with 2t + 1 data replicas. Report ArXiv:1305.4868, CoRR (2013)
Cachin, C., Guerraoui, R., Rodrigues, L.: Introduction to Reliable and Secure Distributed Programming, 2nd edn. Springer (2011)
Cachin, C., Junker, B., Sorniotti, A.: On limitations of using cloud storage for data replication. In: Proc. 6th Workshop on Recent Advances in Intrusion Tolerance and reSilience, WRAITS 2012 (2012)
Cachin, C., Tessaro, S.: Optimal resilience for erasure-coded Byzantine distributed storage. In: Proc. International Conference on Dependable Systems and Networks (DSN-DCCS), pp. 115–124 (2006)
Cho, B., Aguilera, M.K.: Surviving congestion in geo-distributed storage systems. In: Proc. USENIX Annual Technical Conference, pp. 439–451 (2012)
Chun, B.-G., Maniatis, P., Shenker, S., Kubiatowicz, J.: Attested append-only memory: Making adversaries stick to their word. In: Proc. 21st ACM Symposium on Operating Systems Principles (SOSP), pp. 189–204 (2007)
Correia, M., Neves, N.F., Veríssimo, P.: How to tolerate half less one Byzantine nodes in practical distributed systems. In: Proc. 23rd Symposium on Reliable Distributed Systems (SRDS), pp. 174–183 (2004)
Dobre, D., Karame, G., Li, W., Majuntke, M., Suri, N., Vukolić, M.: PoWerStore: Proofs of writing for efficient and robust storage. In: Proc. ACM Conference on Computer and Communications Security, CCS (2013)
Dobre, D., Viotti, P., Vukolić, M.: Hybris: Consistency hardening in robust hybrid cloud storage. Research Report RR-13-291, Eurécom (2013)
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35(2), 288–323 (1988)
Fan, R., Lynch, N.A.: Efficient replication of large data objects. In: Fich, F.E. (ed.) DISC 2003. LNCS, vol. 2848, pp. 75–91. Springer, Heidelberg (2003)
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM 32(2), 374–382 (1985)
Guerraoui, R., Vukolić, M.: How fast can a very robust read be? In: Proc. 25th ACM Symposium on Principles of Distributed Computing (PODC), pp. 248–257 (2006)
Guerraoui, R., Vukolić, M.: Refined quorum systems. Distributed Computing 23(1), 1–42 (2010)
Herlihy, M.: Wait-free synchronization. ACM Transactions on Programming Languages and Systems 11(1), 124–149 (1991)
Herlihy, M.P., Wing, J.M.: Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems 12(3), 463–492 (1990)
Kapitza, R., Behl, J., Cachin, C., Distler, T., Kuhnle, S., Mohammadi, S.V., Schröder-Preikschat, W., Stengel, K.: CheapBFT: Resource-efficient Byzantine fault tolerance. In: Proc. 7th European Conference on Computer Systems (EuroSys), pp. 295–308 (April 2012)
Katz, J., Lindell, Y.: Introduction to Modern Cryptography: Principles and Protocols. Chapman & Hall/CRC (2007)
Lamport, L.: On interprocess communication. Distributed Computing 1(2), 77–85, 86–101 (1986)
Lamport, L.: The part-time parliament. ACM Transactions on Computer Systems 16(2), 133–169 (1998)
Lamport, L.: Paxos made simple. SIGACT News 32(4), 51–58 (2001)
Lamport, L.: Lower bounds for asynchronous consensus. In: Schiper, A., Shvartsman, M.M.A.A., Weatherspoon, H., Zhao, B.Y. (eds.) Future Directions in Distributed Computing. LNCS, vol. 2584, pp. 22–23. Springer, Heidelberg (2003)
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
Lynch, N.A., Tuttle, M.R.: An introduction to input/output automata. CWI Quaterly 2(3), 219–246 (1989)
Malkhi, D., Reiter, M.: Secure and scalable replication in Phalanx. In: Proc. 17th Symposium on Reliable Distributed Systems, SRDS (1998)
Malkhi, D., Reiter, M.K.: Byzantine quorum systems. Distributed Computing 11(4), 203–213 (1998)
Martin, J.-P., Alvisi, L., Dahlin, M.: Minimal Byzantine storage. In: Malkhi, D. (ed.) DISC 2002. LNCS, vol. 2508, pp. 311–325. Springer, Heidelberg (2002)
Veronese, G.S., Correia, M., Bessani, A., Lung, L.C., Veríssimo, P.: Efficient Byzantine fault tolerance. IEEE Transactions on Computers 62(1), 16–30 (2011)
Vukolić, M.: Quorum Systems: With Applications to Storage and Consensus. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers (2012)
Wang, Y., Alvisi, L., Dahlin, M.: Gnothi: Separating data and metadata for efficient and available storage replication. In: Proc. USENIX Annual Technical Conference, pp. 413–424 (2012)
Wilkes, J., Hoover, C., Keer, B., Mehra, P., Veitch, A.: Storage, Data, and Information Systems. HP Laboratories (2008)
Yin, J., Martin, J.-P., Venkataramani, A., Alvisi, L., Dahlin, M.: Separating agreement from execution for Byzantine fault-tolerant services. In: Proc. 19th ACM Symposium on Operating Systems Principles (SOSP), pp. 253–268 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Cachin, C., Dobre, D., Vukolić, M. (2014). Separating Data and Control: Asynchronous BFT Storage with 2t + 1 Data Replicas. In: Felber, P., Garg, V. (eds) Stabilization, Safety, and Security of Distributed Systems. SSS 2014. Lecture Notes in Computer Science, vol 8756. Springer, Cham. https://doi.org/10.1007/978-3-319-11764-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-11764-5_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11763-8
Online ISBN: 978-3-319-11764-5
eBook Packages: Computer ScienceComputer Science (R0)