article

Scalable Byzantine fault-tolerant state-machine replication on heterogeneous servers

Authors:

Michael Eischer,

Tobias DistlerAuthors Info & Claims

Computing, Volume 101, Issue 2

Pages 97 - 118

https://doi.org/10.1007/s00607-018-0652-3

Published: 01 February 2019 Publication History

Abstract

When provided with more powerful or extra hardware, state-of-the-art Byzantine fault-tolerant (BFT) replication protocols are unable to effectively exploit the additional computing resources: on the one hand, in settings with heterogeneous servers existing protocols cannot fully utilize servers with higher performance capabilities. On the other hand, using more servers than the minimum number of replicas required for Byzantine fault tolerance in general does not lead to improved throughput and latency, but instead actually degrades performance. In this paper, we address these problems with Omada, a BFT system architecture that is able to benefit from additional hardware resources. To achieve this property while still providing strong consistency, Omada first parallelizes agreement into multiple groups and then executes the requests handled by different groups in a deterministic order. By varying the number of requests to be ordered between groups as well as the number of groups that a replica participates in between servers, Omada offers the possibility to individually adjust the resource usage per server. Moreover, the fact that not all replicas need to take part in every group enables the architecture to exploit additional servers.

References

[1]

Aguilera MK, Strom RE (2000) Efficient atomic broadcast using deterministic merge. In: Proceedings of the 19th symposium on principles of distributed computing, pp 209---218

Digital Library

[2]

Amir Y, Coan B, Kirsch J, Lane J (2007) Customizable fault tolerance for wide-area replication. In: Proceedings of the 26th international symposium on reliable distributed systems, pp 65---82

Digital Library

[3]

Amir Y, Coan B, Kirsch J, Lane J (2011) Prime: Byzantine replication under attack. In: IEEE transactions on dependable and secure computing, vol 8, no 4, pp 564---577

Digital Library

[4]

Aublin PL, Mokhtar SB, Quéma V (2013) RBFT: Redundant Byzantine fault tolerance. In: Proceedings of the 33rd international conference on distributed computing systems, pp 297---306

Digital Library

[5]

Babay A, Amir Y (2016) Fast total ordering for modern data centers. In: Proceedings of the 36th international conference on distributed computing systems, pp 669---679

[6]

Behl J, Distler T, Kapitza R (2015) Consensus-oriented parallelization: how to earn your first million. In: Proceedings of the 16th middleware conference, pp 173---184

Digital Library

[7]

Bessani A, Sousa J, Alchieri EEP (2014) State machine replication for the masses with BFT-SMaRt. In: Proceedings of the 44th international conference on dependable systems networks, pp 355---362

Digital Library

[8]

Castro M, Liskov B (1999) Practical Byzantine fault tolerance. In: Proceedings of the 3rd symposium on operating systems design and implementation, pp 173---186

Digital Library

[9]

Castro M, Rodrigues R, Liskov B (2003) BASE: using abstraction to improve fault tolerance. In: ACM transactions on computer systems, vol 21, no 3, pp 236---269

Digital Library

[10]

Clement A, Kapritsos M, Lee S, Wang Y, Alvisi L, Dahlin M, Riche T (2009) UpRight cluster services. In: Proceedings of the 22nd symposium on operating systems principles, pp 277---290

Digital Library

[11]

Distler T, Kapitza R, Reiser HP (2010) State transfer for hypervisor-based proactive recovery of heterogeneous replicated services. In: Proceedings of the 5th "Sicherheit, Schutz und Zuverlässigkeit" conference, pp 61---72

[12]

Distler T, Cachin C, Kapitza R (2016) Resource-efficient Byzantine fault tolerance. In: IEEE transactions on computers, vol 65, no 9, pp 2807---2819

Digital Library

[13]

Garcia M, Bessani A, Gashi I, Neves N, Obelheiro R (2014) Analysis of operating system diversity for intrusion tolerance. In: Software--practice & experience, vol 44, no 6, pp 735---770

Digital Library

[14]

Hunt P, Konar M, Junqueira F, Reed B (2010) ZooKeeper: wait-free coordination for Internet-scale systems. In: Proceedings of the 2010 USENIX annual technical conference, pp 145---158

Digital Library

[15]

Junqueira F, Bhagwan R, Hevia A, Marzullo K, Voelker GM (2005) Surviving Internet catastrophes. In: Proceedings of the 2005 USENIX annual technical conference, pp 45---60

Digital Library

[16]

Kapitza R, Behl J, Cachin C, Distler T, Kuhnle S, Mohammadi SV, Schröder-Preikschat W, Stengel K (2012) CheapBFT: resource-efficient Byzantine fault tolerance. In: Proceedings of the 7th European conference on computer systems, pp 295---308

Digital Library

[17]

Kapritsos M, Junqueira FP (2010) Scalable agreement: toward ordering as a service. In: Proceedings of the 6th workshop on hot topics in system dependability, pp 7---12

Digital Library

[18]

Li B, Xu W, Abid MZ, Distler T, Kapitza R (2016) SAREK: optimistic parallel ordering in Byzantine fault tolerance. In: Proceedings of the 12th European dependable computing conference, pp 77---88

[19]

Mao Y, Junqueira FP, Marzullo K (2008) Mencius: building efficient replicated state machines for WANs. In: Proceedings of the 8th conference on operating systems design and implementation, pp 369---384

Digital Library

[20]

Ou Z, Zhuang H, Lukyanenko A, Nurminen JK, Hui P, Mazalov V, Ylä-Jääski A (2013) Is the same instance type created equal? Exploiting heterogeneity of public clouds. In: IEEE transactions on cloud computing, vol 1, no 2, pp 201---214

Digital Library

[21]

Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Dover Publications, New York

Digital Library

[22]

Pease M, Shostak R, Lamport L (1980) Reaching agreement in the presence of faults. J ACM 27(2):228---234

Digital Library

[23]

Veronese GS, Correia M, Bessani AN, Lung LC (2009) Spin one's wheels? Byzantine fault tolerance with a spinning primary. In: Proceedings of the 28th international symposium on reliable distributed systems, pp 135---144

Digital Library

[24]

Veronese GS, Correia M, Bessani AN, Lung LC (2010) EBAWA: efficient Byzantine agreement for wide-area networks. In: Proceedings of the 12th symposium on high-assurance systems engineering, pp 10---19

Digital Library

[25]

Yin J, Martin JP, Venkataramani A, Alvisi L, Dahlin M (2003) Separating agreement from execution for Byzantine fault tolerant services. In: Proceedings of the 19th symposium on operating systems principles, pp 253---267

Digital Library

Cited By

Stathakopoulou CPavlovic MVukolić MBromberg YKermarrec AKozyrakis C(2022)State machine replication scalability made simpleProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519579(17-33)Online publication date: 28-Mar-2022
https://dl.acm.org/doi/10.1145/3492321.3519579
Distler T(2021)Byzantine Fault-tolerant State-machine Replication from a Systems PerspectiveACM Computing Surveys10.1145/343672854:1(1-38)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3436728
Eischer MDistler T(2020)Resilient Cloud-based Replication with Low LatencyProceedings of the 21st International Middleware Conference10.1145/3423211.3425689(14-28)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3423211.3425689
Show More Cited By

Index Terms

Scalable Byzantine fault-tolerant state-machine replication on heterogeneous servers
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Index terms have been assigned to the content through auto-classification.

Recommendations

Byzantine Fault-tolerant State-machine Replication from a Systems Perspective

Byzantine fault-tolerant (BFT) state-machine replication makes it possible to design systems that are resilient against arbitrary faults, a requirement considered crucial for an increasing number of use cases such as permissioned blockchains, firewalls, ...
Deterministic or probabilistic? - A survey on Byzantine fault tolerant state machine replication
Highlights
- Network infrastructures and software systems are vulnerable to failures.
- Service replication is a solution that guarantees the service’s correct execution even in the presence of faults.
- The use of a consensus protocol is necessary ...
Abstract
Byzantine Fault tolerant (BFT) protocols are implemented to guarantee the correct system/application behavior even in the presence of arbitrary faults (i.e., Byzantine faults). Byzantine Fault tolerant State Machine Replication (BFT-SMR) is a ...
Separating agreement from execution for byzantine fault tolerant services
SOSP '03

We describe a new architecture for Byzantine fault tolerant state machine replication that separates agreement that orders requests from execution that processes requests. This separation yields two fundamental and practically significant advantages ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computing

Computing Volume 101, Issue 2

February 2019

111 pages

ISSN:0010-485X

Issue’s Table of Contents

Copyright © Copyright © 2019 Springer-Verlag GmbH Austria, part of Springer Nature.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 February 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Stathakopoulou CPavlovic MVukolić MBromberg YKermarrec AKozyrakis C(2022)State machine replication scalability made simpleProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519579(17-33)Online publication date: 28-Mar-2022
https://dl.acm.org/doi/10.1145/3492321.3519579
Distler T(2021)Byzantine Fault-tolerant State-machine Replication from a Systems PerspectiveACM Computing Surveys10.1145/343672854:1(1-38)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3436728
Eischer MDistler T(2020)Resilient Cloud-based Replication with Low LatencyProceedings of the 21st International Middleware Conference10.1145/3423211.3425689(14-28)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3423211.3425689
Eischer MStraßner BDistler TFekete AKleppmann M(2020)Low-latency geo-replicated state machines with guaranteed writesProceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3380787.3393686(1-9)Online publication date: 27-Apr-2020
https://dl.acm.org/doi/10.1145/3380787.3393686
Beheshti MSafi-Esfahani F(2020)BFPF-Cloud: Applying SVM for Byzantine Failure Prediction to Increase Availability and Failure Tolerance in Cloud ComputingSN Computer Science10.1007/s42979-020-00299-51:5Online publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1007/s42979-020-00299-5
Deyerl CDistler T(2019)In Search of a Scalable Raft-based Replication ArchitectureProceedings of the 6th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3301419.3323968(1-7)Online publication date: 25-Mar-2019
https://dl.acm.org/doi/10.1145/3301419.3323968

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents