article

BASE: Using abstraction to improve fault tolerance

Authors:

Rodrigo Rodrigues,

Barbara LiskovAuthors Info & Claims

ACM Transactions on Computer Systems (TOCS), Volume 21, Issue 3

Pages 236 - 269

https://doi.org/10.1145/859716.859718

Published: 01 August 2003 Publication History

Abstract

Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication technique, BASE, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BASE reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or nondeterministic service implementations, which reduces the probability of common mode failures. We built an NFS service where each replica can run a different off-the-shelf file system implementation, and an object-oriented database where the replicas ran the same, nondeterministic implementation. These examples suggest that our technique can be used in practice---in both cases, the implementation required only a modest amount of new code, and our performance results indicate that the replicated services perform comparably to the implementations that they reuse.

References

[1]

Adya, A., Gruber, R., Liskov, B., and Maheshwari, U. 1995. Efficient Optimistic Concurrency Control using Loosely Synchronized Clocks. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose, California, 23--34.]]

[2]

Amir, Y., Danilov, C., Miskin-Amir, M., Stanton, J., and Tutu, C. 2002. Practical Wide-Area Database Replication. Tech. Rep. CNDS-2002-1, Johns Hopkins University.]]

[3]

Birman, K., Schiper, A., and Stephenson, P. 1991. Lightweight causal and atomic group multicast. In ACM Trans. Comput. Syst. Vol. 9(3).]]

[4]

Bressoud, T. and Schneider, F. 1995. Hypervisor-based Fault Tolerance. In Proceedings of the Fifteenth ACM Symposium on Operating System Principles. Copper Mountain Resort, Colorado, 1--11.]]

[5]

Callaghan, B. 1999. NFS Illustrated. Addison-Wesley, Reading, Massachusetts.]]

[6]

Carey, M. J., DeWitt, D. J., and Naughton, J. F. 1993. The OO7 Benchmark. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington D.C., 12--21.]]

[7]

Castro, M. 2000. Practical Byzantine fault-tolerance. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts.]]

[8]

Castro, M., Adya, A., Liskov, B., and Myers, A. 1997. HAC: Hybrid Adaptive Caching for Distributed Storage Systems. In Proceedings of the Sixteenth ACM Symposium on Operating System Principles. Saint Malo, France, 102--115.]]

[9]

Castro, M. and Liskov, B. 1999. Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation. New Orleans, Louisiana, 173--186.]]

[10]

Castro, M. and Liskov, B. 2000. Proactive recovery in a Byzantine-fault-tolerant system. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation. San Diego, California, 273--288.]]

[11]

Castro, M. and Liskov, B. 2002. Practical Byzantine Fault Tolerance and Proactive Recovery. ACM Trans. Comput. Syst. 20, 4 (Nov.), 398--461.]]

Digital Library

[12]

Chen, L. and Avizienis, A. 1978. N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation. In Fault Tolerant Computing, FTCS-8. 3--9.]]

[13]

Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-performance, reliable secondary storage. ACM Comput. Surv. 26, 2, 145--185.]]

Digital Library

[14]

Cooper, E. 1985. Replicated Distributed Programs. In Proceedings of the Tenth ACM Symposium on Operating System Principles. Orcas Island, Washington, 63--78.]]

[15]

Geiger, K. 1995. Inside ODBC. Microsoft Press.]]

[16]

Ghemawat, S. 1995. The Modified Object Buffer: a storage management technique for object-oriented databases. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts.]]

[17]

Gray, J. and Siewiorek, D. 1991. High-availability computer systems. IEEE Comput. 24, 9 (Sept.), 39--48.]]

Digital Library

[18]

Gray, J. N. and Reuter, A. 1993. Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc.]]

Digital Library

[19]

Hayden, M. 1998. The Ensemble System. Tech. Rep. TR98-1662, Cornell University, Ithaca, New York. Jan.]]

Digital Library

[20]

Herlihy, M. P. and Wing, J. M. 1987. Axioms for Concurrent Objects. In Conference Record of the Fourteenth Annual ACM Symposium on Principles of Programming Languages. Munich, Germany, 13--26.]]

[21]

Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., and West, M. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 51--81.]]

Digital Library

[22]

Huang, Y., Kintala, C., Kolettis, N., and Fulton, N. D. 1995. Software rejuvenation: Analysis, modules and applications. In Digest of Papers: FTCS-25, The Twenty-Fifth International Symposium on Fault-Tolerant Computing. Pasadena, California, 381--390.]]

[23]

Jiménez-Peris, R., Patiño-Martínez, M., Kemme, B., and Alonso, G. 2002. Improving the Scalability of Fault-Tolerant Database Clusters. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'02). Vienna, Austria.]]

[24]

Kemme, B. and Alonso, G. 2000. Don't be lazy be consistent: Postgres-R, a new way to implement Database Replication. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases. Cairo, Egypt.]]

[25]

Lamport, L. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM 21, 7 (July), 558--565.]]

Digital Library

[26]

Liskov, B., Castro, M., Shrira, L., and Adya, A. 1999. Providing persistent objects in distributed systems. In Proceedings of the 13th European Conference on Object-Oriented Programming (ECOOP '99). Lisbon, Portugal, 230--257.]]

[27]

Liskov, B., Ghemawat, S., Gruber, R., Johnson, P., Shrira, L., and Williams, M. 1991. Replication in the Harp File System. In Proceedings of the Thirteenth ACM Symposium on Operating System Principles. Pacific Grove, California, 226--238.]]

[28]

Liskov, B. and Guttag, J. 2000. Program Development in Java: Abstraction, Specification, and Object-Oriented Design. Addison-Wesley.]]

[29]

Maffeis, S. 1995. Adding group communication and fault tolerance to CORBA. In Proceedings of the Second USENIX Conference on Object-Oriented Technologies. Toronto, Canada, 135--146.]]

[30]

Marzullo, K. and Schmuck, F. 1988. Supplying high availability with a standard network file system. In Proceedings of the 8th International Conference on Distributed Computing Systems. San Jose, California, 447--453.]]

[31]

Mills, D. L. 1992. Network Time Protocol (Version 3) Specification, Implementation and Analysis. Network Working Report RFC 1305.]]

[32]

Minnich, R. 2000. The Linux BIOS Home Page. http://www.acl.lanl.gov/linuxbios.]]

[33]

Moser, L., Melliar-Smith, P., and Narasimhan, P. 1998. Consistent object replication in the eternal system. Theory and Practice of Object Systems 4, 2 (Jan.), 81--92.]]

[34]

Narasimhan, P., Kihlstrom, K., Moser, L., and Melliar-Smith, P. 1999. Providing Support for Survivable CORBA Applications with the Immune System. In Proceedings of the 19th International Conference on Distributed Computing Systems. Austin, Texas, 507--516.]]

[35]

Object Management Group. 1999. The Common Object Request Broker: Architecture and Specification. Omg techical committee document formal/98-12-01. June.]]

[36]

Object Management Group. 2000. Fault Tolerant CORBA. Omg techical committee document orbos/2000-04-04. Mar.]]

[37]

Ousterhout, J. 1990. Why Aren't Operating Systems Getting Faster as Fast as Hardware? In Proceedings of the Usenix Summer 1990 Technical Conference. Anaheim, California, 247--256.]]

[38]

Pease, M., Shostak, R., and Lamport, L. 1980. Reaching Agreement in the Presence of Faults. J. ACM 27, 2 (Apr.), 228--234.]]

Digital Library

[39]

Ramkumar, B. and Strumpen, V. 1997. Portable checkpointing for heterogeneous architectures. In Digest of Papers: FTCS-27, The Twenty-Seventh Annual International Symposium on Fault-Tolerant Computing. Seattle, Washington, 58--67.]]

[40]

RFC-1014 1987. Network working group request for comments: 1014. XDR: External data representation standard.]]

[41]

RFC-1094 1989. Network working group request for comments: 1094. NFS: Network file system protocol specification.]]

[42]

Rodrigues, R. 2001. Combining abstraction with Byzantine fault-tolerance. M.S. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts.]]

[43]

Romanovsky, A. 2000. Faulty version recovery in object-oriented N-version programming. IEE Proc. Soft. 147, 3 (June), 81--90.]]

[44]

Salles, F., Rodríguez, M., Fabre, J., and Arlat, J. 1999. MetaKernels and Fault Containment Wrappers. In Digest of Papers: FTCS-29, The Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing. Madison, Wisconsin, 22--29.]]

[45]

Schneider, F. 1990. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22, 4 (Dec.), 299--319.]]

Digital Library

[46]

Stonebraker, M., Rowe, L. A., and Hirohama, M. 1990. The implementation of POSTGRES. IEEE Trans. Knowl. Data Eng. 2, 1 (Mar.), 125--142.]]

Digital Library

[47]

Tso, K. and Avizienis, A. 1987. Community error recovery in N-version software: A design study with experimentation. In Digest of Papers: FTCS-17, the Seventeenth Annual Symposium on Fault Tolerant Computing. Pittsburgh, Pennsylvania, 127--133.]]

[48]

W3C. 2000. Extensible Markup Language (XML) 1.0 (Second Edition). W3C recommendation.]]

Cited By

Marotta RPellegrini AAndelfinger P(2024)Follow the Leader: Alternating CPU/GPU Computations in PDESProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3656056(47-51)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3615979.3656056
Ye ZMisra UCheng JZhou WSong D(2024)Specular: Towards Secure, Trust-minimized Optimistic Blockchain Execution2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00175(3943-3960)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00175
Shoker ARahli VDecouchant JEsteves-Verissimo P(2023)Intrusion Resilience Systems for Modern Vehicles2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring)10.1109/VTC2023-Spring57618.2023.10200532(1-7)Online publication date: Jun-2023
https://doi.org/10.1109/VTC2023-Spring57618.2023.10200532
Show More Cited By

Recommendations

Practical byzantine fault tolerance and proactive recovery

Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions ...
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling
SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles

This paper describes the design, implementation, and evaluation of areplication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are unmodified, ...
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling
SOSP '07

This paper describes the design, implementation, and evaluation of areplication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are unmodified, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems

ACM Transactions on Computer Systems Volume 21, Issue 3

August 2003

106 pages

ISSN:0734-2071

EISSN:1557-7333

DOI:10.1145/859716

Issue’s Table of Contents

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2003

Published in TOCS Volume 21, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

109
Total Citations
View Citations
2,190
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Marotta RPellegrini AAndelfinger P(2024)Follow the Leader: Alternating CPU/GPU Computations in PDESProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3615979.3656056(47-51)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3615979.3656056
Ye ZMisra UCheng JZhou WSong D(2024)Specular: Towards Secure, Trust-minimized Optimistic Blockchain Execution2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00175(3943-3960)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00175
Shoker ARahli VDecouchant JEsteves-Verissimo P(2023)Intrusion Resilience Systems for Modern Vehicles2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring)10.1109/VTC2023-Spring57618.2023.10200532(1-7)Online publication date: Jun-2023
https://doi.org/10.1109/VTC2023-Spring57618.2023.10200532
Wang GNixon M(2023)SoK: Essentials of BFT Consensus for Blockchains2023 Fifth International Conference on Blockchain Computing and Applications (BCCA)10.1109/BCCA58897.2023.10338868(315-328)Online publication date: 24-Oct-2023
https://doi.org/10.1109/BCCA58897.2023.10338868
Shrivastava SSharma A(2022)Distributed Ledger Technology (DLT) and Byzantine Fault Tolerance in BlockchainSoft Computing: Theories and Applications10.1007/978-981-19-0707-4_86(971-981)Online publication date: 2-Jun-2022
https://doi.org/10.1007/978-981-19-0707-4_86
Distler T(2021)Byzantine Fault-tolerant State-machine Replication from a Systems PerspectiveACM Computing Surveys10.1145/343672854:1(1-38)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3436728
Oglio JHood KSharma GNesterenko M(2021)Byzantine GeoconsensusNetworked Systems10.1007/978-3-030-91014-3_2(19-35)Online publication date: 19-May-2021
https://dl.acm.org/doi/10.1007/978-3-030-91014-3_2
Zhao W(2021)Byzantine Fault ToleranceFrom Traditional Fault Tolerance to Blockchain10.1002/9781119682127.ch7(245-293)Online publication date: 18-Jun-2021
https://doi.org/10.1002/9781119682127.ch7
Zhang YSetty SChen QZhou LAlvisi LLu SHowell J(2020)Byzantine ordered consensus without byzantine oligarchyProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488802(633-649)Online publication date: 4-Nov-2020
https://dl.acm.org/doi/10.5555/3488766.3488802
Gorbenko ARomanovsky ATarasyuk OBiloborodov O(2020)From Analyzing Operating System Vulnerabilities to Designing Multiversion Intrusion-Tolerant ArchitecturesIEEE Transactions on Reliability10.1109/TR.2019.289724869:1(22-39)Online publication date: Mar-2020
https://doi.org/10.1109/TR.2019.2897248
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents