Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3578358.3591329acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Generic Checkpointing Support for Stream-based State-Machine Replication

Published: 08 May 2023 Publication History

Abstract

Stream-based replication facilitates the deployment and operation of state-machine replication protocols by running them as applications on top of data-stream processing frameworks. Taking advantage of platform-provided features, this approach makes it possible to significantly minimize implementation complexity at the protocol level. To further extend the associated benefits, in this paper we examine how the concept can be used to provide generic support for creating, storing, and applying checkpoints of replica states, both in the use case for catch up and garbage collection as well as to recover failed replicas. Specifically, we present three checkpointing-mechanism designs with different degrees of platform involvement and evaluate them in the context of Twitter's stream-processing engine Heron.

References

[1]
Eduardo Alchieri, Fernando Dotti, Odorico M Mendizabal, and Fernando Pedone. 2017. Reconfiguring Parallel State Machine Replication. In Proceedings of the 36th International Symposium on Reliable Distributed Systems (SRDS '17). 104--113.
[2]
Alysson Bessani, Marcel Santos, João Felix, Nuno Neves, and Miguel Correia. 2013. On the Efficiency of Durable State Machine Replication. In Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC '13). 169--180.
[3]
Alysson Bessani, João Sousa, and Eduardo E P Alchieri. 2014. State Machine Replication for the Masses with BFT-SMaRt. In Proceedings of the 44th International Conference on Dependable Systems and Networks (DSN '14). 355--362.
[4]
Tobias Distler. 2021. Byzantine Fault-Tolerant State-Machine Replication from a Systems Perspective. Comput. Surveys 54, 1, Article 24 (2021), 38 pages.
[5]
Tobias Distler, Rüdiger Kapitza, Ivan Popov, Hans P. Reiser, and Wolfgang Schröder-Preikschat. 2011. SPARE: Replicas on Hold. In Proceedings of the 18th Network and Distributed System Security Symposium (NDSS '11). 407--420.
[6]
Tobias Distler, Rüdiger Kapitza, and Hans P. Reiser. 2010. State Transfer for Hypervisor-Based Proactive Recovery of Heterogeneous Replicated Services. In Proceedings of the 5th "Sicherheit, Schutz und Zuverlässigkeit" Conference (SICHERHEIT '10). 61--72.
[7]
Michael Eischer, Markus Büttner, and Tobias Distler. 2019. Deterministic Fuzzy Checkpoints. In Proceedings of the 38th International Symposium on Reliable Distributed Systems (SRDS '19). 153--162.
[8]
E. N. (Mootaz) Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. 2002. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. Comput. Surveys 34, 3 (2002), 375--408.
[9]
Robert Hagmann. 1987. Reimplementing the Cedar File System Using Logging and Group Commit. In Proceedings of the 11th Symposium on Operating Systems Principles (SOSP '87). 155--162.
[10]
Jaehyun Hwang and Qizhe Cai. 2020. TCP ≈ RDMA: CPU-efficient Remote Storage Access with i10. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI '20). 127--140.
[11]
Jan Kończak and Paweł T Wojciechowski. 2021. Failure Recovery from Persistent Memory in Paxos-Based State Machine Replication. In Proceedings of the 40th International Symposium on Reliable Distributed Systems (SRDS '21). 88--98.
[12]
Jan Kończak, Paweł T Wojciechowski, Nuno Santos, Tomasz Żurkowski, and André Schiper. 2019. Recovery Algorithms for Paxos-Based State Machine Replication. IEEE Transactions on Dependable and Secure Computing 18, 2 (2019), 623--640.
[13]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 41st International Conference on Management of Data (SIGMOD '15). 239--250.
[14]
Leslie Lamport. 1998. The Part-time Parliament. ACM Transactions on Computer Systems 16, 2 (1998), 133--169.
[15]
Laura Lawniczak and Tobias Distler. 2021. Stream-based State Machine Replication. In Proceedings of the 17th European Dependable Computing Conference (EDCC '21). 119--126.
[16]
Xiaojian Liao, Zhe Yang, and Jiwu Shu. 2022. RIO: Order-Preserving and CPU-Efficient Remote Storage Access. arXiv preprint arXiv:2210.08934 (2022).
[17]
Odorico M Mendizabal, Fernando Luís Dotti, and Fernando Pedone. 2016. Analysis of Checkpointing Overhead in Parallel State Machine Replication. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC '16). 534--537.
[18]
Odorico M Mendizabal, Parisa Jalili Marandi, Fernando Luís Dotti, and Fernando Pedone. 2014. Checkpointing in Parallel State-Machine Replication. In Proceedings of the 18th International Conference on Principles of Distributed Systems (OPODIS '14). 123--138.
[19]
Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC '14). 305--320.
[20]
Tuanir F Rezende, Pierre Sutra, Rodrigo Q Saramago, and Lasaro Camargos. 2017. On Making Generalized Paxos Practical. In Proceedings of the 31st International Conference on Advanced Information Networking and Applications (AINA '17). 347--354.
[21]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm @Twitter. In Proceedings of the 40th International Conference on Management of Data (SIGMOD '14). 147--156.
[22]
Qingfeng Zhuge, Hao Zhang, Edwin Hsing-Mean Sha, Rui Xu, Jun Liu, and Shengyu Zhang. 2021. Exploring Efficient Architectures on Remote In-Memory NVM over RDMA. ACM Transactions on Embedded Computing Systems (TECS) 20, 5s (2021), 1--20.

Index Terms

  1. Generic Checkpointing Support for Stream-based State-Machine Replication

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PaPoC '23: Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data
    May 2023
    89 pages
    ISBN:9798400700866
    DOI:10.1145/3578358
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. state-machine replication
    2. checkpointing
    3. recovery
    4. data-stream processing

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    PaPoC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 34 of 47 submissions, 72%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 68
      Total Downloads
    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 17 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media