Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Strong and Efficient Consistency with Consistency-aware Durability

Published: 18 January 2021 Publication History

Abstract

We introduce consistency-aware durability or Cad, a new approach to durability in distributed storage that enables strong consistency while delivering high performance. We demonstrate the efficacy of this approach by designing cross-client monotonic reads, a novel and strong consistency property that provides monotonic reads across failures and sessions in leader-based systems; such a property can be particularly beneficial in geo-distributed and edge-computing scenarios. We build Orca, a modified version of ZooKeeper that implements Cad and cross-client monotonic reads. We experimentally show that Orca provides strong consistency while closely matching the performance of weakly consistent ZooKeeper. Compared to strongly consistent ZooKeeper, Orca provides significantly higher throughput (1.8--3.3×) and notably reduces latency, sometimes by an order of magnitude in geo-distributed settings. We also implement Cad in Redis and show that the performance benefits are similar to that of Cad’s implementation in ZooKeeper.

References

[1]
Ramnatthan Alagappan, Aishwarya Ganesan, Jing Liu, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Fault-tolerance, fast and slow: Exploiting failure asynchrony in distributed systems. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI’18).
[2]
Ramnatthan Alagappan, Aishwarya Ganesan, Yuvraj Patel, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Correlated crash vulnerabilities. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16).
[3]
Apache. ZooKeeper. Retrieved from https://zookeeper.apache.org/.
[4]
Apache. ZooKeeper Configuration Parameters. Retrieved from https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_configuration.
[5]
Apache. ZooKeeper Guarantees, Properties, and Definitions. Retrieved from https://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_guaranteesPropertiesDefinitions.
[6]
Apache. ZooKeeper Leader Activation. Retrieved from https://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_leaderElection.
[7]
Apache. ZooKeeper Overview. Retrieved from https://zookeeper.apache.org/doc/r3.5.1-alpha/zookeeperOver.html.
[8]
Apache ZooKeeper. ZooKeeper Consistency Guarantees. Retrieved from https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#ch_zkGuarantees.
[9]
Apache ZooKeeper. ZooKeeper Programmer’s Guide - ZooKeeper Stat Structure. Retrieved from https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_zkStatStructure.
[10]
Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2013. Bolt-on causal consistency. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13).
[11]
William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos replicated state machines as the basis of a high-performance data store. In Proceedings of the 8th Symposium on Networked Systems Design and Implementation (NSDI’11).
[12]
Randal C. Burns, Robert M. Rees, and Darrell D. E. Long. 2001. An analytical study of opportunistic lease renewal. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS’01).
[13]
Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13).
[14]
James Cipar, Greg Ganger, Kimberly Keeton, Charles B. Morrey III, Craig A. N. Soules, and Alistair Veitch. 2012. LazyBase: Trading freshness for performance in a scalable database. In Proceedings of the EuroSys Conference (EuroSys’12).
[15]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’10).
[16]
Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swinehart, and Doug Terry. 1987. Epidemic algorithms for replicated database maintenance. In Proceedings of the 26th ACM Symposium on Principles of Distributed Computing.
[17]
Aishwarya Ganesan, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. Strong and efficient consistency with consistency-aware durability. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20).
[18]
Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. 2018. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In Proceedings of the 15th Symposium on Networked Systems Design and Implementation (NSDI’18).
[19]
Cary G. Gray and David Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP’89).
[20]
Rachid Guerraoui, Matej Pavlovic, and Dragos-Adrian Seredinschi. 2016. Incremental consistency guarantees for replicated objects. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16).
[21]
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Prog. Lang. Syst. 12, 3 (1990).
[22]
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’10).
[23]
Henrik Ingo and Aishwarya Ganesan. 2020. Discussion with Henrik Ingo. Retrieved from https://www.openlife.cc/comment/662091#comment-662091.
[24]
Jonathan Corbet. 2009. O_*SYNC. Retrieved from https://lwn.net/Articles/350219/.
[25]
Karthik Ranganathan. 2019. Low latency reads in geo-distributed SQL with raft leader leases. Retrieved from https://blog.yugabyte.com/low-latency-reads-in-geo-distributed-sql-with-raft-leader-leases/.
[26]
Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine generals problem. ACM Trans. Prog. Lang. Syst. 4, 3 (1982), 382--401.
[27]
Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15).
[28]
Barbara Liskov and James Cowling. 2012. Viewstamped Replication Revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT CSAIL, 2012.
[29]
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11).
[30]
LogCabin. 2016. Retrieved from https://github.com/logcabin/logcabin.
[31]
Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt, Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, and Wyatt Lloyd. 2015. Existential consistency: Measuring and understanding consistency at Facebook. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15).
[32]
Syed Akbar Mehdi, Cody Littley, Natacha Crooks, Lorenzo Alvisi, Nathan Bronson, and Wyatt Lloyd. 2017. I can’t believe it’s not causal! Scalable causal consistency with no slowdown cascades. In Proceedings of the 14th Symposium on Networked Systems Design and Implementation (NSDI’17).
[33]
MongoDB. 2020. MongoDB Read Preference. Retrieved from https://docs.mongodb.com/manual/core/read-preference/.
[34]
MongoDB. 2020. MongoDB Replication. Retrieved from https://docs.mongodb.org/manual/replication/.
[35]
MongoDB. 2018. Non-Blocking Secondary Reads. Retrieved from https://www.mongodb.com/blog/post/mongodb-40-nonblocking-secondary-reads.
[36]
MongoDB. 2019. Read Concern Linearizable. Retrieved from https://docs.mongodb.com/manual/reference/read-concern-linearizable/.
[37]
Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is more consensus in egalitarian parliaments. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13).
[38]
Seyed Hossein Mortazavi, Bharath Balasubramanian, Eyal de Lara, and Shankaranarayanan Puzhavakath Narayanan. 2018. Toward session consistency for the edge. In Proceedings of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge’18).
[39]
Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06).
[40]
Diego Ongaro. 2014. Consensus: Bridging Theory and Practice. PhD thesis. Stanford University, Stanford, CA.
[41]
Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’14).
[42]
Dan R. K. Ports, Jialin Li, Vincent Liu, Naveen Kr. Sharma, and Arvind Krishnamurthy. 2015. Designing distributed systems using approximate synchrony in data center networks. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI’15).
[43]
David Ratner, Peter Reiher, Gerald J. Popek, and Geoffrey H. Kuenning. 2001. Replication requirements in mobile environments. Mob. Netw. Applic. 6, 6 (2001), 525--533.
[44]
Redis. Redis. Retrieved from http://redis.io/.
[45]
Redis. Redis Persistence. Retrieved from https://redis.io/topics/persistence.
[46]
Redis. Redis Replication. Retrieved from http://redis.io/topics/replication.
[47]
Redis. Redis Sentinel Documentation. Retrieved from https://redis.io/topics/sentinel.
[48]
Redis. Redis WAIT. Retrieved from https://redis.io/commands/wait.
[49]
Redis. Scaling Reads. Retrieved from https://redislabs.com/ebook/part-3-next-steps/chapter-10-scaling-redis/10-1-scaling-reads/.
[50]
Retwis. 2014. Retwis. Retrieved from https://github.com/antirez/retwis.
[51]
Robert Ricci, Eric Eide, and CloudLab Team. 2014. Introducing CloudLab: Scientific infrastructure for advancing cloud architectures and applications. USENIX ;login: 39, 6 (2014).
[52]
Doug Terry. 2013. Replicated data consistency explained through baseball. Commun. ACM 56, 12 (2013), 82--89.
[53]
Douglas B. Terry, Alan J. Demers, Karin Petersen, Mike J. Spreitzer, Marvin M. Theimer, and Brent B. Welch. 1994. Session guarantees for weakly consistent replicated data. In Proceedings of the 3rd International Conference on on Parallel and Distributed Information Systems (PDIS’94).
[54]
Paolo Viotti and Marko Vukolić. 2016. Consistency in non-transactional distributed storage systems. ACM Comput. Surv. 49, 1 (2016), 19:1--19:34.
[55]
Benjamin Wester, James Cowling, Edmund B. Nightingale, Peter M. Chen, Jason Flinn, and Barbara Liskov. 2009. Tolerating latency in replicated state machines through client speculation. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI’09).
[56]
Youjip Won, Jaemin Jung, Gyeongyeol Choi, Joontaek Oh, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. 2018. Barrier-enabled IO stack for flash storage. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18).

Cited By

View all
  • (2024)SWARM: Replicating Shared Disaggregated-Memory Data in No TimeProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695945(24-45)Online publication date: 4-Nov-2024
  • (2024)Raft Consensus Grouping Mechanism Based on Affinity Propagation Clustering Algorithm2024 4th International Conference on Consumer Electronics and Computer Engineering (ICCECE)10.1109/ICCECE61317.2024.10504245(15-19)Online publication date: 12-Jan-2024
  • (2022)Exploiting Nil-external Interfaces for Fast Replicated StorageACM Transactions on Storage10.1145/354282118:3(1-35)Online publication date: 2-Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 17, Issue 1
Special Section on Usenix Fast 2020
February 2021
165 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3446939
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 January 2021
Accepted: 01 September 2020
Received: 01 June 2020
Published in TOS Volume 17, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Consistency models
  2. durability
  3. persistence
  4. replication

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • VMware
  • DOE
  • NSF

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)277
  • Downloads (Last 6 weeks)51
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SWARM: Replicating Shared Disaggregated-Memory Data in No TimeProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695945(24-45)Online publication date: 4-Nov-2024
  • (2024)Raft Consensus Grouping Mechanism Based on Affinity Propagation Clustering Algorithm2024 4th International Conference on Consumer Electronics and Computer Engineering (ICCECE)10.1109/ICCECE61317.2024.10504245(15-19)Online publication date: 12-Jan-2024
  • (2022)Exploiting Nil-external Interfaces for Fast Replicated StorageACM Transactions on Storage10.1145/354282118:3(1-35)Online publication date: 2-Sep-2022
  • (2022)Leader Confirmation Replication for Millisecond Consensus in Private ChainsIEEE Internet of Things Journal10.1109/JIOT.2021.31138359:11(7944-7958)Online publication date: 1-Jun-2022
  • (2022)Multidirectional Replication for Supporting Strong Consistency, Low Latency, and High ThroughputAlexandria Engineering Journal10.1016/j.aej.2022.05.00561:12(11485-11510)Online publication date: Dec-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media