Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Cornus: atomic commit for a cloud DBMS with storage disaggregation

Published: 01 October 2022 Publication History

Abstract

Two-phase commit (2PC) is widely used in distributed databases to ensure atomicity of distributed transactions. Conventional 2PC was originally designed for the shared-nothing architecture and has two limitations: long latency due to two eager log writes on the critical path, and blocking of progress when a coordinator fails.
Modern cloud-native databases are moving to a storage disaggregation architecture where storage is a shared highly-available service. Our key observation is that disaggregated storage enables protocol innovations that can address both the long-latency and blocking problems. We develop Cornus, an optimized 2PC protocol to achieve this goal. The only extra functionality Cornus requires is an atomic compare-and-swap capability in the storage layer, which many existing storage services already support. We present Cornus in detail and show how it addresses the two limitations. We also deploy it on real storage services including Azure Blob Storage and Redis. Empirical evaluations show that Cornus can achieve up to 1.9X latency reduction over conventional 2PC.

References

[1]
[n.d.]. Amazon DynamoDB: Allows item-level access to DynamoDB based on an Amazon Cognito ID. https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_dynamodb_items.html. (visited on 2022/03/01).
[2]
[n.d.]. Amazon DynamoDB API Operations. https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Operations_Amazon_DynamoDB.html. (visited on 2022/03/01).
[3]
[n.d.]. Azure Blob Storage. https://azure.microsoft.com/en-us/services/storage/blobs/. (visited on 2022/03/01).
[4]
[n.d.]. Azure Cache for Redis. https://azure.microsoft.com/en-us/services/cache/. (visited on 2022/03/01).
[5]
[n.d.]. Azure Storage redundancy. https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy. (visited on 2022/03/01).
[6]
[n.d.]. CockroachDB. https://www.cockroachlabs.com.
[7]
[n.d.]. Google Cloud BigTable --- Writes. https://cloud.google.com/bigtable/docs/writes#conditional. (visited on 2022/03/01).
[8]
[n.d.]. H-Store: A Next Generation OLTP DBMS. http://hstore.cs.brown.edu.
[9]
[n.d.]. Redis. https://redis.io. (visited on 2022/03/01).
[10]
[n.d.]. Redis ACL. https://redis.io/topics/acl. (visited on 2022/03/01).
[11]
[n.d.]. Redis EVAL script. https://redis.io/commands/eval. (visited on 2022/03/01).
[12]
2014. Managing Concurrency in Microsoft Azure Storage. https://azure.microsoft.com/en-us/blog/managing-concurrency-in-microsoft-azure-storage-2/. (visited on 2022/03/01).
[13]
2015. gRPC: A high performance, open-source universal RPC framework. https://grpc.io/. (visited on 2022/03/01).
[14]
2018. Amazon Athena --- Serverless Interactive Query Service. https://aws.amazon.com/athena. (visited on 2022/03/01).
[15]
2018. Amazon Redshift. https://aws.amazon.com/redshift. (visited on 2022/03/01).
[16]
2018. Presto. https://prestodb.io. (visited on 2022/03/01).
[17]
2020. Parallel Commits. https://www.cockroachlabs.com/docs/v20.2/architecture/transaction-layer.html#parallel-commits (visited on 2022/03/01).
[18]
Maha Abdallah. 1997. A non-blocking single-phase commit protocol for rigorous participants. In In Proceedings of the National Conference Bases de Donnes Avances. Citeseer.
[19]
Maha Abdallah, Rachid Guerraoui, and Philippe Pucheral. 1998. One-phase commit: does it make sense?. In Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No. 98TB100250). IEEE, 182--192.
[20]
Y Al-Houmaily and P Chrysanthis. 1995. Two-phase commit in gigabit-networked distributed databases. In Int. Conf. on Parallel and Distributed Computing Systems (PDCS).
[21]
Panagiotis Antonopoulos, Alex Budovski, Cristian Diaconu, Alejandro Hernandez Saenz, Jack Hu, Hanuma Kodavalla, Donald Kossmann, Sandeep Lingam, Umar Farooq Minhas, Naveen Prakash, et al. 2019. Socrates: The new sql server in the cloud. In Proceedings of the 2019 International Conference on Management of Data. 1743--1756.
[22]
Michael Armbrust, Tathagata Das, Liwen Sun, Burak Yavuz, Shixiong Zhu, Mukul Murthy, Joseph Torres, Herman van Hovell, Adrian Ionescu, Alicja Łuszczak, Michał undefined witakowski, Michał Szafrański, Xiao Li, Takuya Ueshin, Mostafa Mokhtar, Peter Boncz, Ali Ghodsi, Sameer Paranjpye, Pieter Senster, Reynold Xin, and Matei Zaharia. 2020. Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proc. VLDB Endow. 13, 12, 3411--3424.
[23]
Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. 2015. SparkSQL: Relational Data Processing in Spark. In SIGMOD.
[24]
Ozalp Babaoglu and Sam Toueg. 1993. Understanding non-blocking atomic commitment. Distributed systems (1993).
[25]
Philip A Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency control and recovery in database systems. Vol. 370. Addison-wesley New York.
[26]
Matthias Brantner, Daniela Florescu, David Graf, Donald Kossmann, and Tim Kraska. 2008. Building a Database on S3. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD '08). Association for Computing Machinery, New York, NY, USA, 251--264.
[27]
Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, et al. 2011. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 143--157.
[28]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26, 2, 1--26.
[29]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143--154.
[30]
Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, et al. 2016. The Snowflake Elastic Data Warehouse. In SIGMOD.
[31]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review 41, 6 (2007), 205--220.
[32]
Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In SOSP. 54--70.
[33]
Tamer Eldeeb and Phil Bernstein. 2016. Transactions for Distributed Actors in the Cloud. Technical Report.
[34]
Jose M. Faleiro and Daniel J. Abadi. 2015. Rethinking Serializable Multiversion Concurrency Control. PVLDB (2015), 1190--1201.
[35]
Jose M Faleiro, Daniel J Abadi, and Joseph M Hellerstein. 2017. High performance transactions via early write visibility. Proceedings of the VLDB Endowment 10, 5 (2017), 613--624.
[36]
Hua Fan and Wojciech Golab. 2019. Ocean vista: gossip-based visibility control for speedy geo-distributed transactions. Proceedings of the VLDB Endowment 12, 11 (2019), 1471--1484.
[37]
Goetz Graefe, Mark Lillibridge, Harumi Kuno, Joseph Tucek, and Alistair Veitch. 2013. Controlled lock violation. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 85--96.
[38]
Jim Gray and Leslie Lamport. 2006. Consensus on Transaction Commit. ACM Trans. Database Syst. 31, 1 (March 2006), 133--160.
[39]
Hua Guo, Xuan Zhou, and Le Cai. 2021. Lock Violation for Fault-tolerant Distributed Database System. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1416--1427.
[40]
Zhihan Guo, Xinyu Zeng, Kan Wu, Wuh-Chwen Hwang, Ziwei Ren, Xiangyao Yu, Mahesh Balakrishnan, and Philip A. Bernstein. 2021. Cornus: Atomic Commit for a Cloud DBMS with Storage Disaggregation (Extended Version).
[41]
Suyash Gupta and Mohammad Sadoghi. 2018. EasyCommit: A Non-blocking Two-phase Commit Protocol. In EDBT. 157--168.
[42]
Rachael Harding, Dana Van Aken, Andrew Pavlo, and Michael Stonebraker. 2017. An Evaluation of Distributed Concurrency Control. VLDB (2017), 553--564.
[43]
Hideaki Kimura, Goetz Graefe, and Harumi A Kuno. 2012. Efficient locking techniques for databases on modern hardware. In ADMS@ VLDB. 1--12.
[44]
Tim Kraska, Gene Pang, Michael J Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-data center consistency. In Proceedings of the 8th ACM European Conference on Computer Systems. 113--126.
[45]
Hsiang-Tsung Kung and John T Robinson. 1981. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS) 6, 2 (1981), 213--226.
[46]
Inseon Lee and Heon Young Yeom. 2002. A single phase distributed commit protocol for main memory database systems. In Proceedings 16th International Parallel and Distributed Processing Symposium. IEEE, 8--pp.
[47]
Yi Lu, Xiangyao Yu, Lei Cao, and Samuel Madden. 2020. Aria: a fast and practical deterministic OLTP database. (2020).
[48]
Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, and Amr El Abbadi. 2013. Low-latency multi-datacenter databases using replicated commit. Proceedings of the VLDB Endowment 6, 9 (2013), 661--672.
[49]
Dahlia Malkhi and Jean-Philippe Martin. 2013. Spanner's concurrency control. ACM SIGACT News 44, 3 (2013), 73--77.
[50]
C Mohan, Bruce Lindsay, and Ron Obermarck. 1986. Transaction management in the R* distributed database management system. ACM Transactions on Database Systems (TODS) 11, 4 (1986), 378--396.
[51]
Thamir Qadah, Suyash Gupta, and Mohammad Sadoghi. 2020. Q-Store: Distributed, Multi-partition Transactions via Queue-oriented Execution and Communication. In EDBT. 73--84.
[52]
George Samaras, Kathryn Britton, Andrew Citron, and C Mohan. 1993. Two-phase commit optimizations and tradeoffs in the commercial environment. In Proceedings of IEEE 9th International Conference on Data Engineering. IEEE, 520--529.
[53]
Dale Skeen. 1981. Nonblocking commit protocols. In Proceedings of the 1981 ACM SIGMOD international conference on Management of data. 133--142.
[54]
Eljas Soisalon-Soininen and Tatu Ylönen. 1995. Partial strictness in two-phase locking. In International Conference on Database Theory. Springer, 139--147.
[55]
James W Stamos and Flaviu Cristian. 1990. A low-cost atomic commit protocol. In Proceedings Ninth Symposium on Reliable Distributed Systems. IEEE, 66--75.
[56]
James W Stamos and Flaviu Cristian. 1993. Coordinator log transaction execution protocol. Distributed and Parallel Databases 1, 4 (1993), 383--408.
[57]
Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, et al. 2020. Cockroachdb: The resilient geo-distributed SQL database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1493--1509.
[58]
Alexander Thomson and Daniel J Abadi. 2010. The case for determinism in database systems. Proceedings of the VLDB Endowment 3, 1--2 (2010), 70--80.
[59]
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. In SIGMOD. 1--12.
[60]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu, and Raghotham Murthy. 2010. Hive --- A Petabyte Scale Data Warehouse Using Hadoop. In ICDE.
[61]
Nathan VanBenschoten. 2019. Parallel Commits: An Atomic Commit Protocol For Globally Distributed Transactions. https://www.cockroachlabs.com/blog/parallel-commits/ (visited on 2022/03/01).
[62]
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data. 1041--1052.
[63]
Xinan Yan, Linguan Yang, Hongbo Zhang, Xiayue Charles Lin, Bernard Wong, Kenneth Salem, and Tim Brecht. 2018. Carousel: Low-latency transaction processing for globally-distributed data. In Proceedings of the 2018 International Conference on Management of Data. 231--243.
[64]
Xiangyao Yu, Yu Xia, Andrew Pavlo, Daniel Sanchez, Larry Rudolph, and Srinivas Devadas. 2018. Sundial: harmonizing concurrency control and caching in a distributed OLTP database management system. Proceedings of the VLDB Endowment 11, 10 (2018), 1289--1302.
[65]
Irene Zhang, Naveen Kr Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan RK Ports. 2018. Building consistent transactions with inconsistent replication. ACM Transactions on Computer Systems (TOCS) 35, 4 (2018), 1--37.
[66]
Jingyu Zhou, Meng Xu, Alexander Shraer, Bala Namasivayam, Alex Miller, Evan Tschannen, Steve Atherton, Andrew J Beamon, Rusty Sears, John Leach, et al. 2021. Foundationdb: A distributed unbundled transactional key value store. In Proceedings of the 2021 International Conference on Management of Data. 2653--2666.
[67]
Jingyu Zhou, Meng Xu, Alexander Shraer, Bala Namasivayam, Alex Miller, Evan Tschannen, Steve Atherton, Andrew J. Beamon, Rusty Sears, John Leach, Dave Rosenthal, Xin Dong, Will Wilson, Ben Collins, David Scherer, Alec Grieser, Young Liu, Alvin Moore, Bhaskar Muppana, Xiaoge Su, and Vishesh Yadav. 2021. FoundationDB: A Distributed Unbundled Transactional Key Value Store. Association for Computing Machinery, New York, NY, USA, 2653--2666.

Cited By

View all
  • (2024)PALF: Replicated Write-Ahead Logging for Distributed DatabasesProceedings of the VLDB Endowment10.14778/3685800.368580317:12(3745-3758)Online publication date: 1-Aug-2024
  • (2024)Occam's Razor for Distributed ProtocolsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698514(618-636)Online publication date: 20-Nov-2024
  • (2024)Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated MemoryProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00043(1-18)Online publication date: 17-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 2
October 2022
266 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2022
Published in PVLDB Volume 16, Issue 2

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)7
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PALF: Replicated Write-Ahead Logging for Distributed DatabasesProceedings of the VLDB Endowment10.14778/3685800.368580317:12(3745-3758)Online publication date: 1-Aug-2024
  • (2024)Occam's Razor for Distributed ProtocolsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698514(618-636)Online publication date: 20-Nov-2024
  • (2024)Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated MemoryProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00043(1-18)Online publication date: 17-Nov-2024
  • (2023)VeriTxn: Verifiable Transactions for Cloud-Native Databases with Storage DisaggregationProceedings of the ACM on Management of Data10.1145/36267641:4(1-27)Online publication date: 12-Dec-2023
  • (2023)Evaluating the Performance Impact of No-Wait Approach to Resolving Write Conflicts in DatabasesComputer Performance Engineering and Stochastic Modelling10.1007/978-3-031-43185-2_12(171-185)Online publication date: 7-Oct-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media