Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Scalable and Robust Snapshot Isolation for High-Performance Storage Engines

Published: 01 February 2023 Publication History

Abstract

MVCC-based snapshot isolation promises that read queries can proceed without interfering with concurrent writes. However, as we show experimentally, in existing implementations a single long-running query can easily cause transactional throughput to collapse. Moreover, existing out-of-memory commit protocols fail to meet the scalability needs of modern multi-core systems. In this paper, we present three complementary techniques for robust and scalable snapshot isolation in out-of-memory systems. First, we propose a commit protocol that minimizes cross-thread communication for better scalability, avoids touching the write set on commit, and enables efficient fine-granular garbage collection. Second, we introduce the Graveyard Index, an auxiliary data structure that moves logically-deleted tuples out of the way of operational transactions. Third, we present an adaptive version storage scheme that enables fast garbage collection and improves scan performance of frequently-modified tuples. All techniques are engineered to scale well on multi-core processors, and together enable robust performance for complex hybrid workloads.

References

[1]
2021. PostgreSQL. https://github.com/postgres/postgres/releases/tag/REL_12_9.
[2]
2022. LeanStore - A High-Performance Storage Engine for Modern Hardware. https://leanstore.io/.
[3]
2022. WiredTiger Storage Engine. https://source.wiredtiger.com/10.0.0/index.html.
[4]
Ants Aasma. 2021. Proposal for CSN based snapshots. https://www.postgresql.org/message-id/CA%2BCSw_tEpJ%3Dmd1zgxPkjH6CWDnTDft4gBi%3D%2BP9SnoC%2BWy3pKdA%40mail.gmail.com.
[5]
Adnan Alhomssi, Michael Haubenschild, and Viktor Leis. 2023. The Evolution of LeanStore. In BTW.
[6]
Adnan Alhomssi and Viktor Leis. 2021. Contention and Space Management in B-Trees. In CIDR.
[7]
Oracle and/or its affiliates. 2021. MySQL 8.0 InnoDB. https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/trx/trx0purge.cc
[8]
Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and Anastasia Ailamaki. 2017. The Case For Heterogeneous HTAP. In CIDR.
[9]
Vaibhav Arora, Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. 2018. Janus: A Hybrid Scalable Multi-Representation Cloud Datastore. IEEE Trans. Knowl. Data Eng. (2018).
[10]
Ronald Barber, Christian Garcia-Arellano, Ronen Grosman, René Müller, Vijayshankar Raman, Richard Sidle, Matt Spilchen, Adam J. Storm, Yuanyuan Tian, Pinar Tözün, Daniel C. Zilio, Matt Huras, Guy M. Lohman, Chandrasekaran Mohan, Fatma Özcan, and Hamid Pirahesh. 2017. Evolving Databases for New-Gen Big Data Applications. In CIDR.
[11]
Ronald Barber, Vijayshankar Raman, Richard Sidle, Yuanyuan Tian, and Pinar Tözün. 2019. Wildfire: HTAP for Big Data. In Encyclopedia of Big Data Technologies.
[12]
Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, and Patrick E. O'Neil. 1995. A Critique of ANSI SQL Isolation Levels. In SIGMOD.
[13]
Jan Böttcher, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2019. Scalable Garbage Collection for In-Memory MVCC Systems. PVLDB 13, 2 (2019), 128--141.
[14]
Chaoyi Cheng, Mingzhe Han, Nuo Xu, Spyros Blanas, Michael D. Bond, and Yang Wang. 2023. Developer's Responsibility or Database's Responsibility? Rethinking Concurrency Control in Databases. In CIDR.
[15]
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, and David A. Wood. 1984. Implementation Techniques for Main Memory Database Systems. In SIGMOD.
[16]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In SIGMOD.
[17]
Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudré-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. PVLDB 7, 4 (2013), 277--288.
[18]
Franz Faerber, Alfons Kemper, Per-Åke Larson, Justin J. Levandoski, Thomas Neumann, and Andrew Pavlo. 2017. Main Memory Database Systems. Found. Trends Databases 8, 1-2 (2017), 1--130.
[19]
Jose M. Faleiro, Daniel Abadi, and Joseph M. Hellerstein. 2017. High Performance Transactions via Early Write Visibility. PVLDB 10, 5 (2017), 613--624.
[20]
Jose M. Faleiro and Daniel J. Abadi. 2015. Rethinking serializable multiversion concurrency control. PVLDB 8, 11 (2015), 1190--1201.
[21]
Alan D. Fekete, Dimitrios Liarokapis, Elizabeth J. O'Neil, Patrick E. O'Neil, and Dennis E. Shasha. 2005. Making snapshot isolation serializable. ACM Trans. Database Syst. (2005).
[22]
Michael J. Freitag, Alfons Kemper, and Thomas Neumann. 2022. Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems. PVLDB 15, 11 (2022), 2797--2810.
[23]
Andres Freund. [n.d.]. Improving Postgres Connection Scalability: Snapshots. https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462
[24]
Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudré-Mauroux, and Samuel Madden. 2010. HYRISE - A Main Memory Hybrid Storage Engine. PVLDB 4, 2 (2010), 105--116.
[25]
Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In CIDR.
[26]
Thanasis Hadzilacos. 1988. Serialization Graph Algorithms for Multiversion Concurrency Control. In PODS.
[27]
Michael Haubenschild, Caetano Sauer, Thomas Neumann, and Viktor Leis. 2020. Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines. In SIGMOD.
[28]
Ryan Johnson, Ippokratis Pandis, Radu Stoica, Manos Athanassoulis, and Anastasia Ailamaki. 2010. Aether: A Scalable Approach to Logging. PVLDB 3, 1 (2010), 681--692.
[29]
Jong-Bin Kim, Hyunsoo Cho, Kihwang Kim, Jaeseon Yu, Sooyong Kang, and Hyungsoo Jung. 2020. Long-lived Transactions Made Less Harmful. In SIGMOD.
[30]
Jong-Bin Kim, Jaeseon Yu, Jaechan Ahn, Sooyong Kang, and Hyungsoo Jung. 2022. Diva: Making MVCC Systems HTAP-Friendly. In SIGMOD.
[31]
Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, and Ippokratis Pandis. 2016. ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads. In SIGMOD.
[32]
Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. 2011. High-Performance Concurrency Control Mechanisms for Main-Memory Databases. PVLDB 5, 4 (2011), 298--309.
[33]
Juchang Lee, Hyungyu Shin, Chang Gyoo Park, Seongyun Ko, Jaeyun Noh, Yongjae Chuh, Wolfgang Stephan, and Wook-Shin Han. 2016. Hybrid Garbage Collection for Multi-Version Concurrency Control in SAP HANA. In SIGMOD.
[34]
Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-memory data management beyond main memory. In ICDE.
[35]
Viktor Leis, Michael Haubenschild, and Thomas Neumann. 2019. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method. IEEE Data Eng. Bull. (2019).
[36]
Tianyu Li, Matthew Butrovich, Amadou Ngom, Wan Shen Lim, Wes McKinney, and Andrew Pavlo. 2020. Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats. PVLDB 14, 4 (2020), 534--546.
[37]
David B. Lomet, Alan D. Fekete, Rui Wang, and Peter Ward. 2012. Multi-version Concurrency via Timestamp Range Conflict Management. In ICDE.
[38]
Darko Makreshanski, Jana Giceva, Claude Barthels, and Gustavo Alonso. 2017. BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications. In SIGMOD.
[39]
Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In SIGMOD.
[40]
Andrew Pavlo. 2023. BenchBase (formerly OLTPBench) is a Multi-DBMS SQL Benchmarking Framework. https://github.com/cmu-db/benchbase
[41]
Dan R. K. Ports and Kevin Grittner. 2012. Serializable Snapshot Isolation in PostgreSQL. PVLDB 5, 12 (2012), 1850--1861.
[42]
Thamir M. Qadah and Mohammad Sadoghi. 2018. QueCC: A Queue-oriented, Control-free Concurrency Architecture. In Middleware.
[43]
Stephen Revilak, Patrick E. O'Neil, and Elizabeth J. O'Neil. 2011. Precisely Serializable Snapshot Isolation (PSSI). In ICDE.
[44]
Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, and Mustafa Canim. 2018. L-Store: A Real-time OLTP and OLAP System. In EDBT.
[45]
Mohammad Sadoghi and Spyros Blanas. 2019. Transaction Processing on Modern Hardware. Morgan & Claypool Publishers.
[46]
Michael Stonebraker and Lawrence A. Rowe. 1986. The Design of Postgres. In SIGMOD.
[47]
Hironobu SUZUKI. 2023. The Internals of PostgreSQL. https://www.interdb.jp/pg/pgsql05.html
[48]
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In SOSP.
[49]
Tianzheng Wang, Ryan Johnson, Alan D. Fekete, and Ippokratis Pandis. 2017. Efficiently making (almost) any concurrency control mechanism serializable. VLDB J. 26, 4 (2017), 537--562.
[50]
Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo. 2017. An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. PVLDB 10, 7 (2017), 781--792.
[51]
Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. 2014. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. PVLDB 8, 3 (2014), 209--220.
[52]
Xiangyao Yu, Andrew Pavlo, Daniel Sánchez, and Srinivas Devadas. 2016. TicToc: Time Traveling Optimistic Concurrency Control. In SIGMOD.
[53]
Ling Zhang, Matthew Butrovich, Tianyu Li, Andrew Pavlo, Yash Nannapaneni, John Rollinson, Huanchen Zhang, Ambarish Balakumar, Daniel Biales, Ziqi Dong, Emmanuel J. Eppinger, Jordi E. Gonzalez, Wan Shen Lim, Jianqiao Liu, Lin Ma, Prashanth Menon, Soumil Mukherjee, Tanuj Nayak, Amadou Ngom, Dong Niu, Deepayan Patra, Poojita Raj, Stephanie Wang, Wuwen Wang, Yao Yu, and William Zhang. 2021. Everything is a Transaction: Unifying Logical Concurrency Control and Physical Data Structure Maintenance in Database Management Systems. In CIDR.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 6
February 2023
393 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2023
Published in PVLDB Volume 16, Issue 6

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)102
  • Downloads (Last 6 weeks)9
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LeanStore: A High-Performance Storage Engine for NVMe SSDsProceedings of the VLDB Endowment10.14778/3685800.368591517:12(4536-4545)Online publication date: 1-Aug-2024
  • (2024)Poplar: Partially-Ordered Parallel Logging for Lower Isolation LevelsWeb and Big Data10.1007/978-981-97-7238-4_30(477-493)Online publication date: 31-Aug-2024
  • (2023)Breathing New Life into an Old Tree: Resolving Logging Dilemma of B+-tree on Modern Computational Storage DrivesProceedings of the VLDB Endowment10.14778/3626292.362629717:2(134-147)Online publication date: 1-Oct-2023
  • (2023)Lock-Free Buffer Managers Do Not Require Delayed Memory ReclamationProceedings of the 1st Workshop on Simplicity in Management of Data10.1145/3596225.3596228(1-3)Online publication date: 23-Jun-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media