Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Strong consistency is not hard to get: two-phase locking and two-phase commit on thousands of cores

Published: 01 September 2019 Publication History

Abstract

Concurrency control is a cornerstone of distributed database engines and storage systems. In pursuit of scalability, a common assumption is that Two-Phase Locking (2PL) and Two-Phase Commit (2PC) are not viable solutions due to their communication overhead. Recent results, however, have hinted that 2PL and 2PC might not have such a bad performance. Nevertheless, there has been no attempt to actually measure how a state-of-the-art implementation of 2PL and 2PC would perform on modern hardware.
The goal of this paper is to establish a baseline for concurrency control mechanisms on thousands of cores connected through a low-latency network. We develop a distributed lock table supporting all the standard locking modes used in database engines. We focus on strong consistency in the form of strict serializability implemented through strict 2PL, but also explore read-committed and repeatable-read, two common isolation levels used in many systems. We do not leverage any known optimizations in the locking or commit parts of the protocols. The surprising result is that, for TPC-C, 2PL and 2PC can be made to scale to thousands of cores and hundreds of machines, reaching a throughput of over 21 million transactions per second with 9.5 million New Order operations per second. Since most existing relational database engines use some form of locking for implementing concurrency control, our findings provide a path for such systems to scale without having to significantly redesign transaction management. To achieve these results, our implementation relies on Remote Direct Memory Access (RDMA). Today, this technology is commonly available on both Infiniband as well as Ethernet networks, making the results valid across a wide range of systems and platforms, including database appliances, data centers, and cloud environments.

References

[1]
G. Alonso, D. Kossmann, and T. Roscoe. SwissBox: An architecture for data processing appliances. In CIDR, pages 32--37, 2011.
[2]
G. Alonso, C. Binnig, I. Pandis, K. Salem, J. Skrzypczak, R. Stutsman, L. Thostrup, T. Wang, Z. Wang, and T. Ziegler. DPI: The Data Processing Interface for Modern Networks. In CIDR, 2019.
[3]
C. Barthels, G. Alonso, and T. Hoefler. Designing Databases for Future High-Performance Networks. IEEE Data Eng. Bull., 40(1):15--26, 2017.
[4]
C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-Scale In-Memory Join Processing Using RDMA. In SIGMOD, pages 1463--1475, 2015.
[5]
C. Barthels, I. Müller, T. Schneider, G. Alonso, and T. Hoefler. Distributed Join Algorithms on Thousands of Cores. PVLDB, 10(5):517--528, 2017.
[6]
R. Belli and T. Hoefler. Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization. In IPDPS, pages 871--881, 2015.
[7]
P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman, 1987.
[8]
M. Besta and T. Hoefler. Slim Fly: A Cost Effective Low-diameter Network Topology. In SC, pages 348--359, 2014.
[9]
M. Burrows. The Chubby Lock Service for Loosely-coupled Distributed Systems. In OSDI, pages 335--350, 2006.
[10]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. C. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's Globally-Distributed Database. In OSDI, pages 261--264, 2012.
[11]
S. Di Girolamo, P. Jolivet, K. D. Underwood, and T. Hoefler. Exploiting Offload-Enabled Network Interfaces. Micro, 36(4):6--17, 2016.
[12]
A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast Remote Memory. In NSDI, pages 401--414, 2014. url: http://dl.acm.org/citation.cfm?id=2616448.2616486.
[13]
A. Dragojević, D. Narayanan, E. B. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: distributed transactions with consistency, availability, and performance. In SOSP, pages 54--70, 2015.
[14]
P. W. Frey and G. Alonso. Minimizing the Hidden Cost of RDMA. In ICDCS, pages 553--560, 2009.
[15]
R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highly-scalable Remote Memory Access Programming with MPI-3 One Sided. In SC, 53:1--53:12, 2013.
[16]
J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1992. isbn: 9780080519555.
[17]
R. Harding, D. Van Aken, A. Pavlo, and M. Stonebraker. An Evaluation of Distributed Concurrency Control. PVLDB, 10(5):553--564, 2017.
[18]
J. Hilland, P. Culley, J. Pinkerton, and R. Recio. RDMA Protocol Verbs Specification, 2003.
[19]
T. Hoefler, S. Di Girolamo, K. Taranov, R. E. Grant, and R. Brightwell. sPIN: High-performance streaming Processing In the Network. In SC, pages 1--16. ACM Press, 2017.
[20]
Jeff Barr. Now Available - Elastic Fabric Adapter (EFA) for Tightly-Coupled HPC Workloads, Apr. 2019. url: https://aws.amazon.com/blogs/aws/now-available-elastic-fabric-adapter-efa-for-tightly-coupled-hpc-workloads/.
[21]
A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In OSDI, pages 185--201, 2016.
[22]
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA Efficiently for Key-value Services. SIGCOMM, 44(4):295--306, 2014.
[23]
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. B. Zdonik, E. P. C. Jones, S. R. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: A High-Performance, Distributed Main Memory Transaction Processing System. In PVLDB, volume 1 of number 2, pages 1496--1499, 2008.
[24]
A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE, pages 195--206, 2011.
[25]
J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-Driven, Highly-Scalable Dragonfly Topology. SIGARCH, 36(3):77--88, 2008.
[26]
H. T. Kung and J. T. Robinson. On Optimistic Methods for Concurrency Control. TODS, 6(2):213--226, 1981.
[27]
F. Liu, L. Yin, and S. Blanas. Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems. In EuroSys, pages 48--63, 2017.
[28]
D. Makreshanski, J. Giceva, C. Barthels, and G. Alonso. Batch-DB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications. In SIGMOD, pages 37--50, 2017.
[29]
M. Pilman, K. Bocksrocker, L. Braun, R. Marroquín, and D. Kossmann. Fast Scans on Key-Value Stores. PVLDB, 10(11): 1526--1537, 2017.
[30]
W. Rödiger, S. Idicula, A. Kemper, and T. Neumann. Flow-Join: Adaptive skew handling for distributed joins over high-speed networks. In ICDE, pages 1194--1205, 2016.
[31]
W. Rödiger, T. Mühlbauer, A. Kemper, and T. Neumann. High-speed Query Processing over High-speed Networks. PVLDB, 9(4):228--239, 2015.
[32]
P. Schmid, M. Besta, and T. Hoefler. High-Performance Distributed RMA Locks. In HPDC, pages 19--30, 2016.
[33]
Tejas Karmarkar. Availability of Linux RDMA on Microsoft Azure, 2015. url: https://azure.microsoft.com/es-es/blog/azure-linux-rdma-hpc-available/.
[34]
M. ten Bruggencate and D. Roweth. Dmapp - An API for One-sided Program Models on Baker Systems. In Cray User Group, 2010.
[35]
A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: Fast Distributed Transactions for Partitioned Database Systems. In SIGMOD, 2012.
[36]
S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. In SOSP, pages 18--32, 2013.
[37]
F. V. Van Wig, L. A. Kachelmeier, and K. N. Erickson. Comparison of High Performance Network Options: EDR ***Infini-Band vs.100Gb RDMA Capable Ethernet. In SC (Poster), 2016.
[38]
T. Wang, R. Johnson, A. Fekete, and I. Pandis. Efficiently making (almost) any concurrency control mechanism serializable. VLDB, 26(4):537--562, 2017.
[39]
X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast in-memory transaction processing using RDMA and HTM. In SOSP, pages 87--104, 2015.
[40]
D. Y. Yoon, M. Chowdhury, and B. Mozafari. Distributed Lock Management with RDMA: Decentralization without Starvation. In SIGMOD, 2018.
[41]
X. Yu, G. Bezerra, A. Pavlo, S. Devadas, and M. Stonebraker. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. PVLDB, 8(3):209--220, 2014.
[42]
E. Zamanian, C. Binnig, T. Harris, and T. Kraska. The End of a Myth: Distributed Transactions Can Scale. PVLDB, 10(6): 685--696, 2017.
[43]
E. Zamanian, X. Yu, M. Stonebraker, and T. Kraska. Rethinking database high availability with RDMA networks. PVLDB, 12(11):1637--1650, 2019.
[44]
T. Ziegler, S. Tumkur Vani, C. Binnig, R. Fonseca, and T. Kraska. Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. In SIGMOD, pages 741--758. ACM Press, 2019.

Cited By

View all
  • (2024)TDSQL: Tencent Distributed Database SystemProceedings of the VLDB Endowment10.14778/3685800.368581217:12(3869-3882)Online publication date: 1-Aug-2024
  • (2024)RCBench: an RDMA-enabled transaction framework for analyzing concurrency control algorithmsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00821-033:2(543-567)Online publication date: 1-Mar-2024
  • (2023)OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed ClusterProceedings of the VLDB Endowment10.14778/3611540.361156016:12(3728-3740)Online publication date: 1-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 13
September 2019
97 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2019
Published in PVLDB Volume 12, Issue 13

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)4
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TDSQL: Tencent Distributed Database SystemProceedings of the VLDB Endowment10.14778/3685800.368581217:12(3869-3882)Online publication date: 1-Aug-2024
  • (2024)RCBench: an RDMA-enabled transaction framework for analyzing concurrency control algorithmsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00821-033:2(543-567)Online publication date: 1-Mar-2024
  • (2023)OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed ClusterProceedings of the VLDB Endowment10.14778/3611540.361156016:12(3728-3740)Online publication date: 1-Aug-2023
  • (2023)Efficient Distributed Transaction Processing in Heterogeneous NetworksProceedings of the VLDB Endowment10.14778/3583140.358315316:6(1372-1385)Online publication date: 20-Apr-2023
  • (2023)Evaluating the Performance Impact of No-Wait Approach to Resolving Write Conflicts in DatabasesComputer Performance Engineering and Stochastic Modelling10.1007/978-3-031-43185-2_12(171-185)Online publication date: 20-Jun-2023
  • (2022)PShard: A Practical Sharding Protocol for Enterprise BlockchainProceedings of the 2022 5th International Conference on Blockchain Technology and Applications10.1145/3581971.3581987(110-116)Online publication date: 16-Dec-2022
  • (2022)EFA: A Viable Alternative to RDMA over InfiniBand for DBMSs?Proceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3538506(1-5)Online publication date: 12-Jun-2022
  • (2021)RedyProceedings of the VLDB Endowment10.14778/3503585.350358715:4(766-779)Online publication date: 1-Dec-2021
  • (2021)ModularisProceedings of the VLDB Endowment10.14778/3484224.348422914:13(3308-3321)Online publication date: 1-Sep-2021

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media