Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMA

Published: 20 June 2023 Publication History

Abstract

Remote data structures built with one-sided Remote Direct Memory Access (RDMA) are at the heart of many disaggregated database management systems today. Concurrent access to these data structures by thousands of remote workers necessitates a highly efficient synchronization scheme. Remarkably, our investigation reveals that existing synchronization schemes display substantial variations in performance and scalability. Even worse, some schemes do not correctly synchronize, resulting in rare and hard-to-detect data corruption. Motivated by these observations, we conduct the first comprehensive analysis of one-sided synchronization techniques and provide general principles for correct synchronization using one-sided RDMA. Our research demonstrates that adherence to these principles not only guarantees correctness but also results in substantial performance enhancements.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023

References

[1]
ARM. 2018. Arm CoreLink CCI-550 Cache Coherent Interconnect Technical Reference Manual. https://developer.arm. com/documentation/100282/0100/?lang=en. https://developer.arm.com/documentation/100282/0100/?lang=en
[2]
ARM. 2021. Introducing the AMBA Coherent Hub Interface. https://developer.arm.com/documentation/102407/0100
[3]
Wei Cao, Yingqiang Zhang, Xinjun Yang, Feifei Li, Sheng Wang, Qingda Hu, Xuntao Cheng, Zongzhi Chen, Zhenjun Liu, Jing Fang, Bo Wang, Yuhui Wang, Haiqing Sun, Ze Yang, Zhushi Cheng, Sen Chen, Jian Wu, Wei Hu, Jianwei Zhao, Yusong Gao, Songlu Cai, Yunyang Zhang, and Jiawang Tong. 2021. PolarDB Serverless. In Proceedings of the 2021 International Conference on Management of Data. ACM. https://doi.org/10.1145/3448016.3457560
[4]
Benjamin Cassell, Tyler Szepesi, Bernard Wong, Tim Brecht, Jonathan Ma, and Xiaoyi Liu. 2017. Nessie: A Decoupled, Client-Driven Key-Value Store Using RDMA. IEEE Trans. Parallel Distributed Syst. 28, 12 (2017), 3537--3552. https: //doi.org/10.1109/TPDS.2017.2729545
[5]
Yeounoh Chung and Erfan Zamanian. 2015. Using RDMA for Lock Management. CoRR abs/1507.03274 (2015). arXiv:1507.03274 http://arxiv.org/abs/1507.03274
[6]
NVIDIA Coporation. 2021. NVIDIA InfiniBand Adaptive Routing Technology. Whitepaper WP-10326-001_v01.
[7]
Andrei Marian Dan, Patrick Lam, Torsten Hoefler, and Martin T. Vechev. 2016. Modeling and analysis of remote memory access programming. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016, Eelco Visser and Yannis Smaragdakis (Eds.). ACM, 129--144. https://doi.org/10.1145/2983990.2984033
[8]
Aleksandar Dragojevic, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In NSDI.
[9]
Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: distributed transactions with consistency, availability, and performance. In SOSP.
[10]
Jingwen Du, Fang Wang, Dan Feng, Changchen Gan, Yuchao Cao, Xiaomin Zou, and Fan Li. 2023. Fast One-Sided RDMA-Based State Machine Replication for Disaggregated Memory. ACM Trans. Archit. Code Optim. (mar 2023). https://doi.org/10.1145/3587096 Just Accepted.
[11]
Philipp Fent, Alexander van Renen, Andreas Kipf, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2020. Low- Latency Communication for Fast DBMS Using RDMA and Shared Memory. In ICDE.
[12]
Torsten Hoefler, Duncan Roweth, Keith Underwood, Bob Alverson, Mark Griswold, Vahid Tabatabaee, Mohan Kalkunte, Surendra Anubolu, Siyan Shen, Abdul Kabbani, Moray McLaren, and Steve Scott. 2023. Datacenter Ethernet and RDMA: Issues at Hyperscale. arXiv:2302.03337 [cs.NI]
[13]
Chenchen Huang, Huiqi Hu, Xuecheng Qi, Xuan Zhou, and Aoying Zhou. 2021. RS-store: RDMA-enabled skiplist-based key-value store for efficient range query. Frontiers of Computer Science 15, 6 (sep 2021). https://doi.org/10.1007/s11704- 020-0126--6
[14]
Ram Huggahalli, Ravi R. Iyer, and Scott Tetrick. 2005. Direct Cache Access for High Bandwidth Network I/O. In 32st International Symposium on Computer Architecture (ISCA 2005), 4--8 June 2005, Madison, Wisconsin, USA. IEEE Computer Society, 50--59. https://doi.org/10.1109/ISCA.2005.23
[15]
InfiniBand Trade Association 2007. InfiniBand Architecture Specification Volume 1. InfiniBand Trade Association. Release 1.2.1.
[16]
InfiniBand Trade Association. 2010. RDMA Over Converged Ethernet (RoCE). https://cw.infinibandta.org/document/ dl/7148.
[17]
Intel. 2012. Intel Data Direct I/O Technology (Intel DDIO): A P rimer. https://www.intel.com/content/dam/www/ public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf
[18]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA efficiently for key-value services. In SIGCOMM.
[19]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. login Usenix Mag. 41, 3 (2016).
[20]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In OSDI.
[21]
Tejas Karmarkar. 2015. Availability of Linux RDMA on Microsoft Azure. Online. https://azure.microsoft.com/enus/ blog/azure-linux-rdma-hpc-available/
[22]
Dario Korolija, Dimitrios Koutsoukos, Kimberly Keeton, Konstantin Taranov, Dejan S. Milojicic, and Gustavo Alonso. 2022. Farview: Disaggregated Memory with Operator Off-loading for Database Engines. In 12th Conference on Innovative Data Systems Research, CIDR 2022, Chaminade, CA, USA, January 9--12, 2022. www.cidrdb.org. https://www.cidrdb.org/cidr2022/papers/p11-korolija.pdf
[23]
Viktor Leis, Michael Haubenschild, and Thomas Neumann. 2019. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method. IEEE Data Eng. Bull. 42 (2019), 73--84.
[24]
Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of practical synchronization. In Proceedings of the 12th International Workshop on Data Management on New Hardware, DaMoN 2016, San Francisco, CA, USA, June 27, 2016. ACM, 3:1--3:8. https://doi.org/10.1145/2933349.2933352
[25]
Edgar A. León, Kurt B. Ferreira, and Arthur B. Maccabe. 2007. Reducing the Impact of the MemoryWall for I/O Using Cache Injection. In 15th Annual IEEE Symposium on High-Performance Interconnects, HOTI 2007, Stanford, CA, USA, August 22--24, 2007, John W. Lockwood, Fabrizio Petrini, Ron Brightwell, and Dhabaleswar K. Panda (Eds.). IEEE Computer Society, 143--150. https://doi.org/10.1109/HOTI.2007.8
[26]
Jiuxing Liu, Jiesheng Wu, Sushmitha P. Kini, Pete Wyckoff, and Dhabaleswar K. Panda. 2003. High performance RDMA-based MPI implementation over InfiniBand. In ICS.
[27]
Simon Loesing, Markus Pilman, Thomas Etter, and Donald Kossmann. 2015. On the Design and Scalability of Distributed Shared-Data Databases. In SIGMOD.
[28]
Teng Ma, Kang Chen, Shaonan Ma, Zhuo Song, and Yongwei Wu. 2021. Thinking More about RDMA Memory Semantics. In IEEE International Conference on Cluster Computing, CLUSTER 2021, Portland, OR, USA, September 7--10, 2021. IEEE, 456--467. https://doi.org/10.1109/Cluster48925.2021.00033
[29]
Teng Ma, Dongbiao He, and Gordon Ning Liu. 2021. HybridSkipList: A Case Study of Designing Distributed Data Structure with Hybrid RDMA. In IEEE 45th Annual Computers, Software, and Applications Conference, COMPSAC 2021, Madrid, Spain, July 12--16, 2021. IEEE, 68--73. https://doi.org/10.1109/COMPSAC51774.2021.00021
[30]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In 2013 USENIX Annual Technical Conference, San Jose, CA, USA, June 26--28, 2013, Andrew Birrell and Emin Gün Sirer (Eds.). USENIX Association, 103--114. https://www.usenix.org/conference/atc13/technicalsessions/ presentation/mitchell
[31]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In USENIX ATC.
[32]
Christopher Mitchell, Kate Montgomery, Lamont Nelson, Siddhartha Sen, and Jinyang Li. 2016. Balancing CPU and Network in the Cell Distributed B-Tree Store. In USENIX ATC.
[33]
Sundeep Narravula, A. Marnidala, Abhinav Vishnu, Karthikeyan Vaidyanathan, and Dhabaleswar K. Panda. 2007. High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations. In Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 14--17 May 2007, Rio de Janeiro, Brazil. IEEE Computer Society, 583--590. https://doi.org/10.1109/CCGRID.2007.58
[34]
Jacob Nelson and Roberto Palmieri. 2020. Performance Evaluation of the Impact of NUMA on One-sided RDMA Interactions. In SRDS.
[35]
PCI-SIG. 2014. PCI Express Base Specification Revision 4.0. (2014).
[36]
R. Recio, B. Metzler, P. Culley, J. Hilland, and D. Garcia. 2007. A Remote Direct Memory Access Protocol Specification. Technical Report. https://doi.org/10.17487/rfc5040
[37]
Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, and Thomas G. Robertazzi. 2013. Design and performance evaluation of NUMA-aware RDMA-based end-to-end data transfer systems. In HiPC.
[38]
Zhenyuan Ruan, Malte Schwarzkopf, Marcos K. Aguilera, and Adam Belay. 2020. AIFM: High-Performance, Application- Integrated Far Memory. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4--6, 2020. USENIX Association, 315--332. https://www.usenix.org/conference/osdi20/presentation/ruan
[39]
H. Shah, F. Marti, W. Noureddine, A. Eiriksson, and R. Sharp. 2014. Remote Direct Memory Access (RDMA) Protocol Extensions. Technical Report. https://doi.org/10.17487/rfc7306
[40]
Debendra Das Sharma. 2019. Compute Express Link. Technical Report. Compute Express Link.
[41]
Arjun Singhvi, Aditya Akella, Maggie Anderson, Rob Cauble, Harshad Deshmukh, Dan Gibson, Milo M. K. Martin, Amanda Strominger, Thomas F. Wenisch, and Amin Vahdat. 2021. CliqueMap: Productionizing an RMA-Based Distributed Caching System. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference. ACM. https://doi.org/10. 1145/3452296.3472934
[42]
Dan Tang, Yungang Bao, Weiwu Hu, and Mingyu Chen. 2010. DMA cache: Using on-chip storage to architecturally separate I/O data from CPU data for improving I/O performance. In 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 9--14 January 2010, Bangalore, India, Matthew T. Jacob, Chita R. Das, and Pradip Bose (Eds.). IEEE Computer Society, 1--12. https://doi.org/10.1109/HPCA.2010.5416638
[43]
Konstantin Taranov, Fabian Fischer, and Torsten Hoefler. 2022. Efficient RDMA Communication Protocols. arXiv:2212.09134 [cs.NI]
[44]
Konstantin Taranov, Salvatore Di Girolamo, and Torsten Hoefler. 2021. CoRM: Compactable Remote Memory over RDMA. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. 1811--1824. https://doi.org/10.1145/3448016.3452817
[45]
Shin-Yeh Tsai, Yizhou Shan, and Yiying Zhang. 2020. Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15--17, 2020, Ada Gavrilovska and Erez Zadok (Eds.). USENIX Association, 33--48. https://www.usenix.org/conference/atc20/presentation/tsai
[46]
Chao Wang and Xuehai Qian. 2021. RDMA-enabled Concurrency Control Protocols for Transactions in the Cloud Era. IEEE Transactions on Cloud Computing (2021), 1--1. https://doi.org/10.1109/tcc.2021.3116516
[47]
Qing Wang, Youyou Lu, and Jiwu Shu. 2022. Sherman: A Write-Optimized Distributed BTree Index on Disaggregated Memory. In Proceedings of the 2022 International Conference on Management of Data. ACM. https://doi.org/10.1145/ 3514221.3517824
[48]
Ruihong Wang, Jianguo Wang, Stratos Idreos, M. Tamer Özsu, and Walid G. Aref. 2022. The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation. CoRR abs/2207.03027 (2022). https://doi.org/10.48550/arXiv.2207.03027 arXiv:2207.03027
[49]
Tinggang Wang, Shuo Yang, Hideaki Kimura, Garret Swart, and Spyros Blanas. 2020. Efficient Usage of One-Sided RDMA for Linear Probing. In International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2020, Tokyo, Japan, August 31, 2020, Rajesh Bordawekar and Tirthankar Lahiri (Eds.). 1--13. http://www.adms-conf.org/2020-camera-ready/ADMS20_06.pdf
[50]
Yandong Wang, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guerin, Xiaoqiao Meng, and Shicong Meng. 2015. HydraDB: a resilient RDMA-driven key-value middleware for in-memory cluster computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, Austin, TX, USA, November 15--20, 2015, Jackie Kern and Jeffrey S. Vetter (Eds.). ACM, 22:1--22:11. https://doi.org/10.1145/2807591.2807614
[51]
Xingda Wei, Rong Chen, Haibo Chen, and Binyu Zang. 2021. XStore: Fast RDMA-Based Ordered Key-Value Store Using Remote Learned Cache. ACM Trans. Storage 17, 3 (2021), 18:1--18:32. https://doi.org/10.1145/3468520
[52]
Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!. In OSDI.
[53]
Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In SOSP.
[54]
Erfan Zamanian, Carsten Binnig, Tim Kraska, and Tim Harris. 2016. The End of a Myth: Distributed Transactions Can Scale. CoRR abs/1607.00655 (2016).
[55]
Erfan Zamanian, Julian Shun, Carsten Binnig, and Tim Kraska. 2021. Chiller: Contention-centric Transaction Execution and Data Partitioning for Modern Networks. SIGMOD Rec. 50, 1 (2021).
[56]
Ming Zhang, Yu Hua, Pengfei Zuo, and Lurong Liu. 2022. FORD: Fast One-sided RDMA-based Distributed Transactions for Disaggregated Persistent Memory. In 20th USENIX Conference on File and Storage Technologies, FAST 2022, Santa Clara, CA, USA, February 22--24, 2022, Dean Hildebrand and Donald E. Porter (Eds.). USENIX Association, 51--68. https://www.usenix.org/conference/fast22/presentation/zhang-ming
[57]
Ming Zhang, Yu Hua, Pengfei Zuo, and Lurong Liu. 2022. FORD: Fast One-sided RDMA-based Distributed Transactions for Disaggregated Persistent Memory. In 20th USENIX Conference on File and Storage Technologies (FAST 22). USENIX Association, Santa Clara, CA, 51--68. https://www.usenix.org/conference/fast22/presentation/zhang-ming
[58]
Yingqiang Zhang, Chaoyi Ruan, Cheng Li, Jimmy Yang, Wei Cao, Feifei Li, Bo Wang, Jing Fang, Yuhui Wang, Jingze Huo, and Chao Bi. 2021. Towards Cost-Effective and Elastic Cloud Database Deployment via Memory Disaggregation. Proc. VLDB Endow. 14, 10 (2021), 1900--1912. https://doi.org/10.14778/3467861.3467877
[59]
Tobias Ziegler, Carsten Binnig, and Viktor Leis. 2022. ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022. ACM, 685--699. https://doi.org/10.1145/3514221.3526187
[60]
Tobias Ziegler, Sumukha Tumkur Vani, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. In SIGMOD.
[61]
Pengfei Zuo, Qihui Zhou, Jiazhao Sun, Liu Yang, Shuangwu Zhang, Yu Hua, James Cheng, Rongfeng He, and Huabing Yan. 2022. RACE: One-sided RDMA-conscious Extendible Hashing. ACM Transactions on Storage 18, 2 (may 2022), 1--29. https://doi.org/10.1145/3511895

Cited By

View all
  • (2024)Status-Byte-Assisted RDMA Transmission Mechanism for Optimizing Multi-Task Video Streaming in Edge ComputingApplied Sciences10.3390/app1417743714:17(7437)Online publication date: 23-Aug-2024
  • (2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 6-Aug-2024
  • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 2-May-2024
  • Show More Cited By

Index Terms

  1. Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMA

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 1, Issue 2
      PACMMOD
      June 2023
      2310 pages
      EISSN:2836-6573
      DOI:10.1145/3605748
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 June 2023
      Published in PACMMOD Volume 1, Issue 2

      Permissions

      Request permissions for this article.

      Author Tags

      1. RDMA
      2. distributed database management systems
      3. synchronization

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)408
      • Downloads (Last 6 weeks)36
      Reflects downloads up to 17 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Status-Byte-Assisted RDMA Transmission Mechanism for Optimizing Multi-Task Video Streaming in Edge ComputingApplied Sciences10.3390/app1417743714:17(7437)Online publication date: 23-Aug-2024
      • (2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 6-Aug-2024
      • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 2-May-2024
      • (2024)CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated MemoryProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695959(110-126)Online publication date: 4-Nov-2024
      • (2024)Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value StoresProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695951(127-143)Online publication date: 4-Nov-2024
      • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
      • (2024)Seamless: Transparent Storage Access Through Smart SwitchesProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663446(1-5)Online publication date: 10-Jun-2024
      • (2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
      • (2024)Scalable Distributed Inverted List Indexes in Disaggregated MemoryProceedings of the ACM on Management of Data10.1145/36549742:3(1-27)Online publication date: 30-May-2024
      • (2024)Data Flow Architectures for Data Processing on Modern Hardware2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00439(5511-5522)Online publication date: 13-May-2024
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media