Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3492321.3519555acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Building an efficient key-value store in a flexible address space

Published: 28 March 2022 Publication History

Abstract

Data management applications store their data using structured files in which data are usually sorted to serve indexing and queries. However, in-place insertions and removals of data are not naturally supported in a file's address space. To avoid repeatedly rewriting existing data in a sorted file to admit changes in place, applications usually employ extra layers of indirections, such as mapping tables and logs, to admit changes out of place. However, this approach leads to increased access cost and excessive complexity.
This paper presents a novel storage abstraction that provides a flexible address space, where in-place updates of arbitrary-sized data, such as insertions and removals, can be performed efficiently. With these mechanisms, applications can manage sorted data in a linear address space with minimal complexity. Extensive evaluations show that a key-value store built on top of it can achieve high performance and efficiency with a simple implementation.

References

[1]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. "Workload Analysis of a Large-Scale Key-Value Store". In: SIGMETRICS Perform. Eval. Rev. 40.1 (2012), pp. 53--64.
[2]
Michael A Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Bradley C Kuszmaul, Donald E Porter, Jun Yuan, and Yang Zhan. "And introduction to Be-trees and write-optimization". In: Login; Magazine 40.5 (2015).
[3]
Lawrence Benson, Hendrik Makait, and Tilmann Rabl. "Viper: An Efficient Hybrid PMem-DRAM Key-Value Store". In: Proceedings of the VLDB Endowment 14.9 (2021), pp. 1544--1556.
[4]
Edward Bortnikov, Anastasia Braginsky, Eshcar Hillel, Idit Keidar, and Gali Sheffi. "Accordion: Better Memory Organization for LSM Key-Value Stores". In: Proc. VLDB Endow. 11.12 (2018), pp. 1863--1875.
[5]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. "Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook". In: 18th USENIX Conference on File and Storage Technologies (FAST'20). 2020, pp. 209--223.
[6]
Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. "FASTER: A Concurrent Key-Value Store with In-Place Updates". In: Proceedings of the 2018 International Conference on Management of Data. 2018, pp. 275--290.
[7]
Hao Chen, Chaoyi Ruan, Cheng Li, Xiaosong Ma, and Yinlong Xu. "SpanDB: A Fast, Cost-Effective LSM-tree Based KV Store on Hybrid Storage". In: 19th USENIX Conference on File and Storage Technologies (FAST 21). 2021, pp. 17--32.
[8]
DWARF Debugging Information Format Committee. DWARF debugging information format version 5. 2017.
[9]
Alexander Conway, Abhishek Gupta, Vijay Chidambaram, Martin Farach-Colton, Richard Spillane, Amy Tai, and Rob Johnson. "SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores". In: 2020 USENIX Annual Technical Conference (USENIX ATC'20). 2020, pp. 49--63.
[10]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. "Benchmarking Cloud Serving Systems with YCSB". In: Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC'10). 2010, pp. 143--154.
[11]
Fernando J Corbato. A paging experiment with the multics system. Tech. rep. MASSACHUSETTS INST OF TECH CAMBRIDGE PROJECT MAC, 1968.
[12]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. 3rd. The MIT Press, 2009.
[13]
Counted B-Trees. https://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html.
[14]
Ali Davoudian, Liu Chen, and Mengchi Liu. "A Survey on NoSQL Stores". In: ACM Comput. Surv. 51.2 (2018).
[15]
Niv Dayan and Stratos Idreos. "Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging". In: Proceedings of the 2018 International Conference on Management of Data (SIGMOD'18). 2018, pp. 505--520.
[16]
Niv Dayan and Stratos Idreos. "The Log-Structured Merge-Bush & the Wacky Continuum". In: Proceedings of the 2019 International Conference on Management of Data (SIGMOD'19). 2019, pp. 449--466.
[17]
Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, and Tony Savor. "Optimizing Space Amplification in RocksDB." In: The Conference on Innovative Data Systems Research (CIDR'17). Vol. 3. 2017, p. 3.
[18]
John Esmet, Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. "The TokuFS Streaming File System". In: Proceedings of the 4th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage'12). 2012, p. 14.
[19]
Ext4 Disk Layout. https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout.
[20]
Ext4 Filesystem. https://www.kernel.org/doc/Documentation/filesystems/ext4.txt.
[21]
Facebook. RocksDB. https://rocksdb.org.
[22]
fallocate(2) --- Linux manual page. https://www.man7.org/linux/man-pages/man2/fallocate.2.html.
[23]
Sanjay Ghemawat and Jeff Dean. LevelDB. https://github.com/google/leveldb.
[24]
Eran Gilad, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Eshcar Hillel, Idit Keidar, Nurit Moscovici, and Rana Shahout. "EvenDB: Optimizing Key-Value Storage for Spatial Locality". In: Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys'20). 2020.
[25]
Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. "X-Engine: An Optimized Storage Engine for Large-Scale E-Commerce Transaction Processing". In: Proceedings of the 2019 International Conference on Management of Data. 2019, pp. 651--665.
[26]
Stratos Idreos and Mark Callaghan. "Key-Value Storage Engines". In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020, pp. 2667--2672.
[27]
Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind, and Sungjin Lee. "PinK: High-speed In-storage Key-value Store with Bounded Tails". In: 2020 USENIX Annual Technical Conference (USENIX ATC 20). 2020, pp. 173--187.
[28]
Inserting a hole into a file. https://lwn.net/Articles/629965/.
[29]
Intel® OptaneTechnology. https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.
[30]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. "BetrFS: Write-Optimization in a Kernel File System". In: ACM Trans. Storage 11.4 (2015).
[31]
Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, and Young-Ri Choi. "SLM-DB: Single-Level Key-Value Store with Persistent Memory". In: Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST'19). 2019, pp. 191--204.
[32]
Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. "Redesigning LSMs for Nonvolatile Memory with NoveLSM". In: Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC'18). 2018, pp. 993--1005.
[33]
Kornilios Kourtis, Nikolas Ioannou, and Ioannis Koltsidas. "Reaping the Performance of Fast NVM Storage with Udepot". In: Proceedings of the 17th USENIX Conference on File and Storage Technologies. 2019, pp. 1--15.
[34]
Changman Lee, Dongho Sim, Joo-Young Hwang, and Sangyeun Cho. "F2FS: A New File System for Flash Storage". In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). 2015, pp. 273--286.
[35]
D. De Leo and P. Boncz. "Packed Memory Arrays - Rewired". In: 2019 IEEE 35th International Conference on Data Engineering (ICDE). 2019, pp. 830--841.
[36]
Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. "KVell: The Design and Implementation of a Fast Persistent Key-Value Store". In: Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP'19). 2019, pp. 447--461.
[37]
Yinan Li, Bingsheng He, Robin Jun Yang, Qiong Luo, and Ke Yi. "Tree Indexing on Solid State Drives". In: Proc. VLDB Endow. 3.1--2 (2010), pp. 1195--1206.
[38]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. "WiscKey: Separating Keys from Values in SSD-conscious Storage". In: 14th USENIX Conference on File and Storage Technologies (FAST'16). 2016, pp. 133--148.
[39]
Chen Luo and Michael J Carey. "LSM-based storage techniques: a survey". In: The VLDB Journal 29.1 (2020), pp. 393--418.
[40]
Peter Macko, Margo Seltzer, and Keith A. Smith. "Tracking Back References in a Write-Anywhere File System". In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST'10). 2010, p. 2.
[41]
MongoDB. https://www.mongodb.com/.
[42]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. "The Log-Structured Merge-Tree (LSM-Tree)". In: Acta Inf. 33.4 (1996), pp. 351--385.
[43]
Michael A Olson, Keith Bostic, and Margo I Seltzer. "Berkeley DB." In: USENIX Annual Technical Conference, FREENIX Track. 1999, pp. 183--191.
[44]
Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. "Tucana: Design and Implementation of a Fast and Efficient Scale-up Key-Value Store". In: Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC'16). 2016, pp. 537--550.
[45]
PerconaFT (TokuDB). https://github.com/percona/PerconaFT.
[46]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. "PebblesDB: Building Key-Value Stores Using Fragmented Log-Structured Merge Trees". In: Proceedings of the 26th Symposium on Operating Systems Principles (SOSP'17). 2017, pp. 497--514.
[47]
Kai Ren and Garth Gibson. "TABLEFS: Enhancing Metadata Efficiency in the Local File System". In: Proceedings of the 2013 USENIX Conference on Annual Technical Conference (USENIX ATC'13). 2013, pp. 145--156.
[48]
Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. "SlimDB: A Space-Efficient Key-Value Storage Engine for Semi-Sorted Data". In: Proc. VLDB Endow. 10.13 (2017), pp. 2037--2048.
[49]
RocksDB Tuning Guide. https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide.
[50]
Ohad Rodeh. "B-Trees, Shadowing, and Clones". In: ACM Trans. Storage 3.4 (2008).
[51]
Ohad Rodeh, Josef Bacik, and Chris Mason. "BTRFS: The Linux B-Tree Filesystem". In: ACM Trans. Storage 9.3 (2013).
[52]
Mendel Rosenblum and John K. Ousterhout. "The Design and Implementation of a Log-Structured File System". In: ACM Trans. Comput. Syst. 10.1 (1992), pp. 26--52.
[53]
Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. "Log-Structured Memory for DRAM-Based Storage". In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST'14). 2014, pp. 1--16.
[54]
Felix Martin Schuhknecht, Jens Dittrich, and Ankur Sharma. "RUMA Has It: Rewired User-Space Memory Access is Possible!" In: Proc. VLDB Endow. 9.10 (2016), pp. 768--779.
[55]
Kai Shen, Stan Park, and Meng Zhu. "Journaling of Journal is (Almost) Free". In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST'14). 2014, pp. 287--293.
[56]
Pradeep Shetty, Richard Spillane, Ravikant Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. "Building Workload-Independent Storage with VT-Trees". In: Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST'13). 2013, pp. 17--30.
[57]
Symas Lightning Memory-mapped Database. https://symas.com/lmdb/.
[58]
The SGI XFS Filesystem. https://www.kernel.org/doc/Documentation/filesystems/xfs.txt.
[59]
Darrick Wong, Dave Chinner, Eric Sandeen, Ryan Lerch, and Sillion Graphics Inc. XFS Algorithms & Data Structures, 3rd Edition. 2018.
[60]
Kan Wu, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. "Towards an Unwritten Contract of Intel Optane SSD". In: Proceedings of the 11th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage'19). 2019, p. 3.
[61]
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. "LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items". In: 2015 USENIX Annual Technical Conference (USENIX ATC'15). 2015, pp. 71--82.
[62]
Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. "HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems". In: Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference. 2017, pp. 349--362.
[63]
Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, and Swaminathan Sundararaman. "Don't Stack Your Log On My Log". In: 2nd Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW'14). 2014.
[64]
Juncheng Yang, Yao Yue, and K. V. Rashmi. "A large scale analysis of hundreds of in-memory cache clusters at Twitter". In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20). 2020, pp. 191--208.
[65]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. "MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM". In: 2020 USENIX Annual Technical Conference (USENIX ATC'20). 2020, pp. 17--31.
[66]
Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. "Optimizing Every Operation in a Write-Optimized File System". In: Proceedings of the 14th Usenix Conference on File and Storage Technologies. 2016, pp. 1--14.
[67]
Yang Zhan, Alex Conway, Yizheng Jiao, Eric Knorr, Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Donald E. Porter, and Jun Yuan. "The Full Path to Full-Path Indexing". In: Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST'18). 2018, pp. 123--138.
[68]
H. Zhang, G. Chen, B. C. Ooi, K. Tan, and M. Zhang. "In-Memory Big Data Management and Processing: A Survey". In: IEEE Transactions on Knowledge and Data Engineering 27.7 (2015), pp. 1920--1948.
[69]
Wenshao Zhong, Chen Chen, Xingbo Wu, and Song Jiang. "REMIX: Efficient Range Query for LSM-trees". In: 19th USENIX Conference on File and Storage Technologies (FAST 21). 2021, pp. 51--64.

Index Terms

  1. Building an efficient key-value store in a flexible address space

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      EuroSys '22: Proceedings of the Seventeenth European Conference on Computer Systems
      March 2022
      783 pages
      ISBN:9781450391627
      DOI:10.1145/3492321
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 March 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. address space
      2. key-value store
      3. storage

      Qualifiers

      • Research-article

      Conference

      EuroSys '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 241 of 1,308 submissions, 18%

      Upcoming Conference

      EuroSys '25
      Twentieth European Conference on Computer Systems
      March 30 - April 3, 2025
      Rotterdam , Netherlands

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 1,132
        Total Downloads
      • Downloads (Last 12 months)106
      • Downloads (Last 6 weeks)21
      Reflects downloads up to 10 Nov 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media