Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3588195.3592982acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Closing the Performance Gap between Leveling and Tiering Compaction via Bundle Compaction

Published: 07 August 2023 Publication History

Abstract

So far, most LSM-tree-based storage engines adopt either leveling or tiering compaction. We note that while leveling compaction can deliver high search performance and low space amplification, it has a high rate of write amplification (therefore delivering poor write performance). On the other hand, tiering compaction has a low rate of write amplification (therefore delivering good write performance) but has poor search performance and high space amplification. Aiming to close the performance gap between leveling and tiering databases, this paper proposes a new storage engine called B+LSM. The novel ideas of B+LSM lie in two aspects: (1) B+LSM replaces the underlying level structure of LSM-tree with a B+-tree-like tree, and each tree node is defined as a Bundle Compaction Unit (BCU), whose size is allowed to be dynamically changed with workload statistics to balance read and write performance. (2) B+LSM proposes a new node-grained compaction scheme called Bundle Compaction. Bundle compaction is always triggered to merge all the data within a BCU node, partition them into bundles, and then send bundles to the children. Such a compaction scheme can take advantage of leveling and tiering compaction by auto-tuning the size of BCU nodes. We implemented B+LSM and compared it with LevelDB, RocksDB, PebblesDB, and L2SM on the YCSB workloads. The results show that B+LSM can achieve high time performance and reduce space amplification on both static and dynamic workloads.

References

[1]
Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. In ATC. 363--375.
[2]
Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In ATC. 753--766.
[3]
Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, and Diego Didona. 2020. SILK Preventing Latency Spikes in Log-Structured Merge Key-Value Stores Running Heterogeneous Workloads. ACM Transactions on Computer Systems, Vol. 36, 4 (2020), 1--27.
[4]
Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, Vol. 13, 7 (1970), 422--426.
[5]
Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, and Yinlong Xu. 2018. HashKV: Enabling Efficient Updates in KV Storage via Hashing. In ATC. 1007--1019.
[6]
Lidong Chen, Yinliang Yue, Haobo Wang, and Jianhua Wu. 2018. A Priority and Fairness Mixed Compaction Scheduling Mechanism for LSM-tree Based KV-Stores. In ICA3PP. 89--105.
[7]
A. Conway, A. Gupta, V. Chidambaram, M. Farach-Colton, R. P. Spillane, A. Tai, and R. Johnson. 2020. SplinterDB: Closing the bandwidth gap for NVMe key-value stores. In ATC. 49--63.
[8]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. 2010. Benchmarking cloud serving systems with YCSB. In SoCC. 143--154.
[9]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In SIGMOD. 79--94.
[10]
Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. In SIGMOD. 505--520.
[11]
HBase. 2022. http://hbase.apache.org/.
[12]
Kecheng Huang, Zhiping Jia, Zhaoyan Shen, Zili Shao, and Feng Chen. 2021. Less is More: De-amplifying I/Os for Key-value Stores with a Log-assisted LSM-tree. In ICDE. 612--623.
[13]
Peiquan Jin, Jianchuan Li, and Hai Long. 2021. DLC: A New Compaction Scheme for LSM-tree with High Stability and Low Latency. In EDBT. 547--557.
[14]
Jungwon Lee, Miran Shim, and Hyesook Lim. 2016. Name prefix matching using bloom filter pre-searching for content centric network. J. Netw. Comput. Appl., Vol. 65 (2016), 36--47.
[15]
LevelDB. 2022. https://github.com/google/leveldb.
[16]
Jianchuan Li, Peiquan Jin, Yuanjin Lin, Ming Zhao, Yi Wang, and Kuankuan Guo. 2021. Elastic and Stable Compaction for LSM-tree: A FaaS-Based Approach on TerarkDB. In CIKM. 3906--3915.
[17]
Ruicheng Liu, Peiquan Jin, Shouhong Wan, and Bei Hua. 2021. RoBF: An Auto-Tuning Bloom Filter for Mixed Queries on LSM-tree. In SEKE. 1--6.
[18]
Ruicheng Liu, Peiquan Jin, Xiaoliang Wang, Yongping Luo, and Zhaole Chu. 2022. Design Considerations of A Novel Distributed Key-Value Store for New Storage. In ICDCS. 1276--1277.
[19]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. WiscKey: Separating Keys from Values in SSD-Conscious Storage. ACM Trans. Storage, Vol. 13, 1 (2017), 5:1--5:28.
[20]
Mingchen Lu, Chen Tang, and Peiquan Jin. 2022. Revisiting LSM-Tree-Based Key-Value Stores for ZNS SSDs. In IEEE BigData. 6772--6774.
[21]
Chen Luo and Michael J. Carey. 2019. On Performance Stability in LSM-based Storage Systems. Proceedings of the VLDB Endowment, Vol. 13, 4 (2019), 449--462.
[22]
Chen Luo and Michael J. Carey. 2020. LSM-based storage techniques: a survey. The VLDB Journal, Vol. 29, 1 (2020), 393--418.
[23]
Siqiang Luo, Subarna Chatterjee, Rafael Ketsetsidis, Niv Dayan, Wilson Qin, and Stratos Idreos. 2020. Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores. In SIGMOD. 2071--2086.
[24]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica, Vol. 33, 4 (1996), 351--385.
[25]
Fengfeng Pan, Yinliang Yue, and Jin Xiong. 2017. duppercaseCompaction: Delayed compaction for the uppercaseLSM-tree. International Journal of Parallel Programming, Vol. 45, 6 (2017), 1310--1325.
[26]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. In SOSP. 497--514.
[27]
RocksDB. 2022. http://rocksdb.org/.
[28]
Russell Sears and Raghu Ramakrishnan. 2012. buppercaseLSM: a general purpose log structured merge tree. In SIGMOD. 217--228.
[29]
Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Siyuan Ma, Yanfeng Zhang, and Xiaodong Zhang. 2017. uppercaseLSbuppercaseM-tree: Re-enabling buffer caching in data management for mixed reads and writes. In ICDCS. 68--79.
[30]
A. v. Renen, V. Leis, A. Kemper, T. Neumann, T. Hashida, K. Oe, Y. Doi, L. Harada, and M. Sato. 2018. Managing Non-Volatile Memory in Database Systems. In SIGMOD. 1541--1555.
[31]
Xiaoliang Wang, Peiquan Jin, Bei Hua, Hai Long, and Wei Huang. 2022. Reducing Write Amplification of LSM-Tree with Block-Grained Compaction. In ICDE. 3119--3131.
[32]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In SIGMOD. 323--336.
[33]
X. Zhou, J. Arulraj, A. Pavlo, and D. Cohen. 2021. Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory. In SIGMOD. 2195--2207.

Cited By

View all
  • (2024)Scavenger: Better Space-Time Trade-Offs for Key-Value Separated LSM-trees2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00312(4072-4085)Online publication date: 13-May-2024
  • (2024)Range Cache: An Efficient Cache Component for Accelerating Range Queries on LSM - Based Key-Value Stores2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00044(488-500)Online publication date: 13-May-2024
  • (2023)LeanKV: Efficient Garbage Collection for LSM-Based Key-Value Stores on ZNS SSDs through Lifetime-Based SSTable Clustering2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00260(1895-1902)Online publication date: 17-Dec-2023

Index Terms

  1. Closing the Performance Gap between Leveling and Tiering Compaction via Bundle Compaction

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
    August 2023
    350 pages
    ISBN:9798400701559
    DOI:10.1145/3588195
    • General Chair:
    • Ali R. Butt,
    • Program Chairs:
    • Ningfang Mi,
    • Kyle Chard
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. LSM-tree
    2. compaction
    3. key-value storage
    4. optimization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    HPDC '23

    Acceptance Rates

    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)153
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Scavenger: Better Space-Time Trade-Offs for Key-Value Separated LSM-trees2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00312(4072-4085)Online publication date: 13-May-2024
    • (2024)Range Cache: An Efficient Cache Component for Accelerating Range Queries on LSM - Based Key-Value Stores2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00044(488-500)Online publication date: 13-May-2024
    • (2023)LeanKV: Efficient Garbage Collection for LSM-Based Key-Value Stores on ZNS SSDs through Lifetime-Based SSTable Clustering2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00260(1895-1902)Online publication date: 17-Dec-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media