Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

SlimDB: a space-efficient key-value storage engine for semi-sorted data

Published: 01 September 2017 Publication History

Abstract

Modern key-value stores often use write-optimized indexes and compact in-memory indexes to speed up read and write performance. One popular write-optimized index is the Log-structured merge-tree (LSM-tree) which provides indexed access to write-intensive data. It has been increasingly used as a storage backbone for many services, including file system metadata management, graph processing engines, and machine learning feature storage engines. Existing LSM-tree implementations often exhibit high write amplifications caused by compaction, and lack optimizations to maximize read performance on solid-state disks. The goal of this paper is to explore techniques that leverage common workload characteristics shared by many systems using key-value stores to reduce the read/write amplification overhead typically associated with general-purpose LSM-tree implementations. Our experiments show that by applying these design techniques, our new implementation of a key-value store, SlimDB, can be two to three times faster, use less memory to cache metadata indices, and show lower tail latency in read operations compared to popular LSM-tree implementations such as LevelDB and RocksDB.

References

[1]
T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. Linkbench: a database benchmark based on the facebook social graph. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013.
[2]
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS Performance Evaluation Review, 2012.
[3]
D. Belazzougui, F. C. Botelho, and M. Dietzfelbinger. Hash, displace, and compress. In Proceedings of the European Symposium on Algorithms, 2009.
[4]
M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In Proceedings of annual ACM symposium on parallel algorithms and architectures, 2007.
[5]
B. Bloom. Space/time trade-offs in hash coding with allowable errors. Communication of ACM, 1970.
[6]
G. S. Brodal and R. Fagerberg. Lower bounds for external memory dictionaries. In Proceedings of the annual ACM-SIAM symposium on Discrete algorithms, 2003.
[7]
F. Chang, J. Dean, S. Ghemawat, and et al. Bigtable: a distributed storage system for structured data. In the USENIX Symposium on Operating Systems Design and Implementation, 2006.
[8]
J. Chen, C. Douglas, and et al. Walnut: A unified cloud object store. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012.
[9]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In the ACM Symposium on Cloud Computing (SOCC), 2010.
[10]
N. Dayan, M. Athanassoulis, and S. Idreos. Monkey: Optimal navigable key-value store. In the ACM International Conference on Management of Data (SIGMOD), 2017.
[11]
B. Debnath, S. Sengupta, and J. Li. Flashstore: high throughput persistent key-value store. Proceedings of the VLDB Endowment, 2010.
[12]
B. Debnath, S. Sengupta, and J. Li. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In ACM International Conference on Management of data, 2011.
[13]
G. DeCandia, D. Hastorun, and et al. Dynamo: Amazon's highly available key-value store. In Proceedings of ACM SIGOPS Symposium on Operating Systems Principles, 2007.
[14]
B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher. Cuckoo filter: Practically better than bloom. In Proceedings of the ACM International on Conference on Emerging Networking Experiments and Technologies, 2014.
[15]
B. Fan, D. Zhou, H. Lim, M. Kaminsky, and D. G. Andersen. When cycles are cheap, some tables can be huge. In Proceedings of the USENIX conference on Hot Topics in Operating Systems, 2013.
[16]
N. Fountoulakis, M. Khosla, and K. Panagiotou. The multiple-orientability thresholds for random hypergraphs. In the ACM-SIAM symposium on Discrete Algorithms, 2011.
[17]
X. He, J. Pan, O. Jin, and et al. Practical lessons from predicting clicks on ads at facebook. In The International Workshop on Data Mining for Online Advertising, 2014.
[18]
G. J. Jacobson. Succinct Static Data Structures. PhD thesis, Pittsburgh, PA, USA, 1988. AAI8918056.
[19]
H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S. Sudarshan, and R. Kanneganti. Incremental organization for data recording and warehousing. In Proceedings of the International Conference on Very Large Data Bases, 1997.
[20]
LevelDB. A fast and lightweight key/value database library, 2011. http://code.google.com/p/leveldb/.
[21]
C. Li, Y. Lu, Q. Mei, D. Wang, and S. Pandey. Click-through prediction for advertising in twitter timeline. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.
[22]
Y. Li, B. He, R. J. Yang, Q. Luo, and K. Yi. Tree indexing on solid state drives. Proceedings of the VLDB Endowment, 2010.
[23]
H. Lim, D. G. Andersen, and M. Kaminsky. Towards accurate and fast evaluation of multi-stage log-structured designs. In Proceedings of the Usenix Conference on File and Storage Technologies (FAST), 2016.
[24]
H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: a memory-efficient, high-performance key-value store. In the ACM Symposium on Operating Systems Principles, 2011.
[25]
L. Lu, T. S. Pillai, and et al. Wisckey: Separating keys from values in ssd-conscious storage. In Proceedings of the 14th Usenix Conference on File and Storage Technologies, 2016.
[26]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 1996.
[27]
R. Pagh and F. F. Rodler. Cuckoo hashing. Journal of Algorithms, 2004.
[28]
K. Ren and G. Gibson. TableFS: Enhancing metadata efficiency in the local file system. Usenix Annual Technical Conference, 2013.
[29]
RocksDB. A facebook fork of leveldb which is optimized for flash and big memory machines, 2013. https://rocksdb.org.
[30]
O. Rodeh, J. Bacik, and C. Mason. BRTFS: The Linux B-tree Filesystem. IBM Research Report RJ10501, 2012.
[31]
R. Sears and R. Ramakrishnan. bLSM: a general purpose log structured merge tree. Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012.
[32]
P. Shetty, R. Spillane, R. Malpani, B. Andrews, J. Seyster, and E. Zadok. Building workload-independent storage with VT-Trees. In Proccedings of the conference on File and Storage Technologies, 2013.
[33]
X. Wu, Y. Xu, Z. Shao, and S. Jiang. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In USENIX Annual Technical Conference, 2015.

Cited By

View all
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 1-Aug-2024
  • (2024)Aleph Filter: To Infinity in Constant TimeProceedings of the VLDB Endowment10.14778/3681954.368202717:11(3644-3656)Online publication date: 1-Jul-2024
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 10, Issue 13
Proceedings of the 43rd International Conference on Very Large Data Bases, Munich, Germany
September 2017
72 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2017
Published in PVLDB Volume 10, Issue 13

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)4
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 1-Aug-2024
  • (2024)Aleph Filter: To Infinity in Constant TimeProceedings of the VLDB Endowment10.14778/3681954.368202717:11(3644-3656)Online publication date: 1-Jul-2024
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 1-Jul-2024
  • (2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
  • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
  • (2024)CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated InfrastructureProceedings of the ACM on Management of Data10.1145/36549272:3(1-28)Online publication date: 30-May-2024
  • (2024)An LSM Tree Augmented with B+ Tree on Nonvolatile MemoryACM Transactions on Storage10.1145/363347520:1(1-24)Online publication date: 30-Jan-2024
  • (2024)An Enhanced Batch Query Architecture in Real-time RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680034(5078-5085)Online publication date: 21-Oct-2024
  • (2024)Beyond Bloom: A Tutorial on Future Feature-Rich FiltersCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654681(636-644)Online publication date: 9-Jun-2024
  • (2024)SolsDBFuture Generation Computer Systems10.1016/j.future.2024.05.050160:C(295-304)Online publication date: 1-Nov-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media