Abstract
Key-value (KV) stores have become a backbone of large-scale applications in today’s data centers. Write-optimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used in KV storage systems like BigTable and RocksDB. Conventional LSM-tree organizes KV items into multiple, successively larger components, and uses compaction to push KV items from one smaller component to another adjacent larger component until the KV items reach the largest component. Unfortunately, current compaction scheme incurs significant write amplification due to repeated KV item reads and writes, and then results in poor throughput. We propose a new compaction scheme, delayed compaction (dCompaction), that decreases write amplification. dCompaction postpones some compactions and gather them into the following compaction. In this way, it avoids KV item reads and writes during compaction, and consequently improves the throughput of LSM-tree based KV stores. We implement dCompaction on RocksDB, and conduct extensive experiments. Validation using YCSB framework shows that compared with RocksDB dCompaction has about 30% write performance improvements and also comparable read performance.
Similar content being viewed by others
References
Sears, R., Ramakrishnan, R.: bLSM: a general purpose log-structured merge tree. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD’12, pp. 217–228. ACM, New York, NY, USA (2012)
Google: LevelDB. http://code.google.com/p/LevelDB (2012)
Facebook: RocksDB. http://rocksdb.org/ (2013)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI 2006: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, pp 15–25. USENIX Association, Berkeley, CA, USA (2006)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system[J]. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
HBase Documentation. Hbase: Bigtable-like structured storage for hadoop hdfs. http://wiki.apache.org/hadoop/Hbasea, (2011)
Neil, P.O., Cheng, E., Gawlick, D., Neil, E.O.: The log-structured merge-tree (LSM-tree). Acta Inf. 33(4), 351–385 (1996)
Redis: http://redis.io/
Memcached: http://memcached.org/
Huang, Q., Birman, K., van Renesse, R., Lloyd, W., Kumar, S., Li, H.C.: An analysis of facebook photo caching. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP’03)
Atikoglu, B., Yuehai, X., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: SIGMETRICS (2012)
Shetty, P., Spillane, R., el at.: Building workload-independent storage with VT-trees. In: 11th USENIX Conference on File and Storage Technologies (2013)
Jermaine, C., Omiecinski, E., Yee, W.G.: The partitioned exponential file for database storage management. VLDB J. 16(4), 417–437 (2007)
Zigang, Z., Yinliang, Y., Bingsheng, H., et al.: Pipelined compaction for the LSM-tree. In: 28th International Parallel and Distributed Processing Symposium, pp. 777–786 (2014)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC’00, pp. 143–154. ACM, New York, NY, USA (2010)
Escriva, R., Wong, B., Sirer, E.G.: HyperDex: a distributed, searchable key-value store. SIGCOMM Comput. Commun. Rev. 42(4), 25–36 (2012)
Bender, M.A., Farach-Colton, M., Fineman, J.T., Fogel, Y.R., Kuszmaul, B.C., Nelson, J.: Cache-oblivious streaming b-trees. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’07), pp. 81–92. ACM, New York, NY, USA (2007)
Spillane, R.P., Shetty, P.J., Zadok, E., Dixit, S., Archak, S.: An efficient multi-tier tablet server storage architecture. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC’11, pp. 1–14. ACM, New York, NY, USA (2011)
Li, Y., He, B., Yang, R.J., Luo, Q., Yi, K.: Tree indexing on solid state drives. Proc. VLDB Endow. 3(1–2), 1195–1206 (2010)
Chazelle, B., Guibas, L.J.: Fractional cascading: a data structuring technique with geometric applications. In: Automata, Languages and Programming, volume 194 of Lecture Notes in Computer Science, pp. 90–100. Springer, Berlin Heidelberg (1985)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Wu, X., et al.: LSM-trie: an LSM-treebased ultra-large key-value store for small data. In: USENIX Annual Technical Conference (2015)
Acknowledgements
We thank the anonymous reviewers for their helpful feedback. This work is partly supported by MOST’s 13th FYP project of China under Grant No. 2016YFB1000202, and National Science Foundation of China under Grants Nos. 61303056, 61379042 and Youth Innovation Promotion Association, CAS, No. 2016146.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pan, F., Yue, Y. & Xiong, J. dCompaction: Delayed Compaction for the LSM-Tree. Int J Parallel Prog 45, 1310–1325 (2017). https://doi.org/10.1007/s10766-016-0472-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0472-z