Abstract
Recently, the log-structured merge-tree (LSM-tree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this, there have been a large number of research efforts, from both the database community and the operating systems community, that try to improve various aspects of LSM-trees. In this paper, we provide a survey of recent research efforts on LSM-trees so that readers can learn the state of the art in LSM-based storage techniques. We provide a general taxonomy to classify the literature of LSM-trees, survey the efforts in detail, and discuss their strengths and trade-offs. We further survey several representative LSM-based open-source NoSQL systems and discuss some potential future research directions resulting from the survey.
Similar content being viewed by others
Notes
Also referred to as runs in the literature.
Also referred to as compaction in the literature.
Even for an append-mostly workload, the total number of levels will grow extremely slowly since the maximum number of entries that an LSM-tree can store increases exponentially with a factor of T as the number of levels increases.
RocksDB supports a limited form of a partitioned tiering merge policy to bound the maximum size of each SSTable due to its internal restrictions. However, the disk space may still be doubled temporarily during large merges.
RocksDB [57] supports a limited form of fractional cascading by maintaining the set of overlapping SSTables at the adjacent next level for each SSTable. These pointers are used to narrow down the search range when locating specific SSTables during point lookups.
It is called the compaction filter in RocksDB since RocksDB prefers the term compaction to merge. We use the term merge here to minimize the potential for terminology confusion.
References
Absalyamov, I., et al.: Lightweight cardinality estimation in LSM-based systems. In: ACM SIGMOD, pp. 841–855 (2018)
Ahmad, M.Y., Kemme, B.: Compaction management in distributed key-value datastores. PVLDB 8(8), 850–861 (2015)
Alsubaiee, S., et al.: AsterixDB: a scalable, open source BDMS. PVLDB 7(14), 1905–1916 (2014)
Alsubaiee, S., et al.: Storage management in AsterixDB. PVLDB 7(10), 841–852 (2014)
Alsubaiee, S., et al.: LSM-based storage and indexing: an old idea with timely benefits. In: International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-spatial Data (GeoRich), pp 1–6 (2015)
Amur, H., et al.: Design of a write-optimized data store. Tech. rep, Georgia Institute of Technology (2013)
Athanassoulis, M., et al.: MaSM: efficient online updates in data warehouses. In: ACM SIGMOD, pp. 865–876. ACM (2011)
Athanassoulis, M., et al.: Designing access methods: the RUM conjecture. In: EDBT, vol. 2016, pp. 461–466 (2016)
Balmau, O., et al.: FloDB: unlocking memory in persistent key-value stores. In: European Conference on Computer Systems (EuroSys), pp. 80–94 (2017)
Balmau, O., et al.: TRIAD: creating synergies between memory, disk and log in log structured key-value stores. In: USENIX Annual Technical Conference (ATC), pp. 363–375 (2017)
Bender, M.A., et al.: Don’t thrash: how to cache your hash on flash. PVLDB 5(11), 1627–1637 (2012)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. CACM 13(7), 422–426 (1970)
Bortnikov, E., et al.: Accordion: better memory organization for LSM key-value stores. PVLDB 11(12), 1863–1875 (2018)
Cassandra. http://cassandra.apache.org/
Chan, H.H.W., et al.: HashKV: enabling efficient updates in KV storage via hashing. In: USENIX Annual Technical Conference (ATC), pp. 1007–1019 (2018)
Chang, F., et al.: Bigtable: a distributed storage system for structured data. ACM TOCS 26(2), 4:1–4:26 (2008)
Chazelle, B., Guibas, L.J.: Fractional cascading: I. A data structuring technique. Algorithmica 1(1), 133–162 (1986)
Chen, G.J., et al.: Realtime data processing at Facebook. In: ACM SIGMOD, pp. 1087–1098 (2016)
Dragon: A distributed graph query engine. https://code.fb.com/data-infrastructure/dragon-a-distributed-graph-query-engine/
Dayan, N., Idreos, S.: Dostoevsky: Better space-time trade-offs for LSM-tree based key-value stores via adaptive removal of superfluous merging. In: ACM SIGMOD, pp. 505–520 (2018)
Dayan, N., et al.: Monkey: optimal navigable key-value store. In: ACM SIGMOD, pp. 79–94 (2017)
Dayan, N., et al.: Optimal Bloom filters and adaptive merging for LSM-trees. ACM TODS 43(4), 16:1–16:48 (2018)
DeCandia, G., et al.: Dynamo: Amazon’y highly available key-value store. In: ACM SOSP, pp. 205–220 (2007)
Dong, S., et al.: Optimizing space amplification in RocksDB. In: CIDR, vol. 3, p. 3 (2017)
D’silva, J.V., et al.: Secondary indexing techniques for key-value stores: two rings to rule them all. In: International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP) (2017)
Duan, H., et al.: Incremental materialized view maintenance on distributed log-structured merge-tree. In: DASFAA, pp. 682–700 (2018)
Fagin, R., et al.: Optimal aggregation algorithms for middleware. In: ACM PODS, pp. 102–113 (2001)
Fan, B., et al.: Cuckoo filter: practically better than bloom. In: International Conference on emerging Networking EXperiments and Technologies (CoNEXT), pp. 75–88 (2014)
Fang, Y., et al.: Spatial indexing in Microsoft SQL Server 2008. In: ACM SIGMOD, pp. 1207–1216 (2008)
Golan-Gueta, G., et al.: Scaling concurrent log-structured data stores. In: European Conference on Computer Systems (EuroSys), pp. 32:1–32:14 (2015)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD, pp. 47–57 (1984)
HBase. https://hbase.apache.org/
Haerder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM CSUR 15(4), 287–317 (1983)
Jagadish, H.V., et al.: Incremental organization for data recording and warehousing. In: VLDB, pp. 16–25 (1997)
Jermaine, C., et al.: The partitioned exponential file for database storage management. VLDBJ 16(4), 417–437 (2007)
Kannan, S., et al.: Redesigning LSMs for nonvolatile memory with NoveLSM. In: USENIX Annual Technical Conference (ATC), pp. 993–1005 (2018)
Khodaei, A., et al.: Hybrid indexing and seamless ranking of spatial and textual features of web documents. In: Database and Expert Systems Applications (DEXA), pp. 450–466 (2010)
Kim, T., et al.: Supporting similarity queries in Apache AsterixDB. In: EDBT, pp. 528–539 (2018)
Kim, Y., et al.: A comparative study of log-structured merge-tree-based spatial indexes for big data. In: ICDE, pp. 147–150 (2017)
LevelDB. http://leveldb.org/
Lawder, J.: The application of space-filling curves to the storage and retrieval of multi-dimensional data. Ph.D. thesis, PhD Thesis, University of London, UK (2000)
Li, Y., et al.: Tree indexing on solid state drives. PVLDB 3(1–2), 1195–1206 (2010)
Lim, H., et al.: Towards accurate and fast evaluation of multi-stage log-structured designs. In: USENIX Conference on File and Storage Technologies (FAST), pp. 149–166 (2016)
Lu, L., et al.: WiscKey: separating keys from values in SSD-conscious storage. In: USENIX Conference on File and Storage Technologies (FAST), pp. 133–148 (2016)
Luo, C., Carey, M.J.: Efficient data ingestion and query processing for LSM-based storage systems. PVLDB 12(5), 531–543 (2019)
MyRocks. http://myrocks.io/
Mathieu, C., et al.: Bigtable merge compaction. CoRR arXiv:1407.3008 (2014)
Mei, F., et al.: LSM-tree managed storage for large-scale key-value store. In: ACM SoCC, pp. 142–156 (2017)
Mei, F., et al.: SifrDB: a unified solution for write-optimized key-value stores in large datacenter. In: ACM SoCC, pp. 477–489 (2018)
Muth, P., et al.: The LHAM log-structured history data access method. VLDBJ 8(3), 199–221 (2000)
O’Neil, P., et al.: The log-structured merge-tree (LSM-tree). Acta Inf. 33(4), 351–385 (1996)
Pan, F.F., et al.: dCompaction: speeding up compaction of the LSM-tree via delayed compaction. J. Comput. Sci. Technol. 32(1), 41–54 (2017)
Papagiannis, A., et al.: An efficient memory-mapped key-value store for flash storage. In: ACM SoCC, pp. 490–502 (2018)
Pugh, W.: Skip lists: a probabilistic alternative to balanced trees. CACM 33(6), 668–676 (1990)
Putze, F., et al.: Cache-, hash-, and space-efficient bloom filters. J. Exp. Algorithmics 14, 4:4.4–4:4.18 (2010)
Qader, M.A., et al.: A comparative study of secondary indexing techniques in LSM-based NoSQL databases. In: ACM SIGMOD, pp. 551–566 (2018)
RocksDB. http://rocksdb.org/
Raju, P., et al.: PebblesDB: building key-value stores using fragmented log-structured merge trees. In: ACM SOSP, pp. 497–514 (2017)
Ren, K., et al.: SlimDB: a space-efficient key-value storage engine for semi-sorted data. PVLDB 10(13), 2037–2048 (2017)
Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM TOCS 10(1), 26–52 (1992)
Sears, R., Ramakrishnan, R.: bLSM: a general purpose log structured merge tree. In: ACM SIGMOD, pp. 217–228 (2012)
Seltzer, M.I.: File system performance and transaction support. Tech. rep., PhD Thesis, Department of Electrical Engineering and Computer Sciences, University of California Berkeley (1992)
Severance, D.G., Lohman, G.M.: Differential files: their application to the maintenance of large databases. ACM TODS 1(3), 256–267 (1976)
Shetty, P.J., et al.: Building workload-independent storage with VT-trees. In: USENIX Conference on File and Storage Technologies (FAST), pp. 17–30 (2013)
Stonebraker, M.: The design of the Postgres storage system. In: VLDB, pp. 289–300 (1987)
Tan, W., et al.: Diff-index: differentiated index in distributed log-structured data stores. In: EDBT, pp. 700–711 (2014)
Tang, Y., et al.: Deferred lightweight indexing for log-structured key-value stores. In: International Symposium in Cluster, Cloud, and Grid Computing (CCGrid), pp. 11–20 (2015)
Teng, D., et al.: LSbM-tree: Re-enabling buffer caching in data management for mixed reads and writes. In: IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 68–79 (2017)
Teng, D., et al.: A low-cost disk solution enabling LSM-tree to achieve high performance for mixed read/write workloads. ACM TOS 14(2), 15:1–15:26 (2018)
Thonangi, R., Yang, J.: On log-structured merge for solid-state drives. In: ICDE, pp. 683–694 (2017)
Thonangi, R., et al.: A practical concurrent index for solid-state drives. In: ACM CIKM, pp. 1332–1341 (2012)
Turner, J.: New directions in communications (or which way to the information age?). IEEE Commun. Mag. 24(10), 8–15 (1986)
Vinçon, T., et al.: NoFTL-KV: Tackling write-amplification on KV-stores with native storage management. In: EDBT, pp. 457–460 (2018)
Wang, P., et al.: An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In: European Conference on Computer Systems (EuroSys), pp. 16:1–16:14 (2014)
Wu, L., et al.: LSII: An indexing structure for exact real-time search on microblogs. In: ICDE, pp. 482–493 (2013)
Wu, X., et al.: LSM-trie: an LSM-tree-based ultra-large key-value store for small data. In: USENIX Annual Technical Conference (ATC), pp. 71–82 (2015)
Yao, A.C.C.: On random 2–3 trees. Acta Inf. 9(2), 159–170 (1978)
Yao, T., et al.: Building efficient key-value stores via a lightweight compaction tree. ACM TOS 13(4), 29:1–29:28 (2017)
Yao, T., et al.: A light-weight compaction tree to reduce I/O amplification toward efficient key-value stores. In: International Conference on Massive Storage Systems and Technology (MSST) (2017)
Yoon, H., et al.: Mutant: Balancing storage cost and latency in LSM-tree data stores. In: ACM SoCC, pp. 162–173 (2018)
Yue, Y., et al.: Building an efficient put-intensive key-value store with skip-tree. IEEE Trans. Parallel Distrib. Syst. 28(4), 961–973 (2017)
Zhang, W., et al.: Improving write performance of LSMT-based key-value store. In: International Conference on Parallel and Distributed Systems (ICPADS), pp. 553–560 (2016)
Zhang, Y., et al.: ElasticBF: Fine-grained and elastic bloom filter towards efficient read for LSM-tree-based KV stores. In: USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage) (2018)
Zhang, Z., et al.: Pipelined compaction for the LSM-tree. In: IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp. 777–786 (2014)
Zhu, Y., et al.: An efficient bulk loading approach of secondary index in distributed log-structured data stores. In: DASFAA, pp. 87–102 (2017)
Acknowledgements
We would like to thank Mark Callaghan, Manos Athanassoulis, and the anonymous reviewers for their valuable comments and feedback. This work was supported by NSF awards CNS-1305430, IIS-1447720, and IIS-1838248 along with industrial support from Amazon, Google, and Microsoft and support from The Donald Bren Foundation (via a Bren Chair) of UC Irvine.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luo, C., Carey, M.J. LSM-based storage techniques: a survey. The VLDB Journal 29, 393–418 (2020). https://doi.org/10.1007/s00778-019-00555-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-019-00555-y