Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2813767.2813773guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

LSM-trie: an LSM-tree-based ultra-large key-value store for small data

Published: 08 July 2015 Publication History

Abstract

Key-value (KV) stores have become a backbone of large-scale applications in today's data centers. The data set of the store on a single server can grow to billions of KV items or many terabytes, while individual data items are often small (with their values as small as a couple of bytes). It is a daunting task to efficiently organize such an ultra-large KV store to support fast access. Current KV storage systems have one or more of the following inadequacies: (1) very high data write amplifications, (2) large index set, and (3) dramatic degradation of read performance with overspill index out of memory.
To address the issue, we propose LSM-trie, a KV storage system that substantially reduces metadata for locating KV items, reduces write amplification by an order of magnitude, and needs only two disk accesses with each KV read even when only less than 10% of metadata (Bloom filters) can be held in memory. To this end, LSM-trie constructs a trie, or a prefix tree, that stores data in a hierarchical structure and keeps reorganizing them using a compaction method much more efficient than that adopted for LSM-tree. Our experiments show that LSM-trie can improve write and read throughput of LevelDB, a state-of-the-art KV system, by up to 20 times and up to 10 times, respectively.

References

[1]
Apache cassandra. http://cassandra.apache.org.
[2]
Consider the apache cassandra database. http://goo.gl/tX37h3.
[3]
How much text versus metadata is in a tweet? http://goo.gl/EBFIFs.
[4]
Leveldb: A fast and lightweight key/value database library by google. https://code.google.com/p/leveldb/.
[5]
mongodb. http://goo.gl/sQdYfo.
[6]
Mysql cluster: Scalability. http://goo.gl/SIfvfe.
[7]
Project voldermort: A distributed key-value storage system. http://project-voldemort.com.
[8]
Seagate kinetic HDD. http://goo.gl/pS9bs1.
[9]
Silt: A memory-efficient, high-performance key-value store. https://github.com/silt/silt.
[10]
Storing hundreds of millions of simple key-value pairs in redis. http://goo.gl/ieeU17.
[11]
Under the hood: Building and open-sourcing rocksdb. http://goo.gl/9xulVB.
[12]
ANDERSEN, D. G., FRANKLIN, J., KAMINSKY, M., PHANISHAYEE, A., TAN, L., AND VASUDEVAN, V. Fawn: A fast array of wimpy nodes. Commun. ACM 54, 7 (July 2011), 101-109.
[13]
ATIKOGLU, B., XU, Y., FRACHTENBERG, E., JIANG, S., AND PALECZNY, M. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/ PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (New York, NY, USA, 2012), SIGMETRICS '12, ACM, pp. 53-64.
[14]
BEAVER, D., KUMAR, S., LI, H. C., SOBEL, J., AND VAJGEL, P. Finding a needle in haystack: Facebook's photo storage. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (Berkeley, CA, USA, 2010), OSDI'10, USENIX Association, pp. 1-8.
[15]
BENDER, M. A., FARACH-COLTON, M., FINEMAN, J. T., FOGEL, Y. R., KUSZMAUL, B. C., AND NELSON, J. Cacheoblivious streaming b-trees. In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (New York, NY, USA, 2007), SPAA '07, ACM, pp. 81-92.
[16]
BLOOM, B. H. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (July 1970), 422-426.
[17]
BRANTNER, M., FLORESCU, D., GRAF, D., KOSSMANN, D., AND KRASKA, T. Building a database on s3. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (New York, NY, USA, 2008), SIGMOD '08, ACM, pp. 251-264.
[18]
COOPER, B. F., SILBERSTEIN, A., TAM, E., RAMAKRISHNAN, R., AND SEARS, R. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing (New York, NY, USA, 2010), SoCC '10, ACM, pp. 143-154.
[19]
DEAN, J., AND GHEMAWAT, S. Mapreduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (Jan. 2008), 107-113.
[20]
DEBNATH, B., SENGUPTA, S., AND LI, J. Flashstore: High throughput persistent key-value store. Proc. VLDB Endow. 3, 1-2 (Sept. 2010), 1414-1425.
[21]
DEBNATH, B., SENGUPTA, S., AND LI, J. Skimpystash: Ram space skimpy key-value store on flash-based storage. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (New York, NY, USA, 2011), SIGMOD '11, ACM, pp. 25-36.
[22]
DECANDIA, G., HASTORUN, D., JAMPANI, M., KAKULAPATI, G., LAKSHMAN, A., PILCHIN, A., SIVASUBRAMANIAN, S., VOSSHALL, P., AND VOGELS, W. Dynamo: Amazon's highly available key-value store. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles (New York, NY, USA, 2007), SOSP '07, ACM, pp. 205-220.
[23]
KUSZMAUL, B. C. How fractal trees work. http://goo.gl/PG3kr4.
[24]
LIM, H., FAN, B., ANDERSEN, D. G., AND KAMINSKY, M. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (New York, NY, USA, 2011), SOSP '11, ACM, pp. 1-13.
[25]
LOMET, D. Replicated indexes for distributed data. In In PDIS (1996), pp. 108-119.
[26]
NISHTALA, R., FUGAL, H., GRIMM, S., KWIATKOWSKI, M., LEE, H., LI, H. C., MCELROY, R., PALECZNY, M., PEEK, D., SAAB, P., STAFFORD, D., TUNG, T., AND VENKATARAMANI, V. Scaling memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (Lombard, IL, 2013), USENIX, pp. 385-398.
[27]
O'NEIL, P., CHENG, E., GAWLICK, D., AND O'NEIL, E. The log-structured merge-tree (lsm-tree). Acta Inf. 33, 4 (June 1996), 351-385.
[28]
PIRZADEH, P., TATEMURA, J., AND HACIGÜMÜS, H. Performance evaluation of range queries in key value stores. In 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May 2011 - Workshop Proceedings (2011), pp. 1092-1101.
[29]
SEARS, R., AND RAMAKRISHNAN, R. blsm: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (New York, NY, USA, 2012), SIGMOD '12, ACM, pp. 217-228.
[30]
SHETTY, P., SPILLANE, R., MALPANI, R., ANDREWS, B., SEYSTER, J., AND ZADOK, E. Building workload-independent storage with vt-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (Berkeley, CA, USA, 2013), FAST'13, USENIX Association, pp. 17-30.
[31]
VO, H. T., WANG, S., AGRAWAL, D., CHEN, G., AND OOI, B. C. Logbase: A scalable log-structured database system in the cloud. Proc. VLDB Endow. 5, 10 (June 2012), 1004-1015.
[32]
YU, X., GUM, B., CHEN, Y., WANG, R. Y., LI, K., KRISHNAMURTHY, A., AND ANDERSON, T. E. Trading capacity for performance in a disk array. In Proceedings of the 4th conference on Symposium on Operating System Design & Implementation-Volume 4 (2000), USENIX Association, pp. 17-17.
[33]
ZAHARIA, M., CHOWDHURY, M., DAS, T., DAVE, A., MA, J., MCCAULEY, M., FRANKLIN, M. J., SHENKER, S., AND STOICA, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (Berkeley, CA, USA, 2012), NSDI'12, USENIX Association, pp. 2-2.

Cited By

View all
  • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
  • (2023)An LSM Tree Augmented with B+ Tree on Nonvolatile MemoryACM Transactions on Storage10.1145/363347520:1(1-24)Online publication date: 2-Dec-2023
  • (2023)MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and QueryingProceedings of the ACM on Management of Data10.1145/36267361:4(1-27)Online publication date: 12-Dec-2023
  • Show More Cited By
  1. LSM-trie: an LSM-tree-based ultra-large key-value store for small data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    USENIX ATC '15: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference
    July 2015
    625 pages
    ISBN:9781931971225

    Sponsors

    • VMware
    • NetApp
    • Google Inc.
    • Facebook: Facebook
    • HP: HP

    Publisher

    USENIX Association

    United States

    Publication History

    Published: 08 July 2015

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
    • (2023)An LSM Tree Augmented with B+ Tree on Nonvolatile MemoryACM Transactions on Storage10.1145/363347520:1(1-24)Online publication date: 2-Dec-2023
    • (2023)MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and QueryingProceedings of the ACM on Management of Data10.1145/36267361:4(1-27)Online publication date: 12-Dec-2023
    • (2023)Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic WorkloadsProceedings of the ACM on Management of Data10.1145/36173331:3(1-25)Online publication date: 13-Nov-2023
    • (2023)Hector: A Framework to Design and Evaluate Scheduling Strategies in Persistent Key-Value StoresProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605614(535-545)Online publication date: 7-Aug-2023
    • (2022)Building a Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud StorageACM Transactions on Architecture and Code Optimization10.1145/352745219:3(1-26)Online publication date: 25-May-2022
    • (2022)TimeUnion: An Efficient Architecture with Unified Data Model for Timeseries Management Systems on Hybrid Cloud StorageProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526175(1418-1432)Online publication date: 10-Jun-2022
    • (2022)Dissecting, Designing, and Optimizing LSM-based Data StoresProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3522563(2489-2497)Online publication date: 10-Jun-2022
    • (2022)p2KVSProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519567(575-591)Online publication date: 28-Mar-2022
    • (2021)Verifying concurrent multicopy search structuresProceedings of the ACM on Programming Languages10.1145/34854905:OOPSLA(1-32)Online publication date: 15-Oct-2021
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media