Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3650697.3650715guideproceedingsArticle/Chapter ViewAbstractPublication PagesfastConference Proceedingsconference-collections
research-article

ELECT: enabling erasure coding tiering for LSM-tree-based storage

Published: 11 March 2024 Publication History

Abstract

Given the skewed nature of practical key-value (KV) storage workloads, distributed KV stores can adopt a tiered approach to support fast data access in a hot tier and persistent storage in a cold tier. To provide data availability guarantees for the hot tier, existing distributed KV stores often rely on replication and incur prohibitively high redundancy overhead. Erasure coding provides a low-cost redundancy alternative, but incurs high access performance overhead. We present ELECT, a distributed KV store that enables erasure coding tiering based on the log-structured merge tree (LSM-tree), by adopting a hybrid redundancy approach that carefully combines replication and erasure coding with respect to the LSM-tree layout. ELECT incorporates hotness awareness and selectively converts data from replication to erasure coding in the hot tier and offloads data from the hot tier to the cold tier. It also provides a tunable approach to balance the trade-off between storage savings and access performance through a single user-configurable parameter. We implemented ELECT atop Cassandra, which is replication-based. Experiments on Alibaba Cloud show that ELECT achieves significant storage savings in the hot tier, while maintaining high performance and data availability guarantees, compared with Cassandra.

References

[1]
Amazon. Amazon Elastic Block Store. https://aws.amazon.com/ebs/.
[2]
Amazon. Amazon Simple Storage Service. https://aws.amazon.com/s3/.
[3]
Apache. Cassandra 4.1.0. https://github.com/apache/cassandra/releases/tag/cassandra-4.1.0.
[4]
Apache. Cassandra documentation - hints. https://cassandra.apache.org/doc/4.1/cassandra/operating/hints.html.
[5]
Apache. HBase. https://hbase.apache.org/.
[6]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. In Proc. of ACM SIGMETRICS, 2012.
[7]
Burton Howard Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 12(7):422--426, 1970.
[8]
Viveck R. Cadambe, Kishori M. Konwar, Muriel Medard, Haochen Pan, Lewis Tseng, and Yingjian Wu. CassandrEAS: Highly available and storage-efficient distributed key-value store with erasure coding. In Proc. of NCA, 2020.
[9]
Zhichao Cao and Siying Dong. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. In Proc. of USENIX FAST, 2020.
[10]
Jeremy C. W. Chan, Qian Ding, Patrick P. C. Lee, and Helen H. W. Chan. Parity logging with reserved space: Towards efficient updates and recovery in erasure-coded clustered storage. In Proc. of USENIX FAST, 2014.
[11]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. In Proc. of USENIX OSDI, 2006.
[12]
Batyr Charyyev, Engin Arslan, and Mehmet Hadi Gunes. Latency comparison of cloud datacenters and edge servers. In Proc. of IEEE GLOBECOM, 2020.
[13]
Haibo Chen, Heng Zhang, Mingkai Dong, Zhaoguo Wang, Yubin Xia, Haibing Guan, and Binyu Zang. Efficient and available in-memory KV-store with hybrid erasure coding and replication. ACM Trans. on Storage, 13(3):25, 2017.
[14]
Liangfeng Cheng, Yuchong Hu, and Patrick P. C. Lee. Coupling decentralized key-value stores with erasure coding. In Proc. of ACM SOCC, 2019.
[15]
Asaf Cidon, Stephen Rumble, Ryan Stutsman, Sachin Katti, John Ousterhout, and Mendel Rosenblum. Copysets: Reducing the frequency of data loss in cloud storage. In Proc. of USENIX ATC, 2013.
[16]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Proc. of ACM SOCC, 2010.
[17]
Couchbase. Couchbase automated data partitioning. https://www.couchbase.com/blog/what-exactly-membase/.
[18]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's highly available key-value store. In Proc. of ACM SOSP, 2007.
[19]
Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. Network coding for distributed storage systems. IEEE Trans. on Information Theory, 56(9):4539--4551, 2010.
[20]
Partha Dutta, Rachid Guerraoui, and Ron R. Levy. Optimistic erasure-coded distributed storage. In Proc. of DISC, 2008.
[21]
EdgeKV. Augment image and video manager with edge compute. https://techdocs.akamai.com/edgekv/docs/augment-image-and-video-manager-with-edge-compute.
[22]
Robert Escriva, Bernard Wong, and Emin Gün Sirer. HyperDex: A distributed, searchable key-value store. In Proc. of ACM SIGCOMM, 2012.
[23]
Bin Fan, Wittawat Tantisiriroj, Lin Xiao, and Garth A. Gibson. Diskreduce: Replication as a prelude to erasure coding in data-intensive scalable computing. Technical Report, CMU-PDL-11-112, Carnegie Mellon University Parallel Data Laboratory, 2011.
[24]
Daniel Ford, François Labelle, Florentina Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. Availability in globally distributed storage systems. In Proc. of USENIX OSDI, 2010.
[25]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proc. of ACM SOSP, 2003.
[26]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. Erasure coding in windows azure storage. In Proc. of USENIX ATC, 2012.
[27]
Intel. Intelligent storage acceleration library. https://github.com/intel/isa-l.
[28]
Jeeyoon Jung and Dongkun Shin. Lifetime-leveling LSM-tree compaction for ZNS SSD. In Proc. of ACM HotStorage, 2022.
[29]
Saurabh Kadekodi, Francisco Maturana, Sanjith Athlur, Arif Merchant, K. V. Rashmi, and Gregory R. Ganger. Tiger: Disk-adaptive redundancy without placement restrictions. In Proc. of USENIX OSDI, 2022.
[30]
Saurabh Kadekodi, Francisco Maturana, Suhas Jayaram Subramanya, Juncheng Yang, K. V. Rashmi, and Gregory R. Ganger. PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy. In Proc. of USENIX OSDI, 2020.
[31]
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proc. of ACM STOC, 1997.
[32]
Kishori M. Konwar, N. Prakash, Erez Kantor, Nancy Lynch, Muriel Médard, and Alexander A. Schwarzmann. Storage-optimized data-atomic algorithms for handling erasures and errors in distributed storage systems. In Proc. of IEEE IPDPS, 2016.
[33]
Chunbo Lai, Song Jiang, Liqiong Yang, Shiding Lin, Guangyu Sun, Zhenyu Hou, Can Cui, and Jason Cong. Atlas: Baidu's key-value storage system for cloud data. In Proc. of IEEE MSST, 2015.
[34]
Avinash Lakshman and Prashant Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS operating systems review, 44(2):35--40, 2010.
[35]
Youngmoon Lee, Hasan Al Maruf, Mosharaf Chowdhury, Asaf Cidon, and Kang G. Shin. Hydra: Resilient and highly available remote memory. In Proc. of USENIX FAST, 2022.
[36]
Runhui Li, Yuchong Hu, and Patrick P. C. Lee. Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. IEEE Trans. on parallel and distributed systems, 28(9):2500--2513, 2017.
[37]
Witold Litwin, Rim Moussa, and Thomas Schwarz. LH*RS---a highly-available scalable distributed data structure. ACM Trans. on Database System, 30(3):769- -811, 2005.
[38]
Alibaba Cloud Computing Co. Ltd. Alibaba cloud. https://www.alibabacloud.com/.
[39]
Francisco Maturana and K. V. Rashmi. Convertible codes: Enabling efficient conversion of coded data in distributed storage. IEEE Trans. on Information Theory, 68(7):4392 -- 4407, 2022.
[40]
Ruben Mayer, Harshit Gupta, Enrique Saurez, and Umakishore Ramachandran. FogStore: Toward a distributed data store for fog computing. In Proc. of IEEE FWC, 2017.
[41]
Ralph C. Merkle. A digital signature based on a conventional encryption function. In Proc. of CRYPTO, 1987.
[42]
Seyed Hossein Mortazavi, Mohammad Salehe, Carolina Simoes Gomes, Caleb Phillips, and Eyal De Lara. CloudPath: A multi-tier cloud computing framework. In Proc. of ACM/IEEE SEC, 2017.
[43]
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar. f4: Facebook's warm BLOB storage system. In Proc. of USENIX OSDI, 2014.
[44]
Nicolas Nicolaou, Viveck Cadambe, N. Prakash, Andria Trigeorgi, Kishori Konwar, Muriel Medard, and Nancy Lynch. ARES: Adaptive, reconfigurable, erasure coded, atomic storage. In Proc. of IEEE ICDCS, 2019.
[45]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33:351--385, 1996.
[46]
PingCAP. TiKV. https://tikv.org.
[47]
Sean Quinlan and Sean Dorward. Venti: a new approach to archival storage. In Proc. USENIX FAST, 2002.
[48]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proc. of ACM SOSP, 2017.
[49]
K. V. Rashmi, Mosharaf Chowdhury, Jack Kosaian, Ion Stoica, and Kannan Ramchandran. EC-Cache: Load-balanced, low-latency cluster caching with online erasure coding. In Proc. of USENIX OSDI, 2016.
[50]
Irving S. Reed and Gustave Solomon. Polynomial codes over certain finite fields. Journal of the society for industrial and applied mathematics, 8(2):300--304, 1960.
[51]
David Reinsel. How you contribute to today's growing datasphere and its enterprise impact. https://blogs.idc.com/2019/11/04/how-you-contribute-to-todays-growing-datasphere-and-its-enterprise-impact/, Nov 2019.
[52]
Mahadev Satyanarayanan. The emergence of edge computing. IEEE Computer, 50(1):30--39, 2017.
[53]
Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5):637 -- 646, 2016.
[54]
Amy Tai, Michael Wei, Michael J. Freedman, Ittai Abraham, and Dahlia Malkhi. Replex: A scalable, highly available multi-index data store. In Proc. of USENIX ATC, 2016.
[55]
Konstantin Taranov, Gustavo Alonso, and Torsten Hoefler. Fast and strongly-consistent per-item resilience in key-value stores. In Proc. of ACM EuroSys, 2018.
[56]
Robbert Van Renesse and Fred B. Schneider. Chain replication for supporting high throughput and availability. In Proc. of USENIX OSDI, 2004.
[57]
Michalis Vardoulakis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. Tebis: index shipping for efficient replication in lsm key-value stores. In Proc. of EuroSys, 2022.
[58]
John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan. The HP AutoRAID hierarchical storage system. ACM Trans. on Computer Systems, 14(1):108-- 136, 1996.
[59]
Si Wu, Zhirong Shen, and Patrick P. C. Lee. Enabling I/O-efficient redundancy transitioning in erasure-coded KV stores via elastic Reed-Solomon codes. In Proc. of IEEE SRDS, 2020.
[60]
Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. A tale of two erasure codes in HDFS. In Proc. of USENIX FAST, 2015.
[61]
Juncheng Yang, Anirudh Sabnis, Daniel S. Berger, K. V. Rashmi, and Ramesh K. Sitaraman. C2DN: How to harness erasure codes at the edge for efficient content delivery. In Proc. of USENIX NSDI, 2022.
[62]
Juncheng Yang, Yao Yue, and K. V. Rashmi. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In Proc. of USENIX OSDI, 2020.
[63]
Qiaori Yao, Yuchong Hu, Liangfeng Cheng, Patrick P. C. Lee, Dan Feng, Weichun Wang, and Wei Chen. StripeMerge: Efficient wide-stripe generation for large-scale erasure-coded storage. In Proc. of IEEE ICDCS, 2021.
[64]
Ting Yao, Jiguang Wan, Ping Huang, Yiwen Zhang, Zhiwen Liu, Changsheng Xie, and Xubin He. GearDB: A GC-free Key-Value store on HM-SMR drives with gear compaction. In Proc. of USENIX FAST, 2019.
[65]
Matt M. T. Yiu, Helen H. W. Chan, and Patrick P. C. Lee. Erasure coding for small objects in in-memory KV storage. In Proc. of ACM SYSTOR, 2017.
[66]
Heng Zhang, Mingkai Dong, and Haibo Chen. Efficient and available in-memory KV-store with hybrid erasure coding and replication. In Proc. of USENIX FAST, 2016.
[67]
Heng Zhang, Shaoyuan Huang, Mengwei Xu, Deke Guo, Xiaofei Wang, Victor C. M. Leung, and Wenyu Wang. How far have edge clouds gone? a spatial-temporal analysis of edge network latency in the wild. In Proc. of IEEE/ACM IWQoS, 2023.
[68]
Qiang Zhang, Yongkun Li, Patrick P. C. Lee, Yinlong Xu, and Si Wu. DEPART: Replica decoupling for distributed key-value storage. In Proc. of USENIX FAST, 2022.
[69]
Yang Zhou, Hassan M. G. Wassel, Sihang Liu, Jiaqi Gao, James Mickens, Minlan Yu, Chris Kennelly, Paul Turner, David E. Culler, Henry M. Levy, and Amin Vahdat. Carbink: Fault-tolerant far memory. In Proc. of USENIX OSDI, 2022.

Index Terms

  1. ELECT: enabling erasure coding tiering for LSM-tree-based storage
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    FAST '24: Proceedings of the 22nd USENIX Conference on File and Storage Technologies
    February 2024
    387 pages
    ISBN:978-1-939133-38-0
    • Others:
    • Xiaosong Ma,
    • Youjip Won

    Sponsors

    • FUTUREWEI
    • NSF
    • Google Inc.
    • IBM
    • Samsung

    Publisher

    USENIX Association

    United States

    Publication History

    Published: 11 March 2024

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media