Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Asynchronous Compaction Acceleration Scheme for Near-data Processing-enabled LSM-tree-based KV Stores

Published: 11 September 2024 Publication History

Abstract

LSM-tree-based key-value stores (KV stores) convert random-write requests to sequence-write ones to achieve high I/O performance. Meanwhile, compaction operations in KV stores update SSTables in forms of reorganizing low-level data components to high-level ones, thereby guaranteeing an orderly data layout in each component. Repeated writes caused by compaction (a.k.a. write amplification) impacts I/O bandwidth and overall system performance. Near-data processing (NDP) is one of the effective approaches to addressing this write-amplification issue. Most NDP-based techniques adopt synchronous parallel schemes to perform a compaction task on both the host and its NDP-enabled device. In synchronous parallel compaction schemes, the execution time of compaction is determined by a subsystem that has lower compaction performance coupled by under-utilized computing resources in a NDP framework. To solve this problem, we propose an asynchronous parallel scheme named PStore to improve the compaction performance in KV stores. In PStore, we designed a multi-tasks queue and three priority-based scheduling methods. PStore elects proper compaction tasks to be offloaded in host- and device-side compaction modules. Our proposed cross-leveled compaction mechanism mitigates write amplification induced by asynchronous compaction. PStore featured with the asynchronous compaction mechanism fully utilizes computing resources in both host- and device-side subsystems. Compared with the two popular synchronous compaction modes based on KV stores (TStore and LevelDB), our PStore immensely improves the throughput by up to a factor of 14 and 10.52 with an average of a factor of 2.09 and 1.73, respectively.

References

[2]
Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime H. Moreno, Richard Murphy, Ravi Nair, and Steven Swanson. 2014. Near-data processing: Insights from a MICRO-46 workshop. IEEE Micro 34, 4 (2014), 36–42.
[3]
Helen H. W. Chan, Chieh-Jan Mike Liang, Yongkun Li, Wenjia He, Patrick P. C. Lee, Lianjie Zhu, Yaozu Dong, Yinlong Xu, Yu Xu, Jin Jiang et al. 2018. HashKV: Enabling efficient updates in KV storage via hashing. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’18). 1007–1019.
[4]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2 (2008), 4.
[5]
Benjamin Y Cho, Won Seob Jeong, Doohwan Oh, and Won Woo Ro. 2013. XSD: Accelerating mapreduce by harnessing the GPU inside an SSD. In Proceedings of the 1st Workshop on Near-Data Processing.
[6]
Sangyeun Cho, Sanghoan Chang, and Insoon Jo. 2015. The solid-state drive technology, today and tomorrow. In Proceedings of the IEEE 31st International Conference on Data Engineering (ICDE’15). IEEE, 1520–1522.
[7]
Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. 2013. Active disk meets flash: A case for intelligent SSDs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ACM, 91–102.
[8]
Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query processing on smart SSDs: Opportunities and challenges. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 1221–1230.
[9]
Maya Gokhale, Bill Holmes, and Ken Iobst. 1995. Processing in memory: The terasys massively parallel PIM array. Computer 28, 4 (1995), 23–31.
[10]
Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling concurrent log-structured data stores. In Proceedings of the Tenth European Conference on Computer Systems. Association for Computing Machinery, Article 32, 14 pages.
[11]
Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho et al. 2016. Biscuit: A framework for near-data processing of big data workloads. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). IEEE, 153–165.
[12]
Tayler H. Hetherington, Timothy G. Rogers, Lisa Hsu, Mike O’Connor, and Tor M. Aamodt. 2012. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 88–98.
[13]
Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. 2019. X-engine: An optimized storage engine for large-scale E-commerce transaction processing. In Proceedings of the International Conference on Management of Data. 651–665.
[14]
Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind, and Sungjin Lee. 2020. PinK: High-speed in-storage key-value store with bounded tails. In Proceedings of the USENIX Annual Technical Conference.173–187.
[15]
Yanqin Jin, Hung-Wei Tseng, Yannis Papakonstantinou, and Steven Swanson. 2017. KAML: A flexible, high-performance key-value SSD. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 373–384.
[16]
Yangwook Kang, Yang-suk Kee, Ethan L. Miller, and Chanik Park. 2013. Enabling cost-effective data processing with smart ssd. In Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST’13). IEEE, 1–12.
[17]
Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Redesigning LSMs for nonvolatile memory with NoveLSM. In Proceedings of the USENIX Annual Technical Conference.993–1005.
[18]
Kimberly Keeton, David A. Patterson, and Joseph M. Hellerstein. 1998. A case for intelligent disks (IDISKs). ACM SIGMOD Rec. 27, 3 (1998), 42–52.
[19]
Chunbo Lai, Song Jiang, Liqiong Yang, Shiding Lin, Guangyu Sun, Zhenyu Hou, Can Cui, and Jason Cong. 2015. Atlas: Baidu’s key-value storage system for cloud data. In Proceedings of the 31st Symposium on Mass Storage Systems and Technologies (MSST’15). IEEE, 1–14.
[20]
Leveldb. 2018. fast and lightweight key/value database library by Google. Retrieved from http://code.google.com/p/leveldb
[21]
Yinan Li, Bingsheng He, Robin Jun Yang, Qiong Luo, and Ke Yi. 2010. Tree indexing on solid state drives. Proc. VLDB Endow. 3, 1-2 (2010), 1195–1206.
[22]
Shengwen Liang, Ying Wang, Youyou Lu, Zhe Yang, Huawei Li, and Xiaowei Li. 2019. Cognitive SSD: A deep learning engine for in-storage data retrieval. In Proceedings of the USENIX Annual Technical Conference (USENIXATC’19). 395–410.
[23]
Ling Liu. 2013. Computing infrastructure for big data processing. Front. Comput. Sci. 7, 2 (2013), 165–170.
[24]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of the 14th Usenix Conference on File and Storage Technologies. USENIX Association, 133–148.
[25]
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inform. 33, 4 (1996), 351–385.
[26]
Dongchul Park, Jianguo Wang, and Yang Suk Kee. 2016. In-storage computing for hadoop MapReduce framework: Challenges and possibilities. IEEE Trans. Comput. PP, 99 (2016), 1–1.
[27]
RocksDB. 2018. A persistent key-value store. Retrieved from https://rocksdb.org/
[28]
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 217–228.
[29]
CALD Seminar. 1998. Active storage for large-scale data mining and multimedia. Center for Automated Learning and Discovery (1998), 62–73.
[30]
Pradeep Shetty, Richard Spillane, Ravikant Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies. USENIX Association, 17–30.
[31]
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating a file system with GPUs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 485–498.
[32]
Hui Sun, Wei Liu, Jianzhong Huang, Song Fu, Zhi Qiao, and Weisong Shi. 2019. Near-data processing-enabled and time-aware compaction optimization for LSM-tree-based key-value stores. In Proceedings of the 48th International Conference on Parallel Processing. 1–11.
[33]
Hui Sun, Wei Liu, Jianzhong Huang, and Weisong Shi. 2019. Collaborative compaction optimization system using near-data processing for LSM-tree-based key-value stores. J. Parallel Distrib. Comput. 131 (2019), 29–43.
[34]
Hui Sun, Wei Liu, Zhi Qiao, Song Fu, and Weisong Shi. 2018. DStore: A holistic key-value store exploring near-data processing and on-demand scheduling for compaction optimization. IEEE Access 6 (2018), 61233–61253.
[35]
Xuan Sun, Jinghuan Yu, Zimeng Zhou, and Chun Jason Xue. 2020. FPGA-based compaction engine for accelerating LSM-tree key-value stores. In Proceedings of the IEEE International Conference on Data Engineering. 1261–1272.
[36]
Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Siyuan Ma, Yanfeng Zhang, and Xiaodong Zhang. 2017. LSbM-tree: Re-enabling buffer caching in data management for mixed reads and writes. In Proceedings of the 37th IEEE International Conference on Distributed Computing Systems.
[37]
Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, and Andreas Koch. 2020. nKV in action: Accelerating KV-stores on native computational storage with near-data processing. Proc. VLDB Endow. 13, 12 (2020), 2981–2984.
[38]
Jianguo Wang, Dongchul Park, Yang Suk Kee, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD in-storage computing for list intersection. In Proceedings of the International Workshop on Data Management on New Hardware. 4:1–4:7.
[39]
Jianguo Wang, Dongchul Park, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD in-storage computing for search engines. IEEE Trans. Comput. (2016), 1–1.
[40]
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree-based key-value store on open-channel SSD. In Proceedings of the 9th European Conference on Computer Systems. ACM, 16.
[41]
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference. 71–82.
[42]
Peng Xu, Jiguang Wan, Ping Huang, Xiaogang Yang, Chenlei Tang, Fei Wu, and Changsheng Xie. 2020. LUDA: Boost LSM key value store compactions with GPUs. Retrieved from https://arxiv.org/abs/2004.03054
[43]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing write stalls and write amplification in LSM-tree-based KV stores with matrix container in NVM. In Proceedings of the USENIX Annual Technical Conference.17–31.
[44]
Zigang Zhang, Yinliang Yue, Bingsheng He, Jin Xiong, Mingyu Chen, Lixin Zhang, and Ninghui Sun. 2014. Pipelined compaction for the LSM-tree. In Proceedings of the IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 777–786.

Cited By

View all
  • (2024)Coordinating Compaction Between LSM-Tree Based Key-Value Stores for Edge Federation2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00054(419-429)Online publication date: 7-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 23, Issue 6
November 2024
505 pages
EISSN:1558-3465
DOI:10.1145/3613645
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 11 September 2024
Online AM: 29 September 2023
Accepted: 21 September 2023
Revised: 19 August 2023
Received: 28 January 2023
Published in TECS Volume 23, Issue 6

Check for updates

Author Tags

  1. Near-data processing
  2. key-value store
  3. asynchronous compaction
  4. write amplification

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • State Key Laboratory of Computer Architecture (ICT, CAS)
  • Natural Science Research Projects at Higher Institutions in Anhui Province
  • U.S. National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)367
  • Downloads (Last 6 weeks)77
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Coordinating Compaction Between LSM-Tree Based Key-Value Stores for Edge Federation2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00054(419-429)Online publication date: 7-Jul-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media