Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3352460.3358303acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

FIDR: A Scalable Storage System for Fine-Grain Inline Data Reduction with Efficient Memory Handling

Published: 12 October 2019 Publication History

Abstract

Storage systems play a critical role in modern servers which run highly data-intensive applications. To satisfy the high performance and capacity demands of such applications, storage systems now deploy an array of fast SSDs per server. To reduce the storage cost of employing many SSDs per server, storage systems actively perform inline data reduction (e.g., data deduplication, compression). Existing inline data reduction studies can achieve high performance and scalability by offloading computation-intensive data-reduction operations to dedicated hardware accelerators. However, such existing studies suffer from limited workload support and scalability. For example, they reduce only large data blocks, which incur many IO requests, leading to low data reduction rates, and their offloading overlooks memory-intensive operations, leading to the unoptimal scalability.
In this paper, we propose FIDR, a highly scalable storage system to enable the inline data reduction of fine-grain data. We first identify key limitations of existing studies, and then set our scaling storage server design which effectively resolves the limitations by employing an optimal offloading mechanism. The key ideas of FIDR are to achieve high applicability by enabling fine-grain data reduction and high scalability by distributing data-reduction operations to optimal devices (e.g., host processor, accelerator, network interface card). The proposed offloading mechanism considers computation, memory capacity, and memory bandwidth requirements altogether. For evaluation, we implement an example FIDR system prototype using FPGAs. Our prototype system outperforms a current state-of-the-art data reduction system up to 3.3 times by significantly reducing both computation and memory resource requirements.

References

[1]
Deepstorage.net. 2012. Storage efficiency imperative: an in-depth review of storage efficiency technologies and the solidfire approach. http://www.deepstorage.net/NEW/reports/SolidFireStorageEfficiency.pdf.
[2]
Mohamed S Abdelfattah, Andrei Hagiescu, and Deshanand Singh. 2014. Gzip on a chip: High performance lossless data compression on fpgas using opencl. In Proceedings of the International Workshop on OpenCL 2013 & 2014. ACM, 4.
[3]
Elena Agostini, Davide Rossetti, and Sreeram Potluri. 2017. Offloading communication control logic in GPU accelerated applications. In 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 248--257.
[4]
Jaehyung Ahn, Dongup Kwon, Youngsok Kim, Mohammadamin Ajdari, Jaewon Lee, and Jangwoo Kim. 2015. DCS: a fast and scalable device-centric server architecture. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 559--571.
[5]
Mohammadamin Ajdari, Pyeongsu Park, Joonsung Kim, Dongup Kwon, and Jangwoo Kim. 2019. CIDR: A cost-effective in-line data reduction system for terabit-per-second scale SSD arrays. In High Performance Computer Architecture, 2019. HPCA-25. 25th Annual IEEE International Symposium on. IEEE.
[6]
Mohammadamin Ajdari, Pyeongsu Park, Dongup Kwon, Joonsung Kim, and Jangwoo Kim. 2018. A Scalable HW-Based Inline Deduplication for SSD Arrays. IEEE Computer Architecture Letters 17, 1 (2018), 47--50.
[7]
AMD. 2017. AMD EPYC 7000 series. https://www.amd.com/en/products/epyc-7000-series.
[8]
Anton Shilov. 2019. Samsung HBM2E 'Flashbolt' memory for GPUs: 16 GB per stack, 3.2 Gbps. https://www.anandtech.com/show/14110/samsung-introduces-hbm2e-flashbolt-memory-16-gb-32-gbps.
[9]
Pramod Bhatotia, Rodrigo Rodrigues, and Akshat Verma. 2012. Shredder: GPU-accelerated incremental storage and computation. In FAST.
[10]
Robert Birke, Mathias Bjoerkqvist, Lydia Y Chen, Evgenia Smirni, and Ton Engbersen. 2014. (Big) data in a virtualized world: volume, velocity, and variety in cloud datacenters. In Proceedings of the 12th USENIX conference on File and Storage Technologies. USENIX Association, 177--189.
[11]
Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives. In FAST.
[12]
Chris M. Evans. Jan 2017. HPE 3PAR Adaptive Data reduction: A competitive comparison of array-based data reduction. https://www.hpe.com/h20195/v2/getpdf.aspx/4AA6-6256ENW.pdf.
[13]
Java Doin. 2016. Open-source SHA-256 hardware core. http://opencores.org/project,sha256_hash_core.
[14]
Ahmed El-Shimi, Ran Kalach, Ankit Kumar, Adi Ottean, Jin Li, and Sudipta Sengupta. 2012. Primary Data Deduplication-Large Scale Study and System Design. In USENIX ATC.
[15]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking: SmartNICs in the public cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 51--66.
[16]
Jeremy Fowers, Joo-Young Kim, Doug Burger, and Scott Hauck. 2015. A scalable high-bandwidth architecture for lossless compression on fpgas. In Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on. IEEE, 52--59.
[17]
Fanglu Guo and Petros Efstathopoulos. 2011. Building a High-performance Deduplication System. In USENIX annual technical conference.
[18]
Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 620--629.
[19]
Hewlett Packard Enterprise. Nov 2018. HPE 3PAR StoreServ Architecture. https://www.hpe.com/h20195/v2/getpdf.aspx/4aa3-3516enw.pdf.
[20]
Intel. 2016. Intel Xeon Processor E5-4669 v4. https://ark.intel.com/products/93805/Intel-Xeon-Processor-E5-4669-v4-55M-Cache-2_20-GHz.
[21]
Kentaro Katayama, Hidetoshi Matsumura, Hiroaki Kameyama, Shinichi Sazawa, and Yasuhiro Watanabe. 2017. An FPGA-accelerated high-throughput data optimization system for high-speed transfer via wide area network. In 2017 International Conference on Field Programmable Technology (ICFPT). IEEE, 211--214.
[22]
Jonghwa Kim, Choonghyun Lee, Sangyup Lee, Ikjoon Son, Jongmoo Choi, Sungroh Yoon, Hu-ung Lee, Sooyong Kang, Youjip Won, and Jaehyuk Cha. 2012. Deduplication in SSDs: Mo del and quantitative analysis. In IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), 2012. IEEE, 1--12.
[23]
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. Reflex: Remote flash local flash. ACM SIGPLAN Notices 52, 4 (2017), 345--359.
[24]
Ricardo Koller and Raju Rangaswami. 2010. I/O deduplication: Utilizing content similarity to improve I/O performance. In File and Storage Technologies (FAST), 8th Usenix Conference on. Usenix.
[25]
Dongup Kwon, Jaehyung Ahn, Dongju Chae, Mohammadamin Ajdari, Jaewon Lee, Suheon Bae, Youngsok Kim, and Jangwoo Kim. 2018. DCS-ctrl: a fast and flexible device-control mechanism for device-centric server architecture. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 491--504.
[26]
Sungjin Lee, Jihoon Park, Kermin Fleming, Jihong Kim, et al. 2011. Improving performance and lifetime of solid-state drives using hardware-accelerated compression. IEEE Transactions on consumer electronics 57, 4 (2011).
[27]
Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. Kv-direct: High-performance in-memory key-value store with programmable nic. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 137--152.
[28]
Dongyang Li, Qing Yang, Qingbo Wang, Cyril Guyot, Ashwin Narasimha, Dejan Vucinic, and Zvonimir Bandic. 2015. A Parallel and Pipelined Architecture for Accelerating Fingerprint Computation in High Throughput Data Storages. In FCCM.
[29]
Bin Lin, Shanshan Li, Xiangke Liao, Jing Zhang, and Xiaodong Liu. 2014. Leach: an automatic learning cache for inline primary deduplication system. Frontiers of Computer Science 8, 2 (2014), 175--183.
[30]
Ming Liu, Simon Peter, Arvind Krishnamurthy, and P. Mangpo Phothilimthana. 2019. E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers. In USENIX annual technical conference.
[31]
PureStorage. 2019. PureStorage Purity Reduce. https://www.purestorage.com/products/purity/purity-reduce.html.
[32]
Ran Xian and Runshen Zhu. 2016. Reposity of an open-source PALM tree. https://github.com/runshenzhu/palmtree.
[33]
Jason Sewall, Jatin Chhugani, Changkyu Kim, Nadathur Satish, and Pradeep Dubey. 2011. PALM: Parallel architecture-friendly latch-free modifications to B+ trees on many-core processors. Proc. VLDB Endowment 4, 11 (2011), 795--806.
[34]
SK Hynix. 2019. HBM Products. https://www.skhynix.com/chat/product/dramHBM.jsp.
[35]
SmartIOPS. 2016. Flash Summit 2016 Product Video. http://www.smartiops.com/.
[36]
SmartIOPS. Feb. 2018. World's Fastest SSDs. http://www.smartiops.com/worlds-fastest-ssds/.
[37]
Solidfire. 2019. How Solidfire data efficiencies work. https://www.netapp.com/us/media/ds-solidfire-data-efficiencies-breif.pdf.
[38]
Kiran Srinivasan, Timothy Bisson, Garth R Goodson, and Kaladhar Voruganti. 2012. iDedup: latency-aware, inline data deduplication for primary storage. In FAST.
[39]
Storage Networking Industry Association. IOTTA trace repository. 2008. FIU Traces. http://iotta.snia.org/.
[40]
Vasily Tarasov, Deepak Jain, Geoff Kuenning, Sonam Mandal, Karthikeyani Palanisami, Philip Shilane, Sagar Trehan, and Erez Zadok. 2014. Dmdedup: Device mapper target for data deduplication. In Ottawa Linux Symp.
[41]
Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: creating application objects efficiently for heterogeneous computing. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 53--65.
[42]
Tung, Liam. 2018. Intel: This 'ruler' SSD is world's densest, so you can cram 1PB in single 1U rack. https://www.zdnet.com/article/intel-this-ruler-ssd-is-worlds-densest-so-you-can-cram-1pb-in-single-1u-rack/.
[43]
Avani Wildani, Ethan L Miller, and Ohad Rodeh. 2013. Hands: A heuristically arranged non-backup inline deduplication system. In ICDE.
[44]
Huijun Wu, Chen Wang, Yinjin Fu, Sherif Sakr, Kai Lu, and Liming Zhu. 2018. A differentiated caching mechanism to enable primary storage deduplication in clouds. IEEE Transactions on Parallel and Distributed Systems 29, 6 (2018), 1202--1216.
[45]
Wen Xia, Hong Jiang, Dan Feng, Lei Tian, Min Fu, and Zhongtao Wang. 2012. P-dedupe: Exploiting parallelism in data deduplication system. In Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on. IEEE, 338--347.
[46]
Xilinx. 2016. DMA Subsystem for PCI Express (Vivado 2016.3) - Performance Numbers. https://www.xilinx.com/support/answers/68049.html.
[47]
Xilinx. March 2019. VCU1525 Reconfigurable Acceleration Platform. https://www.xilinx.com/support/documentation/boards_and_kits/vcu1525/ug1268-vcu1525-reconfig-accel-platform.pdf.
[48]
Yi-Hua Edward Yang and Viktor K Prasanna. 2010. High throughput and large capacity pipelined dynamic search tree on FPGA. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 83--92.
[49]
Jie Zhang, David Donofrio, John Shalf, Mahmut T Kandemir, and Myoungsoo Jung. 2015. Nvmmu: A non-volatile memory management unit for heterogeneous gpu-ssd architectures. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 13--24.
[50]
Benjamin Zhu, Kai Li, and R Hugo Patterson. 2008. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In Fast, Vol. 8. 1--14.

Cited By

View all
  • (2024)Eliminating Storage Management Overhead of Deduplication over SSD Arrays Through a Hardware/Software Co-DesignProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640368(320-335)Online publication date: 27-Apr-2024
  • (2024)HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00078(825-838)Online publication date: 27-May-2024
  • (2023)Accelerating Content-Defined Chunking for Data Deduplication Based on Speculative JumpIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329077034:9(2568-2579)Online publication date: Sep-2023
  • Show More Cited By

Index Terms

  1. FIDR: A Scalable Storage System for Fine-Grain Inline Data Reduction with Efficient Memory Handling

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
        October 2019
        1104 pages
        ISBN:9781450369381
        DOI:10.1145/3352460
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 October 2019

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. FPGA
        2. SSD array
        3. compression
        4. deduplication
        5. memory management
        6. small chunk
        7. table management

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        MICRO '52
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 484 of 2,242 submissions, 22%

        Upcoming Conference

        MICRO '24

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)86
        • Downloads (Last 6 weeks)10
        Reflects downloads up to 02 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Eliminating Storage Management Overhead of Deduplication over SSD Arrays Through a Hardware/Software Co-DesignProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640368(320-335)Online publication date: 27-Apr-2024
        • (2024)HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00078(825-838)Online publication date: 27-May-2024
        • (2023)Accelerating Content-Defined Chunking for Data Deduplication Based on Speculative JumpIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329077034:9(2568-2579)Online publication date: Sep-2023
        • (2023)CostFM: A High Cost-Performance Fingerprint Management Mechanism for Shared SSDs2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00042(223-230)Online publication date: 6-Nov-2023
        • (2022)An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage SystemsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35308966:2(1-27)Online publication date: 6-Jun-2022
        • (2021)Understanding the Performance Characteristics of Computational Storage Drives: A Case Study with SmartSSDElectronics10.3390/electronics1021261710:21(2617)Online publication date: 26-Oct-2021
        • (2020)Hardware-assisted Service Live Migration in Resource-limited Edge Computing Systems2020 57th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18072.2020.9218677(1-6)Online publication date: Jul-2020

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media