Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems

Published: 06 June 2022 Publication History

Abstract

All-flash storage (AFS) systems have become an essential infrastructure component to support enterprise applications, where sub-millisecond latency and very high throughput are required. Nevertheless, the price per capacity ofsolid-state drives (SSDs) is relatively high, which has encouraged system architects to adoptdata reduction techniques, mainlydeduplication andcompression, in enterprise storage solutions. To provide higher reliability and performance, SSDs are typically grouped usingredundant array of independent disk (RAID) configurations. Data reduction on top of RAID arrays, however, adds I/O overheads and also complicates the I/O patterns redirected to the underlying backend SSDs, which invalidates the best-practice configurations used in AFS. Unfortunately, existing works on the performance of data reduction do not consider its interaction and I/O overheads with other enterprise storage components including SSD arrays and RAID controllers. In this paper, using a real setup with enterprise-grade components and based on the open-source data reduction module RedHat VDO, we reveal novel observations on the performance gap between the state-of-the-art and the optimal all-flash storage stack with integrated data reduction. We therefore explore the I/O patterns at the storage entry point and compare them with those at the disk subsystem. Our analysis shows a significant amount of I/O overheads for guaranteeing consistency and avoiding data loss through data journaling, frequent small-sized metadata updates, and duplicate content verification. We accompany these observations with cross-layer optimizations to enhance the performance of AFS, which range from deriving new optimal hardware RAID configurations up to introducing changes to the enterprise storage stack. By analyzing the characteristics of I/O types and their overheads, we propose three techniques: (a) application-aware lazy persistence, (b) a fast, read-only I/O cache for duplicate verification, and (c) disaggregation of block maps and data by offloading block maps to a very fast persistent memory device. By consolidating all proposed optimizations and implementing them in an enterprise AFS, we show 1.3× to 12.5× speedup over the baseline AFS with 90% data reduction, and from 7.8× up to 57× performance/cost improvement over an optimized AFS (with no data reduction) running applications ranging from 100% read-only to 100% write-only accesses.

References

[1]
014)]% AbdelfattahHS14, Mohamed S. Abdelfattah, Andrei Hagiescu, and Deshanand P. Singh. 2014. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL. In Proceedings of the International Workshop on OpenCL, IWOCL 2013 & 2014, May 13--14, 2013, Georgia Tech, Atlanta, GA, USA / Bristol, UK, May 12--13, 2014 . ACM, 4:1--4:9. https://doi.org/10.1145/2664666.2664670
[2]
018)]% pomacs/AhmadianMA18, Saba Ahmadian, Onur Mutlu, and Hossein Asadi. 2018. ECI-Cache: A High-Endurance and Cost-Efficient I/O Caching Scheme for Virtualized Platforms. Proc. ACM Meas. Anal. Comput. Syst., Vol. 2, 1 (2018), 9:1--9:34. https://doi.org/10.1145/3179412
[3]
021)]% tpds/AhmadianSMA21, Saba Ahmadian, Reza Salkhordeh, Onur Mutlu, and Hossein Asadi. 2021. ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms. IEEE Trans. Parallel Distributed Syst., Vol. 32, 10 (2021), 2415--2433. https://doi.org/10.1109/TPDS.2021.3066308
[4]
019a)]% AjdariLPKK19, Mohammadamin Ajdari, Wonsik Lee, Pyeongsu Park, Joonsung Kim, and Jangwoo Kim. 2019 a. FIDR: A Scalable Storage System for Fine-Grain Inline Data Reduction with Efficient Memory Handling. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12--16, 2019. ACM, 239--252. https://doi.org/10.1145/3352460.3358303
[5]
019b)]% AjdariPKKK19, Mohammadamin Ajdari, Pyeongsu Park, Joonsung Kim, Dongup Kwon, and Jangwoo Kim. 2019 b. CIDR: A Cost-Effective In-Line Data Reduction System for Terabit-Per-Second Scale SSD Arrays. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16--20, 2019. IEEE, 28--41. https://doi.org/10.1109/HPCA.2019.00025
[6]
018)]% AjdariPKKK18, Mohammadamin Ajdari, Pyeongsu Park, Dongup Kwon, Joonsung Kim, and Jangwoo Kim. 2018. A Scalable HW-Based Inline Deduplication for SSD Arrays. IEEE Comput. Archit. Lett., Vol. 17, 1 (2018), 47--50. https://doi.org/10.1109/LCA.2017.2753258
[7]
012)]% BhatotiaRV12, Pramod Bhatotia, Rodrigo Rodrigues, and Akshat Verma. 2012. Shredder: GPU-accelerated incremental storage and computation. In Proceedings of the 10th USENIX conference on File and Storage Technologies, FAST 2012, San Jose, CA, USA, February 14--17, 2012. USENIX Association, 14. https://www.usenix.org/conference/fast12/shredder-gpu-accelerated-incremental-storage-and-computation
[8]
022)]% broadcomWBCache, Broadcom Inc. 2022. Knowledge Base. https://www.broadcom.com/support/knowledgebase/1211161498420/performance-tuning-on-the-mr-sas-2108-lsi-sas-2208-sas-3108-base Retrieved January 20, 2022 from
[9]
003)]% bucy2003disksim, John S Bucy, Gregory R Ganger, et almbox. 2003. The DiskSim simulation environment version 3.0 reference manual .School of Computer Science, Carnegie Mellon University.
[10]
011)]% chenLZ11, Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives. In 9th USENIX Conference on File and Storage Technologies, San Jose, CA, USA, February 15--17. 77--90.
[11]
, Shenze Chen and Don Towsley. 1996. A performance evaluation of RAID architectures. IEEE Transactions on computers, Vol. 45, 10 (1996), 1116--1130.
[12]
017)]% HP_Reduce, Chris M. Evans. Jan 2017. HPE 3PAR Adaptive Data reduction: A competitive comparison of array-based data reduction . https://www.hpe.com/h20195/v2/getpdf.aspx/4AA6--6256ENW.pdf .
[13]
011)]% ConstantinescuGC11, Cornel Constantinescu, Joseph S. Glider, and David D. Chambliss. 2011. Mixing Deduplication and Compression on Active Data Sets. In 2011 Data Compression Conference (DCC 2011), 29--31 March 2011, Snowbird, UT, USA. IEEE Computer Society, 393--402. https://doi.org/10.1109/DCC.2011.46
[14]
010)]% DebnathS010, Biplob K. Debnath, Sudipta Sengupta, and Jin Li. 2010. ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory. In USENIX Annual Technical Conference (ATC), Boston, MA, USA, June 23--25 . https://www.usenix.org/conference/usenix-atc-10/chunkstash-speeding-inline-storage-deduplication-using-flash-memory
[15]
014)]% DuZX14, Yimo Du, Youtao Zhang, and Nong Xiao. 2014. R-Dedup: Content Aware Redundancy Management for SSD-Based RAID Systems. In 43rd International Conference on Parallel Processing (ICPP), Minneapolis, MN, USA, September 9--12. 111--120. https://doi.org/10.1109/ICPP.2014.20
[16]
imi et almbox.(2012)]% El-ShimiKKO0S12, Ahmed El-Shimi, Ran Kalach, Ankit Kumar, Adi Ottean, Jin Li, and Sudipta Sengupta. 2012. Primary Data Deduplication - Large Scale Study and System Design. In 2012 USENIX Annual Technical Conference, Boston, MA, USA, June 13--15, 2012. USENIX Association, 285--296. https://www.usenix.org/conference/atc12/technical-sessions/presentation/el-shimi
[17]
015)]% FowersKBH15, Jeremy Fowers, Joo-Young Kim, Doug Burger, and Scott Hauck. 2015. A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs. In 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2015, Vancouver, BC, Canada, May 2--6, 2015 . IEEE Computer Society, 52--59. https://doi.org/10.1109/FCCM.2015.46
[18]
018)]% FuHLFCX18, Min Fu, Shujie Han, Patrick P. C. Lee, Dan Feng, Zuoning Chen, and Yu Xiao. 2018. A Simulation Analysis of Redundancy and Reliability in Primary Storage Deduplication. IEEE Trans. Comput., Vol. 67, 9 (2018), 1259--1272. https://doi.org/10.1109/TC.2018.2808496
[19]
016)]% FuLFCX16, Min Fu, Patrick P. C. Lee, Dan Feng, Zuoning Chen, and Yu Xiao. 2016. A simulation analysis of reliability in primary storage deduplication. In 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25--27, 2016. IEEE Computer Society, 199--208. https://doi.org/10.1109/IISWC.2016.7581280
[20]
, Fanglu Guo and Petros Efstathopoulos. 2011. Building a High-performance Deduplication System. In USENIX Annual Technical Conference (ATC), Portland, OR, USA, June 15--17 . https://www.usenix.org/conference/usenixatc11/building-high-performance-deduplication-system
[21]
, Adam Horvath. 2021. MurMurHash3, an ultra fast hash algorithm for C# / .NET . https://blog.teamleadnet.com/2012/08/murmurhash3-ultra-fast-hash-algorithm.html Retrieved January 25, 2022 from
[22]
, Louis Imershein. 2018. Open Source Data Reduction for High Performance Flash Storage. In Flash Memory Summit, 2018, Santa Clara, CA, USA. https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_SOFT-202--1_Imershein.pdf
[23]
022a)]% intelISAL, Intel. 2022 a. Intel(R) Intelligent Storage Acceleration Library . https://github.com/01org/isa-l Retrieved January 20, 2022 from
[24]
022b)]% opencasdoc, Intel. 2022 b. Open Cache Acceleration Software . https://open-cas.github.io/ Retrieved January 20, 2022 from
[25]
022c)]% opencasgit, Intel. 2022 c. Open CAS Linux . https://https://github.com/Open-CAS/open-cas-linux/ Retrieved January 20, 2022 from
[26]
, Raj Jain and Shawn Routhier. 1986. Packet trains--measurements and a new model for computer network traffic. IEEE journal on selected areas in Communications, Vol. 4, 6 (1986), 986--995.
[27]
021)]% JiangZHMWLZ21, Tianyang Jiang, Guangyan Zhang, Zican Huang, Xiaosong Ma, Junyu Wei, Zhiyue Li, and Weimin Zheng. 2021. FusionRAID: Achieving Consistent Low Latency for Commodity SSD Arrays. In 19th USENIX Conference on File and Storage Technologies, FAST February 23--25, 2021 . USENIX Association, 355--370. https://www.usenix.org/conference/fast21/presentation/jiang
[28]
015)]% KaiserBSM15, Jü rgen Kaiser, André Brinkmann, Tim Sü ß, and Dirk Meister. 2015. Deriving and comparing deduplication techniques using a model-based classification. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys 2015, Bordeaux, France, April 21--24, 2015. ACM, 11:1--11:13. https://doi.org/10.1145/2741948.2741952
[29]
016)]% KaiserSNB16, Jü rgen Kaiser, Tim Sü ß, Lars Nagel, and André Brinkmann. 2016. Sorted deduplication: How to process thousands of backup streams. In 32nd Symposium on Mass Storage Systems and Technologies, MSST, Santa Clara, CA, USA, May 2--6 . 1--14. https://doi.org/10.1109/MSST.2016.7897082
[30]
009)]% kim2009flashsim, Youngjae Kim, Brendan Tauras, Aayush Gupta, and Bhuvan Urgaonkar. 2009. Flashsim: A simulator for nand flash-based solid-state drives. In 2009 First International Conference on Advances in System Simulation. IEEE, 125--131.
[31]
020)]% tc/KishaniAA20, Mostafa Kishani, Saba Ahmadian, and Hossein Asadi. 2020. A Modeling Framework for Reliability of Erasure Codes in SSD Arrays. IEEE Trans. Computers, Vol. 69, 5 (2020), 649--665. https://doi.org/10.1109/TC.2019.2962691
[32]
, Andy Klein. 2022. What SMART Hard Disk Errors Actually Tell Us . https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures Retrieved January 20, 2022 from
[33]
, Ricardo Koller and Raju Rangaswami. 2010. I/O Deduplication: Utilizing Content Similarity to Improve I/O Performance. In 8th USENIX Conference on File and Storage Technologies, San Jose, CA, USA, February 23--26, 2010. USENIX, 211--224. http://www.usenix.org/events/fast10/tech/full_papers/koller.pdf
[34]
984)]% lazowska1984quantitative, Edward D Lazowska, John Zahorjan, G Scott Graham, and Kenneth C Sevcik. 1984. Quantitative system performance: computer system analysis using queueing network models .Prentice-Hall, Inc.
[35]
015)]% LiYWGNVB15, Dongyang Li, Qing Yang, Qingbo Wang, Cyril Guyot, Ashwin Narasimha, Dejan Vucinic, and Zvonimir Bandic. 2015. A Parallel and Pipelined Architecture for Accelerating Fingerprint Computation in High Throughput Data Storages. In 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2015, Vancouver, BC, Canada, May 2--6, 2015. IEEE Computer Society, 203--206. https://doi.org/10.1109/FCCM.2015.43
[36]
021)]% LiPSLGG21, Huaicheng Li, Martin L. Putra, Ronald Shi, Xing Lin, Gregory R. Ganger, and Haryadi S. Gunawi. 2021. lODA: A Host/Device Co-Design for Strong Predictability Contract on Modern Flash Storage. In SOSP ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event / Koblenz, Germany, October 26--29 . 263--279. https://doi.org/10.1145/3477132.3483573
[37]
016)]% li2016cachededup, Wenji Li, Gregory Jean-Baptise, Juan Riveros, Giri Narasimhan, Tony Zhang, and Ming Zhao. 2016. CacheDedup: In-line deduplication for flash caching. In 14th USENIX Conference on File and Storage Technologies (FAST 16). 301--314.
[38]
009)]% LillibridgeEBDTC09, Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezis, and Peter Camble. 2009. Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality. In 7th USENIX Conference on File and Storage Technologies, February 24--27, 2009, San Francisco, CA, USA. Proceedings. 111--123. http://www.usenix.org/events/fast09/tech/full_papers/lillibridge/lillibridge.pdf
[39]
018a)]% liuCQL18, Jian Liu, Yunpeng Chai, Xiao Qin, and Yao-Hong Liu. 2018a. Endurable SSD-Based Read Cache for Improving the Performance of Selective Restore from Deduplication Systems. Journal of Computer Science and Technology, Vol. 33, 1 (2018), 58--78. https://doi.org/10.1007/s11390-018--1808--5
[40]
019)]% LiuSPKK19, Sihang Liu, Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, and Samira Manabi Khan. 2019. Janus: optimizing memory and storage support for non-volatile memory systems. In Proceedings of the 46th International Symposium on Computer Architecture, ISCA Phoenix, AZ, USA, June 22--26. ACM, 143--156. https://doi.org/10.1145/3307650.3322206
[41]
018b)]% LiuHAW18, Tong Liu, Xubin He, Shakeel Alibhai, and Chentao Wu. 2018b. Reference-Counter Aware Deduplication in Erasure-Coded Distributed Storage System. In IEEE International Conference on Networking, Architecture and Storage (NAS), Chongqing, China, October 11--14 . 1--10. https://doi.org/10.1109/NAS.2018.8515697
[42]
011)]% LSI11, LSI Corporation. 2011. LSI MegaRAID Advanced Software Evaluation Guide V3.0. https://docs.broadcom.com/doc/12350183 Retrieved January 20, 2022 from
[43]
, Lou Lydiksen. 2015. Modeling workload IO size mixes with Oracle's Vdbench tool . https://blog.purestorage.com/purely-technical/modeling-io-size-mixes-with-vdbench/ Retrieved January 20, 2022 from
[44]
014)]% maoJWT14, Bo Mao, Hong Jiang, Suzhen Wu, and Lei Tian. 2014. POD: Performance Oriented I/O Deduplication for Primary Storage Systems in the Cloud. In IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19--23. 767--776. https://doi.org/10.1109/IPDPS.2014.84
[45]
, Dirk Meister and André Brinkmann. 2010. dedupv1: Improving deduplication throughput using solid state drives (SSD). In IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2012, Lake Tahoe, Nevada, USA, May 3--7. 1--6. https://doi.org/10.1109/MSST.2010.5496992
[46]
018)]% ni2018thindedup, Fan Ni, Xingbo Wu, Weijun Li, and Song Jiang. 2018. ThinDedup: An I/O Deduplication Scheme that Minimizes Efficiency Loss due to Metadata Writes. In 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). IEEE, 1--8.
[47]
019)]% Oracle19, Oracle. 2019. Architectural Overview of the Oracle ZFS Storage Appliance . https://www.oracle.com/technetwork/server-storage/sun-unified-storage/documentation/o14-001-architecture-overview-zfsa-2099942.pdf Retrieved January 20, 2022 from
[48]
011)]% patel2011marss, Avadh Patel, Furat Afram, and Kanad Ghose. 2011. Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors. In 1st International Qemu Users' Forum. Citeseer, 29--30.
[49]
988)]% PattersonGK88, David A. Patterson, Garth A. Gibson, and Randy H. Katz. 1988. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 1--3, 1988. 109--116. https://doi.org/10.1145/50202.50214
[50]
021)]% redHat21git, RedHat. 2021. dm-vdo . https://github.com/dm-vdo/ Retrieved January 20, 2022 from
[51]
021)]% redhat21doc, RedHat, Inc. 2021. RedHat Enterprise Linux 8 Deduplicating and Compressing Storage: Using VDO to Optimize Storage Capacity in RHEL 8 . https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/pdf/deduplicating_and_compressing_storage/red_hat_enterprise_linux-8-deduplicating_and_compressing_storage-en-us.pdf Retrieved January 20, 2022 from
[52]
995)]% rosenblum1995complete, Mendel Rosenblum, Stephen A Herrod, Emmett Witchel, and Anoop Gupta. 1995. Complete computer system simulation: The SimOS approach. IEEE Parallel & Distributed Technology: Systems & Applications, Vol. 3, 4 (1995), 34--43.
[53]
011)]% RozierSZMUY11, Eric Rozier, William H. Sanders, Pin Zhou, NagaPramod Mandagere, Sandeep Uttamchandani, and Mark L. Yakushev. 2011. Modeling the Fault Tolerance Consequences of Deduplication. In 30th IEEE Symposium on Reliable Distributed Systems (SRDS), Madrid, Spain, October 4--7 . 75--84. https://doi.org/10.1109/SRDS.2011.18
[54]
019a)]% tpds/SalkhordehHA19, Reza Salkhordeh, Mostafa Hadizadeh, and Hossein Asadi. 2019 a. An Efficient Hybrid I/O Caching Architecture Using Heterogeneous SSDs. IEEE Trans. Parallel Distributed Syst., Vol. 30, 6 (2019), 1238--1250. https://doi.org/10.1109/TPDS.2018.2883745
[55]
019b)]% tc/SalkhordehMA19, Reza Salkhordeh, Onur Mutlu, and Hossein Asadi. 2019 b. An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories. IEEE Trans. Computers, Vol. 68, 8 (2019), 1114--1130. https://doi.org/10.1109/TC.2019.2906597
[56]
022)]% SamsungSMART, Samsung. 2022. Communicating With Your SSD: Understanding SMART Attributes . https://www.samsung.com Retrieved January 20, 2022 from
[57]
, Sam Silverberg. 2022. OpenDedup Overview . https://opendedup.org/odd/overview/ Retrieved January 20, 2022 from
[58]
012)]% SrinivasanBGV12, Kiran Srinivasan, Timothy Bisson, Garth R. Goodson, and Kaladhar Voruganti. 2012. iDedup: latency-aware, inline data deduplication for primary storage. In Proceedings of the 10th USENIX conference on File and Storage Technologies, FAST 2012, San Jose, CA, USA, February 14--17, 2012. USENIX Association, 24. https://www.usenix.org/conference/fast12/idedup-latency-aware-inline-data-deduplication-primary-storage
[59]
011)]% SukhwaniABA11, Bharat Sukhwani, Bü lent Abali, Bernard Brezzo, and Sameh W. Asaad. 2011. High-Throughput, Lossless Data Compresion on FPGAs. In IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2011, Salt Lake City, Utah, USA, 1--3 May 2011. IEEE Computer Society, 113--116. https://doi.org/10.1109/FCCM.2011.56
[60]
014)]% TarasovJKMPSTZ14, Vasily Tarasov, Deepak Jain, Geoff Kuenning, Sonam Mandal, Karthikeyani Palanisami, Philip Shilane, Sagar Trehan, and Erez Zadok. 2014. Dmdedup: Device mapper target for data deduplication. In 2014 Ottawa Linux Symposium (OLS) .
[61]
015)]% sigmetrics/TarihiAS15, Mojtaba Tarihi, Hossein Asadi, and Hamid Sarbazi-Azad. 2015. DiskAccel: Accelerating Disk-Based Experiments by Representative Sampling. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Portland, OR, USA, June 15--19, 2015 . ACM, 297--308. https://doi.org/10.1145/2745844.2745856
[62]
022)]% tetc/Tarihi2022, Mojtaba Tarihi, Soheil Azadvar, Arash Tavakkol, Hossein Asadi, and Hamid Sarbazi-Azad. 2022. Quick Generation of SSD Performance Models Using Machine Learning. IEEE Transactions on Emerging Topics in Computing (2022). https://doi.org/10.1109/TETC.2021.3116197
[63]
022a)]% kernel22, The kernel development community. 2022 a. BTRFS Main Page . https://btrfs.wiki.kernel.org/index.php/Main_Page Retrieved January 20, 2022 from
[64]
022b)]% kernelDML21, The kernel development community. 2022 b. dm-linear . https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/linear.html Retrieved January 20, 2022 from
[65]
017)]% NVDedup, Chundong Wang, Qingsong Wei, Jun Yang, Cheng Chen, Yechao Yang, and Mingdi Xue. 2017. NV-Dedup: High-performance inline deduplication for non-volatile memory. In IEEETransactions on Computers, Vol. 67. IEEE, 658--671.
[66]
020)]% wangLXKDL20, Qiuping Wang, Jinhong Li, Wen Xia, Erik Kruus, Biplob Debnath, and Patrick P. C. Lee. 2020. Austere Flash Caching with Deduplication and Compression. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15--17, 2020 . USENIX Association, 713--726. https://www.usenix.org/conference/atc20/presentation/wang-qiuping
[67]
013)]% WildaniMR13, Avani Wildani, Ethan L. Miller, and Ohad Rodeh. 2013. HANDS: A heuristically arranged non-backup in-line deduplication system. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013. IEEE Computer Society, 446--457. https://doi.org/10.1109/ICDE.2013.6544846
[68]
017)]% WuWFSZL17, Huijun Wu, Chen Wang, Yinjin Fu, Sherif Sakr, Liming Zhu, and Kai Lu. 2017. HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud. CoRR, Vol. abs/1702.08153 (2017). showeprint[arXiv]1702.08153 http://arxiv.org/abs/1702.08153
[69]
021)]% dedupHR, Suzhen Wu, Chunfeng Du, Weiwei Zhang, Bo Mao, and Hong Jiang. 2021. DedupHR: Exploiting Content Locality to Alleviate Read/Write Interference in Deduplication-based Flash Storage. In IEEE Transactions on Computers, 2021 . IEEE Computer Society. https://doi.org/10.1109/TC.2021.3084116
[70]
011)]% xiaJFH11, Wen Xia, Hong Jiang, Dan Feng, and Yu Hua. 2011. SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput. In 2011 USENIX Annual Technical Conference, Portland, OR, USA, June 15--17, 2011. USENIX Association. https://www.usenix.org/conference/usenixatc11/silo-similarity-locality-based-near-exact-deduplication-scheme-low-ram
[71]
012)]% xia2012p, Wen Xia, Hong Jiang, Dan Feng, Lei Tian, Min Fu, and Zhongtao Wang. 2012. P-Dedupe: Exploiting Parallelism in Data Deduplication System. In 7th IEEE International Conference on Networking, Architecture, and Storage (NAS), Xiamen, China, June 28--30. 338--347. https://doi.org/10.1109/NAS.2012.46
[72]
019)]% yan2019ses, Zhichao Yan, Hong Jiang, Song Jiang, Yujuan Tan, and Hao Luo. 2019. SES-Dedup: a case for low-cost ECC-based SSD deduplication. In 2019 35th Symposium on Mass Storage Systems and Technologies (MSST). 292--298. https://doi.org/10.1109/MSST.2019.00009
[73]
020)]% ZhaoAACTSRAB20, Nannan Zhao, Hadeel Albahar, Subil Abraham, Keren Chen, Vasily Tarasov, Dimitrios Skourtis, Lukas Rupprecht, Ali Anwar, and Ali Raza Butt. 2020. DupHunter: Flexible High-Performance Deduplication for Docker Registries. In USENIX Annual Technical Conference, USENIX ATC, July 15--17 . 769--783. https://www.usenix.org/conference/atc20/presentation/zhao
[74]
008)]% ZhuLP08, Benjamin Zhu, Kai Li, and R. Hugo Patterson. 2008. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In 6th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA, USA, February 26--29. 269--282. http://www.usenix.org/events/fast08/tech/zhu.html
[75]
021)]% ZouYSX0W21, Xiangyu Zou, Jingsong Yuan, Philip Shilane, Wen Xia, Haijun Zhang, and Xuan Wang. 2021. The Dilemma between Deduplication and Locality: Can Both be Achieved?. In 19th USENIX Conference on File and Storage Technologies, FAST February 23--25 . USENIX Association, 171--185. https://www.usenix.org/conference/fast21/presentation/zou
[76]
018)]% zuo2018improving, Pengfei Zuo, Yu Hua, Ming Zhao, Wen Zhou, and Yuncheng Guo. 2018. Improving the performance and endurance of encrypted non-volatile main memory through deduplicating writes. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 442--454.

Cited By

View all
  • (2024)From SSDs Back to HDDs: Optimizing VDO to Support Inline Deduplication and Compression for HDDs as Primary Storage MediaACM Transactions on Storage10.1145/3678250Online publication date: 23-Jul-2024
  • (2024)IO-SEA: Storage I/O and Data Management for Exascale ArchitecturesProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3654620(94-100)Online publication date: 7-May-2024

Index Terms

  1. An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
    Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 2
    POMACS
    June 2022
    499 pages
    EISSN:2476-1249
    DOI:10.1145/3543145
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2022
    Published in POMACS Volume 6, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. all-flash storage systems
    2. compression
    3. deduplication
    4. performance evaluation
    5. raid
    6. solid-state drives

    Qualifiers

    • Research-article

    Funding Sources

    • The European High-Performance Computing Joint Undertaking (JU) and the German Ministry of Education and Research (BMBF)
    • HPDS Corp.

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)78
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)From SSDs Back to HDDs: Optimizing VDO to Support Inline Deduplication and Compression for HDDs as Primary Storage MediaACM Transactions on Storage10.1145/3678250Online publication date: 23-Jul-2024
    • (2024)IO-SEA: Storage I/O and Data Management for Exascale ArchitecturesProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3654620(94-100)Online publication date: 7-May-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media