research-article

A Read-leveling Data Distribution Scheme for Promoting Read Performance in SSDs with Deduplication

Authors:

Yuchong HuAuthors Info & Claims

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Article No.: 22, Pages 1 - 10

https://doi.org/10.1145/3337821.3337884

Published: 05 August 2019 Publication History

Abstract

Deduplication, as a space-saving technology, is widely deployed in the flash-based storage systems to address the capacity and endurance limitations of flash devices. In this paper, we find that deduplication changes the physical data layout, which raises the chances of the uneven read distribution. This uneven read distribution not only increases the access contention but also deteriorates the read parallelism, thus leading to the read performance degradation. To solve this issue, we propose an efficient read-leveling data distribution scheme (RLDDS), which scatters the highly-duplicated data into different parallel units, to improve the read performance for SSDs with deduplication for access-intensive workloads. RLDDS writes data into a parallel unit with lower potential read-hotness to balance the read distribution among all the parallel units. Extensive experimental results show that RLDDS effectively improves the read performance by up to 21.61% compared to deduplication with the conventional dynamic data allocation scheme. Additional benefits of RLDDS include the promoted write performance (up to 23.69%) in access-intensive workloads and the overall system performance improvement (up to 18.22%) with the same write traffic reduction.

References

[1]

Li-Pin Chang and Tei-Wei Kuo. 2002. An adaptive striping architecture for flash memory storage systems of embedded systems. In Proc. 8th IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, 187--196.

Digital Library

[2]

Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives. In Proc. USENIX FAST, 2011. 77--90.

Digital Library

[3]

Edward Grady Coffman and Peter J Denning. 1973. Operating systems theory. Vol. 973. prentice-Hall Englewood Cliffs, NJ.

Digital Library

[4]

Ahmed El-Shimi, Ran Kalach, Ankit Kumar, Adi Ottean, Jin Li, and Sudipta Sengupta. 2012. Primary data deduplicationąłlarge scale study and system design. In Proc. USENIX FAST, 2012.

Digital Library

[5]

Robert Gallager. 1962. Low-density parity-check codes. IRE Transactions on information theory 8, 1 (1962), 21--28.

[6]

Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar, and Anand Sivasubramaniam. 2011. Leveraging Value Locality in Optimizing NAND Flash-based SSDs. In Proc. USENIX FAST, 2011.

Digital Library

[7]

Yang Hu, Hong Jiang, Dan Feng, Lei Tian, Hao Luo, and Chao Ren. 2013. Exploring and exploiting the multilevel parallelism inside SSDs for improved performance and endurance. IEEE Trans. Comput. 62, 6 (2013), 1141--1155.

Digital Library

[8]

Yang Hu, Hong Jiang, Dan Feng, Lei Tian, Hao Luo, and Shuping Zhang. 2011. Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity. In Proceedings of the international conference on Supercomputing, 2011. ACM, 96--107.

Digital Library

[9]

Jeong-Uk Kang, Jin-Soo Kim, Chanik Park, Hyoungjun Park, and Joonwon Lee. 2007. A multi-channel architecture for highperformance NAND flash-based storage system. Journal of systems Architecture 53, 9 (2007), 644--658.

Digital Library

[10]

Jonghwa Kim, Choonghyun Lee, Sangyup Lee, Ikjoon Son, Jongmoo Choi, Sungroh Yoon, Hu-ung Lee, Sooyong Kang, Youjip Won, and Jaehyuk Cha. 2012. Deduplication in SSDs: Model and quantitative analysis. In Proc. MSST, 2012. IEEE, 1--12.

[11]

Ricardo Koller and Raju Rangaswami. 2010. I/O deduplication: Utilizing content similarity to improve I/O performance. ACM Transactions on Storage 6, 3 (2010), 13.

Digital Library

[12]

Cheng Li, Philip Shilane, Fred Douglis, Hyong Shim, Stephen Smaldone, and Grant Wallace. 2014. Nitro: A Capacity-Optimized SSD Cache for Primary Storage. In Proc. USENIX ATC, 2014. 501--512.

Digital Library

[13]

Wenji Li, Gregory Jean-Baptise, Juan Riveros, Giri Narasimhan, Tony Zhang, and Ming Zhao. 2016. CacheDedup: In-line Deduplication for Flash Caching. In Proc. USENIX FAST, 2016.

Digital Library

[14]

Jian Liu, Yunpeng Chai, Xiao Qin, and Yuan Xiao. 2014. PLC-cache: Endurable SSD cache for deduplication-based primary storage. In Proc. MSST, 2014. IEEE, 1--12.

[15]

Bo Mao, Hong Jiang, Suzhen Wu, and Lei Tian. 2014. POD: Performance oriented I/O deduplication for primary storage systems in the cloud. In Proc. IPDPS, 2014. IEEE, 767--776.

Digital Library

[16]

Microsoft. 2017. MSR cambridge traces repository. http://iotta.snia.org/traces/388.

[17]

Dushyanth Narayanan, Eno Thereska, Austin Donnelly, Sameh Elnikety, and Antony Rowstron. 2009. Migrating server storage to SSDs: analysis of tradeoffs. In Proceedings of the 4th ACM European conference on Computer systems, 2009.

Digital Library

[18]

FIPS PUB. 1995. Secure hash standard. Public Law (1995), 235.

[19]

Sean Quinlan and Sean Dorward. 2002. Venti: A New Approach to Archival Storage. In Proc. USENIX FAST, 2002. 89--101.

Digital Library

[20]

Ronald Rivest. 1992. The MD5 message-digest algorithm. Technical Report.

[21]

Ji-Yong Shin, Zeng-Lin Xia, Ning-Yi Xu, Rui Gao, Xiong-Fei Cai, Seungryoul Maeng, and Feng-Hsiung Hsu. 2009. FTL design exploration in reconfigurable high-performance SSD for server applications. In Proc. 23rd international conference on Supercomputing. ACM, 338--349.

Digital Library

[22]

Hiroshi Uchigaito, Seiji Miura, and Takumi Nito. 2018. Efficient data-allocation scheme for eliminating garbage collection during analysis of big graphs stored in nand flash memory. IEEE Trans. Comput. 67, 5 (2018), 646--657.

[23]

Fei Wu, Zuo Lu, You Zhou, Xubin He, Zhihu Tan, and Chang-sheng Xie. 2018. OSPADA: One-Shot Programming Aware Data Allocation Policy to Improve 3D NAND Flash Read Performance. In Proc. ICCD, 2018. IEEE, 51--58.

[24]

Guanying Wu and Xubin He. 2012. Reducing SSD read latency via NAND flash program and erase suspension. In Proc. USENIX FAST, 2012, Vol. 12. 10--10.

Digital Library

[25]

Guanying Wu, Xubin He, Ningde Xie, and Tong Zhang. 2010. DiffECC: Improving SSD read performance using differentiated error correction coding schemes. In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. IEEE, 57--66.

Digital Library

[26]

Meng Zhang, Fei Wu, Xubin He, Ping Huang, Shunzhuo Wang, and Changsheng Xie. 2016. REAL: A retention error aware LDPC decoding scheme to improve NAND flash read performance. In Proc. MSST, 2016. IEEE, 1--13.

[27]

Benjamin Zhu, Kai Li, and R Hugo Patterson. 2008. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In Proc. USENIX FAST, 2008. 1--14.

Digital Library

Cited By

Dong YChen BPan YZou XXia W(2024)H2C-Dedup: Reducing I/O and GC Amplification for QLC SSDs from the Deduplication Metadata PerspectiveProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698507(704-719)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698507
Li JChen XLiu DRen AZeng ZTan Y(2023)RadarSSD: A Computational Storage for Radar Signal ProcessingProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605628(244-253)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605628
Liu WLu YWu CLi JGuo M(2023)ERP: An Efficient Rewrite Scheme to Improve the Inline Deduplication Restore Performance in Backup Systems2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00055(371-378)Online publication date: Jan-2023
https://doi.org/10.1109/ICPADS56603.2022.00055
Show More Cited By

Index Terms

A Read-leveling Data Distribution Scheme for Promoting Read Performance in SSDs with Deduplication
1. Computer systems organization
  1. Architectures

Recommendations

CA-Dedupe: content-aware deduplication in SSDs
Abstract
Flash memories have been around for many years because of their high performance compared to HDDs. But flash memories have a limited lifespan, and they will wear prematurely if used in write-intensive usages. Solutions such as wear leveling, ...
Flash-Based Storage Deduplication Techniques: A Survey

Exponential growth of the amount of data stored worldwide together with high level of data redundancy motivates the active development of data deduplication techniques. The overall increasing popularity of solid-state drives (SSDs) as primary storage ...
WOJ: Enabling Write-Once Full-data Journaling in SSDs by Using Weak-Hashing-based Deduplication

Journaling is a commonly used technique to ensure data consistency in file systems, such as ext3 and ext4. With journaling technique, file system updates are first recorded in a journal (in the commit phase) and later applied to their home locations in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

August 2019

1107 pages

ISBN:9781450362955

DOI:10.1145/3337821

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2019

ICPP 2019: 48th International Conference on Parallel Processing

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
358
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)7

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dong YChen BPan YZou XXia W(2024)H2C-Dedup: Reducing I/O and GC Amplification for QLC SSDs from the Deduplication Metadata PerspectiveProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698507(704-719)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698507
Li JChen XLiu DRen AZeng ZTan Y(2023)RadarSSD: A Computational Storage for Radar Signal ProcessingProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605628(244-253)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605628
Liu WLu YWu CLi JGuo M(2023)ERP: An Efficient Rewrite Scheme to Improve the Inline Deduplication Restore Performance in Backup Systems2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00055(371-378)Online publication date: Jan-2023
https://doi.org/10.1109/ICPADS56603.2022.00055
Bae JPark JJun YSeo EMalka MKolodner HBellosa FGabel M(2022)Dedup-for-speedProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534937(128-139)Online publication date: 6-Jun-2022
https://dl.acm.org/doi/10.1145/3534056.3534937
Lu MWang FLi ZHe W(2022)EDC: An Elastic Data Cache to Optimizing the I/O Performance in Deduplicated SSDsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310140441:7(2250-2262)Online publication date: Jul-2022
https://doi.org/10.1109/TCAD.2021.3101404
He QZhang FBian GZhang WDuan DLi ZChen C(2022)Research on Data Routing Strategy of Deduplication in Cloud EnvironmentIEEE Access10.1109/ACCESS.2021.313975710(9529-9542)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2021.3139757
Cheng LHu YKe ZWu Z(2021)Coupling Right-Provisioned Cold Storage Data Centers with DeduplicationProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472485(1-11)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472485
Hu ZZou XXia WZhao YZhang WWu D(2021)Smart-DNN: Efficiently Reducing the Memory Requirements of Running Deep Neural Networks on Resource-constrained Platforms2021 IEEE 39th International Conference on Computer Design (ICCD)10.1109/ICCD53106.2021.00087(533-541)Online publication date: Oct-2021
https://doi.org/10.1109/ICCD53106.2021.00087
Mo YHua YLi PCao QLiu X(2021)A Cost-Efficient Metadata Scheme for High-Performance Deduplication Systems2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00034(49-56)Online publication date: Dec-2021
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00034
Hu ZZou XXia WJin STao DLiu YZhang WZhang Z(2020)Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats SimilarityProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404408(1-12)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404408

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents