Article

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets

Authors:

MASCOTS '12: Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

Pages 201 - 208

https://doi.org/10.1109/MASCOTS.2012.32

Published: 07 August 2012 Publication History

Abstract

Data deduplication has been widely adopted in contemporary backup storage systems. It not only saves storage space considerably, but also shortens the data backup time significantly. Since the major goal of the original data deduplication lies in saving storage space, its design has been focused primarily on improving write performance by removing as many duplicate data as possible from incoming data streams. Although fast recovery from a system crash relies mainly on read performance provided by deduplication storage, little investigation into read performance improvement has been made. In general, as the amount of deduplicated data increases, write performance improves accordingly, whereas associated read performance becomes worse. In this paper, we newly propose a deduplication scheme that assures demanded read performance of each data stream while achieving its write performance at a reasonable level, eventually being able to guarantee a target system recovery time. For this, we first propose an indicator called cache aware Chunk Fragmentation Level (CFL) that estimates degraded read performance on the fly by taking into account both incoming chunk information and read cache effects. We also show a strong correlation between this CFL and read performance in the backup datasets. In order to guarantee demanded read performance expressed in terms of a CFL value, we propose a read performance enhancement scheme called selective duplication that is activated whenever the current CFL becomes worse than the demanded one. The key idea is to judiciously write non-unique (shared) chunks into storage together with unique chunks unless the shared chunks exhibit good enough spatial locality. We quantify the spatial locality by using a selective duplication threshold value. Our experiments with the actual backup datasets demonstrate that the proposed scheme achieves demanded read performance in most cases at the reasonable cost of write performance.

Cited By

View all

Panda ASarangi S(2023)SnapStoreProceedings of the 24th International Middleware Conference10.1145/3590140.3629120(261-274)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3629120
Zou XYuan JShilane PXia WZhang HWang X(2022)From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s LocalityACM Transactions on Storage10.1145/350792118:3(1-28)Online publication date: 24-Aug-2022
https://dl.acm.org/doi/10.1145/3507921
Zhang DDeng YZhou YZhu YQin X(2021)Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry DistillingACM Transactions on Storage10.1145/345962617:4(1-23)Online publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1145/3459626
Show More Cited By

Recommendations

Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, ...
Leveraging data deduplication to improve the performance of primary storage systems in the cloud
SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

Recent studies have shown that moderate to high data redundancy exists in primary storage systems, such as VM-based, enterprise and HPC storage systems, which indicates that the data deduplication technology can be used to effectively reduce the write ...
Chunk Fragmentation Level: An Effective Indicator for Read Performance Degradation in Deduplication Storage
HPCC '11: Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications

Data deduplication has recently become commonplace in most secondary storage and even in some primary storage for the capacity optimization purpose. Aside from its write performance, read performance of the deduplication storage has been gaining in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

MASCOTS '12: Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

August 2012

480 pages

ISBN:9780769547930

Publisher

IEEE Computer Society

United States

Publication History

Published: 07 August 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Panda ASarangi S(2023)SnapStoreProceedings of the 24th International Middleware Conference10.1145/3590140.3629120(261-274)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3629120
Zou XYuan JShilane PXia WZhang HWang X(2022)From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s LocalityACM Transactions on Storage10.1145/350792118:3(1-28)Online publication date: 24-Aug-2022
https://dl.acm.org/doi/10.1145/3507921
Zhang DDeng YZhou YZhu YQin X(2021)Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry DistillingACM Transactions on Storage10.1145/345962617:4(1-23)Online publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1145/3459626
Yang RDeng YZhou YHuang P(2021)Boosting the Restoring Performance of Deduplication Data by Classifying Backup MetadataACM/IMS Transactions on Data Science10.1145/34372612:2(1-16)Online publication date: 21-Apr-2021
https://dl.acm.org/doi/10.1145/3437261
Li PHua YCao QZhang M(2020)Improving the Restore Performance via Physical-Locality Middleware for Backup SystemsProceedings of the 21st International Middleware Conference10.1145/3423211.3425691(341-355)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3423211.3425691
Cao ZLiu SWu FWang GLi BDu DMerchant AWeatherspoon H(2019)Sliding look-back window assisted data chunk rewriting for improving deduplication restore performanceProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323311(129-142)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323311
Cao ZWen HWu FDu DAgrawal NRangaswami R(2018)ALACCProceedings of the 16th USENIX Conference on File and Storage Technologies10.5555/3189759.3189789(309-323)Online publication date: 12-Feb-2018
https://dl.acm.org/doi/10.5555/3189759.3189789
Kaur RChana IBhattacharya J(2018)Data deduplication techniques for efficient cloud storage managementThe Journal of Supercomputing10.1007/s11227-017-2210-874:5(2035-2085)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s11227-017-2210-8
Deng MChen WXiao NYu SHu Y(2017)GLE-DedupInternational Journal of Parallel Programming10.1007/s10766-016-0450-545:4(946-964)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s10766-016-0450-5
Fu MFeng DHua YHe XChen ZXia WZhang YTan YSchindler JZadok E(2015)Design tradeoffs for data deduplication performance in backup workloadsProceedings of the 13th USENIX Conference on File and Storage Technologies10.5555/2750482.2750507(331-344)Online publication date: 16-Feb-2015
https://dl.acm.org/doi/10.5555/2750482.2750507
Show More Cited By

Abstract

Cited By

Recommendations

Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

Leveraging data deduplication to improve the performance of primary storage systems in the cloud

Chunk Fragmentation Level: An Effective Indicator for Read Performance Degradation in Deduplication Storage

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations