Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/MASCOTS.2012.32guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets

Published: 07 August 2012 Publication History

Abstract

Data deduplication has been widely adopted in contemporary backup storage systems. It not only saves storage space considerably, but also shortens the data backup time significantly. Since the major goal of the original data deduplication lies in saving storage space, its design has been focused primarily on improving write performance by removing as many duplicate data as possible from incoming data streams. Although fast recovery from a system crash relies mainly on read performance provided by deduplication storage, little investigation into read performance improvement has been made. In general, as the amount of deduplicated data increases, write performance improves accordingly, whereas associated read performance becomes worse. In this paper, we newly propose a deduplication scheme that assures demanded read performance of each data stream while achieving its write performance at a reasonable level, eventually being able to guarantee a target system recovery time. For this, we first propose an indicator called cache aware Chunk Fragmentation Level (CFL) that estimates degraded read performance on the fly by taking into account both incoming chunk information and read cache effects. We also show a strong correlation between this CFL and read performance in the backup datasets. In order to guarantee demanded read performance expressed in terms of a CFL value, we propose a read performance enhancement scheme called selective duplication that is activated whenever the current CFL becomes worse than the demanded one. The key idea is to judiciously write non-unique (shared) chunks into storage together with unique chunks unless the shared chunks exhibit good enough spatial locality. We quantify the spatial locality by using a selective duplication threshold value. Our experiments with the actual backup datasets demonstrate that the proposed scheme achieves demanded read performance in most cases at the reasonable cost of write performance.

Cited By

View all
  • (2023)SnapStoreProceedings of the 24th International Middleware Conference10.1145/3590140.3629120(261-274)Online publication date: 27-Nov-2023
  • (2022)From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s LocalityACM Transactions on Storage10.1145/350792118:3(1-28)Online publication date: 24-Aug-2022
  • (2021)Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry DistillingACM Transactions on Storage10.1145/345962617:4(1-23)Online publication date: 15-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
MASCOTS '12: Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
August 2012
480 pages
ISBN:9780769547930

Publisher

IEEE Computer Society

United States

Publication History

Published: 07 August 2012

Author Tags

  1. data deduplication
  2. read performance
  3. storage

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)SnapStoreProceedings of the 24th International Middleware Conference10.1145/3590140.3629120(261-274)Online publication date: 27-Nov-2023
  • (2022)From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s LocalityACM Transactions on Storage10.1145/350792118:3(1-28)Online publication date: 24-Aug-2022
  • (2021)Improving the Performance of Deduplication-Based Backup Systems via Container Utilization Based Hot Fingerprint Entry DistillingACM Transactions on Storage10.1145/345962617:4(1-23)Online publication date: 15-Oct-2021
  • (2021)Boosting the Restoring Performance of Deduplication Data by Classifying Backup MetadataACM/IMS Transactions on Data Science10.1145/34372612:2(1-16)Online publication date: 21-Apr-2021
  • (2020)Improving the Restore Performance via Physical-Locality Middleware for Backup SystemsProceedings of the 21st International Middleware Conference10.1145/3423211.3425691(341-355)Online publication date: 7-Dec-2020
  • (2019)Sliding look-back window assisted data chunk rewriting for improving deduplication restore performanceProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323311(129-142)Online publication date: 25-Feb-2019
  • (2018)ALACCProceedings of the 16th USENIX Conference on File and Storage Technologies10.5555/3189759.3189789(309-323)Online publication date: 12-Feb-2018
  • (2018)Data deduplication techniques for efficient cloud storage managementThe Journal of Supercomputing10.1007/s11227-017-2210-874:5(2035-2085)Online publication date: 1-May-2018
  • (2017)GLE-DedupInternational Journal of Parallel Programming10.1007/s10766-016-0450-545:4(946-964)Online publication date: 1-Aug-2017
  • (2015)Design tradeoffs for data deduplication performance in backup workloadsProceedings of the 13th USENIX Conference on File and Storage Technologies10.5555/2750482.2750507(331-344)Online publication date: 16-Feb-2015
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media