Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3026852.3026870guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

99 deduplication problems

Published: 20 June 2016 Publication History

Abstract

Deduplication is a widely studied capacity optimization technique that replaces redundant regions of data with references. Not only is deduplication an ongoing area of academic research, numerous vendors have deduplicated storage products. Historically, most deduplication-related publications focus on a narrow range of topics: maximizing deduplication ratios and read/write performance. While future research will continue to optimize these areas, we believe that there are numerous novel, deduplication-specific problems that have been largely ignored in the academic community. Based on feedback from customers as well as internal architecture discussions, we present new deduplication problems that will hopefully spur the next generation of research.

References

[1]
A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. FARSITE: federated, available, and reliable storage for an incompletely trusted environment. In OSDI, 2002.
[2]
M. Chamness. Capacity forecasting in a backup storage environment. In LISA, 2011.
[3]
F. Douglis, D. Bhardwaj, H. Qian, and P. Shilane. Content-aware load balancing for distributed backup. In LISA, 2011.
[4]
D. Harnik, E. Khaitzin, and D. Sotnikov. Estimating unseen deduplication-from theory to practice. In FAST, 2016.
[5]
D. Harnik, B. Pinkas, and A. Shulman-Peleg. Side channels in cloud services: Deduplication in cloud storage. Security & Privacy, IEEE, 8(6):40-47, 2010.
[6]
M. Lillibridge, K. Eshghi, and D. Bhagwat. Improving restore speed for backup systems that use inline chunk-based deduplication. In FAST, 2013.
[7]
A. Ma, R. Traylor, F. Douglis, M. Chamness, G. Lu, D. Sawyer, S. Chandra, and W. Hsu. RAIDShield: characterizing, monitoring, and proactively protecting against disk failures. ACM TOS, 11(4):17, 2015.
[8]
J. Paulo and J. Pereira. A survey and classification of storage deduplication systems. ACM Computing Surveys, 47(1):11, 2014.
[9]
S. Quinlan and S. Dorward. Venti: A new approach to archival data storage. In FAST, 2002.
[10]
E. W. Rozier, W. H. Sanders, P. Zhou, N. Mandagere, S. M. Uttamchandani, and M. L. Yakushev. Modeling the fault tolerance consequences of deduplication. In IEEE Symposium on Reliable Distributed Systems, 2011.
[11]
D. Shue, M. J. Freedman, and A. Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In OSDI, 2012.
[12]
K. Srinivasan, T. Bisson, G. Goodson, and K. Voruganti. iDedup: latency-aware, inline data deduplication for primary storage. In FAST, 2012.
[13]
B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the Data Domain deduplication file system. In FAST, 2008.

Cited By

View all
  • (2024)An End-to-end High-performance Deduplication Scheme for Docker Registries and Docker Container Storage SystemsACM Transactions on Storage10.1145/364381920:3(1-35)Online publication date: 30-Jan-2024
  • (2022)From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s LocalityACM Transactions on Storage10.1145/350792118:3(1-28)Online publication date: 24-Aug-2022
  • (2021)GoSeed: Optimal Seeding Plan for Deduplicated StorageACM Transactions on Storage10.1145/345330117:3(1-28)Online publication date: 16-Aug-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
HotStorage'16: Proceedings of the 8th USENIX Conference on Hot Topics in Storage and File Systems
June 2016
120 pages

Sponsors

  • VMware
  • NetApp
  • Google Inc.
  • IBMR: IBM Research
  • Facebook: Facebook

Publisher

USENIX Association

United States

Publication History

Published: 20 June 2016

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An End-to-end High-performance Deduplication Scheme for Docker Registries and Docker Container Storage SystemsACM Transactions on Storage10.1145/364381920:3(1-35)Online publication date: 30-Jan-2024
  • (2022)From Hyper-dimensional Structures to Linear Structures: Maintaining Deduplicated Data’s LocalityACM Transactions on Storage10.1145/350792118:3(1-28)Online publication date: 24-Aug-2022
  • (2021)GoSeed: Optimal Seeding Plan for Deduplicated StorageACM Transactions on Storage10.1145/345330117:3(1-28)Online publication date: 16-Aug-2021
  • (2020)DupHunterProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489199(769-783)Online publication date: 15-Jul-2020
  • (2020)GoSeedProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386710(193-208)Online publication date: 24-Feb-2020
  • (2019)Sketching volume capacities in deduplicated storageProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323309(107-119)Online publication date: 25-Feb-2019
  • (2019)Sketching Volume Capacities in Deduplicated StorageACM Transactions on Storage10.1145/336973715:4(1-23)Online publication date: 18-Dec-2019
  • (2019)Improving Flash Memory Performance and Reliability for Smartphones With I/O DeduplicationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.283439538:6(1017-1027)Online publication date: 1-Jun-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media