Optimizing the restoration performance of deduplication systems through an energy-saving data layout

Fang Yan^1,2,
Xi Yang¹,
Jiamou Liu²,
HengLiang Tang¹,
Yu-An Tan³ &
…
YuanZhang Li³

245 Accesses
Explore all metrics

Abstract

While data deduplication is an important data compression technique that removes copies of repeated data to enhance storage utilization, security and privacy risks arise since sensitive or delicate user data are at risk to both insider and outsider attacks. A distinct negative factor to performance of the technique is data fragmentation, which not only slows down the restoration process but also leads to massive power consumption. In this paper, we address this problem from the perspective of data layout. The kernel point of our method is a novel RAID-5-based cross grouping data layout (CGDL). We introduce a selective deduplication algorithm (SDD) to perform data replication and restoration. A new CGDL-based disk scheduling algorithm (LDP) is also proposed that predicts location dependence to save energy by eliminating the redundant disk read/write operations. We evaluate our new method on the Linux MD (multiple device) driver modules. The experiments show that, under a 10 disks 3 groups storage configuration, our method drastically (by 20%) improves restoration efficiency with only 7.6% reduction on the deduplication ratio, while reducing 23% power consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on novel classification of deduplication storage systems

Article 16 June 2020

A Viewpoint on Different Data Deduplication Systems and Allied Issues

DA Placement: A Dual-Aware Data Placement in a Deduplicated and Erasure-Coded Storage System

References

Yinjin F, Non X, Fang L (2012) Research and development on key techniques of data deduplication. J Comput Res Development 49(1):12–20
Google Scholar
Fang Y, YuAn T, QuanXin Z et al (2016) An effective RAID data layout for object-based de-duplication backup system. Chin J Electron 25(5):832–840
Article Google Scholar
Wen X, Hong J, Dan F (2015) Similarity and locality based indexing for high performance data deduplication. IEEE Trans Comput 64(4):1–10
MathSciNet MATH Google Scholar
Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. Soft Computing 20(4):1437–1448
Article Google Scholar
Min F, Dan F, Yu H et al (2015) Design tradeoffs for data deduplication performance in backup workloads. In: Proceedings of the 13th USENIX conference on file and storage techonogies, Santa Clara, CA, pp 331–344
Xiao Y, Yu-an T, Zhizhuo S et al (2018) A fault-tolerant and energy-efficient continuous data protection system. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-018-0726-2
Eshghi K, Tang HK (2005) A framework for analyzing and improving content-based chunking algorithms. Technical Report HPL-2005-30(R, vol 1. Hewlett Packard Laboratories, Palo Alto
Google Scholar
Srinivasan K, Bisson T, Goodson G et al (2012) iDedup: Latency-aware, inline data deduplication for primary storage. In: Proceedings Of the 10th USENIX conference on file and storage technologies. San Jose, CA, pp 299–312
Jin NY, Dongchul P, HC DD (2012) Assuring demanded read performance of data deduplication storage with backup datasets. In: Proceedings of the 20th IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication systems, Washington, DC, USA, pp 201–208
Kaczmarczyk M, Barczynski M, Kilian W et al (2012) Reducing impact of data fragmentation caused by in-line deduplication. In: Proceedings of the 5th annual international systems and storage conference, Haifa, Israel, pp 1–12
Lillibridge M, Eshghi K, Bhagwat D (2013) Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings Of the 11th USENIX conference on file and storage technologies, San Jose, CA, pp 183–189
Kaczmarczyk M, Dubnicki C (2015) Reducing fragmentation impact with forward knowledge in backup systems with deduplication. In: Proceedings of the 8th ACM international systems and storage conference, Haifa, Israel, 1–12
Ng C-H, Lee PPC (2013) RevDedup: A reverse deduplication storage system optimized for reads to latest backups. In: Proceedings of the 4th Asia-Pacific workshop on systems, Singapore, pp 1–18
Bo M, Hong J, SuZhen W et al (2012) SAR: SSD assisted restore optimization for deduplication-based storage systems in the cloud. In: Proceedings of the 7th international conference on networking, architecture and storages, Xiamen, Fujian, China, pp 328–337
Jian L, YunPeng C, Chang Y et al (2016) A delayed container organization approach to improve restore speed for deduplication systems. IEEE Trans Parallel Distrib Syst 27(9):2477–2491
Article Google Scholar
JingLi Z, XueJun N, LeiHua Q et al (2011) Optimization for data de-duplication algorithm based on storage environment aware. Comput Sci 38(2):308–316
Google Scholar
Gracia-Tinedo R, Sànchez-Artigas M, García-López P (2014) eWave: Leveraging energy-awareness for in-line deduplication clusters. In: Proceedings of the 2014 international conference on systems and storage, Haifa, Israel, pp 1–11
Zhizhuo S, Quanxin Z, Yuanzhan L et al (2018) DPPDL: A dynamic partial-parallel data layout for green video surveillance storage. IEEE Trans Circuits Syst Video Technol 28(1):193–205
Google Scholar
Xiao Y, Chang-you Z, Yuan X et al (2018) An extra-parity energy saving data layout for video surveillance?. Multimed Tools Appl 77(1):4563–4583
Article Google Scholar
Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. Soft Comput 20(4):1437–1448
Article Google Scholar
Lin W, Xu S, Li J, Xu L, Peng Z (2017) Design and theoretical analysis of virtual machine placement algorithm based on peak workload characteristics. Soft Comput 21(5):1301–1314
Article MATH Google Scholar
Liang C, Tan Y-A, Zhang X, Wang X, Zheng J, Zhang Q (2018) Building packet length covert channel over mobile VoIP traffics. J Netw Comput Appl 118:144–153
Article Google Scholar
Guan Z, Zhang Y, Wu L, Wu J, Ma Y, Hu J (2019) APPA: An anonymous and privacy preserving data aggregation scheme for fog-enhanced IoT. J Netw Comput Appl 125:82–92
Article Google Scholar
Liang C, Wang X, Zhang X, Zhang Y, Sharif K, Tan Y-A (2018) A payload-dependent packet rearranging covert channel for mobile VoIP traffic. Inform Sci 465:162–173
Article Google Scholar
Lin W, Xu S, He L, Li J (2017) Multi-resource scheduling and power simulation for cloud computing. Inf Sc 397:168–186
Article Google Scholar
Guan Z, Zhang Y, Zhu L, Wu L, Yu S (2019) Effect: An efficient flexible privacy-preserving data aggregation scheme with authentication in smart grid. Science China Information Sciences. https://doi.org/10.1007/s11432-018-9451-y
Tan Y-A, Xue Y, Liang C, Zheng J, Zhang Q, Zheng J, Li Y (2018) A root privilege management scheme with revocable authorization for Android devices. J Netw Comput Appl 107(4):69–82
Article Google Scholar
Zhang X, Zhu L, Wang X, Zhang C, Zhu H, Tan Y-A (2019) A packet-reordering covert channel over VoLTE voice and video traffics. J Netw Comput Appl 126:29–38
Article Google Scholar
Li Y, Hu J, Wu Z, Liu C, Peng F, Zhang Y (2018) Research on QoS service composition based on coevolutionary genetic algorithm. Soft Comput 22(23):7865–7874
Article MATH Google Scholar
Zhang Q, Wang X, Yuan J, Liu L, Wang R, Huang H, Li Y (2019) A hierarchical group key agreement protocol using orientable attributes for cloud computing. Inform Sci 480:55–69
Article Google Scholar
Tan Y-A, Zhang X, Sharif K, Liang C, Zhang Q, Li Y (2018) Covert timing channels for IoT over mobile networks. IEEE Wirel Commun 25(6):38–44
Article Google Scholar

Download references

Funding

This work is supported by the National Key R&D Program of China (no. 2018YFB1004402), the Beijing Municipal Natural Science Foundation (no. 4172053), the National Natural Science Foundation of China (no. U1636213), and China State Key Laboratory of Virtual Reality Technology and Systems (2016–2018) .

Author information

Authors and Affiliations

Information School, Beijing Wuzi University, Beijing, China
Fang Yan, Xi Yang & HengLiang Tang
Department of Computer Science, The University of Auckland, Auckland, New Zealand
Fang Yan & Jiamou Liu
Department of Computer Science, Beijing Institute of Technology, Beijing, China
Yu-An Tan & YuanZhang Li

Authors

Fang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jiamou Liu
View author publications
You can also search for this author in PubMed Google Scholar
HengLiang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-An Tan
View author publications
You can also search for this author in PubMed Google Scholar
YuanZhang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to YuanZhang Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, F., Yang, X., Liu, J. et al. Optimizing the restoration performance of deduplication systems through an energy-saving data layout. Ann. Telecommun. 74, 461–471 (2019). https://doi.org/10.1007/s12243-019-00711-z

Download citation

Received: 27 June 2018
Accepted: 01 March 2019
Published: 09 March 2019
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s12243-019-00711-z

Optimizing the restoration performance of deduplication systems through an energy-saving data layout

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A survey on novel classification of deduplication storage systems

A Viewpoint on Different Data Deduplication Systems and Allied Issues

DA Placement: A Dual-Aware Data Placement in a Deduplicated and Erasure-Coded Storage System

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimizing the restoration performance of deduplication systems through an energy-saving data layout

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A survey on novel classification of deduplication storage systems

A Viewpoint on Different Data Deduplication Systems and Allied Issues

DA Placement: A Dual-Aware Data Placement in a Deduplicated and Erasure-Coded Storage System

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation