Abstract
While data deduplication is an important data compression technique that removes copies of repeated data to enhance storage utilization, security and privacy risks arise since sensitive or delicate user data are at risk to both insider and outsider attacks. A distinct negative factor to performance of the technique is data fragmentation, which not only slows down the restoration process but also leads to massive power consumption. In this paper, we address this problem from the perspective of data layout. The kernel point of our method is a novel RAID-5-based cross grouping data layout (CGDL). We introduce a selective deduplication algorithm (SDD) to perform data replication and restoration. A new CGDL-based disk scheduling algorithm (LDP) is also proposed that predicts location dependence to save energy by eliminating the redundant disk read/write operations. We evaluate our new method on the Linux MD (multiple device) driver modules. The experiments show that, under a 10 disks 3 groups storage configuration, our method drastically (by 20%) improves restoration efficiency with only 7.6% reduction on the deduplication ratio, while reducing 23% power consumption.
Similar content being viewed by others
References
Yinjin F, Non X, Fang L (2012) Research and development on key techniques of data deduplication. J Comput Res Development 49(1):12–20
Fang Y, YuAn T, QuanXin Z et al (2016) An effective RAID data layout for object-based de-duplication backup system. Chin J Electron 25(5):832–840
Wen X, Hong J, Dan F (2015) Similarity and locality based indexing for high performance data deduplication. IEEE Trans Comput 64(4):1–10
Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. Soft Computing 20(4):1437–1448
Min F, Dan F, Yu H et al (2015) Design tradeoffs for data deduplication performance in backup workloads. In: Proceedings of the 13th USENIX conference on file and storage techonogies, Santa Clara, CA, pp 331–344
Xiao Y, Yu-an T, Zhizhuo S et al (2018) A fault-tolerant and energy-efficient continuous data protection system. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-018-0726-2
Eshghi K, Tang HK (2005) A framework for analyzing and improving content-based chunking algorithms. Technical Report HPL-2005-30(R, vol 1. Hewlett Packard Laboratories, Palo Alto
Srinivasan K, Bisson T, Goodson G et al (2012) iDedup: Latency-aware, inline data deduplication for primary storage. In: Proceedings Of the 10th USENIX conference on file and storage technologies. San Jose, CA, pp 299–312
Jin NY, Dongchul P, HC DD (2012) Assuring demanded read performance of data deduplication storage with backup datasets. In: Proceedings of the 20th IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication systems, Washington, DC, USA, pp 201–208
Kaczmarczyk M, Barczynski M, Kilian W et al (2012) Reducing impact of data fragmentation caused by in-line deduplication. In: Proceedings of the 5th annual international systems and storage conference, Haifa, Israel, pp 1–12
Lillibridge M, Eshghi K, Bhagwat D (2013) Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings Of the 11th USENIX conference on file and storage technologies, San Jose, CA, pp 183–189
Kaczmarczyk M, Dubnicki C (2015) Reducing fragmentation impact with forward knowledge in backup systems with deduplication. In: Proceedings of the 8th ACM international systems and storage conference, Haifa, Israel, 1–12
Ng C-H, Lee PPC (2013) RevDedup: A reverse deduplication storage system optimized for reads to latest backups. In: Proceedings of the 4th Asia-Pacific workshop on systems, Singapore, pp 1–18
Bo M, Hong J, SuZhen W et al (2012) SAR: SSD assisted restore optimization for deduplication-based storage systems in the cloud. In: Proceedings of the 7th international conference on networking, architecture and storages, Xiamen, Fujian, China, pp 328–337
Jian L, YunPeng C, Chang Y et al (2016) A delayed container organization approach to improve restore speed for deduplication systems. IEEE Trans Parallel Distrib Syst 27(9):2477–2491
JingLi Z, XueJun N, LeiHua Q et al (2011) Optimization for data de-duplication algorithm based on storage environment aware. Comput Sci 38(2):308–316
Gracia-Tinedo R, Sànchez-Artigas M, García-López P (2014) eWave: Leveraging energy-awareness for in-line deduplication clusters. In: Proceedings of the 2014 international conference on systems and storage, Haifa, Israel, pp 1–11
Zhizhuo S, Quanxin Z, Yuanzhan L et al (2018) DPPDL: A dynamic partial-parallel data layout for green video surveillance storage. IEEE Trans Circuits Syst Video Technol 28(1):193–205
Xiao Y, Chang-you Z, Yuan X et al (2018) An extra-parity energy saving data layout for video surveillance?. Multimed Tools Appl 77(1):4563–4583
Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. Soft Comput 20(4):1437–1448
Lin W, Xu S, Li J, Xu L, Peng Z (2017) Design and theoretical analysis of virtual machine placement algorithm based on peak workload characteristics. Soft Comput 21(5):1301–1314
Liang C, Tan Y-A, Zhang X, Wang X, Zheng J, Zhang Q (2018) Building packet length covert channel over mobile VoIP traffics. J Netw Comput Appl 118:144–153
Guan Z, Zhang Y, Wu L, Wu J, Ma Y, Hu J (2019) APPA: An anonymous and privacy preserving data aggregation scheme for fog-enhanced IoT. J Netw Comput Appl 125:82–92
Liang C, Wang X, Zhang X, Zhang Y, Sharif K, Tan Y-A (2018) A payload-dependent packet rearranging covert channel for mobile VoIP traffic. Inform Sci 465:162–173
Lin W, Xu S, He L, Li J (2017) Multi-resource scheduling and power simulation for cloud computing. Inf Sc 397:168–186
Guan Z, Zhang Y, Zhu L, Wu L, Yu S (2019) Effect: An efficient flexible privacy-preserving data aggregation scheme with authentication in smart grid. Science China Information Sciences. https://doi.org/10.1007/s11432-018-9451-y
Tan Y-A, Xue Y, Liang C, Zheng J, Zhang Q, Zheng J, Li Y (2018) A root privilege management scheme with revocable authorization for Android devices. J Netw Comput Appl 107(4):69–82
Zhang X, Zhu L, Wang X, Zhang C, Zhu H, Tan Y-A (2019) A packet-reordering covert channel over VoLTE voice and video traffics. J Netw Comput Appl 126:29–38
Li Y, Hu J, Wu Z, Liu C, Peng F, Zhang Y (2018) Research on QoS service composition based on coevolutionary genetic algorithm. Soft Comput 22(23):7865–7874
Zhang Q, Wang X, Yuan J, Liu L, Wang R, Huang H, Li Y (2019) A hierarchical group key agreement protocol using orientable attributes for cloud computing. Inform Sci 480:55–69
Tan Y-A, Zhang X, Sharif K, Liang C, Zhang Q, Li Y (2018) Covert timing channels for IoT over mobile networks. IEEE Wirel Commun 25(6):38–44
Funding
This work is supported by the National Key R&D Program of China (no. 2018YFB1004402), the Beijing Municipal Natural Science Foundation (no. 4172053), the National Natural Science Foundation of China (no. U1636213), and China State Key Laboratory of Virtual Reality Technology and Systems (2016–2018) .
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, F., Yang, X., Liu, J. et al. Optimizing the restoration performance of deduplication systems through an energy-saving data layout. Ann. Telecommun. 74, 461–471 (2019). https://doi.org/10.1007/s12243-019-00711-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12243-019-00711-z