Abstract
Data deduplication is an effective method to reduce data storage requirements. In data deduplication process, fingerprint identification may cause frequent on-disk fingerprint lookups which hurt performance seriously. Some locality-aware approaches were proposed to tackle this issue. Recently, the Persistent Memory (PM) brings low latency and high bandwidth, and has become a hotspot in data storage. Deduplication systems with fingerprints stored on PM will provide extremely fast on-disk fingerprint lookup, and therefore traditional locality-aware approaches designed for slow devices are likely no longer valid.
In this paper, we model the traditional locality-aware approaches and analyze their performance on PM. Inspired by the analysis, we propose an optimized PM-based fingerprint identification scheme in which the fingerprint cache is replaced with a simple, low-cost read buffer, and the order of the Bloom filter and the read buffer is swapped. The experimental results on real PM devices show that, compared with the traditional locality-aware approaches, the proposed scheme improves the fingerprint identification throughput by 1.2–2.3 times.
This work is partially supported by National Science Foundation of China (U1833114, 61872201, 61702521); Science and Technology Development Plan of Tianjin (18ZXZNGX00140, 18ZXZNGX00200).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhu, B., Li, K., Patterson, R.H.: Avoiding the disk bottleneck in the data domain deduplication file system. In Fast 8, 1–14 (2008)
Xia, W., Jiang, H., Feng, D., Hua, Y.: SiLo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. In USENIX ATC, pp. 26–30 (2011)
Bhagwat, D., Eshghi, K., Long, D.D., Lillibridge, M.: Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In: Proceedings of the MASCOTS 2009, pp. 1–9. IEEE (2009)
Debnath, B.K., Sengupta, S., Li, J.: ChunkStash: speeding up inline storage deduplication using flash memory. In: USENIX ATC, pp. 1–16 (2010)
Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezis, G., Camble, P.: Sparse indexing: large scale, inline deduplication using sampling and locality. In Fast 9, 111–123 (2009)
Ma, J., Stones, R.J., Ma, Y., Wang, J., Ren, J., Wang, G., Liu, X.: Lazy exact deduplication. ACM Trans. Storage (TOS) 13(2), 1–26 (2017)
Meister, D., Kaiser, J., Brinkmann, A.: Block locality caching for data deduplication. In: Proceedings of the Fast., pp. 1–12 (2013)
Yang, J., et al.: An empirical guide to the behavior and use of scalable persistent memory. In Proceedings of the FAST (2020)
Rudoff, A.: Persistent memory programming. Login: Usenix Mag. 42(2), 34–40 (2017)
Xu, J., Swanson, S.: NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In: 14th USENIX Conference on File and Storage Technologies (FAST 2016), pp. 323–338 (2016)
Nam, M., Cha, H., Choi, Y. R., Noh, S. H., Nam, B.: Write-optimized dynamic hashing for persistent memory. In: 17th USENIX Conference on File and Storage Technologies (FAST 2019), pp. 31–44 (2019)
Lepers, B., Balmau, O., Gupta, K., Zwaenepoel, W.: KVell: the design and implementation of a fast persistent key-value store. In: Proceedings of the 27th ACM SOSP, pp. 447–461 (2019)
Beeler, B.: Intel optane dc persistent memory module (pmm) (2019)
Wang, C., et al.: Nv-dedup: high-performance inline deduplication for non-volatile memory. IEEE Trans. Comput. 67(5), 658–671 (2017)
Tarasov, V., Mudrankit, A., Buik, W., Shilane, P., Kuenning, G., Zadok, E.: Generating realistic datasets for deduplication analysis. In: USENIX ATC, pp. 261–272 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Li, Y., He, K., Wang, G., Liu, X. (2021). Towards Optimizing Deduplication on Persistent Memory. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-79478-1_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79477-4
Online ISBN: 978-3-030-79478-1
eBook Packages: Computer ScienceComputer Science (R0)