TBF: A High-Efficient Query Mechanism in De-duplication Backup System

Bin Zhou^19,20,
Hai Jin¹⁹,
Xia Xie¹⁹ &
…
PingPeng Yuan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7296))

Included in the following conference series:

International Conference on Grid and Pervasive Computing

1939 Accesses
1 Citations

Abstract

For the big data, the fingerprints of the data chunks are very huge and cannot be stored in the memory completely. Accordingly, a new query mechanism namely Two-stage Bloom Filter mechanism is proposed. First, each bit of the second grade bloom filter represents the chunks having the identical fingerprints which reducing the rate of false positives. Second, a two-dimensional list is created corresponding to the two grade bloom filter to gather the absolute addresses of the data chunks with the identical fingerprints. Finally, we suggest a new hash function class with the strong global random characteristic. Two-stage Bloom Filter decreases the number of accessing disks, improves the speed of detecting the redundant data chunks, and reduces the rate of false positive. Our experiments indicate that Two-stage Bloom Filter reduces about 30~40% storage accessing of false positive with the same length of the first grade Bloom Filter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Data De-duplication Using Cuckoo Hashing in Cloud Storage

Large-Scale Data Management System Using Data De-duplication System

AngleCut: A Ring-Based Hashing Scheme for Distributed Metadata Management

References

The International Data Corporation website, http://www.idc.com
Zhu, B., Li, K., Patterson, H.: Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In: Proceedings of 6th USENIX Conference on File and Storage Technologies, pp. 1–14. USENIX Association, San Jose (2008)
Google Scholar
Bobbarjung, D.R., Jaqannathan, S., Dubnicki, C.: Improving Duplicate Elimination in Storage Systems. ACM Transactions on Storage 2, 424–448 (2006)
Article Google Scholar
Lillibridge, M.: Sparse Indexing, Large Scale, Inline Deduplication Using Sampling and Locality. In: Proceedings of 7th USENIX Conference on File and Storage Technologies, pp. 111–123. USENIX Association, San Francisco (2009)
Google Scholar
Thewl, T.T., Thein, N.L.: An Efficient Indexing Mechanism for Data Deduplication. In: Proceedings of 2009 International Conference on the Current Trends in Information Technology, pp. 1–5. IEEE Press, Dubai (2009)
Chapter Google Scholar
Bhagwat, D., Eshghi, K., Long, D.D.E., Lillibridge, M.: Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup. In: Proceedings of 17th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 1–9. IEEE Press, London (2009)
Chapter Google Scholar
Kruus, E., Ungureanu, C., Dubnicki, C.: Bimodal Content Defined Chunking for Backup Streams. In: Proceedings of 8th USENIX Conference on File and Storage Technologies, pp. 239–252. USENIX Association, Berkeley (2010)
Google Scholar
Lu, G.L., Jin, Y., Du, D.H.C.: Frequency Based Chunking for Data De-Duplication. In: Proceedings of 18th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 287–296. IEEE Press, Miami (2010)
Chapter Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. In: Proceedings of 7th USENIX Symposium on Operating Systems Design and Implementation, pp. 205–218. USENIX Association, Berkeley (2006)
Google Scholar
Jain, N., Dahlin, M., Tewari, R.: TAPER: Tiered Approach for Eliminating Redundancy in Replica Synchronization. In: Proceedings of 4th USENIX Conference on File And Storage Technologies, pp. 281–294. USENIX Association, Berkeley (2005)
Google Scholar
Bhattacherjee, S., Naranq, A., Garq, V.K.: High Throughput Data Redundancy Removal Algorithm with Scalable Performance. In: Proceedings of 6th International Conference on High Performance and Embedded Architectures and Compilers, pp. 87–96. ACM, New York (2011)
Google Scholar
Debnath, B., Sengupta, S., Li, J., Lilja, D.J., Du, D.: BloomFlash: Bloom Filter on Flash-Based Storage. In: Proceedings of 31th International Conference on Distributed Computing Systems, pp. 635–644. IEEE Computer Society, Washington (2011)
Google Scholar
Bender, M.A., Farach-Colton, M., Johnson, R., Kuszmaul, B.C., Medjedovic, D., Montes, P., Shetty, P., Spillane, R.P., Zadok, E.: Don’t Thrash: How to Cache Your Hash on Flash. In: Proceedings of 3rd USENIX Conference on Hot Topics in Storage and File Systems, p. 1. USENIX Association, Berkeley (2011)
Google Scholar
Guo, D., Wu, J., Chen, H.H., Yuan, Y., Luo, X.S.: The Dynamic Bloom Filters. IEEE Transactions on Knowledge and Data Engineering 22, 120–133 (2010)
Article Google Scholar
Song, H.Y., Dharmapurikar, S., Turner, J., Lockwood, J.: Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing. In: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 181–192. ACM, New York (2005)
Google Scholar
Guo, D., Chen, H.H., Luo, X.S.: Theory and Network Application of Dynamic Bloom Filters. In: Proceedings of 25th Annual Joint Conference of the IEEE Computer and Communications Societies, pp. 1–12. IEEE Press, Spain (2006)
Google Scholar
Ahmadi, M., Wong, S.: Modified Collision Packet Classification Using Counting Bloom Filter in Tuple Space. In: Proceedings of 25th IASTED International Conference on Parallel and Distributed Computing and Networks, pp. 70–76. ACTA Press, Anaheim (2007)
Google Scholar
Ahmadi, M., Wong, S.: A Memory-optimized Bloom Filter Using an Additional Hashing Function. In: Proceedings of Global Telecommunications Conference, pp. 1–5. IEEE Press, New Orleans (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Bin Zhou, Hai Jin, Xia Xie & PingPeng Yuan
School of Computer Science and Technology, South-Central University for Nationalities, Wuhan, China
Bin Zhou

Authors

Bin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xia Xie
View author publications
You can also search for this author in PubMed Google Scholar
PingPeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074, Wuhan, China
Ruixuan Li
Department of Computing, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China
Jiannong Cao
University of Franche-Comte, FEMTO-ST, 1 cours Leprince-Ringuet, 25200, Montbéliard, France
Julien Bourgeois

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, B., Jin, H., Xie, X., Yuan, P. (2012). TBF: A High-Efficient Query Mechanism in De-duplication Backup System. In: Li, R., Cao, J., Bourgeois, J. (eds) Advances in Grid and Pervasive Computing. GPC 2012. Lecture Notes in Computer Science, vol 7296. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30767-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-30767-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30766-9
Online ISBN: 978-3-642-30767-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics