Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Multi-Granularity Locality-Sensitive Bloom Filter

Published: 01 December 2015 Publication History

Abstract

In many applications, such as homeland security, image processing, social network, and bioinformatics, it is often required to support an approximate membership query (AMQ) to answer a question like “is an (query) object q near to at least one of the objects in the given data set Ω?” However, existing techniques for processing AMQs require a key parameter, i.e., the distance value, to be defined in advance for the query processing. In this paper, we propose a novel filter, called multi-granularity locality-sensitive Bloom filter (MLBF), which can process AMQs with multiple distance granularities. Specifically, the MLBF is composed of two Bloom filters (BF), one is called basic multi-granularity locality-sensitive BF (BMLBF), and the other is called multi-granularity verification BF (MVBF). The BMLBF is used to store the data objects. It adopts an alignable locality-sensitive hashing (LSH) function family to support multiple granularities. The MVBF is used to reduce the false positive rate of the MLBF. The false negative rate of the MLBF is reduced by applying AND-constructions followed by an OR-construction. In addition, based on the MLBF structure, we suggest a more spaceeffective variant, called the MLBF, to further reduce space cost. Theoretical analyses for estimating false positive/negative rates of the MLBF/MLBF are given. Experiments using synthetic and real data show that the theoretical estimates are quite accurate, and the MLBF/MLBF technique can handle AMQs with low false positive and negative rates for multiple distance granularities.

References

[1]
Y. Hua, B. Xiao, B. Veeravalli, and D. Feng, “Locality-sensitive Bloom filter for approximate membership query,” IEEE Trans. Comput., vol. 61, no. 6, pp. 817–830, Jun. 2012.
[2]
A. Kirsch and M. Mitzenmacher, “Distance-sensitive Bloom filters,” in Proc. 8th Workshop Algorithm Eng. Exper., 2006, pp. 41–50.
[3]
P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards removing the curse of dimensionality,” in Proc. 30th Annu. ACM Symp. Theory Comput., 1998, pp. 604 –613.
[4]
B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Commun. ACM, vol. 13, no. 7, pp. 422 –426, 1970.
[5]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” in Proc. 20th Annu. Symp. Comput. Geometry, 2004, pp. 253–262.
[6]
A. Rajaraman and J. D. Ullman, Mining of Massive Databases. New York, NY, USA: Cambridge Univ. Press, 2011.
[7]
A. Broder and M. Mitzenmacher, “Network applications of Bloom filters: A survey,” Internet Math., vol. 1, no. 4, pp. 485–509, 2004.
[8]
Q. Lv, W. Josephson, and Z. Wang, “Multi-probe LSH: Efficient indexing for high-dimensional similarity search,” in Proc. 33rd Int. Conf. Very Large Data Bases , 2007, pp. 950–961.
[9]
K. Shanmugasundaram, H. Bronnimann, and N. Memon, “Payload attribution via hierarchical Bloom filters,” in Proc. 11th ACM Conf. Comput. Commun. Security, 2004, pp. 31–41.
[10]
L. Fan, et al., “Summary cache: A scalable wide-area web cache sharing protocol,” IEEE/ACM Trans. Netw., vol. 8, no. 3, pp. 281–293, Jun. 2000.
[11]
B. Xiao and Y. Hua, “Using parallel Bloom filters for multiattribute representation on network services,” IEEE Trans. Parallel Distrib. Syst., vol. 21, no. 1, pp. 20–32, Jan. 2009.
[12]
J. Qian, Q. Zhu, and Y. Wang, “Bloom filter based associative deletion,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 8, pp. 1986–1998, Aug. 2014.
[13]
S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz, “ Theory and practice of Bloom filters for distributed systems,” IEEE Commun. Surveys Tutorials, vol. 14, no. 1, pp. 131– 155, Apr. 2011.
[14]
A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high dimensions via hashing,” in Proc. 25th Int. Conf. Very Large Data Bases , 1999, pp. 518–529.
[15]
R. Motwani, A. Naor, and R. Panigrahy, “Lower bounds on locality sensitive hashing,” SIAM J. Discr. Math., vol. 21, pp. 930–935, 2005.
[16]
A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” in Proc. 47th Annu. IEEE Symp. Found. Comput. Sci., 2006, pp. 459–468.
[17]
J. Gan, J. Feng, Q. Fang, and W. Ng, “Locality-sensitive hashing scheme based on dynamic collision counting,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2012, pp. 541–552.
[18]
V. Satuluri and S. Parthasarathy, “Bayesian locality sensitive hashing for fast similarity search,” Proc. VLDB Endowment, vol. 5, pp. 430–441, 2012.
[19]
R. Quislant, E. Gutierrez, and O. Plata, “LS-Sig: Locality-sensitive signatures for transactional memory,” IEEE Trans. Comput., vol. 62, no. 2, pp. 322–335, Feb. 2013.
[20]
Y. Tao, K. Yi, C. Sheng, and P. Kalnis, “Efficient and accurate nearest neighbor and closest pair search in high dimensional space,” ACM Trans. Data. Syst., vol. 35, no. 3, pp. 1–46, 2010.
[21]
Y. Hua, H. Jiang, Y. Zhu, D. Feng, and L. Tian, “Semantic-aware metadata organization paradigm in next-generation file systems,” IEEE Trans. Parallel Distrib. Syst., vol. 23, no. 2, pp. 337–344, Feb. 2012.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 64, Issue 12
Dec. 2015
299 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 2015

Author Tags

  1. false positive/negative rates
  2. Approximate membership query
  3. query processing
  4. Bloom filter
  5. locality-sensitive hashing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media