Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1400751.1400798acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
research-article

Optimizing data popularity conscious bloom filters

Published: 18 August 2008 Publication History

Abstract

Bloom filters are compact set representations that support set membership queries with small, one-sided error probabilities. Standard Bloom filters are oblivious to object popularity in sets and membership queries. However, sets and queries in many distributed applications follow known, stable, highly skewed distributions (e.g., Zipf-like). This paper studies the problem of minimizing the false-positive probability of a Bloom filter by adapting the number of hashes used for each data object to its popularity in sets and membership queries. We model the problem as a constrained nonlinear integer program and propose two polynomial-time solutions with bounded approximation ratios -- one is a 2-approximation algorithm with O(Nc) running time (c ≥ 6 in practice); the other is a (2+ε)-approximation algorithm with running time O(N2/ε)$, ε > 0. Here N denotes the total number of distinct data objects that appear in sets or queries. We quantitatively evaluate our proposed approach on two distributed applications (cooperative caching and full-text keyword searching) driven by real-life data traces. Compared to standard Bloom filters, our data popularity-conscious Bloom filters achieve up to 24 and 27 times false-positive probability reduction for the two applications respectively. The quantitative evaluation also validates our solution's bounded approximation ratio to the optimal.

References

[1]
V. Almeida, A. Bestavros, M. Crovella, and A. de Oliveira. Characterizing reference locality in the WWW. In the 4th Int'l Conf. on Parallel and Distributed Information Systems, pages 92--103, 1996.
[2]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[3]
J. Bruck, J. Gao, and A. Jiang. Weighted Bloom filters. In the IEEE Int'l Symp. on Information Theory, July 2006.
[4]
J. Byers, J. Considine, M. Mitzenmacher, and S. Rost. Informed content delivery across adaptive overlay networks. In ACM SIGCOMM, pages 47--60, 2002.
[5]
B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. The Bloomier filter: An efficient data structure for static support lookup tables. In the 15th Symp. on Distributed Algorithms, pages 30--39, 2004.
[6]
S. Cohen and Y. Matias. Spectral Bloom filters. In the ACM SIGMOD, pages 241--252, 2003.
[7]
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. on Networking, 8(3):281--293, 2000.
[8]
F. Hao, M. Kodialam, and T.V. Lakshman. Building high accuracy Bloom filters using partitioned hashing. In ACM SIGMETRICS, pages 277--288, 2007.
[9]
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, 1963.
[10]
D. S. Johnson and K. A. Niemi. On knapsack partitions and a new dynamic programming techniques for trees. Mathematics of Operations Research, 8(1):1--14, 1983.
[11]
A. Kirsch and M. Mitzenmacher. Less hashing, same performance: Building a better Bloom filter. In the 14th European Symp. on Algorithms, pages 456--467, 2006.
[12]
S. G. Kolliopoulos and G. Steiner. Partially ordered knapsack and applications to scheduling. In the 10th European Symp. on Algorithms, pages 612--624, 2002.
[13]
S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79--86, 1951.
[14]
A. Kumar, J. Xu, and E. W. Zegura. Efficient and scalable query routing for unstructured peer-to-peer networks. In the IEEE INFOCOM, 2005.
[15]
J. Li, T. Loo, J. Hellerstein, F. Kaashoek, D.R. Karger, and R. Morris. On the feasibility of peer-to-peer web indexing and search. In the 2nd Int'l Workshop on Peer-to-Peer Systems, 2003.
[16]
L. F. Mackert and G. M. Lohman. R* optimizer validation and performance evaluation for distributed queries. In the 12th Int'l Conf. on Very Large Data Bases, pages 149--159, 1986.
[17]
U. Manber and S. Wu. An algorithm for approximate membership checking with application to password security. Information Processing Letters, 50:191--197, 1994.
[18]
M. Mitzenmacher. Compressed bloom filters. In the 20th ACM Symp. on Principles of Distributed Computing, pages 144--150, 2001.
[19]
J. K. Mullin. Optimal semijoins for distributed database systems. IEEE Trans. on Software Engineering, 16(5):558--560, 1990.
[20]
A. Pagh, R. Pagh, and S. S. Rao. An optimal Bloom filter replacement. In the 16th ACM Symp. on Discrete Algorithms, pages 823--829, 2005.
[21]
S. Podlipnig and L. Boszormenyi. A survey of web cache replacement strategies. ACM Computing Surveys, 35(4):374--398, 2003.
[22]
M. V. Ramakrishna. Practical performance of Bloom filters and parallel free-text searching. Communications of ACM, 32(10):1237--1239, 1989.
[23]
P. Reynold and A. Vahdat. Efficient peer-to-peer keyword searching. In Middleware, pages 21--40, 2003.
[24]
S. Rhea and J. Kubiatowicz. Probabilistic location and routing. In the 21st IEEE INFOCOM, pages 1248--1257, 2002.
[25]
Y.J. Song, V. Ramasubramanian, and E.G. Sirer. Optimal resource utilization in content distribution networks. Technical Report TR2005-2004, Dept. of Computer Science, Cornell Univ., 2005.
[26]
E. H. Spafford. Opurs: Preventing weak password choices. Computer and Security, 11:273--278, 1992.
[27]
M. Zhong, K. Shen, and J. Seiferas. Replication degree customization for high availability. In the 3rd EuroSys Conf., pages 55--68, 2008.
[28]
M. Zhong, K. Shen, and J. Seiferas. Correlation-aware object placement for multi-object operations. In the 28th Int'l Conf. on Distributed Computing Systems, 2008.

Cited By

View all
  • (2024)Neutron Beam Evaluation of Probabilistic Data Structure-based Online Checkers2024 IEEE 30th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS60994.2024.10616084(1-6)Online publication date: 3-Jul-2024
  • (2023)Seesaw Counting Filter: A Dynamic Filtering Framework for Vulnerable Negative KeysIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327370935:12(12987-13001)Online publication date: 1-Dec-2023
  • (2023)PA-Sketch: A Fast and Accurate Sketch for Differentiated Flow Estimation2023 IEEE 31st International Conference on Network Protocols (ICNP)10.1109/ICNP59255.2023.10355581(1-11)Online publication date: 10-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODC '08: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
August 2008
474 pages
ISBN:9781595939890
DOI:10.1145/1400751
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bloom filters
  2. dynamic programming
  3. knapsack problems
  4. optimization

Qualifiers

  • Research-article

Conference

PODC '08

Acceptance Rates

Overall Acceptance Rate 740 of 2,477 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)8
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Neutron Beam Evaluation of Probabilistic Data Structure-based Online Checkers2024 IEEE 30th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS60994.2024.10616084(1-6)Online publication date: 3-Jul-2024
  • (2023)Seesaw Counting Filter: A Dynamic Filtering Framework for Vulnerable Negative KeysIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327370935:12(12987-13001)Online publication date: 1-Dec-2023
  • (2023)PA-Sketch: A Fast and Accurate Sketch for Differentiated Flow Estimation2023 IEEE 31st International Conference on Network Protocols (ICNP)10.1109/ICNP59255.2023.10355581(1-11)Online publication date: 10-Oct-2023
  • (2022)Seesaw Counting Filter: An Efficient Guardian for Vulnerable Negative Keys During Dynamic FilteringProceedings of the ACM Web Conference 202210.1145/3485447.3511996(2759-2767)Online publication date: 25-Apr-2022
  • (2022)Adaptive One Memory Access Bloom FiltersIEEE Transactions on Network and Service Management10.1109/TNSM.2022.314543619:2(848-859)Online publication date: Jun-2022
  • (2022)A Pareto optimal Bloom filter family with hash adaptivityThe VLDB Journal10.1007/s00778-022-00755-z32:3(525-548)Online publication date: 26-Jul-2022
  • (2021)Bloom Filter With a False Positive Free ZoneIEEE Transactions on Network and Service Management10.1109/TNSM.2021.305907518:2(2334-2349)Online publication date: Jun-2021
  • (2021)Hash Adaptive Bloom Filter2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00061(636-647)Online publication date: Apr-2021
  • (2020)Modifications using Circular Shift for a Better Bloom FilterProceedings of the International Conference on Research in Adaptive and Convergent Systems10.1145/3400286.3418232(149-154)Online publication date: 13-Oct-2020
  • (2020)Adaptive Cuckoo FiltersACM Journal of Experimental Algorithmics10.1145/333950425(1-20)Online publication date: 13-Mar-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media