Nothing Special   »   [go: up one dir, main page]

Skip to main content

Redundant Bit Vectors for Quickly Searching High-Dimensional Regions

  • Conference paper
Deterministic and Statistical Methods in Machine Learning (DSMML 2004)

Abstract

Applications such as audio fingerprinting require search in high dimensions: find an item in a database that is similar to a query. An important property of this search task is that negative answers are very frequent: much of the time, a query does not correspond to any database item.

We propose Redundant Bit Vectors (RBVs): a novel method for quickly solving this search problem. RBVs rely on three key ideas: 1) approximate the high-dimensional regions/distributions as tightened hyperrectangles, 2) partition the query space to store each item redundantly in an index and 3) use bit vectors to store and search the index efficiently.

We show that our method is the preferred method for very large databases or when the queries are often not in the database. Our method is 109 times faster than linear scan, and 48 times faster than locality-sensitive hashing on a data set of 239369 audio fingerprints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Burges, C.J., Platt, J.C., Jana, S.: Distortion discriminant analysis for audio fingerprinting. IEEE Transactions on Speech and Audio Processing 11, 165–174 (2003)

    Article  Google Scholar 

  2. Ullman, S., Sali, E., Vidal-Naquet, M.: A fragment-based approach to object representation and classification. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059, pp. 85–102. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Achlioptas, D.: Database-friendly random projections. In: Proc. of the 20th Ann. Symp. on Principles of Database Systems, pp. 274–281 (2001)

    Google Scholar 

  4. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proc. of the 20th Ann. Symp. on Computational Geometry, pp. 253–262 (2004)

    Google Scholar 

  5. Diamantaras, K., Kung, S.: Principal Components Neural Networks. John Wiley, Chichester (1996)

    Google Scholar 

  6. Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  7. Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman and Hall, Boca Raton (1994)

    MATH  Google Scholar 

  8. Tax, D.M., Duin, R.P.: Uniform object generation for optimizing one-class classifiers. Journal of Machine Learning Research 2, 155–173 (2001)

    Article  Google Scholar 

  9. Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Goldstein, J., Ramakrishnan, R.: Contrast plots and P-Sphere trees: Space vs. time in nearest neighbour searches. In: Proc. of the 26th Intl. Conf. on Very Large Databases, pp. 429–440 (2000)

    Google Scholar 

  11. O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proc. of the 1997 ACM SIGMOD Intl. Conf., pp. 38–49 (1997)

    Google Scholar 

  12. Arya, S., Mount, D.M.: Approximate range searching. In: Proc. of the 11th Ann. Symp. on Computational Geometry, pp. 172–181 (1995)

    Google Scholar 

  13. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 11, 397–409 (1979)

    Google Scholar 

  14. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proc. of the 1984 ACM SIGMOD Conf., pp. 47–57 (1984)

    Google Scholar 

  15. White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proc. of the 12th Intl. Conf. on Data Engineering, pp. 516–523 (1996)

    Google Scholar 

  16. Berchtold, S., Keim, D., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In: Proc. of the 22nd Intl. Conf. on Very Large Databases, pp. 28–39 (1996)

    Google Scholar 

  17. Pagel, B.U., Korn, F., Faltusos, C.: Deflating the dimensionality curse using multiple fractal dimensions. In: Proc. of the 16th Intl. Conf. on Data Engineering, pp. 589–598 (2000)

    Google Scholar 

  18. Andoni, A., Indyk, P.: E2LSH 0.1 user manual. Technical report, Massachusetts Institute of Technology (2004), http://web.mit.edu/andoni/www/LSH/manual.pdf

  19. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering 13, 64–78 (2001)

    Article  Google Scholar 

  20. Goldstein, J., Platt, J.C., Burges, C.J.: Indexing high-dimensional rectangles for fast multimedia identification. Technical Report MSR-TR-2003-38, Microsoft Research (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Goldstein, J., Plat, J.C., Burges, C.J.C. (2005). Redundant Bit Vectors for Quickly Searching High-Dimensional Regions. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_9

Download citation

  • DOI: https://doi.org/10.1007/11559887_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29073-5

  • Online ISBN: 978-3-540-31728-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics