Abstract
Applications such as audio fingerprinting require search in high dimensions: find an item in a database that is similar to a query. An important property of this search task is that negative answers are very frequent: much of the time, a query does not correspond to any database item.
We propose Redundant Bit Vectors (RBVs): a novel method for quickly solving this search problem. RBVs rely on three key ideas: 1) approximate the high-dimensional regions/distributions as tightened hyperrectangles, 2) partition the query space to store each item redundantly in an index and 3) use bit vectors to store and search the index efficiently.
We show that our method is the preferred method for very large databases or when the queries are often not in the database. Our method is 109 times faster than linear scan, and 48 times faster than locality-sensitive hashing on a data set of 239369 audio fingerprints.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burges, C.J., Platt, J.C., Jana, S.: Distortion discriminant analysis for audio fingerprinting. IEEE Transactions on Speech and Audio Processing 11, 165–174 (2003)
Ullman, S., Sali, E., Vidal-Naquet, M.: A fragment-based approach to object representation and classification. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059, pp. 85–102. Springer, Heidelberg (2001)
Achlioptas, D.: Database-friendly random projections. In: Proc. of the 20th Ann. Symp. on Principles of Database Systems, pp. 274–281 (2001)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proc. of the 20th Ann. Symp. on Computational Geometry, pp. 253–262 (2004)
Diamantaras, K., Kung, S.: Principal Components Neural Networks. John Wiley, Chichester (1996)
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman and Hall, Boca Raton (1994)
Tax, D.M., Duin, R.P.: Uniform object generation for optimizing one-class classifiers. Journal of Machine Learning Research 2, 155–173 (2001)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Goldstein, J., Ramakrishnan, R.: Contrast plots and P-Sphere trees: Space vs. time in nearest neighbour searches. In: Proc. of the 26th Intl. Conf. on Very Large Databases, pp. 429–440 (2000)
O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proc. of the 1997 ACM SIGMOD Intl. Conf., pp. 38–49 (1997)
Arya, S., Mount, D.M.: Approximate range searching. In: Proc. of the 11th Ann. Symp. on Computational Geometry, pp. 172–181 (1995)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 11, 397–409 (1979)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proc. of the 1984 ACM SIGMOD Conf., pp. 47–57 (1984)
White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proc. of the 12th Intl. Conf. on Data Engineering, pp. 516–523 (1996)
Berchtold, S., Keim, D., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In: Proc. of the 22nd Intl. Conf. on Very Large Databases, pp. 28–39 (1996)
Pagel, B.U., Korn, F., Faltusos, C.: Deflating the dimensionality curse using multiple fractal dimensions. In: Proc. of the 16th Intl. Conf. on Data Engineering, pp. 589–598 (2000)
Andoni, A., Indyk, P.: E2LSH 0.1 user manual. Technical report, Massachusetts Institute of Technology (2004), http://web.mit.edu/andoni/www/LSH/manual.pdf
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering 13, 64–78 (2001)
Goldstein, J., Platt, J.C., Burges, C.J.: Indexing high-dimensional rectangles for fast multimedia identification. Technical Report MSR-TR-2003-38, Microsoft Research (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goldstein, J., Plat, J.C., Burges, C.J.C. (2005). Redundant Bit Vectors for Quickly Searching High-Dimensional Regions. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_9
Download citation
DOI: https://doi.org/10.1007/11559887_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29073-5
Online ISBN: 978-3-540-31728-9
eBook Packages: Computer ScienceComputer Science (R0)