Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/997817.997857acmconferencesArticle/Chapter ViewAbstractPublication PagessocgConference Proceedingsconference-collections
Article

Locality-sensitive hashing scheme based on p-stable distributions

Published: 08 June 2004 Publication History

Abstract

We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields the first known provably efficient approximate NN algorithm for the case p<1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain "bounded growth" condition.Unlike earlier schemes, our LSH scheme works directly on points in the Euclidean space without embeddings. Consequently, the resulting query time bound is free of large factors and is simple and easy to implement. Our experiments (on synthetic data sets) show that the our data structure is up to 40 times faster than kd-tree.

References

[1]
C. Aggarwal and D. Keim A. Hinneburg. On the surprising behavior of distance metrics in high dimensional spaces. Proceedings of the International Conference on Database Theory, pages 420--434, 2001.]]
[2]
S. Arya and D. Mount. Ann: Library for approximate nearest neighbor searching. available at http://www.cs.umd.edu/~mount/ANN/.]]
[3]
S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 573--582, 1994.]]
[4]
K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? Proceedings of the International Conference on Database Theory, pages 217--235, 1999.]]
[5]
J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17:419--428, 2001.]]
[6]
J. Buhler. Provably sensitive indexing strategies for biosequence similarity search. Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB02), 2002.]]
[7]
J. Buhler and M. Tompa. Finding motifs using random projections. Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB01), 2001.]]
[8]
J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. J. Amer. Statist. Assoc., 71:340--344, 1976.]]
[9]
K. Clarkson. Nearest neighbor queries in metric spaces. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pages 609--617, 1997.]]
[10]
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. Ullman, and C. Yang. Finding interesting associations without support prunning. Proceedings of the 16th International Conference on Data Engineering (ICDE), 2000.]]
[11]
G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan. Fast mining of massive tabular data via approximate distance computations. Proc. 18th International Conference on Data Engineering (ICDE), 2002.]]
[12]
T. Darrell, P. Indyk, G. Shakhnarovich, and P. Viola. Approximate nearest neighbors methods for learning and vision. NIPS Workshop at http://www.ai.mit.edu/projects/vip/nips03ann, 2003.]]
[13]
B. Georgescu, I. Shimshoni, and P. Meer. Mean shift based clustering in high dimensions: A texture classification example. Proceedings of the 9th International Conference on Computer Vision, 2003.]]
[14]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), 1999.]]
[15]
S. Har-Peled. A replacement for voronoi diagrams of near linear size. Proceedings of the Symposium on Foundations of Computer Science, 2001.]]
[16]
T. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the web. WebDB Workshop, 2000.]]
[17]
A. Hinneburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the International Conference on Very Large Databases (VLDB), pages 506--515, 2000.]]
[18]
P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. Proceedings of the Symposium on Foundations of Computer Science, 2000.]]
[19]
P. Indyk and R. Motwani. Approximate nearest neighbor: towards removing the curse of dimensionality. Proceedings of the Symposium on Theory of Computing, 1998.]]
[20]
P. Indyk and N. Thaper. Fast color image retrieval via embeddings. Workshop on Statistical and Computational Theories of Vision (at ICCV), 2003.]]
[21]
D. Karger and M Ruhl. Finding nearest neighbors in growth-restricted metrics. Proceedings of the Symposium on Theory of Computing, 2002.]]
[22]
J. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, 1997.]]
[23]
R. Krauthgamer and J. R. Lee. Navigating nets: Simple algorithms for proximity search. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2004.]]
[24]
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. Proceedings of the Thirtieth ACM Symposium on Theory of Computing, pages 614--623, 1998.]]
[25]
J. P. Nolan. An introduction to stable distributions. available at http://www.cas.american.edu/~jpnolan/chap1.ps.]]
[26]
Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of collections of files. Proceedings of the International Conference on Web Information Systems Engineering (WISE), 2002.]]
[27]
N. Shivakumar. Detecting digital copyright violations on the Internet (Ph.D. thesis). Department of Computer Science, Stanford University, 2000.]]
[28]
Roger Weber, Hans J. Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. Proceedings of the 24th Int. Conf. Very Large Data Bases (VLDB), 1998.]]
[29]
C. Yang. Macs: Music audio characteristic sequence indexing for similarity retrieval. Proceedings of the Workshop on Applications of Signal Processing to Audio and Acoustics, 2001.]]
[30]
P.N. Yiannilos. Locally lifting the curse of dimensionality for nearest neighbor search. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2000.]]
[31]
V.M. Zolotarev. One-Dimensional Stable Distributions. Vol. 65 of Translations of Mathematical Monographs, American Mathematical Society, 1986.]]

Cited By

View all
  • (2025)Audio feature enhancement based on quaternion filtering and deep hashingNeurocomputing10.1016/j.neucom.2024.128727612(128727)Online publication date: Jan-2025
  • (2025)Information sparsity guided transformer for multi-modal medical image super-resolutionExpert Systems with Applications10.1016/j.eswa.2024.125428261(125428)Online publication date: Feb-2025
  • (2024)Selecting Indispensable Edge Patterns With Adaptive Sampling and Double Local Analysis for Data DescriptionJournal of Cases on Information Technology10.4018/JCIT.33594526:1(1-26)Online publication date: 21-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SCG '04: Proceedings of the twentieth annual symposium on Computational geometry
June 2004
468 pages
ISBN:1581138857
DOI:10.1145/997817
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. p-stable distributions
  2. approximate nearest neighbor
  3. locally sensitive hashing
  4. sublinear algorithm

Qualifiers

  • Article

Conference

SoCG04
SoCG04: Annual ACM Symposium on Computational Geometry 2004
June 8 - 11, 2004
New York, Brooklyn, USA

Acceptance Rates

SCG '04 Paper Acceptance Rate 49 of 147 submissions, 33%;
Overall Acceptance Rate 625 of 1,685 submissions, 37%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)366
  • Downloads (Last 6 weeks)54
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Audio feature enhancement based on quaternion filtering and deep hashingNeurocomputing10.1016/j.neucom.2024.128727612(128727)Online publication date: Jan-2025
  • (2025)Information sparsity guided transformer for multi-modal medical image super-resolutionExpert Systems with Applications10.1016/j.eswa.2024.125428261(125428)Online publication date: Feb-2025
  • (2024)Selecting Indispensable Edge Patterns With Adaptive Sampling and Double Local Analysis for Data DescriptionJournal of Cases on Information Technology10.4018/JCIT.33594526:1(1-26)Online publication date: 21-Feb-2024
  • (2024)Deep Supervised Hashing by Fusing Multiscale Deep Features for Image RetrievalInformation10.3390/info1503014315:3(143)Online publication date: 5-Mar-2024
  • (2024)RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3681954.368195917:11(2735-2749)Online publication date: 1-Jul-2024
  • (2024)DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3665844.366585417:9(2241-2254)Online publication date: 1-May-2024
  • (2024)An Efficient Privacy-Preserving Face Authentication Scheme Using Local Sensitive HashingProceedings of the 2024 12th International Conference on Communications and Broadband Networking10.1145/3688636.3688666(123-128)Online publication date: 24-Jul-2024
  • (2024)BMoss: Reconfigurable Hardware Accelerator for Scalable Plagiarism DetectionProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680491(102-107)Online publication date: 4-Sep-2024
  • (2024)Semi-supervised Concept Preserving Hashing for Image Retrieval in Non-stationary Data EnvironmentProceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval10.1145/3664524.3675364(14-19)Online publication date: 10-Jun-2024
  • (2024)Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365958020:10(1-19)Online publication date: 12-Sep-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media