Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2020408.2020576acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Fast approximate similarity search based on degree-reduced neighborhood graphs

Published: 21 August 2011 Publication History

Abstract

This paper presents a fast approximate similarity search method for finding the most similar object to a given query object from an object set with a dissimilarity with a success probability exceeding a given value. As a search index, the proposed method utilizes a degree-reduced k-nearest neighbor (k-DR) graph constructed from the object set with the dissimilarity, and explores the k-DR graph along its edges using a greedy search (GS) algorithm starting from multiple initial vertices with parallel processing. In the graph-construction stage, the structural parameter k of the k-DR graph is determined so that the probability with which at least one search trial of those with multiple initial vertices succeeds is more than the given success probability. To estimate the greedy-search success probability, we introduce the concept of a basin in the k-DR graph. The experimental results on a real data set verify the approximation scheme and high search performance of the proposed method and demonstrate that it is superior to E2LSH in terms of the expected search cost.

References

[1]
A. Andoni, M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Nearest-neighbor methods in learning and vision; theory and practice, chapter 3: Locality-sensitive hashing using stable distributions. The MIT Press, Cambridge, Massachusetts, 2005.
[2]
A. Andoni and P. Indyk. Near-optimal algorithms for approximate nearest neighbor in high dimensions. In Proc. IEEE Symp. Foundations of Computer Science, pages 459--468, October 2006.
[3]
K. Aoyama, K. Saito, T. Yamada, and N. Ueda. Fast similarity search in small-world networks. In R. M. et al., editor, Complex Networks: Int. Workshop on Complex Networks, pages 185--196. Springer, 2009.
[4]
K. Aoyama, S. Watanabe, H. Sawada, Y. Minami, N. Ueda, and K. Saito. Fast similarity search on a large speech data set with neighborhood graph indexing. In Proc. Int. Conf. Acoustics, Speech, Signal Process., pages 5358--5361, March 2010.
[5]
E. Chávez, G. Navarro, R. Baeza-Yates, and J. L. Marroquín. Searching in metric spaces. ACM Comp. Surveys, 33(3):273--321, September 2001.
[6]
J. Chen, H.-R. Fang, and Y. Saad. Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection. The Journal of Machine Learning Research, 10:1989--2012, 2009.
[7]
R. Dechter and J. Pearl. Generalized best-first search strategies and the optimality of A*. Journal of the ACM, 32(3):505--536, July 1985.
[8]
H. Hacid and T. Yoshida. Neighborhood graphs for indexing and retrieving multi-dimensional data. J. Intelligent Information Systems, 34:93--111, 2010.
[9]
G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces. ACM Trans. Database System, 28(4):517--580, December 2003.
[10]
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. ACM Symp. Theory of Computing, pages 604--613, May 1998.
[11]
J. W. Jaromczyk and G. T. Toussaint. Relative neighborhood graphs and their relatives. Proc. IEEE, 80(9):1502--1516, September 1992.
[12]
D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proc. ACM Symp. on Theory of Computing, pages 741--750, May 2002.
[13]
R. E. Korf, W. Zhang, I. T. Thayer, and H. Hohwald. Frontier search. Journal of the ACM, 52(5):715--748, September 2005.
[14]
R. Krauthgamer and J. R. Lee. Navigating nets: Simple algorithms for proximity search. In Proc. ACM-SIAM Symp. Discrete Algorithms, pages 798--807. SIAM/SIGACT, January 2004.
[15]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278--2324, 1998.
[16]
Ö. Şimşek and D. Jensen. Decentralized search in networks using homophily and degree disparity. In Proc. Int. Joint Conf. Artificial Intelligence, pages 304--310, July 2005.
[17]
M. T. Orchard. A fast nearest-neighbor search algorithm. In Proc. Int. Conf. Acoustics, Speech, Signal Process., volume 4, pages 2297--2300, 1991.
[18]
P. Ram, D. Lee, W. B. March, and A. G. Gray. Linear-time algorithms for pairwise statistical problems. In Advances in Neural Information Processing Systems 22 (Dec 2009). MIT Press, 2010.
[19]
J. Sakagaito and T. Wada. Nearest first traversing graph for simultaneous object tracking and recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pages 1--7, June 2007.
[20]
T. B. Sebastian and B. B. Kimia. Metric-based shape retrieval in large databases. In Proc. Int. Conf. Pattern Recognition, volume 3, pages 291--296, 2002.
[21]
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In Proc. SIGMOD, pages 563--575, June 2009.
[22]
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.

Cited By

View all
  • (2024)Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data SegmentProceedings of the ACM on Management of Data10.1145/36392692:1(1-27)Online publication date: 26-Mar-2024
  • (2023)Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico libraryNature Communications10.1038/s41467-023-39279-714:1Online publication date: 22-Jun-2023
  • (2023)An efficient indexing technique for billion-scale nearest neighbor searchMultimedia Tools and Applications10.1007/s11042-023-14825-z82:20(31673-31689)Online publication date: 23-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate algorithm
  2. index structure
  3. neighborhood graph
  4. similarity search

Qualifiers

  • Poster

Conference

KDD '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data SegmentProceedings of the ACM on Management of Data10.1145/36392692:1(1-27)Online publication date: 26-Mar-2024
  • (2023)Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico libraryNature Communications10.1038/s41467-023-39279-714:1Online publication date: 22-Jun-2023
  • (2023)An efficient indexing technique for billion-scale nearest neighbor searchMultimedia Tools and Applications10.1007/s11042-023-14825-z82:20(31673-31689)Online publication date: 23-Mar-2023
  • (2022)GPU-accelerated Proximity Graph Approximate Nearest Neighbor Search and Construction2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00046(552-564)Online publication date: May-2022
  • (2021)A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor searchProceedings of the VLDB Endowment10.14778/3476249.347625514:11(1964-1978)Online publication date: 27-Oct-2021
  • (2021)Exploring State-of-the-Art Nearest Neighbor (NN) Search TechniquesProceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)10.1145/3430984.3431968(443-446)Online publication date: 2-Jan-2021
  • (2021)Two-stage routing with optimized guided search and greedy algorithm on proximity graph▪Knowledge-Based Systems10.1016/j.knosys.2021.107305229:COnline publication date: 11-Oct-2021
  • (2020)Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World GraphsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.288947342:4(824-836)Online publication date: 1-Apr-2020
  • (2019)Return of the Lernaean HydraProceedings of the VLDB Endowment10.14778/3368289.336830313:3(403-420)Online publication date: 1-Nov-2019
  • (2019)Similarity Search-based Blind Source SeparationICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8682162(7888-7892)Online publication date: May-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media