Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases

Published: 30 May 2023 Publication History

Abstract

Approximate nearest neighbor (ANN) search is a fundamental search in multi-dimensional databases, which has numerous real-world applications, such as image retrieval, recommendation, entity resolution, and sequence matching. Proximity graph (PG) has been the state-of-the-art index for ANN search. However, the search on existing PGs either suffers from a high time complexity or has no performance guarantee on the search result. In this paper, we propose a novel τ-monotonic graph (τ- MG) to address the limitations. The novelty of τ-MG lies in a τ-monotonic property. Based on this property, we prove that if the distance between a query q and its nearest neighbor is less than a constant τ, the search on τ-MG guarantees to find the exact nearest neighbor of q and the time complexity of the search is smaller than all existing PG-based methods. For index construction efficiency, we propose an approximate variant of τ-MG, namely τ-monotonic neighborhood graph (τ- MNG), which only requires the neighborhood of each node to be τ-monotonic. We further propose an optimization to reduce the number of distance computations in search. Our extensive experiments show that our techniques outperform all existing methods on well-known real-world datasets.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023 research paper "Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases".

References

[1]
Laurent Amsaleg, Oussama Chelly, Teddy Furon, Stéphane Girard, Michael E. Houle, Ken-ichi Kawarabayashi, and Michael Nett. 2015. Estimating Local Intrinsic Dimensionality. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 29--38.
[2]
Laurent Amsaleg, Björn Þór Jónsson, and Herwig Lejsek. 2018. Scalability of the NV-tree: Three Experiments. In Proceedings of the International Conference on Similarity Search and Applications (SISAP), Vol. 11223. 59--72.
[3]
Anon. 2010. Datasets for approximate nearest neighbor search. Retrieved May 2022 from http://corpus-texmex.irisa.fr/.
[4]
Anon. 2011. Million Song Dataset Benchmarks. Retrieved May 2020 from http://www.ifs.tuwien.ac.at/mir/msd/.
[5]
Anon. unknown. Common Crawl. Retrieved April 2020 from http://commoncrawl.org/.
[6]
Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, Vol. 87 (2020), 101374.
[7]
Franz Aurenhammer. 1991. Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure. Comput. Surveys, Vol. 23, 3 (1991), 345--405.
[8]
Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. In Proceedings of the European Conference on Computer Vision (ECCV), Vol. 11216. 209--224.
[9]
Dmitry Baranchuk, Dmitry Persiyanov, Anton Sinitsin, and Artem Babenko. 2019. Learning to route in similarity graphs. In Proceedings of the International Conference on Machine Learning (ICML). 475--484.
[10]
Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P. Drake, Jane M. Landolin, and Adam M. Phillippy. 2015. Assembling Large Genomes with Single-Molecule Sequencing and Locality-Sensitive Hashing. Nature Biotechnology, Vol. 33, 6 (2015), 623--630.
[11]
Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When Is ”Nearest Neighbor” Meaningful?. In Proceedings of the International Conference on Database Theory (ICDT). 217--235.
[12]
Abhinandan Das, Mayur Datar, Ashutosh Garg, and Shyamsundar Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In Proceedings of the International Conference on World Wide Web (WWW). 271--280.
[13]
D.W. Dearholt, N. Gonzales, and G. Kurup. 1988. Monotonic Search Networks For Computer Vision Databases. In Proceedings of the Asilomar Conference on Signals, Systems and Computers, Vol. 2. 548--553.
[14]
Wei Dong, Moses Charikar, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the International Conference on World Wide Web (WWW). 577--586.
[15]
Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. CoRR, Vol. abs/1609.07228 (2016).
[16]
Cong Fu, Changxu Wang, and Deng Cai. 2021. High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1--1.
[17]
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast Approximate Nearest Neighbor Search with the Navigating Spreading-out Graph. Proceedings of the VLDB Endowment, Vol. 12, 5 (2019), 461--474.
[18]
Gylfi Þó r Gudmundsson, Bjö rn Þó r Jó nsson, Laurent Amsaleg, and Michael J. Franklin. 2018. Prototyping a Web-Scale Multimedia Retrieval Service Using Spark. ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 14, 3s (2018), 65:1--65:24.
[19]
Kiana Hajebi, Yasin Abbasi-Yadkori, Hossein Shahbazi, and Hong Zhang. 2011. Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1312--1317.
[20]
Ben Harwood and Tom Drummond. 2016. FANNG: Fast Approximate Nearest Neighbour Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5713--5722.
[21]
Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. 2012. KORE: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 545--554.
[22]
Omid Jafari, Preeti Maurya, Parth Nagarkar, Khandker Mushfiqul Islam, and Chidambaram Crushev. 2021. A Survey on Locality Sensitive Hashing Algorithms and their Applications. CoRR, Vol. abs/2102.08942 (2021).
[23]
J.W. Jaromczyk and G.T. Toussaint. 1992. Relative neighborhood graphs and their relatives. Proc. IEEE, Vol. 80, 9 (1992), 1502--1517.
[24]
Pennington Jeffrey, Socher Richard, and D. Manning Christopher. 2015. GloVe: Global Vectors for Word Representation. Retrieved May 2022 from http://nlp.stanford.edu/projects/glove/.
[25]
Hervé Jé gou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, 1 (2011), 117--128.
[26]
Zhongming Jin, Debing Zhang, Yao Hu, Shiding Lin, Deng Cai, and Xiaofei He. 2014. Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion. IEEE Transactions on Cybernetics, Vol. 44, 11 (2014), 2167--2177.
[27]
Jon Kleinberg. 2000. The Small-World Phenomenon: An Algorithmic Perspective. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). 163--170.
[28]
Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the International Conference on Computer Vision (ICCV). 2130--2137.
[29]
Govinda D. Kurup. 1992. Database Organized on the Basis of Similarities with Applications in Computer Vision. Ph.,D. Dissertation.
[30]
Herwig Lejsek, Friðrik Heiðar Á smundsson, Bjö rn Þó r Jó nsson, and Laurent Amsaleg. 2009. NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, 5 (2009), 869--883.
[31]
Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. 2020. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 2539--2554.
[32]
W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin. 2020. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement. IEEE Transactions on Knowledge and Data Engineering, Vol. 32, 8 (2020), 1475--1488.
[33]
Ting Liu, Andrew Moore, Ke Yang, and Alexander Gray. 2004. An Investigation of Practical Approximate Nearest Neighbor Algorithms. In Advances in Neural Information Processing Systems, Vol. 17.
[34]
Wanqi Liu, Hanchen Wang, Ying Zhang, Wei Wang, Lu Qin, and Xuemin Lin. 2021. EI-LSH: An early-termination driven I/O efficient incremental c-approximate nearest neighbor search. VLDB Journal, Vol. 30, 2 (2021), 215--235.
[35]
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, Vol. 45 (2014), 61--68.
[36]
Y. A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 4 (2020), 824--836.
[37]
Charles U. Martel and Van Nguyen. 2004. Analyzing Kleinberg's (and other) small-world Models. In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODC). ACM, 179--188.
[38]
Yusuke Matsui, Yusuke Uchida, Hervé Jégou, and Shin'ichi Satoh. 2018. A Survey of Product Quantization. ITE Transactions on Media Technology and Applications, Vol. 6, 1 (2018), 2--10.
[39]
Stanley Milgram. 1967. The small world problem. Psychology Today, Vol. 1, 1 (1967), 61--67.
[40]
Stanislav Morozov and Artem Babenko. 2018. Non-metric Similarity Graphs for Maximum Inner Product Search. In Advances in Neural Information Processing Systems, Vol. 31.
[41]
Toshiro Ogita, Hidetomo Ichihashi, Akira Notsu, and Katsuhiro Honda. 2014. Improvement of PCA-Based Approximate Nearest Neighbor Search Using Distance Statistics. Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol. 18, 4 (2014), 658--664.
[42]
Marco Patella and Paolo Ciaccia. 2008. The Many Facets of Approximate Similarity Search. In Proceedings of the First International Workshop on Similarity Search and Applications (SISAP). 10--21.
[43]
Yun Peng, Byron Choi, Tsz Nam Chan, and Jianliang Xu. 2022. LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases. In Proceedings of the International Conference on Data Engineering (ICDE). 2508--2521.
[44]
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
[45]
Liudmila Prokhorenkova and Aleksandr Shekhovtsov. 2020a. Graph-based Nearest Neighbor Search: From Practice to Theory. In Proceedings of the International Conference on Machine Learning (ICML). 7803--7813.
[46]
Liudmila Prokhorenkova and Aleksandr Shekhovtsov. 2020b. Graph-based Nearest Neighbor Search: From Practice to Theory. In Proceedings of the International Conference on Machine Learning (ICML)), Vol. 119. 7803--7813.
[47]
Uri Shaft and Raghu Ramakrishnan. 2005. When Is Nearest Neighbors Indexable?. In Proceedings of the International Conference on Database Theory (ICDT). 158--172.
[48]
Larissa C. Shimomura, Rafael Seidi Oyamada, Marcos R. Vieira, and Daniel S. Kaster. 2021. A survey on graph-based methods for similarity searches in metric spaces. Information Systems, Vol. 95 (2021), 101507.
[49]
Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, and Xuemin Lin. 2014. SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index. Proceedings of the VLDB Endowment, Vol. 8, 1 (2014), 1--12.
[50]
Javier Vargas Muñoz, Marcos A. Gonçalves, Zanoni Dias, and Ricardo da S. Torres. 2019. Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search. Pattern Recognition (PR), Vol. 96 (2019), 106970.
[51]
Hongya Wang, Zhizheng Wang, Wei Wang, Yingyuan Xiao, Zeng Zhao, and Kaixiang Yang. 2020. A Note on Graph-Based Nearest Neighbor Search. arxiv: 2012.11083 [cs.LG]
[52]
Jingdong Wang, Naiyan Wang, You Jia, Jian Li, Gang Zeng, Hongbin Zha, and Xian-Sheng Hua. 2014. Trinary-Projection Trees for Approximate Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 2 (2014), 388--403.
[53]
Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 1964--1978.
[54]
Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature, Vol. 339 (1998), 440--442.
[55]
Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N. Holtmann-Rice, David Simcha, and Felix X. Yu. 2017. Multiscale Quantization for Fast Similarity Search. In Advances in Neural Information Processing Systems. 5745--5755.
[56]
Zhaozhuo Xu, Weijie Zhao, Shulong Tan, Zhixin Zhou, and Ping Li. 2022. Proximity Graph Maintenance for Fast Online Nearest Neighbor Search. arXiv (2022).
[57]
Artem Babenko Yandex and Victor Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2055--2063.
[58]
Yuanhang Yu, Dong Wen?, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2022. GPU-accelerated Proximity Graph Approximate Nearest Neighbor Search and Construction. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 552--564.
[59]
Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 1033--1044.
[60]
Yuxin Zheng, Qi Guo, Anthony K.H. Tung, and Sai Wu. 2016. LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index. In Proceedings of the International Conference on Management of Data (SIGMOD). 2023--2037.

Cited By

View all
  • (2024)RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3681954.368195917:11(2735-2749)Online publication date: 30-Aug-2024
  • (2024)DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3665844.366585417:9(2241-2254)Online publication date: 6-Aug-2024
  • (2024)Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365958020:10(1-19)Online publication date: 12-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. $au$-monotonic
  2. ann search
  3. edge occlusion rule
  4. proximity graph

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong RGC
  • NSF of Hunan Province
  • NSF of Guangdong Province
  • NSFC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)561
  • Downloads (Last 6 weeks)79
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3681954.368195917:11(2735-2749)Online publication date: 30-Aug-2024
  • (2024)DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor SearchProceedings of the VLDB Endowment10.14778/3665844.366585417:9(2241-2254)Online publication date: 6-Aug-2024
  • (2024)Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365958020:10(1-19)Online publication date: 12-Sep-2024
  • (2024)RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36549702:3(1-27)Online publication date: 30-May-2024
  • (2024)ChatGraph: Chat with Your Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00424(5445-5448)Online publication date: 13-May-2024
  • (2024)Efficient Reverse $k$ Approximate Nearest Neighbor Search Over High-Dimensional Vectors2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00325(4262-4274)Online publication date: 13-May-2024
  • (2024)Top-Down Construction of Locally Monotonic Graphs for Similarity SearchSimilarity Search and Applications10.1007/978-3-031-75823-2_25(291-300)Online publication date: 25-Oct-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media