research-article

Revisiting kd-tree for Nearest Neighbor Search

Authors:

Kaushik SinhaAuthors Info & Claims

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1378 - 1388

https://doi.org/10.1145/3292500.3330875

Published: 25 July 2019 Publication History

Abstract

\kdtree \citefriedman1976algorithm has long been deemed unsuitable for exact nearest-neighbor search in high dimensional data. The theoretical guarantees and the empirical performance of \kdtree do not show significant improvements over brute-force nearest-neighbor search in moderate to high dimensions. \kdtree has been used relatively more successfully for approximate search \citemuja2009flann but lack theoretical guarantees. In the article, we build upon randomized-partition trees \citedasgupta2013randomized to propose \kdtree based approximate search schemes with $O(d łog d + łog n)$ query time for data sets with n points in d dimensions and rigorous theoretical guarantees on the search accuracy. We empirically validate the search accuracy and the query time guarantees of our proposed schemes, demonstrating the significantly improved scaling for same level of accuracy.

References

[1]

Nir Ailon and Bernard Chazelle. 2009. The fast Johnson--Lindenstrauss transform and approximate nearest neighbors. SIAM Journal on computing, Vol. 39, 1 (2009), 302--322.

Digital Library

[2]

Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems. 1225--1233.

Digital Library

[3]

Alexandr Andoni, Assaf Naor, Aleksandar Nikolov, Ilya Razenshteyn, and Erik Waingarten. 2018. Hölder homeomorphisms and approximate nearest neighbors. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 159--169.

[4]

Alexandr Andoni, Huy L Nguyen, Aleksandar Nikolov, Ilya Razenshteyn, and Erik Waingarten. 2017. Approximate near neighbors for general symmetric norms. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 902--913.

Digital Library

[5]

Sunil Arya, David M Mount, Nathan Netanyahu, Ruth Silverman, and Angela Y Wu. 1994. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. In Proc. 5th ACM-SIAM Sympos. Discrete Algorithms. 573--582.

Digital Library

[6]

Patrice Assouad. 1979. Etude d'une dimension metrique liee a la possibilite de plongements dans R^ n. CR Acad. Sci. Paris Sér. AB, Vol. 288 (1979), A731--A734.

[7]

Arthur Asuncion and David Newman. 2007. UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets.html

[8]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2017. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. In International Conference on Similarity Search and Applications. Springer, 34--49. https://github.com/erikbern/ann-benchmarks

[9]

Artem Babenko and Victor Lempitsky. 2012. The inverted multi-index. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 3069--3076.

Digital Library

[10]

Alina Beygelzimer, Sham Kakade, and John Langford. 2006. Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning. ACM, 97--104.

Digital Library

[11]

Moses S Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 380--388.

Digital Library

[12]

Kenneth L Clarkson. 2006. Nearest-neighbor searching and metric space dimensions. Nearest-neighbor methods for learning and vision: theory and practice (2006), 15--59.

[13]

Sanjoy Dasgupta and Yoav Freund. 2008. Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing. ACM, 537--546.

Digital Library

[14]

Sanjoy Dasgupta and Kaushik Sinha. 2013. Randomized partition trees for exact nearest neighbor search. In Conference on Learning Theory. 317--337.

[15]

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry. ACM, 253--262.

Digital Library

[16]

Jerome H Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1976. An algorithm for finding best matches in logarithmic time. ACM Trans. Math. Software, Vol. 3, SLAC-PUB-1549-REV. 2 (1976), 209--226.

Digital Library

[17]

Keinosuke Fukunaga and Patrenahalli M. Narendra. 1975. A branch and bound algorithm for computing k-nearest neighbors. IEEE transactions on computers, Vol. 100, 7 (1975), 750--753.

Digital Library

[18]

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2014. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 4 (2014), 744--755.

Digital Library

[19]

Aristides Gionis, Piotr Indyk, Rajeev Motwani, et almbox. 1999. Similarity search in high dimensions via hashing. In Vldb, Vol. 99. 518--529.

Digital Library

[20]

Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 12 (2013), 2916--2929.

Digital Library

[21]

Robert M Gray et almbox. 2006. Toeplitz and circulant matrices: A review. Foundations and Trends® in Communications and Information Theory, Vol. 2, 3 (2006), 155--239.

Digital Library

[22]

Kiana Hajebi, Yasin Abbasi-Yadkori, Hossein Shahbazi, and Hong Zhang. 2011. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Vol. 22. 1312.

Digital Library

[23]

Ville Hyvönen, Teemu Pitk"anen, Sotiris Tasoulis, Elias J"a"asaari, Risto Tuomainen, Liang Wang, Jukka Corander, and Teemu Roos. 2016. Fast nearest neighbor search through sparse random projections and voting. In Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 881--888.

[24]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, 604--613.

Digital Library

[25]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 1 (2011), 117--128.

Digital Library

[26]

Omid Keivani and Kaushik Sinha. 2018. Improved nearest neighbor search using auxiliary information and priority functions. In International Conference on Machine Learning . 2578--2586.

[27]

Omid Keivani, Kaushik Sinha, and Parikshit Ram. 2017. Improved maximum inner product search with better theoretical guarantees. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2927--2934.

[28]

Omid Keivani, Kaushik Sinha, and Parikshit Ram. 2018. Improved maximum inner product search with better theoretical guarantee using randomized partition trees. Machine Learning (2018), 1--26.

Digital Library

[29]

Quoc Le, Tamás Sarlós, and Alex Smola. 2013. Fastfood-approximating kernel expansions in loglinear time. In Proceedings of the international conference on machine learning, Vol. 85.

Digital Library

[30]

Ting Liu, Andrew W Moore, Ke Yang, and Alexander G Gray. 2005. An investigation of practical approximate nearest neighbor algorithms. In Advances in neural information processing systems. 825--832.

Digital Library

[31]

Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, 950--961.

Digital Library

[32]

Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, Vol. 45 (2014), 61--68.

[33]

Yu A Malkov and Dmitry A Yashunin. 2016. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv preprint arXiv:1603.09320 (2016).

[34]

James McNames. 2001. A fast nearest-neighbor algorithm based on a principal axis search tree. IEEE Transactions on Pattern Analysis & Machine Intelligence 9 (2001), 964--976.

Digital Library

[35]

Andrew W Moore. 2000. The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 397--405.

Digital Library

[36]

Marius Muja and David G Lowe. 2009. Flann, fast library for approximate nearest neighbors. In International Conference on Computer Vision Theory and Applications (VISAPP'09), Vol. 3. INSTICC Press.

[37]

David Nister and Henrik Stewenius. 2006. Scalable recognition with a vocabulary tree. In Computer vision and pattern recognition, 2006 IEEE computer society conference on, Vol. 2. Ieee, 2161--2168.

Digital Library

[38]

Michael T Orchard. 1991. A fast nearest-neighbor search algorithm. In Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on. IEEE, 2297--2300.

[39]

Parikshit Ram, Dongryeol Lee, and Alexander G Gray. 2012. Nearest-neighbor search on a time budget via max-margin trees. In Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, 1011--1022.

[40]

Kaushik Sinha. 2014. LSH vs Randomized Partition Trees: Which One to Use for Nearest Neighbor Search?. In Machine Learning and Applications (ICMLA), 2014 13th International Conference on. IEEE, 41--46.

Digital Library

[41]

Kaushik Sinha and Omid Keivani. 2017. Sparse Randomized Partition Trees for Nearest Neighbor Search. In Artificial Intelligence and Statistics. 681--689.

[42]

Antonio Torralba, Rob Fergus, and William T Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 30, 11 (2008), 1958--1970.

Digital Library

[43]

Santosh S Vempala. 2012. Randomly-oriented kd trees adapt to intrinsic dimension. In LIPIcs-Leibniz International Proceedings in Informatics, Vol. 18. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

[44]

Nakul Verma, Samory Kpotufe, and Sanjoy Dasgupta. 2009. Which spatial partition trees are adaptive to intrinsic dimension?. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 565--574.

Digital Library

[45]

Enrique Vidal. 1994. New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA). Pattern Recognition Letters, Vol. 15, 1 (1994), 1--7.

Digital Library

[46]

Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2016. Learning to hash for indexing big data-a survey. Proc. IEEE, Vol. 104, 1 (2016), 34--57.

[47]

Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, et almbox. 2018. A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 4 (2018), 769--790.

[48]

Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N Holtmann-Rice, David Simcha, and Felix Yu. 2017. Multiscale quantization for fast similarity search. In Advances in Neural Information Processing Systems. 5745--5755.

Digital Library

[49]

Yubao Wu, Ruoming Jin, and Xiang Zhang. 2014. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of Data. ACM, 1139--1150.

Digital Library

[50]

Felix Yu, Sanjiv Kumar, Yunchao Gong, and Shih-Fu Chang. 2014. Circulant binary embedding. In International conference on machine learning . 946--954.

Digital Library

Cited By

Yin ZGao JBalsebre PCong GLong C(2025)DEG: Efficient Hybrid Vector Search Using the Dynamic Edge Navigation GraphProceedings of the ACM on Management of Data10.1145/37096793:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709679
Hliš TFister IFister I(2025)Path Overlap Detection and Property Graph Construction from TCX Data for Advanced Analysis2025 IEEE 23rd World Symposium on Applied Machine Intelligence and Informatics (SAMI)10.1109/SAMI63904.2025.10883235(000489-000494)Online publication date: 23-Jan-2025
https://doi.org/10.1109/SAMI63904.2025.10883235
Zhang JMu LZhang DChen ZRajbhandari-Thapa JPagán JLi YMai GZhou Z(2025)SpaCE: a spatial counterfactual explainable deep learning model for predicting out-of-hospital cardiac arrest survival outcomeInternational Journal of Geographical Information Science10.1080/13658816.2024.2443757(1-32)Online publication date: 28-Jan-2025
https://doi.org/10.1080/13658816.2024.2443757
Show More Cited By

Index Terms

Revisiting kd-tree for Nearest Neighbor Search
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Randomized search
2. Theory of computation
  1. Design and analysis of algorithms
    1. Streaming, sublinear and near linear time algorithms
      1. Nearest neighbor algorithms

Recommendations

Concurrent linearizable nearest neighbour search in LockFree-kD-tree
Highlights
- The included proof is generic and presents a useful contour for proving correctness – linearizability and lock-freedom – of similar data structures.
Abstract
The Nearest neighbour search (NNS) is a fundamental problem in many application domains dealing with multidimensional data. In a concurrent setting, where dynamic modifications are allowed, a linearizable implementation of the NNS is ...
An improved algorithm finding nearest neighbor using Kd-trees
LATIN'08: Proceedings of the 8th Latin American conference on Theoretical informatics

We suggest a simple modification to the Kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades ...
Efficient approximate nearest neighbor search with integrated binary codes
MM '11: Proceedings of the 19th ACM international conference on Multimedia

Nearest neighbor search in Euclidean space is a fundamental problem in multimedia retrieval. The difficulty of exact nearest neighbor search has led to approximate solutions that sacrifice precision for efficiency. Among such solutions, approaches that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2019

3305 pages

ISBN:9781450362016

DOI:10.1145/3292500

General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '19

Sponsor:

KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 4 - 8, 2019

AK, Anchorage, USA

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

70
Total Citations
View Citations
2,009
Total Downloads

Downloads (Last 12 months)367
Downloads (Last 6 weeks)38

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yin ZGao JBalsebre PCong GLong C(2025)DEG: Efficient Hybrid Vector Search Using the Dynamic Edge Navigation GraphProceedings of the ACM on Management of Data10.1145/37096793:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709679
Hliš TFister IFister I(2025)Path Overlap Detection and Property Graph Construction from TCX Data for Advanced Analysis2025 IEEE 23rd World Symposium on Applied Machine Intelligence and Informatics (SAMI)10.1109/SAMI63904.2025.10883235(000489-000494)Online publication date: 23-Jan-2025
https://doi.org/10.1109/SAMI63904.2025.10883235
Zhang JMu LZhang DChen ZRajbhandari-Thapa JPagán JLi YMai GZhou Z(2025)SpaCE: a spatial counterfactual explainable deep learning model for predicting out-of-hospital cardiac arrest survival outcomeInternational Journal of Geographical Information Science10.1080/13658816.2024.2443757(1-32)Online publication date: 28-Jan-2025
https://doi.org/10.1080/13658816.2024.2443757
Peng SZhang ZHuang XWang ZSu SCai G(2025)A robust few‐shot classifier with image as set of pointsIET Computer Vision10.1049/cvi2.1234019:1Online publication date: 2-Jan-2025
https://doi.org/10.1049/cvi2.12340
Jiang RMa RZhao LDong WZhou R(2025)Impacts of Lake Basin Topography Predicted by Long Short-Term Memory on Difference of Lake Area and Water Resource EvolutionJournal of Cleaner Production10.1016/j.jclepro.2025.144781(144781)Online publication date: Jan-2025
https://doi.org/10.1016/j.jclepro.2025.144781
Liu BZhou AWei BWang JTsoumakas G(2025)Oversampling multi-label data based on natural neighbor and label correlationExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125257259:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.eswa.2024.125257
Zhang YXie YZhou JXu XMiao M(2024)Cucumber Seedling Segmentation Network Based on a Multiview Geometric Graph Encoder from 3D Point CloudsPlant Phenomics10.34133/plantphenomics.02546(0254)Online publication date: 2024
https://doi.org/10.34133/plantphenomics.0254
Lin CYang LZhou RZhang THan YWang Y(2024)Spatial Pattern and Environmental Driving Factors of Treeline Elevations in Yulong Snow Mountain, ChinaForests10.3390/f1507126115:7(1261)Online publication date: 19-Jul-2024
https://doi.org/10.3390/f15071261
De-Jesús-Grullón RBatista Jorge REspinal Serrata ABueno Díaz JPichardo Estévez JGuerrero-Rodríguez N(2024)Modeling and Simulation of Distribution Networks with High Renewable Penetration in Open-Source Software: QGIS and OpenDSSEnergies10.3390/en1712292517:12(2925)Online publication date: 14-Jun-2024
https://doi.org/10.3390/en17122925
Mateos-Ramirez PGomez-Avila JVillaseñor CArana-Daniel N(2024)Visual Odometry in GPS-Denied Zones for Fixed-Wing Unmanned Aerial Vehicle with Reduced Accumulative Error Based on Satellite ImageryApplied Sciences10.3390/app1416742014:16(7420)Online publication date: 22-Aug-2024
https://doi.org/10.3390/app14167420
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten