Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Searching in metric spaces by spatial approximation

Published: 01 August 2002 Publication History

Abstract

We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree (“spatial approximation tree”), is based on approaching the searched objects spatially, that is, getting closer and closer to them, rather than the classic divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that does not need to tune parameters, which makes it appealing for use by non-experts.

References

[1]
1. Aurenhammer F (1991) Voronoi diagrams - a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345- 405.
[2]
2. Bentley J (1975) Multidimensional binary search trees used for associative searching. Comm ACM 18(9):509-517.
[3]
3. Bentley J (1979) Multidimensional binary search trees in database applications. IEEE Trans Software Eng 5(4):333-340.
[4]
4. Burkhard W, Keller R (1973) Some approaches to best-match file searching. Comm ACM 16(4):230-236.
[5]
5. Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. In Proc. ACM Conference on Management of Data (SIGMOD'97), Sigmod Rec 26(2):357- 368.
[6]
6. Brin S (1995) Near neighbor search in large metric spaces. In Proc. 21st Conference on Very Large Databases (VLDB'95), pp 574-584.
[7]
7. Baeza-Yates R, Cunto W, Manber U, Wu S (1994) Proximity matching using fixed-queries trees. In Proc. 5th Conference on Combinatorial Pattern Matching (CPM'94), Lecture Notes in Computer Science, vol. 807. Springer, Berlin Heidelberg New York, pp 198-212.
[8]
8. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading, Mass., USA.
[9]
9. Chávez E, Marroquín J (1997) Proximity queries in metric spaces. In Proc. 4th South American Workshop on String Processing (WSP'97), pp 21-36. Carleton University.
[10]
10. Chávez E, Marroquín J, Baeza-Yates R (1999) Spaghettis: an array-based algorithm for similarity queries in metric spaces. In Proc. 6th South American Symposium on String Processing and Information Retrieval (SPIRE'99), pp 38-46. IEEE, New York.
[11]
11. Chávez E, Marroquín J, Navarro G (2001) Fixed queries array: a fast and economical data structure for proximity searching. Multimedia Tools Appl 14(2):113-135.
[12]
12. Chávez E, Navarro G (2000) An effective clustering algorithm to index high dimensional metric spaces. In Proc. 7th South American Symposium on String Processing and Information Retrieval (SPIRE'00), pp 75-86. IEEE, New York.
[13]
13. Chávez E, Navarro G (2001) A probabilistic spell for the curse of dimensionality. In Proc. 3rd Workshop on Algorithm Engineering and Experiments (ALENEX'01), pp 147-160, Lecture Notes in Computer Science, vol. 2153. Springer, Berlin Heidelberg New York.
[14]
14. Chávez E, Navarro G, Baeza-Yates R, Marroquín J (2001) Searching in metric spaces. ACM Comput Surv 33(3):273-321.
[15]
15. Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In Proc. 23rd Conference on Very Large Databases (VLDB'97), pp 426-435.
[16]
16. Dehne F, Noltemeier H (1987) Voronoi trees and clustering problems. Inf Syst 12(2):171-175.
[17]
17. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In Proc. ACM Conference on Management of Data (SIGMOD'84), pp 47-57.
[18]
18. Harman D (1995) Overview of the third text retrieval conference. In: Proc. 3rd Text Retrieval Conference (TREC-3), pp 1-19. NIST Special Publication 500-207.
[19]
19. Hjaltason G, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265-318.
[20]
20. Micó L, Oncina J, Carrasco R (1996) A fast branch and bound nearest neighbor classifier in metric spaces. Pattern Recognition Lett 17:731-739.
[21]
21. Micó L, Oncina J, Vidal E (1994) A new version of the nearest-neighbor approximating and eliminating search (aesa) with linear preprocessing-time and memory requirements. Pattern Recognition Lett 15:9-17.
[22]
22. Navarro G (1999) Searching in metric spaces by spatial approx - imation. In Proc. 6th South American Symposium on String Processing and Information Retrieval (SPIRE'99), pp 141-148. IEEE, New York.
[23]
23. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31-88.
[24]
24. Nene S, Nayar S (1997) A simple algorithm for nearest neighbor search in high dimensions. IEEE Trans Pattern Anal Mach Intell 19(9):989-1003.
[25]
25. Noltemeier H (1989) Voronoi trees and applications. In Proc. International Workshop on Discrete Algorithms and Complexity, pp 69-74.
[26]
26. Navarro G, Reyes N (2001) Dynamic spatial approximation trees. In Proc. XXI Conference of the Chilean Computer Science Society (SCCC'01). IEEE, New York, pp 213-222.
[27]
27. Noltemeier H, Verbarg K, Zirkelbach C (1992) Monotonous Bisector* Trees-a tool for efficient partitioning of complex schenes of geometric objects. In: Data structures and efficient algorithms, Lecture Notes in Computer Science, vol. 594. Springer, Berlin Heidelberg New York, pp 186-203.
[28]
28. Reyes N (2001) Dynamic data structures for searching metric spaces. MSc. Thesis, Univ. Nac. de San Luis, Argentina. In progress. Advisor: Navarro G.
[29]
29. Shapiro M (1977) The choice of reference points in best-match file searching. Comm ACM 20(5):339-343.
[30]
30. Uhlmann J (1991) Implementing metric trees to satisfy general proximity/similarity queries. Manuscript.
[31]
31. Uhlmann J (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40:175-179.
[32]
32. Vidal E (1986) An algorithm for fnding nearest neighbors in (approximately) constant verage time. Pattern Recognition Lett 4:145-157.
[33]
33. Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. 4th ACM-SIAM Symposium on Discrete Algorithms (SODA '93), pp 311 - 321.
[34]
34. Yianilos P (2000) Locally lifting the curse of dimensionality for nearest neighbor search. In: Proc. 11th ACM-SIAM Symposium on Discrete Algorithms (SODA'00).

Cited By

View all
  • (2024)ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured DataProceedings of the ACM on Management of Data10.1145/36549232:3(1-27)Online publication date: 30-May-2024
  • (2024)An Exploration Graph with Continuous Refinement for Efficient Multimedia RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658117(657-665)Online publication date: 30-May-2024
  • (2024)DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338111136:10(5092-5105)Online publication date: 1-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 11, Issue 1
August 2002
91 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 August 2002

Author Tags

  1. Multimedia databases
  2. Similarity or proximity search
  3. Spatial and multidimensional search
  4. Spatial approximation tree

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)132
  • Downloads (Last 6 weeks)30
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured DataProceedings of the ACM on Management of Data10.1145/36549232:3(1-27)Online publication date: 30-May-2024
  • (2024)An Exploration Graph with Continuous Refinement for Efficient Multimedia RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658117(657-665)Online publication date: 30-May-2024
  • (2024)DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338111136:10(5092-5105)Online publication date: 1-Oct-2024
  • (2024)Survey of vector database management systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00864-x33:5(1591-1615)Online publication date: 1-Sep-2024
  • (2024)Top-Down Construction of Locally Monotonic Graphs for Similarity SearchSimilarity Search and Applications10.1007/978-3-031-75823-2_25(291-300)Online publication date: 5-Nov-2024
  • (2023)Worst-case performance of popular approximate nearest neighbor search implementationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669013(66239-66256)Online publication date: 10-Dec-2023
  • (2023)Adaptive Indexing in High-Dimensional Metric SpacesProceedings of the VLDB Endowment10.14778/3603581.360359216:10(2525-2537)Online publication date: 1-Jun-2023
  • (2023)LiteHST: A Tree Embedding based Method for Similarity SearchProceedings of the ACM on Management of Data10.1145/35887151:1(1-26)Online publication date: 30-May-2023
  • (2023)Overview of the SISAP 2023 Indexing ChallengeSimilarity Search and Applications10.1007/978-3-031-46994-7_21(255-264)Online publication date: 9-Oct-2023
  • (2023)Finding HSP Neighbors via an Exact, Hierarchical ApproachSimilarity Search and Applications10.1007/978-3-031-46994-7_1(3-18)Online publication date: 9-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media