Abstract
Similarity search using metric indexing techniques is largely a solved problem in low-dimensional spaces. However techniques based only on the triangle inequality property start to fail as dimensionality increases.
Since proper metric spaces allow a finite projection of any three objects into a 2D Euclidean space, the notion of angle can be validly applied among any three (but no more) objects. High dimensionality is known to have interesting effects on angles in vector spaces, but to our knowledge this has not been studied in more general metric spaces. Here, we consider the use of angles among objects in combination with distances.
As dimensionality becomes higher, we show that the variance in sampled angles reduces. Furthermore, sampled angles also become correlated with inter-object distances, giving different distributions between query solutions and non-solutions. We show the theoretical underpinnings of this observation in unbounded high-dimensional Euclidean spaces, and then examine how the pure property is reflected in some real-world high dimensional spaces. Our experiments on both generated and real world datasets demonstrate that these observations can have an important impact on the tractability of search as dimensionality increases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(V_n(R)=\int _{B_n(R)}1 \, d x_1\, \dots \, dx_n = \int _{B_n(1)}R^n \, d y_1\, \dots \, dy_n=R^n V_n(1)\), where we integrate by substitution with \(x_i=R{y_i}\) for all i.
- 2.
If the radius of the hypersphere is constrained to be within the unit cube rather than at the defined radii the results are identical.
- 3.
We separately established that the viewpoint does not affect the measured angles.
- 4.
- 5.
- 6.
- 7.
Source code is available from https://github.com/aldearle/SISAP2020_angles or from the authors.
- 8.
The same number is used across all sets to allow fair comparison, even although for example the cost of performing an exclusion for AnnSift may be greater than the cost of measuring the distance directly.
- 9.
Unless \(|d(p_i,s_j)- d(p_i,q)| > t\), when exclusion can occur in any case.
References
Large powers of sine appear gaussian — why? https://math.stackexchange.com/questions/2293330. Accessed 22 June 2020
Blum, A., Hopcroft, J., Kannan, R.: High-Dimensional Space, p. 4–28. Cambridge University Press (2020). https://doi.org/10.1017/9781108755528.002
Cai, T., Fan, J., Jiang, T.: Distributions of angles in random packing on spheres. J. Mach. Learn. Res. 14(21), 1837–1864 (2013)
Chávez, E., Navarro, G.: Metric databases. In: Encyclopedia of Database Technologies and Applications. Idea Group (2005)
Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Hilbert exclusion: improved metric search through finite isometric embeddings. ACM Trans. Inf. Syst. 35(3), 17:1–17:27, December 2016. https://doi.org/10.1145/3001583
Connor, R.C.H., Cardillo, F.A.: Quantifying the specificity of near-duplicate image classification functions. In: Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 4: VISAPP, Rome, Italy, 27–29 February 2016, pp. 647–654 (2016)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: ICML 2014, Beijing, China, pp. 647–655 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems, pp. 777–784. MIT Press (2005)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, 20–25 September 1999, pp. 1150–1157 (1999)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Pramanik, S.K., Li, J., Ruan, J., Bhattacharjee, S.K.: Efficient search scheme for very large image databases. In: Internet Imaging, vol. 3964 (1999)
Voelker, A.R., Gosmann, J., Stewart, T.C.: Efficiently sampling vectors and coordinates from the n-sphere and n-ball. Technical report, Centre for Theoretical Neuroscience, Waterloo, ON, January 2017
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach, vol. 32. Springer, Heidelberg (2006). https://doi.org/10.1007/0-387-29151-2
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Connor, R., Dearle, A. (2020). Sampled Angles in High-Dimensional Spaces. In: Satoh, S., et al. Similarity Search and Applications. SISAP 2020. Lecture Notes in Computer Science(), vol 12440. Springer, Cham. https://doi.org/10.1007/978-3-030-60936-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-60936-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60935-1
Online ISBN: 978-3-030-60936-8
eBook Packages: Computer ScienceComputer Science (R0)