Abstract
Similarity search is becoming popular in even more disciplines, such as multimedia databases, bioinformatics, social networks, to name a few. The existing indexing techniques often assume the metric space model that could be too restrictive from the domain point of view. Hence, many modern applications that involve complex similarities do not use any indexing and use just sequential search, so they are applicable only to small databases. In this paper we revisit the assumptions which persist in the mainstream research of content-based retrieval. Leaving the traditional indexing paradigms such as the metric space model, our goal is to propose alternative methods for indexing that shall lead to high-performance similarity search. We introduce the design of the algorithmic framework SIMDEX for exploration of analytical properties (axioms) useful for indexing that hold in a given complex similarity space but were not discovered so far. Consequently, the known axioms will be localized as a subset within the universe of all axioms suitable for indexing. Speaking in a hyperbole, for database research the discovery of new axioms valid in some similarity space might have an impact comparable to the discovery of new laws of physics holding in parallel universes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barták, R.: Constraint Models for Reasoning on Unification in Inductive Logic Programming. In: Dicheva, D., Dochev, D. (eds.) AIMSA 2010. LNCS, vol. 6304, pp. 101–110. Springer, Heidelberg (2010)
Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distance. In: Proc. ACM International Conference on Image and Video Retrieval, pp. 438–445 (2010)
Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: a Test Collection for Content-Based Image Retrieval. CoRR, abs/0905.4627v2 (2009)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (2004)
Galgonek, J., Hoksza, D., Skopal, T.: SProt: sphere-based protein structure similarity algorithm. Proteome Science 9, 1–12 (2011)
Hetland, M.L.: Ptolemaic indexing. arXiv:0911.4384 [cs.DS] (2009)
Hettich, S., Bay, S.D.: The UCI KDD archive (1999), http://kdd.ics.uci.edu
Howarth, P., Rüger, S.: Fractional distance measures for content-based image retrieval. In: 27th European Conference on Information Retrieval, pp. 447–456. Springer (2005)
Lämmel, R., Schulte, W.: Controllable Combinatorial Coverage in Grammar-Based Testing. In: Uyar, M.Ü., Duale, A.Y., Fecko, M.A. (eds.) TestCom 2006. LNCS, vol. 3964, pp. 19–38. Springer, Heidelberg (2006)
Lokoč, J., Hetland, M.L., Skopal, T., Beecks, C.: Ptolemaic indexing of the signature quadratic form distance. In: Proceedings of the Fourth International Conference on Similarity Search and Applications, pp. 9–16. ACM (2011)
Navarro, G.: Analyzing metric space indexes: What for? In: IEEE SISAP 2009, pp. 3–10 (2009)
Noam, Chomsky: On certain formal properties of grammars. Information and Control 2(2), 137–167 (1959)
Novák, J., Skopal, T., Hoksza, D., Lokoč, J.: Non-metric Similarity Search of Tandem Mass Spectra Including Posttranslational Modifications. Journal of Discrete Algorithms 13 (2012)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Skopal, T.: Unified framework for fast exact and approximate search in dissimilarity spaces. ACM Transactions on Database Systems 32(4), 1–46 (2007)
Skopal, T., Bustos, B.: On nonmetric similarity search problems in complex domains. ACM Comput. Surv. 43, 34:1–34:50 (2011)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., Secaucus (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Skopal, T., Bartoš, T. (2012). Algorithmic Exploration of Axiom Spaces for Efficient Similarity Search at Large Scale. In: Navarro, G., Pestov, V. (eds) Similarity Search and Applications. SISAP 2012. Lecture Notes in Computer Science, vol 7404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32153-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-32153-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32152-8
Online ISBN: 978-3-642-32153-5
eBook Packages: Computer ScienceComputer Science (R0)