Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

On nonmetric similarity search problems in complex domains

Published: 18 October 2011 Publication History

Abstract

The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a query. A popular type of such a mechanism is similarity querying. For a long time, the database-oriented applications of similarity search employed the definition of similarity restricted to metric distances. Due to its topological properties, metric similarity can be effectively used to index a database which can then be queried efficiently by so-called metric access methods. However, together with the increasing complexity of data entities across various domains, in recent years there appeared many similarities that were not metrics—we call them nonmetric similarity functions. In this article we survey domains employing nonmetric functions for effective similarity search, and methods for efficient nonmetric similarity search. First, we show that the ongoing research in many of these domains requires complex representations of data entities. Simultaneously, such complex representations allow us to model also complex and computationally expensive similarity functions (often represented by various matching algorithms). However, the more complex similarity function one develops, the more likely it will be a nonmetric. Second, we review state-of-the-art techniques for efficient (fast) nonmetric similarity search, concerning both exact and approximate search. Finally, we discuss some open problems and possible future research trends.

References

[1]
Ackermann, M., Blömer, J., and Sohler, C. 2008. Clustering for metric and non-metric distance measures. In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM Philadelphia, PA, 799--808.
[2]
Aggarwal, C., Hinneburg, A., and Keim, D. 2001. On the surprising behavior of distance metrics in high dimensional spaces. In Proceedings of the 8th International Conference on Database Theory (ICDT). Springer-Verlag, Berlin, Germany.
[3]
Aggarwal, C. and Yu, P. 2000. The IGrid index: Reversing the dimensionality curse for similarity indexing in high dimensional space. In Proceedings of the 6th ACM International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, NY, 119--129.
[4]
Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman, D. 1990. Basic local alignment search tool. J. Molec. Biol. 215, 3, 403--410.
[5]
Angeles-Yreta, A., Solis-Estrella, H., Landassuri-Moreno, V., and Figueroa-Nazuno, J. 2004. Similarity search in seismological signals. In Proceedings of the 6th Mexican International Conference in Computer Science (ENC)). IEEE Computer Society Press, Los Alamitos, CA, 50--56.
[6]
Antani, S., Lee, D., Long, L., and Thoma, G. 2004. Evaluation of shape similarity measurement methods for spine X-ray images. J. Vis. Comm. Image Rep. 15, 3, 285--302.
[7]
Ashby, F. and Perrin, N. 1988. Toward a unified theory of similarity and recognition. Psych. Rev. 95, 1, 124--150.
[8]
Ashby, F. G., Ed. 1992. Multidimensional Models of Perception and Cognition. Lawrence Erlbaum Associates, Hillsdale, NJ.
[9]
Athitsos, V., Alon, J., Sclaroff, S., and Kollios, G. 2004. Boostmap: A method for efficient approximate similarity rankings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Computer Society Press, Los Alamitos, CA, 486--493.
[10]
Athitsos, V., Hadjieleftheriou, M., Kollios, G., and Sclaroff, S. 2005. Query-sensitive embeddings. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). ACM Press, New York, NY, 706--717.
[11]
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing, Boston, MA.
[12]
Bartolini, I., Ciaccia, P., and Patella, M. 2005. WARP: Accurate retrieval of shapes using phase of fourier descriptors and time warping distance. IEEE Patt. Anal. Mach. Intell. 27, 1, 142--147.
[13]
Barton, G. J. 2002. SCANPS Version 2.3.9 User guide. University of Dundee, UK.
[14]
Becker, G. and Potts, M. 2007. Non-metric biometric clustering. In Proceedings of the Biometrics Symposium. 1--6.
[15]
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 322--331.
[16]
Berndt, D. and Clifford, J. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the AAAI Workshop On Knowledge Discovery in Databases. 229--248.
[17]
Berry, M. and Browne, M. 1999. Understanding Search Engines, Mathematical Modeling and Text Retrieval. SIAM, Philadelphia, PA.
[18]
Berry, M., Dumais, S., and Letsche, T. 1995. Computation methods for intelligent information access. In Proceedings of the ACM/IEEE Supercomputing Conference.
[19]
Blanken, H. M., de Vries, A. P., Blok, H. E., and Feng, L. 2007. Multimedia Retrieval. Springer.
[20]
Böhm, C., Berchtold, S., and Keim, D. 2001. Searching in high-dimensional spaces—Index structures for improving the performance of multimedia databases. ACM Comp. Surv. 33, 3, 322--373.
[21]
Brin, S. 1995. Near neighbor search in large metric spaces. In Proceedings of the 21st Conference on Very Large Databases (VLDB). Morgan Kaufmann, San Francisco, CA, 574--584.
[22]
Bustos, B., Navarro, G., and Chávez, E. 2003. Pivot selection techniques for proximity searching in metric spaces. Patt. Recog. Lett. 24, 14, 2357--2366.
[23]
Bustos, B. and Skopal, T. 2006. Dynamic similarity search in multi-metric spaces. In Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR). ACM Press, New York, NY, 137--146.
[24]
Buttler, D. 2004. A short survey of document structure similarity algorithms. In Proceedings of the 5th International Conference on Internet Computing (IC). CSREA Press, 3--9.
[25]
Cha, G.-H. 2006. Non-metric similarity ranking for image retrieval. In Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA). Lecture Notes in Computer Science, vol. 4080. Springer, Berlin, Germany, 853--862.
[26]
Cha, S.-H., Yoon, S., and Tappert, C. 2005. On binary similarity measures for handwritten character recognition. In Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR). IEEE Computer Society Press, Los Alamitos, CA, 4--8.
[27]
Chávez, E. and Navarro, G. 2001. A probabilistic spell for the curse of dimensionality. In Proceedings of the Workshops on Algorithm Engineering and Experiments. Lecture Notes in Computer Science, vol. 2153. Springer, Berlin, Germany, 147--160.
[28]
Chávez, E., Navarro, G., Baeza-Yates, R., and Marroquín, J. 2001. Searching in metric spaces. ACM Comp. Surv. 33, 3, 273--321.
[29]
Chen, L. and Lian, X. 2008. Dynamic skyline queries in metric spaces. In Proceedings of the 11th International Conference on Extending Database Technology. ACM Press, New York, NY, 333--343.
[30]
Chen, L., Zsu, M. T., and Oria, V. 2005. Robust and fast similarity search for moving object trajectories. In Proceedings of the ACM International Conference on Managment of Data (SIGMOD). ACM Press, New York, NY, 491--502.
[31]
Ciaccia, P. and Patella, M. 2002. Searching in metric spaces with user-defined and approximate distances. ACM Trans. Datab. Syst. 27, 4, 398--437.
[32]
Ciaccia, P. and Patella, M. 2009. Principles of information filtering in metric spaces. In Proceedings of the 2nd International Workshop on Similarity Search and Applications (SISAP). IEEE Computer Society Press, Los Alamitos, CA, 99--106.
[33]
Ciaccia, P., Patella, M., and Zezula, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the International Conference on Very Large Databases. 426--435.
[34]
Corazza, P. 1999. Introduction to metric-preserving functions. Amer. Math. Monthly 104, 4, 309--23.
[35]
Cormen, T., Stein, C., Rivest, R., and Leiserson, C. 2001. Introduction to Algorithms, 2nd ed. MIT Press, Cambridge, MA.
[36]
Corral, A., Manolopoulos, Y., Theodoridis, Y., and Vassilakopoulos, M. 2000. Closest pair queries in spatial databases. SIGMOD Rec. 29, 2, 189--200.
[37]
Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comp. Surv. 40, 2, 1--60.
[38]
Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. 1978. A model for evolutionary change in proteins. Atlas Protein Sequence Structure 5, 345--352.
[39]
Deb, S. 2004. Multimedia Systems and Content-Based Image Retrieval. Information Science Publications.
[40]
Dixon, S. L. and Koehler, R. T. 1999. The hidden component of size in two-dimensional fragment descriptors: Side effects on sampling in bioactive libraries. J. Med. Chem. 42, 15, 2887--2900.
[41]
Dohnal, V., Gennaro, C., Savino, P., and Zezula, P. 2003. D-index: Distance searching index for metric data sets. Multimed. Tools Appl. 21, 1, 9--33.
[42]
Donahue, M., Geiger, D., Liu, T., and Hummel, R. 1996. Sparse representations for image decomposition with occlusions. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society Press, Los Alamitos, CA, 7--12.
[43]
Dubuisson, M.-P. and Jain, A. 1994. A modified Hausdorff distance for object matching. In Proceedings of the 12th IAPR Conference on Computer Vision and Image Processing. 566--568.
[44]
Eckhardt, A., Skopal, T., and Vojtáš, P. 2009. On fuzzy vs. metric similarity search in complex databases. In Proceedings of the 8th Conference on Flexible Query Answering Systems (FQAS). Lecture Notes in Artificial Intelligence, vol. 5822. Springer, Berlin, Germany, 64--75.
[45]
Faloutsos, C. 1996. Searching Multimedia Databases by Content. Kluwer Academic Publishers, Norwell, MA.
[46]
Faloutsos, C. and Kamel, I. 1994. Beyond uniformity and independence: Analysis of r-trees using the concept of fractal dimension. In Proceedings of the 13th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM Press, New York, NY, 4--13.
[47]
Faloutsos, C. and Lin, K.-I. 1995. Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). ACM Press, New York, NY, 163--174.
[48]
Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time-series databases. SIGMOD Rec. 23, 2, 419--429.
[49]
Farago, A., Linder, T., and Lugosi, G. 1993. Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Patt. Anal. Mach. Intell. 15, 9, 957--962.
[50]
Foote, J. 1997. A similarity measure for automatic audio classification. In Proceedings of the AAAI Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora.
[51]
Foote, J. and Silverman, H. 1994. A model distance measure for talker clustering and identification. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, Computer Society Press, Los Alamitos, CA, 317--320.
[52]
Freeman, M. 2006. Evaluating dataflow and pipelined vector processing architectures for FPGA co-processors. In Proceedings of the 9th EUROMICRO Conference on Digital System Design. IEEE Computer Society Press, Los Alamitos, CA, 127--130.
[53]
Freeman, M., Weeks, M., and Austin, J. 2005. Hardware implementation of similarity functions. In Proceedings of the IADIS International Conference on Applied Computing.
[54]
Fu, A. Y., Wenyin, L., and Deng, X. 2006. Detecting phishing Web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE Trans. Depend. Secure Comput. 3, 4, 301--311.
[55]
Gerstein, M. and Levitt, M. 1998. Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins. Protein Sci. 7, 2, 445--456.
[56]
Giannopoulos, P. and Veltkamp, R. C. 2002. A pseudo-metric for weighted point sets. In Proceedings of the 7th European Conference on Computer Vision-Part III (ECCV). Springer, Berlin, Germany, 715--730.
[57]
Goh, K.-S., Li, B., and Chang, E. 2002. DynDex: A dynamic and non-metric space indexer. In Proceedings of the 10th ACM International Conference on Multimedia (MM). ACM Press, New York, NY, 466--475.
[58]
Goyal, N., Lifshits, Y., and Schtze, H. 2008. Disorder inequality: A combinatorial approach to nearest neighbor search. In Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM). ACM Press, New York, NY, 25--32.
[59]
Green, P. 1993. SWAT. http://www.phrap.org/phredphrap/swat.html.
[60]
Guo, A. and Siegelmann, H. 2004. Time-warped longest common subsequence algorithm for music retrieval. In Proceedings of the 5th International Conference on Music Information Retrieval.
[61]
Hart, P. 1968. The condensed nearest neighbour rule. IEEE Trans. Inform. Theor. 14, 3, 515--516.
[62]
He, H. and Singh, A. 2006. Closure-tree: An index structure for graph queries. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). IEEE Computer Society Press, Los Alamitos, CA, 38.
[63]
Henikoff, S. and Henikoff, J. 1992. Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci. U.S.A. 89, 22, 10915--10919.
[64]
Higgins, A., Bahler, L., and Porter, J. 1993. Voice identification using nearest-neighbor distance measure. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 2. 375--378.
[65]
Hjaltason, G. and Samet, H. 1995. Ranking in spatial databases. In Proceedings of the 4th International Symposium on Advances in Spatial Databases (SSD). Lecture Notes in Computer Science, vol. 951. Springer, Berlin, Germany, 83--95.
[66]
Holm, L. and Sander, C. 1993. Protein structure comparison by alignment of distance matrices. J. Molec. Biol. 233, 1, 123--138.
[67]
Howarth, P. and Ruger, S. 2005. Fractional distance measures for content-based image retrieval. In Proceedings of the 27th European Conference on Information Retrieval Research (ECIR). Lecture Notes in Computer Science, vol. 3408. Springer-Verlag, Berlin, Germany, 447--456.
[68]
Hristescu, G. and Farach-Colton, M. 1999. Cluster-preserving embedding of proteins. Tech. rep. 99--50. Department of Computer Science, Rutgers University, New Brunswick, NJ.
[69]
Hu, N., Dannenberg, R., and Tzanetakis, G. 2003. Polyphonic audio matching and alignment for music retrieval. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 185--188.
[70]
Huang, B. and Kinsner, W. 2002. ECG frame classification using dynamic time warping. In Proceedings of the Canadian Conference on Electrical and Computer Engineering. Vol. 2. IEEE Computer Society Press, Los Alamitos, CA, 1105--1110.
[71]
Huttenlocher, D., Klanderman, G., and Rucklidge, W. 1993. Comparing images using the Hausdorff distance. IEEE Patt. Anal. Mach. Intell. 15, 9, 850--863.
[72]
Jacobs, D., Weinshall, D., and Gdalyahu, Y. 2000. Classification with nonmetric distances: Image retrieval and class representation. IEEE Patt. Anal. Mach. Intell. 22, 6, 583--600.
[73]
Jagadish, H. V., Ooi, B. C., Tan, K.-L., Yu, C., and Zhang, R. 2005. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Datab. Syst. 30, 2, 364--397.
[74]
Jäkela, F., Schölkopf, B., and Wichmanna, F. A. 2008. Similarity, kernels, and the triangle inequality. J. Math. Psych. 52, 5, 297--303.
[75]
Jesorsky, O., Kirchberg, K. J., and Frischholz, R. 2001. Robust face detection using the Hausdorff distance. In Proceedings of the 3rd International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA). Lecture Notes in Computer Science, vol. 2091. Springer, Berlin, Germany, 90--95.
[76]
Kabsch, W. 1976. A solution for the best rotation to relate two sets of vectors. Acta Crystal. Sec. A 32, 5, 922--923.
[77]
Kao, D., Bergeron, R., and Sparr, T. 1997. Mapping metric data to multidimensional spaces. Tech. rep. TR 97-13. Department of Computer Science, University of New Hampshire, Durham, NH.
[78]
Ke, Y., Cheng, J., and Ng, W. 2008. Efficient correlation search from graph databases. IEEE Trans. Data Knowl. Eng. 20, 12, 1601--1615.
[79]
Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. 2000. Dimensionality reduction for fast similarity search in large time series databases. J. Knowl. Inform. Syst. 3, 263--286.
[80]
Keogh, E. and Ratanamahatana, C. 2005. Exact indexing of dynamic time warping. J. Knowl. Inform. Syst. 7, 3, 358--386.
[81]
Khamsi, M. A. and Kirk, W. A., Eds. 2001. An Introduction to Metric Spaces and Fixed Point Theory. Wiley-Interscience, New York, NY.
[82]
Kim, S.-W., Park, S., and Chu, W. 2001. An index-based approach for similarity search supporting time warping in large sequence databases. In Proceedings of the 17th International Conference on Data Engineering (ICDE). IEEE Computer Society Press, Los Alamitos, CA, 607--614.
[83]
Krumhansl, C. L. 1978. Concerning the applicability of geometric models to similar data: The interrelationship between similarity and spatial density. Psych. Rev. 85, 5, 445--463.
[84]
Kruskal, J. B. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1, 1--27.
[85]
Lackner, P., Koppensteiner, W. A., Sippl, M. J., and Domingues, F. S. 2000. Prosup: A refined tool for protein structure alignment. Protein Eng. 13, 11, 745--752.
[86]
Lee, C.-H. and Lin, M.-F. 2008. Adaptive similarity measurement using relevance feedback. In Proceedings of the IEEE 8th International Conference on Computer and Information Technology Workshops (CITWORKSHOPS). IEEE Computer Society Press, Los Alamitos, CA, 314--318.
[87]
Levenshtein, I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Doklady 10, 707--710.
[88]
Lifshits, Y. 2009. Combinatorial framework for similarity search. In Proceedings of the 2nd International Workshop on Similarity Search and Applications (SISAP). IEEE Computer Society Press, Los Alamitos, CA, 11--17.
[89]
Lipman, D. and Pearson, W. 1985. Rapid and sensitive protein similarity searches. Science 227, 1435--1441.
[90]
Logan, B. and Salomon, A. 2001. A music similarity function based on signal analysis. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 745--748.
[91]
Lu, X. and Jain, A. 2008. Deformation modeling for robust 3d face matching. IEEE Trans. Patt. Anal. Mach. Intell. 30, 8, 1346--1357.
[92]
Mandl, T. 1998. Learning similarity functions in information retrieval. In Proceedings of EUFIT.
[93]
Marcu, D. 2004. A study on metrics and statistical analysis. Studia Univ. BABESBOLYAI, Math. XLIX, 3, 43--74.
[94]
Marzal, A. and Vidal, E. 1993. Computation of normalized edit distance and applications. IEEE Trans. Patt. Anal. Mach. Intell. 15, 9, 926--932.
[95]
Micó, M. L., Oncina, J., and Vidal, E. 1994. A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Patt. Recog. Lett. 15, 1, 9--17.
[96]
Morse, M. and Patel, J. 2007. An efficient and accurate method for evaluating time series similarity. In Proceedings of the ACM International Conference on Managment of Data (SIGMOD). ACM Press, New York, NY, 569--580.
[97]
Mukherjee, A. 1989. Hardware algorithms for determining similarity between two strings. IEEE Trans. Comp. 38, 4, 600--603.
[98]
Needleman, S. and Wunsch, C. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molec. Biol. 48, 3, 443--453.
[99]
Nikolova, N. and Jaworska, J. 2003. Approaches to measure chemical similarity: A review. SAR Combin. Sci. 22, 10, 1006--1026.
[100]
Ortiz, A. R., Strauss, C. E., and Olmea, O. 2002. Mammoth (matching molecular models obtained from theory): An automated method for model comparison. Protein Sci. 11, 11, 2606--2621.
[101]
Pampalk, E. 2006. Audio-based music similarity and retrieval: Combining a spectral similarity model with information extracted from fluctuation patterns. implementation submitted to the 3rd Annual Music Information Retrieval eXchange (MIREX). In Proceedings of the International Symposium on Music Information Retrieval.
[102]
Papadimitriou, C. H., Tamaki, H., Raghavan, P., and Vempala, S. 1998. Latent semantic indexing: A probabilistic analysis. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Conference on Principles of Database Systems (PODS). ACM Press, New York, NY, 159--168.
[103]
Parker, C., Fern, A., and Tadepalli, P. 2007. Learning for efficient retrieval of structured data with noisy queries. In Proceedings of the 24th International Conference on Machine Learning (ICML). ACM Press, New York, NY, 729--736.
[104]
Perera, D. G. and Li, K. F. 2008. Parallel computation of similarity measures using an FPGA-based processor array. In Proceedings of the 22nd International Conference on Advanced Information Networking and Applications. IEEE Computer Society Press, Los Alamitos, CA, 955--962.
[105]
Pu, J., Kalyanaraman, Y., Jayanti, S., Ramani, K., and Pizlo, Z. 2007. Navigation and discovery in 3D cad repositories. IEEE Comp. Graph. Appl. 27, 4, 38--47.
[106]
Ratanamahatana, C. A. and Tohlong, P. 2006. Speech audio retrieval using voice query. In Proceedings of the 9th International Conference on Asian Digital Libraries (ICADL). Lecture Notes in Computer Science, vol. 4312. Springer-Verlag, Berlin, Germany, 494--497.
[107]
Robinson, D. D., Lyne, P. D., and Richards, W. G. 2000. Partial molecular alignment via local structure analysis. J. Chem. Inform. Comp. Sci. 40, 2, 503--512.
[108]
Rosch, E. 1975. Cognitive reference points. Cog. Psych. 7, 532--547.
[109]
Roth, V., Laub, J., Buhmann, J. M., and Müller, K. R. 2002. Going metric: Denoising pairwise data. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS). 817--824.
[110]
Rothkopf, E. 1957. A measure of stimulus similarity and errors in some paired-associate learning tasks. J. Exper. Psych. 53, 2, 94--101.
[111]
Rubner, Y., Puzicha, J., Tomasi, C., and Buhmann, J. 2001. Emipirical evaluation of dissimilarity measures for color and texture. Comp. Vis. Image Understand. 84, 1, 25--43.
[112]
Rubner, Y. and Tomasi, C. 2001. Perceptual Metrics for Image Database Navigation. Kluwer, Dordrecht, The Netherlands.
[113]
Rubner, Y., Tomasi, C., and Guibas, L. 1998. A metric for distributions with applications to image databases. In Proceedings of the 6th International Conference on Computer Vision (ICCV). 59--66.
[114]
Saha, S. and Bandyopadhyay, S. 2007. MRI brain image segmentation by fuzzy symmetry based genetic clustering technique. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC). IEEE, Computer Society Press, Los Alamitos, CA, 4417--4424.
[115]
Samet, H. 2006. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco, CA.
[116]
Santini, S. and Jain, R. 1999. Similarity measures. IEEE Patt. Anal. Mach. Intell. 21, 9, 871--883.
[117]
Sanz, I., Mesiti, M., Guerrini, G., and Berlanga, R. 2008. Fragment-based approximate retrieval in highly heterogeneous xml collections. Data Knowl. Eng. 64, 1, 266--293.
[118]
Schaefer, G., Zhu, S., and Ruszala, S. 2005. Visualization of medical infrared image databases. In Proceedings of the 27th IEEE Annual Conference on Engineering in Medicine and Biology. IEEE, Computer Society Press, Los Alamitos, CA, 634--637.
[119]
Shepard, R. N. 1962. The analysis of proximities: Multidimensional scaling with an unknown distance function I. Psychometrika 27, 2, 125--140.
[120]
Skopal, T. 2004. Pivoting M-tree: A metric access method for efficient similarity search. In Proceedings of the 4th Annual Workshop DATESO'04. 21--31. Also in http://www.ceur-ws.org/Vol-98.
[121]
Skopal, T. 2006. On fast non-metric similarity search by metric access methods. In Proceedings of the 10th International Conference on Extending Database Technology (EDBT). Lecture Notes in Computer Science, vol. 3896. Springer, Berlin, Germany, 718--736.
[122]
Skopal, T. 2007. Unified framework for fast exact and approximate search in dissimilarity spaces. ACM Trans. Datab. Syst. 32, 4, 1--46.
[123]
Skopal, T. and Bustos, B. 2009. On index-free similarity search in metric spaces. In Proceedings of the 20th International Conference on Database and Expert System Applications (DEXA). Lecture Notes in Computer Science, vol. 5690. Springer, Berlin, Germany, 516--531.
[124]
Skopal, T. and Lokoč, J. 2008. NM-tree: Flexible approximate similarity search in metric and non-metric spaces. In Proceedings of the 19th International Conference on Database and Expert Systems Applications (DEXA'08). Lecture Notes in Computer Science, vol. 5181. Springer-Verlag, Berlin, Germany. 312--325.
[125]
Smeaton, A. F., Over, P., and Kraaij, W. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR). ACM Press, New York, NY, 321--330.
[126]
Smith, T. and Waterman, M. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 1, 195--197.
[127]
Su, M.-C. and Chou, C.-H. 2001. A modified version of the k-means algorithm with a distance based on cluster symmetry. IEEE Trans. Patt. Anal. Mach. Intell. 23, 6, 674--680.
[128]
Suyoto, I. S. H., Uitdenbogerd, A. L., and Scholer, F. 2007. Effective retrieval of polyphonic audio with polyphonic symbolic queries. In Proceedings of the International Workshop on Multimedia Information Retrieval (MIR). ACM Press, New York, NY, 105--114.
[129]
Tao, Y., Papadias, D., and Lian, X. 2004. Reverse knn search in arbitrary dimensionality. In Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 744--755.
[130]
Taylor, W. and Orengo, C. 1989. Protein structure alignment. J. Molec. Biol. 208, 1, 1--22.
[131]
Tekli, J., Chbeir, R., and Yetongnon, K. 2007. A hybrid approach for XML similarity. In Proceedings of the 33rd Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM). Lecture Notes in Computer Science, vol. 4362. Springer, Berlin, Germany, 783--795.
[132]
Tsang, W., Corboy, A., Lee, K., Raicu, D., and Furst, J. 2005. Texture-based image retrieval for computerized tomography databases. In Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems (CBMS). IEEE Computer Society Press, Los Alamitos, CA, 593--598.
[133]
Tsukada, S. and Watanabe, T. 1995. Speech recognition device for calculating a corrected similarity partially dependent on circumstances of production of input patterns. U.S. Patent No. 5432886. NEC Corporation.
[134]
Tuzcu, V. and Nas, S. 2005. Dynamic time warping as a novel tool in pattern recognition of ECG changes in heart rhythm disturbances. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. IEEE Computer Society Press, Los Alamitos, CA, 182--186.
[135]
Tversky, A. 1977. Features of similarity. Psych. Rev. 84, 4, 327--352.
[136]
Tversky, A. and Gati, I. 1982. Similarity, separability, and the triangle inequality. Psych. Rev. 89, 2, 123--154.
[137]
Typke, R., Giannopoulos, P., Veltkamp, R., Wiering, F., and van Oostrum, R. 2003. Using transportation distances for measuring melodic similarity. In Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR). 107--114.
[138]
Uhlmann, J. 1991. Satisfying general proximity/similarity queries with metric trees. Inform. Process. Lett. 40, 4, 175--179.
[139]
Vacha, P. and Haindl, M. 2008. Illumination invariants based on Markov random fields. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR). IEEE Computer Society Press, Los Alamitos, CA.
[140]
Vidal, E. 1986. An algorithm for finding nearest neighbours in (approximately) constant average time. Patt. Recog. Lett. 4, 3, 145--157.
[141]
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., and Keogh, E. 2003. Indexing multi-dimensional time-series with support for multiple distance measures. In Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, NY, 216--225.
[142]
Vlachos, M., Kollios, G., and Gunopulos, D. 2005. Elastic translation invariant matching of trajectories. Mach. Learn. 58, 2--3, 301--334.
[143]
Vojtáš, P. and Eckhardt, A. 2009. Using tuneable fuzzy similarity in non-metric search. In Proceedings of the 2nd International Workshop on Similarity Search and Applications (SISAP). IEEE, Computer Society Press, Los Alamitos, CA, 163--164.
[144]
Wang, X., Wang, J. T. L., Lin, K. I., Shasha, D., Shapiro, B. A., and Zhang, K. 2000. An index structure for data mining and clustering. Knowl. Inform. Syst. 2, 2, 161--184.
[145]
Weber, R., Schek, H.-J., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24rd International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 194--205.
[146]
Wild, D. J. and Willett, P. 1996. Similarity searching in files of three-dimensional chemical structures. alignment of molecular electrostatic potential fields with a genetic algorithm. J. Chem. Inform. Comp. Sci. 36, 2, 159--167.
[147]
Willett, P. 1998. Structural similarity measures for database searching. In Encyclopedia of Computational Chemistry. John Wiley, New York, NY, 2748--2756.
[148]
Willett, P., Barnard, J. M., and Downs, G. M. 1998. Chemical similarity searching. J. Chem. Inform. Comp. Sci. 38, 6, 983--996.
[149]
Wilson, D. L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybernet. 2, 3, 408--421.
[150]
Wirotius, M., Ramel, J.-Y., and Vincent, N. 2004. Improving DTW for online handwritten signature verification. In Proceedings of the International Conference on Image Analysis and Recognition (ICIAR). Lecture Notes in Computer Science, vol. 3212. Springer-Verlag, Berlin Germany, 786--793.
[151]
Yan, X., Yu, P., and Han, J. 2005. Substructure similarity search in graph databases. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). ACM Press, New York, NY, 766--777.
[152]
Yang, K. and Shahabi, C. 2004. A PCA-based similarity measure for multivariate time series. In Proceedings of the 2nd ACM International Workshop on Multimedia Databases (MMDB). ACM Press, New York, NY, 65--74.
[153]
Yi, B.-K., Jagadish, H. V., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of the International Conference on Data Engineering (ICDE). 201--208.
[154]
Zezula, P., Amato, G., Dohnal, V., and Batko, M. 2005. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, Berlin, Germany.
[155]
Zhang, B. and Srihari, S. 2002. A fast algorithm for finding k-nearest neighbors with non-metric dissimilarity. In Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR). IEEE Computer Society Press, Los Alamitos, CA, 13.
[156]
Zhao, Q., Hoi, S. C. H., Liu, T.-Y., Bhowmick, S. S., Lyu, M. R., and Ma, W.-Y. 2006. Time-dependent semantic similarity measure of queries using historical click-through data. In Proceedings of the 15th International Conference on World Wide Web (WWW). ACM Press, New York, NY, 543--552.
[157]
Zhou, X., Zhou, X., and Shen, H. 2007. Efficient similarity search by summarization in large video databse. In Proceedings of the 18th Australasian Database Conference (ADC). Australian Computer Society, 161--167.
[158]
Zuo, X. and Jin, X. 2007. General hierarchical model (GHM) to measure similarity of time series. SIGMOD Rec. 36, 1, 13--18.

Cited By

View all
  • (2024)Worst-Case-Optimal Similarity Joins on Graph DatabasesProceedings of the ACM on Management of Data10.1145/36392942:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Filtering with relational similarityInformation Systems10.1016/j.is.2024.102345122(102345)Online publication date: May-2024
  • (2024)Visualizations for universal deep-feature representations: survey and taxonomyKnowledge and Information Systems10.1007/s10115-023-01933-366:2(811-840)Online publication date: 1-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 43, Issue 4
October 2011
556 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/1978802
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2011
Accepted: 01 January 2010
Revised: 01 September 2009
Received: 01 May 2009
Published in CSUR Volume 43, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Similarity retrieval
  2. approximate and exact search
  3. nonmetric distances
  4. similarity measuring

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)4
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Worst-Case-Optimal Similarity Joins on Graph DatabasesProceedings of the ACM on Management of Data10.1145/36392942:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Filtering with relational similarityInformation Systems10.1016/j.is.2024.102345122(102345)Online publication date: May-2024
  • (2024)Visualizations for universal deep-feature representations: survey and taxonomyKnowledge and Information Systems10.1007/s10115-023-01933-366:2(811-840)Online publication date: 1-Feb-2024
  • (2023)Unconventional application of k-means for distributed approximate similarity searchInformation Sciences: an International Journal10.1016/j.ins.2022.11.024619:C(208-234)Online publication date: 1-Jan-2023
  • (2023)Mathematical Methods for the Shape Analysis and Indexing of Tangible CH ArtefactsMathematical Modeling in Cultural Heritage10.1007/978-981-99-3679-3_7(99-120)Online publication date: 1-Jun-2023
  • (2022)Open dataset discovery using context-enhanced similarity searchKnowledge and Information Systems10.1007/s10115-022-01751-z64:12(3265-3291)Online publication date: 1-Dec-2022
  • (2022)Concept of Relational Similarity SearchSimilarity Search and Applications10.1007/978-3-031-17849-8_8(89-103)Online publication date: 28-Sep-2022
  • (2022)Hybrid (CPU/GPU) Exact Nearest Neighbors Search in High-Dimensional SpacesArtificial Intelligence Applications and Innovations10.1007/978-3-031-08337-2_10(112-123)Online publication date: 10-Jun-2022
  • (2021)Similarity vs. Relevance: From Simple Searches to Complex DiscoverySimilarity Search and Applications10.1007/978-3-030-89657-7_9(104-117)Online publication date: 22-Oct-2021
  • (2020)Recent (Dis)similarity Measures Between Histograms for Recognizing Many Classes of Plant LeavesPattern Recognition Applications in Engineering10.4018/978-1-7998-1839-7.ch008(180-203)Online publication date: 2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media