Abstract
Determining the semantic similarity between concept pairs is an important task in many language related problems. In the biomedical field, several approaches to assess the semantic similarity between concepts by exploiting the knowledge provided by a domain ontology have been proposed. In this paper, some of those approaches are studied, exploiting the taxonomical structure of a biomedical ontology (SNOMED-CT). Then, a new measure is presented based on computing the amount of overlapping and non-overlapping taxonomical knowledge between concept pairs. The performance of our proposal is compared against related ones using a set of standard benchmarks of manually ranked terms. The correlation between the results obtained by the computerized approaches and the manual ranking shows that our proposal clearly outperforms previous works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Cilibrasi, R.L., Vitányi, P.M.: The Google similarity distance. IEEE Transaction on Knowledge and Data Engineering 19(3), 370–383 (2006)
Sanchez, D., Moreno, A.: Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowledge Engineering 63(3), 600–623 (2008)
Lee, J., Kim, M., Lee, Y.: Information retrieval based on conceptual distance in is-a hierarchies. Journal of Documentation 49(2), 188–207 (1993)
Pedersen, T., Pakhomov, S., Patwardhan, S., Chute, C.: Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40, 288–299 (2007)
Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Wilbu, W., Yang, Y.: An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Computers in Biology and Medicine 26, 209–222 (1996)
Resnik, P.: Using information content to evalutate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI 95), Montreal, Canada, pp. 448–453 (1995)
Lin, D.: An information-theoretic definition of similarity. In: Shavlik, J.W. (ed.) Proceedings of the 15th International Conference on Machine Learning (ICML 98), Madison, Wisconson, USA, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics, September 1997, pp. 19–33 (1997)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998), http://www.cogsci.princeton.edu/~wn/
Neches, R., Fikes, R., Finin, T., Gruber, T., Senator, T., Swartout, W.: Enabling technology for knowledge sharing. AI Magazine 12(3), 36–56 (1991)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 133–138. Association for Computational Linguistics (1994)
Leacock, C., Chodorow, M.: WordNet: An electronic lexical database. In: Combining local context and WordNet similarity for word sense identification, pp. 265–283. MIT Press, Cambridge (1998)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction form the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)
Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104, 211–240 (1997)
Lemaire, B., Denhiére, G.: Effects of high-order co-occurrences on word semantic similarities. Current Psychology Letters - Behaviour, Brain and Cognition 18(1) (2006)
Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering, 2nd printing. Springer, Heidelberg (2004)
Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 9(1), 17–30 (1989)
Caviedes, J., Cimino, J.: Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics 37, 77–85 (2004)
Nguyen, H., Al-Mubaid, H.: New ontology-based semantic similarity measure for the biomedical domain. In: IEEE conference on Granular Computing, pp. 623–628 (2006)
Burgun, A., Bodenreider, O.: Comparing terms, concepts and semantic classes in wordnet and the unified medical language system. In: Proc. of the NAACL 2001 Workshop: WordNet and other lexical resources: Applications, extensions and customizations, Pittsburgh, PA, pp. 77–82 (2001)
Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Cimiano, P.: Ontology Learning and Population from Text. Algorithms, Evaluation and Applications (2006)
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.E.: Information retrieval by semantic similarity. Int. J. Semantic Web Inf. Syst. 2(3), 55–73 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Batet, M., Sanchez, D., Valls, A., Gibert, K. (2010). Exploiting Taxonomical Knowledge to Compute Semantic Similarity: An Evaluation in the Biomedical Domain. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-13022-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13021-2
Online ISBN: 978-3-642-13022-9
eBook Packages: Computer ScienceComputer Science (R0)