Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content

G Pirró, N Seco - On the Move to Meaningful Internet Systems: OTM …, 2008 - Springer
On the Move to Meaningful Internet Systems: OTM 2008: OTM 2008 Confederated …, 2008Springer
In many research fields such as Psychology, Linguistics, Cognitive Science, Biomedicine,
and Artificial Intelligence, computing semantic similarity between words is an important
issue. In this paper we present a new semantic similarity metric that exploits some notions of
the early work done using a feature based theory of similarity, and translates it into the
information theoretic domain which leverages the notion of Information Content (IC). In
particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values …
Abstract
In many research fields such as Psychology, Linguistics, Cognitive Science, Biomedicine, and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper we present a new semantic similarity metric that exploits some notions of the early work done using a feature based theory of similarity, and translates it into the information theoretic domain which leverages the notion of Information Content (IC). In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, we conducted an on line experiment asking the community of researchers to rank a list of 65 word pairs. The experiment’s web setup allowed to collect 101 similarity ratings, and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that our metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC based metrics. We implemented our metric and several others in the Java WordNet Similarity Library.
Springer