Closing the gap in WSD: supervised results with unsupervised methods
Abstract
Word-Sense Disambiguation (WSD), holds promise for many NLP applications requiring
broad-coverage language understanding, such as summarization (Barzilay and
Elhadad, 1997) and question answering (Ramakrishnan et al., 2003). Recent studies
have also shown that WSD can benefit machine translation (Vickrey et al., 2005) and
information retrieval (Stokoe, 2005). Much work has focused on the computational
treatment of sense ambiguity, primarily using data-driven methods. The most accurate
WSD systems to date are supervised and rely on the availability of sense-labeled
training data. This restriction poses a significant barrier to widespread use of WSD
in practice, since such data is extremely expensive to acquire for new languages and
domains.
Unsupervised WSD holds the key to enable such application, as it does not require
sense-labeled data. However, unsupervised methods fall far behind supervised ones
in terms of accuracy and ease of use. In this thesis we explore the reasons for this,
and present solutions to remedy this situation. We hypothesize that one of the main
problems with unsupervised WSD is its lack of a standard formulation and general
purpose tools common to supervised methods. As a first step, we examine existing approaches
to unsupervised WSD, with the aim of detecting independent principles that
can be utilized in a general framework. We investigate ways of leveraging the diversity
of existing methods, using ensembles, a common tool in the supervised learning
framework. This approach allows us to achieve accuracy beyond that of the individual
methods, without need for extensive modification of the underlying systems.
Our examination of existing unsupervised approaches highlights the importance of
using the predominant sense in case of uncertainty, and the effectiveness of statistical
similarity methods as a tool for WSD. However, it also serves to emphasize the need for
a way to merge and combine learning elements, and the potential of a supervised-style
approach to the problem. Relying on existing methods does not take full advantage of
the insights gained from the supervised framework.
We therefore present an unsupervised WSD system which circumvents the question
of actual disambiguation method, which is the main source of discrepancy in unsupervised
WSD, and deals directly with the data. Our method uses statistical and semantic
similarity measures to produce labeled training data in a completely unsupervised fashion.
This allows the training and use of any standard supervised classifier for the actual
disambiguation. Classifiers trained with our method significantly outperform those using
other methods of data generation, and represent a big step in bridging the accuracy
gap between supervised and unsupervised methods.
Finally, we address a major drawback of classical unsupervised systems – their reliance
on a fixed sense inventory and lexical resources. This dependence represents
a substantial setback for unsupervised methods in cases where such resources are unavailable.
Unfortunately, these are exactly the areas in which unsupervised methods are
most needed. Unsupervised sense-discrimination, which does not share those restrictions,
presents a promising solution to the problem. We therefore develop an unsupervised
sense discrimination system. We base our system on a well-studied probabilistic
generative model, Latent Dirichlet Allocation (Blei et al., 2003), which has many of
the advantages of supervised frameworks. The model’s probabilistic nature lends itself
to easy combination and extension, and its generative aspect is well suited to linguistic
tasks. Our model achieves state-of-the-art performance on the unsupervised sense
induction task, while remaining independent of any fixed sense inventory, and thus
represents a fully unsupervised, general purpose, WSD tool.