Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1613715.1613845dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Word sense disambiguation using OntoNotes: an empirical study

Published: 25 October 2008 Publication History

Abstract

The accuracy of current word sense disambiguation (WSD) systems is affected by the fine-grained sense inventory of WordNet as well as a lack of training examples. Using the WSD examples provided through OntoNotes, we conduct the first large-scale WSD evaluation involving hundreds of word types and tens of thousands of sense-tagged examples, while adopting a coarse-grained sense inventory. We show that though WSD systems trained with a large number of examples can obtain a high level of accuracy, they nevertheless suffer a substantial drop in accuracy when applied to a different domain. To address this issue, we propose combining a domain adaptation technique using feature augmentation with active learning. Our results show that this approach is effective in reducing the annotation effort required to adapt a WSD system to a new domain. Finally, we propose that one can maximize the dual benefits of reducing the annotation effort while ensuring an increase in WSD accuracy, by only performing active learning on the set of most frequently occurring word types.

References

[1]
M. Carpuat and D. Wu. 2007. Improving Statistical Machine Translation Using Word Sense Disambiguation. In Proc. of EMNLP-CoNLL07, pages 61--72.
[2]
X. Carreras and L. Marquez. 2005. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In Proc. of CoNLL-2005, pages 152--164.
[3]
Y. S. Chan and H. T. Ng. 2007. Domain Adaptation with Active Learning for Word Sense Disambiguation. In Proc. of ACL07, pages 49--56.
[4]
Y. S. Chan, H. T. Ng, and D. Chiang. 2007a. Word Sense Disambiguation Improves Statistical Machine Translation. In Proc. of ACL07, pages 33--40.
[5]
Y. S. Chan, H. T. Ng, and Z. Zhong. 2007b. NUS-PT: Exploiting Parallel Texts for Word Sense Disambiguation in the English All-Words Tasks. In Proc. of SemEval-2007, pages 253--256.
[6]
E. Charniak and M. Johnson. 2005. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking. In Proc. of ACL05, pages 173--180.
[7]
J. Y. Chen, A. Schein, L. Ungar, and M. Palmer. 2006. An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation. In Proc. of HLT/NAACL06, pages 120--127.
[8]
M. Collins. 1999. Head-Driven Statistical Model for Natural Language Parsing. PhD dissertation, University of Pennsylvania.
[9]
H. Daume III and D. Marcu. 2006. Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research, 26:101--126.
[10]
H. Daume III. 2007. Frustratingly Easy Domain Adaptation. In Proc. of ACL07, pages 256--263.
[11]
B. Decadt, V. Hoste, and W. Daelemans. 2004. GAMBL, Genetic Algorithm Optimization of Memory-Based WSD. In Proc. of SENSEVAL-3, pages 108--112.
[12]
G. Escudero, L. Marquez, and G. Riagu. 2000. An Empirical Study of the Domain Dependence of Supervised Word Sense Disambiguation Systems. In Proc. of EMNLP/VLC00, pages 172--180.
[13]
A. Fujii, K. Inui, T. Tokunaga, and H. Tanaka. 1998. Selective Sampling for Example-based Word Sense Disambiguation. Computational Linguistics, 24(4).
[14]
E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, and R. Weischedel. 2006. OntoNotes: The 90% solution. In Proc. of HLT-NAACL06, pages 57--60.
[15]
Y. K. Lee and H. T. Ng. 2002. An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation. In Proc. of EMNLP02, pages 41--48.
[16]
D. D. Lewis and W. A. Gale. 1994. A Sequential Algorithm for Training Text Classifiers. In Proc. of SIGIR94.
[17]
M. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330.
[18]
D. Martinez and E. Agirre. 2000. One Sense per Collocation and Genre/Topic Variations. In Proc. of EMNLP/VLC00, pages 207--215.
[19]
R. Mihalcea and D. Moldovan. 2001. Pattern Learning and Active Feature Selection for Word Sense Disambiguation. In Proc. of SENSEVAL-2, pages 127--130.
[20]
G. A. Miller, M. Chodorow, S. Landes, C. Leacock, and R. G. Thomas. 1994. Using a Semantic Concordance for Sense Identification. In Proc. of ARPA Human Language Technology Workshop, pages 240--243.
[21]
G. A. Miller. 1990. WordNet: An On-line Lexical Database. International Journal of Lexicography, 3(4):235--312.
[22]
R. Navigli, K. C. Litkowski, and O. Hargraves. 2007. SemEval-2007 Task 07: Coarse-Grained English All-Words Task. In Proc. of SemEval-2007, pages 30--35.
[23]
J. Nivre, J. Hall, S. Kubler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret. 2007. The CoNLL 2007 Shared Task on Dependency Parsing. In Proc. of EMNLP-CoNLL07, pages 915--932.
[24]
M. Palmer, C. Fellbaum, S. Cotton, L. Delfs, and H. T. Dang. 2001. English Tasks: All-Words and Verb Lexical Sample. In Proc. of SENSEVAL-2, pages 21--24.
[25]
M. Palmer, D. Gildea, and P. Kingsbury. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1):71--105.
[26]
B. Snyder and M. Palmer. 2004. The English All-Words Task. In Proc. of SENSEVAL-3, pages 41--43.
[27]
S. Tratz, A. Sanfilippo, M. Gregory, A. Chappell, C. Posse, and P. Whitney. 2007. PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation. In Proc. of SemEval-2007, pages 264--267.
[28]
J. B. Zhu and E. Hovy. 2007. Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem. In Proc. of EMNLP-CoNLL07, pages 783--790.

Cited By

View all
  • (2011)Reducing the need for double annotationProceedings of the 5th Linguistic Annotation Workshop10.5555/2018966.2018974(65-73)Online publication date: 23-Jun-2011
  • (2011)Identification of domain-specific senses in a machine-readable dictionaryProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 210.5555/2002736.2002845(552-557)Online publication date: 19-Jun-2011
  • (2010)Jointly modeling WSD and SRL with Markov logicProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873800(161-169)Online publication date: 23-Aug-2010
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing
October 2008
1129 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 25 October 2008

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)10
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Reducing the need for double annotationProceedings of the 5th Linguistic Annotation Workshop10.5555/2018966.2018974(65-73)Online publication date: 23-Jun-2011
  • (2011)Identification of domain-specific senses in a machine-readable dictionaryProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 210.5555/2002736.2002845(552-557)Online publication date: 19-Jun-2011
  • (2010)Jointly modeling WSD and SRL with Markov logicProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873800(161-169)Online publication date: 23-Aug-2010
  • (2010)Improving semantic role labeling with word senseHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858029(246-249)Online publication date: 2-Jun-2010
  • (2009)SemEval-2010 task 17Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions10.5555/1621969.1621991(123-128)Online publication date: 4-Jun-2009
  • (2009)Supervised domain adaption for WSDProceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics10.5555/1609067.1609071(42-50)Online publication date: 30-Mar-2009
  • (2009)Semi-supervised Clustering for Word Instances and Its Effect on Word Sense DisambiguationProceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing10.1007/978-3-642-00382-0_22(266-279)Online publication date: 17-Feb-2009

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media