Article

Free access

Part-of-speech tagging using virtual evidence and negative training

Authors:

Sheila M. Reynolds,

Jeff A. BilmesAuthors Info & Claims

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Pages 459 - 466

https://doi.org/10.3115/1220575.1220633

Published: 06 October 2005 Publication History

Abstract

We present a part-of-speech tagger which introduces two new concepts: virtual evidence in the form of an "observed child" node, and negative training data to learn the conditional probabilities for the observed child. Associated with each word is a flexible feature-set which can include binary flags, neighboring words, etc. The conditional probability of Tag given Word + Features is implemented using a factored language-model with back-off to avoid data sparsity problems. This model remains within the framework of Dynamic Bayesian Networks (DBNs) and is conditionally-structured, but resolves the label bias problem inherent in the conditional Markov model (CMM).

References

[1]

Michele Banko and Robert C. Moore. 2004. Part of Speech Tagging in Context. Proceedings of COLING.

Digital Library

[2]

Jeff Bilmes. 2004. On Soft Evidence in Bayesian Networks. Tech. Rep. UWEETR-2004-0016, U. Washington Dept. of Electrical Engineering, 2004.

[3]

Jeff Bilmes and Katrin Kirchhoff. 2003. Factored language models and generalized parallel backoff. Proceedings of HLT-NAACL: Short Papers, 4--6.

Digital Library

[4]

Jeff Bilmes and Geoffrey Zweig. 2002. The graphical models toolkit: An open source software system for speech and time-series processing. Proceedings of ICASSP, vol4, 3916--3919.

[5]

Kenneth W. Church and Patrick Hanks. 1989. Word Association Norms, Mutual Information, and Lexicography. Proceedings of ACL, 76--83.

Digital Library

[6]

Michael Collins. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. Proc. EMNLP.

Digital Library

[7]

Dan Klein and Christopher D. Manning. 2002. Conditional Structure versus Conditional Estimation in NLP Models. Proceedings of EMNLP, 9--16.

Digital Library

[8]

John Lafferty, Andrew McCallum and Fernando Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of ICML, 282--289.

Digital Library

[9]

Sang-Zoo Lee, Jun-ichi Tsujii and Hae-Chang Rim. 2000. Part-of-Speech Tagging Based on Hidden Markov Model Assuming Joint Independence. Proceedings of 38th ACL, 263--269.

Digital Library

[10]

Mitchell P. Marcus, Beatrice Santorini and Mary A. Marcinkiewicz. 1994. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19:313--330.

Digital Library

[11]

Andrew McCallum. 2003. Efficiently Inducing Features of Conditional Random Fields. Proceedings of UAI.

Digital Library

[12]

Andrew McCallum, Dayne Freitag and Fernando Pereira. 2000. Maximum-Entropy Markov Models for Information Extraction and Segmentation. Proc. 17th International Conf. on Machine Learning, 591--598.

Digital Library

[13]

Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

Digital Library

[14]

Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. EMNLP 1, 133--142.

[15]

Noah A. Smith and Jason Eisner 2005. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data. Proceedings of ACL.

Digital Library

[16]

Andreas Stolcke. 2002. SRILM -- an extensible language modeling toolkit. Proc. ICASSP, vol 2, 901--904.

[17]

Scott M. Thede and Mary P. Harper. 1999. A Second-Order Hidden Markov Model for Part-of-Speech Tagging. Proceedings of 37th ACL, 175--182.

Digital Library

[18]

Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. Proceedings of HLT-NAACL, 252--259.

Digital Library

Cited By

Umansky-Pesin SReichart RRappoport AJoshi AHuang CJurafsky D(2010)A multi-domain web-based algorithm for POS tagging of unknown wordsProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944712(1274-1282)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1944566.1944712
Li XKoehn PMihalcea R(2009)On the use of virtual evidence in conditional random fieldsProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 310.5555/1699648.1699675(1289-1297)Online publication date: 6-Aug-2009
https://dl.acm.org/doi/10.5555/1699648.1699675

Part-of-speech tagging using virtual evidence and negative training
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Part-of-speech tagging of modern hebrew text

Words in Semitic texts often consist of a concatenation of word segments, each corresponding to a part-of-speech (POS) category. Semitic words may be ambiguous with regard to their segmentation as well as to the POS tags assigned to each segment. When ...
Part of Speech Tagging in Bengali Using Support Vector Machine
ICIT '08: Proceedings of the 2008 International Conference on Information Technology

Part of Speech (POS) tagging is the task of labeling each word in a sentence with its appropriate syntactic category called part of speech. POS tagging is a very important preprocessing task for language processing activities. This paper reports about ...
Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario
ACL '07: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions

This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali is a morphologically rich language and our taggers make use of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

October 2005

1054 pages

Conference Chair:
Raymond J. Mooney
The University of Texas at Austin

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 October 2005

Qualifiers

Article

Acceptance Rates

HLT '05 Paper Acceptance Rate 127 of 402 submissions, 32%;

Overall Acceptance Rate 240 of 768 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Umansky-Pesin SReichart RRappoport AJoshi AHuang CJurafsky D(2010)A multi-domain web-based algorithm for POS tagging of unknown wordsProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944712(1274-1282)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1944566.1944712
Li XKoehn PMihalcea R(2009)On the use of virtual evidence in conditional random fieldsProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 310.5555/1699648.1699675(1289-1297)Online publication date: 6-Aug-2009
https://dl.acm.org/doi/10.5555/1699648.1699675

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten