Abstract
While adopting the contextualized hidden Markov model (CHMM) framework for unsupervised Russian POS tagging, we investigate the possibility of utilizing the left, right, and unambiguous context in the CHMM framework. We propose a backoff smoothing method that incorporates all three types of context into the transition probability estimation during the expectation-maximization process. The resulting model with this new method achieves overall and disambiguation accuracies comparable to a CHMM using the classic backoff smoothing method for HMM-based POS tagging from [17].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abend, O., Reichart, R., Rappoport, A.: Improved unsupervised pos induction through prototype discovery. In: Proceedings of the 48th ACL (2010)
Adler, M.: Hebrew Morphological Disambiguation. Ph.D. thesis, University of the Negev (2007)
Banko, M., Moore, R.C.: Part of speech tagging in context. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)
Berg-Kirkpatrick, T., Bouchard-Ct, A., DeNero, J., Klein, D.: Painless unsupervised learning with features. In: Proceedings of NAACL 2010 (2010)
Brill, E.: Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging. In: Very Large, pp. 1–13. Kluwer Academic Press, Dordrecht (1995)
Chen, S.F.: Building Probabilistic Models for Natural Language. Ph.D. thesis, Harvard University (1996)
Goldberg, Y., Adler, M., Elhadad, M.: Em can find pretty good pos taggers (when given a good start). In: Proceedings of ACL 2008: HLT (2008)
Goldwater, S., Griffiths, T.: A fully bayesian approach to unsupervised part-of-speech tagging. In: Proceedings of the 45th ACL (2007)
Haghighi, A., Klein, D.: Prototype-driven learning for sequence models. In: Proceedings of the main conference on HLT-NAACL (2006)
Johnson, M.: Why doesnt em find good hmm pos-taggers. In: n EMNLP (2007)
Kriouile, A.: Some improvements in speech recognition algorithms based on hmm. In: Acoustics, Speech, and Signal Processing (1990)
Kupiec, J.: Robust part-of-speech tagging using a hidden markov model. Computer Speech & Language 6, 225–242 (1992)
Lamar, M., Maron, Y., Bienenstock, E.: Latent descriptor clustering for unsupervised pos induction. In: EMNLP 2010 (2010)
Merialdo, B.: Tagging english text with a probabilistic model. Computational Linguistics 20, 155–171 (1994)
Mihalcea, R.: The role of non-ambiguous words in natural language disambiguation. In: Proceedings of the Conference on RANLP (2003)
Ravi, S., Knight, K.: Minimized models for unsupervised part-of-speech tagging. In: Proceedings of ACL-IJCNLP 2009, pp. 504–512 (2009)
Thede, S.M., Harper, M.P.: A second-order hidden markov model for part-of-speech tagging. In: Proceedings of the 37th Annual Meeting of the ACL (1999)
Toutanova, K., Johnson, M.: A bayesian lda-based model for semi-supervised part-of-speech tagging. In: Proceedings of NIPS (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, L., Peterson, E., Chen, J., Petrova, Y., Srihari, R. (2011). Unsupervised Russian POS Tagging with Appropriate Context. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_54
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)