Abstract
In natural language processing, a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. Mainstream approaches are generally corpus-based: a POS tagger learns from a corpus of pre-annotated data how to correctly tag unlabeled data. Presented here is a brief state-of-the-art account on POS tagging. POS tagging approaches make use of labeled corpus to train computational trained models. Several typical models of three kings of tagging are introduced in this article: rule-based tagging, statistical approaches and evolution algorithms. The advantages and the pitfalls of each typical tagging are discussed and analyzed. Some rule-based and stochastic methods have been successfully achieved accuracies of 93–96 %, while that of some evolution algorithms are about 96–97 %.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Araujo, L. (2001). Evolutionary parsing for a probabilistic context free grammar. In Rough sets and current trends in computing, Canada (pp. 590–597). Berlin: Springer.
Araujo, L. (2002). Part-of-speech tagging with evolutionary algorithms. In Third International conference on computational linguistics and intelligent text processing, Mexico City, Mexico (pp. 187–203).
Bohnet, B., & Nivre, J. (2012). A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing. In Joint conference on empirical methods in natural language processing & computational natural language learning, Jeju Island, Korea (pp. 1455–1465).
Brants, T. (2000). TnT: a statistical part-of-speech tagger. In Proceedings of the sixth applied natural language processing conference, Seattle, WA (pp. 224–231). Trento: Association for Computational Linguistics.
Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the third conference on applied computational linguistics (pp. 112–116). Trento: Association for Computational Linguistics.
Brill, E. (1995). Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics, 21(4), 543–565.
Carlberger, J., & Kann, V. (1999). Implementing an efficient part-of-speech tagger. Software-Practice and Experience, 29(9), 815–832.
Charniak, E., Hendrickson, C., et al. (1993). Equations for part-of-speech tagging. In AAAI-93, Proceedings (pp. 784–784). New York: Wiley.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines: and other kernel-based learning methods. Cambridge: Cambridge University Press.
Cutting, D., Kupiec, J., et al. (1992). A practical part-of-speech tagger (pp. 133–140). Trendo: Association for Computational Linguistics.
Davis, M., & Dunning, T. (1995). Query translation using evolutionary programming for multi-lingual information retrieval. In Proceedings of the fourth annual conference on evolutionary programming (pp. 175–185).
Ferreira, C. (2001). Gene expression programming: a new adaptive algorithm for solving problems. Arxiv preprint cs/0102027.
Ferreira, C. (2003). Function finding and the creation of numerical constants in gene expression programming. In Advances in soft computing, 265.
Garrette, D., & Baldridge, J. (2013). Learning a part-of-speech tagger from two hours of annotation. In Proceedings of NAACL, Atlanta, Georgia (pp. 129–134).
Giménez, J., & Marquez, L. (2004). SVMTool: A general POS tagger generator based on support vector machines. In Proceedings of the 4th international conference on language resources and evaluation (LREC’04), Citeseer.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison: Wesley.
Greene, B. B., & Rubin, G. M. (1971). Automatic grammatical tagging of English. Department of Linguistics, Brown University.
Jamatia, A., Gamblack, B., & Das, A. (2015). Part-of-speech tagging for code-mixed english-hindi twitter and facebook chat messages. In Proceedings of recent advances in natural language processing (pp. 239–248). Hissar.
Jing, P., Changjie, T., et al. (2005). M-GEP: a new evolution algorithm based on multi-layer chromosomes gene expression programming. Chinese Journal of Computers, 28(9), 1459–1466.
Karakasis, V. K., & Stafylopatis, A. (2008). Efficient evolution of accurate classification rules using a combination of gene expression programming and clonal selection. IEEE Transactions on Evolutionary Computation, 12(6), 662–678.
Karkaletsis, G., Petasis, G., & Paliouras, V. (2015). Using machine learning techniques for part-of-speech tagging in the Greek language. Singapore: World Scientific Publishing Company.
Kempe, A. (1993). A probabilistic tagger and an analysis of tagging errors. Rapport technique, Institut für maschinelle sprachverarbeitung, Universität stuttgart.
Krovetz, R. (1997). Homonymy and polysemy in information retrieval. In Meeting of the Association for Computational Linguistics (pp. 72–79). Trendo: Association for Computational Linguistics.
Lee, S. Z., Tsujii, J. I., & Rim, H. C. (2000). Lexicalized hidden markov models for part-of-speech tagging. In International conference on computational linguistics (pp. 481–487). Trendo: Association for Computational Linguistics.
Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1(1), 1–38.
Lv, C., Liu, H., et al. (2010). An efficient corpus based part-of-speech tagging with GEP. In Sixth international conference on semantics, knowledge and grids (pp. 289–292). IEEE.
Magerman, D. M. (1995). Statistical decision-tree models for parsing. In Meeting of the Association for Computational Linguistics (pp. 276–283). Trendo: Association for Computational Linguistics.
Manning, C. D., Schütze, H., et al. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.
Marques, N., & Lopes, G. (2001). Tagging with small training corpora. In International symposium on advances in intelligent data analysis (pp. 63–72). Berlin: Springer.
Màrquez, L., Padro, L., et al. (2000). A machine learning approach to POS tagging. Machine Learning, 39(1), 59–91.
Martinez, A. R. (2012). Part-of-speech tagging. Wiley Interdisciplinary Reviews, 4(1), 107–113.
Merialdo, B. (1994). Tagging English text with a probabilistic model. Computational Linguistics, 20(2), 155–171.
Nakagawa, T., Kudoh, T., et al. (2001). Unknown word guessing and part-of-speech tagging using support vector machines. In Proceedings of the sixth natural language processing pacific rim symposium (pp. 325–331).
Nakamura, M., Maruyama, K., et al. (1990). Neural network approach to word category prediction for English texts. In International conference on computational linguistics (pp. 213–218). Trendo: Association for Computational Linguistics.
Ngai, G., & Florian, R. (2001). Transformation-based learning in the fast lane. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies (pp. 1–8).
Owoputi, O., O’Connor, B., & Dyer, C. (2013). Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of NAACL, Atlanta (pp. 380–390).
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE (vol. 77(2), pp. 257–286).
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In Proceedings of EMNLP’1996, New Brunswick, New Jersey (vol. 1, pp. 133–142).
Sánchez-Villamil, E., Forcada, M., et al. (2004). Unsupervised training of a finite-state sliding-window part-of-speech tagger. EsTAL, 2004, 454–463.
Schmid, H. (1994). Part-of-speech tagging with neural networks. In International conference on computational linguistics (pp. 172–176). Trendo: Association for Computational Linguistics.
Smith, T. C., & Witten, I. H. (1995). A genetic algorithm for the induction of natural language grammars. In Proc IJCAI-95 workshop on new approaches to learning for natural language processing (pp. 17–24).
Sun, G., Lang, F., & Qiao P. (2008). Chinese part-of-speech tagging based on fusion model. In Proceedings of the 11th joint conference on information sciences. Amsterdam: Atlantis Press.
Thede, S. M., & Harper, M. P. (1999). A second-order hidden Markov model for part-of-speech tagging. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 175–182).
Varile, G. B., & Zampolli, A. (1997). Survey of the state of the art in human language technology. Cambridge: Cambridge University Press.
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
Voutilainen, A. (2003). Part-of-speech tagging. The Oxford handbook of computational linguistics (pp. 219–232).
Wilks, Y., & Stevenson, M. (2000). Combining independent knowledge sources for word sense disambiguation. Amsterdam Studies in the Theory and History of Linguistic Science Series, 4, 117–130.
Tian, Y., & Lo, D. (2015). A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports. In International conference on software analysis, evolution and reengineering (pp. 570–574). Montréal.
Zhou, C., Xiao, W., et al. (2003). Evolving accurate and compact classification rules with gene expression programming. IEEE Transactions on Evolutionary Computation, 7(6), 519–531.
Zuo, J., Tang, C., et al. (2002). Mining predicate association rule by gene expression programming. In Advances in web-age information management (pp. 281–294).
Zuo, J., Tang, C., et al. (2004). Time series prediction based on gene expression programming. In Advances in web-age information management (pp. 55–64).
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Nos. 61440018, 61501411), the Hubei Natural Science Foundation (No. 2014CFB904).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lv, C., Liu, H., Dong, Y. et al. Corpus based part-of-speech tagging. Int J Speech Technol 19, 647–654 (2016). https://doi.org/10.1007/s10772-016-9356-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-016-9356-2