Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A hybrid Arabic POS tagging for simple and compound morphosyntactic tags

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The objective of this work is to develop a POS tagger for the Arabic language. This analyzer uses a very rich tag set that gives syntactic information about proclitic attached to words. This study employs a probabilistic model and a morphological analyzer to identify the right tag in the context. Most published research on probabilistic analysis uses only a training corpus to search the probable tags for each words, and this sometimes affects their performances. In this paper, we propose a method that takes into account the tags that are not included in the training data. These tags are proposed by the Alkhalil_Morpho_Sys analyzer (Bebah et al. 2011). We show that this consideration increases significantly the accuracy of the morphosyntactic analysis. In addition, the adopted tag set is very rich and it contains the compound tags that allow analyze the proclitics attached to words.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Al Shamsi, F., & Guessoum, A. (2006). A hidden markov model-based POS tagger for Arabic. In Proceedings of the 8th International Conference on the Statistical. Besançon, France.

  • Al-Taani, A. T., & Al-Rub, S. A. (2009). A rule-based approach for tagging non-vocalized Arabic words. International Arab Journal of Information Technology, 6(3), 320–328.

    Google Scholar 

  • Altabba, M., Al-Zaraee, A., & Shukairy, M. A. (2010). An Arabic morphological analyzer and part-of-speech tagger. Thesis, Faculty of Informatics Engineering, Arab International University, Damascus.

  • Antony, P. J., & Soman, K. P. (2011). Parts of speech tagging for Indian languages: A literature survey. International Journal of Computer Applications (0975-8887), 34(8), 22–29.

    Google Scholar 

  • Atiyya, M., Choukri, K., & Yaseen, M. (2005, September 29). NEMLAR Arabic written corpus. Retrieved June 11, 2015, from http://www.rdi-eg.com/Downloads/Lang%20Tech/Nemlar-specifications-resources-WC-V3.0_Final.doc.

  • Attia, M., Yaseen, M., & Choukri, K. (2005). Specifications of the Arabic Written Corpus produced within the NEMLAR project. http://www.medar.info/The_Nemlar_Project/Publications/WC_design_final.pdf.

  • Bebah, M. O. A. O., Meziane, A., Mazroui, A., & Lakhouaja, A. (2011). Alkhalil morpho sys. In 7th International computing conference in Arabic.

  • Boudchiche, M., Mazroui, M., ould Abdallahi Ould Bebah, M., & Lakhouaja, A. (2014). L’analyseur Morphosyntaxique Alkhali Morpho Sys 2. In 1st National Doctoral Day of Engineering Arabic Language.

  • Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the workshop on speech and natural language (pp. 112–116). Association for Computational Linguistics.

  • Buckwalter, T. (2002). Buckwalter Arabic morphological analyzer version 2.0. Linguistic Data Consortium, University of Pennsylvania. LDC Catalog No. LDC2002L49. ISBN 1-58563-324-0.

  • Chalabi, A. (2004). Sakhr Arabic lexicon. In NEMLAR international conference on Arabic language resources and tools (pp. 21–24).

  • Darwish, K., Abdelali, A., & Mubarak, H. (2014). Using stem-templates to improve Arabic POS and gender/number tagging. In International conference on language resources and evaluation (LREC-2014).

  • Diab, M. (2009). Second generation AMIRA tools for Arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking. In 2nd International conference on Arabic language resources and tools. Cairo, Egypt.

  • Diab, M., Hacioglu, K., & Jurafsky, D. (2004). Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proceedings of HLT-NAACL 2004: Short papers (pp. 149–152). Association for Computational Linguistics.

  • El Jihad, A., & Yousfi, A. (2005). Etiquetage morpho-syntaxique des textes arabes par modèle de Markov caché. In Proceedings of Rencontre Des Etudiants Chercheurs En Informatique Pour Le Traitement Automatique Des Langues (pp. 649–654). Dourdan, France

  • El-Jihad, A., Yousfi, A., & Si-Lhoussain, A. (2011). Morpho-syntactic tagging system based on the patterns words for Arabic texts. International Arab Journal of Information Technology, 8(4), 350–354.

    Google Scholar 

  • Ghoul, D. (2011). Outils génériques pour l’étiquetage morphosyntaxique de la langue arabe: segmentation et corpus d’entraînement.

  • Huang, L., Peng, Y., Wang, H., & Wu, Z. (2002). Statistical part-of-speech tagging for classical Chinese. In Text, speech and dialogue (pp. 115–122). Brno

  • Khoja, S. (2001). APT: Arabic part-of-speech tagger. In Proceedings of the student workshop at NAACL (pp. 20–25).

  • Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Nakagawa, T., & Uchimoto, K. (2007). A hybrid approach to word segmentation and POS tagging. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp. 217–220). Association for Computational Linguistics.

  • Neuhoff, D. L. (1975). The Viterbi algorithm as an aid in text recognition (Corresp.). IEEE Transactions on Information Theory, 21(2), 222–226.

    Article  Google Scholar 

  • Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8(1), 1–38.

    Article  Google Scholar 

  • Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., et al. (2014). A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. Reykjavik: LREC.

    Google Scholar 

  • Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing (Vol. 12, pp. 44–49). Manchester.

  • Thibeault, M. (2004). La catégorisation grammaticale automatique: adaptation du catégoriseur de Brill au français et modification de l’approche. Université Laval.

  • Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on Human Language Technology (Vol. 1, pp. 173–180). Association for Computational Linguistics.

  • Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Ababou.

Appendix

Appendix

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ababou, N., Mazroui, A. A hybrid Arabic POS tagging for simple and compound morphosyntactic tags. Int J Speech Technol 19, 289–302 (2016). https://doi.org/10.1007/s10772-015-9302-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9302-8

Keywords

Navigation