Abstract
The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aloulou, C. (2005). Une approche multi-agent pour l’analyse de l’arabe : Modélisation de la syntaxe. Doctoral dissertation, University of Manouba, Tunisia
Alqrainy, S., Muaidi, H., & Alkoffash, M. S. (2012). Context-free grammar analysis for Arabic sentences. International Journal of Computer Applications, 53(3), 7–11.
Al-Taani, A., Msallam, M., & Wedian, S. (2012). A top-down chart parser for analyzing Arabic sentences. The International Arab Journal of Information Technology, 9, 109–116.
Bataineh, B. M., & Bataineh, E. A. (2009). An efficient recursive transition network parser for Arabic language. In Proceedings of the World Congress on Engineering, vol 2 (pp. 1–3)
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python. Sebastopol: O’Reilly Media Inc.
Buckwalter T. (2004). ‘Buckwalter Arabic morphological analyzer version 2.0′.
Debili, F., Achour, H., & Souissi, E. (2001). La langue Arabe et l’ordinateur: De l’etiquetage grammatical à la voyellation automatique, Correspondances 71 (1), Lyon, (pp. 1–20).
Green, S., and Manning, C. D. (2010). Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd international conference on computational linguistics (pp. 394–402). Baltimore: Association for Computational Linguistics.
Habash, N. Y. (2010). Introduction to Arabic Natural Language Processing. Synthesis Lectures on Human Language Technologies, G. Hirst, (Series Ed). 3(1).
Habash, N. Y., & Roth, R. M. (2009). Catib: The Columbia Arabic Treebank. In Proceedings of the ACL-IJCNLP 2009 conference short papers (pp. 221–224). Stroudsburg, PA: Association for Computational Linguistics.
Hajic, J., Vidová-Hladká, B., & Pajas, P. (2001). The Prague dependency treebank: Annotation structure and support. In Proceedings of the IRCS workshop on linguistic databases (pp. 105–114).
Khoufi, N., Aloulou, C., & Hadrich Belguith, L. (2014) Chunking Arabic texts using conditional random fields, In Proceedings of the 11th ACS/IEEE international conference on computer systems and applications (AICCSA 2014) (pp. 428–432), November 2014, Doha.
Khoufi, N., Louati, S., Aloulou, C., & Hadrich Belguith, L.(2013) Supervised learning model for parsing Arabic language, In Proceedings of the 10th International workshop on natural language processing and cognitive science (NLPCS 2013) (pp. 129–136), Marseille.
Klein, D., & Manning, C. D. (2003). Fast exact inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge (pp. 3–10). MA: MIT Press.
Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. The NEMLAR conference on Arabic language resources and tools, pp. 102–109.
Maamouri, M., Bies, A. and Kulick, S. (2008). Enhancing the Arabic Treebank: A collaborative effort toward new annotation guidelines. In Proceedings of the sixth international conference on language resources and evaluation (LREC 2008), Marrakech May 28-30, 2008.
Maamouri M., Bies A., Kulick S., Krouna S., Gaddeche F. & Zaghouani W. (2010). Arabic Treebank: Part 3 v 3.2 LDC2010T08. Web Download. Philadelphia: Linguistic Data Consortium.
McCord, M. C., & Cavalli-Sforza, V. (2007). An arabic slot grammar parser. In Proceedings of the 2007 Workshop on computational approaches to semitic languages: Common issues and resources (pp. 81–88). Baltimore: Association for Computational Linguistics.
Othman, E., Shaalan, K., and Rafea, A. (2003). A chart parser for analyzing Modern Standard Arabic sentences. In Proceedings of the MT summit IX workshop on machine translation for semitic languages: issues and approaches (pp. 37–44).
Ouersighni, R. (2001). A major offshoot of the DIINAR-MBC project: AraParse, a morphosyntactic analyzer for unvowelled Arabic texts. ACL 39th Annual Meeting, Stroudsburg (pp. 9–16). Association for Computational Linguistics: PA.
Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery M., Rambow O., & Roth, R. M. (2014). Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the language resources and evaluation conference (LREC), Reykjavik.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khoufi, N., Aloulou, C. & Belguith, L.H. Parsing Arabic using induced probabilistic context free grammar. Int J Speech Technol 19, 313–323 (2016). https://doi.org/10.1007/s10772-015-9300-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9300-x