Parsing Arabic using induced probabilistic context free grammar

Nabil Khoufi¹,
Chafik Aloulou¹ &
Lamia Hadrich Belguith¹

400 Accesses
7 Citations
Explore all metrics

Abstract

The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Aloulou, C. (2005). Une approche multi-agent pour l’analyse de l’arabe : Modélisation de la syntaxe. Doctoral dissertation, University of Manouba, Tunisia
Alqrainy, S., Muaidi, H., & Alkoffash, M. S. (2012). Context-free grammar analysis for Arabic sentences. International Journal of Computer Applications, 53(3), 7–11.
Article Google Scholar
Al-Taani, A., Msallam, M., & Wedian, S. (2012). A top-down chart parser for analyzing Arabic sentences. The International Arab Journal of Information Technology, 9, 109–116.
Google Scholar
Bataineh, B. M., & Bataineh, E. A. (2009). An efficient recursive transition network parser for Arabic language. In Proceedings of the World Congress on Engineering, vol 2 (pp. 1–3)
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python. Sebastopol: O’Reilly Media Inc.
MATH Google Scholar
Buckwalter T. (2004). ‘Buckwalter Arabic morphological analyzer version 2.0′.
Debili, F., Achour, H., & Souissi, E. (2001). La langue Arabe et l’ordinateur: De l’etiquetage grammatical à la voyellation automatique, Correspondances 71 (1), Lyon, (pp. 1–20).
Green, S., and Manning, C. D. (2010). Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd international conference on computational linguistics (pp. 394–402). Baltimore: Association for Computational Linguistics.
Habash, N. Y. (2010). Introduction to Arabic Natural Language Processing. Synthesis Lectures on Human Language Technologies, G. Hirst, (Series Ed). 3(1).
Habash, N. Y., & Roth, R. M. (2009). Catib: The Columbia Arabic Treebank. In Proceedings of the ACL-IJCNLP 2009 conference short papers (pp. 221–224). Stroudsburg, PA: Association for Computational Linguistics.
Hajic, J., Vidová-Hladká, B., & Pajas, P. (2001). The Prague dependency treebank: Annotation structure and support. In Proceedings of the IRCS workshop on linguistic databases (pp. 105–114).
Khoufi, N., Aloulou, C., & Hadrich Belguith, L. (2014) Chunking Arabic texts using conditional random fields, In Proceedings of the 11th ACS/IEEE international conference on computer systems and applications (AICCSA 2014) (pp. 428–432), November 2014, Doha.
Khoufi, N., Louati, S., Aloulou, C., & Hadrich Belguith, L.(2013) Supervised learning model for parsing Arabic language, In Proceedings of the 10th International workshop on natural language processing and cognitive science (NLPCS 2013) (pp. 129–136), Marseille.
Klein, D., & Manning, C. D. (2003). Fast exact inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge (pp. 3–10). MA: MIT Press.
Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. The NEMLAR conference on Arabic language resources and tools, pp. 102–109.
Maamouri, M., Bies, A. and Kulick, S. (2008). Enhancing the Arabic Treebank: A collaborative effort toward new annotation guidelines. In Proceedings of the sixth international conference on language resources and evaluation (LREC 2008), Marrakech May 28-30, 2008.
Maamouri M., Bies A., Kulick S., Krouna S., Gaddeche F. & Zaghouani W. (2010). Arabic Treebank: Part 3 v 3.2 LDC2010T08. Web Download. Philadelphia: Linguistic Data Consortium.
McCord, M. C., & Cavalli-Sforza, V. (2007). An arabic slot grammar parser. In Proceedings of the 2007 Workshop on computational approaches to semitic languages: Common issues and resources (pp. 81–88). Baltimore: Association for Computational Linguistics.
Othman, E., Shaalan, K., and Rafea, A. (2003). A chart parser for analyzing Modern Standard Arabic sentences. In Proceedings of the MT summit IX workshop on machine translation for semitic languages: issues and approaches (pp. 37–44).
Ouersighni, R. (2001). A major offshoot of the DIINAR-MBC project: AraParse, a morphosyntactic analyzer for unvowelled Arabic texts. ACL 39th Annual Meeting, Stroudsburg (pp. 9–16). Association for Computational Linguistics: PA.
Google Scholar
Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery M., Rambow O., & Roth, R. M. (2014). Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the language resources and evaluation conference (LREC), Reykjavik.

Download references

Author information

Authors and Affiliations

ANLP-RG, MIR@CL Lab, University of Sfax, Sfax, Tunisia
Nabil Khoufi, Chafik Aloulou & Lamia Hadrich Belguith

Authors

Nabil Khoufi
View author publications
You can also search for this author in PubMed Google Scholar
Chafik Aloulou
View author publications
You can also search for this author in PubMed Google Scholar
Lamia Hadrich Belguith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nabil Khoufi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoufi, N., Aloulou, C. & Belguith, L.H. Parsing Arabic using induced probabilistic context free grammar. Int J Speech Technol 19, 313–323 (2016). https://doi.org/10.1007/s10772-015-9300-x

Download citation

Received: 04 May 2015
Accepted: 12 August 2015
Published: 04 September 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10772-015-9300-x

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Statistical Arabic Grammar Analyzer

A Framework for Language Resource Construction and Syntactic Analysis: Case of Arabic

Transducer Cascade to Parse Arabic Corpora

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Parsing Arabic using induced probabilistic context free grammar

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Statistical Arabic Grammar Analyzer

A Framework for Language Resource Construction and Syntactic Analysis: Case of Arabic

Transducer Cascade to Parse Arabic Corpora

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation