Abstract
Arabic is a challenging language when it comes to grammar production and parsing. It combines complex linguistic phenomena with a rich morphology that make its processing particularly ambiguous. This leaded us to choose the Tree-Adjoining Grammar (TAG) formalism. Indeed, TAG provides sufficient constraints for handling diverse linguistic phenomena and seems to be adequate to represent Arabic syntactic structures. In this paper, we present a semi-automatically generated TAG for modern standard Arabic using a compiler and a metagrammatical description language called XMG (eXtensible MetaGrammar). We describe the linguistic coverage of our grammar, and show how we used TAG and XMG’s properties to define in an expressive and concise way different linguistic phenomena. To check the coverage of our grammar, we have set up a development environment including a parser and using a test corpus of linguistic phenomena gathering both grammatical and ungrammatical sentences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
XMG2 extends XMG by including a meta metagrammar compiler.
- 5.
A black node is a resource and can be unified with 0 or more white nodes; a white node is a need and must be unified with a black node; a red node is saturated and cannot be unified with any other node.
- 6.
In our metagrammatical description, tree fragment names are in French (e.g. EpineVerbe) and so are syntactic categories (e.g. SV for Syntagme Verbal).
- 7.
For the elliptical subject.
- 8.
ObjetCanon [Objet1] \(\longrightarrow \) ObjetCanonSN[Objet1] \(\vee \) ObjetCanonClit[Objet1] \(\vee \) ObjetIndCanon[Objet1]
- 9.
ObjetCanon[Objet2]\(\longrightarrow \) ObjetCanonSN[Objet2] \(\vee \) ObjetCanonClit[Objet2] \(\vee \) ObjetIndCanon[Objet2]
- 10.
In order to decrease the size of the image some features have been omitted.
- 11.
We did not include phrasal structures.
References
Belguith, L., Aloulou, C., Ben Hamadou.: MASPAR: De la segmentation À l’analyse syntaxique de textes arabes. CÉPADUÈS-Editions, editeur, Revue Information Interaction Intelligence I, Vol. 3, 9–36 (2007)
Loukam, M., Laskri, M.T.: PHARAS: Une plateforme d’analyse basée sur le formalisme HPSG pour l’arabe standard: Développements récents et perspectives. JED’08, Journées de l’Ecole Doctorale, University Badji Mokhtar, Annaba, Algeria (2008)
Attia, M.: Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. Ph.D. Dissertation. University of Manchester, Faculty of Humanities (2008)
Habash, N. and Rambow, O.: Extracting a tree adjoining grammar from the penn arabic treebank. In: Proceedings of Traitement Automatique du Langage Naturel (TALN-04), pp. 277–284 (2004)
Crabbé, B., Duchier, D., Gardent, C.. Le., Roux, J., Parmentier, Y.: XMG : eXtensible MetaGrammar. Comput. Linguist. 39(3), 591–629 (2013)
Ben Khelil, C., Duchier, D., Parmentier, P., Zribi, C., Ben Fraj, F.: ArabTAG : from a Handcrafted to a Semi-automatically Generated TAG, In TAG+12 : 12th International Workshop on Tree-Adjoining Grammars and Related Formalisms, Düsseldorf, Germany (2016)
Joshi, A., Levy, L., Takahashi, M.: Tree adjunct grammars. J. Comput. Syst. Sci. 10(1), 136–163 (1975)
Maamouri, M., Bies, A., Jin, H., Buckwalter, T.: Arabic treebank: Part 1 v 2.0. LDC Catalog No.: LDC2003T06, ISBN: 1-58563-261-9, ISLRN: pp. 333-321-196-670-5 (2003)
Maamouri, M., Bies, A.: Developing an arabic treebank: Methods, guidelines, procedures, and tools. In: Ali Farghaly and Karine Megerdoomian, editors, COLING 2004 Computational Approaches to Arabic Script-based Languages, pp. 2–9, Geneva, Switzerland (2004)
Ben Fraj, F.: Construction d’une grammaire d’arbres adjoints pour la langue arabe. In: Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles, Montpellier, France, June. Association pour le Traitement Automatique des Langues (2011)
Kouloughli, D.: La grammaire Arabe pour tous. Press Pocket (1992)
Simon Petitjean, S.: Génération Modulaire de Grammaires Formelles. Ph.D. thesis, Université d’Orléans, France (2014)
XTAG Research Group,: A lexicalized tree adjoining grammar for english, Technical Report IRCS-01-03, IRCS, University of Pennsylvania (2001)
Parmentier, Y., Kallmeyer, L., Lichte, T., Maier, W., Dellert, J.: TuLiPA : A Syntax-Semantics Parsing Environment for Mildly Context-Sensitive Formalisms. In: 9th International Workshop on Tree-Adjoining Grammar and Related Formalisms (TAG+9),121–128, Tübingen, Germany (2008)
Ben Khelil, C., Ben Othmane Zribi, C., Duchier, D., Parmentier, Y.: A new syntactic-semantic interface for ArabTAG an Arabic Tree Adjoining grammar. In: Proceedings of International Arabic Conference of Information Technology (ACIT 2017). Hammamet, Tunisia (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Ben Khelil, C., Othmane Zribi, C.B., Duchier, D., Parmentier, Y. (2023). Parsing Arabic with a Semi-automatically Generated TAG: Dealing with Linguistic Phenomena. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13397. Springer, Cham. https://doi.org/10.1007/978-3-031-23804-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-23804-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23803-1
Online ISBN: 978-3-031-23804-8
eBook Packages: Computer ScienceComputer Science (R0)