Abstract
In this paper, we present an adaptation to Modern Standard Arabic of a French and English term extractor. The goal of this work is to reduce the lack of resources and NLP tools for Arabic language in specialised domains. The adaptation firstly focuses on the description of extraction processes similar to those already defined for French and English while considering the morpho-syntactic specificity of Arabic. Agglutination phenomena are further taken into account in the term extraction process. The current state of the adapted system was evaluated on a medical text corpus. 400 maximal candidate terms were examined, among which 288 were correct (72% precision). An error analysis shows that term extraction errors are first due to Part-of-Speech tagging errors and the difficulties induced by non-diacritised texts, then to remaining agglutination phenomena.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
noun-m-s-g-d: defined singular masculine noun in genitive case. noun-f-s-n-c: constructed singular feminine noun in nominative case.
- 3.
References
Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M. (eds.) Recent Advances in Computational Terminology. John Benjamins, Amsterdam (2001)
Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) STUDFUZZ. STUDFUZZ, vol. 185, pp. 255–279. Springer, Heidelberg (2005). https://doi.org/10.1007/3-540-32394-5_20
Marshman, E., Gariépy, J.L., Harms, C.: Helping language professionals relate to terms: terminological relations and termbases. J. Spec. Transl. 18, 45–71 (2012)
Q. Zadeh, B., Handschuh, S.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), Dublin, Ireland, pp. 52–63 (2014)
Cohen, K.B., Demner-Fushman, D.: Biomedical Natural Language Processing. John Benjamins Publishing Company, Philadelphia (2013)
Aubin, S., Hamon, T.: Improving term extraction with terminological resources. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 380–387. Springer, Heidelberg (2006). https://doi.org/10.1007/11816508_39
Boulaknadel, S., Daille, B., Aboutajdine, D.: A multi-word term extraction program for arabic language. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Tapias, D. (eds.) Proceedings of the LREC 2008 (2008)
Habash, N.: Introduction to Arabic Natural Language Processing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, San Raphael (2010)
Massoud, R.: La terminologie au liban : réalités et défis. Annales de l’Institut de langues et de traduction (ILT) 10 (2003)
Samy, D., Moreno-Sandoval, A., Bueno-Díaz, C., Garrote-Salazar, M., Guirao, J.M.: Medical term extraction in an arabic medical corpus. In: Proceedings of LREC 2012 (2012)
Daille, B.: Conceptual structuring through term variations. In: Bond, F., Kohonen, A., Carthy, D.M., Villaciencio, A. (eds.) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition, and Treatment, pp. 9–16 (2003)
Bounhas, I., Slimani, Y.: A hybrid approach for arabic multi-word term extraction. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2009, pp. 1–8. IEEE (2009)
Dunning, T.: Accurate methods for the statistics of suprise and coincidence. Comput. Linguist. 19, 61–74 (1993). Special Issue on Using Large Corpora: I
AlKhatib, K., Badarneh, A.: Automatic extraction of arabic multi-word terms. In: IMCSIT, pp. 411–418 (2010)
Kageura, K., Umino, B.: Methods of automatic term recognition - a review. Terminology 3, 259–289 (1996)
Maynard, D., Ananiadou, S.: Identifying terms by their family and friends. In: Proceedings of COLING 2000, Saarbrucken, Germany, pp. 530–536 (2000)
Abed, A.M., Tiun, S., Albared, M.: Arabic term extraction using combined approach on islamic document. J. Theor. Appl. Inf. Technol. 58, 601–608 (2013)
Bounhas, I., Elayeb, B., Evrard, F., Slimani, Y.: Organizing contextual knowledge for arabic text disambiguation and terminology extraction. Knowl. Org. J. 38, 473–490 (2011)
Bounhas, I., Lahbib, W., Elayeb, B.: Arabic domain terminology extraction: a literature review. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 792–799. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45563-0_51
Korkontzelos, I., Klapaftis, I.P., Manandhar, S.: Reviewing and evaluating automatic term recognition techniques. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 248–259. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85287-2_24
Hamon, T., Engström, C., Silvestrov, S.: Term ranking adaptation to the domain: genetic algorithm-based optimisation of the C-Value. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 71–83. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10888-9_8
Roth, R., Rambow, O., Habash, N., Diab, M., Rudin, C.: Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In: Proceedings of ACL-08: HLT, Short Papers, Columbus, Ohio, pp. 117–120 (2008)
Hadrich, L.B., Chaaben, N.: Analyse et désambiguïsation morphologiques de textes arabes non voyellés. In: Actes de TALN’06, Leuven, Belgique, pp. 493–501 (2006)
Al-Sulaiti, L., Atwell, E.: The design of a corpus of contemporary arabic. Int. J. Corpus Linguist. 11, 1–36 (2006)
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)
Habash, N., Rambow, O., Roth, R.: MADA+TOKAN Manual. CCLS-10-01 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Neifar, W., Hamon, T., Zweigenbaum, P., Khemakhem, M.E., Belguith, L.H. (2018). Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-75477-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)