Abstract
This paper describes the design and implementation of a computational model for Arabic natural language semantics, a semantic parser for capturing the deep semantic representation of Arabic text. The parser represents a major part of an Interlingua-based machine translation system for translating Arabic text into Sign Language. The parser follows a frame-based analysis to capture the overall meaning of Arabic text into a formal representation suitable for NLP applications that need for deep semantics representation, such as language generation and machine translation. We will show the representational power of this theory for the semantic analysis of texts in Arabic, a language which differs substantially from English in several ways. We will also show that the integration of WordNet and FrameNet in a single unified knowledge resource can improve disambiguation accuracy. Furthermore, we will propose a rule based algorithm to generate an equivalent Arabic FrameNet, using a lexical resource alignment of FrameNet1.3 LUs and WordNet3.0 synsets for English Language. A pilot study of motion and location verbs was carried out in order to test our system. Our corpus is made up of more than 2000 Arabic sentences in the domain of motion events collected from Algerian first level educational Arabic books and other relevant Arabic corpora.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Al-Sulaiti, L., & Atwell, E. S. (2006). The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11(2), 135–171.
Baker, C. F. (2012). FrameNet, current collaborations and future goals. Language Resources and Evaluation, 46(2), 269–286.
Baker, C., Ellsworth, M., & Erk, K. (2007). SemEval’07 task 19: Frame semantic structure extraction. In: Proceedings of the 4th international workshop on semantic evaluations (pp. 99–104). Association for Computational Linguistics.
Baker, C. F., & Fellbaum, C. (2009). WordNet and FrameNet as complementary resources for annotation. In: Proceedings of the third linguistic annotation workshop (pp. 125–129). Singapore: ACL
Baker, C. F., Fillmore, C. J., & Cronin, B. (2003). The structure of the FrameNet database. International Journal of Lexicography, 16(3), 281–296.
Bick, E. (2011). A FrameNet for Danish. In: Proceedings of Nodalida 2011, May 11–13, Riga, Latvia. NEALT Proceedings Series (vol 11, pp. 34–41).
Boas, H. C. (2002). Bilingual FrameNet dictionaries for machine translation. In: LREC 2002 (pp. 1364–1371). Las Palmas: Iles Canaries
Boas, H. C. (2005). Semantic frames as interlingual representations for multilingual lexical databases. International Journal of Lexicography, 18(4), 445–478.
Boas, H. C. (2009). Recent trends in multilingual computational lexicography. Multilingual FrameNets in computational lexicography: Methods and applications (pp. 1–36). Berlin: Mouton de Gruyter.
Borin, L., Dannélls, D., Forsberg, M., Toporowska Gronostaj, M., & Kokkinakis, D. (2010). The past meets the present in Swedish FrameNet+. In: 14th EURALEX international congress.
Brachman, R., & Levesque, H. (2004). Knowledge representation and reasoning. Amsterdam: Elsevier.
Buckwalter, T. (2002). Buckwalter Arabic morphological analyzer version 1.0. linguistic data consortium. University of Pennsylvania, LDC Catalog No.: LDC2002L49.
Burchardt, A., Erk, K., Frank, A., Kowalski, A., Padó, S., & Pinkal, M. (2009). Using FrameNet for the semantic analysis of German: Annotation, representation, and automation. Multilingual FrameNets in Computational Lexicography: Methods and applications (pp. 209–244).
Burchardt, A., & Frank, A. (2006). Approaching textual entailment with LFG and FrameNet frames. In Proceedings of the second PASCAL RTE challenge workshop
Burchardt, A., Frank, A., & Pinkal, M. (2005). Building text meaning representations from contextually related frames–a case study. In: Proceedings of IWCS-6.
Calzolari, N., Grishman, R., & Palmer, M. (2003). Standards and best practice for multilingual computational lexicons, mile (the multilingual isle lexical entry. In: ISLE Deliverable D2. 2 & 3.2
Covington, M. A. (2001). A fundamental algorithm for dependency parsing. In: Proceedings of the 39th annual ACM southeast conference (pp. 95–102).
Croft, W. (2009). Connecting frames and constructions: A case study of eat and feed. Constructions and Frames, 1(1), 7–28.
Crysmann, B., Frank, A., Kiefer, B., Müller, S., Neumann, G., Piskorski, J., & Krieger, H. U. (2002). An integrated architecture for shallow and deep processing. In: Proceedings of the 40th annual meeting on association for computational linguistics (pp. 441–448). Association for Computational Linguistics.
Deschacht, K., & Moens, M. F. (2009). Semi-supervised semantic role labeling using the latent words language model. In: Proceedings of the 2009 conference on empirical methods in natural language processing (vol. 1, pp. 21–29). Singapoer: ACL.
Diab, M., Alkhalifa, M., Elkateb, S., Fellbaum, C., Mansouri, A., & Palmer, M. (2007). Semeval 2007 task 18: Arabic semantic labeling. In: Proceedings of the 4th international workshop on semantic evaluations (pp. 93–98). Association for Computational Linguistics.
Diab, M., Hacioglu, K., & Jurafsky, D. (2004). Automatic tagging of Arabic text: From raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004: Short papers (pp. 149–152). ACL.
Dorr, B. J. (1992). The use of lexical semantics in interlingual machine translation. Machine Translation, 7(3), 135–193.
Dorr, B. J. (1993). Machine translation: A view from the Lexicon. Cambridge: MIT press.
Dorr, B. J., Hovy, E. H., & Levin, L. S. (2004). Machine translation: Interlingual methods. In: B. Keith (Ed.), Encyclopedia of language and linguistics, 2nd edition (p 939). Amsterdam.
Dukes, K., & Buckwalter, T. (2010). A dependency treebank of the Quran using traditional Arabic grammar. In: The IEEE 7th international conference on informatics and systems (INFOS), 2010 (pp. 1–7).
Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., & Fellbaum, C. (2006). Building a wordnet for Arabic. In: Proceedings of the fifth international conference on language resources and evaluation (LREC 2006).
Erk, K., & Pado, S. (2006). Shalmaneser—a toolchain for shallow semantic parsing. In : Proceedings of LREC (Vol. 6).
Farghaly, A. (2004). Computer processing of Arabic script-based languages: current state and future directions. In: Proceedings of the workshop on computational approaches to Arabic script-based languages (pp. 1–1). ACL.
Fehri, A. F. (1993). Issues in the structure of Arabic clauses and words (Vol. 29). New York: Springer.
Fellbaum, C. (1998). WordNet. New York: Blackwell Publishing Ltd.
Ferrández, O., Ellsworth, M., Munoz, R., & Baker, C. F. (2010). Aligning FrameNet and WordNet based on semantic neighborhoods. In: LREC (Vol. 10, pp. 310–314).
Fillmore, C. J. (1968). The case for case. In E. W. Bach & R. T. Harms (Eds.), Universals in linguistic theory (Vol. 1, pp. 1–88). New York: Holt, Rinehart, and Winston.
Fillmore, C. (1982). Frame semantics. Linguistics in the morning calm (pp. 111–137).
Fillmore, C. J. (1988). The mechanisms of ‘Construction Grammar’. In: S. Axmaker, A. Jaisser, & H. Singmaster (Eds.), Proceedings of the fourteenth annual meeting of the Berkeley linguistics society (pp. 35–55). University of California, Berkeley: Berkeley Linguistics Society.
Fillmore, C. J. (2006). Frame semantics. Cognitive linguistics: Basic readings, 34, 373–400.
Fillmore, C. J. (2008). Border conflicts: FrameNet meets construction grammar. In: Proceedings of the XIII EURALEX international congress (pp. 49–68).
Fillmore, C. J. (2009). A valency dictionary of English. International Journal of Lexicography, 22(1), 55–85.
Fillmore, C. J., & Baker, C. F. (2001). Frame semantics for text understanding. In: Proceedings of WordNet and other lexical resources workshop, NAACL. (pp. 59–64). Pittsburgh.
Fillmore, C. J., & Baker, C. F. (2004). The evolution of FrameNet annotation practices. In: Proceedings of building lexical resources from semantically annotated corpora workshop (pp. 1–8). Lisbon: LREC
Fillmore, C. J. & Baker, C. F. (2009). A frames approach to semantic analysis. In B. Heine, & H. Narrog (Eds.), The Oxford handbook of linguistic analysis (pp. 313–339). Oxford University Press.
Fillmore, C. J., Johnson, C. R., & Petruck, M. R. (2003). Background to framenet. International Journal of Lexicography, 16(3), 235–250.
Fillmore, C. J., Wooters, C., & Baker, C. F. (2001). Building a large lexical databank which provides deep semantics. In: Proceedings of the Pacific Asian conference on language, information and computation (pp. 3–25). Hong Kong, China
Fleischman, M., Kwon, N., & Hovy, E. (2003). Maximum entropy models for FrameNet classification. In: Proceedings of the 2003 conference on empirical methods in natural language processing (pp. 49–56). ACL.
Frank, A., Krieger, H. U., Xu, F., Uszkoreit, H., Crysmann, B., Jörg, B., & Schäfer, U. (2007). Question answering from structured knowledge sources. Journal of Applied Logic, 5(1), 20–48.
Frederking, R., Grannes, D., Cousseau, P., & Nirenburg, S. (1993). An MAT tool and its effectiveness. In: Proceedings of the workshop on human language technology (pp. 196–201). Association for Computational Linguistics.
Fung, P., & Chen, B. (2006). Robust word sense translation by EM learning of frame semantics. In: Proceedings of the COLING/ACL on main conference poster sessions (pp. 239–246). ACL.
Fürstenau, H., & Lapata, M. (2009). Graph alignment for semi-supervised semantic role labeling. In: Proceedings of the 2009 conference on empirical methods in natural language processing (Vol. 1, pp. 11–20). ACL.
Gawron, J. M. (2008). Frame semantics. C. Maienborn et al. (pp. 664–687).
Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28(3), 245–288.
Giuglea, A. M., & Moschitti, A. (2006). Shallow semantic parsing based on FrameNet, VerbNet and PropBank. In: ECAI (Vol. 141, pp. 563–567).
Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.
Goldberg, A. E. (2010). Verbs, constructions and semantic frames. In: Syntax, lexical semantics, and event structure (pp. 21–38). Oxford: Oxford University Press
Habash, N. (2004). Large scale lexeme based Arabic morphological generation. In : Proceedings of Traitement Automatique du Langage Naturel (TALN-04).
Habash, N., & Roth, R. M. (2009, August). CATiB: The Columbia Arabic treebank. In: Proceedings of the ACL-IJCNLP 2009 conference short papers (pp. 221–224). ACL.
Hajic, J., Smrz, O., Zemánek, P., Šnaidauf, J., & Beška, E. (2004). Prague Arabic dependency treebank: Development in data and tools. In: Proceedings of the NEMLAR international conference on Arabic language resources and tools (pp. 110–117).
Huenerfauth, M. (2006). Generating American Sign Language classifier predicates for English-to-ASL machine translation (Doctoral dissertation, University of Pennsylvania).
Huenerfauth, M., & Lu, P. (2010). Modeling and synthesizing spatially inflected verbs for American sign language animations. In: Proceedings of the 12th international ACM SIGACCESS conference on computers and accessibility (pp. 99–106). ACM.
Jackendoff, R. (1992). Semantic structures (Vol. 18). Cambridge: MIT press.
Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.
Johansson, R., & Nugues, P. (2007). LTH: Semantic structure extraction using nonprojective dependency trees. In: Proceedings of the 4th international workshop on semantic evaluations (pp. 227–230). ACL.
Johansson, R., & Nugues, P. (2008). Comparing dependency and constituent syntax for frame semantic analysis. In: 6th International LREC Conference.
Johnson, C., & Fillmore, C. J. (2000, April). The FrameNet tagset for frame-semantic and syntactic coding of predicate-argument structure. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (pp. 56-62). ACL.
Lakhfif, A., Laskri, M.T. & Atwell, E.(2013). Multi-level analysis and annotation of Arabic Corpora for Text-to-Sign Language MT. In: Proceedings of WACL’2 second workshop on Arabic Corpus Linguistics, Lancaster (pp. 49–52)
Lang, J., & Lapata, M. (2011). Unsupervised semantic role induction with graph partitioning. In: Proceedings of the conference on empirical methods in natural language processing (pp. 1320–1331). ACL.
Langacker, R. W. (1987). Foundations of cognitive grammar: Theoretical prerequisites. Redwood City: Stanford University Press.
Lin, D. (1998). An information-theoretic definition of similarity. In: ICML (Vol. 98, pp. 296–304).
Lönneker-Rodman, B. (2007). Multilinguality and FrameNet. ICSI Technical Report TR-07-001, Berkeley, CA.
Lowe, J. B., Baker, C. F., & Fillmore, C. J. (1997). A frame-semantic approach to semantic annotation. In: Proceedings of the SIGLEX workshop on tagging text with lexical semantics: Why, what, and how (pp. 18–24).
Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004). The Penn Arabic treebank: Building a large-scale annotated Arabic corpus. In: NEMLAR conference on Arabic language resources and tools (pp. 102–109).
Martins, R. T., Rino, L. H. M., Nunes, M. D. G. V., & Montilha, G. (2000). An interlingua aiming at communication on the Web: How language-independent can it be? In: Proceedings of the 2000 NAACL-ANLP workshop on applied interlinguas: Practical applications of interlingual approaches to NLP (Vol. 2, pp. 24–33). ACL.
Matsubayashi, Y., Okazaki, N., & Tsujii, J. I. (2009, August). A comparative study on generalization of semantic roles in FrameNet. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the afnlp (Vol. 1, pp. 19–27). ACL.
Meir, I. (2002). A cross-modality perspective on verb agreement. Natural Language & Linguistic Theory, 20(2), 413–450.
Meir, I., Padden, C. A., Aronoff, M., & Sandler, W. (2007). Body as subject. Journal of Linguistics, 43(03), 531–563.
Melˇcuk, I. (1988). Dependency syntax: Theory and practice. Albany: State University of NY Press.
Minsky, M. (1975). A framework for representing knowledge. In P. Winston (Ed.), The psychology of computer vision (pp. 211–217). New York: McGraw-Hill.
Moschitti, A. (2008). Kernel methods, syntax and semantics for relational text categorization. In: Proceedings of the 17th ACM conference on Information and knowledge management (pp. 253–262)
Moschitti, A., Morarescu, P., & Harabagiu, S. M. (2003). Open domain information extraction via automatic semantic labeling. In: FLAIRS conference (pp. 397–401).
Moschitti, A., Quarteroni, S., Basili, R., & Manandhar, S. (2007). Exploiting syntactic and shallow semantic kernels for question answer classification. In: Annual meeting-association for computational linguistics (Vol. 45, p. 776).
Narayanan, S., & Harabagiu, S. (2004, August). Question answering based on semantic structures. In Proceedings of the 20th international conference on Computational Linguistics (p. 693). ACL.
Niles, I., & Pease, A. (2003). Mapping WordNet to the SUMO ontology. In: Proceedings of the IEEE international knowledge engineering conference (pp. 23–26).
Nirenburg, S., & Raskin, V. (2004). Ontological semantics. Cambridge, MA: MIT Press.
Nivre, J. (2008). Algorithms for deterministic incremental dependency parsing. Computational Linguistics, 34(4), 513–553.
Nyberg, E. H., & Mitamura, T. (1992). The KANT system: Fast, accurate, high-quality translation in practical domains. In: Proceedings of the 14th conference on computational linguistics (Vol. 3, pp. 1069–1073). ACL.
Ohara, K. H., Fujii, S., Ohori, T., Suzuki, R., Saito, H., & Ishizaki, S. (2004). The japanese framenet project: An introduction. In: Proceedings of LREC-04 satellite workshop “Building Lexical Resources from Semantically Annotated Corpora”(LREC 2004) (pp. 9–11).
Osswald, R., & Van Valin Jr, R. D. (2014). FrameNet, frame structure, and the syntax-semantics interface. In: Frames and concept types (pp. 125–156). New York: Springer International Publishing.
Padó, S., & Lapata, M. (2005, October). Cross-linguistic projection of role-semantic information. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (pp. 859–866). ACL.
Palmer, M. (2011). Going beyond shallow semantics. In: Proceedings of the ACL 2011 workshop on relational models of semantics (pp. 1–1). Association for Computational Linguistics.
Palmer, M., Babko-Malaya, O., Bies, A., Diab, M. T., Maamouri, M., Mansouri, A., & Zaghouani, W. (2008). A pilot Arabic Propbank. In: LREC.
Palmer, A., & Sporleder, C. (2010, August). Evaluating FrameNet-style semantic parsing: the role of coverage gaps in FrameNet. In Proceedings of the 23rd international conference on computational linguistics: posters (pp. 928-936). ACL.
Palmer, M., & Wu, Z. (1995). Verb semantics for English-Chinese translation. Machine Translation, 10(1–2), 59–92.
Pazienza, M. T., & Velardi, P. (1987). A structured representation of word-senses for semantic analysis. In: Proceedings of the third conference on European chapter of the Association for Computational Linguistics (pp. 249–257)
Petruck, M. R. L. (1996). Frame semantics. In J. Verschueren, J.-O. Östman Blommaert, & C. Bulcaen (Eds.), Handbook of pragmatics (pp. 1–13). Amsterdam: John Benjamins.
Petruck, M. R. L. (2008). Framing motion in Hebrew and English, In R. Rossini Favretti (Ed.), Frames corpora and knowledge representation. (pp. 43–51). Bologna: Bononia University Press.
Pustejovsky, J. (1991a). The syntax of event structure. Cognition, 41(1), 47–81.
Pustejovsky, J. (1991b). The generative lexicon. Computational Linguistics, 17(4), 409–441.
Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007.
Ruppenhofer, J., Ellsworth, M., Petruck, M. R., Johnson, C. R., & Scheffczyk, J. (2006). FrameNet II: Extended theory and practice. Berkeley: ICSI.
Ruppenhofer, J., Sporleder, C., Morante, R., Baker, C., & Palmer, M. (2010). Semeval-2010 task 10: Linking events and their participants in discourse. In: Proceedings of the 5th international workshop on semantic evaluation (pp. 45–50). ACL.
Ryding, K. C. (2005). A reference grammar of modern standard Arabic. Cambridge: Cambridge University Press.
Sandler, W. (1986). The spreading hand auto segment of American sign language. Sign Language Studies, 50(1), 1–28.
Saint-Dizier, P. (2006). Introduction to the syntax and semantics of prepositions. In: Syntax and semantics of prepositions (pp. 1–25). New York: Springer
Sanfilippo, A.., Calzolari N., Ananiadou S., Gaizauskas R., Saint-Dizier P., Vossen P. (eds.), (1999). Preliminary Recommendations on Lexical Semantic Encoding. EAGLESLE3-4244 Final Report.
Schank, R. C. (1972). Conceptual dependency: A theory of natural language understanding. Cognitive Psychology, 3(4), 552–631.
Scheffczyk, J., Pease, A., & Ellsworth, M. (2006). Linking FrameNet to the suggested upper merged ontology. In: Frontiers in artificial intelligence and applications (pp. 289–300). Amsterdam: IOS Press
Sharaf, A., & Atwell, E. (2009). Knowledge representation of the Quran through frame semantics: A corpus-based approach. Corpus Linguistics, 2009, 12.
Shen, D., & Lapata, M. (2007). Using semantic roles to improve question answering. In: EMNLP-CoNLL (pp. 12–21).
Shi, L., & Mihalcea, R. (2005). Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In: Computational linguistics and intelligent text processing (pp. 100–111). Berlin: Springer
Snider, N., & Diab, M. (2006). Unsupervised induction of modern standard Arabic verb classes using syntactic frames and LSA. In: Proceedings of the COLING/ACL on main conference poster sessions (pp. 795–802). Association for Computational Linguistics.
Sowa, J. F. (1988). Knowledge representation in databases, expert systems, and natural language. In: DS-3 (pp. 17–50).
Subirats-Rüggeberg, C., & Petruck, M. R. (2003). Surprise: Spanish FrameNet! presentation at the workshop on frame semantics. In Proceedings of the international congress of linguists. Praga.
Surdeanu, M., Harabagiu, S., Williams, J., & Aarseth, P. (2003). Using predicate-argument structures for information extraction. In: Proceedings of the 41st annual meeting on association for computational linguistics (Vol. 1, pp. 8-15). ACL.
Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. Language Typology and Syntactic Description, 3, 57–149.
Tatu, M., & Moldovan, D. (2005). A semantic approach to recognizing textual entailment. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (pp. 371–378). ACL.
Taub, S. F. (2001). Language from the body: Iconicity and metaphor in American Sign Language. Cambridge: Cambridge University Press.
Taub, S. F., & Galvan, D. (2001). Patterns of conceptual encoding in ASL motion descriptions. Sign Language Studies, 1(2), 175–200.
Tesnière, L. (1959). Eléments de syntaxe structurale. Klincksieck: Librairie C.
Thompson, C. A., Levy, R., & Manning, C. D. (2003). A generative model for semantic role labeling. In :Machine learning: ECML 2003 (pp. 397–408). Berlin: Springer
Titov, I., & Klementiev, A. (2012). A Bayesian approach to unsupervised semantic role induction. In: Proceedings of the 13th conference of the european chapter of the Association for Computational Linguistics (pp. 12–22).
Traum, D., & Habash, N. (2000). Generation from lexical conceptual structures. In: Proceedings of the 2000 NAACL-ANLP workshop on applied interlinguas: Practical applications of interlingual approaches to NLP (Vol. 2, pp. 52–59). ACL.
Vossen, P. (1998). Introduction to eurowordnet. In: EuroWordNet: A multilingual database with lexical semantic networks (pp. 1–17). Berlin: Springer
Wilcox, S. (2004). Gesture and language: Cross-linguistic and historical data from signed languages. Gesture, 4(1), 43–73.
Wilks, Y., & Fass, D. (1992). The preference semantics family. Computers & Mathematics with Applications, 23(2), 205–221.
Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics (pp. 133–138).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Lakhfif, A., Laskri, M.T. A frame-based approach for capturing semantics from Arabic text for text-to-sign language MT. Int J Speech Technol 19, 203–228 (2016). https://doi.org/10.1007/s10772-015-9290-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9290-8