Abstract
One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zbib, R., Malchiodi, E., Devlin, J., Stallard, D., Matsoukas, S., Schwartz, R., Makhoul, J., Zaidan, O.F., Callison-Burch, C.: Machine translation of arabic dialects. In: The 2012 Conference of the North American Chapter of the Association for Computational Linguistics, Montreal. Association for Computational Linguistics (2012)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, Morristown, NJ, USA, pp. 311–318. Association for Computational Linguistics (2002)
Tillmann, C., Ney, H.: Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational Linguistics 29, 97–133 (2003)
Ittycheriah, A., Roukos, S.: Direct translation model 2. In: Proceedings of the North American Chapter of the Association for Computational Linguistics and Human Language Technologies Conference (NAACL-HLT), Rochester, NY (2007)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Habash, N.: Introduction to Arabic Natural Language Processing. In: Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers (2010)
Chiang, D., Diab, M.T., Habash, N., Rambow, O., Shareef, S.: Parsing arabic dialects. In: EACL (2006)
Habash, N., Rambow, O.: Magead: A morphological analyzer and generator for the arabic dialects. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 681–688. Association for Computational Linguistics (2006)
Abo Bakr, H.M., Shaalan, K., Ziedan, I.: A hybrid approach for converting written egyptian colloquial dialect into diacritized arabic. In: Proceedings of the 6th International Conference on Informatics and Systems, INFOS2008, Cairo, Egypt (2008)
Sawaf, H.: Arabic dialect handling in hybrid machine translation. In: Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), Denver, Colorado. Association for Machine Translation in the Americas (2010)
Salloum, W., Habash, N.: Dialectal to standard arabic paraphrasing to improve arabic-english statistical machine translation. In: Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties, Edinburgh, Scotland, pp. 10–21. Association for Computational Linguistics (2011)
Sajjad, H., Darwish, K., Belinkov, Y.: Translating dialectal arabic to english. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria. Short Papers, vol. 2, pp. 1–6. Association for Computational Linguistics (2013)
Al-Sabbagh, R., Girju, R.: Mining the web for the induction of a dialectical arabic lexicon. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, European Language Resources Association, ELRA (2010)
Riesa, J., Yarowsky, D.: Minimally supervised morphological segmentation with applications to machine translation. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th Conf. of the Association for Machine Translation in the Americas (AMTA 2006), Cambridge, MA. Association for Machine Translation in the Americas, AMTA (2006)
Baroni, M., Bisi, S.: Using cooccurrence statistics and the web to discover synonyms in technical language. In: Proceedings of LREC 2004, pp. 1725–1728 (2004)
Hagiwara, M., Ogawa, Y., Toyama, K.: Selection of effective contextual information for automatic synonym acquisition. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 353–360. Association for Computational Linguistics (2006)
Koehn, P.: Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004)
Franz, M., McCarley, J.S.: Arabic information retrieval at ibm. In: TREC (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Durrani, N., Al-Onaizan, Y., Ittycheriah, A. (2014). Improving Egyptian-to-English SMT by Mapping Egyptian into MSA. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)