Nothing Special   »   [go: up one dir, main page]

Skip to main content

Improving Egyptian-to-English SMT by Mapping Egyptian into MSA

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

  • 1695 Accesses

Abstract

One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of  1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Zbib, R., Malchiodi, E., Devlin, J., Stallard, D., Matsoukas, S., Schwartz, R., Makhoul, J., Zaidan, O.F., Callison-Burch, C.: Machine translation of arabic dialects. In: The 2012 Conference of the North American Chapter of the Association for Computational Linguistics, Montreal. Association for Computational Linguistics (2012)

    Google Scholar 

  2. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, Morristown, NJ, USA, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  3. Tillmann, C., Ney, H.: Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational Linguistics 29, 97–133 (2003)

    Article  MATH  Google Scholar 

  4. Ittycheriah, A., Roukos, S.: Direct translation model 2. In: Proceedings of the North American Chapter of the Association for Computational Linguistics and Human Language Technologies Conference (NAACL-HLT), Rochester, NY (2007)

    Google Scholar 

  5. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)

    Google Scholar 

  6. Habash, N.: Introduction to Arabic Natural Language Processing. In: Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers (2010)

    Google Scholar 

  7. Chiang, D., Diab, M.T., Habash, N., Rambow, O., Shareef, S.: Parsing arabic dialects. In: EACL (2006)

    Google Scholar 

  8. Habash, N., Rambow, O.: Magead: A morphological analyzer and generator for the arabic dialects. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 681–688. Association for Computational Linguistics (2006)

    Google Scholar 

  9. Abo Bakr, H.M., Shaalan, K., Ziedan, I.: A hybrid approach for converting written egyptian colloquial dialect into diacritized arabic. In: Proceedings of the 6th International Conference on Informatics and Systems, INFOS2008, Cairo, Egypt (2008)

    Google Scholar 

  10. Sawaf, H.: Arabic dialect handling in hybrid machine translation. In: Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), Denver, Colorado. Association for Machine Translation in the Americas (2010)

    Google Scholar 

  11. Salloum, W., Habash, N.: Dialectal to standard arabic paraphrasing to improve arabic-english statistical machine translation. In: Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties, Edinburgh, Scotland, pp. 10–21. Association for Computational Linguistics (2011)

    Google Scholar 

  12. Sajjad, H., Darwish, K., Belinkov, Y.: Translating dialectal arabic to english. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria. Short Papers, vol. 2, pp. 1–6. Association for Computational Linguistics (2013)

    Google Scholar 

  13. Al-Sabbagh, R., Girju, R.: Mining the web for the induction of a dialectical arabic lexicon. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, European Language Resources Association, ELRA (2010)

    Google Scholar 

  14. Riesa, J., Yarowsky, D.: Minimally supervised morphological segmentation with applications to machine translation. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th Conf. of the Association for Machine Translation in the Americas (AMTA 2006), Cambridge, MA. Association for Machine Translation in the Americas, AMTA (2006)

    Google Scholar 

  15. Baroni, M., Bisi, S.: Using cooccurrence statistics and the web to discover synonyms in technical language. In: Proceedings of LREC 2004, pp. 1725–1728 (2004)

    Google Scholar 

  16. Hagiwara, M., Ogawa, Y., Toyama, K.: Selection of effective contextual information for automatic synonym acquisition. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 353–360. Association for Computational Linguistics (2006)

    Google Scholar 

  17. Koehn, P.: Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Franz, M., McCarley, J.S.: Arabic information retrieval at ibm. In: TREC (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Durrani, N., Al-Onaizan, Y., Ittycheriah, A. (2014). Improving Egyptian-to-English SMT by Mapping Egyptian into MSA. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics