Abstract
This paper presents an in-depth description of the features of the open-source CMU-EBMT example-based machine translation system. CMU-EBMT is a complete end-to-end system including lexicon induction, word and phrase alignment, corpus indexing and lookup, language model, decoder, and parameter tuning components. While it does not require them, it can take advantage of external alignment information and other annotations provided by GIZA++ and other systems. To illustrate a recent addition to CMU-EBMT, experiments are presented which show an improvement of 0.16 BLEU points (0.9% relative) on a cross-validated small-data English–Haitian translation task when using a new set of fine-grained log-linear feature values representing language model match lengths in addition to language model probabilities.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Al-Onaizan Y, Curin J, Jahr M, Knight K, Lafferty J, Melamed I, Och FJ, Purdy D, Smith NA, Yarowsky D (1999) Statistical machine translation: final report. In: Proceedings of the summer workshop on language engineering. John Hopkins University Center for Language and Speech Processing
Bertoldi N, Haddow B, Fouet JB (2009) Improved minimum error rate training in Moses. Prague Bull Math Linguist, pp 1–11
Brants T, Franz A (2006) Web 1T 5-gram Version 1
Brown RD (1996) Example-based machine translation in the Pangloss system. In: Proceedings of the sixteenth international conference on computational linguistics, Copenhagen, Denmark, pp 169–174. http://www.aclweb.org/anthology/C/C96/C96-1030.pdf
Brown RD (1997) Automated dictionary extraction for “knowledge-free” example-based translation. In: Proceedings of the seventh international conference on theoretical and methodological issues in machine translation (TMI-97), Santa Fe, New Mexico, pp 111–118. http://www.cs.cmu.edu/~ralf/papers.html
Brown RD (2000) Automated generalization of translation examples. In: Proceedings of the eighteenth international conference on computational linguistics (COLING-2000), pp 125–131.http://www.aclweb.org/anthology/C00-1019
Brown RD (2001) Transfer-rule induction for example-based translation. In: Proceedings of the workshop on example-based machine translation. http://www.cs.-cmu.edu~ralf/papers.html
Brown RD (2003) Clustered transfer rule induction for example-based translation. In: Recent advances in example-based machine translation, text, speech and language technology, chap. 10. Kluwer Academic Publishers, Dordrecht, pp 287–306
Brown RD (2004) A modified Burrows–Wheeler transform for highly-scalable example-based translation. In: Machine translation: from real users to research, Proceedings of the 6th conference of the Association for Machine Translation in the Americas (AMTA-2004), Lecture Notes in Artificial Intelligence, vol 3265. Springer Verlag, pp 27–36. http://www.cs.cmu.edu/~ralf/papers.html
Brown RD (2005) Context-sensitive retrieval for example-based machine translation. In: Proceedings of workshop: example-based machine translation, the tenth machine translation summit, pp 12–16. http://www.cs.cmu.edu/~ralf/papers.html
Brown RD (2008) Exploiting document-level context for data-driven machine translation. In: Proceedings of the eighth conference of the Association for Machine Translation in the Americas (AMTA-2008). http://www.amtaweb.org/papers/-2.02_Brown.pdf
Brown RD (2010) Taming structured perceptrons on wild feature vectors. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 384–391. http://www.aclweb.org/anthology/W10-1758
Brown R, Frederking R (1995) Applying statistical English language modeling to symbolic machine translation. In: Proceedings of the sixth international conference on theoretical and methodological issues in machine translation (TMI-95), pp 221–239
Brown RD, Hutchinson R, Bennett PN, Carbonell JG, Jansen P (2003) Reducing boundary friction using translation-fragment overlap. In: Proceedings of the ninth machine translation summit, pp 24–31. http://www.cs.cmu.edu~ralf/papers.html
Burrows M, Wheeler D (1994) A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation
Carnegie Mellon University: public release of Haitian-Creole language data (2010). http://www.speech.cs.cmu.edu/haitian/text
Frederking R (1994) Statistical language models for symbolic MT. In: Language engineering on the information highway workshop, Santorini, Greece
Frederking R, Nirenburg S (1994) Three heads are better than one. In: Proceedings of the fourth conference on applied natural language processing (ANLP-94), Stuttgart, Germany
Frederking R, Nirenburg S, Farwell D, Helmreich S, Hovy E, Knight K, Beale S, Domashnev C, Attardo D, Grannes D, Brown R (1994) Integrating translations from multiple sources within the Pangloss mark III machine translation. In: Proceedings of the first conference of the Association for Machine Translation in the Americas. Association for Machine Translation in the Americas, Columbia, Maryland, pp 73–80
Gangadharaiah R, Brown R, Carbonell J (2006) Spectral clustering for example based machine translation. In: Proceedings of the human language technology conference of the NAACL, companion volume: short papers. Association for Computational Linguistics, pp 41–44. http://www.aclweb.org/anthology/N06-2011
Gangadharaiah R, Brown RD, Carbonell JG (2010) Monolingual distributional profiles for word substitution in machine translation. In: Proceedings of the 23rd international conference on computation linguistics (COLING-2010). http://www.cs.cmu.edu/~rgangadh/rashmi_coling10.pdf
Gimpel K, Smith NA (2008) Rich source-side context for statistical machine translation. In: Proceedings of the third workshop on statistical machine translation, pp 9–17
Graff D, Kong J, Chen K, Maeda K (2007) English gigaword, 3rd edn. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T07
Hutchinson R, Bennett PN, Carbonell J, Jansen P, Brown R (2003) Maximal lattice overlap in example-based machine translation. Tech. Rep. CMU-CS-03-138. Computer Science Department, Carnegie Mellon University
Kim JD, Brown RD, Jansen PJ, Carbonell JG (2005) Symmetric probabilistic alignment for example-based translation. In: Proceedings of the tenth workshop of the European Assocation for Machine Translation (EAMT-05)
Kim JD, Brown RD, Carbonell JG (2010) Chunk-Based EBMT. In: Proceedings of the 14th workshop of the European Association for Machine Translation (EAMT-2010)
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation. In: Machine translation: from real users to research, proceedings of the 6th conference of the Association for Machine Translation in the Americas (AMTA-2004), Lecture Notes in Artificial Intelligence, vol 3265. Springer Verlag
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), pp 177–180. (Demonstration session). http://www.aclweb.org/anthology/P07-2045.
Lin CY, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd meeting of the Association for Computational Linguistics (ACL’04), main volume, Barcelona, Spain, pp 605–612. doi:10.3115/1218955.1219032. http://www.aclweb.org/anthology/P04-1077
Lopez A (2007) Hierarchical phrase-based translation with suffix arrays. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, pp 976–985. http://www.aclweb.org/anthology/D/D07/D07-1104.pdf
Melamed ID (1997) A word-to-word model of translational equivalence. In: Proceedings of the 35th annual meeting of the Association for Computational Linguistics (ACL’97), pp 490–497. doi:10.3115/976909.979680. http://www.aclweb.org/anthology/P97-1063
Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence. North-Holland, pp 173–180
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st meeting of the Association for Computational Linguistics (ACL-2003). Association for Computational Linguistics, Sapporo, Japan, pp 160–167. doi:10.3115/1075096.1075117. http://www.aclweb.org/anthology/P03-1021
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics. doi:10.3115/1073083.1073135. http://www.aclweb.org/anthology/P02-1040
Phillips AB (2007) Sub-phrasal matching and structural templates in example-based MT. In: Proceedings of the 11th conference on theoretical and methodological issues in machine translation (TMI-07). http://www.dustoftheground.net/techne-/research/publications.php
Phillips AB (2010) The Cunei machine translation platform for WMT ’10. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics, MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 149–154. http://www.aclweb.org/anthology/W10-1721
Phillips AB (2011) Personal communication
Somers H (1999) Example-based machine translation. Mach Transl 14(2): 113–158
Stolcke A (2002) Srilm—an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing, pp 901–904
Vogel S (2005) PESA: phrase pair extraction as sentence splitting. In: Proceedings of the tenth machine translation summit (MT Summit X)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brown, R.D. The CMU-EBMT machine translation system. Machine Translation 25, 179–195 (2011). https://doi.org/10.1007/s10590-011-9095-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-011-9095-8