Abstract
Machine translation of human languages is a field almost as old as computers themselves. Recent approaches to this challenging problem aim at learning translation knowledge automatically (or semi-automatically) from online text corpora, especially human-translated documents. For some language pairs, substantial translation resources exist, and these corpus-based systems can perform well. But for most language pairs, data is scarce, andcurrent techniques do not work well. To examine the gap betweenhuman and machine translators, we created an experiment in which humanbeings were asked to translate an unknown language into English on thesole basis of a very small bilingual text. Participants performed quite well,and debriefings revealed a number of valuable strategies. We discuss thesestrategies and apply some of them to a statistical translation system.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Al-Onaizan, Y., J. Curin, M. Jahr, K. Knight, J. Lafferty, D. Melamed, F. Och, D. Purdy, N. A., Smith, and D. Yarowsky: 1999, Statistical Machine Translation', Final report, JHU Workshop 1999. Technical report, CLSP, Johns Hopkins University, Baltimore, MD.
Al-Onaizan, Y., U. Germann, U. Hermjakob, K. Knight, P. Koehn, D. Marcu, and K. Yamada: 2000, ‘Translating with Scarce Resources’, in Proceedings of the Seventeenth National Conference on Artificial Intelligence, Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, TX.
Alshawi, H., A. L. Buchsbaum, and F. Xia: ‘A Comparison of Head Transducers and Transfer for a Limited Domain Translation Application’, in 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 360–365.
Brown, P., S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer: 1993, ‘The Mathematics of StatisticalMachine Translation: Parameter Estimation’, Computational Linguistics 19, 263–311.
Brown, R. D.: 2000, ‘Automated Generalization of Translation Examples’, in Proceedings of the 18th International Conference on Computational Linguistics: COLING 2000 in Europe, Saarbrücken, Germany, pp. 125–131.
Church, K. W.: 1993, ‘Charalign: A Program for Aligning Parallel Texts at the Character Level’, in 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, pp. 1–16.
Germann, U., M. Jahr, K. Knight, D. Marcu, and K. Yamada: 2001, ‘Fast Decoding and Optimal Decoding for Machine Translation’, in Association for Computational Linguistics 39th Annual Meeting and 10th Conference of the European Chapter, Toulouse, France, 228–235.
Hull, G.: 1999, Tetum: Language Manual for East Timor, Academy of East Timor Studies.
Knight, K.: 1997, ‘Automating Knowledge Acquisition for Machine Translation’, AI Magazine 18, 81–96.
Knight, K.: 1999, ‘A Statistical MT Tutorial Wookbook, Prepared in Connection with the JHU Summer Workshop’, Technical report, USC/ISI, Los Angeles, CA. Available at www.isi.edu/natural-language/mt/wkbk.rtf.
Langkilde, I. and K. Knight: 1998, ‘Generation that Exploits Corpus-Based Statistical Knowledge’, in COLING-ACL’ 98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 704–710.
Melamed, I. D.: 2000, Empirical Methods for Exploiting Parallel Texts, MIT Press, Cambridge, MA.
Nagao, M.: 1984, ‘A Framework of a Mechanical Translation between Japanese and English by Analogy Principle’, in A. Elithorn and R. Barnerji (eds), Artificial and Human Intelligence, North-Holland, Amsterdam, pp. 173–180.
Och, F. J., C. Tillmann, and H. Ney: 1999, ‘Improved Alignment Models for Statistical Machine Translation’, in Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, pp. 20–28.
Wahlster, W. (ed.): 2000, Verbmobil: Foundations of Speech-to-Speech Translation, Springer, Berlin.
White, J. and T. O'Connell: 1994, ‘Evaluation in the ARPA Machine Translation Program: 1993 Methodology’, in Proceedings of the ARPA Human Language Technology Workshop, Plainsboro, NJ.
Wu, D.: 1997, ‘Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora’, Computational Linguistics 23, 377–404.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Al-Onaizan, Y., Germann, U., Hermjakob, U. et al. Translation with Scarce Bilingual Resources. Machine Translation 17, 1–17 (2002). https://doi.org/10.1023/A:1025539822079
Issue Date:
DOI: https://doi.org/10.1023/A:1025539822079