Translation with Scarce Bilingual Resources

Yaser Al-Onaizan¹,
Ulrich Germann¹,
Ulf Hermjakob¹,
Kevin Knight¹,
Philipp Koehn¹,
Daniel Marcu¹ &
…
Kenji Yamada¹

111 Accesses
2 Citations
Explore all metrics

Abstract

Machine translation of human languages is a field almost as old as computers themselves. Recent approaches to this challenging problem aim at learning translation knowledge automatically (or semi-automatically) from online text corpora, especially human-translated documents. For some language pairs, substantial translation resources exist, and these corpus-based systems can perform well. But for most language pairs, data is scarce, andcurrent techniques do not work well. To examine the gap betweenhuman and machine translators, we created an experiment in which humanbeings were asked to translate an unknown language into English on thesole basis of a very small bilingual text. Participants performed quite well,and debriefings revealed a number of valuable strategies. We discuss thesestrategies and apply some of them to a statistical translation system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Al-Onaizan, Y., J. Curin, M. Jahr, K. Knight, J. Lafferty, D. Melamed, F. Och, D. Purdy, N. A., Smith, and D. Yarowsky: 1999, Statistical Machine Translation', Final report, JHU Workshop 1999. Technical report, CLSP, Johns Hopkins University, Baltimore, MD.
Google Scholar
Al-Onaizan, Y., U. Germann, U. Hermjakob, K. Knight, P. Koehn, D. Marcu, and K. Yamada: 2000, ‘Translating with Scarce Resources’, in Proceedings of the Seventeenth National Conference on Artificial Intelligence, Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, TX.
Alshawi, H., A. L. Buchsbaum, and F. Xia: ‘A Comparison of Head Transducers and Transfer for a Limited Domain Translation Application’, in 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 360–365.
Brown, P., S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer: 1993, ‘The Mathematics of StatisticalMachine Translation: Parameter Estimation’, Computational Linguistics 19, 263–311.
Google Scholar
Brown, R. D.: 2000, ‘Automated Generalization of Translation Examples’, in Proceedings of the 18th International Conference on Computational Linguistics: COLING 2000 in Europe, Saarbrücken, Germany, pp. 125–131.
Church, K. W.: 1993, ‘Charalign: A Program for Aligning Parallel Texts at the Character Level’, in 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, pp. 1–16.
Germann, U., M. Jahr, K. Knight, D. Marcu, and K. Yamada: 2001, ‘Fast Decoding and Optimal Decoding for Machine Translation’, in Association for Computational Linguistics 39th Annual Meeting and 10th Conference of the European Chapter, Toulouse, France, 228–235.
Hull, G.: 1999, Tetum: Language Manual for East Timor, Academy of East Timor Studies.
Knight, K.: 1997, ‘Automating Knowledge Acquisition for Machine Translation’, AI Magazine 18, 81–96.
Google Scholar
Knight, K.: 1999, ‘A Statistical MT Tutorial Wookbook, Prepared in Connection with the JHU Summer Workshop’, Technical report, USC/ISI, Los Angeles, CA. Available at www.isi.edu/natural-language/mt/wkbk.rtf.
Google Scholar
Langkilde, I. and K. Knight: 1998, ‘Generation that Exploits Corpus-Based Statistical Knowledge’, in COLING-ACL’ 98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 704–710.
Melamed, I. D.: 2000, Empirical Methods for Exploiting Parallel Texts, MIT Press, Cambridge, MA.
Google Scholar
Nagao, M.: 1984, ‘A Framework of a Mechanical Translation between Japanese and English by Analogy Principle’, in A. Elithorn and R. Barnerji (eds), Artificial and Human Intelligence, North-Holland, Amsterdam, pp. 173–180.
Google Scholar
Och, F. J., C. Tillmann, and H. Ney: 1999, ‘Improved Alignment Models for Statistical Machine Translation’, in Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, pp. 20–28.
Wahlster, W. (ed.): 2000, Verbmobil: Foundations of Speech-to-Speech Translation, Springer, Berlin.
Google Scholar
White, J. and T. O'Connell: 1994, ‘Evaluation in the ARPA Machine Translation Program: 1993 Methodology’, in Proceedings of the ARPA Human Language Technology Workshop, Plainsboro, NJ.
Wu, D.: 1997, ‘Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora’, Computational Linguistics 23, 377–404.
Google Scholar

Download references

Author information

Authors and Affiliations

Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA, 90292, USA
Yaser Al-Onaizan, Ulrich Germann, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Daniel Marcu & Kenji Yamada

Authors

Yaser Al-Onaizan
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Germann
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Hermjakob
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Knight
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Koehn
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Marcu
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Yamada
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Onaizan, Y., Germann, U., Hermjakob, U. et al. Translation with Scarce Bilingual Resources. Machine Translation 17, 1–17 (2002). https://doi.org/10.1023/A:1025539822079

Download citation

Issue Date: March 2002
DOI: https://doi.org/10.1023/A:1025539822079

Translation with Scarce Bilingual Resources

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Inflating a Training Corpus for SMT by Using Unrelated Unaligned Monolingual Data

Computational Approaches to Translation Studies

Learning Curve with Machine Translation Based on Parallel, Bilingual Corpora

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Translation with Scarce Bilingual Resources

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Inflating a Training Corpus for SMT by Using Unrelated Unaligned Monolingual Data

Computational Approaches to Translation Studies

Learning Curve with Machine Translation Based on Parallel, Bilingual Corpora

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation