research-article

Free access

Divide and translate: improving long distance reordering in statistical machine translation

Authors:

Katsuhito Sudoh,

Hajime Tsukada,

Masaaki NagataAuthors Info & Claims

WMT '10: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

Pages 418 - 427

Published: 15 July 2010 Publication History

Abstract

This paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause translations with non-terminals. The non-terminals are placeholders of embedded clauses, by which we reduce complicated clause-level reordering into simple word-level reordering. Its translation model is trained using a bilingual corpus with clause-level alignment, which can be automatically annotated by our alignment algorithm with a syntactic parser in the source language. We achieved significant improvements of 1.4% in BLEU and 1.3% in TER by using Moses, and 2.2% in BLEU and 3.5% in TER by using our hierarchical phrase-based SMT, for the English-to-Japanese translation of research paper abstracts in the medical domain.

References

[1]

}}Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311.

Digital Library

[2]

}}David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201--228.

Digital Library

[3]

}}Michael Collins, Philipp Koehn, and Ivona Ku&ccirc;erová. 2005. Clause restructuring for statistical machine translation. In Proc. ACL, pages 531--540.

Digital Library

[4]

}}Osamu Furuse, Setsuo Yamada, and Kazuhide Yamamoto. 1998. Splitting long or ill-formed input for robust spoken-language translation. In Proc. COLING-ACL, pages 421--427.

Digital Library

[5]

}}Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What's in a translation rule? In Proc. NAACL, pages 273--280.

[6]

}}Jonathan Graehl and Kevin Knight. 2004. Training tree transducers. In Proc. HLT-NAACL, pages 105--112.

[7]

}}Jason Katz-Brown and Michael Collins. 2008. Syntactic reordering in preprocessing for Japanese-English translation: MIT system description for NTCIR-7 patent translation task. In Proc. NTCIR-7, pages 409--414.

[8]

}}Yeun-Bae Kim and Terumasa Ehara. 1994. A method for partitioning of long Japanese sentences with subject resolution in J/E machine translation. In Proc. International Conference on Computer Processing of Oriental Languages, pages 467--473.

[9]

}}Phillip Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. HLT-NAACL, pages 263--270.

Digital Library

[10]

}}Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proc. IWSLT.

[11]

}}Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proc. ACL Companion Volume Proceedings of the Demo and Poster Sessions, pages 177--180.

Digital Library

[12]

}}Chi-Ho Li, Dongdong Zhang, Mu Li, Ming Zhou, Minghui Li, and Yi Guan. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proc. ACL, pages 720--727.

[13]

}}Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-String alignment template for statistical machine translation. In Proc. Coling-ACL, pages 609--616.

Digital Library

[14]

}}Igor Malioutov and Regina Barzilay. 2006. Minimum cut model for spoken lecture segmentation. In Proc. Coling-ACL, pages 25--32.

Digital Library

[15]

}}Yusuke Miyao and Jun'ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computational Linguistics, 34(1):35--80.

Digital Library

[16]

}}Franz Josef Och, Nicola Ueffing, and Hermann Ney. 2001. An efficient A* search algorithm for statistical machine translation. In Proc. the ACL Workshop on Data-Driven Methods in Machine Translation, pages 55--62.

Digital Library

[17]

}}Kishore Papineni, Salim Roukos, Todd Ward, and Wei Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proc. ACL, pages 311--318.

Digital Library

[18]

}}Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proc. AMTA, pages 223--231.

[19]

}}Christoph Tillmann. 2004. A unigram orientation model for statistical machine translation. In Proc. HLT-NAACL, pages 101--104.

Digital Library

[20]

}}Erik F. Tjong, Kim Sang, and Hervé Déjean. 2001. Introduction to the CoNLL-2001 shared task: Clause identification. In Proc. CoNLL, pages 53--57.

Digital Library

[21]

}}Roy Tromble and Jason Eisner. 2009. Learning linear ordering problems for better translation. In Proc. EMNLP, pages 1007--1016.

Digital Library

[22]

}}Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377--404.

Digital Library

[23]

}}Fei Xia and Michael McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proc. COLING, pages 508--514.

Digital Library

[24]

}}Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a dependency parser to improve SMT for Subject-Object-Verb languages. In Proc. HLT-NAACL, pages 245--253.

Digital Library

[25]

}}Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model. In Proc. ACL, pages 523--530.

Digital Library

[26]

}}Ying Zhang, Stephan Vogel, and Alex Weibel. 2004. Interpreting BLEU/NIST scores: How much improvement do we need to have a better system? In Proc. LREC, pages 2051--2054.

[27]

}}Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proc. ICML, pages 912--919.

Cited By

Tran NTran HNguyen SNguyen HNguyen TGuéhéneuc YKhomh FSarro F(2019)Does BLEU score work for code migration?Proceedings of the 27th International Conference on Program Comprehension10.1109/ICPC.2019.00034(165-176)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICPC.2019.00034
Nguyen ANguyen TNguyen TCohen MGrunske LWhalen M(2015)Divide-and-conquer approach for multi-phase statistical migration for source codeProceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE.2015.74(585-596)Online publication date: 9-Nov-2015
https://dl.acm.org/doi/10.1109/ASE.2015.74
Koeva SRizov BStoyanova ILeseva SDekova RGenov ATarpomanova EDimitrova TKukova H(2012)Application of clause alignment for statistical machine translationProceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation10.5555/2392936.2392952(102-110)Online publication date: 12-Jul-2012
https://dl.acm.org/doi/10.5555/2392936.2392952
Show More Cited By

Index Terms

Divide and translate: improving long distance reordering in statistical machine translation

Recommendations

Divide and Translate Legal Text Sentence by Using Its Logical Structure
KICSS '12: Proceedings of the 2012 Seventh International Conference on Knowledge, Information and Creativity Support Systems

Translating legal text is generally considered to be difficult because legal text has some characteristics that make it different from other daily-use documents and legal text is usually long and complicated. In order boost the legal text translation ...
Translate Once, Translate Twice, Translate Thrice and Attribute: Identifying Authors and Machine Translation Tools in Translated Text
ICSC '12: Proceedings of the 2012 IEEE Sixth International Conference on Semantic Computing

In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine ...
Learning to translate: a psycholinguistic approach to the induction of grammars and transfer functions

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

WMT '10: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

July 2010

455 pages

ISBN:9781932432718

Program Chairs:
Chris Callison-Burch
Johns Hopkins University
,
Philipp Koehn
University of Edinburgh (United Kingdom)
,
Christof Monz
University of Amsterdam (The Netherlands)
,
Kay Peterson
NIST
,
Omar Zaidan
Johns Hopkins University

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 15 July 2010

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
275
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)7

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tran NTran HNguyen SNguyen HNguyen TGuéhéneuc YKhomh FSarro F(2019)Does BLEU score work for code migration?Proceedings of the 27th International Conference on Program Comprehension10.1109/ICPC.2019.00034(165-176)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICPC.2019.00034
Nguyen ANguyen TNguyen TCohen MGrunske LWhalen M(2015)Divide-and-conquer approach for multi-phase statistical migration for source codeProceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE.2015.74(585-596)Online publication date: 9-Nov-2015
https://dl.acm.org/doi/10.1109/ASE.2015.74
Koeva SRizov BStoyanova ILeseva SDekova RGenov ATarpomanova EDimitrova TKukova H(2012)Application of clause alignment for statistical machine translationProceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation10.5555/2392936.2392952(102-110)Online publication date: 12-Jul-2012
https://dl.acm.org/doi/10.5555/2392936.2392952
Isozaki HSudoh KTsukada HDuh K(2012)HPSG-Based Preprocessing for English-to-Japanese TranslationACM Transactions on Asian Language Information Processing10.1145/2334801.233480211:3(1-16)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1145/2334801.2334802
Onishi TUtiyama MSumita ELin D(2011)Reordering constraint based on document-level contextProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 210.5555/2002736.2002824(434-438)Online publication date: 19-Jun-2011
https://dl.acm.org/doi/10.5555/2002736.2002824

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents