Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1868850.1868912dlproceedingsArticle/Chapter ViewAbstractPublication PageswmtConference Proceedingsconference-collections
research-article
Free access

Divide and translate: improving long distance reordering in statistical machine translation

Published: 15 July 2010 Publication History

Abstract

This paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause translations with non-terminals. The non-terminals are placeholders of embedded clauses, by which we reduce complicated clause-level reordering into simple word-level reordering. Its translation model is trained using a bilingual corpus with clause-level alignment, which can be automatically annotated by our alignment algorithm with a syntactic parser in the source language. We achieved significant improvements of 1.4% in BLEU and 1.3% in TER by using Moses, and 2.2% in BLEU and 3.5% in TER by using our hierarchical phrase-based SMT, for the English-to-Japanese translation of research paper abstracts in the medical domain.

References

[1]
}}Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311.
[2]
}}David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201--228.
[3]
}}Michael Collins, Philipp Koehn, and Ivona Kuĉerová. 2005. Clause restructuring for statistical machine translation. In Proc. ACL, pages 531--540.
[4]
}}Osamu Furuse, Setsuo Yamada, and Kazuhide Yamamoto. 1998. Splitting long or ill-formed input for robust spoken-language translation. In Proc. COLING-ACL, pages 421--427.
[5]
}}Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What's in a translation rule? In Proc. NAACL, pages 273--280.
[6]
}}Jonathan Graehl and Kevin Knight. 2004. Training tree transducers. In Proc. HLT-NAACL, pages 105--112.
[7]
}}Jason Katz-Brown and Michael Collins. 2008. Syntactic reordering in preprocessing for Japanese-English translation: MIT system description for NTCIR-7 patent translation task. In Proc. NTCIR-7, pages 409--414.
[8]
}}Yeun-Bae Kim and Terumasa Ehara. 1994. A method for partitioning of long Japanese sentences with subject resolution in J/E machine translation. In Proc. International Conference on Computer Processing of Oriental Languages, pages 467--473.
[9]
}}Phillip Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. HLT-NAACL, pages 263--270.
[10]
}}Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proc. IWSLT.
[11]
}}Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proc. ACL Companion Volume Proceedings of the Demo and Poster Sessions, pages 177--180.
[12]
}}Chi-Ho Li, Dongdong Zhang, Mu Li, Ming Zhou, Minghui Li, and Yi Guan. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proc. ACL, pages 720--727.
[13]
}}Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-String alignment template for statistical machine translation. In Proc. Coling-ACL, pages 609--616.
[14]
}}Igor Malioutov and Regina Barzilay. 2006. Minimum cut model for spoken lecture segmentation. In Proc. Coling-ACL, pages 25--32.
[15]
}}Yusuke Miyao and Jun'ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computational Linguistics, 34(1):35--80.
[16]
}}Franz Josef Och, Nicola Ueffing, and Hermann Ney. 2001. An efficient A* search algorithm for statistical machine translation. In Proc. the ACL Workshop on Data-Driven Methods in Machine Translation, pages 55--62.
[17]
}}Kishore Papineni, Salim Roukos, Todd Ward, and Wei Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proc. ACL, pages 311--318.
[18]
}}Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proc. AMTA, pages 223--231.
[19]
}}Christoph Tillmann. 2004. A unigram orientation model for statistical machine translation. In Proc. HLT-NAACL, pages 101--104.
[20]
}}Erik F. Tjong, Kim Sang, and Hervé Déjean. 2001. Introduction to the CoNLL-2001 shared task: Clause identification. In Proc. CoNLL, pages 53--57.
[21]
}}Roy Tromble and Jason Eisner. 2009. Learning linear ordering problems for better translation. In Proc. EMNLP, pages 1007--1016.
[22]
}}Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377--404.
[23]
}}Fei Xia and Michael McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proc. COLING, pages 508--514.
[24]
}}Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a dependency parser to improve SMT for Subject-Object-Verb languages. In Proc. HLT-NAACL, pages 245--253.
[25]
}}Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model. In Proc. ACL, pages 523--530.
[26]
}}Ying Zhang, Stephan Vogel, and Alex Weibel. 2004. Interpreting BLEU/NIST scores: How much improvement do we need to have a better system? In Proc. LREC, pages 2051--2054.
[27]
}}Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proc. ICML, pages 912--919.

Cited By

View all
  • (2019)Does BLEU score work for code migration?Proceedings of the 27th International Conference on Program Comprehension10.1109/ICPC.2019.00034(165-176)Online publication date: 25-May-2019
  • (2015)Divide-and-conquer approach for multi-phase statistical migration for source codeProceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE.2015.74(585-596)Online publication date: 9-Nov-2015
  • (2012)Application of clause alignment for statistical machine translationProceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation10.5555/2392936.2392952(102-110)Online publication date: 12-Jul-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
WMT '10: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
July 2010
455 pages
ISBN:9781932432718

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 15 July 2010

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)7
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Does BLEU score work for code migration?Proceedings of the 27th International Conference on Program Comprehension10.1109/ICPC.2019.00034(165-176)Online publication date: 25-May-2019
  • (2015)Divide-and-conquer approach for multi-phase statistical migration for source codeProceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE.2015.74(585-596)Online publication date: 9-Nov-2015
  • (2012)Application of clause alignment for statistical machine translationProceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation10.5555/2392936.2392952(102-110)Online publication date: 12-Jul-2012
  • (2012)HPSG-Based Preprocessing for English-to-Japanese TranslationACM Transactions on Asian Language Information Processing10.1145/2334801.233480211:3(1-16)Online publication date: 1-Sep-2012
  • (2011)Reordering constraint based on document-level contextProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 210.5555/2002736.2002824(434-438)Online publication date: 19-Jun-2011

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media