Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1698239.1698249dlproceedingsArticle/Chapter ViewAbstractPublication PagesmweConference Proceedingsconference-collections
research-article
Free access

Improving statistical machine translation using domain bilingual multiword expressions

Published: 06 August 2009 Publication History

Abstract

Multiword expressions (MWEs) have been proved useful for many natural language processing tasks. However, how to use them to improve performance of statistical machine translation (SMT) is not well studied. This paper presents a simple yet effective strategy to extract domain bilingual multiword expressions. In addition, we implement three methods to integrate bilingual MWEs to Moses, the state-of-the-art phrase-based machine translation system. Experiments show that bilingual MWEs could improve translation performance significantly.

References

[1]
Necip Fazil Ayan and Bonnie J. Dorr. 2006. Going beyond aer: an extensive analysis of word alignments and their impact on mt. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics, pages 9--16.
[2]
Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Dominic Widdows. 2003. An empirical model of multiword expression decomposability. In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisiton and Treatment, pages 89--96.
[3]
Colin Bannard. 2007. A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In Proceedings of the ACL Workshop on A Broader Perspective on Multiword Expressions, pages 1--8.
[4]
Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311.
[5]
Baobao Chang, Pernilla Danielsson, and Wolfgang Teubert. 2002. Extraction of translation unit from chinese-english parallel corpora. In Proceedings of the first SIGHAN workshop on Chinese language processing, pages 1--5.
[6]
Stanley F. Chen and Joshua Goodman. 1998. Am empirical study of smoothing techniques for language modeling. Technical report.
[7]
Michael Collins, Philipp Koehn, and Ivona Kučerová. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 531--540.
[8]
Michael Collins. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Empirical Methods in Natural Language Processing Conference, pages 1--8.
[9]
Tim Van de Cruys and Begoña Villada Moirón. 2007. Semantics-based multiword expression extraction. In Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pages 25--32.
[10]
Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61--74.
[11]
Matthias Eck, Stephan Vogel, and Alex Waibel. 2004. Improving statistical machine translation in the medical domain using the unified medical language system. In Proceedings of the 20th international conference on Computational Linguistics table of contents, pages 792--798.
[12]
Afsaneh Fazly and Suzanne Stevenson. 2006. Automatically constructing a lexicon of verb phrase idiomatic combinations. In Proceedings of the EACL, pages 337--344.
[13]
Katerina T. Frantzi and Sophia Ananiadou. 1996. Extracting nested collocations. In Proceedings of the 16th conference on Computational linguistics, pages 41--46.
[14]
Alexander Fraser and Daniel Marcu. 2007. Measuring word alignment quality for statistical machine translation. Computational Linguistics, 33(3):293--303.
[15]
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 48--54.
[16]
Patrik Lambert and Rafael Banchs. 2005. Data inferred multi-word expressions for statistical machine translation. In Proceedings of Machine Translation Summit X, pages 396--403.
[17]
Patrik Lambert and Rafael Banchs. 2006. Grouping multi-word expressions according to part-of-speech in statistical machine translation. In Proceedings of the Workshop on Multi-word-expressions in a multilingual context, pages 9--16.
[18]
Adam Lopez and Philip Resnik. 2006. Word-based alignment, phrase-based translation: What's the link? In proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation, pages 90--99.
[19]
Shengfen Luo and Maosong Sun. 2003. Two-character chinese word extraction based on hybrid of internal and contextual measures. In Proceedings of the second SIGHAN workshop on Chinese language processing, pages 24--30.
[20]
Franz Josef Och. 2002. Statistical Machine Translation: From Single-Word Models to Alignment Templates. Ph.d. thesis, Computer Science Department, RWTH Aachen, Germany.
[21]
Patrick Pantel and Dekang Lin. 2001. A statistical corpus based term extractor. In AI '01: Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence, pages 36--46.
[22]
Kishore Papineni, Salim Roukos, Todd Ward, and Weijing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Conference of the Association for Computational Linguistics, pages 311--318.
[23]
Scott Songlin Piao, Paul Rayson, Dawn Archer, and Tony McEnery. 2005. Comparing and combining a semantic tagger and a statistical tool for mwe extraction. Computer Speech and Language, 19(4):378--397.
[24]
Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger. 2002. Multiword expressions: A pain in the neck for nlp. In Proceedings of the 3th International Conference on Intelligent Text Processing and Computational Linguistics(CICLing-2002), pages 1--15.
[25]
Takaaki Tanaka and Timothy Baldwin. 2003. Noun-noun compound machine translation: A feasibility study on shallow processing. In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pages 17--24.
[26]
Hua Wu, Haifeng Wang, and Chengqing Zong. 2008. Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In Proceedings of Conference on Computational Linguistics (COLING), pages 993--1000.

Cited By

View all
  • (2019)Unsupervised compositionality prediction of nominal compoundsComputational Linguistics10.1162/coli_a_0034145:1(1-57)Online publication date: 17-May-2019
  • (2019)Topic-based term translation models for statistical machine translationArtificial Intelligence10.1016/j.artint.2015.12.002232:C(54-75)Online publication date: 4-Jan-2019
  • (2018)Reuse of termino-ontological resources and text corpora for building a multilingual domain ontologyJournal of Biomedical Informatics10.1016/j.jbi.2013.12.01348:C(171-182)Online publication date: 26-Dec-2018
  • Show More Cited By

Index Terms

  1. Improving statistical machine translation using domain bilingual multiword expressions

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      MWE '09: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
      August 2009
      80 pages
      ISBN:9781932432602

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 06 August 2009

      Qualifiers

      • Research-article

      Acceptance Rates

      Overall Acceptance Rate 31 of 69 submissions, 45%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)50
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 25 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Unsupervised compositionality prediction of nominal compoundsComputational Linguistics10.1162/coli_a_0034145:1(1-57)Online publication date: 17-May-2019
      • (2019)Topic-based term translation models for statistical machine translationArtificial Intelligence10.1016/j.artint.2015.12.002232:C(54-75)Online publication date: 4-Jan-2019
      • (2018)Reuse of termino-ontological resources and text corpora for building a multilingual domain ontologyJournal of Biomedical Informatics10.1016/j.jbi.2013.12.01348:C(171-182)Online publication date: 26-Dec-2018
      • (2012)Bootstrapping method for chunk alignment in phrase based SMTProceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)10.5555/2387956.2387969(93-100)Online publication date: 23-Apr-2012
      • (2011)A rapid method to extract multiword expressions with statistic measures and linguistic rulesProceedings of the 2011 international conference on Web information systems and mining - Volume Part II10.5555/2045753.2045789(234-241)Online publication date: 24-Sep-2011
      • (2011)An n-gram frequency database reference to handle MWE extraction in NLP applicationsProceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World10.5555/2021121.2021139(83-91)Online publication date: 23-Jun-2011
      • (2011)Influence of treebank design on representation of multiword expressionsProceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I10.5555/1964799.1964801(1-14)Online publication date: 20-Feb-2011
      • (2010)Task-based evaluation of multiword expressionsHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858028(242-245)Online publication date: 2-Jun-2010

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media