Efficient accurate syntactic direct translation models: one tree at a time

Hany Hassan¹,
Khalil Sima’an² &
Andy Way³

168 Accesses
1 Citation
Explore all metrics

Abstract

A challenging aspect of Statistical Machine Translation from Arabic to English lies in bringing the Arabic source morpho-syntax to bear on the lexical as well as word-order choices of the English target string. In this article, we extend the feature-rich discriminative Direct Translation Model 2 (DTM2) with a novel linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar. This way we can reap the benefits of a target syntactic enhancement that leads to more grammatical output while also enabling dynamic decoding without the risk of blowing up decoding space and time requirements. Our model defines a mix of model parameters, some of which involve DTM2 source morpho-syntactic features, and others are novel target side syntactic features. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the-art DTM2 system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Bangalore S, Joshi A (1999) Supertagging: An approach to almost parsing. Comput Linguist 25(2): 237–265
Google Scholar
Berger A, Della Pietra S, Della Pietra VJ (1996) Maximum entropy approach to natural language processing. Computat Linguist 22(1): 39–71
Google Scholar
Brown P, Cocke J, Della Pietra S, Jelinek F, Della Pietra VJ, Mercer Lafferty R, Roossin P (1990) A statistical approach to machine translation. Computat Linguist 16(2): 79–85
Google Scholar
Chelba C (2000) Exploiting syntactic structure for natural language modeling. Ph.D. thesis, Johns Hopkins University, Baltimore, MD
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the association for computational linguistics (ACL05), Ann Arbor, pp 263–270
Clark S, Curran J (2007) Wide-coverage efficient statistical parsing with ccg and log-linear models. Computat Linguist 33(1): 439–552
Google Scholar
Hassan H, Sima’an K, Way A (2009) Lexicalized semi-incremental dependency parsing. In: Proceedings of RANLP 2009, the international conference on recent advances in natural language processing, Borovets, Bulgaria (to appear)
Hassan H, Sima’an K, Way A (2008a) Syntactically lexicalized phrase-based statistical translation. IEEE Trans Audio Speech Lang Process 6(7): 1260–1273
Article Google Scholar
Hassan H, Sima’an K, Way A (2008b) A syntactic language model based on incremental ccg parsing. In: Proceedings IEEE workshop on spoken language technology (SLT), Goa
Hassan H, Sima’an K, Way A (2007) Integrating supertags into phrase-based statistical machine translation. In: Proceedings of the ACL-2007, Prague, Czech Republic, pp 288–295
Hockenmaier J (2003) Data and models for statistical parsing with combinatory categorial grammar. Ph.D Thesis, University of Edinburgh, Edinburgh
Huang L, Chiang D (2007) Forest rescoring: faster decoding with integrated language models. In: Proceedings of the ACL-2007, Prague
Ittycheriah A, Roukos S (2007) Direct translation model 2. In: Human Language Technologies 2007: the conference of the North American chapter of the association for computational linguistics. Proceedings of the main conference, Rochester, pp 57–64
Koehn P (2004a) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. Machine translation: from real users to research. In: Proceedings of 6th conference of the association for machine translation in the Americas, AMTA, Washington, DC, pp 115–124
Koehn P (2004b) Statistical significance tests for machine translation evaluation. In: Proceedings the conference on empirical methods in natural language processing (EMNLP), Barcelona, pp 388–395
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the joint human language technology conference and the annual meeting of the North American chapter of the association for computational linguistics (HLT-NAACL 2003), Edmonton, pp 127–133
Marcu D, Wang W, Echihabi A, Knight K (2006) SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006), Sydney, pp 44–52
Papineni K, Roukos S, Ward T (1997) Feature-based language understanding. In: Proceedings of 5th European conference on speech communication and technology EUROSPEECH ’97, Rhodes, pp 1435–1438
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a Method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics (ACL’02), Philadelphia, pp 311–318
Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of ACL-08: HLT, Columbus, pp 577–585
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the association for machine translation in the Americas, Cambridege, pp 223–231
Steedman M (2000) The syntactic process. MIT Press, Cambridge
Google Scholar
Tillmann C, Ney H (2003) Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computat Linguist 29(1): 97–133
Article Google Scholar
Zollmann A, Venugopal A. Syntax augmented machine translation via chart parsing. In: Proceedings of the workshop on statistical machine translation, HLT/NAACL, New York, pp 138–141

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, USA
Hany Hassan
University of Amsterdam, Amsterdam, The Netherlands
Khalil Sima’an
Dublin City University, Dublin, Ireland
Andy Way

Authors

Hany Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Khalil Sima’an
View author publications
You can also search for this author in PubMed Google Scholar
Andy Way
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hany Hassan.

Additional information

This work was done while the first author was working at IBM.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassan, H., Sima’an, K. & Way, A. Efficient accurate syntactic direct translation models: one tree at a time. Machine Translation 26, 121–136 (2012). https://doi.org/10.1007/s10590-011-9116-7

Download citation

Received: 30 June 2010
Accepted: 30 September 2011
Published: 25 October 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10590-011-9116-7

Efficient accurate syntactic direct translation models: one tree at a time

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

English-Arabic Statistical Machine Translation: State of the Art

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient accurate syntactic direct translation models: one tree at a time

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

English-Arabic Statistical Machine Translation: State of the Art

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation