Abstract
A challenging aspect of Statistical Machine Translation from Arabic to English lies in bringing the Arabic source morpho-syntax to bear on the lexical as well as word-order choices of the English target string. In this article, we extend the feature-rich discriminative Direct Translation Model 2 (DTM2) with a novel linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar. This way we can reap the benefits of a target syntactic enhancement that leads to more grammatical output while also enabling dynamic decoding without the risk of blowing up decoding space and time requirements. Our model defines a mix of model parameters, some of which involve DTM2 source morpho-syntactic features, and others are novel target side syntactic features. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the-art DTM2 system.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bangalore S, Joshi A (1999) Supertagging: An approach to almost parsing. Comput Linguist 25(2): 237–265
Berger A, Della Pietra S, Della Pietra VJ (1996) Maximum entropy approach to natural language processing. Computat Linguist 22(1): 39–71
Brown P, Cocke J, Della Pietra S, Jelinek F, Della Pietra VJ, Mercer Lafferty R, Roossin P (1990) A statistical approach to machine translation. Computat Linguist 16(2): 79–85
Chelba C (2000) Exploiting syntactic structure for natural language modeling. Ph.D. thesis, Johns Hopkins University, Baltimore, MD
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the association for computational linguistics (ACL05), Ann Arbor, pp 263–270
Clark S, Curran J (2007) Wide-coverage efficient statistical parsing with ccg and log-linear models. Computat Linguist 33(1): 439–552
Hassan H, Sima’an K, Way A (2009) Lexicalized semi-incremental dependency parsing. In: Proceedings of RANLP 2009, the international conference on recent advances in natural language processing, Borovets, Bulgaria (to appear)
Hassan H, Sima’an K, Way A (2008a) Syntactically lexicalized phrase-based statistical translation. IEEE Trans Audio Speech Lang Process 6(7): 1260–1273
Hassan H, Sima’an K, Way A (2008b) A syntactic language model based on incremental ccg parsing. In: Proceedings IEEE workshop on spoken language technology (SLT), Goa
Hassan H, Sima’an K, Way A (2007) Integrating supertags into phrase-based statistical machine translation. In: Proceedings of the ACL-2007, Prague, Czech Republic, pp 288–295
Hockenmaier J (2003) Data and models for statistical parsing with combinatory categorial grammar. Ph.D Thesis, University of Edinburgh, Edinburgh
Huang L, Chiang D (2007) Forest rescoring: faster decoding with integrated language models. In: Proceedings of the ACL-2007, Prague
Ittycheriah A, Roukos S (2007) Direct translation model 2. In: Human Language Technologies 2007: the conference of the North American chapter of the association for computational linguistics. Proceedings of the main conference, Rochester, pp 57–64
Koehn P (2004a) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. Machine translation: from real users to research. In: Proceedings of 6th conference of the association for machine translation in the Americas, AMTA, Washington, DC, pp 115–124
Koehn P (2004b) Statistical significance tests for machine translation evaluation. In: Proceedings the conference on empirical methods in natural language processing (EMNLP), Barcelona, pp 388–395
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the joint human language technology conference and the annual meeting of the North American chapter of the association for computational linguistics (HLT-NAACL 2003), Edmonton, pp 127–133
Marcu D, Wang W, Echihabi A, Knight K (2006) SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006), Sydney, pp 44–52
Papineni K, Roukos S, Ward T (1997) Feature-based language understanding. In: Proceedings of 5th European conference on speech communication and technology EUROSPEECH ’97, Rhodes, pp 1435–1438
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a Method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics (ACL’02), Philadelphia, pp 311–318
Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of ACL-08: HLT, Columbus, pp 577–585
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the association for machine translation in the Americas, Cambridege, pp 223–231
Steedman M (2000) The syntactic process. MIT Press, Cambridge
Tillmann C, Ney H (2003) Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computat Linguist 29(1): 97–133
Zollmann A, Venugopal A. Syntax augmented machine translation via chart parsing. In: Proceedings of the workshop on statistical machine translation, HLT/NAACL, New York, pp 138–141
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was done while the first author was working at IBM.
Rights and permissions
About this article
Cite this article
Hassan, H., Sima’an, K. & Way, A. Efficient accurate syntactic direct translation models: one tree at a time. Machine Translation 26, 121–136 (2012). https://doi.org/10.1007/s10590-011-9116-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-011-9116-7