Abstract
We present a new implication of Wu’s (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retrieving truly parallel sentence translations from large collections of highly non-parallel documents. Our approach leverages a strong language universal constraint posited by the ITG Hypothesis, that can serve as a strong inductive bias for various language learning problems, resulting in both efficiency and accuracy gains. The task we attack is highly practical since non-parallel multilingual data exists in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Our aim here is to mine truly parallel sentences, as opposed to comparable sentence pairs or loose translations as in most previous work. The method we introduce exploits Bracketing ITGs to produce the first known results for this problem. Experiments show that it obtains large accuracy gains on this task compared to the expected performance of state-of-the-art models that were developed for the less stringent task of mining comparable sentence pairs.
This work was supported in part by the Hong Kong Research Grants Council through grants RGC6083/99E, RGC6256/00E, DAG03/04.EG09, and RGC6206/03E.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wu, D.: An algorithm for simultaneously bracketing parallel texts by aligning words. In: ACL-1995, Cambridge, MA (1995)
Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23 (1997)
Zens, R., Ney, H.: A comparative study on reordering constraints in statistical machine translation. In: ACL-2003, Sapporo, pp. 192–202 (2003)
Zhang, H., Gildea, D.: Syntax-based alignment: Supervised or unsupervised? In: COLING-2004, Geneva (2004)
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: ACL-2001, Toulouse, France (2001)
Zhang, H., Gildea, D.: Stochastic lexicalized inversion transduction grammar for alignment. In: ACL-2005, Ann Arbor, pp. 475–482 (2005)
Zens, R., Ney, H., Watanabe, T., Sumita, E.: Reordering constraints for phrasebased statistical machine translation. In: COLING-2004, Geneva (2004)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: ACL-2005, Ann Arbor, pp. 263–270 (2005)
Fung, P., Cheung, P.: Mining very-non-parallel corpora: Parallel sentence and lexicon extraction via bootstrapping and em. In: EMNLP-2004, Barcelona (2004)
Munteanu, D.S., Fraser, A., Marcu, D.: Improved machine translation performance via parallel sentence extraction from comparable corpora. In: NAACL-2004 (2004)
Zhao, B., Vogel, S.: Adaptive parallel sentences mining from web bilingual news collections. In: IEEE Workshop on Data Mining (2002)
Lewis, P.M., Stearns, R.E.: Syntax-directed transduction. Journal of the Association for Computing Machinery 15, 465–488 (1968)
Fung, P., Liu, X., Cheung, C.S.: Mixed-language query disambiguation. In: ACL- 1999, Maryland (1999)
Och, F.J., Ney, H.: Improved statistical alignment models. In: ACL-2000, Hong Kong (2000)
Brown, P.F., DellaPietra, S.A., DellaPietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation. Computational Linguistics 19, 263–311 (1993)
Leusch, G., Ueffing, N., Ney, H.: A novel string-to-string distance measure with applications to machine translation evaluation. In: MT Summit IX (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, D., Fung, P. (2005). Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_23
Download citation
DOI: https://doi.org/10.1007/11562214_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)