Abstract
In recent years, Recurrent Neural Network based Neural Machine Translation (RNN-based NMT) equipped with an attention mechanism from the decoder to encoder, has achieved great advancements and exhibited good performance in many language pairs. However, little work has been done on the attention mechanism for the target side, which has the potential to further improve NMT. To address this issue, in this paper, we propose a novel bilingual attention based NMT, where its bilingual attention mechanism exploits decoding history and enables the NMT model to better dynamically select and exploit source side and target side information. Compared with previous RNN-based NMT models, our model has two advantages: First, our model exercises a dynamic control over the ratios at which source and target contexts respectively contribute to the generation of the next target word. In this way, the weakly induced structure relations on both sides can be exploited for NMT. Second, through short-cut connections, the training errors of our model can be directly back-propagated, which effectively alleviates the gradient vanishing or exploding issue. Experimental results and in-depth analyses on Chinese-English, English-German, and English-French translation tasks show that our model with proper configurations can significantly surpass the dominant NMT model, Transformer. Particularly, our proposed model has won the first prize in the English-Chinese translation task of WMT2018.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The corpora include LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06.
References
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings
Bojar O, Federmann C, Fishel M, Graham Y, Haddow B, Koehn P, Monz C (2018) Findings of the 2018 conference on machine translation (WMT18). In: Bojar O, Chatterjee R, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Jimeno-Yepes A, Koehn P, Monz C, Negri M, Nèvèol A, Neves ML, Post M, Specia L, Turchi M, Verspoor K (eds) Proceedings of the third conference on machine translation: Shared Task Papers, WMT 2018, Belgium, Brussels, October 31 - November 1, 2018, Association for Computational Linguistics, p 272–303
Buck C, Heafield K, Van Ooyen B (2014) N-gram counts and language models from the common crawl. In: LREC, Citeseer, vol 2, p 4
Chen K, Wang R, Utiyama M, Sumita E (2019) Neural machine translation with reordering embeddings. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, vol 1: Long Papers, Association for Computational Linguistics, pp 1787–1799
Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, Jones L, Schuster M, Shazeer N, Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Chen Z, Wu Y, Hughes M (2018) The best of both worlds: Combining recent advances in neural machine translation. In: Proceedings of the 56th annual meeting of the association for computational linguistics (vol 1: Long Papers), Association for Computational Linguistics, p 76–86
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. empirical methods in natural language processing p 551–561
Cheng Y, Tu Z, Meng F, Zhai J, Liu Y (2018) Towards robust neural machine translation. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, vol 1: Long Papers, Association for Computational Linguistics, pp 1756–1766
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics, association for computational linguistics, p 263–270
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for computational linguistics, Doha, Qatar, pp 1724– 1734
Gehring J, Auli M, Grangier D, Dauphin YN (2017) A convolutional encoder model for neural machine translation. In: Barzilay R, Kan M (eds) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, vol 1: Long Papers, Association for Computational Linguistics, p 123–135
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Precup D, Teh YW (eds) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, PMLR, Proceedings of Machine Learning Research, vol 70, p 1243–1252
Graves A, Wayne G, Danihelka I (2014) Neural turing machines CoRR arXiv:1410.5401
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016 p 770–778
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors, arXiv:1207.0580
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Johnson M, Schuster M, Le Q V, Krikun M, Wu Y, Chen Z, Thorat N, Viègas FB, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Google’s multilingual neural machine translation system: Enabling zero-shot translation. TACL 5:339–351
Kim J, El-khamy M, Lee J (2017) Residual LSTM: design of a deep recurrent architecture for distant speech recognition. In: Interspeech 2017, 18th annual conference of the international speech communication association, Stockholm, Sweden, August 20-24, 2017, p 1591–1595
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, 25-26 July 2004, Barcelona, Spain ACL, p 388–395
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-vol 1, Association for Computational Linguistics, p 48–54
Kuang S, Li J, Branco A, Luo W, Xiong D (2018) Attention focusing for neural machine translation by bridging source and target embeddings. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, vol 1: Long Papers, Association for Computational Linguistics p 1767–1776
Lee K, Levy O, Zettlemoyer L (2017) Recurrent additive networks, CoRR arXiv:1705.07393
Li C, Vu NT (2018) Densely connected convolutional networks for speech recognition. In: Proceedings of the 13th ITG Symposium on Speech Communication, Oldenburg, Germany, October 10–12, 2018, VDE / IEEE, p 1–5
Li J, Xiong D, Tu Z, Zhu M, Zhang M, Zhou G (2017) Modeling source syntax for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, p 688–697
Lin H, Meng F, Su J, Yin Y, Yang Z, Ge Y, Zhou J, Luo J (2020) Dynamic context-guided capsule network for multimodal machine translation. In: Chen CW, Cucchiara R, Hua X, Qi G, Ricci E, Zhang Z, Zimmermann R (eds) MM’20: The 28th ACM international conference on multimedia, virtual event / seattle, WA, USA, October 12-16, 2020, ACM, p 1320–1329
Lin J, Ma S, Su Q, Sun X (2018) Decoding-history-based adaptive control of attention for neural machine translation, CoRR arXiv:1802.01812
Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: 5Th international conference on learning representations, ICLR 2017, Toulon, France, april 24-26, 2017 Conference Track Proceedings. OpenReview.net
Liu Y, Lapata M (2018) Learning structured text representations. TACL 6:63–75
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, association for computational linguistics, p 1412–1421
Meng F, Tu Z, Cheng Y, Wu H, Zhai J, Yang Y, Wang D (2018) Neural machine translation with key-value memory-augmented attention. In: Lang J (ed) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, pp 25740–2580
Mi H, Sankaran B, Wang Z, Ittycheriah A (2016) Coverage embedding models for neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language Processing Association for Computational Linguistics, p 955–960
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, July 6-12, 2002, Philadelphia, PA, USA ACL, p 311–318
Pascanu R, Gülçehre Ç, Cho K, Bengio Y (2014) How to construct deep recurrent neural networks. In: Bengio Y, LeCun Y (eds) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings
Pereyra G, Tucker G, Chorowski J, Kaiser L, Hinton G E (2017) Regularizing neural networks by penalizing confident output distributions. In: 5Th international conference on learning representations, ICLR 2017, Toulon, France, april 24-26 2017, Workshop Track Proceedings. OpenReview.net
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, The Association for Computer Linguistics
Su J, Tan Z, Xiong D, Ji R, Shi X, Liu Y (2017) Lattice-based recurrent neural network encoders for neural machine translation. In: Singh SP, Markovitch S (eds) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, AAAI Press, p 3302– 3308
Su J, Wu S, Xiong D, Lu Y, Han X, Zhang B (2018) Variational recurrent neural machine translation. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, p 5488–5495
Su J, Wu S, Zhang B, Wu C, Qin Y, Xiong D (2018) A neural generative autoencoder for bilingual word embeddings. Inf Sci 424:287–300
Su J, Zeng J, Xiong D, Liu Y, Wang M, Xie J (2018) A hierarchy-to-sequence attentional neural machine translation model. IEEE ACM Trans Audio Speech Lang Process 26(3):623–632
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, p 3104–3112
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., p 4278–4284
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) LINE: Large-scale information network embedding. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th International Conference on World Wide Web, WWW 2015, ACM, pp 1067–1077
Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, The Association for Computer Linguistics
Tu Z, Liu Y, Lu Z, Liu X, Li H (2017) Context gates for neural machine translation. TACL 5:87–99
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, p 6000–6010
Wang C (2017) RRA: Recurrent residual attention for sequence learning, CoRR arXiv:1709.03714
Wang G, Ying R, Huang J, Leskovec J (2020) Direct multi-hop attention based graph neural network. CoRR arXiv:2009.14332
Wang M, Lu Z, Li H, Liu Q (2016) Memory-enhanced decoder for neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing association for computational linguistics, pp 278–286
Wang M, Lu Z, Zhou J, Liu Q (2017) Deep neural machine translation with linear associative unit. In: Barzilay R, Kan M (eds) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long papers, association for computational linguistics, pp 136–145
Wang M, Xie J, Tan Z, Su J, Xiong D, Bian C (2018) Neural machine translation with decoding history enhanced attention. In: Proceedings of the 27th international conference on computational linguistics, association for computational linguistics, p 1464–1473
Wang M, Xie J, Tan Z, Su J, Xiong D, Bian C (2019) Towards linear time neural machine translation with capsule networks. In: Inui K, Jiang J, Ng V,Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Association for Computational Linguistics, pp 803–812
Wang T, Cho K (2016) Larger-context language modelling with recurrent neural network. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), association for computational linguistics, Berlin, Germany, pp 1319–1329
Wang Y, Wang L, Li VOK, Tu Z (2020) On the sparsity of neural machine translation models. In: Webber B, Cohn T, He Y, Liu Y (eds) Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, November 16-20, 2020, association for computational linguistics, p 1060–1066
Weng R, Yu H, Wei X, Luo W (2020) Towards enhancing faithfulness for neural machine translation. In: Webber B, Cohn T, He Y, Liu Y (eds) Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, November 16-20, 2020, Association for Computational Linguistics, pp 2675–2684
Werlen LM, Pappas N, Ram D, Popescu-belis A (2018) Self-attentive residual decoder for neural machine translation. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018 vol 1 Long Papers, p 1366–1379
Weston J, Chopra S, Bordes A (2014) Memory networks. CoRR arXiv:1410.3916
Xia Y, Tian F, Qin T, Yu N, Liu T (2017) Sequence generation with target attention. In: Machine learning and knowledge discovery in databases - european conference, ECML PKDD 2017, Skopje, Macedonia, september 18-22, 2017 Proceedings, Part I, pp 816–831
Xiao F, Li J, Zhao H, Wang R, Chen K (2019) Lattice-based transformer encoder for neural machine translation. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Association for Computational Linguistics, pp 3090–3097
Yang J, Zhang B, Qin Y, Zhang X, Lin Q, Su J (2018) Otem&utem: Over- and under-translation evaluation metric for NMT. In: Zhang M, Ng V, Zhao D, Li S, Zan H (eds) Natural language processing and chinese computing - 7th CCF international conference, NLPCC 2018, Hohhot, China, august 26-30, 2018, proceedings, Part I, Springer, Lecture Notes in Computer Science, vol 11108, pp 291–302
Yin Y, Meng F, Su J, Zhou C, Yang Z, Zhou J, Luo J (2020) A novel graph-based multi-modal fusion encoder for neural machine translation. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Association for Computational Linguistics, pp 3025–3035
Zhang B, Xiong D, Su J, Duan H, Zhang M (2016) Variational neural machine translation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, pp 521–530
Zhang B, Xiong D, Su J (2018) Accelerating neural transformer via an average attention network. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, Association for Computational Linguistics, pp 1789–1798
Zhang B, Xiong D, Xie J, Su J (2020) Neural machine translation with gru-gated attention model. IEEE Trans Neural Networks Learn Syst 31(11):4688–4698
Zhang J, Zong C (2016) Exploiting source-side monolingual data in neural machine translation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, pp 1535–1545
Zhang J, Wang M, Liu Q, Zhou J (2017) Incorporating word reordering knowledge into attention-based neural machine translation. In: Barzilay R, Kan M (eds) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, Association for Computational Linguistics, pp 1524–1534
Zhang W, Hu J, Feng Y, Liu Q (2018) Refining source representations with relation networks for neural machine translation. In: Bender EM, Derczynski L, Isabelle P (eds) Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, Association for Computational Linguistics, p 1292–1303
Zhang X, Su J, Qin Y, Liu Y, Ji R, Wang H (2018) Asynchronous bidirectional decoding for neural machine translation. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, AAAI Press, p 5698–5705
Zheng Z, Zhou H, Huang S, Mou L, Dai X, Chen J, Tu Z (2018) Modeling past and future for neural machine translation. TACL 6:145–157
Zhou J, Cao Y, Wang X, Li P, Xu W (2016) Deep recurrent models with fast-forward connections for neural machine translation. TACL 4:371–383
Acknowledgements
The authors was supported by National Key Research and Development Program of China (No. 2020AAA0108004), National Natural Science Foundation of China (No. 61672440), Natural Science Foundation of Fujian Province of China (No.2020J06001), Youth Innovation Fund of Xiamen (Grant No. 3502Z20206059), and Industry-University-Research Project of Xiamen City (3502Z20203002). Fei Long and Jinsong Su are corresponding authors. We also thank the anonymous reviewers for their insightful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kang, L., He, S., Wang, M. et al. Bilingual attention based neural machine translation. Appl Intell 53, 4302–4315 (2023). https://doi.org/10.1007/s10489-022-03563-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03563-8