Abstract
Document-level machine translation remains challenging owing to the high time complexity of existing models. In this paper, we propose a dual-learning-based neural machine translation (NMT) using undirected neural sequence model for document-level translation. Dual-learning mechanism can enable an NMT system to automatically learn from corpora through a reinforcement learning process. Undirected neural sequence models such as Bidirectional Encoder Representations from Transformers (BERT) have achieved success on several natural language processing (NLP) tasks. Inspired by a BERT-like machine translation model, we employ a constant-time decoding strategy in our model. In addition, we utilize a two-step training strategy. The experimental results show that our approach has much faster decoding speed than a previous document-level NMT model on several document-level translation tasks while the loss of our approach’s translation quality is acceptable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, NIPS 2017, pp. 5998–6008. Curran Associates, New York (2017)
Bawden, R., Sennrich, R., Birch, A., Haddow, B.: Evaluating discourse phenomena in neural machine translation. In: Walker, M.A., Ji, H., Stent, A. (eds.) NAACL-HLT 2018, vol. 1, pp. 1304–1313. ACL, Stroudsburg (2018)
Tan, X., Zhang, L., Xiong, D., Zhou, G.: Hierarchical modeling of global context for document-level neural machine translation. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) EMNLP-IJCNLP 2019, pp. 1576–1585. ACL, Stroudsburg (2019)
Zhang, J., et al.: Improving the transformer translation model with document-level context. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) The 2018 Conference on EMNLP, pp. 533–542. ACL, Stroudsburg (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) NAACL-HLT 2019, vol. 1, pp. 4171–4186. ACL, Stroudsburg (2019)
Mansimov, E., Wang, A., Cho, K.: A generalized framework of sequence generation with application to undirected sequence models. CoRR abs/1905.12790 (2019)
Lample, G., Conneau, A.: Cross-lingual language model pretraining. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, NIPS 2019, pp. 7057–7067. Curran Associates, New York (2019)
Miculicich, L., Ram, D., Pappas, N., Henderson, J.: Document-level neural machine translation with hierarchical attention networks. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) The 2018 Conference on EMNLP, pp. 2947–2954. ACL, Stroudsburg (2018)
Ghazvininejad, M., Levy, O., Liu, Y., Zettlemoyer, L.: Mask-predict: parallel decoding of conditional masked language models. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) EMNLP-IJCNLP 2019, pp. 6111–6120. ACL, Stroudsburg (2019)
He, D., et al.: Dual learning for machine translation. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, NIPS 2016, pp. 820–828. Curran Associates, New York (2016)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics, ACL 2002, pp. 311–318. ACL, Stroudsburg (2002)
Cettolo, M., Girardi, C., Federico, M.: Wit3: web inventory of transcribed and translated talks. In: 16th Conference of the European Association for Machine Translation (EAMT), Trento, Italy, pp. 261–268 (2012)
Cettolo, M., Jan, N., Sebastian, S., Bentivogli, L., Cattoni, R., Federico, M.: The IWSLT 2015 evaluation campaign. In: The International Workshop on Spoken Language Translation, Da Nang, Vietnam (2015)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, pp. 1715–1725. ACL, Stroudsburg (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Werlen, L.M., Popescu-Belis, A.: Validation of an automatic metric for the accuracy of pronoun translation (APT). In: Webber, B.L., Popescu-Belis, A., Tiedemann, J. (eds.) 3rd Workshop on Discourse in Machine Translation, EMNLP 2017, pp. 17–25. ACL, Stroudsburg (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, L., Xu, J. (2020). Dual-Learning-Based Neural Machine Translation Using Undirected Sequence Model for Document Translation. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_73
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)