Abstract
Evaluation discrepancy and overcorrection phenomenon are two common problems in neural machine translation (NMT). NMT models are generally trained with word-level learning objective, but evaluated by sentence-level metrics. Moreover, the cross-entropy loss function discourages model to generate synonymous predictions and overcorrect them to ground truth words. To address these two drawbacks, we adopt multi-task learning and propose a mixed learning objective (MLO) which combines the strength of word-level and sentence-level evaluation without modifying model structure. At word-level, it calculates semantic similarity between predicted and ground truth words. At sentence-level, it computes probabilistic n-gram matching scores of generated translations. We also combine a loss-sensitive scheduled sampling decoding strategy with MLO to explore its extensibility. Experimental results on IWSLT 2016 German-English and WMT 2019 English-Chinese datasets demonstrate that our methodology can significantly promote translation quality. The ablation study shows that both word-level and sentence-level learning objectives can improve BLEU scores. Furthermore, MLO is consistent with state-of-the-art scheduled sampling methods and can achieve further promotion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: NIPS, pp. 1171–1179 (2015)
Cettolo, M., Girardi, C., Federico, M.: Wit\(^3\): web inventory of transcribed and translated talks. In: EAMT, Trento, Italy, pp. 261–268 (2012)
Cho, K., Gulcehre, B.V.M.C., Bahdanau, D., Schwenk, F.B.H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Forcada, M.L., Ñeco, R.P.: Recursive hetero-associative memories for translation. In: Mira, J., Moreno-Díaz, R., Cabestany, J. (eds.) IWANN 1997. LNCS, vol. 1240, pp. 453–462. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0032504
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Gumbel, E.J.: Statistical theory of extreme values and some practical applications. NBS Applied Mathematics Series, vol. 33 (1954)
Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: CIKM, pp. 1411–1420 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Lin, K., Li, D., He, X., Zhang, Z., Sun, M.T.: Adversarial ranking for language generation. In: NIPS, pp. 3155–3165 (2017)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)
Maddison, C., Tarlow, D., Minka, T.: A* sampling. NIPS (2014)
Meng, F., Zhang, J.: DTMT: a novel deep transition architecture for neural machine translation. In: AAAI, vol. 33, pp. 224–231 (2019)
Pradhan, N., Gyanchandani, M., Wadhvani, R.: A review on text similarity technique used in IR and its application. Int. J. Comput. Appl. 120(9), 29–34 (2015)
Ranzato, M., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732 (2015)
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP, pp. 379–389 (2015)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL, pp. 1715–1725 (2016)
Shao, C., Feng, Y., Chen, X.: Greedy search with probabilistic n-gram matching for neural machine translation. arXiv preprint arXiv:1809.03132 (2018)
Shen, S., et al.: Minimum risk training for neural machine translation. In: ACL, pp. 1683–1692 (2016)
Shterionov, D., Nagle, P., Casanellas, L., Superbo, R., O’Dowd, T.: Empirical evaluation of NMT and PBSMT quality for large-scale translation production. In: EAMT: User Track (2017)
So, D., Le, Q., Liang, C.: The evolved transformer. In: ICML, pp. 5877–5886 (2019)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)
Sutton, R.S., Barto, A.G., et al.: Introduction to reinforcement learning, vol. 2. MIT Press, Cambridge (1998)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Venkatraman, A., Hebert, M., Bagnell, J.A.: Improving multi-step prediction of learned time series models. In: AAAI, pp. 3024–3030 (2015)
Wieting, J., Berg-Kirkpatrick, T., Gimpel, K., Neubig, G.: Beyond bleu: training neural machine translation with semantic similarity. In: ACL, pp. 4344–4355 (2019)
Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)
Wiseman, S., Rush, A.M.: Sequence-to-sequence learning as beam-search optimization. In: EMNLP 2016, pp. 1296–1306 (2016)
Zhang, W., Feng, Y., Meng, F., You, D., Liu, Q.: Bridging the gap between training and inference for neural machine translation. In: ACL, pp. 4334–4343 (2019)
Acknowledgment
This research work has been funded by the National Natural Science Foundation of China (Grant No. 61772337), the National Key Research and Development Program of China No. 2018YFC0830803.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, W., Zhou, L., Liu, G., Zhang, Q. (2020). A Mixed Learning Objective for Neural Machine Translation. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds) Chinese Computational Linguistics. CCL 2020. Lecture Notes in Computer Science(), vol 12522. Springer, Cham. https://doi.org/10.1007/978-3-030-63031-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-63031-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63030-0
Online ISBN: 978-3-030-63031-7
eBook Packages: Computer ScienceComputer Science (R0)