A Mixed Learning Objective for Neural Machine Translation

Wenjie Lu¹⁴,
Leiying Zhou¹⁴,
Gongshen Liu¹⁴ &
…
Quanhai Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12522))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

867 Accesses

Abstract

Evaluation discrepancy and overcorrection phenomenon are two common problems in neural machine translation (NMT). NMT models are generally trained with word-level learning objective, but evaluated by sentence-level metrics. Moreover, the cross-entropy loss function discourages model to generate synonymous predictions and overcorrect them to ground truth words. To address these two drawbacks, we adopt multi-task learning and propose a mixed learning objective (MLO) which combines the strength of word-level and sentence-level evaluation without modifying model structure. At word-level, it calculates semantic similarity between predicted and ground truth words. At sentence-level, it computes probabilistic n-gram matching scores of generated translations. We also combine a loss-sensitive scheduled sampling decoding strategy with MLO to explore its extensibility. Experimental results on IWSLT 2016 German-English and WMT 2019 English-Chinese datasets demonstrate that our methodology can significantly promote translation quality. The ablation study shows that both word-level and sentence-level learning objectives can improve BLEU scores. Furthermore, MLO is consistent with state-of-the-art scheduled sampling methods and can achieve further promotion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Training with Additional Semantic Constraints for Enhancing Neural Machine Translation

Optimizing Non-Decomposable Evaluation Metrics for Neural Machine Translation

Article 14 July 2017

Transformer-Based Unified Neural Network for Quality Estimation and Transformer-Based Re-decoding Model for Machine Translation

Notes

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)
Google Scholar
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: NIPS, pp. 1171–1179 (2015)
Google Scholar
Cettolo, M., Girardi, C., Federico, M.: Wit$^3$: web inventory of transcribed and translated talks. In: EAMT, Trento, Italy, pp. 261–268 (2012)
Google Scholar
Cho, K., Gulcehre, B.V.M.C., Bahdanau, D., Schwenk, F.B.H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Google Scholar
Forcada, M.L., Ñeco, R.P.: Recursive hetero-associative memories for translation. In: Mira, J., Moreno-Díaz, R., Cabestany, J. (eds.) IWANN 1997. LNCS, vol. 1240, pp. 453–462. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0032504
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Google Scholar
Gumbel, E.J.: Statistical theory of extreme values and some practical applications. NBS Applied Mathematics Series, vol. 33 (1954)
Google Scholar
Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: CIKM, pp. 1411–1420 (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Lin, K., Li, D., He, X., Zhang, Z., Sun, M.T.: Adversarial ranking for language generation. In: NIPS, pp. 3155–3165 (2017)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)
Google Scholar
Maddison, C., Tarlow, D., Minka, T.: A* sampling. NIPS (2014)
Google Scholar
Meng, F., Zhang, J.: DTMT: a novel deep transition architecture for neural machine translation. In: AAAI, vol. 33, pp. 224–231 (2019)
Google Scholar
Pradhan, N., Gyanchandani, M., Wadhvani, R.: A review on text similarity technique used in IR and its application. Int. J. Comput. Appl. 120(9), 29–34 (2015)
Google Scholar
Ranzato, M., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732 (2015)
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP, pp. 379–389 (2015)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL, pp. 1715–1725 (2016)
Google Scholar
Shao, C., Feng, Y., Chen, X.: Greedy search with probabilistic n-gram matching for neural machine translation. arXiv preprint arXiv:1809.03132 (2018)
Shen, S., et al.: Minimum risk training for neural machine translation. In: ACL, pp. 1683–1692 (2016)
Google Scholar
Shterionov, D., Nagle, P., Casanellas, L., Superbo, R., O’Dowd, T.: Empirical evaluation of NMT and PBSMT quality for large-scale translation production. In: EAMT: User Track (2017)
Google Scholar
So, D., Le, Q., Liang, C.: The evolved transformer. In: ICML, pp. 5877–5886 (2019)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)
Google Scholar
Sutton, R.S., Barto, A.G., et al.: Introduction to reinforcement learning, vol. 2. MIT Press, Cambridge (1998)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Venkatraman, A., Hebert, M., Bagnell, J.A.: Improving multi-step prediction of learned time series models. In: AAAI, pp. 3024–3030 (2015)
Google Scholar
Wieting, J., Berg-Kirkpatrick, T., Gimpel, K., Neubig, G.: Beyond bleu: training neural machine translation with semantic similarity. In: ACL, pp. 4344–4355 (2019)
Google Scholar
Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)
Article Google Scholar
Wiseman, S., Rush, A.M.: Sequence-to-sequence learning as beam-search optimization. In: EMNLP 2016, pp. 1296–1306 (2016)
Google Scholar
Zhang, W., Feng, Y., Meng, F., You, D., Liu, Q.: Bridging the gap between training and inference for neural machine translation. In: ACL, pp. 4334–4343 (2019)
Google Scholar

Download references

Acknowledgment

This research work has been funded by the National Natural Science Foundation of China (Grant No. 61772337), the National Key Research and Development Program of China No. 2018YFC0830803.

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Wenjie Lu, Leiying Zhou, Gongshen Liu & Quanhai Zhang

Authors

Wenjie Lu
View author publications
You can also search for this author in PubMed Google Scholar
Leiying Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Gongshen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Quanhai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gongshen Liu or Quanhai Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Peking University, Beijing, China
Sujian Li
Westlake University, Hangzhou, China
Yue Zhang
Tsinghua University, Beijing, China
Yang Liu
Chinese Academy of Sciences, Beijing, China
Shizhu He
Beijing Language and Culture University, Beijing, China
Gaoqi Rao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, W., Zhou, L., Liu, G., Zhang, Q. (2020). A Mixed Learning Objective for Neural Machine Translation. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds) Chinese Computational Linguistics. CCL 2020. Lecture Notes in Computer Science(), vol 12522. Springer, Cham. https://doi.org/10.1007/978-3-030-63031-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-63031-7_15
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63030-0
Online ISBN: 978-3-030-63031-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Mixed Learning Objective for Neural Machine Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Training with Additional Semantic Constraints for Enhancing Neural Machine Translation

Optimizing Non-Decomposable Evaluation Metrics for Neural Machine Translation

Transformer-Based Unified Neural Network for Quality Estimation and Transformer-Based Re-decoding Model for Machine Translation

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Mixed Learning Objective for Neural Machine Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Training with Additional Semantic Constraints for Enhancing Neural Machine Translation

Optimizing Non-Decomposable Evaluation Metrics for Neural Machine Translation

Transformer-Based Unified Neural Network for Quality Estimation and Transformer-Based Re-decoding Model for Machine Translation

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation