Recent Advances in Dialogue Machine Translation
<p>Architectures of the (<b>a</b>) SMT and (<b>b</b>) NMT models.</p> "> Figure 2
<p>An example of chat-based dialogue translation. It is extracted from the Chinese-English MVSub Corpus.</p> "> Figure 3
<p>An example of task-based dialogue translation. It is extracted from the Chinese-English IWSLT-DIALOG Corpus.</p> "> Figure 4
<p>The overview of dialogue translation issues and sub-categories along with related works.</p> "> Figure 5
<p>The architecture of the document-level Transformer model.</p> "> Figure 6
<p>The architecture of reconstructor-augmented NMT. The Chinese word in red is a zero pronoun <span class="html-italic">it</span>.</p> "> Figure 7
<p>Illustration of decoding with reconstruction.</p> "> Figure 8
<p>Personalized dialogue translation model using speaker information.</p> "> Figure 9
<p>Illustration of decoding with reconstruction.</p> ">
Abstract
:1. Introduction
- Previous works mainly exploited dialogue MT from perspectives of coherence, consistency, and cohesion. Furthermore, recent studies began to pay more attention to the issue of personality such as role preference.
- Although there are some related corpora, the scarcity of training data remains one of the crucial issues, which severely hinders the further development of the deep learning methods for real applications of dialogue translation.
- Existing approaches can be categorized into three main strands. One research line is to exploit document-level NMT architectures, which can improve the consistency and coherence in translation output. The second one tries to deal with specific discourse phenomena such as anaphora, which can lead to better cohesion in translations. The third line aims to enhance the personality of dialogue MT systems by leveraging additional information labeled by humans. In future work, it is necessary to design an end-to-end model that can capture various characteristics of dialogues.
- Through our empirical experiments, we gain some interesting findings: (1) data selection methods can significantly improve the baseline model especially for small-scale data; (2) the large batch learning works well, which makes sentence-level NMT models perform the best among different NMT models; (3) document-level contexts are not always useful on the dialogue translation due to the limitation of data; (4) it is helpful to dialogue MT by transferring general knowledge from pretrained models.
2. Preliminary
2.1. Machine Translation
2.1.1. Statistical Machine Translation
2.1.2. Neural Machine Translation
2.2. Dialogue Translation
3. Overview of Dialogue Machine Translation
3.1. Dialogue Translation Issues
3.2. Existing Data
3.3. Representative Approaches
3.3.1. Architecture: Document-Level NMT
3.3.2. Discourse Phenomenon: Zero Pronoun Translation
3.3.3. Dialogue Personality: Speaker Information
3.4. Real-Life Applications
4. Building Advanced Dialogue NMT Systems
4.1. Methodology
4.2. Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
NLP | Natural Language Processing |
MT | Machine Translation |
NMT | Neural Machine Translation |
RBMT | Rule-based Machine Translation |
SMT | Statistical Machine Translation |
LM | Language Model |
NE | Named Entity |
ZP | Zero Pronoun |
SOTA | State-of-The-Art |
References
- Dictionary, O.E. Oxford English Dictionary; Simpson, J.A., Weiner, E.S.C., Eds.; Oxford University Press: Oxford, UK, 1989. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Danescu-Niculescu-Mizil, C.; Lee, L. Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, Portland, OR, USA, 23 June 2011; pp. 76–87. [Google Scholar]
- Banchs, R.E. Movie-DiC: A Movie Dialogue Corpus for Research and Development. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, 8–14 July 2017; pp. 203–207. [Google Scholar]
- Walker, M.A.; Lin, G.I.; Sawyer, J. An Annotated Corpus of Film Dialogue for Learning and Characterizing Character Style. In Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 23–25 May 2012; pp. 1373–1378. [Google Scholar]
- Schmitt, A.; Ultes, S.; Minker, W. A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System. In Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 23–25 May 2012; pp. 3369–3373. [Google Scholar]
- Byrne, B.; Krishnamoorthi, K.; Sankar, C.; Neelakantan, A.; Goodrich, B.; Duckworth, D.; Yavuz, S.; Dubey, A.; Kim, K.Y.; Cedilnik, A. Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 4516–4525. [Google Scholar]
- Tiedemann, J. Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 23–25 May 2012; pp. 2214–2218. [Google Scholar]
- Wang, L.; Zhang, X.; Tu, Z.; Liu, Q.; Way, A. Automatic Construction of Discourse Corpora for Dialogue Translation. In Proceedings of the 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, 23–28 May 2016; pp. 2248–2254. [Google Scholar]
- Farajian, M.A.; Lopes, A.V.; Martins, A.F.; Maruf, S.; Haffari, G. Findings of the WMT 2020 shared task on chat translation. In Proceedings of the 5th Conference on Machine Translation, Online, 19–20 November 2020; pp. 65–75. [Google Scholar]
- Wang, L.; Tu, Z.; Way, A.; Liu, Q. Exploiting Cross-Sentence Context for Neural Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
- Maruf, S.; Martins, A.F.; Haffari, G. Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations. In Proceedings of the 3rd Conference on Machine Translation, Belgium, Brussels, 31 October–1 November 2018. [Google Scholar]
- Wang, L.; Tu, Z.; Shi, S.; Zhang, T.; Graham, Y.; Liu, Q. Translating Pro-Drop Languages with Reconstruction Models. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Wang, L.; Tu, Z.; Wang, X.; Shi, S. One Model to Learn Both: Zero Pronoun Prediction and Translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 921–930. [Google Scholar]
- Yang, J.; Tong, J.; Li, S.; Gao, S.; Guo, J.; Xue, N. Recovering dropped pronouns in Chinese conversations via modeling their referents. In Proceedings of the the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 3–5 June 2019. [Google Scholar]
- Meyer, T.; Poláková, L. Machine translation with many manually labeled discourse connectives. In Proceedings of the Workshop on Discourse in Machine Translation, Sofia, Bulgaria, 9 August 2013; pp. 43–50. [Google Scholar]
- Meyer, T.; Webber, B. Implicitation of discourse connectives in (machine) translation. In Proceedings of the Workshop on Discourse in Machine Translation, Sofia, Bulgaria, 9 August 2013; pp. 19–26. [Google Scholar]
- Liang, Y.; Meng, F.; Chen, Y.; Xu, J.; Zhou, J. Modeling bilingual conversational characteristics for neural chat translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021. [Google Scholar]
- Liang, Y.; Zhou, C.; Meng, F.; Xu, J.; Chen, Y.; Su, J.; Zhou, J. Towards Making the Most of Dialogue Characteristics for Neural Chat Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 12 October 2021. [Google Scholar]
- Nirenburg, S.; Raskin, V.; Tucker, A. On knowledge-based machine translation. In Proceedings of the 11th Conference on Computational Linguistics, Bonn, Germany, 25–29 August 1986; pp. 627–632. [Google Scholar]
- Koehn, P. Statistical Machine Translation; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Kalchbrenner, N.; Blunsom, P. Recurrent Continuous Translation Models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the 28th Conference on Neural Information Processing Systems, Montréal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Brown, P.F.; Pietra, V.J.D.; Pietra, S.A.D.; Mercer, R.L. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 1993, 19, 263–311. [Google Scholar]
- Och, F.J.; Ney, H. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 295–302. [Google Scholar]
- Weaver, W. The mathematics of communication. Sci. Am. 1949, 181, 11–15. [Google Scholar] [CrossRef] [PubMed]
- Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Cowan, B.; Shen, W.; Moran, C.; Zens, R.; et al. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, 23–30 June 2007; pp. 177–180. [Google Scholar]
- Och, F.J.; Ney, H. A Systematic Comparison of Various Statistical Alignment Models. Comput. Linguist. 2003, 29, 19–51. [Google Scholar] [CrossRef]
- Stolcke, A. Srilm—An extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing, Denver, CO, USA, 16–20 September 2002; pp. 901–904. [Google Scholar]
- Och, F.J. Minimum Error Rate Training in Statistical Machine Translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan, 7–12 July2003; pp. 160–167. [Google Scholar]
- Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar]
- Sanders, T.; Maat, H.P. Cohesion and coherence: Linguistic approaches. Reading 2006, 99, 440–446. [Google Scholar]
- Mann, W.C.; Thompson, S.A. Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdiscip. J. Study Discourse 1988, 8, 243–281. [Google Scholar] [CrossRef]
- Foster, G.; Isabelle, P.; Kuhn, R. Translating Structured Documents. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas, Denver, CO, USA, 31 October–4 November 2010. [Google Scholar]
- Marcu, D.; Carlson, L.; Watanabe, M. The automatic translation of discourse structures. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, USA, 29 April–4 May 2000. [Google Scholar]
- Tu, M.; Zhou, Y.; Zong, C. A novel translation framework based on rhetorical structure theory. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; pp. 370–374. [Google Scholar]
- Guzmán, F.; Joty, S.; Màrquez, L.; Nakov, P. Using discourse structure improves machine translation evaluation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 22–27 June 2014; pp. 687–698. [Google Scholar]
- Chen, J.; Li, X.; Zhang, J.; Zhou, C.; Cui, J.; Wang, B.; Su, J. Modeling Discourse Structure for Document-level Neural Machine Translation. In Proceedings of the 1st Workshop on Automatic Simultaneous Translation, Seattle, WA, USA, 9 July 2020; pp. 30–36. [Google Scholar]
- Smith, K.S.; Specia, L. Assessing crosslingual discourse relations in machine translation. arXiv 2018, arXiv:1810.03148. [Google Scholar]
- Xiao, T.; Zhu, J.; Yao, S.; Zhang, H. Document-level consistency verification in machine translation. In Proceedings of the 13th Machine Translation Summit, Xiamen, China, 19–23 September 2011; pp. 131–138. [Google Scholar]
- Gong, Z.; Zhang, M.; Tan, C.L.; Zhou, G. Classifier-based tense model for SMT. In Proceedings of the 24th International Conference on Computational Linguistics, Mumbai, India, 8–15 December 2012; pp. 411–420. [Google Scholar]
- Gong, Z.; Zhang, M.; Tan, C.L.; Zhou, G. N-gram-based tense models for statistical machine translation. In Proceedings of the the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 12–14 July 2012; pp. 276–285. [Google Scholar]
- Sun, Z.; Wang, M.; Zhou, H.; Zhao, C.; Huang, S.; Chen, J.; Li, L. Capturing longer context for document-level neural machine translation: A multi-resolutional approach. arXiv 2020, arXiv:2010.08961. [Google Scholar]
- Guillou, L. Analysing lexical consistency in translation. In Proceedings of the Workshop on Discourse in Machine Translation, Sofia, Bulgaria, 9 August 2013; pp. 10–18. [Google Scholar]
- Chen, B.; Zhu, X. Bilingual sentiment consistency for statistical machine translation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 26–30 April 2014; pp. 607–615. [Google Scholar]
- Tiedemann, J. Context adaptation in statistical machine translation using models with exponentially decaying cache. In Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, Uppsala, Sweden, 15 July 2010; pp. 909–919. [Google Scholar]
- Gong, Z.; Zhang, M.; Zhou, G. Cache-based document-level statistical machine translation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, 27–31 July 2011. [Google Scholar]
- Hardmeier, C.; Nivre, J.; Tiedemann, J. Document-wide decoding for phrase-based statistical machine translation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 12–14 July 2012; pp. 1179–1190. [Google Scholar]
- Yu, L.; Sartran, L.; Stokowiec, W.; Ling, W.; Kong, L.; Blunsom, P.; Dyer, C. Better Document-Level Machine Translation with Bayes’ Rule. Trans. Assoc. Comput. Linguist. 2020, 8, 346–360. [Google Scholar] [CrossRef]
- Jean, S.; Lauly, S.; Firat, O.; Cho, K. Does Neural Machine Translation Benefit from Larger Context? arXiv 2017, arXiv:1704.05135. [Google Scholar]
- Halliday, M.A.K.; Hasan, R. Cohesion in English; Longman: London, UK, 1976. [Google Scholar]
- Voita, E.; Serdyukov, P.; Sennrich, R.; Titov, I. Context-Aware Neural Machine Translation Learns Anaphora Resolution. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 1264–1274. [Google Scholar]
- Li, C.N.; Thompson, S.A. Third-person pronouns and zero-anaphora in Chinese discourse. In Discourse and Syntax; Brill: Leiden, The Netherlands, 1979; pp. 311–335. [Google Scholar]
- Chung, T.; Gildea, D. Effects of empty categories on machine translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, USA, 9–11 October 2010; pp. 636–645. [Google Scholar]
- Wang, L.; Tu, Z.; Zhang, X.; Li, H.; Way, A.; Liu, Q. A Novel Approach for Dropped Pronoun Translation. In Proceedings of the The 2016 Conference of the North American Chapter of the Association for Computational Linguistics, San Diego, CA, USA, 12–17 June 2016; pp. 983–993. [Google Scholar]
- Xiong, D.; Ben, G.; Zhang, M.; Lv, Y.; Liu, Q. Modeling lexical cohesion for document-level machine translation. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013. [Google Scholar]
- Wong, B.T.; Kit, C. Extending machine translation evaluation metrics with lexical cohesion to document level. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 12–14 July 2012; pp. 1060–1068. [Google Scholar]
- Zheng, Y.; Chen, G.; Huang, M.; Liu, S.; Zhu, X. Personalized dialogue generation with diversified traits. arXiv 2019, arXiv:1901.09672. [Google Scholar]
- Vanmassenhove, E.; Hardmeier, C.; Way, A. Getting Gender Right in Neural Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3003–3008. [Google Scholar]
- Mirkin, S.; Nowson, S.; Brun, C.; Perez, J. Motivating personality-aware machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1102–1108. [Google Scholar]
- Chen, K.; Wang, R.; Utiyama, M.; Sumita, E.; Zhao, T. Neural Machine Translation With Sentence-Level Topic Context. IEEE/ACM Trans. Audio, Speech, Lang. Process. 2019, 27, 1970–1984. [Google Scholar] [CrossRef]
- Lavecchia, C.; Smaili, K.; Langlois, D. Building Parallel Corpora from Movies. In Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science, Madeira, Portugal, 12–13 June 2007; pp. 201–210. [Google Scholar]
- Tiedemann, J. Improved sentence alignment for movie subtitles. In Proceedings of the 3rd Conference on Recent Advances in Natural Language Processing, Varna, Bulgaria, 1–3 September 2007; Volume 7, pp. 582–588. [Google Scholar]
- Itamar, E.; Itai, A. Using Movie Subtitles for Creating a Large-Scale Bilingual Corpora. In Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, 26 May–1 June 2008; pp. 269–272. [Google Scholar]
- Tiedemann, J. Synchronizing Translated Movie Subtitles. In Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, 26 May–1 June 2008; pp. 1902–1906. [Google Scholar]
- Xiao, H.; Wang, X. Constructing Parallel Corpus from Movie Subtitles. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy, Hong Kong, China, 26–27 March 2009; pp. 329–336. [Google Scholar]
- Zhang, S.; Ling, W.; Dyer, C. Dual Subtitles as Parallel Corpora. In Proceedings of the 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, 23–28 May 2016; pp. 1869–1874. [Google Scholar]
- Lison, P.; Tiedemann, J. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, 23–28 May 2016. [Google Scholar]
- Paul, M.; Federico, M.; Stüker, S. Overview of the IWSLT 2010 evaluation campaign. In Proceedings of the 2010 International Workshop on Spoken Language Translation, Paris, France, 2–3 December 2010. [Google Scholar]
- Koehn, P.; Knowles, R. Six Challenges for Neural Machine Translation. In Proceedings of the 1st Workshop on Neural Machine Translation, Vancouver, BC, Canada, 30 July–4 August 2017. [Google Scholar]
- Poria, S.; Hazarika, D.; Majumder, N.; Naik, G.; Cambria, E.; Mihalcea, R. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 527–536. [Google Scholar]
- Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of 10th Machine Translation Summit Proceedings of Conference, Phuket, Thailand, 13–15 September 2005; pp. 79–86.
- Yang, Y.; Liu, Y.; Xue, N. Recovering dropped pronouns from Chinese text messages. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 309–313. [Google Scholar]
- Wang, L.; Du, J.; Li, L.; Tu, Z.; Way, A.; Liu, Q. Semantics-Enhanced Task-Oriented Dialogue Translation: A Case Study on Hotel Booking. In Proceedings of the 8th International Joint Conference on Natural Language Processing: System Demonstrations, Taiwan, China, 28–30 November 2017; pp. 33–36. [Google Scholar]
- Ghazvininejad, M.; Levy, O.; Liu, Y.; Zettlemoyer, L. Mask-predict: Parallel decoding of conditional masked language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016. [Google Scholar]
- Wang, L.; Wong, D.F.; Chao, L.S.; Lu, Y.; Xing, J. A systematic comparison of data selection criteria for smt domain adaptation. Sci. World J. 2014, 2014, 745485. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Li, M.; Liu, F.; Shi, S.; Tu, Z.; Wang, X.; Wu, S.; Zeng, J.; Zhang, W. Tencent Translation System for the WMT21 News Translation Task. In Proceedings of the 6th Conference on Machine Translation, Punta Cana, Dominican Republic, 7–11 November 2021. [Google Scholar]
- Ott, M.; Edunov, S.; Grangier, D.; Auli, M. Scaling Neural Machine Translation. In Proceedings of the 3rd Conference on Machine Translation, Brussels, Belgium, 31 October–1 November 2018. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Conneau, A.; Lample, G. Cross-lingual language model pretraining. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Liu, Y.; Gu, J.; Goyal, N.; Li, X.; Edunov, S.; Ghazvininejad, M.; Lewis, M.; Zettlemoyer, L. Multilingual denoising pre-training for neural machine translation. arXiv 2020, arXiv:2001.08210. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Zhang, J.; Luan, H.; Sun, M.; Zhai, F.; Xu, J.; Zhang, M.; Liu, Y. Improving the Transformer Translation Model with Document-Level Context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 533–542. [Google Scholar]
- Gu, J.; Bradbury, J.; Xiong, C.; Li, V.O.; Socher, R. Non-Autoregressive Neural Machine Translation. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Ding, L.; Wang, L.; Liu, X.; Wong, D.F.; Tao, D.; Tu, Z. Understanding and Improving Lexical Choice in Non-Autoregressive Translation. In Proceedings of the 9th International Conference on Learning Representations, Vienna, Austria, 4–8 May 2021. [Google Scholar]
- Kim, Y.; Rush, A.M. Sequence-Level Knowledge Distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016. [Google Scholar]
- Kasai, J.; Cross, J.; Ghazvininejad, M.; Gu, J. Parallel Machine Translation with Disentangled Context Transformer. arXiv 2020, arXiv:2001.05136. [Google Scholar]
- Li, L.; Jiang, X.; Liu, Q. Pretrained language models for document-level neural machine translation. arXiv 2019, arXiv:1911.03110. [Google Scholar]
- Ott, M.; Edunov, S.; Baevski, A.; Fan, A.; Gross, S.; Ng, N.; Grangier, D.; Auli, M. FAIRSEQ: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016. [Google Scholar]
- Kim, Y.; Tran, D.T.; Ney, H. When and Why is Document-level Context Useful in Neural Machine Translation? In Proceedings of the 4th Workshop on Discourse in Machine Translation, Hong Kong, China, 3 November 2019. [Google Scholar]
- Li, B.; Liu, H.; Wang, Z.; Jiang, Y.; Xiao, T.; Zhu, J.; Liu, T.; Li, C. Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
Corpus | Language | Domain | |||
---|---|---|---|---|---|
OpenSubtitle | FR-EN | movie subtitle | 29.2 M | 35 K | 8.0/7.5 |
ES-EN | 64.7 M | 78 K | 8.0/7.3 | ||
EN-RU | 27.4 M | 35 K | 5.8/6.7 | ||
ZH-EN | 11.2 M | 14 K | 5.4/7.3 | ||
TVSub | ZH-EN | TV series subtitle | 2.2 M | 3 K | 5.6/7.7 |
MVSub | ZH-EN | Friends subtitle | 0.1 M | 5 K | 6.0/7.9 |
IWSLT-DIALOG | ZH-EN | travel dialogue | 0.2 M | 2 K | 19.5/21.0 |
BConTrasT | EN-DE | task-based dialogue | 8.1 K | 0.6 K | 6.7/9.2 |
BMELD | ZH-EN | Friends subtitle | 6.2 K | 1 K | 6.0/7.9 |
Europarl | ET-EN | European Parliament speech | 0.2 M | 150 K | 35.1/36.4 |
EN-DE | 1.9 M | - | 23.2/24.9 | ||
FR-EN | 2.0 M | - | 25.6/25.0 |
Data | # Sent. | # Ave. Len. |
---|---|---|
Parallel | ||
In-domain | 13,845 | 10.3/10.1 |
Valid | 1902 | 10.3/10.2 |
Test | 2100 | 10.1/10.0 |
Out-of-domain | 46,074,573 | 23.4/22.4 |
+filter | 33,293,382 | 24.3/23.6 |
+select | 1,000,000 | 21.4/20.9 |
Monolingual | ||
Out-of-domain De | 58,044,806 | 28.0 |
+filter | 56,508,715 | 27.1 |
+select | 1,000,000 | 24.2 |
Out-of-domain En | 34,209,709 | 17.2 |
+filter | 32,823,301 | 16.6 |
+select | 1,000,000 | 14.5 |
Systems | Finetune | BLEU |
---|---|---|
Models | ||
Sent-B | In | 42.56 |
In+Out | 59.81 | |
Sent-S | In | 41.87 |
In+Out | 58.62 | |
Doc | In | 45.65 |
In+Out | 51.12 | |
In→In | 51.93 | |
Nat | In+Out | 54.01 |
In+Out | 54.59 | |
Pre-training | ||
Sent→Doc | Out→In | 49.77 |
Out→In+Out | 51.58 | |
Xlm→Sent | In+Out | 59.61 |
Bert→Doc | In+Out | 56.01 |
mBart→Sent | In+Out | 62.67 |
Models | −Domain | +Domain |
---|---|---|
Valid Set (combined) | ||
Sent-S | 62.66 | 61.19 |
Sent-B | 64.99 | 63.00 |
Xlm | 64.19 | 61.30 |
Valid Set (split) | ||
Sent-S | 60.05 | 62.09 |
Sent-B | 59.64 | 63.31 |
Xlm | 61.12 | 62.04 |
Ave. | 62.27 | 62.48 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, S.; Sun, Y.; Wang, L. Recent Advances in Dialogue Machine Translation. Information 2021, 12, 484. https://doi.org/10.3390/info12110484
Liu S, Sun Y, Wang L. Recent Advances in Dialogue Machine Translation. Information. 2021; 12(11):484. https://doi.org/10.3390/info12110484
Chicago/Turabian StyleLiu, Siyou, Yuqi Sun, and Longyue Wang. 2021. "Recent Advances in Dialogue Machine Translation" Information 12, no. 11: 484. https://doi.org/10.3390/info12110484
APA StyleLiu, S., Sun, Y., & Wang, L. (2021). Recent Advances in Dialogue Machine Translation. Information, 12(11), 484. https://doi.org/10.3390/info12110484