Multi-Granularity Self-Attention for Neural Machine Translation

Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, Zhaopeng Tu

Abstract

Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalisms. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling – a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveal that Mg-Sa indeed captures useful phrase information at various levels of granularities.

Anthology ID:: D19-1082
Volume:: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:: EMNLP | IJCNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 887–897
Language:
URL:: https://aclanthology.org/D19-1082
DOI:: 10.18653/v1/D19-1082
Bibkey:
Cite (ACL):: Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, and Zhaopeng Tu. 2019. Multi-Granularity Self-Attention for Neural Machine Translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 887–897, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Multi-Granularity Self-Attention for Neural Machine Translation (Hao et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-1082.pdf

PDF Cite Search