short-paper

POS Tag-enhanced Coarse-to-fine Attention for Neural Machine Translation

Authors:

Yidong ChenAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 18, Issue 4

Article No.: 46, Pages 1 - 14

https://doi.org/10.1145/3321124

Published: 22 April 2019 Publication History

Abstract

Although neural machine translation (NMT) has certain capability to implicitly learn semantic information of sentences, we explore and show that Part-of-Speech (POS) tags can be explicitly incorporated into the attention mechanism of NMT effectively to yield further improvements. In this article, we propose an NMT model with tag-enhanced attention mechanism. In our model, NMT and POS tagging are jointly modeled via multi-task learning. Besides following common practice to enrich encoder annotations by introducing predicted source POS tags, we exploit predicted target POS tags to refine attention model in a coarse-to-fine manner. Specifically, we first implement a coarse attention operation solely on source annotations and target hidden state, where the produced context vector is applied to update target hidden state used for target POS tagging. Then, we perform a fine attention operation that extends the coarse one by further exploiting the predicted target POS tags. Finally, we facilitate word prediction by simultaneously utilizing the context vector from fine attention and the predicted target POS tags. Experimental results and further analyses on Chinese-English and Japanese-English translation tasks demonstrate the superiority of our proposed model over the conventional NMT models. We release our code at https://github.com/middlekisser/PEA-NMT.git.

References

[1]

Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the ICLR.

[2]

Maria Barrett, Joachim Bingel, Frank Keller, and Anders Søgaard. 2016. Weakly supervised part-of-speech tagging using eye-tracking data. In Proceedings of the ACL2016.

[3]

Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Simaan. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the EMNLP.

[4]

Franck Burlot, Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2017. Word representations in factored neural machine translation. In Proceedings of the WMT.

[5]

Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the ACL.

[6]

Kehai Chen, Rui Wang, Masao Utiyama, Lemao Liu, Akihiro Tamura, Eiichiro Sumita, and Tiejun Zhao. 2017. Neural machine translation with source dependency representation. In Proceedings of the EMNLP.

[7]

Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, and Tiejun Zhao. 2018. Syntax-directed attention for neural machine translation. In Proceedings of the AAAI.

[8]

Yong Cheng, Shiqi Shen, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Agreement-based joint training for bidirectional attention-based neural machine translation. In Proceedings of the IJCAI.

Digital Library

[9]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the EMNLP.

[10]

Greg Coppola Chris Alberti, David Weiss and Slav Petrov. 2015. Improved transition-based parsing and tagging with neural networks. In Proceedings of the EMNLP.

[11]

Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vymolova, Kaisheng Yao, Chris Dyer, and Gholamreza Haffari. 2016. Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of the NAACL.

[12]

Fabien Cromieres. 2016. Kyoto-NMT: A neural machine translation implementation in chainer. In Proceedings of the COLING.

[13]

David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Daniel Andor, Chris Alberti, and Michael Collins. 2016. Globally normalized transition-based neural networks. In Proceedings of the ACL.

[14]

V. Demberg and F. Keller. 2008. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 109, 2 (2008), 193.

[15]

Jimmy Ba Diederik P. Kingma. 2015. Adam: A method for stochastic optimization. In Proceedings of the ICLR.

[16]

Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the ACL.

[17]

Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the ACL.

[18]

Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho. 2017. Learning to parse and translate improves neural machine translation. In Proceedings of the ACL.

[19]

Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, and Kenny Q. Zhu. 2016. Improving attention modeling with implicit distortion and fertility for machine translation. In Proceedings of the COLING.

[20]

Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2016. Factored neural machine translation architectures. In Proceedings of the IWSLT.

[21]

Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2017. Neural machine translation by generating multiple linguistic factors. In Arxiv:1712.01821v1.

[22]

Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of the EMNLP.

[23]

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the EMNLP.

[24]

Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of the ACL.

[25]

Lemao Liu, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2016. Neural machine translation with supervised attention. In Proceedings of the COLING.

[26]

Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of the AAAI.

Digital Library

[27]

Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. Multi-task sequence to sequence learning. In Proceedings of the ICLR.

[28]

Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the EMNLP.

[29]

Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Maria Nadejde, Siva Reddy, and Alexandra Birch. 2017. Predicting target language CCG supertags improves neural machine translation. In Proceedings of the WMT.

[30]

Graham Neubig, Yosuke Nakata, and Shinsuke Mori. 2011. Pointwise prediction for robust, adaptable japanese morphological analysis. In Proceedings of the ACL.

Digital Library

[31]

Franz Joseph Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. In Proceedings of the ACL.

Digital Library

[32]

Rico Sennrich and Alexandra Birch. 2016. Linguistic input features improve neural machine translation. In Proceedings of the CMT.

[33]

Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. 2018. Variational recurrent neural machine translation. In Proceedings of the AAAI 2018.

[34]

Jinsong Su, Jiali Zeng, Deyi Xiong, Yang Liu, Mingxuan Wang, and Jun Xie. 2018. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Audio Speech Lang. Process. 26, 3 (2018), 623–632.

Digital Library

[35]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the NIPS.

Digital Library

[36]

Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the ACL.

[37]

Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. Deep neural machine translation with linear associative unit. In Proceedings of the ACL.

[38]

Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Learning sentence representation with guidance of human attention. In Proceedings of the IJCAI.

Digital Library

[39]

Shuangzhi Wu, Dongdong Zhang, Nan Yang, Mu Li, and Ming Zhou. 2017. Sequence-to-dependency neural machine translation. In Proceedings of the ACL.

[40]

Shuangzhi Wu, Ming Zhou, and Dongdong Zhang. 2017. Improved neural machine translation with source syntax. In Proceedings of the IJCAI.

Digital Library

[41]

Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, and Min Zhang. 2016. Variational neural machine translation. In Proceedings of the EMNLP 2016.

[42]

Jinchao Zhang, Mingxuan Wang, Qun Liu, and Jie Zhou. 2017. Incorporating word reordering knowledge into attention-based neural machine translation. In Proceedings of the ACL.

Cited By

Kadeer ZYi NWumaier A(2023)Part-of-Speech Tags Guide Low-Resource Machine TranslationElectronics10.3390/electronics1216340112:16(3401)Online publication date: 10-Aug-2023
https://doi.org/10.3390/electronics12163401
Nguyen LPham NDuc LHoang CDinh D(2022)Moment matching training for neural machine translation: An empirical studyJournal of Intelligent & Fuzzy Systems10.3233/JIFS-21324043:3(2633-2645)Online publication date: 21-Jul-2022
https://doi.org/10.3233/JIFS-213240
Shi X(2022)Chinese-English Contrastive Translation System Based on Lagrangian Search Mathematical Algorithm ModelApplied Mathematics and Nonlinear Sciences10.2478/amns.2022.2.0122Online publication date: 15-Jul-2022
https://doi.org/10.2478/amns.2022.2.0122
Show More Cited By

Index Terms

POS Tag-enhanced Coarse-to-fine Attention for Neural Machine Translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Neural Machine Translation Enhancements through Lexical Semantic Network
ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and Simulation

In most languages, many words have multiple senses, thus machine translation systems have to choose between several candidates representing different senses of an input word. Although neural machine translation has recently become a dominant paradigm ...
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

In this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Explicitly Modeling Word Translations in Neural Machine Translation

In this article, we show that word translations can be explicitly incorporated into NMT effectively to avoid wrong translations. Specifically, we propose three cross-lingual encoders to explicitly incorporate word translations into NMT: (1) Factored ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18, Issue 4

December 2019

305 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3327969

Editor:
Nianwen Xue
Brandeis University, Waltham, USA

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2019

Accepted: 01 February 2019

Revised: 01 February 2019

Received: 01 June 2018

Published in TALLIP Volume 18, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed

Funding Sources

Scientific Research Project of National Language Committee of China
Natural Science Foundation of Fujian Province
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
406
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kadeer ZYi NWumaier A(2023)Part-of-Speech Tags Guide Low-Resource Machine TranslationElectronics10.3390/electronics1216340112:16(3401)Online publication date: 10-Aug-2023
https://doi.org/10.3390/electronics12163401
Nguyen LPham NDuc LHoang CDinh D(2022)Moment matching training for neural machine translation: An empirical studyJournal of Intelligent & Fuzzy Systems10.3233/JIFS-21324043:3(2633-2645)Online publication date: 21-Jul-2022
https://doi.org/10.3233/JIFS-213240
Shi X(2022)Chinese-English Contrastive Translation System Based on Lagrangian Search Mathematical Algorithm ModelApplied Mathematics and Nonlinear Sciences10.2478/amns.2022.2.0122Online publication date: 15-Jul-2022
https://doi.org/10.2478/amns.2022.2.0122
Jian LXiang HLe G(2022)LSTM-Based Attentional Embedding for English Machine TranslationScientific Programming10.1155/2022/39097262022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/3909726
Di Gennaro GOspedale ADi Girolamo ABuonanno APalmieri FFedele G(2022)Split-word Architecture in Recurrent Neural Networks POS-Tagging2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892466(01-07)Online publication date: 18-Jul-2022
https://doi.org/10.1109/IJCNN55064.2022.9892466
Mundotiya RMehta ABaruah RSingh A(2022)Integration of morphological features and contextual weightage using monotonic chunk attention for part of speech taggingJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2021.08.02334:9(7324-7334)Online publication date: Oct-2022
https://doi.org/10.1016/j.jksuci.2021.08.023
Hlaing ZThu YSupnithi TNetisopakul P(2022)Improving neural machine translation with POS-tag features for low-resource language pairsHeliyon10.1016/j.heliyon.2022.e103758:8(e10375)Online publication date: Aug-2022
https://doi.org/10.1016/j.heliyon.2022.e10375
Bharti SGupta RPatel SShah M(2022)Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic ApproachAnnals of Data Science10.1007/s40745-022-00434-411:1(347-378)Online publication date: 16-Aug-2022
https://doi.org/10.1007/s40745-022-00434-4
Ganesh PRawal BPeter AGiri A(2021)POS-Tagging based Neural Machine Translation System for European Languages using TransformersWSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS10.37394/23209.2021.18.518(26-33)Online publication date: 24-May-2021
https://doi.org/10.37394/23209.2021.18.5
Wang R(2021)Research on Intelligent English Translation Method Based on the Improved Attention Mechanism ModelScientific Programming10.1155/2021/96672552021Online publication date: 23-Nov-2021
https://dl.acm.org/doi/10.1155/2021/9667255
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents