Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning

Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

Abstract

We investigate the integration of a planning mechanism into an encoder-decoder architecture with attention. We develop a model that can plan ahead when it computes alignments between the source and target sequences not only for a single time-step but for the next k time-steps as well by constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by strategic attentive reader and writer (STRAW) model, a recent neural architecture for planning with hierarchical reinforcement learning that can also learn higher level temporal abstractions. Our proposed model is end-to-end trainable with differentiable operations. We show that our model outperforms strong baselines on character-level translation task from WMT’15 with fewer parameters and computes alignments that are qualitatively intuitive.

Anthology ID:: W17-2627
Volume:: Proceedings of the 2nd Workshop on Representation Learning for NLP
Month:: August
Year:: 2017
Address:: Vancouver, Canada
Editors:: Phil Blunsom, Antoine Bordes, Kyunghyun Cho, Shay Cohen, Chris Dyer, Edward Grefenstette, Karl Moritz Hermann, Laura Rimell, Jason Weston, Scott Yih
Venue:: RepL4NLP
SIG:: SIGREP
Publisher:: Association for Computational Linguistics
Note:
Pages:: 228–234
Language:
URL:: https://aclanthology.org/W17-2627
DOI:: 10.18653/v1/W17-2627
Bibkey:
Cite (ACL):: Caglar Gulcehre, Francis Dutil, Adam Trischler, and Yoshua Bengio. 2017. Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 228–234, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning (Gulcehre et al., RepL4NLP 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-2627.pdf

PDF Cite Search