article

The representational geometry of word meanings acquired by neural machine translation models

Authors:

Sébastien Jean,

Yoshua BengioAuthors Info & Claims

Machine Translation, Volume 31, Issue 1-2

Pages 3 - 18

https://doi.org/10.1007/s10590-017-9194-2

Published: 01 June 2017 Publication History

Abstract

This work is the first comprehensive analysis of the properties of word embeddings learned by neural machine translation (NMT) models trained on bilingual texts. We show the word representations of NMT models outperform those learned from monolingual text by established algorithms such as Skipgram and CBOW on tasks that require knowledge of semantic similarity and/or lexical---syntactic role. These effects hold when translating from English to French and English to German, and we argue that the desirable properties of NMT word embeddings should emerge largely independently of the source and target languages. Further, we apply a recently-proposed heuristic method for training NMT models with very large vocabularies, and show that this vocabulary expansion method results in minimal degradation of embedding quality. This allows us to make a large vocabulary of NMT embeddings available for future research and applications. Overall, our analyses indicate that NMT embeddings should be used in applications that require word concepts to be organised according to similarity and/or lexical function, while monolingual embeddings are better suited to modelling (nonspecific) inter-word relatedness.

References

[1]

Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of NAACL-HLT 2009

[2]

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR

[3]

Baroni M, Dinu G, Kruszewski G (2014) Dont count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 1

[4]

Bengio Y, Sénécal JS (2003) Quick training of probabilistic neural nets by importance sampling. In: Proceedings of AISTATS 2003

[5]

Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137-1155

Digital Library

[6]

Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res(JAIR) 49:1-47

Digital Library

[7]

Chandar S, Lauly S, Larochelle H, Khapra MM, Ravindran B, Raykar V, Saha A (2014) An autoencoder approach to learning bilingual word representations. In: NIPS

[8]

Cho K, van Merrienboer B, Gulcehre C, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014), to appear

[9]

Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, ACM, pp 160-167

[10]

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493-2537

[11]

Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: 52nd annual meetingofthe association for computational linguistics, Baltimore, June

[12]

Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Proceedings of EACL, vol 2014

[13]

Firth RJ (1957) A synopsis of linguistic theory 1930-1955. Philological Society, Oxford, pp 1-32

[14]

Haghighi A, Liang P, Berg-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: ACL, vol 2008, pp 771-779

[15]

Hermann KM, Blunsom P (2014) Multilingual distributed representations without word alignment. In: Proceedings of ICLR

[16]

Hill F, Korhonen A (2014) Learning abstract concepts from multi-modal data: since you probably can't see what i mean. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014)

[17]

Hill F, Reichart R, Korhonen A (2014) Simlex-999: evaluating semantic models with (genuine) similarity estimation. arXiv preprint arXiv:1408.3456

[18]

Jean S, Cho K, Memisevic R, Bengio Y (2015) On using very large target vocabulary for neural machine translation. In: Proceedings of NAACL

[19]

Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Association for Computational Linguistics, Seattle

[20]

Klementiev A, Titov I, Bhattarai B (2012a) Inducing crosslingual distributed representations of words. COLING

[21]

Klementiev A, Titov I, Bhattarai B (2012b) Inducing crosslingual distributed representations of words. In: COLING

[22]

Ko¿isk? T, Hermann KM, Blunsom P (2014) Learning bilingual word representations by marginalizing alignments. In: Proceedings of ACL

[23]

Kusner M, Sun Y, Kolkin N, Weinberger KQ (2015) From word embeddings to document distances. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 957-966

[24]

Landauer TK, Dumais ST (1997) A solution to plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211

[25]

Levy O, Goldberg Y (2014) Dependency-based word embeddings. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 2

[26]

Luong T, Sutskever I, Le QV, Vinyals O, Zaremba W (2014) Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206

[27]

Mikolov T, Le QV, Sutskever I (2013a) Exploiting similarities among languages for machine translation. In: CORR

[28]

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111-3119

Digital Library

[29]

Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp 1081-1088

Digital Library

[30]

Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. AISTATS, Citeseer 5:246-252

[31]

Nelson DL, McEvoy CL, Schreiber TA (2004) The university of south florida free association, rhyme, and word fragment norms. Behav Res Methods Instrum Comput 36(3):402-407

[32]

Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014)

[33]

Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of NIPS

[34]

Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37(1):141-188

[35]

Vuli¿ I, De Smet W, Moens MF (2011) Identifying word translations from comparable corpora using latent topic models. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Vol 2, Association for Computational Linguistics, pp 479-484

[36]

Weston J, Bengio S, Usunier N (2010) Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81(1):21-35

Digital Library

Cited By

Ren L(2022)Application of Improved Image Restoration Algorithm and Depth Generation in English Intelligent Translation Teaching SystemMobile Information Systems10.1155/2022/73989292022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7398929
Li Z(2021)Artificial Intelligence Machine Translation Based on Fuzzy AlgorithmMobile Information Systems10.1155/2021/18276272021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/1827627
Bi SMaseleno AYuan XBalas V(2020)Intelligent system for English translation using automated knowledge baseJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-17999139:4(5057-5066)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.3233/JIFS-179991
Show More Cited By

The representational geometry of word meanings acquired by neural machine translation models
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Word Sense Based Hindi-Tamil Statistical Machine Translation

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

Crosslingual word embeddings developed from multiple parallel corpora help in understanding the relationships between languages and improving the prediction quality of machine translation. However, in low resource languages with complex and ...
Simple measures of bridging lexical divergence help unsupervised neural machine translation for low-resource languages
Abstract
Unsupervised Neural Machine Translation (UNMT) approaches have gained widespread popularity in recent times. Though these approaches show impressive translation performance using only monolingual corpora of the languages involved, these approaches ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Translation

Machine Translation Volume 31, Issue 1-2

June 2017

85 pages

ISSN:0922-6567

Issue’s Table of Contents

Copyright © Copyright © 2017 Springer Science+Business Media B.V.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ren L(2022)Application of Improved Image Restoration Algorithm and Depth Generation in English Intelligent Translation Teaching SystemMobile Information Systems10.1155/2022/73989292022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7398929
Li Z(2021)Artificial Intelligence Machine Translation Based on Fuzzy AlgorithmMobile Information Systems10.1155/2021/18276272021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/1827627
Bi SMaseleno AYuan XBalas V(2020)Intelligent system for English translation using automated knowledge baseJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-17999139:4(5057-5066)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.3233/JIFS-179991
Wang HYao YZhang W(2020)Semantic ordering of English machine translation based on fuzzy theoryJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-17959938:4(3765-3772)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.3233/JIFS-179599
Amplayo RLee KYeo JHwang S(2018)Translations as additional contexts for sentence classificationProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304222.3304320(3955-3961)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304222.3304320
Gao FGe YLiu Y(2018)Remember and forgetMultimedia Tools and Applications10.5555/3288251.328830977:22(29269-29282)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.5555/3288251.3288309
McCann BBradbury JXiong CSocher R(2017)Learned in translationProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295377(6297-6308)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3295222.3295377

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents