Nothing Special   »   [go: up one dir, main page]

skip to main content
article

The representational geometry of word meanings acquired by neural machine translation models

Published: 01 June 2017 Publication History

Abstract

This work is the first comprehensive analysis of the properties of word embeddings learned by neural machine translation (NMT) models trained on bilingual texts. We show the word representations of NMT models outperform those learned from monolingual text by established algorithms such as Skipgram and CBOW on tasks that require knowledge of semantic similarity and/or lexical---syntactic role. These effects hold when translating from English to French and English to German, and we argue that the desirable properties of NMT word embeddings should emerge largely independently of the source and target languages. Further, we apply a recently-proposed heuristic method for training NMT models with very large vocabularies, and show that this vocabulary expansion method results in minimal degradation of embedding quality. This allows us to make a large vocabulary of NMT embeddings available for future research and applications. Overall, our analyses indicate that NMT embeddings should be used in applications that require word concepts to be organised according to similarity and/or lexical function, while monolingual embeddings are better suited to modelling (nonspecific) inter-word relatedness.

References

[1]
Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of NAACL-HLT 2009
[2]
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR
[3]
Baroni M, Dinu G, Kruszewski G (2014) Dont count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 1
[4]
Bengio Y, Sénécal JS (2003) Quick training of probabilistic neural nets by importance sampling. In: Proceedings of AISTATS 2003
[5]
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137-1155
[6]
Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res(JAIR) 49:1-47
[7]
Chandar S, Lauly S, Larochelle H, Khapra MM, Ravindran B, Raykar V, Saha A (2014) An autoencoder approach to learning bilingual word representations. In: NIPS
[8]
Cho K, van Merrienboer B, Gulcehre C, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014), to appear
[9]
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, ACM, pp 160-167
[10]
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493-2537
[11]
Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: 52nd annual meetingofthe association for computational linguistics, Baltimore, June
[12]
Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Proceedings of EACL, vol 2014
[13]
Firth RJ (1957) A synopsis of linguistic theory 1930-1955. Philological Society, Oxford, pp 1-32
[14]
Haghighi A, Liang P, Berg-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: ACL, vol 2008, pp 771-779
[15]
Hermann KM, Blunsom P (2014) Multilingual distributed representations without word alignment. In: Proceedings of ICLR
[16]
Hill F, Korhonen A (2014) Learning abstract concepts from multi-modal data: since you probably can't see what i mean. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014)
[17]
Hill F, Reichart R, Korhonen A (2014) Simlex-999: evaluating semantic models with (genuine) similarity estimation. arXiv preprint arXiv:1408.3456
[18]
Jean S, Cho K, Memisevic R, Bengio Y (2015) On using very large target vocabulary for neural machine translation. In: Proceedings of NAACL
[19]
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Association for Computational Linguistics, Seattle
[20]
Klementiev A, Titov I, Bhattarai B (2012a) Inducing crosslingual distributed representations of words. COLING
[21]
Klementiev A, Titov I, Bhattarai B (2012b) Inducing crosslingual distributed representations of words. In: COLING
[22]
Ko¿isk? T, Hermann KM, Blunsom P (2014) Learning bilingual word representations by marginalizing alignments. In: Proceedings of ACL
[23]
Kusner M, Sun Y, Kolkin N, Weinberger KQ (2015) From word embeddings to document distances. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 957-966
[24]
Landauer TK, Dumais ST (1997) A solution to plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211
[25]
Levy O, Goldberg Y (2014) Dependency-based word embeddings. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 2
[26]
Luong T, Sutskever I, Le QV, Vinyals O, Zaremba W (2014) Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206
[27]
Mikolov T, Le QV, Sutskever I (2013a) Exploiting similarities among languages for machine translation. In: CORR
[28]
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111-3119
[29]
Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp 1081-1088
[30]
Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. AISTATS, Citeseer 5:246-252
[31]
Nelson DL, McEvoy CL, Schreiber TA (2004) The university of south florida free association, rhyme, and word fragment norms. Behav Res Methods Instrum Comput 36(3):402-407
[32]
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014)
[33]
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of NIPS
[34]
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37(1):141-188
[35]
Vuli¿ I, De Smet W, Moens MF (2011) Identifying word translations from comparable corpora using latent topic models. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Vol 2, Association for Computational Linguistics, pp 479-484
[36]
Weston J, Bengio S, Usunier N (2010) Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81(1):21-35

Cited By

View all
  • (2022)Application of Improved Image Restoration Algorithm and Depth Generation in English Intelligent Translation Teaching SystemMobile Information Systems10.1155/2022/73989292022Online publication date: 1-Jan-2022
  • (2021)Artificial Intelligence Machine Translation Based on Fuzzy AlgorithmMobile Information Systems10.1155/2021/18276272021Online publication date: 1-Jan-2021
  • (2020)Intelligent system for English translation using automated knowledge baseJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-17999139:4(5057-5066)Online publication date: 1-Jan-2020
  • Show More Cited By
  1. The representational geometry of word meanings acquired by neural machine translation models

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Machine Translation
      Machine Translation  Volume 31, Issue 1-2
      June 2017
      85 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 June 2017

      Author Tags

      1. Machine translation
      2. Representation
      3. Word embeddings

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 30 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Application of Improved Image Restoration Algorithm and Depth Generation in English Intelligent Translation Teaching SystemMobile Information Systems10.1155/2022/73989292022Online publication date: 1-Jan-2022
      • (2021)Artificial Intelligence Machine Translation Based on Fuzzy AlgorithmMobile Information Systems10.1155/2021/18276272021Online publication date: 1-Jan-2021
      • (2020)Intelligent system for English translation using automated knowledge baseJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-17999139:4(5057-5066)Online publication date: 1-Jan-2020
      • (2020)Semantic ordering of English machine translation based on fuzzy theoryJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-17959938:4(3765-3772)Online publication date: 1-Jan-2020
      • (2018)Translations as additional contexts for sentence classificationProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304222.3304320(3955-3961)Online publication date: 13-Jul-2018
      • (2018)Remember and forgetMultimedia Tools and Applications10.5555/3288251.328830977:22(29269-29282)Online publication date: 1-Nov-2018
      • (2017)Learned in translationProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295377(6297-6308)Online publication date: 4-Dec-2017

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media