Article

Free access

Learned in translation: contextualized word vectors

Authors:

James Bradbury,

Richard SocherAuthors Info & Claims

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

Pages 6297 - 6308

Published: 04 December 2017 Publication History

PDF eReader Publisher Site

Abstract

Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.

References

[1]

E. Agirre, C. Banea, C. Cardie, D. M. Cer, M. T. Diab, A. Gonzalez-Agirre, W. Guo, R. Mihalcea, G. Rigau, and J. Wiebe. SemEval-2014 Task 10: Multilingual semantic textual similarity. In SemEval@COLING, 2014.

[2]

D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.

[3]

S. R. Bowman, C. Potts, and C. D. Manning. Recursive neural networks for learning logical semantics. CoRR, abs/1406.1827, 2014.

[4]

S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2015.

[5]

M. Cettolo, J. Niehues, S. Stüker, L. Bentivogli, R. Cattoni, and M. Federico. The IWSLT 2015 evaluation campaign. In IWSLT, 2015.

[6]

Q. Chen, X.-D. Zhu, Z.-H. Ling, S. Wei, and H. Jiang. Enhancing and combining sequential and tree LSTM for natural language inference. CoRR, abs/1609.06038, 2016.

[7]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. JMLR, 12:2493-2537, 2011.

Digital Library

[8]

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017.

[9]

J. P. C. G. da Silva, L. Coheur, A. C. Mendes, and A. Wichert. From symbolic to sub-symbolic information in question classification. Artif. Intell. Rev., 35:137-154, 2011.

Digital Library

[10]

A. M. Dai and Q. V. Le. Semi-supervised sequence learning. In NIPS, 2015.

Digital Library

[11]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248-255, 2009.

[12]

A. B. Dieng, C. Wang, J. Gao, and J. W. Paisley. TopicRNN: A recurrent neural network with long-range semantic dependency. CoRR, abs/1611.01702, 2016.

[13]

L. Dong and M. Lapata. Language to logical form with neural attention. CoRR, abs/1601.01280, 2016.

[14]

A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847, 2016.

[15]

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580-587, 2014.

Digital Library

[16]

I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, and Y. Bengio. Maxout networks. In ICML, 2013.

Digital Library

[17]

A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5):602-610, 2005.

Digital Library

[18]

H. Guo, C. Cherry, and J. Su. End-to-end multi-view networks for text classification. CoRR, abs/1704.05907, 2017.

[19]

K. Hashimoto, C. Xiong, Y. Tsuruoka, and R. Socher. A joint many-task model: Growing a neural network for multiple NLP tasks. CoRR, abs/1611.01587, 2016.

[20]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016.

[21]

F. Hill, K. Cho, and A. Korhonen. Learning distributed representations of sentences from unlabelled data. In HLT-NAACL, 2016.

[22]

F. Hill, K. Cho, S. Jean, and Y. Bengio. The representational geometry of word meanings acquired by neural machine translation models. Machine Translation, pages 1-16, 2017. ISSN 1573-0573. URL http://dx.doi.org/10.1007/s10590-017-9194-2.

Digital Library

[23]

M. Huang, Q. Qian, and X. Zhu. Encoding syntactic knowledge in neural networks for sentiment classification. ACM Trans. Inf. Syst., 35:26:1-26:27, 2017.

Digital Library

[24]

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

Digital Library

[25]

R. Johnson and T. Zhang. Supervised and semi-supervised text categorization using LSTM for region embeddings. In ICML, 2016.

[26]

R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip-thought vectors. In NIPS, 2015.

Digital Library

[27]

G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. OpenNMT: Open-source toolkit for neural machine translation. ArXiv e-prints, 2017.

[28]

P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: Open source toolkit for statistical machine translation. In ACL, 2007.

Digital Library

[29]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.

Digital Library

[30]

A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, and R. Socher. Ask me anything: Dynamic memory networks for natural language processing. In ICML, 2016.

Digital Library

[31]

X. Li and D. Roth. Learning question classifiers: The role of semantic information. Natural Language Engineering, 12:229-249, 2006.

Digital Library

[32]

B. Loni, G. van Tulder, P. Wiggers, D. M. J. Tax, and M. Loog. Question classification by weighted combination of lexical, syntactic and semantic features. In TSD, 2011.

[33]

M. Looks, M. Herreshoff, D. Hutchins, and P. Norvig. Deep learning with dynamic computation graphs. CoRR, abs/1702.02181, 2017.

[34]

J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. arXiv preprint arXiv:1612.01887, 2016.

[35]

T. Luong, H. Pham, and C. D. Manning. Effective approaches to attention-based neural machine translation. In EMNLP, 2015.

[36]

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142-150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-1015.

Digital Library

[37]

H. T. Madabushi and M. Lee. High accuracy rule-based question classification using question syntax and semantics. In COLING, 2016.

[38]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR (workshop), 2013.

[39]

S. Min, M. Seo, and H. Hajishirzi. Question answering through transfer learning from large fine-grained supervision data. 2017.

[40]

T. Miyato, A. M. Dai, and I. Goodfellow. Adversarial training methods for semi-supervised text classification. 2017.

[41]

L. Mou, H. Peng, G. Li, Y. Xu, L. Zhang, and Z. Jin. Discriminative neural sentence modeling by tree-based convolution. In EMNLP, 2015.

[42]

T. Munkhdalai and H. Yu. Neural semantic encoders. CoRR, abs/1607.04315, 2016a.

[43]

T. Munkhdalai and H. Yu. Neural tree indexers for text understanding. CoRR, abs/1607.04492, 2016b.

[44]

V. Nair and G. E. Hinton. Rectified linear units improve restricted Boltzmann machines. In ICML, 2010.

Digital Library

[45]

R. Nallapati, B. Zhou, C. N. dos Santos, Çaglar Gülçehre, and B. Xiang. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In CoNLL, 2016.

[46]

B. Paria, K. M. Annervaz, A. Dukkipati, A. Chatterjee, and S. Podder. A neural architecture mimicking humans end-to-end for natural language inference. CoRR, abs/1611.04741, 2016.

[47]

A. P. Parikh, O. Tackstrom, D. Das, and J. Uszkoreit. A decomposable attention model for natural language inference. In EMNLP, 2016.

[48]

J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP, 2014.

[49]

Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M.-H. Yang. Hedged deep tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4303-4311, 2016.

[50]

A. Radford, R. Józefowicz, and I. Sutskever. Learning to generate reviews and discovering sentiment. CoRR, abs/1704.01444, 2017.

[51]

P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.

[52]

P. Ramachandran, P. J. Liu, and Q. V. Le. Unsupervised pretraining for sequence to sequence learning. CoRR, abs/1611.02683, 2016.

[53]

K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In ECCV, 2010.

Digital Library

[54]

M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi. Bidirectional attention flow for machine comprehension. ICLR, 2017.

[55]

L. Sha, B. Chang, Z. Sui, and S. Li. Reading and thinking: Re-read LSTM unit for textual entailment recognition. In COLING, 2016.

[56]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[57]

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, 2013.

[58]

R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. In ACL, 2014.

[59]

L. Specia, S. Frank, K. Sima'an, and D. Elliott. A shared task on multimodal machine translation and crosslingual image description. In WMT, 2016.

[60]

I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.

Digital Library

[61]

N. Van-Tu and L. Anh-Cuong. Improving question classification by feature extraction and selection. Indian Journal of Science and Technology, 9(17), 2016.

[62]

E. M. Voorhees and D. M. Tice. The TREC-8 question answering track evaluation. In TREC, volume 1999, page 82, 1999.

[63]

S. Wang and J. Jiang. Machine comprehension using Match-LSTM and answer pointer. 2017.

[64]

W. Wang, N. Yang, F. Wei, B. Chang, and M. Zhou. Gated self-matching networks for reading comprehension and question answering. 2017.

[65]

J. Wieting, M. Bansal, K. Gimpel, and K. Livescu. Towards universal paraphrastic sentence embeddings. In ICLR, 2016.

[66]

C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and textual question answering. In Proceedings of The 33rd International Conference on Machine Learning, pages 2397-2406, 2016.

[67]

C. Xiong, V. Zhong, and R. Socher. Dynamic coattention networks for question answering. ICRL, 2017.

[68]

Y. Yu, W. Zhang, K. Hasan, M. Yu, B. Xiang, and B. Zhou. End-to-end reading comprehension with dynamic answer chunk ranking. ICLR, 2017.

[69]

R. Zhang, H. Lee, and D. R. Radev. Dependency sensitive convolutional neural networks for modeling sentences and documents. In HLT-NAACL, 2016.

[70]

P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, and B. Xu. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In COLING, 2016.

[71]

Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G.-R. Xue, Y. Yu, and Q. Yang. Heterogeneous transfer learning for image classification. In AAAI, 2011.

Cited By

Nikoulina VTezekbayev MKozhakhmet NBabazhanova M Gallé M Assylbekov Z(2022)The Rediscovery HypothesisJournal of Artificial Intelligence Research10.1613/jair.1.1278872(1343-1384)Online publication date: 4-Jan-2022
https://dl.acm.org/doi/10.1613/jair.1.12788
Munir KBai HZhao HZhao J(2021)Memorizing All for Implicit Discourse Relation RecognitionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348501621:3(1-20)Online publication date: 13-Dec-2021
https://dl.acm.org/doi/10.1145/3485016
Metzler DTay YBahri DNajork M(2021)Rethinking searchACM SIGIR Forum10.1145/3476415.347642855:1(1-27)Online publication date: 16-Jul-2021
https://dl.acm.org/doi/10.1145/3476415.3476428
Show More Cited By

Learned in translation: contextualized word vectors
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Overview of Verb Phrase Translation in Machine Translation: English to Tamil and Hindi to Tamil
FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation

We present an overview of verb phrase translation in machine translation from English to Tamil and Hindi to Tamil track, where English, Hindi and Tamil belong to three different language families, namely, Indo-European, Indo-Aryan and Dravidian family ...
Word Sense Based Hindi-Tamil Statistical Machine Translation

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
English to Malayalam translation: a statistical approach
A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India

This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

December 2017

7104 pages

ISBN:9781510860964

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 04 December 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
251
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)7

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nikoulina VTezekbayev MKozhakhmet NBabazhanova M Gallé M Assylbekov Z(2022)The Rediscovery HypothesisJournal of Artificial Intelligence Research10.1613/jair.1.1278872(1343-1384)Online publication date: 4-Jan-2022
https://dl.acm.org/doi/10.1613/jair.1.12788
Munir KBai HZhao HZhao J(2021)Memorizing All for Implicit Discourse Relation RecognitionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348501621:3(1-20)Online publication date: 13-Dec-2021
https://dl.acm.org/doi/10.1145/3485016
Metzler DTay YBahri DNajork M(2021)Rethinking searchACM SIGIR Forum10.1145/3476415.347642855:1(1-27)Online publication date: 16-Jul-2021
https://dl.acm.org/doi/10.1145/3476415.3476428
Wu QHare AWang STu YLiu ZBrinton CLi Y(2021)BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and SegmentationACM Transactions on Intelligent Systems and Technology10.1145/346826812:5(1-29)Online publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1145/3468268
Shivachi CMokhosi RShijie ZQihe L(2021)Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech TaggingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344597520:4(1-25)Online publication date: 26-May-2021
https://dl.acm.org/doi/10.1145/3445975
Naseem URazzak IKhan SPrasad M(2021)A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language ModelsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/343423720:5(1-35)Online publication date: 30-Jun-2021
https://dl.acm.org/doi/10.1145/3434237
Ni PLi YLi GChang V(2021)A Hybrid Siamese Neural Network for Natural Language Inference in Cyber-Physical SystemsACM Transactions on Internet Technology10.1145/341820821:2(1-25)Online publication date: 15-Mar-2021
https://dl.acm.org/doi/10.1145/3418208
Hosseini-Asl EMcCann BWu CYavuz SSocher RLarochelle HRanzato MHadsell RBalcan MLin H(2020)A simple language model for task-oriented dialogueProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497418(20179-20191)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497418
Lewis MGhazvininejad MGhosh GAghajanyan AWang SZettlemoyer LLarochelle HRanzato MHadsell RBalcan MLin H(2020)Pre-training via ParaphrasingProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497275(18470-18481)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497275
Lubis NHeck Mvan Niekerk CGašić M(2020)Adaptable Conversational MachinesAI Magazine10.1609/aimag.v41i3.532241:3(28-44)Online publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1609/aimag.v41i3.5322
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents