Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3295222.3295377guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Learned in translation: contextualized word vectors

Published: 04 December 2017 Publication History

Abstract

Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.

References

[1]
E. Agirre, C. Banea, C. Cardie, D. M. Cer, M. T. Diab, A. Gonzalez-Agirre, W. Guo, R. Mihalcea, G. Rigau, and J. Wiebe. SemEval-2014 Task 10: Multilingual semantic textual similarity. In SemEval@COLING, 2014.
[2]
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
[3]
S. R. Bowman, C. Potts, and C. D. Manning. Recursive neural networks for learning logical semantics. CoRR, abs/1406.1827, 2014.
[4]
S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2015.
[5]
M. Cettolo, J. Niehues, S. Stüker, L. Bentivogli, R. Cattoni, and M. Federico. The IWSLT 2015 evaluation campaign. In IWSLT, 2015.
[6]
Q. Chen, X.-D. Zhu, Z.-H. Ling, S. Wei, and H. Jiang. Enhancing and combining sequential and tree LSTM for natural language inference. CoRR, abs/1609.06038, 2016.
[7]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. JMLR, 12:2493-2537, 2011.
[8]
A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017.
[9]
J. P. C. G. da Silva, L. Coheur, A. C. Mendes, and A. Wichert. From symbolic to sub-symbolic information in question classification. Artif. Intell. Rev., 35:137-154, 2011.
[10]
A. M. Dai and Q. V. Le. Semi-supervised sequence learning. In NIPS, 2015.
[11]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248-255, 2009.
[12]
A. B. Dieng, C. Wang, J. Gao, and J. W. Paisley. TopicRNN: A recurrent neural network with long-range semantic dependency. CoRR, abs/1611.01702, 2016.
[13]
L. Dong and M. Lapata. Language to logical form with neural attention. CoRR, abs/1601.01280, 2016.
[14]
A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847, 2016.
[15]
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580-587, 2014.
[16]
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, and Y. Bengio. Maxout networks. In ICML, 2013.
[17]
A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5):602-610, 2005.
[18]
H. Guo, C. Cherry, and J. Su. End-to-end multi-view networks for text classification. CoRR, abs/1704.05907, 2017.
[19]
K. Hashimoto, C. Xiong, Y. Tsuruoka, and R. Socher. A joint many-task model: Growing a neural network for multiple NLP tasks. CoRR, abs/1611.01587, 2016.
[20]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016.
[21]
F. Hill, K. Cho, and A. Korhonen. Learning distributed representations of sentences from unlabelled data. In HLT-NAACL, 2016.
[22]
F. Hill, K. Cho, S. Jean, and Y. Bengio. The representational geometry of word meanings acquired by neural machine translation models. Machine Translation, pages 1-16, 2017. ISSN 1573-0573. URL http://dx.doi.org/10.1007/s10590-017-9194-2.
[23]
M. Huang, Q. Qian, and X. Zhu. Encoding syntactic knowledge in neural networks for sentiment classification. ACM Trans. Inf. Syst., 35:26:1-26:27, 2017.
[24]
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
[25]
R. Johnson and T. Zhang. Supervised and semi-supervised text categorization using LSTM for region embeddings. In ICML, 2016.
[26]
R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip-thought vectors. In NIPS, 2015.
[27]
G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. OpenNMT: Open-source toolkit for neural machine translation. ArXiv e-prints, 2017.
[28]
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: Open source toolkit for statistical machine translation. In ACL, 2007.
[29]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
[30]
A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, and R. Socher. Ask me anything: Dynamic memory networks for natural language processing. In ICML, 2016.
[31]
X. Li and D. Roth. Learning question classifiers: The role of semantic information. Natural Language Engineering, 12:229-249, 2006.
[32]
B. Loni, G. van Tulder, P. Wiggers, D. M. J. Tax, and M. Loog. Question classification by weighted combination of lexical, syntactic and semantic features. In TSD, 2011.
[33]
M. Looks, M. Herreshoff, D. Hutchins, and P. Norvig. Deep learning with dynamic computation graphs. CoRR, abs/1702.02181, 2017.
[34]
J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. arXiv preprint arXiv:1612.01887, 2016.
[35]
T. Luong, H. Pham, and C. D. Manning. Effective approaches to attention-based neural machine translation. In EMNLP, 2015.
[36]
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142-150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-1015.
[37]
H. T. Madabushi and M. Lee. High accuracy rule-based question classification using question syntax and semantics. In COLING, 2016.
[38]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR (workshop), 2013.
[39]
S. Min, M. Seo, and H. Hajishirzi. Question answering through transfer learning from large fine-grained supervision data. 2017.
[40]
T. Miyato, A. M. Dai, and I. Goodfellow. Adversarial training methods for semi-supervised text classification. 2017.
[41]
L. Mou, H. Peng, G. Li, Y. Xu, L. Zhang, and Z. Jin. Discriminative neural sentence modeling by tree-based convolution. In EMNLP, 2015.
[42]
T. Munkhdalai and H. Yu. Neural semantic encoders. CoRR, abs/1607.04315, 2016a.
[43]
T. Munkhdalai and H. Yu. Neural tree indexers for text understanding. CoRR, abs/1607.04492, 2016b.
[44]
V. Nair and G. E. Hinton. Rectified linear units improve restricted Boltzmann machines. In ICML, 2010.
[45]
R. Nallapati, B. Zhou, C. N. dos Santos, Çaglar Gülçehre, and B. Xiang. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In CoNLL, 2016.
[46]
B. Paria, K. M. Annervaz, A. Dukkipati, A. Chatterjee, and S. Podder. A neural architecture mimicking humans end-to-end for natural language inference. CoRR, abs/1611.04741, 2016.
[47]
A. P. Parikh, O. Tackstrom, D. Das, and J. Uszkoreit. A decomposable attention model for natural language inference. In EMNLP, 2016.
[48]
J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP, 2014.
[49]
Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M.-H. Yang. Hedged deep tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4303-4311, 2016.
[50]
A. Radford, R. Józefowicz, and I. Sutskever. Learning to generate reviews and discovering sentiment. CoRR, abs/1704.01444, 2017.
[51]
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
[52]
P. Ramachandran, P. J. Liu, and Q. V. Le. Unsupervised pretraining for sequence to sequence learning. CoRR, abs/1611.02683, 2016.
[53]
K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In ECCV, 2010.
[54]
M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi. Bidirectional attention flow for machine comprehension. ICLR, 2017.
[55]
L. Sha, B. Chang, Z. Sui, and S. Li. Reading and thinking: Re-read LSTM unit for textual entailment recognition. In COLING, 2016.
[56]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[57]
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, 2013.
[58]
R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. In ACL, 2014.
[59]
L. Specia, S. Frank, K. Sima'an, and D. Elliott. A shared task on multimodal machine translation and crosslingual image description. In WMT, 2016.
[60]
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
[61]
N. Van-Tu and L. Anh-Cuong. Improving question classification by feature extraction and selection. Indian Journal of Science and Technology, 9(17), 2016.
[62]
E. M. Voorhees and D. M. Tice. The TREC-8 question answering track evaluation. In TREC, volume 1999, page 82, 1999.
[63]
S. Wang and J. Jiang. Machine comprehension using Match-LSTM and answer pointer. 2017.
[64]
W. Wang, N. Yang, F. Wei, B. Chang, and M. Zhou. Gated self-matching networks for reading comprehension and question answering. 2017.
[65]
J. Wieting, M. Bansal, K. Gimpel, and K. Livescu. Towards universal paraphrastic sentence embeddings. In ICLR, 2016.
[66]
C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and textual question answering. In Proceedings of The 33rd International Conference on Machine Learning, pages 2397-2406, 2016.
[67]
C. Xiong, V. Zhong, and R. Socher. Dynamic coattention networks for question answering. ICRL, 2017.
[68]
Y. Yu, W. Zhang, K. Hasan, M. Yu, B. Xiang, and B. Zhou. End-to-end reading comprehension with dynamic answer chunk ranking. ICLR, 2017.
[69]
R. Zhang, H. Lee, and D. R. Radev. Dependency sensitive convolutional neural networks for modeling sentences and documents. In HLT-NAACL, 2016.
[70]
P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, and B. Xu. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In COLING, 2016.
[71]
Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G.-R. Xue, Y. Yu, and Q. Yang. Heterogeneous transfer learning for image classification. In AAAI, 2011.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems
December 2017
7104 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 04 December 2017

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)7
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)The Rediscovery HypothesisJournal of Artificial Intelligence Research10.1613/jair.1.1278872(1343-1384)Online publication date: 4-Jan-2022
  • (2021)Memorizing All for Implicit Discourse Relation RecognitionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348501621:3(1-20)Online publication date: 13-Dec-2021
  • (2021)Rethinking searchACM SIGIR Forum10.1145/3476415.347642855:1(1-27)Online publication date: 16-Jul-2021
  • (2021)BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and SegmentationACM Transactions on Intelligent Systems and Technology10.1145/346826812:5(1-29)Online publication date: 15-Oct-2021
  • (2021)Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech TaggingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344597520:4(1-25)Online publication date: 26-May-2021
  • (2021)A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language ModelsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/343423720:5(1-35)Online publication date: 30-Jun-2021
  • (2021)A Hybrid Siamese Neural Network for Natural Language Inference in Cyber-Physical SystemsACM Transactions on Internet Technology10.1145/341820821:2(1-25)Online publication date: 15-Mar-2021
  • (2020)A simple language model for task-oriented dialogueProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497418(20179-20191)Online publication date: 6-Dec-2020
  • (2020)Pre-training via ParaphrasingProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497275(18470-18481)Online publication date: 6-Dec-2020
  • (2020)Adaptable Conversational MachinesAI Magazine10.1609/aimag.v41i3.532241:3(28-44)Online publication date: 1-Sep-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media