research-article

Free access

Revisiting LSTM networks for semi-supervised text classification via mixed objective function

AUTHORs:

Devendra Singh Sachan,

Ruslan SalakhutdinovAuthors Info & Claims

AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

Article No.: 852, Pages 6940 - 6948

https://doi.org/10.1609/aaai.v33i01.33016940

Published: 27 January 2019 Publication History

PDF eReader Publisher Site

Abstract

In this paper, we study bidirectional LSTM network for the task of text classification using both supervised and semi-supervised approaches. Several prior works have suggested that either complex pretraining schemes using unsupervised methods such as language modeling (Dai and Le 2015; Miyato, Dai, and Goodfellow 2016) or complicated models (Johnson and Zhang 2017) are necessary to achieve a high classification accuracy. However, we develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report state-of-the-art results for text classification task on several benchmark datasets. In particular, on the ACL-IMDB sentiment analysis and AG-News topic classification datasets, our method outperforms current approaches by a substantial margin. We also show the generality of the mixed objective function by improving the performance on relation extraction task.¹

References

[1]

Blum, A., and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In COLT.

[2]

Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; and Kuksa, P. 2011. Natural language processing (almost) from scratch. JMLR.

[3]

Dai, A. M., and Le, Q. V. 2015. Semi-supervised sequence learning. In NIPS.

[4]

Deerwester, S.; Dumais, S. T.; Furnas, G. W.; Landauer, T. K.; and Harshman, R. 1990. Indexing by latent semantic analysis. JAIST.

[5]

Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Computing Research Repository arXiv:1810.04805.

[6]

Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. Computing Research Repository arXiv: 1412.6572.

[7]

Grandvalet, Y., and Bengio, Y. 2004. Semi-supervised learning by entropy minimization. In NIPS.

[8]

Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Computation.

[9]

Howard, J., and Ruder, S. 2018. Universal language model finetuning for text classification. In ACL.

[10]

Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In ECML.

[11]

Johnson, R., and Zhang, T. 2015a. Effective use of word order for text categorization with convolutional neural networks. In NAACL.

[12]

Johnson, R., and Zhang, T. 2015b. Semi-supervised convolutional neural networks for text categorization via region embedding. In NIPS.

[13]

Johnson, R., and Zhang, T. 2016. Supervised and semi-supervised text categorization using lstm for region embeddings. In ICML.

[14]

Johnson, R., and Zhang, T. 2017. Deep pyramid convolutional neural networks for text categorization. In ACL.

[15]

Joulin, A.; Grave, E.; Bojanowski, P.; and Mikolov, T. 2017. Bag of tricks for efficient text classification. In EACL.

[16]

Kim, Y. 2014. Convolutional neural networks for sentence classification. In EMNLP.

[17]

Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. Computing Research Repository arXiv:1412.6980.

[18]

Lai, S.; Xu, L.; Liu, K.; and Zhao, J. 2015. Recurrent convolutional neural networks for text classification. In AAAI.

[19]

Le, Q., and Mikolov, T. 2014. Distributed representations of sentences and documents. In ICML.

[20]

Lewis, D. D.; Yang, Y.; Rose, T. G.; and Li, F. 2004. Rcvl: A new benchmark collection for text categorization research. JMLR.

[21]

Maas, A. L.; Daly, R. E.; Pham, P. T.; Huang, D.; Ng, A. Y.; and Potts, C. 2011. Learning word vectors for sentiment analysis. In ACL.

[22]

McCallum, A., and Nigam, K. 1998. A comparison of event models for naive bayes text classification. In Workshop on Learning for Text Categorization, AAAI.

[23]

Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In NIPS.

[24]

Mikolov, T.; Grave, E.; Bojanowski, P.; Puhrsch, C.; and Joulin, A. 2018. Advances in pre-training distributed word representations. In LREC.

[25]

Miyato, T.; Maeda, S.-i.; Koyama, M.; and Ishii, S. 2018. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE TPAMI.

[26]

Miyato, T.; Dai, A. M.; and Goodfellow, I. 2016. Adversarial training methods for semi-supervised text classification. Computing Research Repository arXiv:1605.07725.

[27]

Nigam, K.; McCallum, A. K.; Thrun, S.; and Mitchell, T. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning.

[28]

Pang, B., and Lee, L. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retr.

[29]

Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch. In NIPS-W.

[30]

Peters, M.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer, L. 2018. Deep contextualized word representations. In NAACL.

[31]

Qi, Y.; Sachan, D.; Felix, M.; Padmanabhan, S.; and Neubig, G. 2018. When and why are pre-trained word embeddings useful for neural machine translation? In NAACL.

[32]

Ramachandran, P.; Liu, P.; and Le, Q. 2017. Unsupervised pretraining for sequence to sequence learning. In EMNLP.

[33]

Sachan, D. S.; Zaheer, M.; and Salakhutdinov, R. 2018. Investigating the working of text classifiers. In COLING.

[34]

Sahami, M.; Dumais, S.; Heckerman, D.; and Horvitz, E. 1998. A bayesian approach to filtering junk E-mail. In Workshop on Learning for Text Categorization, AAAI.

[35]

Santos, C. d.; Xiang, B.; and Zhou, B. 2015. Classifying relations by ranking with convolutional neural networks. In ACL.

[36]

Schuster, M., and Paliwal, K. 1997. Bidirectional recurrent neural networks. Trans. Sig. Proc.

Digital Library

[37]

Tang, D.; Qin, B.; and Liu, T. 2015. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP.

[38]

Werbos, P. J. 1988. Generalization of backpropagation with application to a recurrent gas market model. Neural networks.

[39]

Xu, Y.; Mou, L.; Li, G.; Chen, Y.; Peng, H.; and Jin, Z. 2015. Classifying relations via long short term memory networks along shortest dependency paths. In EMNLP.

[40]

Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; and Hovy, E. 2016. Hierarchical attention networks for document classification. In NAACL.

[41]

Zhang, Y.; Zhong, V.; Chen, D.; Angeli, G.; and Manning, C. D. 2017. Position-aware attention and supervised data improve slot filling. In EMNLP.

[42]

Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level convolutional networks for text classification. In NIPS.

[43]

Zhou, C.; Sun, C.; Liu, Z.; and Lau, F. C. M. 2015. A C-LSTM neural network for text classification. Computing Research Repository arXiv: 1511.08630.

[44]

Zhou, P.; Qi, Z.; Zheng, S.; Xu, J.; Bao, H.; and Xu, B. 2016. Text classification improved by integrating bidirectional lstm with two-dimensional max pooling. In COLING.

Cited By

U SR PM SR.B.V. S(2024)I-SFND: a novel interpretable self-ensembled semi-supervised model based on transformers for fake news detectionJournal of Intelligent Information Systems10.1007/s10844-023-00821-062:2(355-375)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10844-023-00821-0
Viegas FCanuto SCunha WFrança CValiense CRocha LGonçalves M(2023)CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short TextsProceedings of the 29th Brazilian Symposium on Multimedia and the Web10.1145/3617023.3617039(110-118)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3617023.3617039
Li ZLin TWu YLiu MTang FZhao MLi YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)UniSA: Unified Generative Framework for Sentiment AnalysisProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612336(6132-6142)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612336
Show More Cited By

Index Terms

Revisiting LSTM networks for semi-supervised text classification via mixed objective function
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
      2. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Improving Semi-Supervised Text Classification with Dual Meta-Learning
The goal of semi-supervised text classification (SSTC) is to train a model by exploring both a small number of labeled data and a large number of unlabeled data, such that the learned semi-supervised classifier performs better than the supervised ...
Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

Text categorization is one of the fundamental tasks in text mining. Classical supervised methods need lot of labeled data to train a classifier. Since assigning labels to the large amount of data is very costly and time consuming, it is useful to use ...
Active deep networks for semi-supervised sentiment classification
COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics: Posters

This paper presents a novel semi-supervised learning algorithm called Active Deep Networks (ADN), to address the semi-supervised sentiment classification problem with active learning. First, we propose the semi-supervised learning method of ADN. ADN is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

January 2019

10088 pages

ISBN:978-1-57735-809-1

Copyright © 2019 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 27 January 2019

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
19
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)4

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

U SR PM SR.B.V. S(2024)I-SFND: a novel interpretable self-ensembled semi-supervised model based on transformers for fake news detectionJournal of Intelligent Information Systems10.1007/s10844-023-00821-062:2(355-375)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10844-023-00821-0
Viegas FCanuto SCunha WFrança CValiense CRocha LGonçalves M(2023)CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short TextsProceedings of the 29th Brazilian Symposium on Multimedia and the Web10.1145/3617023.3617039(110-118)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3617023.3617039
Li ZLin TWu YLiu MTang FZhao MLi YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)UniSA: Unified Generative Framework for Sentiment AnalysisProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612336(6132-6142)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612336
Hou ZDu YLi WHu JLi HLi XChen X(2022)C-BDCLSTMApplied Soft Computing10.1016/j.asoc.2022.109659130:COnline publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1016/j.asoc.2022.109659
Minaee SKalchbrenner NCambria ENikzad NChenaghlu MGao J(2021)Deep Learning--based Text ClassificationACM Computing Surveys10.1145/343972654:3(1-40)Online publication date: 17-Apr-2021
https://dl.acm.org/doi/10.1145/3439726
Wójcik MHorzyk ABulanda D(2021)Associative Graphs for Fine-Grained Text Sentiment AnalysisNeural Information Processing10.1007/978-3-030-92270-2_21(238-249)Online publication date: 8-Dec-2021
https://dl.acm.org/doi/10.1007/978-3-030-92270-2_21
Lioutas VGuo YDaumé HSingh A(2020)Time-aware large kernel convolutionsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525511(6172-6183)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525511
Ren ZYeh RSchwing ALarochelle HRanzato MHadsell RBalcan MLin H(2020)Not all unlabeled data are equalProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497552(21786-21797)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497552

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents