Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1609/aaai.v33i01.33016940guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Revisiting LSTM networks for semi-supervised text classification via mixed objective function

Published: 27 January 2019 Publication History

Abstract

In this paper, we study bidirectional LSTM network for the task of text classification using both supervised and semi-supervised approaches. Several prior works have suggested that either complex pretraining schemes using unsupervised methods such as language modeling (Dai and Le 2015; Miyato, Dai, and Goodfellow 2016) or complicated models (Johnson and Zhang 2017) are necessary to achieve a high classification accuracy. However, we develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report state-of-the-art results for text classification task on several benchmark datasets. In particular, on the ACL-IMDB sentiment analysis and AG-News topic classification datasets, our method outperforms current approaches by a substantial margin. We also show the generality of the mixed objective function by improving the performance on relation extraction task.1

References

[1]
Blum, A., and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In COLT.
[2]
Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; and Kuksa, P. 2011. Natural language processing (almost) from scratch. JMLR.
[3]
Dai, A. M., and Le, Q. V. 2015. Semi-supervised sequence learning. In NIPS.
[4]
Deerwester, S.; Dumais, S. T.; Furnas, G. W.; Landauer, T. K.; and Harshman, R. 1990. Indexing by latent semantic analysis. JAIST.
[5]
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Computing Research Repository arXiv:1810.04805.
[6]
Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and harnessing adversarial examples. Computing Research Repository arXiv: 1412.6572.
[7]
Grandvalet, Y., and Bengio, Y. 2004. Semi-supervised learning by entropy minimization. In NIPS.
[8]
Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Computation.
[9]
Howard, J., and Ruder, S. 2018. Universal language model finetuning for text classification. In ACL.
[10]
Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In ECML.
[11]
Johnson, R., and Zhang, T. 2015a. Effective use of word order for text categorization with convolutional neural networks. In NAACL.
[12]
Johnson, R., and Zhang, T. 2015b. Semi-supervised convolutional neural networks for text categorization via region embedding. In NIPS.
[13]
Johnson, R., and Zhang, T. 2016. Supervised and semi-supervised text categorization using lstm for region embeddings. In ICML.
[14]
Johnson, R., and Zhang, T. 2017. Deep pyramid convolutional neural networks for text categorization. In ACL.
[15]
Joulin, A.; Grave, E.; Bojanowski, P.; and Mikolov, T. 2017. Bag of tricks for efficient text classification. In EACL.
[16]
Kim, Y. 2014. Convolutional neural networks for sentence classification. In EMNLP.
[17]
Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. Computing Research Repository arXiv:1412.6980.
[18]
Lai, S.; Xu, L.; Liu, K.; and Zhao, J. 2015. Recurrent convolutional neural networks for text classification. In AAAI.
[19]
Le, Q., and Mikolov, T. 2014. Distributed representations of sentences and documents. In ICML.
[20]
Lewis, D. D.; Yang, Y.; Rose, T. G.; and Li, F. 2004. Rcvl: A new benchmark collection for text categorization research. JMLR.
[21]
Maas, A. L.; Daly, R. E.; Pham, P. T.; Huang, D.; Ng, A. Y.; and Potts, C. 2011. Learning word vectors for sentiment analysis. In ACL.
[22]
McCallum, A., and Nigam, K. 1998. A comparison of event models for naive bayes text classification. In Workshop on Learning for Text Categorization, AAAI.
[23]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In NIPS.
[24]
Mikolov, T.; Grave, E.; Bojanowski, P.; Puhrsch, C.; and Joulin, A. 2018. Advances in pre-training distributed word representations. In LREC.
[25]
Miyato, T.; Maeda, S.-i.; Koyama, M.; and Ishii, S. 2018. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE TPAMI.
[26]
Miyato, T.; Dai, A. M.; and Goodfellow, I. 2016. Adversarial training methods for semi-supervised text classification. Computing Research Repository arXiv:1605.07725.
[27]
Nigam, K.; McCallum, A. K.; Thrun, S.; and Mitchell, T. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning.
[28]
Pang, B., and Lee, L. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retr.
[29]
Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch. In NIPS-W.
[30]
Peters, M.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer, L. 2018. Deep contextualized word representations. In NAACL.
[31]
Qi, Y.; Sachan, D.; Felix, M.; Padmanabhan, S.; and Neubig, G. 2018. When and why are pre-trained word embeddings useful for neural machine translation? In NAACL.
[32]
Ramachandran, P.; Liu, P.; and Le, Q. 2017. Unsupervised pretraining for sequence to sequence learning. In EMNLP.
[33]
Sachan, D. S.; Zaheer, M.; and Salakhutdinov, R. 2018. Investigating the working of text classifiers. In COLING.
[34]
Sahami, M.; Dumais, S.; Heckerman, D.; and Horvitz, E. 1998. A bayesian approach to filtering junk E-mail. In Workshop on Learning for Text Categorization, AAAI.
[35]
Santos, C. d.; Xiang, B.; and Zhou, B. 2015. Classifying relations by ranking with convolutional neural networks. In ACL.
[36]
Schuster, M., and Paliwal, K. 1997. Bidirectional recurrent neural networks. Trans. Sig. Proc.
[37]
Tang, D.; Qin, B.; and Liu, T. 2015. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP.
[38]
Werbos, P. J. 1988. Generalization of backpropagation with application to a recurrent gas market model. Neural networks.
[39]
Xu, Y.; Mou, L.; Li, G.; Chen, Y.; Peng, H.; and Jin, Z. 2015. Classifying relations via long short term memory networks along shortest dependency paths. In EMNLP.
[40]
Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; and Hovy, E. 2016. Hierarchical attention networks for document classification. In NAACL.
[41]
Zhang, Y.; Zhong, V.; Chen, D.; Angeli, G.; and Manning, C. D. 2017. Position-aware attention and supervised data improve slot filling. In EMNLP.
[42]
Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level convolutional networks for text classification. In NIPS.
[43]
Zhou, C.; Sun, C.; Liu, Z.; and Lau, F. C. M. 2015. A C-LSTM neural network for text classification. Computing Research Repository arXiv: 1511.08630.
[44]
Zhou, P.; Qi, Z.; Zheng, S.; Xu, J.; Bao, H.; and Xu, B. 2016. Text classification improved by integrating bidirectional lstm with two-dimensional max pooling. In COLING.

Cited By

View all
  • (2024)I-SFND: a novel interpretable self-ensembled semi-supervised model based on transformers for fake news detectionJournal of Intelligent Information Systems10.1007/s10844-023-00821-062:2(355-375)Online publication date: 1-Apr-2024
  • (2023)CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short TextsProceedings of the 29th Brazilian Symposium on Multimedia and the Web10.1145/3617023.3617039(110-118)Online publication date: 23-Oct-2023
  • (2023)UniSA: Unified Generative Framework for Sentiment AnalysisProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612336(6132-6142)Online publication date: 26-Oct-2023
  • Show More Cited By

Index Terms

  1. Revisiting LSTM networks for semi-supervised text classification via mixed objective function
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence
          January 2019
          10088 pages
          ISBN:978-1-57735-809-1

          Sponsors

          • Association for the Advancement of Artificial Intelligence

          Publisher

          AAAI Press

          Publication History

          Published: 27 January 2019

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)13
          • Downloads (Last 6 weeks)4
          Reflects downloads up to 26 Sep 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)I-SFND: a novel interpretable self-ensembled semi-supervised model based on transformers for fake news detectionJournal of Intelligent Information Systems10.1007/s10844-023-00821-062:2(355-375)Online publication date: 1-Apr-2024
          • (2023)CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short TextsProceedings of the 29th Brazilian Symposium on Multimedia and the Web10.1145/3617023.3617039(110-118)Online publication date: 23-Oct-2023
          • (2023)UniSA: Unified Generative Framework for Sentiment AnalysisProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612336(6132-6142)Online publication date: 26-Oct-2023
          • (2022)C-BDCLSTMApplied Soft Computing10.1016/j.asoc.2022.109659130:COnline publication date: 1-Nov-2022
          • (2021)Deep Learning--based Text ClassificationACM Computing Surveys10.1145/343972654:3(1-40)Online publication date: 17-Apr-2021
          • (2021)Associative Graphs for Fine-Grained Text Sentiment AnalysisNeural Information Processing10.1007/978-3-030-92270-2_21(238-249)Online publication date: 8-Dec-2021
          • (2020)Time-aware large kernel convolutionsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525511(6172-6183)Online publication date: 13-Jul-2020
          • (2020)Not all unlabeled data are equalProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497552(21786-21797)Online publication date: 6-Dec-2020

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media