research-article

Finding decision jumps in text classification

Authors:

Sen SongAuthors Info & Claims

Volume 371, Issue C

Pages 177 - 187

https://doi.org/10.1016/j.neucom.2019.08.082

Published: 02 January 2020 Publication History

Highlights

•

We propose Jumper, a novel framework that models text classification as a sequential decision process.

•

Experiments show that Jumper makes decisions whenever the evidence is enough, therefore reducing total text reading by 30-40% and often finding the key rationale of prediction.

•

Jumper achieves classification accuracy better than or comparable to state-of-the-art models in several benchmarks and industrial datasets.

•

Jumper is able to make a decision at the theoretically optimal decision position.

Abstract

Text classification is one of the key problems in natural language processing (NLP), and in early years, it was usually accomplished by feature-based machine learning models. Recently, the deep neural network has become a powerful learning machine, making it possible to work with text itself as raw input for the classification problems. However, existing neural networks are typically end-to-end and lack explicit interpretation of the prediction. In this paper, we propose Jumper, a novel framework that models text classification as a sequential decision process. Generally, Jumper is a neural system that scans a piece of text sequentially and makes classification decisions at the time it wishes, which is inspired by the cognitive process of human text reading. In our framework, both the classification result and when to make the classification are part of the decision process, controlled by a policy network and trained with reinforcement learning. Experimental results of real-world applications demonstrate the following properties of a properly trained Jumper: (1) it tends to make decisions whenever the evidence is enough, therefore reducing total text reading by 30–40% and often finding the key rationale of the prediction; and (2) it achieves classification accuracy better than or comparable to state-of-the-art models in several benchmark and industrial datasets. We further conduct a simulation experiment with mock data, which confirms that Jumper is able to make a decision at the theoretically optimal decision position.

References

[1]

D. Bahdanau, Cho K., Y. Bengio, Neural machine translation by jointly learning to align and translate, ICLR, 2015.

[2]

R. Bellman, A Markovian decision process, J. Math. Mech. 6 (5) (1957) 679–684.

[3]

Y. Bengio, Practical recommendations for gradient-based training of deep architectures, Neural Networks: Tricks of the Trade, Springer, 2012, pp. 437–478.

[4]

P. Blunsom, E. Grefenstette, N. Kalchbrenner, A convolutional neural network for modelling sentences, ACL, 2014, pp. 655–665.

[5]

Cho K., B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder–decoder for statistical machine translation, EMNLP, 2014, pp. 1724–1734.

[6]

J. Devlin, Chang M., Lee K., K. Toutanova, Bert: pre-training of deep bidirectional transformers for language understanding, NAACL-HLT (2019) 4171–4186.

[7]

K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber, LSTM: a search space odyssey, IEEE Trans. Nerual Netw. Learning Syst. 28 (10) (2015) 2222–2232.

[8]

He K., Zhang X., S. Ren, Sun J., Deep residual learning for image recognition, CVPR, 2016, pp. 770–778.

[9]

He X., D. Golub, Character-level question answering with attention, EMNLP, 2016, pp. 1598–1607.

[10]

Hu B., Lu Z., Li H., Chen Q., Convolutional neural network architectures for matching natural language sentences, NIPS, 2014, pp. 2042–2050.

[11]

K.S. Jones, A statistical interpretation of term specificity and its application in retrieval, J. Documentation. 60 (5) (1972) 493–502.

[12]

A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of tricks for efficient text classification, EACL, 2017, pp. 427–431.

[13]

I. Kanaris, K. Kanaris, I. Houvardas, E. Stamatatos, Words versus character n-grams for anti-spam filtering, Int. J. Artif. Intell. Tools 16 (06) (2007) 1047–1067.

[14]

Kim Y., Convolutional neural networks for sentence classification, EMNLP, 2014, pp. 1746–1751.

[15]

Lei T., R. Barzilay, T. Jaakkola, Rationalizing neural predictions, EMNLP, 2016, pp. 107–117.

[16]

Li L., C.R. Weinberg, T.A. Darden, L.G. Pedersen, Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics 17 (12) (2001) 1131–1142.

[17]

Lin Z., Feng M., C.N.d. Santos, Yu M., Xiang B., Zhou B., Y. Bengio, A structured self-attentive sentence embedding, ICLR, 2017.

[18]

Liu X., Mou L., Cui H., Lu Z., Song S., Jumper: learning when to make classification decisions in reading, IJCAI, 2018, pp. 4237–4243.

[19]

L.M. Manevitz, M. Yousef, One-class SVMs for document classification, J. Mach. Learn. Res. 2 (Dec) (2001) 139–154.

[20]

C. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.

[21]

G. Marcus, Deep learning: a critical appraisal, arXiv preprint arXiv:1801.00631, 2018.

[22]

T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, ICLR (2013).

[23]

T. Mikolov, I. Sutskever, Chen K., G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, NIPS, 2013, pp. 3111–3119.

[24]

Mou L., Lu Z., Li H., Jin Z., Coupling distributed and symbolic execution for natural language queries, ICML, 2017, pp. 2518–2526.

[25]

Mou L., Peng H., Li G., Xu Y., Zhang L., Jin Z., Discriminative neural sentence modeling by tree-based convolution, EMNLP, 2015, pp. 2315–2325.

[26]

Pang B., Lee L., A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, ACL, 2004, pp. 271–278.

[27]

Pang B., Lee L., S. Vaithyanathan, Thumbs up?: sentiment classification using machine learning techniques, ACL,, 2002, pp. 79–86.

[28]

A. Paszke, S. Gross, S. Chintala, G. Chanan, Yang E., Z. DeVito, Lin Z., A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch, NIPS Autodiff Workshop (2017).

[29]

K. Pearson, Note on regression and inheritance in the case of two parents, Proc. R. Soc. Lond. 58 (1895) 240–242.

[30]

J. Pennington, R. Socher, C.D. Manning, GloVe: global vectors for word representation, EMNLP, 2014, pp. 1532–1543.

[31]

P. Rajpurkar, Zhang J., K. Lopyrev, Liang P., SQuAD: 100,000+ questions for machine comprehension of text, EMNLP, 2016, pp. 2383–2392.

[32]

G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, L. Chanona-Hernández, Syntactic dependency-based n-grams as classification features, Mexican International Conference on Artificial Intelligence, Springer, 2012, pp. 1–11.

[33]

R. Socher, J. Pennington, Huang E.H., Ng A.Y., C.D. Manning, Semi-supervised recursive autoencoders for predicting sentiment distributions, EMNLP, 2011, pp. 151–161.

[34]

R.S. Sutton, A.G. Barto, F. Bach, Reinforcement Learning: An Introduction, MIT press, 1998.

Digital Library

[35]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, NIPS, 2014, pp. 5998–6008.

[36]

Wang S., C.D. Manning, Baselines and bigrams: simple, good sentiment and topic classification, ACL (2), 2012, pp. 90–94.

[37]

G. Weikum, Foundations of statistical natural language processing, International Conference on Management of Data, 31, 2002, pp. 37–38.

[38]

Wen T.-H., D. Vandyke, N. Mrkšić, M. Gasic, L.M. Rojas Barahona, Su P.-H., S. Ultes, Young S., A network-based end-to-end trainable task-oriented dialogue system, EACL, 2017, pp. 438–449.

[39]

R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (1992) 229–256.

Digital Library

[40]

Xu B., Guo X., Ye Y., Cheng J., An improved random forest classifier for text categorization, Journal of Computers 7 (12) (2012) 2913–2920.

[41]

Yang Z., Yang D., C. Dyer, He X., A. Smola, E. Hovy, Hierarchical attention networks for document classification, NAACL-HLT, 2016, pp. 1480–1489.

[42]

Yu A.W., Lee H., Le Q., Learning to skim text, ACL, 2017, pp. 1880–1890.

[43]

K. Yu, Y. Liu, A. G. Schwing, J. Peng, Fast and accurate text classification: Skimming, rereading and early stopping, ICLR Workshop, 2018.

[44]

Zeng D., Liu K., Lai S., Zhou G., Zhao J., Relation classification via convolutional deep neural network, COLING, 2014, pp. 2335–2344.

[45]

Zhang X., Zhao J., LeCun Y., Character-level convolutional networks for text classification, NIPS, 2015, pp. 649–657.

[46]

Zhang Y., I. Marshall, B.C. Wallace, Rationale-augmented convolutional neural networks for text classification, EMNLP, 2016, pp. 795–804.

[47]

X. Zhou, X. Wan, J. Xiao, Attention-based lstm network for cross-lingual sentiment classification, EMNLP (2016) 247–256.

[48]

Zhu X., P. Sobhani, Guo H., Long short-term memory over tree structures, ICML, 2015, pp. 1604–1612.

Cited By

Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Cao AUtke JKlabjan D(2023)Early Classifying Multimodal SequencesProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614163(183-189)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3577190.3614163
Tao RWang LWu B(2023)A resource-efficient ECG diagnosis model for mobile health devicesInformation Sciences: an International Journal10.1016/j.ins.2023.119628648:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.ins.2023.119628
Show More Cited By

Index Terms

Finding decision jumps in text classification
1. Theory of computation
  1. Theory and algorithms for application domains

Index terms have been assigned to the content through auto-classification.

Recommendations

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Urdu text classification
FIT '09: Proceedings of the 7th International Conference on Frontiers of Information Technology

This paper compares statistical techniques for text classification using Naïve Bayes and Support Vector Machines, in context of Urdu language. A large corpus is used for training and testing purpose of the classifiers. However, those classifiers cannot ...
Boosting to correct inductive bias in text classification
CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

This paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 371, Issue C

Jan 2020

200 pages

ISSN:0925-2312

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 02 January 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Cao AUtke JKlabjan D(2023)Early Classifying Multimodal SequencesProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614163(183-189)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3577190.3614163
Tao RWang LWu B(2023)A resource-efficient ECG diagnosis model for mobile health devicesInformation Sciences: an International Journal10.1016/j.ins.2023.119628648:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.ins.2023.119628
Huang YChen N(2023)MTD: Multi-Timestep Detector for Delayed Streaming PerceptionPattern Recognition and Computer Vision10.1007/978-981-99-8435-0_27(337-349)Online publication date: 13-Oct-2023
https://dl.acm.org/doi/10.1007/978-981-99-8435-0_27
Cao AUtke JKlabjan D(2023)A Policy for Early Sequence ClassificationArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44207-0_5(50-61)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-44207-0_5
Zhang WTang LMo SSong SLiu XKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Learning robust rule representations for abstract reasoning via internal inferencesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602701(33550-33562)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602701
Minaee SKalchbrenner NCambria ENikzad NChenaghlu MGao J(2021)Deep Learning--based Text ClassificationACM Computing Surveys10.1145/343972654:3(1-40)Online publication date: 17-Apr-2021
https://dl.acm.org/doi/10.1145/3439726

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents