Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Finding decision jumps in text classification

Published: 02 January 2020 Publication History

Highlights

We propose Jumper, a novel framework that models text classification as a sequential decision process.
Experiments show that Jumper makes decisions whenever the evidence is enough, therefore reducing total text reading by 30-40% and often finding the key rationale of prediction.
Jumper achieves classification accuracy better than or comparable to state-of-the-art models in several benchmarks and industrial datasets.
Jumper is able to make a decision at the theoretically optimal decision position.

Abstract

Text classification is one of the key problems in natural language processing (NLP), and in early years, it was usually accomplished by feature-based machine learning models. Recently, the deep neural network has become a powerful learning machine, making it possible to work with text itself as raw input for the classification problems. However, existing neural networks are typically end-to-end and lack explicit interpretation of the prediction. In this paper, we propose Jumper, a novel framework that models text classification as a sequential decision process. Generally, Jumper is a neural system that scans a piece of text sequentially and makes classification decisions at the time it wishes, which is inspired by the cognitive process of human text reading. In our framework, both the classification result and when to make the classification are part of the decision process, controlled by a policy network and trained with reinforcement learning. Experimental results of real-world applications demonstrate the following properties of a properly trained Jumper: (1) it tends to make decisions whenever the evidence is enough, therefore reducing total text reading by 30–40% and often finding the key rationale of the prediction; and (2) it achieves classification accuracy better than or comparable to state-of-the-art models in several benchmark and industrial datasets. We further conduct a simulation experiment with mock data, which confirms that Jumper is able to make a decision at the theoretically optimal decision position.

References

[1]
D. Bahdanau, Cho K., Y. Bengio, Neural machine translation by jointly learning to align and translate, ICLR, 2015.
[2]
R. Bellman, A Markovian decision process, J. Math. Mech. 6 (5) (1957) 679–684.
[3]
Y. Bengio, Practical recommendations for gradient-based training of deep architectures, Neural Networks: Tricks of the Trade, Springer, 2012, pp. 437–478.
[4]
P. Blunsom, E. Grefenstette, N. Kalchbrenner, A convolutional neural network for modelling sentences, ACL, 2014, pp. 655–665.
[5]
Cho K., B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder–decoder for statistical machine translation, EMNLP, 2014, pp. 1724–1734.
[6]
J. Devlin, Chang M., Lee K., K. Toutanova, Bert: pre-training of deep bidirectional transformers for language understanding, NAACL-HLT (2019) 4171–4186.
[7]
K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber, LSTM: a search space odyssey, IEEE Trans. Nerual Netw. Learning Syst. 28 (10) (2015) 2222–2232.
[8]
He K., Zhang X., S. Ren, Sun J., Deep residual learning for image recognition, CVPR, 2016, pp. 770–778.
[9]
He X., D. Golub, Character-level question answering with attention, EMNLP, 2016, pp. 1598–1607.
[10]
Hu B., Lu Z., Li H., Chen Q., Convolutional neural network architectures for matching natural language sentences, NIPS, 2014, pp. 2042–2050.
[11]
K.S. Jones, A statistical interpretation of term specificity and its application in retrieval, J. Documentation. 60 (5) (1972) 493–502.
[12]
A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of tricks for efficient text classification, EACL, 2017, pp. 427–431.
[13]
I. Kanaris, K. Kanaris, I. Houvardas, E. Stamatatos, Words versus character n-grams for anti-spam filtering, Int. J. Artif. Intell. Tools 16 (06) (2007) 1047–1067.
[14]
Kim Y., Convolutional neural networks for sentence classification, EMNLP, 2014, pp. 1746–1751.
[15]
Lei T., R. Barzilay, T. Jaakkola, Rationalizing neural predictions, EMNLP, 2016, pp. 107–117.
[16]
Li L., C.R. Weinberg, T.A. Darden, L.G. Pedersen, Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics 17 (12) (2001) 1131–1142.
[17]
Lin Z., Feng M., C.N.d. Santos, Yu M., Xiang B., Zhou B., Y. Bengio, A structured self-attentive sentence embedding, ICLR, 2017.
[18]
Liu X., Mou L., Cui H., Lu Z., Song S., Jumper: learning when to make classification decisions in reading, IJCAI, 2018, pp. 4237–4243.
[19]
L.M. Manevitz, M. Yousef, One-class SVMs for document classification, J. Mach. Learn. Res. 2 (Dec) (2001) 139–154.
[20]
C. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.
[21]
G. Marcus, Deep learning: a critical appraisal, arXiv preprint arXiv:1801.00631, 2018.
[22]
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, ICLR (2013).
[23]
T. Mikolov, I. Sutskever, Chen K., G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, NIPS, 2013, pp. 3111–3119.
[24]
Mou L., Lu Z., Li H., Jin Z., Coupling distributed and symbolic execution for natural language queries, ICML, 2017, pp. 2518–2526.
[25]
Mou L., Peng H., Li G., Xu Y., Zhang L., Jin Z., Discriminative neural sentence modeling by tree-based convolution, EMNLP, 2015, pp. 2315–2325.
[26]
Pang B., Lee L., A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, ACL, 2004, pp. 271–278.
[27]
Pang B., Lee L., S. Vaithyanathan, Thumbs up?: sentiment classification using machine learning techniques, ACL,, 2002, pp. 79–86.
[28]
A. Paszke, S. Gross, S. Chintala, G. Chanan, Yang E., Z. DeVito, Lin Z., A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch, NIPS Autodiff Workshop (2017).
[29]
K. Pearson, Note on regression and inheritance in the case of two parents, Proc. R. Soc. Lond. 58 (1895) 240–242.
[30]
J. Pennington, R. Socher, C.D. Manning, GloVe: global vectors for word representation, EMNLP, 2014, pp. 1532–1543.
[31]
P. Rajpurkar, Zhang J., K. Lopyrev, Liang P., SQuAD: 100,000+ questions for machine comprehension of text, EMNLP, 2016, pp. 2383–2392.
[32]
G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, L. Chanona-Hernández, Syntactic dependency-based n-grams as classification features, Mexican International Conference on Artificial Intelligence, Springer, 2012, pp. 1–11.
[33]
R. Socher, J. Pennington, Huang E.H., Ng A.Y., C.D. Manning, Semi-supervised recursive autoencoders for predicting sentiment distributions, EMNLP, 2011, pp. 151–161.
[34]
R.S. Sutton, A.G. Barto, F. Bach, Reinforcement Learning: An Introduction, MIT press, 1998.
[35]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, NIPS, 2014, pp. 5998–6008.
[36]
Wang S., C.D. Manning, Baselines and bigrams: simple, good sentiment and topic classification, ACL (2), 2012, pp. 90–94.
[37]
G. Weikum, Foundations of statistical natural language processing, International Conference on Management of Data, 31, 2002, pp. 37–38.
[38]
Wen T.-H., D. Vandyke, N. Mrkšić, M. Gasic, L.M. Rojas Barahona, Su P.-H., S. Ultes, Young S., A network-based end-to-end trainable task-oriented dialogue system, EACL, 2017, pp. 438–449.
[39]
R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8 (1992) 229–256.
[40]
Xu B., Guo X., Ye Y., Cheng J., An improved random forest classifier for text categorization, Journal of Computers 7 (12) (2012) 2913–2920.
[41]
Yang Z., Yang D., C. Dyer, He X., A. Smola, E. Hovy, Hierarchical attention networks for document classification, NAACL-HLT, 2016, pp. 1480–1489.
[42]
Yu A.W., Lee H., Le Q., Learning to skim text, ACL, 2017, pp. 1880–1890.
[43]
K. Yu, Y. Liu, A. G. Schwing, J. Peng, Fast and accurate text classification: Skimming, rereading and early stopping, ICLR Workshop, 2018.
[44]
Zeng D., Liu K., Lai S., Zhou G., Zhao J., Relation classification via convolutional deep neural network, COLING, 2014, pp. 2335–2344.
[45]
Zhang X., Zhao J., LeCun Y., Character-level convolutional networks for text classification, NIPS, 2015, pp. 649–657.
[46]
Zhang Y., I. Marshall, B.C. Wallace, Rationale-augmented convolutional neural networks for text classification, EMNLP, 2016, pp. 795–804.
[47]
X. Zhou, X. Wan, J. Xiao, Attention-based lstm network for cross-lingual sentiment classification, EMNLP (2016) 247–256.
[48]
Zhu X., P. Sobhani, Guo H., Long short-term memory over tree structures, ICML, 2015, pp. 1604–1612.

Cited By

View all
  • (2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
  • (2023)Early Classifying Multimodal SequencesProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614163(183-189)Online publication date: 9-Oct-2023
  • (2023)A resource-efficient ECG diagnosis model for mobile health devicesInformation Sciences: an International Journal10.1016/j.ins.2023.119628648:COnline publication date: 1-Nov-2023
  • Show More Cited By

Index Terms

  1. Finding decision jumps in text classification
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Neurocomputing
    Neurocomputing  Volume 371, Issue C
    Jan 2020
    200 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 02 January 2020

    Author Tags

    1. Text classification
    2. Reinforcement learning
    3. Weak supervision
    4. Rationalizing neural prediction

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
    • (2023)Early Classifying Multimodal SequencesProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614163(183-189)Online publication date: 9-Oct-2023
    • (2023)A resource-efficient ECG diagnosis model for mobile health devicesInformation Sciences: an International Journal10.1016/j.ins.2023.119628648:COnline publication date: 1-Nov-2023
    • (2023)MTD: Multi-Timestep Detector for Delayed Streaming PerceptionPattern Recognition and Computer Vision10.1007/978-981-99-8435-0_27(337-349)Online publication date: 13-Oct-2023
    • (2023)A Policy for Early Sequence ClassificationArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44207-0_5(50-61)Online publication date: 26-Sep-2023
    • (2022)Learning robust rule representations for abstract reasoning via internal inferencesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602701(33550-33562)Online publication date: 28-Nov-2022
    • (2021)Deep Learning--based Text ClassificationACM Computing Surveys10.1145/343972654:3(1-40)Online publication date: 17-Apr-2021

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media