Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3331453.3361636acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaeConference Proceedingsconference-collections
research-article

Chinese Text Feature Extraction and Classification Based on Deep Learning

Published: 22 October 2019 Publication History

Abstract

With the rapid development of deep learning, neural networks have been widely used in natural language processing tasks and achieved good results. Since convolutional neural networks can acquire high-level features that can better represent textual semantic information, convolutional neural networks (CNN) and convolutional recurrent neural networks (CRNN) are used to establish feature extraction models to extract text features. At the same time, tf-idf and word2vec methods are used to represent text features, and then feed them into SVM and Random forest classifier to classify Chinese academic papers dataset. Experimental results show that the classification results obtained by using the CNN and CRNN feature extraction model are better than using the TF-IDF and Word2vec feature extraction methods. In addition, the classification results obtained by using SVM and Random forest classifier are better than that of the original neural network.

References

[1]
Y Chen, H Jiang, C Li, X Jia and P Ghamisi (Oct. 2016). Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Re-mote Sens. 54(10) 6232--6251.
[2]
H Liang, X Sun, Y Sun, Y Gao (2017). Text feature extraction based on deep learning: a review. EURASIP J. Wirel. Commun. Netw., 211. https://doi.org/10.1186/s13638-017-0993-1.
[3]
Dixa Saxena, S K Saritha and K.N.S.S.V. Prasad (2017). Survey Paper on Feature Extraction Methods in Text Categorization. International Journal of Computer Applications, 166(11), 1--7.
[4]
Yoon Kim (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
[5]
Krizhevsky A; Sutskever I, Hinton G E (2012). ImageNet classification with deep convolutional neural networks.In Advances in Neural Information Processing Systems. Neural Infor-mation Processing Systems Foundation,Inc.: South Lake Tahoe, NV, USA,1097--1105.
[6]
Girshick R; Donahue J; Darrell T; Malik J (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Columbus, OH, USA, 23-28 June 2014, 580--587.
[7]
Girshick R (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on Comput-er Vision (ICCV),Santiago, Chile, 13-16 December 2015, 1440--1448.
[8]
Simonyan K, Zisserman A (2014). Very deep convolutional networks for large-scale image recognition.arXiv 2014, arXiv:1409.1556.
[9]
He K, Zhang X, Ren S, Sun J (2016). Deep residual learning for image recognition. In Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 27-30 June 2016.
[10]
Xiang Zhang, Junbo Zhao, and Yann LeCun (2015). Character-level convolutional networks for text classification. InNIPS.
[11]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He,Alex Smola and Eduard Hovy (2016). Hierarchical attention networks for document classification.In Proceedings of NAACL-HLT, pages 1480--1489,San Diego, California, USA.
[12]
Alexis Conneau, Holger Schwenk,Loic Barrault, and Yann Le Cun (2017). Very Deep Con-volutional Networks for Text Classification.arXiv:1606.01781v2.
[13]
Armand Joulin, Edouard Grave, Piotr Bojanowski and Tomas Mikolov (2016). Bag of Tricks for Efficient Text Classification.arXiv:1607.01759v3.
[14]
AIZAWA A (2003). An information-theoretic perspective of TF-IDF measures. Inf. Pro-cess. Manage.39, 1, 45--65.
[15]
Hiemstra D (2000). A probabilistic justification for using tf×idf term weighting in infor-mation retrieval. International Journal on Digital Libraries, 3(2), 131--139.
[16]
Joachims T (1997). A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of the 14th international conference on machine learning (ICML'97) (pp. 143--151).
[17]
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013). Efficient Estimation of Word Repre-sentations in Vector Space. arXiv:1301.3781, 2013.
[18]
Quoc V Le, Tomas Mikolov (2014). Distributed Representations of Sentences and Documents. arXiv:1405.4053, 2014.

Cited By

View all
  • (2023)Block-Based Privacy-Preserving Healthcare Data Ranked Retrieval in Encrypted Cloud File SystemsIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2022.321268427:2(732-743)Online publication date: Feb-2023
  • (2023)How to prioritize perceived quality attributes from consumers' perspective? Analysis through social media dataElectronic Commerce Research10.1007/s10660-022-09652-7Online publication date: 10-Jan-2023
  • (2023)Privacy-Preserving Searchable Encryption Scheme Based on Deep Structured Semantic Model over Cloud ApplicationMachine Learning for Cyber Security10.1007/978-3-031-20099-1_49(584-608)Online publication date: 13-Jan-2023
  • Show More Cited By

Index Terms

  1. Chinese Text Feature Extraction and Classification Based on Deep Learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CSAE '19: Proceedings of the 3rd International Conference on Computer Science and Application Engineering
    October 2019
    942 pages
    ISBN:9781450362948
    DOI:10.1145/3331453
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN
    2. CRNN
    3. Feature extraction
    4. Text classification

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CSAE 2019

    Acceptance Rates

    Overall Acceptance Rate 368 of 770 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Block-Based Privacy-Preserving Healthcare Data Ranked Retrieval in Encrypted Cloud File SystemsIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2022.321268427:2(732-743)Online publication date: Feb-2023
    • (2023)How to prioritize perceived quality attributes from consumers' perspective? Analysis through social media dataElectronic Commerce Research10.1007/s10660-022-09652-7Online publication date: 10-Jan-2023
    • (2023)Privacy-Preserving Searchable Encryption Scheme Based on Deep Structured Semantic Model over Cloud ApplicationMachine Learning for Cyber Security10.1007/978-3-031-20099-1_49(584-608)Online publication date: 13-Jan-2023
    • (2021)Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessmentScientometrics10.1007/s11192-021-04028-4126:8(6937-6963)Online publication date: 23-Jun-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media