Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

ClassiNet -- Predicting Missing Features for Short-Text Classification

Published: 27 June 2018 Publication History

Abstract

Short and sparse texts such as tweets, search engine snippets, product reviews, and chat messages are abundant on the Web. Classifying such short-texts into a pre-defined set of categories is a common problem that arises in various contexts, such as sentiment classification, spam detection, and information recommendation. The fundamental problem in short-text classification is feature sparseness -- the lack of feature overlap between a trained model and a test instance to be classified. We propose ClassiNet -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex vi in the ClassiNet, where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge eij connecting a vertex vi to a vertex vj represents the conditional probability that given vi exists in an instance, vj also exists in the same instance.
We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance x, we find similar features from ClassiNet that did not appear in x, and append those features in the representation of x. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

References

[1]
Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM 51, 1 (2008), 117--122.
[2]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022.
[3]
John Blitzer, Mark Dredze, and Fernando Pereira. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the Association for Computational Linguistics (ACL’07). 440--447.
[4]
John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods for Natural Language Processing (EMNLP’06). 120--128.
[5]
D. Bollegala, Y. Matsuo, and M. Ishizuka. 2007. Measuring semantic similarity between words using web search engines. In Proceedings of the World Wide Web (WWW’07. 757--766.
[6]
José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. NASARI: A novel approach to a semantically-aware representation of items. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’15). Association for Computational Linguistics, Denver, Colorado, 567--577. http://www.aclweb.org/anthology/N15-1059
[7]
Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. Journal of ACL Computing Surveys 44, 1 (2012), 1--50.
[8]
Moses Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the Symposium on Theory of Computing (STOC’02). 380--388.
[9]
Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahadanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST’14). 103--111.
[10]
Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuska. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (2011), 2493--2537.
[11]
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, and Yueheng Sun. 2008. Finding question-answer pairs from online forums. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). 467--474.
[12]
Zichao Dai, Aixin Sun, and Xu-Ying Liu. 2013. CREST: Cluster-based representation enrichment for short text classification. In Proceedings of the Advances in Knowledge Discovery and Data Mining (AKDDM’13). 256--267.
[13]
Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the Computational Linguistics (COLING’14). 69--78. http://www.aclweb.org/anthology/C14-1008
[14]
Hui Fang. 2008. A re-examination of query expansion using lexical resources. In Proceedings of the Association for Computational Linguistics (ACL’08). 139--147.
[15]
Zhiguo Gong, Chan Wa Cheang, and Leong Hou U. 2005. Web query expansion by wordnet. In Proceedings of the Database and Expert Systems Applications (DEXA’05). 166--175.
[16]
Hu Guan, Jinguy Zhou, and Minyi Guo. 2009. A class-feature-centroid classifier for text categorization. In Proceedings of the World Wide Web (WWW’09). 201--210.
[17]
Xiaofei He and Partha Niyogi. 2003. Locality preserving projections. In Proceedings of the Neural Information Processing Systems (NIPS’03). 153--160.
[18]
Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016a. Learning disributed representations of sentences from unlabelled data. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16). 1367--1377.
[19]
Felix Hill, KyungHyun Cho, Anna Korhonen, and Yoshua Bengio. 2016b. Learning to understand phrases by embedding the dictionary. Transactions of the Association for Computational Linguistics 4 (2016), 17--30. https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/711
[20]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
[21]
Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the Knowledge Discovery and Data Mining (KDD’04). 168--177.
[22]
Wenpeng Hu, Jiajun Zhang, and Nan Zheng. 2016. Different contexts lead to different word embeddings. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). The COLING 2016 Organizing Committee, Osaka, Japan, 762--771. http://aclweb.org/anthology/C16-1073
[23]
Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the Association for Computational Linguistics (ACL’12). 873--882.
[24]
Ignacio Iacobacci, Mohammed Taher Pilehvar, and Roberto Navigli. 2015a. SenseEmbed: Learning sense embeddings for word and relational similarty. In Proceedings of the Association for Computational Linguistics (ACL’15). 95--105.
[25]
Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015b. SensEmbed: Learning sense embeddings for word and relational similarity. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 95--105. http://www.aclweb.org/anthology/P15-1010
[26]
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of Computing (STOC’98). 604--613.
[27]
Richard Johansson and Luis Nieto Piña. 2015. Embedding a semantic network in a word space. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’15). Association for Computational Linguistics, Denver, Colorado, 1428--1433. http://www.aclweb.org/anthology/N15-1164
[28]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 1746--1751. http://www.aclweb.org/anthology/D14-1181
[29]
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. Skip-thought vectors. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’15). 3276--3284.
[30]
Zornista Kozareva and Eduard Hovy. 2010. Not all seeds are equal: Measuring the quality of text mining seeds. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’10). 618--626.
[31]
Bing kun Wang, Yong feng Huang, Wan xia Yang, and Xing Li. 2012. Short text classification based on strong feature thesaurus. Journal of Zhejiang University-SCIENCE C (Computers and Electronics) 13, 9 (2012), 649--659.
[32]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media?. In Proceedings of the World Wide Web (WWW’10). 591--600.
[33]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning (ICML’14). 1188--1196.
[34]
Jiwei Li and Dan Jurafsky. 2015. Do multi-sense embeddings improve natural language understanding? In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). Association for Computational Linguistics, Lisbon, Portugal, 1722--1732. http://aclweb.org/anthology/D15-1200
[35]
Juzheng Li, Jun Zhu, and Bo Zhang. 2016b. Discriminative deep random walk for network classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL’16). Association for Computational Linguistics, Berlin, Germany, 1004--1013. http://www.aclweb.org/anthology/P16-1095
[36]
Shaohua Li, Tat-Seng Chua, Jun Zhu, and Chunyan Miao. 2016a. Generative topic embedding: A continuous representation of documents. In Proceedings of the Association for Computational Linguistics (ACL’16). 666--675.
[37]
Pengfei Liu, Xipeng Qiu, and Xuangjing Huang. 2015b. Learning context-sensitive word embeddings with neural tensor skip-gram model. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’15). 1284--1290.
[38]
Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015a. Topical word embeddings. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 2418--2424.
[39]
Zhengdong Lu and Hang Li. 2013. A deep architecture for matching short texts. In Proceedings of the Neural Information Processing Systems (NIPS’13). 1367--1375.
[40]
Yuan Man. 2014. Feature extension for short text categorization using frequent term sets. In Proceedings of the International Conference on Information Technology and Quantitative Management. 663--670.
[41]
Christopher D. Manning and Hinrich Schutze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts.
[42]
Rada Mihalcea and Dragomir Radev. 2011. Graph-based Natural Language Processing and Information Retrieval. Cambridge University Press.
[43]
Tomas Mikolov, Kai Chen, and Jeffrey Dean. 2013. Efficient estimation of word representation in vector space, In Proceedings of the International Conference on Learning Representations(CoRR’13).
[44]
George A. Miller. 1995. WordNet: A lexical database for english. Communications of the ACM 38, 11 (November 1995), 39--41.
[45]
Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. 2014. Efficient non-parametric estimation of multiple embeddings per word in vector space. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 1059--1069. https://www.youtube.com/watch?v=EeBj4TyW8B88feature=youtu.be
[46]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report SIDL-WP-1999-0120. Stanford InfoLab.
[47]
Sinno Jialin Pan, Xiaochuan Ni, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the World Wide Web (WWW’10). 751--760.
[48]
Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the Association for Computational Linguistics (ACL).
[49]
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the Association for Computational Linguistics (ACL’05). 115--124.
[50]
Jeffery Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.
[51]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 701--710.
[52]
Aniket Rangrej, Sayali Kulkarni, and Ashish V. Tendulkar. 2011. Comparative study of clustering techniques for short text documents. In Proceedings of the World Wide Web (WWW’11). 111--112.
[53]
Deepak Ravichandran, Patrick Pantel, and Eduard Hovy. 2005. Randomized algorithms and NLP: Using locality sensitive hash functions for high speed noun clustering. In Proceedings of the Association for Computational Linguistics (ACL’05). 622--629.
[54]
Joseph Reisinger and Raymond J. Mooney. 2010. Multi-prototype vector-space models of word meaning. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’10). 109--117.
[55]
Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes twitter uers: Real-time event detection by social sensors. In Proceedings of the World Wide Web (WWW’10). 851--860.
[56]
G. Salton and C. Buckley. 1983. Introduction to Modern Information Retreival. McGraw-Hill Book Company.
[57]
Bei Shi, Wai Lam, Shoaib Jameel, Steven Schockaert, and Kwun Ping Lai. 2017. Jointly learning word embeddings and latent topics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). 375--384.
[58]
Linfeng Song, Zhiguo Wang, Haitao Mi, and Daniel Gildea. 2016. Sense embedding learning for word sense induction. In Proceeding of the Fifth Joint Conferene on Lexical and Compositional Semantics (*SEM'16). 85--90.
[59]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural network from overfitting. Journal of Machine Learning Research 15 (2014), 1929--1958.
[60]
Jiang Su, Jelber Sayyad-Shirabad, and Stan Matwin. 2011. Large scale text classification using semi-supervised multinomial naive bayes. In Proceedings of the International Conference on Machine Learning (ICML’11).
[61]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 1067--1077.
[62]
Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvind Kappas. 2010. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology 61, 12 (December 2010), 2544--2558.
[63]
Jason Weston, Sumit Chopra, and Keith Adams. 2014. #TagSpace: Semantic embeddings from hashtags. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 1822--1827. http://www.aclweb.org/anthology/D14-1194
[64]
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. 2013. A biterm topic model for short texts. In Proceedings of the World Wide Web (WWW’13). 1445--1456.
[65]
Shansong Yang, Weiming Lu, Dezhi Yang, Liang Yao, and Baogang Wei. 2015. Short text understanding by leveraging knowledge into topic model. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’15). Association for Computational Linguistics, 1232--1237.
[66]
Dani Yogatama and Noah A. Smith. 2014. Making the most of bag of words: Sentence regularization with alternating direction method of multipliers. In Proceedings of the International Conference on Machine Learning (ICML’14). 656--664.
[67]
Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In IEEE 2015 International Conference on Computer Vision (ICCV'15).

Cited By

View all
  • (2024)Prompt-Learning for Short Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333278736:10(5328-5339)Online publication date: 1-Oct-2024
  • (2023)TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution informationFrontiers in Genetics10.3389/fgene.2023.124387414Online publication date: 5-Oct-2023
  • (2023)Experimenting Datasets and Machine Learning Techniques for Enhancing Cyberbullying Detection2023 IEEE 11th Conference on Systems, Process & Control (ICSPC)10.1109/ICSPC59664.2023.10420359(379-383)Online publication date: 16-Dec-2023
  • Show More Cited By

Index Terms

  1. ClassiNet -- Predicting Missing Features for Short-Text Classification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 5
    October 2018
    354 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3234931
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2018
    Accepted: 01 March 2018
    Revised: 01 March 2018
    Received: 01 May 2017
    Published in TKDD Volume 12, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Classifier networks
    2. feature sparseness
    3. short-texts
    4. text classification

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • ERATO Kawarabayashi Large Graph Project from the Japan Science and Technology Agency (JST)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 28 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Prompt-Learning for Short Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333278736:10(5328-5339)Online publication date: 1-Oct-2024
    • (2023)TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution informationFrontiers in Genetics10.3389/fgene.2023.124387414Online publication date: 5-Oct-2023
    • (2023)Experimenting Datasets and Machine Learning Techniques for Enhancing Cyberbullying Detection2023 IEEE 11th Conference on Systems, Process & Control (ICSPC)10.1109/ICSPC59664.2023.10420359(379-383)Online publication date: 16-Dec-2023
    • (2022)Innovative Research by Using IoT Applications on Cross-National English Cultural Communication Based on Crowdsourcing Translation ModelWireless Communications and Mobile Computing10.1155/2022/47234602022(1-11)Online publication date: 21-Aug-2022
    • (2022)In Search of Insight from Unstructured Text Data: Towards an Identification of Text Mining TechniquesDigital Science10.1007/978-3-030-93677-8_52(591-603)Online publication date: 17-Jan-2022
    • (2021)A Chinese document parsing and code recognition system using Regex and SVM2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)10.1109/IAEAC50856.2021.9390975(1860-1864)Online publication date: 12-Mar-2021
    • (2020)Application-Oriented Approach for Detecting Cyberaggression in Social MediaAdvances in Artificial Intelligence, Software and Systems Engineering10.1007/978-3-030-51328-3_19(129-136)Online publication date: 4-Jul-2020
    • (2019)Cross-border e-commerce commodity risk assessment using text mining and fuzzy rule-based reasoningAdvanced Engineering Informatics10.1016/j.aei.2019.03.00240(69-80)Online publication date: Apr-2019

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media