Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2390948.2390966dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Active learning for imbalanced sentiment classification

Published: 12 July 2012 Publication History

Abstract

Active learning is a promising way for sentiment classification to reduce the annotation cost. In this paper, we focus on the imbalanced class distribution scenario for sentiment classification, wherein the number of positive samples is quite different from that of negative samples. This scenario posits new challenges to active learning. To address these challenges, we propose a novel active learning approach, named co-selecting, by taking both the imbalanced class distribution issue and uncertainty into account. Specifically, our co-selecting approach employs two feature subspace classifiers to collectively select most informative minority-class samples for manual annotation by leveraging a certainty measurement and an uncertainty measurement, and in the meanwhile, automatically label most informative majority-class samples, to reduce human-annotation efforts. Extensive experiments across four domains demonstrate great potential and effectiveness of our proposed co-selecting approach to active learning for imbalanced sentiment classification.

References

[1]
Attenberg J. and F. Provost. 2010. Why Label when you can Search? Alternatives to Active Learning for Applying Human Resources to Build Classification Models Under Extreme Class Imbalance. In Proceeding of KDD-10, 423--432.
[2]
Blitzer J., M. Dredze and F. Pereira. 2007. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proceedings of ACL-07, 440--447.
[3]
Cui H., V. Mittal, and M. Datar. 2006. Comparative Experiments on Sentiment Classification for Online Product Reviews. In Proceedings of AAAI-06, pp. 1265--1270.
[4]
Doyle S., J. Monaco, M. Feldman, J. Tomaszewski and A. Madabhushi. 2011. An Active Learning based Classification Strategy for the Minority Class Problem: Application to Histopathology Annotation. BMC Bioinformatics, 12: 424, 1471--2105.
[5]
Ertekin S., J. Huang, L. Bottou and C. Giles. 2007a. Learning on the Border: Active Learning in Imbalanced Data Classification. In Proceedings of CIKM-07, 127--136.
[6]
Ertekin S., J. Huang, L. Bottou and C. Giles. 2007b. Active Learning in Class Imbalanced Problem. In Proceedings of SIGIR-07, 823--824.
[7]
Freund Y., H. Seung, E. Shamir and N. Tishby. 1997. Selective Sampling using the Query by Committee algorithm. Machine Learning, 28(2--3), 133--168.
[8]
He Y., C. Lin and H. Alani. 2011. Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification. In Proceeding of ACL-11, 123--131.
[9]
Lewis D. and W. Gale. 1994. Training Text Classifiers by Uncertainty Sampling. In Proceedings of SIGIR-94, 3--12.
[10]
Li F., Y. Tang, M. Huang and X. Zhu. 2009. Answering Opinion Questions with Random Walks on Graphs. In Proceedings of ACL-IJCNLP-09, 737--745.
[11]
Li S. and C. Zong. 2008. Multi-domain Sentiment Classification. In Proceedings of ACL-08, short paper, pp. 257--260.
[12]
Li S., C. Huang, G. Zhou and S. Lee. 2010. Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification. In Proceedings of ACL-10, pp. 414--423.
[13]
Li S., Z. Wang, G. Zhou and S. Lee. 2011a. Semi-supervised Learning for Imbalanced Sentiment Classification. In Proceeding of IJCAI-11, 826--1831.
[14]
Li S., G. Zhou, Z. Wang, S. Lee and R. Wang. 2011b. Imbalanced Sentiment Classification. In Proceedings of CIKM-11, poster paper, 2469--2472.
[15]
Lloret E., A. Balahur, M. Palomar, and A. Montoyo. 2009. Towards Building a Competitive Opinion Summarization System. In Proceedings of NAACL-09 Student Research Workshop and Doctoral Consortium, 72--77.
[16]
Kubat M. and S. Matwin. 1997. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In Proceedings of ICML-97, 179--186.
[17]
Muslea I., S. Minton and C. Knoblock. 2006. Active Learning with Multiple Views. Journal of Artificial Intelligence Research, vol. 27, 203--233.
[18]
Pang B. and L. Lee. 2008. Opinion Mining and Sentiment Analysis: Foundations and Trends. Information Retrieval, vol. 2(12), 1--135.
[19]
Pang B., L. Lee and S. Vaithyanathan. 2002.Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of EMNLP-02, 79--86.
[20]
Settles B. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin, Madison, 2009.
[21]
Turney P. 2002. Thumbs up or Thumbs down? Semantic Orientation Applied to Unsupervised Classification of reviews. In Proceedings of ACL-02, 417--424.
[22]
Wan X. 2009. Co-Training for Cross-Lingual Sentiment Classification. In Proceedings of ACL-IJCNLP-09, 235--243.
[23]
Yang Y. and G. Ma. 2010. Ensemble-based Active Learning for Class Imbalance Problem. J. Biomedical Science and Engineering, vol. 3, 1021--1028.
[24]
Zhang M. and X. Ye. 2008. A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval. In Proceedings of SIGIR-08, 411--418.
[25]
Zhu J. and E. Hovy. 2007. Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem. In Proceedings of ACL-07, 783--793.

Cited By

View all
  • (2022)Breaking the Curse of Class Imbalance: Bangla Text ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351160121:5(1-21)Online publication date: 29-Apr-2022
  • (2018)Incorporating Multi-Level User Preference into Document-Level Sentiment ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/323451218:1(1-17)Online publication date: 19-Nov-2018
  • (2017)An empirical study of self-training and data balancing techniques for splice site predictionInternational Journal of Bioinformatics Research and Applications10.1504/IJBRA.2017.08205513:1(40-61)Online publication date: 1-Jan-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
July 2012
1573 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 12 July 2012

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)4
Reflects downloads up to 28 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Breaking the Curse of Class Imbalance: Bangla Text ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351160121:5(1-21)Online publication date: 29-Apr-2022
  • (2018)Incorporating Multi-Level User Preference into Document-Level Sentiment ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/323451218:1(1-17)Online publication date: 19-Nov-2018
  • (2017)An empirical study of self-training and data balancing techniques for splice site predictionInternational Journal of Bioinformatics Research and Applications10.1504/IJBRA.2017.08205513:1(40-61)Online publication date: 1-Jan-2017
  • (2017)A Two-step Information Accumulation Strategy for Learning from Highly Imbalanced DataProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132940(1289-1298)Online publication date: 6-Nov-2017
  • (2017)A comparison study on active learning integrated ensemble approaches in sentiment analysisComputers and Electrical Engineering10.1016/j.compeleceng.2016.11.01557:C(311-323)Online publication date: 1-Jan-2017
  • (2015)Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samplesInformation Sciences: an International Journal10.1016/j.ins.2015.04.003317:C(67-77)Online publication date: 1-Oct-2015
  • (2015)Chinese comments sentiment classification based on word2vec and SVMperfExpert Systems with Applications: An International Journal10.1016/j.eswa.2014.09.01142:4(1857-1863)Online publication date: 1-Mar-2015
  • (2013)Contextual and active learning-based affect-sensing from virtual drama improvisationACM Transactions on Speech and Language Processing 10.1145/2407736.24077389:4(1-25)Online publication date: 30-Jan-2013

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media