Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1557019.1557156acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Sentiment analysis of blogs by combining lexical knowledge with text classification

Published: 28 June 2009 Publication History

Abstract

The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies, that are increasingly concerned about monitoring the discussion around their products. Tracking such discussion on weblogs, provides useful insight on how to improve products or market them more effectively. An important component of such analysis is to characterize the sentiment expressed in blogs about specific brands and products. Sentiment Analysis focuses on this task of automatically identifying whether a piece of text expresses a positive or negative opinion about the subject matter. Most previous work in this area uses prior lexical knowledge in terms of the sentiment-polarity of words. In contrast, some recent approaches treat the task as a text classification problem, where they learn to classify sentiment based only on labeled training data. In this paper, we present a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. Empirical results on diverse domains show that our approach performs better than using background knowledge or training data in isolation, as well as alternative approaches to using lexical knowledge with text classification.

References

[1]
R. Agrawal, R. J. B. Jr., and R. Srikant. Athena: Mining-based interactive management of text databases. In Extending Database Technology, 2000.
[2]
Blogpulse: A service of nielsen buzzmetrics. http://www.blogpulse.com/.
[3]
R. T. Clemen and R. L. Winkler. Combining probability distributions from experts in risk analysis. Risk Analysis, 19:187--203, 1999.
[4]
W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Transferring naive Bayes classifiers for text classification. In AAAI, 2007.
[5]
S. Das and M. Chen. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Asia Pacific Finance Association, 2001.
[6]
A. Dayanik, D. D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing informative prior distributions from domain knowledge in text classification. In SIGIR, 2006.
[7]
G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In SIGIR, 2008.
[8]
K. T. Durant and M. D. Smith. Advances in Web Mining and Web Usage Analysis, chapter Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection. Springer, 2007.
[9]
Extracting the main content from a webpage. http://w-shadow.com/blog/2008/01/25/extracting-the-main-content-from-a-webpage/.
[10]
S. French. Group consensus probability distributions: A critical survey. In Bayesian Statistics 2, pages 183--197. North-Holland, 1985.
[11]
C. Genest and J. V. Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1:114--135, 1986.
[12]
M. Hu and B. Liu. Mining and summarizing customer reviews. In KDD, 2004.
[13]
S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In COLING, 2004.
[14]
B. Liu. Web Data Mining. Springer, 2007.
[15]
B. Liu, X. Li, W. S. Lee, and P. Yu. Text classification by labeling words. In AAAI, 2004.
[16]
A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI Workshop on Text Categorization, 1998.
[17]
V. Ng, S. Dasgupta, and S. M. N. Arifin. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In ACL, 2006.
[18]
K. Nigam. Using Unlabeled Data to Improve Text Classification. PhD thesis, Carnegie Mellon University, 2001.
[19]
B. Pang and L. Lee. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In ACL, 2004.
[20]
B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In EMNLP, 2002.
[21]
M. F. Porter. An algorithm for suffix stripping, pages 313--316. Morgan Kaufmann Publishers Inc., 1997.
[22]
G. Ramakrishnan, A. Jadhav, A. Joshi, S. Chakrabarti, and P. Bhattacharyya. Question answering via Bayesian inference on lexical relations. In ACL Workshop on Multilingual Summarization and Question Answering, 2003.
[23]
R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197--227, 1990.
[24]
R. E. Schapire, M. Rochery, M. G. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In ICML, 2002.
[25]
J. Shavlik. A framework for combining symbolic and neural learning. In Machine Learning, 1992.
[26]
V. Sindhwani and P. Melville. Document-word co-regularization for semi-supervised sentiment analysis. In ICDM, 2008.
[27]
S. Spangler, Y. Chen, L. Proctor, A. Lelescu, A. Behal, B. He, T. Griffin, A. Liu, B. Wade, and T. Davis. COBRA-Mining Web for Corporate Brand and Reputation Analysis. IEEE International Conference on Web Intelligence, 2007.
[28]
P. Turney. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. ACL, 2002.
[29]
T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In EMNLP, 2005.
[30]
R. L. Winkler. The consensus of subjective probability distributions. Management Science, 15:361--375, 1968.
[31]
X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD, 2004.
[32]
H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In EMNLP, 2003.
[33]
L. Zhuang, F. Jing, and X.-Y. Zhu. Movie review mining and summarization. In CIKM, 2006.

Cited By

View all
  • (2025)Adversarial contrastive representation training with external knowledge injection for zero-shot stance detectionNeurocomputing10.1016/j.neucom.2024.128849614(128849)Online publication date: Jan-2025
  • (2025)A parameter-free text classification method based on dual compressorsKnowledge and Information Systems10.1007/s10115-024-02335-9Online publication date: 13-Jan-2025
  • (2024)Twitter Sentiment Analysis of the Low-Cost Airline Services After COVID-19 Outbreak: The Case of AirAsiaBusiness Systems Research Journal10.2478/bsrj-2023-000914:2(1-23)Online publication date: 25-Apr-2024
  • Show More Cited By

Index Terms

  1. Sentiment analysis of blogs by combining lexical knowledge with text classification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
    June 2009
    1426 pages
    ISBN:9781605584959
    DOI:10.1145/1557019
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. background knowledge
    2. blog analysis
    3. dual supervision
    4. movie reviews
    5. naive bayes
    6. opinion mining
    7. political blogs
    8. prior knowledge
    9. sentiment analysis
    10. technology blogs
    11. text mining

    Qualifiers

    • Research-article

    Conference

    KDD09

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)94
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Adversarial contrastive representation training with external knowledge injection for zero-shot stance detectionNeurocomputing10.1016/j.neucom.2024.128849614(128849)Online publication date: Jan-2025
    • (2025)A parameter-free text classification method based on dual compressorsKnowledge and Information Systems10.1007/s10115-024-02335-9Online publication date: 13-Jan-2025
    • (2024)Twitter Sentiment Analysis of the Low-Cost Airline Services After COVID-19 Outbreak: The Case of AirAsiaBusiness Systems Research Journal10.2478/bsrj-2023-000914:2(1-23)Online publication date: 25-Apr-2024
    • (2024)Interactive Machine Teaching by Labeling Rules and InstancesTransactions of the Association for Computational Linguistics10.1162/tacl_a_0070712(1441-1459)Online publication date: 18-Nov-2024
    • (2024)Deep Learning for Reddit Text Classification: TextCNN and TextRNN Approaches2024 4th Interdisciplinary Conference on Electrics and Computer (INTCEC)10.1109/INTCEC61833.2024.10603240(1-7)Online publication date: 11-Jun-2024
    • (2024)A Comprehensive Survey on Affective Computing: Challenges, Trends, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.342248012(96150-96168)Online publication date: 2024
    • (2024)Machine learning and rule-based embedding techniques for classifying text documentsInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02555-w15:12(5637-5652)Online publication date: 24-Oct-2024
    • (2024)CSMF-SPC: Multimodal Sentiment Analysis Model with Effective Context Semantic Modality Fusion and Sentiment Polarity CorrectionPattern Analysis and Applications10.1007/s10044-024-01320-w27:3Online publication date: 23-Aug-2024
    • (2024)Radical-attended and Pinyin-attended malicious long-tail keywords detectionNeural Computing and Applications10.1007/s00521-024-09871-z36:24(14757-14773)Online publication date: 10-May-2024
    • (2024)Multi-dimensional Edge-Embedded GCNs for Arabic Text ClassificationLinking Theory and Practice of Digital Libraries10.1007/978-3-031-72437-4_14(241-255)Online publication date: 26-Sep-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media