Article

Free access

Thumbs up?: sentiment classification using machine learning techniques

Authors:

Shivakumar VaithyanathanAuthors Info & Claims

EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Pages 79 - 86

https://doi.org/10.3115/1118693.1118704

Published: 06 July 2002 Publication History

Abstract

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

References

[1]

Shlomo Argamon-Engelson, Moshe Koppel, and Galit Avneri. 1998. Style-based text categorization: What newspaper am I reading? In Proc. of the AAAI Workshop on Text Categorization, pages 1--4.

[2]

Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71.

Digital Library

[3]

Douglas Biber. 1988. Variation across Speech and Writing. Cambridge University Press.

[4]

Stanley Chen and Ronald Rosenfeld. 2000. A survey of smoothing techniques for ME models. IEEE Trans. Speech and Audio Processing, 8(1):37--50.

[5]

Sanjiv Das and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proc. of the 8th Asia Pacific Finance Association Annual Conference (APFA 2001).

[6]

Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380--393.

Digital Library

[7]

Pedro Domingos and Michael J. Pazzani. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2--3):103--130.

Digital Library

[8]

Aidan Finn, Nicholas Kushmerick, and Barry Smyth. 2002. Genre classification and domain transfer for information filtering. In Proc. of the European Colloquium on Information Retrieval Research, pages 353--362, Glasgow.

Digital Library

[9]

Vasileios Hatzivassiloglou and Kathleen McKeown. 1997. Predicting the semantic orientation of adjectives. In Proc. of the 35th ACL/8th EACL, pages 174--181.

Digital Library

[10]

Vasileios Hatzivassiloglou and Janyce Wiebe. 2000. Effects of adjective orientation and gradability on sentence subjectivity. In Proc. of COLING.

Digital Library

[11]

Marti Hearst. 1992. Direction-based text interpretation as an information access refinement. In Paul Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Associates.

Digital Library

[12]

Alison Huettner and Pero Subasic. 2000. Fuzzy typing for document management. In ACL 2000 Companion Volume: Tutorial Abstracts and Demonstration Notes, pages 26--27.

[13]

Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proc. of the European Conference on Machine Learning (ECML), pages 137--142.

Digital Library

[14]

Thorsten Joachims. 1999. Making large-scale SVM learning practical. In Bernhard Schölkopf and Alexander Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 44--56. MIT Press.

Digital Library

[15]

Jussi Karlgren and Douglass Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proc. of COLING.

Digital Library

[16]

Brett Kessler, Geoffrey Nunberg, and Hinrich Schütze. 1997. Automatic detection of text genre. In Proc. of the 35th ACL/8th EACL, pages 32--38.

Digital Library

[17]

David D. Lewis. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proc. of the European Conference on Machine Learning (ECML), pages 4--15. Invited talk.

Digital Library

[18]

Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for Naive Bayes text classification. In Proc. of the AAAI-98 Workshop on Learning for Text Categorization, pages 41--48.

[19]

Frederick Mosteller and David L. Wallace. 1984. Applied Bayesian and Classical Inference: The Case of the Federalist Papers. Springer-Verlag.

[20]

Kamal Nigam, John Lafferty, and Andrew McCallum. 1999. Using maximum entropy for text classification. In Proc. of the IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67.

[21]

Ted Pedersen. 2001. A decision tree of bigrams is an accurate predictor of word sense. In Proc. of the Second NAACL, pages 79--86.

Digital Library

[22]

Warren Sack. 1994. On the computation of point of view. In Proc. of the Twelfth AAAI, page 1488. Student abstract.

Digital Library

[23]

Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Proc. of Innovative Applications of Artificial Intelligence (IAAI), pages 1058--1065.

Digital Library

[24]

Junichi Tatemura. 2000. Virtual reviewers for collaborative exploration of movie reviews. In Proc. of the 5th International Conference on Intelligent User Interfaces, pages 272--275.

Digital Library

[25]

Loren Terveen, Will Hill, Brian Amento, David McDonald, and Josh Creter. 1997. PHOAKS: A system for sharing recommendations. Communications of the ACM, 40(3):59--62.

Digital Library

[26]

Laura Mayfield Tomokiyo and Rosie Jones. 2001. You're not from round here, are you? Naive Bayes detection of non-native utterance text. In Proc. of the Second NAACL, pages 239--246.

[27]

Richard M. Tong. 2001. An operational system for detecting and tracking opinions in on-line discussion. Workshop note, SIGIR 2001 Workshop on Operational Text Classification.

[28]

Peter D. Turney and Michael L. Littman. 2002. Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report EGB-1094, National Research Council Canada.

[29]

Peter Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proc. of the ACL.

Digital Library

[30]

Janyce M. Wiebe, Theresa Wilson, and Matthew Bell. 2001. Identifying collocations for recognizing opinions. In Proc. of the ACL/EACL Workshop on Collocation.

[31]

Yorick Wilks and Mark Stevenson. 1998. The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation. Journal of Natural Language Engineering, 4(2):135--144.

Digital Library

Cited By

Mamta Singh GKori DEkbal ACai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Aspect-Based Multimodal Mining: Unveiling Sentiments, Complaints, and Beyond in User-Generated ContentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681703(6433-6442)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681703
Mamta Ekbal A(2024)Transformer based multilingual joint learning framework for code-mixed and english sentiment analysisJournal of Intelligent Information Systems10.1007/s10844-023-00808-x62:1(231-253)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s10844-023-00808-x
Son YChoi AWowak KAngst C(2024)Gender mismatch and bias in people‐centric operationsJournal of Operations Management10.1002/joom.124970:5(E1-E17)Online publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1002/joom.1249
Show More Cited By

Thumbs up?: sentiment classification using machine learning techniques
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the ...
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

July 2002

328 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 July 2002

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,048
Total Citations
View Citations
43,788
Total Downloads

Downloads (Last 12 months)1,217
Downloads (Last 6 weeks)338

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mamta Singh GKori DEkbal ACai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Aspect-Based Multimodal Mining: Unveiling Sentiments, Complaints, and Beyond in User-Generated ContentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681703(6433-6442)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681703
Mamta Ekbal A(2024)Transformer based multilingual joint learning framework for code-mixed and english sentiment analysisJournal of Intelligent Information Systems10.1007/s10844-023-00808-x62:1(231-253)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s10844-023-00808-x
Son YChoi AWowak KAngst C(2024)Gender mismatch and bias in people‐centric operationsJournal of Operations Management10.1002/joom.124970:5(E1-E17)Online publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1002/joom.1249
Dhurandhar ARamamurthy KAhuja KArya VOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Locally invariant explanationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666975(19410-19445)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666975
Bhatia KNarayan ADe Sa CRé COh ANaumann TGloberson ASaenko KHardt MLevine S(2023)TARTProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666549(9751-9788)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666549
Ali NTubaishat AAl-Obeidat FShabaz MWaqas MHalim ZRida IAnwar S(2023)Towards Enhanced Identification of Emotion from Resource-Constrained Language through a novel Multilingual BERT ApproachACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3592794Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3592794
Das RSingh T(2023)Multimodal Sentiment Analysis: A Survey of Methods, Trends, and ChallengesACM Computing Surveys10.1145/358607555:13s(1-38)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3586075
Das RSingh T(2023)Image–Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358486122:6(1-30)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3584861
Cunha WViegas FFrança CRosa TRocha LGonçalves M(2023)A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text ClassificationACM Computing Surveys10.1145/358200055:13s(1-52)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3582000
Chen JYao ZZhao SZhang Y(2023)Fusion Pre-trained Emoji Feature Enhancement for Sentiment AnalysisACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357858222:4(1-14)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3578582
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents