Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3243082.3267448acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

LIWBC: a bigram algorithm to enhance results in polarity classification

Published: 16 October 2018 Publication History

Abstract

The text mining literature shows a growing body of work concerned with the automatic identification of sentiment in text. Sentiment polarity classification is one of the most important text mining tasks. The typical approach to polarity classification uses lexicons to count word usage from linguistic or emotional aspects. One of the most widely used lexicons is the Linguistic Inquiry and Word Count (LIWC). LIWC assigns words to categories (e.g., positive emotion) based on a lexicon of words associated with psycholinguist categories. It has been widely used in polarity classification task with good results. However, it only accounts for word count, discarding the text structure and ignoring important semantic relationships between words. In this work, we present LIWBC, an algorithm to count bigrams using the lexicon provided by LIWC. The goal is to incorporate text structure information to improve the polarity classification task with LIWC lexicon. We conducted experiments to evaluate LIWBC with two real datasets: the first one consists of blogger posts; the second one is the movie reviews dataset, which contains full-text movie reviews from IMDB. Both datasets were processed with LIWC and LIWBC. After that, we ran four classification algorithms in the data processed by LIWC and LIWBC. The SVM algorithm executed with LIWBC data yielded the best result in both datasets. The F1 score of SVM in blogger posts and movie reviews dataset had an improvement of 2.2% and 2.5%, respectively.

References

[1]
Georg W. Alpers, Andrew J. Winzelberg, Catherine Classen, Heidi Roberts, Parvati Dev, Cheryl Koopman, and C. Barr Taylor. 2005. Evaluation of computerized text analysis in an Internet breast cancer support group. Computers in Human Behavior 21, 2 (2005), 361 - 376.
[2]
Flavio Carvalho and Gustavo Paiva Guedes. 2017. Night Sleep Deprivation: Computational Analysis of Language Effects. In Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web. ACM, Gramado, Brazil, 221-224.
[3]
Parvathi Chundi and April L Corbet. 2012. Analyzing sentiments from street harassment stories. In Proceedings of the 2012 workshop on Data-driven user behavioral modelling and mining from social media. ACM, Maui, HI, USA, 35-36.
[4]
Michelle Drouin, Ryan L Boyd, Jeffrey T Hancock, and Audrey James. 2017. Linguistic analysis of chat transcripts from child predator undercover sex stings. The Journal of Forensic Psychiatry & Psychology 28, 4 (2017), 437-457.
[5]
Delia Irazú Hernández Farías, Emilio Sulis, Viviana Patti, Giancarlo Ruffo, and Cristina Bosco. 2015. Valento: Sentiment analysis of figurative language tweets with irony and sarcasm. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). ACL, Denver, Colorado, 694-698.
[6]
Elisabetta Fersini, Federico Alberto Pozzi, and Enza Messina. 2015. Detecting irony and sarcasm in microblogs: The role of expressive signals and ensemble classifiers. In Data Science and Advanced Analytics (DSAA), 2015. IEEE, Paris, France, 1--8.
[7]
E. Gabrilovich and S. Markovitch. 2004. Text categorization with many redundant features: Using aggressive feature selection to make SVMs competitive with C4.5. Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004, 321-328.
[8]
Carlos E González-Gallardo, Azucena Montes, Gerardo Sierra, J Antonio Núnez-Juárez, Adolfo Jonathan Salinas-López, and Juan Ek. 2015. Tweets Classification using Corpus Dependent Tags, Character and POS N-grams. In CLEF (Working Notes).
[9]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10-18.
[10]
Stefan G Hofmann, Philippa M Moore, Cassidy Gutner, and Justin W Weeks. 2012. Linguistic correlates of social anxiety disorder. Cognition & emotion 26, 4 (2012), 720-726.
[11]
Francisco Iacobelli, Alastair J Gill, Scott Nowson, and Jon Oberlander. 2011. Large scale personality classification of bloggers. In Affective computing and intelligent interaction. Springer, 568-577.
[12]
Martin JH Balsters, Emiel J Krahmer, Marc GJ Swerts, and Ad JJM Vingerhoets. 2012. Verbal and nonverbal correlates for depression: a review. Current Psychiatry Reviews 8, 3 (2012), 227-234.
[13]
S. Kim, J.Y. Bak, and A. Oh. 2012. Do you feel what I feel? Social aspects of emotions in Twitter conversations. ICWSM 2012 - Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (2012), 495-498.
[14]
Adam DI Kramer. 2012. The spread of emotion via Facebook. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 767-770.
[15]
Bing Liu. 2010. Sentiment Analysis and Subjectivity. Handbook of natural language processing 2 (2010), 627-666.
[16]
Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5, 1 (2012), 1--167.
[17]
Christopher D Manning and Hinrich Schütze. 1999. Foundations of statistical natural language processing. MIT press.
[18]
Arman Khadjeh Nassirtoussi, Saeed Aghabozorgi, Teh Ying Wah, and David Chek Ling Ngo. 2014. Text mining for market prediction: A systematic review. Expert Systems with Applications 41, 16 (2014), 7653-7670.
[19]
Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 271.
[20]
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 79-86.
[21]
Javier Parapar, David E Losada, and Alvaro Barreiro. 2012. A Learning-Based Approach for the Identification of Sexual Predators in Chat Logs. In CLEF (Online Working Notes/Labs/Workshop).
[22]
James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report.
[23]
Terry F Pettijohn and Donald F Sacco Jr. 2009. The language of lyrics: An analysis of popular Billboard songs across conditions of social and economic threat. Journal of Language and Social Psychology 28, 3 (2009), 297-311.
[24]
Stephanie Rude, Eva-Maria Gortner, and James Pennebaker. 2004. Language use of depressed and depression-vulnerable college students. Cognition & Emotion 18, 8 (2004), 1121-1133.
[25]
Miguel A Sanchez-Perez, Ilia Markov, Helena Gómez-Adorno, and Grigori Sidorov. 2017. Comparison of Character n-grams and Lexical Features on Author, Gender, and Language Variety Identification on the Same Spanish News Corpus. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 145-151.
[26]
Jonathan Schler, Moshe Koppel, Shlomo Argamon, and James W Pennebaker. 2006. Effects of age and gender on blogging. In AAAI spring symposium: Computational approaches to analyzing weblogs, Vol. 6. 199-205.
[27]
H Andrew Schwartz, Johannes C Eichstaedt, Margaret L Kern, Lukasz Dziurzynski, Stephanie M Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin EP Seligman, et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one 8, 9 (2013), e73791.
[28]
Michele Settanni and Davide Marengo. 2015. Sharing feelings online: studying emotional well-being via automated text analysis of Facebook posts. Frontiers in psychology 6 (2015), 1045.
[29]
Daisaku Shibata, Shoko Wakamiya, Ayae Kinoshita, and Eiji Aramaki. 2016. Detecting Japanese patients with Alzheimer's disease based on word category frequencies. In Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP). 78-85.
[30]
Anna Stavrianou, Periklis Andritsos, and Nicolas Nicoloyannis. 2007. Overview and semantic issues of text mining. ACM Sigmod Record 36, 3 (2007), 23-34.
[31]
Yla R Tausczik and James W Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology 29, 1 (2010), 24-54.
[32]
Andranik Tumasjan, Timm Oliver Sprenger, Philipp G Sandner, and Isabell M Welpe. 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment. Icwsm 10, 1 (2010), 178-185.
[33]
Sida Wang and Christopher D Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 90-94.
[34]
John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Charagram: Embedding Words and Sentences via Character n-grams. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1504-1515.
[35]
Zhe Zhao, Tao Liu, Shen Li, Bofang Li, and Xiaoyong Du. 2017. Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 244-253.

Cited By

View all
  • (2019)Text Extraction and Clustering for Multimedia: A review on Techniques and Challenges2019 International Conference on Digitization (ICD)10.1109/ICD47981.2019.9105905(38-43)Online publication date: Nov-2019

Index Terms

  1. LIWBC: a bigram algorithm to enhance results in polarity classification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web
    October 2018
    437 pages
    ISBN:9781450358675
    DOI:10.1145/3243082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. LIWC
    2. Sentiment analysis
    3. Text mining

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Conference

    WebMedia '18
    WebMedia '18: Brazilian Symposium on Multimedia and the Web
    October 16 - 19, 2018
    BA, Salvador, Brazil

    Acceptance Rates

    WebMedia '18 Paper Acceptance Rate 37 of 111 submissions, 33%;
    Overall Acceptance Rate 270 of 873 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Text Extraction and Clustering for Multimedia: A review on Techniques and Challenges2019 International Conference on Digitization (ICD)10.1109/ICD47981.2019.9105905(38-43)Online publication date: Nov-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media