Abstract
This article proposes novel frameworks of SentiVerb and Spell Checker system, which extracts the reaction, mood, and opinion of users from social media text (SMT). The opinion of users is extracted from their written text on social media such as comments, tweets, blogs, feedbacks etc. and are classified as positive or negative opinion based on sentiment score of SMT using dictionary-based approach and a binary classifier. The dictionary-based approach uses opinion verb dictionary (OVD) to extract the sentiment of opinion verbs present in SMT. This OVD contain only opinion verbs along with their sentiment score. The various steps of the framework such as lower-case conversion, tokenization, spell checker, Part-of-Speech tagging, stop word elimination, stemming, sentiment score calculation, and classification of SMT has been discussed. A new concept of threshold negative parameter is first time introduced in this article. In the experiment, the proposed SentiVerb system’s performance is evaluated on three datasets such as Facebook comments on goods and services tax (GST) implementation in India, tweets on the debate between former president of USA Mr. Barack Obama and Mr. John McCain, and the movie reviews. Consequently, the implementation of the proposed SentiVerb system using rule-based classifier (RBC) gives the best performance result in term of accuracy with 82.5% on GST comments and 79.18% on Obama-McCain debate, which is better than the existing algorithms on the social issues related domain dataset(s). Also, system performance (accuracy of 71.3%) is better than others results on standard movie dataset.
Similar content being viewed by others
Notes
Negation words are those words which reverse the polarity of the word if they come before the positive or negative opinion word within a sentence.
References
Asghar MZ, Ahmad S, Qasim M, et al (2016) SentiHealth: creating health-related sentiment lexicon using hybrid approach. Springerplus 5. https://doi.org/10.1186/s40064-016-2809-x
Baccianella S, Esuli A, Sebastiani F (2010) SENTIWORDN ET 3 . 0 : an enhanced lexical resource for sentiment analysis and opinion mining. In: Calzolari N, Choukri K, Maegaard B, et al (eds) International Conference on Language Resources and Evaluation, LREC. Valletta, Malta, p 2200–2204
Benamara F, Irit S, Cesarano C, et al (2007) Sentiment Analysis : Adjectives and Adverbs are better than Adjectives Alone. In: In Proc of Int Conf on Weblogs and Social Media. p 1–4
Binali H, Potdar V, Wu C (2009) A state of the art opinion mining and its application domains. In: 2009 IEEE International Conference on Industrial Technology. p 1–6
Blitzer J, Blitzer J, Dredze M, et al (2007) Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). p 187–205
Clark E, Araki K (2011) Text normalization in social media: Progress, problems and applications for a pre-processing system of casual English. Procedia - Soc Behav Sci 27:2–11. https://doi.org/10.1016/j.sbspro.2011.10.577
Cruz L, Ochoa J, Roche M, Poncelet P (2017) Dictionary-based sentiment analysis applied to a specific domain. In: SIMBig 2015, SIMBig 2016: Information Management and Big Data. p 57–68
Das S, Chen M (2001) Yahoo! for Amazon: Extracting market sentiment from stock message boards. In: Proceedings of the Asia Pacific finance …. Bangkok, Thailand, p 43
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. Proc twelfth Int Conf World Wide Web - WWW ‘03 519. https://doi.org/10.1145/775152.775226
Davies M Word frequency data. http://www.wordfrequency.info/free.asp?s=y. Accessed 7 Feb 2017
Doyle D Stopwords. https://www.ranks.nl/stopwords. Accessed 2 Sep 2017
Dutta S, Saha T, Banerjee S, Naskar SK (2015) Text normalization in code-mixed social media text. 2015 IEEE 2nd Int Conf Recent Trends Inf Syst ReTIS 2015 - Proc 378–382. https://doi.org/10.1109/ReTIS.2015.7232908
Fernández-Gavilanes M, Álvarez-López T, Juncal-Martínez J et al (2016) Unsupervised method for sentiment analysis in online texts. Expert Syst Appl 58:57–75. https://doi.org/10.1016/j.eswa.2016.03.031
Guan T, Wang L, Jin J, Song X (2018) Knowledge contribution behavior in online Q & a communities : an empirical investigation. Comput Human Behav 81:137–147
Gupellil I, Boukhalfa K (2015) Social big data mining: a survey focused on opinion mining and sentiments analysis. 12th Int Symp Program Syst ISPS 2015 132–141. https://doi.org/10.1109/ISPS.2015.7244976
Han B, Cook P, Baldwin T (2013) Lexical normalization for social media text. ACM Trans Intell Syst Technol 4:1–27. https://doi.org/10.1145/2414425.2414430
Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the 35th annual meeting on Association for Computational Linguistics. p 174–181
Hearst MA (1992) Direction-Based Text Interpretation as an Information Access Refinement. In: Jacobs P, Erlbaum L (eds) Text-based Intelligent Systems. p 257–274
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ‘04. p 168–177
Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd international conference on World Wide Web - WWW ‘13. p 607–618
Hussein DME-DM (2016) A survey on sentiment analysis challenges. J King Saud Univ - Eng Sci. https://doi.org/10.1016/j.jksues.2016.04.002
Jivani AG (2011) A comparative study of stemming algorithms. Int J Comput Technol Appl 2:1930–1938
Kapko M (2015) 7 Staggering social media use by-the-minute stats. In: CIO From IDG. http://www.cio.com/article/2915592/social-media/7-staggering-social-media-use-by-the-minute-stats.html. Accessed 3 Jan 2017
Karamibekr M, Ghorbani AA (2012) Sentiment analysis of social issues. In: 2012 International Conference on Social Informatics. p 215–221
Karamibekr M, Ghorbani AA (2012) Verb oriented sentiment classification. In: Proceedings - 2012 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2012. p 327–331
Kim S-M, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of the 20th international conference on Computational Linguistics - COLING ‘04. Geneva, p 1367–1373
Leong LY, Jaafar NI, Ainin S (2018) The effects of Facebook browsing and usage intensity on impulse purchase in f-commerce. Comput Human Behav 78:160–173. https://doi.org/10.1016/j.chb.2017.09.033
Li H, Yamanishi K (2001) Mining from open answers in questionnaire data. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ‘01. p 443–449
Liu B (2012) Sentiment analysis and opinion mining. Morgan & Claypool Publishers, San Rafael
Liu F, Weng F, Wang B, Liu Y (2011) Insertion, deletion, or substitution? Normalizing text messages without pre-categorization nor supervision. In: ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. p 71–76
Liu F, Weng F, Jiang X (2012) A Broad-Coverage Normalization System for Social Media Language. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. p 1035–1044
Lu Y, Castellanos M, Dayal U, Zhai C (2011) Automatic construction of a context-aware sentiment lexicon. In: Proceedings of the 20th international conference on World wide web - WWW ‘11. ACM Press, New York, p 347
Luoa N, Guoa X, Lua B, Chen G (2018) Can non-work-related social media use benefit the company ? A study on internal blogging and affective organizational commitment. Comput Human Behav 81:84–92. https://doi.org/10.1016/j.chb.2017.12.004
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5:1093–1113. https://doi.org/10.1016/j.asej.2014.04.011
Modi SN (2017) Good and simple tax. In: Facebook. https://www.facebook.com/narendramodi/posts/10159500212090165. Accessed 6 Oct 2017
Morinaga S, Yamanishi K, Tateishi K, Fukushima T (2002) Mining product reputations on the Web. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ‘02. p 341
Mou J, Shin D (2018) Effects of social popularity and time scarcity on online consumer behaviour regarding smart healthcare products: an eye-tracking approach. Comput Human Behav 78:74–89. https://doi.org/10.1016/j.chb.2017.08.049
Nasukawa T, Yi J (2003) Sentiment analysis: Capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on knowledge capture. p 70–77
Ortega Bueno R, Fonseca Bruzón A, Muñiz Cuza C, et al (2015) UO_UA: using latent semantic analysis to build a domain-dependent sentiment resource. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Association for Computational Linguistics, Dublin, Ireland, p 773–778
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the ACL
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2:1–135
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. Proc ACL-02 Conf Empir methods Nat Lang Process (EMNLP ‘02) July 06–07 2002 Philadelphia, PA, USA Philadelph, p 79–86. https://doi.org/10.3115/1118693.1118704
Peng L, Cui G, Zhuang M, Li C (2014) What do seller manipulations of online product reviews mean to consumers?
Polanyi L, Zaenen A (2006) Contextual valence shifters. In: Shanahan JG, Qu Y, Wiebe J (eds) Computing attitude and affect in text: theory and applications, The Inform. Springer, Dordrecht, p 1–10
Press Trust Of India (2013) India to have the highest internet traffic growth rate. In: Bus. Stand. http://www.business-standard.com/article/technology/india-to-have-the-highest-internet-traffic-growth-rate-113071000014_1.html. Accessed 12 Oct 2017
QuinStreet Inc. webopedia. https://www.webopedia.com/quick_ref/textmessageabbreviations.asp. Accessed 20 Jul 2017
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl-Based Syst 89:14–46. https://doi.org/10.1016/j.knosys.2015.06.015
Sack W (1994) On the computation of point of view. In: Proceedings of the Association for Advancement of Artificial Intelligence. p 1488
Saif H, He Y, Fernandez M, Alani H (2014) Adapting sentiment lexicons using contextual semantics for sentiment analysis of twitter. In: Presutti V, Blomqvist E, Troncy R, Sack H, Papadakis ITA (eds) The semantic web: ESWC 2014 satellite events. ESWC 2014. Springer, Cham, pp 54–63
Saif H, Fernandez M, He Y, Alani H (2014) On Stopwords, filtering and data sparsity for sentiment analysis of twitter. Proc Ninth Int Conf Lang Resour Eval 810–817
Schmidbauer H, Rösch A, Stieler F (2018) The 2016 US presidential election and media on Instagram: who was in the lead? Comput Human Behav 81:148–160. https://doi.org/10.1016/j.chb.2017.11.021
Serrano-Guerrero J, Olivas JA, Romero FP, Herrera-Viedma E (2015) Sentiment analysis: a review and comparative analysis of web services. Inf Sci 311:18–38. https://doi.org/10.1016/j.ins.2015.03.040
Singh SK, Sachan MK (2017) Importance and challenges of social media text. Int J Adv Res Comput Sci 8:831–834. https://doi.org/10.26483/ijarcs.v8i3.3108
Singh SK, Paul S (2015) Sentiment analysis of social issues and sentiment score calculation of negative prefixes. Int J Appl Eng Res 10:1694–1699
Singh SK, Paul S, Kumar D (2014) Sentiment analysis approaches on different data set domain: survey. Int J Database Theory Appl 7:39–50. https://doi.org/10.14257/ijdta.2014.7.5.04
Singh SK, Paul S, Kumar D, Arfi H (2014) Sentiment analysis on twitter data set: survey. Int J Appl Eng Res 9:13925–13936
Singh PK, Singh SK, Paul S (2015) Sentiment classification of social issues using contextual valence shifters. Int J Eng Technol 7:1443–1452
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45:427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Somasundaran S, Wiebe J (2010) Recognizing stances in ideological on-line debates. In: Proceeding CAAGET ‘10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. Los Angeles, California, p 116–124
Subasic P, Huettner A (2001) Affect analysis of text using fuzzy semantic typing. IEEE Trans Fuzzy Syst 9:483–496. https://doi.org/10.1109/91.940962
Subrahmanian VS, Reforgiato D (2008) AVA : adjective- combinations for sentiment analysis. IEEE Intell Syst 23:43–50. https://doi.org/10.1109/MIS.2008.57
Taboada M, Anthony C, Voll K (2006) Methods for creating semantic orientation dictionaries. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC). Genoa, Italy, pp 427–432
Taboada M, Brooke J, Tofiloski M et al (2010) Lexicon-basedmethods for sentiment analysis. Comput Linguist 37:267–307. https://doi.org/10.1162/COLI_a_00049
Tong R (2001) An operational system for detecting and tracking opinions in on-line discussion. Work Notes ACM SIGIR 2001 Work Oper Text Classif, p 1–6
Turney PD (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. Proc 40th Annu Meet Assoc Comput Linguist - ACL ‘02 417–424. https://doi.org/10.3115/1073083.1073153
Vilnai-Yavetz I, Levina O (2018) Motivating social sharing of e-business content: intrinsic motivation, extrinsic motivation, or crowding-out effect? Comput Human Behav 79:181–191. https://doi.org/10.1016/j.chb.2017.10.034
Vohra SM, Teraiya JB (2012) A comparative study of sentiment analysis techniques. J Information, Knowl Res Comput Eng 2:313–317. https://doi.org/10.13140/2.1.4255.0722
Wang P, Ng HT (2013) A Beam-Search Decoder for Normalization of Social Media Text with Application to Machine Translation. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp 471–481
Washenko A (2015) The 75 most important social media acronyms. In: Sproutsocial Blog. https://sproutsocial.com/insights/social-media-acronyms/. Accessed 25 May 2017
Whissell CM (1989) The Dictionary of Affect in Language. In: The Measurement of Emotions. Elsevier, p 113–131
Wiebe JM (1990) Identifying subjective characters in narrative. In: Proceedings of the 13th conference on Computational linguistics (COLING ‘90). Helsinki, Finland, p 401–408
Wiebe J (1994) Tracking point of view in narrative. Comput Linguist 20:233–287
Wiebe JM (2000) Learning subjective adjectives from corpora. In: Proceedings of the National Conference on Artificial Intelligence. p 735–741
Wiebe JM, Bruce RF, O’Hara TP (1999) Development and use of a gold-standard data set for subjectivity classifications. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. p 246–253
XPO6 (2009) List of English Stop Words. http://xpo6.com/list-of-english-stop-words/. Accessed 20 Aug 2017
Yi J, Nasukawa T, Bunescu R, Niblack W (2003) Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Third IEEE International Conference on Data Mining. p 427–434
Acknowledgements
The authors would like to thank Sant Longowal Institute of Engineering and Technology, Punjab, India for providing the systems and infrastructure to support this research work. Also thank to Dr. Sanjeev Prakash and Smt. Preetpal Kaur Buttar for proof reading.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Appendix 2
Rights and permissions
About this article
Cite this article
Singh, S.K., Sachan, M.K. SentiVerb system: classification of social media text using sentiment analysis. Multimed Tools Appl 78, 32109–32136 (2019). https://doi.org/10.1007/s11042-019-07995-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07995-2