Abstract
People express their opinions about things like products, celebrities and services using social media channels. The analysis of these textual contents for sentiments is a gold mine for marketing experts as well as for research in humanities, thus automatic sentiment analysis is a popular area of applied artificial intelligence. The chief objective of this paper is to investigate automatic sentiment analysis on social media contents over various text sources and languages. The comparative findings of the investigation may give useful insights to artificial intelligence researchers who develop sentiment analyzers for a new textual source. To achieve this, we describe supervised machine learning based systems which perform sentiment analysis and we comparatively evaluate them on seven publicly available English and Hungarian databases, which contain text documents taken from Twitter and product review sites. We discuss the differences among these text genres and languages in terms of document- and target-level sentiment analysis.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amigó E, Carrillo de Albornoz J, Chugur I, Corujo A, Gonzalo J, Martín T, Meij E, de Rijke M, Spina D, Amigo E, de Albornoz JC, Martin T, de Rijke M (2013) Overview of replab 2013: evaluating online reputation monitoring systems. In: Information access evaluation. multilinguality, multimodality, and visualization, pp 333–352
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the seventh international conference on language resources and evaluation (LREC’10)
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bohnet B (2010) Top accuracy and fast dependency parsing is not a contradiction. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), Beijing, China, pp 89–97
Ceylan H, Mihalcea R (2011) An efficient indexer for large N-gram corpora. In: ACL (system demonstrations), pp 103–108
Cossu JV, Bigot B, Bonnefoy L, Morchid M, Bost X, Senay G, Dufour R, Bouvier V, Torres-Moreno JM, El-Beze M (2013) LIA@RepLab 2013. In: Working notes of CLEF 2013 evaluation labs and workshop
Farkas R, Bohnet B (2012) Stacking of dependency and phrase structure parsers. In: Proceedings of COLING 2012, the COLING 2012 Organizing Committee, Mumbai, pp 849–866
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82. doi:10.1145/2436256.2436274
Foster J, Çetinoglu Ö, Wagner J, Le Roux J, Hogan S, Nivre J, Hogan D, Van Genabith J (2011) # hardtoparse: POS tagging and parsing the twitterverse. In: AAAI 2011 workshop on analyzing microtext, pp 20–25
Hangya V, Farkas R (2013) Filtering and polarity detection for reputation management on tweets. In: Working notes of CLEF 2013 evaluation labs and workshop
Hangya V, Berend G, Farkas R (2013) SZTE-NLP: sentiment detection on twitter messages. In: Second joint conference on lexical and computational semantics (*SEM), volume 2: proceedings of the seventh international workshop on semantic evaluation (SemEval 2013), pp 549–553
Hangya V, Berend G, Varga I, Farkas R (2014) SZTE-NLP: aspect level opinion mining exploiting syntactic cues. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014). Dublin, Ireland, pp 610–614
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188
Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent Twitter sentiment classification. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 151–160
Jindal N, Liu B, Street SM (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining
Kessler JS, Eckert M, Clark L, Nicolov N (2010) The 2010 ICWSM JDPA sentiment corpus for the automotive domain. In: 4th international AAAI conference on weblogs and social media data workshop challenge (ICWSM-DWC 2010)
Kiritchenko S, Zhu X, Cherry C, Mohammad S (2014) NRC-Canada-2014: detecting aspects and sentiment in customer reviews. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), SemEval, p 437
Klein D, Manning CD (2003) Accurate unlexicalized parsing. In: Proceedings of the 41st ACL, pp 423–430. doi:10.3115/1075096.1075150
Kong L, Schneider N, Swayamdipta S, Bhatia A, Dyer C, Smith NA (2014) A dependency parser for tweets. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, pp 1001–1012
Lazaridou A, Titov II, Sporleder CC (2013) A Bayesian model for joint unsupervised induction of sentiment, aspect and discourse representations. In: 51st annual meeting of the Association for Computational Linguistics, ACL 2013, pp 1630–1639
Li S, Zhou L, Li Y (2015) Improving aspect extraction by augmenting a frequency-based method with web-based similarity measures. Inf Process Manag 51(1):58–67. doi:10.1016/j.ipm.2014.08.005
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Martínez-Cámara E, Martín-Valadivia MT, Urena-López LA, Montejo-Ráez AR (2012) Sentiment analysis in Twitter. Nat Lang Eng 20(01):1–28. doi:10.1017/S1351324912000332
McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu
Miháltz M (2013) OpinHuBank: szabadon hozzáférhető annotált korpusz magyar nyelvű véleményelemzéshez. In: IX. Magyar Számítógépes Nyelvészeti Konferencia, pp 343–345
Montejo-Ráez A, Martínez-Cámara E, Martín-Valdivia MT, Ureña-López LA (2014) A knowledge-based approach for polarity classification in Twitter. J Assoc Inf Sci Technol 65(2):414–425. doi:10.1002/asi.22984
O’Connor B, Balasubramanyan R (2010) From tweets to polls: linking text sentiment to public opinion time series. In: ICWSM
Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S (2014) Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), SemEval ’14, pp 27–35
Poria S, Cambria E, Ku LW, Gui C, Gelbukh A (2014) A rule-based approach to aspect extraction from product reviews. In: Proceedings of the second workshop on natural language processing for social media (SocialNLP), Association for Computational Linguistics and Dublin City University, Dublin, pp 28–37
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl-Based Syst 89:14–46. doi:10.1016/j.knosys.2015.06.015
Reyes A, Rosso P (2013) On the difficulty of automatically detecting irony: beyond a simple case of negation. Knowl Inf Syst 40(3):595–614. doi:10.1007/s10115-013-0652-8
Rosenthal S, Nakov P, Ritter A, Stoyanov V (2014) Semeval-2014 task 9: Sentiment analysis in twitter. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), SemEval, pp 73–80
Sang ETK, Bos J (2012) Predicting the 2011 Dutch Senate Election results with Twitter. In: Proceedings of the 13th conference of the European Chapter of the Association for Computational Linguistics, pp 53–60
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Szántó Zs, Farkas R (2014) Special techniques for constituent parsing of morphologically rich languages. In: Proceedings of the 14th conference of the European Chapter of the Association for Computational Linguistics, pp 135–144
Varga I, Sano M, Torisawa K, Hashimoto C, Ohtake K, Kawai T, Oh JH, De Saeger S (2013) Aid is out there: looking for help from tweets during a large scale disaster. In: Proceedings of the 51st annual meeting of the ACL, pp 1619–1629
Vilares D, Alonso MA, Gómez-Rodriguez C (2015a) A syntactic approach for opinion mining on Spanish reviews. Nat Lang Eng 21(01):139–163. doi:10.1017/S1351324913000181
Vilares D, Alonso MA, Gómez-Rodríguez C (2015b) On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages. J Assoc Inf Sci Technol 66(9):1799–1816. doi:10.1002/asi.23284
Vinodhini G, Chandrasekaran RM (2012) Sentiment analysis and opinion mining: a survey. Int J Adv Res Comput Sci Softw Eng 2(6):282–292
Wagner J, Arora P, Cortes S (2014) DCU: aspect-based polarity classification for semeval task 4. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp 223–229
Wiegand M, Balahur A, Roth B, Klakow D, Montoyo A (2010) A survey on the role of negation in sentiment analysis. In: Proceedings of the workshop on negation and speculation in natural language processing, pp 60–68
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics, pp 347–354
Wilson T, Kozareva Z, Nakov P, Rosenthal S, Stoyanov V, Ritter A (2013) SemEval-2013 Task 2: sentiment analysis in Twitter. In: Proceedings of the international workshop on semantic evaluation, SemEval‘3
Zhang C, Zeng D, Li J, Wang FY, Zuo W (2009) Sentiment analysis of Chinese documents: from sentence to document level. J Am Soc Inf Sci Technol 60(12):2474–2487. doi:10.1002/asi.21206
Zhu X, Kiritchenko S, Mohammad S (2014) NRC-Canada-2014: recent improvements in the sentiment analysis of tweets. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp 443–447
Zsibrita J, Vincze V, Farkas R (2013) Magyarlanc: a toolkit for morphological and dependency parsing of Hungarian. In: Proceedings of RANLP, pp 763–771
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hangya, V., Farkas, R. A comparative empirical study on social media sentiment analysis over various genres and languages. Artif Intell Rev 47, 485–505 (2017). https://doi.org/10.1007/s10462-016-9489-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-016-9489-3