Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Computational approaches for mining user's opinions on the Web 2.0

Published: 01 November 2014 Publication History

Abstract

We carry out an empirical analysis to determine characteristics of social media channels.User generated content is "noisy" and contains mistakes, emoticons, etc.We evaluate text preprocessing algorithms regarding user generated content.Discussion of improvements to opinion mining process. The emerging research area of opinion mining deals with computational methods in order to find, extract and systematically analyze people's opinions, attitudes and emotions towards certain topics. While providing interesting market research information, the user generated content existing on the Web 2.0 presents numerous challenges regarding systematic analysis, the differences and unique characteristics of the various social media channels being one of them. This article reports on the determination of such particularities, and deduces their impact on text preprocessing and opinion mining algorithms. The effectiveness of different algorithms is evaluated in order to determine their applicability to the various social media channels. Our research shows that text preprocessing algorithms are mandatory for mining opinions on the Web 2.0 and that part of these algorithms are sensitive to errors and mistakes contained in the user generated content.

References

[1]
A. Abbasi, H. Chen, A. Salem, Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums, ACM Transactions on Information Systems (TOIS), 26 (2008) 12-34.
[2]
E. Airoldi, X. Bai, R. Padman, Markov blankets and meta-heuristics search: sentiment extraction from unstructured texts, in: Lecture notes in computer science advances in web mining and web usage analysis, Springer, Berlin, Heidelberg, 2006, pp. 167-187.
[3]
Balahur, A. (2013). Sentiment analysis in social media texts. In Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 120-128).
[4]
A. Balahur, R. Steinberger, E. van der Goot, B. Pouliquen, M. Kabadjov, Opinion mining on newspaper quotations, in: WI-IAT '09, Proceedings of the 2009 IEEE/WIC/ACM international joint conference on web intelligence and intelligent agent technology /// IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technologies, 2009. WI-IAT '09; 15-18 September 2009, Milano, Italy; proceedings, Vol. 03, IEEE, Piscataway, NJ, 2009, pp. 523-526.
[5]
A. Balahur, M. Turchi, Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis, Computer Speech & Language (2013).
[6]
Banea, C., Mihalcea, R., & Wiebe, J. M. (2010). Multilingual subjectivity: Are more languages better? In Proceedings of the 23rd international conference on computational linguistics (pp. 28-36).
[7]
Bergsma, S., McNamee, P., Bagdouri, M., Fink, C., & Wilson, T. (2012). Language identification for creating language-specific Twitter collections. In Proceedings of the second workshop on language in social media (pp. 65-74).
[8]
Bollegala, D., Weir, D. J., & Carroll, J. (2011). Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification. In Proceedings of the 49th annual meeting of the association for computational linguistics (pp. 132-141).
[9]
J. Bollen, H. Mao, X. Zeng, Twitter mood predicts the stockmarket, Journal of Computational Science, 2 (2011) 1-8.
[10]
Boyd-Graber, J., & Resnik, P. (2010). Holistic sentiment analysis across languages: Multilingual supervised latent dirichlet allocation. In Proceedings of the 2010 conference on empirical methods in natural language processing. EMNLP-2010 (pp. 45-55). Association for Computational Linguistics.
[11]
Brody, S., & Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! using word lengthening to detect sentiment in microblogs. In Proceedings of the conference on empirical methods in natural language processing (pp. 562-570).
[12]
Caumanns, J. (1999). A fast and simple stemming algorithm for german words.
[13]
Chaovalit, P., & Zhou, L. (2005). Movie review mining: A comparison between supervised and unsupervised classification approaches. In Proceedings of the 38th annual Hawaii international conference on system sciences (pp. 112-121).
[14]
Choi, Y., & Cardie, C. (2010). Hierarchical sequential learning for extracting opinions and their attributes. In Proceedings of the ACL 2010 conference short papers (pp. 269-274).
[15]
Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005). Identifying sources of opinions with conditional random fields and extraction patterns. In Proceedings of human language technology conference and conference on empirical methods in natural language processing (pp. 355-362).
[16]
Clark, A. (2003). Pre-processing very noisy text. In Proceedings of workshop on shallow processing of large corpora (pp. 12-22).
[17]
Cui, H., Mittal, V., & Datar, M. (2006). Comparative experiments on sentiment classification for online product reviews. In Proceedings of AAAI-2006 (pp. 1265-1270).
[18]
Cunningham, H., Maynard, D., Bontcheva, K., & Tablan, V. (2002). GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL'02, Proceedings of the 40th anniversary meeting of the association for computational linguistics.
[19]
Davidov, D., Tsur, O., & Rappoport, A. (2010). Enhanced sentiment learning using Twitter hashtags and smileys. In Proceedings of the 23rd international conference on computational linguistics. Posters (pp. 241-249).
[20]
Derczynski, L., Maynard, D., Aswani, N., & Bontcheva, K. (2013). Microblog-genre noise and impact on semantic annotation accuracy. In 24th ACM conference on hypertext and social media.
[21]
Derczynski, L., Ritter, A., Clark, S., & Bontcheva, K. (2013). Twitter part-of-speech tagging for all: Overcoming sparse and noisy data. In Proceedings of the international conference on recent advances in natural language processing (pp. 198-206).
[22]
Derczynski, L., Yang, B., & Jensen, C. S. (2013). Towards context-aware search and analysis on social media data. In Proceedings of the 16th international conference on extending database technology (pp. 137-142).
[23]
L. Dey, S.M. Haque, Opinion mining from noisy text data, International Journal on Document Analysis and Recognition (IJDAR), 12 (2009) 205-226.
[24]
X. Ding, B. Liu, P.S. Yu, A holistic lexicon-based approach to opinion mining, in: International conference on web search & data mining. Palo Alto, California, February 11-12, 2008, ACM, New York, NY, 2008.
[25]
Gotoh, Y., & Renals, S. (2000). Sentence boundary detection in broadcast speech transcripts. In Automatic speech REcognition: Challenges for the new millennium (pp. 228-235).
[26]
T. Güngör, Part-of-speech tagging, in: Handbook of natural language processing, Chapman & Hall/CRC, Boca Raton, FL, 2010, pp. 205-235.
[27]
Guo, H., Zhu, H., Guo, Z., Zahng, X., & Su, Z. (2009). Product feature categorization with multilevel latent semantic association. In Proceedings of the 18th ACM conference on information and knowledge management (pp. 1087-1096).
[28]
Guo, H., Zhu, H., Guo, Z., & Su, Z. (2011). Domain customization for aspect-oriented opinion analysis with multi-level latent sentiment clues. In Proceedings of the 20th ACM international conference on information and knowledge management (pp. 2493-2496).
[29]
Guozheng, Z., Faming, Z., Fang, W., & Jian, L. (2008). Knowledge creation in marketing based on data mining. In International conference on intelligent computation technology and automation (Vol. 1, pp. 782-786).
[30]
Hatzivassiloglou, V., & Wiebe, J. M. (2000). Effects of adjective orientation and gradability on sentence subjectivity. In: COLING '00, Proceedings of the 18th conference on computational linguistics (Vol. 1, pp. 299-305).
[31]
T. Hofmann, Unsupervised learning by probabilistic latent semantic analysis, Machine Learning, 42 (2001) 177-196.
[32]
Hu, M., & Liu, B. (2004a). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 168-177).
[33]
Hu, M., & Liu, B. (2004b). Mining opinion features in customer reviews. In Proceedings of AAAI (pp. 755-760).
[34]
Jakob, N., & Gurevych, I. (2010). Extracting opinion targets in a single- and cross-domain setting with conditional random fields. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 1035-1045).
[35]
Jin, W., Ho, H. H., & Srihari, R. K. (2009). OpinionMiner: A novel machine learning system for web opinion mining and extraction. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1195-1204).
[36]
N. Jindal, B. Liu, Mining comparative sentences and relations, in: AAAI'06, Proceedings of the 21st national conference on artificial intelligence, Vol. 2, AAAI Press, 2006, pp. 1331-1336.
[37]
N. Jindal, B. Liu, Identifying comparative sentences in text documents, in: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, Association for Computing Machinery, New York, NY, USA, 2006, pp. 244-251.
[38]
C. Kaiser, Opinion mining im Web 2.0 - Konzept und Fallbeispiel, HMD - Praxis der Wirtschaftsinformatik, 46 (2009) 90-99.
[39]
Kemighan, M. D., Church, K. W., & Gale, W. A. (1990). A spelling correction program based on a noisy channel model. In Proceedings of the 13th conference on computational linguistics (Vol. 2, pp. 205-210).
[40]
Kessler, J. S., & Nicolov, N. (2009). Targeting sentiment expressions through supervised ranking of linguistic configurations. In Proceedings of the third international AAAI conference on weblogs and social media (pp. 90-97).
[41]
Kim, S.-M., & Hovy, E. (2004). Determining the sentiment of opinions. In Proceedings of 20th international conference on computational linguistics (pp. 1367-1373). Geneva, Switzerland.
[42]
T. Kiss, J. Strunk, Unsupervised multilingual sentence boundary detection, Computational Linguistics, 32 (2006) 485-525.
[43]
Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the OMG! In Proceedings of the fifth international AAAI conference on weblogs and social media (pp. 538-541).
[44]
Leshed, G., & Kaye, J. (2006). Understanding how bloggers feel: Recognizing affect in blog posts. In Conference on human factors in computing systems (pp. 1019-1024). New York, NY, USA.
[45]
Li, F., Han, C., Huang, M., Zhu, X., Xia, Y.-J., Zhang, S., & Yu, H., (2010). Structure-aware review mining and summarization. In Proceedings of the 23rd international conference on computational linguistics (pp. 653-661).
[46]
B. Liu, Web data mining: Exploring hyperlinks, contents, and usage data (Corr. 2. print). Data-centric systems and applications, Springer, Berlin, 2008.
[47]
B. Liu, Sentiment analysis and opinion mining, Morgan & Claypool, San Rafael, 2012.
[48]
Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: analyzing and comparing opinions on the Web. In: WWW '05, Proceedings of the 14th international conference on World Wide Web (pp. 342-351). New York, NY, USA.
[49]
Maynard, D., Bontcheva, K., & Rout, D. (2012). Challenges in developing opinion mining tools for social media. In Proceedings of @NLP can u tag #user_generated_content?! Workshop at LREC 2012.
[50]
Maynard, D., Dupplaw, D., & Hare, J. (2013). Multimodal sentiment analysis of social media. In BCS SGAI workshop on social media analysis. <http://eprints.soton.ac.uk/360546/>.
[51]
M. McCandless, E. Hatcher, O. Gospodnetić, Lucene in action, Manning, Greenwich, 2010.
[52]
A. Mikheev, Periods, capitalized words, etc., Computational Linguistics, 28 (2002) 289-318.
[53]
G. Mishne, N.S. Glance, Predicting movie sales from blogger sentiment, in: Proceedings of the 21st national conference on artificial intelligence. Boston, AAAI Press, Massachusetts, 2006, pp. 11-14.
[54]
Moghaddam, S., & Ester, M. (2010). Opinion digger: An unsupervised opinion miner from unstructured product reviews. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 1825-1828).
[55]
Moghaddam, S., & Ester, M. (2012). On the design of LDA models for aspect-based opinion mining. In Proceedings of the 21st ACM international conference on information and knowledge management (pp. 803-812).
[56]
Nakagawa, T., Inui, K., & Kurohashi, S. (2010). Dependency tree-based sentiment classification using CRFs with hidden variables. In Human language technologies: The 2010 annual conference of the North American chapter of the ACL (pp. 786-794).
[57]
Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the seventh conference on international language resources and evaluation (LREC) (pp. 1320-1326). Valletta, Malta.
[58]
Pan, S. J., Ni, X., Sun, J.-T., Yang, Q., & Chen, Z. (2010). Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World Wide Web (pp. 751-760).
[59]
B. Pang, L. Lee, Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval, 2 (2008) 1-135.
[60]
G. Petz, M. Karpowicz, H. Fürschu¿ß, A. Auinger, V. Stříteský, A. Holzinger, Opinion mining on the Web 2.0 - Characteristics of user generated content and their impacts, in: Lecture notes in computer science, Vol. 7947, Springer, Berlin, Heidelberg, 2013, pp. 35-46.
[61]
G. Petz, M. Karpowicz, H. Fürschu¿ß, A. Auinger, S.M. Winkler, S. Schaller, On text preprocessing for opinion mining outside of laboratory environments, in: Lecture notes in computer science. Active media technology, Springer, Berlin, Heidelberg, 2012, pp. 618-629.
[62]
Popescu, A.-M., & Etzioni, O. (2005). Extracting product features and opinions from reviews. In Proceedings of human language technology conference and conference on empirical methods in natural language processing (pp. 339-346).
[63]
Porter, M. F. (2001). Snowball: A Language for Stemming Algorithms. <http://snowball.tartarus.org/texts/introduction.html>.
[64]
Rapid-i. RapidMiner. <http://rapid-i.com/content/view/181/190/lang,en/>.
[65]
Sayeed, A. B. (2011). A distributional and syntactic approach to fine-grained opinion mining (Dissertation). University of Maryland.
[66]
J. Steinberger, M. Ebrahim, M. Ehrmann, A. Hurriyetoglu, M. Kabadjov, P. Lenkova, Creating sentiment dictionaries via triangulation, Decision Support Systems, 53 (2012) 689-694.
[67]
Steinberger, J., Lenkova, P., Kabadjov, M., Steinberger, R., & Goot van der, Erik (2011). Multilingual entity-centered sentiment analysis evaluated by parallel corpora. In Proceedings of the 8th international conference recent advances in natural language processing (pp. 770-775).
[68]
The Apache Software Foundation. Apache OpenNLP developer documentation: Written and maintained by the apache OpenNLP development community. <http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html>.
[69]
The Apache Software Foundation. LUCENE.net search engine library. <http://lucenenet.apache.org/>.
[70]
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting of the association for computational linguistics (pp. 417-424).
[71]
Wiebe, J. M., & Mihalcea, R. (2006). Word sense and subjectivity. In: ACL-44, Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics (pp. 1065-1072).
[72]
T. Wilson, J.M. Wiebe, P. Hoffmann, Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis, Computational Linguistics, 35 (2009) 399-433.
[73]
Wong, T.-L., Bing, L., & Lam, W. (2011). Normalizing web product attributes and discovering domain ontology with minimal effort. In Proceedings of the fourth ACM international conference on web search and data mining (pp. 805-814).
[74]
L. Yi, B. Liu, Web page cleaning for web mining through feature weighting, in: IJCAI'03, Proceedings of the 18th international joint conference on artificial intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003, pp. 43-48.
[75]
Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., & Liu, B. (2011). Combining lexicon-based and learning-based methods for Twitter sentiment analysis: Technical report HPL-2011-89.
[76]
T. Zhang, Fundamental statistical techniques, in: Handbook of natural language processing, Chapman & Hall/CRC, Boca Raton, FL, 2010, pp. 189-204.
[77]
W. Zhang, C. Yu, W. Meng, Opinion retrieval from blogs, in: Proceedings of the sixteenth ACM conference on conference on information and knowledge management. Lisboa, Portugal, November 6-10, 2007, ACM, New York, NY, 2007, pp. 831-840.

Cited By

View all
  • (2023)Prediction of the customers' interests using sentiment analysis in e-commerce data for comparison of Arabic, English, and Turkish languagesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.02.01735:3(227-237)Online publication date: 1-Mar-2023
  • (2022)Systematic literature review of arabic aspect-based sentiment analysisJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.07.00134:9(6524-6551)Online publication date: 1-Oct-2022
  • (2022)Integrating the sentiments of multiple news providers for stock market index movement predictionInformation Sciences: an International Journal10.1016/j.ins.2022.10.029615:C(529-556)Online publication date: 1-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 50, Issue 6
November 2014
104 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 November 2014

Author Tags

  1. Data mining
  2. Noisy text
  3. Opinion mining
  4. Text preprocessing
  5. User generated content

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Prediction of the customers' interests using sentiment analysis in e-commerce data for comparison of Arabic, English, and Turkish languagesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.02.01735:3(227-237)Online publication date: 1-Mar-2023
  • (2022)Systematic literature review of arabic aspect-based sentiment analysisJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.07.00134:9(6524-6551)Online publication date: 1-Oct-2022
  • (2022)Integrating the sentiments of multiple news providers for stock market index movement predictionInformation Sciences: an International Journal10.1016/j.ins.2022.10.029615:C(529-556)Online publication date: 1-Nov-2022
  • (2022)A process-centric performance management in a call centerApplied Intelligence10.1007/s10489-022-03740-953:3(3304-3317)Online publication date: 27-May-2022
  • (2022)Better safe than sorry: a study on older adults’ credibility judgments and spreading of health misinformationUniversal Access in the Information Society10.1007/s10209-022-00899-322:3(957-966)Online publication date: 4-Aug-2022
  • (2022)How does WeChat’s active engagement with health information contribute to psychological well-being through social capital?Universal Access in the Information Society10.1007/s10209-021-00795-221:3(657-673)Online publication date: 1-Aug-2022
  • (2021)Community detection in social recommender systems: a surveyApplied Intelligence10.1007/s10489-020-01962-351:6(3975-3995)Online publication date: 1-Jun-2021
  • (2020)Different platforms for different patients’ needsInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2019.102386137:COnline publication date: 1-Jul-2020
  • (2020)Recursive Neural Text Classification Using Discourse Tree Structure for Argumentation Mining and Sentiment Analysis TasksFoundations of Intelligent Systems10.1007/978-3-030-59491-6_9(90-101)Online publication date: 23-Sep-2020
  • (2019)A framework for fake review detection in online consumer electronics retailersInformation Processing and Management: an International Journal10.1016/j.ipm.2019.03.00256:4(1234-1244)Online publication date: 1-Jul-2019
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media