Nothing Special   »   [go: up one dir, main page]

skip to main content
article

An automatic non-English sentiment lexicon builder using unannotated corpus

Published: 01 April 2019 Publication History

Abstract

Sentiment lexicons in the English language are widely accessible while in many other languages, these resources are extremely deficient. Current techniques and methods for sentiment analysis focus mainly on the English language, whereas other languages are neglected due to lack of resources. In order to overcome challenges faced in building non-English lexicons, we propose a language-independent method that automatically builds non-English sentiment lexicons based on currently available English lexicons with an unannotated corpus from the target language. The proposed method will automatically recognize and extract new polarity words from the unannotated corpus based on the initial seed lexicons that are developed by translating three reliable English lexicons. The experimental results from the test datasets confirmed that a developed non-English sentiment lexicon could significantly enhance the performance of non-English sentiment classifications, compared with other methods and lexicons. The developed lexicon in the Arabic language outperformed other commonly used methods for developing non-English lexicons, with an 0.74 F measure. The adopted approach in this study was proven to be language independent and can be implemented in other languages as well. This paper also contributes to understanding the approaches to developing sentiment resources.

References

[1]
Vilares D, Alonso MA, Gómez-Rodríguez C (2017) Supervised sentiment analysis in multilingual environments. Inf Process Manag 53(3):595---607
[2]
Williams ML, Burnap P (2015) Cyberhate on social media in the aftermath of Woolwich: a case study in computational criminology and big data. Br J Criminol 56(2):211---238
[3]
Bravo-Marquez F, Frank E, Pfahringer B (2016) Building a Twitter opinion lexicon from automatically-annotated tweets. Knowl Based Syst 108:65---78
[4]
Wu FZ, Huang YF, Song YQ, Liu SX (2016) Towards building a high-quality microblog-specific Chinese sentiment lexicon. Decis Support Syst 87:39---49
[5]
Kiritchenko S, Zhu X, Mohammad SM (2014) Sentiment analysis of short informal texts. J Artif Intell Res 50:723---762
[6]
Deng S, Sinha AP, Zhao H (2017) Adapting sentiment lexicons to domain-specific social media texts. Decis Support Syst 94:65---76
[7]
Bermingham A, Smeaton AF (2010) Classifying sentiment in microblogs: is brevity an advantage? In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ACM
[8]
Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, 2005. HICSS'05, IEEE
[9]
Kouloumpis E, Wilson T, Moore JD (2011) Twitter sentiment analysis: the good the bad and the omg! ICWSM 11(538---541):164
[10]
Wu S-J, Chiang R-D, Ji Z-H (2017) Development of a Chinese opinion-mining system for application to Internet online forums. J Supercomput 73(7):2987---3001
[11]
Lo SL, Cambria E, Chiong R, Cornforth D (2017) Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48(4):499---527
[12]
Perez-Rosas V, Banea C, Mihalcea R (2012) Learning sentiment lexicons in Spanish. In: Lrec 2012: Eighth International Conference on Language Resources and Evaluation, 2012, pp 3077---3081
[13]
Steinberger J, Ebrahim M, Ehrmann M, Hurriyetoglu A, Kabadjov M, Lenkova P, Steinberger R, Tanev H, Vázquez S, Zavarella V (2012) Creating sentiment dictionaries via triangulation. Decis Support Syst 53(4):689---694
[14]
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1---167
[15]
Lo SL, Cambria E, Chiong R, Cornforth D (2016) A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection. Knowl Based Syst 105:236---247
[16]
Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Inf Fusion 36:10---25
[17]
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AY, Gelbukh A, Zhou Q (2016) Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 8(4):757---771
[18]
Abdaoui A, Azé J, Bringay S, Poncelet P (2017) Feel: a french expanded emotion lexicon. Lang Resour Eval 51(3):833---855
[19]
Scharl A, Sabou M, Gindl S, Rafelsberger W, Weichselbraun A (2012) Leveraging the wisdom of the crowds for the acquisition of multilingual language resources
[20]
Hassan A, Abu-Jbara A, Jha R, Radev D (2011) Identifying the semantic orientation of foreign words. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol 2. Association for Computational Linguistics
[21]
Nusko B, Tahmasebi N, Mogren O (2016) Building a sentiment lexicon for swedish. In: Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, 11 July 2016, Krakow, Poland. Linköping University Electronic Press
[22]
Kumar P, Jaiswal UC (2016) A comparative study on sentiment analysis and opinion mining. Int J Eng Technol 8(2):938---943
[23]
Pozzi FA, Fersini E, Messina E, Liu B (2017) Chapter 1: challenges of sentiment analysis in social networks: an overview. sentiment analysis in social networks. Morgan Kaufmann, Boston, pp 1---11
[24]
Zhang HL, Gan WY, Jiang B (2014) IEEE, machine learning and lexicon based methods for sentiment classification: a survey. In: 2014 11th Web Information System and Application Conference (WISA), 2014, pp 262---265
[25]
Denecke K (2008) Using sentiwordnet for multilingual sentiment analysis. In: IEEE 24th International Conference on Data Engineering Workshop, 2008. ICDEW 2008, IEEE
[26]
Yao J, Wu G, Liu J, Zheng Y (2006) Using bilingual lexicon to judge sentiment orientation of Chinese words. In: The Sixth IEEE International Conference on Computer and Information Technology, 2006. CIT'06, IEEE
[27]
Mihalcea R, Banea C, Wiebe JM (2007) Learning multilingual subjective language via cross-lingual projections
[28]
Mohammad SM, Turney PD (2013) Crowdsourcing a word---emotion association lexicon. Comput Intell 29(3):436---465
[29]
Nielsen FA (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs. In: 1st Workshop on Making Sense of Microposts 2011: Big Things Come in Small Packages, #MSM 2011--Co-located with the 8th Extended Semantic Web Conference, ESWC 2011. Heraklion, Crete
[30]
Hammer H, Bai A, Yazidi A, Engelstad P (2014) Building sentiment lexicons applying graph theory on information from three norwegian thesauruses. In: Norsk Informatikkonferanse (NIK)
[31]
Basile V, Nissim M (2013) Sentiment analysis on Italian tweets. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
[32]
Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP on Interactive Demonstrations. Association for Computational Linguistics
[33]
Remus R, Quasthoff U, Heyer G (2010) SentiWS: a publicly available German-language resource for sentiment analysis. In: LREC
[34]
Jha V, Savitha R, Hebbar SS, Shenoy PD, Venugopal K (2015) Hmdsad: Hindi multi-domain sentiment aware dictionary. In: International Conference on Computing and Network Communications (CoCoNet), 2015, IEEE
[35]
Al-Twairesh N, Al-Khalifa H, Al-Salman A (2016) AraSenTi: large-scale twitter-specific Arabic sentiment lexicons. In: Association for Computational Linguistics, 2016, pp 697---705
[36]
Elhawary M, Elfeky M (2010) Mining Arabic business reviews. In: IEEE International Conference on Data Mining Workshops (ICDMW), 2010, IEEE
[37]
Haniewicz K, Kaczmarek M, Adamczyk M, Rutkowski W (2014) Polarity lexicon for the polish language: design and extension with random walk algorithm. In: Swiatek J et al. (eds) International Conference on Systems Science, ICSS 2013, 2014. Springer, pp 173---182
[38]
Feng S, Song KS, Wang DL, Yu G (2015) A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs. World Wide Web-Internet Web Inf Syst 18(4):949---967
[39]
Hong Y, Kwak H, Baek Y, Moon S (2013) Tower of babel: a crowdsourcing game building sentiment lexicons for resource-scarce languages. In: 22nd International Conference on World Wide Web, WWW 2013, Rio de Janeiro
[40]
Abdul-Mageed M, Diab M, Kübler S (2014) SAMAR: subjectivity and sentiment analysis for Arabic social media. Comput Speech Lang 28(1):20---37
[41]
Lafourcade M, Joubert A, Le Brun N (2015) Collecting and evaluating lexical polarity with a game with a purpose. In: RANLP
[42]
Mohammad SM, Salameh M, Kiritchenko S (2016) How translation alters sentiment. J Artif Intell Res 55:95---130
[43]
Shboul BA, Al-Ayyoub M, Jararweh Y (2015) Multi-way sentiment classification of Arabic reviews. In: 2015 6th International Conference on Information and Communication Systems (ICICS)
[44]
Abdullah M, Hadzikadic M (2017) Sentiment analysis on Arabic Tweets: challenges to dissecting the language. In: International Conference on Social Computing and Social Media, 2017. Springer
[45]
Najar D, Mesfar S (2017) Opinion mining and sentiment analysis for Arabic on-line texts: application on the political domain. Int J Speech Technol 20:575---585
[46]
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM
[47]
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, 2005. Association for Computational Linguistics
[48]
Al-Moslmi T, Albared M, Al-Shabi A, Omar N, Abdullah S (2018) Arabic senti-lexicon: constructing publicly available language resources for Arabic sentiment analysis. J Inf Sci 44(3):345---362
[49]
El-Halees A (2011) Arabic opinion mining using combined classification approach. In: The International Arab Conference on Information Technology, pp 10---13
[50]
Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544---2558
[51]
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC
[52]
Black W, Elkateb S, Rodriguez H, Alkhalifa M, Vossen P, Pease A, Fellbaum C (2006) Introducing the Arabic wordnet project. In: Proceedings of the Third International WordNet Conference
[53]
Stone PJ, Dunphy DC, Smith MS (1966) The general inquirer: a computer approach to content analysis. MIT Press, Oxford
[54]
Mahyoub FHH, Siddiqui MA, Dahab MY (2014) Building an Arabic sentiment lexicon using semi-supervised learning. J King Saud Univ Comput Inf Sci 26(4):417---424
[55]
Badaro G, Baly R, Hajj H, Habash N, El-Hajj W (2014) A large scale Arabic sentiment lexicon for Arabic opinion mining. ANLP 2014:165
[56]
Maamouri M, Graff D, Bouziri B, Krouna S, Bies A, Kulick S (2010) Standard Arabic morphological analyzer (SAMA) version 3.1. Linguistic Data Consortium, Catalog No.: LDC2010L01
[57]
Esuli A, Sebastiani F (2007) SentiWordNet: a high-coverage lexical resource for opinion mining. Evaluation 17:1---26
[58]
Abdul-Mageed M, Diab MT (2014) SANA: a large scale multi-genre, multi-dialect lexicon for Arabic subjectivity and sentiment analysis. In: LREC, 2014
[59]
Abdul-Mageed M, MT Diab (2011) Subjectivity and sentiment annotation of modern standard arabic newswire. In: Proceedings of the 5th Linguistic Annotation Workshop, 2011. Association for Computational Linguistics
[60]
Eskander R, Rambow O (2015) SLSA: a sentiment lexicon for Standard Arabic. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2015. Association for Computational Linguistics (ACL)
[61]
Buckwalter T (2002) Buckwalter Arabic morphological analyzer version 2.0. Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Catalog No.: LDC2004L02. 2004, ISBN 1-58563-324-0
[62]
Al-Subaihin AA, Al-Khalifa HS, Al-Salman AS (2011) A proposed sentiment analysis tool for modern arabic using human-based computing. In: Proceedings of the 13th International Conference on Information Integration and Web-Based Applications and Services, 2011, ACM
[63]
Abdul-Mageed M (2019) Modeling Arabic subjectivity and sentiment in lexical space. Inf Process Manag 56(2):291---307
[64]
Das SR, Chen MY (2007) Yahoo! for Amazon: sentiment extraction from small talk on the web. Manag Sci 53(9):1375---1388
[65]
Velikovich L, Blair-Goldensohn S, Hannan K, McDonald R (2010) The viability of web-derived polarity lexicons. In: 2010 Human Language Technologies Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010, Los Angeles, CA
[66]
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267---307
[67]
Davalos S, Merchant A, Rose GM, Lessley BJ, Teredesai AM (2015) `The good old days': an examination of nostalgia in Facebook posts. Int J Hum Comput Stud 83:83---93
[68]
Abdelali A, Darwish K, Durrani N, Mubarak H (2016) Farasa: a fast and furious segmenter for Arabic. In: HLT-NAACL Demos, 2016
[69]
Powers D (2007) Evaluation: from precision, recall and fmeasure to roc, informedness, markedness and correlation. J Mach Learn Technol 2:37---63
[70]
Giachanou A, Crestani F (2016) Like it or not: a survey of twitter sentiment analysis methods. ACM Comput Surv (CSUR) 49(2):28
[71]
Mohammad SM, Turney PD (2013) Nrc emotion lexicon. 2013, NRC technical report
[72]
Hussein DMEDM (2018) A survey on sentiment analysis challenges. J King Saud Univ Eng Sci 30(4):330---338
[73]
Saad MK (2010) The impact of text preprocessing and term weighting on arabic text classification. Comput Eng Islam Univ, Gaza
[74]
Zerrouki T, Balla A (2017) Tashkeela: novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data Brief 11:147

Cited By

View all
  • (2023)Cyberbullying detection and machine learning: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-023-10553-w56:Suppl 1(1375-1416)Online publication date: 1-Oct-2023
  • (2021)Research and Application of English Corpus Digitization Using Intelligent Numerical Method and Big Data Technology2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture10.1145/3495018.3501082(2237-2240)Online publication date: 23-Oct-2021
  • (2020)Sentiment Lexicon for Chinese College Students to Build and ApplyProceedings of the 4th International Conference on Computer Science and Application Engineering10.1145/3424978.3425088(1-7)Online publication date: 20-Oct-2020
  • Show More Cited By
  1. An automatic non-English sentiment lexicon builder using unannotated corpus

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image The Journal of Supercomputing
      The Journal of Supercomputing  Volume 75, Issue 4
      April 2019
      542 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 April 2019

      Author Tags

      1. Building resources
      2. Natural language processing
      3. Sentiment analysis
      4. Sentiment lexicon
      5. Text analysis

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Cyberbullying detection and machine learning: a systematic literature reviewArtificial Intelligence Review10.1007/s10462-023-10553-w56:Suppl 1(1375-1416)Online publication date: 1-Oct-2023
      • (2021)Research and Application of English Corpus Digitization Using Intelligent Numerical Method and Big Data Technology2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture10.1145/3495018.3501082(2237-2240)Online publication date: 23-Oct-2021
      • (2020)Sentiment Lexicon for Chinese College Students to Build and ApplyProceedings of the 4th International Conference on Computer Science and Application Engineering10.1145/3424978.3425088(1-7)Online publication date: 20-Oct-2020
      • (2020)A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNetACM Transactions on Asian and Low-Resource Language Information Processing10.1145/340485419:6(1-38)Online publication date: 13-Oct-2020
      • (2020)An integrated semi-automated framework for domain-based polarity words extraction from an unannotated non-English corpusThe Journal of Supercomputing10.1007/s11227-020-03222-076:12(9772-9799)Online publication date: 3-Mar-2020

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media