Abstract
Social media platforms provide an opportunity to the users to express their views and emotions on any topic. Various researchers have successfully used the content posted on these platforms to capture the emotions of the people about the given event or topic. During COVID-19 pandemic, Indians extensively used Twitter owing to an increased need for virtual interaction. In this work, we analyse the tweets posted in India during COVID-19 outbreak to understand how individuals in India reacted to the pandemic. We identified the timelines of three major COVID-19 waves from May 2020 to March 2022 and retrieved 13,818 tweets from COV19Tweets dataset available at IEEE DataPort for the respective duration of each of the three waves. Lexicon based sentiment analysis of the tweets indicated a positive mindset of the Indian population during the pandemic. Further, visual analysis through word clouds revealed that a few words were common for all waves whereas some words were wave-specific. It was observed that the words used in tweets cannot be compulsorily associated with positive or negative emotions, as the context or the set of words taken together may be a better indicator. Hence, machine learning approach was followed for the identification of sentiments by extracting BoW (Bag-of-Words) and TF–IDF (Term Frequency–Inverse Document Frequency) features from the tweet text. Comparative performance analysis of the four classification algorithms, namely, Decision Tree (DT), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) and two ensemble methods Adaboost and Random Forest revealed that LR applied to BoW featureset was the best performer. Finally, we performed Latent Dirichlet Allocation (LDA) based topic modeling on the COVID-19 tweets to identify topics of discussion in each of the waves. The topics evolved from informative messages related to the pandemic during the first wave, to wider discussions related to the impact of COVID-19 on nifty, tourism, etc. for the second wave, and the omicron virus, availability of beds, and ventilators in the third wave. This study can be of great interest to governments, as they may undertake similar studies to understand human behavior when natural calamities or pandemics occur at the local or global levels. The automated capture of public sentiments and identification of topics may expedite the appropriate execution of preventive measures taken by governments and address the concerns of citizens almost instantly.
Similar content being viewed by others
Notes
These were the keywords used to identify the tweets and construct the dataset.
References
Abdulaziz M, Alotaibi A, Alsolamy M, Alabbas A (2021) Topic based sentiment analysis for COVID-19 tweets. Int J Adv Comput Sci Appl 12(1):626–636. https://doi.org/10.14569/IJACSA.2021.0120172
Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on language in social media (LSM 2011). Association for Computational Linguistics, Portland, Oregon, pp 30–38
Agarwal B, Mittal N, Agarwal B, Mittal N (2016) Machine learning approach for sentiment analysis. In: Prominent feature extraction for sentiment analysis. Springer, pp 21–45
Alamoodi A, Baker MR, Albahri O, Zaidan B, Zaidan A, Wong W-K et al (2022) Public sentiment analysis and topic modeling regarding Covid-19’s three waves of total lockdown: a case study on movement control order in malaysia. KSII Trans Internet Inf Syst 16(7):2169–2190. https://doi.org/10.3837/tiis.2022.07.003
Aliguliyev RM, Iskandarli GY (2022) Measuring citizen satisfaction with e-government services by using sentiment analysis technology. Int Electron Govern 14(4):479–489. https://doi.org/10.1504/IJEG.2022.129304
Alzubi JA, Jain R, Singh A, Parwekar P, Gupta M (2021) COBERT: COVID-19 question answering system using BERT. Arab J Sci Eng. https://doi.org/10.1007/s13369-021-05810-5
Aslam N, Rustam F, Lee E, Washington PB, Ashraf I (2022) Sentiment analysis and emotion detection on cryptocurrency related Tweets using ensemble LSTM–GRU model. IEEE Access 10:39313–39324. https://doi.org/10.1109/ACCESS.2022.3165621
Bayhaqy A., Sfenrianto S, Nainggolan K, Kaburuan ER (2018) Sentiment analysis about e-commerce from tweets using decision tree, k-nearest neighbor, and Naïve Bayes. In: 2018 international conference on Orange Technologies (ICOT), pp 1–6
Benrouba F, Boudour R (2023) Emotional sentiment analysis of social media content for mental health safety. Soc Netw Anal Min 13(1):1–8. https://doi.org/10.1007/s13278-022-01000-9
Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media Inc, Sebastopol
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Brooks SK, Webster RK, Smith LE, Woodland L, Wessely S, Greenberg N, Rubin GJ (2020) The psychological impact of quarantine and how to reduce it: rapid review of the evidence. The Lancet 395(10227):912–920. https://doi.org/10.1016/S0140-6736(20)30460-8
Cheng X, Yan X, Lan Y, Guo J (2014) BTM: topic modeling over short texts. IEEE Trans Knowl Data Eng 26(12):2928–2941. https://doi.org/10.1109/TKDE.2014.2313872
Chowdhary K (2020) Fundamentals of artificial intelligence. Springer, Berlin
Cohen Priva U, Austerweil JL (2015) Analyzing the history of cognition using topic models. Cognition 135:4–9. https://doi.org/10.1016/j.cognition.2014.11.006
Cucinotta D, Vanelli M (2020) WHO declares COVID-19 a pandemic. Acta bio medica: Atenei parmensis 91(1):157–160. https://doi.org/10.23750/abm.v91i1.9397
Dey L, Chakraborty S, Biswas A, Bose B, Tiwari S (2016) Sentiment analysis of review datasets using Naive Bayes and k-NN classifier. Int J Inf Eng Electron Bus. https://doi.org/10.5815/ijieeb.2016.04.07
Dhawan B (2021) Twitter says it saw 600% increase in daily average tweets around COVID-19 during India’s second wave of coronavirus. [June 1, 2023] https://www.financialexpress.com/life/technology-twitter-says-it-saw-600-increase-in-daily-average-tweets-around-covid-19-during-indias-second-wave-of-coronavirus-2281448/
Ellison NB, Vitak J, Gray R, Lampe C (2014) Cultivating social resources on social network sites: Facebook relationship maintenance behaviors and their role in social capital processes. J Comput Mediat Commun 19(4):855–870. https://doi.org/10.1111/jcc4.12078
Fitri VA, Andreswari R, Hasibuan MA (2019) Sentiment analysis of social media Twitter with case of anti-LGBT campaign in Indonesia using Naïve Bayes, decision tree, and random forest algorithm. In: The fifth information systems international conference, 23–24 July 2019, Surabaya, Indonesia, vol 161. Elsevier, pp 765–772
Gautam G, Yadav D (2014) Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In: 2014 seventh international conference on contemporary computing (IC3). IEEE, pp 437–442
Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D et al (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2
Hasan A, Moin S, Karim A, Shamshirband S (2018) Machine learning-based sentiment analysis for twitter accounts. Math Comput Appl 23(1):1–15. https://doi.org/10.3390/mca23010011
Hydrator (2020) Documenting the now. [January 31, 2023] https://github.com/docnow/hydrator
Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L (2019) Latent Dirichlet Allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl 78:15169–15211. https://doi.org/10.1007/s11042-018-6894-4
Johari A (2020) India’s focus on coronavirus leaves TB and HIV patients adrift. [June 1, 2023] https://scroll.in/article/958400/invisible-crisis-tb-and-hiv-patients-left-adrift-in-indias-focus-on-coronavirus
Khan FH, Bashir S, Qamar U (2014) TOM: Twitter opinion mining framework using hybrid classification scheme. Decis Support Syst 57:245–257. https://doi.org/10.1016/j.dss.2013.09.004
Khan W, Ghazanfar MA, Azam MA, Karami A, Alyoubi KH, Alfakeeh AS (2020) Stock market prediction using machine learning classifiers and social media, news. J Amb Intell Human Comput. https://doi.org/10.1007/s12652-020-01839-w
Khuc VN, Shivade C, Ramnath R, Ramanathan J (2012) Towards building large-scale distributed systems for twitter sentiment analysis. In: Proceedings of the 27th annual ACM symposium on applied computing, pp 459–464
Kuehn BM (2021) Despite improvements, COVID-19’s health care disruptions persist. JAMA 325(23):2335. https://doi.org/10.1001/jama.2021.9134
Kulkarni T (2020) Cancer patients worried as hospitals focus on COVID-19. [June 1, 2023] https://www.thehindu.com/news/cities/bangalore/cancer-patients-worried-as-hospitals-focus-on-covid-19/article31292061.ece/amp/
Lamsal R (2021) Design and analysis of a large-scale COVID-19 tweets dataset. Appl Intell 51(5):2790–2804. https://doi.org/10.1007/s10489-020-02029-z
Lamsal R (2023) Coronavirus (COVID-19) Geo-tagged Tweets dataset. [March 21, 2023] https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset
Liu B et al (2010) Sentiment analysis and subjectivity. In: Indurkhya N, Damerau FJ (eds) Handbook of natural language processing. Chapman and Hall/CRC, Boca Raton, pp 627–666
Loria S et al (2020) TextBlob documentation, Release 0.16. [January 31, 2023] https://textblob.readthedocs.io/en/dev/
Lu Q, Chesbrough H (2022) Measuring open innovation practices through topic modelling: revisiting their impact on firm financial performance. Technovation 114:102434. https://doi.org/10.1016/j.technovation.2021.102434
Lucini FR, Tonetto LM, Fogliatto FS, Anzanello MJ (2020) Text mining approach to explore dimensions of airline customer satisfaction using online customer reviews. J Air Transp Manag 83:101760. https://doi.org/10.1016/j.jairtraman.2019.101760
Machová K, Mikula M, Gao X, Mach M (2020) Lexicon-based sentiment analysis using the particle swarm optimization. Electronics 9(8):1317. https://doi.org/10.3390/electronics9081317
Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, USA, pp 262–272
Moreo A, Romero M, Castro J, Zurita J (2012) Lexicon-based comments-oriented news sentiment analyzer system. Expert Syst Appl 39(10):9166–9180. https://doi.org/10.1016/j.eswa.2012.02.057
Mujahid M, Lee E, Rustam F, Washington PB, Ullah S, Reshi AA, Ashraf I (2021) Sentiment analysis and topic modeling on tweets about online education during COVID-19. Appl Sci 11(18):8438. https://doi.org/10.3390/app11188438
News18 (2020) Day after honouring doctors with claps, many in India are evicting them fearing Covid-19. [June 1, 2023] https://www.news18.com/news/buzz/day-after-honouring-doctors-with-claps-many-in-india-are-evicting-them-fearing-covid-19-2548937.html
Ng C, Law KM, Ip AW (2021) Assessing public opinions of products through sentiment analysis: product satisfaction assessment by sentiment analysis. J Organ End User Comput (JOEUC) 33(4):125–141. https://doi.org/10.4018/JOEUC.20210701.oa6
Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl 42(24):9603–9611. https://doi.org/10.1016/j.eswa.2015.07.052
Palomino-Garibay A, Camacho-Gonzalez AT, Fierro-Villaneda RA, Hernandez-Farias I, Buscaldi D, Meza-Ruiz IV et al (2015) A random forest approach for authorship profiling. In: Proceedings of CLEF
Patel A, Meehan K (2021) Fake news detection on reddit utilising countvectorizer and term frequency-inverse document frequency with logistic regression, multinominalnb and support vector machine. In: 2021 32nd Irish signals and systems conference (ISSC), pp 1–6
Polanyi L, Zaenen A (2006) Contextual valence shifters. In: Shanahan JG, Qu Y, Wiebe J (eds) Computing attitude and affect in text: theory and applications. Springer, Berlin, pp 1–10
Porter K (2018) Analyzing the darknetmarkets subreddit for evolutions of tools and trends using LDA topic modeling. Digit Investig 26:S87–S97. https://doi.org/10.1016/j.diin.2018.04.023
Qiao F, Williams J (2022) Topic modelling and sentiment analysis of global warming tweets: evidence from big data analysis. J Organ End User Comput (JOEUC) 34(3):1–18. https://doi.org/10.4018/JOEUC.294901
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. ELRA, Valletta, Malta, pp 45–50
Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining. Association for Computing Machinery, New York, NY, USA, pp 399–408
Saberi B, Saad S (2017) Sentiment analysis or opinion mining: a review. Int J Adv Sci Eng Inf Technol 7(5):1660–1666. https://doi.org/10.18517/ijaseit.7.4.2137
Sahi RS, Schwyck ME, Parkinson C, Eisenberger NI (2021) Having more virtual interaction partners during COVID-19 physical distancing measures may benefit mental health. Sci Rep 11(1):18273. https://doi.org/10.1038/s41598-021-97421-1
Satya B, SJ MH, Rahardi M, Abdulloh FF (2022) Sentiment analysis of review sestyc using support vector machine, Naive Bayes, and logistic regression algorithm. In: 2022 5th international conference on information and communications technology (ICOIACT), pp 188–193
Schmidt S, Zorenböhmer C, Arifi D, Resch B (2023) Polarity-based sentiment analysis of georeferenced Tweets related to the 2022 Twitter acquisition. Information 14(2):71. https://doi.org/10.3390/info14020071
Sharma A, Dey S (2012a) A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM research in applied computation symposium, pp 1–7
Sharma A, Dey S (2012) Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. IJCA special issue on advanced computing and communication technologies for HPC applications 3:15–20
Sievert C, Shirley K (2014) LDAvis: a method for visualizing and interpreting topics. In: Proceedings of the workshop on interactive language learning, visualization, and interfaces. Association for Computational Linguistics, Baltimore, Maryland, USA, pp 63–70
Singh M, Jakhar AK, Pandey S (2021) Sentiment analysis on the impact of coronavirus in social life using the Bert model. Soc Netw Anal Min 11(1):33. https://doi.org/10.1007/s13278-021-00737-z
Slater J, Masih N (2020) As pandemic intensifies, many in India die due to shortage of hospital beds. [June 1, 2023] https://www.seattletimes.com/nation-world/as-pandemic-intensifies-many-in-india-die-due-to-shortage-of-hospital-beds/
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307. https://doi.org/10.1162/coli_a_00049
Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 417–424
Ullah MA, Marium SM, Begum SA, Dipa NS (2020) An algorithm and method for sentiment analysis using the text and emoticon. ICT Express 6(4):357–360. https://doi.org/10.1016/j.icte.2020.07.003
Wang J, Liu P, She MF, Nahavandi S, Kouzani A (2013) Bag-of-words representation for biomedical time series classification. Biomed Signal Process Control 8(6):634–644. https://doi.org/10.1016/j.bspc.2013.06.004
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, pp 424–433
Weng J, Lim EP, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on web search and data mining. Association for Computing Machinery, pp 261–270
Worldometer (2023) Coronavirus cases in India. [January 31, 2023] https://www.worldometers.info/coronavirus/country/india/
Xiong H, Cheng Y, Zhao W, Liu J (2019) Analyzing scientific research topics in manufacturing field using a topic model. Comput Ind Eng 135:333–347. https://doi.org/10.1016/j.cie.2019.06.010
Yin H, Song X, Yang S, Li J (2022) Sentiment analysis and topic modeling for COVID-19 vaccine discussions. World Wide Web 25(3):1067–1083. https://doi.org/10.1007/s11280-022-01029-y
Zhao YY, Qin B, Liu T (2010) Integrating intra- and inter-document evidences for improving sentence sentiment classification. Acta Autom Sin 36(10):1417–1425. https://doi.org/10.1016/S1874-1029(09)60057-4
Acknowledgements
This research follows from the project work done as part of the Summer Internship Programme (SIP) 2021–22 organized by the Centre for Research, Maitreyi College, University of Delhi.
Funding
This research received no external funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhardwaj, M., Mishra, P., Badhani, S. et al. Sentiment analysis and topic modeling of COVID-19 tweets of India. Int J Syst Assur Eng Manag 15, 1756–1776 (2024). https://doi.org/10.1007/s13198-023-02082-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-023-02082-0