TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
<p>Weekly distribution of 2,014,792,896 tweets from 1 February 2020 to 31 March 2021.</p> "> Figure 2
<p>Distribution of languages with more than 10K tweets. The y-axis indicates the number of tweets in log scale.</p> "> Figure 3
<p>Country and city distributions across months sorted by overall tweet proportions.</p> "> Figure 4
<p>Geotagged tweets worldwide normalized by country’s population (per 100,000 persons). Tweets geotagged using <span class="html-italic">user location</span>, <span class="html-italic">user profile description</span>, and <span class="html-italic">GPS-coordinates</span> are included.</p> "> Figure 5
<p>Weekly distribution representing public sentiment based on worldwide tweets in all languages.</p> "> Figure 6
<p>Worldwide sentiment based on normalized classifier scores of the representative sentiment in each country. Numbers on countries are z-scores computed using the representative sentiment tweets normalized by total tweets from all countries.</p> "> Figure 7
<p>Weekly distribution of sentiment labels for the top-six countries.</p> "> Figure 8
<p>Monthly distributions of positive and negative sentiment tweets for the top-six countries.</p> "> Figure 9
<p>Sentiment across US counties. Tweets geotagged using <span class="html-italic">user location</span>, <span class="html-italic">user profile description</span>, and <span class="html-italic">GPS-coordinates</span> are included after normalizing by the total number of tweets from each county.</p> "> Figure 10
<p>Weekly distribution of sentiment labels of tweets in top-five languages excluding English (i.e., Spanish, Portuguese, French, Indonesian, and Arabic).</p> "> Figure 11
<p>Percentage of female users for countries meeting representative sampling criteria (confidence interval = 95%; margin of error <math display="inline"><semantics> <mrow> <mo>≤</mo> <mn>1</mn> <mo>%</mo> </mrow> </semantics></math>). Gray color indicates the countries excluded due to under representation (<math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>85</mn> </mrow> </semantics></math>).</p> "> Figure 12
<p>Weekly trends of important issues related to personal and social lives of users linked to COVID-19.</p> "> Figure 13
<p>Global digital divide estimated through the type of device used for tweeting. Representative device type penetration (percentage) is shown on top of each country.</p> "> Figure 14
<p>The evolution of natural cities in the mainland US during the pandemic. Note that the red patches correspond to the natural cities while the gray lines indicate the triangulated irregular network of geo-referenced tweets.</p> ">
Abstract
:1. Introduction
2. Methods
2.1. Data Collection and Description
2.2. Named-Entity Recognition
2.3. Geographic Information
Algorithm 1: Pseudo-code for processing toponyms from text. |
Algorithm 2: Pseudo-code for geotagging place object. |
Algorithm 3: Pseudo-code for the overall processing of all attributes. |
2.4. Sentiment Classification
2.5. User Type and Gender Classification
3. Results
3.1. Named-Entities Results
3.2. Geotagging Results
3.3. Sentiment Analysis Results
3.4. User Type Identification and Gender Classification Results
4. Analysis and Applications
4.1. Trend Analysis
4.2. Global Digital Divide
4.3. Evolution of Natural Cities in the US
4.4. Potential Applications
- Disease forecasting and surveillance lead to the early detection and prevention of an outbreak. Moreover, early warning systems alert authorities and healthcare providers to prepare and respond to outbreaks in a timely fashion. TBCOV’s broad topical coverage, particularly about self-reported symptoms and deaths, can be a strong indicator for the early warning systems;
- Identification of fake information is essential to tackle negative influences on societies, especially during health emergencies. Tweets’ temporal information, re-sharing and retweeting patterns, and the use of specific tone in the textual content can potentially lead to the identification of rumors and fake information. More than two billion tweets in the TBCOV dataset is a goldmine for detecting conspiracies, rumors, and misinformation circulated on social media (e.g., drinking bleach can cure COVID-19). More importantly, the data can be used to develop robust models for fake news and rumor detection;
- Understanding communities’ knowledge gaps during emergency situations, such as the COVID-19 pandemic is crucial for authorities to deal with the surge of uncertainties. TBCOV’s comprehensive geographic, as well as temporal coverage can be analyzed to understand public questions and queries;
- Identification of shortages of important items such as Personal Protective Equipment (PPE), oxygen, and face mask becomes the top priority for governments during health emergencies. Building models to identify pertinent social media reports could help authorities plan and prevent devastating consequences of shortages;
- Understanding public sentiment and reactions against governments policies, such as lockdowns, closure of businesses, as well as slow response or vaccination rate can be performed using social media data, such as TBCOV;
- Rapid needs assessment informs humanitarian organizations’ and governments’ response operations and determines relief priorities for an affected population during emergencies, such as the COVID-19 pandemic. Our trends analysis results highlighted the effectiveness of TBCOV for mining priority needs of population in terms of food, cash, medicines, and more;
- Identification of self-reported symptoms, such as fever, cough, loss of taste, etc., through social media data could indicate a likely future hot-spot when reports spike in a geographical area. TBCOV tweets geotagged with fine-grained locations, such as counties and cities, can be useful to build models for symptom detection and hot-spot prediction;
- Finding correlations is an important measure of relationship between two variables. We remark that the TBCOV dataset can be used to perform various types of correlation analysis to detect patterns and generate hypotheses. These analyses include, but are not limited to, finding correlations between COVID-19 cases and self-reported symptoms on Twitter; or between COVID-19 cases and death reports. Correlations between COVID-19 cases and negative sentiment in a geographical location or the surge of messages showing anxiety and unemployment rate; or correlation between daily negative tweets and the rate of food insufficiency in an area can open new avenues for interesting analyses.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Keywords
, Japan Coronavirus, Turkey Coronavirus, #coronavid19, #新冠肺炎, #socialdistancing Connecticut, Belgium Coronavirus, withings.com kg, #earth, Haiti COVID-19, #COVID19, wuhan virus, #socialdistancing Washington, Hungary Coronavirus, Togo COVID-19, Covid19DE, Botswana COVID-19, #Coronavirustexas, #coronavirusnobrasil, #COVIDPakistan, Mexico COVID-19, coronavirus china, #chinavirus, Corona Ausbruch, Poland COVID-19, #pandemic, coronavirus outbreak, mehl, korona, #coronavirusoutbreak, Kuwait Coronavirus, #COVD, Namibia Coronavirus, #socialdistancing Indiana, #coronavirusuk, Saudi Arabia COVID-19, #NouveauCoronavirus, Vatican City Coronavirus, Cape Verde Coronavirus, Niger Coronavirus, #socialdistancing Alabama, United Arab Emirates Coronavirus, Finland COVID-19, Roma, pandemic, CoronaVirus Japan, corona vairus, #covid19uk, #China, Bangladesh COVID-19, #Lockdown, United Arab Emirates COVID-19, Austria Coronavirus, Uganda COVID-19, New Zealand Coronavirus, local food, Covid19 US, Lockdown Switzerland, coronavirus bio-weapon, #Koronavirus, #CoronaSchlager, food scarcity, Tanzania COVID-19, coronavirusupdate, #facemask, #LockdownNow, meat shortage, #socialdistancing Kansas, #socialdistancing Montana, #socialdistancing Wyoming, coronga virus, Bosnia Herzegovina Coronavirus, Suffolk Hardship, queue, recesión económica, Cameroon Coronavirus, #Coronavírus, Bahamas Coronavirus, Sierra Leone Coronavirus, Lithuania COVID-19, Algeria Coronavirus, CoronaSymptoms, #socialdistancing, Afghanistan Coronavirus, Corona, Oman Coronavirus, San Marino COVID-19, Sierra Leone COVID-19, كورونا_الأردن#, Suffolk hopeless, Kazakhstan COVID-19, Denmark Coronavirus, Kiribati Coronavirus, #veyekow, Jordan COVID-19, mask, #oustduterte, Morocco Coronavirus, virus, #chinacoronavirus, statistics, #YoMeQuedoEnCasa, Ukraine COVID-19, Torino, Chile Coronavirus, #coronavirusrd, #coronavirusupdates, Hungary COVID-19, Suffolk Corona Depressed, #vizagcovid19 #covid19vizag #indiavizagcovid19 #covid19vskp, #caronavirus, Rwanda COVID-19, Uzbekistan Coronavirus, #caronavirususa, Loss of Smell, علاج_كورونا#, Pakistan Coronavirus, #vizag, Slovakia COVID-19, Netherlands COVID-19, #coronaapocolypse, #新冠病毒, Burma COVID-19, Benin COVID-19, #coronaviruscalifornia, #socialdistancing Delaware, Napoli, Vanuatu COVID-19, abdominal pain, mass testing, #CoronavirusOutbreak, Belize COVID-19, breathing issues, corona virus, Guatemala COVID-19, #wuhanvirus, security, Suffolk lockdown, virus corona, #coronaviruses, Outdoor Masks, Cuarentena Colombia, #COVID19, #COVID_19uk, Thailand COVID-19, Mali COVID-19, covid-19 doctors, Norway Coronavirus, Coronavirus Vaccine, Finland Coronavirus, #FlattenTheCurve, Latvia Coronavirus, #coronavirusbrasil, eggs shortage, #socialdistancing Colorado, Paraguay Coronavirus, #coronavirusmumbai, Jordan Coronavirus, #socialdistancing North Carolina, #Pandemic, Ivory Coast Coronavirus, curfew news, Mauritius COVID-19, #masks4all, #Swiss, #socialdistancing Maryland, South Sudan Coronavirus, #Outbreak, #socialdistancing us, Uganda Coronavirus, #codvid19, #covid-19 #covid19, Djibouti COVID-19, #coronaviruscolombia, China Coronavirus, Philippines COVID-19, coronaviridae, Argentina Coronavirus, #CoronaUpdate, #Piacenza, coronavirus transmission, Corona Riverhead NY, Covid-19 Suffolk NY, respiradores Colombia, فيروس كورونا, #fakenewscovid19, Panama COVID-19, China COVID-19, Corona East Hampton NY, Indonesia COVID-19, Denmark COVID-19, #socialdistancing Iowa, كورونا#, Covid19Deutschland, #coronavirusmexico, Austria COVID-19, Armenia Coronavirus, Kenya Coronavirus, #코로나, Coronavirus, #Coronavirus, France COVID-19, #socialdistancing Kentucky, فيروس_كورونا_المستجد#, Cameroon COVID-19, Guinea COVID-19, 新冠病毒, Kyrgyzstan Coronavirus, #coronavirusargentina, Economic recession Switzerland, Tonga COVID-19, #shelterinplace, #socialdistancing Hawaii, Bergamo, #caronavirusoutbreak #Quarantined, Mauritius Coronavirus, Zambia COVID-19, bread shortage, Montenegro Coronavirus, Cyprus Coronavirus, CORONA VIRUS, covid-19 usa, #safety, COVID-19-Pandemie, #CoronaVirusitaly, Saudi Arabia coronavirus update, CSA, #2019nCoV, Peru COVID-19, #supermercato #quarantena, Serbia Coronavirus, Sweden Coronavirus, sore throat, Italy COVID-19, #socialdistancing Tennessee, #coronaviruspandemic, Barbados Coronavirus, #coronaVirus, CoronaVirusInNigeria, Somalia Coronavirus, #kamitidaktakutviruscorona, #socialdistancing Wisconsin, #2019ncov, ayuda gobierno, Corona virus كورونا, 加油武汉, Djibouti Coronavirus, 武汉肺炎, #CoronavirusFR, Social distance, Saudi Arabia Coronavirus, #IStayHome, Guinea Coronavirus, Australia COVID-19, #mask4all, Slovenia Coronavirus, Brunei COVID-19, Nicaragua COVID-19, Sudan COVID-19, Korea South Coronavirus, #covid19Canada, Eritrea COVID-19, #coronavirusIndonesia, Bologna, Kosovo COVID-19, #covid-19, Coronavirus Switzerland, Coronavirus Geneva, covid recovered, Kyrgyzstan COVID-19, #CoronavirusEnColombia, #Wuhan, #caronavirusindia, Corona Brookhaven NY, novelcoronavirus, #corona, Turkey COVID-19, Mauritania Coronavirus, #2019nCov, Bulgaria COVID-19, Oman COVID-19, Bulgaria Coronavirus, #coronaviruspuertorico, #coronavirus, #socialdistancing California, #coronavirusnewyork, coronavirus weapon, Cuba COVID-19, Suffolk covid, #coronaflu, #COVIDpain, #Covid, #socialdistancing Rhode Island, إشاعات_كورونا#, #covid19vizag #vizagcovid19, #covid19Indonesia, Outdoor, Switzerland COVID-19, Slovenia COVID-19, #socialdistancing Minnesota, 2019nCoV, Kosovo Coronavirus, covidiot, #marchapelocorona, dry beans shortage, 武漢肺炎, #socialdistancing Mississippi, Bahrain COVID-19, Serbia COVID-19, mascherina #Covid, United States COVID-19, food supply chain, #covid, #covid19italia, CoronaVirus Iran, Emergency food supply, Togo Coronavirus, Latvia COVID-19, muscle pain, brot, Ecuador Coronavirus, Covid19, Laos COVID-19, diarrhoea, #ncoV2019, Swaziland COVID-19, #COVID, Colombia Coronavirus, Online ordering, Suffolk Pandemic, People, UAE Coronavirus, Iraq COVID-19, Palau COVID-19, Korea South COVID-19, Coronavírus brasil, #coronavirususa, East Timor Coronavirus, #Corona virus, #COVID-19, Bolivia Coronavirus, COVID -19., Corona Southold NY, #コロナ,Benin Coronavirus, #COVID–19, test kits, Qatar COVID-19, Congo COVID-19, Comoros COVID-19, Coronavirus-Pandemie, #ForcaCoronaVirus, #socialdistancing New Mexico, #武汉加油, Mask, Tajikistan Coronavirus, maske, #coronaviruskerala, #Covid-19 United States, Myanmar, #myfitnesspal, Piacenza, #covid haiti, Libya COVID-19, supplies shortage, #新型コロナウイルス, Firenze, 코로나바이러스, Philippines Coronavirus, Grenada COVID-19, Israel Coronavirus, #covid-19 brasil, Cuba Coronavirus, Turkmenistan Coronavirus, #MASKS, #PánicoPorCoranovirus, Germany Coronavirus, #Ncov, Dominican Republic Coronavirus, Norway COVID-19, South Hampton NY, Syria COVID-19, #CoronaVirusSeattle, UK Coronavirus, flour shortage, Tunisia Coronavirus, Nicaragua Coronavirus, suffolk sick, Samoa COVID-19, Italia, #iorestoacasa, #coronavirusdelhi, Papua New Guinea Coronavirus, #ohiocoronavirus, COVID19NIGERIA, #武汉疫情, #coronafest, #Covid19Switzerland, Bhutan Coronavirus, Somalia COVID-19, #Sinophobia, #Covid_19india, #Corvid19virus, Luxembourg COVID-19, #socialdistancing Missouri, Malaysia COVID-19, #socialdistancing Illinois, Chad Coronavirus, #2019_ncov, #socialdistancing Georgia, Cutremur, Mongolia Coronavirus, Sudan Coronavirus, covid-19 healthcare, N95, #coronaviruschile, Madagascar Coronavirus, Syria Coronavirus, Solomon Islands Coronavirus,كورونا_قطر#, Spain COVID-19, Tonga Coronavirus, #DoingMyPartCO, Suffolk unemployment, quédate en casa Colombia, Liechtenstein COVID-19, Nauru COVID-19, #NeuerCoronavirus, #caronavirusoutbreak, #socialdistancing Arkansas, Ethiopia Coronavirus, Guatemala Coronavirus, Pakistan COVID-19, Dominica COVID-19, CORONA, Treatment, #CoronaLockdown, Coronavirus usa, walk, #SARSCoV2, Suffolk loss, corvid-19, Portugal Coronavirus, #coronacure, Chile COVID-19, COVID19 USA, Sweden COVID-19, France Coronavirus, #Foodbank, #kowona, Botswana Coronavirus, extension, Lithuania Coronavirus, Albania Coronavirus, Burkina Coronavirus, #WuhanCoronavirus, Ecuador COVID-19, Tajikistan COVID-19, Lebanon Coronavirus, Cambodia Coronavirus, #ncov19, CoronaVirus Korean, Seychelles Coronavirus, Honduras Coronavirus, Nariño Covid19, #socialdistancing Virginia, safety |
Brazil COVID-19, Micronesia Coronavirus, Coronavirus crisis, #socialdistancing Nevada, Mongolia COVID-19, Malta Coronavirus, Estonia Coronavirus, #Briefing_COVID19, Burundi Coronavirus, Canada COVID-19, Ghana Coronavirus, Iceland Coronavirus, #PhysicalDistancing, #emergency, كورونا_مصر#, Peru Coronavirus, Mexico Coronavirus, photo, Equatorial Guinea Coronavirus, #CoronaVirusCA, Estonia COVID-19, coronavírus brasil, Gabon COVID-19, Canada Coronavirus, #coronavirusperu, Bangladesh Coronavirus, Belarus Coronavirus, Suriname Coronavirus, Iran Coronavirus, #coronavirusinindia, #2019_nocv, Namibia COVID-19, Corona, #PutusRantaiCovid19, 武汉加油, image, Armenia COVID-19, Liberia COVID-19, Maldives COVID-19, Taiwan COVID-19, Nepal Coronavirus, Bhutan COVID-19, Ethiopia COVID-19, Jamaica Coronavirus, #dontgoviral, فيروس_كورونا#, Coronavirus US, Andorra COVID-19, Poland Coronavirus, Liberia Coronavirus, Tunisia COVID-19, Suffolk worry, #virus, Georgia Coronavirus, #武漢肺炎, nCov2019, كورونا_الكويت#, Central African Rep Coronavirus, Dominica Coronavirus, food shortage, Fiji COVID-19, Belarus COVID-19, Palau Coronavirus, #covid19france, government, Singapore COVID-19, #Corona, Papua New Guinea COVID-19, #Coronavirusnyc, carona virus, #mascherina, 2 week food supply, Bahamas COVID-19, Libya Coronavirus, Ireland Coronavirus, shopping, Thailand Coronavirus, Tuvalu Coronavirus, corona, #Coronavirusireland, Bahrain Coronavirus, coronavirus conspiracy, 2019-nCoV, Venezuela COVID-19, Burundi COVID-19, #socialdistancing Pennsylvania, Sri Lanka COVID-19, coronavirus new york, lombardia, #코로나바이러스, #CDC, Guinea-Bissau COVID-19, coronavirus pandemic, كورونا_لبنان#, COVID, El Salvador Coronavirus, coronavirus wuhan, #CoronaAlert, #Epidemic, Czech Republic COVID-19, #coronavirusmadrid, #covid19, Colombia COVID-19, NeuerCoronavirus, #QuarantineAndChill, #Coronapanik, South Africa COVID-19, Romania Coronavirus, Afghanistan COVID-19, corongavirus, covid-19, Grenada Coronavirus, Liechtenstein Coronavirus, test kit, Bosnia Herzegovina COVID-19, quarantena, Angola COVID-19, Greece COVID-19, Lesotho Coronavirus, covid19, corona virus news, groceries, muertes Colombia, #coronavirusu, #socialdistancing West Virginia, #DuringMy14DayQuarantine, park, activities, Lesotho COVID-19, Gambia COVID-19, Yemen Coronavirus, cutremur, covid, #socialdistancing Arizona, Uruguay COVID-19, mascarilla, #socialdistancing South Dakota, Micronesia COVID-19, Brescia, East Timor COVID-19, Masks4all, #MyPandemicSurvivalPlan, Croatia COVID-19, Turkmenistan COVID-19, Covid-19 US, Vanuatu Coronavirus, #socialdistancing North Dakota, Moldova COVID-19, Samoa Coronavirus, magnitude, nCoV, Nigeria COVID-19, recuperados covid19 Colombia, supermercato, #coronavirusbrazil, Monaco COVID-19, Mozambique Coronavirus, Mozambique COVID-19, #socialdistancing New Jersey, Measures, Malaysia Coronavirus, potatoes shortage, Niger COVID-19, Greece Coronavirus, Croatia Coronavirus, San Marino Coronavirus, Corona Suffolk NY, Haiti Coronavirus, #coronapocalypse, Ukraine Coronavirus, food supply, Guyana COVID-19, Senegal Coronavirus, Costa Rica COVID-19, CoV, Australia Coronavirus, #covid_19, #Coronaferien, #FarmersMarket, Nauru Coronavirus, Lebanon COVID-19, Vietnam Coronavirus, ecq, Spain Coronavirus, Cambodia COVID-19, #socialdistancing Louisiana, United Kingdom Coronavirus, Vietnam COVID-19, Kenya COVID-19, Macedonia Coronavirus, optimista, #coronavirustelangana, Zambia Coronavirus, withings.com st, #NovelCorona, Cape Verde COVID-19, Suffolk corona, #corona haiti, Macedonia COVID-19, Honduras COVID-19, #Covid_19Colombia, Burkina COVID-19, #PresidentCuomo, Ireland COVID-19, #firenze, #africacoronacure, #wuhan, #COVID19NIGERIA, Covid19_DE, #疫情, Covid-19 nurses, #Coronavirusmexico, Equatorial Guinea COVID-19, #CoronavirusSwitzerland, Swaziland Coronavirus, St Lucia Coronavirus, Egypt COVID-19, Paraguay COVID-19, Belgium COVID-19, #coronapandemic, Guyana Coronavirus, ncov-19, Nariño Coronavirus, Central African Rep COVID-19, masque, #coronavirusespana, #fakenewscorona, #covid2019pt, Iceland COVID-19, Andorra Coronavirus, Luxembourg Coronavirus, Nouveau coronavirus, Rwanda Coronavirus, Madagascar COVID-19, Masks, Saudi Arabia Covid-19 Update, #yellowalert #chinavirus, ncov, #socialdistancing Texas, emergency, Kiribati COVID-19, Korea North Coronavirus, cough, covid 19, #Sars-cov-2, #socialdistancing Idaho, #coronavirusuruguay, #武汉肺炎, picture, #socialdistancing Michigan, Covid19 Switzerland, Iraq Coronavirus, #socialdistancing Florida, Eritrea Coronavirus, breathing difficulties, Venezuela Coronavirus, #coronavirusafrica, Smithtown NY, #coronavirus, Fiji Coronavirus, Covid-19 brasil, rice shortage, Slovakia Coronavirus, UK COVID-19, كورونا_إيران#, #Wuhancoronavirus, #Anakapalli #covid19, Azerbaijan COVID-19, seism, Germany COVID-19, #Wuhanlockdown, Chad COVID-19, Italy Coronavirus, Nigeria Coronavirus, Russian Federation Coronavirus, Portugal COVID-19, St Lucia COVID-19, #socialdistancing Vermont, #coronaviruscure, #CoronavirusPandemic, كورونا_البحرين#, CoV2019 WHO 2019CoV coronovirus PHAC Canada Toronto, South Africa Coronavirus, 疫情, fatigue, Uruguay Coronavirus, Zimbabwe Coronavirus, Dominican Republic COVID-19, earthquake, Azerbaijan Coronavirus, Georgia COVID-19, India Coronavirus, Montenegro COVID-19, #CoronaVirusInNigeria, #covid19espana, India COVID-19, mala gestión, #socialdistancing Oregon, #StayHomeSaveLives, Long Island Corona, bohnen, lock down, outside, Malawi COVID-19, #Coronavirus, #socialdistancing South Carolina, #covid-19uk, Mauritania COVID-19, #N95, corona recovered, Costa Rica Coronavirus, novel coronavirus, testkit. #StayAtHome, Albania COVID-19, Marshall Islands COVID-19, coronavirus, fever, #coronavirustruth, Egypt Coronavirus, #socialdistancing Utah, Malta COVID-19, كورونا_الجديد#, Virus Corona, #COVID19Pandemic, #WuhanVirus, #coronaoutbreak, Comoros Coronavirus, #Chile, Cyprus COVID-19, Russian Federation COVID-19, Jamaica COVID-19, #coronavirusUP, Sri Lanka Coronavirus, Kuwait COVID-19, Netherlands Coronavirus, safe, CoronaTreatment, #coronavirusoutbreak, Belize Coronavirus, COVID-19, #socialdistancing Oklahoma, covid19 recovered, #CoronaVirusDE, Moldova Coronavirus, CoronavirusFR, nCov, #socialdistancing Massachusetts, Korea North COVID-19, 코로나, #COVID19PT, #shortage, Switzerland Coronavirus, Tuvalu COVID-19, outdoor, #covd19, Coronavirus Colombia, withings.com lb, #加油武汉, #staysafe, New Zealand COVID-19, UCI disponibles, disaster, El Salvador COVID-19, Seychelles COVID-19, コロナ, Malawi Coronavirus, Vatican City COVID-19, Angola Coronavirus, Kazakhstan Coronavirus, #infocoronavirus, #nCoV, Czech Republic Coronavirus, #covid_19uk, #socialdistancing New York, 新型冠状病毒, #新型冠状病毒, #Distancing, Marshall Islands Coronavirus, #CoronaVirusCanada, Argentina COVID-19, Mali Coronavirus, coronavirus news, #Covid19, 新冠肺炎, #Kungflu, #codvid_19, masks, nCoV2019, Covid_19, Wuhan virus, images, testkits, كورونا_السعودية#, Nepal COVID-19, Corona Islip NY, vegetables shortage, #socialdistancing Ohio, #coronavirus #covid-19, Tanzania Coronavirus, Brazil Coronavirus, COVID-19 USA, #coronavirusnyc, #socialdistancing Maine, #コロナウイルス, social distancing, #sentom, #Covid-19, UAE COVID-19, corona virus outbreak, Taiwan Coronavirus, Monaco Coronavirus, Israel COVID-19, #socialdistancing New Hampshire, #coronaday, Indonesia Coronavirus, Symptoms, loss of smell, #CoronaVirusIreland, Panama Coronavirus, Barbados COVID-19, #covid19ireland, Burma Coronavirus, Covid19 Suffolk NY, Milano, Gabon Coronavirus, #security, #africa, Iran COVID-19, #covid19india, Ivory Coast COVID-19, #Covid19 united states, #masks, Senegal COVID-19, South Sudan COVID-19, coronavirus epidemic, #Connecting, Gambia Coronavirus, mascherina, Corona Babylon NY, social distance, United Kingdom COVID-19, Singapore Coronavirus, restrictions, #Covid-19brasil, tapabocas Colombia, #CoronavirusAustralia, Suriname COVID-19, Algeria COVID-19, #coronavirusecuador, #conronaviruspandemic, Yemen COVID-19, #Covid-19 US, #coronaviruscure, cereals shortage, Morocco COVID-19, #socialdistancing Nebraska, ncov19, Bolivia COVID-19, Japan COVID-19, Solomon Islands COVID-19, Corona Huntington NY, United States Coronavirus, #socialdistancing Alaska, Guinea-Bissau Coronavirus, #ncov2019, duterte, novel corona virus, Romania COVID-19, #quarantine, #covid2019, #socialdistancing usa, #Socialdistancing, Brunei Coronavirus, Qatar Coronavirus, #milano, #코로나19, Maldives Coronavirus, #coronavirusmaharashtra, Congo Coronavirus, Uzbekistan COVID-19, كورونا_العراق#, Ghana COVID-19, Laos Coronavirus, Zimbabwe COVID-19, #coronadeutschland |
References
- Castillo, C. Big Crisis Data; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Fraustino, J.D.; Liu, B.F.; Jin, Y. Social media use during disasters. Soc. Media Crisis Commun. 2017, 283, 32–47. [Google Scholar]
- Starbird, K.; Palen, L.; Hughes, A.L.; Vieweg, S. Chatter on the red: What hazards threat reveals about the social life of microblogged information. In ACM Conference on Computer Supported Cooperative Work; Association for Computing Machinery: New York, NY, USA, 2010; pp. 241–250. [Google Scholar]
- Sinnenberg, L.; Buttenheim, A.M.; Padrez, K.; Mancheno, C.; Ungar, L.; Merchant, R.M. Twitter as a tool for health research: A systematic review. Am. J. Public Health 2017, 107, e1–e8. [Google Scholar] [CrossRef] [PubMed]
- Zadeh, A.H.; Zolbanin, H.M.; Sharda, R.; Delen, D. Social media for nowcasting flu activity: Spatio-temporal big data analysis. Inf. Syst. Front. 2019, 21, 743–760. [Google Scholar] [CrossRef]
- Broniatowski, D.A.; Paul, M.J.; Dredze, M. National and local influenza surveillance through Twitter: An analysis of the 2012–2013 influenza epidemic. PLoS ONE 2013, 8, e83672. [Google Scholar] [CrossRef] [Green Version]
- Lamsal, R. Corona Virus (COVID-19) Geolocation-based Sentiment Data. IEEE Dataport 2020. [Google Scholar] [CrossRef]
- Lamsal, R. Corona Virus (COVID-19) Tweets Dataset. IEEE Dataport 2020. [Google Scholar] [CrossRef]
- Alqurashi, S.; Alhindi, A.; Alanazi, E. Large Arabic Twitter Dataset on COVID-19. arXiv 2020, arXiv:2004.04315. [Google Scholar]
- Haouari, F.; Hasanain, M.; Suwaileh, R.; Elsayed, T. ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks. arXiv 2020, arXiv:2004.05861. [Google Scholar]
- Kang, Y.; Gao, S.; Liang, Y.; Li, M.; Rao, J.; Kruse, J. Multiscale dynamic human mobility flow dataset in the US during the COVID-19 epidemic. Sci. Data 2020, 7, 1–13. [Google Scholar] [CrossRef]
- Park, S.; Han, S.; Kim, J.; Molaie, M.M.; Vu, H.D.; Singh, K.; Han, J.; Lee, W.; Cha, M. COVID-19 Discourse on Twitter in Four Asian Countries: Case Study of Risk Communication. J. Med. Internet Res. 2021, 23, e23272. [Google Scholar] [CrossRef] [PubMed]
- Banda, J.M.; Tekumalla, R.; Wang, G.; Yu, J.; Liu, T.; Ding, Y.; Chowell, G. A large-scale COVID-19 Twitter chatter dataset for open scientific research—An international collaboration. arXiv 2020, arXiv:2004.03688. [Google Scholar] [CrossRef]
- Gohil, S.; Vuik, S.; Darzi, A. Sentiment analysis of health care tweets: Review of the methods used. JMIR Public Health Surveill. 2018, 4, e43. [Google Scholar] [CrossRef]
- Gui, X.; Kou, Y.; Pine, K.H.; Chen, Y. Managing uncertainty: Using social media for risk assessment during a public health crisis. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA, 2 May 2017; pp. 4520–4533. [Google Scholar]
- Alamoodi, A.; Zaidan, B.; Zaidan, A.; Albahri, O.; Mohammed, K.; Malik, R.; Almahdi, E.; Chyad, M.; Tareq, Z.; Albahri, A.; et al. Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review. Expert Syst. Appl. 2020, 167, 114155. [Google Scholar] [CrossRef]
- Barbieri, F.; Espinosa-Anke, L.; Camacho-Collados, J. A Multilingual Language Model Toolkit for Twitter. arXiv 2021, arXiv:2104.12250. [Google Scholar]
- Geotagging. 2021. Available online: https://en.wikipedia.org/wiki/Geotagging (accessed on 20 June 2021).
- Boulos, M.N.K.; Geraghty, E.M. Geographical Tracking and Mapping of Coronavirus Disease COVID-19/Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Epidemic and Associated Events around the World: How 21st Century GIS Technologies Are Supporting the Global Fight against Outbreaks and Epidemics. Int. J. Health Geogr. 2020, 19, 8. [Google Scholar]
- Haworth, B. Emergency management perspectives on volunteered geographic information: Opportunities, challenges and change. Comput. Environ. Urban Syst. 2016, 57, 189–198. [Google Scholar] [CrossRef]
- Tzavella, K.; Fekete, A.; Fiedrich, F. Opportunities provided by geographic information systems and volunteered geographic information for a timely emergency response during flood events in Cologne, Germany. Nat. Hazards 2018, 91, 29–57. [Google Scholar] [CrossRef]
- Marrero, M.; Urbano, J.; Sánchez-Cuadrado, S.; Morato, J.; Gómez-Berbís, J.M. Named entity recognition: Fallacies, challenges and opportunities. Comput. Stand. Interfaces 2013, 35, 482–489. [Google Scholar] [CrossRef]
- Sekine, S.; Ranchhod, E. Named Entities: Recognition, Classification and Use; John Benjamins Publishing: Amsterdam, The Netherlands, 2009; Volume 19. [Google Scholar]
- Farmakiotou, D.; Karkaletsis, V.; Koutsias, J.; Sigletos, G.; Spyropoulos, C.D.; Stamatopoulos, P. Rule-based named entity recognition for Greek financial texts. In Proceedings of the Workshop on Computational Lexicography and Multimedia Dictionaries (COMLEX 2000), Kato Achaia, Greece, 22–23 September 2000; pp. 75–78. [Google Scholar]
- Finkel, J.R.; Manning, C.D. Nested named entity recognition. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, 6 August 2009; pp. 141–150. [Google Scholar]
- Manierre, M.J. Gaps in knowledge: Tracking and explaining gender differences in health information seeking. Soc. Sci. Med. 2015, 128, 151–158. [Google Scholar] [CrossRef]
- Antonio, A.; Tuffley, D. The gender digital divide in developing countries. Future Internet 2014, 6, 673–687. [Google Scholar] [CrossRef] [Green Version]
- Johnson, J.L.; Greaves, L.; Repta, R. Better science with sex and gender: Facilitating the use of a sex and gender-based analysis in health research. Int. J. Equity Health 2009, 8, 14. [Google Scholar] [CrossRef] [Green Version]
- Lawrence, K.; Rieder, A. Methodologic and ethical ramifications of sex and gender differences in public health research. Gender Med. 2007, 4, S96–S105. [Google Scholar] [CrossRef]
- CrisisNLP. TBCOV Data Repository. 2021. Available online: https://crisisnlp.qcri.org/tbcov (accessed on 9 November 2021).
- Thara, S.; Poornachandran, P. Code-mixing: A brief survey. In Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 19–22 September 2018; pp. 2382–2388. [Google Scholar]
- Qazi, U.; Imran, M.; Ofli, F. GeoCoV19: A dataset of hundreds of millions of multilingual COVID-19 tweets with location information. Sigspatial Spec. 2020, 12, 6–15. [Google Scholar] [CrossRef]
- MacKinlay, A.; Aamer, H.; Yepes, A.J. Detection of adverse drug reactions using medical named entities on Twitter. In AMIA Annual Symposium Proceedings; American Medical Informatics Association: Bethesda, MD, USA, 2017; Volume 2017, p. 1215. [Google Scholar]
- Stefanidis, A.; Vraga, E.; Lamprianidis, G.; Radzikowski, J.; Delamater, P.L.; Jacobsen, K.H.; Pfoser, D.; Croitoru, A.; Crooks, A. Zika in Twitter: Temporal variations of locations, actors, and concepts. JMIR Public Health Surveill. 2017, 3, e22. [Google Scholar] [CrossRef]
- Li, J.; Sun, A.; Han, J.; Li, C. A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 2020, 34, 50–70. [Google Scholar] [CrossRef] [Green Version]
- spaCy. Trained Models & Pipelines. 2021. Available online: https://spacy.io/models (accessed on 7 December 2021).
- Grace, R. Toponym usage in social media in emergencies. Int. J. Disaster Risk Reduct. 2021, 52, 101923. [Google Scholar] [CrossRef]
- Zade, H.; Shah, K.; Rangarajan, V.; Kshirsagar, P.; Imran, M.; Starbird, K. From Situational Awareness to Actionability: Towards Improving the Utility of Social Media Data for Crisis Response. Proc. ACM-Hum.-Comput. Interact. 2018, 2, 195. [Google Scholar] [CrossRef]
- Hindustan Times. Inundated, COVID-19 Helplines Crumble. 2021. Available online: https://www.hindustantimes.com/india-news/inundated-covid-helplines-crumble-101618684641863.html (accessed on 20 June 2021).
- Times of India. Social Media Is the New Helpline. 2021. Available online: https://timesofindia.indiatimes.com/viral-news/covid-19-india-social-media-is-the-new-helpline-for-a-crisis-hit-country/articleshow/82345645.cms (accessed on 20 June 2021).
- Sloan, L.; Morgan, J.; Burnap, P.; Williams, M. Who tweets? Deriving the demographic characteristics of age, occupation and social class from Twitter user meta-data. PLoS ONE 2015, 10, e0115545. [Google Scholar] [CrossRef] [Green Version]
- Ajao, O.; Hong, J.; Liu, W. A survey of location inference techniques on Twitter. J. Inf. Sci. 2015, 41, 855–864. [Google Scholar] [CrossRef] [Green Version]
- Carley, K.M.; Malik, M.; Landwehr, P.M.; Pfeffer, J.; Kowalchuck, M. Crowd sourcing disaster management: The complex nature of Twitter usage in Padang Indonesia. Saf. Sci. 2016, 90, 48–61. [Google Scholar] [CrossRef] [Green Version]
- Haklay, M.; Weber, P. Openstreetmap: User-generated street maps. IEEE Pervasive Comput. 2008, 7, 12–18. [Google Scholar] [CrossRef] [Green Version]
- Huang, H.; Chen, W.; Xie, T.; Wei, Y.; Feng, Z.; Wu, W. The Impact of Individual Behaviors and Governmental Guidance Measures on Pandemic-Triggered Public Sentiment: Based on System Dynamics and Cross-Validation. Int. J. Environ. Res. Public Health 2021, 18, 4245. [Google Scholar] [CrossRef]
- Zhang, T.; Cheng, C. Temporal and Spatial Evolution and Influencing Factors of Public Sentiment in Natural Disasters—A Case Study of Typhoon Haiyan. ISPRS Int. J.-Geo-Inf. 2021, 10, 299. [Google Scholar] [CrossRef]
- O’Connor, B.; Balasubramanyan, R.; Routledge, B.; Smith, N. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the International AAAI Conference on Web and Social Media, Washington, DC, USA, 23 May 2010. [Google Scholar]
- Burnap, P.; Williams, M.L. Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy Internet 2015, 7, 223–242. [Google Scholar] [CrossRef] [Green Version]
- Beigi, G.; Hu, X.; Maciejewski, R.; Liu, H. An overview of sentiment analysis in social media and its applications in disaster relief. Sentim. Anal. Ontol. Eng. 2016, 313–340. [Google Scholar] [CrossRef]
- Aday, S.; Farrell, H.; Lynch, M.; Sides, J.; Freelon, D. New media and conflict after the Arab Spring. U. S. Inst. Peace 2012, 80, 1–24. [Google Scholar]
- Liu, B. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 2012, 5, 1–167. [Google Scholar] [CrossRef] [Green Version]
- Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Wang, S.; Liu, B. Deep learning for sentiment analysis: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1253. [Google Scholar] [CrossRef] [Green Version]
- Yue, L.; Chen, W.; Li, X.; Zuo, W.; Yin, M. A survey of sentiment analysis in social media. Knowl. Inf. Syst. 2019, 60, 617–663. [Google Scholar] [CrossRef]
- Ceron, A.; Curini, L.; Iacus, S.M.; Porro, G. Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. New Media Soc. 2014, 16, 340–358. [Google Scholar] [CrossRef]
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, É.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5 July 2020; pp. 8440–8451. [Google Scholar]
- Twitter Statistics. 2021. Available online: https://www.businessofapps.com/data/twitter-statistics/ (accessed on 22 June 2021).
- Zhang, Z.; Bors, G. “Less is more”: Mining useful features from Twitter user profiles for Twitter user classification in the public health domain. Online Inf. Rev. 2019, 44, 213–237. [Google Scholar] [CrossRef]
- Uddin, M.M.; Imran, M.; Sajjad, H. Understanding types of users on Twitter. arXiv 2014, arXiv:1406.1335. [Google Scholar]
- Okazaki, S.; Díaz-Martín, A.M.; Rozano, M.; Menéndez-Benito, H.D. Using Twitter to engage with customers: A data mining approach. Internet Res. 2015, 25, 416–434. [Google Scholar] [CrossRef]
- Hannon, J.; Bennett, M.; Smyth, B. Recommending twitter users to follow using content and collaborative filtering approaches. In Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, Spain, 26 September 2010; pp. 199–206. [Google Scholar]
- Garcia Esparza, S.; O’Mahony, M.P.; Smyth, B. Catstream: Categorising tweets for user profiling and stream filtering. In Proceedings of the 2013 International Conference on Intelligent User Interfaces, Santa Monica, CA, USA, 19 March 2013; pp. 25–36. [Google Scholar]
- Ali, M. The Morphological Gender Assignment for English Personal Names. Ph.D. Thesis, California State University, Northridge, CA, USA, 2019. [Google Scholar]
- Slepian, M.L.; Galinsky, A.D. The voiced pronunciation of initial phonemes predicts the gender of names. J. Personal. Soc. Psychol. 2016, 110, 509. [Google Scholar] [CrossRef]
- Babu, A. Data World: Gender-by-Names Dataset. 2018. Available online: https://data.world/arunbabu/gender-by-names (accessed on 21 January 2021).
- Kantrowitz, M. CMU: Name Gender Dataset. 1995. Available online: http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/ (accessed on 21 January 2021).
- Howard, D. Data World: Gender-by-Name Dataset. 2017. Available online: https://data.world/howarder/gender-by-name (accessed on 21 January 2021).
- Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4 August 2001. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Cochran, W.G. Sampling Techniques, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
- Centers for Disease Control and Prevention. Symptoms of COVID-19. 2021. Available online: https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html (accessed on 9 November 2012).
- Jiang, B.; Miao, Y. The Evolution of Natural Cities from the Perspective of Location-Based Social Media. Prof. Geogr. 2015, 67, 295–306. [Google Scholar] [CrossRef] [Green Version]
- Jiang, B. Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution. Prof. Geogr. 2013, 65, 482–494. [Google Scholar] [CrossRef]
- Jiang, B.; Ma, D.; Yin, J.; Sandberg, M. Spatial Distribution of City Tweets and Their Densities. Geogr. Anal. 2016, 48, 337–351. [Google Scholar] [CrossRef] [Green Version]
Argentina Coronavirus, Armenia Coronavirus, Australia Coronavirus, Austria Coronavirus, Azerbaijan Coronavirus, Bahamas Coronavirus, Bahrain Coronavirus, Bangladesh Coronavirus, Barbados Coronavirus, Belarus Coronavirus, Belgium Coronavirus, Belize Coronavirus, Benin Coronavirus, Bhutan Coronavirus, Bolivia Coronavirus, Bosnia Herzegovina Coronavirus, Botswana Coronavirus, Brazil Coronavirus, Brunei Coronavirus, Bulgaria Coronavirus, Burkina Coronavirus, Burundi Coronavirus, Cambodia Coronavirus, Cameroon Coronavirus, Canada Coronavirus, COVID-19, Congo COVID-19, Congo COVID-19, Costa Rica COVID-19, Croatia COVID-19, Cuba COVID-19, Cyprus COVID-19, Czech Republic COVID-19, Denmark COVID-19, Djibouti COVID-19, Dominica COVID-19, Dominican Republic COVID-19, East Timor COVID-19, Ecuador COVID-19, Egypt COVID-19, El Salvador COVID-19, Equatorial Guinea COVID-19, Eritrea COVID-19, Estonia COVID-19, Ethiopia COVID-19, Fiji COVID-19, Finland COVID-19, France COVID-19, Gabon COVID-19, Gambia COVID-19, Georgia COVID-19, Germany COVID-19, Ghana COVID-19, #socialdistancing us, #socialdistancing usa, #socialdistancing Alabama, #socialdistancing Alaska, #socialdistancing Arizona, #socialdistancing Arkansas, #socialdistancing California, #socialdistancing Colorado, #socialdistancing Connecticut, #socialdistancing Delaware, #socialdistancing Florida, #socialdistancing Georgia, #socialdistancing Hawaii, #socialdistancing Idaho, #socialdistancing Illinois, #socialdistancing Indiana, #socialdistancing Iowa, #socialdistancing Kansas, #socialdistancing Kentucky, #socialdistancing Louisiana, #socialdistancing Maine, #socialdistancing Maryland, #socialdistancing Massachusetts, #socialdistancing Michigan, económica, quédate en casa Colombia, respiradores Colombia, tapabocas Colombia, UCI disponibles, recuperados covid19 Colombia, muertes Colombia, Nariño Coronavirus, Nariño Covid19, #coronavirus, #Corona, #COVID19, #WuhanCoronavirus, #ncoV2019, #coronavirus, Italia, lombardia, #covid19italia, #COVID19Pandemic, Covid, #CoronavirusAustralia, #pandemic, Covid-19 USA |
Model | Precision | Recall | F1-Score |
---|---|---|---|
English | 0.85 | 0.85 | 0.85 |
Spanish | 0.90 | 0.90 | 0.90 |
Portuguese | 0.90 | 0.90 | 0.90 |
French | 0.84 | 0.84 | 0.84 |
Italian | 0.86 | 0.85 | 0.86 |
Multilingual | 0.84 | 0.83 | 0.84 |
Language | Person | Organization | Location | Miscellaneous |
---|---|---|---|---|
English (U) | 14,796,271 | 18,887,285 | 2,930,148 | 10,798,850 |
English (A) | 409,794,668 | 611,669,779 | 483,680,780 | 1,690,122,455 |
Spanish (U) | 3,777,463 | 2,230,017 | 3,265,204 | 14,968,547 |
Spanish (A) | 98,561,105 | 69,581,078 | 169,903,131 | 301,512,355 |
Portuguese (U) | 1,439,192 | 932,504 | 1,006,396 | 2,845,321 |
Portuguese (A) | 27,577,759 | 15,896,880 | 40,090,891 | 52,440,351 |
French (U) | 1,374,884 | 804,336 | 719,896 | 3,894,968 |
French (A) | 23,595,420 | 17,256,551 | 34,064,424 | 63,010,283 |
Total (U) | 55,721,884 | 33,324,173 | 10,336,415 | 40,767,983 |
Total (A) | 803,832,752 | 814,205,050 | 805,175,906 | 2,320,195,791 |
Country | State | County | City | |
---|---|---|---|---|
Place | 0.988 (7990) | 0.967 (7871) | 0.771 (7394) | 0.967 (4903) |
Metric | Country | State | County | City |
---|---|---|---|---|
Precision | 0.868 | 0.839 | 0.648 | 0.802 |
Recall | 1.000 | 0.968 | 0.922 | 0.656 |
F1-score | 0.929 | 0.899 | 0.761 | 0.722 |
Metric | Country | State | County | City |
---|---|---|---|---|
Precision | 0.888 | 0.781 | 0.056 | 0.430 |
Recall | 0.732 | 0.640 | 0.462 | 0.184 |
F1-score | 0.803 | 0.703 | 0.100 | 0.258 |
Attribute | Occurrences | Geotagged (Yield) | Countries | States | Counties | Cities |
---|---|---|---|---|---|---|
Coordinates | 2,799,378 | 2,799,378 (100%) | 211 | 1912 | 9037 | 8079 |
Place | 51,411,442 | 51,061,938 (99%) | 215 | 1906 | 13,343 | 9932 |
User location | 1,284,668,011 | 1,132,595,646 (88%) | 218 | 2511 | 24,806 | 20,648 |
User prof. desc. | 1,642,116,879 | 180,508,901 (11%) | 218 | 2485 | 18,588 | 14,600 |
Tweet text | 2,014,792,896 | 515,802,081 (26%) | 218 | 2513 | 24,235 | 20,549 |
Metric | Female | Male | Macro Avg. | Weighted Avg. |
---|---|---|---|---|
Precision | 0.872 | 0.816 | 0.844 | 0.850 |
Recall | 0.885 | 0.797 | 0.841 | 0.851 |
F1-score | 0.878 | 0.807 | 0.843 | 0.850 |
Main Topics | Sub-Topics |
---|---|
COVID-19 symptoms | Fever, cough, shortness of breath, headache, loss of taste and smell |
COVID deaths mentions | Parents, siblings, grandparents, relatives, and close connections |
Food shortages | Food availability, food access, food adequacy, and food acceptability |
Anxiety and depression | Anger, sleepless, fearful, upset, restless, and anxious |
Mask usage and importance | Mask violation, masks are important, wear masks, masks save lives |
Willingness to take vaccine | Reactions, harmfulness, got vaccine, covid jab taken |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Imran, M.; Qazi, U.; Ofli, F. TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels. Data 2022, 7, 8. https://doi.org/10.3390/data7010008
Imran M, Qazi U, Ofli F. TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels. Data. 2022; 7(1):8. https://doi.org/10.3390/data7010008
Chicago/Turabian StyleImran, Muhammad, Umair Qazi, and Ferda Ofli. 2022. "TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels" Data 7, no. 1: 8. https://doi.org/10.3390/data7010008
APA StyleImran, M., Qazi, U., & Ofli, F. (2022). TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels. Data, 7(1), 8. https://doi.org/10.3390/data7010008