Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3328413.3328415acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
short-paper

Understanding Demographic Bias and Representation in Social Media Health Data

Published: 26 June 2019 Publication History

Abstract

Text, images, geotags and other data from social media sites lend researchers a unique window into population health trends and disease spread. While these data provide the opportunity to track and measure health outcomes across geographic regions, over extended periods of time, and through complex social networks, they also present challenges. Most notably, these data carry significant biases due to demographic differences in who chooses to use each platform, and what they choose to share. While several publications have discussed the limitations of leveraging social media data for public health research, the amount of literature systematically investigating their demographic bias and exploring mitigation strategies is limited and ripe for interdisciplinary contributions. In this discussion paper, we highlight that understanding the strengths and limitations of these data sources would enable a rigorous assessment of their usefulness for public health research and provide a means for quantifying uncertainty in research findings.

References

[1]
Tim Althoff, Rok Sosic, Jennifer L. Hicks, Abby C. King, Scott L. Delp, and Jure Leskovec. 2017. Large-scale physical activity data reveal worldwide activity inequality. Nature 547, 7663 (2017), 336--339.
[2]
Chi Y. Bahk, Melissa Cumming, Louisa Paushter, Lawrence C. Madoff, Angus Thomson, and John S. Brownstein. 2016. Publicly Available Online Tool Facilitates Real-Time Monitoring Of Vaccine Conversations And Sentiments. Health Affairs 35, 2 (2016), 341--347.
[3]
David A. Broniatowski, Amelia M. Jamison, SiHua Qi, Lulwah AlKulaib, Tao Chen, Adrian Benton, Sandra C. Quinn, and Mark Dredze. 2018. Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate. American Journal of Public Health 108, 10 (2018), 1378--1384.
[4]
Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Conference on Fairness,Accountability and Transparency (2018-01--21). 77--91. http://proceedings.mlr.press/v81/buolamwini18a.html
[5]
Pew Research Center. 2018. Demographics of Social Media Users and Adoption in the United States. http://www.pewinternet.org/fact-sheet/social-media/
[6]
Nina Cesare, Christan Grant, Jared B. Hawkins, John S. Brownstein, and Elaine O. Nsoesie. 2017. Demographics in Social Media Data for Public Health Research: Does it matter? arXiv preprint arXiv:1710.11048 (2017).
[7]
Lauren E. Charles-Smith, Tera L. Reynolds, Mark A. Cameron, Mike Conway, Eric H. Y. Lau, Jennifer M. Olsen, Julie A. Pavlin, Mika Shigematsu, Laura C. Streichert, Katie J. Suda, and Courtney D. Corley. 2015. Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review. PloS One 10, 10 (2015), e0139701.
[8]
Cynthia Chew and Gunther Eysenbach. 2010. Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak. PLOS ONE 5, 11 (2010), e14118.
[9]
Courtney D. Corley, Diane J. Cook, Armin R. Mikler, and Karan P. Singh. 2010. Text and Structural Data Mining of Influenza Mentions in Web and Social Media. International Journal of Environmental Research and Public Health 7, 2 (2010), 596--615.
[10]
Kadija Ferryman and Mikaela Pitcan. 2018. Fairness in precision medicine. Data & Society (2018).
[11]
Deen Freelon, Charlton D. McIllwain, and Meredith Clark. 2016. Beyond the Hashtags: #Ferguson, #Blacklivesmatter, and the Online Struggle for Offline Justice. Center for Media & Social Impact, American University, (2016), 1--92. http://cmsimpact.org/resource/ beyond-hashtags-ferguson-blacklivesmatter-online-struggle-offline-justice/
[12]
Janaina Gomide, Adriano Veloso, Wagner Meira, Jr., VirgÃlio Almeida, FabrÃcio Benevenuto, Fernanda Ferraz, and Mauro Teixeira. 2011. Dengue Surveillance Based on a Computational Model of Spatio-temporal Locality of Twitter. In Proceedings of the 3rd International Web Science Conference (WebSci '11). ACM, 3:1--3:8.
[13]
Sherri Grasmuck, Jason Martin, and Shanyang Zhao. 2009. Ethno-Racial Identity Displays on Facebook. Journal of Computer-Mediated Communication 15, 1 (2009), 158--188.
[14]
Greg P. Griffin and Junfeng Jiao. 2015. Where does bicycling for health happen? Analysing volunteered geographic information through place and plexus. Journal of Transport & Health 2, 2 (2015), 238--247.
[15]
Jenine K. Harris, Jared B. Hawkins, Leila Nguyen, Elaine O. Nsoesie, Gaurav Tuli, Raed Mansour, and John S. Brownstein. 2017. Using Twitter to Identify and Respond to Food Poisoning: The Food Safety STL Project. Journal of public health management and practice: JPHMP 23, 6 (2017), 577--580.
[16]
Jenine K. Harris, Reed Mansour, Bechara Choucair, Joe Olson, Cory Nissen, and Jay Bhatt. 2014. Health Department Use of Social Media to Identify Foodborne Illness - Chicago, Illinois, 2013--2014. https://www.cdc.gov/mmwr/preview/mmwrhtml/mm6332a1.htm
[17]
Cassandra Harrison, Mohip Jorder, Henri Stern, Faina Stavinsky, Vasudha Reddy, Heather Hanson, HaeNaWaechter, Luther Lowe, Luis Gravano, and Sharon Balter. 2014. Using online reviews by restaurant patrons to identify unreported cases of foodborne illness - New York City, 2012--2013. MMWR. Morbidity and mortality weekly report 63, 20 (2014), 441.
[18]
Sanja Kapidzic and Susan C Herring. 2015. Race, gender, and self-presentation in teen profile photographs. New Media & Society 17, 6 (2015), 958--976.
[19]
Sarah F. McGough, John S. Brownstein, Jared B. Hawkins, and Mauricio Santillana. 2017. Forecasting Zika Incidence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data. PLOS Neglected Tropical Diseases 11, 1 (2017), e0005295.
[20]
Quynh C. Nguyen, Matt McCullough, Hsien-wen Meng, Debjyoti Paul, Dapeng Li, Suraj Kath, Geoffrey Loomis, Elaine O. Nsoesie, Ming Wen, Ken R. Smith, and Feifei Li. 2017. Geotagged US Tweets as Predictors of County-Level Health Outcomes, 2015--2016. American Journal of Public Health 107, 11 (2017), 1776--1782.
[21]
Elaine O. Nsoesie and John S. Brownstein. 2015. Computational Approaches to Influenza Surveillance: Beyond Timeliness. Cell Host & Microbe 17, 3 (2015), 275--278.
[22]
Elaine O. Nsoesie, Luisa Flor, Jared Hawkins, Adyasha Maharana, Tobi Skotnes, Fatima Marinho, and John S. Brownstein. 2016. Social Media as a Sentinel for Disease Surveillance: What Does Sociodemographic Status Have to Do with It? PLOS Currents Outbreaks (2016).
[23]
Elaine O. Nsoesie, Sheryl A. Kluberg, and John S. Brownstein. 2014. Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports. Preventive Medicine 67 (2014), 264--269.
[24]
Sherry Pagoto, Kristin L Schneider, Martinus Evans, Molly E Waring, Brad Appelhans, Andrew M Busch, Matthew C Whited, Herpreet Thind, and Michelle Ziedonis. 2014. Tweeting it off: characteristics of adults who tweet about a weight loss attempt. Journal of the American Medical Informatics Association 21, 6 (2014), 1032--1037.
[25]
Guido Antonio Powell, Kate Zinszer, Aman Verma, Chi Bahk, Lawrence Madoff, John Brownstein, and David Buckeridge. 2016. Media content about vaccines in the United States and Canada, 2012--2014: An analysis using data from the Vaccine Sentimeter. Vaccine 34, 50 (2016), 6229--6235.
[26]
Kyle W. Prier, Matthew S. Smith, Christophe Giraud-Carrier, and Carl L. Hanson. 2011. Identifying Health-Related Topics on Twitter. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer, 18--25.
[27]
Caitlin M. Rivers and Bryan L. Lewis. 2014. Ethical research standards in a world of big data. F1000Research (2014).
[28]
Marcel Salathe, Clark C. Freifeld, Sumiko R. Mekaru, Anna F. Tomasulo, and John S. Brownstein. 2013. Influenza A (H7N9) and the Importance of Digital Epidemiology. The New England journal of medicine 369, 5 (2013), 401--404.
[29]
Marcel Salathe and Shashank Khandelwal. 2011. Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. PLoS computational biology 7, 10 (2011), e1002199.
[30]
Mauricio Santillana, Andrà T. Nguyen, Mark Dredze, Michael J. Paul, Elaine O. Nsoesie, and John S. Brownstein. 2015. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLOS Computational Biology 11, 10 (2015), e1004513.
[31]
Mike Savage, Evelyn Ruppert, and John Law. 2010. Digital Devices: nine theses. (2010), 17. http://research.gold.ac.uk/7988
[32]
Samuel V. Scarpino, James G. Scott, Rosalind Eggo, Nedialko B. Dimitrov, and Lauren A. Meyers. 2016. Data Blindspots: High-Tech Disease Surveillance Misses the Poor. Online Journal of Public Health Informatics 8, 1 (2016).
[33]
Tuli, Gaurav. 2015. Modeling and Twitter-based Surveillance of Smoking Contagion.
[34]
Yi-ChiaWang, Moira Burke, and Robert Kraut. 2013. Gender Topic and Auidence Response: An analysis of user-generated content on Facebook. In In proceedings of ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2013) (2013). ACM Press, 1--5.

Cited By

View all
  • (2024)Sensing the pulse of the pandemic: unveiling the geographical and demographic disparities of public sentiment toward COVID-19 through social mediaCartography and Geographic Information Science10.1080/15230406.2024.232348951:3(366-384)Online publication date: 21-Mar-2024
  • (2024)The anatomy of conspiracy theorists: Unveiling traits using a comprehensive Twitter datasetComputer Communications10.1016/j.comcom.2024.01.027217(25-40)Online publication date: Mar-2024
  • (2024)Some Observations on Social Media Mining tools for Health ApplicationsData Science and Applications10.1007/978-981-99-7817-5_8(97-109)Online publication date: 18-Jan-2024
  • Show More Cited By

Index Terms

  1. Understanding Demographic Bias and Representation in Social Media Health Data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WebSci '19: Companion Publication of the 10th ACM Conference on Web Science
    June 2019
    34 pages
    ISBN:9781450361743
    DOI:10.1145/3328413
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. demographics
    2. disease surveillance
    3. public health
    4. social media

    Qualifiers

    • Short-paper

    Funding Sources

    • Robert Wood Johnson Foundation

    Conference

    WebSci '19
    Sponsor:
    WebSci '19: 11th ACM Conference on Web Science
    June 30 - July 3, 2019
    Massachusetts, Boston, USA

    Acceptance Rates

    WebSci '19 Paper Acceptance Rate 41 of 130 submissions, 32%;
    Overall Acceptance Rate 245 of 933 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)68
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Sensing the pulse of the pandemic: unveiling the geographical and demographic disparities of public sentiment toward COVID-19 through social mediaCartography and Geographic Information Science10.1080/15230406.2024.232348951:3(366-384)Online publication date: 21-Mar-2024
    • (2024)The anatomy of conspiracy theorists: Unveiling traits using a comprehensive Twitter datasetComputer Communications10.1016/j.comcom.2024.01.027217(25-40)Online publication date: Mar-2024
    • (2024)Some Observations on Social Media Mining tools for Health ApplicationsData Science and Applications10.1007/978-981-99-7817-5_8(97-109)Online publication date: 18-Jan-2024
    • (2023)Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on TwitterJournal of Medical Internet Research10.2196/4435625(e44356)Online publication date: 9-Jun-2023
    • (2023)Toward reduction of detrimental effects of hurricanes using a social media data analytic Approach: How climate change is perceived?Climate Risk Management10.1016/j.crm.2023.10048039(100480)Online publication date: 2023
    • (2023)Self‐reported treatment effectiveness for Crohn's disease using a novel crowdsourcing web‐based platformUnited European Gastroenterology Journal10.1002/ueg2.12424Online publication date: 27-Jun-2023
    • (2022)Sample Bias in Web-Based Patient-Generated Health Data of Dutch Patients With Gastrointestinal Stromal Tumor: Survey StudyJMIR Formative Research10.2196/367556:12(e36755)Online publication date: 15-Dec-2022
    • (2022)Automated gathering of real-world data from online patient forums can complement pharmacovigilance for rare cancersScientific Reports10.1038/s41598-022-13894-812:1Online publication date: 20-Jun-2022
    • (2022)Ethical Considerations in the Application of Artificial Intelligence to Monitor Social Media for COVID-19 DataMinds and Machines10.1007/s11023-022-09610-032:4(759-768)Online publication date: 25-Aug-2022
    • (2022)Machine Learning Identification of Self-reported COVID-19 Symptoms from Tweets in CanadaAI for Disease Surveillance and Pandemic Intelligence10.1007/978-3-030-93080-6_9(101-111)Online publication date: 9-Mar-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media