Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages

Published: 01 March 2013 Publication History

Abstract

We analyze over 570 million Twitter messages from an eight month period and find that tracking a small number of keywords allows us to estimate influenza rates and alcohol sales volume with high accuracy. We validate our approach against government statistics and find strong correlations with influenza-like illnesses reported by the U.S. Centers for Disease Control and Prevention (r(14) = .964, p < .001) and with alcohol sales volume reported by the U.S. Census Bureau (r(5) = .932, p < .01). We analyze the robustness of this approach to spurious keyword matches, and we propose a document classification component to filter these misleading messages. We find that this document classifier can reduce error rates by over half in simulated false alarm experiments, though more research is needed to develop methods that are robust in cases of extremely high noise.

References

[1]
Brownstein, J., Freifeld, C., Reis, B., & Mandl, K. (2008). Surveillance sans frontieres: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS Medicine, 5, 1019-1024.
[2]
Chang, C., & Lin, C. (2011). LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology 2(3), 27:1-27:27, Software available at http://www.csie. ntu.edu.tw/~cjlin/libsvm.
[3]
Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak. PLoS ONE, 5(11).
[4]
Collier, N., Doan, S., Kawazeo, A., Goodwin, R., Conway, M., Tateno, Y., Ngo, H.Q., Dien, D., Kawtrakul, A., Takeuchi, K., Shigematsu, M., & Taniguchi, K. (2008). BioCaster: detecting public health rumors with a web-based text mining system. Bioinformatics, 24, 2940-2941.
[5]
Corley, C., Cook, D., Mikler, A., & Singh, K. (2010). Text and structural data mining of influenza mentions in web and social media. International Journal of Environmental Research and Public Health, 7(2), 596-615.
[6]
Culotta, A. (2010). Towards detecting influenza epidemics by analyzing twitter messages. In: Workshop on social media analytics at the 16th ACM SIGKDD conference on knowledge discovery and data mining.
[7]
de Quincey, E., & Kostkova, P. (2009). Early warning and outbreak detection using social networking websites: The potential of twitter, electronic healthcare. In: eHealth 2nd international conference. Instanbul, Turkey.
[8]
Drucker, H., Burges, C., Kaufman L., Smola A., & Vapnik V. (1996). Support vector regression machines. In: Advances in Neural Information Processing Systems 9, pp. 155-161.
[9]
Eysenbach, G. (2006). Infodemiology: Tracking flu-related searches on the web for syndromic surveillance. In: AMIA: Annual symposium proceedings, pp. 244-248.
[10]
Giampiccolo, D., Magnini, B., Dagan, I., & Dolan, B. (2007). The third pascal recognizing textual entailment challenge. In: Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pp. 1-9. Prague.
[11]
Gilbert, E., & Karahalios, K. (2010). Widespread worry and the stock market. In: Proceedings of the 4th international AAAI conference on weblogs and social media. Washington, DC.
[12]
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457, 1012-1014.
[13]
Grishman, R., Huttunen, S., & Yangarber, R. (2002). Information extraction for enhanced access to disease outbreak reports. Journal of Biomedical Informatics, 35(4), 236-246.
[14]
Gruhl, D., Guha, R., Kumar, R., Novak, J., & Tomkins, A. (2005). The predictive power of online chatter. In: Proceedings of the 11th ACM SIGKDD intlernational conference on knowledge discovery and data mining, pp. 78-87. ACM, New York, NY, USA.
[15]
Johnson, H., Wagner, M., Hogan, W., Chapman, W., Olszewski, R., Dowling, J., & Barnas, G. (2004). Analysis of web access logs for surveillance of influenza. MEDINFO pp. 1202-1206.
[16]
Kanny, D., Liu, Y., & Bewer, R. (2011). Binge drinking: United States, 2009. Morbidity and Mortality Weekly Report, 60(01), 101-104.
[17]
Lampos, V., & Cristianini, N. (2010). Tracking the flu pandemic by monitoring the social web. In: 2nd IAPR workshop on cognitive information processing (CIP 2010), pp. 411-416.
[18]
Lavrenko, V., Schmill, M. D., Lawrie, D., Ogilvie, P., Jensen, D., & Allan, J. (2000). Language models for financial news recommendation. In: Proceedings of the ninth international conference on information and knowledge management (CIKM). Washington, DC.
[19]
Linge, J., Steinberger, R., Weber, T., Yangarber, R., van der Goot, E., Khudhairy, D., & Stilianakis, N. (2009). Internet surveillance systems for early alerting of health threats. Eurosurveillance, 14(13).
[20]
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(3, (Ser. B)), 503-528.
[21]
Mawudeku, A., & Blench, M. (2006). Global public health intelligence network (GPHIN). In: 7th conference of the association for machine translation in the Americas.
[22]
McGinnis, J., & Foege, W. (1993). Actual causes of death in the united states. Journal of American Medical Association, 270, 2207-2012.
[23]
Mishne, G., Balog, K., de Rijke, M., & Ernsting, B. (2007). MoodViews: Tracking and searching moodannotated blog posts. In: international conference on weblogs and social media. Boulder, CO.
[24]
O'Connor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From Tweets to polls: Linking text sentiment to public opinion time series. In: International AAAI conference on weblogs and social media. Washington, DC.
[25]
Oreskovic, A. (2010). Twitter snags over 100 million users, eyes money-making. London: Reuters.
[26]
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(12), 1-135.
[27]
Polgreen, P., Chen, Y., Pennock, D., & Forrest, N. (2008). Using internet searches for influenza surveillance. Clinical infectious diseases, 47, 1443-1448.
[28]
Reilly, A., Iarocci, E., Jung, C., Hartley, D., & Nelson, N. (2008). Indications and warning of pandemic influenza compared to seasonal influenza. Advances in Disease Surveillance, 5, 190.
[29]
Ritterman, J., Osborne, M., & Klein, E. (2009). Using prediction markets and Twitter to predict a swine flu pandemic. In: 1st international workshop on mining social media.
[30]
Signorini, A., Segre, A. M., & Polgreen, P. M. (2011). The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS ONE, 6(5), e19467.

Cited By

View all
  • (2022)Twitter-aided decision making: a review of recent developmentsApplied Intelligence10.1007/s10489-022-03241-952:12(13839-13854)Online publication date: 1-Sep-2022
  • (2019)Drinks & CrowdsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/33289303:2(1-30)Online publication date: 21-Jun-2019
  • (2018)Using four different online media sources to forecast the crude oil priceJournal of Information Science10.1177/016555151769829844:3(408-421)Online publication date: 1-Jun-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Language Resources and Evaluation
Language Resources and Evaluation  Volume 47, Issue 1
March 2013
263 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2013

Author Tags

  1. Classification
  2. Regression
  3. Social media

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Twitter-aided decision making: a review of recent developmentsApplied Intelligence10.1007/s10489-022-03241-952:12(13839-13854)Online publication date: 1-Sep-2022
  • (2019)Drinks & CrowdsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/33289303:2(1-30)Online publication date: 21-Jun-2019
  • (2018)Using four different online media sources to forecast the crude oil priceJournal of Information Science10.1177/016555151769829844:3(408-421)Online publication date: 1-Jun-2018
  • (2018)Twitter data analysis to understand societal response to air qualityProceedings of the 9th International Conference on Social Media and Society10.1145/3217804.3217900(82-90)Online publication date: 18-Jul-2018
  • (2017)Geo-Social Analytics Based on Spatio-Temporal Dynamics of Marijuana-Related TweetsProceedings of the 2017 International Conference on Information System and Data Mining10.1145/3077584.3077588(28-38)Online publication date: 1-Apr-2017
  • (2017)Enhancing Feature Selection Using Word EmbeddingsProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052622(695-704)Online publication date: 3-Apr-2017
  • (2017)Measuring Global Disease with WikipediaProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing10.1145/2998181.2998183(1812-1834)Online publication date: 25-Feb-2017
  • (2016)On Infectious Intestinal Disease Surveillance using Social Media ContentProceedings of the 6th International Conference on Digital Health Conference10.1145/2896338.2896372(157-161)Online publication date: 11-Apr-2016
  • (2015)Social media analytics and research test-bed (SMART dashboard)Proceedings of the 2015 International Conference on Social Media & Society10.1145/2789187.2789196(1-7)Online publication date: 27-Jul-2015
  • (2015)You Tweet What You EatProceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems10.1145/2702123.2702153(3197-3206)Online publication date: 18-Apr-2015
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media