Abstract
It is reported that there are hundreds of thousands of deaths caused by seasonal flu all around the world every year. More other diseases such as chickenpox, malaria, etc. are also serious threats to people’s physical and mental health. There are 250,000–500,000 deaths every year around the world. Therefore proper techniques for disease surveillance are highly demanded. Recently, social media analysis is regarded as an efficient way to achieve this goal, which is feasible since growing number of people have been posting their health information on social media such as blogs, personal websites, etc. Previous work on social media analysis mainly focused on English materials but hardly considered Chinese materials, which hinders the application of such technique to Chinese people. In this paper, we proposed a new method of Chinese social media analysis for disease surveillance. More specifically, we compared different kinds of methods in the process of classification and then proposed a new way to process Chinese text data. The Chinese Sina micro-blog data collected from September to December 2013 are used to validate the effectiveness of the proposed method. The results show that a high classification precision of 87.49 % in average has been obtained. Comparing with the data from the authority, Chinese National Influenza Center, we can predict the outbreak time of flu 5 days earlier.
Similar content being viewed by others
References
IResearch (2010) In 2010 the global Internet users spend most of their time in social media. http://service.iresearch.cn/others//20101129/128573.shtml
Infographic (2012) The growing impact of social media. http://www.sociallyawareblog.com/2012/11/21/time-americans-spend-per-month-on-social-media-sites/
Collier N, Son NT, Nguyen NM (2011) OMG u got flu? Analysis of shared health messages for bio-surveillance. J. Biomed Semant 2(S–5):S9
Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014
Mangold WG, Faulds DJ (2009) Social media: the new hybrid element of the promotion mix. Bus Horiz 52(4):357–365
Kamel Boulos MN, Sanfilippo AP, Corley CD, Wheeler S (2010) Social web mining and exploitation for serious applications. Technosocial predictive analytics and related technologies for public health, environmental and national security surveillance. Comput Methods Programs Biomed 100(1):16–23
Lampos V, De Bie T, Cristianini N (2010) Flu detector-tracking epidemics on twitter. In: European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD 2010), Barcelona, Spain, pp 599–602
Freifeld CC, Chunara R, Mekaru SR, Chan EH, Kass-Hout T, Iacucci AA, Brownstein JS (2010) Participatory epidemiology: use of mobile phones for community-based health reporting. PLoS Med 7(12):e1000376
Sadilek A, Kautz HA, Silenzio (2012a) Predicting disease transmission from geo-tagged micro-blog data. In: Twenty-sixth AAAI conference on artificial intelligence
Sadilek A, Kautz H, Silenzio V (2012b) Dublin: modeling spread of disease from social interactions. In: Proceedings of sixth AAAI international conference on weblogs and social media (ICWSM)
Kaundal R, Kapoor AS, Raghava GP (2006) Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinform 7(1):485
Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using flickr for prediction and forecast. In: Proceedings of the international conference on multimedia. ACM, pp 1235–1244
Zheng-yan C (2010) Short message classification of microblogging based on semantic. Mod Comput 8:006
Yang F, Liu Y, Yu X, Yang M (2012) Automatic detection of rumor on sina weibo. In: Proceedings of the ACM SIGKDD workshop on mining data semantics. ACM, p 13
Bao M, Yang N, Zhou L, Lao Y, Zhang Y, Tian Y (2013) The spatial analysis of weibo check-in data–the case study of wuhan. In: Geo-informatics in resource management and sustainable ecosystem. Springer, Berlin, pp 480–491
Sun Y, Yan H, Lu C, Bie R, Zhou Z (2014) Constructing the web of events from raw data in the web of things. Mob Inf Syst 10(1):105–125
Ritchie M, Charlish A, Woodbridge K, Stove A (2011) Use of the Kullback–Leibler divergence in estimating clutter distributions. In: 2011 IEEE on radar conference (RADAR). IEEE, pp 751–756
Amati G, Van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst (TOIS) 20(4):357–389
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Liu J, Li B, Zhang W-S (2012) Feature extraction using maximum variance sparse mapping. Neural Comput Appl 21(8):1827–1833
Deng S, Xu Y, Li L, Li X, He Y (2013) A feature-selection algorithm based on support vector machine-multiclass for hyperspectral visible spectral analysis. J Food Eng 119(1):159–166
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT-Press, pp 41–56
Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Yang N, Li S, Liu J, Bian F (2014) Sensitivity of support vector machine classification to various training features. TELKOMNIKA Indones J Electr Eng 12(1):286–291
Han E-HS, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. Springer, Berlin
Acknowledgments
This research is supported in part by National Nature Science Foundation of China No. 61440054, Fundamental Research Funds for the Central Universities of China No. 216-274213, and Nature Science Foundation of Hubei, China No. 2014CFA048. Outstanding Academic Talents Startup Funds of Wuhan University, No. 216-410100003 and 216-410100004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cui, X., Yang, N., Wang, Z. et al. Chinese social media analysis for disease surveillance. Pers Ubiquit Comput 19, 1125–1132 (2015). https://doi.org/10.1007/s00779-015-0877-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-015-0877-5