Abstract
We present the multivariate Bayesian scan statistic (MBSS), a general framework for event detection and characterization in multivariate spatial time series data. MBSS integrates prior information and observations from multiple data streams in a principled Bayesian framework, computing the posterior probability of each type of event in each space-time region. MBSS learns a multivariate Gamma-Poisson model from historical data, and models the effects of each event type on each stream using expert knowledge or labeled training examples. We evaluate MBSS on various disease surveillance tasks, detecting and characterizing outbreaks injected into three streams of Pennsylvania medication sales data. We demonstrate that MBSS can be used both as a “general” event detector, with high detection power across a variety of event types, and a “specific” detector that incorporates prior knowledge of an event’s effects to achieve much higher detection power. MBSS has many other advantages over previous event detection approaches, including faster computation and easy interpretation and visualization of results, and allows faster and more accurate event detection by integrating information from the multiple streams. Most importantly, MBSS can model and differentiate between multiple event types, thus distinguishing between events requiring urgent responses and other, less relevant patterns in the data.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Buckeridge, D. L., Burkom, H. S., Moore, A. W., Pavlin, J. A., Cutchis, P. N., & Hogan, W. R. (2004). Evaluation of syndromic surveillance systems: development of an epidemic simulation model. Morbidity and Mortality Weekly Report, 53(Supplement on Syndromic Surveillance), 137–143.
Burkom, H. S. (2003). Biosurveillance applying scan statistics with multiple, disparate data sources. Journal of Urban Health, 80(2 Suppl. 1), i57–i65.
Burkom, H. S., Murphy, S. P., Coberly, J., & Hurt-Mullen, K. (2005). Public health monitoring tools for multiple data streams. Morbidity and Mortality Weekly Report, 54(Supplement on Syndromic Surveillance), 55–62.
Clayton, D., & Kaldor, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics, 43, 671–681.
Cooper, G. F., Dash, D. H., Levander, J. D., Wong, W.-K., Hogan, W. R., & Wagner, M. M. (2004). Bayesian biosurveillance of disease outbreaks. In Proc. conference on uncertainty in artificial intelligence.
Cooper, G. F., Dowling, J. N., Levander, J. D., & Sutovsky, P. (2007). A Bayesian algorithm for detecting CDC Category A outbreak diseases from emergency department chief complaints. Advances in Disease Surveillance, 2, 45.
Duczmal, L., & Assuncao, R. (2004). A simulated annealing strategy for the detection of arbitrary shaped spatial clusters. Computational Statistics and Data Analysis, 45, 269–286.
Jiang, X., Neill, D. B., & Cooper, G. F. (2008). A Bayesian network model for spatial event surveillance. (Tech. rep.). University of Pittsburgh, Department of Biomedical Informatics.
Kleinman, K., Abrams, A., Kulldorff, M., & Platt, R. (2005). A model-adjusted space-time scan statistic with an application to syndromic surveillance. Epidemiology and Infection, 133(3), 409–419.
Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6), 1481–1496.
Kulldorff, M. (2001). Prospective time-periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society A, 164, 61–72.
Kulldorff, M., & Nagarwalla, N. (1995). Spatial disease clusters: detection and inference. Statistics in Medicine, 14, 799–810.
Kulldorff, M., Athas, W., Feuer, E., Miller, B., & Key, C. (1998). Evaluating cluster alarms: a space-time scan statistic and cluster alarms in Los Alamos. American Journal of Public Health, 88, 1377–1380.
Kulldorff, M., Heffernan, R., Hartman, J., Assuncao, R., & Mostashari, F. (2005). A space-time permutation scan statistic for the early detection of disease outbreaks. PLoS Medicine, 2(3), e59.
Kulldorff, M., Huang, L., Pickle, L., & Duczmal, L. (2006). An elliptic spatial scan statistic. Statistics in Medicine, 25, 3929–3943.
Kulldorff, M., Mostashari, F., Duczmal, L., Yih, W. K., Kleinman, K., & Platt, R. (2007). Multivariate scan statistics for disease surveillance. Statistics in Medicine, 26, 1824–1833.
Mollié, A. (1999). Bayesian and empirical Bayes approaches to disease mapping. In A. B. Lawson, A. Biggeri, D. Böhning, E. Lesaffre, J.-F. Viel, & R. Bertollini (Eds.) Disease mapping and risk assessment for public health.
Neill, D. B. (2006). Detection of spatial and spatio-temporal clusters (Tech. rep. CMU-CS-06-142). Ph.D. thesis, Carnegie Mellon University, Department of Computer Science.
Neill, D. B. (2007a). An empirical comparison of spatial scan statistics for outbreak detection. Advances in Disease Surveillance, 4, 259.
Neill, D. B. (2007b). Incorporating learning into disease surveillance systems. Advances in Disease Surveillance, 4, 107.
Neill, D. B., & Lingwall, J. (2007). A nonparametric scan statistic for multivariate disease surveillance. Advances in Disease Surveillance, 4, 106.
Neill, D. B., & Moore, A. W. (2004). Rapid detection of significant spatial clusters. In Proc. 10th ACM SIGKDD conf. on knowledge discovery and data mining (pp. 256–265).
Neill, D. B., & Moore, A. W. (2005). Anomalous spatial cluster detection. In Proc. KDD 2005 workshop on data mining methods for anomaly detection (pp. 41–44).
Neill, D. B., & Sabhnani, M. R. (2007). A robust expectation-based spatial scan statistic. Advances in Disease Surveillance, 2, 61.
Neill, D. B., Moore, A. W., & Sabhnani, M. R. (2005a). Detecting elongated disease clusters. Morbidity and Mortality Weekly Report, 54(Supplement on Syndromic Surveillance), 197.
Neill, D. B., Moore, A. W., Sabhnani, M. R., & Daniel, K. (2005b). Detection of emerging space-time clusters. In Proc. 11th ACM SIGKDD intl. conf. on knowledge discovery and data mining.
Neill, D. B., Moore, A. W., & Cooper, G. F. (2006). A Bayesian spatial scan statistic. In Advances in neural information processing systems 18 (pp. 1003–1010).
Patil, G. P., & Taillie, C. (2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Envir. Ecol. Stat., 11, 183–197.
Reis, B. Y., Kohane, I. S., & Mandl, K. D. (2007). An epidemiological network model for disease outbreak detection. PLoS Medicine, 4, 210.
Sabhnani, M. R., Neill, D. B., Moore, A. W., Tsui, F.-C., Wagner, M. M., & Espino, J. U. (2005). Detecting anomalous clusters in pharmacy retail data. In Proc. KDD 2005 workshop on data mining methods for anomaly detection (pp. 58–61).
Tango, T., & Takahashi, K. (2005). A flexibly shaped spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4, 11.
Wagner, M. M., Tsui, F.-C., Espino, J. U., Hogan, W., Hutman, J., Hirsch, J., Neill, D. B., Moore, A. W., Parks, G., Lewis, C., & Aller, R. (2004). A national retail data monitor for public health surveillance. Morbidity and Mortality Weekly Report, 53(Supplement on Syndromic Surveillance), 40–42.
Wallstrom, G. L., Wagner, M. M., & Hogan, W. R. (2005). High-fidelity injection detectability experiments: a tool for evaluation of syndromic surveillance systems. Morbidity and Mortality Weekly Report, 54(Supplement on Syndromic Surveillance), 85–91.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Dragos Margineantu, Denver Dash, and Weng-Keen Wong.
Rights and permissions
About this article
Cite this article
Neill, D.B., Cooper, G.F. A multivariate Bayesian scan statistic for early event detection and characterization. Mach Learn 79, 261–282 (2010). https://doi.org/10.1007/s10994-009-5144-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5144-4