Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1150402.1150410acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Spatial scan statistics: approximations and performance study

Published: 20 August 2006 Publication History

Abstract

Spatial scan statistics are used to determine hotspots in spatial data, and are widely used in epidemiology and biosurveillance. In recent years, there has been much effort invested in designing efficient algorithms for finding such "high discrepancy" regions, with methods ranging from fast heuristics for special cases, to general grid-based methods, and to efficient approximation algorithms with provable guarantees on performance and quality.In this paper, we make a number of contributions to the computational study of spatial scan statistics. First, we describe a simple exact algorithm for finding the largest discrepancy region in a domain. Second, we propose a new approximation algorithm for a large class of discrepancy functions (including the Kulldorff scan statistic) that improves the approximation versus run time trade-off of prior methods. Third, we extend our simple exact and our approximation algorithms to data sets which lie naturally on a grid or are accumulated onto a grid. Fourth, we conduct a detailed experimental comparison of these methods with a number of known methods, demonstrating that our approximation algorithm has far superior performance in practice to prior methods, and exhibits a good performance-accuracy trade-off.All extant methods (including those in this paper) are suitable for data sets that are modestly sized; if data sets are of the order of millions of data points, none of these methods scale well. For such massive data settings, it is natural to examine whether small-space streaming algorithms might yield accurate answers. Here, we provide some negative results, showing that any streaming algorithms that even provide approximately optimal answers to the discrepancy maximization problem must use space linear in the input.

References

[1]
D. Agarwal, J. M. Phillips, and S. Venkatasubramanian. The hunting of the bump: on maximizing statistical discrepancy. Proc. 17th Ann. ACM-SIAM Symp. on Disc. Alg., pages 1137--1146, 2006.
[2]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Jnl. Comp. Sys. Sci., 58(1):137--147, 1999.
[3]
Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68(4):702--732, 2004.
[4]
A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In IEEE Conf. Comp. Compl., pages 107--117, 2003.
[5]
M. Dwass. Modified randomization tests for nonparametric hypotheses. An. Math. Stat., 28:181--187, 1957.
[6]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1 difference algorithm for massive data streams. In IEEE Symp. on Foun. of Comp. Sci., pages 501--511, 1999.
[7]
J. H. Friedman and N. I. Fisher. Bump hunting in high-dimensional data. Stat. and Comp., 9(2):123--143, April 1999.
[8]
D. Haussler and E. Welzl. epsilon-nets and simplex range queries. Disc. & Comp. Geom., 2:127--151, 1987.
[9]
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. TR 1998-001, DEC Sys. Res. Center, 1998.
[10]
J. Hoh and J. Ott. Scan statistics to scan markers for susceptibility genes. Proc. Natl. Acad. Sci. USA, 97(17):9615--9617, 2000.
[11]
M. Kulldorff. A spatial scan statistic. Comm. in Stat.: Th. and Meth., 26:1481--1496, 1997.
[12]
E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997.
[13]
D. Neill, A. Moore, K. Daniel, and R. Sabhnani. Scan statistics. http://www.autonlab.org/autonweb/software/10474.html, Sep 2005.
[14]
D. B. Neill and A. W. Moore. A fast multi-resolution method for detection of significant spatial disease clusters. Adv. Neur. Info. Proc. Sys., 10:651--658, 2004.
[15]
D. B. Neill and A. W. Moore. Rapid detection of significant spatial clusters. In KDD, 2004.
[16]
D. B. Neill, A. W. Moore, F. Pereira, and T. Mitchell. Detecting significant multidimensional spatial clusters. L. K. Saul, et al., eds. Adv. Neur. Info. Proc. Sys., 17:969--976, 2005.
[17]
C. Priebe, J. Conroy, D. Marchette, and Y. Park. Scan statistics on enron graphs. Comp. and Math. Org. Theory, 11(3):229--247, 2005.
[18]
J. S. Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37--57, 1985.

Cited By

View all

Index Terms

  1. Spatial scan statistics: approximations and performance study

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2006
    986 pages
    ISBN:1595933395
    DOI:10.1145/1150402
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Kulldorff scan statistic
    2. discrepancy
    3. spatial scan statistics

    Qualifiers

    • Article

    Conference

    KDD06

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bayesian spatial cluster signal learning with application to adverse event (AE)Journal of Biopharmaceutical Statistics10.1080/10543406.2024.2325148(1-13)Online publication date: 21-Mar-2024
    • (2024)Parameterized Complexity of Streaming Diameter and Connectivity ProblemsAlgorithmica10.1007/s00453-024-01246-z86:9(2885-2928)Online publication date: 1-Sep-2024
    • (2023)Spatiotemporal Data Mining Problems and MethodsAnalytics10.3390/analytics20200272:2(485-508)Online publication date: 14-Jun-2023
    • (2023)Streaming deletion problems parameterized by vertex coverTheoretical Computer Science10.1016/j.tcs.2023.114178979(114178)Online publication date: Nov-2023
    • (2023)Small Vertex Cover Helps in Fixed-Parameter Tractability of Graph Deletion Problems over Data StreamsTheory of Computing Systems10.1007/s00224-023-10136-w67:6(1241-1267)Online publication date: 20-Sep-2023
    • (2022)Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots: A SurveyACM Computing Surveys10.1145/348789355:2(1-38)Online publication date: 18-Jan-2022
    • (2022)Effectiveness of Periocular Biometric Recognition Under Face Mask RestrictionsBreakthroughs in Digital Biometrics and Forensics10.1007/978-3-031-10706-1_11(241-255)Online publication date: 15-Oct-2022
    • (2021)Massively Parallel Discovery of Loosely Moving Congestion Patterns from Trajectory DataISPRS International Journal of Geo-Information10.3390/ijgi1011078710:11(787)Online publication date: 17-Nov-2021
    • (2021)Streaming Deletion Problems Parameterized by Vertex CoverFundamentals of Computation Theory10.1007/978-3-030-86593-1_29(413-426)Online publication date: 9-Sep-2021
    • (2020)Crime Hotspot DetectionImproving the Safety and Efficiency of Emergency Services10.4018/978-1-7998-2535-7.ch010(209-238)Online publication date: 2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media