Abstract
Modeling spatially distributed phenomena in terms of its controlling factors is a recurring problem in geoscience. Most efforts concentrate on predicting the value of response variable in terms of controlling variables either through a physical model or a regression model. However, many geospatial systems comprises complex, nonlinear, and spatially non-uniform relationships, making it difficult to even formulate a viable model. This paper focuses on spatial partitioning of controlling variables that are attributed to a particular range of a response variable. Thus, the presented method surveys spatially distributed relationships between predictors and response. The method is based on association analysis technique of identifying emerging patterns, which are extended in order to be applied more effectively to geospatial data sets. The outcome of the method is a list of spatial footprints, each characterized by a unique “controlling pattern”—a list of specific values of predictors that locally correlate with a specified value of response variable. Mapping the controlling footprints reveals geographic regionalization of relationship between predictors and response. The data mining underpinnings of the method are given and its application to a real world problem is demonstrated using an expository example focusing on determining variety of environmental associations of high vegetation density across the continental United States.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data. Washington, D.C., pp 26–28, 207–216
Boulesteix AL, Tutz G, Strimmer K (2003) A cart-based approach to discover emerging patterns in microarray data. Bioinformatics 19(18):2465–2472
Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering. Heidelberg, Germany
Ceci M, Appice A, Malerba D (2007) Discovering emerging patterns in spatial databases: a multi-relational approach. In: Knowledge discovery in databases: PKDD 2007, series: lecture notes in artificial intelligence, vol 4702. Springer, Berlin, pp 390–397
Cormode G, Muthukrishnan S (2004) What’s new: finding significant differences in network data streams. In: IEEE INFOCOM
Cressie, NA (1993) Statistics for spatial data. Wiley, New York
Ding W, Stepinski TF, Parmar R, Jiang D, Eick CF (2009) Discovery of feature-based hot spots using supervised clustering. Comput Geosci 35:1508–1516
Ding W, Stepinski TF, Salazar, J (2009) Discovery of geospatial discriminating patterns from remote sensing datasets. In: SIAM international conference on data mining (SDM), Nevada, April 2009
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: KDD ’99: proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. San Diego, California, United States
Korkalainen T, Lauren A (2006) Using phytogeomorphology, cartography and GIS to explain forest site productivity expressed as tree height in southern and central Finland. Geomorphology 74:271–284
Larsen DR, Speckman, PL (2004) Multivariate regression trees for analysis of abundance data. Biometrics, 60(2):543–549
Li J, Wong L (2005) Structural geography of the space of emerging patterns. Intelligent Data Analysis 9(6):567–588
Li J, Yang Q (2007) Strong compound-risk factors: efficient discovery through emerging patterns and contrast sets. IEEE Trans Inf Technol Biomed 11:544–552
Li J, Liu H, S-K Ng, Wong L (2003) Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics 19:ii93–ii102
Liaghati T, Preda M, Cox M (2003) Heavy metal distribution and controlling factors within coastal plain sediments, Bells Creek catchment, southeast Queensland, Australia. Environ Int 29:935–948
Lobell, J. I. Ortiz-Monasterio, Asner GP, Naylor RL, Falcon WP (2005) Combining field surveys, remote sensing, and regression trees to understand yield variations in an irrigated wheat landscape. Agron J 97:241–249
Munkres J (1999) Topology, 2nd edn. Prentice Hall, Upper Saddle River
Navas A, Machín J (2002) Spatial distribution of heavy metals and arsenic in soils of Aragón (northeast Spain): controlling factors and environmental implications. Appl Geochem 17:961–973
ORNL (2009) Oak Ridge National Laboratory distributed active archive center data holdings.
Podraza R, Tomaszewski K (2005) KTDA: emerging patterns based data analysis system. In: XXI fall meeting of polish information processing society, pp 213–221
PRISM (2009) PRISM (parameter-elevation regressions on independent slopes model) climate mapping system products matrix. PRISM, Corvallis
Remmel TK, Csillag, F (2006) Mutual information spectra for comparing categorical maps. Int J Remote Sens 27:1425–1452
Rousseeuw J, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88:1273–1283
Rusjan S, Mikos, M (2008) Assessment of hydrological and seasonal controls over the nitrate flushing from a forested watershed using a data mining technique. Hydrol Earth Syst Sci 12:645–656
Seamless (2009) National map seamless server. USGS, Denver
Steegen A, Govers G, Takkena I, Nachtergaelea J, Poesena J, Merckxb R (2001) Factors controlling sediment and phosphorus export from two Belgian agricultural catchments. J Environ Qual 30:1249–1258
Stepinski T, Ding W, Eick C (2008) Discovering controlling factors of geospatial variables. In: The 16th ACM SIGSPATIAL international conference on advances in geographic information systems (ACM GIS 2008). Irvine, CA, USA, pp 1–4
Wang X, Qin Y (2005) Spatial distribution of metals in urban topsoils of Xuzhou (China): controlling factors and environmental implications. Environ Geol 49(6):905–914
White D, Sifneos JC (2002) Regression tree cartography. J Comput Graph Stat 11(3):600–614
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn (Morgan Kaufmann series in data management systems). Morgan Kaufmann, San Francisco
Acknowledgements
The work is supported in part by the National Science Foundation under Grant IIS-0812271. A portion of this research was conducted at the Lunar and Planetary Institute, which is operated by the USRA under contract CAN-NCC5-679 with NASA. This is LPI Contribution No.1532.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Stepinski, T.F., Ding, W. & Eick, C.F. Controlling patterns of geospatial phenomena. Geoinformatica 15, 399–416 (2011). https://doi.org/10.1007/s10707-010-0107-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-010-0107-2