Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Leveraging the power of local spatial autocorrelation in geophysical interpolative clustering

Published: 01 September 2014 Publication History

Abstract

Nowadays ubiquitous sensor stations are deployed worldwide, in order to measure several geophysical variables (e.g. temperature, humidity, light) for a growing number of ecological and industrial processes. Although these variables are, in general, measured over large zones and long (potentially unbounded) periods of time, stations cannot cover any space location. On the other hand, due to their huge volume, data produced cannot be entirely recorded for future analysis. In this scenario, summarization, i.e. the computation of aggregates of data, can be used to reduce the amount of produced data stored on the disk, while interpolation, i.e. the estimation of unknown data in each location of interest, can be used to supplement station records. We illustrate a novel data mining solution, named interpolative clustering, that has the merit of addressing both these tasks in time-evolving, multivariate geophysical applications. It yields a time-evolving clustering model, in order to summarize geophysical data and computes a weighted linear combination of cluster prototypes, in order to predict data. Clustering is done by accounting for the local presence of the spatial autocorrelation property in the geophysical data. Weights of the linear combination are defined, in order to reflect the inverse distance of the unseen data to each cluster geometry. The cluster geometry is represented through shape-dependent sampling of geographic coordinates of clustered stations. Experiments performed with several data collections investigate the trade-off between the summarization capability and predictive accuracy of the presented interpolative clustering algorithm.

References

[1]
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of 29th international conference on very large data bases (VLDB 2003), pp 81-92.
[2]
Aggarwal CC, Han J, Wang J, Yu PS (2007) On clustering massive data streams: a summarization paradigm. In: Advances in database systems: data streams models and algorithms (book chapter), vol 31. Springer-US, pp 9-38.
[3]
Aho T, Zenko B, Dzeroski S, Elomaa T (2012) Multi-target regression with rule ensembles. J Mach Learn Res 2(13):2367-2407.
[4]
Angin P, Neville J (2008) A shrinkage approach for modeling non-stationary relational autocorrelation. In: Proceedings of the 8th IEEE international conference on data mining, IEEE Computer Society, pp 707-712.
[5]
Anselin L (1995) Local indicators of spatial association:lisa. Geogr Anal 27(2):93-115.
[6]
Appice A, Ceci M, Malerba D, Lanza A (2012) Learning and transferring geographically weighted regression trees across time. In: Proceedings of MSM/MUSE 2012, LNCS, vol 7472. Springer, Berlin, pp 97-117.
[7]
Appice A, Ciampi A, Malerba D (2013a) Summarizing numeric spatial data streams by trend cluster discovery. Data Mining Knowl Discov.
[8]
Appice A, Ciampi A, Malerba D, Guccione P (2013b) Using trend clusters for spatiotemporal interpolation of missing data in a sensor network. J Spatial Inf Sci 6(1):119-153.
[9]
Appice A, Pravilovic S, Malerba D, Lanza A (2013c) Enhancing regression models with spatio-temporal indicator additions. In: Baldoni M, Baroglio C, Boella G, Micalizio R (eds) Proceedings of AI*IA 2013: Advances in Artificial Intelligence--XIIIth international conference of the Italian Association for Artificial Intelligence, Lecture Notes in Computer Science, vol 8249. Springer, Berlin, pp 433-444.
[10]
Bailey T, Krzanowski W (2012) An overview of approaches to the analysis and modelling of multivariate geostatistical data. Math Geosci 44(4):381-393.
[11]
Blanchet FG, Legendre P, Borcard D (2008) Modelling directional spatial processes in ecological data. Ecol Model 215(4):325-336. http://www.sciencedirect.com/science/article/pii/S0304380008001798.
[12]
Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of ICML. Morgan Kaufmann, pp 55-63.
[13]
Boots B (2002) Local measures of spatial association. Ecoscience 9(2):168-176.
[14]
Burrough P, McDonnell R (1998) Principles of geographical information systems. Oxford University Press, Oxford.
[15]
Chen Z, Yang S, Li L, Xie Z (2010) A clustering approximation mechanism based on data spatial correlation in wireless sensor networks. In: Proceedings of the 9th conference on wireless telecommunications symposium, WTS 2010. IEEE Press, pp 208-214.
[16]
Chiky R, Hébrail G (2008) Summarizing distributed data streams for storage in data warehouses. In: Proceedings of the 10th international conference on data warehousing and knowledge discovery (DaWaK 2008), LNCS, vol 5182. Springer, Berlin, pp 65-74.
[17]
Cressie N (1990) The origins of kriging. Math Geol 22(3):239-252.
[18]
Cressie N (1993) Statistics for spatial data. Wiley, New York.
[19]
Debeljak M, Trajanov A, Stojanova D, Leprince F, D¿eroski S (2012) Using relational decision trees to model out-crossing rates in a multi-field setting. Ecol Model 245:75-83.
[20]
Dem¿ar D, Debeljak M, Lavigne C, D¿eroski S (2005) Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Abstracts of the 90th ESA annual meeting, The Ecological Society of America, p 152.
[21]
Dray S, Jombart T (2011) Revisiting guerry's data: introducing spatial constraints in multivariate analysis. Ann Appl Stat 5(4):2278-2299.
[22]
Dray S, Legendre P, Peres-Neto PR (2006) Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (pcnm). Ecol Model 196(34):483-493. ecolmodel.2006.02.015. http://www.sciencedirect.com/science/article/pii/S0304380006000925.
[23]
European Environment Agency (2006) Corine land cover 2006. http://sia.eionet.europa.eu/CLC2006.
[24]
Gama J (2010) Knowledge discovery from data streams, 1st edn. Chapman & Hall/CRC, Boca Raton.
[25]
Getis A (2008) A history of the concept of spatial autocorrelation: a geographer's perspective. Geogr Anal 40(3):297-309.
[26]
Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 24(3):189-206.
[27]
Goodchild M (1986) Spatial autocorrelation. Geo Books.
[28]
Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, Oxford.
[29]
Gora G, Wojna A (2002) RIONA: a classifier combining rule induction and k-NN method with automated selection of optimal neighbourhood. In: Proceedings of ECML 2002. Springer, Berlin, pp 111-123.
[30]
Holden ZA, Evans JS (2010) Using fuzzy c-means and local autocorrelation to cluster satellite-inferred burn severity classes. Int J Wildland Fire 19(7):853-860.
[31]
Ikonomovska E, Gama J, Dzeroski S (2011) Incremental multi-target model trees for data streams. In: Chu WC, Wong WE, Palakal MJ, Hung CC (eds) Proceedings of the 2011 ACM symposium on applied computing (SAC). ACM, pp 988-993.
[32]
Ingelrest F, Barrenetxea G, Schaefer G, Vetterli M, Couach O, Parlange M (2010) Sensorscope: application-specific sensor network for environmental monitoring. ACM Trans Sens Netw 17(1-17):32.
[33]
Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, Oxford.
[34]
Karydas C, Gitas I, Koutsogiannaki E, Lydakis-Simantiris N, Silleos G (2009) Evaluation of spatial interpolation techniques for mapping agricultural topsoil properties in Crete. In: Proceedings of EARSeL 2009, vol 8, pp 26-39.
[35]
Kelley P, Barry R (1999) Sparse spatial autoregressions. Stat Probab Lett 33:291-297.
[36]
Kim B, Tsiotras P (2009) Image segmentation on cell-center sampled quadtree and octree grids. pp 72, 480L-72, pp. 480L-9.
[37]
Kistler R, Kalnay E, Collins W, Saha S, White G, Woollen J, Chelliah M, Ebisuzaki W, Kanamitsu M, Kousky V, van den Dool H, Jenne R, Fiorino M (2001) The ncep/ncar 50-year reanalysis. Bull Am Meteorol Soc 82(2):247-267.
[38]
Krige DG (1951) A statistical approach to some mine valuation and allied problems on the Witwatersrand. Master's thesis.
[39]
Lam N (1983) Spatial interpolation methods: a review. Am Cartogr 10:129-149.
[40]
Legendre P (1993) Spatial autocorrelation: trouble or new paradigm? Ecology 74:1659-1673.
[41]
LeSage JH, Pace K (2001) Spatial dependence in data mining. In: Data mining for scientific and engineering applications. Kluwer, Dordrecht, pp 439-460.
[42]
Li J, Heap A (2008) A review of spatial interpolation methods for environmental scientists. Geoscience Australia, Record 2008/23.
[43]
Li L, Revesz P (2002) A comparison of spatio-temporal interpolation methods. GIScience, LNCS 2478. Springer, Berlin, pp 145-160.
[44]
Li L, Zhang X, Holt J, Tian J, Piltner R (2011) Spatiotemporal interpolation methods for air pollution exposure. In: Proceedings of SARA 2011, AAAI.
[45]
Lin G, Chen L (2004) A spatial interpolation method based on radial basis function networks incorporating a semivariogram model. J Hydrol 288:288-298.
[46]
Lu GY, Wong DW (2008) An adaptive inverse-distance weighting spatial interpolation technique. J Comput Geosci 34:1044-1055.
[47]
Michalski RS, Stepp RE (1983) Learning from observation: conceptual clustering. In: Carbonell JG, Mitchell TM (eds) Michalski RS. Machine learning, an artificial intelligence approach, Tioga, pp 331-364.
[48]
Nassar S, Sander J (2007) Effective summarization of multi-dimensional data streams for historical stream mining. In: Proceedings of the 19th international conference on scientific and statistical database management, SSDBM 2007. IEEE Computer Society, p 30.
[49]
NOAACoastWatch (2013a) Ndbc standard meteorological buoy data. http://coastwatch.pfeg.noaa.gov/erddap/tabledap/cwwcNDBCMet.html.
[50]
NOAACoastWatch (2013b) Wind diffusivity current, metop ascat, global, near real time (1 day composite). http://coastwatch.pfeg.noaa.gov/erddap/griddap/erdQAekm1day.html.
[51]
NOAACoastWatch (2013c) Wind stress, metop ascat, global, near real time (1 day composite). http://coastwatch.pfeg.noaa.gov/erddap/griddap/erdQAstress1day.html.
[52]
NOAANODC(2009) World ocean atlas 2009, seasonal climatology, 5 degree, temperature, salinity, oxygen. http://coastwatch.pfeg.noaa.gov/erddap/griddap/nodcWoa09sea5t.html.
[53]
Ohashi O, Torgo L (2012) Spatial interpolation using multiple regression. In: Zaki MJ, Siebes A, Yu JX, Goethals B, Webb GI, Wu X (eds) 12th IEEE international conference on data mining, ICDM 2012. IEEE Computer Society, pp 1044-1049.
[54]
Orkin M, Drogin R (1990) Vital statistics. McGraw Hill, New York.
[55]
Pace P, Barry R (1997) Quick computation of regression with a spatially autoregressive dependent variable. Geogr Anal 29(3):232-247.
[56]
Price M (2012) Arcgis 10: importing data from excel spreadsheets. http://www.esri.com/news/arcuser/0312/importing-data-from-excel-spreadsheets.html.
[57]
Rodrigues PP, Gama J, Lopes LMB (2008) Clustering distributed sensor data streams. In: Proceedings of the European conference on machine learning and knowledge discovery in databases, LNCS 5212. Springer, Berlin, pp 282-297.
[58]
Sampson PD, Guttorp P (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87:108-119.
[59]
Scrucca L (2005) Clustering multivariate spatial data based on local measures of spatial autocorrelation. Tech. Rep. 20, Quaderni del Dipartimento di Economia, Finanza e Statistica, Università di Perugia.
[60]
Sen Z, Salhn AD (2001) Spatial interpolation and estimation of solar irradiation by cumulative semivariograms. Solar Energy 71(1):11-21.
[61]
Shepard D (1968a) A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of the 1968 23rd ACM national conference, ACM'68. ACM, New York, NY, USA, pp 517-524. 1145/800186.810616.
[62]
Shepard D (1968b) A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of the 1968 ACM national conference, ACM, pp 517-524.
[63]
Song YC, Meng HD (2010) The application of cluster analysis in geophysical data interpretation. Comput Geosci 14(2):263-271.
[64]
Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R (2006) Monic: modeling and monitoring cluster transitions. In: Proceedings of the KDD 2006, ACM, pp 706-711.
[65]
Stein ML (1999) Interpolation of spatial data: some theory for kriging (springer series in statistics), 1st edn. Springer, Berlin.
[66]
Stojanova D (2009) Estimating forest properties from remotely sensed data by using machine learning. Master's thesis, Jo¿nef Stefan International Postgraduate School, Ljubljana, Slovenia.
[67]
Stojanova D, Ceci M, Appice A, Dzeroski S (2012) Network regression with predictive clustering trees. Data Min Knowl Discov 25(2):378-413.
[68]
Stojanova D, Ceci M, Appice A, Malerba D, Dzeroski S (2013) Dealing with spatial autocorrelation when learning predictive clustering trees. Ecol Inform 13:22-39.
[69]
Teegavarapu RSV, Meskele T, Pathak CS (2012) Geo-spatial grid-based transformations of precipitation estimates using spatial interpolation methods. Comput Geosci 40:28-39.
[70]
Tobler W (1979) Cellular geography. Philos Geogr 20:379-386.
[71]
Umer M, Kulik L, Tanin E (2010) Spatial interpolation in wireless sensor networks: localized algorithms for variogram modeling and Kriging. Geoinformatica 14(1):101-134.
[72]
Wang Y, Witten I (1997) Induction of model trees for predicting continuous classes. In: Proceedings of ECML 1997. Springer, Berlin, pp 128-137.
[73]
Yong J, Xiao-ling Z, Jun S (2007) Unsupervised classification of polarimetric SAR Image by quadtree segment and SVM. In: 1st Asian and Pacific conference on synthetic aperture radar, 2007 (APSAR 2007), pp 480-483.

Cited By

View all
  • (2017)Shaping City Neighborhoods Leveraging Crowd SensorsInformation Systems10.1016/j.is.2016.06.00964:C(368-378)Online publication date: 1-Mar-2017
  • (2016)Collective regression for handling autocorrelation of network data in a transductive settingJournal of Intelligent Information Systems10.1007/s10844-015-0361-846:3(447-472)Online publication date: 1-Jun-2016
  • (2016)Exploiting Spatial Correlation of Spectral Signature for Training Data Selection in Hyperspectral Image ClassificationDiscovery Science10.1007/978-3-319-46307-0_19(295-309)Online publication date: 19-Oct-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery  Volume 28, Issue 5-6
September 2014
482 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2014

Author Tags

  1. Clustering
  2. Geophysical data stream
  3. Inverse distance weighting
  4. Spatial autocorrelation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Shaping City Neighborhoods Leveraging Crowd SensorsInformation Systems10.1016/j.is.2016.06.00964:C(368-378)Online publication date: 1-Mar-2017
  • (2016)Collective regression for handling autocorrelation of network data in a transductive settingJournal of Intelligent Information Systems10.1007/s10844-015-0361-846:3(447-472)Online publication date: 1-Jun-2016
  • (2016)Exploiting Spatial Correlation of Spectral Signature for Training Data Selection in Hyperspectral Image ClassificationDiscovery Science10.1007/978-3-319-46307-0_19(295-309)Online publication date: 19-Oct-2016
  • (2015)A survey on multi-output regressionWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11575:5(216-233)Online publication date: 1-Sep-2015

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media