Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Spatial Prediction for Multivariate Non-Gaussian Data

Published: 27 March 2017 Publication History

Abstract

With the ever increasing volume of geo-referenced datasets, there is a real need for better statistical estimation and prediction techniques for spatial analysis. Most existing approaches focus on predicting multivariate Gaussian spatial processes, but as the data may consist of non-Gaussian (or mixed type) variables, this creates two challenges: (1) how to accurately capture the dependencies among different data types, both Gaussian and non-Gaussian; and (2) how to efficiently predict multivariate non-Gaussian spatial processes. In this article, we propose a generic approach for predicting multiple response variables of mixed types. The proposed approach accurately captures cross-spatial dependencies among response variables and reduces the computational burden by projecting the spatial process to a lower dimensional space with knot-based techniques. Efficient approximations are provided to estimate posterior marginals of latent variables for the predictive process, and extensive experimental evaluations based on both simulation and real-life datasets are provided to demonstrate the effectiveness and efficiency of this new approach.

Supplementary Material

a36-liu-apndx.pdf (liu.zip)
Supplemental movie, appendix, image and software files for, Spatial Prediction for Multivariate Non-Gaussian Data

References

[1]
T. C. Bailey and W. J. Krzanowski. 2000. Extensions to spatial factor methods with an illustration in geochemistry. Mathematical Geology 32, 6 (2000), 657--682.
[2]
S. Bandyopadhyay. 2005. Simulated annealing using a reversible jump Markov Chain Monte Carlo algorithm for fuzzy clustering. IEEE Transactions on Knowledge and Data Engineering 17, 4 (2005), 479--490.
[3]
Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang. 2008. Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 4 (2008), 825--848.
[4]
Mario Boley and Henrik Grosskreutz. 2008. A randomized approach for approximating the number of frequent sets. In Proceedings of the 2008 8th IEEE International Conference on Data Mining. 43--52.
[5]
Edwin V. Bonilla, Kian M. Chai, and Christopher Williams. 2008. Multi-task Gaussian process prediction. Advances in Neural Information Processing Systems, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis (Eds.). Vol. 20. Curran Associates, Inc., 153--160. http://papers.nips.cc/paper/3189-multi-task-gaussian-process-prediction.pdf.
[6]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, New York, NY.
[7]
Leo Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth.
[8]
Catherine A. Calder. 2007. Dynamic factor process convolution models for multivariate space time data with application to air quality assessment. Environmental and Ecological Statistics 14, 3 (2007), 229--247.
[9]
Pierrette Chagneau, Frederic Mortier, Nicolas Picard, and Jean-NoÃńl Bacro. 2010. Hierarchical Bayesian model for Gaussian, Poisson and ordinal random fields. In Quantitative Geology and Geostatistics, P. M. Atkinson and C. D. Lloyd (Eds.), Vol. 16. Springer, The Netherlands, 333--344.
[10]
Pierrette Chagneau, Fradaric Mortier, Nicolas Picard, and Jean-Nol Bacro. 2011. A hierarchical Bayesian model for spatial prediction of multivariate non-Gaussian random fields. Biometrics 67, 1 (2011), 97--105.
[11]
Jorge Chica-Olmo. 2007. Prediction of housing location price by a multivariate spatial method: Cokriging. Journal of Real Estate Research 29, 1 (2007), 95--114.
[12]
Jungsoon Choi, Brian J. Reich, Montserrat Fuentes, and Jerry M. Davis. 2009. Multivariate spatial-temporal modeling and prediction of speciated fine particles 3, 2 (2009), 407--418.
[13]
William F. Christensen and Yasuo Amemiya. 2001. Generalized shifted-factor analysis method for multivariate geo-referenced data. Mathematical Geology 33, 7 (2001), 801--824.
[14]
William F. Christensen and Yasuo Amemiya. 2002. Latent variable analysis of multivariate spatial data. Journal of the American Statistical Association 97 (2002), 302--317.
[15]
Noel Cressie. 1991. Statistics for Spatial Data. Wiley-Interscience.
[16]
N. Cressie and G. Johannesson. 2008. Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 1 (2008), 209--226.
[17]
Robin A. Dubin. 1992. Spatial autocorrelation and neighborhood quality. Regional Science and Urban Economics 22, 3 (September 1992), 433--452.
[18]
Andrew O. Finley, Huiyan Sang, Sudipto Banerjee, and Alan E. Gelfand. 2009. Improving the performance of predictive process modeling for large datasets. Computational Statistics 8 Data Analysis 53, 8 (2009), 2873--2884.
[19]
Jerome H. Friedman. 1991. Rejoinder: Multivariate adaptive regression splines. The Annals of Statistics 19, 1 (1991), 123--141.
[20]
Jerome H. Friedman. 2000. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29 (2000), 1189--1232.
[21]
A. E. Gelfand and S Banerjee. 2010. Multivariate spatial process models. In Handbook of Spatial Statistics. Chapman 8 Hall/CRC Press, 495--515.
[22]
A. E. Gelfand, A. M. Schmidt, S. Banerjee, and C. F. Sirmans. 2004. Nonstationary multivariate process modeling through spatially varying coregionalization (with discussion). Test 13, 2 (2004), 1--50.
[23]
P. Goovaerts. 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press, USA.
[24]
Michel Grzebyk and Hans Wackernagel. 1994. Multivariateanalysis and spatial/temporal scales: Real and complex models. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.3893.
[25]
Mikhail Kanevski. 2012. Multitask learning of environmental spatial data. In 6th International Congress on Environmental Modelling and Software Society (iEMSs). Leipzig, Germany.
[26]
B. M. Golam Kibria, Li Sun, James V. Zidek, and Nhu D. Le. 2002. Bayesian spatial prediction of random space-time fields with application to mapping PM2.5 exposure. Journal of the American Statistical Association 97 (2002), 112--124.
[27]
Guichong Li, Nathalie Japkowicz, Trevor J. Stocki, and R. Kurt Ungar. 2008. Border sampling through coupling Markov Chain Monte Carlo. (2008), 393--402.
[28]
Wei Liu, Yu Zheng, Sanjay Chawla, Jing Yuan, and Xie Xing. 2011. Discovering spatio-temporal causal interactions in traffic data streams. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, New York, NY, 1010--1018.
[29]
A. B. McBratney, I. O. A. Odeh, T. F. A. Bishop, M. S. Dunbar, and T. M. Shatar. 2005. An overview of pedometric techniques for use in soil survey. Geoderma 97, 3--4 (2005), 293--327.
[30]
Marco Minozzo and Clarissa Ferrari. 2013. Multivariate geostatistical mapping of radioactive contamination in the maddalena archipelago (Sardinia, italy): Spatial special issue. AStA Advances in Statistical Analysis 97, 2 (2013), 195--213.
[31]
Marco Minozzo and Daniela Fruttini. 2004. Loglinear spatial factor analysis: An application to diabetes mellitus complications. Environmetrics 15, 5 (2004), 423--434.
[32]
Marco Minozzo and Laura Ferracuti. 2012. On the existence of some skew-normal stationary processes. Chilean Journal of Statistics 3 (2012), 157--170.
[33]
Seyed H. Mohammadi, Vandana Pursnani Janeja, and Aryya Gangopadhyay. 2009. Discretized spatio-temporal scan window. In Proceedings of the SIAM International Conference on Data Mining, SDM 2009. 1195--1206.
[34]
Daniel B. Neill and Andrew W. Moore. 2004. Rapid detection of significant spatial clusters. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 256--265.
[35]
Orlando Ohashi and Luís Torgo. 2012. Spatial interpolation using multiple regression. In Proceedings of ICDM. 1044--1049.
[36]
Victor De Oliveira. 2000. Bayesian prediction of clipped Gaussian random fields. Computational Statistics and Data Analysis 34, 3 (2000), 299--314.
[37]
Kelley Pace and Ronald Barry. 1997. Sparse spatial autoregressions. Statistics 8 Probability Letters 33, 3 (1997), 291--297.
[38]
Gregory Piatetsky-Shapiro, Chabane Djeraba, Lise Getoor, Robert Grossman, Ronen Feldman, and Mohammed Zaki. 2006. What are the grand challenges for data mining? KDD-2006 panel report. SIGKDD Explorations Newsletter 8, 2 (December 2006), 70--77.
[39]
Brian J. Reich and Montserrat Fuentes. 2007. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. The Annals of Applied Statistics 1, 1 (2007), 249--264.
[40]
Qian Ren and Sudipto Banerjee. 2013. Hierarchical factor models for large spatially misaligned data: A low-rank predictive process approach. Biometrics 69, 1 (2013), 19--30.
[41]
Greg Ridgeway and David Madigan. 2002. Bayesian analysis of massive datasets via particle filters. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). ACM, New York, NY, USA, 5--13.
[42]
Håvard Rue and Leonhard Held. 2005. Gaussian Markov Random Fields: Theory and Applications, Monographs on Statistics and Applied Probability, Vol. 104. Chapman 8 Hall, London.
[43]
Håvard Rue, Sara Martino, and Nicolas Chopin. 2009. Approximate Bayesian inference for latent Gaussian models by using integrated nested laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71, 2 (2009), 319--392.
[44]
A. Schmidt and M. Rodriguez. 2011. Modelling multivariate counts varying continuously in space. Bayesian Statistic (2011).
[45]
Salford Systems. 2017. Homepage. Retrieved from http://www.salford-systems.com/.
[46]
spBayes. 2012. spBayes: Univariate and multivariate spatial modeling. Retrieved from http://cran.r-project.org/web/packages/spBayes/.
[47]
Luis Torgo and Orlando Ohashi. 2011. 2D-interval predictions for time series. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, New York, NY, 787--794.
[48]
Cristiano Varin, Gudmund Host, and Oivind Skare. 2005. Pairwise likelihood inference in spatial generalized linear mixed models. Computational Statistics 8 Data Analysis 49, 4 (June 2005), 1173--1191.
[49]
H. Wackernagel. 2003. Multivariate Geostatistics: An Introduction with Applications (2nd ed.). Springer-Verlag. 381.
[50]
F. Wang and M. M. Wall. 2003. Generalized common spatial factor model. Biostatistics (Oxford, England) 4, 4 (October 2003), 569--582.
[51]
R. Webster and M. A. Oliver. 1990. Statistical Methods in Soil and Land Resource Survey. Oxford University Press, 316.
[52]
M. A. Wibrin, P. Bogaert, and D. Fasbender. 2006. Combining categorical and continuous spatial information within the Bayesian Maximum Entropy paradigm. Stochastic Environmental Research and Risk Assessment 20, 6 (2006), 423--433.
[53]
Robert L. Wolpert and Katja Ickstadt. 1997. Poisson/gamma random field models for spatial statistics. Biometrika 85, 2 (1997), 251--267.
[54]
Elizabeth Wu, Wei Liu, and Sanjay Chawla. 2008. Spatio-temporal outlier detection in precipitation data. In Proceedings of the KDD Workshop on Knowledge Discovery from Sensor Data. 115--133.
[55]
Mingxi Wu, Chris Jermaine, Sanjay Ranka, Xiuyao Song, and John Gums. 2010. A model-agnostic framework for fast spatial anomaly detection. TKDD 4, 4 (2010), 20.
[56]
Mingxi Wu, Xiuyao Song, Chris Jermaine, Sanjay Ranka, and John Gums. 2009. A LRT framework for fast spatial anomaly detection. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France, 887--896.
[57]
Harry Zhang and Shengli Sheng. 2004. Learning weighted naive Bayes with accurate ranking. In Proceedings of the Fourth IEEE International Conference on Data Mining. 567--570.
[58]
J. Zhu, J. C. Eickhoff, and P. Yan. 2005. Generalized linear latent variable models for repeated measures of spatially correlated multivariate data. Biometrics 61, 3 (2005), 674--683.
[59]
Wei Zhuo, Prabhat, Chris Paciorek, Cari Kaufman, and Wes Bethel. 2011. Parallel kriging analysis for large spatial datasets. In Proceedings of ICDM Workshops’11. 38--44.

Cited By

View all
  • (2023)TechPat: Technical Phrase Extraction for Patent MiningACM Transactions on Knowledge Discovery from Data10.1145/359660317:9(1-31)Online publication date: 15-Jun-2023
  • (2023)Modeling Within-Basket Auxiliary Item Recommendation with Matchability and UbiquityACM Transactions on Intelligent Systems and Technology10.1145/357415714:3(1-19)Online publication date: 17-Feb-2023
  • (2023)Joint spatial modelling of malaria incidence and vector's abundance shows heterogeneity in malaria‐vector geographical relationshipsJournal of Applied Ecology10.1111/1365-2664.1456561:2(365-378)Online publication date: 22-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 3
August 2017
372 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3058790
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2017
Accepted: 01 November 2016
Revised: 01 May 2015
Received: 01 November 2014
Published in TKDD Volume 11, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate bayesian inference
  2. Gaussian and non-Gaussian processes
  3. Laplace approximation
  4. computational statistics
  5. geostatistics
  6. predictive process model

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)TechPat: Technical Phrase Extraction for Patent MiningACM Transactions on Knowledge Discovery from Data10.1145/359660317:9(1-31)Online publication date: 15-Jun-2023
  • (2023)Modeling Within-Basket Auxiliary Item Recommendation with Matchability and UbiquityACM Transactions on Intelligent Systems and Technology10.1145/357415714:3(1-19)Online publication date: 17-Feb-2023
  • (2023)Joint spatial modelling of malaria incidence and vector's abundance shows heterogeneity in malaria‐vector geographical relationshipsJournal of Applied Ecology10.1111/1365-2664.1456561:2(365-378)Online publication date: 22-Dec-2023
  • (2023)Fraud detection in the distributed graph databaseCluster Computing10.1007/s10586-022-03540-326:1(515-537)Online publication date: 1-Feb-2023
  • (2022)DNformer: Temporal Link Prediction with Transfer Learning in Dynamic NetworksACM Transactions on Knowledge Discovery from Data10.1145/355189217:3(1-21)Online publication date: 2-Aug-2022
  • (2022)Survey on the Objectives of Recommender Systems: Measures, Solutions, Evaluation Methodology, and New PerspectivesACM Computing Surveys10.1145/352744955:5(1-38)Online publication date: 3-Dec-2022
  • (2021)Multi-affect(ed): improving recommendation with similarity-enhanced user reliability and influence propagationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-9511-415:5Online publication date: 1-Oct-2021
  • (2021)RNe2Vec: information diffusion popularity prediction based on repost network embeddingComputing10.1007/s00607-020-00858-x103:2(271-289)Online publication date: 1-Feb-2021
  • (2020)A Survey of Researches on Personalized Bundle Recommendation TechniquesMachine Learning for Cyber Security10.1007/978-3-030-62460-6_26(290-304)Online publication date: 8-Oct-2020
  • (2018)Context-aware trust network extraction in large-scale trust-oriented social networksWorld Wide Web10.1007/s11280-017-0485-621:3(713-738)Online publication date: 1-May-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media