Abstract
Spatial prediction(SP) based on machine learning(ML) has been applied to soil water quality, air quality, marine environment, etc. However, there are still deficiencies in dealing with the problem of small samples. Normally, ML requires large amounts of training samples to prevent underfitting. And the data augmentation(DA) methods of mixup and synthetic minority over-sampling technique(SMOTE) ignore the similarity of geographic information. Therefore, this paper proposes a modified upsampling method and combines it with the random forest spatial interpolation(RFSI) to deal with the small sample problem in geographical space. The modified upsampling is mainly reflected in the following two aspects. Firstly, in the process of selecting the nearest points, it is to select points with similar geographic information in some aspects of the category after classification. Secondly, the selected difference is the difference of each category. In order to verify the effectiveness of the proposed method, we use daily precipitation data for January 2018 in Chongqing. The experimental results show that the combination of the modified upsampling method and RFSI effectively improves the accuracy of SP.
Similar content being viewed by others
Data availability
For data and materials in this paper, please contact 2020112038@chd.edu.cn
References
Accion A, Arguello F, Heras DB (2020) Dual-window Superpixel data augmentation for hyperspectral image classification. Appl Sci-Basel 10(24):8833. https://doi.org/10.3390/app10248833
Alvarez O, Guo Q, Klinger RC, Li W, Doherty P (2014) Comparison of elevation and remote sensing derived products as auxiliary data for climate surface interpolation. Int J Climatol 34(7):2258–2268. https://doi.org/10.1002/joc.3835
Behrens T, Schmidt K, RAV R, Gries P, Scholten T, RA MM (2018) Spatial modelling with Euclidean distance fields and machine learning. Eur J Soil Sci 69(5):757–770. https://doi.org/10.1111/ejss.12687
Berndt C, Rabiei E, Haberlandt U (2014) Geostatistical merging of rain gauge and radar data for high temporal resolutions and various station density scenarios. J Hydrol 508:88–101. https://doi.org/10.1016/j.jhydrol.2013.10.028
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Their Appl 13(4):18–28
Hengl T, Nussbaum M, Wright MN, Heuvelink GBM, Graeler B (2018) Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. Peerj 6:e5518. https://doi.org/10.7717/peerj.5518
Huang C, Shibuya A (2020) High accuracy geochemical map generation method by a spatial autocorrelation-based mixture interpolation using remote sensing data. Remote Sens 12(12):1991. https://doi.org/10.3390/rs12121991
Kwak H, Lee WK, Saborowski J, Lee SY, Won MS, Koo KS, Lee MB, Kim SN (2012) Estimating the spatial pattern of human-caused forest fires using a generalized linear mixed model with spatial autocorrelation in South Korea. Int J Geogr Inf Sci 26(9):1589–1602. https://doi.org/10.1080/13658816.2011.642799
Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw 8(1):98–113
Lee H, Kim J, Kim EK, Kim S (2020) Wasserstein generative adversarial networks based data augmentation for radar data analysis. Appl Sci-Basel 10(4):1449. https://doi.org/10.3390/app10041449
Li HT, Shao ZD (2019) Review of spatial interpolation analysis algorithm. Comput Syst Appl 28(07):1–8. https://doi.org/10.15888/j.cnki.csa.006988
Li W, Chen C, Zhang MM, Li HC, Du Q (2019) Data augmentation for hyperspectral image classification with deep CNN. IEEE Geosci Remote Sens Lett 16(4):593–597. https://doi.org/10.1109/lgrs.2018.2878773
Li YS, Peng C, Ran XJ, Xue LF, Chai SL (2022) Soil geochemical prospecting prediction method based on deep convolutional neural networks-taking Daqiao gold deposit in Gansu Province, China as an example. China. Geology 5(1):71–83. https://doi.org/10.31035/cg2021044
Matheron G (1963) Principles Geostat Econ Geol 58(8):1246–1266
Mohanasundaram S, Udmale P, Shrestha S, Baghel T, Doshi SC, Narasimhan B, Kumar GS (2020) A new trend function-based regression kriging for spatial modeling of groundwater hydraulic heads under the sparse distribution of measurement sites. Acta Geophysica 68(3):751–772. https://doi.org/10.1007/s11600-020-00427-y
Mohsenzadeh Karimi S, Kisi O, Porrajabali M, Rouhani-Nia F, Shiri J (2020) Evaluation of the support vector machine, random forest and geo-statistical methodologies for predicting long-term air temperature. ISH J Hydraulic Eng 26(4):376–386
Nelder JA, Wedderburn RW (1972) Generalized linear models. J Royal Stat Soc: Series A (General) 135(3):370–384
Sekulic A, Kilibarda M, Heuvelink GBM, Nikolic M, Bajat B (2020) Random Forest Spatial Interpolation Remote Sensing 12(10):1687. https://doi.org/10.3390/rs12101687
Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46(sup1):234–240
Waske B, van der Linden S, Benediktsson JA, Rabe A, Hostert P (2010) Sensitivity of support vector machines to random feature selection in classification of hyperspectral data. IEEE Trans Geosci Remote Sens 48(7):2880–2889
Willmott CJ, Rowe CM, Philpot WD (1985) Small-scale climate maps: a sensitivity analysis of some common assumptions associated with grid-point interpolation and contouring. Am Cartographer 12(1):5–16
Wu TJ, Luo JC, Gao LJ, Sun YW, Yang YP, Zhou YN, Dong W, Zhang X (2021) Geoparcel-based spatial prediction method for grassland fractional vegetation cover mapping. IEEE JSelect Topics Appl Earth Observat Remote Sensing 14:9241–9253. https://doi.org/10.1109/jstars.2021.3110896
Xuan Thanh N, Ba Tung N, Khac Phong D, Quang Hung B, Thi Nhat Thanh N, Van Quynh V, Thanh Ha L (2015) Spatial interpolation of Meteorologic variables in Vietnam using the kriging method. J Inform Process Syst 11(1):134–147. https://doi.org/10.3745/jips.02.0016
Yan JB, Wu B, He QH (2021) An anisotropic IDW interpolation method with multiple parameters cooperative optimization. Acta Geodetica et Cartographica Sinica 50(5):675–684
Yang N, Zhang Z, Yang J, Hong Z (2022) Applications of data augmentation in mineral prospectivity prediction based on convolutional neural networks. Comput Geosci 165:105075. https://doi.org/10.1016/j.cageo.2022.105075
Zhan AY, Du F, Chen ZZ, Yin GX, Wang M, Zhang YJ (2022) A traffic flow forecasting method based on the GA-SVR. J High Speed Net 28(2):97–106. https://doi.org/10.3233/jhs-220682
Zhang X, Yang X (2020) Building small sample error correction model by DE-SVR during coal prediction, 4th IEEE information technology. Networking, Electronic and Automation Control Conference (ITNEC), Electr Network:2323–2326
Zhang HY, Cisse M, Dauphin YN, Lopez-Paz D (2017) Mixup: beyond empirical risk minimization. arXiv:1710.09412[cs.LG]. https://doi.org/10.48550/arXiv.1710.09412
Zhu L, Chen YS, Ghamisi P, Benediktsson JA (2018) Generative adversarial networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 56(9):5046–5063
Funding
This work was supported in part by National Natural Science Foundation of China under Grant 42071316, the Science and Technology Project of Inner Mongolia Autonomous Region under Grant 2021ZD0045, the Project of Chongqing Agricultural Industry Digital Map under Grant 21C00346, Key Research and Development Program of Shaanxi under Grant 2021NY-170, National Key Research and Development Program, Under Grant 2021YFB3900905 and 2021YFB3901300, National Natural Science Foundation of China under Grant 12001057, Fundamental Research Funds for the Central Universities, CHD under Grant 300102122101 and 300102269103.
Author information
Authors and Affiliations
Contributions
Jiao Sijia carried out the data preparation, performed the experiments, experimental analysis, and wrote the manuscript. Wu Tianjun outlined there search topic, proposed there search methodology, and designed the experiments. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of interest.
Additional information
Communicated by: H. Babaie
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sijia, J., Tianjun, W., Jiancheng, L. et al. Spatial prediction using random forest spatial interpolation with sample augmentation: a case study for precipitation mapping. Earth Sci Inform 16, 863–875 (2023). https://doi.org/10.1007/s12145-023-00936-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-023-00936-6