An Improved Gradient Boosting Regression Tree Estimation Model for Soil Heavy Metal (Arsenic) Pollution Monitoring Using Hyperspectral Remote Sensing
<p>Study area locations. Study area I is Daye. Study area II is Honghu.</p> "> Figure 2
<p>Characteristic band selection algorithm flowchart.</p> "> Figure 3
<p>Histograms of the As concentrations of the soil samples. (<b>a</b>) Soil samples from Honghu. (<b>b</b>) Soil samples from Daye.</p> "> Figure 4
<p>The spectra of the soil samples from the different regions. (<b>a</b>) Honghu. (<b>b</b>) Daye.</p> "> Figure 5
<p>(<b>a</b>) Change in the number of selected variables with the change in the number of samples runs for the CARS algorithm in Daye. (<b>b</b>) Change in the 10-fold root-mean-square error of cross-validation value with the change in the number of sample runs for the CARS algorithm in Daye. (<b>c</b>) Change in the regression coefficient variable with the change in the number of samples runs for the CARS algorithm in Daye.</p> "> Figure 6
<p>(<b>a</b>) Change in the number of selected variables with the change in the number of samples runs for the CARS algorithm in Honghu. (<b>b</b>) Change in the 10-fold root-mean-square error of cross-validation value with the change in the number of sample runs for the CARS algorithm in Honghu. (<b>c</b>) Change in the regression coefficient variable with the change in the number of samples runs for the CARS algorithm in Honghu.</p> "> Figure 7
<p>Variation of the correlation coefficient after the different pretreatments. (<b>a</b>) Honghu; (<b>b</b>) Daye.</p> "> Figure 8
<p>A comparison between the measured values and predicted values of the different regression models for Honghu. (<b>a</b>) Gradient boosting regression tree model. (<b>b</b>) Random forest regression model. (<b>c</b>) Linear regression model. (<b>d</b>) Support vector machine regression model.</p> "> Figure 9
<p>A comparison between the measured values and predicted values of the different regression models for Daye. (<b>a</b>) Gradient boosting regression tree model. (<b>b</b>) Random forest regression model. (<b>c</b>) Linear regression model. (<b>d</b>) Support vector machine regression model.</p> "> Figure 9 Cont.
<p>A comparison between the measured values and predicted values of the different regression models for Daye. (<b>a</b>) Gradient boosting regression tree model. (<b>b</b>) Random forest regression model. (<b>c</b>) Linear regression model. (<b>d</b>) Support vector machine regression model.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Areas
2.2. Research Methods
2.2.1. The CARS-SCA Characteristic Band Selection Algorithm
2.2.2. Gradient Boosting Regression Tree
2.2.3. Support Vector Machine Regression
2.2.4. Random Forest Regression
2.3. Accuracy Evaluation
2.4. Experimental Procedures
2.4.1. Soil Collection and Preparation
2.4.2. Soil Spectral Reflectance Measurement
2.4.3. Spectral Pretreatment
2.4.4. Calibration Set and Validation Set
3. Results and Discussion
3.1. The CARS-SCA Characteristic Band Selection Algorithm
3.2. Characteristic Band Improvement
3.3. Comparative Analysis
3.3.1. Analysis of the Results of the Feature Selection Algorithm Based on CARS
3.3.2. Analysis of the Results of the Feature Selection Algorithm Based on CARS-SCA
3.4. Regression Model
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Li, Y.-m.; Ma, J.-h.; Liu, D.-x.; Sun, Y.-l.; CHEN, Y.-f. Assessment of Heavy Metal Pollution and Potential Ecological Risks of Urban Soils in Kaifeng City, China. Environ. Sci. 2015, 3, 1037–1044. [Google Scholar]
- Rehman, Z.U.; Khan, S.; Brusseau, M.L.; Shah, M.T. Lead and cadmium contamination and exposure risk assessment via consumption of vegetables grown in agricultural soils of five-selected regions of Pakistan. Chemosphere 2017, 168, 1589–1596. [Google Scholar] [CrossRef]
- St Luce, M.; Ziadi, N.; Gagnon, B.; Karam, A. Visible near infrared reflectance spectroscopy prediction of soil heavy metal concentrations in paper mill biosolid- and liming by-product-amended agricultural soils. Geoderma 2017, 288, 23–36. [Google Scholar] [CrossRef]
- Marín, A.; Andrades, M.; Iñigo, V.; Jiménez-Ballesta, B. Lead and Cadmium in Soils of La Rioja Vineyards, Spain. Land Degrad. Dev. 2016, 27, 1286–1294. [Google Scholar] [CrossRef]
- Hu, B.; Chen, S.; Hu, J.; Xia, F.; Xu, J.-f.; Li, Y.; Shi, Z. Application of portable XRF and VNIR sensors for rapid assessment of soil heavy metal pollution. PLoS ONE 2017, 12, e0172438. [Google Scholar] [CrossRef] [PubMed]
- Singh, A.N.; Zhuang, D.-f.; Singh, A.N.; Pan, J.-j.; Qiu, D.-s.; Shi, R.-h. Estimation of As and Cu Contamination in Agricultural Soils Around a Mining Area by Reflectance Spectroscopy: A Case Study. Pedosphere 2009, 19, 719–726. [Google Scholar]
- Gholizadeh, A.; Boruvka, L.; Saberioon, M.M.; KOZÁK, J.; Vašát, R.; Němeček, K. Comparing different data preprocessing methods for monitoring soil heavy metals based on soil spectral features. Soil Water Res. 2015, 10, 218–227. [Google Scholar] [CrossRef]
- Tian, Y.; Shen, R.-p.; Ding, G.-x. Application of Support Vector Machine on Soil Magnesium Content Estimation Based on Hyper-Spectra. Soils 2015, 47, 602–607. [Google Scholar]
- Ma, W.-b.; Tan, K.; Li, H.-d.; Yan, Q.-w. Hyperspectral inversion of heavy metals in soil of a mining area using extreme learning machine. J. Ecol. Rural Environ. 2016, 32, 213–218. [Google Scholar]
- Sun, W.-c.; Zhang, X.; Sun, X.-j.; Cen, Y. Predicting nickel concentration in soil using reflectance spectroscopy associated with organic matter and clay minerals. Geoderma 2018, 327, 25–35. [Google Scholar] [CrossRef]
- Tan, K.; Wang, H.-m.; Zhang, Q.-q.; Jia, X.-p. An improved estimation model for soil heavy metal(loid) concentration retrieval in mining areas using reflectance spectroscopy. J. Soils Sediments 2018, 18, 2008–2022. [Google Scholar] [CrossRef]
- Zhang, Q.-x.; Zhang, H.-b.; Zhang, H.-j.; Wang, X.-s.; Liu, W.-k. Hybrid Inversion Model of Heavy Metals with Hyperspectral Reflectance in Cultivated Soils of Main Grain Producing Areas. Trans. Chin. Soc. Agric. Mach. 2017, 48, 148–155. [Google Scholar]
- Sun, Q.-b.; Yin, C.-q.; Deng, J.-f.; Zhang, D.-f. Characteristics of soil-vegetable pollution of heavy metals and health risk assessment in Daye mining area. Environ. Chem. 2013, 32, 671–677. [Google Scholar]
- Li, H.-d.; Liang, Y.-z.; Xu, Q.-s.; Cao, D.-s. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 2009, 648, 77–84. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.-h.; Chen, J.-y.; Zheng, X.-l.; Zhu, C.; Wang, X.-l.; Shan, C.-z. Inversion of Cadmium Content in Agriculture Soil Based on SGA-RF Algorithm. Trans. Chin. Soc. Agric. Mach. 2018. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (With discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
- Zheng, C.; Kasprowicz, C.G.; Saunders, C. Customized Routing Optimization Based on Gradient Boost Regressor Model. arXiv 2017, arXiv:171011118. [Google Scholar]
- Ismail, S.; Shabri, A.; Samsudin, R. A hybrid model of self-organizing maps (SOM) and least square support vector machine (LSSVM) for time-series forecasting. Expert Syst. Appl. 2011, 38, 10574–10578. [Google Scholar] [CrossRef]
- Naghibi, S.A.; Pourghasemi, H.R.; Dixon, B. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ. Monit. Assess. 2016, 188, 44. [Google Scholar] [CrossRef]
- Tan, K.; Ye, Y.-y.; Du, P.-j.; Zhang, Q.-q. Estimation of heavy metal concentrations in reclaimed mining soils using reflectance spectroscopy. Spectrosc. Spectr. Anal. 2014, 34, 3317. [Google Scholar]
- Wang, Y.-y.; Qi, Y.-b.; Chen, Y.; Xie, F. Prediction of Soil Organic Matter Based on Multi-resolution Remote Sensing Data and Random Forest Algorithm. Acta Pedol. Sin. 2016. [Google Scholar] [CrossRef]
- Zhang, X.; Sun, W.-c.; Cen, Y.; Zhang, L.-f.; Wang, N. Predicting cadmium concentration in soils using laboratory and field reflectance spectroscopy. Sci. Total Environ. 2019, 650, 321–334. [Google Scholar] [CrossRef] [PubMed]
- Zhong, Y.-f.; Wang, X.-y.; Xu, Y.; Wang, S.-y.; Jia, T.-y.; Hu, X.; Zhao, J.; Wei, L.-f.; Zhang, L.-p. Mini-UAV-Borne Hyperspectral Remote Sensing: From Observation and Processing to Applications. IEEE Geosci. Remote Sens. Mag. 2018, 6, 46–62. [Google Scholar] [CrossRef]
- Shi, T.-z.; Chen, Y.-y.; Liu, Y.-l.; Wu, G.-f. Visible and near-infrared reflectance spectroscopy—An alternative for monitoring soil contamination by heavy metals. J. Hazard. Mater. 2013, 265C, 166–176. [Google Scholar] [CrossRef]
- Haubrock, S.N.; Chabrillat, S.; Kuhnert, M.; Hostert, P.; Kaufmann, H. Surface soil moisture quantification and validation based on hyperspectral data and field measurements. J. Appl. Remote Sens. 2008, 2, 183–198. [Google Scholar] [CrossRef]
- Sun, W.-w.; Yang, G.; Wu, K.; Li, W.-y.; Zhang, D.-f. Pure endmember extraction using robust kernel archetypoid analysis for hyperspectral imagery. ISPRS J. Photogramm. Remote Sens. 2017, 131, 147–159. [Google Scholar] [CrossRef]
- Sun, W.-w.; Ma, J.; Yang, G.; Du, B.; Zhang, L.-p. A Poisson nonnegative matrix factorization method with parameter subspace clustering constraint for endmember extraction in hyperspectral imagery. ISPRS J. Photogramm. Remote Sens. 2017, 128, 27–39. [Google Scholar] [CrossRef]
- Sun, W.-c.; Zhang, X. Estimating soil zinc concentrations using reflectance spectroscopy. Int. J. Appl. Earth Obs. Geoinf. 2017, 58, 126–133. [Google Scholar] [CrossRef]
- State Environmental Protection Administration of China. Environmental Quality Standard for Soils (GB15618-1995); Standards Press of China: Beijing, China, 1995. [Google Scholar]
- Angelopoulou, T.; Dimitrakos, A.; Terzopoulou, E.; Zalidis, G.; Theocharis, J.; Stafilov, T.; Zouboulis, A. Reflectance Spectroscopy (Vis-NIR) for Assessing Soil Heavy Metals Concentrations Determined by two Different Analytical Protocols, Based on ISO 11466 and ISO 14869-1. Water Air Soil Pollut. 2017, 228, 436. [Google Scholar] [CrossRef]
- Sun, W.-w.; Du, Q. Graph-Regularized Fast and Robust Principal Component Analysis for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2018, PP, 1–11. [Google Scholar] [CrossRef]
Study Area | Sample Type | Sample Size | Minimum (ug/g) | Maximum (ug/g) | Mean (ug/g) | SD | CV (%) | Skewness | Kurtosis |
---|---|---|---|---|---|---|---|---|---|
Honghu | Calibration | 19 | 27.33 | 39.21 | 33.44 | 3.34 | 9.98% | 0.03 | −0.49 |
Validation | 10 | 26.54 | 37.09 | 31.7 | 3.72 | 11.7% | 0.05 | −1.17 | |
Daye | Calibration | 41 | 7.19 | 12.84 | 9.37 | 1.17 | 12.5% | 0.61 | 0.37 |
Validation | 22 | 7.04 | 11.24 | 9.12 | 0.99 | 10.8% | 0.31 | 0.13 |
Study Area | Characteristic Band Wavelengths (nm) | Modeling Method | |||
---|---|---|---|---|---|
Honghu | 401.9, 405, 408.1, 437.3, 813.6, 817.3, 818.6, 822.3, 823.5, 827.2, 978.9,997.6,998.5, 999.4, 1000.3, 1896.8, 1929.2, 1942.9, 2303, 2357.7, 2362.4, 2376.4, 2385.7 | GBRT | −0.9364 | 4.9147 | 3.6577 |
RF | −0.7230 | 4.6360 | 3.7935 | ||
LR | −0.3553 | 4.1116 | 3.2008 | ||
SVMR | −0.5136 | 4.3452 | 3.5838 | ||
Daye | 401, 402, 420, 566, 577, 676, 677, 688, 689, 690, 697, 705, 706, 707, 822, 826, 827, 871, 957, 984, 1826, 1906, 2224, 2232, 2344, 2382, 2393, 2394 | GBRT | 0.2140 | 0.8555 | 0.7089 |
RF | 0.4131 | 0.7392 | 0.6488 | ||
LR | 0.3028 | 0.8057 | 0.6568 | ||
SVMR | 0.0046 | 0.9628 | 0.7804 |
Study Area | Characteristic Band Wavelengths (nm) | Modeling Method | |||
---|---|---|---|---|---|
Honghu | 827.2, 1000.3, 1929.2, 1949.2, 2357.7 | GBRT | 0.9711 | 0.6007 | 0.4990 |
RF | 0.9557 | 0.7432 | 0.6714 | ||
LR | 0.8914 | 1.1638 | 0.9212 | ||
SVMR | 0.9111 | 1.0531 | 0.8139 | ||
Daye | 402, 420, 566, 577, 689, 690, 697, 2382, 2393 | GBRT | 0.9796 | 0.1379 | 0.1078 |
RF | 0.9542 | 0.2035 | 0.1562 | ||
LR | 0.9018 | 0.3023 | 0.2269 | ||
SVMR | 0.8691 | 0.3491 | 0.1935 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, L.; Yuan, Z.; Zhong, Y.; Yang, L.; Hu, X.; Zhang, Y. An Improved Gradient Boosting Regression Tree Estimation Model for Soil Heavy Metal (Arsenic) Pollution Monitoring Using Hyperspectral Remote Sensing. Appl. Sci. 2019, 9, 1943. https://doi.org/10.3390/app9091943
Wei L, Yuan Z, Zhong Y, Yang L, Hu X, Zhang Y. An Improved Gradient Boosting Regression Tree Estimation Model for Soil Heavy Metal (Arsenic) Pollution Monitoring Using Hyperspectral Remote Sensing. Applied Sciences. 2019; 9(9):1943. https://doi.org/10.3390/app9091943
Chicago/Turabian StyleWei, Lifei, Ziran Yuan, Yanfei Zhong, Lanfang Yang, Xin Hu, and Yangxi Zhang. 2019. "An Improved Gradient Boosting Regression Tree Estimation Model for Soil Heavy Metal (Arsenic) Pollution Monitoring Using Hyperspectral Remote Sensing" Applied Sciences 9, no. 9: 1943. https://doi.org/10.3390/app9091943
APA StyleWei, L., Yuan, Z., Zhong, Y., Yang, L., Hu, X., & Zhang, Y. (2019). An Improved Gradient Boosting Regression Tree Estimation Model for Soil Heavy Metal (Arsenic) Pollution Monitoring Using Hyperspectral Remote Sensing. Applied Sciences, 9(9), 1943. https://doi.org/10.3390/app9091943