Abstract
The purpose of the current study is to produce landslide susceptibility maps using different data mining models. Four modeling techniques, namely random forest (RF), boosted regression tree (BRT), classification and regression tree (CART), and general linear (GLM) are used, and their results are compared for landslides susceptibility mapping at the Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslide locations were identified and mapped from the interpretation of different data types, including high-resolution satellite images, topographic maps, historical records, and extensive field surveys. In total, 125 landslide locations were mapped using ArcGIS 10.2, and the locations were divided into two groups; training (70 %) and validating (25 %), respectively. Eleven layers of landslide-conditioning factors were prepared, including slope aspect, altitude, distance from faults, lithology, plan curvature, profile curvature, rainfall, distance from streams, distance from roads, slope angle, and land use. The relationships between the landslide-conditioning factors and the landslide inventory map were calculated using the mentioned 32 models (RF, BRT, CART, and generalized additive (GAM)). The models’ results were compared with landslide locations, which were not used during the models’ training. The receiver operating characteristics (ROC), including the area under the curve (AUC), was used to assess the accuracy of the models. The success (training data) and prediction (validation data) rate curves were calculated. The results showed that the AUC for success rates are 0.783 (78.3 %), 0.958 (95.8 %), 0.816 (81.6 %), and 0.821 (82.1 %) for RF, BRT, CART, and GLM models, respectively. The prediction rates are 0.812 (81.2 %), 0.856 (85.6 %), 0.862 (86.2 %), and 0.769 (76.9 %) for RF, BRT, CART, and GLM models, respectively. Subsequently, landslide susceptibility maps were divided into four classes, including low, moderate, high, and very high susceptibility. The results revealed that the RF, BRT, CART, and GLM models produced reasonable accuracy in landslide susceptibility mapping. The outcome maps would be useful for general planned development activities in the future, such as choosing new urban areas and infrastructural activities, as well as for environmental protection.
Similar content being viewed by others
References
Abella EAC, Van Westen CJ (2007) Generation of a landslide risk index map for Cuba using spatial multi-criteria evaluation. Landslides 4(4):311–325. doi:10.1007/s10346-007-0087-y
Aertsen W, Kint V, Van Orshoven J, Ozkan K, Muys B (2009) Performance of modelling techniques for the prediction of forest site index: a case study for pine and cedar in the Taurus mountains, Turkey. XIII World Forestry Congress, Buenos Aires, pp 1–12
Akgun A, Sezer EA, Nefeslioglu HA, Gokceoglu C, Pradhan B (2012) An easy-touse MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Comput Geosci 38(1):23–34. doi:10.1016/j.cageo.2011.04.012
Ayalew L, Yamagishi H (2005) The Application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda–Yahiko Mountains, central Japan. Geomorphology 65:15–31. doi:10.1016/j.geomorph.2004.06.010
Ayalew L, Yamagishi H, Ugawa N (2004) Landslide susceptibility mapping using GIS-based weighted linear combination, the case in Tsugawa area of Agano River, Niigata Prefecture, Japan. Landslides 1(1):73–81. doi:10.1007/s10346-003-0006-9
Baatuuwie NB, Leeuwen ILV (2011) Evaluations of three classifiers in mapping forest stand types using medium resolution imagery: a case study in the Offinso Forest District, Ghana. African J Environ Sci Technol 5(1):25–36
Bednarik M, Yilmaz I, Marschalko M (2012) Landslide hazard and risk assessment: a case study from the Hlohovec-Sered landslide—area in southwest Slovakia. Nat Hazards. doi:10.1007/s11069-012-0257-7
Bernknopf RL, Brookshire DS, Shapiro CD (1988) A probabilistic approach to landslide hazard mapping in Cincinnati, Ohio, with applications for economic evaluation. Assoiate Geol Eng Bull 24:39–56
Breiman L (2001) Random forests. Mach Learn 45:5–32. doi:10.1023/A:1010933404324
Breiman L, Cutler A (2004) http://www.stat.berkeley.edu/users/Breiman/RandomForests/ccpapers.html
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall, New York
Brenning A (2005) Spatial prediction models for landslide hazards: review, comparison and evaluation. Nat Hazards Earth Syst Sci 5:853–862. doi:10.5194/nhess-5-853-2005
Bui DT, Pradhan B, Lofman O, Revhaug I, Dick OB (2012) Landslide susceptibility assessment in the Hoa Binh Province of Vietnam: a comparison of the Levenberg-Marquardt and Bayesian regularized neural networks. Geomorphology. doi:10.1016/j.geomorph.2012.04.023
Calle ML, Urrea V (2010) Letter to the Editor: stability of random forest importance measures. Brief Bioinform 12(1):86–89. doi:10.1093/bib/bbq011
Can T, Nefeslioglu H, Gokceoglu C, Sonmez H, Duman TY (2005) Susceptibility assessments of shallow earthflows triggered by heavy rainfall at three catchments by logistic regression analysis. Geomorphology 72(1–4):250–271. doi:10.1016/j.geomorph.2005.05.011
Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Nat Hazards Earth Syst Sci 13:2815–2831. doi:10.5194/nhess-13-2815-2013
Chacon J, Irigaray C, Fernandez T, El Hamdouni R (2006) Engineering geology maps: landslides and geographical information systems. Bull Eng Geol Environ 65:341–411. doi:10.1007/s10064-006-0064-z
Chen XW, Liu M (2006) Prediction of protein–protein interactions using random decision forest framework. Bioinformatics 21(24):4394–4400. doi:10.1093/bioinformatics/bti721
Chung CJF, Fabbri AG (2003) Validation of spatial prediction models for landslide hazard mapping. Nat Hazards 30(3):451–472. doi:10.1023/B:NHAZ.0000007172.62651.2b
Committee on the Review of the National Landslide Hazards Mitigation Strategy (2004) Partnerships for reducing landslide risk. Assessment of the National landslide hazards mitigation strategy. Board on Earth Sciences and Resources, Division on earth and life studies, The National Academic Press, Washington, p 143
Cutler DR, Edwards TC, Karen J, Beard H, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792
Dahal RK, Hasegawa S, Nonomura A, Yamanaka M, Dhakal S, Paudyal P (2008a) Predictive modeling of rainfall-induced landslide hazard in the Lesser Himalaya of Nepal based on weights of evidence. Geomorphology 102(3–4):496–510. doi:10.1016/j.geomorph.2008.05.041
Dahal RK, Hasegawa S, Nonomura A, Yamanaka M, Masuda T, Nishino K (2008b) GIS-based weights-of-evidence modeling of rainfall-induced landslides in small catchments for landslide susceptibility mapping. Environ Geol 54(2):311–324. doi:10.1007/s00254-007-0818-3
Dai FC, Lee CF (2002) Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42:213–228. doi:10.1016/S0169-555X(01)00087-3
Dai FC, Lee CF, Xu ZW (2001) Assessment of landslide susceptibility on the natural terrain of Lantau Island, Hong Kong. Environ Geol 40(3):381–391. doi:10.1007/s002540000163
De La Ville N, Diaz AC, Ramirez D (2002) Remote sensing and GIS technologies as tools to support sustainable management of areas devastated by landslides. Environ Dev Sustain 4(2):221–229
Devkota KC, Regmi AD, Pourghasemi HR, Yoshida K, Pradhan B, Ryu IC, Dhital MR, Althuwaynee OF (2013) Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression models in GIS and their comparison at Mugling-Narayanghat road section in Nepal Himalaya. Nat Hazards 65(1):135–165. doi:10.1007/s11069-012-0347-6
Diaz-Uriate R, de Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(3):1–13. doi:10.1186/1471-2105-7-3
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77:802–813
Elkadiri R, Sultan M, Youssef A, Elbayoumi T, Chase R, Bulkhi A, Al-Katheeri M (2014) A remote sensing-based approach for debris-flow susceptibility assessment using artificial neural networks and logistic regression modeling. Selected topics in applied earth observations and remote sensing, IEEE J Sel Top Appl Earth Obs Remote Sens. doi:10.1109/JSTARS.2014.2337273
Ercanoglu M, Gokceoglu C (2004) Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (West Black Sea Region, Turkey). Eng Geol 75:229–250. doi:10.1016/j.enggeo.2004.06.001
Falaschi F, Giacomelli F, Federici PR, Puccinelli A, D’Amato Avanzi G, Pochini A, Ribolini A (2009) Logistic regression versus artificial neural networks: landslide susceptibility evaluation in a sample area of the Serchio River valley, Italy. Nat Hazards 50:551–569. doi:10.1007/s11069-009-9356-5
Federici PR, Puccinelli A, Cantarelli E, Casarosa N, D’amato Avanzi G, Falaschi F, Giannecchini R, Pochini A, Ribolini A, Bottai M, Salvati N, Testi C (2005) Uso di tecniche GIS nella valutazione della pericolosita’ di frana nella valle del Serchio (Lu). Atti 9a Conferenza Nazionale ASITA 2:1059–1064
Federici PR, Puccinelli A, Cantarelli E, Casarosa N, D’amato Avanzi G, Falaschi F, Giannecchini R, Pochini A, Ribolini A, Bottai M, Salvati N, Testi C (2007) Multidisciplinary investigations in evaluating landslide hazard. An example in the Serchio River valley (Italy). Quat Int 171–172:52–63. doi:10.1016/j.quaint.2006.10.018
Felicísimo A, Cuartero A, Remondo J, Quirós E (2012) Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides. doi:10.1007/s10346-012-0320-1
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27:294–300
Giudici P (2005) Data mining: metodi informatici, statistici e applicazioni. McGraw Hill, Milano, p 401
Greco R, Sorriso-Valvo M, Catalano E (2007) Logistic regression analysis in the evaluation of mass-movements susceptibility: the Aspromonte case study, Calabria, Italy. Eng Geol 89:47–66. doi:10.1016/j.enggeo.2006.09.006
Greenwood WR (1985) Geologic map of the Abha quadrangle, sheet 18 F, Kingdom of Saudi Arabia, Ministry of Petroleum and Mineral Resources, Deputy Ministry for Mineral Resources GM-75 c, scale 1:250,000
Ham J, Chen YC, Crawford MP, Ghosh J (2005) Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans Geosci Remote Sens 43(3):492–501
Hansen L, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001
Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York, p 392
Jaafari A, Najafi A, Pourghasemi HR, Rezaeian J, Sattarian A (2014) GIS-based frequency ratio and index of entropy models for landslide susceptibility assessment in the Caspian forest, northern Iran. Int J Environ Sci Technol 11(4):909–926
Lee S, Pradhan B (2007) Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides 4(1):33–41. doi:10.1007/s10346-006-0047-y
Maerz NH, Youssef AM, Pradhan B, Bulkhi A (2014) Remediation and mitigation strategies for rock fall hazards along the highways of Fayfa Mountain, Jazan Region, Kingdom of Saudi. Arab J Geosci. doi:10.1007/s12517-014-1423-x
Marmion M, Hjort J, Thullier W, Luoto M (2009) Statistical consensus methods for improving predictive geomorphology maps. Comput Geosci 35:615–625. doi:10.1016/j.cageo.2011.04.012
Mathew J, Jha VK, Rawat GS (2009) Landslide susceptibility zonation mapping and its validation in part of Garhwal Lesser Himalaya, India, using binary logistic regression analysis and receiver operating characteristic curve method. Landslides 6(1):17–26. doi:10.1007/s10346-008-0138-z
McCullagh P, Nelder JA (1989) Generalized linear models, Second Editionth edn. Chapman and Hall/CRC, Boca Raton, p 532
Mckenney DW, Pedlar JH (2003) Spatial models of site index based on climate and soil properties for two boreal tree species in Ontario, Canada. Forest Ecol Manag 175:497–507
Micheletti N, Foresti L, Robert S, Leuenberger M, Pedrazzini A, Jaboyedoff M, Kanevski M (2014) Machine learning feature selection methods for landslide susceptibility mapping. Math Geosci 46:33–57. doi:10.1007/s11004-013-9511-0
Mohammady M, Pourghasemi HR, Pradhan B (2012) Landslide susceptibility mapping at Golestan Province Iran: a comparison between frequency ratio, Dempster-Shafer, and weights of evidence models. J Asian Earth Sci 61:221–236
Nefeslioglu HA, Sezer E, Gokceoglu C, Bozkir AS, Duman TY (2010) Assessment of landslide susceptibility by decision trees in the Metropolitan area of Istanbul, Turkey. Math Problems Eng 1–15, 901095. doi:10.1155/2010/901095
Nikita E (2014) The use of generalized linear models and generalized estimating equations in bioarchaeological studies. Am J Phys Anthropol 153:473–483
O’Brien RM (2007) A caution regarding rules of thumb for variance inflation factors. Qual Quant 41(5):673–690
Ohlmacher GC, Davis JC (2003) Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng Geol 69:331–343. doi:10.1016/j.enggeo.2006.09.006
Ozdemir A, Altural T (2013) A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey. J Asian Earth Sci 64:180–197
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222. doi:10.1080/01431160412331269698
Paudel U, Oguchi T (2014) Implementation of random forest in landslide susceptibility study, a case study of the Tokamachi area, Niigata, Japan. Japan Geoscience Union Meeting, Pcaifico Yokohama, 28th April–2nd May, 2014
Payne R (2012) A guide to regression, nonlinear and generalized linear models in GenStat. VSN International, 5 The Waterhouse, Waterhouse Street, Hemel Hempstead, Hertfordshire HP1 1ES, UK, p 88
Petley DN (2008) The global occurrence of fatal landslides in 2007. Geophysical Research Abstracts, vol. 10, EGU General Assembly 2008, p 3
Piccolo D (1998) Statistica. Il Mulino, Bologna, p 969
Pourghasemi HR, Pradhan B, Gokceoglu C (2012a) Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat Hazards 63(2):965–996. doi:10.1007/s11069-012-0217-2
Pourghasemi HR, Pradhan B, Gokceoglu C (2012b) Remote sensing data derived parameters and its use in landslide susceptibility assessment using Shannon’s entropy and GIS, AEROTECH IV-2012. Appl Mech Mater 225:486–491. doi:10.4028/www.scientific.net/AMM.225.486
Pourghasemi HR, Moradi HR, Fatemi Aghda SM (2013a) Landslide susceptibility mapping by binary logistic regression, analytical hierarchy process, and statistical index models and assessment of their performances. Nat Hazards 69:749–779. doi:10.1007/s11069-013-0728-5
Pourghasemi HR, Pradhan B, Gokceoglu C, Mohammadi M, Moradi HR (2013b) Application of weights-of-evidence and certainty factor models and their comparison in landslide susceptibility mapping at Haraz watershed, Iran. Arab J Geosci 6(7):2351–2365. doi:10.1007/s12517-012-0532-7
Pourtaghi ZS, Pourghasemi HR, Rossi M (2014) Forest fire susceptibility mapping in the Minudasht forests, Golestan province, Iran. Environ Earth Sci. doi:10.1007/s12665-014-3502-4
Pradhan B (2010) Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. J Indian Soc Remote Sens 38(2):301–320
Pradhan B, Lee S (2010) Regional landslide susceptibility analysis using back-propagation neural networks model at Cameron Highland, Malaysia. Landslides 7(1):13–30. doi:10.1007/s10346-009-0183-2
Pradhan B, Youssef AM (2010) Manifestation of remote sensing data and GIS on landslide hazard analysis using spatial-based statistical models. Arab J Geosci 3(3):319–326. doi:10.1007/s12517-009-0089-2
Pradhan B, Youssef AM, Varathrajoo R (2010) Approaches for delineating landslide hazard areas using different training sites in an advanced artificial neural network model. Geo-Spat Inform Sci 13(2):93–102
Regmi AD, Yoshida K, Pradhan B, Pourghasemi HR, Khumamoto T, Akgun A (2014) Application of frequency ratio, statistical index and weights-of-evidence models, and their comparison in landslide susceptibility mapping in Central Nepal Himalaya. Arab J Geosci 7(2):725–742. doi:10.1007/s12517-012-0807-z
Remondo J, González A, Díaz de Terán JR, Cendrero A, Fabbri A, Chung CJF (2003) Validation of landslide susceptibility maps; examples and applications from a case study in Northern Spain. Nat Hazds 30(3):437–449. doi:10.1023/B:NHAZ.0000007201.80743.fc
Remondo J, Bonachea J, Cendrero A (2005) A statistical approach to landslide risk modelling at basin scale: from landslide susceptibility to quantitative risk assessment. Landslides 2(4):321–328. doi:10.1007/s10346-005-0016-x
Ridgeway G (2006) Generalized boosted regression models. Documentation on the R package ‘gbm’, version 1.5-7, Available at: http://www.ipensieri.com/gregr/gbm.shtml
Saha AK, Gupta RP, Sarkar I, Arora MK, Csaplovics E (2005) An approach for GIS-based statistical landslide susceptibility zonation with a case study in the Himalayas. Landslides 2:61–69. doi:10.1007/s10346-004-0039-8
Schapire RE (2003) The boosting approach to machine learning: an overview. Nonlinear Estim Classif 171:149–171
Schleier M, Bi RN, Rohn J, Ehret D, Xiang W (2014) Robust landslide susceptibility analysis by combination of frequency ratio, heuristic GIS-methods and ground truth evaluation for a mountainous study area with poor data availability in the Three Gorges Reservoir area, PR China. Environ Earth Sci 71(7):3007–3023. doi:10.1007/s12665-013-2677-4
Schröder B, Vorpahl P, Märker M, Elsenbeer H (2010) Pitfalls in statistical landslide susceptibility modelling. Geophysical Res Abstracts 12:EGU2010-10786
Stehman SV, Czaplewski LR (1998) Design and analysis of thematic map accuracy assessment: fundamental principles. Remote Sens Environ 64:331–344
Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 270:1285–1293
van Westen CJ, van Asch TWJ, Soeters R (2006) Landslide hazard and risk zonation—why is it still so difficult? Bull Eng Geol Environ 65:167–184. doi:10.1007/s10064-005-0023-0
Vorpahl P, Elsenbeer H, Marker M, Schroder B (2012) How can statistical models help to determine driving factors of landslides? Ecol Model 239:27–39
Williams G (2011) Data mining with Rattle and R (the art of excavating data for knowledge discovery series). New York, p 347
Wu XL, Ren F, Niu RQ (2014) Landslide susceptibility assessment using object mapping units, decision tree, and support vector machine models in the Three Gorges of China. Environ Earth Sci 71(11):4725–4738. doi:10.1007/s12665-013-2863-4
Xu C (2013) Assessment of earthquake-triggered landslide susceptibility based on expert knowledge and information value methods: a case study of the 20 April 2013 Lushan, China Mw6.6 earthquake. Di Adv 6(13):119–130
Yeon YK, Han JG, Ryu KH (2010) Landslide susceptibility mapping in Injae, Korea, using a decision tree. Eng Geol 116:274–283. doi:10.1016/j.enggeo.2010.09.009
Yilmaz I (2009) A case study from Koyulhisar (Sivas-Turkey) for landslide susceptibility mapping by artificial neural networks. Bull Eng Geol Environ 68(3):297–306. doi:10.1007/s10064-009-0185-2
Youssef AM (2015) Landslide susceptibility delineation in the Ar-Rayth Area, Jizan, Kingdom of Saudi Arabia, by using analytical hierarchy process, frequency ratio, and logistic regression models. Environ Earth Sci. doi:10.1007/s12665-014-4008-9, Article on line first
Youssef AM, Maerz N (2013) Overview of some geological hazards in the Saudi Arabia. Environ Earth Sci 70:3115–3130. doi:10.1007/s12665-013-2373-4
Youssef AM, Maerz NH, Hassan AM (2009) Remote sensing applications to geological problems in Egypt: case study, slope instability investigation, Sharm El-Sheikh/Ras-Nasrani Area, Southern Sinai. Landslides 6(4):353–360. doi:10.1007/s10346-009-0158-3
Youssef AM, Maerz HN, Al-Otaibi AA (2012) Stability of rock slopes along Raidah Escarpment Road, Asir Area, Kingdom of Saudi Arabia. J Geogr. doi:10.5539/jgg.v4n2p48
Youssef AM, Pradhan B, Maerz NH (2013) Debris flow impact assessment caused by 14 April 2012 rainfall along the Al-Hada Highway, Kingdom of Saudi Arabia using high-resolution satellite imagery. Arab J Geosci 1–11. doi:10.1007/s12517-013-0935-0
Youssef AM, Al-kathery M, Pradhan B (2014a) Landslide susceptibility mapping at Al-Hasher Area, Jizan (Saudi Arabia) using GIS-based frequency ratio and index of entropy models. Geosci J. doi:10.1007/s12303-014-0032-8
Youssef AM, Pradhan B, Jebur MN, El-Harbi HM (2014b) Landslide susceptibility mapping using ensemble bivariate and multivariate statistical models in Fayfa area, Saudi Arabia. Environ Earth Sci. doi:10.1007/s12665-014-3661-3
Youssef AM, Al-kathery M, Pradhan B, Elsahly T (2014c) Debris flow impact assessment along the Al-Raith Road, Kingdom of Saudi Arabia, using remote sensing data and field investigations. Geomat Nat Hazards Risk. doi:10.1080/19475705.2014.933130
Zhu L, Huang J (2006) GIS-based logistic regression method for landslide susceptibility mapping in regional scale. J Zhejiang Univ Sci A 7:2007–2017
Acknowledgments
The authors would like to thank the editorial comments and anonymous reviewers for their helpful comments on the previous version of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Youssef, A.M., Pourghasemi, H.R., Pourtaghi, Z.S. et al. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13, 839–856 (2016). https://doi.org/10.1007/s10346-015-0614-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10346-015-0614-1