Abstract
Rivers, as one of the freshwater resources, are generally put in the state of jeopardy in terms of quantity and quality due to the development in industry, agriculture, and urbanization. Management of water quality is inextricably bound up with a reliable prediction of the Water Quality Index (WQI) for various purposes. In this way, an accurate estimation of WQI is one of the most challenging issues in the water quality studies of surface water resources. There is a board range of traditional methodologies for the WQI evaluation. Due to the intrinsic limitations of conventional models, Data-Driven Models (DDMs) have been frequently employed to assess the WQI for natural streams. In the present research, WQI values and their typical classifications were obtained by guidelines of the National Sanitation Foundation (NSF). Hence, four well-known DDMs such as Evolutionary Polynomial Regression (EPR), M5 Model Tree (MT), Gene-Expression Programming (GEP), and Multivariate Adaptive Regression Spline (MARS) are employed to predict WQI in Karun River. In this way, 12 Water Quality Parameters (i.e., Dissolved Oxygen, Chemical Oxygen Demand, Biochemical Oxygen Demand, Electrical Conductivity, Nitrate, Nitrite, Phosphate, Turbidity, pH, Calcium, Magnesium, and Sodium) were accumulated from nine hydrometry stations and additionally missing values of water temperature were extracted from images analysis of Landsat-7 ETM+. Furthermore, the Gamma Test (GT), Forward Selection (FS), Polynomial Chaotic Expression (PCE), and Principle Component Analysis (PCA) were used to reduce the volume of DDMs-feeding-input variables. Results of DDMs demonstrated that FS-M5 MT had the best performance for the estimation of WQI classification. WQI values for Karun River were assessed in the reliability-based probabilistic framework to consider the effect of any uncertainty and randomness in the input parameters. To this end, the Monte-Carlo scenario sampling technique was conducted to evaluate the limit state function from the DDMs-based-WQI formulation. Based on the qualitative description of the WQI, it was observed that the WQI of Karun River is classified into “Relatively Bad” quality. Moreover, based on the reliability analysis, there is only a 19% chance exists for a specimen from Karun River to have a better quality index.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Abbreviations
- AI:
-
Artificial Intelligence
- ANFIS:
-
Adaptive Neuro-Fuzzy Inference System
- A:
-
Slope parameter returned as this normally includes useful data related to the complexity among input and output variables
- aij:
-
Weighting coefficients of principle components
- AL:
-
Offset which has a certain value for each band
- ANNs:
-
Artificial Neural Networks
- APLR:
-
Adaptive Piecewise Linear Regression
- b1, b2, b3, b4:
-
Weighting coefficients of multivariate linear equation by MT
- BFs:
-
Basis Functions
- BOD:
-
Biochemical Oxygen Demand
- C:
-
A closed bounded set
- c1, c2, c3,… c13:
-
Weighting coefficients of multivariate linear equation by MT
- Ca2+ :
-
Calcium
- CCME:
-
Canadian Council of Ministers of the Environment
- CL:
-
Confidence Level
- COD:
-
Chemical Oxygen Demand
- DDMs:
-
Data-Driven Models
- DN:
-
Digital Number
- Dn:
-
Absolute difference between numerical and theoretical accumulative distribution associated with the parameter
- \(D_{n}^{u}\) :
-
Acceptable limit for the Dn
- DO:
-
Dissolved Oxygen
- DOsat:
-
Dissolved Oxygen in the saturated state
- e:
-
Model error known as the uncertainty parameter
- EC:
-
Electrical Conductivity
- EPR:
-
Evolutionary Polynomial Regression
- ET:
-
Expressions Tree
- ETM+ :
-
Enhanced Thematic Mapper Plus
- F0:
-
F ratio
- FC:
-
Fecal Coliform
- FORM:
-
First Order Reliability Method
- FOSM:
-
First-Order Second Moment
- FS:
-
Forward Selection
- FX(r):
-
Theoretical cumulative distribution associated with r parameter
- G:
-
Green Spectral Band
- GA:
-
Genetic Algorithm
- GEP:
-
Gene-Expression Programming
- GMDH:
-
Group Method of Data Handling
- Gn(r):
-
Numerical cumulative distribution associated with r parameter
- GP:
-
Genetic Programming
- GT:
-
Gamma Test
- GT0:
-
The intercept on the vertical axis (δ = 0)
- h:
-
A function for establishing a relationship among WQPs and WQI
- I:
-
Unit matrix
- IOA:
-
Index of Agreement
- k:
-
Number of the nearest neighbors
- k′:
-
Number of elements in input variables
- K1, K2:
-
Band-specific thermal conservation constant
- KMO:
-
Kaiser–Meyer-Olkin
- K-S:
-
The Kolmogorov–Smirnov
- LS:
-
Least Squares
- LSF:
-
Limit-State Function
- Lλ:
-
Top of Atmospheric Radiance
- M:
-
Maximum number of mathematical terms
- MAE:
-
Mean Absolute Error
- MARS:
-
Multivariate Adaptive Regression Spline
- Mg2+ :
-
Magnesium
- ML:
-
Gain coefficient
- MMSE:
-
Minimum Mean Square Error
- MOGA:
-
Multi-Objective Genetic Algorithm
- MSE:
-
Mean Squared Error
- MT:
-
Model Tree
- n:
-
Number of input variable
- n′:
-
Number of observations
- Na + :
-
Sodium
- NDWI:
-
Normalized Difference Water Index
- NH4:
-
Ammonium
- NIR:
-
Near Infra-Red Spectral Band
- \(NO_{3}^{ - }\) :
-
Nitrate Nitrogen
- NSF:
-
National Sanitation Foundation
- p:
-
Maximum number of input variables
- PCA:
-
Principle Component Analysis
- PCC:
-
Positive Coefficient of Correlation
- PCE:
-
Polynomial Chaotic Expression
- PE:
-
Probability of Exceedance
- pf:
-
Probability of Failure
- pH:
-
Potential of Hydrogen
- \(PO_{4}^{3 - }\) :
-
Phosphate
- Qcal:
-
Value of DN
- R:
-
Coefficient of correlation
- RAE:
-
Relative Absolute Error
- RE:
-
Relative Error
- RMSE:
-
Root Mean Square Error
- ROI:
-
Region of Interest
- RRSE:
-
Root Relative Squared Error
- s:
-
Number of basis functions
- SORM:
-
Second-Order Reliability Method
- SSE:
-
Sum Square Error
- SST:
-
Sea Surface Temperature
- SVM:
-
Support Vector Machine
- T:
-
Temperature
- TB:
-
Brightness Temperature
- TH:
-
Total Hardness
- Tu:
-
Turbidity
- u:
-
Significance level
- USGS:
-
United State Geographical Survey
- VCM:
-
Variance–covariance matrix
- WCs:
-
Weighting Coefficients
- WQI:
-
Water Quality Index
- WQIAC:
-
Acceptable values of WQI
- WQIME:
-
Measured values of WQI
- WQPs:
-
Water Quality Parameters
- WST:
-
Water Surface Temperature
- x:
-
Input vectors known as WQPs
- X1, X2, X3,… Xn:
-
Input variables associated with the limit state function
- y:
-
Output vector known as WQI
- α:
-
The significant level used in F test
- δ:
-
The function associated with the Euclidean distance
- δ′:
-
A collection of coefficients used in EPR formulation
- θ:
-
Input variables vector for a specific problem
- λ:
-
Eigenvalues
- μ(x):
-
Basis function
- π:
-
Overall formulation by EPR
- ρ:
-
Weighting coefficients used in formulation obtained by MARS model
- ϕ(x):
-
Formulation obtained by MARS model
- ϕ1, ϕ2, ϕ3:
-
Functions for establishing a relationship among WQPs
- ω:
-
User-defined-function with various mathematical structure
- \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{x}\) :
-
Vector of random variables
References
Abbasi SA (2002) Water quality indices. State of the art report, Scientific Contribution No.-INCOH/SAR-25/2002. INCOH National Institute of Hydrology, Roorkee
Abrahão R, Carvalho M, da Silva Jr W, Machado T, Gadelha C, Hernandez M (2007) Use of index analysis to evaluate the water quality of a stream receiving industrial effluents. Water SA 33(4):459–466. https://doi.org/10.4314/wsa.v33i4.52940
Ang AH-S, Tang WH (2007) Probability concepts in engineering planning and design: emphasis on application to civil and environmental engineering, 2nd edn. John Wiley and Sons, New Jersy
Balacco G, Laucelli D (2019) Improved air valve design using evolutionary polynomial regression. Water Supp 19(7):2036–2043. https://doi.org/10.2166/ws.2019.081
Bordalo AA, Teixeira R, Wiebe WJ (2006) A water quality index applied to an international shared river basin: the case of the Douro River. Environ Manage 38:910–920. https://doi.org/10.1007/s00267-004-0037-6
Bostanmaneshrad F, Partani S, Noori R, Nachtnebel HP, Berndtsson R, Adamowski JF (2018) Relationship between water quality and macro-scale parameters (land use, erosion, geology, and population density) in the Siminehrood River Basin. Sci Total Environ 639:1588–1600. https://doi.org/10.1016/j.scitotenv.2018.05.244
Bozorg-Haddad O, Soleimani S, Loáiciga HA (2017) Modeling water-quality parameters using genetic algorithm–least squares support vector regression and genetic programming. J Environ Eng 143(7):04017021. https://doi.org/10.1061/(ASCE)EE.1943-7870.0001217
Çamdevýren H, Demýr N, Kanik A, Keskýn S (2005) Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecol Modell 181(4):581–589. https://doi.org/10.1016/j.ecolmodel.2004.06.043
Carbone M, Berardi L, Laucelli D, Piro P (2012) Data-mining approach to investigate sedimentation features in combined sewer overflows. J Hydroinf 14(3):613–627. https://doi.org/10.2166/hydro.2011.003
Chanapathi T, Thatikonda S (2019) Fuzzy-based regional water quality index for surface water quality assessment. J Hazard Toxic Radioac Waste 23(4):04019010
Chapman D (1992) Water quality assessments-a guide to use of biota, sediments and water in environmental monitoring, 1st edn. Cambridge University Press, London
Cude CG (2001) Oregon water quality index: A tool for evaluating water quality management effectiveness. J Am Water Resour Assoc 37(1):125–137. https://doi.org/10.1111/j.1752-1688.2001.tb05480.x
Diamantopoulou MJ, Papamichail DM, Antonopoulos VZ (2005) The use of a neural network technique for the prediction of water quality parameters. Oper Res 5(1):115–125. https://doi.org/10.1007/BF02944165
Dogan E, Sengorur B, Koklu R (2009) Modeling biological oxygen demand of the Melen River in Turkey using an artificial neural network technique. J Environ Manage 90(2):1229–1235. https://doi.org/10.1016/j.jenvman.2008.06.004
Durrant PJ (2001) Wingamma: a non-linear data analysis and modeling tool with applications to flood prediction. PhD thesis. department of computer science, Cardiff University, Wales, UK. http://users.cs.cf.ac.uk/O.F.Rana/Antonia.J.Jones/GammaArchive/Theses/
Emamgholizadeh S, Kashi H, Marofpoor I, Zalaghi E (2014) Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models. Int J Environ Sci Technol 11(3):645–656. https://doi.org/10.1007/s13762-013-0378-x
Evans D, Jones AJ (2002) A proof of the Gamma test. P Roy Soc A-Math Phy 458(2027):2759–2799. https://doi.org/10.1098/rspa.2002.1010
Ferreira C (2001) Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst 13(2), 87–129. https://www.gene-expression-programming.com/
Fiore A, Marano GC, Laucelli D, Monaco P (2014) Evolutionary modeling to evaluate the shear behavior of circular reinforced concrete columns. Adv Civ Eng. https://doi.org/10.1155/2014/684256
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67. https://doi.org/10.1214/aos/1176347963
Gharibi H, Sowlat MH, Mahvi AH, Mahoudzadeh H, Arabalibeik H, Keshavarz M, Karimzadeh N, Hassani G (2012) Development of a dairy cattle drinking water quality index (DCWQI) based on fuzzy inference systems. Ecol Indic 20:228–237. https://doi.org/10.1016/j.ecolind.2012.02.015
Giustolisi O, Savic DA (2006) A symbolic data-driven technique based on evolutionary polynomial regression. J Hydroinf 8(3):207–222. https://doi.org/10.2166/hydro.2006.020b
Gupta R, Singh AN, Singhal A (2019) Application of ANN for Water Quality Index. IJMLC 9(5):688–693. https://doi.org/10.18178/ijmlc.2019.9.5.859
Haghiabi AH, Nasrolahi AH, Parsaie A (2018) Water quality prediction using machine learning methods. Water Qual Res J 53(1):3–13. https://doi.org/10.2166/wqrj.2018.025
Hanh PTM, Sthiannopkao S, Ba DT, Kim KW (2011) Development of water quality indexes to identify pollutants in Vietnam’s surface water. J Environ Eng 137:273–283. https://doi.org/10.1061/(ASCE)EE.1943-7870.0000314
Heddam S (2016a) New modeling strategy based on radial basis function neural network (RBFNN) for predicting dissolved oxygen concentration using the components of the Gregorian calendar as inputs: case study of Clackamas River, Oregon. USA Model Earth Syst Environ 2(4):1–5. https://doi.org/10.1007/s40808-016-0232-5
Heddam S (2016b) Secchi disk depth estimation from water quality parameters: artificial neural network versus multiple linear regression models? Environ Processes 3(2):525–536. https://doi.org/10.1007/s40710-016-0144-4
Heddam S (2016c) Simultaneous modeling and forecasting of hourly dissolved oxygen concentration (DO) using radial basis function neural network (RBFNN) based approach: a case study from the Klamath River, Oregon. USA Model Earth Syst Environ 2(3):135–152. https://doi.org/10.1007/s40808-016-0197-4
Heddam S, Kisi O (2018) Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J Hydrol 559:499–509. https://doi.org/10.1016/j.jhydrol.2018.02.061
Kachroud M, Trolard F, Kefi M, Jaberi S, Bourrié G (2019) Water quality indices: challenges and application limits in the literature. Water 11(2):361. https://doi.org/10.3390/w11020361
Khan FI, Abbasi SA (1997) Accident hazard index: a multiattribute method for process industry hazard rating. Process Saf Environ Prot 75:217–224. https://doi.org/10.1205/095758297529093
Koncar N (1997) Optimization methodologies for direct inverse neurocontrol. PhD Thesis, department of computing, imperial college of science, technology and medicine, University of London. http://users.cs.cf.ac.uk/O.F.Rana/Antonia.J.Jones/Theses/
Lamaro AA, Marinelarena A, Torrusio SE, Sala SE (2013) Water surface temperature estimation from Landsat 7 ETM+ thermal infrared data using the generalized single-channel method: Case study of Embalse del Río Tercero (Córdoba, Argentina). Adv Space Res 51(3):492–500. https://doi.org/10.1016/j.asr.2012.09.032
Mahsuli M, Haukaas T (2013) Seismic risk analysis with reliability methods, part I: Models. Struct Saf 42:54–62. https://doi.org/10.1016/j.strusafe.2013.01.003
Maier HR, Morgan N, Chow CW (2004) Use of artificial neural networks for predicting optimal alum doses and treated water quality parameters. Environ Modell Software 19(5):485–494. https://doi.org/10.1016/S1364-8152(03)00163-4
McFeeters SK (1996) The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int J Remote Sens 17(7):1425–1432. https://doi.org/10.1080/01431169608948714
Misaghi F, Delgosha F, Razzaghmanesh M, Myers B (2017) Introducing a water quality index for assessing water for irrigation purposes: a case study of the Ghezel Ozan River. Sci Total Environ 589:107–116. https://doi.org/10.1016/j.scitotenv.2017.02.226
Najafzadeh M, Ghaemi A (2019) Prediction of the five-day biochemical oxygen demand and chemical oxygen demand in natural streams using machine learning methods. Environ Monit Assess 191(6):380–400. https://doi.org/10.1007/s10661-019-7446-8
Najafzadeh M, Ghaemi A, Emamgholizadeh S (2019) Prediction of water quality parameters using evolutionary computing-based formulations. Int J Environ Sci Technol 16:6377–6396. https://doi.org/10.1007/s13762-018-2049-4
Ndossi MI, Avdan U (2016) Application of open source coding technologies in the production of land surface temperature (LST) maps from Landsat: a PyQGIS plugin. Remote Sens 8(5):413. https://doi.org/10.3390/rs8050413
Nikolaidis E, Ghiocel DM, Singhal S (2004) Engineering Design Reliability Handbook, 1st edn. CRC Press, Florida
Njuguna SM, Onyango JA, Githaiga KB, Gituru RW, Yan X (2020) Application of multivariate statistical analysis and water quality index in health risk assessment by domestic use of river water. Case study of Tana River in Kenya. Process Saf Environ Prot 133:149–158. https://doi.org/10.1016/j.psep.2019.11.006
Noori R, Karbassi A, Sabahi MS (2010) Evaluation of PCA and Gamma test techniques on ANN operation for weekly solid waste prediction. J Environ Manage 91(3):767–771. https://doi.org/10.1016/j.jenvman.2009.10.007
Noori R, Berndtsson R, Hosseinzadeh M, Adamowski MJ, Rabiee-Abyaneh M (2019) A critical review on the application of the National Sanitation Foundation Water Quality Index. Environ Pollut 244:575–587. https://doi.org/10.1016/j.envpol.2018.10.076
Ocampo-Duque W, Schuhmacher M, Domingo JL (2007) A neural-fuzzy approach to classify the ecological status in surface waters. Environ Pollut 148(2):634–641. https://doi.org/10.1016/j.envpol.2006.11.027
Preis A, Ostfeld A (2008) A coupled model tree–genetic algorithm scheme for flow and water quality predictions in watersheds. J Hydrol 349(3–4):364–375. https://doi.org/10.1016/j.jhydrol.2007.11.013
Quinlan JR (1992) Learning with continuous classes. In: Adams and Sterling (eds) proceedings of AI’92. pp 343–348, World Scientific, Singapore. http://citeseerx.ist.psu.edu/viewdoc/citations?doi=10.1.1.34.885.
Rajaee T, Shahabi A (2016) Evaluation of wavelet-GEP and wavelet-ANN hybrid models for prediction of total nitrogen concentration in coastal marine waters. Arabian J Geosci 9(3):176–190. https://doi.org/10.1007/s12517-015-2220-x
Said A, Stevens DK, Sehlke G (2004) An Innovative Index for Evaluating Water Quality in Streams. Environ Manage 34:406–414. https://doi.org/10.1007/s00267-004-0210-y
Sargaonkar A, Deshpande V (2003) Development of an overall index of pollution for surface water based on a general classification scheme in Indian context. Environ Monit Assess 89:43–67. https://doi.org/10.1023/A:1025886025137
Sargaonkar AP, Gupta A, Devotta S (2008) Dynamic weighting system for water quality index. Water SciTechnol 58(6):1261–1271. https://doi.org/10.2166/wst.2008.468
Savic D, Giustolisi O, Berardi L, Shepherd W, Djordjevic S, Saul A (2006) Modeling sewer failure by evolutionary computing. Proc Inst Civ Eng Water Manage 159(2):111–118. https://doi.org/10.1680/wama.2006.159.2.111
Savic DA, Giustolisi O, Laucelli D (2009) Asset deterioration analysis using multi-utility data and multi-objective data mining. J Hydroinf 11(3–4):211–224. https://doi.org/10.2166/hydro.2009.019
Shrestha S, Kazama F (2007) Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin. Japan Environ Modell Software 22(4):464–475. https://doi.org/10.1016/j.envsoft.2006.02.001
Šiljić A, Antanasijević D, Perić-Grujić A, Ristić M, Pocajt V (2015) Artificial neural network modeling of biological oxygen demand in rivers at the national level with input selection based on Monte Carlo simulations. Environ Sci Pollut Res 22(6):4230–4241. https://doi.org/10.1007/s11356-014-3669-y
Sobol IM (1993) Sensitivity estimates for nonlinear mathematical models. Math Model Comp Exp 1:407–414
Song T, Kim K (2009) Development of a water quality loading index based on water quality modeling. J Environ Manage 90(3):1534–1543. https://doi.org/10.1016/j.jenvman.2008.11.008
Sudret B (2008) Global sensitivity analysis using polynomial chaos expansions. Reliab Eng Syst Saf 93(7):964–979. https://doi.org/10.1016/j.ress.2007.04.002
Suen JP, Eheart JW (2003) Evaluation of neural networks for modeling nitrate concentrations in rivers. J Water Resour Plan Manage 129(6):505–510. https://doi.org/10.1061/(ASCE)0733-9496(2003)129:6(505)
Tabachnick BG, Fidell LS (2007) Using multivariate statistics, 5th edn. Pearson/Allyn and Bacon, Boston
Tian Y, Jiang Y, Liu Q, Dong M, Xu D, Liu Y, Xu X (2019) Using a water quality index to assess the water quality of the upper and middle streams of the Luanhe River, northern China. Sci Total Environ 667:142–151. https://doi.org/10.1016/j.scitotenv.2019.02.356
Tiwari S, Babbar R, Kaur G (2018) Performance evaluation of two models for predicting water quality index of river Satluj (India). Adv Civ Eng. https://doi.org/10.1155/2018/8971079
Tripathi M, Singal SK (2019) Use of principal component analysis for parameter selection for development of a novel water quality index: a case study of river Ganga India. Ecol Indic 96:430–436. https://doi.org/10.1016/j.ecolind.2018.09.025
Tsui AP, Jones AJ, De Oliveira AG (2002) The construction of smooth models using irregular embeddings determined by a gamma test analysis. Neural Comput Appl 10(4):318–329. https://doi.org/10.1007/s005210200004
Wang Y, Witten IH (1996) Induction of model trees for predicting continuous classes. Working paper 96/23. Department of Computer Science, University of Waikato. Hamilton, New Zealand. https://hdl.handle.net/10289/1183
Wang P, Yao J, Wang G, Hao F, Shrestha S, Xue B, Xie G, Peng Y (2019) Exploring the application of artificial intelligence technology for identification of water pollution characteristics and tracing the source of water quality pollutants. Sci Total Environ 693:133440. https://doi.org/10.1016/j.scitotenv.2019.07.246
Willmott CJ (1981) On the validation of models. Phys Geogr 2:184–194. https://doi.org/10.1080/02723646.1981.10642213
Wu Z, Wang X, Chen Y, Cai Y, Deng J (2018) Assessing river water quality using water quality index in Lake Taihu Basin, China. Sci Total Environ 612:914–922. https://doi.org/10.1016/j.scitotenv.2017.08.293
Yaseen ZM, Ramal MM, Diop L, Jaafar O, Demir V, Kisi O (2018) Hybrid Adaptive Neuro-Fuzzy Models for Water Quality Index Estimation. Water Resour Manage 32:2227–2245. https://doi.org/10.1007/s11269-018-1915-7
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Najafzadeh, M., Homaei, F. & Farhadi, H. Reliability assessment of water quality index based on guidelines of national sanitation foundation in natural streams: integration of remote sensing and data-driven models. Artif Intell Rev 54, 4619–4651 (2021). https://doi.org/10.1007/s10462-021-10007-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-10007-1