Abstract
As the most water shortage and water polluted area in China, the water quality prediction is of utmost needed and important in Haihe River Basin for its water resource management. The long short-term memory (LSTM) has been a widely used tool for water quality forecast in recent years. The performance and adaptability of LSTM for water quality prediction of different indicators needs to be discussed before it adopted in a specific basin. However, literature contains very few studies on the comparative analysis of the various prediction accuracy of different water quality indicators and the causes, especially in Haihe River Basin. In this study, LSTM was employed to predict biochemical oxygen demand (BOD), permanganate index (CODMn), dissolved oxygen (DO), ammonia nitrogen (NH3–N), total phosphorus (TP), hydrogen ion concentration (pH), and chemical oxygen demand digested by potassium dichromate (CODCr). According to results under 24 different input conditions, it is demonstrated that LSTMs present better predicting on BOD, CODMn, CODCr, and TP (median Nash–Sutcliffe efficiency reaching 0.766, 0.835, 0.837, and 0.711, respectively) than NH3–N, DO, and pH (median Nash–Sutcliffe efficiency of 0.638, 0.625, and 0.229, respectively). Besides, the performance of LSTM to predict water quality is linearly related to the maximum value of temporal autocorrelation and cross-correlation coefficients of water quality indicators calculated by maximal information coefficient with the coefficients of determination of 0.79 to approximately 0.80. This study would provide new knowledge and support for the practical application and improvement of the LSTM in water quality prediction.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Not applicable.
References
Albanese D, Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C (2013) Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics 29:407–408. https://doi.org/10.1093/bioinformatics/bts707
Antanasijević D, Pocajt V, Povrenović D, Perić-Grujić A, Ristić M (2013) Modelling of dissolved oxygen content using artificial neural networks: Danube River, North Serbia, case study. Environ Sci Pollut Res 20:9006–9013. https://doi.org/10.1007/s11356-013-1876-6
Babel MS, Badgujar GB, Shinde VR (2015) Using the mutual information technique to select explanatory variables in artificial neural networks for rainfall forecasting. Meteorol Appl 22:610–616. https://doi.org/10.1002/met.1495
Bao Z, Zhang J, Wang G, Fu G, He R, Yan X, Jin J, Liu Y, Zhang A (2012) Attribution for decreasing streamflow of the Haihe River basin, northern China: climate variability or human activities? J Hydrol 460–461:117–129. https://doi.org/10.1016/j.jhydrol.2012.06.054
Bennett ND, Croke BFW, Guariso G, Guillaume JHA, Hamilton SH, Jakeman AJ, Marsili-Libelli S, Newham LTH, Norton JP, Perrin C, Pierce SA, Robson B, Seppelt R, Voinov AA, Fath BD, Andreassian V (2013) Characterising performance of environmental models. Environ Model Softw 40:1–20. https://doi.org/10.1016/j.envsoft.2012.09.011
Cao Q, Yu G, Sun S, Dou Y, Li H, Qiao Z (2021) Monitoring water quality of the Haihe River Based on ground-based hyperspectral remote sensing. Water 14:22. https://doi.org/10.3390/w14010022
Dang B, Mao D, Xu Y, Luo Y (2017) Conjugative multi-resistant plasmids in Haihe River and their impacts on the abundance and spatial distribution of antibiotic resistance genes. Water Res 111:81–91. https://doi.org/10.1016/j.watres.2016.12.046
Fernando TMKG, Maier HR, Dandy GC (2009) Selection of input variables for data driven models: an average shifted histogram partial mutual information estimator approach. J Hydrol 367:165–176. https://doi.org/10.1016/j.jhydrol.2008.10.019
Galelli S, Humphrey GB, Maier HR, Castelletti A, Dandy GC, Gibbs MS (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models. Environ Model Softw 62:33–51. https://doi.org/10.1016/j.envsoft.2014.08.015
Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81. https://doi.org/10.2307/2528963
Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: Xing EP, Jebara T (eds) Proceedings of the 31st International Conference on Machine Learning. PMLR, Bejing, China, pp 1764–1772
Hamrick JM (1992) A three-dimensional environmental fluid dynamics computer code: theoretical and computational aspects. Special report in applied marine science and ocean engineering ; no. 317.. Virginia Institute of Marine Science, College of William and Mary. 64.
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu C, Wu Q, Li H, Jian S, Li N, Lou Z (2018) Deep learning with a long short-term memory networks approach for rainfall-runoff simulation. Water (Switzerland) 10. https://doi.org/10.3390/w10111543
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95. https://doi.org/10.1109/MCSE.2007.55
Jiang Y, Li C, Sun L, Guo D, Zhang Y, Wang W (2021) A deep learning algorithm for multi-source data fusion to predict water quality of urban sewer networks. J Clean Prod 318:128533. https://doi.org/10.1016/j.jclepro.2021.128533
Kim M, Gerba CP, Choi CY (2010) Assessment of physically-based and data-driven models to predict microbial water quality in open channels. J Environ Sci 22:851–857. https://doi.org/10.1016/S1001-0742(09)60188-1
Kratzert F, Klotz D, Herrnegger M, Sampson AK, Hochreiter S, Nearing GS (2019) Toward improved predictions in Ungauged Basins: exploiting the power of machine learning. Water Resour Res 55:11344–11354. https://doi.org/10.1029/2019WR026065
Le XH, Ho HV, Lee G, Jung S (2019) Application of long short-term memory (LSTM) neural network for flood forecasting. Water (Switzerland) 11. https://doi.org/10.3390/w11071387
Li L, Jiang P, Xu H, Lin G, Guo D, Wu H (2019) Water quality prediction based on recurrent neural network and improved evidence theory: a case study of Qiantang River, China. Environ Sci Pollut Res 26:19879–19896. https://doi.org/10.1007/s11356-019-05116-y
Li R (2018) Water quality forecasting of Haihe River based on improved fuzzy time series model. Dwt 106:285–291. https://doi.org/10.5004/dwt.2018.22085
Liang N, Zou Z, Wei Y (2019) Regression models (SVR, EMD and FastICA) in forecasting water quality of the Haihe River of China. DWT 154:147–159. https://doi.org/10.5004/dwt.2019.24034
Liang Z, Zou R, Chen X, Ren T, Su H, Liu Y (2020) Simulate the forecast capacity of a complicated water quality model using the long short-term memory approach. J Hydrol 581. https://doi.org/10.1016/j.jhydrol.2019.124432
Liu X-b, Peng W-q, He G-j, Liu J-l, Wang Y-c (2008) A coupled model of hydrodynamics and water quality for Yuqiao Reservoir in Haihe River Basin. J Hydrodyn 20:574–582. https://doi.org/10.1016/S1001-6058(08)60097-9
Lv N, Liang X, Chen C, Zhou Y, Li J, Wei H, Wang H (2020) A long short-term memory cyclic model with mutual information for hydrology forecasting: a case study in the xixian basin. Adv Water Resour 141. https://doi.org/10.1016/j.advwatres.2020.103622
Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15:101–124. https://doi.org/10.1016/S1364-8152(99)00007-9
Maier HR, Dandy GC (1996) The use of artificial neural networks for the prediction of water quality parameters. Water Resour Res 32:1013–1022. https://doi.org/10.1029/96WR03529
Maier HR, Jain A, Dandy GC, Sudheer KP (2010) Methods used for the development of neural networks for the prediction of water resource variables in river systems: current status and future directions. Environ Model Softw 25:891–909. https://doi.org/10.1016/j.envsoft.2010.02.003
May RJ, Dandy GC, Maier HR, Nixon JB (2008) Application of partial mutual information variable selection to ANN forecasting of water quality in water distribution systems. Environ Model Softw 23:1289–1299. https://doi.org/10.1016/j.envsoft.2008.03.008
Mikolov T, Karafiát M, Burget L, Jan C, Khudanpur S (2010) Recurrent neural network based language model, in: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. pp 1045–1048. https://doi.org/10.21437/interspeech.2010-343
Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50:885–900. https://doi.org/10.13031/2013.23153
Najah Ahmed A, Binti Othman F, Abdulmohsin Afan H, Khaleel Ibrahim R, Ming Fai C, Shabbir Hossain M, Ehteram M, Elshafie A (2019) Machine learning methods for better water quality prediction. J Hydrol 578. https://doi.org/10.1016/j.jhydrol.2019.124084
Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I - a discussion of principles. J Hydrol 10:282–290. https://doi.org/10.1016/0022-1694(70)90255-6
Palani S, Liong SY, Tkalich P (2008) An ANN application for water quality forecasting. Mar Pollut Bull 56:1586–1597. https://doi.org/10.1016/j.marpolbul.2008.05.021
Pearson K, Lee A (1900) Mathematical contributions to the theory of evolution. VIII. On the Inheritance of Characters not Capable of Exact Quantitative Measurement. Part I. Introductory. Part II. On the Inheritance of Coat-Colour in Horses. Part III. On the Inheritance of Eye-Co. Philosophical Transactions of the Royal Society of London. Series a, Containing Papers of a Mathematical or Physical Character 195:79–150
Quilty J, Adamowski J, Khalil B, Rathinasamy M (2016) Bootstrap rank-ordered conditional mutual information (broCMI): a nonlinear input variable selection method for water resources modeling. Water Resour Res 52:2299–2326. https://doi.org/10.1002/2015WR016959
Read JS, Jia X, Willard J, Appling AP, Zwart JA, Oliver SK, Karpatne A, Hansen GJA, Hanson PC, Watkins W, Steinbach M, Kumar V (2019) Process-guided deep learning predictions of lake water temperature. Water Resour Res 55:9173–9190. https://doi.org/10.1029/2019WR024922
Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais N, Prabhat (2019) Deep learning and process understanding for data-driven Earth system science. Nature 566:195–204. https://doi.org/10.1038/s41586-019-0912-1
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334:1518–1524. https://doi.org/10.1126/science.1205438
Ritter A, Muñoz-Carpena R (2013) Performance evaluation of hydrological models: statistical significance for reducing subjectivity in goodness-of-fit assessments. J Hydrol 480:33–45. https://doi.org/10.1016/j.jhydrol.2012.12.004
Santhi C, Srinivasan R, Arnold JG, Williams JR (2006) A modeling approach to evaluate the impacts of water quality management plans implemented in a watershed in Texas. Environ Model Softw 21:1141–1157. https://doi.org/10.1016/j.envsoft.2005.05.013
Song C, Yao L, Hua C, Ni Q (2021) A novel hybrid model for water quality prediction based on synchrosqueezed wavelet transform technique and improved long short-term memory. J Hydrol 603:126879. https://doi.org/10.1016/j.jhydrol.2021.126879
Sudriani Y, Ridwansyah I, Rustini HA (2019) Long short term memory (LSTM) recurrent neural network (RNN) for discharge level prediction and forecast in Cimandiri river, Indonesia, in: IOP Conference Series: Earth and Environmental Science. Institute of Physics Publishing, p 012037. https://doi.org/10.1088/1755-1315/299/1/012037
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks, in: Advances in Neural Information Processing Systems. Neural information processing systems foundation, pp 3104–3112.
Tiyasha, Minh Tung T, MundherYaseen Z (2020) A survey on river water quality modelling using artificial intelligence models: 2000–2020. J Hydrol 585:124670. https://doi.org/10.1016/j.jhydrol.2020.124670
Votruba L. (Ed.) (1988) Systems in water resouce management, in: Developments in Water Science, Developments in Water Science. Elsevier, pp 38–86. https://doi.org/10.1016/S0167-5648(08)70921-3
Wang C, Shan B, Zhang H, Zhao Y (2014) Limitation of spatial distribution of ammonia-oxidizing microorganisms in the Haihe River, China, by heavy metals. J Environ Sci 26:502–511. https://doi.org/10.1016/S1001-0742(13)60443-X
Wang Y, Zhou J, Chen K, Wang Y, Liu L (2017) Water quality prediction method based on LSTM neural network, In: Li T, Lopez LM, Li Y (Ed.), 2017 12th International Conference On In℡Ligent Systems And Knowledge Engineering (Ieee Iske).
Xiang Z, Demir I (2020) Distributed long-term hourly streamflow predictions using deep learning – a case study for State of Iowa. Environ Model Softw 131:104761. https://doi.org/10.1016/j.envsoft.2020.104761
Xiang Z, Yan J, Demir I (2020) A rainfall‐runoff model with LSTM‐based sequence‐to‐sequence learning. Water Resour Res 56. https://doi.org/10.1029/2019WR025326
Zhang J, Zhu Y, Zhang X, Ye M, Yang J (2018) Developing a long short-term memory (LSTM) based model for predicting water table depth in agricultural areas. J Hydrol 561:918–929. https://doi.org/10.1016/j.jhydrol.2018.04.065
Zhang L, Zou ZH, Zhao YF (2016) Application of chaotic prediction model based on wavelet transform on water quality prediction. IOP Conf Ser Earth Environ Sci 39:012001. https://doi.org/10.1088/1755-1315/39/1/012001
Zhang X, Jiang HL, Zhang YZ (2012) The hybrid method to predict biochemical oxygen demand of Haihe River in China. AMR 610–613:1066–1069. https://doi.org/10.4028/www.scientific.net/AMR.610-613.1066
Zhang Y, Li C, Jiang Y, Sun L, Zhao R, Yan K, Wang W (2022) Accurate prediction of water quality in urban drainage network with integrated EMD-LSTM model. J Clean Prod 354:131724. https://doi.org/10.1016/j.jclepro.2022.131724
Zheng M, Zheng H, Wu Y, Xiao Y, Du Y, Xu W, Lu F, Wang X, Ouyang Z (2015) Changes in nitrogen budget and potential risk to the environment over 20years (1990–2010) in the agroecosystems of the Haihe Basin, China. J Environ Sci 28:195–202. https://doi.org/10.1016/j.jes.2014.05.053
Zhou Y (2020) Real-time probabilistic forecasting of river water quality under data missing situation: deep learning plus post-processing techniques. J Hydrol 589. https://doi.org/10.1016/j.jhydrol.2020.125164
Zhu Y, Drake S, Lü H, Xia J (2010) Analysis of temporal and spatial differences in eco-environmental carrying capacity related to water in the Haihe river basins, China. Water Resour Manage 24:1089–1105. https://doi.org/10.1007/s11269-009-9487-1
Acknowledgements
The authors would like to acknowledge the Hebei Provincial Academy of Ecological Environmental Science, China (http://www.hebhky.cn/) for their data.
Funding
This work was supported by the National Nature Science Foundation of China (no. 41807471) and the Open Research Fund Program of MNR Key Laboratory for Geo-Environmental Monitoring of Great Bay Area (SZU51029202010).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. QL and YW designed all experiments. QL conducted all experiments and analyzed the results. The first draft of the manuscript was written by QL, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
The authors express their ethical approval of the contents of the submitted work.
Consent to participate
The authors express their consent to have participated in the submitted work.
Consent for publication
The authors state that the data used is in the public domain and may be published.
Competing interests
The authors declare no competing interests.
Additional information
Responsible Editor: Xianliang Yi
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Highlights
1. Different water quality indicators in the Haihe River Basin are predicted using LSTM.
2. LSTM is more accurate when increasing the time steps of input variables properly.
3. LSTM performs better forBOD, CODMn, CODCr, and TP than NH3–N, DO, and pH.
4. LSTM performance is linearly related to autocorrelation/cross-correlation of inputs.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix A LSTM model structure
Appendix A LSTM model structure
A memory block, whose state at time \(t\) is illustrated in Fig. 3, consists of a forget gate, an input gate, a memory cell, and an output gate (Hochreiter and Schmidhuber 1997). In the last computation at time \((t-1)\), both cell state \(({C}_{t-1})\) and output \(({h}_{t-1})\) are stored by the memory block, and the initial values of \({C}_{0}\) and \({h}_{0}\) are zero. At time \(t\), new inputs \({(X}_{t})\) are available. First, the forget gate, which determines what information to remove, generates a value \({(f}_{t})\) between 0 and 1 as a basis for determining the extent of allowing \({C}_{t-1}\) to pass by combing \({h}_{t-1}\) and \({X}_{t}\) into sigmoid function (Eq. (10)). Meanwhile, a new candidate cell state (\(\widetilde{{c}_{t}})\) and its coefficient can be generated by Eqs. (11) and (12), respectively. Thereafter, the new cell state \(({C}_{t})\) is determined according to Eq. (13). Next, the output gate produces a value \(({o}_{t})\) to determine the parts of the cell state to output based on Eq. (14). Finally, the output is calculated by Eq. (15).
In Eqs. (10), (11), (12), and (14), \(W\) denotes the matrices of weights for the gates or cells with the corresponding subscripts; \(b\) represents learnable biases. Besides, \(\sigma\) and \(\mathrm{tan}h\) denotes the sigmoid function and the tanh function, respectively.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Q., Yang, Y., Yang, L. et al. Comparative analysis of water quality prediction performance based on LSTM in the Haihe River Basin, China. Environ Sci Pollut Res 30, 7498–7509 (2023). https://doi.org/10.1007/s11356-022-22758-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-022-22758-7