Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Comparative analysis of water quality prediction performance based on LSTM in the Haihe River Basin, China

  • Research Article
  • Published:
Environmental Science and Pollution Research Aims and scope Submit manuscript

Abstract

As the most water shortage and water polluted area in China, the water quality prediction is of utmost needed and important in Haihe River Basin for its water resource management. The long short-term memory (LSTM) has been a widely used tool for water quality forecast in recent years. The performance and adaptability of LSTM for water quality prediction of different indicators needs to be discussed before it adopted in a specific basin. However, literature contains very few studies on the comparative analysis of the various prediction accuracy of different water quality indicators and the causes, especially in Haihe River Basin. In this study, LSTM was employed to predict biochemical oxygen demand (BOD), permanganate index (CODMn), dissolved oxygen (DO), ammonia nitrogen (NH3–N), total phosphorus (TP), hydrogen ion concentration (pH), and chemical oxygen demand digested by potassium dichromate (CODCr). According to results under 24 different input conditions, it is demonstrated that LSTMs present better predicting on BOD, CODMn, CODCr, and TP (median Nash–Sutcliffe efficiency reaching 0.766, 0.835, 0.837, and 0.711, respectively) than NH3–N, DO, and pH (median Nash–Sutcliffe efficiency of 0.638, 0.625, and 0.229, respectively). Besides, the performance of LSTM to predict water quality is linearly related to the maximum value of temporal autocorrelation and cross-correlation coefficients of water quality indicators calculated by maximal information coefficient with the coefficients of determination of 0.79 to approximately 0.80. This study would provide new knowledge and support for the practical application and improvement of the LSTM in water quality prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Not applicable.

References

Download references

Acknowledgements

The authors would like to acknowledge the Hebei Provincial Academy of Ecological Environmental Science, China (http://www.hebhky.cn/) for their data.

Funding

This work was supported by the National Nature Science Foundation of China (no. 41807471) and the Open Research Fund Program of MNR Key Laboratory for Geo-Environmental Monitoring of Great Bay Area (SZU51029202010).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. QL and YW designed all experiments. QL conducted all experiments and analyzed the results. The first draft of the manuscript was written by QL, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yonggui Wang.

Ethics declarations

Ethics approval

The authors express their ethical approval of the contents of the submitted work.

Consent to participate

The authors express their consent to have participated in the submitted work.

Consent for publication

The authors state that the data used is in the public domain and may be published.

Competing interests

The authors declare no competing interests.

Additional information

Responsible Editor: Xianliang Yi

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Highlights

1. Different water quality indicators in the Haihe River Basin are predicted using LSTM.

2. LSTM is more accurate when increasing the time steps of input variables properly.

3. LSTM performs better forBOD, CODMn, CODCr, and TP than NH3–N, DO, and pH.

4. LSTM performance is linearly related to autocorrelation/cross-correlation of inputs.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (1.21 MB)

Appendix A LSTM model structure

Appendix A LSTM model structure

A memory block, whose state at time \(t\) is illustrated in Fig. 3, consists of a forget gate, an input gate, a memory cell, and an output gate (Hochreiter and Schmidhuber 1997). In the last computation at time \((t-1)\), both cell state \(({C}_{t-1})\) and output \(({h}_{t-1})\) are stored by the memory block, and the initial values of \({C}_{0}\) and \({h}_{0}\) are zero. At time \(t\), new inputs \({(X}_{t})\) are available. First, the forget gate, which determines what information to remove, generates a value \({(f}_{t})\) between 0 and 1 as a basis for determining the extent of allowing \({C}_{t-1}\) to pass by combing \({h}_{t-1}\) and \({X}_{t}\) into sigmoid function (Eq. (10)). Meanwhile, a new candidate cell state (\(\widetilde{{c}_{t}})\) and its coefficient can be generated by Eqs. (11) and (12), respectively. Thereafter, the new cell state \(({C}_{t})\) is determined according to Eq. (13). Next, the output gate produces a value \(({o}_{t})\) to determine the parts of the cell state to output based on Eq. (14). Finally, the output is calculated by Eq. (15).

$${f}_{t}= \sigma \left({W}_{xf}{X}_{t}+{W}_{hf}{h}_{t-1}+{b}_{f}\right)$$
(10)
$$\widetilde{{c}_{t}}=\mathrm{tanh}\left({W}_{xc}{X}_{t}+{W}_{hc}{h}_{t-1}+{b}_{c}\right)$$
(11)
$${i}_{t}= \sigma \left({W}_{xi}{X}_{t}+{W}_{hi}{h}_{t-1}+{b}_{t}\right)$$
(12)
$${c}_{t}= {f}_{t}*{c}_{t-1}+{i}_{t}*\widetilde{{c}_{t}}$$
(13)
$${o}_{t}= \sigma \left({W}_{xo}{X}_{t}+{W}_{ho}{h}_{t-1}+{b}_{0}\right)$$
(14)
$${h}_{t}= {o}_{t}*{\mathrm{tanh}(c}_{t})$$
(15)

In Eqs. (10), (11), (12), and (14), \(W\) denotes the matrices of weights for the gates or cells with the corresponding subscripts; \(b\) represents learnable biases. Besides, \(\sigma\) and \(\mathrm{tan}h\) denotes the sigmoid function and the tanh function, respectively.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Q., Yang, Y., Yang, L. et al. Comparative analysis of water quality prediction performance based on LSTM in the Haihe River Basin, China. Environ Sci Pollut Res 30, 7498–7509 (2023). https://doi.org/10.1007/s11356-022-22758-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11356-022-22758-7

Keywords