Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
An Investigation into the Applicability of the SHUD Model for Streamflow Simulation Based on CMFD Meteorological Data in the Yellow River Source Region
Previous Article in Journal
Impact of Water-Induced Corrosion on the Structural Security of Transmission Line Steel Pile Poles
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Catchment Attributes Influencing Performance of Global Streamflow Reanalysis

Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) and Key Laboratory for Water Security in the Guangdong-Hongkong-Macao Greater Bay Area, School of Civil Engineering, Sun Yat-sen University, Guangzhou 510275, China
Water 2024, 16(24), 3582; https://doi.org/10.3390/w16243582
Submission received: 18 October 2024 / Revised: 29 November 2024 / Accepted: 9 December 2024 / Published: 12 December 2024
(This article belongs to the Section Hydrology)
Figure 1
<p>Spatial distributions of <span class="html-italic">KGE</span> (<b>a</b>) and its three components (<b>b</b>–<b>d</b>).</p> ">
Figure 2
<p>Heatmap of the correlation between between <span class="html-italic">KGE</span> and its components (<span class="html-italic">r</span>, <span class="html-italic">γ</span>, <span class="html-italic">β</span>) and catchment attributes. The cells with hatching indicate a <span class="html-italic">p</span>-value of &lt;0.05.</p> ">
Figure 3
<p>Summary plots showing each dot corresponds to a catchment, and the vertical distribution indicates the density.</p> ">
Figure 4
<p>Additive effects of two attributes. The first box shows the SHAP value of the first attribute. The following three boxes represent the SHAP values of the first attribute combined with the low, intermediate, and high values of the second attribute.</p> ">
Figure 5
<p>SHAP dependence plots depicting the mechanism by which the SHAP values change along with catchment attributes. Each subfigure represents a scatter plot of the SHAP value of the attribute versus its corresponding attribute value. Each dot represents a catchment and the density indicates the concentration of the dots (<b>a</b>–<b>l</b>).</p> ">
Figure 6
<p>Spatial pattern of the key catchment attributes based on SHAP values. The spatial distribution of the clusters for primary drivers (<b>a</b>); and the boxplots of key catchment attributes’ SHAP values of each cluster (<b>b</b>).</p> ">
Figure 7
<p>Relative contribution of key catchment attributes for seasonal <span class="html-italic">KGE</span>: (<b>a</b>) December–January–February (DJF); (<b>b</b>) March–April–May (MAM); (<b>c</b>) June–July–August (JJA); and (<b>d</b>) September–October–November (SON).</p> ">
Figure 8
<p>SHAP dependence plots depicting the mechanism by which the SHAP values change with each key catchment attribute by season. Each subfigure represents a scatter plot of the SHAP value for the attribute across the four seasons versus its corresponding attribute value. Each dot represents a catchment (<b>a</b>–<b>l</b>).</p> ">
Figure 9
<p>Heatmap of SHAP interaction values across four seasons.</p> ">
Versions Notes

Abstract

:
Performance plays a critical role in the practical use of global streamflow reanalysis. This paper presents the combined use of random forest and the Shapley additive explanation to examine the mechanism by which catchment attributes influence the accuracy of streamflow estimates in reanalysis products. In particular, the reanalysis generated by the Global Flood Awareness System streamflow is validated by streamflow observations provided by the Catchment Attributes and MEteorology for Large-sample Studies dataset. Results highlight that with regard to the Kling–Gupta efficiency, the reanalysis surpasses mean flow benchmarks in 93% of catchments across the continental United States. In addition, twelve catchment attributes are identified as major controlling factors with spatial patterns categorized into five clusters. Topographic characteristics and climatic indices are also observed to exhibit pronounced influences. Streamflow reanalysis performs better in catchments with low precipitation seasonality and steep slopes or in wet catchments with a low frequency of precipitation events. The partial dependence plot slopes of most key attributes are consistent across the four seasons but the slopes’ magnitudes vary. Seasonal snow exhibits positive effects during snow melting from March to August and negative effects associated with snowpack accumulation from September to February. Catchments with very low precipitation seasonality (values less than −1) show strong seasonal variation in streamflow estimations, with negative effects from June to November and positive effects from December to May. Overall, this paper provides useful information for applications of global streamflow reanalysis and lays the groundwork for further research into understanding the seasonal effects of catchment attributes.

1. Introduction

Global hydrological models (GHMs) have become an established tool to simulate water resources worldwide [1,2,3,4]. They are increasingly adopted for the assessment of climate change impacts, serving as the basis of adaptive management [5].In practice, GHMs are used to investigate the alterations in streamflow trends, variability, and timing; to assess flooding risk and drought occurrence; and to evaluate impacts on environmental flow [6,7,8,9]. Global streamflow reanalysis, generated by applying climate reanalysis data to drive GHMs, plays a key role in supporting the management of water resources at catchment at regional and continental scales because of its ability to provide spatially and temporally continuous streamflow information [5,10]; however, accurate streamflow hydrographs remain challenging because of the heterogeneity of landscape properties and the complexity of interactions between climatic inputs and catchment characteristics [11,12].
The performance of GHMs plays a critical role in their practical applications [13,14,15]. To facilitate the applications of GHMs, end users must assess their performance by comparing their outputs against observed data to identify their strengths and weaknesses [16]. Previous studies in large-sample hydrology, which examined data from numerous catchments across diverse regions and climates, have investigated the links between hydrological processes, catchment characteristics, and climate factors [17,18,19]. Hydrological regimes are identified by clustering analysis, and then hydrological processes are related to different geophysical and climatic drivers [12,20]. Meanwhile, the meta-analysis results highlight the existence of some catchment attributes, for example, catchment area, aridity, and elevation, which commonly influence the predictive performance of hydrological models [19]. However, the intensive computation required by the large number of parameters in GHMs often hinders the analysis of key drivers of their performance over large domains [21]. Moreover, the influence of catchment attributes on the accuracy of streamflow estimates in global reanalysis has seasonal variations, which are yet to be fully explored [22,23]. In addition, the effects of various catchment attributes are cross-dependent and they can interact nonlinearly [20].
Machine learning (ML) and explainable artificial intelligence (XAI) are effective in extracting nonlinear relationships between catchment attributes and hydrological signatures [24]. ML has demonstrated its power to identify the key catchment attributes and provide a comprehensive understanding of the role of catchment attributes in hydrological processes from a different perspective. Although the aforementioned studies highlight that catchment attributes influence the performance of hydrological models, relatively few studies focus on GHMs and the specific influence of individual catchment attribute values [13]. Additionally, the influence of seasonal variations and interactions between attributes has not yet been investigated. As an XAI approach, Shapley additive explanations (SHAP) reverse-engineer the output of a machine learning model to understand the relationships between inputs and outputs learned by the model [25]. SHAP values not only reflect the global importance of catchment attributes for the entire dataset but also evaluate individual samples [26]. Therefore, partial dependence plots (PDPs) calculate the marginal effects of a given predictor variable in a certain condition, which enables an understanding of the influence of specific catchment attribute values on the accuracy of reanalysis data to simulate hydrographs. Additionally, the SHAP enable the analysis of interactions between attributes.
In this study, the combined use of ML and XAI to identify key catchment attributes influencing the performance of global streamflow reanalysis was presented [27]. In particular, a random forest was utilized to fit nonlinear relationships between the performance of the global streamflow reanalysis and catchment attributes [28]; SHAP values were used to measure the contribution of each feature [26]; and the PDPs were utilized to visualize the sensitivity of each feature with regard to the prediction [29]. The main objectives of this study are as follows: (1) to evaluate the simulated hydrographs in the global streamflow reanalysis across 671 catchments over the continental United States (CONUS); (2) to identify the key catchment attributes that influence the performance; and (3) to detect the seasonal effects of key catchment attributes. The results of this study will effectively illustrate the catchment attributes that influence the performance of global streamflow reanalysis to produce reliable hydrographs.

2. Data

2.1. Global Streamflow Reanalysis

The worldwide Global Flood Awareness System (GloFAS) streamflow reanalysis v2.1 is produced by coupling the land surface model runoff component of the ECMWF ERA5 global reanalysis [30] with the LISFLOOD hydrological and channel routing model [4,31]. The ERA5 atmospheric reanalysis provides essential variables for simulating runoff, including precipitation, daily mean surface air temperature, relative humidity, incoming solar radiation, net longwave radiation, and mean wind speed. First, by generating ERA5 runoff across a grid cell, the runoff is routed by LISFLOOD to generate streamflow. In particular, 41.6% of the parameters underpinning operational GloFAS using daily streamflow observations from 1287 stations across the globe are located in North America [32]. The GloFAS reanalysis covers the time period from 1 January 1979 to near real-time at a daily time-step and with a spatial resolution of 0.1°. Previous studies have shown that the GloFAS reanalysis hydrographs are highly representative of observed flows, achieving 86% accuracy compared to simply using the mean annual discharge across 1801 case study catchments [27,33].

2.2. CAMELS Dataset

Streamflow observations are sourced from the Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) dataset [24]. The CAMELS dataset, comprising a large number of catchments and a diverse range of attributes, is suitable for large-sample studies. For each basin, the CAMELS dataset provides a daily time series of meteorological forcing and streamflow and additional quantitative attributes. The daily streamflow observations derived from the United States Geological Survey streamflow gages cover the time period from 1980 to 2014. All catchments have a minimum of 20 years of continuous discharge records. This dataset is built upon the Model Parameter Estimation Experiment dataset to cover 671 headwater-type basins with minimal human influence. The size of the 671 catchments ranges from 4 km2 to 25,000 km2 and the mean elevation varies from 15 m to 3529 m.
Catchment attributes are descriptors of landscape features that likely influence hydrological processes [24]. The CAMELS dataset covers a total of 43 climatic and physiographic attributes [34]. The large-sample dataset of catchment attributes is an extension of the N15 dataset, and it is obtained by integrating multiple available attribute datasets [18,34]. In general, the catchment attributes have five classes, that is, topographic characteristics, climatic indices, land cover characteristics, soil characteristics, and geological characteristics. According to Addor et al. (2018), some highly correlated attributes—for example, the leaf area index difference and green vegetation fraction, which are highly correlated with the leaf area index maximum, as well as soil porosity and conductivity, which are highly correlated with the sand fraction—were excluded [24]. He then constructed a random forest model to evaluate the predictability of hydrological signatures using catchment attributes. Ultimately, 29 out of 43 attributes were selected as having a strong influence. These 29 attributes were utilized in this study to investigate their influence on the performance of GloFAS reanalysis. The 29 considered attributes are presented in Table 1.

3. Methods

3.1. Assessment Performance

First, streamflows from the GloFAS reanalysis product were aligned with those in the CAMELS observation dataset. Considering the possible mislocations of the river network in streamflow routines [37], the neighboring grids were used for correction, and the time series of the GloFAS reanalysis was processed as follows: (1) the initial station cell was located in accordance with the latitude and longitude of the catchment gauge station; (2) the Kling–Gupta efficiency (KGE) was calculated between the GloFAS reanalysis and the observed catchment streamflow time series for the initial cell and eight surrounding cells; (3) the target cell was considered as possessing a large KGE, and its latitude and longitude were saved; (4) after identifying the target cell, extract the corresponding grid’s streamflow reanalysis time series based on its latitude and longitude.
The KGE coefficient accounts for the bias ratio, variability ratio, and correlation in the evaluation of the GloFAS reanalysis, which is calculated as follows:
K G E = 1 r 1 2 + ( β 1 ) 2 + ( γ 1 ) 2
where r is Pearson’s linear correlation between observations and reanalysis; β is the measure of mean error; and γ is the measure of variability error. As shown in Equation (2), the correlation measures how well the reanalysis can capture the temporal pattern:
r = t = 1 T ( s i m t μ s i m ) ( o b s t μ o b s ) t = 1 T ( s i m t μ s i m ) 2 · t = 1 T ( o b s t μ o b s ) 2
The bias ratio measures how well the model captures the water balance.
β = μ s i m μ o b s
The variability ratio quantifies the model’s ability to capture the streamflow variability.
γ = σ s i m / μ s i m σ o b s / μ o b s
For the three components of KGE, the optimal value for the correlation coefficient is 1, indicating a perfect match between the trends of reanalysis and observations. The ideal value for the bias ratio is also 1, signifying that the mean values of reanalysis and observations are identical. Similarly, the variability ratio achieves its best value at 1, representing equal coefficients of variation for reanalysis and observations [38,39].
To ease the interpretation of a large sample dataset and to avoid bias introduced by the highly negative KGE values, the C2M transformation method proposed by Mathevet et al. (2006) was utilized [40].
C 2 M : K G E = K G E 2 K G E
In general, the transformed KGE* is bounded within [−1,1], with 1 being both the optimal and the maximum possible value. Notably, the abovementioned transformation does not alter the performance rankings among different catchments.

3.2. Random Forest Modeling

Random forest is a popular ML algorithm that extends the standard classification and regression tree by creating a collection of trees with binary divisions [28,41]. The focus in this study was on catchments where streamflow reanalysis hydrographs were deemed comparable to the observed values. The criterion used to select catchments was a KGE value > −0.41, indicating that the streamflow reanalysis hydrograph performs better than simply using the observed mean streamflow, or benchmark, as a predictor [27]. To circumvent overfitting, the number of model predictors must be reduced to as few as possible while retaining predictive accuracy [42]. The Recursive Feature Elimination algorithm was used to extract the optimal features by evaluating the influence on prediction accuracy through removing predictors one by one. This process selected 12 key attributes from the 29 considered attributes. To further investigate the influence of catchment attributes on performance, this study conducted an in-depth analysis to explore the seasonal effects of catchment attributes, which were identified as key catchment attributes on a KGE prediction. For implementation, a 10-fold cross-validation was used to select the best parameter of the random forest model with the highest validation R2 value. Finally, the average validation R2 is 0.55, with 200 estimators, 42 random states, and a maximum tree depth of 20.

3.3. Shapley Additive Explanations

As an important XAI method, the TreeExplainer-based SHAP framework was adopted to quantify the importance of catchment attributes [26]. The SHAP leverage the principles of Shapley values to quantify the contributions of each feature [43]. In particular, the method begins by obtaining the model’s “base value”, which is computed as the average of all predicted values. Subsequently, the SHAP break down the importance of each input feature by assessing the contribution of each independent variable to the final prediction [44].
g ( z ) = ϕ 0 + j = 1 M ϕ j z j
and:
ϕ i = S N \ { i } S ! ( M S 1 ) ! M ! [ f x ( S { i } ) f x ( S ) ]
where g is the explanation model; z 0 , 1 M is the coalition vector; ϕ 0 is the mean value of the prediction value; M is the number of input features; and ϕ j R is the feature attribution for a feature j ; S is the set of non-zero indexes in z ; and f x is the expected value of the function conditioned on a subset of input features.
When using the SHAP values, individual samples can have their own set of variable importance, that is, local variable importance [25]. Variables that consistently demonstrate importance in the predictions across all samples are referred to as variables of global importance. The global importance can be obtained as the average of absolute SHAP values for each catchment attribute. Catchment attributes with larger absolute Shapley values have higher predictive power.
I j = 1 n i = 1 n ϕ j ( i )
where I j is the SHAP feature importance; j and i are the input variable and data sample, respectively; and n is the number of samples. The SHAP dependence plots were used to investigate the mechanism by which the catchment attributes’ importance changes as their values vary. If the SHAP value is greater than 0, then the feature has a positive effect on the model output and vice versa.

3.4. Self-Organizing Map Clustering

The self-organizing map (SOM) network is an unsupervised method that automatically identifies patterns in the samples [45]. It autonomously organizes and adjusts network parameters without the need for manual intervention during training. The network consists only of an input layer and an output layer. Input vectors are fed into the network through the input layer, and the network performs competitive learning based on a clustering rule to determine the winning neuron. Using the neighborhood radius and learning rate, the weight vectors are automatically adjusted, moving closer to or further away from the input vectors. Through continuous learning and adjustment, all input samples eventually converge to the appropriate positions. The SOM clustering method is used to classify the CAMELS data based on SHAP values of key catchment attributes.

4. Results

4.1. Performance of Streamflow Reanalysis

The performance of the GloFAS reanalysis is shown in Figure 1. High KGE values are observed in the West and the East. By contrast, relatively low KGE values are observed in the semi-arid Great Plains region, arid Southwest, Central region around the Great Lakes, and Florida. Considering that the KGE value for the benchmark of mean flow is approximately −0.41 [46], where the benchmark refers to using the observed mean streamflow as a baseline, a KGE higher than −0.41 can be observed across 93% of catchments. The median KGE across all catchments is 0.17 with an interquartile range of −0.01 to 0.33. Poor KGE values are generally due to the erroneous variability and bias ratios. For example, in the Mississippi River Basin (as shown in Figure 1c), many catchments are represented by red dots, indicating a variability ratio close to 0, which suggests that the variability in the streamflow reanalysis hydrograph is much smaller than that of the observed data. All catchments show a positive correlation, with a median Pearson’s correlation coefficient of 0.35 (Figure 1b). This indicates a relatively good alignment between the trends of the streamflow reanalysis hydrograph and observations. GloFAS reanalysis shows lower variability than observations in 78% of catchments, that is, a variability ratio of <1 with a median variability ratio of 0.71. Moreover, the variability of the GloFAS reanalysis is higher than observations in the western mountain area (Figure 1c). In 65 % of the catchments, GloFAS reanalysis exhibits a median bias ratio of 0.88 (Figure 1d).
Spearman’s rank correlation between the KGE, its three components, and the considered catchment attributes is presented in Figure 2. The cells with hatching indicate a p-value of <0.05. Compared with soil, land cover, and geological characteristics, topographic characteristics and climatic indices exhibit higher correlation coefficients. The KGE is positively correlated with the mean slope, mean elevation, and forest fraction. By contrast, the KGE is negatively correlated with precipitation seasonality, aridity, frequency of high precipitation, and depth to the bedrock. Catchment attributes with a Spearman’s rank correlation coefficient greater than 0.2 do not show statistical significance. A p-value greater than 0.05 indicates no significant linear or monotonic relationship. For the three components of the KGE, the correlation patterns with catchment attributes are almost consistent with the pattern of the KGE, particularly the correlation symbol and relative magnitude.

4.2. Effects of Catchment Attributes

The relative importance of the key catchment attributes influencing the accuracy of streamflow estimates in the GloFAS reanalysis product is illustrated in Figure 3. The ranking of importance, based on SHAP feature importance, is ordered along the y-axis, while the direction of driver effects is revealed on the x-axis. Positive SHAP values indicate more accurate streamflow estimates in the GloFAS reanalysis, which is associated with a higher KGE. The gradient color indicates the values of catchment attributes. Climatic indices and topographic characteristics have a remarkable influence, whereas attributes related to soil, land cover, and geology exhibit limited influence. The first two primary drivers are precipitation seasonality and mean slope, both with long right tails. Therefore, catchments with a precipitation peak in winter or steeper slopes have more accurate streamflow reanalysis hydrographs. Similarly, the catchment area has tails for both sides. This result confirms previous findings, that is, streamflow in smaller catchments is generally more difficult to simulate, whereas larger catchments have better performance [27]. In addition, the LAI maximum, soil, and subsurface permeability do not have remarkable effects.
The combined effects of tailing catchment attributes are shown in Figure 4. Since Figure 3 demonstrates a tailing effect for precipitation seasonality, mean slope, and catchment area, there is a positive correlation between catchment area and the KGE, with a few small catchments leading to very low SHAP values. Some example catchments, e.g., a small basin with low slope values, high seasonality, and high aridity would have bad KGE values. Additive effects of two attributes are utilized to analyze the combined effect of two catchment attributes, which influences the accuracy of streamflow estimates in reanalysis products. Catchments are divided into the smallest-third, intermediate-third, and largest-third based on the values of catchment attributes. Figure 4 shows that as aridity increases, the SHAP values for catchments with high seasonality decrease. As the slope increases, the SHAP values for small catchments increase. This indicates that in catchments with high seasonality, higher aridity exacerbates the degradation of the accuracy of streamflow estimates in streamflow reanalysis. In small catchments, a flatter slope leads to more severe degradation of streamflow reanalysis estimate accuracy.
PDPs are used to analyze how key catchment attributes enhance or deteriorate model performance, as shown in Figure 5. The red horizontal line is used to differentiate between positive and negative effects. The locally weighted scatterplot smoothing (LOWESS), a non-parametric regression method for smoothing data, is used to fit the nonlinear trends in these scatterplots. The influence patterns of the drivers are divided into three types: (1) increasing PDP slopes; (2) decreasing PDP slopes; and (3) nonlinear PDP slopes. The LOWESS curves decrease as the aridity, precipitation seasonality, frequency of high-precipitation events, and subsurface permeability increase. As shown in Figure 5c, when the catchment mean slope is less than 100 m/km, it exerts a relatively monotonous linear increase effect. The performance is highly sensitive to a small catchment area and aridity (Figure 5c,e). When the precipitation seasonality is less than −0.3, the slope is greater than 47 m/km; the aridity is less than 0.87; the frequency of high-precipitation events is less than 22 days/year; the catchment area is greater than 480 km2; and the forest fraction is greater than 0.52. SHAP values of these attributes are greater than 0, indicating that the attribute values within these ranges enhance model performance. In addition, nonmonotonic characteristics of the slope trend are observed between performance and catchment attributes such as clay fraction, sand fraction, and LAI maximum.
The spatial patterns of key drivers for each catchment are shown in Figure 6. SOM is utilized on the SHAP values of key catchment attributes, with the number of clusters determined to be five using the silhouette score. Streamflow reanalysis performs better in cluster 1 and cluster 2. Cluster 1 is located in the northwest of the CONUS, where precipitation seasonality and catchment slope have positive effects, as shown in the boxplots on the left. Massman (2020) found that the hydrological model’s performance and precipitation have a positive and significant correlation in this area [20]. Cluster 2 is situated in the northern Appalachian Mountains, which are characterized by high SHAP values for extreme precipitation frequency, aridity, elevation, forest cover, and soil properties compared with other clusters. Massman (2020) also highlights a correlation between the hydrological model’s performance and the baseflow index in this region [20]. Cluster 3 is found in the Southeastern U.S., featuring low elevation, gentle slopes, and high forest cover. Compared with cluster 2, cluster 3 experiences more intense precipitation [25]. Cluster 4 is located in the high-latitude regions near the Great Lakes, where the fraction of precipitation falling as snow, the clay fraction, and LAI maximum have a negative effect on the performance of the streamflow reanalysis. Cluster 5 is mainly distributed in the Southwestern and Central U.S., dominated by aridity, with the lowest SHAP values for aridity, frequency of high-precipitation events, and forest among all clusters.

4.3. Seasonality Effects of Key Catchment Attributes

The relative contribution of catchment attributes to streamflow-reanalysis-simulated hydrographs is investigated by season using the SHAP values. As shown in Figure 7, topographic characteristics and climatic indices collectively contribute to nearly three-quarters. The catchment area and mean slope are the factors within the topographic characteristics that exhibit the greatest seasonal variation, particularly with a substantial increase during March, April, and May (MAM). An increased contribution from mean elevation is observed during September, October, and November (SON), as well as December, January, and February (DJF). Regarding climatic indices, the contribution patterns exhibit notable variations seasonally. Aridity maintains a relatively high contribution across all seasons, with marked effects in SON. The contribution of precipitation seasonality is more pronounced in DJF and MAM, whereas the frequency of high-precipitation events plays a more substantial role in June, July, and August (JJA), as well as SON. Furthermore, the proportion of precipitation falling as snow has an increased influence in JJA and DJF. Compared with other seasons, the LAI maximum shows increased contributions in SON and DJF, and the geological factors are more influential in MAM and JJA.
The relationships between catchment attributes and SHAP values are shown by season in Figure 8. Different colors are used to represent different seasons. Despite seasonal variations, the effects of catchment attributes are consistent across the four seasons. Catchments with very low precipitation seasonality (values less than −1) show strong seasonal variation in streamflow estimations, with lower accuracy from June to November and higher accuracy from December to May (Figure 8a). In snow-dominated catchments with a high fraction of precipitation falling as snow (values greater than 0.5), snow exhibits positive effects during snow melting from March to August and negative effects associated with snowpack accumulation from September to February (Figure 8i). Catchments with a high frequency of high-precipitation events are more sensitive to deteriorated model performance in JJA (Figure 8d). Meanwhile, in catchments with higher elevations (above 2000 m), the negative influence on model performance during DJF becomes more pronounced as elevation increases (Figure 8f). Moreover, catchment attributes related to soil and geological characteristics exhibit a greater sensitivity in affecting model performance during the wet season. In particular, subsurface permeability is more sensitive in MAM and JJA, and the sand fraction shows a different pattern in JJA compared with other seasons (Figure 8j). The LAI maximum is more sensitive in DJF and SON (Figure 8k) and a low forest fraction amplifies sensitivity in DJF (Figure 8g).
SHAP interaction values reveal the extent of pairwise contributions between key catchment attributes on the performance (Figure 9). Most of the SHAP interaction values across the four seasons exhibited small magnitudes, indicating that individual contributions dominate within each season. Nevertheless, specific catchment attribute interactions are highly prominent, and they vary with the seasons. During DJF, the aridity and fraction of precipitation falling as snow have the most prominent interactions. The values in the first five rows are larger than the other seasons, indicating that snow makes the interactions among topographic characteristics more influential [47]. The interaction between area and mean slope has the largest magnitude in MAM. In addition, precipitation seasonality and subsurface permeability exhibit a substantial interaction. The interacting effects of the frequency of high-precipitation events with mean slope, and of the frequency of high-precipitation events with soil characteristics, are more pronounced in JJA. Therefore, the infiltration excess overland flow process modeling is affected by the high precipitation in summer and the mean slope interaction [48]. On the contrary, the aridity and LAI maximum have a remarkable interaction effect on topographic characteristics and precipitation seasonality in SON. Therefore, the partitioning of precipitation into runoff, evaporation, and storage becomes more variable during the arid season [20].

5. Discussion

5.1. Key Control Factors on Streamflow Reanalysis Discharge Simulations

The GloFAS-ERA5 streamflow reanalysis has great application potential for local water management [27]. Building upon previous studies that assess the performance in comparison with observed data, this paper presents a novel XAI-based investigation that links the performance with catchment attributes without relying on computationally intensive hydrological modeling. The results confirm previous findings, that is, the key drivers for hydrological model performance are the seasonality and timing of precipitation, aridity, catchment area, and fraction of precipitation falling as snow [19,20]. Meanwhile, this paper highlights the effects of the catchment’s mean slope, forest fraction, and soil, which differ from findings typically associated with lumped models. This difference is primarily due to the fact that the input data for distributed GHMs includes factors such as soil texture, vegetation, river routing, etc. This result suggests that more accurate DEM data and vegetation data play an important role in improving GHMs to simulate hydrographs [1,49,50]. Moreover, the spatial patterns of the control factors indicate that streamflow reanalysis performs better in catchments with low precipitation seasonality and steep slopes or in wet catchments with a low frequency of precipitation events.

5.2. Relationship Between Streamflow Reanalysis Performance and Catchment Attributes

The relationships between the performance and catchment attributes are unveiled by PDPs, and the results of this paper are in agreement with previous studies [20,51]. In particular, the performance decreases as the aridity and frequency of high-precipitation events increase, whereas it increases as the catchment area and catchment mean slope increase. Notably, this paper extends upon previous research by revealing the seasonal effects of catchment attributes on the performance. It was found that the effect of the seasonality and timing of precipitation as well as the fraction of precipitation falling as snow exhibit significant seasonal variations. Seasonal impacts of snow demonstrate a positive effect during the snow-melting in MAM and JJA and a negative effect associated with snowpack accumulation in SON and DJF. Catchments with very low precipitation seasonality (values less than −1) show strong seasonal variations in streamflow estimations, highlighting the relationship between model performance and the seasonal soil water content of the catchment. Catchments with smaller LAI maximum values exhibit lower SHAP values in DJF and SON, indicating that GHMs inadequately capture vegetation changes in regions with a low LAI maximum. PDPs allow us to determine SHAP values under specific attribute conditions. Leveraging the additive principle of SHAP [26], it is further possible to infer SHAP values for various complex combinations of catchment attributes, offering valuable guidance for applying GloFAS-ERA5 streamflow reanalysis in ungauged regions. The seasonal variations in the influence of catchment attributes provide valuable insights for improving the process descriptions in GHMs.

5.3. Seasonal Variations in the Effects of Topographic Characteristics

Seasonal variations in catchment conditions, such as soil water content, vegetation changes, and freeze–thaw cycles, can induce seasonal changes in runoff generation processes and seasonal water balance, which in turn affect the accuracy of hydrograph simulations across different seasons [23]. The topographic characteristics and soils do not change from season to season, whereas their effects show seasonal variation. Rainfall and snowmelt dominate the hydrological budget during spring and summer, which may increase infiltration in excess of field capacity, leading to greater subsurface drainage [52]. Seasonal freezing and thawing of soil alters its hydrological properties, which can affect catchment soil water drainage [53]. On the other hand, the mean slope and area of the catchment contribute to a greater extent during spring than in other seasons. Most of the snowmelt runoff comes from the melting snow, and its magnitude depends on the slope of the catchment and the spatial variability of the snowpack within it [54]. Moreover, SHAP interaction values also reveal that the topographic characteristics interact with climatic indices, varying seasonally. Therefore, incorporating seasonal variations in hydrological model parameters is a practical approach to address deficiencies in GHMs and improve their performance [22].

6. Conclusions

This paper examines how catchment attributes influence the performance of global streamflow reanalysis. In particular, the performance was investigated across the CONUS based on the CAMELS dataset. The results indicate that the GloFAS reanalysis surpasses mean flow benchmarks in 93.4% of the catchments. The precipitation seasonality, mean slope, aridity, frequency of high-precipitation events, area, mean elevation, forest fraction, clay fraction, sand fraction, subsurface permeability, LAI maximum, and fraction of precipitation falling as snow are identified as key catchment attributes. The relationship between key catchment attributes and the performance revealed by PDPs is consistent with the previous literature and it extends upon previous research by revealing the seasonal effects of key catchment attributes. The streamflow reanalysis performs better under a low precipitation seasonality with a steep slope or in wet catchments with a low frequency of precipitation events, which are found in cluster 1 and cluster 2, respectively. Most of the key attributes of the PDP slopes are consistent across the four seasons but the slopes’ magnitudes vary. Some catchment attributes show pronounced seasonal effect differences within certain ranges (i.e., precipitation seasonality, frequency of high-precipitation events, LAI maximum, and fraction of precipitation falling as snow). Notably, the topographic characteristic parameters remain unchanged but their effects on the performance show seasonal variations. Collectively, this paper provides useful information for practical applications of streamflow reanalysis and lays the groundwork for further research into understanding the seasonal effects of catchment attributes.

Funding

This research is supported by the Ministry of Science and Technology of China (2023YFF0804900), the National Natural Science Foundation of China (52379033 and 52409120), and the Guangdong Provincial Department of Science and Technology (2019ZT08G090).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Open Research

The GloFAS-ERA5 streamflow reanalysis v2.1 can be downloaded from the Copernicus Climate Data Store (https://cds.climate.copernicus.eu/, accessed on 10 December 2024). The Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset can be sourced from the University Corporation for Atmospheric Research (https://ral.ucar.edu/solutions/products/camels, accessed on 10 December 2024).

References

  1. Alcamo, J.; Döll, P.; Henley, B.J.; Kaspar, F.; Lehner, B.; Rösch, T.; Siebert, S. Development and Testing of the WaterGAP 2 Global Model of Water Use and Availability. Hydrol. Sci. J. 2003, 48, 317–337. [Google Scholar] [CrossRef]
  2. Döll, P.; Kaspar, F.; Lehner, B. A Global Hydrological Model for Deriving Water Availability Indicators: Model Tuning and Validation. J. Hydrol. 2003, 270, 105–134. [Google Scholar] [CrossRef]
  3. Greve, P.; Burek, P.; Wada, Y. Using the Budyko Framework for Calibrating a Global Hydrological Model. Water Resour. Res. 2020, 56, e2019WR026280. [Google Scholar] [CrossRef]
  4. Alfieri, L.; Lorini, V.; Hirpa, F.A.; Harrigan, S.; Zsoter, E.; Prudhomme, C.; Salamon, P. A Global Streamflow Reanalysis for 1980–2018. J. Hydrol. X 2020, 6, 100049. [Google Scholar] [CrossRef]
  5. Burek, P.; Satoh, Y.; Kahil, T.; Tang, T.; Greve, P.; Smilovic, M.; Guillaumot, L.; Zhao, F.; Wada, Y. Development of the Community Water Model (CWatM v1.04)—A High-Resolution Hydrological Model for Global and Regional Assessment of Integrated Water Resources Management. Geosci. Model Dev. 2020, 13, 3267–3298. [Google Scholar] [CrossRef]
  6. Gosling, S.N.; Zaherpour, J.; Mount, N.J.; Hattermann, F.F.; Dankers, R.; Arheimer, B.; Breuer, L.; Ding, J.; Haddeland, I.; Kumar, R.; et al. A Comparison of Changes in River Runoff from Multiple Global and Catchment-Scale Hydrological Models under Global Warming Scenarios of 1 °C, 2 °C and 3 °C. Clim. Change 2017, 141, 577–595. [Google Scholar] [CrossRef]
  7. van der Wiel, K.; Wanders, N.; Selten, F.M.; Bierkens, M.F.P. Added Value of Large Ensemble Simulations for Assessing Extreme River Discharge in a 2 °C Warmer World. Geophys. Res. Lett. 2019, 46, 2093–2102. [Google Scholar] [CrossRef]
  8. Yang, T.; Sun, F.; Gentine, P.; Liu, W.; Wang, H.; Yin, J.; Du, M.; Liu, C. Evaluation and Machine Learning Improvement of Global Hydrological Model-Based Flood Simulations. Environ. Res. Lett. 2019, 14, 114027. [Google Scholar] [CrossRef]
  9. Messager, M.L.; Lehner, B.; Cockburn, C.; Lamouroux, N.; Pella, H.; Snelder, T.; Tockner, K.; Trautmann, T.; Watt, C.; Datry, T. Global Prevalence of Non-Perennial Rivers and Streams. Nature 2021, 594, 391–397. [Google Scholar] [CrossRef]
  10. Emerton, R.E.; Stephens, E.M.; Pappenberger, F.; Pagano, T.C.; Weerts, A.H.; Wood, A.W.; Salamon, P.; Brown, J.D.; Hjerdt, N.; Donnelly, C.; et al. Continental and Global Scale Flood Forecasting Systems. WIREs Water 2016, 3, 391–418. [Google Scholar] [CrossRef]
  11. McDonnell, J.J.; Sivapalan, M.; Vaché, K.; Dunn, S.; Grant, G.; Haggerty, R.; Hinz, C.; Hooper, R.; Kirchner, J.; Roderick, M.L.; et al. Moving beyond Heterogeneity and Process Complexity: A New Vision for Watershed Hydrology. Water Resour. Res. 2007, 43. [Google Scholar] [CrossRef]
  12. Kuentz, A.; Arheimer, B.; Hundecha, Y.; Wagener, T. Understanding Hydrologic Variability across Europe through Catchment Classification. Hydrol. Earth Syst. Sci. 2017, 21, 2863–2879. [Google Scholar] [CrossRef]
  13. Andersson, J.; Pechlivanidis, I.; Gustafsson, D.; Donnelly, C.; Arheimer, B. Key Factors for Improving Large-Scale Hydrological Model Performance. Eur. Water 2015, 49, 77–88. [Google Scholar]
  14. Zaherpour, J.; Gosling, S.N.; Mount, N.; Schmied, H.M.; Veldkamp, T.I.E.; Dankers, R.; Eisner, S.; Gerten, D.; Gudmundsson, L.; Haddeland, I.; et al. Worldwide Evaluation of Mean and Extreme Runoff from Six Global-Scale Hydrological Models That Account for Human Impacts. Environ. Res. Lett. 2018, 13, 065015. [Google Scholar] [CrossRef]
  15. Merz, R.; Miniussi, A.; Basso, S.; Petersen, K.-J.; Tarasova, L. More Complex Is Not Necessarily Better in Large Scale Hydrological Modelling—A Model Complexity Experiment across the Contiguous United States. Bull. Am. Meteorol. Soc. 2022, 103, E1947–E1967. [Google Scholar] [CrossRef]
  16. Ward, P.J.; Jongman, B.; Salamon, P.; Simpson, A.; Bates, P.; De Groeve, T.; Muis, S.; de Perez, E.C.; Rudari, R.; Trigg, M.A.; et al. Usefulness and Limitations of Global Flood Risk Models. Nat. Clim. Change 2015, 5, 712–715. [Google Scholar] [CrossRef]
  17. Addor, N.; Do, H.X.; Alvarez-Garreton, C.; Coxon, G.; Fowler, K.; Mendoza, P.A. Large-Sample Hydrology: Recent Progress, Guidelines for New Datasets and Grand Challenges. Hydrol. Sci. J. 2020, 65, 712–725. [Google Scholar] [CrossRef]
  18. Newman, A.J.; Clark, M.P.; Sampson, K.; Wood, A.; Hay, L.E.; Bock, A.; Viger, R.J.; Blodgett, D.; Brekke, L.; Arnold, J.R.; et al. Development of a Large-Sample Watershed-Scale Hydrometeorological Data Set for the Contiguous USA: Data Set Characteristics and Assessment of Regional Variability in Hydrologic Model Performance. Hydrol. Earth Syst. Sci. 2015, 19, 209–223. [Google Scholar] [CrossRef]
  19. Parajka, J.; Viglione, A.; Rogger, M.; Salinas, J.L.; Sivapalan, M.; Blöschl, G. Comparative Assessment of Predictions in Ungauged Basins &ndash; Part 1: Runoff-Hydrograph Studies. Hydrol. Earth Syst. Sci. 2013, 17, 1783–1795. [Google Scholar] [CrossRef]
  20. Massmann, C. Identification of Factors Influencing Hydrologic Model Performance Using a Top-down Approach in a Large Number of U.S. Catchments. Hydrol. Process. 2020, 34, 4–20. [Google Scholar] [CrossRef]
  21. Yan, H.; Sun, N.; Eldardiry, H.; Thurber, T.B.; Reed, P.M.; Malek, K.; Gupta, R.; Kennedy, D.; Swenson, S.C.; Hou, Z.; et al. Large Ensemble Diagnostic Evaluation of Hydrologic Parameter Uncertainty in the Community Land Model Version 5 (CLM5). J. Adv. Model. Earth Syst. 2023, 15, e2022MS003312. [Google Scholar] [CrossRef]
  22. Lan, T.; Lin, K.; Xu, C.-Y.; Liu, Z.; Cai, H. A Framework for Seasonal Variations of Hydrological Model Parameters: Impact on Model Results and Response to Dynamic Catchment Characteristics. Hydrol. Earth Syst. Sci. 2020, 24, 5859–5874. [Google Scholar] [CrossRef]
  23. Deng, C.; Liu, P.; Wang, D.; Wang, W. Temporal Variation and Scaling of Parameters for a Monthly Hydrologic Model. J. Hydrol. 2018, 558, 290–300. [Google Scholar] [CrossRef]
  24. Addor, N.; Nearing, G.; Prieto, C.; Newman, A.J.; Le Vine, N.; Clark, M.P. A Ranking of Hydrological Signatures Based on Their Predictability in Space. Water Resour. Res. 2018, 54, 8792–8812. [Google Scholar] [CrossRef]
  25. Lin, X.; Fan, J.; Hou, Z.J.; Wang, J. Machine Learning of Key Variables Impacting Extreme Precipitation in Various Regions of the Contiguous United States. J. Adv. Model. Earth Syst. 2023, 15, e2022MS003334. [Google Scholar] [CrossRef]
  26. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  27. Harrigan, S.; Zsoter, E.; Alfieri, L.; Prudhomme, C.; Salamon, P.; Wetterhall, F.; Barnard, C.; Cloke, H.; Pappenberger, F. GloFAS-ERA5 Operational Global River Discharge Reanalysis 1979–Present. Earth Syst. Sci. Data 2020, 12, 2043–2060. [Google Scholar] [CrossRef]
  28. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  29. Huang, F.; Zhang, Y.; Zhang, Y.; Nourani, V.; Li, Q.; Li, L.; Shangguan, W. Towards Interpreting Machine Learning Models for Predicting Soil Moisture Droughts. Environ. Res. Lett. 2023, 18, 074002. [Google Scholar] [CrossRef]
  30. Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 Global Reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
  31. Van Der Knijff, J.M.; Younis, J.; De Roo, A.P.J. LISFLOOD: A GIS-based Distributed Model for River Basin Scale Water Balance and Flood Simulation. Int. J. Geogr. Inf. Sci. 2010, 24, 189–212. [Google Scholar] [CrossRef]
  32. Hirpa, F.A.; Salamon, P.; Beck, H.E.; Lorini, V.; Alfieri, L.; Zsoter, E.; Dadson, S.J. Calibration of the Global Flood Awareness System (GloFAS) Using Daily Streamflow Data. J. Hydrol. 2018, 566, 595–606. [Google Scholar] [CrossRef] [PubMed]
  33. Zhao, T.; Chen, Z.; Tu, T.; Yan, D.; Chen, X. Unravelling the Potential of Global Streamflow Reanalysis in Characterizing Local Flow Regime. Sci. Total Environ. 2022, 838, 156125. [Google Scholar] [CrossRef] [PubMed]
  34. Addor, N.; Newman, A.J.; Mizukami, N.; Clark, M.P. The CAMELS Data Set: Catchment Attributes and Meteorology for Large-Sample Studies. Hydrol. Earth Syst. Sci. 2017, 21, 5293–5313. [Google Scholar] [CrossRef]
  35. Pelletier, J.D.; Broxton, P.D.; Hazenberg, P.; Zeng, X.; Troch, P.A.; Niu, G.-Y.; Williams, Z.; Brunke, M.A.; Gochis, D. A Gridded Global Data Set of Soil, Intact Regolith, and Sedimentary Deposit Thicknesses for Regional and Global Land Surface Modeling. J. Adv. Model. Earth Syst. 2016, 8, 41–65. [Google Scholar] [CrossRef]
  36. Miller, D.A.; White, R.A. A Conterminous United States Multilayer Soil Characteristics Dataset for Regional Climate and Hydrology Modeling. Earth Interact. 1998, 2, 1–26. [Google Scholar] [CrossRef]
  37. Chen, H.; Liu, J.; Mao, G.; Wang, Z.; Zeng, Z.; Chen, A.; Wang, K.; Chen, D. Intercomparison of Ten ISI-MIP Models in Simulating Discharges along the Lancang-Mekong River Basin. Sci. Total Environ. 2021, 765, 144494. [Google Scholar] [CrossRef]
  38. Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the Mean Squared Error and NSE Performance Criteria: Implications for Improving Hydrological Modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
  39. Kling, H.; Fuchs, M.; Paulin, M. Runoff Conditions in the Upper Danube Basin under an Ensemble of Climate Change Scenarios. J. Hydrol. 2012, 424, 264–277. [Google Scholar] [CrossRef]
  40. Mathevet, T.; Michel, C.; Andréassian, V.; Perrin, C. A Bounded Version of the Nash-Sutcliffe Criterion for Better Model Assessment on Large Sets of Basins. IAHS-AISH Publ. 2006, 307, 211–219. [Google Scholar]
  41. Li, W.; Migliavacca, M.; Forkel, M.; Denissen, J.M.C.; Reichstein, M.; Yang, H.; Duveiller, G.; Weber, U.; Orth, R. Widespread Increasing Vegetation Sensitivity to Soil Moisture. Nat. Commun. 2022, 13, 3959. [Google Scholar] [CrossRef] [PubMed]
  42. Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and Variable Importance in Random Forests. Stat. Comput. 2017, 27, 659–678. [Google Scholar] [CrossRef]
  43. Shapley, L.S. Stochastic Games*. Proc. Natl. Acad. Sci. 1953, 39, 1095–1100. [Google Scholar] [CrossRef] [PubMed]
  44. Cai, X.; Li, L.; Fisher, J.B.; Zeng, Z.; Zhou, S.; Tan, X.; Liu, B.; Chen, X. The Responses of Ecosystem Water Use Efficiency to CO2, Nitrogen Deposition, and Climatic Drivers across China. J. Hydrol. 2023, 622, 129696. [Google Scholar] [CrossRef]
  45. Kohonen, T. The Self-Organizing Map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
  46. Knoben, W.J.M.; Freer, J.E.; Woods, R.A. Technical Note: Inherent Benchmark or Not? Comparing Nash–Sutcliffe and Kling–Gupta Efficiency Scores. Hydrol. Earth Syst. Sci. 2019, 23, 4323–4331. [Google Scholar] [CrossRef]
  47. Gudmundsson, L.; Wagener, T.; Tallaksen, L.M.; Engeland, K. Evaluation of Nine Large-Scale Hydrological Models with Respect to the Seasonal Runoff Climatology in Europe. Water Resour. Res. 2012, 48. [Google Scholar] [CrossRef]
  48. Mai, J.; Craig, J.R.; Tolson, B.A.; Arsenault, R. The Sensitivity of Simulated Streamflow to Individual Hydrologic Processes across North America. Nat. Commun. 2022, 13, 455. [Google Scholar] [CrossRef]
  49. Hoch, J.M.; Sutanudjaja, E.H.; Wanders, N.; van Beek, R.L.P.H.; Bierkens, M.F.P. Hyper-Resolution PCR-GLOBWB: Opportunities and Challenges from Refining Model Spatial Resolution to 1 km over the European Continent. Hydrol. Earth Syst. Sci. 2023, 27, 1383–1401. [Google Scholar] [CrossRef]
  50. Moges, D.M.; Virro, H.; Kmoch, A.; Cibin, R.; Rohith, A.N.; Martínez-Salvador, A.; Conesa-García, C.; Uuemaa, E. How Does the Choice of DEMs Affect Catchment Hydrological Modeling? Sci. Total Environ. 2023, 892, 164627. [Google Scholar] [CrossRef]
  51. Poncelet, C.; Merz, R.; Merz, B.; Parajka, J.; Oudin, L.; Andréassian, V.; Perrin, C. Process-Based Interpretation of Conceptual Hydrological Model Performance Using a Multinational Catchment Set. Water Resour. Res. 2017, 53, 7247–7268. [Google Scholar] [CrossRef]
  52. Barnhart, T.B.; Molotch, N.P.; Livneh, B.; Harpold, A.A.; Knowles, J.F.; Schneider, D. Snowmelt Rate Dictates Streamflow. Geophys. Res. Lett. 2016, 43, 8006–8016. [Google Scholar] [CrossRef]
  53. Niu, G.-Y.; Yang, Z.-L. Effects of Frozen Soil on Snowmelt Runoff and Soil Water Storage at a Continental Scale. J. Hydrometeorol. 2006, 7, 937–952. [Google Scholar] [CrossRef]
  54. Freudiger, D.; Kohn, I.; Seibert, J.; Stahl, K.; Weiler, M. Snow Redistribution for the Hydrological Modeling of Alpine Catchments. WIREs Water 2017, 4, e1232. [Google Scholar] [CrossRef]
Figure 1. Spatial distributions of KGE (a) and its three components (bd).
Figure 1. Spatial distributions of KGE (a) and its three components (bd).
Water 16 03582 g001
Figure 2. Heatmap of the correlation between between KGE and its components (r, γ, β) and catchment attributes. The cells with hatching indicate a p-value of <0.05.
Figure 2. Heatmap of the correlation between between KGE and its components (r, γ, β) and catchment attributes. The cells with hatching indicate a p-value of <0.05.
Water 16 03582 g002
Figure 3. Summary plots showing each dot corresponds to a catchment, and the vertical distribution indicates the density.
Figure 3. Summary plots showing each dot corresponds to a catchment, and the vertical distribution indicates the density.
Water 16 03582 g003
Figure 4. Additive effects of two attributes. The first box shows the SHAP value of the first attribute. The following three boxes represent the SHAP values of the first attribute combined with the low, intermediate, and high values of the second attribute.
Figure 4. Additive effects of two attributes. The first box shows the SHAP value of the first attribute. The following three boxes represent the SHAP values of the first attribute combined with the low, intermediate, and high values of the second attribute.
Water 16 03582 g004
Figure 5. SHAP dependence plots depicting the mechanism by which the SHAP values change along with catchment attributes. Each subfigure represents a scatter plot of the SHAP value of the attribute versus its corresponding attribute value. Each dot represents a catchment and the density indicates the concentration of the dots (al).
Figure 5. SHAP dependence plots depicting the mechanism by which the SHAP values change along with catchment attributes. Each subfigure represents a scatter plot of the SHAP value of the attribute versus its corresponding attribute value. Each dot represents a catchment and the density indicates the concentration of the dots (al).
Water 16 03582 g005
Figure 6. Spatial pattern of the key catchment attributes based on SHAP values. The spatial distribution of the clusters for primary drivers (a); and the boxplots of key catchment attributes’ SHAP values of each cluster (b).
Figure 6. Spatial pattern of the key catchment attributes based on SHAP values. The spatial distribution of the clusters for primary drivers (a); and the boxplots of key catchment attributes’ SHAP values of each cluster (b).
Water 16 03582 g006
Figure 7. Relative contribution of key catchment attributes for seasonal KGE: (a) December–January–February (DJF); (b) March–April–May (MAM); (c) June–July–August (JJA); and (d) September–October–November (SON).
Figure 7. Relative contribution of key catchment attributes for seasonal KGE: (a) December–January–February (DJF); (b) March–April–May (MAM); (c) June–July–August (JJA); and (d) September–October–November (SON).
Water 16 03582 g007
Figure 8. SHAP dependence plots depicting the mechanism by which the SHAP values change with each key catchment attribute by season. Each subfigure represents a scatter plot of the SHAP value for the attribute across the four seasons versus its corresponding attribute value. Each dot represents a catchment (al).
Figure 8. SHAP dependence plots depicting the mechanism by which the SHAP values change with each key catchment attribute by season. Each subfigure represents a scatter plot of the SHAP value for the attribute across the four seasons versus its corresponding attribute value. Each dot represents a catchment (al).
Water 16 03582 g008
Figure 9. Heatmap of SHAP interaction values across four seasons.
Figure 9. Heatmap of SHAP interaction values across four seasons.
Water 16 03582 g009
Table 1. Considered catchment attributes.
Table 1. Considered catchment attributes.
TypeNameUnitData Source
Topographic characteristics Areakm2N15—USGS data
Mean elevationmN15—USGS data
Mean slopem/kmN15—USGS data
Climatic indicesPrecipitation seasonalityN15—Daymet
Fraction of precipitation falling as snowN15—Daymet
AridityN15—Daymet
Frequency of high-precipitation eventsdays/yearN15—Daymet
Duration of high-precipitation eventsdaysN15—Daymet
Timing of high-precipitation eventsseasonN15—Daymet
Timing of low-precipitation eventsseasonN15—Daymet
Soil characteristicsDepth to bedrockmPelletier et al. (2016) [35]
Soil depthmMiller and White (1998)—STATSGO [36]
Sand fraction%Miller and White (1998)—STATSGO [36]
Silt fraction%Miller and White (1998)—STATSGO [36]
Clay fraction%Miller and White (1998)—STATSGO [36]
Water fraction%Miller and White (1998)—STATSGO [36]
Other fraction%Miller and White (1998)—STATSGO [36]
Land cover characteristicsForest fractionN15—USGS data
LAI maximumMODIS
Green vegetation fraction differenceMODIS
Fraction of dominant land coverMODIS
Dominant land coverMODIS
Root depth 50%mMODIS
Root depth 99%mMODIS
Geological characteristicsDominant geological classGLiM
Fraction of dominant geological classGLiM
Fraction of carbonate rocksGLiM
Subsurface porosityGLHYMPS
Subsurface permeabilitym2GLHYMPS
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ding, X. Catchment Attributes Influencing Performance of Global Streamflow Reanalysis. Water 2024, 16, 3582. https://doi.org/10.3390/w16243582

AMA Style

Ding X. Catchment Attributes Influencing Performance of Global Streamflow Reanalysis. Water. 2024; 16(24):3582. https://doi.org/10.3390/w16243582

Chicago/Turabian Style

Ding, Xinjun. 2024. "Catchment Attributes Influencing Performance of Global Streamflow Reanalysis" Water 16, no. 24: 3582. https://doi.org/10.3390/w16243582

APA Style

Ding, X. (2024). Catchment Attributes Influencing Performance of Global Streamflow Reanalysis. Water, 16(24), 3582. https://doi.org/10.3390/w16243582

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop