Data Descriptor
Open access
Published: 24 September 2020

Global terrestrial carbon fluxes of 1999–2019 estimated by upscaling eddy covariance data with a random forest

Jiye Zeng¹,
Tsuneo Matsunaga¹,
Zheng-Hong Tan²,
Nobuko Saigusa¹,
Tomoko Shirai¹,
Yanhong Tang³,
Shushi Peng ORCID: orcid.org/0000-0001-5098-726X⁴ &
…
Yoko Fukuda ORCID: orcid.org/0000-0002-6548-3619¹

Scientific Data volume 7, Article number: 313 (2020) Cite this article

13k Accesses
89 Citations
6 Altmetric
Metrics details

Subjects

Carbon cycle

Abstract

The terrestrial biosphere is a key player in slowing the accumulation of carbon dioxide in the atmosphere. While quantification of carbon fluxes at global land scale is important for mitigation policy related to climate and carbon, measurements are only available at sites scarcely distributed in the world. This leads to using various methods to upscale site measurements to the whole terrestrial biosphere. This article reports a product obtained by using a Random Forest to upscale terrestrial net ecosystem exchange, gross primary production, and ecosystem respiration from FLUXNET 2015. Our product covers land from −60°S to 80°N with a spatial resolution of 0.1° × 0.1° every 10 days during the period 1999–2019. It was compared with four existing products. A distinguishable feature of our method is using three derived variables of leaf area index to represent plant functional type (PFT) so that measurements from different PFTs can be mixed better by the model. This product can be valuable for the carbon-cycle community to validate terrestrial biosphere models and cross check datasets.

Measurement(s)	ecosystem-wide respiration • gross primary production • net ecosystem exchange • carbon flux • carbon dioxide
Technology Type(s)	Random forest
Factor Type(s)	year of data collection
Sample Characteristic - Environment	terrestrial biome • climate system
Sample Characteristic - Location	Earth (planet)

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12932882

Monitoring of carbon-water fluxes at Eurasian meteorological stations using random forest and remote sensing

Article Open access 07 September 2023

Recent divergence in the contributions of tropical and boreal forests to the terrestrial carbon sink

Article 27 January 2020

Changes in land use and management led to a decline in Eastern Europe’s terrestrial carbon sink

Article Open access 03 July 2023

Background & Summary

Terrestrial ecosystems are a major forcing regulating climate by exchanges of energy and carbon with the atmosphere^1,2. As a large and persistent carbon sink^3,4, they have served to slow the accumulation of anthropogenic CO₂ in the atmosphere⁵. Over the past 10 years (2008–2017), the terrestrial carbon sink was estimated to have removed about 30% of fossil CO₂ emissions⁶. As direct estimate of the sink using observations is difficult due to high variability of the photosynthesis and biases in flux measurements⁷, the Global Carbon Project estimated the sink in early years as the residue of the fossil CO₂ emissions minus CO₂ accumulated in the atmosphere and removed by the oceans⁸, and recently as the assemble of process and inverse models⁶. Nevertheless, flux measurements by the eddy covariance technique⁹ have served as a benchmark for model validation^10,11,12,13 and provided fundamental knowledge on global carbon cycling^14,15,16.

Data-driven machine learning (DDML) methods have been used to combine flux tower measurements, remote sensing observations, and climate model data to upscale forest fluxes^{17,18,19,20,21,22,23,24,25,26,27,28}. The key characteristic that distinguishes a DDML method from others is that the functional relationships are not assumed, but rather learned from patterns in the measurements. A DDML model is more objective than other types of models in that it does not subjectively impose conditions on valid ranges of model parameters in the optimization process. In the early 2000s, Papale and Valentini¹⁷ used observations of the EUROFLUX project to train a neural network simulator to estimate carbon fluxes of European forests at the continental scale. The same method was used later to examine the effect of spatial sampling on the extrapolation of the neural network¹⁸. Later, other DDML methods were introduced, including support vector machine^19,20,21, model tree assemble^22,23,24,25, and random forest^26,27,28. Results of DDML have been used to evaluate terrestrial biosphere models^9,10,11 and empirical data based on remote sensing¹².

This study presents a data-driven global gridded product for terrestrial net ecosystem exchange, gross primary production, and ecosystem respiration obtained by using a Random Forest method to upscale FLUXNET-2015 to the land from −60°S to 80°N in the period 1999–2019. The spatial and temporal resolutions are in 0.1° × 0.1° and 10 days respectively. Although similar data-driven products using FLUXNET-2015 have been reported^27,28,29, datasets obtained by a different approach are valuable considering uncertainty elements of data-driven methods, which include extrapolation to conditions unrepresented in the training data^17,18, selection of predictor variables^26,30, and product-specific biases of predictor variables³¹. The product can also be valuable for diagnosing large disparities existing among different types of models³².

Methods

Model setup

As illustrated in Fig. 1, the target variables of gross primary production (GPP), ecosystem respiration (RECO), and net ecosystem exchange (NEE) were modelled as the nonlinear function of leaf area index (LAI), fraction of absorbed photosynthetically active radiation (FAPAR), downward shortwave solar radiation on the surface (DSSR), air temperature (T2M), and relative humidity (RH2M) at 2 meters above the surface, and three variables derived from LAI to indicate plant functional type (PFT): the minimum (LAI_MIN) and maximum (LAI_MAX) of LAI in a year, and the number of LAI larger than the mean of LAI_MIN and LAI_MAX (LAI_COUNT) in a year. They directly reflect the spatial distributions of seasonality and leaf biomass. Replacing PFT by the three derived LAI variables is a new idea of this study.

The freeware Ranger³³ that implements the regression algorithm of random forest³⁴ (RF) was used to model the relationships between carbon fluxes and independent variables listed in Fig. 1. A RF includes many binary decision trees³⁵, which are grown independently using a two-stage randomization procedure. The first step is to assign to each tree a subset of the training data randomly sampled with repetition; then each tree is recursively split into binary nodes until the number of data points in the terminal nodes is not larger than a specified number. In each split, the RF randomly selects a subset of predictor variables and searches them for splitting points that maximize node impurity³⁵, which is equivalent to minimize the weighted variances of the target between parent and child nodes³⁶. In making a prediction, a new set of predictors is examined with each tree in a trained RF, passing them through branches of nodes according to the splitting points until the journey ends up in a terminal node, and the mean of the target variable in the node is taken as an estimate. Then the mean estimate of all terminal nodes is used as the prediction.

FLUXNET data

We extracted GPP, RECO, and NEE from the tier-1 sites of FLUXNET 2015⁹, specifically daily GPP from the GPP_NT_VUT_REF column, RECO from RECO_NT_VUT_REF, and NEE from NEE_VUT_REF. Daily NEE is the sum of hourly measurements of ecosystem exchange; RECO is the ecosystem respiration estimated by extending night-time hourly measurements (when photosynthesis stopped) to the whole day³⁷, and GPP was calculated from NEE and RECO. The quality flags of these variables were used to exclude points with less than 90% of the measured and good-quality gap-fill data. We found, however, that this quality control measure was not sufficient to ensure the consistency between GPP-RECO and -NEE. Tramontana et al.²⁷ used a robust regression method to select data, which would effectively filter out suspicious data points. We simply excluded those points which had an absolute difference between GPP-RECO and NEE larger than 0.1 gC m⁻² d⁻¹. About 7.5% of the data fell into this category.

The length of data record varied largely from site to site. To balance the presentation of sites in the RF, we used only the most recent data for up to three years. The daily fluxes were binned into 10-day means corresponding to the periods of remote sensing data, i.e., the first 10-day mean of a month included data from the first to the 10th days, the second 10-day mean from the 11th to the 20th days, and the third 10-day mean from the remaining days. The binned RECO and NEE were used to recalculate GPP. This process resulted in a total number of 16,939 records from 204 flux tower sites.

Remote sensing data

The remote sensing data were derived from the Copernicus Global Land Service. LAI and FAPAR³⁸ were available in 1 km spatial resolution for every 10 days from 1999 to present (https://land.copernicus.eu/global/themes/vegetation). We evaluated FAPAR by analysing the correlation between GPP and FAPAR*DSSR. The relationship was the basis of many light-use efficiency models^12,39,40,41 for GPP. The results show that between GPP and FAPAR*DSSR, 62% of the sites have a R² larger than 0.5 and 27% have a R² larger than 0.7. Plots of LAI with flux observation for all sites show good correlations in terms of amplitude and seasonal pattern. These indicate that the extracted LAI and FAPAR are good predictors.

Climate data

The predictor variables T2M, RH2M, and DSSR came from the fifth-generation ECMWF atmospheric reanalysis of the global climate (ERA5⁴²). The spatial resolution of the hourly data on single levels is 0.25°x0.25°. Analysis and forecast data in every three hours were obtained for T2M and RH2M. Their daily means were calculated first and then used to calculate the 10-day means. Hourly accumulated DSSR data were used to calculate the daily accumulated DSSR, which was then used to calculate the 10-day mean. T2M and DSSR were checked by comparing them with the air temperature (TA) and photosynthetic photon flux density (PPFD) of FLUEXNET 2015. The R² between T2M and TA is larger than 0.7 for 98% of the sites, and the percentage for R² > 0.7 between DSSR and PPFD is 91%.

Data Records

The product is available at https://doi.org/10.17595/20200227.001⁴³. Data files in NetCDF format are named as VARIABLE.YEAR.ver.NUMBER.nc in which VAIARBLE can be GPP or NEE or RECO, YEAR is the year of fluxes, and the version NUMBER is usually the year the dataset was created or updated. The meta-information inside describes the method, software, and data sources.

Figures 2–4 show the distributions of the annual means and mean uncertainties of GPP, RECO, and NEE in 2014, respectively. An uncertainty is the standard deviation of flux values in the terminal nodes of 500 trees used to make a prediction. The spatial patterns are similar to other existing products^44,45. The annual GPP increased from 134.3 PgC yr⁻¹ in 1999 to 142.2 PgC yr⁻¹ in 2019 with an increasing rate of 0.49 PgC yr⁻¹ (Fig. 5). The estimate is slightly higher than those by Copernicus⁴¹ and Jung et al.²⁹ (refer to as Jung-2019 hereafter), but lower than a recent estimate⁴⁶. The RECO was estimated to be 115.6 PgC yr⁻¹ in 1999 and 121.3 PgC yr⁻¹ in 2019, which are higher than that of Jung-2019. The trend of RECO was 0.33 PgC yr⁻¹. However, the estimated NEE in this study, which was −20.3 PgC yr⁻¹ in 1999 and −22.8 PgC yr⁻¹ in 2019, is lower than Jung-2019’s estimate. The trend of NEE is 0.14 PgC yr⁻¹, which is smaller than some recent estimates^45,47. Our GPP and RECO trends are larger than those of Li et al.⁴⁸.

There is a 1.6 ± 0.3 PgC yr⁻¹discrepancy between GPP-RECO and -NEE resulting from the node splitting scheme of RF. While GPP was recalculated as RECO-NEE and the values of predictor variables are the same in the training datasets, the trained tree structures can be significantly different from each other as the node splitting rule is based on the variances of the target in parent and child nodes. Therefore, the predicted GPP, RECO, and NEE for a new set of predictors would come from different records of the training datasets. Although the discrepancy is only about 1% of the annual GPP, it is large regarding the global terrestrial carbon sink⁶.

Our dataset is featured by using derived LAI variables (LAI_MAX, LAI_MIN, LAI_COUNT) to present PFT. While different PFTs are independent and have discrete spatial patterns (Figs. S1–S11 in Supplementary File 1, data extracted from the MCD12C1 MODIS/Terra + Aqua dataset⁴⁹), derived LAI variables are related to leaf biomass and seasonality and show rather continuous spatial transitions (Fig. S12 in Supplementary File 1). Unbalanced sampling becomes worse when measurements are grouped into different PFTs. We therefore expect that using the derived LAI variables should improve the spatial interpolation of the random forest as measurements of different PFTs can be mixed more logically. However, evaluating the advantages or disadvantages requires a dedicated inter-comparison study, which is beyond the scope of this study.

Technical Validation

We tested the performance of the RF with different numbers of trees (250–1000) and target data points (3–10) in the terminal nodes by a 10-fold cross-validation. The experiments indicated that raising or lowing these numbers did not change the goodness of fitting for GPP, RECO, and NEE. Therefore, we adopted the default configuration of Ranger, which has 500 trees and 5 target data points in the terminal nodes. With this configuration, the cross-validation obtained an R² of 0.86 ± 0.01 between the modelled GPP and observations. With all the training data, we obtained R² as 0.97 for GPP, 0.96 for RECO, and 0.94 for NEE.

The random data partition scheme of cross-validation may prevent RF from modelling a false relationship, but it gives little information on site-specific performances. To investigate these, we conducted leave-one-site-out validations. Sites were excluded one by one in the training data and the excluded sites were used for validation. The statistics of R² are summarized in Table 1. The performance order is GPP > RECO > NEE, which indicates the order of uncertainties in their estimates. Figure 6 shows the spatial pattern of R² for each site. Generally, RF performed better for forests that had large seasonal variations, as fluxes can be associated with the variations of predictors. In the areas where seasonal variation was small, site-specific uncertainty factors were more likely to blur the relationship between the target and predictors.

Table 1 Percentages of sites in ranges of R² obtained by the leave-one-site-out validation.

Full size table

A concern on using short-term data to train RF to make long-term predictions is whether the bias would increase significantly with time due to the disturbance on ecosystems. We analysed the fitting of all sites’ data and summarised the results in Tables S1–S3. More statistical details are available in Supplementary File 2. Indeed, the p-value shows that the correlation of the bias with the year was significant for some sites.

Of all 204 sites, the site named DK-SOR (55.4859°N, 11.6446°E) is the only one that has data in all 16 years from 1999 (start year of this study) to 2014 (end year of FLUXNET2015) and its p-value is smaller than 0.05 for GPP, RECO, and NEE. We present its data fitting in Fig. 7. The trend is negligible considering the large variation of measurements. We provide plots and statistical details for all sites in Supplementary Files 3–5. They show that a significant bias trend was caused most likely by incomplete data or special events in some years.

We compared this product (labelled as NIES) with those of Copernicus⁴¹, Tramontana et al.²⁷, Bodesheim et al.²⁸, and Jung et al.²⁹ (Fig. 8). For cross-checking among products, we only compared data in the period 2001–2013 when all products were available. The temporal and spatial resolutions of the Copernicus GPP are 10 days and 1 km respectively. The Bodesheim-2018 GPP and RECO are in 0.5° by 0.5° grids. The annual fluxes were calculated from the half-hourly fluxes of 12 months. The spatial resolution of the annual products of Tramontana-2016 and Jung-2019 is also 0.5° by 0.5° degrees. Jung-2019 included GPP and RECO from both night-time and daytime partition methods. We used the datasets of night-time partition.

All products show similar latitudinal variations. The differences in GPP are the smallest, followed by those of RECO and NEE. Table 2 summarises the integrated fluxes. The GPP difference between NIES and Copernicus is the smallest, with about 1% larger in NIES GPP. The difference in NIES GPP is about 4%, 2%, and 3% when compared with the values of Tramontana-2016, Bodeshein-2018, and Jung-2019, respectively. While the NIES RECO is 8% larger than that of Tramontana-2016 and 6% larger than that of Bodeshein-2018, the NIES NEE is 13% smaller than that of Tramontana-2016, 4% smaller than that of Bodeshein-2018, and 10% smaller than that of Jung-2019.

Table 2 Inter-comparison of global fluxes per year.

Full size table

Bias and Uncertainty

Although the RF method usually produces unbiased fitting to measurements, it cannot avoid the problem of unbalanced sampling. As the RF makes predictions using values of the training samples, extrapolations to unsampled domains could lead to large biases and uncertainties. To investigate the problem, we compared the histograms of T2M and LAI in global grids with the histograms of T2M and LAI associated with the training samples. Photosynthesis is determined mainly by these two variables. In constructing the global histograms, the number of grid cells was weighted by the grid area so that the count would reflect the area correctly. The weighting was not used when counting samples as sites were considered representing the same area of their surroundings no matter where they were located.

Figure 9a shows that the sampling frequency of T2M was much smaller than that of the global grids in low- and high-temperature bins. This indicates that areas with a cold or hot climate are under-represented by the measurement sites; therefore, predictions for cold areas are likely to be biased toward warmer areas, although the exact impact is extremely difficult to diagnose unless the RF implementation is designed specifically for such a purpose. Similar biases exist in hot areas. Unbalanced sampling is also shown in the histogram of LAI (Fig. 9b), especially in the low LAI bins. This is partly related to air temperature as areas with a very cold climate tend to have a small LAI.

Unbalanced sampling might not be the main factor for the NEE overestimate as the modelled GPP agrees well with that of Copernicus, which was obtained by a different method. By analysing the variations of observed GPP, RECO, and NEE, we found that the main cause of the large NEE bias was that the mean NEE of any site in any season was relatively small in comparison with its variation, as shown in Table 3. For GPP and RECO, the mean flux was larger than the standard deviation (SD) and about three times as much as the model SD; but for the NEE, the mean flux was smaller than both the SD and the model SD. The bias of prediction was much smaller than all the mean fluxes, but the SD of the bias for NEE was nearly as large as the mean flux.

Table 3 Summary of flux, bias, and model. Unit is PgC per year.

Full size table

Values in Table 3 are summaries of the statistics in Tables S1 to S3 with each site being considered as a unit no matter how many years of measurements it includes. Also, for each site, all years were considered equal in calculating the overall mean even if some years had a small number of data points. This could have led to the seasonality bias. However, if we only select the years with no missing data, many sites would have been excluded from the summary in Table 3, which would have increased the geographical bias.

Code availability

We used the software by Wright and Ziegler³³, available at https://github.com/imbs-hl/ranger. The code for data processing was written in ZeScript (https://www.zegraph.com/z-script/) and is available upon request.

References

Bonan, G. B. Forests and Climate Change: Forcings, Feedbacks, and the Climate Benefits of Forests. Science 320, 1444–1449 (2008).
Article ADS CAS PubMed Google Scholar
Shevliakova, E. et al. Historical warming reduced due to enhanced land carbon uptake. Proceedings of the National Academy of Sciences 110, 16730–16735 (2013).
Article ADS CAS Google Scholar
Pan, Y. et al. A Large and Persistent Carbon Sink in the World’s Forests. Science 333, 988–993 (2011).
Article ADS CAS PubMed Google Scholar
Ballantyne, A. P., Alden, C. B., Miller, J. B., Tans, P. P. & White, J. W. C. Increase in observed net carbon dioxide uptake by land and oceans during the past 50 years. Nature 488, 70–72 (2012).
Article ADS CAS PubMed Google Scholar
Keenan, T. F. et al. Recent pause in the growth rate of atmospheric CO₂ due to enhanced terrestrial carbon uptake. Nat Commun 7, 13428 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Le Quéré, C. et al. Global Carbon Budget 2018. Earth Syst. Sci. Data 10, 2141–2194 (2018).
Article ADS Google Scholar
Keenan, T. F. & Williams, C. A. The Terrestrial Carbon Sink. Annu. Rev. Environ. Resour. 43, 219–243 (2018).
Article Google Scholar
Le Quéré, C. et al. Global Carbon Budget 2016. Earth Syst. Sci. Data 8, 605–649 (2016).
Article ADS Google Scholar
Pastorello, G. et al. A New Data Set to Keep a Sharper Eye on Land-Air Exchanges. Eos, https://doi.org/10.1029/2017EO071597 (2017).
Bonan, G. B. et al. Improving canopy processes in the Community Land Model version 4 (CLM4) using global flux fields empirically inferred from FLUXNET data. J. Geophys. Res. 116, G02014 (2011).
Article ADS Google Scholar
Slevin, D., Tett, S. F. B., Exbrayat, J.-F., Bloom, A. A. & Williams, M. Global evaluation of gross primary productivity in the JULES land surface model v3.4.1. Geosci. Model Dev. 10, 2651–2670 (2017).
Article ADS Google Scholar
Wang, L. et al. Evaluation of the Latest MODIS GPP Products across Multiple Biomes Using Global Eddy Covariance Flux Data. Remote Sensing 9, 418 (2017).
Article ADS Google Scholar
Barman, R., Jain, A. K. & Liang, M. Climate-driven uncertainties in modeling terrestrial gross primary production: a site level to global-scale analysis. Glob Change Biol 20, 1394–1411 (2014).
Article ADS Google Scholar
Beer, C. et al. Terrestrial Gross Carbon Dioxide Uptake: Global Distribution and Covariation with Climate. Science 329, 834–838 (2010).
Article ADS CAS PubMed Google Scholar
Jung, M. et al. Compensatory water effects link yearly global land CO₂ sink changes to temperature. Nature 541, 516–520 (2017).
Article ADS CAS PubMed Google Scholar
Zhang, Z. et al. Effect of climate warming on the annual terrestrial net ecosystem CO₂ exchange globally in the boreal and temperate regions. Sci Rep 7, 3108 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Papale, D. & Valentini, R. A new assessment of European forests carbon exchanges by eddy fluxes and artificial neural network spatialization. Global Change Biol 9, 525–535 (2003).
Article ADS Google Scholar
Papale, D. et al. Effect of spatial sampling from European flux towers for estimating carbon and water fluxes with artificial neural networks: Sampling Effect on Fluxes Upscaling. J. Geophys. Res. Biogeosci. 120, 1941–1957 (2015).
Article Google Scholar
Yang, F. et al. Developing a continental-scale measure of gross primary production by combining MODIS and AmeriFlux data through Support Vector Machine approach. Remote Sensing of Environment 110, 109–122 (2007).
Article ADS Google Scholar
Ueyama, M. et al. Upscaling terrestrial carbon dioxide fluxes in Alaska with satellite remote sensing and support vector regression: Upscaling CO₂ Fluxes in Alaska. J. Geophys. Res. Biogeosci. 118, 1266–1281 (2013).
Article CAS Google Scholar
Ichii, K. et al. New data-driven estimation of terrestrial CO₂ fluxes in Asia using a standardized database of eddy covariance measurements, remote sensing data, and support vector regression: Data-Driven CO₂ Fluxes in Asia. J. Geophys. Res. Biogeosci. 122, 767–795 (2017).
Article CAS Google Scholar
Jung, M., Reichstein, M. & Bondeau, A. Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model. Biogeosciences 6, 2001–2013 (2009).
Article ADS CAS Google Scholar
Jung, M. et al. Global patterns of land-atmosphere fluxes of carbon dioxide, latent heat, and sensible heat derived from eddy covariance, satellite, and meteorological observations. J. Geophys. Res. 116, G00J07 (2011).
Article Google Scholar
Xiao, J. et al. Estimation of net ecosystem carbon exchange for the conterminous United States by combining MODIS and AmeriFlux data. Agricultural and Forest Meteorology 148, 1827–1847 (2008).
Article ADS Google Scholar
Xiao, J. et al. A continuous measure of gross primary production for the conterminous United States derived from MODIS and AmeriFlux data. Remote Sensing of Environment 114, 576–591 (2010).
Article ADS Google Scholar
Tramontana, G., Ichii, K., Camps-Valls, G., Tomelleri, E. & Papale, D. Uncertainty analysis of gross primary production upscaling using Random Forests, remote sensing and eddy covariance data. Remote Sensing of Environment 168, 360–373 (2015).
Article ADS Google Scholar
Tramontana, G. et al. Predicting carbon dioxide and energy fluxes across global FLUXNET sites withregression algorithms. Biogeosciences 13, 4291–4313 (2016).
Article ADS CAS Google Scholar
Bodesheim, P., Jung, M., Gans, F., Mahecha, M. D. & Reichstein, M. Upscaled diurnal cycles of land–atmosphere fluxes: a new global half-hourly data product. Earth Syst. Sci. Data 10, 1327–1365 (2018).
Article ADS Google Scholar
Jung, M. et al. Scaling carbon fluxes from eddy covariance sites to globe: Synthesis and evaluation of the FLUXCOM approach. https://www.biogeosciences-discuss.net/bg-2019-368/bg-2019-368.pdf (2019).
Garnaud, C., Sushama, L. & Arora, V. K. The effect of driving climate data on the simulated terrestrial carbon pools and fluxes over North America: Effect of climate on terrestrial carbon pools. Int. J. Climatol. 34, 1098–1110 (2014).
Article Google Scholar
Zhao, Y. et al. How errors on meteorological variables impact simulated ecosystem fluxes: a case study for six French sites. Biogeosciences 9, 2537–2564 (2012).
Article ADS Google Scholar
Chen, M. et al. Regional contribution to variability and trends of global gross primary productivity. Environ. Res. Lett. 12, 105005 (2017).
Article ADS CAS Google Scholar
Wright, M. N. & Ziegler, A. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Soft. 77, (2017).
Breiman, L. Random forests. Machine Learning 45, 5–32 (2001).
Article MATH Google Scholar
Breiman, L., Friedman, J., Stone, C. J. & Olshen, R. A. Classification and Regression Trees. (Belmont, CA, Wadsworth, 1984).
MATH Google Scholar
Ishwaran, H. The effect of splitting on random forests. Mach Learn 99, 75–118 (2015).
Article MathSciNet PubMed MATH Google Scholar
Reichstein, M. et al. On the separation of net ecosystem exchange into assimilation and ecosystem respiration: review and improved algorithm. Global Change Biol 11, 1424–1439 (2005).
Article ADS Google Scholar
Camacho, F., Cernicharo, J., Lacaze, R., Baret, F. & Weiss, M. GEOV1: LAI, FAPAR essential climate variables and FCOVER global time series capitalizing over existing products. Part 2: Validation and intercomparison with reference products. Remote Sensing of Environment 137, 310–329 (2013).
Article ADS Google Scholar
Joiner, J. et al. Estimation of Terrestrial Global Gross Primary Production (GPP) with Satellite Data-Driven Models and Eddy Covariance Flux Data. Remote Sensing 10, 1346 (2018).
Article ADS Google Scholar
Zhang, Y. et al. A global moderate resolution dataset of gross primary production of vegetation for 2000–2016. Sci Data 4, 170165 (2017).
Article PubMed PubMed Central Google Scholar
Monteith, J. L. Solar Radiation and Productivity in Tropical Ecosystems. The Journal of Applied Ecology 9, 747 (1972).
Article Google Scholar
Copernicus Climate Change Service (C3S). ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate. Copernicus Climate Change Service Climate Data Store (CDS), date of access. https://cds.climate.copernicus.eu/cdsapp#!/home (2017).
Zeng, J. A Data-driven Upscale Product of Global Gross Primary Production, Net Ecosystem Exchange and Ecosystem Respiration. National Institute for Environmental Studies https://doi.org/10.17595/20200227.001 (2020).
Zhang, L. et al. Upscaling carbon fluxes over the Great Plains grasslands: Sinks and sources. J. Geophys. Res. 116, G00J03 (2011).
Article CAS Google Scholar
Norton, A. J. et al. Estimating global gross primary productivity using chlorophyll fluorescence and a data assimilation system with the BETHY-SCOPE model. Biogeosciences 16, 3069–3093 (2019).
Article ADS CAS Google Scholar
Badgley, G., Anderegg, L. D. L., Berry, J. A. & Field, C. B. Terrestrial gross primary production: Using NIR V to scale from site to globe. Glob Change Biol 25, 3731–3740 (2019).
Article ADS Google Scholar
Ciais, P. et al. Five decades of northern land carbon uptake revealed by the interhemispheric CO₂ gradient. Nature 568, 221–225 (2019).
Article ADS CAS PubMed Google Scholar
Li, W. et al. Recent Changes in Global Photosynthesis and Terrestrial Ecosystem Respiration Constrained From Multiple Observations. Geophys. Res. Lett. 45, 1058–1068 (2018).
Article ADS Google Scholar
M. Friedl, D. S. MCD12C1 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 0.05Deg CMG V006. NASA EOSDIS Land Processes DAAC https://doi.org/10.5067/MODIS/MCD12C1.006 (2015).

Download references

Acknowledgements

This work used eddy covariance data from the FLUXNET community (https://fluxnet.fluxdata.org/), climate data from ECMWF ERA5 (https://cds.climate.copernicus.eu/#!/home), and remote sensing data from Copernicus Global Land Service (https://land.copernicus.eu/global/). We would like to thank these organizations for providing great data-sharing platforms. We would also like to express our appreciation to the data providers of the FLUXNET 2015 dataset (https://fluxnet.fluxdata.org/data/fluxnet2015-dataset/). Their contributions made this work possible. We would like to thank each individual explicitly here but cannot do so due to limited space.

Author information

Authors and Affiliations

National Institute for Environmental Studies, Tsukuba, Ibaraki, 305-8506, Japan
Jiye Zeng, Tsuneo Matsunaga, Nobuko Saigusa, Tomoko Shirai & Yoko Fukuda
Department of Environmental Science, Hainan University, Haikou, 570228, China
Zheng-Hong Tan
Department of Ecology, Peking University, Beijing, China
Yanhong Tang
College of Urban and Environmental Sciences, Peking University, Beijing, China
Shushi Peng

Authors

Jiye Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Tsuneo Matsunaga
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-Hong Tan
View author publications
You can also search for this author in PubMed Google Scholar
Nobuko Saigusa
View author publications
You can also search for this author in PubMed Google Scholar
Tomoko Shirai
View author publications
You can also search for this author in PubMed Google Scholar
Yanhong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Shushi Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yoko Fukuda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jiye Zeng: Data processing, modelling, & manuscript writing; Tsuneo Matsunaga: Issue of satellite data; Zheng-Hong Tan, Nobuko Saigusa, Yanhong Tang: Inputs on issues of covariance eddy measurement data. Shu-shi Peng: Inputs on issues of global fluxes. Tomoko Shirai and Yoko Fukuda: Data repository and other support. All authors reviewed and edited the manuscript.

Corresponding author

Correspondence to Jiye Zeng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Zeng, J., Matsunaga, T., Tan, ZH. et al. Global terrestrial carbon fluxes of 1999–2019 estimated by upscaling eddy covariance data with a random forest. Sci Data 7, 313 (2020). https://doi.org/10.1038/s41597-020-00653-5

Download citation

Received: 04 March 2020
Accepted: 25 August 2020
Published: 24 September 2020
DOI: https://doi.org/10.1038/s41597-020-00653-5

This article is cited by

Principles for satellite monitoring of vegetation carbon uptake
- I. Colin Prentice
- Manuela Balzarolo
- Roel van Hoolst
Nature Reviews Earth & Environment (2024)
Spatiotemporal dynamics and underlying mechanisms of ecosystem respiration in rubber plantations: a case study in Hainan Island
- Yinghe An
- Zhixiang Wu
- Zhongyi Sun
Journal of Rubber Research (2024)
The fusion of multiple scale data indicates that the carbon sink function of the Qinghai-Tibet Plateau is substantial
- Jingyu Zeng
- Tao Zhou
- Xia Liu
Carbon Balance and Management (2023)
Characterization of Regional Combustion Efficiency using ΔXCO: ΔXCO2 Observed by a Portable Fourier-Transform Spectrometer at an Urban Site in Beijing
- Ke Che
- Yi Liu
- Pucai Wang
Advances in Atmospheric Sciences (2022)
Retrieving vegetation biophysical parameters and GPP using satellite-driven LUE model in a National Park
- Moumita Marandi
- Bikash Ranjan Parida
- Surajit Ghosh
Environment, Development and Sustainability (2022)

Global terrestrial carbon fluxes of 1999–2019 estimated by upscaling eddy covariance data with a random forest

Subjects

Abstract

Similar content being viewed by others

Monitoring of carbon-water fluxes at Eurasian meteorological stations using random forest and remote sensing

Recent divergence in the contributions of tropical and boreal forests to the terrestrial carbon sink

Changes in land use and management led to a decline in Eastern Europe’s terrestrial carbon sink

Background & Summary

Methods

Model setup

FLUXNET data

Remote sensing data

Climate data

Data Records

Technical Validation

Bias and Uncertainty

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

This article is cited by

Principles for satellite monitoring of vegetation carbon uptake

Spatiotemporal dynamics and underlying mechanisms of ecosystem respiration in rubber plantations: a case study in Hainan Island

The fusion of multiple scale data indicates that the carbon sink function of the Qinghai-Tibet Plateau is substantial

Characterization of Regional Combustion Efficiency using ΔXCO: ΔXCO2 Observed by a Portable Fourier-Transform Spectrometer at an Urban Site in Beijing

Retrieving vegetation biophysical parameters and GPP using satellite-driven LUE model in a National Park

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Background & Summary

Methods

Model setup

FLUXNET data

Remote sensing data

Climate data

Data Records

Technical Validation

Bias and Uncertainty

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links