Predicting Global Patterns of Long-Term Climate Change From Short-Term Simulations Using Machine Learning

www.nature.
com/npjclimatsci
ARTICLE OPEN
Predicting global patterns of long-term climate change from

short-term simulations using machine learning
1,2 ✉ 1,3,4,5 1,3,6 7 8 1,6,9
L. A. Mansfield , P. J. Nowack , M. Kasoar , R. G. Everitt , W. J. Collins and A. Voulgarakis
Understanding and estimating regional climate change under different anthropogenic emission scenarios is pivotal for informing
societal adaptation and mitigation measures. However, the high computational complexity of state-of-the-art climate models
remains a central bottleneck in this endeavour. Here we introduce a machine learning approach, which utilises a unique dataset of
existing climate model simulations to learn relationships between short-term and long-term temperature responses to different
climate forcing scenarios. This approach not only has the potential to accelerate climate change projections by reducing the costs
of scenario computations, but also helps uncover early indicators of modelled long-term climate responses, which is of relevance to
climate change detection, predictability, and attribution. Our results highlight challenges and opportunities for data-driven climate
modelling, especially concerning the incorporation of even larger model datasets in the future. We therefore encourage extensive
data sharing among research institutes to build ever more powerful climate response emulators, and thus to enable faster climate
change projections.
npj Climate and Atmospheric Science (2020)3:44 ; https://doi.org/10.1038/s41612-020-00148-5
1234567890():,;
INTRODUCTION outputs (long-term responses) given new unseen inputs (short-

To achieve long-term climate change mitigation and adaptation term responses i.e. the results of easier to perform short-term
goals, such as limiting global warming to 1.5 or 2 °C, there must be simulations). While data science methods are increasingly used
a global effort to decide and act upon effective but realistic within climate science (e.g. refs. 24–30), no study has attempted the
emission pathways1. This requires an understanding of the application we present here, i.e. to predict the magnitude and
consequences of such pathways, which are often diverse and patterns of long-term climate response to a wide range of global
involve changes in multiple climate forcers1–3. In particular, and regional forcing scenarios.
different emission scenarios of, for example, greenhouse gases
and aerosols are responsible for diverse changes in regional Building surrogate climate models
climate, which are not always well captured by a metric such as To train our learning algorithms, we take advantage of a unique
global temperature-change potential4–9. Exploring more detailed set of GCM simulations performed in recent years using the
relationships between emissions and multiregional climate Hadley Centre Global Environment Model 3 (HadGEM3). In these,
responses still requires the application of Global Climate Models step-wise perturbations were applied to various forcing agents to
(GCMs) that allow the behaviour of the climate to be simulated explore characteristic short- and long-term climate responses to
under various conditions (e.g. different atmospheric greenhouse them5,7,8,14,16,31–34. The set of simulations includes global pertur-
gas and aerosol concentrations or emissions fields)10–12 on bations of long-lived greenhouse gases such as carbon dioxide
decadal to multi-centennial timescales (e.g. refs. 5,13–16). However, (CO2) and methane (CH4), as well as global and local perturbations
modelling climate at increasingly high spatial resolutions has to key short-lived pollutants such as sulfate (SO4) and black carbon
significantly increased the computational complexity of GCMs2, a (BC) particles, amongst others (Supplementary Table 1). A key
tendency that has been accelerated by the incorporation and difference between these two types of perturbations is that long-
enhancement of a number of new Earth system model lived forcers are homogeneously distributed in the atmosphere so
components and processes17–20. This high computational cost that the region of emission is effectively inconsequential for the
has driven us to investigate how machine learning methods can global temperature response pattern. In contrast, the response
help accelerate estimates of global and regional climate change pattern does depend on the region of emission for short-lived
under different climate forcing scenarios. forcers.
Our work is further motivated by studies that have suggested The evolution of the GCM’s global mean temperature response
links between characteristic short-term and long-term response to some example forcing scenarios is highlighted in Fig. 1a. All
patterns to different climate forcing agents5,21,22. Here, we seek a scenarios show an initial sudden response in the first few years,
fast ‘surrogate model’23 to find a mapping from short-term to which we label the ‘short-term response’. The global mean
long-term response patterns within a given GCM (Fig. 1). Once temperature then converges towards a new (approximately)
learned, this surrogate model can be used to rapidly predict other equilibrated steady state, which we label the ‘long-term response’.
1
Department of Physics, Imperial College London, South Kensington Campus, London SW7 2BW, UK. 2School of Mathematics and Statistics, University of Reading, Whiteknights,
Berkshire RG6 6AX, UK. 3Grantham Institute, Imperial College London, South Kensington Campus, London SW7 2AZ, UK. 4Climatic Research Unit, Data Science Institute, Imperial
College London, South Kensington Campus, London SW7 2AZ, UK. 5School of Environmental Sciences, University of East Anglia, Norwich, Norfolk NR4 7TJ, UK. 6Leverhulme
Centre for Wildfires, Environment and Society, Department of Physics, Imperial College London, South Kensington Campus, London SW7 2BW, UK. 7Department of Statistics,
University of Warwick, Coventry CV4 7AL, UK. 8Department of Meteorology, University of Reading, Whiteknights, Berkshire RG6 6ET, UK. 9School of Environmental Engineering,
Technical University of Crete, Chania, Crete 73100, Greece. ✉email: laura.mansfield@pgr.reading.ac.uk
Published in partnership with CECCR at King Abdulaziz University

L.A. Mansfield et al.
2
1234567890():,;
Fig. 1 Data-driven approach to learning relationships between short-term and long-term climate response patterns. a Global mean
surface temperature response of a GCM (HadGEM3) to selected global and regional sudden step perturbations, e.g. to changes in long-lived
greenhouse gases (CO2, CH4), the solar constant and short-lived aerosols (SO4, BC). b Example of the short-term and long-term surface
temperature response patterns for 2xCO2 scenario, defined as an average over the first 10 years and years 70–100, respectively. c Process
diagram highlighting the training and prediction stages. In the training stage, a regression function is learned for pairs of short-term and long-
term response maps, where the data are obtained from existing HadGEM3 simulations. In the prediction stage, the long-term response for a
new unseen scenario is predicted by applying the already learned function to the short-term response to this new scenario, which is cheaper
to obtain (here only 10 climate model years).
We are interested in not just the global mean response but, more to predict the long-term surface temperature response of the GCM
importantly, in the global response patterns, such as the example from the short-term temperature responses to perturbations
shown in Fig. 1b for the 2xCO2 scenario. (Fig. 1c). Then we can make effectively instantaneous predictions
In essence, GCMs map the initial state of the climate system and using results from new short-term simulations as input so that
its boundary conditions, such as emission fields, to a state of the repeated long GCM runs can be avoided. Based on the available
climate at a later time, using complicated functions representing GCM data, we define the ‘long-term’ as the quasi-equilibrium
the model physics, chemistry, and biology17. Our statistical model response after removing the initial transient response (first 70
approximates the behaviour of the full GCM for a specific target years) and averaging over the remaining years of the simulations,
climate variable of interest; here we choose surface temperature similarly to previous studies (see Methods)5,14,36. We define ‘short-
at each GCM grid cell, a central variable of interest in climate term’ as the response over the first 10 years of each simulation.
science and impact studies. This model is trained on simulations The task is to learn the function f ðxÞ that maps these short-term
from the full global climate model (supervised learning35), in order responses (x) to the long-term responses (y) (‘TRAINING’ in Fig. 1c).
npj Climate and Atmospheric Science (2020) 44 Published in partnership with CECCR at King Abdulaziz University
3
We use an independent regression model of the long-term
response for each grid cell. Each one depends on the short-term
response at all grid cells, so that predictions are not only based on
local information but can also draw predictive capability from any
changes in surface temperature worldwide. We present Ridge
regression37 and Gaussian Process Regression (GPR)38 with a linear
kernel (see Methods) as approaches for constructing this mapping.
Then, the learned regression functions can be used to predict the
long-term response for new, unseen inputs (x ), (‘PREDICTION’ in
Fig. 1c). We choose Ridge regression and GPR, because these two
methods handle well the limited sample size (number of
simulations available) for training, which also limits how effectively
the number of free parameters for other approaches such as deep
learning, including convolutional neural networks, could be Fig. 2 Distribution of predicted grid-point scale surface tempera-
constrained. Future data collaborations, discussed below, could ture responses in °C for all methods for one short-lived forcing,
make the adaptation of our methodology to incorporate deep No_SO2_Europe, and one long-lived forcing, 3xCH4. The central
leaning an option. For the learning process, we use all but one of vertical boxes indicate the interquartile range shown on a standard
the available simulations at a time for training and cross- box plot, the horizontal line shows the median and the black point
shows the mean. The horizontal width shows the distribution of
validation. The trained model is then used to make a temperature temperature values overall grid points, i.e. the wider regions
response prediction for the simulation that was left out each time. highlight that more grid points have this value of predicted
Finally, we assess the prediction skill of our machine learning temperature response. Note the different vertical scales.
models by comparing the predicted response maps f ðx Þ to the
results of the complex GCM simulations. This is repeated so that Squared Error (RMSE) by comparing the prediction and GCM
each simulation is predicted once based on the information response at every grid point (Methods). We highlight that grid-
learned from all other independent simulations (Methods). scale error metrics need to be interpreted with care because they
can present misleading results, particularly for higher resolution
models. For example, they penalize patterns that—as broad
RESULTS AND DISCUSSION features—are predicted correctly but displaced marginally on the
Overall method performance spatial grid44. This issue is necessarily more prevalent for the machine
We evaluate the performance of the two different machine learning approaches where smaller scale patterns are more
learning methods (Ridge, GPR) by benchmarking them against a frequently predicted, while pattern scaling predicts more consistently
traditional pattern scaling approach36,39, often used for estimating smooth, cautious patterns with reduced spatial variability (Supple-
future patterns of climate change40–42. The latter relies on mentary Fig. 1). This consideration is a key reason why predictions for
multiplying the long-term response pattern for the 2xCO2 scenario larger scale domains are often selected in impact studies11,12. We
by the relative magnitude of global mean response for each therefore also compare the absolute errors in global mean
individual climate forcer. This is approximated as the ratio of temperature and in regional mean temperature over ten broad
global mean effective radiative forcing (ERF) between the forcer regions (Fig. 3); four of which are the main emission regions (North
and the 2xCO2 scenario (Methods)36. Alternative approaches are America, Europe, South Asia, and East Asia) and the remaining cover
discussed in Methods and Supplementary. primarily land areas where responses affect the majority of the
We compare the predictions of long-term regional surface world’s population. The boxplots in Fig. 3 show how these errors are
temperature changes with those produced by the complex GCM. distributed overall predicted scenarios for each regression method.
From analysis at a grid-cell level, both Ridge regression and GPR Both Ridge and GPR generally outperform the pattern scaling
capture some broad features that pattern scaling is also known to approach, but we find that, in most cases, it is GPR errors that are
predict effectively, such as enhanced warming over the Northern lowest. Note that scenario-specific pattern scaling errors are
Hemisphere, particularly over land, and Arctic amplification43 necessarily dependent on the approach chosen to scale the global
(Supplementary Figs. 1 and 2). However, the key advantage of CO2-response pattern (Methods, Supplementary Fig. 4), but all
both machine learning methods is that they capture regional pattern scaling approaches share their fundamental limitation in
patterns and diversity in the response not predicted by pattern predicting spatial variability (Fig. 2). The large spread in absolute
scaling. In particular, aerosol forcing scenarios show highly specific errors in Fig. 3 is due to the large spread in response magnitude for
regional imprints on surface temperature due to the spatial the different scenarios. Specifically, the large errors (e.g. 1–2 °C for
heterogeneity of the emissions and their short lifetimes4,7,33. It the machine learning models and >3 °C for pattern scaling) come
is the ability to learn these patterns that gives data-driven methods mostly from regions/scenarios with a large magnitude of response,
the edge over any pattern scaling method for such predictions. The which expectedly tend to be for strong forcings (e.g. strong solar or
example in Fig. 2 shows the distribution of predicted temperature greenhouse gas forcings), but these errors can be small relative to
responses over all individual grid boxes for one short-lived and one the overall magnitude of scenario response. In contrast, small
long-lived forcing scenario. For the long-lived forcings all three absolute errors can be large relative to the magnitude of response
types of model predictions produce a similar distribution of surface (Supplementary Fig. 5), making prediction more challenging for
temperature responses to the GCM. However, for short-lived weakly forced scenarios. This is also consistent with the finding that
forcing scenarios, the range and variability of responses is highly regional aerosol perturbations, with typically weaker forcings, are
underestimated in the case of pattern scaling. This is consistent more difficult to predict compared to long-lived pollutant
across short-lived forcing scenario predictions (Supplementary Fig. perturbations (Fig. 2).
3) and exists because pattern scaling is constrained to the same
pattern, regardless of the scaling factor used to estimate the global Learning early indicators
mean response (Methods, Supplementary Fig. 4). As well as advancing our predictability skills, the machine
In the following, we quantify how well the two machine learning methods inform us about regions that experience the
learning models and pattern scaling perform on different spatial earliest indicators of long-term climate change in the GCM. By
scales. At the grid-scale level, we calculate the Root Mean assessing the structure of learned Ridge regression coefficients,
Published in partnership with CECCR at King Abdulaziz University npj Climate and Atmospheric Science (2020) 44
4
Fig. 3 Prediction skill comparison for entire globe and ten major world regions. RMSE at grid-cell level and global/regional absolute errors
in °C for all scenarios, calculated by averaging the predicted response over each region and taking the difference between the GCM output
and the prediction using three methods: R = Ridge regression, G = Gaussian Process Regression, and P = Pattern scaling. Boxplots show the
distribution of errors across scenario predictions. Boxes show the interquartile range, whiskers show the extrema, lines show the medians and
black diamonds show the mean. The dots indicate the errors for each individual scenario. Note the different scale for the Arctic and that
points exceed the scale in Arctic (9.5), Northwest Asia (4.7), East Asia (3.7) and the Grid RMSE (3.8).
we find patterns in the short-term response that consistently Data constraints and future directions
indicate the long-term temperature response (Supplementary We identify more extensive training data (additional simulations
Fig. 6). In some regions (e.g. East Asia) the dominant coefficients and forcing scenarios) as key to further improving the skill of our
appear in regions close to the predicted grid cell, whereas in machine learning methods. In Fig. 4 it is demonstrated that as the
other regions (e.g. Europe) predictions are strongly influenced number of data training samples increases, the mean prediction
by the short-term responses over relatively remote areas, such accuracy significantly increases and becomes more consistent. We
as sea-ice regions over the Arctic. This highlights the fact that therefore expect significant potential for further improvements in
climate model response predictability varies strongly depending predictions with even more training data. More simulations would
on the region of interest, and often involves interactions with better constrain parameters of the statistical models and improve
regions very far from the region of interest as well as from the the chances that a predicted scenario contains features previously
emission region. seen by the statistical model (e.g. refs. 38,47, Methods).
We also examine which areas are overall the most influential for Since obtaining training data from the GCM is expensive,
long-term predictability, by averaging magnitude of coefficients sensible choices can also be made about how to increase the
across all grid cells to find a global mean coefficient map dataset by choosing which new scenarios will benefit the accuracy
(Supplementary Fig. 6c, f). This coefficient map mimics warming of the method the most, e.g. to address some complex regional
patterns seen in previous studies (enhanced at high latitudes, over aspects of the responses to short-lived pollutants. We recommend
land and over the subtropics)14 but also shows amplified increasing the dataset to include more short-lived pollutant
scenarios, noting that those with large forcings may reduce the
coefficient weights in sea-ice regions, high-altitude regions,
noise in the training data so as to better constrain learned
primary emission regions and mid-latitude jet stream regions.
relationships (e.g. Supplementary Fig. 5). Some regions stand out
Arctic and high-altitude regions are known to warm more rapidly
as particularly challenging for our machine learning approaches,
due to ice and snow albedo feedbacks45 and faster upper with Europe being a prominent example (Supplementary Fig. 2).
tropospheric warming11,46 respectively. These regions exhibit This is partly due to large variations in the long-term response
accelerated warming in the simulation compared to their across the training data over Europe relative to other regions,
surroundings, making them robust harbingers of long-term which means predictions are less well constrained and would
change within the model. We highlight the implications for future benefit more from increased training data. Additionally, the
studies that attempt to interpret already observed warming variability in the GCM-predicted temperature time series is
patterns from a climate change perspective. generally larger over Europe compared to other regions in both
5
(Supplementary Fig. 11). Finally, we also tested other linear (e.g.
LASSO47) and nonlinear (e.g. Random Forest) methods for the
same learning task. However, these provided weaker results so
that we focused our discussion on Ridge and GPR here. We have
explored the use of these methods in the context of predicting
temperature responses; however, we leave open the topic of
predicting other variables such as precipitation, which we expect
to be more challenging due to its spatial and temporal
variability48,49, but for which pattern scaling approaches are
well-known to perform particularly poorly36,41,43,50.
We also wish to highlight another long-term perspective in
which the framework presented here could be useful. ‘Emulators’
that approximate model output given specific inputs, are a
popular tool of choice for prediction, sensitivity analysis,
uncertainty quantification and calibration and have great potential
for climate prediction and impact studies23,51–59. However, long-
term, spatially resolved climate prediction for diverse forcings has
not yet been addressed due to the cost of training such emulators.
Fig. 4 Prediction skill for Gaussian Process Regression trained on A major implication of the approach presented here is that it can
an increasing number of simulations. Mean of absolute errors in °C catalyse designing long-term climate emulators, by using a
across all predicted scenarios against number of training simula- combination of the short-term/long-term relationships presented
tions, with each line representing a different region (Fig. 3). RMSE at
the grid-scale level is also shown in black with white dots. For a fixed
here and trained emulators of the short-term climate response to
number of training data points, the process of training and different forcings (i.e. multilevel emulation52,59). Training an
predicting is repeated several times over different combinations of emulator that predicts the spatial patterns of long-term response
training data to obtain multiple prediction errors for each scenario. to a range of forcings would be an extremely challenging task, as
Full boxplots showing the distribution of errors across scenario it would require tens of simulations, all of them multidecadal in
predictions given these different combinations of training simula- length, in order to train the emulator. Our method drastically
tions can be found in Supplementary Fig. 7. accelerates this process by reducing the length of such simula-
tions to be of the order of 5–10 years, with subsequent use of the
the control and perturbation simulations (Supplementary Fig. 8). relationships presented here for translating short-term responses
This gives rise to a weaker signal-to-noise ratio for both short- and to long-term responses.
long-term responses in this region, increasing the difficulty of Our study made use of existing simulations from a single global
learning meaningful predictive relationships. It is also noteworthy climate model. However, it opens the door for similar approaches
that Ridge regression predictions for Europe depend strongly on to be taken with datasets from other individual climate models.
remote parts of the Arctic where the short-term response is The same GCMs are typically run by several different research
stronger but also highly variable (Supplementary Figs. 5 and 6). centres across the world so that additional simulation data should
This points to the issue that internal variability can introduce noise be an effort of (inter)national collaboration. We therefore
to the inputs and outputs of the regression. This is partially encourage widespread data sharing to test the limits of our
addressed with multidecadal averages in the definitions of the approach as an important part of future research efforts in this
short- and long-term responses, under the limitation that we have direction. We hope that our work will catalyse developments for
only a single realization of each simulation available. If, in future coordinated efforts in which carefully selected perturbation
work, we have available an ensemble of simulations for each experiments will be performed in a multi-model framework.
perturbation, an average over these would more effectively Increased availability of training datasets through model inter-
separate the internal variability from the response. The use of comparison exercises, along with increasing access to powerful
several diverse simulations in the training dataset also allows the computing hardware can only help with this endeavour, leading
noise in the inputs and outputs to be treated as random noise in to further advances in climate model emulation.
the regression, which would be even better determined with
increased training data.
A key challenge of working with the climate model information METHODS
here is its high dimensionality (27,840 grid cells) given the small Available simulations
scenario sample size of 21 simulations. We note that we tried To learn the regression models, we use data from long-term simulations
sensible approaches to dimension reduction for decreasing the from the Hadley Centre Global Environment Model 3 (HadGEM3)
number of points in both inputs and outputs, including physical HadGEM3, a climate model developed by the UK Met Office17. HadGEM3
dimension reduction by regional averaging, and statistical is a GCM for the atmosphere, land18, ocean19, and sea-ice20. In the
configuration used here, the horizontal resolution is 1.875° by 1.25°, giving
dimension reduction with principal component analysis (PCA)47.
grid boxes ~140 km wide in the mid-latitudes17. The simulations were run
However, the resulting regressions generated larger prediction in previous academic studies and model intercomparison projects, namely
errors (Supplementary Fig. 9). Furthermore, we explored the use of the Precipitation Driver and Response Model Intercomparison Project
different variables as the short-term predictors, such as air (PDRMIP)16,31,32, Evaluating the Climate and Air Quality Impacts of Short-
temperature at 500 hPa, geopotential height at 500 hPa (as an lived pollutants (ECLIPSE)7,8,33 and Kasoar et al. (2018)5,14,34. There are
indicator of the large-scale dynamical responses), radiative forcing 21 such simulations for a range of forcings, including long-lived green-
or sea level pressure. Surface temperature consistently outper- house gas perturbations (e.g. carbon dioxide (CO2), methane (CH4), CFC-
forms other predictors, although a similar degree of accuracy is 12), short-lived pollutant perturbations (e.g. sulfur dioxide emissions (SO2,
the precursor to sulfate aerosol (SO4)), black carbon (BC), organic carbon
achieved with 500 hPa air temperature and geopotential height,
(OC)) and a solar forcing perturbation. For the short-term pollutants,
suggesting the information encoded by these is similar (Supple- regional perturbations exist, to account for the influence of emission
mentary Fig. 10). Throughout, we have selected the first 10 years region to the response4,60.
of the GCM simulations as the inputs to our regression, but we The long-lived greenhouse gas (CO2, CH4, CFC-12) simulations were
find promising results for even shorter periods, e.g. the first 5 years performed by altering the atmospheric mixing ratios. The short-lived
6
pollutant experiments were performed by abruptly scaling present-day The last term shrinks many of the βj coefficients close to zero, so that the
emission fields in simulations performed by ECLIPSE7,8,33 and Kasoar et al. remaining large coefficients can be viewed as stronger predictors of y. This
(2018)5,14,34 or by scaling multi-model mean concentration fields in introduces a bias but lowers the variance5. The regularisation parameter λ
PDRMIP16,31,32. The solar forcing experiment was performed by changing controls the amount of shrinkage and is chosen through cross-validation,
the solar irradiance constant31. The GCM is run until it converges towards a described below. Once β0 and βj have been learned, we can use (1) to
new climate state, to reach an approximate equilibrium (70–100 years). The make predictions. We carried out the regression with and without inputs x
response is calculated by differencing this with its corresponding control normalised to zero mean and unit variance with very little difference in
simulation (independent control simulations were run for each pro- results. We use Python package scikit-learn to implement Ridge regression
ject5,7,8,14,16,31–34). For the long-term response, we discard the transient and cross-validation62.
response and average from year 70–100 for PDRMIP and Kasoar et al.
(2018) to smooth out internal variability over the 30-year period36. For the
5 ECLIPSE simulations, we average from year 70 to year 80, since this is the
Cross-validation
full temporal extent of ECLIPSE simulations. For the short-term response, Cross-validation is used here to estimate the best value of λ for prediction
we average over the first 10 years of the simulation to reduce the influence based on the available training data. First, we split the training dataset (of
of natural variability of the GCM36. size N) into a chosen number of subsets of size NCV . We use three subsets
The experiments from PDRMIP consist of simulations with a doubling of so NCV is around 6–7. Then, we iterate through a list of possible values of λ,
CO2 concentration, tripling of CH4 concentration, a 10× increase in CFC-12 and for each one, the following steps are taken.
concentration, a 2% increase in total solar irradiance, 5× increase in sulfate
(1) Set λ from list.
concentrations (SO4), a 10× increase in black carbon (BC) concentrations, a
10× increase in SO4 concentrations over Europe only, a 10× increase in SO4
concentrations over Asia only, and a reduction to preindustrial SO4 (a) Set aside one of the smaller datasets as the validation data (size
concentrations16,31. From ECLIPSE project simulations, we use a 20% NCV ).
reduction in CH4 emissions, a doubling in CO2 concentration, a 100% (b) Train the regression model with the remaining data ðN NCV Þ by
reduction in BC emissions, 100% reduction in SO2 emissions, and a 100% minimising (3).
reduction in carbon monoxide (CO) emissions7,8,33. The simulations (c) Use the inputs of the validation dataset on the trained model to
performed by Kasoar et al. (2018) consist of a 100% reduction in SO2 make predictions on the outputs using (1) and call this y.
over the Northern Hemisphere mid-latitudes (NHML), a 100% reduction in (d) Compare these predictions with the true outputs of the
BC over the NHML, a 100% reduction in SO2 over China only, a 100% validation dataset using an error metric such as root-mean-
reduction in SO2 over East Asia, a 100% reduction in SO2 over Europe and a squared error (RMSE), accounting for all grid cells i ¼ 1; ¼ ; p
100% reduction in SO2 over US5,14,34. Additional simulations had also been and weighting by the grid-cell area, wi ,
performed by the groups, but we only consider simulations where the !1=2
Xp 2
global mean response exceeds natural variability, calculated as the RMSEcv;λ ¼
wi y y 2 (4)
i i
standard deviation among the control simulations. This is because we i
want to limit the noise in the small dataset we have. Scenarios that we did
not use for this reason were the global removals of organic carbon, volatile
(e) Repeat steps a-d for other subsets of validation data (we use 3 in
organic compounds and nitrogen oxides (ECLIPSE7,8,33) and the removal of
total).
SO2 over India (Kasoar et al. (2018)5,14,34).
(2) Calculate the cross-validation score as the mean RMSE for this value
of λ for all three subsets.
Regression methods X 3
We construct the mapping between short-term temperature response (x) RMSEλ ¼ RMSEcv;λ (5)
cv
and long-term temperature response (y) described in Fig. 1b using Ridge
regression37 and Gaussian Process Regression (GPR)38. These were found This process is repeated for all values of λ in the list. The value of λ that
to be strongest from a range of machine learning methods tested, produces the lowest RMSEλ is selected as the parameter for use in the final
including Random Forest and Lasso. stage of training of the model, where all training data is used.
Ridge regression Gaussian Process Regression

Given output variable y and input variable x, linear regression uses the Rather than learning the parameters β0 and βj , Gaussian Process
mapping Regression is a non-parametric approach, where we seek a distribution
X over possible functions that fit the data. This is done from a Bayesian
y ¼ β0 þ βj xj (1) perspective, where we define a prior distribution over the possible
j functions. Then after observing the data, we use Bayes’ theorem to obtain
where there are p predictors, indexed by j ¼ 1; ; p. The parameters to fit a posterior distribution over possible functions. The prior distribution is a
are the intercept, β0 , and the coefficients, βj , associated with each Gaussian process,
predictor xj . The method of least squares is used to fit the parameters by y GPðμ0 ðxÞ; C0 ðx; x 0 ÞÞ (6)
minimising the sum of the residual squared error for the training data pairs
ðxi ; yi Þ for grid points i ¼ 1; ; N: where μ0 is the prior mean function, which we assume to be linear with
" !#2 slope β, μ0 ð x Þ ¼ βx, and C0 ðx; x 0 Þ is the prior covariance function, which
X X describes the covariance between two points, x and x 0 38. We choose the
yi β0 þ βj xij (2) following squared exponential covariance function,
i j !
When the number of samples exactly equals the number of parameters, 0 jx x 0 j2
C ðx; x Þ ¼ σ exp
2
(7)
N ¼ p þ 1, this can be minimised to give a unique solution. When N > p þ 2l 2
1 the parameters are overdetermined and this is an optimisation problem
in βj . In contrast, when N < p þ 1, there are more free parameters, βj , than where σ 2 and l are the output variance and lengthscale, respectively,
there are observed data points to constrain them47. There are many which reflect the sensitivity of the outputs to changes in inputs38.
possible values of βj that satisfy (2) equal to zero, making this an under- The prior Gaussian process is combined with the data using Bayes’
determined problem. Our problem falls under this regime since we have Theorem to obtain a posterior distribution over functions. This is another
many predictors (one for each grid point, i.e. p ¼ 27; 840) but few training Gaussian process, with an updated mean function, μ ðxÞ, and covariance
simulations ðN ¼ 20Þ. This is why we introduce a regularisation constraint function, C ðx; x 0 Þ,
which penalises large values of βj . Thus, we minimise47,61: y GPðμ ðxÞ; C ðx; x 0 ÞÞ (8)
(" !#2 )
X X X 2 The details can be found in relevant textbooks38. Predictions of the
yi β0 þ βj xij þλ βj (3) output can then be made at unseen values of x, where the Gaussian process
i j j provides both an expected value and the variance around this value. Since
7
the prediction is effectively built on correlations between the new inputs interest due to societal relevance. Here we defined the prediction error as
and the training data inputs, this variance will be lower for predictions at the absolute difference between the predicted response in each region, y r ,
values of x that are closer to values already seen in training data. We follow and the response from the complex GCM in the same region, y r :
these steps with the framework provided by GPy in Python. The values of β,
Eabs ¼ y y (11)
σ2 , and l are learned through optimisation (the L-BGFS optimiser) in GPy63. r r
where subscript r indicates the mean response overall grid boxes in that
region, weighted by the grid box area. We also calculate the absolute error
Pattern scaling for the global mean response in the same way. These RMSE, regional and
We benchmark our machine learning models against pattern scaling, a global error metrics are presented in Fig. 3 for all prediction methods.
traditional method for obtaining spatial response patterns to forcings
without running a full GCM36,39. It has been widely used for conducting
regional climate change projections40–42 in impact studies64 and to extend
simplified models to predict spatial outputs58,65. Pattern scaling requires DATA AVAILABILITY
one previous GCM run to obtain the long-term response of the variable of Data used in this manuscript were originally produced in previous studies5,7,8,14,16,31–34.
interest for a reference scenario. Typically, a strong greenhouse gas Postprocessed data used to produce results in this study is available at 10.5281/
perturbation, such as a doubling of CO2 is used as this reference response zenodo.3971024.
pattern on the longitude-latitude grid, Vref ðlat; lonÞ. We use the 2xCO2
scenario from PDRMIP (since more than half of the simulations are from
PDRMIP we expect this to be a more valid reference pattern than the CODE AVAILABILITY
2xCO2 ECLIPSE scenario)16,31,32. Then, the variable of interest is estimated Code to produce results is publicly available on github.com/lm2612/Ridge_3 and
at each grid point for a new scenario, V ðlat; lonÞ by multiplying the github.com/lm2612/GPRegression. Use of the HadGEM3-GA4 climate model was
reference pattern by scaler value s, i.e. provided by the Met Office through the Joint Weather and Climate Research
V ðlat; lonÞ ¼ s ´ Vref ðlat; lonÞ (9) Programme, and the model source code is not generally available. For more
information on accessing the model, see http://www.metoffice.gov.uk/research/
The scaler value s is the ratio of long-term global mean temperature collaboration/um-collaboration.
response between the prediction and reference scenario. This can be
derived from either a simplified climate model, such as a global energy Received: 17 March 2020; Accepted: 11 October 2020;
balance model43,66; a statistical model58; or a mathematical relationship,
such as the assumed linear relationship between long-term temperature
response and effective radiative forcing (ERF)64,67. We take the latter
approach due to the availability of variables required to calculate ERF for
the relevant perturbations studied here.
ERF is defined as the energy imbalance between the surface and the top REFERENCES
of the atmosphere in a GCM run in which the atmosphere is allowed to 1. IPCC Climate Change 2014: Synthesis Report (eds. Pachauri, R. K. & Meyer, L. A.)
respond, while sea-surface temperatures are kept fixed (i.e. no ocean (Cambridge Univ. Press, 2015).
coupling)1,5,8,33. These simulations were run for 5 years in previous 2. Collins, M. et al. Quantifying future climate change. Nat. Clim. Change 2, 403 EP
studies5,7,8,14,16,31–34 and therefore we average over the first 5 years of the (2012).
simulations to reduce noise in the estimate of global mean ERFs. 3. Rogelj, J., Mccollum, D. L., O'Neill, B. C. & Riahi, K. 2020 emissions levels required
Pattern scaling is generally considered as a fair approximation36,43,66 but to limit warming to below 2 °C. Nat. Clim. Change 3, 405–412 (2013).
it assumes that the magnitude of the response scales linearly with the 4. Shindell, D. & Faluvegi, G. Climate response to regional radiative forcing during
amount of radiative forcing, which is not necessarily true, particularly for the twentieth century. Nat. Geosci. 2, 294 EP (2009).
climate forcings of a different type to the reference scenario36. 5. Kasoar, M., Shawki, D. & Voulgarakis, A. Similar spatial patterns of global climate
Furthermore, it cannot necessarily predict the highly inhomogeneous response to aerosols from different regions. npj Clim. Atmos. Sci. 1, 12 (2018).
effects of certain types of climate forcings such as from aerosol emissions. 6. Shine, K. P., Fuglestvedt, J. S., Hailemariam, K. & Stuber, N. Alternatives to the Global
There are alternative approaches for obtaining a sensible scaler value s Warming Potential for comparing climate impacts of emissions of greenhouse
such as using the ratio of short-term temperature response between the gases. Clim. Change 68, 281–302, (2005).
predicted and reference scenarios (see Supplementary Fig. 4). We note that 7. Baker, L. H. et al. Climate responses to anthropogenic emissions of short-lived
such a method can sometimes achieve a higher performance in predicting climate pollutants. Atmos. Chem. Phys. 15, 8201–8216 (2015).
the mean response in some regions than our machine learning approach. 8. Aamaas, B., Berntsen, T. K., Fuglestvedt, J. S., Shine, K. P. & Collins, W. J. Regional
However, it suffers the same limitations as the method presented here, in temperature change potentials for short-lived climate forcers based on radiative
that the spatial variability in the response is not captured, particularly for forcing from multiple models. Atmos. Chem. Phys. 17, 10795–10809 (2017).
short-lived pollutants (Supplementary Fig. 3). This limitation will be true 9. Collins, W. J. et al. Global and regional temperature-change potentials for near-
regardless of the choice of scaler value, since the spatial variability is fixed term climate forcers. Atmos. Chem. Phys. 13, 2471–2485 (2013).
based on the reference pattern. 10. Bitz, C. M. & Polvani, L. M. Antarctic climate response to stratospheric ozone
depletion in a fine resolution ocean climate model. Geophys. Res. Lett. 39, L20705
(2012).
Prediction errors 11. Nowack, P. J., Braesicke, P., Luke Abraham, N. & Pyle, J. A. On the role of ozone
We predict long-term climate response, y for each scenario following the feedback in the ENSO amplitude response under global warming. Geophys. Res.
three methods described above. We calculate the Root Mean Squared Error Lett. 44, 3858–3866 (2017).
(RMSE) at the grid-cell level with 12. Hartmann, D. L., Blossey, P. N. & Dygert, B. D. Convection and climate: what have
!1=2 we learned from simple models and simplified settings? Curr. Clim. Chang. Rep. 5,
Xp 2 196–206 (2019).
RMSE ¼
wi y y 2 (10)
i i 13. Persad, G. G. & Caldeira, K. Divergent global-scale temperature effects from
i
identical aerosols emitted in different regions. Nat. Commun. 9, 3289 (2018).
where subscript i ¼ 1; ¼ ; p indexes the grid cell and w i is the normalised 14. Shawki, D., Voulgarakis, A., Chakraborty, A., Kasoar, M. & Srinivasan, J. The South
weight of grid cell i. We note that measuring errors at these scales can Asian monsoon response to remote aerosols: global and regional mechanisms. J.
introduce unintended biases in the evaluation of our methods. For Geophys. Res. Atmos. 123, 11585–11601 (2018).
example, even small spatial offsets in climate response patterns can lead to 15. Conley, A. J. et al. Multimodel surface temperature responses to removal of U.S.
large, nonphysical quantitative errors44. We also show the absolute error in sulfur dioxide emissions. J. Geophys. Res. Atmos. 123, 2773–2796 (2018).
mean response over ten world regions that cover a broader spatial scale 16. Liu, L. et al. A PDRMIP Multimodel study on the impacts of regional aerosol
(Fig. 3). These are the four main emission regions; North America, Europe, forcings on global and regional precipitation. J. Clim. 31, 4429–4447 (2018).
South Asia and East Asia, as defined in the Hemispheric Transport of 17. Williams, K. D. et al. The Met Office Global Coupled Model 3.0 and 3.1 (GC3.0 and
Air Pollution experiments68; and six remaining regions; the Arctic, GC3.1) Configurations. J. Adv. Model. Earth Syst. 10, 357–380 (2018).
Northwest Asia, Northern Africa, Southern Africa, South America and 18. Walters, D. et al. The met Office Unified Model Global Atmosphere 7.0/7.1 and
Australia. These cover the land regions where climate responses are of JULES Global Land 7.0 configurations. Geosci. Model Dev. 12, 1909–1963 (2019).
8
19. Storkey, D. et al. UK Global Ocean GO6 and GO7: a traceable hierarchy of model 51. Williamson, D., Blaker, A. T., Hampton, C. & Salter, J. Identifying and removing
resolutions. Geosci. Model Dev. 11, 3187–3213 (2018). structural biases in climate models with history matching. Clim. Dyn. 45,
20. Ridley, J. K. et al. The sea ice model component of HadGEM3-GC3.1. Geosci. Model 1299–1324 (2015).
Dev. 11, 713–723 (2018). 52. Cumming, J. & Goldstein, M. Bayes linear uncertainty analysis for oil reservoirs
21. Ceppi, P., Zappa, G., Shepherd, T. G. & Gregory, J. M. Fast and slow components of based on multiscale computer experiments. In Oxford Handbook of Applied
the extratropical atmospheric circulation response to CO2 forcing. J. Clim. 31, Bayesian Analysis, Oxford: Oxford University Press, (eds. O’Hagan, A., & West, M.)
1091–1105 (2017). pp. 241–270 (2010).
22. Persad, G. G., Ming, Y., Shen, Z. & Ramaswamy, V. Spatially similar surface energy flux 53. McNeall, D. J., Challenor, P. G., Gattiker, J. R. & Stone, E. J. The potential of an
perturbations due to greenhouse gases and aerosols. Nat. Commun. 9, 3247 (2018). observational data set for calibration of a computationally expensive computer
23. Ryan, E., Wild, O., Voulgarakis, A. & Lee, L. Fast sensitivity analysis methods for model. Geosci. Model Dev. Discuss. 6, 1715–1728 (2013).
computationally expensive models with multi-dimensional output. Geosci. Model 54. Salter, J. M. & Williamson, D. A comparison of statistical emulation methodologies for
Dev. 11, 3131–3146 (2018). multi-wave calibration of environmental models. Environmetrics 27, 507–523 (2016).
24. Bracco, A., Falasca, F., Nenes, A., Fountalis, I. & Dovrolis, C. Advancing climate 55. Rougier, J., Sexton, D. M. H., Murphy, J. M. & Stainforth, D. Analyzing the climate
science with knowledge-discovery through data mining. npj Clim. Atmos. Sci. 1, sensitivity of the HadSM3 climate model using ensembles from different but
20174 (2018). related experiments. J. Clim. 22, 3540–3557 (2009).
25. Kretschmer, M., Runge, J. & Coumou, D. Early prediction of extreme stratospheric 56. Lee, L. A., Reddington, C. L. & Carslaw, K. S. On the relationship between aerosol
polar vortex states based on causal precursors. Geophys. Res. Lett. 44, 8592–8600 model uncertainty and radiative forcing uncertainty. Proc. Natl Acad. Sci. USA.
(2017). 113, 5820–5827 (2016).
26. Nowack, P. et al. Using machine learning to build temperature-based ozone 57. Edwards, T. L. et al. Revisiting Antarctic ice loss due to marine ice-cliff instability.
parameterizations for climate sensitivity simulations. Environ. Res. Lett. 13, 104016 Nature 566, 58–64 (2019).
(2018). 58. Castruccio, S. et al. Statistical emulation of climate model projections based on
27. Sippel, S. et al. Uncovering the forced climate response from a single ensemble precomputed GCM runs. J. Clim. 27, 1829–1844 (2014).
member using statistical learning. J. Clim. 32, 5677–5699 (2019). 59. Tran, G. T. et al. Building a traceable climate model hierarchy with multi-level
28. Runge, J. et al. Inferring causation from time series in Earth system sciences. Nat. emulators. Adv. Stat. Climatol. Meteorol. Oceanogr. 2, 17–37 (2016).
Commun. 10, 2553 (2019). 60. Shindell, D. T., Voulgarakis, A., Faluvegi, G. & Milly, G. Precipitation response to
29. Knüsel, B. et al. Applying big data beyond small problems in climate research. regional radiative forcing. Atmos. Chem. Phys. 12, 6969–6982 (2012).
Nat. Clim. Change 9, 196–202 (2019). 61. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B
30. Nowack, P., Runge, J., Eyring, V. & Haigh, J. D. Causal networks for climate model 58(1), 267–288 (1996).
evaluation and constrained projections. Nat. Commun. 11, 1415 (2020). 62. Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res.
31. Myhre, G. et al. PDRMIP: a precipitation driver and response model inter- 12, 2825–2830 (2011).
comparison project-protocol and preliminary results. Bull. Am. Meteorol. Soc. 98, 63. GPy. GPy: A gaussian process framework in python. http://github.com/SheffieldML/
1185–1198 (2017). GPy (2014).
32. Samset, B. H. et al. Fast and slow precipitation responses to individual climate 64. Huntingford, C. & Cox, P. M. An analogue model to derive additional climate
forcers: a PDRMIP multimodel study. Geophys. Res. Lett. 43, 2782–2791 (2016). change scenarios from existing GCM simulations. Clim. Dyn. 16, 575–586 (2000).
33. Stohl, A. et al. Evaluating the climate and air quality impacts of short-lived pol- 65. Harris, G. R. et al. Frequency distributions of transient regional climate change
lutants. Atmos. Chem. Phys. 15, 10529–10566 (2015). from perturbed physics ensembles of general circulation model simulations. Clim.
34. Kasoar, M. et al. Regional and global temperature response to anthropogenic SO2 Dyn. 27, 357–375 (2006).
emissions from China in three climate models. Atmos. Chem. Phys. 16, 9785–9804 66. Ishizaki, Y. et al. Temperature scaling pattern dependence on representative
(2016). concentration pathway emission scenarios. Clim. Change 112, 535–546 (2012).
35. Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and 67. Gregory, J. M. et al. A new method for diagnosing radiative forcing and climate
Statistics) (Springer-Verlag, 2006). sensitivity. Geophys. Res. Lett. 31, L03205 (2004).
36. Mitchell, T. D. Pattern scaling: an examination of the accuracy of the technique 68. Sanderson, M. G. et al. A multi-model study of the hemispheric transport and
for describing future climates. Clim. Change 60, 217–242 (2003). deposition of oxidised nitrogen. Geophys. Res. Lett. 35, L17815 (2008).
37. Hoerl, A. E. & Kennard, R. W. Ridge regression: applications to nonorthogonal
problems. Technometrics 12, 69–82 (1970).
38. Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning. ACKNOWLEDGEMENTS
Cambridge MA: MIT Press (2006). L.A.M.’s work was funded through EPSRC grant EP/L016613/1. P.J.N. is supported
39. Santer, B. D., Wigley, T. M. L., Schlesinger, M. E. & Mitchell, J. F. B. Developing through an Imperial College Research Fellowship. A.V. is partially funded by the
Climate Scenarios from Equilibrium GCM Results (1990) Max Planck Institut für Leverhulme Centre for Wildfires, Environment and Society through the Leverhulme
Meteorologie, Report 47, Hamburg. Trust, grant RC-2018-023. Simulations with HadGEM3-GA4 were performed using the
40. Hulme, M., Raper, S. C. B. & Wigley, T. M. L. An integrated framework to address MONSooN system, a collaborative facility supplied under the Joint Weather and
climate change (ESCAPE) and further developments of the global and regional Climate Research Programme, which is a strategic partnership between the Met
climate modules (MAGICC). Energy Policy 23, 347–355 (1995). Office and the Natural Environment Research Council.
41. Murphy, J. M. et al. A methodology for probabilistic predictions of regional cli-
mate change from perturbed physics ensembles. Philos. Trans. R. Soc. A Math.
Phys. Eng. Sci. 365, 1993–2028 (2007).
AUTHOR CONTRIBUTIONS
42. Watterson, I. G. Calculation of probability density functions for temperature and pre-
cipitation change under global warming. J. Geophys. Res. Atmos 113, D12106 (2008). This work was initiated by A.V. L.A.M. carried out the analyses and wrote the
43. Tebaldi, C. & Arblaster, J. M. Pattern scaling: Its strengths and limitations, and an manuscript. A.V. and P.J.N. supervised and contributed to writing. P.J.N. and R.G.E.
update on the latest model simulations. Clim. Change 122, 459–471 (2014). advised on statistical methods. M.K. and B.C. performed the simulations.
44. Rougier, J. Ensemble averaging and mean squared error. J. Clim. 29, 8865–8870 (2016).
45. Hall, A., Cox, P., Huntingford, C. & Klein, S. Progressing emergent constraints on
future climate change. Nat. Clim. Change 9, 269–278 (2019) COMPETING INTERESTS
46. Fu, Q., Manabe, S. & Johanson, C. M. On the warming in the tropical upper The authors declare no competing interests.
troposphere: models versus observations. Geophys. Res. Lett. 38, L15704 (2011).
47. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer
New York Inc., 2001). ADDITIONAL INFORMATION
48. Pendergrass, A. G., Knutti, R., Lehner, F., Deser, C. & Sanderson, B. M. Precipitation Supplementary information is available for this paper at https://doi.org/10.1038/
variability increases in a warmer climate. Sci. Rep. 7, 17966 (2017). s41612-020-00148-5.
49. Pendergrass, A. G. & Knutti, R. The uneven nature of daily precipitation and its
change. Geophys. Res. Lett. 45, 11980–11988 (2018). Correspondence and requests for materials should be addressed to L.A.M.
50. Pendergrass, A. G., Lehner, F., Sanderson, B. M. & Xu, Y. Does extreme pre-
cipitation intensity depend on the emissions scenario? Geophys. Res. Lett. 42, Reprints and permission information is available at http://www.nature.com/
8767–8774 (2015). reprints
9
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims Commons license, and indicate if changes were made. The images or other third party
in published maps and institutional affiliations. material in this article are included in the article’s Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
Open Access This article is licensed under a Creative Commons org/licenses/by/4.0/.
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative © CROWN 2020

Predicting Global Patterns of Long-Term Climate Change From Short-Term Simulations Using Machine Learning

Uploaded by

Copyright:

Available Formats

Predicting Global Patterns of Long-Term Climate Change From Short-Term Simulations Using Machine Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predicting Global Patterns of Long-Term Climate Change From Short-Term Simulations Using Machine Learning

Uploaded by

Copyright:

Available Formats

www.nature.

Predicting global patterns of long-term climate change from

INTRODUCTION outputs (long-term responses) given new unseen inputs (short-

Published in partnership with CECCR at King Abdulaziz University

Ridge regression Gaussian Process Regression

You might also like