Connecting Dynamic Vegetation Models To Data - An Inverse Perspective
Connecting Dynamic Vegetation Models To Data - An Inverse Perspective
Connecting Dynamic Vegetation Models To Data - An Inverse Perspective
SPECIAL
ISSUE
ABSTRACT
INTRODUCTION
2240
http://wileyonlinelibrary.com/journal/jbi
doi:10.1111/j.1365-2699.2012.02745.x
F. Hartig et al.
Individual scale
Plot scale
Geographical scale
Projection into
geographic space
Carbon
allocation
Seeds
Upscaling
species1
Factor 2
Factor 2
Latitude
species1
species 2
species1
species 2
Factor 1
Factor 1
Fundamental niche,
growth rate > 0
in isolation
Realized niche,
growth rate > 0
in community
Longitude
Realized distribution,
map of realized niche to
environmental conditions
properties (Kohler et al., 2000; see also Huth & Ditzer, 2000;
FORMIX model; Sitch et al., 2003; LPJ model; Scheiter &
Higgins, 2009; aDGVM model). It should be noted, however,
that the number of PFTs that are locally co-occurring in global
vegetation models is typically much lower than in their more
local counterparts (e.g. Huth & Ditzer, 2000).
Central to most DVMs are also interactions between plants,
for example through competition for resources such as light,
water and nutrients that are available in a common resource
pool (Fig. 1). These competition processes are major determinants for predicted community composition, non-equilibrium
dynamics and succession. Examples are the dry limit of trees,
which is determined in many DGVMs by competition with
grasses and fire (Sitch et al., 2003; Scheiter & Higgins, 2009),
and successional dynamics in forest gap models, which are
based on vertical light competition between trees of different
sizes (Bugmann, 1996; Huth & Ditzer, 2000). Finally, geographical distribution and productivity maps can be created by
upscaling model predictions to the scale at which environmental information is available (Fig. 1).
Thus, DVMs take a bottom-up approach to predict from
basic processes the responses of plants to environmental
conditions and competition. There are, however, limitations
to constraining all necessary parameters and processes bottomup. For example, despite the fact that many processes in DGVMs
are relatively well understood, small differences in parameterization and model design can propagate to create large
uncertainties (Cramer et al., 2001; Sitch et al., 2008; Galbraith
et al., 2010). These uncertainties are unlikely to have been
reduced by the newer generation of DVMs (Moorcroft et al.,
2242
species 2
2001; Smith et al., 2001; Sato et al., 2007). Moreover, the desire
to include additional processes, such as dispersal and migration
(Lischke et al., 2006), more detailed plant physiology (Higgins
et al., 2012), seed competition (Bohn et al., 2011), environmental modulation (Linder et al., 2012) or phenotypic plasticity
and evolution (e.g. Parmesan, 2006; Kramer et al., 2010), is
bound to add to the problem of constraining uncertainty about
parameters and processes. A further issue in global vegetation
models is the practice of summarizing a large number of species
or individuals by an average functional type. The Amazonian
Basin, for example, is home to c. 11,000 tree species (Hubbell
et al., 2008), but many global vegetation models summarize this
diversity by only two functional types. While we concede that it
may not always be necessary to model each species explicitly,
there is substantial evidence that interspecific and intraspecific
diversity in species traits may be important for understanding
and correctly predicting community dynamics (Daz & Cabido,
2001; Clark, 2010; McMahon et al., 2011).
To address these challenges, we have to find additional
information to constrain parameters and processes. To that
end, it seems natural to explore the potential of inverse
modelling approaches, which allow for the use of a large range
of data types from different sources and scales that cannot be
used for direct parameter estimation. Modern Bayesian
methods provide the means to combine such inverse parameter estimates with any direct parameter estimates that are
available, and thus provide a method to synthesize all available
data sources. In the following section we review the general
technical prerequisites of Bayesian methods for parameter
estimation and model selection. This will be useful for
Journal of Biogeography 39, 22402252
2012 Blackwell Publishing Ltd
MHD2
2r2
F. Hartig et al.
Parameter value
Bayes theorem
By specifying the likelihood function, we have established a
statistically interpretable measure of fit, which could, for
example, be used to search for the model parameters Q with
the highest likelihood (maximum likelihood estimation).
Often, however, there is additional, independent information
about parameters or processes available, for example from field
data that provide direct parameter estimates. Bayes theorem
offers the possibility of merging such independent information
with the inversely generated information that is contained in
the likelihood. The theorem states that
pHjD
pDjH pH
pD
Strong / informative
prior
Probability density
Probability density
Parameter value
Measurements
Abundance
NPP predicted
e.g. specific
leaf area
Stem diameter
Prior
parameter
estimate
Repeat to
add new
information
Likelihood
Dynamic of model
vegetation predictions
given the
model
observed
Posterior
data
parameter
estimate
Error model
NPP observed
Abundance
Forecasting,
quantification of
predictive uncertainty
Model selection
and improvement
Stem diameter
F. Hartig et al.
MATLAB or PYTHON, there are packages that provide
implementations for the most common algorithms. Also,
there are stand-alone implementations such as OpenBUGS or
JAGS.
A point of practical importance is the number of model
calculations needed for these algorithms: an inverse Bayesian
parameter estimation typically requires about 5 104 to
5 105 model runs for a model of the order of 20 parameters.
Given that there are limits to the parallelizability particularly of
MCMC algorithms, it follows that a single model run should
not take longer than a minute (a runtime of 1 min for a single
simulation and 105 steps of the algorithm result in a total
runtime of c. 70 days). For models with longer runtimes,
model simplification, model parallelization or model emulation (Conti & OHagan, 2010) can be considered.
SELECTING FIELD DATA FOR INVERSE
MODELLING OF VEGETATION MODELS
The final part of this paper is devoted to data and applications.
We discuss what we perceive as the most promising data types
for constraining parameters and processes of DVMs, and argue
that combining multiple data types at different scales provides
a promising general strategy for obtaining strong, independent
data sets.
Data from local vegetation inventories
One traditional source of vegetation data is local vegetation
inventories. Such observations usually describe the vegetation
on sampling plots of up to a few hectares, listing species,
abundance, sizes or biomass, growth and sometimes also the
spatial location of plant individuals. Direct vegetation observations are comparably abundant for forests [Condit et al.,
2000; Jenkins et al., 2001; ter Steege et al., 2006; see also the
Forest Inventory Analysis program (FIA) and the Center for
Tropical Forest Science (CTFS) forest plot network], but there
are also good datasets available for grasslands (Roscher et al.,
2004), savannas (Sankaran et al., 2005) and from general
databases such as the Global Index of Vegetation-Plot
Passive
Sensors such as Landsat or MODIS (Moderate Resolution Imaging Spectroradiometer) observe the visible, infrared and
microwave spectrum; allows estimation of leaf area via the normalized difference vegetation index, NDVI
(see, e.g. Zhang et al., 2003), as well as water content and temperature profiles of canopies (infrared spectrum,
Fensholt et al., 2010) and soil moisture and temperature (microwave spectrum, Kerr, 2007) and even chemical and
taxonomic diversity (Asner & Martin, 2009).
A lidar device creates a laser beam and measures the reflection created by all optically reflecting material, i.e. ground,
leaves etc. May be used to estimate vegetation height and leaf area index, LAI (Asner et al., 2010; Lefsky, 2010).
A radar device creates an electromagnetic signal, typically radio waves or microwaves, and measures reflections of
this signal. Depending on the wavelength, reflections are triggered particularly by the massive parts of the vegetation,
i.e. branches and trunks. Used to estimate aboveground biomass, height/vertical structure of the vegetation (Krieger
et al., 2010; Kohler & Huth, 2010), also soil moisture (Kerr, 2007) and temperature.
Laser (Lidar)
Radar
2246
2247
F. Hartig et al.
models, which may increase the computational burden and lead
to instabilities in inference if the correlation structure is not
captured adequately. Thus, information gains have to be
weighed up against potential problems that come with new data.
Correlations, however, may not only be influenced by the
types of data that are chosen, but also by the way the data are
aggregated and represented. Data can be transformed, changed
in scale (up/downscaling) or summarized. A sensible choice
here may considerably reduce correlations. Thus, we believe
that independence of data is a technical problem, but we still
recommend considering the full spectrum of vegetation data
for inference. In particular, we recommend including data that
are measured at different scales (Fig. 5) because errors will
tend to be less correlated and information more complementary than for data obtained at the same scale (Grimm &
Railsback, 2012). Using all available data sources will not only
facilitate a synthesis of our empirical information as a product
that is more useful for ecological theory and prediction, it will
also facilitate the further development of DVMs by systematically assessing their ability to match multiple empirical
observations at the same time.
CONCLUSIONS
We have discussed the principles of the Bayesian modelling
approach for inferring parameters and structure of DVMs.
Inverse parameterization of DVMs resembles the correlative
species distribution modelling approach. The importance of
prior knowledge about parameters, and also about model
structure, however, will remain an area where DVMs differ
significantly from correlative modelling approaches. We
therefore think that inverse modelling methods will not, as
one might fear, reduce DVMs to merely a very complicated
version of a correlative model that is blindly adjusted to data.
Rather, Bayesian methods offer a state-of-the-art technique
that allows us to use all available data (on model inputs and
model outputs) to improve our knowledge about the
processes that govern vegetation dynamics, and hence our
ability to predict the future development of the Earths
vegetation.
The additional information that can be gained by applying
inverse modelling methods to heterogeneous data types such as
species distributions, local vegetation inventories, eddy flux
measurements, and remote sensing data offers tremendous
scientific potential (see also Luo et al., 2011). Using many or
all of these data in parallel will allow us to build models that
provide a better representation of the dynamics and functional
diversity of the terrestrial vegetation. Bayesian methods will
also improve our knowledge about the uncertainty of model
predictions, which is an important factor for informing policy.
The benefits of such a route lie not only in the potential to
generate substantially better predictions, making DVMs more
relevant for applied questions, but also in the ability to test
ecological theory with DVMs at different temporal and spatial
scales, which allows fundamental questions of evolution,
biogeography and community ecology to be addressed. For
2248
F. Hartig et al.
Hartig, F., Calabrese, J.M., Reineking, B., Wiegand, T. & Huth,
A. (2011) Statistical inference for stochastic simulation
models theory and application. Ecology Letters, 14, 816
827.
Hickler, T., Vohland, K., Feehan, J., Miller, P.A., Smith, B.,
Costa, L., Giesecke, T., Fronzek, S., Carter, T., Cramer, W.,
Kuhn, I. & Sykes, M.T. (2012) Projecting the future distribution of European potential natural vegetation zones with
a generalized, tree species-based dynamic vegetation model.
Global Ecology and Biogeography, 21, 5063.
Higgins, S.I., Scheiter, S. & Sankaran, M. (2010) The stability
of African savannas: insights from the indirect estimation
of the parameters of a dynamic model. Ecology, 91, 1682
1692.
Higgins, S.I., OHara, R.B., Bykova, O., Cramer, M.D., Chuine,
I., Gerstner, E.-M., Hickler, T., Morin, X., Kearney, M.R.,
Midgley, G.F. & Scheiter, S. (2012) A physiological analogy
of the niche for projecting the potential distribution of
plants. Journal of Biogeography, 39, 21322145.
Hubbell, S.P., He, F., Condit, R., Borda-de-Agua, L., Kellner, J.
& ter Steege, H. (2008) How many tree species are there in
the Amazon and how many of them will go extinct? Proceedings of the National Academy of Sciences USA, 105,
1149811504.
Hurtt, G.C., Fisk, J., Thomas, R.Q., Dubayah, R., Moorcroft,
P.R. & Shugart, H.H. (2010) Linking models and data on
vegetation structure. Journal of Geophysical Research, 115,
G00E10.
Huth, A. & Ditzer, T. (2000) Simulation of the growth of a
lowland Dipterocarp rain forest with FORMIX3. Ecological
Modelling, 134, 125.
Jeltsch, F., Moloney, K.A., Schurr, F.M., Kochy, M. & Schwager, M. (2008) The state of plant population modelling in
light of environmental change. Perspectives in Plant Ecology,
Evolution and Systematics, 9, 171189.
Jenkins, J.C., Birdsey, R.A. & Pan, Y. (2001) Biomass and NPP
estimation for the mid-Atlantic region (USA) using plotlevel forest inventory data. Ecological Applications, 11, 1174
1193.
Johnson, J.B. & Omland, K.S. (2004) Model selection in
ecology and evolution. Trends in Ecology and Evolution, 19,
101108.
Kadane, J.B. & Lazar, N.A. (2004) Methods and criteria for
model selection. Journal of the American Statistical Association, 99, 279290.
Kass, R.E. & Wasserman, L. (1996) The selection of prior
distributions by formal rules. Journal of the American Statistical Association, 91, 13431370.
Kattge, J., Daz, S., Lavorel, S. et al. (2011) TRY a global
database of plant traits. Global Change Biology, 17, 29052935.
Kerr, Y.H. (2007) Soil moisture from space: where are we?
Hydrogeology Journal, 15, 117120.
Knorr, W. & Kattge, J. (2005) Inversion of terrestrial ecosystem
model parameter values against eddy covariance measurements by Monte Carlo sampling. Global Change Biology, 11,
13331351.
2250
F. Hartig et al.
Xiaodong, Y. & Shugart, H.H. (2005) FAREAST: a forest gap
model to simulate dynamics and patterns of eastern Eurasian forests. Journal of Biogeography, 32, 16411658.
Zhang, X., Friedl, M.A., Schaaf, C.B., Strahler, A.H., Hodges,
J.C.F., Gao, F., Reed, B.C. & Huete, A. (2003) Monitoring
vegetation phenology using MODIS. Remote Sensing of
Environment, 84, 471475.
Zimmermann, N.E., Edwards, T.C., Graham, C.H., Pearman,
P.B. & Svenning, J.-C. (2010) New trends in species distribution modelling. Ecography, 33, 985989.
Zurell, D., Berger, U., Cabral, J.S., Jeltsch, F., Meynard, C.N.,
Munkemuller, T., Nehrbass, N., Pagel, J., Reineking, B.,
Schroder, B. & Grimm, V. (2009) The virtual ecologist
approach: simulating data and observers. Oikos, 119,
622635.
BIOSKETCH
Florian Hartig is interested in the processes that govern the
assembly, dynamics, distribution and evolution of ecological
communities. His current research focuses on diversity
patterns of tropical rain forests. All authors of this paper
share a common interest in vegetation modelling and/or
statistical methodology.
Author contributions: All authors worked together in designing and writing this review. F.H. coordinated the discussion
and led the writing.
2252