Geostatistic Analysis: Lecture 6-8 Feb25-Mar10, 2008
Geostatistic Analysis: Lecture 6-8 Feb25-Mar10, 2008
Geostatistic Analysis: Lecture 6-8 Feb25-Mar10, 2008
Lecture 6-8
Feb25-Mar10, 2008
Purposes
A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second
data set. By a quantile, we mean the fraction (or percent) of points below the given
value. That is, the 0.3 quantile (or 30%) is the point at which 30% of the data fall
below and 70% fall above that value
0.3 quantile is also called 30 percentile, 0.4 quantitle is 40 percentile
25 percentile is called first quantitle
50 percentile is called second quantitle (equal to median value)
75 percentile is called third quantitle
100 percentile is called fourth quantile
Normal QQPlot
General QQPlot
1.3 Voronoi map
• The semivariogram/covariance
cloud shows the empirical
semivariogram (half of the
difference squared) and covariance
for all pairs of locations within a
dataset and plots them as a
function of the distance between
the two locations.
• the empirical semivariogram for the
(i,j)th pair is simply
0.5*(z(si)-z(sj))2, and the empirical
covariance is the cross-product
where is the
average of the data. The
semivariogram/covariance cloud
can be used to examine the local
characteristics of spatial
autocorrelation within a dataset
and look for outliers.
Creating Variography
The distance where the model first flattens out is known as the range
The value at which the model attains the range is called the sill
The value at which the model intercepts the y-axis is called the nugget
Fitting the semivariogram
• Circular, spherical, exponential,
Gaussian, and linear.
Making a prediction
r
r
r
We usually uses power functions greater than 1.
A r = 2 is known as the inverse distance
squared weighted interpolation.
Adv. and disadv.
3. Global Polynomial interpolator
First order
Second order:
Third order:
4. Local Polynomial interpolator
Radial basis
functions Wi
weight:
- regularized: 0, 0.001, 0.01, 0.1, 0.5
the higher the weight,
the smoother the surface
- tension: 0, 1, 5, 10
the higher, the coarser
number of points:
- used in the calculation. the more points.
the smoother the surface
6. Kriging
• Kriging is a moderately quick interpolator that can be
exact or smoothed depending on the measurement error
model. It is very flexible and allows you to investigate
graphs of spatial autocorrelation. Kriging uses statistical
models that allow a variety of map outputs including
predictions, prediction standard errors, probability, etc.
The flexibility of kriging can require a lot of decision-
making. Kriging assumes the data come from a
stationary stochastic process (1. constant mean
throughout the region, or 2. variance of differences
between any two samples is independent of position, but
depends on separated distance), , and some methods
assume normally-distributed data.
Trend and error
• In Kriging, a predicted value depends on two factors: a trend and an additional
element of variability. This is an intuitive idea with plenty of analogies in the real
world. For instance, if you go from the ocean to the top of a mountain, you
have an upward trend in elevation. However, there is likely to be variation on
the way—you will go both up and down when crossing valleys, streams, knobs
and other features.
• In Kriging, the trend part of a prediction is called the trend. The fluctuation part
is called spatially-autocorrelated random error. "Error" doesn't mean a
mistake—it just means a fluctuation from the trend. Z(s) = μ(s) + ε (s)
– Assumption one: ε (s) is zero, (positive errors and negative errors)
– Assumption two: the autocorrelation of the error is purely spatial; it
depends only on distance and not on any other property (such as position)
of a location.
• before produce a final surface, you should know how well the model predicts the
values at unknown locations. cross-validation and validation help you make an
informed decision as to which model provides the best predictions.
• cross-validation uses all of the data to estimate the autocorrelation model. Then
it removes each data location, one at a time, and predicts the associated data
value. compare the predicted value with measured value
• validation first remove part of the data (test dataset) using Create Subset tool,
and then uses the rest of the data (training dataset) to develop the trend and
autocorrection models to be used for prediction
• in both methods, graphs and summary statistics used for diagnostics are the
same: predicted, prediction error (predicted-measured), standardized error
(error/estimated kriging standard error), normal QQPlot (standardized error and
standard normal distribution)
create subset for validation
demo
basic rules for good predicts
• Kriging performs statistical analysis of the error in its predictions. This allows it to
create four kinds of surfaces: prediction, standard error, quantile, and probability.
• Prediction maps estimate values at locations where measurements have not
been taken. (All interpolators make prediction maps.)
• Standard error maps show the distribution of prediction error for a surface. Error
tends to be highest in places where there is little or no sample data.
• Quantile maps show the values that the true values are unlikely to exceed.
• Probability maps show the odds that the true value at a location is greater than a
threshold value.
• The various interpolation methods (Inverse Distance Weighting, Global
Polynomial, Local Polynomial, Radial Basis Functions, and Kriging) offer trade-
offs in speed, flexibility, and their advantages and disadvantages.
• Fast interpolators produce output surfaces quickly, but are not as good at
capturing subtle surface variations.
• Exact interpolators predict values equal to the observed value at all sampled
locations. Smooth interpolators do not.
• Flexible interpolators allow users to fine-tune the output, while inflexible
interpolators allow users to avoid making lots of choices
Main references