Engineering Statistics Handbook
Engineering Statistics Handbook
Engineering Statistics Handbook
1
2
>
2
2
for an upper one-tailed test
1
2
2
2
for a two-tailed test
Test
Statistic:
F =
where and are the sample variances. The
more this ratio deviates from 1, the stronger the
evidence for unequal population variances.
Significance
Level:
Critical
Region:
The hypothesis that the two variances are equal
is rejected if
F > F
, N
1
-1, N
2
-1
for an upper one-tailed test
F < F
1-, N
1
-1, N
2
-1
for a lower one-tailed
test
F < F
1-/2, N
1
-1, N
2
-1
for a two-tailed test
1.3.5.9. F-Test for Equality of Two Variances
http://www.itl.nist.gov/div898/handbook/eda/section3/eda359.htm[6/27/2012 2:01:58 PM]
or
F > F
/2, N
1
-1, N
2
-1
where F
, N
1
-1, N
2
-1
is the critical value of the F
distribution with N
1
-1 and N
2
-1 degrees of
freedom and a significance level of .
In the above formulas for the critical regions,
the Handbook follows the convention that F
is
the upper critical value from the F distribution
and F
1-
is the lower critical value from the F
distribution. Note that this is the opposite of the
designation used by some texts and software
programs.
F Test
Example
The following F-test was generated for the AUTO83B.DAT
data set. The data set contains 480 ceramic strength
measurements for two batches of material. The summary
statistics for each batch are shown below.
BATCH 1:
NUMBER OF OBSERVATIONS = 240
MEAN = 688.9987
STANDARD DEVIATION = 65.54909
BATCH 2:
NUMBER OF OBSERVATIONS = 240
MEAN = 611.1559
STANDARD DEVIATION = 61.85425
We are testing the null hypothesis that the variances for the
two batches are equal.
H
0
:
1
2
=
2
2
H
a
:
1
2
2
2
Test statistic: F = 1.123037
Numerator degrees of freedom: N
1
- 1 = 239
Denominator degrees of freedom: N
2
- 1 = 239
Significance level: = 0.05
Critical values: F(1-/2,N
1
-1,N
2
-1) = 0.7756
F(/2,N
1
-1,N
2
-1) = 1.2894
Rejection region: Reject H
0
if F < 0.7756 or F >
1.2894
The F test indicates that there is not enough evidence to reject
the null hypothesis that the two batch variancess are equal at
the 0.05 significance level.
Questions The F-test can be used to answer the following questions:
1. Do two samples come from populations with equal
variancess?
2. Does a new process, treatment, or test reduce the
variability of the current process?
1.3.5.9. F-Test for Equality of Two Variances
http://www.itl.nist.gov/div898/handbook/eda/section3/eda359.htm[6/27/2012 2:01:58 PM]
Related
Techniques
Quantile-Quantile Plot
Bihistogram
Chi-Square Test
Bartlett's Test
Levene Test
Case Study Ceramic strength data.
Software The F-test for equality of two variances is available in many
general purpose statistical software programs. Both Dataplot
code and R code can be used to generate the analyses in this
section.
1.3.5.10. Levene Test for Equality of Variances
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35a.htm[6/27/2012 2:01:59 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.10. Levene Test for Equality of Variances
Purpose:
Test for
Homogeneity
of Variances
Levene's test ( Levene 1960) is used to test if k samples have
equal variances. Equal variances across samples is called
homogeneity of variance. Some statistical tests, for example
the analysis of variance, assume that variances are equal
across groups or samples. The Levene test can be used to
verify that assumption.
Levene's test is an alternative to the Bartlett test. The Levene
test is less sensitive than the Bartlett test to departures from
normality. If you have strong evidence that your data do in
fact come from a normal, or nearly normal, distribution, then
Bartlett's test has better performance.
Definition The Levene test is defined as:
H
0
:
1
2
=
2
2
= ... =
k
2
H
a
:
i
2
j
2
for at least one pair (i,j).
Test
Statistic:
Given a variable Y with sample of size N
divided into k subgroups, where N
i
is the
sample size of the ith subgroup, the Levene test
statistic is defined as:
where Z
ij
can have one of the following three
definitions:
1.
where is the mean of the ith subgroup.
2.
where is the median of the ith
subgroup.
3.
1.3.5.10. Levene Test for Equality of Variances
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35a.htm[6/27/2012 2:01:59 PM]
where is the 10% trimmed mean of
the ith subgroup.
are the group means of the Z
ij
and is the
overall mean of the Z
ij
.
The three choices for defining Z
ij
determine the
robustness and power of Levene's test. By
robustness, we mean the ability of the test to
not falsely detect unequal variances when the
underlying data are not normally distributed and
the variables are in fact equal. By power, we
mean the ability of the test to detect unequal
variances when the variances are in fact
unequal.
Levene's original paper only proposed using the
mean. Brown and Forsythe (1974)) extended
Levene's test to use either the median or the
trimmed mean in addition to the mean. They
performed Monte Carlo studies that indicated
that using the trimmed mean performed best
when the underlying data followed a Cauchy
distribution (i.e., heavy-tailed) and the median
performed best when the underlying data
followed a (i.e., skewed) distribution. Using
the mean provided the best power for
symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the
underlying distribution, the definition based on
the median is recommended as the choice that
provides good robustness against many types of
non-normal data while retaining good power. If
you have knowledge of the underlying
distribution of the data, this may indicate using
one of the other choices.
Significance
Level:
Critical
Region:
The Levene test rejects the hypothesis that the
variances are equal if
W > F
, k-1, N-k
where F
, k-1, N-k
is the upper critical value of
the F distribution with k-1 and N-k degrees of
freedom at a significance level of .
In the above formulas for the critical regions,
the Handbook follows the convention that F
is
the upper critical value from the F distribution
1.3.5.10. Levene Test for Equality of Variances
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35a.htm[6/27/2012 2:01:59 PM]
and F
1-
is the lower critical value. Note that
this is the opposite of some texts and software
programs.
Levene's Test
Example
Levene's test, based on the median, was performed for the
GEAR.DAT data set. The data set includes ten measurements
of gear diameter for each of ten batches for a total of 100
measurements.
H
0
:
1
2
= ... =
10
2
H
a
:
1
2
...
10
2
Test statistic: W = 1.705910
Degrees of freedom: k-1 = 10-1 = 9
N-k = 100-10 = 90
Significance level: = 0.05
Critical value (upper tail): F
,k-1,N-k
= 1.9855
Critical region: Reject H
0
if F > 1.9855
We are testing the hypothesis that the group variances are
equal. We fail to reject the null hypothesis at the 0.05
significance level since the value of the Levene test statistic is
less than the critical value. We conclude that there is
insufficient evidence to claim that the variances are not equal.
Question Levene's test can be used to answer the following question:
Is the assumption of equal variances valid?
Related
Techniques
Standard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
Software The Levene test is available in some general purpose
statistical software programs. Both Dataplot code and R code
can be used to generate the analyses in this section.
1.3.5.11. Measures of Skewness and Kurtosis
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm[6/27/2012 2:02:00 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.11. Measures of Skewness and Kurtosis
Skewness
and
Kurtosis
A fundamental task in many statistical analyses is to
characterize the location and variability of a data set. A
further characterization of the data includes skewness and
kurtosis.
Skewness is a measure of symmetry, or more precisely, the
lack of symmetry. A distribution, or data set, is symmetric if
it looks the same to the left and right of the center point.
Kurtosis is a measure of whether the data are peaked or flat
relative to a normal distribution. That is, data sets with high
kurtosis tend to have a distinct peak near the mean, decline
rather rapidly, and have heavy tails. Data sets with low
kurtosis tend to have a flat top near the mean rather than a
sharp peak. A uniform distribution would be the extreme
case.
The histogram is an effective graphical technique for showing
both the skewness and kurtosis of data set.
Definition
of Skewness
For univariate data Y
1
, Y
2
, ..., Y
N
, the formula for skewness
is:
where is the mean, is the standard deviation, and N is the
number of data points. The skewness for a normal
distribution is zero, and any symmetric data should have a
skewness near zero. Negative values for the skewness
indicate data that are skewed left and positive values for the
skewness indicate data that are skewed right. By skewed left,
we mean that the left tail is long relative to the right tail.
Similarly, skewed right means that the right tail is long
relative to the left tail. Some measurements have a lower
bound and are skewed right. For example, in reliability
studies, failure times cannot be negative.
Definition
of Kurtosis
For univariate data Y
1
, Y
2
, ..., Y
N
, the formula for kurtosis is:
1.3.5.11. Measures of Skewness and Kurtosis
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm[6/27/2012 2:02:00 PM]
where is the mean, is the standard deviation, and N is the
number of data points.
Alternative
Definition
of Kurtosis
The kurtosis for a standard normal distribution is three. For
this reason, some sources use the following definition of
kurtosis (often referred to as "excess kurtosis"):
This definition is used so that the standard normal
distribution has a kurtosis of zero. In addition, with the
second definition positive kurtosis indicates a "peaked"
distribution and negative kurtosis indicates a "flat"
distribution.
Which definition of kurtosis is used is a matter of convention
(this handbook uses the original definition). When using
software to compute the sample kurtosis, you need to be
aware of which convention is being followed. Many sources
use the term kurtosis when they are actually computing
"excess kurtosis", so it may not always be clear.
Examples The following example shows histograms for 10,000 random
numbers generated from a normal, a double exponential, a
Cauchy, and a Weibull distribution.
Normal
Distribution
The first histogram is a sample from a normal distribution.
The normal distribution is a symmetric distribution with well-
behaved tails. This is indicated by the skewness of 0.03. The
kurtosis of 2.96 is near the expected value of 3. The
histogram verifies the symmetry.
1.3.5.11. Measures of Skewness and Kurtosis
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm[6/27/2012 2:02:00 PM]
Double
Exponential
Distribution
The second histogram is a sample from a double exponential
distribution. The double exponential is a symmetric
distribution. Compared to the normal, it has a stronger peak,
more rapid decay, and heavier tails. That is, we would expect
a skewness near zero and a kurtosis higher than 3. The
skewness is 0.06 and the kurtosis is 5.9.
Cauchy
Distribution
The third histogram is a sample from a Cauchy distribution.
For better visual comparison with the other data sets, we
restricted the histogram of the Cauchy distribution to values
between -10 and 10. The full data set for the Cauchy data in
fact has a minimum of approximately -29,000 and a
maximum of approximately 89,000.
The Cauchy distribution is a symmetric distribution with
heavy tails and a single peak at the center of the distribution.
Since it is symmetric, we would expect a skewness near zero.
Due to the heavier tails, we might expect the kurtosis to be
larger than for a normal distribution. In fact the skewness is
69.99 and the kurtosis is 6,693. These extremely high values
can be explained by the heavy tails. Just as the mean and
standard deviation can be distorted by extreme values in the
tails, so too can the skewness and kurtosis measures.
Weibull
Distribution
The fourth histogram is a sample from a Weibull distribution
with shape parameter 1.5. The Weibull distribution is a
skewed distribution with the amount of skewness depending
on the value of the shape parameter. The degree of decay as
we move away from the center also depends on the value of
the shape parameter. For this data set, the skewness is 1.08
and the kurtosis is 4.46, which indicates moderate skewness
and kurtosis.
Dealing
with
Skewness
and
Kurtosis
Many classical statistical tests and intervals depend on
normality assumptions. Significant skewness and kurtosis
clearly indicate that data are not normal. If a data set exhibits
significant skewness or kurtosis (as indicated by a histogram
or the numerical measures), what can we do about it?
One approach is to apply some type of transformation to try
to make the data normal, or more nearly normal. The Box-
Cox transformation is a useful technique for trying to
normalize a data set. In particular, taking the log or square
root of a data set is often useful for data that exhibit moderate
right skewness.
Another approach is to use techniques based on distributions
other than the normal. For example, in reliability studies, the
exponential, Weibull, and lognormal distributions are
typically used as a basis for modeling rather than using the
normal distribution. The probability plot correlation
coefficient plot and the probability plot are useful tools for
determining a good distributional model for the data.
1.3.5.11. Measures of Skewness and Kurtosis
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm[6/27/2012 2:02:00 PM]
Software The skewness and kurtosis coefficients are available in most
general purpose statistical software programs.
1.3.5.12. Autocorrelation
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm[6/27/2012 2:02:02 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.12. Autocorrelation
Purpose:
Detect Non-
Randomness,
Time Series
Modeling
The autocorrelation ( Box and Jenkins, 1976) function
can be used for the following two purposes:
1. To detect non-randomness in data.
2. To identify an appropriate time series model if the
data are not random.
Definition Given measurements, Y
1
, Y
2
, ..., Y
N
at time X
1
, X
2
, ..., X
N
,
the lag k autocorrelation function is defined as
Although the time variable, X, is not used in the formula
for autocorrelation, the assumption is that the observations
are equi-spaced.
Autocorrelation is a correlation coefficient. However,
instead of correlation between two different variables, the
correlation is between two values of the same variable at
times X
i
and X
i+k
.
When the autocorrelation is used to detect non-
randomness, it is usually only the first (lag 1)
autocorrelation that is of interest. When the
autocorrelation is used to identify an appropriate time
series model, the autocorrelations are usually plotted for
many lags.
Autocorrelation
Example
Lag-one autocorrelations were computed for the the
LEW.DAT data set.
lag autocorrelation
0. 1.00
1. -0.31
2. -0.74
3. 0.77
4. 0.21
5. -0.90
6. 0.38
7. 0.63
8. -0.77
9. -0.12
10. 0.82
11. -0.40
12. -0.55
13. 0.73
14. 0.07
15. -0.76
1.3.5.12. Autocorrelation
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm[6/27/2012 2:02:02 PM]
16. 0.40
17. 0.48
18. -0.70
19. -0.03
20. 0.70
21. -0.41
22. -0.43
23. 0.67
24. 0.00
25. -0.66
26. 0.42
27. 0.39
28. -0.65
29. 0.03
30. 0.63
31. -0.42
32. -0.36
33. 0.64
34. -0.05
35. -0.60
36. 0.43
37. 0.32
38. -0.64
39. 0.08
40. 0.58
41. -0.45
42. -0.28
43. 0.62
44. -0.10
45. -0.55
46. 0.45
47. 0.25
48. -0.61
49. 0.14
Questions The autocorrelation function can be used to answer the
following questions.
1. Was this sample data set generated from a random
process?
2. Would a non-linear or time series model be a more
appropriate model for these data than a simple
constant plus error model?
Importance Randomness is one of the key assumptions in
determining if a univariate statistical process is in control.
If the assumptions of constant location and scale,
randomness, and fixed distribution are reasonable, then
the univariate process can be modeled as:
where E
i
is an error term.
If the randomness assumption is not valid, then a different
model needs to be used. This will typically be either a
time series model or a non-linear model (with time as the
independent variable).
Related
Techniques
Autocorrelation Plot
Run Sequence Plot
Lag Plot
Runs Test
Case Study The heat flow meter data demonstrate the use of
autocorrelation in determining if the data are from a
random process.
Software The autocorrelation capability is available in most general
1.3.5.12. Autocorrelation
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm[6/27/2012 2:02:02 PM]
purpose statistical software programs. Both Dataplot code
and R code can be used to generate the analyses in this
section.
1.3.5.13. Runs Test for Detecting Non-randomness
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm[6/27/2012 2:02:03 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.13. Runs Test for Detecting Non-
randomness
Purpose:
Detect Non-
Randomness
The runs test (Bradley, 1968) can be used to decide if a data
set is from a random process.
A run is defined as a series of increasing values or a series of
decreasing values. The number of increasing, or decreasing,
values is the length of the run. In a random data set, the
probability that the (I+1)th value is larger or smaller than the
Ith value follows a binomial distribution, which forms the basis
of the runs test.
Typical
Analysis
and Test
Statistics
The first step in the runs test is to count the number of runs in
the data sequence. There are several ways to define runs in the
literature, however, in all cases the formulation must produce a
dichotomous sequence of values. For example, a series of 20
coin tosses might produce the following sequence of heads (H)
and tails (T).
H H T T H T H H H H T H H T T T T T H H
The number of runs for this series is nine. There are 11 heads
and 9 tails in the sequence.
Definition We will code values above the median as positive and values
below the median as negative. A run is defined as a series of
consecutive positive (or negative) values. The runs test is
defined as:
H
0
:
the sequence was produced in a random manner
H
a
:
the sequence was not produced in a random
manner
Test
Statistic:
The test statistic is
where R is the observed number of runs, R, is the
expected number of runs, and s
R
is the standard
deviation of the number of runs. The values of R
1.3.5.13. Runs Test for Detecting Non-randomness
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm[6/27/2012 2:02:03 PM]
and s
R
are computed as follows:
where n
1
and n
2
are the number of positive and
negative values in the series.
Significance
Level:
Critical
Region:
The runs test rejects the null hypothesis if
|Z| > Z
1-/2
For a large-sample runs test (where n
1
> 10 and
n
2
> 10), the test statistic is compared to a
standard normal table. That is, at the 5 %
significance level, a test statistic with an absolute
value greater than 1.96 indicates non-
randomness. For a small-sample runs test, there
are tables to determine critical values that depend
on values of n
1
and n
2
(Mendenhall, 1982).
Runs Test
Example
A runs test was performed for 200 measurements of beam
deflection contained in the LEW.DAT data set.
H
0
: the sequence was produced in a random manner
H
a
: the sequence was not produced in a random
manner
Test statistic: Z = 2.6938
Significance level: = 0.05
Critical value (upper tail): Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
Since the test statistic is greater than the critical value, we
conclude that the data are not random at the 0.05 significance
level.
Question The runs test can be used to answer the following question:
Were these sample data generated from a random
process?
Importance Randomness is one of the key assumptions in determining if a
univariate statistical process is in control. If the assumptions of
constant location and scale, randomness, and fixed distribution
1.3.5.13. Runs Test for Detecting Non-randomness
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm[6/27/2012 2:02:03 PM]
are reasonable, then the univariate process can be modeled as:
where E
i
is an error term.
If the randomness assumption is not valid, then a different
model needs to be used. This will typically be either a times
series model or a non-linear model (with time as the
independent variable).
Related
Techniques
Autocorrelation
Run Sequence Plot
Lag Plot
Case Study Heat flow meter data
Software Most general purpose statistical software programs support a
runs test. Both Dataplot code and R code can be used to
generate the analyses in this section.
1.3.5.14. Anderson-Darling Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm[6/27/2012 2:02:04 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.14. Anderson-Darling Test
Purpose:
Test for
Distributional
Adequacy
The Anderson-Darling test (Stephens, 1974) is used to test if a
sample of data came from a population with a specific distribution.
It is a modification of the Kolmogorov-Smirnov (K-S) test and
gives more weight to the tails than does the K-S test. The K-S test
is distribution free in the sense that the critical values do not depend
on the specific distribution being tested. The Anderson-Darling test
makes use of the specific distribution in calculating critical values.
This has the advantage of allowing a more sensitive test and the
disadvantage that critical values must be calculated for each
distribution. Currently, tables of critical values are available for the
normal, lognormal, exponential, Weibull, extreme value type I, and
logistic distributions. We do not provide the tables of critical values
in this Handbook (see Stephens 1974, 1976, 1977, and 1979) since
this test is usually applied with a statistical software program that
will print the relevant critical values.
The Anderson-Darling test is an alternative to the chi-square and
Kolmogorov-Smirnov goodness-of-fit tests.
Definition The Anderson-Darling test is defined as:
H
0
: The data follow a specified distribution.
H
a
: The data do not follow the specified distribution
Test
Statistic:
The Anderson-Darling test statistic is defined as
where
F is the cumulative distribution function of the
specified distribution. Note that the Y
i
are the ordered
data.
Significance
Level:
Critical
Region:
The critical values for the Anderson-Darling test are
dependent on the specific distribution that is being
1.3.5.14. Anderson-Darling Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm[6/27/2012 2:02:04 PM]
tested. Tabulated values and formulas have been
published (Stephens, 1974, 1976, 1977, 1979) for a
few specific distributions (normal, lognormal,
exponential, Weibull, logistic, extreme value type 1).
The test is a one-sided test and the hypothesis that the
distribution is of a specific form is rejected if the test
statistic, A, is greater than the critical value.
Note that for a given distribution, the Anderson-
Darling statistic may be multiplied by a constant
(which usually depends on the sample size, n). These
constants are given in the various papers by Stephens.
In the sample output below, the test statistic values are
adjusted. Also, be aware that different constants (and
therefore critical values) have been published. You
just need to be aware of what constant was used for a
given set of critical values (the needed constant is
typically given with the critical values).
Sample
Output
We generated 1,000 random numbers for normal, double
exponential, Cauchy, and lognormal distributions. In all four cases,
the Anderson-Darling test was applied to test for a normal
distribution.
The normal random numbers were stored in the variable Y1, the
double exponential random numbers were stored in the variable Y2,
the Cauchy random numbers were stored in the variable Y3, and the
lognormal random numbers were stored in the variable Y4.
Distribution Mean Standard
Deviation
------------ -------- ---------
---------
Normal (Y1) 0.004360
1.001816
Double Exponential (Y2) 0.020349
1.321627
Cauchy (Y3) 1.503854
35.130590
Lognormal (Y4) 1.518372
1.719969
H
0
: the data are normally distributed
H
a
: the data are not normally distributed
Y1 adjusted test statistic: A
2
= 0.2576
Y2 adjusted test statistic: A
2
= 5.8492
Y3 adjusted test statistic: A
2
= 288.7863
Y4 adjusted test statistic: A
2
= 83.3935
Significance level: = 0.05
Critical value: 0.752
Critical region: Reject H
0
if A
2
> 0.752
When the data were generated using a normal distribution, the test
statistic was small and the hypothesis of normality was not rejected.
When the data were generated using the double exponential,
Cauchy, and lognormal distributions, the test statistics were large,
and the hypothesis of an underlying normal distribution was
1.3.5.14. Anderson-Darling Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm[6/27/2012 2:02:04 PM]
rejected at the 0.05 significance level.
Questions The Anderson-Darling test can be used to answer the following
questions:
Are the data from a normal distribution?
Are the data from a log-normal distribution?
Are the data from a Weibull distribution?
Are the data from an exponential distribution?
Are the data from a logistic distribution?
Importance Many statistical tests and procedures are based on specific
distributional assumptions. The assumption of normality is
particularly common in classical statistical tests. Much reliability
modeling is based on the assumption that the data follow a Weibull
distribution.
There are many non-parametric and robust techniques that do not
make strong distributional assumptions. However, techniques based
on specific distributional assumptions are in general more powerful
than non-parametric and robust techniques. Therefore, if the
distributional assumptions can be validated, they are generally
preferred.
Related
Techniques
Chi-Square goodness-of-fit Test
Kolmogorov-Smirnov Test
Shapiro-Wilk Normality Test
Probability Plot
Probability Plot Correlation Coefficient Plot
Case Study Josephson junction cryothermometry case study.
Software The Anderson-Darling goodness-of-fit test is available in some
general purpose statistical software programs. Both Dataplot code
and R code can be used to generate the analyses in this section.
1.3.5.15. Chi-Square Goodness-of-Fit Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm[6/27/2012 2:02:05 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.15. Chi-Square Goodness-of-Fit Test
Purpose:
Test for
distributional
adequacy
The chi-square test (Snedecor and Cochran, 1989) is used
to test if a sample of data came from a population with a
specific distribution.
An attractive feature of the chi-square goodness-of-fit test
is that it can be applied to any univariate distribution for
which you can calculate the cumulative distribution
function. The chi-square goodness-of-fit test is applied to
binned data (i.e., data put into classes). This is actually not a
restriction since for non-binned data you can simply
calculate a histogram or frequency table before generating
the chi-square test. However, the value of the chi-square
test statistic are dependent on how the data is binned.
Another disadvantage of the chi-square test is that it
requires a sufficient sample size in order for the chi-square
approximation to be valid.
The chi-square test is an alternative to the Anderson-
Darling and Kolmogorov-Smirnov goodness-of-fit tests.
The chi-square goodness-of-fit test can be applied to
discrete distributions such as the binomial and the Poisson.
The Kolmogorov-Smirnov and Anderson-Darling tests are
restricted to continuous distributions.
Additional discussion of the chi-square goodness-of-fit test
is contained in the product and process comparisons chapter
(chapter 7).
Definition The chi-square test is defined for the hypothesis:
H
0
:
The data follow a specified distribution.
H
a
: The data do not follow the specified
distribution.
Test
Statistic:
For the chi-square goodness-of-fit
computation, the data are divided into k bins
and the test statistic is defined as
1.3.5.15. Chi-Square Goodness-of-Fit Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm[6/27/2012 2:02:05 PM]
where is the observed frequency for bin i
and is the expected frequency for bin i.
The expected frequency is calculated by
where F is the cumulative Distribution
function for the distribution being tested, Y
u
is
the upper limit for class i, Y
l
is the lower limit
for class i, and N is the sample size.
This test is sensitive to the choice of bins.
There is no optimal choice for the bin width
(since the optimal bin width depends on the
distribution). Most reasonable choices should
produce similar, but not identical, results. For
the chi-square approximation to be valid, the
expected frequency should be at least 5. This
test is not valid for small samples, and if some
of the counts are less than five, you may need
to combine some bins in the tails.
Significance
Level:
.
Critical
Region:
The test statistic follows, approximately, a
chi-square distribution with (k - c) degrees of
freedom where k is the number of non-empty
cells and c = the number of estimated
parameters (including location and scale
parameters and shape parameters) for the
distribution + 1. For example, for a 3-
parameter Weibull distribution, c = 4.
Therefore, the hypothesis that the data are
from a population with the specified
distribution is rejected if
where is the chi-square critical
value with k - c degrees of freedom and
significance level .
Chi-Square
Test Example
We generated 1,000 random numbers for normal, double
exponential, t with 3 degrees of freedom, and lognormal
distributions. In all cases, a chi-square test with k = 32 bins
was applied to test for normally distributed data. Because
the normal distribution has two parameters, c = 2 + 1 = 3
The normal random numbers were stored in the variable
Y1, the double exponential random numbers were stored in
the variable Y2, the t random numbers were stored in the
1.3.5.15. Chi-Square Goodness-of-Fit Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm[6/27/2012 2:02:05 PM]
variable Y3, and the lognormal random numbers were
stored in the variable Y4.
H
0
: the data are normally distributed
H
a
: the data are not normally distributed
Y1 Test statistic:
2
= 32.256
Y2 Test statistic:
2
= 91.776
Y3 Test statistic:
2
= 101.488
Y4 Test statistic:
2
= 1085.104
Significance level: = 0.05
Degrees of freedom: k - c = 32 - 3 = 29
Critical value:
2
1-, k-c
= 42.557
Critical region: Reject H
0
if
2
> 42.557
As we would hope, the chi-square test fails to reject the null
hypothesis for the normally distributed data set and rejects
the null hypothesis for the three non-normal data sets.
Questions The chi-square test can be used to answer the following
types of questions:
Are the data from a normal distribution?
Are the data from a log-normal distribution?
Are the data from a Weibull distribution?
Are the data from an exponential distribution?
Are the data from a logistic distribution?
Are the data from a binomial distribution?
Importance Many statistical tests and procedures are based on specific
distributional assumptions. The assumption of normality is
particularly common in classical statistical tests. Much
reliability modeling is based on the assumption that the
distribution of the data follows a Weibull distribution.
There are many non-parametric and robust techniques that
are not based on strong distributional assumptions. By non-
parametric, we mean a technique, such as the sign test, that
is not based on a specific distributional assumption. By
robust, we mean a statistical technique that performs well
under a wide range of distributional assumptions. However,
techniques based on specific distributional assumptions are
in general more powerful than these non-parametric and
robust techniques. By power, we mean the ability to detect a
difference when that difference actually exists. Therefore, if
the distributional assumption can be confirmed, the
parametric techniques are generally preferred.
If you are using a technique that makes a normality (or
some other type of distributional) assumption, it is important
to confirm that this assumption is in fact justified. If it is,
the more powerful parametric techniques can be used. If the
distributional assumption is not justified, a non-parametric
or robust technique may be required.
1.3.5.15. Chi-Square Goodness-of-Fit Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm[6/27/2012 2:02:05 PM]
Related
Techniques
Anderson-Darling Goodness-of-Fit Test
Kolmogorov-Smirnov Test
Shapiro-Wilk Normality Test
Probability Plots
Probability Plot Correlation Coefficient Plot
Software Some general purpose statistical software programs provide
a chi-square goodness-of-fit test for at least some of the
common distributions. Both Dataplot code and R code can
be used to generate the analyses in this section.
1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm[6/27/2012 2:02:06 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test
Purpose:
Test for
Distributional
Adequacy
The Kolmogorov-Smirnov test (Chakravart, Laha, and Roy,
1967) is used to decide if a sample comes from a population with
a specific distribution.
The Kolmogorov-Smirnov (K-S) test is based on the empirical
distribution function (ECDF). Given N ordered data points Y
1
,
Y
2
, ..., Y
N
, the ECDF is defined as
where n(i) is the number of points less than Y
i
and the Y
i
are
ordered from smallest to largest value. This is a step function that
increases by 1/N at the value of each ordered data point.
The graph below is a plot of the empirical distribution function
with a normal cumulative distribution function for 100 normal
random numbers. The K-S test is based on the maximum distance
between these two curves.
Characteristics
and
Limitations of
An attractive feature of this test is that the distribution of the K-S
test statistic itself does not depend on the underlying cumulative
distribution function being tested. Another advantage is that it is
1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm[6/27/2012 2:02:06 PM]
the K-S Test an exact test (the chi-square goodness-of-fit test depends on an
adequate sample size for the approximations to be valid). Despite
these advantages, the K-S test has several important limitations:
1. It only applies to continuous distributions.
2. It tends to be more sensitive near the center of the
distribution than at the tails.
3. Perhaps the most serious limitation is that the distribution
must be fully specified. That is, if location, scale, and shape
parameters are estimated from the data, the critical region
of the K-S test is no longer valid. It typically must be
determined by simulation.
Due to limitations 2 and 3 above, many analysts prefer to use the
Anderson-Darling goodness-of-fit test. However, the Anderson-
Darling test is only available for a few specific distributions.
Definition The Kolmogorov-Smirnov test is defined by:
H
0
:
The data follow a specified distribution
H
a
:
The data do not follow the specified distribution
Test
Statistic:
The Kolmogorov-Smirnov test statistic is defined as
where F is the theoretical cumulative distribution of
the distribution being tested which must be a
continuous distribution (i.e., no discrete
distributions such as the binomial or Poisson), and
it must be fully specified (i.e., the location, scale,
and shape parameters cannot be estimated from the
data).
Significance
Level:
.
Critical
Values:
The hypothesis regarding the distributional form is
rejected if the test statistic, D, is greater than the
critical value obtained from a table. There are
several variations of these tables in the literature
that use somewhat different scalings for the K-S
test statistic and critical regions. These alternative
formulations should be equivalent, but it is
necessary to ensure that the test statistic is
calculated in a way that is consistent with how the
critical values were tabulated.
We do not provide the K-S tables in the Handbook
since software programs that perform a K-S test
will provide the relevant critical values.
Technical Note Previous editions of e-Handbook gave the following formula for
the computation of the Kolmogorov-Smirnov goodness of fit
1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm[6/27/2012 2:02:06 PM]
statistic:
This formula is in fact not correct. Note that this formula can be
rewritten as:
This form makes it clear that an upper bound on the difference
between these two formulas is i/N. For actual data, the difference
is likely to be less than the upper bound.
For example, for N = 20, the upper bound on the difference
between these two formulas is 0.05 (for comparison, the 5%
critical value is 0.294). For N = 100, the upper bound is 0.001. In
practice, if you have moderate to large sample sizes (say N 50),
these formulas are essentially equivalent.
Kolmogorov-
Smirnov Test
Example
We generated 1,000 random numbers for normal, double
exponential, t with 3 degrees of freedom, and lognormal
distributions. In all cases, the Kolmogorov-Smirnov test was
applied to test for a normal distribution.
The normal random numbers were stored in the variable Y1, the
double exponential random numbers were stored in the variable
Y2, the t random numbers were stored in the variable Y3, and the
lognormal random numbers were stored in the variable Y4.
H
0
: the data are normally distributed
H
a
: the data are not normally distributed
Y1 test statistic: D = 0.0241492
Y2 test statistic: D = 0.0514086
Y3 test statistic: D = 0.0611935
Y4 test statistic: D = 0.5354889
Significance level: = 0.05
Critical value: 0.04301
Critical region: Reject H
0
if D > 0.04301
As expected, the null hypothesis is not rejected for the normally
distributed data, but is rejected for the remaining three data sets
that are not normally distributed.
Questions The Kolmogorov-Smirnov test can be used to answer the
following types of questions:
Are the data from a normal distribution?
Are the data from a log-normal distribution?
Are the data from a Weibull distribution?
Are the data from an exponential distribution?
Are the data from a logistic distribution?
1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm[6/27/2012 2:02:06 PM]
Importance Many statistical tests and procedures are based on specific
distributional assumptions. The assumption of normality is
particularly common in classical statistical tests. Much reliability
modeling is based on the assumption that the data follow a
Weibull distribution.
There are many non-parametric and robust techniques that are not
based on strong distributional assumptions. By non-parametric,
we mean a technique, such as the sign test, that is not based on a
specific distributional assumption. By robust, we mean a
statistical technique that performs well under a wide range of
distributional assumptions. However, techniques based on specific
distributional assumptions are in general more powerful than
these non-parametric and robust techniques. By power, we mean
the ability to detect a difference when that difference actually
exists. Therefore, if the distributional assumptions can be
confirmed, the parametric techniques are generally preferred.
If you are using a technique that makes a normality (or some
other type of distributional) assumption, it is important to confirm
that this assumption is in fact justified. If it is, the more powerful
parametric techniques can be used. If the distributional
assumption is not justified, using a non-parametric or robust
technique may be required.
Related
Techniques
Anderson-Darling goodness-of-fit Test
Chi-Square goodness-of-fit Test
Shapiro-Wilk Normality Test
Probability Plots
Probability Plot Correlation Coefficient Plot
Software Some general purpose statistical software programs support the
Kolmogorov-Smirnov goodness-of-fit test, at least for the more
common distributions. Both Dataplot code and R code can be
used to generate the analyses in this section.
1.3.5.17. Detection of Outliers
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm[6/27/2012 2:02:08 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.17. Detection of Outliers
Introduction An outlier is an observation that appears to deviate
markedly from other observations in the sample.
Identification of potential outliers is important for the
following reasons.
1. An outlier may indicate bad data. For example, the
data may have been coded incorrectly or an
experiment may not have been run correctly. If it
can be determined that an outlying point is in fact
erroneous, then the outlying value should be deleted
from the analysis (or corrected if possible).
2. In some cases, it may not be possible to determine if
an outlying point is bad data. Outliers may be due to
random variation or may indicate something
scientifically interesting. In any event, we typically
do not want to simply delete the outlying
observation. However, if the data contains
significant outliers, we may need to consider the use
of robust statistical techniques.
Labeling,
Accomodation,
Identification
Iglewicz and Hoaglin distinguish the three following
issues with regards to outliers.
1. outlier labeling - flag potential outliers for further
investigation (i.e., are the potential outliers
erroneous data, indicative of an inappropriate
distributional model, and so on).
2. outlier accomodation - use robust statistical
techniques that will not be unduly affected by
outliers. That is, if we cannot determine that
potential outliers are erroneous observations, do we
need modify our statistical analysis to more
appropriately account for these observations?
3. outlier identification - formally test whether
observations are outliers.
This section focuses on the labeling and identification
1.3.5.17. Detection of Outliers
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm[6/27/2012 2:02:08 PM]
issues.
Normality
Assumption
Identifying an observation as an outlier depends on the
underlying distribution of the data. In this section, we limit
the discussion to univariate data sets that are assumed to
follow an approximately normal distribution. If the
normality assumption for the data being tested is not valid,
then a determination that there is an outlier may in fact be
due to the non-normality of the data rather than the
prescence of an outlier.
For this reason, it is recommended that you generate a
normal probability plot of the data before applying an
outlier test. Although you can also perform formal tests for
normality, the prescence of one or more outliers may
cause the tests to reject normality when it is in fact a
reasonable assumption for applying the outlier test.
In addition to checking the normality assumption, the
lower and upper tails of the normal probability plot can be
a useful graphical technique for identifying potential
outliers. In particular, the plot can help determine whether
we need to check for a single outlier or whether we need
to check for multiple outliers.
The box plot and the histogram can also be useful
graphical tools in checking the normality assumption and
in identifying potential outliers.
Single Versus
Multiple
Outliers
Some outlier tests are designed to detect the prescence of a
single outlier while other tests are designed to detect the
prescence of multiple outliers. It is not appropriate to
apply a test for a single outlier sequentially in order to
detect multiple outliers.
In addition, some tests that detect multiple outliers may
require that you specify the number of suspected outliers
exactly.
Masking and
Swamping
Masking can occur when we specify too few outliers in the
test. For example, if we are testing for a single outlier
when there are in fact two (or more) outliers, these
additional outliers may influence the value of the test
statistic enough so that no points are declared as outliers.
On the other hand, swamping can occur when we specify
too many outliers in the test. For example, if we are testing
for two or more outliers when there is in fact only a single
outlier, both points may be declared outliers (many tests
will declare either all or none of the tested points as
outliers).
Due to the possibility of masking and swamping, it is
useful to complement formal outlier tests with graphical
1.3.5.17. Detection of Outliers
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm[6/27/2012 2:02:08 PM]
methods. Graphics can often help identify cases where
masking or swamping may be an issue. Swamping and
masking are also the reason that many tests require that the
exact number of outliers being tested must be specified.
Also, masking is one reason that trying to apply a single
outlier test sequentially can fail. For example, if there are
multiple outliers, masking may cause the outlier test for
the first outlier to return a conclusion of no outliers (and
so the testing for any additional outliers is not performed).
Z-Scores and
Modified Z-
Scores
The Z-score of an observation is defined as
with and s denoting the sample mean and sample
standard deviation, respectively. In other words, data is
given in units of how many standard deviations it is from
the mean.
Although it is common practice to use Z-scores to identify
possible outliers, this can be misleading (partiucarly for
small sample sizes) due to the fact that the maximum Z-
score is at most .
Iglewicz and Hoaglin recommend using the modified Z-
score
with MAD denoting the median absolute deviation and
denoting the median.
These authors recommend that modified Z-scores with an
absolute value of greater than 3.5 be labeled as potential
outliers.
Formal
Outlier Tests
A number of formal outlier tests have proposed in the
literature. These can be grouped by the following
characteristics:
What is the distributional model for the data? We
restrict our discussion to tests that assume the data
follow an approximately normal distribution.
Is the test designed for a single outlier or is it
designed for multiple outliers?
If the test is designed for multiple outliers, does the
number of outliers need to be specified exactly or
can we specify an upper bound for the number of
outliers?
1.3.5.17. Detection of Outliers
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm[6/27/2012 2:02:08 PM]
The following are a few of the more commonly used
outlier tests for normally distributed data. This list is not
exhaustive (a large number of outlier tests have been
proposed in the literature). The tests given here are
essentially based on the criterion of "distance from the
mean". This is not the only criterion that could be used.
For example, the Dixon test, which is not discussed here,
is based a value being too large (or small) compared to its
nearest neighbor.
1. Grubbs' Test - this is the recommended test when
testing for a single outlier.
2. Tietjen-Moore Test - this is a generalization of the
Grubbs' test to the case of more than one outlier. It
has the limitation that the number of outliers must
be specified exactly.
3. Generalized Extreme Studentized Deviate (ESD)
Test - this test requires only an upper bound on the
suspected number of outliers and is the
recommended test when the exact number of outliers
is not known.
Lognormal
Distribution
The tests discussed here are specifically based on the
assumption that the data follow an approximately normal
disribution. If your data follow an approximately
lognormal distribution, you can transform the data to
normality by taking the logarithms of the data and then
applying the outlier tests discussed here.
Further
Information
Iglewicz and Hoaglin provide an extensive discussion of
the outlier tests given above (as well as some not given
above) and also give a good tutorial on the subject of
outliers. Barnett and Lewis provide a book length
treatment of the subject.
In addition to discussing additional tests for data that
follow an approximately normal distribution, these sources
also discuss the case where the data are not normally
distributed.
1.3.5.18. Yates Algorithm
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i.htm[6/27/2012 2:02:09 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.18. Yates Algorithm
Purpose:
Estimate
Factor Effects
in a 2-Level
Factorial
Design
Full factorial and fractional factorial designs are common
in designed experiments for engineering and scientific
applications.
In these designs, each factor is assigned two levels. These
are typically called the low and high levels. For
computational purposes, the factors are scaled so that the
low level is assigned a value of -1 and the high level is
assigned a value of +1. These are also commonly referred
to as "-" and "+".
A full factorial design contains all possible combinations
of low/high levels for all the factors. A fractional factorial
design contains a carefully chosen subset of these
combinations. The criterion for choosing the subsets is
discussed in detail in the process improvement chapter.
The Yates algorithm exploits the special structure of these
designs to generate least squares estimates for factor
effects for all factors and all relevant interactions.
The mathematical details of the Yates algorithm are given
in chapter 10 of Box, Hunter, and Hunter (1978). Natrella
(1963) also provides a procedure for testing the
significance of effect estimates.
The effect estimates are typically complemented by a
number of graphical techniques such as the DOE mean
plot and the DOE contour plot ("DOE" represents "design
of experiments"). These are demonstrated in the eddy
current case study.
Yates Order Before performing the Yates algorithm, the data should be
arranged in "Yates order". That is, given k factors, the kth
column consists of 2
k-1
minus signs (i.e., the low level of
the factor) followed by 2
k-1
plus signs (i.e., the high level
of the factor). For example, for a full factorial design with
three factors, the design matrix is
- - -
+ - -
- + -
1.3.5.18. Yates Algorithm
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i.htm[6/27/2012 2:02:09 PM]
+ + -
- - +
+ - +
- + +
+ + +
Determining the Yates order for fractional factorial
designs requires knowledge of the confounding structure
of the fractional factorial design.
Yates
Algorithm
The Yates algorithm is demonstrated for the eddy current
data set. The data set contains eight measurements from a
two-level, full factorial design with three factors. The
purpose of the experiment is to identify factors that have
the most effect on eddy current measurements.
In the "Effect" column, we list the main effects and
interactions from our factorial experiment in standard
order. In the "Response" column, we list the measurement
results from our experiment in Yates order.
Effect Response Col 1 Col 2 Col 3
Estimate
------ -------- ----- ----- ----- --
------
Mean 1.70 6.27 10.21 21.27
2.65875
X1 4.57 3.94 11.06 12.41
1.55125
X2 0.55 6.10 5.71 -3.47 -
0.43375
X1*X2 3.39 4.96 6.70 0.51
0.06375
X3 1.51 2.87 -2.33 0.85
0.10625
X1*X3 4.59 2.84 -1.14 0.99
0.12375
X2*X3 0.67 3.08 -0.03 1.19
0.14875
X1*X2*X3 4.29 3.62 0.54 0.57
0.07125
Sum of responses: 21.27
Sum-of-squared responses: 77.7707
Sum-of-squared Col 3: 622.1656
The first four values in Col 1 are obtained by adding
adjacent pairs of responses, for example 4.57 + 1.70 =
6.27, and 3.39 + 0.55 = 3.94. The second four values in
Col 1 are obtained by subtracting the same adjacent pairs
of responses, for example, 4.57 - 1.70 = 2.87, and 3.39 -
0.55 = 2.84. The values in Col 2 are calculated in the same
way, except that we are adding and subtracting adjacent
values from Col 1. Col 3 is computed using adjacent
values from Col 2. Finally, we obtain the "Estimate"
column by dividing the values in Col 3 by the total number
of responses, 8.
We can check our calculations by making sure that the
first value in Col 3 (21.27) is the sum of all the responses.
In addition, the sum-of-squared responses (77.7707)
should equal the sum-of-squared Col 3 values divided by 8
(622.1656/8 = 77.7707).
1.3.5.18. Yates Algorithm
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i.htm[6/27/2012 2:02:09 PM]
Practical
Considerations
The Yates algorithm provides a convenient method for
computing effect estimates; however, the same
information is easily obtained from statistical software
using either an analysis of variance or regression
procedure. The methods for analyzing data from a
designed experiment are discussed more fully in the
chapter on Process Improvement.
Graphical
Presentation
The following plots may be useful to complement the
quantitative information from the Yates algorithm.
1. Ordered data plot
2. Ordered absolute effects plot
3. Cumulative residual standard deviation plot
Questions The Yates algorithm can be used to answer the following
question.
1. What is the estimated effect of a factor on the
response?
Related
Techniques
Multi-factor analysis of variance
DOE mean plot
Block plot
DOE contour plot
Case Study The analysis of a full factorial design is demonstrated in
the eddy current case study.
Software All statistical software packages are capable of estimating
effects using an analysis of variance or least squares
regression procedure.
1.3.5.18.1. Defining Models and Prediction Equations
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i1.htm[6/27/2012 2:02:10 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.18. Yates Algorithm
1.3.5.18.1. Defining Models and Prediction
Equations
For
Orthogonal
Designs,
Parameter
Estimates
Don't
Change as
Additional
Terms Are
Added
In most cases of least-squares fitting, the model coefficients
for previously added terms change depending on what was
successively added. For example, the X1 coefficient might
change depending on whether or not an X2 term was included
in the model. This is not the case when the design is
orthogonal, as is a 2
3
full factorial design. For orthogonal
designs, the estimates for the previously included terms do not
change as additional terms are added. This means the ranked
list of parameter estimates are the least-squares coefficient
estimates for progressively more complicated models.
Example
Prediction
Equation
We use the parameter estimates derived from a least-squares
analysis for the eddy current data set to create an example
prediction equation.
Parameter Estimate
--------- --------
Mean 2.65875
X1 1.55125
X2 -0.43375
X1*X2 0.06375
X3 0.10625
X1*X3 0.12375
X2*X3 0.14875
X1*X2*X3 0.07125
A prediction equation predicts a value of the reponse variable
for given values of the factors. The equation we select can
include all the factors shown above, or it can include a subset
of the factors. For example, one possible prediction equation
using only two factors, X1 and X2, is:
The least-squares parameter estimates in the prediction
equation reflect the change in response for a one-unit change
in the factor value. To obtain "full" effect estimates (as
computed using the Yates algorithm) for the change in factor
levels from -1 to +1, the effect estimates (except for the
intercept) would be multiplied by two.
Remember that the Yates algorithm is just a convenient
1.3.5.18.1. Defining Models and Prediction Equations
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i1.htm[6/27/2012 2:02:10 PM]
method for computing effects, any statistical software package
with least-squares regression capabilities will produce the
same effects as well as many other useful analyses.
Model
Selection
We want to select the most appropriate model for our data
while balancing the following two goals.
1. We want the model to include all important factors.
2. We want the model to be parsimonious. That is, the
model should be as simple as possible.
Note that the residual standard deviation alone is insufficient
for determining the most appropriate model as it will always
be decreased by adding additional factors. The next section
describes a number of approaches for determining which
factors (and interactions) to include in the model.
1.3.5.18.2. Important Factors
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i2.htm[6/27/2012 2:02:11 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.5. Quantitative Techniques
1.3.5.18. Yates Algorithm
1.3.5.18.2. Important Factors
Identify
Important
Factors
We want to select the most appropriate model to represent our data. This requires balancing
the following two goals.
1. We want the model to include all important factors.
2. We want the model to be parsimonious. That is, the model should be as simple as
possible.
In short, we want our model to include all the important factors and interactions and to omit
the unimportant factors and interactions.
Seven criteria are utilized to define important factors. These seven criteria are not all equally
important, nor will they yield identical subsets, in which case a consensus subset or a
weighted consensus subset must be extracted. In practice, some of these criteria may not apply
in all situations.
These criteria will be examined in the context of the eddy current data set. The parameter
estimates computed using least-squares analysis are shown below.
Parameter Estimate
--------- --------
Mean 2.65875
X1 1.55125
X2 -0.43375
X1*X2 0.06375
X3 0.10625
X1*X3 0.12375
X2*X3 0.14875
X1*X2*X3 0.07125
In practice, not all of these criteria will be used with every analysis (and some analysts may
have additional criteria). These critierion are given as useful guidelines. Most analysts will
focus on those criteria that they find most useful.
Criteria for
Including
Terms in
the Model
The seven criteria that we can use in determining whether to keep a factor in the model can be
summarized as follows.
1. Parameters: Engineering Significance
2. Parameters: Order of Magnitude
3. Parameters: Statistical Significance
4. Parameters: Probability Plots
5. Effects: Youden Plot
6. Residual Standard Deviation: Engineering Significance
7. Residual Standard Deviation: Statistical Significance
The first four criteria focus on parameter estimates with three numeric criteria and one
1.3.5.18.2. Important Factors
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i2.htm[6/27/2012 2:02:11 PM]
graphical criteria. The fifth criteria focuses on effects, which are twice the parameter
estimates. The last two criteria focus on the residual standard deviation of the model. We
discuss each of these seven criteria in detail in the sections that following.
Parameters:
Engineering
Significance
The minimum engineering significant difference is defined as
where is the absolute value of the parameter estimate and is the minimum engineering
significant difference.
That is, declare a factor as "important" if the parameter estimate is greater than some a priori
declared engineering difference. This implies that the engineering staff have in fact stated
what a minimum difference will be. Oftentimes this is not the case. In the absence of an a
priori difference, a good rough rule for the minimum engineering significant is to keep only
those factors whose parameter estimate is greater than, say, 10% of the current production
average. In this case, let's say that the average detector has a sensitivity of 2.5 ohms. This
would suggest that we would declare all factors whose parameter is greater than 10 % of 2.5
ohms = 0.25 ohm to be significant (from an engineering point of view).
Based on this minimum engineering significant difference criterion, we conclude that we
should keep two terms: X1 and X2.
Parameters:
Order of
Magnitude
The order of magnitude criterion is defined as
That is, exclude any factor that is less than 10 % of the maximum parameter size. We may or
may not keep the other factors. This criterion is neither engineering nor statistical, but it does
offer some additional numerical insight. For the current example, the largest parameter is from
X1 (1.55125 ohms), and so 10 % of that is 0.155 ohms, which suggests keeping all factors
whose parameters exceed 0.155 ohms.
Based on the order-of-magnitude criterion, we thus conclude that we should keep two terms:
X1 and X2. A third term, X2*X3 (0.14875), is just slightly under the cutoff level, so we may
consider keeping it based on the other criterion.
Parameters:
Statistical
Significance
Statistical significance is defined as
That is, declare a factor as important if its parameter is more than 2 standard deviations away
from 0 (0, by definition, meaning "no effect").
The "2" comes from normal theory (more specifically, a value of 1.96 yields a 95 %
confidence interval). More precise values would come from t-distribution theory.
The difficulty with this is that in order to invoke this criterion we need the standard deviation,
, of an observation. This is problematic because
1. the engineer may not know ;
2. the experiment might not have replication, and so a model-free estimate of is not
obtainable;
1.3.5.18.2. Important Factors
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i2.htm[6/27/2012 2:02:11 PM]
3. obtaining an estimate of by assuming the sometimes- employed assumption of
ignoring 3-term interactions and higher may be incorrect from an engineering point of
view.
For the eddy current example:
1. the engineer did not know ;
2. the design (a 2
3
full factorial) did not have replication;
3. ignoring 3-term interactions and higher interactions leads to an estimate of based on
omitting only a single term: the X1*X2*X3 interaction.
For the eddy current example, if one assumes that the 3-term interaction is nil and hence
represents a single drawing from a population centered at zero, then an estimate of the
standard deviation of a parameter is simply the estimate of the 3-factor interaction (0.07125).
Two standard deviations is thus 0.1425. For this example, the rule is thus to keep all >
0.1425.
This results in keeping three terms: X1 (1.55125), X2 (-0.43375), and X1*X2 (0.14875).
Parameters:
Probability
Plots
Probability plots can be used in the following manner.
1. Normal Probability Plot: Keep a factor as "important" if it is well off the line through
zero on a normal probability plot of the parameter estimates.
2. Half-Normal Probability Plot: Keep a factor as "important" if it is well off the line near
zero on a half-normal probability plot of the absolute value of parameter estimates.
Both of these methods are based on the fact that the least-squares estimates of parameters for
these two-level orthogonal designs are simply half the difference of averages and so the
central limit theorem, loosely applied, suggests that (if no factor were important) the
parameter estimates should have approximately a normal distribution with mean zero and the
absolute value of the estimates should have a half-normal distribution.
Since the half-normal probability plot is only concerned with parmeter magnitudes as opposed
to signed parameters (which are subject to the vagaries of how the initial factor codings +1
and -1 were assigned), the half-normal probability plot is preferred by some over the normal
probability plot.
Normal
Probablity
Plot of
Parameters
The following normal probability plot shows the parameter estimates for the eddy current
data.
1.3.5.18.2. Important Factors
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i2.htm[6/27/2012 2:02:11 PM]
For the example at hand, the probability plot clearly shows two factors (X1 and X2) displaced
off the line. All of the remaining five parameters are behaving like random drawings from a
normal distribution centered at zero, and so are deemed to be statistically non-significant. In
conclusion, this rule keeps two factors: X1 (1.55125) and X2 (-0.43375).
Averages:
Youden Plot
A Youden plot can be used in the following way. Keep a factor as "important" if it is
displaced away from the central-tendancy "bunch" in a Youden plot of high and low averages.
By definition, a factor is important when its average response for the low (-1) setting is
significantly different from its average response for the high (+1) setting. (Note that effects are
twice the parameter estimates.) Conversely, if the low and high averages are about the same,
then what difference does it make which setting to use and so why would such a factor be
considered important? This fact in combination with the intrinsic benefits of the Youden plot
for comparing pairs of items leads to the technique of generating a Youden plot of the low
and high averages.
Youden Plot
of Effect
Estimates
The following is the Youden plot of the effect estimatess for the eddy current data.
1.3.5.18.2. Important Factors
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i2.htm[6/27/2012 2:02:11 PM]
For the example at hand, the Youden plot clearly shows a cluster of points near the grand
average (2.65875) with two displaced points above (factor 1) and below (factor 2). Based on
the Youden plot, we conclude to keep two factors: X1 (1.55125) and X2 (-0.43375).
Residual
Standard
Deviation:
Engineering
Significance
This criterion is defined as
Residual Standard Deviation > Cutoff
That is, declare a factor as "important" if the cumulative model that includes the factor (and
all larger factors) has a residual standard deviation smaller than an a priori engineering-
specified minimum residual standard deviation.
This criterion is different from the others in that it is model focused. In practice, this criterion
states that starting with the largest parameter, we cumulatively keep adding terms to the model
and monitor how the residual standard deviation for each progressively more complicated
model becomes smaller. At some point, the cumulative model will become complicated
enough and comprehensive enough that the resulting residual standard deviation will drop
below the pre-specified engineering cutoff for the residual standard deviation. At that point,
we stop adding terms and declare all of the model-included terms to be "important" and
1.3.5.18.2. Important Factors
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i2.htm[6/27/2012 2:02:11 PM]
everything not in the model to be "unimportant".
This approach implies that the engineer has considered what a minimum residual standard
deviation should be. In effect, this relates to what the engineer can tolerate for the magnitude
of the typical residual (the difference between the raw data and the predicted value from the
model). In other words, how good does the engineer want the prediction equation to be.
Unfortunately, this engineering specification has not always been formulated and so this
criterion can become moot.
In the absence of a prior specified cutoff, a good rough rule for the minimum engineering
residual standard deviation is to keep adding terms until the residual standard deviation just
dips below, say, 5 % of the current production average. For the eddy current data, let's say
that the average detector has a sensitivity of 2.5 ohms. Then this would suggest that we would
keep adding terms to the model until the residual standard deviation falls below 5 % of 2.5
ohms = 0.125 ohms.
Residual
Model Std. Dev.
----------------------------------------------------- ---------
Mean + X1 0.57272
Mean + X1 + X2 0.30429
Mean + X1 + X2 + X2*X3 0.26737
Mean + X1 + X2 + X2*X3 + X1*X3 0.23341
Mean + X1 + X2 + X2*X3 + X1*X3 + X3 0.19121
Mean + X1 + X2 + X2*X3 + X1*X3 + X3 + X1*X2*X3 0.18031
Mean + X1 + X2 + X2*X3 + X1*X3 + X3 + X1*X2*X3 + X1*X2 NA
Based on the minimum residual standard deviation criteria, and we would include all terms in
order to drive the residual standard deviation below 0.125. Again, the 5 % rule is a rough-
and-ready rule that has no basis in engineering or statistics, but is simply a "numerics".
Ideally, the engineer has a better cutoff for the residual standard deviation that is based on
how well he/she wants the equation to peform in practice. If such a number were available,
then for this criterion and data set we would select something less than the entire collection of
terms.
Residual
Standard
Deviation:
Statistical
Significance
This criterion is defined as
Residual Standard Deviation >
where is the standard deviation of an observation under replicated conditions.
That is, declare a term as "important" until the cumulative model that includes the term has a
residual standard deviation smaller than . In essence, we are allowing that we cannot demand
a model fit any better than what we would obtain if we had replicated data; that is, we cannot
demand that the residual standard deviation from any fitted model be any smaller than the
(theoretical or actual) replication standard deviation. We can drive the fitted standard
deviation down (by adding terms) until it achieves a value close to , but to attempt to drive it
down further means that we are, in effect, trying to fit noise.
In practice, this criterion may be difficult to apply because
1. the engineer may not know ;
2. the experiment might not have replication, and so a model-free estimate of is not
obtainable.
For the current case study:
1. the engineer did not know ;
1.3.5.18.2. Important Factors
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35i2.htm[6/27/2012 2:02:11 PM]
2. the design (a 2
3
full factorial) did not have replication. The most common way of
having replication in such designs is to have replicated center points at the center of the
cube ((X1,X2,X3) = (0,0,0)).
Thus for this current case, this criteria could not be used to yield a subset of "important"
factors.
Conclusions In summary, the seven criteria for specifying "important" factors yielded the following for the
eddy current data:
1. Parameters, Engineering Significance: X1, X2
2. Parameters, Numerically Significant: X1, X2
3. Parameters, Statistically Significant: X1, X2, X2*X3
4. Parameters, Probability Plots: X1, X2
5. Effects, Youden Plot: X1, X2
6. Residual SD, Engineering Significance: all 7 terms
7. Residual SD, Statistical Significance: not applicable
Such conflicting results are common. Arguably, the three most important criteria (listed in
order of most important) are:
4. Parameters, Probability Plots: X1, X2
1. Parameters, Engineering Significance: X1, X2
3. Residual SD, Engineering Significance: all 7 terms
Scanning all of the above, we thus declare the following consensus for the eddy current data:
1. Important Factors: X1 and X2
2. Parsimonious Prediction Equation:
(with a residual standard deviation of 0.30429 ohms)
Note that this is the initial model selection. We still need to perform model validation with a
residual analysis.
1.3.6. Probability Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda36.htm[6/27/2012 2:02:13 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
Probability
Distributions
Probability distributions are a fundamental concept in
statistics. They are used both on a theoretical level and a
practical level.
Some practical uses of probability distributions are:
To calculate confidence intervals for parameters and
to calculate critical regions for hypothesis tests.
For univariate data, it is often useful to determine a
reasonable distributional model for the data.
Statistical intervals and hypothesis tests are often
based on specific distributional assumptions. Before
computing an interval or test based on a distributional
assumption, we need to verify that the assumption is
justified for the given data set. In this case, the
distribution does not need to be the best-fitting
distribution for the data, but an adequate enough
model so that the statistical technique yields valid
conclusions.
Simulation studies with random numbers generated
from using a specific probability distribution are often
needed.
Table of
Contents
1. What is a probability distribution?
2. Related probability functions
3. Families of distributions
4. Location and scale parameters
5. Estimating the parameters of a distribution
6. A gallery of common distributions
7. Tables for probability distributions
1.3.6.1. What is a Probability Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm[6/27/2012 2:02:14 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.1. What is a Probability Distribution
Discrete
Distributions
The mathematical definition of a discrete probability
function, p(x), is a function that satisfies the following
properties.
1. The probability that x can take a specific value is p(x).
That is
2. p(x) is non-negative for all real x.
3. The sum of p(x) over all possible values of x is 1, that
is
where j represents all possible values that x can have
and p
j
is the probability at x
j
.
One consequence of properties 2 and 3 is that 0 <=
p(x) <= 1.
What does this actually mean? A discrete probability
function is a function that can take a discrete number of
values (not necessarily finite). This is most often the non-
negative integers or some subset of the non-negative
integers. There is no mathematical restriction that discrete
probability functions only be defined at integers, but in
practice this is usually what makes sense. For example, if
you toss a coin 6 times, you can get 2 heads or 3 heads but
not 2 1/2 heads. Each of the discrete values has a certain
probability of occurrence that is between zero and one. That
is, a discrete function that allows negative values or values
greater than one is not a probability function. The condition
that the probabilities sum to one means that at least one of
the values has to occur.
Continuous
Distributions
The mathematical definition of a continuous probability
function, f(x), is a function that satisfies the following
properties.
1.3.6.1. What is a Probability Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm[6/27/2012 2:02:14 PM]
1. The probability that x is between two points a and b is
2. It is non-negative for all real x.
3. The integral of the probability function is one, that is
What does this actually mean? Since continuous probability
functions are defined for an infinite number of points over a
continuous interval, the probability at a single point is
always zero. Probabilities are measured over intervals, not
single points. That is, the area under the curve between two
distinct points defines the probability for that interval. This
means that the height of the probability function can in fact
be greater than one. The property that the integral must
equal one is equivalent to the property for discrete
distributions that the sum of all the probabilities must equal
one.
Probability
Mass
Functions
Versus
Probability
Density
Functions
Discrete probability functions are referred to as probability
mass functions and continuous probability functions are
referred to as probability density functions. The term
probability functions covers both discrete and continuous
distributions. When we are referring to probability functions
in generic terms, we may use the term probability density
functions to mean both discrete and continuous probability
functions.
1.3.6.2. Related Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm[6/27/2012 2:02:15 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.2. Related Distributions
Probability distributions are typically defined in terms of the
probability density function. However, there are a number of
probability functions used in applications.
Probability
Density
Function
For a continuous function, the probability density function
(pdf) is the probability that the variate has the value x. Since
for continuous distributions the probability at a single point is
zero, this is often expressed in terms of an integral between
two points.
For a discrete distribution, the pdf is the probability that the
variate takes the value x.
The following is the plot of the normal probability density
function.
Cumulative
Distribution
Function
The cumulative distribution function (cdf) is the probability
that the variable takes a value less than or equal to x. That is
1.3.6.2. Related Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm[6/27/2012 2:02:15 PM]
For a continuous distribution, this can be expressed
mathematically as
For a discrete distribution, the cdf can be expressed as
The following is the plot of the normal cumulative
distribution function.
The horizontal axis is the allowable domain for the given
probability function. Since the vertical axis is a probability, it
must fall between zero and one. It increases from zero to one
as we go from left to right on the horizontal axis.
Percent
Point
Function
The percent point function (ppf) is the inverse of the
cumulative distribution function. For this reason, the percent
point function is also commonly referred to as the inverse
distribution function. That is, for a distribution function we
calculate the probability that the variable is less than or equal
to x for a given x. For the percent point function, we start
with the probability and compute the corresponding x for the
cumulative distribution. Mathematically, this can be
expressed as
or alternatively
The following is the plot of the normal percent point
function.
1.3.6.2. Related Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm[6/27/2012 2:02:15 PM]
Since the horizontal axis is a probability, it goes from zero to
one. The vertical axis goes from the smallest to the largest
value of the cumulative distribution function.
Hazard
Function
The hazard function is the ratio of the probability density
function to the survival function, S(x).
The following is the plot of the normal distribution hazard
function.
Hazard plots are most commonly used in reliability
applications. Note that Johnson, Kotz, and Balakrishnan refer
to this as the conditional failure density function rather than
the hazard function.
Cumulative
Hazard
The cumulative hazard function is the integral of the hazard
function.
1.3.6.2. Related Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm[6/27/2012 2:02:15 PM]
Function
This can alternatively be expressed as
The following is the plot of the normal cumulative hazard
function.
Cumulative hazard plots are most commonly used in
reliability applications. Note that Johnson, Kotz, and
Balakrishnan refer to this as the hazard function rather than
the cumulative hazard function.
Survival
Function
Survival functions are most often used in reliability and
related fields. The survival function is the probability that the
variate takes a value greater than x.
The following is the plot of the normal distribution survival
function.
1.3.6.2. Related Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm[6/27/2012 2:02:15 PM]
For a survival function, the y value on the graph starts at 1
and monotonically decreases to zero. The survival function
should be compared to the cumulative distribution function.
Inverse
Survival
Function
Just as the percent point function is the inverse of the
cumulative distribution function, the survival function also
has an inverse function. The inverse survival function can be
defined in terms of the percent point function.
The following is the plot of the normal distribution inverse
survival function.
As with the percent point function, the horizontal axis is a
probability. Therefore the horizontal axis goes from 0 to 1
regardless of the particular distribution. The appearance is
similar to the percent point function. However, instead of
going from the smallest to the largest value on the vertical
axis, it goes from the largest to the smallest value.
1.3.6.2. Related Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm[6/27/2012 2:02:15 PM]
1.3.6.3. Families of Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda363.htm[6/27/2012 2:02:17 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.3. Families of Distributions
Shape
Parameters
Many probability distributions are not a single distribution,
but are in fact a family of distributions. This is due to the
distribution having one or more shape parameters.
Shape parameters allow a distribution to take on a variety of
shapes, depending on the value of the shape parameter. These
distributions are particularly useful in modeling applications
since they are flexible enough to model a variety of data sets.
Example:
Weibull
Distribution
The Weibull distribution is an example of a distribution that
has a shape parameter. The following graph plots the Weibull
pdf with the following values for the shape parameter: 0.5,
1.0, 2.0, and 5.0.
The shapes above include an exponential distribution, a right-
skewed distribution, and a relatively symmetric distribution.
The Weibull distribution has a relatively simple distributional
form. However, the shape parameter allows the Weibull to
assume a wide variety of shapes. This combination of
simplicity and flexibility in the shape of the Weibull
distribution has made it an effective distributional model in
reliability applications. This ability to model a wide variety
of distributional shapes using a relatively simple
distributional form is possible with many other distributional
1.3.6.3. Families of Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda363.htm[6/27/2012 2:02:17 PM]
families as well.
PPCC Plots The PPCC plot is an effective graphical tool for selecting the
member of a distributional family with a single shape
parameter that best fits a given set of data.
1.3.6.4. Location and Scale Parameters
http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm[6/27/2012 2:02:18 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.4. Location and Scale Parameters
Normal
PDF
A probability distribution is characterized by location and
scale parameters. Location and scale parameters are typically
used in modeling applications.
For example, the following graph is the probability density
function for the standard normal distribution, which has the
location parameter equal to zero and scale parameter equal to
one.
Location
Parameter
The next plot shows the probability density function for a
normal distribution with a location parameter of 10 and a
scale parameter of 1.
1.3.6.4. Location and Scale Parameters
http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm[6/27/2012 2:02:18 PM]
The effect of the location parameter is to translate the graph,
relative to the standard normal distribution, 10 units to the
right on the horizontal axis. A location parameter of -10
would have shifted the graph 10 units to the left on the
horizontal axis.
That is, a location parameter simply shifts the graph left or
right on the horizontal axis.
Scale
Parameter
The next plot has a scale parameter of 3 (and a location
parameter of zero). The effect of the scale parameter is to
stretch out the graph. The maximum y value is approximately
0.13 as opposed 0.4 in the previous graphs. The y value, i.e.,
the vertical axis value, approaches zero at about (+/-) 9 as
opposed to (+/-) 3 with the first graph.
In contrast, the next graph has a scale parameter of 1/3
(=0.333). The effect of this scale parameter is to squeeze the
pdf. That is, the maximum y value is approximately 1.2 as
opposed to 0.4 and the y value is near zero at (+/-) 1 as
opposed to (+/-) 3.
1.3.6.4. Location and Scale Parameters
http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm[6/27/2012 2:02:18 PM]
The effect of a scale parameter greater than one is to stretch
the pdf. The greater the magnitude, the greater the stretching.
The effect of a scale parameter less than one is to compress
the pdf. The compressing approaches a spike as the scale
parameter goes to zero. A scale parameter of 1 leaves the pdf
unchanged (if the scale parameter is 1 to begin with) and
non-positive scale parameters are not allowed.
Location
and Scale
Together
The following graph shows the effect of both a location and
a scale parameter. The plot has been shifted right 10 units
and stretched by a factor of 3.
Standard
Form
The standard form of any distribution is the form that has
location parameter zero and scale parameter one.
It is common in statistical software packages to only
compute the standard form of the distribution. There are
formulas for converting from the standard form to the form
with other location and scale parameters. These formulas are
independent of the particular probability distribution.
1.3.6.4. Location and Scale Parameters
http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm[6/27/2012 2:02:18 PM]
Formulas
for Location
and Scale
Based on
the
Standard
Form
The following are the formulas for computing various
probability functions based on the standard form of the
distribution. The parameter a refers to the location parameter
and the parameter b refers to the scale parameter. Shape
parameters are not included.
Cumulative Distribution
Function
F(x;a,b) = F((x-a)/b;0,1)
Probability Density Function f(x;a,b) = (1/b)f((x-a)/b;0,1)
Percent Point Function G( ;a,b) = a + bG( ;0,1)
Hazard Function h(x;a,b) = (1/b)h((x-a)/b;0,1)
Cumulative Hazard Function H(x;a,b) = H((x-a)/b;0,1)
Survival Function S(x;a,b) = S((x-a)/b;0,1)
Inverse Survival Function Z( ;a,b) = a + bZ( ;0,1)
Random Numbers Y(a,b) = a + bY(0,1)
Relationship
to Mean
and
Standard
Deviation
For the normal distribution, the location and scale parameters
correspond to the mean and standard deviation, respectively.
However, this is not necessarily true for other distributions.
In fact, it is not true for most distributions.
1.3.6.5. Estimating the Parameters of a Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda365.htm[6/27/2012 2:02:19 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.5. Estimating the Parameters of a
Distribution
Model a
univariate
data set
with a
probability
distribution
One common application of probability distributions is
modeling univariate data with a specific probability
distribution. This involves the following two steps:
1. Determination of the "best-fitting" distribution.
2. Estimation of the parameters (shape, location, and scale
parameters) for that distribution.
Various
Methods
There are various methods, both numerical and graphical, for
estimating the parameters of a probability distribution.
1. Method of moments
2. Maximum likelihood
3. Least squares
4. PPCC and probability plots
1.3.6.5.1. Method of Moments
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3651.htm[6/27/2012 2:02:19 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.5. Estimating the Parameters of a Distribution
1.3.6.5.1. Method of Moments
Method of
Moments
The method of moments equates sample moments to parameter
estimates. When moment methods are available, they have the
advantage of simplicity. The disadvantage is that they are often
not available and they do not have the desirable optimality
properties of maximum likelihood and least squares estimators.
The primary use of moment estimates is as starting values for
the more precise maximum likelihood and least squares
estimates.
Software Most general purpose statistical software does not include
explicit method of moments parameter estimation commands.
However, when utilized, the method of moment formulas tend
to be straightforward and can be easily implemented in most
statistical software programs.
1.3.6.5.2. Maximum Likelihood
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3652.htm[6/27/2012 2:02:20 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.5. Estimating the Parameters of a Distribution
1.3.6.5.2. Maximum Likelihood
Maximum
Likelihood
Maximum likelihood estimation begins with the
mathematical expression known as a likelihood function of
the sample data. Loosely speaking, the likelihood of a set
of data is the probability of obtaining that particular set of
data given the chosen probability model. This expression
contains the unknown parameters. Those values of the
parameter that maximize the sample likelihood are known
as the maximum likelihood estimates.
The reliability chapter contains some examples of the
likelihood functions for a few of the commonly used
distributions in reliability analysis.
Advantages The advantages of this method are:
Maximum likelihood provides a consistent approach
to parameter estimation problems. This means that
maximum likelihood estimates can be developed for
a large variety of estimation situations. For example,
they can be applied in reliability analysis to
censored data under various censoring models.
Maximum likelihood methods have desirable
mathematical and optimality properties. Specifically,
1. They become minimum variance unbiased
estimators as the sample size increases. By
unbiased, we mean that if we take (a very
large number of) random samples with
replacement from a population, the average
value of the parameter estimates will be
theoretically exactly equal to the population
value. By minimum variance, we mean that
the estimator has the smallest variance, and
thus the narrowest confidence interval, of all
estimators of that type.
2. They have approximate normal distributions
and approximate sample variances that can be
used to generate confidence bounds and
hypothesis tests for the parameters.
Several popular statistical software packages provide
1.3.6.5.2. Maximum Likelihood
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3652.htm[6/27/2012 2:02:20 PM]
excellent algorithms for maximum likelihood
estimates for many of the commonly used
distributions. This helps mitigate the computational
complexity of maximum likelihood estimation.
Disadvantages The disadvantages of this method are:
The likelihood equations need to be specifically
worked out for a given distribution and estimation
problem. The mathematics is often non-trivial,
particularly if confidence intervals for the
parameters are desired.
The numerical estimation is usually non-trivial.
Except for a few cases where the maximum
likelihood formulas are in fact simple, it is generally
best to rely on high quality statistical software to
obtain maximum likelihood estimates. Fortunately,
high quality maximum likelihood software is
becoming increasingly common.
Maximum likelihood estimates can be heavily biased
for small samples. The optimality properties may not
apply for small samples.
Maximum likelihood can be sensitive to the choice
of starting values.
Software Most general purpose statistical software programs support
maximum likelihood estimation (MLE) in some form.
MLE estimation can be supported in two ways.
1. A software program may provide a generic function
minimization (or equivalently, maximization)
capability. This is also referred to as function
optimization. Maximum likelihood estimation is
essentially a function optimization problem.
This type of capability is particularly common in
mathematical software programs.
2. A software program may provide MLE
computations for a specific problem. For example, it
may generate ML estimates for the parameters of a
Weibull distribution.
Statistical software programs will often provide ML
estimates for many specific problems even when
they do not support general function optimization.
The advantage of function minimization software is that it
can be applied to many different MLE problems. The
drawback is that you have to specify the maximum
1.3.6.5.2. Maximum Likelihood
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3652.htm[6/27/2012 2:02:20 PM]
likelihood equations to the software. As the functions can
be non-trivial, there is potential for error in entering the
equations.
The advantage of the specific MLE procedures is that
greater efficiency and better numerical stability can often
be obtained by taking advantage of the properties of the
specific estimation problem. The specific methods often
return explicit confidence intervals. In addition, you do not
have to know or specify the likelihood equations to the
software. The disadvantage is that each MLE problem
must be specifically coded.
1.3.6.5.3. Least Squares
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3653.htm[6/27/2012 2:02:21 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.5. Estimating the Parameters of a Distribution
1.3.6.5.3. Least Squares
Least Squares Non-linear least squares provides an alternative to
maximum likelihood.
Advantages The advantages of this method are:
Non-linear least squares software may be available
in many statistical software packages that do not
support maximum likelihood estimates.
It can be applied more generally than maximum
likelihood. That is, if your software provides non-
linear fitting and it has the ability to specify the
probability function you are interested in, then you
can generate least squares estimates for that
distribution. This will allow you to obtain reasonable
estimates for distributions even if the software does
not provide maximum likelihood estimates.
Disadvantages The disadvantages of this method are:
It is not readily applicable to censored data.
It is generally considered to have less desirable
optimality properties than maximum likelihood.
It can be quite sensitive to the choice of starting
values.
Software Non-linear least squares fitting is available in many
general purpose statistical software programs.
1.3.6.5.4. PPCC and Probability Plots
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3654.htm[6/27/2012 2:02:22 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.5. Estimating the Parameters of a Distribution
1.3.6.5.4. PPCC and Probability Plots
PPCC and
Probability
Plots
The PPCC plot can be used to estimate the shape
parameter of a distribution with a single shape parameter.
After finding the best value of the shape parameter, the
probability plot can be used to estimate the location and
scale parameters of a probability distribution.
Advantages The advantages of this method are:
It is based on two well-understood concepts.
1. The linearity (i.e., straightness) of the
probability plot is a good measure of the
adequacy of the distributional fit.
2. The correlation coefficient between the points
on the probability plot is a good measure of
the linearity of the probability plot.
It is an easy technique to implement for a wide
variety of distributions with a single shape
parameter. The basic requirement is to be able to
compute the percent point function, which is needed
in the computation of both the probability plot and
the PPCC plot.
The PPCC plot provides insight into the sensitivity
of the shape parameter. That is, if the PPCC plot is
relatively flat in the neighborhood of the optimal
value of the shape parameter, this is a strong
indication that the fitted model will not be sensitive
to small deviations, or even large deviations in some
cases, in the value of the shape parameter.
The maximum correlation value provides a method
for comparing across distributions as well as
identifying the best value of the shape parameter for
a given distribution. For example, we could use the
PPCC and probability fits for the Weibull,
lognormal, and possibly several other distributions.
Comparing the maximum correlation coefficient
achieved for each distribution can help in selecting
which is the best distribution to use.
1.3.6.5.4. PPCC and Probability Plots
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3654.htm[6/27/2012 2:02:22 PM]
Disadvantages The disadvantages of this method are:
It is limited to distributions with a single shape
parameter.
PPCC plots are not widely available in statistical
software packages other than Dataplot (Dataplot
provides PPCC plots for 40+ distributions).
Probability plots are generally available. However,
many statistical software packages only provide
them for a limited number of distributions.
Significance levels for the correlation coefficient
(i.e., if the maximum correlation value is above a
given value, then the distribution provides an
adequate fit for the data with a given confidence
level) have only been worked out for a limited
number of distributions.
Other
Graphical
Methods
For reliability applications, the hazard plot and the Weibull
plot are alternative graphical methods that are commonly
used to estimate parameters.
1.3.6.6. Gallery of Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm[6/27/2012 2:02:23 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
Gallery of
Common
Distributions
Detailed information on a few of the most common
distributions is available below. There are a large number of
distributions used in statistical applications. It is beyond the
scope of this Handbook to discuss more than a few of these.
Two excellent sources for additional detailed information on
a large array of distributions are Johnson, Kotz, and
Balakrishnan and Evans, Hastings, and Peacock. Equations
for the probability functions are given for the standard form
of the distribution. Formulas exist for defining the functions
with location and scale parameters in terms of the standard
form of the distribution.
The sections on parameter estimation are restricted to the
method of moments and maximum likelihood. This is
because the least squares and PPCC and probability plot
estimation procedures are generic. The maximum likelihood
equations are not listed if they involve solving simultaneous
equations. This is because these methods require
sophisticated computer software to solve. Except where the
maximum likelihood estimates are trivial, you should depend
on a statistical software program to compute them.
References are given for those who are interested.
Be aware that different sources may give formulas that are
different from those shown here. In some cases, these are
simply mathematically equivalent formulations. In other
cases, a different parameterization may be used.
Continuous
Distributions
Normal
Distribution
Uniform
Distribution
Cauchy
Distribution
1.3.6.6. Gallery of Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm[6/27/2012 2:02:23 PM]
t Distribution F Distribution Chi-Square
Distribution
Exponential
Distribution
Weibull
Distribution
Lognormal
Distribution
Birnbaum-
Saunders
(Fatigue Life)
Distribution
Gamma
Distribution
Double
Exponential
Distribution
Power Normal
Distribution
Power
Lognormal
Distribution
Tukey-Lambda
Distribution
Extreme Value
Type I
Distribution
Beta Distribution
Discrete
Distributions
Binomial
Distribution
Poisson
Distribution
1.3.6.6.1. Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm[6/27/2012 2:02:25 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.1. Normal Distribution
Probability
Density
Function
The general formula for the probability density function of
the normal distribution is
where is the location parameter and is the scale
parameter. The case where = 0 and = 1 is called the
standard normal distribution. The equation for the standard
normal distribution is
Since the general form of probability functions can be
expressed in terms of the standard distribution, all subsequent
formulas in this section are given for the standard form of the
function.
The following is the plot of the standard normal probability
density function.
Cumulative The formula for the cumulative distribution function of the
1.3.6.6.1. Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm[6/27/2012 2:02:25 PM]
Distribution
Function
normal distribution does not exist in a simple closed formula.
It is computed numerically.
The following is the plot of the normal cumulative
distribution function.
Percent
Point
Function
The formula for the percent point function of the normal
distribution does not exist in a simple closed formula. It is
computed numerically.
The following is the plot of the normal percent point
function.
Hazard
Function
The formula for the hazard function of the normal
distribution is
where is the cumulative distribution function of the
1.3.6.6.1. Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm[6/27/2012 2:02:25 PM]
standard normal distribution and is the probability density
function of the standard normal distribution.
The following is the plot of the normal hazard function.
Cumulative
Hazard
Function
The normal cumulative hazard function can be computed
from the normal cumulative distribution function.
The following is the plot of the normal cumulative hazard
function.
Survival
Function
The normal survival function can be computed from the
normal cumulative distribution function.
The following is the plot of the normal survival function.
1.3.6.6.1. Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm[6/27/2012 2:02:25 PM]
Inverse
Survival
Function
The normal inverse survival function can be computed from
the normal percent point function.
The following is the plot of the normal inverse survival
function.
Common
Statistics
Mean The location parameter .
Median The location parameter .
Mode The location parameter .
Range Infinity in both directions.
Standard
Deviation
The scale parameter .
Coefficient of
Variation
Skewness 0
Kurtosis 3
Parameter The location and scale parameters of the normal distribution
1.3.6.6.1. Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3661.htm[6/27/2012 2:02:25 PM]
Estimation can be estimated with the sample mean and sample standard
deviation, respectively.
Comments For both theoretical and practical reasons, the normal
distribution is probably the most important distribution in
statistics. For example,
Many classical statistical tests are based on the
assumption that the data follow a normal distribution.
This assumption should be tested before applying these
tests.
In modeling applications, such as linear and non-linear
regression, the error term is often assumed to follow a
normal distribution with fixed location and scale.
The normal distribution is used to find significance
levels in many hypothesis tests and confidence
intervals.
Theroretical
Justification
- Central
Limit
Theorem
The normal distribution is widely used. Part of the appeal is
that it is well behaved and mathematically tractable.
However, the central limit theorem provides a theoretical
basis for why it has wide applicability.
The central limit theorem basically states that as the sample
size (N) becomes large, the following occur:
1. The sampling distribution of the mean becomes
approximately normal regardless of the distribution of
the original variable.
2. The sampling distribution of the mean is centered at
the population mean, , of the original variable. In
addition, the standard deviation of the sampling
distribution of the mean approaches .
Software Most general purpose statistical software programs support at
least some of the probability functions for the normal
distribution.
1.3.6.6.2. Uniform Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm[6/27/2012 2:02:26 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.2. Uniform Distribution
Probability
Density
Function
The general formula for the probability density function of the uniform
distribution is
where A is the location parameter and (B - A) is the scale parameter. The
case where A = 0 and B = 1 is called the standard uniform distribution.
The equation for the standard uniform distribution is
Since the general form of probability functions can be expressed in terms of
the standard distribution, all subsequent formulas in this section are given
for the standard form of the function.
The following is the plot of the uniform probability density function.
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the uniform
distribution is
The following is the plot of the uniform cumulative distribution function.
1.3.6.6.2. Uniform Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm[6/27/2012 2:02:26 PM]
Percent
Point
Function
The formula for the percent point function of the uniform distribution is
The following is the plot of the uniform percent point function.
Hazard
Function
The formula for the hazard function of the uniform distribution is
The following is the plot of the uniform hazard function.
1.3.6.6.2. Uniform Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm[6/27/2012 2:02:26 PM]
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the uniform distribution
is
The following is the plot of the uniform cumulative hazard function.
Survival
Function
The uniform survival function can be computed from the uniform
cumulative distribution function.
The following is the plot of the uniform survival function.
1.3.6.6.2. Uniform Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm[6/27/2012 2:02:26 PM]
Inverse
Survival
Function
The uniform inverse survival function can be computed from the uniform
percent point function.
The following is the plot of the uniform inverse survival function.
Common
Statistics
Mean (A + B)/2
Median (A + B)/2
Range B - A
Standard Deviation
Coefficient of
Variation
Skewness 0
Kurtosis 9/5
Parameter
Estimation
The method of moments estimators for A and B are
1.3.6.6.2. Uniform Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3662.htm[6/27/2012 2:02:26 PM]
The maximum likelihood estimators are usually given in terms of the
parameters a and h where
A = a - h
B = a + h
The maximum likelihood estimators for a and h are
This gives the following maximum likelihood estimators for A and B
Comments The uniform distribution defines equal probability over a given range for a
continuous distribution. For this reason, it is important as a reference
distribution.
One of the most important applications of the uniform distribution is in the
generation of random numbers. That is, almost all random number
generators generate random numbers on the (0,1) interval. For other
distributions, some transformation is applied to the uniform random
numbers.
Software Most general purpose statistical software programs support at least some of
the probability functions for the uniform distribution.
1.3.6.6.3. Cauchy Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm[6/27/2012 2:02:28 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.3. Cauchy Distribution
Probability
Density
Function
The general formula for the probability density function of
the Cauchy distribution is
where t is the location parameter and s is the scale parameter.
The case where t = 0 and s = 1 is called the standard
Cauchy distribution. The equation for the standard Cauchy
distribution reduces to
Since the general form of probability functions can be
expressed in terms of the standard distribution, all subsequent
formulas in this section are given for the standard form of the
function.
The following is the plot of the standard Cauchy probability
density function.
Cumulative
Distribution
The formula for the cumulative distribution function for the
Cauchy distribution is
1.3.6.6.3. Cauchy Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm[6/27/2012 2:02:28 PM]
Function
The following is the plot of the Cauchy cumulative
distribution function.
Percent
Point
Function
The formula for the percent point function of the Cauchy
distribution is
The following is the plot of the Cauchy percent point
function.
Hazard
Function
The Cauchy hazard function can be computed from the
Cauchy probability density and cumulative distribution
functions.
The following is the plot of the Cauchy hazard function.
1.3.6.6.3. Cauchy Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm[6/27/2012 2:02:28 PM]
Cumulative
Hazard
Function
The Cauchy cumulative hazard function can be computed
from the Cauchy cumulative distribution function.
The following is the plot of the Cauchy cumulative hazard
function.
Survival
Function
The Cauchy survival function can be computed from the
Cauchy cumulative distribution function.
The following is the plot of the Cauchy survival function.
1.3.6.6.3. Cauchy Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm[6/27/2012 2:02:28 PM]
Inverse
Survival
Function
The Cauchy inverse survival function can be computed from
the Cauchy percent point function.
The following is the plot of the Cauchy inverse survival
function.
Common
Statistics
Mean The mean is undefined.
Median The location parameter t.
Mode The location parameter t.
Range Infinity in both directions.
Standard
Deviation
The standard deviation is undefined.
Coefficient of
Variation
The coefficient of variation is undefined.
Skewness The skewness is undefined.
Kurtosis The kurtosis is undefined.
Parameter The likelihood functions for the Cauchy maximum likelihood
1.3.6.6.3. Cauchy Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm[6/27/2012 2:02:28 PM]
Estimation estimates are given in chapter 16 of Johnson, Kotz, and
Balakrishnan. These equations typically must be solved
numerically on a computer.
Comments The Cauchy distribution is important as an example of a
pathological case. Cauchy distributions look similar to a
normal distribution. However, they have much heavier tails.
When studying hypothesis tests that assume normality, seeing
how the tests perform on data from a Cauchy distribution is a
good indicator of how sensitive the tests are to heavy-tail
departures from normality. Likewise, it is a good check for
robust techniques that are designed to work well under a wide
variety of distributional assumptions.
The mean and standard deviation of the Cauchy distribution
are undefined. The practical meaning of this is that collecting
1,000 data points gives no more accurate an estimate of the
mean and standard deviation than does a single point.
Software Many general purpose statistical software programs support
at least some of the probability functions for the Cauchy
distribution.
1.3.6.6.4. t Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3664.htm[6/27/2012 2:02:29 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.4. t Distribution
Probability
Density
Function
The formula for the probability density function of the t
distribution is
where is the beta function and is a positive integer shape
parameter. The formula for the beta function is
In a testing context, the t distribution is treated as a
"standardized distribution" (i.e., no location or scale
parameters). However, in a distributional modeling context
(as with other probability distributions), the t distribution
itself can be transformed with a location parameter, , and a
scale parameter, .
The following is the plot of the t probability density function
for 4 different values of the shape parameter.
These plots all have a similar shape. The difference is in the
heaviness of the tails. In fact, the t distribution with equal
1.3.6.6.4. t Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3664.htm[6/27/2012 2:02:29 PM]
to 1 is a Cauchy distribution. The t distribution approaches a
normal distribution as becomes large. The approximation is
quite good for values of > 30.
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the t
distribution is complicated and is not included here. It is
given in the Evans, Hastings, and Peacock book.
The following are the plots of the t cumulative distribution
function with the same values of as the pdf plots above.
Percent
Point
Function
The formula for the percent point function of the t
distribution does not exist in a simple closed form. It is
computed numerically.
The following are the plots of the t percent point function
with the same values of as the pdf plots above.
Other
Probability
Functions
Since the t distribution is typically used to develop hypothesis
tests and confidence intervals and rarely for modeling
applications, we omit the formulas and plots for the hazard,
1.3.6.6.4. t Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3664.htm[6/27/2012 2:02:29 PM]
cumulative hazard, survival, and inverse survival probability
functions.
Common
Statistics
Mean 0 (It is undefined for equal to 1.)
Median 0
Mode 0
Range Infinity in both directions.
Standard
Deviation
It is undefined for equal to 1 or 2.
Coefficient of
Variation
Undefined
Skewness 0. It is undefined for less than or equal
to 3. However, the t distribution is
symmetric in all cases.
Kurtosis
It is undefined for less than or equal to
4.
Parameter
Estimation
Since the t distribution is typically used to develop hypothesis
tests and confidence intervals and rarely for modeling
applications, we omit any discussion of parameter estimation.
Comments The t distribution is used in many cases for the critical
regions for hypothesis tests and in determining confidence
intervals. The most common example is testing if data are
consistent with the assumed process mean.
Software Most general purpose statistical software programs support at
least some of the probability functions for the t distribution.
1.3.6.6.5. F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm[6/27/2012 2:02:31 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.5. F Distribution
Probability
Density
Function
The F distribution is the ratio of two chi-square distributions
with degrees of freedom and , respectively, where each
chi-square has first been divided by its degrees of freedom.
The formula for the probability density function of the F
distribution is
where and are the shape parameters and is the gamma
function. The formula for the gamma function is
In a testing context, the F distribution is treated as a
"standardized distribution" (i.e., no location or scale
parameters). However, in a distributional modeling context
(as with other probability distributions), the F distribution
itself can be transformed with a location parameter, , and a
scale parameter, .
The following is the plot of the F probability density function
for 4 different values of the shape parameters.
1.3.6.6.5. F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm[6/27/2012 2:02:31 PM]
Cumulative
Distribution
Function
The formula for the Cumulative distribution function of the F
distribution is
where k = / ( + *x) and I
k
is the incomplete beta
function. The formula for the incomplete beta function is
where B is the beta function
The following is the plot of the F cumulative distribution
function with the same values of and as the pdf plots
above.
Percent
Point
Function
The formula for the percent point function of the F
distribution does not exist in a simple closed form. It is
computed numerically.
The following is the plot of the F percent point function with
the same values of and as the pdf plots above.
1.3.6.6.5. F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm[6/27/2012 2:02:31 PM]
Other
Probability
Functions
Since the F distribution is typically used to develop
hypothesis tests and confidence intervals and rarely for
modeling applications, we omit the formulas and plots for the
hazard, cumulative hazard, survival, and inverse survival
probability functions.
Common
Statistics
The formulas below are for the case where the location
parameter is zero and the scale parameter is one.
Mean
Mode
Range 0 to positive infinity
Standard
Deviation
Coefficient of
Variation
Skewness
Parameter
Estimation
Since the F distribution is typically used to develop
hypothesis tests and confidence intervals and rarely for
modeling applications, we omit any discussion of parameter
estimation.
Comments The F distribution is used in many cases for the critical
regions for hypothesis tests and in determining confidence
intervals. Two common examples are the analysis of variance
and the F test to determine if the variances of two populations
are equal.
1.3.6.6.5. F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm[6/27/2012 2:02:31 PM]
Software Most general purpose statistical software programs support at
least some of the probability functions for the F distribution.
1.3.6.6.6. Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm[6/27/2012 2:02:32 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.6. Chi-Square Distribution
Probability
Density
Function
The chi-square distribution results when independent
variables with standard normal distributions are squared and
summed. The formula for the probability density function of
the chi-square distribution is
where is the shape parameter and is the gamma function.
The formula for the gamma function is
In a testing context, the chi-square distribution is treated as a
"standardized distribution" (i.e., no location or scale
parameters). However, in a distributional modeling context
(as with other probability distributions), the chi-square
distribution itself can be transformed with a location
parameter, , and a scale parameter, .
The following is the plot of the chi-square probability density
function for 4 different values of the shape parameter.
1.3.6.6.6. Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm[6/27/2012 2:02:32 PM]
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the
chi-square distribution is
where is the gamma function defined above and is the
incomplete gamma function. The formula for the incomplete
gamma function is
The following is the plot of the chi-square cumulative
distribution function with the same values of as the pdf
plots above.
Percent
Point
Function
The formula for the percent point function of the chi-square
distribution does not exist in a simple closed form. It is
computed numerically.
The following is the plot of the chi-square percent point
function with the same values of as the pdf plots above.
1.3.6.6.6. Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm[6/27/2012 2:02:32 PM]
Other
Probability
Functions
Since the chi-square distribution is typically used to develop
hypothesis tests and confidence intervals and rarely for
modeling applications, we omit the formulas and plots for the
hazard, cumulative hazard, survival, and inverse survival
probability functions.
Common
Statistics
Mean
Median approximately - 2/3 for large
Mode
Range 0 to positive infinity
Standard
Deviation
Coefficient of
Variation
Skewness
Kurtosis
Parameter
Estimation
Since the chi-square distribution is typically used to develop
hypothesis tests and confidence intervals and rarely for
modeling applications, we omit any discussion of parameter
estimation.
Comments The chi-square distribution is used in many cases for the
critical regions for hypothesis tests and in determining
confidence intervals. Two common examples are the chi-
square test for independence in an RxC contingency table
and the chi-square test to determine if the standard deviation
of a population is equal to a pre-specified value.
Software Most general purpose statistical software programs support at
least some of the probability functions for the chi-square
distribution.
1.3.6.6.6. Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm[6/27/2012 2:02:32 PM]
1.3.6.6.7. Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3667.htm[6/27/2012 2:02:33 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.7. Exponential Distribution
Probability
Density
Function
The general formula for the probability density function of
the exponential distribution is
where is the location parameter and is the scale
parameter (the scale parameter is often referred to as which
equals ). The case where = 0 and = 1 is called the
standard exponential distribution. The equation for the
standard exponential distribution is
The general form of probability functions can be expressed
in terms of the standard distribution. Subsequent formulas in
this section are given for the 1-parameter (i.e., with scale
parameter) form of the function.
The following is the plot of the exponential probability
density function.
Cumulative
Distribution
The formula for the cumulative distribution function of the
exponential distribution is
1.3.6.6.7. Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3667.htm[6/27/2012 2:02:33 PM]
Function
The following is the plot of the exponential cumulative
distribution function.
Percent
Point
Function
The formula for the percent point function of the exponential
distribution is
The following is the plot of the exponential percent point
function.
Hazard
Function
The formula for the hazard function of the exponential
distribution is
The following is the plot of the exponential hazard function.
1.3.6.6.7. Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3667.htm[6/27/2012 2:02:33 PM]
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the
exponential distribution is
The following is the plot of the exponential cumulative
hazard function.
Survival
Function
The formula for the survival function of the exponential
distribution is
The following is the plot of the exponential survival function.
1.3.6.6.7. Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3667.htm[6/27/2012 2:02:33 PM]
Inverse
Survival
Function
The formula for the inverse survival function of the
exponential distribution is
The following is the plot of the exponential inverse survival
function.
Common
Statistics
Mean
Median
Mode Zero
Range Zero to plus infinity
Standard
Deviation
Coefficient of
Variation
1
Skewness 2
Kurtosis 9
1.3.6.6.7. Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3667.htm[6/27/2012 2:02:33 PM]
Parameter
Estimation
For the full sample case, the maximum likelihood estimator
of the scale parameter is the sample mean. Maximum
likelihood estimation for the exponential distribution is
discussed in the chapter on reliability (Chapter 8). It is also
discussed in chapter 19 of Johnson, Kotz, and Balakrishnan.
Comments The exponential distribution is primarily used in reliability
applications. The exponential distribution is used to model
data with a constant failure rate (indicated by the hazard plot
which is simply equal to a constant).
Software Most general purpose statistical software programs support at
least some of the probability functions for the exponential
distribution.
1.3.6.6.8. Weibull Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3668.htm[6/27/2012 2:02:35 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.8. Weibull Distribution
Probability
Density
Function
The formula for the probability density function of the general Weibull
distribution is
where is the shape parameter, is the location parameter and is the
scale parameter. The case where = 0 and = 1 is called the standard
Weibull distribution. The case where = 0 is called the 2-parameter
Weibull distribution. The equation for the standard Weibull distribution
reduces to
Since the general form of probability functions can be expressed in
terms of the standard distribution, all subsequent formulas in this
section are given for the standard form of the function.
The following is the plot of the Weibull probability density function.
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the Weibull
distribution is
1.3.6.6.8. Weibull Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3668.htm[6/27/2012 2:02:35 PM]
The following is the plot of the Weibull cumulative distribution
function with the same values of as the pdf plots above.
Percent
Point
Function
The formula for the percent point function of the Weibull distribution is
The following is the plot of the Weibull percent point function with the
same values of as the pdf plots above.
Hazard
Function
The formula for the hazard function of the Weibull distribution is
The following is the plot of the Weibull hazard function with the same
values of as the pdf plots above.
1.3.6.6.8. Weibull Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3668.htm[6/27/2012 2:02:35 PM]
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the Weibull
distribution is
The following is the plot of the Weibull cumulative hazard function
with the same values of as the pdf plots above.
Survival
Function
The formula for the survival function of the Weibull distribution is
The following is the plot of the Weibull survival function with the same
values of as the pdf plots above.
1.3.6.6.8. Weibull Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3668.htm[6/27/2012 2:02:35 PM]
Inverse
Survival
Function
The formula for the inverse survival function of the Weibull
distribution is
The following is the plot of the Weibull inverse survival function with
the same values of as the pdf plots above.
Common
Statistics
The formulas below are with the location parameter equal to zero and
the scale parameter equal to one.
Mean
where is the gamma function
Median
Mode
1.3.6.6.8. Weibull Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3668.htm[6/27/2012 2:02:35 PM]
Range Zero to positive infinity.
Standard Deviation
Coefficient of
Variation
Parameter
Estimation
Maximum likelihood estimation for the Weibull distribution is
discussed in the Reliability chapter (Chapter 8). It is also discussed in
Chapter 21 of Johnson, Kotz, and Balakrishnan.
Comments The Weibull distribution is used extensively in reliability applications
to model failure times.
Software Most general purpose statistical software programs support at least
some of the probability functions for the Weibull distribution.
1.3.6.6.9. Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm[6/27/2012 2:02:37 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.9. Lognormal Distribution
Probability
Density
Function
A variable X is lognormally distributed if Y = LN(X) is
normally distributed with "LN" denoting the natural
logarithm. The general formula for the probability density
function of the lognormal distribution is
where is the shape parameter, is the location parameter
and m is the scale parameter. The case where = 0 and m =
1 is called the standard lognormal distribution. The case
where equals zero is called the 2-parameter lognormal
distribution.
The equation for the standard lognormal distribution is
Since the general form of probability functions can be
expressed in terms of the standard distribution, all subsequent
formulas in this section are given for the standard form of the
function.
The following is the plot of the lognormal probability density
function for four values of .
1.3.6.6.9. Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm[6/27/2012 2:02:37 PM]
There are several common parameterizations of the
lognormal distribution. The form given here is from Evans,
Hastings, and Peacock.
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the
lognormal distribution is
where is the cumulative distribution function of the normal
distribution.
The following is the plot of the lognormal cumulative
distribution function with the same values of as the pdf
plots above.
Percent
Point
Function
The formula for the percent point function of the lognormal
distribution is
1.3.6.6.9. Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm[6/27/2012 2:02:37 PM]
where is the percent point function of the normal
distribution.
The following is the plot of the lognormal percent point
function with the same values of as the pdf plots above.
Hazard
Function
The formula for the hazard function of the lognormal
distribution is
where is the probability density function of the normal
distribution and is the cumulative distribution function of
the normal distribution.
The following is the plot of the lognormal hazard function
with the same values of as the pdf plots above.
1.3.6.6.9. Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm[6/27/2012 2:02:37 PM]
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the
lognormal distribution is
where is the cumulative distribution function of the normal
distribution.
The following is the plot of the lognormal cumulative hazard
function with the same values of as the pdf plots above.
Survival
Function
The formula for the survival function of the lognormal
distribution is
where is the cumulative distribution function of the normal
distribution.
The following is the plot of the lognormal survival function
with the same values of as the pdf plots above.
1.3.6.6.9. Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm[6/27/2012 2:02:37 PM]
Inverse
Survival
Function
The formula for the inverse survival function of the
lognormal distribution is
where is the percent point function of the normal
distribution.
The following is the plot of the lognormal inverse survival
function with the same values of as the pdf plots above.
Common
Statistics
The formulas below are with the location parameter equal to
zero and the scale parameter equal to one.
Mean
Median Scale parameter m (= 1 if scale parameter
not specified).
Mode
Range Zero to positive infinity
1.3.6.6.9. Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm[6/27/2012 2:02:37 PM]
Standard
Deviation
Skewness
Kurtosis
Coefficient of
Variation
Parameter
Estimation
The maximum likelihood estimates for the scale parameter,
m, and the shape parameter, , are
and
where
If the location parameter is known, it can be subtracted from
the original data points before computing the maximum
likelihood estimates of the shape and scale parameters.
Comments The lognormal distribution is used extensively in reliability
applications to model failure times. The lognormal and
Weibull distributions are probably the most commonly used
distributions in reliability applications.
Software Most general purpose statistical software programs support at
least some of the probability functions for the lognormal
distribution.
1.3.6.6.10. Fatigue Life Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366a.htm[6/27/2012 2:02:38 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.10. Birnbaum-Saunders (Fatigue Life)
Distribution
Probability
Density
Function
The Birnbaum-Saunders distribution is also commonly known as the
fatigue life distribution. There are several alternative formulations of
the Birnbaum-Saunders distribution in the literature.
The general formula for the probability density function of the
Birnbaum-Saunders distribution is
where is the shape parameter, is the location parameter, is the
scale parameter, is the probability density function of the standard
normal distribution, and is the cumulative distribution function of
the standard normal distribution. The case where = 0 and = 1 is
called the standard Birnbaum-Saunders distribution. The equation
for the standard Birnbaum-Saunders distribution reduces to
Since the general form of probability functions can be expressed in
terms of the standard distribution, all subsequent formulas in this
section are given for the standard form of the function.
The following is the plot of the Birnbaum-Saunders probability density
function.
1.3.6.6.10. Fatigue Life Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366a.htm[6/27/2012 2:02:38 PM]
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the Birnbaum-
Saunders distribution is
where is the cumulative distribution function of the standard normal
distribution. The following is the plot of the Birnbaum-Saunders
cumulative distribution function with the same values of as the pdf
plots above.
Percent
Point
Function
The formula for the percent point function of the Birnbaum-Saunders
distribution is
where is the percent point function of the standard normal
distribution. The following is the plot of the Birnbaum-Saunders
percent point function with the same values of as the pdf plots
1.3.6.6.10. Fatigue Life Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366a.htm[6/27/2012 2:02:38 PM]
above.
Hazard
Function
The Birnbaum-Saunders hazard function can be computed from the
Birnbaum-Saunders probability density and cumulative distribution
functions.
The following is the plot of the Birnbaum-Saunders hazard function
with the same values of as the pdf plots above.
Cumulative
Hazard
Function
The Birnbaum-Saunders cumulative hazard function can be computed
from the Birnbaum-Saunders cumulative distribution function.
The following is the plot of the Birnbaum-Saunders cumulative hazard
function with the same values of as the pdf plots above.
1.3.6.6.10. Fatigue Life Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366a.htm[6/27/2012 2:02:38 PM]
Survival
Function
The Birnbaum-Saunders survival function can be computed from the
Birnbaum-Saunders cumulative distribution function.
The following is the plot of the Birnbaum-Saunders survival function
with the same values of as the pdf plots above.
Inverse
Survival
Function
The Birnbaum-Saunders inverse survival function can be computed
from the Birnbaum-Saunders percent point function.
The following is the plot of the gamma inverse survival function with
the same values of as the pdf plots above.
1.3.6.6.10. Fatigue Life Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366a.htm[6/27/2012 2:02:38 PM]
Common
Statistics
The formulas below are with the location parameter equal to zero and
the scale parameter equal to one.
Mean
Range Zero to positive infinity.
Standard Deviation
Coefficient of
Variation
Parameter
Estimation
Maximum likelihood estimation for the Birnbaum-Saunders
distribution is discussed in the Reliability chapter.
Comments The Birnbaum-Saunders distribution is used extensively in reliability
applications to model failure times.
Software Some general purpose statistical software programs, including
Dataplot, support at least some of the probability functions for the
Birnbaum-Saunders distribution. Support for this distribution is likely
to be available for statistical programs that emphasize reliability
applications.
The "bs" package implements support for the Birnbaum-Saunders
distribution for the R package. See
Leiva, V., Hernandez, H., and Riquelme, M. (2006). A New
Package for the Birnbaum-Saunders Distribution. Rnews, 6/4,
35-40. (http://www.r-project.org)
1.3.6.6.11. Gamma Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366b.htm[6/27/2012 2:02:40 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.11. Gamma Distribution
Probability
Density
Function
The general formula for the probability density function of
the gamma distribution is
where is the shape parameter, is the location parameter,
is the scale parameter, and is the gamma function which
has the formula
The case where = 0 and = 1 is called the standard
gamma distribution. The equation for the standard gamma
distribution reduces to
Since the general form of probability functions can be
expressed in terms of the standard distribution, all subsequent
formulas in this section are given for the standard form of the
function.
The following is the plot of the gamma probability density
function.
1.3.6.6.11. Gamma Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366b.htm[6/27/2012 2:02:40 PM]
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the
gamma distribution is
where is the gamma function defined above and is
the incomplete gamma function. The incomplete gamma
function has the formula
The following is the plot of the gamma cumulative
distribution function with the same values of as the pdf
plots above.
Percent
Point
Function
The formula for the percent point function of the gamma
distribution does not exist in a simple closed form. It is
computed numerically.
1.3.6.6.11. Gamma Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366b.htm[6/27/2012 2:02:40 PM]
The following is the plot of the gamma percent point function
with the same values of as the pdf plots above.
Hazard
Function
The formula for the hazard function of the gamma
distribution is
The following is the plot of the gamma hazard function with
the same values of as the pdf plots above.
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the
gamma distribution is
where is the gamma function defined above and is
the incomplete gamma function defined above.
1.3.6.6.11. Gamma Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366b.htm[6/27/2012 2:02:40 PM]
The following is the plot of the gamma cumulative hazard
function with the same values of as the pdf plots above.
Survival
Function
The formula for the survival function of the gamma
distribution is
where is the gamma function defined above and is
the incomplete gamma function defined above.
The following is the plot of the gamma survival function with
the same values of as the pdf plots above.
Inverse
Survival
Function
The gamma inverse survival function does not exist in simple
closed form. It is computed numberically.
The following is the plot of the gamma inverse survival
function with the same values of as the pdf plots above.
1.3.6.6.11. Gamma Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366b.htm[6/27/2012 2:02:40 PM]
Common
Statistics
The formulas below are with the location parameter equal to
zero and the scale parameter equal to one.
Mean
Mode
Range Zero to positive infinity.
Standard
Deviation
Skewness
Kurtosis
Coefficient of
Variation
Parameter
Estimation
The method of moments estimators of the gamma distribution
are
where and s are the sample mean and standard deviation,
respectively.
The equations for the maximum likelihood estimation of the
shape and scale parameters are given in Chapter 18 of Evans,
Hastings, and Peacock and Chapter 17 of Johnson, Kotz, and
Balakrishnan. These equations need to be solved numerically;
this is typically accomplished by using statistical software
packages.
1.3.6.6.11. Gamma Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366b.htm[6/27/2012 2:02:40 PM]
Software Some general purpose statistical software programs support
at least some of the probability functions for the gamma
distribution.
1.3.6.6.12. Double Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366c.htm[6/27/2012 2:02:42 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.12. Double Exponential Distribution
Probability
Density
Function
The general formula for the probability density function of
the double exponential distribution is
where is the location parameter and is the scale
parameter. The case where = 0 and = 1 is called the
standard double exponential distribution. The equation for
the standard double exponential distribution is
Since the general form of probability functions can be
expressed in terms of the standard distribution, all subsequent
formulas in this section are given for the standard form of the
function.
The following is the plot of the double exponential
probability density function.
Cumulative
Distribution
The formula for the cumulative distribution function of the
double exponential distribution is
1.3.6.6.12. Double Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366c.htm[6/27/2012 2:02:42 PM]
Function
The following is the plot of the double exponential
cumulative distribution function.
Percent
Point
Function
The formula for the percent point function of the double
exponential distribution is
The following is the plot of the double exponential percent
point function.
Hazard
Function
The formula for the hazard function of the double exponential
distribution is
1.3.6.6.12. Double Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366c.htm[6/27/2012 2:02:42 PM]
The following is the plot of the double exponential hazard
function.
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the double
exponential distribution is
The following is the plot of the double exponential
cumulative hazard function.
Survival
Function
The double exponential survival function can be computed
from the cumulative distribution function of the double
exponential distribution.
The following is the plot of the double exponential survival
1.3.6.6.12. Double Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366c.htm[6/27/2012 2:02:42 PM]
function.
Inverse
Survival
Function
The formula for the inverse survival function of the double
exponential distribution is
The following is the plot of the double exponential inverse
survival function.
Common
Statistics
Mean
Median
Mode
Range Negative infinity to positive infinity
Standard
Deviation
Skewness 0
Kurtosis 6
1.3.6.6.12. Double Exponential Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366c.htm[6/27/2012 2:02:42 PM]
Coefficient of
Variation
Parameter
Estimation
The maximum likelihood estimators of the location and scale
parameters of the double exponential distribution are
where is the sample median.
Software Some general purpose statistical software programs support
at least some of the probability functions for the double
exponential distribution.
1.3.6.6.13. Power Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366d.htm[6/27/2012 2:02:43 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.13. Power Normal Distribution
Probability
Density
Function
The formula for the probability density function of the
standard form of the power normal distribution is
where p is the shape parameter (also referred to as the power
parameter), is the cumulative distribution function of the
standard normal distribution, and is the probability density
function of the standard normal distribution.
As with other probability distributions, the power normal
distribution can be transformed with a location parameter, ,
and a scale parameter, . We omit the equation for the
general form of the power normal distribution. Since the
general form of probability functions can be expressed in
terms of the standard distribution, all subsequent formulas in
this section are given for the standard form of the function.
The following is the plot of the power normal probability
density function with four values of p.
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the
power normal distribution is
1.3.6.6.13. Power Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366d.htm[6/27/2012 2:02:43 PM]
where is the cumulative distribution function of the
standard normal distribution.
The following is the plot of the power normal cumulative
distribution function with the same values of p as the pdf
plots above.
Percent
Point
Function
The formula for the percent point function of the power
normal distribution is
where is the percent point function of the standard
normal distribution.
The following is the plot of the power normal percent point
function with the same values of p as the pdf plots above.
Hazard The formula for the hazard function of the power normal
1.3.6.6.13. Power Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366d.htm[6/27/2012 2:02:43 PM]
Function distribution is
The following is the plot of the power normal hazard function
with the same values of p as the pdf plots above.
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the power
normal distribution is
The following is the plot of the power normal cumulative
hazard function with the same values of p as the pdf plots
above.
Survival
Function
The formula for the survival function of the power normal
distribution is
1.3.6.6.13. Power Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366d.htm[6/27/2012 2:02:43 PM]
The following is the plot of the power normal survival
function with the same values of p as the pdf plots above.
Inverse
Survival
Function
The formula for the inverse survival function of the power
normal distribution is
The following is the plot of the power normal inverse
survival function with the same values of p as the pdf plots
above.
Common
Statistics
The statistics for the power normal distribution are
complicated and require tables. Nelson discusses the mean,
median, mode, and standard deviation of the power normal
distribution and provides references to the appropriate tables.
Software Most general purpose statistical software programs do not
support the probability functions for the power normal
distribution.
1.3.6.6.13. Power Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366d.htm[6/27/2012 2:02:43 PM]
1.3.6.6.14. Power Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366e.htm[6/27/2012 2:02:44 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.14. Power Lognormal Distribution
Probability
Density
Function
The formula for the probability density function of the standard form of
the power lognormal distribution is
where p (also referred to as the power parameter) and are the shape
parameters, is the cumulative distribution function of the standard
normal distribution, and is the probability density function of the
standard normal distribution.
As with other probability distributions, the power lognormal distribution
can be transformed with a location parameter, , and a scale parameter,
B. We omit the equation for the general form of the power lognormal
distribution. Since the general form of probability functions can be
expressed in terms of the standard distribution, all subsequent formulas
in this section are given for the standard form of the function.
The following is the plot of the power lognormal probability density
function with four values of p and set to 1.
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the power
lognormal distribution is
1.3.6.6.14. Power Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366e.htm[6/27/2012 2:02:44 PM]
where is the cumulative distribution function of the standard normal
distribution.
The following is the plot of the power lognormal cumulative
distribution function with the same values of p as the pdf plots above.
Percent
Point
Function
The formula for the percent point function of the power lognormal
distribution is
where is the percent point function of the standard normal
distribution.
The following is the plot of the power lognormal percent point function
with the same values of p as the pdf plots above.
Hazard The formula for the hazard function of the power lognormal distribution
1.3.6.6.14. Power Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366e.htm[6/27/2012 2:02:44 PM]
Function is
where is the cumulative distribution function of the standard normal
distribution, and is the probability density function of the standard
normal distribution.
Note that this is simply a multiple (p) of the lognormal hazard function.
The following is the plot of the power lognormal hazard function with
the same values of p as the pdf plots above.
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the power lognormal
distribution is
The following is the plot of the power lognormal cumulative hazard
function with the same values of p as the pdf plots above.
1.3.6.6.14. Power Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366e.htm[6/27/2012 2:02:44 PM]
Survival
Function
The formula for the survival function of the power lognormal
distribution is
The following is the plot of the power lognormal survival function with
the same values of p as the pdf plots above.
Inverse
Survival
Function
The formula for the inverse survival function of the power lognormal
distribution is
The following is the plot of the power lognormal inverse survival
function with the same values of p as the pdf plots above.
1.3.6.6.14. Power Lognormal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366e.htm[6/27/2012 2:02:44 PM]
Common
Statistics
The statistics for the power lognormal distribution are complicated and
require tables. Nelson discusses the mean, median, mode, and standard
deviation of the power lognormal distribution and provides references to
the appropriate tables.
Parameter
Estimation
Nelson discusses maximum likelihood estimation for the power
lognormal distribution. These estimates need to be performed with
computer software. Software for maximum likelihood estimation of the
parameters of the power lognormal distribution is not as readily
available as for other reliability distributions such as the exponential,
Weibull, and lognormal.
Software Most general purpose statistical software programs do not support the
probability functions for the power lognormal distribution.
1.3.6.6.15. Tukey-Lambda Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366f.htm[6/27/2012 2:02:46 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.15. Tukey-Lambda Distribution
Probability
Density
Function
The Tukey-Lambda density function does not have a simple,
closed form. It is computed numerically.
The Tukey-Lambda distribution has the shape parameter .
As with other probability distributions, the Tukey-Lambda
distribution can be transformed with a location parameter, ,
and a scale parameter, . Since the general form of
probability functions can be expressed in terms of the
standard distribution, all subsequent formulas in this section
are given for the standard form of the function.
The following is the plot of the Tukey-Lambda probability
density function for four values of .
Cumulative
Distribution
Function
The Tukey-Lambda distribution does not have a simple,
closed form. It is computed numerically.
The following is the plot of the Tukey-Lambda cumulative
distribution function with the same values of as the pdf
plots above.
1.3.6.6.15. Tukey-Lambda Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366f.htm[6/27/2012 2:02:46 PM]
Percent
Point
Function
The formula for the percent point function of the standard
form of the Tukey-Lambda distribution is
The following is the plot of the Tukey-Lambda percent point
function with the same values of as the pdf plots above.
Other
Probability
Functions
The Tukey-Lambda distribution is typically used to identify
an appropriate distribution (see the comments below) and not
used in statistical models directly. For this reason, we omit
the formulas, and plots for the hazard, cumulative hazard,
survival, and inverse survival functions. We also omit the
common statistics and parameter estimation sections.
Comments The Tukey-Lambda distribution is actually a family of
distributions that can approximate a number of common
distributions. For example,
= -1 approximately Cauchy
1.3.6.6.15. Tukey-Lambda Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366f.htm[6/27/2012 2:02:46 PM]
= 0 exactly logistic
= 0.14 approximately normal
= 0.5 U-shaped
= 1 exactly uniform (from -1 to +1)
The most common use of this distribution is to generate a
Tukey-Lambda PPCC plot of a data set. Based on the ppcc
plot, an appropriate model for the data is suggested. For
example, if the maximum correlation occurs for a value of
at or near 0.14, then the data can be modeled with a normal
distribution. Values of less than this imply a heavy-tailed
distribution (with -1 approximating a Cauchy). That is, as the
optimal value of goes from 0.14 to -1, increasingly heavy
tails are implied. Similarly, as the optimal value of becomes
greater than 0.14, shorter tails are implied.
As the Tukey-Lambda distribution is a symmetric
distribution, the use of the Tukey-Lambda PPCC plot to
determine a reasonable distribution to model the data only
applies to symmetric distributuins. A histogram of the data
should provide evidence as to whether the data can be
reasonably modeled with a symmetric distribution.
Software Most general purpose statistical software programs do not
support the probability functions for the Tukey-Lambda
distribution.
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.16. Extreme Value Type I Distribution
Probability
Density
Function
The extreme value type I distribution has two forms. One is
based on the smallest extreme and the other is based on the
largest extreme. We call these the minimum and maximum
cases, respectively. Formulas and plots for both cases are
given. The extreme value type I distribution is also referred to
as the Gumbel distribution.
The general formula for the probability density function of
the Gumbel (minimum) distribution is
where is the location parameter and is the scale
parameter. The case where = 0 and = 1 is called the
standard Gumbel distribution. The equation for the
standard Gumbel distribution (minimum) reduces to
The following is the plot of the Gumbel probability density
function for the minimum case.
The general formula for the probability density function of
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
the Gumbel (maximum) distribution is
where is the location parameter and is the scale
parameter. The case where = 0 and = 1 is called the
standard Gumbel distribution. The equation for the
standard Gumbel distribution (maximum) reduces to
The following is the plot of the Gumbel probability density
function for the maximum case.
Since the general form of probability functions can be
expressed in terms of the standard distribution, all subsequent
formulas in this section are given for the standard form of the
function.
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the
Gumbel distribution (minimum) is
The following is the plot of the Gumbel cumulative
distribution function for the minimum case.
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
The formula for the cumulative distribution function of the
Gumbel distribution (maximum) is
The following is the plot of the Gumbel cumulative
distribution function for the maximum case.
Percent
Point
Function
The formula for the percent point function of the Gumbel
distribution (minimum) is
The following is the plot of the Gumbel percent point
function for the minimum case.
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
The formula for the percent point function of the Gumbel
distribution (maximum) is
The following is the plot of the Gumbel percent point
function for the maximum case.
Hazard
Function
The formula for the hazard function of the Gumbel
distribution (minimum) is
The following is the plot of the Gumbel hazard function for
the minimum case.
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
The formula for the hazard function of the Gumbel
distribution (maximum) is
The following is the plot of the Gumbel hazard function for
the maximum case.
Cumulative
Hazard
Function
The formula for the cumulative hazard function of the
Gumbel distribution (minimum) is
The following is the plot of the Gumbel cumulative hazard
function for the minimum case.
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
The formula for the cumulative hazard function of the
Gumbel distribution (maximum) is
The following is the plot of the Gumbel cumulative hazard
function for the maximum case.
Survival
Function
The formula for the survival function of the Gumbel
distribution (minimum) is
The following is the plot of the Gumbel survival function for
the minimum case.
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
The formula for the survival function of the Gumbel
distribution (maximum) is
The following is the plot of the Gumbel survival function for
the maximum case.
Inverse
Survival
Function
The formula for the inverse survival function of the Gumbel
distribution (minimum) is
The following is the plot of the Gumbel inverse survival
function for the minimum case.
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
The formula for the inverse survival function of the Gumbel
distribution (maximum) is
The following is the plot of the Gumbel inverse survival
function for the maximum case.
Common
Statistics
The formulas below are for the maximum order statistic case.
Mean
The constant 0.5772 is Euler's number.
Median
Mode
Range Negative infinity to positive infinity.
Standard
Deviation
1.3.6.6.16. Extreme Value Type I Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm[6/27/2012 2:02:47 PM]
Skewness 1.13955
Kurtosis 5.4
Coefficient of
Variation
Parameter
Estimation
The method of moments estimators of the Gumbel
(maximum) distribution are
where and s are the sample mean and standard deviation,
respectively.
The equations for the maximum likelihood estimation of the
shape and scale parameters are discussed in Chapter 15 of
Evans, Hastings, and Peacock and Chapter 22 of Johnson,
Kotz, and Balakrishnan. These equations need to be solved
numerically and this is typically accomplished by using
statistical software packages.
Software Some general purpose statistical software programs support
at least some of the probability functions for the extreme
value type I distribution.
1.3.6.6.17. Beta Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm[6/27/2012 2:02:49 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.17. Beta Distribution
Probability
Density
Function
The general formula for the probability density function of the beta
distribution is
where p and q are the shape parameters, a and b are the lower and upper
bounds, respectively, of the distribution, and B(p,q) is the beta function.
The beta function has the formula
The case where a = 0 and b = 1 is called the standard beta distribution.
The equation for the standard beta distribution is
Typically we define the general form of a distribution in terms of location
and scale parameters. The beta is different in that we define the general
distribution in terms of the lower and upper bounds. However, the location
and scale parameters can be defined in terms of the lower and upper limits
as follows:
location = a
scale = b - a
Since the general form of probability functions can be expressed in terms
of the standard distribution, all subsequent formulas in this section are
given for the standard form of the function.
The following is the plot of the beta probability density function for four
different values of the shape parameters.
1.3.6.6.17. Beta Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm[6/27/2012 2:02:49 PM]
Cumulative
Distribution
Function
The formula for the cumulative distribution function of the beta distribution
is also called the incomplete beta function ratio (commonly denoted by I
x
)
and is defined as
where B is the beta function defined above.
The following is the plot of the beta cumulative distribution function with
the same values of the shape parameters as the pdf plots above.
Percent
Point
Function
The formula for the percent point function of the beta distribution does not
exist in a simple closed form. It is computed numerically.
The following is the plot of the beta percent point function with the same
values of the shape parameters as the pdf plots above.
1.3.6.6.17. Beta Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm[6/27/2012 2:02:49 PM]
Other
Probability
Functions
Since the beta distribution is not typically used for reliability applications,
we omit the formulas and plots for the hazard, cumulative hazard, survival,
and inverse survival probability functions.
Common
Statistics
The formulas below are for the case where the lower limit is zero and the
upper limit is one.
Mean
Mode
Range 0 to 1
Standard Deviation
Coefficient of
Variation
Skewness
Parameter
Estimation
First consider the case where a and b are assumed to be known. For this
case, the method of moments estimates are
where is the sample mean and s
2
is the sample variance. If a and b are
not 0 and 1, respectively, then replace with and s
2
with
in the above equations.
For the case when a and b are known, the maximum likelihood estimates
1.3.6.6.17. Beta Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm[6/27/2012 2:02:49 PM]
can be obtained by solving the following set of equations
The maximum likelihood equations for the case when a and b are not
known are given in pages 221-235 of Volume II of Johnson, Kotz, and
Balakrishan.
Software Most general purpose statistical software programs support at least some of
the probability functions for the beta distribution.
1.3.6.6.18. Binomial Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm[6/27/2012 2:02:50 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.18. Binomial Distribution
Probability
Mass
Function
The binomial distribution is used when there are exactly two
mutually exclusive outcomes of a trial. These outcomes are
appropriately labeled "success" and "failure". The binomial
distribution is used to obtain the probability of observing x successes
in N trials, with the probability of success on a single trial denoted
by p. The binomial distribution assumes that p is fixed for all trials.
The formula for the binomial probability mass function is
where
The following is the plot of the binomial probability density function
for four values of p and n = 100.
Cumulative
Distribution
Function
The formula for the binomial cumulative probability function is
1.3.6.6.18. Binomial Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm[6/27/2012 2:02:50 PM]
The following is the plot of the binomial cumulative distribution
function with the same values of p as the pdf plots above.
Percent
Point
Function
The binomial percent point function does not exist in simple closed
form. It is computed numerically. Note that because this is a discrete
distribution that is only defined for integer values of x, the percent
point function is not smooth in the way the percent point function
typically is for a continuous distribution.
The following is the plot of the binomial percent point function with
the same values of p as the pdf plots above.
Common
Statistics
Mean
Mode
Range 0 to N
Standard Deviation
1.3.6.6.18. Binomial Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm[6/27/2012 2:02:50 PM]
Coefficient of
Variation
Skewness
Kurtosis
Comments The binomial distribution is probably the most commonly used
discrete distribution.
Parameter
Estimation
The maximum likelihood estimator of p (n is fixed) is
Software Most general purpose statistical software programs support at least
some of the probability functions for the binomial distribution.
1.3.6.6.19. Poisson Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366j.htm[6/27/2012 2:02:51 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.6. Gallery of Distributions
1.3.6.6.19. Poisson Distribution
Probability
Mass
Function
The Poisson distribution is used to model the number of
events occurring within a given time interval.
The formula for the Poisson probability mass function is
is the shape parameter which indicates the average number
of events in the given time interval.
The following is the plot of the Poisson probability density
function for four values of .
Cumulative
Distribution
Function
The formula for the Poisson cumulative probability function
is
The following is the plot of the Poisson cumulative
distribution function with the same values of as the pdf
plots above.
1.3.6.6.19. Poisson Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366j.htm[6/27/2012 2:02:51 PM]
Percent
Point
Function
The Poisson percent point function does not exist in simple
closed form. It is computed numerically. Note that because
this is a discrete distribution that is only defined for integer
values of x, the percent point function is not smooth in the
way the percent point function typically is for a continuous
distribution.
The following is the plot of the Poisson percent point
function with the same values of as the pdf plots above.
Common
Statistics
Mean
Mode For non-integer , it is the largest integer
less than . For integer , x = and x =
- 1 are both the mode.
Range 0 to positive infinity
Standard
Deviation
Coefficient of
Variation
1.3.6.6.19. Poisson Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366j.htm[6/27/2012 2:02:51 PM]
Skewness
Kurtosis
Parameter
Estimation
The maximum likelihood estimator of is
where is the sample mean.
Software Most general purpose statistical software programs support at
least some of the probability functions for the Poisson
distribution.
1.3.6.7. Tables for Probability Distributions
http://www.itl.nist.gov/div898/handbook/eda/section3/eda367.htm[6/27/2012 2:02:52 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.7. Tables for Probability Distributions
Tables Several commonly used tables for probability distributions can
be referenced below.
The values from these tables can also be obtained from most
general purpose statistical software programs. Most
introductory statistics textbooks (e.g., Snedecor and Cochran)
contain more extensive tables than are included here. These
tables are included for convenience.
1. Cumulative distribution function for the standard normal
distribution
2. Upper critical values of Student's t-distribution with
degrees of freedom
3. Upper critical values of the F-distribution with and
degrees of freedom
4. Upper critical values of the chi-square distribution with
degrees of freedom
5. Critical values of t
*
distribution for testing the output of
a linear calibration line at 3 points
6. Upper critical values of the normal PPCC distribution
1.3.6.7.1. Cumulative Distribution Function of the Standard Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3671.htm[6/27/2012 2:02:53 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.7. Tables for Probability Distributions
1.3.6.7.1. Cumulative Distribution Function of
the Standard Normal Distribution
How to
Use This
Table
The table below contains the area under the standard normal
curve from 0 to z. This can be used to compute the cumulative
distribution function values for the standard normal
distribution.
The table utilizes the symmetry of the normal distribution, so
what in fact is given is
where a is the value of interest. This is demonstrated in the
graph below for a = 0.5. The shaded area of the curve
represents the probability that x is between 0 and a.
This can be clarified by a few simple examples.
1. What is the probability that x is less than or equal to
1.53? Look for 1.5 in the X column, go right to the 0.03
column to find the value 0.43699. Now add 0.5 (for the
probability less than zero) to obtain the final result of
0.93699.
2. What is the probability that x is less than or equal to -
1.3.6.7.1. Cumulative Distribution Function of the Standard Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3671.htm[6/27/2012 2:02:53 PM]
1.53? For negative values, use the relationship
From the first example, this gives 1 - 0.93699 =
0.06301.
3. What is the probability that x is between -1 and 0.5?
Look up the values for 0.5 (0.5 + 0.19146 = 0.69146)
and -1 (1 - (0.5 + 0.34134) = 0.15866). Then subtract
the results (0.69146 - 0.15866) to obtain the result
0.5328.
To use this table with a non-standard normal distribution
(either the location parameter is not 0 or the scale parameter is
not 1), standardize your value by subtracting the mean and
dividing the result by the standard deviation. Then look up the
value for this standardized value.
A few particularly important numbers derived from the table
below, specifically numbers that are commonly used in
significance tests, are summarized in the following table:
p 0.001 0.005 0.010 0.025 0.050 0.100
Z
p
-3.090 -2.576 -2.326 -1.960 -1.645 -1.282
p 0.999 0.995 0.990 0.975 0.950 0.900
Z
p
+3.090 +2.576 +2.326 +1.960 +1.645 +1.282
These are critical values for the normal distribution.
Area under the Normal Curve from 0
to X
X 0.00 0.01 0.02 0.03 0.04 0.05 0.06
0.07 0.08 0.09
0.0 0.00000 0.00399 0.00798 0.01197 0.01595 0.01994 0.02392
0.02790 0.03188 0.03586
0.1 0.03983 0.04380 0.04776 0.05172 0.05567 0.05962 0.06356
0.06749 0.07142 0.07535
0.2 0.07926 0.08317 0.08706 0.09095 0.09483 0.09871 0.10257
0.10642 0.11026 0.11409
0.3 0.11791 0.12172 0.12552 0.12930 0.13307 0.13683 0.14058
0.14431 0.14803 0.15173
0.4 0.15542 0.15910 0.16276 0.16640 0.17003 0.17364 0.17724
0.18082 0.18439 0.18793
0.5 0.19146 0.19497 0.19847 0.20194 0.20540 0.20884 0.21226
0.21566 0.21904 0.22240
0.6 0.22575 0.22907 0.23237 0.23565 0.23891 0.24215 0.24537
0.24857 0.25175 0.25490
0.7 0.25804 0.26115 0.26424 0.26730 0.27035 0.27337 0.27637
0.27935 0.28230 0.28524
0.8 0.28814 0.29103 0.29389 0.29673 0.29955 0.30234 0.30511
0.30785 0.31057 0.31327
0.9 0.31594 0.31859 0.32121 0.32381 0.32639 0.32894 0.33147
0.33398 0.33646 0.33891
1.3.6.7.1. Cumulative Distribution Function of the Standard Normal Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3671.htm[6/27/2012 2:02:53 PM]
1.0 0.34134 0.34375 0.34614 0.34849 0.35083 0.35314 0.35543
0.35769 0.35993 0.36214
1.1 0.36433 0.36650 0.36864 0.37076 0.37286 0.37493 0.37698
0.37900 0.38100 0.38298
1.2 0.38493 0.38686 0.38877 0.39065 0.39251 0.39435 0.39617
0.39796 0.39973 0.40147
1.3 0.40320 0.40490 0.40658 0.40824 0.40988 0.41149 0.41308
0.41466 0.41621 0.41774
1.4 0.41924 0.42073 0.42220 0.42364 0.42507 0.42647 0.42785
0.42922 0.43056 0.43189
1.5 0.43319 0.43448 0.43574 0.43699 0.43822 0.43943 0.44062
0.44179 0.44295 0.44408
1.6 0.44520 0.44630 0.44738 0.44845 0.44950 0.45053 0.45154
0.45254 0.45352 0.45449
1.7 0.45543 0.45637 0.45728 0.45818 0.45907 0.45994 0.46080
0.46164 0.46246 0.46327
1.8 0.46407 0.46485 0.46562 0.46638 0.46712 0.46784 0.46856
0.46926 0.46995 0.47062
1.9 0.47128 0.47193 0.47257 0.47320 0.47381 0.47441 0.47500
0.47558 0.47615 0.47670
2.0 0.47725 0.47778 0.47831 0.47882 0.47932 0.47982 0.48030
0.48077 0.48124 0.48169
2.1 0.48214 0.48257 0.48300 0.48341 0.48382 0.48422 0.48461
0.48500 0.48537 0.48574
2.2 0.48610 0.48645 0.48679 0.48713 0.48745 0.48778 0.48809
0.48840 0.48870 0.48899
2.3 0.48928 0.48956 0.48983 0.49010 0.49036 0.49061 0.49086
0.49111 0.49134 0.49158
2.4 0.49180 0.49202 0.49224 0.49245 0.49266 0.49286 0.49305
0.49324 0.49343 0.49361
2.5 0.49379 0.49396 0.49413 0.49430 0.49446 0.49461 0.49477
0.49492 0.49506 0.49520
2.6 0.49534 0.49547 0.49560 0.49573 0.49585 0.49598 0.49609
0.49621 0.49632 0.49643
2.7 0.49653 0.49664 0.49674 0.49683 0.49693 0.49702 0.49711
0.49720 0.49728 0.49736
2.8 0.49744 0.49752 0.49760 0.49767 0.49774 0.49781 0.49788
0.49795 0.49801 0.49807
2.9 0.49813 0.49819 0.49825 0.49831 0.49836 0.49841 0.49846
0.49851 0.49856 0.49861
3.0 0.49865 0.49869 0.49874 0.49878 0.49882 0.49886 0.49889
0.49893 0.49896 0.49900
3.1 0.49903 0.49906 0.49910 0.49913 0.49916 0.49918 0.49921
0.49924 0.49926 0.49929
3.2 0.49931 0.49934 0.49936 0.49938 0.49940 0.49942 0.49944
0.49946 0.49948 0.49950
3.3 0.49952 0.49953 0.49955 0.49957 0.49958 0.49960 0.49961
0.49962 0.49964 0.49965
3.4 0.49966 0.49968 0.49969 0.49970 0.49971 0.49972 0.49973
0.49974 0.49975 0.49976
3.5 0.49977 0.49978 0.49978 0.49979 0.49980 0.49981 0.49981
0.49982 0.49983 0.49983
3.6 0.49984 0.49985 0.49985 0.49986 0.49986 0.49987 0.49987
0.49988 0.49988 0.49989
3.7 0.49989 0.49990 0.49990 0.49990 0.49991 0.49991 0.49992
0.49992 0.49992 0.49992
3.8 0.49993 0.49993 0.49993 0.49994 0.49994 0.49994 0.49994
0.49995 0.49995 0.49995
3.9 0.49995 0.49995 0.49996 0.49996 0.49996 0.49996 0.49996
0.49996 0.49997 0.49997
4.0 0.49997 0.49997 0.49997 0.49997 0.49997 0.49997 0.49998
0.49998 0.49998 0.49998
1.3.6.7.2. Critical Values of the Student's-t Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm[6/27/2012 2:02:54 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.7. Tables for Probability Distributions
1.3.6.7.2. Critical Values of the Student's t
Distribution
How to
Use This
Table
This table contains critical values of the Student's t
distribution computed using the cumulative distribution
function. The t distribution is symmetric so that
t
1-,
= -t
,
.
The t table can be used for both one-sided (lower and upper)
and two-sided tests using the appropriate value of .
The significance level, , is demonstrated in the graph below,
which displays a t distribution with 10 degrees of freedom.
The most commonly used significance level is = 0.05. For a
two-sided test, we compute 1 - /2, or 1 - 0.05/2 = 0.975 when
= 0.05. If the absolute value of the test statistic is greater
than the critical value (0.975), then we reject the null
hypothesis. Due to the symmetry of the t distribution, we only
tabulate the positive critical values in the table below.
Given a specified value for :
1. For a two-sided test, find the column corresponding to
1-/2 and reject the null hypothesis if the absolute value
1.3.6.7.2. Critical Values of the Student's-t Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm[6/27/2012 2:02:54 PM]
of the test statistic is greater than the value of t
1-/2,
in
the table below.
2. For an upper, one-sided test, find the column
corresponding to 1- and reject the null hypothesis if the
test statistic is greater than the table value.
3. For a lower, one-sided test, find the column
corresponding to 1- and reject the null hypothesis if the
test statistic is less than the negative of the table value.
Critical values of Student's t distribution with degrees of
freedom
Probability less than the critical value
(t
1-,
)
0.90 0.95 0.975 0.99 0.995
0.999
1. 3.078 6.314 12.706 31.821 63.657
318.313
2. 1.886 2.920 4.303 6.965 9.925
22.327
3. 1.638 2.353 3.182 4.541 5.841
10.215
4. 1.533 2.132 2.776 3.747 4.604
7.173
5. 1.476 2.015 2.571 3.365 4.032
5.893
6. 1.440 1.943 2.447 3.143 3.707
5.208
7. 1.415 1.895 2.365 2.998 3.499
4.782
8. 1.397 1.860 2.306 2.896 3.355
4.499
9. 1.383 1.833 2.262 2.821 3.250
4.296
10. 1.372 1.812 2.228 2.764 3.169
4.143
11. 1.363 1.796 2.201 2.718 3.106
4.024
12. 1.356 1.782 2.179 2.681 3.055
3.929
13. 1.350 1.771 2.160 2.650 3.012
3.852
14. 1.345 1.761 2.145 2.624 2.977
3.787
15. 1.341 1.753 2.131 2.602 2.947
3.733
16. 1.337 1.746 2.120 2.583 2.921
3.686
17. 1.333 1.740 2.110 2.567 2.898
3.646
18. 1.330 1.734 2.101 2.552 2.878
3.610
19. 1.328 1.729 2.093 2.539 2.861
3.579
20. 1.325 1.725 2.086 2.528 2.845
1.3.6.7.2. Critical Values of the Student's-t Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm[6/27/2012 2:02:54 PM]
3.552
21. 1.323 1.721 2.080 2.518 2.831
3.527
22. 1.321 1.717 2.074 2.508 2.819
3.505
23. 1.319 1.714 2.069 2.500 2.807
3.485
24. 1.318 1.711 2.064 2.492 2.797
3.467
25. 1.316 1.708 2.060 2.485 2.787
3.450
26. 1.315 1.706 2.056 2.479 2.779
3.435
27. 1.314 1.703 2.052 2.473 2.771
3.421
28. 1.313 1.701 2.048 2.467 2.763
3.408
29. 1.311 1.699 2.045 2.462 2.756
3.396
30. 1.310 1.697 2.042 2.457 2.750
3.385
31. 1.309 1.696 2.040 2.453 2.744
3.375
32. 1.309 1.694 2.037 2.449 2.738
3.365
33. 1.308 1.692 2.035 2.445 2.733
3.356
34. 1.307 1.691 2.032 2.441 2.728
3.348
35. 1.306 1.690 2.030 2.438 2.724
3.340
36. 1.306 1.688 2.028 2.434 2.719
3.333
37. 1.305 1.687 2.026 2.431 2.715
3.326
38. 1.304 1.686 2.024 2.429 2.712
3.319
39. 1.304 1.685 2.023 2.426 2.708
3.313
40. 1.303 1.684 2.021 2.423 2.704
3.307
41. 1.303 1.683 2.020 2.421 2.701
3.301
42. 1.302 1.682 2.018 2.418 2.698
3.296
43. 1.302 1.681 2.017 2.416 2.695
3.291
44. 1.301 1.680 2.015 2.414 2.692
3.286
45. 1.301 1.679 2.014 2.412 2.690
3.281
46. 1.300 1.679 2.013 2.410 2.687
3.277
47. 1.300 1.678 2.012 2.408 2.685
3.273
48. 1.299 1.677 2.011 2.407 2.682
3.269
49. 1.299 1.677 2.010 2.405 2.680
3.265
50. 1.299 1.676 2.009 2.403 2.678
3.261
51. 1.298 1.675 2.008 2.402 2.676
3.258
52. 1.298 1.675 2.007 2.400 2.674
1.3.6.7.2. Critical Values of the Student's-t Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm[6/27/2012 2:02:54 PM]
3.255
53. 1.298 1.674 2.006 2.399 2.672
3.251
54. 1.297 1.674 2.005 2.397 2.670
3.248
55. 1.297 1.673 2.004 2.396 2.668
3.245
56. 1.297 1.673 2.003 2.395 2.667
3.242
57. 1.297 1.672 2.002 2.394 2.665
3.239
58. 1.296 1.672 2.002 2.392 2.663
3.237
59. 1.296 1.671 2.001 2.391 2.662
3.234
60. 1.296 1.671 2.000 2.390 2.660
3.232
61. 1.296 1.670 2.000 2.389 2.659
3.229
62. 1.295 1.670 1.999 2.388 2.657
3.227
63. 1.295 1.669 1.998 2.387 2.656
3.225
64. 1.295 1.669 1.998 2.386 2.655
3.223
65. 1.295 1.669 1.997 2.385 2.654
3.220
66. 1.295 1.668 1.997 2.384 2.652
3.218
67. 1.294 1.668 1.996 2.383 2.651
3.216
68. 1.294 1.668 1.995 2.382 2.650
3.214
69. 1.294 1.667 1.995 2.382 2.649
3.213
70. 1.294 1.667 1.994 2.381 2.648
3.211
71. 1.294 1.667 1.994 2.380 2.647
3.209
72. 1.293 1.666 1.993 2.379 2.646
3.207
73. 1.293 1.666 1.993 2.379 2.645
3.206
74. 1.293 1.666 1.993 2.378 2.644
3.204
75. 1.293 1.665 1.992 2.377 2.643
3.202
76. 1.293 1.665 1.992 2.376 2.642
3.201
77. 1.293 1.665 1.991 2.376 2.641
3.199
78. 1.292 1.665 1.991 2.375 2.640
3.198
79. 1.292 1.664 1.990 2.374 2.640
3.197
80. 1.292 1.664 1.990 2.374 2.639
3.195
81. 1.292 1.664 1.990 2.373 2.638
3.194
82. 1.292 1.664 1.989 2.373 2.637
3.193
83. 1.292 1.663 1.989 2.372 2.636
3.191
84. 1.292 1.663 1.989 2.372 2.636
1.3.6.7.2. Critical Values of the Student's-t Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm[6/27/2012 2:02:54 PM]
3.190
85. 1.292 1.663 1.988 2.371 2.635
3.189
86. 1.291 1.663 1.988 2.370 2.634
3.188
87. 1.291 1.663 1.988 2.370 2.634
3.187
88. 1.291 1.662 1.987 2.369 2.633
3.185
89. 1.291 1.662 1.987 2.369 2.632
3.184
90. 1.291 1.662 1.987 2.368 2.632
3.183
91. 1.291 1.662 1.986 2.368 2.631
3.182
92. 1.291 1.662 1.986 2.368 2.630
3.181
93. 1.291 1.661 1.986 2.367 2.630
3.180
94. 1.291 1.661 1.986 2.367 2.629
3.179
95. 1.291 1.661 1.985 2.366 2.629
3.178
96. 1.290 1.661 1.985 2.366 2.628
3.177
97. 1.290 1.661 1.985 2.365 2.627
3.176
98. 1.290 1.661 1.984 2.365 2.627
3.175
99. 1.290 1.660 1.984 2.365 2.626
3.175
100. 1.290 1.660 1.984 2.364 2.626
3.174
1.282 1.645 1.960 2.326 2.576
3.090
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.7. Tables for Probability Distributions
1.3.6.7.3. Upper Critical Values of the F
Distribution
How to
Use This
Table
This table contains the upper critical values of the F
distribution. This table is used for one-sided F tests at the =
0.05, 0.10, and 0.01 levels.
More specifically, a test statistic is computed with and
degrees of freedom, and the result is compared to this table.
For a one-sided test, the null hypothesis is rejected when the
test statistic is greater than the tabled value. This is
demonstrated with the graph of an F distribution with = 10
and = 10. The shaded area of the graph indicates the
rejection region at the significance level. Since this is a one-
sided test, we have probability in the upper tail of exceeding
the critical value and zero in the lower tail. Because the F
distribution is asymmetric, a two-sided test requires a set of of
tables (not included here) that contain the rejection regions for
both the lower and upper tails.
Contents The following tables for from 1 to 100 are included:
1. One sided, 5% significance level, = 1 - 10
2. One sided, 5% significance level, = 11 - 20
3. One sided, 10% significance level, = 1 - 10
4. One sided, 10% significance level, = 11 - 20
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
5. One sided, 1% significance level, = 1 - 10
6. One sided, 1% significance level, = 11 - 20
Upper critical values of the F distribution
for numerator degrees of freedom and denominator
degrees of freedom
5% significance level
\ 1 2 3 4 5
6 7 8 9 10
1 161.448 199.500 215.707 224.583 230.162
233.986 236.768 238.882 240.543 241.882
2 18.513 19.000 19.164 19.247 19.296
19.330 19.353 19.371 19.385 19.396
3 10.128 9.552 9.277 9.117 9.013
8.941 8.887 8.845 8.812 8.786
4 7.709 6.944 6.591 6.388 6.256
6.163 6.094 6.041 5.999 5.964
5 6.608 5.786 5.409 5.192 5.050
4.950 4.876 4.818 4.772 4.735
6 5.987 5.143 4.757 4.534 4.387
4.284 4.207 4.147 4.099 4.060
7 5.591 4.737 4.347 4.120 3.972
3.866 3.787 3.726 3.677 3.637
8 5.318 4.459 4.066 3.838 3.687
3.581 3.500 3.438 3.388 3.347
9 5.117 4.256 3.863 3.633 3.482
3.374 3.293 3.230 3.179 3.137
10 4.965 4.103 3.708 3.478 3.326
3.217 3.135 3.072 3.020 2.978
11 4.844 3.982 3.587 3.357 3.204
3.095 3.012 2.948 2.896 2.854
12 4.747 3.885 3.490 3.259 3.106
2.996 2.913 2.849 2.796 2.753
13 4.667 3.806 3.411 3.179 3.025
2.915 2.832 2.767 2.714 2.671
14 4.600 3.739 3.344 3.112 2.958
2.848 2.764 2.699 2.646 2.602
15 4.543 3.682 3.287 3.056 2.901
2.790 2.707 2.641 2.588 2.544
16 4.494 3.634 3.239 3.007 2.852
2.741 2.657 2.591 2.538 2.494
17 4.451 3.592 3.197 2.965 2.810
2.699 2.614 2.548 2.494 2.450
18 4.414 3.555 3.160 2.928 2.773
2.661 2.577 2.510 2.456 2.412
19 4.381 3.522 3.127 2.895 2.740
2.628 2.544 2.477 2.423 2.378
20 4.351 3.493 3.098 2.866 2.711
2.599 2.514 2.447 2.393 2.348
21 4.325 3.467 3.072 2.840 2.685
2.573 2.488 2.420 2.366 2.321
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
22 4.301 3.443 3.049 2.817 2.661
2.549 2.464 2.397 2.342 2.297
23 4.279 3.422 3.028 2.796 2.640
2.528 2.442 2.375 2.320 2.275
24 4.260 3.403 3.009 2.776 2.621
2.508 2.423 2.355 2.300 2.255
25 4.242 3.385 2.991 2.759 2.603
2.490 2.405 2.337 2.282 2.236
26 4.225 3.369 2.975 2.743 2.587
2.474 2.388 2.321 2.265 2.220
27 4.210 3.354 2.960 2.728 2.572
2.459 2.373 2.305 2.250 2.204
28 4.196 3.340 2.947 2.714 2.558
2.445 2.359 2.291 2.236 2.190
29 4.183 3.328 2.934 2.701 2.545
2.432 2.346 2.278 2.223 2.177
30 4.171 3.316 2.922 2.690 2.534
2.421 2.334 2.266 2.211 2.165
31 4.160 3.305 2.911 2.679 2.523
2.409 2.323 2.255 2.199 2.153
32 4.149 3.295 2.901 2.668 2.512
2.399 2.313 2.244 2.189 2.142
33 4.139 3.285 2.892 2.659 2.503
2.389 2.303 2.235 2.179 2.133
34 4.130 3.276 2.883 2.650 2.494
2.380 2.294 2.225 2.170 2.123
35 4.121 3.267 2.874 2.641 2.485
2.372 2.285 2.217 2.161 2.114
36 4.113 3.259 2.866 2.634 2.477
2.364 2.277 2.209 2.153 2.106
37 4.105 3.252 2.859 2.626 2.470
2.356 2.270 2.201 2.145 2.098
38 4.098 3.245 2.852 2.619 2.463
2.349 2.262 2.194 2.138 2.091
39 4.091 3.238 2.845 2.612 2.456
2.342 2.255 2.187 2.131 2.084
40 4.085 3.232 2.839 2.606 2.449
2.336 2.249 2.180 2.124 2.077
41 4.079 3.226 2.833 2.600 2.443
2.330 2.243 2.174 2.118 2.071
42 4.073 3.220 2.827 2.594 2.438
2.324 2.237 2.168 2.112 2.065
43 4.067 3.214 2.822 2.589 2.432
2.318 2.232 2.163 2.106 2.059
44 4.062 3.209 2.816 2.584 2.427
2.313 2.226 2.157 2.101 2.054
45 4.057 3.204 2.812 2.579 2.422
2.308 2.221 2.152 2.096 2.049
46 4.052 3.200 2.807 2.574 2.417
2.304 2.216 2.147 2.091 2.044
47 4.047 3.195 2.802 2.570 2.413
2.299 2.212 2.143 2.086 2.039
48 4.043 3.191 2.798 2.565 2.409
2.295 2.207 2.138 2.082 2.035
49 4.038 3.187 2.794 2.561 2.404
2.290 2.203 2.134 2.077 2.030
50 4.034 3.183 2.790 2.557 2.400
2.286 2.199 2.130 2.073 2.026
51 4.030 3.179 2.786 2.553 2.397
2.283 2.195 2.126 2.069 2.022
52 4.027 3.175 2.783 2.550 2.393
2.279 2.192 2.122 2.066 2.018
53 4.023 3.172 2.779 2.546 2.389
2.275 2.188 2.119 2.062 2.015
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
54 4.020 3.168 2.776 2.543 2.386
2.272 2.185 2.115 2.059 2.011
55 4.016 3.165 2.773 2.540 2.383
2.269 2.181 2.112 2.055 2.008
56 4.013 3.162 2.769 2.537 2.380
2.266 2.178 2.109 2.052 2.005
57 4.010 3.159 2.766 2.534 2.377
2.263 2.175 2.106 2.049 2.001
58 4.007 3.156 2.764 2.531 2.374
2.260 2.172 2.103 2.046 1.998
59 4.004 3.153 2.761 2.528 2.371
2.257 2.169 2.100 2.043 1.995
60 4.001 3.150 2.758 2.525 2.368
2.254 2.167 2.097 2.040 1.993
61 3.998 3.148 2.755 2.523 2.366
2.251 2.164 2.094 2.037 1.990
62 3.996 3.145 2.753 2.520 2.363
2.249 2.161 2.092 2.035 1.987
63 3.993 3.143 2.751 2.518 2.361
2.246 2.159 2.089 2.032 1.985
64 3.991 3.140 2.748 2.515 2.358
2.244 2.156 2.087 2.030 1.982
65 3.989 3.138 2.746 2.513 2.356
2.242 2.154 2.084 2.027 1.980
66 3.986 3.136 2.744 2.511 2.354
2.239 2.152 2.082 2.025 1.977
67 3.984 3.134 2.742 2.509 2.352
2.237 2.150 2.080 2.023 1.975
68 3.982 3.132 2.740 2.507 2.350
2.235 2.148 2.078 2.021 1.973
69 3.980 3.130 2.737 2.505 2.348
2.233 2.145 2.076 2.019 1.971
70 3.978 3.128 2.736 2.503 2.346
2.231 2.143 2.074 2.017 1.969
71 3.976 3.126 2.734 2.501 2.344
2.229 2.142 2.072 2.015 1.967
72 3.974 3.124 2.732 2.499 2.342
2.227 2.140 2.070 2.013 1.965
73 3.972 3.122 2.730 2.497 2.340
2.226 2.138 2.068 2.011 1.963
74 3.970 3.120 2.728 2.495 2.338
2.224 2.136 2.066 2.009 1.961
75 3.968 3.119 2.727 2.494 2.337
2.222 2.134 2.064 2.007 1.959
76 3.967 3.117 2.725 2.492 2.335
2.220 2.133 2.063 2.006 1.958
77 3.965 3.115 2.723 2.490 2.333
2.219 2.131 2.061 2.004 1.956
78 3.963 3.114 2.722 2.489 2.332
2.217 2.129 2.059 2.002 1.954
79 3.962 3.112 2.720 2.487 2.330
2.216 2.128 2.058 2.001 1.953
80 3.960 3.111 2.719 2.486 2.329
2.214 2.126 2.056 1.999 1.951
81 3.959 3.109 2.717 2.484 2.327
2.213 2.125 2.055 1.998 1.950
82 3.957 3.108 2.716 2.483 2.326
2.211 2.123 2.053 1.996 1.948
83 3.956 3.107 2.715 2.482 2.324
2.210 2.122 2.052 1.995 1.947
84 3.955 3.105 2.713 2.480 2.323
2.209 2.121 2.051 1.993 1.945
85 3.953 3.104 2.712 2.479 2.322
2.207 2.119 2.049 1.992 1.944
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
86 3.952 3.103 2.711 2.478 2.321
2.206 2.118 2.048 1.991 1.943
87 3.951 3.101 2.709 2.476 2.319
2.205 2.117 2.047 1.989 1.941
88 3.949 3.100 2.708 2.475 2.318
2.203 2.115 2.045 1.988 1.940
89 3.948 3.099 2.707 2.474 2.317
2.202 2.114 2.044 1.987 1.939
90 3.947 3.098 2.706 2.473 2.316
2.201 2.113 2.043 1.986 1.938
91 3.946 3.097 2.705 2.472 2.315
2.200 2.112 2.042 1.984 1.936
92 3.945 3.095 2.704 2.471 2.313
2.199 2.111 2.041 1.983 1.935
93 3.943 3.094 2.703 2.470 2.312
2.198 2.110 2.040 1.982 1.934
94 3.942 3.093 2.701 2.469 2.311
2.197 2.109 2.038 1.981 1.933
95 3.941 3.092 2.700 2.467 2.310
2.196 2.108 2.037 1.980 1.932
96 3.940 3.091 2.699 2.466 2.309
2.195 2.106 2.036 1.979 1.931
97 3.939 3.090 2.698 2.465 2.308
2.194 2.105 2.035 1.978 1.930
98 3.938 3.089 2.697 2.465 2.307
2.193 2.104 2.034 1.977 1.929
99 3.937 3.088 2.696 2.464 2.306
2.192 2.103 2.033 1.976 1.928
100 3.936 3.087 2.696 2.463 2.305
2.191 2.103 2.032 1.975 1.927
\ 11 12 13 14 15
16 17 18 19 20
1 242.983 243.906 244.690 245.364 245.950
246.464 246.918 247.323 247.686 248.013
2 19.405 19.413 19.419 19.424 19.429
19.433 19.437 19.440 19.443 19.446
3 8.763 8.745 8.729 8.715 8.703
8.692 8.683 8.675 8.667 8.660
4 5.936 5.912 5.891 5.873 5.858
5.844 5.832 5.821 5.811 5.803
5 4.704 4.678 4.655 4.636 4.619
4.604 4.590 4.579 4.568 4.558
6 4.027 4.000 3.976 3.956 3.938
3.922 3.908 3.896 3.884 3.874
7 3.603 3.575 3.550 3.529 3.511
3.494 3.480 3.467 3.455 3.445
8 3.313 3.284 3.259 3.237 3.218
3.202 3.187 3.173 3.161 3.150
9 3.102 3.073 3.048 3.025 3.006
2.989 2.974 2.960 2.948 2.936
10 2.943 2.913 2.887 2.865 2.845
2.828 2.812 2.798 2.785 2.774
11 2.818 2.788 2.761 2.739 2.719
2.701 2.685 2.671 2.658 2.646
12 2.717 2.687 2.660 2.637 2.617
2.599 2.583 2.568 2.555 2.544
13 2.635 2.604 2.577 2.554 2.533
2.515 2.499 2.484 2.471 2.459
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
14 2.565 2.534 2.507 2.484 2.463
2.445 2.428 2.413 2.400 2.388
15 2.507 2.475 2.448 2.424 2.403
2.385 2.368 2.353 2.340 2.328
16 2.456 2.425 2.397 2.373 2.352
2.333 2.317 2.302 2.288 2.276
17 2.413 2.381 2.353 2.329 2.308
2.289 2.272 2.257 2.243 2.230
18 2.374 2.342 2.314 2.290 2.269
2.250 2.233 2.217 2.203 2.191
19 2.340 2.308 2.280 2.256 2.234
2.215 2.198 2.182 2.168 2.155
20 2.310 2.278 2.250 2.225 2.203
2.184 2.167 2.151 2.137 2.124
21 2.283 2.250 2.222 2.197 2.176
2.156 2.139 2.123 2.109 2.096
22 2.259 2.226 2.198 2.173 2.151
2.131 2.114 2.098 2.084 2.071
23 2.236 2.204 2.175 2.150 2.128
2.109 2.091 2.075 2.061 2.048
24 2.216 2.183 2.155 2.130 2.108
2.088 2.070 2.054 2.040 2.027
25 2.198 2.165 2.136 2.111 2.089
2.069 2.051 2.035 2.021 2.007
26 2.181 2.148 2.119 2.094 2.072
2.052 2.034 2.018 2.003 1.990
27 2.166 2.132 2.103 2.078 2.056
2.036 2.018 2.002 1.987 1.974
28 2.151 2.118 2.089 2.064 2.041
2.021 2.003 1.987 1.972 1.959
29 2.138 2.104 2.075 2.050 2.027
2.007 1.989 1.973 1.958 1.945
30 2.126 2.092 2.063 2.037 2.015
1.995 1.976 1.960 1.945 1.932
31 2.114 2.080 2.051 2.026 2.003
1.983 1.965 1.948 1.933 1.920
32 2.103 2.070 2.040 2.015 1.992
1.972 1.953 1.937 1.922 1.908
33 2.093 2.060 2.030 2.004 1.982
1.961 1.943 1.926 1.911 1.898
34 2.084 2.050 2.021 1.995 1.972
1.952 1.933 1.917 1.902 1.888
35 2.075 2.041 2.012 1.986 1.963
1.942 1.924 1.907 1.892 1.878
36 2.067 2.033 2.003 1.977 1.954
1.934 1.915 1.899 1.883 1.870
37 2.059 2.025 1.995 1.969 1.946
1.926 1.907 1.890 1.875 1.861
38 2.051 2.017 1.988 1.962 1.939
1.918 1.899 1.883 1.867 1.853
39 2.044 2.010 1.981 1.954 1.931
1.911 1.892 1.875 1.860 1.846
40 2.038 2.003 1.974 1.948 1.924
1.904 1.885 1.868 1.853 1.839
41 2.031 1.997 1.967 1.941 1.918
1.897 1.879 1.862 1.846 1.832
42 2.025 1.991 1.961 1.935 1.912
1.891 1.872 1.855 1.840 1.826
43 2.020 1.985 1.955 1.929 1.906
1.885 1.866 1.849 1.834 1.820
44 2.014 1.980 1.950 1.924 1.900
1.879 1.861 1.844 1.828 1.814
45 2.009 1.974 1.945 1.918 1.895
1.874 1.855 1.838 1.823 1.808
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
46 2.004 1.969 1.940 1.913 1.890
1.869 1.850 1.833 1.817 1.803
47 1.999 1.965 1.935 1.908 1.885
1.864 1.845 1.828 1.812 1.798
48 1.995 1.960 1.930 1.904 1.880
1.859 1.840 1.823 1.807 1.793
49 1.990 1.956 1.926 1.899 1.876
1.855 1.836 1.819 1.803 1.789
50 1.986 1.952 1.921 1.895 1.871
1.850 1.831 1.814 1.798 1.784
51 1.982 1.947 1.917 1.891 1.867
1.846 1.827 1.810 1.794 1.780
52 1.978 1.944 1.913 1.887 1.863
1.842 1.823 1.806 1.790 1.776
53 1.975 1.940 1.910 1.883 1.859
1.838 1.819 1.802 1.786 1.772
54 1.971 1.936 1.906 1.879 1.856
1.835 1.816 1.798 1.782 1.768
55 1.968 1.933 1.903 1.876 1.852
1.831 1.812 1.795 1.779 1.764
56 1.964 1.930 1.899 1.873 1.849
1.828 1.809 1.791 1.775 1.761
57 1.961 1.926 1.896 1.869 1.846
1.824 1.805 1.788 1.772 1.757
58 1.958 1.923 1.893 1.866 1.842
1.821 1.802 1.785 1.769 1.754
59 1.955 1.920 1.890 1.863 1.839
1.818 1.799 1.781 1.766 1.751
60 1.952 1.917 1.887 1.860 1.836
1.815 1.796 1.778 1.763 1.748
61 1.949 1.915 1.884 1.857 1.834
1.812 1.793 1.776 1.760 1.745
62 1.947 1.912 1.882 1.855 1.831
1.809 1.790 1.773 1.757 1.742
63 1.944 1.909 1.879 1.852 1.828
1.807 1.787 1.770 1.754 1.739
64 1.942 1.907 1.876 1.849 1.826
1.804 1.785 1.767 1.751 1.737
65 1.939 1.904 1.874 1.847 1.823
1.802 1.782 1.765 1.749 1.734
66 1.937 1.902 1.871 1.845 1.821
1.799 1.780 1.762 1.746 1.732
67 1.935 1.900 1.869 1.842 1.818
1.797 1.777 1.760 1.744 1.729
68 1.932 1.897 1.867 1.840 1.816
1.795 1.775 1.758 1.742 1.727
69 1.930 1.895 1.865 1.838 1.814
1.792 1.773 1.755 1.739 1.725
70 1.928 1.893 1.863 1.836 1.812
1.790 1.771 1.753 1.737 1.722
71 1.926 1.891 1.861 1.834 1.810
1.788 1.769 1.751 1.735 1.720
72 1.924 1.889 1.859 1.832 1.808
1.786 1.767 1.749 1.733 1.718
73 1.922 1.887 1.857 1.830 1.806
1.784 1.765 1.747 1.731 1.716
74 1.921 1.885 1.855 1.828 1.804
1.782 1.763 1.745 1.729 1.714
75 1.919 1.884 1.853 1.826 1.802
1.780 1.761 1.743 1.727 1.712
76 1.917 1.882 1.851 1.824 1.800
1.778 1.759 1.741 1.725 1.710
77 1.915 1.880 1.849 1.822 1.798
1.777 1.757 1.739 1.723 1.708
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
78 1.914 1.878 1.848 1.821 1.797
1.775 1.755 1.738 1.721 1.707
79 1.912 1.877 1.846 1.819 1.795
1.773 1.754 1.736 1.720 1.705
80 1.910 1.875 1.845 1.817 1.793
1.772 1.752 1.734 1.718 1.703
81 1.909 1.874 1.843 1.816 1.792
1.770 1.750 1.733 1.716 1.702
82 1.907 1.872 1.841 1.814 1.790
1.768 1.749 1.731 1.715 1.700
83 1.906 1.871 1.840 1.813 1.789
1.767 1.747 1.729 1.713 1.698
84 1.905 1.869 1.838 1.811 1.787
1.765 1.746 1.728 1.712 1.697
85 1.903 1.868 1.837 1.810 1.786
1.764 1.744 1.726 1.710 1.695
86 1.902 1.867 1.836 1.808 1.784
1.762 1.743 1.725 1.709 1.694
87 1.900 1.865 1.834 1.807 1.783
1.761 1.741 1.724 1.707 1.692
88 1.899 1.864 1.833 1.806 1.782
1.760 1.740 1.722 1.706 1.691
89 1.898 1.863 1.832 1.804 1.780
1.758 1.739 1.721 1.705 1.690
90 1.897 1.861 1.830 1.803 1.779
1.757 1.737 1.720 1.703 1.688
91 1.895 1.860 1.829 1.802 1.778
1.756 1.736 1.718 1.702 1.687
92 1.894 1.859 1.828 1.801 1.776
1.755 1.735 1.717 1.701 1.686
93 1.893 1.858 1.827 1.800 1.775
1.753 1.734 1.716 1.699 1.684
94 1.892 1.857 1.826 1.798 1.774
1.752 1.733 1.715 1.698 1.683
95 1.891 1.856 1.825 1.797 1.773
1.751 1.731 1.713 1.697 1.682
96 1.890 1.854 1.823 1.796 1.772
1.750 1.730 1.712 1.696 1.681
97 1.889 1.853 1.822 1.795 1.771
1.749 1.729 1.711 1.695 1.680
98 1.888 1.852 1.821 1.794 1.770
1.748 1.728 1.710 1.694 1.679
99 1.887 1.851 1.820 1.793 1.769
1.747 1.727 1.709 1.693 1.678
100 1.886 1.850 1.819 1.792 1.768
1.746 1.726 1.708 1.691 1.676
Upper critical values of the F distribution
for numerator degrees of freedom and denominator
degrees of freedom
10% significance level
\ 1 2 3 4 5
6 7 8 9 10
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
1 39.863 49.500 53.593 55.833 57.240
58.204 58.906 59.439 59.858 60.195
2 8.526 9.000 9.162 9.243 9.293
9.326 9.349 9.367 9.381 9.392
3 5.538 5.462 5.391 5.343 5.309
5.285 5.266 5.252 5.240 5.230
4 4.545 4.325 4.191 4.107 4.051
4.010 3.979 3.955 3.936 3.920
5 4.060 3.780 3.619 3.520 3.453
3.405 3.368 3.339 3.316 3.297
6 3.776 3.463 3.289 3.181 3.108
3.055 3.014 2.983 2.958 2.937
7 3.589 3.257 3.074 2.961 2.883
2.827 2.785 2.752 2.725 2.703
8 3.458 3.113 2.924 2.806 2.726
2.668 2.624 2.589 2.561 2.538
9 3.360 3.006 2.813 2.693 2.611
2.551 2.505 2.469 2.440 2.416
10 3.285 2.924 2.728 2.605 2.522
2.461 2.414 2.377 2.347 2.323
11 3.225 2.860 2.660 2.536 2.451
2.389 2.342 2.304 2.274 2.248
12 3.177 2.807 2.606 2.480 2.394
2.331 2.283 2.245 2.214 2.188
13 3.136 2.763 2.560 2.434 2.347
2.283 2.234 2.195 2.164 2.138
14 3.102 2.726 2.522 2.395 2.307
2.243 2.193 2.154 2.122 2.095
15 3.073 2.695 2.490 2.361 2.273
2.208 2.158 2.119 2.086 2.059
16 3.048 2.668 2.462 2.333 2.244
2.178 2.128 2.088 2.055 2.028
17 3.026 2.645 2.437 2.308 2.218
2.152 2.102 2.061 2.028 2.001
18 3.007 2.624 2.416 2.286 2.196
2.130 2.079 2.038 2.005 1.977
19 2.990 2.606 2.397 2.266 2.176
2.109 2.058 2.017 1.984 1.956
20 2.975 2.589 2.380 2.249 2.158
2.091 2.040 1.999 1.965 1.937
21 2.961 2.575 2.365 2.233 2.142
2.075 2.023 1.982 1.948 1.920
22 2.949 2.561 2.351 2.219 2.128
2.060 2.008 1.967 1.933 1.904
23 2.937 2.549 2.339 2.207 2.115
2.047 1.995 1.953 1.919 1.890
24 2.927 2.538 2.327 2.195 2.103
2.035 1.983 1.941 1.906 1.877
25 2.918 2.528 2.317 2.184 2.092
2.024 1.971 1.929 1.895 1.866
26 2.909 2.519 2.307 2.174 2.082
2.014 1.961 1.919 1.884 1.855
27 2.901 2.511 2.299 2.165 2.073
2.005 1.952 1.909 1.874 1.845
28 2.894 2.503 2.291 2.157 2.064
1.996 1.943 1.900 1.865 1.836
29 2.887 2.495 2.283 2.149 2.057
1.988 1.935 1.892 1.857 1.827
30 2.881 2.489 2.276 2.142 2.049
1.980 1.927 1.884 1.849 1.819
31 2.875 2.482 2.270 2.136 2.042
1.973 1.920 1.877 1.842 1.812
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
32 2.869 2.477 2.263 2.129 2.036
1.967 1.913 1.870 1.835 1.805
33 2.864 2.471 2.258 2.123 2.030
1.961 1.907 1.864 1.828 1.799
34 2.859 2.466 2.252 2.118 2.024
1.955 1.901 1.858 1.822 1.793
35 2.855 2.461 2.247 2.113 2.019
1.950 1.896 1.852 1.817 1.787
36 2.850 2.456 2.243 2.108 2.014
1.945 1.891 1.847 1.811 1.781
37 2.846 2.452 2.238 2.103 2.009
1.940 1.886 1.842 1.806 1.776
38 2.842 2.448 2.234 2.099 2.005
1.935 1.881 1.838 1.802 1.772
39 2.839 2.444 2.230 2.095 2.001
1.931 1.877 1.833 1.797 1.767
40 2.835 2.440 2.226 2.091 1.997
1.927 1.873 1.829 1.793 1.763
41 2.832 2.437 2.222 2.087 1.993
1.923 1.869 1.825 1.789 1.759
42 2.829 2.434 2.219 2.084 1.989
1.919 1.865 1.821 1.785 1.755
43 2.826 2.430 2.216 2.080 1.986
1.916 1.861 1.817 1.781 1.751
44 2.823 2.427 2.213 2.077 1.983
1.913 1.858 1.814 1.778 1.747
45 2.820 2.425 2.210 2.074 1.980
1.909 1.855 1.811 1.774 1.744
46 2.818 2.422 2.207 2.071 1.977
1.906 1.852 1.808 1.771 1.741
47 2.815 2.419 2.204 2.068 1.974
1.903 1.849 1.805 1.768 1.738
48 2.813 2.417 2.202 2.066 1.971
1.901 1.846 1.802 1.765 1.735
49 2.811 2.414 2.199 2.063 1.968
1.898 1.843 1.799 1.763 1.732
50 2.809 2.412 2.197 2.061 1.966
1.895 1.840 1.796 1.760 1.729
51 2.807 2.410 2.194 2.058 1.964
1.893 1.838 1.794 1.757 1.727
52 2.805 2.408 2.192 2.056 1.961
1.891 1.836 1.791 1.755 1.724
53 2.803 2.406 2.190 2.054 1.959
1.888 1.833 1.789 1.752 1.722
54 2.801 2.404 2.188 2.052 1.957
1.886 1.831 1.787 1.750 1.719
55 2.799 2.402 2.186 2.050 1.955
1.884 1.829 1.785 1.748 1.717
56 2.797 2.400 2.184 2.048 1.953
1.882 1.827 1.782 1.746 1.715
57 2.796 2.398 2.182 2.046 1.951
1.880 1.825 1.780 1.744 1.713
58 2.794 2.396 2.181 2.044 1.949
1.878 1.823 1.779 1.742 1.711
59 2.793 2.395 2.179 2.043 1.947
1.876 1.821 1.777 1.740 1.709
60 2.791 2.393 2.177 2.041 1.946
1.875 1.819 1.775 1.738 1.707
61 2.790 2.392 2.176 2.039 1.944
1.873 1.818 1.773 1.736 1.705
62 2.788 2.390 2.174 2.038 1.942
1.871 1.816 1.771 1.735 1.703
63 2.787 2.389 2.173 2.036 1.941
1.870 1.814 1.770 1.733 1.702
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
64 2.786 2.387 2.171 2.035 1.939
1.868 1.813 1.768 1.731 1.700
65 2.784 2.386 2.170 2.033 1.938
1.867 1.811 1.767 1.730 1.699
66 2.783 2.385 2.169 2.032 1.937
1.865 1.810 1.765 1.728 1.697
67 2.782 2.384 2.167 2.031 1.935
1.864 1.808 1.764 1.727 1.696
68 2.781 2.382 2.166 2.029 1.934
1.863 1.807 1.762 1.725 1.694
69 2.780 2.381 2.165 2.028 1.933
1.861 1.806 1.761 1.724 1.693
70 2.779 2.380 2.164 2.027 1.931
1.860 1.804 1.760 1.723 1.691
71 2.778 2.379 2.163 2.026 1.930
1.859 1.803 1.758 1.721 1.690
72 2.777 2.378 2.161 2.025 1.929
1.858 1.802 1.757 1.720 1.689
73 2.776 2.377 2.160 2.024 1.928
1.856 1.801 1.756 1.719 1.687
74 2.775 2.376 2.159 2.022 1.927
1.855 1.800 1.755 1.718 1.686
75 2.774 2.375 2.158 2.021 1.926
1.854 1.798 1.754 1.716 1.685
76 2.773 2.374 2.157 2.020 1.925
1.853 1.797 1.752 1.715 1.684
77 2.772 2.373 2.156 2.019 1.924
1.852 1.796 1.751 1.714 1.683
78 2.771 2.372 2.155 2.018 1.923
1.851 1.795 1.750 1.713 1.682
79 2.770 2.371 2.154 2.017 1.922
1.850 1.794 1.749 1.712 1.681
80 2.769 2.370 2.154 2.016 1.921
1.849 1.793 1.748 1.711 1.680
81 2.769 2.369 2.153 2.016 1.920
1.848 1.792 1.747 1.710 1.679
82 2.768 2.368 2.152 2.015 1.919
1.847 1.791 1.746 1.709 1.678
83 2.767 2.368 2.151 2.014 1.918
1.846 1.790 1.745 1.708 1.677
84 2.766 2.367 2.150 2.013 1.917
1.845 1.790 1.744 1.707 1.676
85 2.765 2.366 2.149 2.012 1.916
1.845 1.789 1.744 1.706 1.675
86 2.765 2.365 2.149 2.011 1.915
1.844 1.788 1.743 1.705 1.674
87 2.764 2.365 2.148 2.011 1.915
1.843 1.787 1.742 1.705 1.673
88 2.763 2.364 2.147 2.010 1.914
1.842 1.786 1.741 1.704 1.672
89 2.763 2.363 2.146 2.009 1.913
1.841 1.785 1.740 1.703 1.671
90 2.762 2.363 2.146 2.008 1.912
1.841 1.785 1.739 1.702 1.670
91 2.761 2.362 2.145 2.008 1.912
1.840 1.784 1.739 1.701 1.670
92 2.761 2.361 2.144 2.007 1.911
1.839 1.783 1.738 1.701 1.669
93 2.760 2.361 2.144 2.006 1.910
1.838 1.782 1.737 1.700 1.668
94 2.760 2.360 2.143 2.006 1.910
1.838 1.782 1.736 1.699 1.667
95 2.759 2.359 2.142 2.005 1.909
1.837 1.781 1.736 1.698 1.667
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
96 2.759 2.359 2.142 2.004 1.908
1.836 1.780 1.735 1.698 1.666
97 2.758 2.358 2.141 2.004 1.908
1.836 1.780 1.734 1.697 1.665
98 2.757 2.358 2.141 2.003 1.907
1.835 1.779 1.734 1.696 1.665
99 2.757 2.357 2.140 2.003 1.906
1.835 1.778 1.733 1.696 1.664
100 2.756 2.356 2.139 2.002 1.906
1.834 1.778 1.732 1.695 1.663
\ 11 12 13 14 15
16 17 18 19 20
1 60.473 60.705 60.903 61.073 61.220
61.350 61.464 61.566 61.658 61.740
2 9.401 9.408 9.415 9.420 9.425
9.429 9.433 9.436 9.439 9.441
3 5.222 5.216 5.210 5.205 5.200
5.196 5.193 5.190 5.187 5.184
4 3.907 3.896 3.886 3.878 3.870
3.864 3.858 3.853 3.849 3.844
5 3.282 3.268 3.257 3.247 3.238
3.230 3.223 3.217 3.212 3.207
6 2.920 2.905 2.892 2.881 2.871
2.863 2.855 2.848 2.842 2.836
7 2.684 2.668 2.654 2.643 2.632
2.623 2.615 2.607 2.601 2.595
8 2.519 2.502 2.488 2.475 2.464
2.455 2.446 2.438 2.431 2.425
9 2.396 2.379 2.364 2.351 2.340
2.329 2.320 2.312 2.305 2.298
10 2.302 2.284 2.269 2.255 2.244
2.233 2.224 2.215 2.208 2.201
11 2.227 2.209 2.193 2.179 2.167
2.156 2.147 2.138 2.130 2.123
12 2.166 2.147 2.131 2.117 2.105
2.094 2.084 2.075 2.067 2.060
13 2.116 2.097 2.080 2.066 2.053
2.042 2.032 2.023 2.014 2.007
14 2.073 2.054 2.037 2.022 2.010
1.998 1.988 1.978 1.970 1.962
15 2.037 2.017 2.000 1.985 1.972
1.961 1.950 1.941 1.932 1.924
16 2.005 1.985 1.968 1.953 1.940
1.928 1.917 1.908 1.899 1.891
17 1.978 1.958 1.940 1.925 1.912
1.900 1.889 1.879 1.870 1.862
18 1.954 1.933 1.916 1.900 1.887
1.875 1.864 1.854 1.845 1.837
19 1.932 1.912 1.894 1.878 1.865
1.852 1.841 1.831 1.822 1.814
20 1.913 1.892 1.875 1.859 1.845
1.833 1.821 1.811 1.802 1.794
21 1.896 1.875 1.857 1.841 1.827
1.815 1.803 1.793 1.784 1.776
22 1.880 1.859 1.841 1.825 1.811
1.798 1.787 1.777 1.768 1.759
23 1.866 1.845 1.827 1.811 1.796
1.784 1.772 1.762 1.753 1.744
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
24 1.853 1.832 1.814 1.797 1.783
1.770 1.759 1.748 1.739 1.730
25 1.841 1.820 1.802 1.785 1.771
1.758 1.746 1.736 1.726 1.718
26 1.830 1.809 1.790 1.774 1.760
1.747 1.735 1.724 1.715 1.706
27 1.820 1.799 1.780 1.764 1.749
1.736 1.724 1.714 1.704 1.695
28 1.811 1.790 1.771 1.754 1.740
1.726 1.715 1.704 1.694 1.685
29 1.802 1.781 1.762 1.745 1.731
1.717 1.705 1.695 1.685 1.676
30 1.794 1.773 1.754 1.737 1.722
1.709 1.697 1.686 1.676 1.667
31 1.787 1.765 1.746 1.729 1.714
1.701 1.689 1.678 1.668 1.659
32 1.780 1.758 1.739 1.722 1.707
1.694 1.682 1.671 1.661 1.652
33 1.773 1.751 1.732 1.715 1.700
1.687 1.675 1.664 1.654 1.645
34 1.767 1.745 1.726 1.709 1.694
1.680 1.668 1.657 1.647 1.638
35 1.761 1.739 1.720 1.703 1.688
1.674 1.662 1.651 1.641 1.632
36 1.756 1.734 1.715 1.697 1.682
1.669 1.656 1.645 1.635 1.626
37 1.751 1.729 1.709 1.692 1.677
1.663 1.651 1.640 1.630 1.620
38 1.746 1.724 1.704 1.687 1.672
1.658 1.646 1.635 1.624 1.615
39 1.741 1.719 1.700 1.682 1.667
1.653 1.641 1.630 1.619 1.610
40 1.737 1.715 1.695 1.678 1.662
1.649 1.636 1.625 1.615 1.605
41 1.733 1.710 1.691 1.673 1.658
1.644 1.632 1.620 1.610 1.601
42 1.729 1.706 1.687 1.669 1.654
1.640 1.628 1.616 1.606 1.596
43 1.725 1.703 1.683 1.665 1.650
1.636 1.624 1.612 1.602 1.592
44 1.721 1.699 1.679 1.662 1.646
1.632 1.620 1.608 1.598 1.588
45 1.718 1.695 1.676 1.658 1.643
1.629 1.616 1.605 1.594 1.585
46 1.715 1.692 1.672 1.655 1.639
1.625 1.613 1.601 1.591 1.581
47 1.712 1.689 1.669 1.652 1.636
1.622 1.609 1.598 1.587 1.578
48 1.709 1.686 1.666 1.648 1.633
1.619 1.606 1.594 1.584 1.574
49 1.706 1.683 1.663 1.645 1.630
1.616 1.603 1.591 1.581 1.571
50 1.703 1.680 1.660 1.643 1.627
1.613 1.600 1.588 1.578 1.568
51 1.700 1.677 1.658 1.640 1.624
1.610 1.597 1.586 1.575 1.565
52 1.698 1.675 1.655 1.637 1.621
1.607 1.594 1.583 1.572 1.562
53 1.695 1.672 1.652 1.635 1.619
1.605 1.592 1.580 1.570 1.560
54 1.693 1.670 1.650 1.632 1.616
1.602 1.589 1.578 1.567 1.557
55 1.691 1.668 1.648 1.630 1.614
1.600 1.587 1.575 1.564 1.555
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
56 1.688 1.666 1.645 1.628 1.612
1.597 1.585 1.573 1.562 1.552
57 1.686 1.663 1.643 1.625 1.610
1.595 1.582 1.571 1.560 1.550
58 1.684 1.661 1.641 1.623 1.607
1.593 1.580 1.568 1.558 1.548
59 1.682 1.659 1.639 1.621 1.605
1.591 1.578 1.566 1.555 1.546
60 1.680 1.657 1.637 1.619 1.603
1.589 1.576 1.564 1.553 1.543
61 1.679 1.656 1.635 1.617 1.601
1.587 1.574 1.562 1.551 1.541
62 1.677 1.654 1.634 1.616 1.600
1.585 1.572 1.560 1.549 1.540
63 1.675 1.652 1.632 1.614 1.598
1.583 1.570 1.558 1.548 1.538
64 1.673 1.650 1.630 1.612 1.596
1.582 1.569 1.557 1.546 1.536
65 1.672 1.649 1.628 1.610 1.594
1.580 1.567 1.555 1.544 1.534
66 1.670 1.647 1.627 1.609 1.593
1.578 1.565 1.553 1.542 1.532
67 1.669 1.646 1.625 1.607 1.591
1.577 1.564 1.552 1.541 1.531
68 1.667 1.644 1.624 1.606 1.590
1.575 1.562 1.550 1.539 1.529
69 1.666 1.643 1.622 1.604 1.588
1.574 1.560 1.548 1.538 1.527
70 1.665 1.641 1.621 1.603 1.587
1.572 1.559 1.547 1.536 1.526
71 1.663 1.640 1.619 1.601 1.585
1.571 1.557 1.545 1.535 1.524
72 1.662 1.639 1.618 1.600 1.584
1.569 1.556 1.544 1.533 1.523
73 1.661 1.637 1.617 1.599 1.583
1.568 1.555 1.543 1.532 1.522
74 1.659 1.636 1.616 1.597 1.581
1.567 1.553 1.541 1.530 1.520
75 1.658 1.635 1.614 1.596 1.580
1.565 1.552 1.540 1.529 1.519
76 1.657 1.634 1.613 1.595 1.579
1.564 1.551 1.539 1.528 1.518
77 1.656 1.632 1.612 1.594 1.578
1.563 1.550 1.538 1.527 1.516
78 1.655 1.631 1.611 1.593 1.576
1.562 1.548 1.536 1.525 1.515
79 1.654 1.630 1.610 1.592 1.575
1.561 1.547 1.535 1.524 1.514
80 1.653 1.629 1.609 1.590 1.574
1.559 1.546 1.534 1.523 1.513
81 1.652 1.628 1.608 1.589 1.573
1.558 1.545 1.533 1.522 1.512
82 1.651 1.627 1.607 1.588 1.572
1.557 1.544 1.532 1.521 1.511
83 1.650 1.626 1.606 1.587 1.571
1.556 1.543 1.531 1.520 1.509
84 1.649 1.625 1.605 1.586 1.570
1.555 1.542 1.530 1.519 1.508
85 1.648 1.624 1.604 1.585 1.569
1.554 1.541 1.529 1.518 1.507
86 1.647 1.623 1.603 1.584 1.568
1.553 1.540 1.528 1.517 1.506
87 1.646 1.622 1.602 1.583 1.567
1.552 1.539 1.527 1.516 1.505
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
88 1.645 1.622 1.601 1.583 1.566
1.551 1.538 1.526 1.515 1.504
89 1.644 1.621 1.600 1.582 1.565
1.550 1.537 1.525 1.514 1.503
90 1.643 1.620 1.599 1.581 1.564
1.550 1.536 1.524 1.513 1.503
91 1.643 1.619 1.598 1.580 1.564
1.549 1.535 1.523 1.512 1.502
92 1.642 1.618 1.598 1.579 1.563
1.548 1.534 1.522 1.511 1.501
93 1.641 1.617 1.597 1.578 1.562
1.547 1.534 1.521 1.510 1.500
94 1.640 1.617 1.596 1.578 1.561
1.546 1.533 1.521 1.509 1.499
95 1.640 1.616 1.595 1.577 1.560
1.545 1.532 1.520 1.509 1.498
96 1.639 1.615 1.594 1.576 1.560
1.545 1.531 1.519 1.508 1.497
97 1.638 1.614 1.594 1.575 1.559
1.544 1.530 1.518 1.507 1.497
98 1.637 1.614 1.593 1.575 1.558
1.543 1.530 1.517 1.506 1.496
99 1.637 1.613 1.592 1.574 1.557
1.542 1.529 1.517 1.505 1.495
100 1.636 1.612 1.592 1.573 1.557
1.542 1.528 1.516 1.505 1.494
Upper critical values of the F distribution
for numerator degrees of freedom and denominator
degrees of freedom
1% significance level
\ 1 2 3 4 5
6 7 8 9 10
1 4052.19 4999.52 5403.34 5624.62 5763.65
5858.97 5928.33 5981.10 6022.50 6055.85
2 98.502 99.000 99.166 99.249 99.300
99.333 99.356 99.374 99.388 99.399
3 34.116 30.816 29.457 28.710 28.237
27.911 27.672 27.489 27.345 27.229
4 21.198 18.000 16.694 15.977 15.522
15.207 14.976 14.799 14.659 14.546
5 16.258 13.274 12.060 11.392 10.967
10.672 10.456 10.289 10.158 10.051
6 13.745 10.925 9.780 9.148 8.746
8.466 8.260 8.102 7.976 7.874
7 12.246 9.547 8.451 7.847 7.460
7.191 6.993 6.840 6.719 6.620
8 11.259 8.649 7.591 7.006 6.632
6.371 6.178 6.029 5.911 5.814
9 10.561 8.022 6.992 6.422 6.057
5.802 5.613 5.467 5.351 5.257
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
10 10.044 7.559 6.552 5.994 5.636
5.386 5.200 5.057 4.942 4.849
11 9.646 7.206 6.217 5.668 5.316
5.069 4.886 4.744 4.632 4.539
12 9.330 6.927 5.953 5.412 5.064
4.821 4.640 4.499 4.388 4.296
13 9.074 6.701 5.739 5.205 4.862
4.620 4.441 4.302 4.191 4.100
14 8.862 6.515 5.564 5.035 4.695
4.456 4.278 4.140 4.030 3.939
15 8.683 6.359 5.417 4.893 4.556
4.318 4.142 4.004 3.895 3.805
16 8.531 6.226 5.292 4.773 4.437
4.202 4.026 3.890 3.780 3.691
17 8.400 6.112 5.185 4.669 4.336
4.102 3.927 3.791 3.682 3.593
18 8.285 6.013 5.092 4.579 4.248
4.015 3.841 3.705 3.597 3.508
19 8.185 5.926 5.010 4.500 4.171
3.939 3.765 3.631 3.523 3.434
20 8.096 5.849 4.938 4.431 4.103
3.871 3.699 3.564 3.457 3.368
21 8.017 5.780 4.874 4.369 4.042
3.812 3.640 3.506 3.398 3.310
22 7.945 5.719 4.817 4.313 3.988
3.758 3.587 3.453 3.346 3.258
23 7.881 5.664 4.765 4.264 3.939
3.710 3.539 3.406 3.299 3.211
24 7.823 5.614 4.718 4.218 3.895
3.667 3.496 3.363 3.256 3.168
25 7.770 5.568 4.675 4.177 3.855
3.627 3.457 3.324 3.217 3.129
26 7.721 5.526 4.637 4.140 3.818
3.591 3.421 3.288 3.182 3.094
27 7.677 5.488 4.601 4.106 3.785
3.558 3.388 3.256 3.149 3.062
28 7.636 5.453 4.568 4.074 3.754
3.528 3.358 3.226 3.120 3.032
29 7.598 5.420 4.538 4.045 3.725
3.499 3.330 3.198 3.092 3.005
30 7.562 5.390 4.510 4.018 3.699
3.473 3.305 3.173 3.067 2.979
31 7.530 5.362 4.484 3.993 3.675
3.449 3.281 3.149 3.043 2.955
32 7.499 5.336 4.459 3.969 3.652
3.427 3.258 3.127 3.021 2.934
33 7.471 5.312 4.437 3.948 3.630
3.406 3.238 3.106 3.000 2.913
34 7.444 5.289 4.416 3.927 3.611
3.386 3.218 3.087 2.981 2.894
35 7.419 5.268 4.396 3.908 3.592
3.368 3.200 3.069 2.963 2.876
36 7.396 5.248 4.377 3.890 3.574
3.351 3.183 3.052 2.946 2.859
37 7.373 5.229 4.360 3.873 3.558
3.334 3.167 3.036 2.930 2.843
38 7.353 5.211 4.343 3.858 3.542
3.319 3.152 3.021 2.915 2.828
39 7.333 5.194 4.327 3.843 3.528
3.305 3.137 3.006 2.901 2.814
40 7.314 5.179 4.313 3.828 3.514
3.291 3.124 2.993 2.888 2.801
41 7.296 5.163 4.299 3.815 3.501
3.278 3.111 2.980 2.875 2.788
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
42 7.280 5.149 4.285 3.802 3.488
3.266 3.099 2.968 2.863 2.776
43 7.264 5.136 4.273 3.790 3.476
3.254 3.087 2.957 2.851 2.764
44 7.248 5.123 4.261 3.778 3.465
3.243 3.076 2.946 2.840 2.754
45 7.234 5.110 4.249 3.767 3.454
3.232 3.066 2.935 2.830 2.743
46 7.220 5.099 4.238 3.757 3.444
3.222 3.056 2.925 2.820 2.733
47 7.207 5.087 4.228 3.747 3.434
3.213 3.046 2.916 2.811 2.724
48 7.194 5.077 4.218 3.737 3.425
3.204 3.037 2.907 2.802 2.715
49 7.182 5.066 4.208 3.728 3.416
3.195 3.028 2.898 2.793 2.706
50 7.171 5.057 4.199 3.720 3.408
3.186 3.020 2.890 2.785 2.698
51 7.159 5.047 4.191 3.711 3.400
3.178 3.012 2.882 2.777 2.690
52 7.149 5.038 4.182 3.703 3.392
3.171 3.005 2.874 2.769 2.683
53 7.139 5.030 4.174 3.695 3.384
3.163 2.997 2.867 2.762 2.675
54 7.129 5.021 4.167 3.688 3.377
3.156 2.990 2.860 2.755 2.668
55 7.119 5.013 4.159 3.681 3.370
3.149 2.983 2.853 2.748 2.662
56 7.110 5.006 4.152 3.674 3.363
3.143 2.977 2.847 2.742 2.655
57 7.102 4.998 4.145 3.667 3.357
3.136 2.971 2.841 2.736 2.649
58 7.093 4.991 4.138 3.661 3.351
3.130 2.965 2.835 2.730 2.643
59 7.085 4.984 4.132 3.655 3.345
3.124 2.959 2.829 2.724 2.637
60 7.077 4.977 4.126 3.649 3.339
3.119 2.953 2.823 2.718 2.632
61 7.070 4.971 4.120 3.643 3.333
3.113 2.948 2.818 2.713 2.626
62 7.062 4.965 4.114 3.638 3.328
3.108 2.942 2.813 2.708 2.621
63 7.055 4.959 4.109 3.632 3.323
3.103 2.937 2.808 2.703 2.616
64 7.048 4.953 4.103 3.627 3.318
3.098 2.932 2.803 2.698 2.611
65 7.042 4.947 4.098 3.622 3.313
3.093 2.928 2.798 2.693 2.607
66 7.035 4.942 4.093 3.618 3.308
3.088 2.923 2.793 2.689 2.602
67 7.029 4.937 4.088 3.613 3.304
3.084 2.919 2.789 2.684 2.598
68 7.023 4.932 4.083 3.608 3.299
3.080 2.914 2.785 2.680 2.593
69 7.017 4.927 4.079 3.604 3.295
3.075 2.910 2.781 2.676 2.589
70 7.011 4.922 4.074 3.600 3.291
3.071 2.906 2.777 2.672 2.585
71 7.006 4.917 4.070 3.596 3.287
3.067 2.902 2.773 2.668 2.581
72 7.001 4.913 4.066 3.591 3.283
3.063 2.898 2.769 2.664 2.578
73 6.995 4.908 4.062 3.588 3.279
3.060 2.895 2.765 2.660 2.574
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
74 6.990 4.904 4.058 3.584 3.275
3.056 2.891 2.762 2.657 2.570
75 6.985 4.900 4.054 3.580 3.272
3.052 2.887 2.758 2.653 2.567
76 6.981 4.896 4.050 3.577 3.268
3.049 2.884 2.755 2.650 2.563
77 6.976 4.892 4.047 3.573 3.265
3.046 2.881 2.751 2.647 2.560
78 6.971 4.888 4.043 3.570 3.261
3.042 2.877 2.748 2.644 2.557
79 6.967 4.884 4.040 3.566 3.258
3.039 2.874 2.745 2.640 2.554
80 6.963 4.881 4.036 3.563 3.255
3.036 2.871 2.742 2.637 2.551
81 6.958 4.877 4.033 3.560 3.252
3.033 2.868 2.739 2.634 2.548
82 6.954 4.874 4.030 3.557 3.249
3.030 2.865 2.736 2.632 2.545
83 6.950 4.870 4.027 3.554 3.246
3.027 2.863 2.733 2.629 2.542
84 6.947 4.867 4.024 3.551 3.243
3.025 2.860 2.731 2.626 2.539
85 6.943 4.864 4.021 3.548 3.240
3.022 2.857 2.728 2.623 2.537
86 6.939 4.861 4.018 3.545 3.238
3.019 2.854 2.725 2.621 2.534
87 6.935 4.858 4.015 3.543 3.235
3.017 2.852 2.723 2.618 2.532
88 6.932 4.855 4.012 3.540 3.233
3.014 2.849 2.720 2.616 2.529
89 6.928 4.852 4.010 3.538 3.230
3.012 2.847 2.718 2.613 2.527
90 6.925 4.849 4.007 3.535 3.228
3.009 2.845 2.715 2.611 2.524
91 6.922 4.846 4.004 3.533 3.225
3.007 2.842 2.713 2.609 2.522
92 6.919 4.844 4.002 3.530 3.223
3.004 2.840 2.711 2.606 2.520
93 6.915 4.841 3.999 3.528 3.221
3.002 2.838 2.709 2.604 2.518
94 6.912 4.838 3.997 3.525 3.218
3.000 2.835 2.706 2.602 2.515
95 6.909 4.836 3.995 3.523 3.216
2.998 2.833 2.704 2.600 2.513
96 6.906 4.833 3.992 3.521 3.214
2.996 2.831 2.702 2.598 2.511
97 6.904 4.831 3.990 3.519 3.212
2.994 2.829 2.700 2.596 2.509
98 6.901 4.829 3.988 3.517 3.210
2.992 2.827 2.698 2.594 2.507
99 6.898 4.826 3.986 3.515 3.208
2.990 2.825 2.696 2.592 2.505
100 6.895 4.824 3.984 3.513 3.206
2.988 2.823 2.694 2.590 2.503
\ 11 12 13 14 15
16 17 18 19 20
1. 6083.35 6106.35 6125.86 6142.70 6157.28
6170.12 6181.42 6191.52 6200.58 6208.74
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
2. 99.408 99.416 99.422 99.428 99.432
99.437 99.440 99.444 99.447 99.449
3. 27.133 27.052 26.983 26.924 26.872
26.827 26.787 26.751 26.719 26.690
4. 14.452 14.374 14.307 14.249 14.198
14.154 14.115 14.080 14.048 14.020
5. 9.963 9.888 9.825 9.770 9.722
9.680 9.643 9.610 9.580 9.553
6. 7.790 7.718 7.657 7.605 7.559
7.519 7.483 7.451 7.422 7.396
7. 6.538 6.469 6.410 6.359 6.314
6.275 6.240 6.209 6.181 6.155
8. 5.734 5.667 5.609 5.559 5.515
5.477 5.442 5.412 5.384 5.359
9. 5.178 5.111 5.055 5.005 4.962
4.924 4.890 4.860 4.833 4.808
10. 4.772 4.706 4.650 4.601 4.558
4.520 4.487 4.457 4.430 4.405
11. 4.462 4.397 4.342 4.293 4.251
4.213 4.180 4.150 4.123 4.099
12. 4.220 4.155 4.100 4.052 4.010
3.972 3.939 3.909 3.883 3.858
13. 4.025 3.960 3.905 3.857 3.815
3.778 3.745 3.716 3.689 3.665
14. 3.864 3.800 3.745 3.698 3.656
3.619 3.586 3.556 3.529 3.505
15. 3.730 3.666 3.612 3.564 3.522
3.485 3.452 3.423 3.396 3.372
16. 3.616 3.553 3.498 3.451 3.409
3.372 3.339 3.310 3.283 3.259
17. 3.519 3.455 3.401 3.353 3.312
3.275 3.242 3.212 3.186 3.162
18. 3.434 3.371 3.316 3.269 3.227
3.190 3.158 3.128 3.101 3.077
19. 3.360 3.297 3.242 3.195 3.153
3.116 3.084 3.054 3.027 3.003
20. 3.294 3.231 3.177 3.130 3.088
3.051 3.018 2.989 2.962 2.938
21. 3.236 3.173 3.119 3.072 3.030
2.993 2.960 2.931 2.904 2.880
22. 3.184 3.121 3.067 3.019 2.978
2.941 2.908 2.879 2.852 2.827
23. 3.137 3.074 3.020 2.973 2.931
2.894 2.861 2.832 2.805 2.781
24. 3.094 3.032 2.977 2.930 2.889
2.852 2.819 2.789 2.762 2.738
25. 3.056 2.993 2.939 2.892 2.850
2.813 2.780 2.751 2.724 2.699
26. 3.021 2.958 2.904 2.857 2.815
2.778 2.745 2.715 2.688 2.664
27. 2.988 2.926 2.871 2.824 2.783
2.746 2.713 2.683 2.656 2.632
28. 2.959 2.896 2.842 2.795 2.753
2.716 2.683 2.653 2.626 2.602
29. 2.931 2.868 2.814 2.767 2.726
2.689 2.656 2.626 2.599 2.574
30. 2.906 2.843 2.789 2.742 2.700
2.663 2.630 2.600 2.573 2.549
31. 2.882 2.820 2.765 2.718 2.677
2.640 2.606 2.577 2.550 2.525
32. 2.860 2.798 2.744 2.696 2.655
2.618 2.584 2.555 2.527 2.503
33. 2.840 2.777 2.723 2.676 2.634
2.597 2.564 2.534 2.507 2.482
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
34. 2.821 2.758 2.704 2.657 2.615
2.578 2.545 2.515 2.488 2.463
35. 2.803 2.740 2.686 2.639 2.597
2.560 2.527 2.497 2.470 2.445
36. 2.786 2.723 2.669 2.622 2.580
2.543 2.510 2.480 2.453 2.428
37. 2.770 2.707 2.653 2.606 2.564
2.527 2.494 2.464 2.437 2.412
38. 2.755 2.692 2.638 2.591 2.549
2.512 2.479 2.449 2.421 2.397
39. 2.741 2.678 2.624 2.577 2.535
2.498 2.465 2.434 2.407 2.382
40. 2.727 2.665 2.611 2.563 2.522
2.484 2.451 2.421 2.394 2.369
41. 2.715 2.652 2.598 2.551 2.509
2.472 2.438 2.408 2.381 2.356
42. 2.703 2.640 2.586 2.539 2.497
2.460 2.426 2.396 2.369 2.344
43. 2.691 2.629 2.575 2.527 2.485
2.448 2.415 2.385 2.357 2.332
44. 2.680 2.618 2.564 2.516 2.475
2.437 2.404 2.374 2.346 2.321
45. 2.670 2.608 2.553 2.506 2.464
2.427 2.393 2.363 2.336 2.311
46. 2.660 2.598 2.544 2.496 2.454
2.417 2.384 2.353 2.326 2.301
47. 2.651 2.588 2.534 2.487 2.445
2.408 2.374 2.344 2.316 2.291
48. 2.642 2.579 2.525 2.478 2.436
2.399 2.365 2.335 2.307 2.282
49. 2.633 2.571 2.517 2.469 2.427
2.390 2.356 2.326 2.299 2.274
50. 2.625 2.562 2.508 2.461 2.419
2.382 2.348 2.318 2.290 2.265
51. 2.617 2.555 2.500 2.453 2.411
2.374 2.340 2.310 2.282 2.257
52. 2.610 2.547 2.493 2.445 2.403
2.366 2.333 2.302 2.275 2.250
53. 2.602 2.540 2.486 2.438 2.396
2.359 2.325 2.295 2.267 2.242
54. 2.595 2.533 2.479 2.431 2.389
2.352 2.318 2.288 2.260 2.235
55. 2.589 2.526 2.472 2.424 2.382
2.345 2.311 2.281 2.253 2.228
56. 2.582 2.520 2.465 2.418 2.376
2.339 2.305 2.275 2.247 2.222
57. 2.576 2.513 2.459 2.412 2.370
2.332 2.299 2.268 2.241 2.215
58. 2.570 2.507 2.453 2.406 2.364
2.326 2.293 2.262 2.235 2.209
59. 2.564 2.502 2.447 2.400 2.358
2.320 2.287 2.256 2.229 2.203
60. 2.559 2.496 2.442 2.394 2.352
2.315 2.281 2.251 2.223 2.198
61. 2.553 2.491 2.436 2.389 2.347
2.309 2.276 2.245 2.218 2.192
62. 2.548 2.486 2.431 2.384 2.342
2.304 2.270 2.240 2.212 2.187
63. 2.543 2.481 2.426 2.379 2.337
2.299 2.265 2.235 2.207 2.182
64. 2.538 2.476 2.421 2.374 2.332
2.294 2.260 2.230 2.202 2.177
65. 2.534 2.471 2.417 2.369 2.327
2.289 2.256 2.225 2.198 2.172
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
66. 2.529 2.466 2.412 2.365 2.322
2.285 2.251 2.221 2.193 2.168
67. 2.525 2.462 2.408 2.360 2.318
2.280 2.247 2.216 2.188 2.163
68. 2.520 2.458 2.403 2.356 2.314
2.276 2.242 2.212 2.184 2.159
69. 2.516 2.454 2.399 2.352 2.310
2.272 2.238 2.208 2.180 2.155
70. 2.512 2.450 2.395 2.348 2.306
2.268 2.234 2.204 2.176 2.150
71. 2.508 2.446 2.391 2.344 2.302
2.264 2.230 2.200 2.172 2.146
72. 2.504 2.442 2.388 2.340 2.298
2.260 2.226 2.196 2.168 2.143
73. 2.501 2.438 2.384 2.336 2.294
2.256 2.223 2.192 2.164 2.139
74. 2.497 2.435 2.380 2.333 2.290
2.253 2.219 2.188 2.161 2.135
75. 2.494 2.431 2.377 2.329 2.287
2.249 2.215 2.185 2.157 2.132
76. 2.490 2.428 2.373 2.326 2.284
2.246 2.212 2.181 2.154 2.128
77. 2.487 2.424 2.370 2.322 2.280
2.243 2.209 2.178 2.150 2.125
78. 2.484 2.421 2.367 2.319 2.277
2.239 2.206 2.175 2.147 2.122
79. 2.481 2.418 2.364 2.316 2.274
2.236 2.202 2.172 2.144 2.118
80. 2.478 2.415 2.361 2.313 2.271
2.233 2.199 2.169 2.141 2.115
81. 2.475 2.412 2.358 2.310 2.268
2.230 2.196 2.166 2.138 2.112
82. 2.472 2.409 2.355 2.307 2.265
2.227 2.193 2.163 2.135 2.109
83. 2.469 2.406 2.352 2.304 2.262
2.224 2.191 2.160 2.132 2.106
84. 2.466 2.404 2.349 2.302 2.259
2.222 2.188 2.157 2.129 2.104
85. 2.464 2.401 2.347 2.299 2.257
2.219 2.185 2.154 2.126 2.101
86. 2.461 2.398 2.344 2.296 2.254
2.216 2.182 2.152 2.124 2.098
87. 2.459 2.396 2.342 2.294 2.252
2.214 2.180 2.149 2.121 2.096
88. 2.456 2.393 2.339 2.291 2.249
2.211 2.177 2.147 2.119 2.093
89. 2.454 2.391 2.337 2.289 2.247
2.209 2.175 2.144 2.116 2.091
90. 2.451 2.389 2.334 2.286 2.244
2.206 2.172 2.142 2.114 2.088
91. 2.449 2.386 2.332 2.284 2.242
2.204 2.170 2.139 2.111 2.086
92. 2.447 2.384 2.330 2.282 2.240
2.202 2.168 2.137 2.109 2.083
93. 2.444 2.382 2.327 2.280 2.237
2.200 2.166 2.135 2.107 2.081
94. 2.442 2.380 2.325 2.277 2.235
2.197 2.163 2.133 2.105 2.079
95. 2.440 2.378 2.323 2.275 2.233
2.195 2.161 2.130 2.102 2.077
96. 2.438 2.375 2.321 2.273 2.231
2.193 2.159 2.128 2.100 2.075
97. 2.436 2.373 2.319 2.271 2.229
2.191 2.157 2.126 2.098 2.073
1.3.6.7.3. Upper Critical Values of the F Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm[6/27/2012 2:02:56 PM]
98. 2.434 2.371 2.317 2.269 2.227
2.189 2.155 2.124 2.096 2.071
99. 2.432 2.369 2.315 2.267 2.225
2.187 2.153 2.122 2.094 2.069
100. 2.430 2.368 2.313 2.265 2.223
2.185 2.151 2.120 2.092 2.067
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.7. Tables for Probability Distributions
1.3.6.7.4. Critical Values of the Chi-Square
Distribution
How to
Use This
Table
This table contains the critical values of the chi-square
distribution. Because of the lack of symmetry of the chi-
square distribution, separate tables are provided for the upper
and lower tails of the distribution.
A test statistic with degrees of freedom is computed from
the data. For upper-tail one-sided tests, the test statistic is
compared with a value from the table of upper-tail critical
values. For two-sided tests, the test statistic is compared with
values from both the table for the upper-tail critical values and
the table for the lower-tail critical values.
The significance level, , is demonstrated with the graph
below which shows a chi-square distribution with 3 degrees of
freedom for a two-sided test at significance level = 0.05. If
the test statistic is greater than the upper-tail critical value or
less than the lower-tail critical value, we reject the null
hypothesis. Specific instructions are given below.
Given a specified value of :
1. For a two-sided test, find the column corresponding to
1-/2 in the table for upper-tail critical values and reject
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
the null hypothesis if the test statistic is greater than the
tabled value. Similarly, find the column corresponding
to /2 in the table for lower-tail critical values and
reject the null hypothesis if the test statistic is less than
the tabled value.
2. For an upper-tail one-sided test, find the column
corresponding to 1- in the table containing upper-tail
critical and reject the null hypothesis if the test statistic
is greater than the tabled value.
3. For a lower-tail one-sided test, find the column
corresponding to in the lower-tail critical values table
and reject the null hypothesis if the computed test
statistic is less than the tabled value.
Upper-tail critical values of chi-square distribution with
degrees of freedom
Probability less than the critical
value
0.90 0.95 0.975 0.99
0.999
1 2.706 3.841 5.024 6.635
10.828
2 4.605 5.991 7.378 9.210
13.816
3 6.251 7.815 9.348 11.345
16.266
4 7.779 9.488 11.143 13.277
18.467
5 9.236 11.070 12.833 15.086
20.515
6 10.645 12.592 14.449 16.812
22.458
7 12.017 14.067 16.013 18.475
24.322
8 13.362 15.507 17.535 20.090
26.125
9 14.684 16.919 19.023 21.666
27.877
10 15.987 18.307 20.483 23.209
29.588
11 17.275 19.675 21.920 24.725
31.264
12 18.549 21.026 23.337 26.217
32.910
13 19.812 22.362 24.736 27.688
34.528
14 21.064 23.685 26.119 29.141
36.123
15 22.307 24.996 27.488 30.578
37.697
16 23.542 26.296 28.845 32.000
39.252
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
17 24.769 27.587 30.191 33.409
40.790
18 25.989 28.869 31.526 34.805
42.312
19 27.204 30.144 32.852 36.191
43.820
20 28.412 31.410 34.170 37.566
45.315
21 29.615 32.671 35.479 38.932
46.797
22 30.813 33.924 36.781 40.289
48.268
23 32.007 35.172 38.076 41.638
49.728
24 33.196 36.415 39.364 42.980
51.179
25 34.382 37.652 40.646 44.314
52.620
26 35.563 38.885 41.923 45.642
54.052
27 36.741 40.113 43.195 46.963
55.476
28 37.916 41.337 44.461 48.278
56.892
29 39.087 42.557 45.722 49.588
58.301
30 40.256 43.773 46.979 50.892
59.703
31 41.422 44.985 48.232 52.191
61.098
32 42.585 46.194 49.480 53.486
62.487
33 43.745 47.400 50.725 54.776
63.870
34 44.903 48.602 51.966 56.061
65.247
35 46.059 49.802 53.203 57.342
66.619
36 47.212 50.998 54.437 58.619
67.985
37 48.363 52.192 55.668 59.893
69.347
38 49.513 53.384 56.896 61.162
70.703
39 50.660 54.572 58.120 62.428
72.055
40 51.805 55.758 59.342 63.691
73.402
41 52.949 56.942 60.561 64.950
74.745
42 54.090 58.124 61.777 66.206
76.084
43 55.230 59.304 62.990 67.459
77.419
44 56.369 60.481 64.201 68.710
78.750
45 57.505 61.656 65.410 69.957
80.077
46 58.641 62.830 66.617 71.201
81.400
47 59.774 64.001 67.821 72.443
82.720
48 60.907 65.171 69.023 73.683
84.037
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
49 62.038 66.339 70.222 74.919
85.351
50 63.167 67.505 71.420 76.154
86.661
51 64.295 68.669 72.616 77.386
87.968
52 65.422 69.832 73.810 78.616
89.272
53 66.548 70.993 75.002 79.843
90.573
54 67.673 72.153 76.192 81.069
91.872
55 68.796 73.311 77.380 82.292
93.168
56 69.919 74.468 78.567 83.513
94.461
57 71.040 75.624 79.752 84.733
95.751
58 72.160 76.778 80.936 85.950
97.039
59 73.279 77.931 82.117 87.166
98.324
60 74.397 79.082 83.298 88.379
99.607
61 75.514 80.232 84.476 89.591
100.888
62 76.630 81.381 85.654 90.802
102.166
63 77.745 82.529 86.830 92.010
103.442
64 78.860 83.675 88.004 93.217
104.716
65 79.973 84.821 89.177 94.422
105.988
66 81.085 85.965 90.349 95.626
107.258
67 82.197 87.108 91.519 96.828
108.526
68 83.308 88.250 92.689 98.028
109.791
69 84.418 89.391 93.856 99.228
111.055
70 85.527 90.531 95.023 100.425
112.317
71 86.635 91.670 96.189 101.621
113.577
72 87.743 92.808 97.353 102.816
114.835
73 88.850 93.945 98.516 104.010
116.092
74 89.956 95.081 99.678 105.202
117.346
75 91.061 96.217 100.839 106.393
118.599
76 92.166 97.351 101.999 107.583
119.850
77 93.270 98.484 103.158 108.771
121.100
78 94.374 99.617 104.316 109.958
122.348
79 95.476 100.749 105.473 111.144
123.594
80 96.578 101.879 106.629 112.329
124.839
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
81 97.680 103.010 107.783 113.512
126.083
82 98.780 104.139 108.937 114.695
127.324
83 99.880 105.267 110.090 115.876
128.565
84 100.980 106.395 111.242 117.057
129.804
85 102.079 107.522 112.393 118.236
131.041
86 103.177 108.648 113.544 119.414
132.277
87 104.275 109.773 114.693 120.591
133.512
88 105.372 110.898 115.841 121.767
134.746
89 106.469 112.022 116.989 122.942
135.978
90 107.565 113.145 118.136 124.116
137.208
91 108.661 114.268 119.282 125.289
138.438
92 109.756 115.390 120.427 126.462
139.666
93 110.850 116.511 121.571 127.633
140.893
94 111.944 117.632 122.715 128.803
142.119
95 113.038 118.752 123.858 129.973
143.344
96 114.131 119.871 125.000 131.141
144.567
97 115.223 120.990 126.141 132.309
145.789
98 116.315 122.108 127.282 133.476
147.010
99 117.407 123.225 128.422 134.642
148.230
100 118.498 124.342 129.561 135.807
149.449
100 118.498 124.342 129.561 135.807
149.449
Lower-tail critical values of chi-square distribution with
degrees of freedom
Probability less than the critical
value
0.10 0.05 0.025 0.01
0.001
1. .016 .004 .001 .000
.000
2. .211 .103 .051 .020
.002
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
3. .584 .352 .216 .115
.024
4. 1.064 .711 .484 .297
.091
5. 1.610 1.145 .831 .554
.210
6. 2.204 1.635 1.237 .872
.381
7. 2.833 2.167 1.690 1.239
.598
8. 3.490 2.733 2.180 1.646
.857
9. 4.168 3.325 2.700 2.088
1.152
10. 4.865 3.940 3.247 2.558
1.479
11. 5.578 4.575 3.816 3.053
1.834
12. 6.304 5.226 4.404 3.571
2.214
13. 7.042 5.892 5.009 4.107
2.617
14. 7.790 6.571 5.629 4.660
3.041
15. 8.547 7.261 6.262 5.229
3.483
16. 9.312 7.962 6.908 5.812
3.942
17. 10.085 8.672 7.564 6.408
4.416
18. 10.865 9.390 8.231 7.015
4.905
19. 11.651 10.117 8.907 7.633
5.407
20. 12.443 10.851 9.591 8.260
5.921
21. 13.240 11.591 10.283 8.897
6.447
22. 14.041 12.338 10.982 9.542
6.983
23. 14.848 13.091 11.689 10.196
7.529
24. 15.659 13.848 12.401 10.856
8.085
25. 16.473 14.611 13.120 11.524
8.649
26. 17.292 15.379 13.844 12.198
9.222
27. 18.114 16.151 14.573 12.879
9.803
28. 18.939 16.928 15.308 13.565
10.391
29. 19.768 17.708 16.047 14.256
10.986
30. 20.599 18.493 16.791 14.953
11.588
31. 21.434 19.281 17.539 15.655
12.196
32. 22.271 20.072 18.291 16.362
12.811
33. 23.110 20.867 19.047 17.074
13.431
34. 23.952 21.664 19.806 17.789
14.057
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
35. 24.797 22.465 20.569 18.509
14.688
36. 25.643 23.269 21.336 19.233
15.324
37. 26.492 24.075 22.106 19.960
15.965
38. 27.343 24.884 22.878 20.691
16.611
39. 28.196 25.695 23.654 21.426
17.262
40. 29.051 26.509 24.433 22.164
17.916
41. 29.907 27.326 25.215 22.906
18.575
42. 30.765 28.144 25.999 23.650
19.239
43. 31.625 28.965 26.785 24.398
19.906
44. 32.487 29.787 27.575 25.148
20.576
45. 33.350 30.612 28.366 25.901
21.251
46. 34.215 31.439 29.160 26.657
21.929
47. 35.081 32.268 29.956 27.416
22.610
48. 35.949 33.098 30.755 28.177
23.295
49. 36.818 33.930 31.555 28.941
23.983
50. 37.689 34.764 32.357 29.707
24.674
51. 38.560 35.600 33.162 30.475
25.368
52. 39.433 36.437 33.968 31.246
26.065
53. 40.308 37.276 34.776 32.018
26.765
54. 41.183 38.116 35.586 32.793
27.468
55. 42.060 38.958 36.398 33.570
28.173
56. 42.937 39.801 37.212 34.350
28.881
57. 43.816 40.646 38.027 35.131
29.592
58. 44.696 41.492 38.844 35.913
30.305
59. 45.577 42.339 39.662 36.698
31.020
60. 46.459 43.188 40.482 37.485
31.738
61. 47.342 44.038 41.303 38.273
32.459
62. 48.226 44.889 42.126 39.063
33.181
63. 49.111 45.741 42.950 39.855
33.906
64. 49.996 46.595 43.776 40.649
34.633
65. 50.883 47.450 44.603 41.444
35.362
66. 51.770 48.305 45.431 42.240
36.093
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
67. 52.659 49.162 46.261 43.038
36.826
68. 53.548 50.020 47.092 43.838
37.561
69. 54.438 50.879 47.924 44.639
38.298
70. 55.329 51.739 48.758 45.442
39.036
71. 56.221 52.600 49.592 46.246
39.777
72. 57.113 53.462 50.428 47.051
40.519
73. 58.006 54.325 51.265 47.858
41.264
74. 58.900 55.189 52.103 48.666
42.010
75. 59.795 56.054 52.942 49.475
42.757
76. 60.690 56.920 53.782 50.286
43.507
77. 61.586 57.786 54.623 51.097
44.258
78. 62.483 58.654 55.466 51.910
45.010
79. 63.380 59.522 56.309 52.725
45.764
80. 64.278 60.391 57.153 53.540
46.520
81. 65.176 61.261 57.998 54.357
47.277
82. 66.076 62.132 58.845 55.174
48.036
83. 66.976 63.004 59.692 55.993
48.796
84. 67.876 63.876 60.540 56.813
49.557
85. 68.777 64.749 61.389 57.634
50.320
86. 69.679 65.623 62.239 58.456
51.085
87. 70.581 66.498 63.089 59.279
51.850
88. 71.484 67.373 63.941 60.103
52.617
89. 72.387 68.249 64.793 60.928
53.386
90. 73.291 69.126 65.647 61.754
54.155
91. 74.196 70.003 66.501 62.581
54.926
92. 75.100 70.882 67.356 63.409
55.698
93. 76.006 71.760 68.211 64.238
56.472
94. 76.912 72.640 69.068 65.068
57.246
95. 77.818 73.520 69.925 65.898
58.022
96. 78.725 74.401 70.783 66.730
58.799
97. 79.633 75.282 71.642 67.562
59.577
98. 80.541 76.164 72.501 68.396
60.356
1.3.6.7.4. Critical Values of the Chi-Square Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm[6/27/2012 2:03:00 PM]
99. 81.449 77.046 73.361 69.230
61.137
100. 82.358 77.929 74.222 70.065
61.918
1.3.6.7.5. Critical Values of the t<sup>*</sup> Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3675.htm[6/27/2012 2:03:02 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.7. Tables for Probability Distributions
1.3.6.7.5.
Critical Values of the t
*
Distribution
How to
Use This
Table
This table contains upper critical values of the t* distribution
that are appropriate for determining whether or not a
calibration line is in a state of statistical control from
measurements on a check standard at three points in the
calibration interval. A test statistic with degrees of freedom
is compared with the critical value. If the absolute value of the
test statistic exceeds the tabled value, the calibration of the
instrument is judged to be out of control.
Upper critical values of t* distribution at significance level 0.05
for testing the output of a linear calibration line at 3 points
1 37.544 61 2.455
2 7.582 62 2.454
3 4.826 63 2.453
4 3.941 64 2.452
5 3.518 65 2.451
6 3.274 66 2.450
7 3.115 67 2.449
8 3.004 68 2.448
9 2.923 69 2.447
10 2.860 70 2.446
11 2.811 71 2.445
12 2.770 72 2.445
13 2.737 73 2.444
14 2.709 74 2.443
15 2.685 75 2.442
16 2.665 76 2.441
17 2.647 77 2.441
18 2.631 78 2.440
19 2.617 79 2.439
20 2.605 80 2.439
21 2.594 81 2.438
22 2.584 82 2.437
23 2.574 83 2.437
24 2.566 84 2.436
25 2.558 85 2.436
26 2.551 86 2.435
27 2.545 87 2.435
28 2.539 88 2.434
29 2.534 89 2.434
30 2.528 90 2.433
31 2.524 91 2.432
32 2.519 92 2.432
33 2.515 93 2.431
34 2.511 94 2.431
35 2.507 95 2.431
1.3.6.7.5. Critical Values of the t<sup>*</sup> Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3675.htm[6/27/2012 2:03:02 PM]
36 2.504 96 2.430
37 2.501 97 2.430
38 2.498 98 2.429
39 2.495 99 2.429
40 2.492 100 2.428
41 2.489 101 2.428
42 2.487 102 2.428
43 2.484 103 2.427
44 2.482 104 2.427
45 2.480 105 2.426
46 2.478 106 2.426
47 2.476 107 2.426
48 2.474 108 2.425
49 2.472 109 2.425
50 2.470 110 2.425
51 2.469 111 2.424
52 2.467 112 2.424
53 2.466 113 2.424
54 2.464 114 2.423
55 2.463 115 2.423
56 2.461 116 2.423
57 2.460 117 2.422
58 2.459 118 2.422
59 2.457 119 2.422
60 2.456 120 2.422
1.3.6.7.6. Critical Values of the Normal PPCC Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3676.htm[6/27/2012 2:03:02 PM]
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.6. Probability Distributions
1.3.6.7. Tables for Probability Distributions
1.3.6.7.6. Critical Values of the Normal PPCC
Distribution
How to
Use This
Table
This table contains the critical values of the normal probability
plot correlation coefficient (PPCC) distribution that are
appropriate for determining whether or not a data set came
from a population with approximately a normal distribution. It
is used in conjuction with a normal probability plot. The test
statistic is the correlation coefficient of the points that make up
a normal probability plot. This test statistic is compared with
the critical value below. If the test statistic is less than the
tabulated value, the null hypothesis that the data came from a
population with a normal distribution is rejected.
For example, suppose a set of 50 data points had a correlation
coefficient of 0.985 from the normal probability plot. At the
5% significance level, the critical value is 0.9761. Since 0.985
is greater than 0.9761, we cannot reject the null hypothesis that
the data came from a population with a normal distribution.
Since perferct normality implies perfect correlation (i.e., a
correlation value of 1), we are only interested in rejecting
normality for correlation values that are too low. That is, this
is a lower one-tailed test.
The values in this table were determined from simulation
studies by Filliben and Devaney.
Critical values of the normal PPCC for testing if data come
from a normal distribution
N 0.01 0.05
3 0.8687 0.8790
4 0.8234 0.8666
5 0.8240 0.8786
6 0.8351 0.8880
7 0.8474 0.8970
8 0.8590 0.9043
9 0.8689 0.9115
10 0.8765 0.9173
11 0.8838 0.9223
12 0.8918 0.9267
13 0.8974 0.9310
14 0.9029 0.9343
1.3.6.7.6. Critical Values of the Normal PPCC Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3676.htm[6/27/2012 2:03:02 PM]
15 0.9080 0.9376
16 0.9121 0.9405
17 0.9160 0.9433
18 0.9196 0.9452
19 0.9230 0.9479
20 0.9256 0.9498
21 0.9285 0.9515
22 0.9308 0.9535
23 0.9334 0.9548
24 0.9356 0.9564
25 0.9370 0.9575
26 0.9393 0.9590
27 0.9413 0.9600
28 0.9428 0.9615
29 0.9441 0.9622
30 0.9462 0.9634
31 0.9476 0.9644
32 0.9490 0.9652
33 0.9505 0.9661
34 0.9521 0.9671
35 0.9530 0.9678
36 0.9540 0.9686
37 0.9551 0.9693
38 0.9555 0.9700
39 0.9568 0.9704
40 0.9576 0.9712
41 0.9589 0.9719
42 0.9593 0.9723
43 0.9609 0.9730
44 0.9611 0.9734
45 0.9620 0.9739
46 0.9629 0.9744
47 0.9637 0.9748
48 0.9640 0.9753
49 0.9643 0.9758
50 0.9654 0.9761
55 0.9683 0.9781
60 0.9706 0.9797
65 0.9723 0.9809
70 0.9742 0.9822
75 0.9758 0.9831
80 0.9771 0.9841
85 0.9784 0.9850
90 0.9797 0.9857
95 0.9804 0.9864
100 0.9814 0.9869
110 0.9830 0.9881
120 0.9841 0.9889
130 0.9854 0.9897
140 0.9865 0.9904
150 0.9871 0.9909
160 0.9879 0.9915
170 0.9887 0.9919
180 0.9891 0.9923
190 0.9897 0.9927
200 0.9903 0.9930
210 0.9907 0.9933
220 0.9910 0.9936
230 0.9914 0.9939
240 0.9917 0.9941
250 0.9921 0.9943
260 0.9924 0.9945
270 0.9926 0.9947
280 0.9929 0.9949
290 0.9931 0.9951
300 0.9933 0.9952
310 0.9936 0.9954
320 0.9937 0.9955
330 0.9939 0.9956
340 0.9941 0.9957
350 0.9942 0.9958
360 0.9944 0.9959
370 0.9945 0.9960
380 0.9947 0.9961
1.3.6.7.6. Critical Values of the Normal PPCC Distribution
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3676.htm[6/27/2012 2:03:02 PM]
390 0.9948 0.9962
400 0.9949 0.9963
410 0.9950 0.9964
420 0.9951 0.9965
430 0.9953 0.9966
440 0.9954 0.9966
450 0.9954 0.9967
460 0.9955 0.9968
470 0.9956 0.9968
480 0.9957 0.9969
490 0.9958 0.9969
500 0.9959 0.9970
525 0.9961 0.9972
550 0.9963 0.9973
575 0.9964 0.9974
600 0.9965 0.9975
625 0.9967 0.9976
650 0.9968 0.9977
675 0.9969 0.9977
700 0.9970 0.9978
725 0.9971 0.9979
750 0.9972 0.9980
775 0.9973 0.9980
800 0.9974 0.9981
825 0.9975 0.9981
850 0.9975 0.9982
875 0.9976 0.9982
900 0.9977 0.9983
925 0.9977 0.9983
950 0.9978 0.9984
975 0.9978 0.9984
1000 0.9979 0.9984
1.4. EDA Case Studies
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4.htm[6/27/2012 2:03:03 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
Summary This section presents a series of case studies that demonstrate
the application of EDA methods to specific problems. In some
cases, we have focused on just one EDA technique that
uncovers virtually all there is to know about the data. For other
case studies, we need several EDA techniques, the selection of
which is dictated by the outcome of the previous step in the
analaysis sequence. Note in these case studies how the flow of
the analysis is motivated by the focus on underlying
assumptions and general EDA principles.
Table of
Contents
for Section
4
1. Introduction
2. By Problem Category
1.4.1. Case Studies Introduction
http://www.itl.nist.gov/div898/handbook/eda/section4/eda41.htm[6/27/2012 2:03:04 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.1. Case Studies Introduction
Purpose The purpose of the first eight case studies is to show how
EDA graphics and quantitative measures and tests are
applied to data from scientific processes and to critique
those data with regard to the following assumptions that
typically underlie a measurement process; namely, that the
data behave like:
random drawings
from a fixed distribution
with a fixed location
with a fixed standard deviation
Case studies 9 and 10 show the use of EDA techniques in
distributional modeling and the analysis of a designed
experiment, respectively.
Y
i
= C + E
i
If the above assumptions are satisfied, the process is said
to be statistically "in control" with the core characteristic
of having "predictability". That is, probability statements
can be made about the process, not only in the past, but
also in the future.
An appropriate model for an "in control" process is
Y
i
= C + E
i
where C is a constant (the "deterministic" or "structural"
component), and where E
i
is the error term (or "random"
component).
The constant C is the average value of the process--it is the
primary summary number which shows up on any report.
Although C is (assumed) fixed, it is unknown, and so a
primary analysis objective of the engineer is to arrive at an
estimate of C.
This goal partitions into 4 sub-goals:
1. Is the most common estimator of C, , the best
estimator for C? What does "best" mean?
2. If is best, what is the uncertainty for . In
particular, is the usual formula for the uncertainty of
1.4.1. Case Studies Introduction
http://www.itl.nist.gov/div898/handbook/eda/section4/eda41.htm[6/27/2012 2:03:04 PM]
:
valid? Here, s is the standard deviation of the data
and N is the sample size.
3. If is not the best estimator for C, what is a better
estimator for C (for example, median, midrange,
midmean)?
4. If there is a better estimator, , what is its
uncertainty? That is, what is ?
EDA and the routine checking of underlying assumptions
provides insight into all of the above.
1. Location and variation checks provide information
as to whether C is really constant.
2. Distributional checks indicate whether is the best
estimator. Techniques for distributional checking
include histograms, normal probability plots, and
probability plot correlation coefficient plots.
3. Randomness checks ascertain whether the usual
is valid.
4. Distributional tests assist in determining a better
estimator, if needed.
5. Simulator tools (namely bootstrapping) provide
values for the uncertainty of alternative estimators.
Assumptions
not satisfied
If one or more of the above assumptions is not satisfied,
then we use EDA techniques, or some mix of EDA and
classical techniques, to find a more appropriate model for
the data. That is,
Y
i
= D + E
i
where D is the deterministic part and E is an error
component.
If the data are not random, then we may investigate fitting
some simple time series models to the data. If the constant
location and scale assumptions are violated, we may need
to investigate the measurement process to see if there is an
explanation.
The assumptions on the error term are still quite relevant
in the sense that for an appropriate model the error
1.4.1. Case Studies Introduction
http://www.itl.nist.gov/div898/handbook/eda/section4/eda41.htm[6/27/2012 2:03:04 PM]
component should follow the assumptions. The criterion
for validating the model, or comparing competing models,
is framed in terms of these assumptions.
Multivariable
data
Although the case studies in this chapter utilize univariate
data, the assumptions above are relevant for multivariable
data as well.
If the data are not univariate, then we are trying to find a
model
Y
i
= F(X
1
, ..., X
k
) + E
i
where F is some function based on one or more variables.
The error component, which is a univariate data set, of a
good model should satisfy the assumptions given above.
The criterion for validating and comparing models is based
on how well the error component follows these
assumptions.
The load cell calibration case study in the process
modeling chapter shows an example of this in the
regression context.
First three
case studies
utilize data
with known
characteristics
The first three case studies utilize data that are randomly
generated from the following distributions:
normal distribution with mean 0 and standard
deviation 1
uniform distribution with mean 0 and standard
deviation (uniform over the interval (0,1))
random walk
The other univariate case studies utilize data from
scientific processes. The goal is to determine if
Y
i
= C + E
i
is a reasonable model. This is done by testing the
underlying assumptions. If the assumptions are satisfied,
then an estimate of C and an estimate of the uncertainty of
C are computed. If the assumptions are not satisfied, we
attempt to find a model where the error component does
satisfy the underlying assumptions.
Graphical
methods that
are applied to
the data
To test the underlying assumptions, each data set is
analyzed using four graphical methods that are particularly
suited for this purpose:
1. run sequence plot which is useful for detecting shifts
of location or scale
1.4.1. Case Studies Introduction
http://www.itl.nist.gov/div898/handbook/eda/section4/eda41.htm[6/27/2012 2:03:04 PM]
2. lag plot which is useful for detecting non-
randomness in the data
3. histogram which is useful for trying to determine the
underlying distribution
4. normal probability plot for deciding whether the data
follow the normal distribution
There are a number of other techniques for addressing the
underlying assumptions. However, the four plots listed
above provide an excellent opportunity for addressing all
of the assumptions on a single page of graphics.
Additional graphical techniques are used in certain case
studies to develop models that do have error components
that satisfy the underlying assumptions.
Quantitative
methods that
are applied to
the data
The normal and uniform random number data sets are also
analyzed with the following quantitative techniques, which
are explained in more detail in an earlier section:
1. Summary statistics which include:
mean
standard deviation
autocorrelation coefficient to test for
randomness
normal and uniform probability plot
correlation coefficients (ppcc) to test for a
normal or uniform distribution, respectively
Wilk-Shapiro test for a normal distribution
2. Linear fit of the data as a function of time to assess
drift (test for fixed location)
3. Bartlett test for fixed variance
4. Autocorrelation plot and coefficient to test for
randomness
5. Runs test to test for lack of randomness
6. Anderson-Darling test for a normal distribution
7. Grubbs test for outliers
8. Summary report
Although the graphical methods applied to the normal and
uniform random numbers are sufficient to assess the
validity of the underlying assumptions, the quantitative
techniques are used to show the different flavor of the
graphical and quantitative approaches.
The remaining case studies intermix one or more of these
1.4.1. Case Studies Introduction
http://www.itl.nist.gov/div898/handbook/eda/section4/eda41.htm[6/27/2012 2:03:04 PM]
quantitative techniques into the analysis where appropriate.
1.4.2. Case Studies
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42.htm[6/27/2012 2:03:05 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
Univariate
Y
i
= C +
E
i
Normal Random
Numbers
Uniform Random
Numbers
Random Walk
Josephson Junction
Cryothermometry
Beam Deflections Filter
Transmittance
Standard Resistor Heat Flow Meter
1
Reliability
Fatigue Life of
Aluminum Alloy
Specimens
Multi-
Factor
Ceramic Strength
1.4.2.1. Normal Random Numbers
http://www.itl.nist.gov/div898/handbook/eda/section4/eda421.htm[6/27/2012 2:03:06 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.1. Normal Random Numbers
Normal
Random
Numbers
This example illustrates the univariate analysis of a set of
normal random numbers.
1. Background and Data
2. Graphical Output and Interpretation
3. Quantitative Output and Interpretation
4. Work This Example Yourself
1.4.2.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4211.htm[6/27/2012 2:03:06 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.1. Normal Random Numbers
1.4.2.1.1. Background and Data
Generation The normal random numbers used in this case study are from
a Rand Corporation publication.
The motivation for studying a set of normal random numbers
is to illustrate the ideal case where all four underlying
assumptions hold.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Data The following is the set of normal random numbers used for
this case study.
-1.2760 -1.2180 -0.4530 -0.3500 0.7230
0.6760 -1.0990 -0.3140 -0.3940 -0.6330
-0.3180 -0.7990 -1.6640 1.3910 0.3820
0.7330 0.6530 0.2190 -0.6810 1.1290
-1.3770 -1.2570 0.4950 -0.1390 -0.8540
0.4280 -1.3220 -0.3150 -0.7320 -1.3480
2.3340 -0.3370 -1.9550 -0.6360 -1.3180
-0.4330 0.5450 0.4280 -0.2970 0.2760
-1.1360 0.6420 3.4360 -1.6670 0.8470
-1.1730 -0.3550 0.0350 0.3590 0.9300
0.4140 -0.0110 0.6660 -1.1320 -0.4100
-1.0770 0.7340 1.4840 -0.3400 0.7890
-0.4940 0.3640 -1.2370 -0.0440 -0.1110
-0.2100 0.9310 0.6160 -0.3770 -0.4330
1.0480 0.0370 0.7590 0.6090 -2.0430
-0.2900 0.4040 -0.5430 0.4860 0.8690
0.3470 2.8160 -0.4640 -0.6320 -1.6140
0.3720 -0.0740 -0.9160 1.3140 -0.0380
0.6370 0.5630 -0.1070 0.1310 -1.8080
-1.1260 0.3790 0.6100 -0.3640 -2.6260
2.1760 0.3930 -0.9240 1.9110 -1.0400
-1.1680 0.4850 0.0760 -0.7690 1.6070
-1.1850 -0.9440 -1.6040 0.1850 -0.2580
-0.3000 -0.5910 -0.5450 0.0180 -0.4850
0.9720 1.7100 2.6820 2.8130 -1.5310
-0.4900 2.0710 1.4440 -1.0920 0.4780
1.2100 0.2940 -0.2480 0.7190 1.1030
1.0900 0.2120 -1.1850 -0.3380 -1.1340
2.6470 0.7770 0.4500 2.2470 1.1510
-1.6760 0.3840 1.1330 1.3930 0.8140
0.3980 0.3180 -0.9280 2.4160 -0.9360
1.0360 0.0240 -0.5600 0.2030 -0.8710
0.8460 -0.6990 -0.3680 0.3440 -0.9260
-0.7970 -1.4040 -1.4720 -0.1180 1.4560
0.6540 -0.9550 2.9070 1.6880 0.7520
-0.4340 0.7460 0.1490 -0.1700 -0.4790
0.5220 0.2310 -0.6190 -0.2650 0.4190
0.5580 -0.5490 0.1920 -0.3340 1.3730
-1.2880 -0.5390 -0.8240 0.2440 -1.0700
0.0100 0.4820 -0.4690 -0.0900 1.1710
1.3720 1.7690 -1.0570 1.6460 0.4810
-0.6000 -0.5920 0.6100 -0.0960 -1.3750
1.4.2.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4211.htm[6/27/2012 2:03:06 PM]
0.8540 -0.5350 1.6070 0.4280 -0.6150
0.3310 -0.3360 -1.1520 0.5330 -0.8330
-0.1480 -1.1440 0.9130 0.6840 1.0430
0.5540 -0.0510 -0.9440 -0.4400 -0.2120
-1.1480 -1.0560 0.6350 -0.3280 -1.2210
0.1180 -2.0450 -1.9770 -1.1330 0.3380
0.3480 0.9700 -0.0170 1.2170 -0.9740
-1.2910 -0.3990 -1.2090 -0.2480 0.4800
0.2840 0.4580 1.3070 -1.6250 -0.6290
-0.5040 -0.0560 -0.1310 0.0480 1.8790
-1.0160 0.3600 -0.1190 2.3310 1.6720
-1.0530 0.8400 -0.2460 0.2370 -1.3120
1.6030 -0.9520 -0.5660 1.6000 0.4650
1.9510 0.1100 0.2510 0.1160 -0.9570
-0.1900 1.4790 -0.9860 1.2490 1.9340
0.0700 -1.3580 -1.2460 -0.9590 -1.2970
-0.7220 0.9250 0.7830 -0.4020 0.6190
1.8260 1.2720 -0.9450 0.4940 0.0500
-1.6960 1.8790 0.0630 0.1320 0.6820
0.5440 -0.4170 -0.6660 -0.1040 -0.2530
-2.5430 -1.3330 1.9870 0.6680 0.3600
1.9270 1.1830 1.2110 1.7650 0.3500
-0.3590 0.1930 -1.0230 -0.2220 -0.6160
-0.0600 -1.3190 0.7850 -0.4300 -0.2980
0.2480 -0.0880 -1.3790 0.2950 -0.1150
-0.6210 -0.6180 0.2090 0.9790 0.9060
-0.0990 -1.3760 1.0470 -0.8720 -2.2000
-1.3840 1.4250 -0.8120 0.7480 -1.0930
-0.4630 -1.2810 -2.5140 0.6750 1.1450
1.0830 -0.6670 -0.2230 -1.5920 -1.2780
0.5030 1.4340 0.2900 0.3970 -0.8370
-0.9730 -0.1200 -1.5940 -0.9960 -1.2440
-0.8570 -0.3710 -0.2160 0.1480 -2.1060
-1.4530 0.6860 -0.0750 -0.2430 -0.1700
-0.1220 1.1070 -1.0390 -0.6360 -0.8600
-0.8950 -1.4580 -0.5390 -0.1590 -0.4200
1.6320 0.5860 -0.4680 -0.3860 -0.3540
0.2030 -1.2340 2.3810 -0.3880 -0.0630
2.0720 -1.4450 -0.6800 0.2240 -0.1200
1.7530 -0.5710 1.2230 -0.1260 0.0340
-0.4350 -0.3750 -0.9850 -0.5850 -0.2030
-0.5560 0.0240 0.1260 1.2500 -0.6150
0.8760 -1.2270 -2.6470 -0.7450 1.7970
-1.2310 0.5470 -0.6340 -0.8360 -0.7190
0.8330 1.2890 -0.0220 -0.4310 0.5820
0.7660 -0.5740 -1.1530 0.5200 -1.0180
-0.8910 0.3320 -0.4530 -1.1270 2.0850
-0.7220 -1.5080 0.4890 -0.4960 -0.0250
0.6440 -0.2330 -0.1530 1.0980 0.7570
-0.0390 -0.4600 0.3930 2.0120 1.3560
0.1050 -0.1710 -0.1100 -1.1450 0.8780
-0.9090 -0.3280 1.0210 -1.6130 1.5600
-1.1920 1.7700 -0.0030 0.3690 0.0520
0.6470 1.0290 1.5260 0.2370 -1.3280
-0.0420 0.5530 0.7700 0.3240 -0.4890
-0.3670 0.3780 0.6010 -1.9960 -0.7380
0.4980 1.0720 1.5670 0.3020 1.1570
-0.7200 1.4030 0.6980 -0.3700 -0.5510
1.4.2.1.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4212.htm[6/27/2012 2:03:08 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.1. Normal Random Numbers
1.4.2.1.2. Graphical Output and Interpretation
Goal The goal of this analysis is threefold:
1. Determine if the univariate model:
is appropriate and valid.
2. Determine if the typical underlying assumptions for
an "in control" measurement process are valid. These
assumptions are:
1. random drawings;
2. from a fixed distribution;
3. with the distribution having a fixed location;
and
4. the distribution having a fixed scale.
3. Determine if the confidence interval
is appropriate and valid where s is the standard
deviation of the original data.
4-Plot of
Data
1.4.2.1.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4212.htm[6/27/2012 2:03:08 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates that the
data do not have any significant shifts in location or
scale over time. The run sequence plot does not show
any obvious outliers.
2. The lag plot (upper right) does not indicate any non-
random pattern in the data.
3. The histogram (lower left) shows that the data are
reasonably symmetric, there do not appear to be
significant outliers in the tails, and that it is
reasonable to assume that the data are from
approximately a normal distribution.
4. The normal probability plot (lower right) verifies that
an assumption of normality is in fact reasonable.
From the above plots, we conclude that the underlying
assumptions are valid and the data follow approximately a
normal distribution. Therefore, the confidence interval form
given previously is appropriate for quantifying the
uncertainty of the population mean. The numerical values
for this model are given in the Quantitative Output and
Interpretation section.
Individual
Plots
Although it is usually not necessary, the plots can be
generated individually to give more detail.
Run
Sequence
Plot
Lag Plot
1.4.2.1.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4212.htm[6/27/2012 2:03:08 PM]
Histogram
(with
overlaid
Normal PDF)
Normal
Probability
Plot
1.4.2.1.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4213.htm[6/27/2012 2:03:09 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.1. Normal Random Numbers
1.4.2.1.3. Quantitative Output and
Interpretation
Summary
Statistics
As a first step in the analysis, common summary statistics
are computed from the data.
Sample size = 500
Mean = -0.2935997E-02
Median = -0.9300000E-01
Minimum = -0.2647000E+01
Maximum = 0.3436000E+01
Range = 0.6083000E+01
Stan. Dev. = 0.1021041E+01
Location One way to quantify a change in location over time is to fit
a straight line to the data using an index variable as the
independent variable in the regression. For our data, we
assume that data are in sequential run order and that the
data were collected at equally spaced time intervals. In our
regression, we use the index variable X = 1, 2, ..., N, where
N is the number of observations. If there is no significant
drift in the location over time, the slope parameter should
be zero.
Coefficient Estimate Stan. Error
t-Value
B
0
0.699127E-02 0.9155E-01
0.0764
B
1
-0.396298E-04 0.3167E-03
-0.1251
Residual Standard Deviation = 1.02205
Residual Degrees of Freedom = 498
The absolute value of the t-value for the slope parameter is
smaller than the critical value of t
0.975,498
= 1.96. Thus, we
conclude that the slope is not different from zero at the 0.05
significance level.
Variation One simple way to detect a change in variation is with
Bartlett's test, after dividing the data set into several equal-
sized intervals. The choice of the number of intervals is
somewhat arbitrary, although values of four or eight are
reasonable. We will divide our data into four intervals.
H
0
:
1
2
=
2
2
=
3
2
=
4
2
H
a
: At least one
i
2
is not equal to the
others.
1.4.2.1.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4213.htm[6/27/2012 2:03:09 PM]
Test statistic: T = 2.373660
Degrees of freedom: k - 1 = 3
Significance level: = 0.05
Critical value:
2
1-,k-1
= 7.814728
Critical region: Reject H
0
if T > 7.814728
In this case, Bartlett's test indicates that the variances are
not significantly different in the four intervals.
Randomness There are many ways in which data can be non-random.
However, most common forms of non-randomness can be
detected with a few simple tests including the lag plot
shown on the previous page.
Another check is an autocorrelation plot that shows the
autocorrelations for various lags. Confidence bands can be
plotted at the 95 % and 99 % confidence levels. Points
outside this band indicate statistically significant values (lag
0 is always 1).
The lag 1 autocorrelation, which is generally the one of
most interest, is 0.045. The critical values at the 5%
significance level are -0.087 and 0.087. Since 0.045 is
within the critical region, the lag 1 autocorrelation is not
statistically significant, so there is no evidence of non-
randomness.
A common test for randomness is the runs test.
H
0
: the sequence was produced in a random
manner
H
a
: the sequence was not produced in a
random manner
Test statistic: Z = -1.0744
Significance level: = 0.05
Critical value: Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
The runs test fails to reject the null hypothesis that the data
were produced in a random manner.
1.4.2.1.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4213.htm[6/27/2012 2:03:09 PM]
Distributional
Analysis
Probability plots are a graphical test for assessing if a
particular distribution provides an adequate fit to a data set.
A quantitative enhancement to the probability plot is the
correlation coefficient of the points on the probability plot,
or PPCC. For this data set the PPCC based on a normal
distribution is 0.996. Since the PPCC is greater than the
critical value of 0.987 (this is a tabulated value), the
normality assumption is not rejected.
Chi-square and Kolmogorov-Smirnov goodness-of-fit tests
are alternative methods for assessing distributional
adequacy. The Wilk-Shapiro and Anderson-Darling tests
can be used to test for normality. The results of the
Anderson-Darling test follow.
H
0
: the data are normally distributed
H
a
: the data are not normally distributed
Adjusted test statistic: A
2
= 1.0612
Significance level: = 0.05
Critical value: 0.787
Critical region: Reject H
0
if A
2
> 0.787
The Anderson-Darling test rejects the normality assumption
at the 0.05 significance level.
Outlier
Analysis
A test for outliers is the Grubbs test.
H
0
: there are no outliers in the data
H
a
: the maximum value is an outlier
Test statistic: G = 3.368068
Significance level: = 0.05
Critical value for an upper one-tailed
test: 3.863087
Critical region: Reject H
0
if G > 3.863087
For this data set, Grubbs' test does not detect any outliers at
the 0.05 significance level.
Model Since the underlying assumptions were validated both
graphically and analytically, we conclude that a reasonable
model for the data is:
Y
i
= C + E
i
where C is the estimated value of the mean, -0.00294. We
can express the uncertainty for C as a 95 % confidence
interval (-0.09266, 0.08678).
Univariate
Report
It is sometimes useful and convenient to summarize the
above results in a report.
Analysis of 500 normal random numbers
1: Sample Size = 500
2: Location
Mean = -
1.4.2.1.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4213.htm[6/27/2012 2:03:09 PM]
0.00294
Standard Deviation of Mean =
0.045663
95% Confidence Interval for Mean = (-
0.09266,0.086779)
Drift with respect to location? = NO
3: Variation
Standard Deviation =
1.021042
95% Confidence Interval for SD =
(0.961437,1.088585)
Drift with respect to variation?
(based on Bartletts test on quarters
of the data) = NO
4: Data are Normal?
(as tested by Anderson-Darling) = YES
5: Randomness
Autocorrelation =
0.045059
Data are Random?
(as measured by autocorrelation) = YES
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed, here we are testing only for
fixed normal)
Data Set is in Statistical Control? = YES
7: Outliers?
(as determined by Grubbs' test) = NO
1.4.2.1.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4214.htm[6/27/2012 2:03:10 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.1. Normal Random Numbers
1.4.2.1.4. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data.
1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. 4-plot of the data.
1. 4-plot of Y. 1. Based on the 4-
plot, there are no
shifts
in location or
scale, and the data
seem to
follow a normal
distribution.
3. Generate the individual plots.
1.4.2.1.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4214.htm[6/27/2012 2:03:10 PM]
1. Generate a run sequence plot.
2. Generate a lag plot.
3. Generate a histogram with an
overlaid normal pdf.
4. Generate a normal probability
plot.
1. The run sequence
plot indicates that
there are no
shifts of location or
scale.
2. The lag plot
does not indicate any
significant
patterns (which would
show the data
were not random).
3. The histogram
indicates that a
normal
distribution is a
good
distribution for
these data.
4. The normal
probability plot
verifies
that the normal
distribution is a
reasonable
distribution for
these data.
4. Generate summary statistics,
quantitative
analysis, and print a univariate
report.
1. Generate a table of summary
statistics.
2. Generate the mean, a confidence
interval for the mean, and compute
a linear fit to detect drift in
location.
3. Generate the standard deviation, a
confidence interval for the
standard
deviation, and detect drift in
variation
by dividing the data into quarters
and
computing Barltett's test for
equal
standard deviations.
4. Check for randomness by generating
an
autocorrelation plot and a runs
test.
5. Check for normality by computing
the
normal probability plot
correlation
coefficient.
6. Check for outliers using Grubbs'
test.
7. Print a univariate report (this
1. The summary
statistics table
displays
25+ statistics.
2. The mean is -
0.00294 and a 95%
confidence
interval is (-
0.093,0.087).
The linear fit
indicates no drift in
location since
the slope parameter
is
statistically not
significant.
3. The standard
deviation is 1.02
with
a 95% confidence
interval of
(0.96,1.09).
Bartlett's test
indicates no
significant
change in
variation.
4. The lag 1
autocorrelation is
0.04.
From the
autocorrelation plot,
this is
within the 95%
confidence interval
bands.
1.4.2.1.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4214.htm[6/27/2012 2:03:10 PM]
assumes
steps 2 thru 6 have already been
run).
5. The normal
probability plot
correlation
coefficient is
0.996. At the 5%
level,
we cannot reject
the normality
assumption.
6. Grubbs' test
detects no outliers
at the
5% level.
7. The results are
summarized in a
convenient
report.
1.4.2.2. Uniform Random Numbers
http://www.itl.nist.gov/div898/handbook/eda/section4/eda422.htm[6/27/2012 2:03:11 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.2. Uniform Random Numbers
Uniform
Random
Numbers
This example illustrates the univariate analysis of a set of
uniform random numbers.
1. Background and Data
2. Graphical Output and Interpretation
3. Quantitative Output and Interpretation
4. Work This Example Yourself
1.4.2.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4221.htm[6/27/2012 2:03:12 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.2. Uniform Random Numbers
1.4.2.2.1. Background and Data
Generation The uniform random numbers used in this case study are from
a Rand Corporation publication.
The motivation for studying a set of uniform random numbers
is to illustrate the effects of a known underlying non-normal
distribution.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Data The following is the set of uniform random numbers used for
this case study.
.100973 .253376 .520135 .863467 .354876
.809590 .911739 .292749 .375420 .480564
.894742 .962480 .524037 .206361 .040200
.822916 .084226 .895319 .645093 .032320
.902560 .159533 .476435 .080336 .990190
.252909 .376707 .153831 .131165 .886767
.439704 .436276 .128079 .997080 .157361
.476403 .236653 .989511 .687712 .171768
.660657 .471734 .072768 .503669 .736170
.658133 .988511 .199291 .310601 .080545
.571824 .063530 .342614 .867990 .743923
.403097 .852697 .760202 .051656 .926866
.574818 .730538 .524718 .623885 .635733
.213505 .325470 .489055 .357548 .284682
.870983 .491256 .737964 .575303 .529647
.783580 .834282 .609352 .034435 .273884
.985201 .776714 .905686 .072210 .940558
.609709 .343350 .500739 .118050 .543139
.808277 .325072 .568248 .294052 .420152
.775678 .834529 .963406 .288980 .831374
.670078 .184754 .061068 .711778 .886854
.020086 .507584 .013676 .667951 .903647
.649329 .609110 .995946 .734887 .517649
.699182 .608928 .937856 .136823 .478341
.654811 .767417 .468509 .505804 .776974
.730395 .718640 .218165 .801243 .563517
.727080 .154531 .822374 .211157 .825314
.385537 .743509 .981777 .402772 .144323
.600210 .455216 .423796 .286026 .699162
.680366 .252291 .483693 .687203 .766211
.399094 .400564 .098932 .050514 .225685
.144642 .756788 .962977 .882254 .382145
.914991 .452368 .479276 .864616 .283554
.947508 .992337 .089200 .803369 .459826
.940368 .587029 .734135 .531403 .334042
.050823 .441048 .194985 .157479 .543297
.926575 .576004 .088122 .222064 .125507
.374211 .100020 .401286 .074697 .966448
.943928 .707258 .636064 .932916 .505344
.844021 .952563 .436517 .708207 .207317
.611969 .044626 .457477 .745192 .433729
.653945 .959342 .582605 .154744 .526695
1.4.2.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4221.htm[6/27/2012 2:03:12 PM]
.270799 .535936 .783848 .823961 .011833
.211594 .945572 .857367 .897543 .875462
.244431 .911904 .259292 .927459 .424811
.621397 .344087 .211686 .848767 .030711
.205925 .701466 .235237 .831773 .208898
.376893 .591416 .262522 .966305 .522825
.044935 .249475 .246338 .244586 .251025
.619627 .933565 .337124 .005499 .765464
.051881 .599611 .963896 .546928 .239123
.287295 .359631 .530726 .898093 .543335
.135462 .779745 .002490 .103393 .598080
.839145 .427268 .428360 .949700 .130212
.489278 .565201 .460588 .523601 .390922
.867728 .144077 .939108 .364770 .617429
.321790 .059787 .379252 .410556 .707007
.867431 .715785 .394118 .692346 .140620
.117452 .041595 .660000 .187439 .242397
.118963 .195654 .143001 .758753 .794041
.921585 .666743 .680684 .962852 .451551
.493819 .476072 .464366 .794543 .590479
.003320 .826695 .948643 .199436 .168108
.513488 .881553 .015403 .545605 .014511
.980862 .482645 .240284 .044499 .908896
.390947 .340735 .441318 .331851 .623241
.941509 .498943 .548581 .886954 .199437
.548730 .809510 .040696 .382707 .742015
.123387 .250162 .529894 .624611 .797524
.914071 .961282 .966986 .102591 .748522
.053900 .387595 .186333 .253798 .145065
.713101 .024674 .054556 .142777 .938919
.740294 .390277 .557322 .709779 .017119
.525275 .802180 .814517 .541784 .561180
.993371 .430533 .512969 .561271 .925536
.040903 .116644 .988352 .079848 .275938
.171539 .099733 .344088 .461233 .483247
.792831 .249647 .100229 .536870 .323075
.754615 .020099 .690749 .413887 .637919
.763558 .404401 .105182 .161501 .848769
.091882 .009732 .825395 .270422 .086304
.833898 .737464 .278580 .900458 .549751
.981506 .549493 .881997 .918707 .615068
.476646 .731895 .020747 .677262 .696229
.064464 .271246 .701841 .361827 .757687
.649020 .971877 .499042 .912272 .953750
.587193 .823431 .540164 .405666 .281310
.030068 .227398 .207145 .329507 .706178
.083586 .991078 .542427 .851366 .158873
.046189 .755331 .223084 .283060 .326481
.333105 .914051 .007893 .326046 .047594
.119018 .538408 .623381 .594136 .285121
.590290 .284666 .879577 .762207 .917575
.374161 .613622 .695026 .390212 .557817
.651483 .483470 .894159 .269400 .397583
.911260 .717646 .489497 .230694 .541374
.775130 .382086 .864299 .016841 .482774
.519081 .398072 .893555 .195023 .717469
.979202 .885521 .029773 .742877 .525165
.344674 .218185 .931393 .278817 .570568
1.4.2.2.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4222.htm[6/27/2012 2:03:13 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.2. Uniform Random Numbers
1.4.2.2.2. Graphical Output and Interpretation
Goal The goal of this analysis is threefold:
1. Determine if the univariate model:
is appropriate and valid.
2. Determine if the typical underlying assumptions for
an "in control" measurement process are valid. These
assumptions are:
1. random drawings;
2. from a fixed distribution;
3. with the distribution having a fixed location;
and
4. the distribution having a fixed scale.
3. Determine if the confidence interval
is appropriate and valid where s is the standard
deviation of the original data.
4-Plot of
Data
1.4.2.2.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4222.htm[6/27/2012 2:03:13 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates that the
data do not have any significant shifts in location or
scale over time.
2. The lag plot (upper right) does not indicate any non-
random pattern in the data.
3. The histogram shows that the frequencies are
relatively flat across the range of the data. This
suggests that the uniform distribution might provide a
better distributional fit than the normal distribution.
4. The normal probability plot verifies that an
assumption of normality is not reasonable. In this
case, the 4-plot should be followed up by a uniform
probability plot to determine if it provides a better fit
to the data. This is shown below.
From the above plots, we conclude that the underlying
assumptions are valid. Therefore, the model Y
i
= C + E
i
is
valid. However, since the data are not normally distributed,
using the mean as an estimate of C and the confidence
interval cited above for quantifying its uncertainty are not
valid or appropriate.
Individual
Plots
Although it is usually not necessary, the plots can be
generated individually to give more detail.
Run
Sequence
Plot
Lag Plot
1.4.2.2.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4222.htm[6/27/2012 2:03:13 PM]
Histogram
(with
overlaid
Normal PDF)
This plot shows that a normal distribution is a poor fit. The
flatness of the histogram suggests that a uniform
distribution might be a better fit.
Histogram
(with
overlaid
Uniform
PDF)
1.4.2.2.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4222.htm[6/27/2012 2:03:13 PM]
Since the histogram from the 4-plot suggested that the
uniform distribution might be a good fit, we overlay a
uniform distribution on top of the histogram. This indicates
a much better fit than a normal distribution.
Normal
Probability
Plot
As with the histogram, the normal probability plot shows
that the normal distribution does not fit these data well.
Uniform
Probability
Plot
Since the above plots suggested that a uniform distribution
might be appropriate, we generate a uniform probability
plot. This plot shows that the uniform distribution provides
an excellent fit to the data.
Better Model Since the data follow the underlying assumptions, but with
a uniform distribution rather than a normal distribution, we
would still like to characterize C by a typical value plus or
minus a confidence interval. In this case, we would like to
find a location estimator with the smallest variability.
The bootstrap plot is an ideal tool for this purpose. The
following plots show the bootstrap plot, with the
1.4.2.2.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4222.htm[6/27/2012 2:03:13 PM]
corresponding histogram, for the mean, median, mid-range,
and median absolute deviation.
Bootstrap
Plots
Mid-Range is
Best
From the above histograms, it is obvious that for these data,
the mid-range is far superior to the mean or median as an
estimate for location.
Using the mean, the location estimate is 0.507 and a 95%
confidence interval for the mean is (0.482,0.534). Using the
mid-range, the location estimate is 0.499 and the 95%
confidence interval for the mid-range is (0.497,0.503).
Although the values for the location are similar, the
difference in the uncertainty intervals is quite large.
Note that in the case of a uniform distribution it is known
theoretically that the mid-range is the best linear unbiased
estimator for location. However, in many applications, the
most appropriate estimator will not be known or it will be
mathematically intractable to determine a valid condfidence
interval. The bootstrap provides a method for determining
(and comparing) confidence intervals in these cases.
1.4.2.2.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4223.htm[6/27/2012 2:03:15 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.2. Uniform Random Numbers
1.4.2.2.3. Quantitative Output and
Interpretation
Summary
Statistics
As a first step in the analysis, common summary statistics
are computed for the data.
Sample size = 500
Mean = 0.5078304
Median = 0.5183650
Minimum = 0.0024900
Maximum = 0.9970800
Range = 0.9945900
Stan. Dev. = 0.2943252
Because the graphs of the data indicate the data may not be
normally distributed, we also compute two other statistics
for the data, the normal PPCC and the uniform PPCC.
Normal PPCC = 0.9771602
Uniform PPCC = 0.9995682
The uniform probability plot correlation coefficient (PPCC)
value is larger than the normal PPCC value. This is
evidence that the uniform distribution fits these data better
than does a normal distribution.
Location One way to quantify a change in location over time is to fit
a straight line to the data using an index variable as the
independent variable in the regression. For our data, we
assume that data are in sequential run order and that the
data were collected at equally spaced time intervals. In our
regression, we use the index variable X = 1, 2, ..., N, where
N is the number of observations. If there is no significant
drift in the location over time, the slope parameter should
be zero.
Coefficient Estimate Stan. Error
t-Value
B
0
0.522923 0.2638E-01
19.82
B
1
-0.602478E-04 0.9125E-04
-0.66
Residual Standard Deviation = 0.2944917
Residual Degrees of Freedom = 498
The t-value of the slope parameter, -0.66, is smaller than
the critical value of t
0.975,498
= 1.96. Thus, we conclude
that the slope is not different from zero at the 0.05
1.4.2.2.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4223.htm[6/27/2012 2:03:15 PM]
significance level.
Variation One simple way to detect a change in variation is with a
Bartlett test after dividing the data set into several equal-
sized intervals. However, the Bartlett test is not robust for
non-normality. Since we know this data set is not
approximated well by the normal distribution, we use the
alternative Levene test. In particular, we use the Levene
test based on the median rather the mean. The choice of the
number of intervals is somewhat arbitrary, although values
of four or eight are reasonable. We will divide our data into
four intervals.
H
0
:
1
2
=
2
2
=
3
2
=
4
2
H
a
: At least one
i
2
is not equal to the
others.
Test statistic: W = 0.07983
Degrees of freedom: k - 1 = 3
Significance level: = 0.05
Critical value: F
,k-1,N-k
= 2.623
Critical region: Reject H
0
if W > 2.623
In this case, the Levene test indicates that the variances are
not significantly different in the four intervals.
Randomness There are many ways in which data can be non-random.
However, most common forms of non-randomness can be
detected with a few simple tests including the lag plot
shown on the previous page.
Another check is an autocorrelation plot that shows the
autocorrelations for various lags. Confidence bands can be
plotted using 95% and 99% confidence levels. Points
outside this band indicate statistically significant values (lag
0 is always 1).
The lag 1 autocorrelation, which is generally the one of
most interest, is 0.03. The critical values at the 5 %
significance level are -0.087 and 0.087. This indicates that
1.4.2.2.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4223.htm[6/27/2012 2:03:15 PM]
the lag 1 autocorrelation is not statistically significant, so
there is no evidence of non-randomness.
A common test for randomness is the runs test.
H
0
: the sequence was produced in a random
manner
H
a
: the sequence was not produced in a
random manner
Test statistic: Z = 0.2686
Significance level: = 0.05
Critical value: Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
The runs test fails to reject the null hypothesis that the data
were produced in a random manner.
Distributional
Analysis
Probability plots are a graphical test of assessing whether a
particular distribution provides an adequate fit to a data set.
A quantitative enhancement to the probability plot is the
correlation coefficient of the points on the probability plot,
or PPCC. For this data set the PPCC based on a normal
distribution is 0.977. Since the PPCC is less than the critical
value of 0.987 (this is a tabulated value), the normality
assumption is rejected.
Chi-square and Kolmogorov-Smirnov goodness-of-fit tests
are alternative methods for assessing distributional
adequacy. The Wilk-Shapiro and Anderson-Darling tests
can be used to test for normality. The results of the
Anderson-Darling test follow.
H
0
: the data are normally distributed
H
a
: the data are not normally distributed
Adjusted test statistic: A
2
= 5.765
Significance level: = 0.05
Critical value: 0.787
Critical region: Reject H
0
if A
2
> 0.787
The Anderson-Darling test rejects the normality assumption
because the value of the test statistic, 5.765, is larger than
the critical value of 0.787 at the 0.05 significance level.
Model Based on the graphical and quantitative analysis, we use the
model
Y
i
= C + E
i
where C is estimated by the mid-range and the uncertainty
interval for C is based on a bootstrap analysis. Specifically,
C = 0.499
95% confidence limit for C = (0.497,0.503)
1.4.2.2.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4223.htm[6/27/2012 2:03:15 PM]
Univariate
Report
It is sometimes useful and convenient to summarize the
above results in a report.
Analysis for 500 uniform random numbers
1: Sample Size = 500
2: Location
Mean =
0.50783
Standard Deviation of Mean =
0.013163
95% Confidence Interval for Mean =
(0.48197,0.533692)
Drift with respect to location? = NO
3: Variation
Standard Deviation =
0.294326
95% Confidence Interval for SD =
(0.277144,0.313796)
Drift with respect to variation?
(based on Levene's test on quarters
of the data) = NO
4: Distribution
Normal PPCC =
0.9771602
Data are Normal?
(as measured by Normal PPCC) = NO
Uniform PPCC =
0.9995682
Data are Uniform?
(as measured by Uniform PPCC) = YES
5: Randomness
Autocorrelation = -
0.03099
Data are Random?
(as measured by autocorrelation) = YES
6: Statistical Control
(i.e., no drift in location or scale,
data is random, distribution is
fixed, here we are testing only for
fixed uniform)
Data Set is in Statistical Control? = YES
1.4.2.2.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4224.htm[6/27/2012 2:03:16 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.2. Uniform Random Numbers
1.4.2.2.4. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data.
1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. 4-plot of the data.
1. 4-plot of Y. 1. Based on the 4-
plot, there are no
shifts
in location or
scale, and the data
do not
seem to follow a
normal distribution.
3. Generate the individual plots.
1.4.2.2.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4224.htm[6/27/2012 2:03:16 PM]
1. Generate a run sequence plot.
2. Generate a lag plot.
3. Generate a histogram with an
overlaid normal pdf.
4. Generate a histogram with an
overlaid uniform pdf.
5. Generate a normal probability
plot.
6. Generate a uniform probability
plot.
1. The run sequence
plot indicates that
there are no
shifts of location or
scale.
2. The lag plot
does not indicate any
significant
patterns (which would
show the data
were not random).
3. The histogram
indicates that a
normal
distribution is not a
good
distribution for
these data.
4. The histogram
indicates that a
uniform
distribution is a
good
distribution for
these data.
5. The normal
probability plot
verifies
that the normal
distribution is not a
reasonable
distribution for
these data.
6. The uniform
probability plot
verifies
that the uniform
distribution is a
reasonable
distribution for
these data.
4. Generate the bootstrap plot.
1. Generate a bootstrap plot. 1. The bootstrap
plot clearly shows
the superiority
of the mid-range
over the mean
and median as the
location
estimator of choice
for
this problem.
5. Generate summary statistics,
quantitative
analysis, and print a univariate
report.
1. Generate a table of summary
statistics.
2. Generate the mean, a confidence
interval for the mean, and compute
a linear fit to detect drift in
location.
3. Generate the standard deviation, a
confidence interval for the
1. The summary
statistics table
displays
25+ statistics.
2. The mean is
0.5078 and a 95%
confidence
interval is
(0.482,0.534).
The linear fit
indicates no drift in
location since
the slope parameter
1.4.2.2.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4224.htm[6/27/2012 2:03:16 PM]
standard
deviation, and detect drift in
variation
by dividing the data into quarters
and
computing Barltetts test for equal
standard deviations.
4. Check for randomness by generating
an
autocorrelation plot and a runs
test.
5. Check for normality by computing
the
normal probability plot
correlation
coefficient.
6. Print a univariate report (this
assumes
steps 2 thru 6 have already been
run).
is
statistically not
significant.
3. The standard
deviation is 0.29
with
a 95% confidence
interval of
(0.277,0.314).
Levene's test
indicates no
significant
drift in
variation.
4. The lag 1
autocorrelation is -
0.03.
From the
autocorrelation plot,
this is
within the 95%
confidence interval
bands.
5. The uniform
probability plot
correlation
coefficient is
0.9995. This
indicates that
the uniform
distribution is a
good fit.
6. The results are
summarized in a
convenient
report.
1.4.2.3. Random Walk
http://www.itl.nist.gov/div898/handbook/eda/section4/eda423.htm[6/27/2012 2:03:17 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.3. Random Walk
Random
Walk
This example illustrates the univariate analysis of a set of
numbers derived from a random walk.
1. Background and Data
2. Test Underlying Assumptions
3. Develop Better Model
4. Validate New Model
5. Work This Example Yourself
1.4.2.3.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4231.htm[6/27/2012 2:03:18 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.3. Random Walk
1.4.2.3.1. Background and Data
Generation A random walk can be generated from a set of uniform
random numbers by the formula:
where U is a set of uniform random numbers.
The motivation for studying a set of random walk data is to
illustrate the effects of a known underlying autocorrelation
structure (i.e., non-randomness) in the data.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Data The following is the set of random walk numbers used for this
case study.
-0.399027
-0.645651
-0.625516
-0.262049
-0.407173
-0.097583
0.314156
0.106905
-0.017675
-0.037111
0.357631
0.820111
0.844148
0.550509
0.090709
0.413625
-0.002149
0.393170
0.538263
0.070583
0.473143
0.132676
0.109111
-0.310553
0.179637
-0.067454
-0.190747
-0.536916
-0.905751
-0.518984
-0.579280
-0.643004
-1.014925
-0.517845
1.4.2.3.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4231.htm[6/27/2012 2:03:18 PM]
-0.860484
-0.884081
-1.147428
-0.657917
-0.470205
-0.798437
-0.637780
-0.666046
-1.093278
-1.089609
-0.853439
-0.695306
-0.206795
-0.507504
-0.696903
-1.116358
-1.044534
-1.481004
-1.638390
-1.270400
-1.026477
-1.123380
-0.770683
-0.510481
-0.958825
-0.531959
-0.457141
-0.226603
-0.201885
-0.078000
0.057733
-0.228762
-0.403292
-0.414237
-0.556689
-0.772007
-0.401024
-0.409768
-0.171804
-0.096501
-0.066854
0.216726
0.551008
0.660360
0.194795
-0.031321
0.453880
0.730594
1.136280
0.708490
1.149048
1.258757
1.102107
1.102846
0.720896
0.764035
1.072312
0.897384
0.965632
0.759684
0.679836
0.955514
1.290043
1.753449
1.542429
1.873803
2.043881
1.728635
1.289703
1.501481
1.888335
1.408421
1.416005
0.929681
1.097632
1.501279
1.650608
1.759718
2.255664
2.490551
2.508200
1.4.2.3.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4231.htm[6/27/2012 2:03:18 PM]
2.707382
2.816310
3.254166
2.890989
2.869330
3.024141
3.291558
3.260067
3.265871
3.542845
3.773240
3.991880
3.710045
4.011288
4.074805
4.301885
3.956416
4.278790
3.989947
4.315261
4.200798
4.444307
4.926084
4.828856
4.473179
4.573389
4.528605
4.452401
4.238427
4.437589
4.617955
4.370246
4.353939
4.541142
4.807353
4.706447
4.607011
4.205943
3.756457
3.482142
3.126784
3.383572
3.846550
4.228803
4.110948
4.525939
4.478307
4.457582
4.822199
4.605752
5.053262
5.545598
5.134798
5.438168
5.397993
5.838361
5.925389
6.159525
6.190928
6.024970
5.575793
5.516840
5.211826
4.869306
4.912601
5.339177
5.415182
5.003303
4.725367
4.350873
4.225085
3.825104
3.726391
3.301088
3.767535
4.211463
4.418722
4.554786
4.987701
4.993045
5.337067
1.4.2.3.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4231.htm[6/27/2012 2:03:18 PM]
5.789629
5.726147
5.934353
5.641670
5.753639
5.298265
5.255743
5.500935
5.434664
5.588610
6.047952
6.130557
5.785299
5.811995
5.582793
5.618730
5.902576
6.226537
5.738371
5.449965
5.895537
6.252904
6.650447
7.025909
6.770340
7.182244
6.941536
7.368996
7.293807
7.415205
7.259291
6.970976
7.319743
6.850454
6.556378
6.757845
6.493083
6.824855
6.533753
6.410646
6.502063
6.264585
6.730889
6.753715
6.298649
6.048126
5.794463
5.539049
5.290072
5.409699
5.843266
5.680389
5.185889
5.451353
5.003233
5.102844
5.566741
5.613668
5.352791
5.140087
4.999718
5.030444
5.428537
5.471872
5.107334
5.387078
4.889569
4.492962
4.591042
4.930187
4.857455
4.785815
5.235515
4.865727
4.855005
4.920206
4.880794
4.904395
4.795317
5.163044
4.807122
1.4.2.3.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4231.htm[6/27/2012 2:03:18 PM]
5.246230
5.111000
5.228429
5.050220
4.610006
4.489258
4.399814
4.606821
4.974252
5.190037
5.084155
5.276501
4.917121
4.534573
4.076168
4.236168
3.923607
3.666004
3.284967
2.980621
2.623622
2.882375
3.176416
3.598001
3.764744
3.945428
4.408280
4.359831
4.353650
4.329722
4.294088
4.588631
4.679111
4.182430
4.509125
4.957768
4.657204
4.325313
4.338800
4.720353
4.235756
4.281361
3.795872
4.276734
4.259379
3.999663
3.544163
3.953058
3.844006
3.684740
3.626058
3.457909
3.581150
4.022659
4.021602
4.070183
4.457137
4.156574
4.205304
4.514814
4.055510
3.938217
4.180232
3.803619
3.553781
3.583675
3.708286
4.005810
4.419880
4.881163
5.348149
4.950740
5.199262
4.753162
4.640757
4.327090
4.080888
3.725953
3.939054
3.463728
3.018284
1.4.2.3.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4231.htm[6/27/2012 2:03:18 PM]
2.661061
3.099980
3.340274
3.230551
3.287873
3.497652
3.014771
3.040046
3.342226
3.656743
3.698527
3.759707
4.253078
4.183611
4.196580
4.257851
4.683387
4.224290
3.840934
4.329286
3.909134
3.685072
3.356611
2.956344
2.800432
2.761665
2.744913
3.037743
2.787390
2.387619
2.424489
2.247564
2.502179
2.022278
2.213027
2.126914
2.264833
2.528391
2.432792
2.037974
1.699475
2.048244
1.640126
1.149858
1.475253
1.245675
0.831979
1.165877
1.403341
1.181921
1.582379
1.632130
2.113636
2.163129
2.545126
2.963833
3.078901
3.055547
3.287442
2.808189
2.985451
3.181679
2.746144
2.517390
2.719231
2.581058
2.838745
2.987765
3.459642
3.458684
3.870956
4.324706
4.411899
4.735330
4.775494
4.681160
4.462470
3.992538
3.719936
3.427081
3.256588
1.4.2.3.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4231.htm[6/27/2012 2:03:18 PM]
3.462766
3.046353
3.537430
3.579857
3.931223
3.590096
3.136285
3.391616
3.114700
2.897760
2.724241
2.557346
2.971397
2.479290
2.305336
1.852930
1.471948
1.510356
1.633737
1.727873
1.512994
1.603284
1.387950
1.767527
2.029734
2.447309
2.321470
2.435092
2.630118
2.520330
2.578147
2.729630
2.713100
3.107260
2.876659
2.774242
3.185503
3.403148
3.392646
3.123339
3.164713
3.439843
3.321929
3.686229
3.203069
3.185843
3.204924
3.102996
3.496552
3.191575
3.409044
3.888246
4.273767
3.803540
4.046417
4.071581
3.916256
3.634441
4.065834
3.844651
3.915219
1.4.2.3.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm[6/27/2012 2:03:20 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.3. Random Walk
1.4.2.3.2. Test Underlying Assumptions
Goal The goal of this analysis is threefold:
1. Determine if the univariate model:
is appropriate and valid.
2. Determine if the typical underlying assumptions for
an "in control" measurement process are valid.
These assumptions are:
1. random drawings;
2. from a fixed distribution;
3. with the distribution having a fixed location;
and
4. the distribution having a fixed scale.
3. Determine if the confidence interval
is appropriate and valid, with s denoting the
standard deviation of the original data.
4-Plot of Data
1.4.2.3.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm[6/27/2012 2:03:20 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates
significant shifts in location over time.
2. The lag plot (upper right) indicates significant non-
randomness in the data.
3. When the assumptions of randomness and constant
location and scale are not satisfied, the
distributional assumptions are not meaningful.
Therefore we do not attempt to make any
interpretation of the histogram (lower left) or the
normal probability plot (lower right).
From the above plots, we conclude that the underlying
assumptions are seriously violated. Therefore the Y
i
= C +
E
i
model is not valid.
When the randomness assumption is seriously violated, a
time series model may be appropriate. The lag plot often
suggests a reasonable model. For example, in this case the
strongly linear appearance of the lag plot suggests a model
fitting Y
i
versus Y
i-1
might be appropriate. When the data
are non-random, it is helpful to supplement the lag plot
with an autocorrelation plot and a spectral plot. Although
in this case the lag plot is enough to suggest an
appropriate model, we provide the autocorrelation and
spectral plots for comparison.
Autocorrelation
Plot
When the lag plot indicates significant non-randomness, it
can be helpful to follow up with a an autocorrelation plot.
This autocorrelation plot shows significant autocorrelation
at lags 1 through 100 in a linearly decreasing fashion.
1.4.2.3.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm[6/27/2012 2:03:20 PM]
Spectral Plot Another useful plot for non-random data is the spectral
plot.
This spectral plot shows a single dominant low frequency
peak.
Quantitative
Output
Although the 4-plot above clearly shows the violation of
the assumptions, we supplement the graphical output with
some quantitative measures.
Summary
Statistics
As a first step in the analysis, common summary statistics
are computed from the data.
Sample size = 500
Mean = 3.216681
Median = 3.612030
Minimum = -1.638390
Maximum = 7.415205
Range = 9.053595
Stan. Dev. = 2.078675
We also computed the autocorrelation to be 0.987, which
is evidence of a very strong autocorrelation.
Location One way to quantify a change in location over time is to
fit a straight line to the data using an index variable as the
independent variable in the regression. For our data, we
assume that data are in sequential run order and that the
data were collected at equally spaced time intervals. In
our regression, we use the index variable X = 1, 2, ..., N,
where N is the number of observations. If there is no
significant drift in the location over time, the slope
parameter should be zero.
Coefficient Estimate Stan. Error
t-Value
B
0
1.83351 0.1721
10.650
B
1
0.552164E-02 0.5953E-03
9.275
Residual Standard Deviation = 1.9214
Residual Degrees of Freedom = 498
1.4.2.3.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4232.htm[6/27/2012 2:03:20 PM]
The t-value of the slope parameter, 9.275, is larger than
the critical value of t
0.975,498
= 1.96. Thus, we conclude
that the slope is different from zero at the 0.05
significance level.
Variation One simple way to detect a change in variation is with a
Bartlett test after dividing the data set into several equal-
sized intervals. However, the Bartlett test is not robust for
non-normality. Since we know this data set is not
approximated well by the normal distribution, we use the
alternative Levene test. In particular, we use the Levene
test based on the median rather the mean. The choice of
the number of intervals is somewhat arbitrary, although
values of four or eight are reasonable. We will divide our
data into four intervals.
H
0
:
1
2
=
2
2
=
3
2
=
4
2
H
a
: At least one
i
2
is not equal to the
others.
Test statistic: W = 10.459
Degrees of freedom: k - 1 = 3
Significance level: = 0.05
Critical value: F
,k-1,N-k
= 2.623
Critical region: Reject H
0
if W > 2.623
In this case, the Levene test indicates that the variances
are significantly different in the four intervals since the
test statistic of 10.459 is greater than the 95 % critical
value of 2.623. Therefore we conclude that the scale is not
constant.
Randomness Although the lag 1 autocorrelation coefficient above
clearly shows the non-randomness, we show the output
from a runs test as well.
H
0
: the sequence was produced in a random
manner
H
a
: the sequence was not produced in a
random manner
Test statistic: Z = -20.3239
Significance level: = 0.05
Critical value: Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
The runs test rejects the null hypothesis that the data were
produced in a random manner at the 0.05 significance
level.
Distributional
Assumptions
Since the quantitative tests show that the assumptions of
randomness and constant location and scale are not met,
the distributional measures will not be meaningful.
Therefore these quantitative tests are omitted.
1.4.2.3.3. Develop A Better Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4233.htm[6/27/2012 2:03:22 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.3. Random Walk
1.4.2.3.3. Develop A Better Model
Lag Plot
Suggests
Better
Model
Since the underlying assumptions did not hold, we need to
develop a better model.
The lag plot showed a distinct linear pattern. Given the
definition of the lag plot, Y
i
versus Y
i-1
, a good candidate
model is a model of the form
Fit Output The results of a linear fit of this model generated the
following results.
Coefficient Estimate Stan. Error t-
Value
A
0
0.050165 0.024171
2.075
A
1
0.987087 0.006313
156.350
Residual Standard Deviation = 0.2931
Residual Degrees of Freedom = 497
The slope parameter, A
1
, has a t value of 156.350 which is
statistically significant. Also, the residual standard deviation is
0.2931. This can be compared to the standard deviation shown
in the summary table, which is 2.078675. That is, the fit to the
autoregressive model has reduced the variability by a factor of
7.
Time
Series
Model
This model is an example of a time series model. More
extensive discussion of time series is given in the Process
Monitoring chapter.
1.4.2.3.4. Validate New Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4234.htm[6/27/2012 2:03:23 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.3. Random Walk
1.4.2.3.4. Validate New Model
Plot
Predicted
with Original
Data
The first step in verifying the model is to plot the predicted
values from the fit with the original data.
This plot indicates a reasonably good fit.
Test
Underlying
Assumptions
on the
Residuals
In addition to the plot of the predicted values, the residual
standard deviation from the fit also indicates a significant
improvement for the model. The next step is to validate the
underlying assumptions for the error component, or
residuals, from this model.
4-Plot of
Residuals
1.4.2.3.4. Validate New Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4234.htm[6/27/2012 2:03:23 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates no
significant shifts in location or scale over time.
2. The lag plot (upper right) exhibits a random
appearance.
3. The histogram shows a relatively flat appearance.
This indicates that a uniform probability distribution
may be an appropriate model for the error component
(or residuals).
4. The normal probability plot clearly shows that the
normal distribution is not an appropriate model for
the error component.
A uniform probability plot can be used to further test the
suggestion that a uniform distribution might be a good
model for the error component.
Uniform
Probability
Plot of
Residuals
1.4.2.3.4. Validate New Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4234.htm[6/27/2012 2:03:23 PM]
Since the uniform probability plot is nearly linear, this
verifies that a uniform distribution is a good model for the
error component.
Conclusions Since the residuals from our model satisfy the underlying
assumptions, we conlude that
where the E
i
follow a uniform distribution is a good model
for this data set. We could simplify this model to
This has the advantage of simplicity (the current point is
simply the previous point plus a uniformly distributed error
term).
Using
Scientific and
Engineering
Knowledge
In this case, the above model makes sense based on our
definition of the random walk. That is, a random walk is
the cumulative sum of uniformly distributed data points. It
makes sense that modeling the current point as the previous
point plus a uniformly distributed error term is about as
good as we can do. Although this case is a bit artificial in
that we knew how the data were constructed, it is common
and desirable to use scientific and engineering knowledge
of the process that generated the data in formulating and
testing models for the data. Quite often, several competing
models will produce nearly equivalent mathematical results.
In this case, selecting the model that best approximates the
scientific understanding of the process is a reasonable
choice.
Time Series
Model
This model is an example of a time series model. More
extensive discussion of time series is given in the Process
Monitoring chapter.
1.4.2.3.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4235.htm[6/27/2012 2:03:24 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.3. Random Walk
1.4.2.3.5. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data.
1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. Validate assumptions.
1. 4-plot of Y.
2. Generate a table of summary
statistics.
3. Generate a linear fit to detect
drift in location.
4. Detect drift in variation by
1. Based on the 4-
plot, there are
shifts
in location and
scale and the data
are not
random.
2. The summary
statistics table
displays
25+ statistics.
1.4.2.3.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4235.htm[6/27/2012 2:03:24 PM]
dividing the data into quarters
and
computing Levene's test for equal
standard deviations.
5. Check for randomness by generating
a runs test.
3. The linear fit
indicates drift in
location since
the slope parameter
is statistically
significant.
4. Levene's test
indicates significant
drift in
variation.
5. The runs test
indicates significant
non-randomness.
3. Generate the randomness plots.
1. Generate an autocorrelation plot.
2. Generate a spectral plot.
1. The
autocorrelation plot
shows
significant
autocorrelation at
lag 1.
2. The spectral plot
shows a single
dominant
low frequency
peak.
4. Fit Y
i
= A0 + A1*Y
i-1
+ E
i
and validate.
1. Generate the fit.
2. Plot fitted line with original
data.
3. Generate a 4-plot of the residuals
from the fit.
4. Generate a uniform probability
plot
of the residuals.
1. The residual
standard deviation
from the
fit is 0.29
(compared to the
standard
deviation of 2.08
from the original
data).
2. The plot of the
predicted values with
the original data
indicates a good fit.
3. The 4-plot
indicates that the
assumptions
of constant
location and scale
are valid.
The lag plot
indicates that the
data are
random. However,
the histogram and
normal
probability plot
indicate that the
uniform
disribution might
be a better model for
the residuals
than the normal
distribution.
4. The uniform
1.4.2.3.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4235.htm[6/27/2012 2:03:24 PM]
probability plot
verifies
that the
residuals can be fit
by a
uniform
distribution.
1.4.2.4. Josephson Junction Cryothermometry
http://www.itl.nist.gov/div898/handbook/eda/section4/eda424.htm[6/27/2012 2:03:25 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.4. Josephson Junction Cryothermometry
Josephson
Junction
Cryothermometry
This example illustrates the univariate analysis of
Josephson junction cyrothermometry.
1. Background and Data
2. Graphical Output and Interpretation
3. Quantitative Output and Interpretation
4. Work This Example Yourself
1.4.2.4.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4241.htm[6/27/2012 2:03:25 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.4. Josephson Junction Cryothermometry
1.4.2.4.1. Background and Data
Generation This data set was collected by Bob Soulen of NIST in
October, 1971 as a sequence of observations collected equi-
spaced in time from a volt meter to ascertain the process
temperature in a Josephson junction cryothermometry (low
temperature) experiment. The response variable is voltage
counts.
Motivation The motivation for studying this data set is to illustrate the
case where there is discreteness in the measurements, but the
underlying assumptions hold. In this case, the discreteness is
due to the data being integers.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Data The following are the data used for this case study.
2899 2898 2898 2900 2898
2901 2899 2901 2900 2898
2898 2898 2898 2900 2898
2897 2899 2897 2899 2899
2900 2897 2900 2900 2899
2898 2898 2899 2899 2899
2899 2899 2898 2899 2899
2899 2902 2899 2900 2898
2899 2899 2899 2899 2899
2899 2900 2899 2900 2898
2901 2900 2899 2899 2899
2899 2899 2900 2899 2898
2898 2898 2900 2896 2897
2899 2899 2900 2898 2900
2901 2898 2899 2901 2900
2898 2900 2899 2899 2897
2899 2898 2899 2899 2898
2899 2897 2899 2899 2897
2899 2897 2899 2897 2897
2899 2897 2898 2898 2899
2897 2898 2897 2899 2899
2898 2898 2897 2898 2895
2897 2898 2898 2896 2898
2898 2897 2896 2898 2898
2897 2897 2898 2898 2896
2898 2898 2896 2899 2898
2898 2898 2899 2899 2898
2898 2899 2899 2899 2900
2900 2901 2899 2898 2898
2900 2899 2898 2901 2897
2898 2898 2900 2899 2899
2898 2898 2899 2898 2901
2900 2897 2897 2898 2898
2900 2898 2899 2898 2898
2898 2896 2895 2898 2898
2898 2898 2897 2897 2895
1.4.2.4.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4241.htm[6/27/2012 2:03:25 PM]
2897 2897 2900 2898 2896
2897 2898 2898 2899 2898
2897 2898 2898 2896 2900
2899 2898 2896 2898 2896
2896 2896 2897 2897 2896
2897 2897 2896 2898 2896
2898 2896 2897 2896 2897
2897 2898 2897 2896 2895
2898 2896 2896 2898 2896
2898 2898 2897 2897 2898
2897 2899 2896 2897 2899
2900 2898 2898 2897 2898
2899 2899 2900 2900 2900
2900 2899 2899 2899 2898
2900 2901 2899 2898 2900
2901 2901 2900 2899 2898
2901 2899 2901 2900 2901
2898 2900 2900 2898 2900
2900 2898 2899 2901 2900
2899 2899 2900 2900 2899
2900 2901 2899 2898 2898
2899 2896 2898 2897 2898
2898 2897 2897 2897 2898
2897 2899 2900 2899 2897
2898 2900 2900 2898 2898
2899 2900 2898 2900 2900
2898 2900 2898 2898 2898
2898 2898 2899 2898 2900
2897 2899 2898 2899 2898
2897 2900 2901 2899 2898
2898 2901 2898 2899 2897
2899 2897 2896 2898 2898
2899 2900 2896 2897 2897
2898 2899 2899 2898 2898
2897 2897 2898 2897 2897
2898 2898 2898 2896 2895
2898 2898 2898 2896 2898
2898 2898 2897 2897 2899
2896 2900 2897 2897 2898
2896 2897 2898 2898 2898
2897 2897 2898 2899 2897
2898 2899 2897 2900 2896
2899 2897 2898 2897 2900
2899 2900 2897 2897 2898
2897 2899 2899 2898 2897
2901 2900 2898 2901 2899
2900 2899 2898 2900 2900
2899 2898 2897 2900 2898
2898 2897 2899 2898 2900
2899 2898 2899 2897 2900
2898 2902 2897 2898 2899
2899 2899 2898 2897 2898
2897 2898 2899 2900 2900
2899 2898 2899 2900 2899
2900 2899 2899 2899 2899
2899 2898 2899 2899 2900
2902 2899 2900 2900 2901
2899 2901 2899 2899 2902
2898 2898 2898 2898 2899
2899 2900 2900 2900 2898
2899 2899 2900 2899 2900
2899 2900 2898 2898 2898
2900 2898 2899 2900 2899
2899 2900 2898 2898 2899
2899 2899 2899 2898 2898
2897 2898 2899 2897 2897
2901 2898 2897 2898 2899
2898 2897 2899 2898 2897
2898 2898 2897 2898 2899
2899 2899 2899 2900 2899
2899 2897 2898 2899 2900
2898 2897 2901 2899 2901
2898 2899 2901 2900 2900
2899 2900 2900 2900 2900
2901 2900 2901 2899 2897
2900 2900 2901 2899 2898
2900 2899 2899 2900 2899
2900 2899 2900 2899 2901
2900 2900 2899 2899 2898
2899 2900 2898 2899 2899
2901 2898 2898 2900 2899
1.4.2.4.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4241.htm[6/27/2012 2:03:25 PM]
2899 2898 2897 2898 2897
2899 2899 2899 2898 2898
2897 2898 2899 2897 2897
2899 2898 2898 2899 2899
2901 2899 2899 2899 2897
2900 2896 2898 2898 2900
2897 2899 2897 2896 2898
2897 2898 2899 2896 2899
2901 2898 2898 2896 2897
2899 2897 2898 2899 2898
2898 2898 2898 2898 2898
2899 2900 2899 2901 2898
2899 2899 2898 2900 2898
2899 2899 2901 2900 2901
2899 2901 2899 2901 2899
2900 2902 2899 2898 2899
2900 2899 2900 2900 2901
2900 2899 2901 2901 2899
2898 2901 2897 2898 2901
2900 2902 2899 2900 2898
2900 2899 2900 2899 2899
2899 2898 2900 2898 2899
2899 2899 2899 2898 2900
1.4.2.4.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4242.htm[6/27/2012 2:03:27 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.4. Josephson Junction Cryothermometry
1.4.2.4.2. Graphical Output and Interpretation
Goal The goal of this analysis is threefold:
1. Determine if the univariate model:
is appropriate and valid.
2. Determine if the typical underlying assumptions for
an "in control" measurement process are valid. These
assumptions are:
1. random drawings;
2. from a fixed distribution;
3. with the distribution having a fixed location;
and
4. the distribution having a fixed scale.
3. Determine if the confidence interval
is appropriate and valid where s is the standard
deviation of the original data.
4-Plot of
Data
1.4.2.4.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4242.htm[6/27/2012 2:03:27 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates that the
data do not have any significant shifts in location or
scale over time.
2. The lag plot (upper right) does not indicate any non-
random pattern in the data.
3. The histogram (lower left) shows that the data are
reasonably symmetric, there does not appear to be
significant outliers in the tails, and that it is
reasonable to assume that the data can be fit with a
normal distribution.
4. The normal probability plot (lower right) is difficult
to interpret due to the fact that there are only a few
distinct values with many repeats.
The integer data with only a few distinct values and many
repeats accounts for the discrete appearance of several of
the plots (e.g., the lag plot and the normal probability plot).
In this case, the nature of the data makes the normal
probability plot difficult to interpret, especially since each
number is repeated many times. However, the histogram
indicates that a normal distribution should provide an
adequate model for the data.
From the above plots, we conclude that the underlying
assumptions are valid and the data can be reasonably
approximated with a normal distribution. Therefore, the
commonly used uncertainty standard is valid and
appropriate. The numerical values for this model are given
in the Quantitative Output and Interpretation section.
Individual
Plots
Although it is normally not necessary, the plots can be
generated individually to give more detail.
Run
Sequence
Plot
1.4.2.4.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4242.htm[6/27/2012 2:03:27 PM]
Lag Plot
Histogram
(with
overlaid
Normal PDF)
Normal
Probability
Plot
1.4.2.4.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4242.htm[6/27/2012 2:03:27 PM]
1.4.2.4.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm[6/27/2012 2:03:28 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.4. Josephson Junction Cryothermometry
1.4.2.4.3. Quantitative Output and
Interpretation
Summary
Statistics
As a first step in the analysis, common summary statistics
were computed from the data.
Sample size = 700
Mean = 2898.562
Median = 2899.000
Minimum = 2895.000
Maximum = 2902.000
Range = 7.000
Stan. Dev. = 1.305
Because of the discrete nature of the data, we also compute
the normal PPCC.
Normal PPCC = 0.97484
Location One way to quantify a change in location over time is to fit
a straight line to the data using an index variable as the
independent variable in the regression. For our data, we
assume that data are in sequential run order and that the
data were collected at equally spaced time intervals. In our
regression, we use the index variable X = 1, 2, ..., N, where
N is the number of observations. If there is no significant
drift in the location over time, the slope parameter should
be zero.
Coefficient Estimate Stan. Error
t-Value
B
0
2.898E+03 9.745E-02
29739.288
B
1
1.071E-03 2.409e-04
4.445
Residual Standard Deviation = 1.288
Residual Degrees of Freedom = 698
The slope parameter, B
1
, has a t value of 4.445 which is
statistically significant (the critical value is 1.96). However,
the value of the slope is 1.071E-03. Given that the slope is
nearly zero, the assumption of constant location is not
seriously violated even though it is statistically significant.
Variation One simple way to detect a change in variation is with a
Bartlett test after dividing the data set into several equal-
sized intervals. However, the Bartlett test is not robust for
non-normality. Since the nature of the data (a few distinct
1.4.2.4.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm[6/27/2012 2:03:28 PM]
points repeated many times) makes the normality
assumption questionable, we use the alternative Levene
test. In particular, we use the Levene test based on the
median rather the mean. The choice of the number of
intervals is somewhat arbitrary, although values of four or
eight are reasonable. We will divide our data into four
intervals.
H
0
:
1
2
=
2
2
=
3
2
=
4
2
H
a
: At least one
i
2
is not equal to the
others.
Test statistic: W = 1.43
Degrees of freedom: k - 1 = 3
Significance level: = 0.05
Critical value: F
,k-1,N-k
= 2.618
Critical region: Reject H
0
if W > 2.618
Since the Levene test statistic value of 1.43 is less than the
95 % critical value of 2.618, we conclude that the variances
are not significantly different in the four intervals.
Randomness There are many ways in which data can be non-random.
However, most common forms of non-randomness can be
detected with a few simple tests. The lag plot in the
previous section is a simple graphical technique.
Another check is an autocorrelation plot that shows the
autocorrelations for various lags. Confidence bands can be
plotted at the 95 % and 99 % confidence levels. Points
outside this band indicate statistically significant values (lag
0 is always 1).
The lag 1 autocorrelation, which is generally the one of
most interest, is 0.31. The critical values at the 5 % level of
significance are -0.087 and 0.087. This indicates that the
lag 1 autocorrelation is statistically significant, so there is
some evidence for non-randomness.
A common test for randomness is the runs test.
1.4.2.4.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm[6/27/2012 2:03:28 PM]
H
0
: the sequence was produced in a random
manner
H
a
: the sequence was not produced in a
random manner
Test statistic: Z = -13.4162
Significance level: = 0.05
Critical value: Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
The runs test indicates non-randomness.
Although the runs test and lag 1 autocorrelation indicate
some mild non-randomness, it is not sufficient to reject the
Y
i
= C + E
i
model. At least part of the non-randomness can
be explained by the discrete nature of the data.
Distributional
Analysis
Probability plots are a graphical test for assessing if a
particular distribution provides an adequate fit to a data set.
A quantitative enhancement to the probability plot is the
correlation coefficient of the points on the probability plot,
or PPCC. For this data set the PPCC based on a normal
distribution is 0.975. Since the PPCC is less than the critical
value of 0.987 (this is a tabulated value), the normality
assumption is rejected.
Chi-square and Kolmogorov-Smirnov goodness-of-fit tests
are alternative methods for assessing distributional
adequacy. The Wilk-Shapiro and Anderson-Darling tests
can be used to test for normality. The results of the
Anderson-Darling test follow.
H
0
: the data are normally distributed
H
a
: the data are not normally distributed
Adjusted test statistic: A
2
= 16.858
Significance level: = 0.05
Critical value: 0.787
Critical region: Reject H
0
if A
2
> 0.787
The Anderson-Darling test rejects the normality assumption
because the test statistic, 16.858, is greater than the 95 %
critical value 0.787.
Although the data are not strictly normal, the violation of
the normality assumption is not severe enough to conclude
that the Y
i
= C + E
i
model is unreasonable. At least part of
the non-normality can be explained by the discrete nature
of the data.
Outlier
Analysis
A test for outliers is the Grubbs test.
H
0
: there are no outliers in the data
H
a
: the maximum value is an outlier
Test statistic: G = 2.729201
Significance level: = 0.05
Critical value for a one-tailed test:
1.4.2.4.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm[6/27/2012 2:03:28 PM]
3.950619
Critical region: Reject H
0
if G > 3.950619
For this data set, Grubbs' test does not detect any outliers at
the 0.05 significance level.
Model Although the randomness and normality assumptions were
mildly violated, we conclude that a reasonable model for
the data is:
In addition, a 95 % confidence interval for the mean value
is (2898.515, 2898.928).
Univariate
Report
It is sometimes useful and convenient to summarize the
above results in a report.
Analysis for Josephson Junction Cryothermometry
Data
1: Sample Size = 700
2: Location
Mean =
2898.562
Standard Deviation of Mean =
0.049323
95% Confidence Interval for Mean =
(2898.465,2898.658)
Drift with respect to location? = YES
(Further analysis indicates that
the drift, while statistically
significant, is not practically
significant)
3: Variation
Standard Deviation =
1.30497
95% Confidence Interval for SD =
(1.240007,1.377169)
Drift with respect to variation?
(based on Levene's test on quarters
of the data) = NO
4: Distribution
Normal PPCC =
0.97484
Data are Normal?
(as measured by Normal PPCC) = NO
5: Randomness
Autocorrelation =
0.314802
Data are Random?
(as measured by autocorrelation) = NO
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed, here we are testing only for
fixed normal)
Data Set is in Statistical Control? = NO
Note: Although we have violations of
the assumptions, they are mild enough,
and at least partially explained by the
discrete nature of the data, so we may model
the data as if it were in statistical
control
7: Outliers?
(as determined by Grubbs test) = NO
1.4.2.4.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm[6/27/2012 2:03:28 PM]
1.4.2.4.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4244.htm[6/27/2012 2:03:29 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.4. Josephson Junction Cryothermometry
1.4.2.4.4. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data.
1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. 4-plot of the data.
1. 4-plot of Y. 1. Based on the 4-
plot, there are no
shifts
in location or
scale. Due to the
nature
of the data (a
few distinct points
with
many repeats),
the normality
assumption is
1.4.2.4.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4244.htm[6/27/2012 2:03:29 PM]
questionable.
3. Generate the individual plots.
1. Generate a run sequence plot.
2. Generate a lag plot.
3. Generate a histogram with an
overlaid normal pdf.
4. Generate a normal probability
plot.
1. The run sequence
plot indicates that
there are no
shifts of location or
scale.
2. The lag plot
does not indicate any
significant
patterns (which would
show the data
were not random).
3. The histogram
indicates that a
normal
distribution is a
good
distribution for
these data.
4. The discrete
nature of the data
masks
the normality or
non-normality of the
data somewhat.
The plot indicates
that
a normal
distribution provides
a rough
approximation for
the data.
4. Generate summary statistics,
quantitative
analysis, and print a univariate
report.
1. Generate a table of summary
statistics.
2. Generate the mean, a confidence
interval for the mean, and compute
a linear fit to detect drift in
location.
3. Generate the standard deviation, a
confidence interval for the
standard
deviation, and detect drift in
variation
by dividing the data into quarters
and
computing Levene's test for equal
standard deviations.
4. Check for randomness by generating
an
autocorrelation plot and a runs
test.
5. Check for normality by computing
the
normal probability plot
correlation
coefficient.
1. The summary
statistics table
displays
25+ statistics.
2. The mean is
2898.56 and a 95%
confidence
interval is
(2898.46,2898.66).
The linear fit
indicates no
meaningful drift
in location since
the value of the
slope
parameter is near
zero.
3. The standard
devaition is 1.30
with
a 95% confidence
interval of
(1.24,1.38).
Levene's test
indicates no
significant
drift in
variation.
1.4.2.4.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4244.htm[6/27/2012 2:03:29 PM]
6. Check for outliers using Grubbs'
test.
7. Print a univariate report (this
assumes
steps 2 thru 6 have already been
run).
4. The lag 1
autocorrelation is
0.31.
This indicates
some mild non-
randomness.
5. The normal
probability plot
correlation
coefficient is
0.975. At the 5%
level,
we reject the
normality assumption.
6. Grubbs' test
detects no outliers
at the
5% level.
7. The results are
summarized in a
convenient
report.
1.4.2.5. Beam Deflections
http://www.itl.nist.gov/div898/handbook/eda/section4/eda425.htm[6/27/2012 2:03:30 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.5. Beam Deflections
Beam
Deflection
This example illustrates the univariate analysis of beam
deflection data.
1. Background and Data
2. Test Underlying Assumptions
3. Develop a Better Model
4. Validate New Model
5. Work This Example Yourself
1.4.2.5.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm[6/27/2012 2:03:31 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.5. Beam Deflections
1.4.2.5.1. Background and Data
Generation This data set was collected by H. S. Lew of NIST in 1969 to
measure steel-concrete beam deflections. The response
variable is the deflection of a beam from the center point.
The motivation for studying this data set is to show how the
underlying assumptions are affected by periodic data.
Data The following are the data used for this case study.
-213
-564
-35
-15
141
115
-420
-360
203
-338
-431
194
-220
-513
154
-125
-559
92
-21
-579
-52
99
-543
-175
162
-457
-346
204
-300
-474
164
-107
-572
-8
83
-541
-224
180
-420
-374
201
-236
-531
83
27
-564
-112
131
1.4.2.5.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm[6/27/2012 2:03:31 PM]
-507
-254
199
-311
-495
143
-46
-579
-90
136
-472
-338
202
-287
-477
169
-124
-568
17
48
-568
-135
162
-430
-422
172
-74
-577
-13
92
-534
-243
194
-355
-465
156
-81
-578
-64
139
-449
-384
193
-198
-538
110
-44
-577
-6
66
-552
-164
161
-460
-344
205
-281
-504
134
-28
-576
-118
156
-437
-381
200
-220
-540
83
11
-568
-160
172
-414
-408
188
-125
-572
-32
139
-492
1.4.2.5.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm[6/27/2012 2:03:31 PM]
-321
205
-262
-504
142
-83
-574
0
48
-571
-106
137
-501
-266
190
-391
-406
194
-186
-553
83
-13
-577
-49
103
-515
-280
201
300
-506
131
-45
-578
-80
138
-462
-361
201
-211
-554
32
74
-533
-235
187
-372
-442
182
-147
-566
25
68
-535
-244
194
-351
-463
174
-125
-570
15
72
-550
-190
172
-424
-385
198
-218
-536
96
1.4.2.5.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm[6/27/2012 2:03:32 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.5. Beam Deflections
1.4.2.5.2. Test Underlying Assumptions
Goal The goal of this analysis is threefold:
1. Determine if the univariate model:
is appropriate and valid.
2. Determine if the typical underlying assumptions for
an "in control" measurement process are valid.
These assumptions are:
1. random drawings;
2. from a fixed distribution;
3. with the distribution having a fixed location;
and
4. the distribution having a fixed scale.
3. Determine if the confidence interval
is appropriate and valid where s is the standard
deviation of the original data.
4-Plot of Data
1.4.2.5.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm[6/27/2012 2:03:32 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates that the
data do not have any significant shifts in location or
scale over time.
2. The lag plot (upper right) shows that the data are
not random. The lag plot further indicates the
presence of a few outliers.
3. When the randomness assumption is thus seriously
violated, the histogram (lower left) and normal
probability plot (lower right) are ignored since
determining the distribution of the data is only
meaningful when the data are random.
From the above plots we conclude that the underlying
randomness assumption is not valid. Therefore, the model
is not appropriate.
We need to develop a better model. Non-random data can
frequently be modeled using time series mehtodology.
Specifically, the circular pattern in the lag plot indicates
that a sinusoidal model might be appropriate. The
sinusoidal model will be developed in the next section.
Individual Plots The plots can be generated individually for more detail. In
this case, only the run sequence plot and the lag plot are
drawn since the distributional plots are not meaningful.
Run Sequence
Plot
Lag Plot
1.4.2.5.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm[6/27/2012 2:03:32 PM]
We have drawn some lines and boxes on the plot to better
isolate the outliers. The following data points appear to be
outliers based on the lag plot.
INDEX Y(i-1) Y(i)
158 -506.00 300.00
157 300.00 201.00
3 -15.00 -35.00
5 115.00 141.00
That is, the third, fifth, 157th, and 158th points appear to
be outliers.
Autocorrelation
Plot
When the lag plot indicates significant non-randomness, it
can be helpful to follow up with a an autocorrelation plot.
This autocorrelation plot shows a distinct cyclic pattern.
As with the lag plot, this suggests a sinusoidal model.
Spectral Plot Another useful plot for non-random data is the spectral
plot.
1.4.2.5.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm[6/27/2012 2:03:32 PM]
This spectral plot shows a single dominant peak at a
frequency of 0.3. This frequency of 0.3 will be used in
fitting the sinusoidal model in the next section.
Quantitative
Results
Although the lag plot, autocorrelation plot, and spectral
plot clearly show the violation of the randomness
assumption, we supplement the graphical output with
some quantitative measures.
Summary
Statistics
As a first step in the analysis, summary statistics are
computed from the data.
Sample size = 200
Mean = -177.4350
Median = -162.0000
Minimum = -579.0000
Maximum = 300.0000
Range = 879.0000
Stan. Dev. = 277.3322
Location One way to quantify a change in location over time is to
fit a straight line to the data set using the index variable X
= 1, 2, ..., N, with N denoting the number of observations.
If there is no significant drift in the location, the slope
parameter should be zero.
Coefficient Estimate Stan. Error
t-Value
A
0
-178.175 39.47
-4.514
A
1
0.7366E-02 0.34
0.022
Residual Standard Deviation = 278.0313
Residual Degrees of Freedom = 198
The slope parameter, A1, has a t value of 0.022 which is
statistically not significant. This indicates that the slope
can in fact be considered zero.
Variation One simple way to detect a change in variation is with a
Bartlett test after dividing the data set into several equal-
1.4.2.5.2. Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm[6/27/2012 2:03:32 PM]
sized intervals. However, the Bartlett the non-randomness
of this data does not allows us to assume normality, we
use the alternative Levene test. In partiuclar, we use the
Levene test based on the median rather the mean. The
choice of the number of intervals is somewhat arbitrary,
although values of 4 or 8 are reasonable.
H
0
:
1
2
=
2
2
=
3
2
=
4
2
H
a
: At least one
i
2
is not equal to the
others.
Test statistic: W = 0.09378
Degrees of freedom: k - 1 = 3
Sample size: N = 200
Significance level: = 0.05
Critical value: F
,k-1,N-k
= 2.651
Critical region: Reject H
0
if W > 2.651
In this case, the Levene test indicates that the variances
are not significantly different in the four intervals since
the test statistic value, 0.9378, is less than the critical
value of 2.651.
Randomness A runs test is used to check for randomness
H
0
: the sequence was produced in a random
manner
H
a
: the sequence was not produced in a
random manner
Test statistic: Z = 2.6938
Significance level: = 0.05
Critical value: Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
The absolute value of the test statistic is larger than the
critical value at the 5 % significance level, so we conclude
that the data are not random.
Distributional
Assumptions
Since the quantitative tests show that the assumptions of
constant scale and non-randomness are not met, the
distributional measures will not be meaningful. Therefore
these quantitative tests are omitted.
1.4.2.5.3. Develop a Better Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4253.htm[6/27/2012 2:03:34 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.5. Beam Deflections
1.4.2.5.3. Develop a Better Model
Sinusoidal
Model
The lag plot and autocorrelation plot in the previous section strongly
suggested a sinusoidal model might be appropriate. The basic sinusoidal
model is:
where C is constant defining a mean level, is an amplitude for the sine
function, is the frequency, T
i
is a time variable, and is the phase.
This sinusoidal model can be fit using non-linear least squares.
To obtain a good fit, sinusoidal models require good starting values for
C, the amplitude, and the frequency.
Good Starting
Value for C
A good starting value for C can be obtained by calculating the mean of
the data. If the data show a trend, i.e., the assumption of constant
location is violated, we can replace C with a linear or quadratic least
squares fit. That is, the model becomes
or
Since our data did not have any meaningful change of location, we can
fit the simpler model with C equal to the mean. From the summary
output in the previous page, the mean is -177.44.
Good Starting
Value for
Frequency
The starting value for the frequency can be obtained from the spectral
plot, which shows the dominant frequency is about 0.3.
Complex
Demodulation
Phase Plot
The complex demodulation phase plot can be used to refine this initial
estimate for the frequency.
For the complex demodulation plot, if the lines slope from left to right,
the frequency should be increased. If the lines slope from right to left, it
should be decreased. A relatively flat (i.e., horizontal) slope indicates a
good frequency. We could generate the demodulation phase plot for 0.3
and then use trial and error to obtain a better estimate for the frequency.
To simplify this, we generate 16 of these plots on a single page starting
1.4.2.5.3. Develop a Better Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4253.htm[6/27/2012 2:03:34 PM]
with a frequency of 0.28, increasing in increments of 0.0025, and
stopping at 0.3175.
Interpretation The plots start with lines sloping from left to right but gradually change
to a right to left slope. The relatively flat slope occurs for frequency
0.3025 (third row, second column). The complex demodulation phase
plot restricts the range from to . This is why the plot appears
to show some breaks.
Good Starting
Values for
Amplitude
The complex demodulation amplitude plot is used to find a good starting
value for the amplitude. In addition, this plot indicates whether or not the
amplitude is constant over the entire range of the data or if it varies. If
the plot is essentially flat, i.e., zero slope, then it is reasonable to assume
a constant amplitude in the non-linear model. However, if the slope
varies over the range of the plot, we may need to adjust the model to be:
That is, we replace with a function of time. A linear fit is specified in
the model above, but this can be replaced with a more elaborate function
if needed.
Complex
Demodulation
Amplitude
Plot
1.4.2.5.3. Develop a Better Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4253.htm[6/27/2012 2:03:34 PM]
The complex demodulation amplitude plot for this data shows that:
1. The amplitude is fixed at approximately 390.
2. There is a short start-up effect.
3. There is a change in amplitude at around x=160 that should be
investigated for an outlier.
In terms of a non-linear model, the plot indicates that fitting a single
constant for should be adequate for this data set.
Fit Results Using starting estimates of 0.3025 for the frequency, 390 for the
amplitude, and -177.44 for C, the following parameters were estimated.
Coefficient Estimate Stan. Error t-Value
C -178.786 11.02 -16.22
AMP -361.766 26.19 -13.81
FREQ 0.302596 0.1510E-03 2005.00
PHASE 1.46536 0.4909E-01 29.85
Residual Standard Deviation = 155.8484
Residual Degrees of Freedom = 196
Model From the fit results, our proposed model is:
We will evaluate the adequacy of this model in the next section.
1.4.2.5.4. Validate New Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4254.htm[6/27/2012 2:03:35 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.5. Beam Deflections
1.4.2.5.4. Validate New Model
4-Plot of
Residuals
The first step in evaluating the fit is to generate a 4-plot of the residuals.
Interpretation The assumptions are addressed by the graphics shown above:
1. The run sequence plot (upper left) indicates that the data do not
have any significant shifts in location. There does seem to be
some shifts in scale. A start-up effect was detected previously by
the complex demodulation amplitude plot. There does appear to be
a few outliers.
2. The lag plot (upper right) shows that the data are random. The
outliers also appear in the lag plot.
3. The histogram (lower left) and the normal probability plot (lower
right) do not show any serious non-normality in the residuals.
However, the bend in the left portion of the normal probability
plot shows some cause for concern.
The 4-plot indicates that this fit is reasonably good. However, we will
attempt to improve the fit by removing the outliers.
Fit Results
with Outliers
Removed
The following parameter estimates were obtained after removing three
outliers.
Coefficient Estimate Stan. Error t-Value
1.4.2.5.4. Validate New Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4254.htm[6/27/2012 2:03:35 PM]
C -178.788 10.57 -16.91
AMP -361.759 25.45 -14.22
FREQ 0.302597 0.1457E-03 2077.00
PHASE 1.46533 0.4715E-01 31.08
Residual Standard Deviation = 148.3398
Residual Degrees of Freedom = 193
New Fit to
Edited Data
The original fit, with a residual standard deviation of 155.84, was:
The new fit, with a residual standard deviation of 148.34, is:
There is minimal change in the parameter estimates and about a 5 %
reduction in the residual standard deviation. In this case, removing the
residuals has a modest benefit in terms of reducing the variability of the
model.
4-Plot for
New Fit
This plot shows that the underlying assumptions are satisfied and
therefore the new fit is a good descriptor of the data.
In this case, it is a judgment call whether to use the fit with or without
the outliers removed.
1.4.2.5.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4255.htm[6/27/2012 2:03:36 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.5. Beam Deflections
1.4.2.5.5. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data.
1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. Validate assumptions.
1. 4-plot of Y.
2. Generate a run sequence plot.
3. Generate a lag plot.
1. Based on the 4-
plot, there are no
obvious shifts in
location and scale,
but the data are
not random.
2. Based on the run
sequence plot, there
are no obvious
shifts in location
and
1.4.2.5.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4255.htm[6/27/2012 2:03:36 PM]
4. Generate an autocorrelation plot.
5. Generate a spectral plot.
6. Generate a table of summary
statistics.
7. Generate a linear fit to detect
drift in location.
8. Detect drift in variation by
dividing the data into quarters
and
computing Levene's test statistic
for
equal standard deviations.
9. Check for randomness by generating
a runs test.
scale.
3. Based on the lag
plot, the data
are not random.
4. The
autocorrelation plot
shows
significant
autocorrelation at
lag 1.
5. The spectral plot
shows a single
dominant
low frequency
peak.
6. The summary
statistics table
displays
25+ statistics.
7. The linear fit
indicates no drift in
location since
the slope parameter
is not
statistically
significant.
8. Levene's test
indicates no
significant drift
in variation.
9. The runs test
indicates significant
non-randomness.
3. Fit
Y
i
= C + A*SIN(2*PI*omega*t
i
+phi).
1. Generate a complex demodulation
phase plot.
2. Generate a complex demodulation
amplitude plot.
3. Fit the non-linear model.
1. Complex
demodulation phase
plot
indicates a
starting frequency
of 0.3025.
2. Complex
demodulation
amplitude
plot indicates an
amplitude of
390 (but there
is a short start-up
effect).
3. Non-linear fit
generates final
parameter
estimates. The
residual standard
deviation from
the fit is 155.85
(compared to the
standard
deviation of 277.73
from
the original
data).
1.4.2.5.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4255.htm[6/27/2012 2:03:36 PM]
4. Validate fit.
1. Generate a 4-plot of the residuals
from the fit.
2. Generate a nonlinear fit with
outliers removed.
3. Generate a 4-plot of the residuals
from the fit with the outliers
removed.
1. The 4-plot
indicates that the
assumptions
of constant
location and scale
are valid.
The lag plot
indicates that the
data are
random. The
histogram and normal
probability plot
indicate that the
residuals
that the
normality assumption
for the
residuals are not
seriously violated,
although there is
a bend on the
probablity
plot that
warrants attention.
2. The fit after
removing 3 outliers
shows
some marginal
improvement in the
model
(a 5% reduction
in the residual
standard
deviation).
3. The 4-plot of
the model fit after
3 outliers
removed shows
marginal
improvement in
satisfying model
assumptions.
1.4.2.6. Filter Transmittance
http://www.itl.nist.gov/div898/handbook/eda/section4/eda426.htm[6/27/2012 2:03:37 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.6. Filter Transmittance
Filter
Transmittance
This example illustrates the univariate analysis of filter
transmittance data.
1. Background and Data
2. Graphical Output and Interpretation
3. Quantitative Output and Interpretation
4. Work This Example Yourself
1.4.2.6.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4261.htm[6/27/2012 2:03:38 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.6. Filter Transmittance
1.4.2.6.1. Background and Data
Generation This data set was collected by NIST chemist Radu
Mavrodineaunu in the 1970's from an automatic data
acquisition system for a filter transmittance experiment. The
response variable is transmittance.
The motivation for studying this data set is to show how the
underlying autocorrelation structure in a relatively small data
set helped the scientist detect problems with his automatic
data acquisition system.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Data The following are the data used for this case study.
2.00180
2.00170
2.00180
2.00190
2.00180
2.00170
2.00150
2.00140
2.00150
2.00150
2.00170
2.00180
2.00180
2.00190
2.00190
2.00210
2.00200
2.00160
2.00140
2.00130
2.00130
2.00150
2.00150
2.00160
2.00150
2.00140
2.00130
2.00140
2.00150
2.00140
2.00150
2.00160
2.00150
2.00160
2.00190
2.00200
2.00200
2.00210
2.00220
1.4.2.6.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4261.htm[6/27/2012 2:03:38 PM]
2.00230
2.00240
2.00250
2.00270
2.00260
2.00260
2.00260
2.00270
2.00260
2.00250
2.00240
1.4.2.6.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4262.htm[6/27/2012 2:03:39 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.6. Filter Transmittance
1.4.2.6.2. Graphical Output and Interpretation
Goal The goal of this analysis is threefold:
1. Determine if the univariate model:
is appropriate and valid.
2. Determine if the typical underlying assumptions for
an "in control" measurement process are valid. These
assumptions are:
1. random drawings;
2. from a fixed distribution;
3. with the distribution having a fixed location;
and
4. the distribution having a fixed scale.
3. Determine if the confidence interval
is appropriate and valid where s is the standard
deviation of the original data.
4-Plot of
Data
1.4.2.6.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4262.htm[6/27/2012 2:03:39 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates a
significant shift in location around x=35.
2. The linear appearance in the lag plot (upper right)
indicates a non-random pattern in the data.
3. Since the lag plot indicates significant non-
randomness, we do not make any interpretation of
either the histogram (lower left) or the normal
probability plot (lower right).
The serious violation of the non-randomness assumption
means that the univariate model
is not valid. Given the linear appearance of the lag plot, the
first step might be to consider a model of the type
However, in this case discussions with the scientist revealed
that non-randomness was entirely unexpected. An
examination of the experimental process revealed that the
sampling rate for the automatic data acquisition system was
too fast. That is, the equipment did not have sufficient time
to reset before the next sample started, resulting in the
current measurement being contaminated by the previous
measurement. The solution was to rerun the experiment
allowing more time between samples.
Simple graphical techniques can be quite effective in
revealing unexpected results in the data. When this occurs,
it is important to investigate whether the unexpected result
is due to problems in the experiment and data collection or
is indicative of unexpected underlying structure in the data.
This determination cannot be made on the basis of statistics
alone. The role of the graphical and statistical analysis is to
detect problems or unexpected results in the data. Resolving
the issues requires the knowledge of the scientist or
engineer.
Individual
Plots
Although it is generally unnecessary, the plots can be
generated individually to give more detail. Since the lag
plot indicates significant non-randomness, we omit the
distributional plots.
Run
Sequence
Plot
1.4.2.6.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4262.htm[6/27/2012 2:03:39 PM]
Lag Plot
1.4.2.6.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4263.htm[6/27/2012 2:03:40 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.6. Filter Transmittance
1.4.2.6.3. Quantitative Output and
Interpretation
Summary
Statistics
As a first step in the analysis, common summary statistics
are computed from the data.
Sample size = 50
Mean = 2.0019
Median = 2.0018
Minimum = 2.0013
Maximum = 2.0027
Range = 0.0014
Stan. Dev. = 0.0004
Location One way to quantify a change in location over time is to fit
a straight line to the data using an index variable as the
independent variable in the regression. For our data, we
assume that data are in sequential run order and that the
data were collected at equally spaced time intervals. In our
regression, we use the index variable X = 1, 2, ..., N, where
N is the number of observations. If there is no significant
drift in the location over time, the slope parameter should
be zero.
Coefficient Estimate Stan. Error
t-Value
B
0
2.00138 0.9695E-04
0.2064E+05
B
1
0.185E-04 0.3309E-05
5.582
Residual Standard Deviation = 0.3376404E-03
Residual Degrees of Freedom = 48
The slope parameter, B
1
, has a t value of 5.582, which is
statistically significant. Although the estimated slope,
0.185E-04, is nearly zero, the range of data (2.0013 to
2.0027) is also very small. In this case, we conclude that
there is drift in location, although it is relatively small.
Variation One simple way to detect a change in variation is with a
Bartlett test after dividing the data set into several equal
sized intervals. However, the Bartlett test is not robust for
non-normality. Since the normality assumption is
questionable for these data, we use the alternative Levene
test. In particular, we use the Levene test based on the
median rather the mean. The choice of the number of
intervals is somewhat arbitrary, although values of four or
1.4.2.6.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4263.htm[6/27/2012 2:03:40 PM]
eight are reasonable. We will divide our data into four
intervals.
H
0
:
1
2
=
2
2
=
3
2
=
4
2
H
a
: At least one
i
2
is not equal to the
others.
Test statistic: W = 0.971
Degrees of freedom: k - 1 = 3
Significance level: = 0.05
Critical value: F
,k-1,N-k
= 2.806
Critical region: Reject H
0
if W > 2.806
In this case, since the Levene test statistic value of 0.971 is
less than the critical value of 2.806 at the 5 % level, we
conclude that there is no evidence of a change in variation.
Randomness There are many ways in which data can be non-random.
However, most common forms of non-randomness can be
detected with a few simple tests. The lag plot in the 4-plot
in the previous seciton is a simple graphical technique.
One check is an autocorrelation plot that shows the
autocorrelations for various lags. Confidence bands can be
plotted at the 95 % and 99 % confidence levels. Points
outside this band indicate statistically significant values (lag
0 is always 1).
The lag 1 autocorrelation, which is generally the one of
most interest, is 0.93. The critical values at the 5 % level
are -0.277 and 0.277. This indicates that the lag 1
autocorrelation is statistically significant, so there is strong
evidence of non-randomness.
A common test for randomness is the runs test.
H
0
: the sequence was produced in a random
manner
H
a
: the sequence was not produced in a
random manner
Test statistic: Z = -5.3246
1.4.2.6.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4263.htm[6/27/2012 2:03:40 PM]
Significance level: = 0.05
Critical value: Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
Because the test statistic is outside of the critical region, we
reject the null hypothesis and conclude that the data are not
random.
Distributional
Analysis
Since we rejected the randomness assumption, the
distributional tests are not meaningful. Therefore, these
quantitative tests are omitted. We also omit Grubbs' outlier
test since it also assumes the data are approximately
normally distributed.
Univariate
Report
It is sometimes useful and convenient to summarize the
above results in a report.
Analysis for filter transmittance data
1: Sample Size = 50
2: Location
Mean =
2.001857
Standard Deviation of Mean =
0.00006
95% Confidence Interval for Mean =
(2.001735,2.001979)
Drift with respect to location? = NO
3: Variation
Standard Deviation =
0.00043
95% Confidence Interval for SD =
(0.000359,0.000535)
Change in variation?
(based on Levene's test on quarters
of the data) = NO
4: Distribution
Distributional tests omitted due to
non-randomness of the data
5: Randomness
Lag One Autocorrelation =
0.937998
Data are Random?
(as measured by autocorrelation) = NO
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed, here we are testing only for
normal)
Data Set is in Statistical Control? = NO
7: Outliers?
(Grubbs' test omitted) = NO
1.4.2.6.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4264.htm[6/27/2012 2:03:40 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.6. Filter Transmittance
1.4.2.6.4. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data.
1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. 4-plot of the data.
1. 4-plot of Y. 1. Based on the 4-
plot, there is a
shift
in location and
the data are not
random.
3. Generate the individual plots.
1. Generate a run sequence plot. 1. The run sequence
1.4.2.6.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4264.htm[6/27/2012 2:03:40 PM]
2. Generate a lag plot.
plot indicates that
there is a shift
in location.
2. The strong linear
pattern of the lag
plot indicates
significant
non-randomness.
4. Generate summary statistics,
quantitative
analysis, and print a univariate
report.
1. Generate a table of summary
statistics.
2. Compute a linear fit based on
quarters of the data to detect
drift in location.
3. Compute Levene's test based on
quarters of the data to detect
changes in variation.
4. Check for randomness by generating
an
autocorrelation plot and a runs
test.
5. Print a univariate report (this
assumes
steps 2 thru 4 have already been
run).
1. The summary
statistics table
displays
25+ statistics.
2. The linear fit
indicates a slight
drift in
location since
the slope parameter
is
statistically
significant, but
small.
3. Levene's test
indicates no
significant
drift in
variation.
4. The lag 1
autocorrelation is
0.94.
This is outside
the 95% confidence
interval bands
which indicates
significant
non-randomness.
5. The results are
summarized in a
convenient
report.
1.4.2.7. Standard Resistor
http://www.itl.nist.gov/div898/handbook/eda/section4/eda427.htm[6/27/2012 2:03:41 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.7. Standard Resistor
Standard
Resistor
This example illustrates the univariate analysis of standard
resistor data.
1. Background and Data
2. Graphical Output and Interpretation
3. Quantitative Output and Interpretation
4. Work This Example Yourself
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.7. Standard Resistor
1.4.2.7.1. Background and Data
Generation This data set was collected by Ron Dziuba of NIST over a 5-
year period from 1980 to 1985. The response variable is
resistor values.
The motivation for studying this data set is to illustrate data
that violate the assumptions of constant location and scale.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Data The following are the data used for this case study.
27.8680
27.8929
27.8773
27.8530
27.8876
27.8725
27.8743
27.8879
27.8728
27.8746
27.8863
27.8716
27.8818
27.8872
27.8885
27.8945
27.8797
27.8627
27.8870
27.8895
27.9138
27.8931
27.8852
27.8788
27.8827
27.8939
27.8558
27.8814
27.8479
27.8479
27.8848
27.8809
27.8479
27.8611
27.8630
27.8679
27.8637
27.8985
27.8900
27.8577
27.8848
27.8869
27.8976
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
27.8610
27.8567
27.8417
27.8280
27.8555
27.8639
27.8702
27.8582
27.8605
27.8900
27.8758
27.8774
27.9008
27.8988
27.8897
27.8990
27.8958
27.8830
27.8967
27.9105
27.9028
27.8977
27.8953
27.8970
27.9190
27.9180
27.8997
27.9204
27.9234
27.9072
27.9152
27.9091
27.8882
27.9035
27.9267
27.9138
27.8955
27.9203
27.9239
27.9199
27.9646
27.9411
27.9345
27.8712
27.9145
27.9259
27.9317
27.9239
27.9247
27.9150
27.9444
27.9457
27.9166
27.9066
27.9088
27.9255
27.9312
27.9439
27.9210
27.9102
27.9083
27.9121
27.9113
27.9091
27.9235
27.9291
27.9253
27.9092
27.9117
27.9194
27.9039
27.9515
27.9143
27.9124
27.9128
27.9260
27.9339
27.9500
27.9530
27.9430
27.9400
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
27.8850
27.9350
27.9120
27.9260
27.9660
27.9280
27.9450
27.9390
27.9429
27.9207
27.9205
27.9204
27.9198
27.9246
27.9366
27.9234
27.9125
27.9032
27.9285
27.9561
27.9616
27.9530
27.9280
27.9060
27.9380
27.9310
27.9347
27.9339
27.9410
27.9397
27.9472
27.9235
27.9315
27.9368
27.9403
27.9529
27.9263
27.9347
27.9371
27.9129
27.9549
27.9422
27.9423
27.9750
27.9339
27.9629
27.9587
27.9503
27.9573
27.9518
27.9527
27.9589
27.9300
27.9629
27.9630
27.9660
27.9730
27.9660
27.9630
27.9570
27.9650
27.9520
27.9820
27.9560
27.9670
27.9520
27.9470
27.9720
27.9610
27.9437
27.9660
27.9580
27.9660
27.9700
27.9600
27.9660
27.9770
27.9110
27.9690
27.9698
27.9616
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
27.9371
27.9700
27.9265
27.9964
27.9842
27.9667
27.9610
27.9943
27.9616
27.9397
27.9799
28.0086
27.9709
27.9741
27.9675
27.9826
27.9676
27.9703
27.9789
27.9786
27.9722
27.9831
28.0043
27.9548
27.9875
27.9495
27.9549
27.9469
27.9744
27.9744
27.9449
27.9837
27.9585
28.0096
27.9762
27.9641
27.9854
27.9877
27.9839
27.9817
27.9845
27.9877
27.9880
27.9822
27.9836
28.0030
27.9678
28.0146
27.9945
27.9805
27.9785
27.9791
27.9817
27.9805
27.9782
27.9753
27.9792
27.9704
27.9794
27.9814
27.9794
27.9795
27.9881
27.9772
27.9796
27.9736
27.9772
27.9960
27.9795
27.9779
27.9829
27.9829
27.9815
27.9811
27.9773
27.9778
27.9724
27.9756
27.9699
27.9724
27.9666
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
27.9666
27.9739
27.9684
27.9861
27.9901
27.9879
27.9865
27.9876
27.9814
27.9842
27.9868
27.9834
27.9892
27.9864
27.9843
27.9838
27.9847
27.9860
27.9872
27.9869
27.9602
27.9852
27.9860
27.9836
27.9813
27.9623
27.9843
27.9802
27.9863
27.9813
27.9881
27.9850
27.9850
27.9830
27.9866
27.9888
27.9841
27.9863
27.9903
27.9961
27.9905
27.9945
27.9878
27.9929
27.9914
27.9914
27.9997
28.0006
27.9999
28.0004
28.0020
28.0029
28.0008
28.0040
28.0078
28.0065
27.9959
28.0073
28.0017
28.0042
28.0036
28.0055
28.0007
28.0066
28.0011
27.9960
28.0083
27.9978
28.0108
28.0088
28.0088
28.0139
28.0092
28.0092
28.0049
28.0111
28.0120
28.0093
28.0116
28.0102
28.0139
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
28.0113
28.0158
28.0156
28.0137
28.0236
28.0171
28.0224
28.0184
28.0199
28.0190
28.0204
28.0170
28.0183
28.0201
28.0182
28.0183
28.0175
28.0127
28.0211
28.0057
28.0180
28.0183
28.0149
28.0185
28.0182
28.0192
28.0213
28.0216
28.0169
28.0162
28.0167
28.0167
28.0169
28.0169
28.0161
28.0152
28.0179
28.0215
28.0194
28.0115
28.0174
28.0178
28.0202
28.0240
28.0198
28.0194
28.0171
28.0134
28.0121
28.0121
28.0141
28.0101
28.0114
28.0122
28.0124
28.0171
28.0165
28.0166
28.0159
28.0181
28.0200
28.0116
28.0144
28.0141
28.0116
28.0107
28.0169
28.0105
28.0136
28.0138
28.0114
28.0122
28.0122
28.0116
28.0025
28.0097
28.0066
28.0072
28.0066
28.0068
28.0067
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
28.0130
28.0091
28.0088
28.0091
28.0091
28.0115
28.0087
28.0128
28.0139
28.0095
28.0115
28.0101
28.0121
28.0114
28.0121
28.0122
28.0121
28.0168
28.0212
28.0219
28.0221
28.0204
28.0169
28.0141
28.0142
28.0147
28.0159
28.0165
28.0144
28.0182
28.0155
28.0155
28.0192
28.0204
28.0185
28.0248
28.0185
28.0226
28.0271
28.0290
28.0240
28.0302
28.0243
28.0288
28.0287
28.0301
28.0273
28.0313
28.0293
28.0300
28.0344
28.0308
28.0291
28.0287
28.0358
28.0309
28.0286
28.0308
28.0291
28.0380
28.0411
28.0420
28.0359
28.0368
28.0327
28.0361
28.0334
28.0300
28.0347
28.0359
28.0344
28.0370
28.0355
28.0371
28.0318
28.0390
28.0390
28.0390
28.0376
28.0376
28.0377
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
28.0345
28.0333
28.0429
28.0379
28.0401
28.0401
28.0423
28.0393
28.0382
28.0424
28.0386
28.0386
28.0373
28.0397
28.0412
28.0565
28.0419
28.0456
28.0426
28.0423
28.0391
28.0403
28.0388
28.0408
28.0457
28.0455
28.0460
28.0456
28.0464
28.0442
28.0416
28.0451
28.0432
28.0434
28.0448
28.0448
28.0373
28.0429
28.0392
28.0469
28.0443
28.0356
28.0474
28.0446
28.0348
28.0368
28.0418
28.0445
28.0533
28.0439
28.0474
28.0435
28.0419
28.0538
28.0538
28.0463
28.0491
28.0441
28.0411
28.0507
28.0459
28.0519
28.0554
28.0512
28.0507
28.0582
28.0471
28.0539
28.0530
28.0502
28.0422
28.0431
28.0395
28.0177
28.0425
28.0484
28.0693
28.0490
28.0453
28.0494
28.0522
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
28.0393
28.0443
28.0465
28.0450
28.0539
28.0566
28.0585
28.0486
28.0427
28.0548
28.0616
28.0298
28.0726
28.0695
28.0629
28.0503
28.0493
28.0537
28.0613
28.0643
28.0678
28.0564
28.0703
28.0647
28.0579
28.0630
28.0716
28.0586
28.0607
28.0601
28.0611
28.0606
28.0611
28.0066
28.0412
28.0558
28.0590
28.0750
28.0483
28.0599
28.0490
28.0499
28.0565
28.0612
28.0634
28.0627
28.0519
28.0551
28.0696
28.0581
28.0568
28.0572
28.0529
28.0421
28.0432
28.0211
28.0363
28.0436
28.0619
28.0573
28.0499
28.0340
28.0474
28.0534
28.0589
28.0466
28.0448
28.0576
28.0558
28.0522
28.0480
28.0444
28.0429
28.0624
28.0610
28.0461
28.0564
28.0734
28.0565
28.0503
28.0581
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
28.0519
28.0625
28.0583
28.0645
28.0642
28.0535
28.0510
28.0542
28.0677
28.0416
28.0676
28.0596
28.0635
28.0558
28.0623
28.0718
28.0585
28.0552
28.0684
28.0646
28.0590
28.0465
28.0594
28.0303
28.0533
28.0561
28.0585
28.0497
28.0582
28.0507
28.0562
28.0715
28.0468
28.0411
28.0587
28.0456
28.0705
28.0534
28.0558
28.0536
28.0552
28.0461
28.0598
28.0598
28.0650
28.0423
28.0442
28.0449
28.0660
28.0506
28.0655
28.0512
28.0407
28.0475
28.0411
28.0512
28.1036
28.0641
28.0572
28.0700
28.0577
28.0637
28.0534
28.0461
28.0701
28.0631
28.0575
28.0444
28.0592
28.0684
28.0593
28.0677
28.0512
28.0644
28.0660
28.0542
28.0768
28.0515
28.0579
28.0538
28.0526
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
28.0833
28.0637
28.0529
28.0535
28.0561
28.0736
28.0635
28.0600
28.0520
28.0695
28.0608
28.0608
28.0590
28.0290
28.0939
28.0618
28.0551
28.0757
28.0698
28.0717
28.0529
28.0644
28.0613
28.0759
28.0745
28.0736
28.0611
28.0732
28.0782
28.0682
28.0756
28.0857
28.0739
28.0840
28.0862
28.0724
28.0727
28.0752
28.0732
28.0703
28.0849
28.0795
28.0902
28.0874
28.0971
28.0638
28.0877
28.0751
28.0904
28.0971
28.0661
28.0711
28.0754
28.0516
28.0961
28.0689
28.1110
28.1062
28.0726
28.1141
28.0913
28.0982
28.0703
28.0654
28.0760
28.0727
28.0850
28.0877
28.0967
28.1185
28.0945
28.0834
28.0764
28.1129
28.0797
28.0707
28.1008
28.0971
28.0826
28.0857
28.0984
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
28.0869
28.0795
28.0875
28.1184
28.0746
28.0816
28.0879
28.0888
28.0924
28.0979
28.0702
28.0847
28.0917
28.0834
28.0823
28.0917
28.0779
28.0852
28.0863
28.0942
28.0801
28.0817
28.0922
28.0914
28.0868
28.0832
28.0881
28.0910
28.0886
28.0961
28.0857
28.0859
28.1086
28.0838
28.0921
28.0945
28.0839
28.0877
28.0803
28.0928
28.0885
28.0940
28.0856
28.0849
28.0955
28.0955
28.0846
28.0871
28.0872
28.0917
28.0931
28.0865
28.0900
28.0915
28.0963
28.0917
28.0950
28.0898
28.0902
28.0867
28.0843
28.0939
28.0902
28.0911
28.0909
28.0949
28.0867
28.0932
28.0891
28.0932
28.0887
28.0925
28.0928
28.0883
28.0946
28.0977
28.0914
28.0959
28.0926
28.0923
28.0950
1.4.2.7.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm[6/27/2012 2:03:42 PM]
28.1006
28.0924
28.0963
28.0893
28.0956
28.0980
28.0928
28.0951
28.0958
28.0912
28.0990
28.0915
28.0957
28.0976
28.0888
28.0928
28.0910
28.0902
28.0950
28.0995
28.0965
28.0972
28.0963
28.0946
28.0942
28.0998
28.0911
28.1043
28.1002
28.0991
28.0959
28.0996
28.0926
28.1002
28.0961
28.0983
28.0997
28.0959
28.0988
28.1029
28.0989
28.1000
28.0944
28.0979
28.1005
28.1012
28.1013
28.0999
28.0991
28.1059
28.0961
28.0981
28.1045
28.1047
28.1042
28.1146
28.1113
28.1051
28.1065
28.1065
28.0985
28.1000
28.1066
28.1041
28.0954
28.1090
1.4.2.7.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4272.htm[6/27/2012 2:03:44 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.7. Standard Resistor
1.4.2.7.2. Graphical Output and Interpretation
Goal The goal of this analysis is threefold:
1. Determine if the univariate model:
is appropriate and valid.
2. Determine if the typical underlying assumptions for
an "in control" measurement process are valid. These
assumptions are:
1. random drawings;
2. from a fixed distribution;
3. with the distribution having a fixed location;
and
4. the distribution having a fixed scale.
3. Determine if the confidence interval
is appropriate and valid where s is the standard
deviation of the original data.
4-Plot of
Data
1.4.2.7.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4272.htm[6/27/2012 2:03:44 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates
significant shifts in both location and variation.
Specifically, the location is increasing with time. The
variability seems greater in the first and last third of
the data than it does in the middle third.
2. The lag plot (upper right) shows a significant non-
random pattern in the data. Specifically, the strong
linear appearance of this plot is indicative of a model
that relates Y
t
to Y
t-1
.
3. The distributional plots, the histogram (lower left)
and the normal probability plot (lower right), are not
interpreted since the randomness assumption is so
clearly violated.
The serious violation of the non-randomness assumption
means that the univariate model
is not valid. Given the linear appearance of the lag plot, the
first step might be to consider a model of the type
However, discussions with the scientist revealed the
following:
1. the drift with respect to location was expected.
2. the non-constant variability was not expected.
The scientist examined the data collection device and
determined that the non-constant variation was a seasonal
effect. The high variability data in the first and last thirds
was collected in winter while the more stable middle third
was collected in the summer. The seasonal effect was
determined to be caused by the amount of humidity
affecting the measurement equipment. In this case, the
solution was to modify the test equipment to be less
sensitive to enviromental factors.
Simple graphical techniques can be quite effective in
revealing unexpected results in the data. When this occurs,
it is important to investigate whether the unexpected result
is due to problems in the experiment and data collection, or
is it in fact indicative of an unexpected underlying structure
in the data. This determination cannot be made on the basis
of statistics alone. The role of the graphical and statistical
analysis is to detect problems or unexpected results in the
data. Resolving the issues requires the knowledge of the
1.4.2.7.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4272.htm[6/27/2012 2:03:44 PM]
scientist or engineer.
Individual
Plots
Although it is generally unnecessary, the plots can be
generated individually to give more detail. Since the lag
plot indicates significant non-randomness, we omit the
distributional plots.
Run
Sequence
Plot
Lag Plot
1.4.2.7.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm[6/27/2012 2:03:45 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.7. Standard Resistor
1.4.2.7.3. Quantitative Output and
Interpretation
Summary
Statistics
As a first step in the analysis, common summary statistics
are computed from the data.
Sample size = 1000
Mean = 28.01634
Median = 28.02910
Minimum = 27.82800
Maximum = 28.11850
Range = 0.29050
Stan. Dev. = 0.06349
Location One way to quantify a change in location over time is to fit
a straight line to the data using an index variable as the
independent variable in the regression. For our data, we
assume that data are in sequential run order and that the
data were collected at equally spaced time intervals. In our
regression, we use the index variable X = 1, 2, ..., N, where
N is the number of observations. If there is no significant
drift in the location over time, the slope parameter should
be zero.
Coefficient Estimate Stan. Error
t-Value
B
0
27.9114 0.1209E-02
0.2309E+05
B
1
0.20967E-03 0.2092E-05
100.2
Residual Standard Deviation = 0.1909796E-01
Residual Degrees of Freedom = 998
The slope parameter, B
1
, has a t value of 100.2 which is
statistically significant. The value of the slope parameter
estimate is 0.00021. Although this number is nearly zero,
we need to take into account that the original scale of the
data is from about 27.8 to 28.2. In this case, we conclude
that there is a drift in location.
Variation One simple way to detect a change in variation is with a
Bartlett test after dividing the data set into several equal-
sized intervals. However, the Bartlett test is not robust for
non-normality. Since the normality assumption is
questionable for these data, we use the alternative Levene
test. In particular, we use the Levene test based on the
median rather the mean. The choice of the number of
1.4.2.7.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm[6/27/2012 2:03:45 PM]
intervals is somewhat arbitrary, although values of four or
eight are reasonable. We will divide our data into four
intervals.
H
0
:
1
2
=
2
2
=
3
2
=
4
2
H
a
: At least one
i
2
is not equal to the
others.
Test statistic: W = 140.85
Degrees of freedom: k - 1 = 3
Significance level: = 0.05
Critical value: F
,k-1,N-k
= 2.614
Critical region: Reject H
0
if W > 2.614
In this case, since the Levene test statistic value of 140.85
is greater than the 5 % significance level critical value of
2.614, we conclude that there is significant evidence of
nonconstant variation.
Randomness There are many ways in which data can be non-random.
However, most common forms of non-randomness can be
detected with a few simple tests. The lag plot in the 4-plot
in the previous section is a simple graphical technique.
One check is an autocorrelation plot that shows the
autocorrelations for various lags. Confidence bands can be
plotted at the 95 % and 99 % confidence levels. Points
outside this band indicate statistically significant values (lag
0 is always 1).
The lag 1 autocorrelation, which is generally the one of
greatest interest, is 0.97. The critical values at the 5 %
significance level are -0.062 and 0.062. This indicates that
the lag 1 autocorrelation is statistically significant, so there
is strong evidence of non-randomness.
A common test for randomness is the runs test.
H
0
: the sequence was produced in a random
manner
H
a
: the sequence was not produced in a
1.4.2.7.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm[6/27/2012 2:03:45 PM]
random manner
Test statistic: Z = -30.5629
Significance level: = 0.05
Critical value: Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
Because the test statistic is outside of the critical region, we
reject the null hypothesis and conclude that the data are not
random.
Distributional
Analysis
Since we rejected the randomness assumption, the
distributional tests are not meaningful. Therefore, these
quantitative tests are omitted. Since the Grubbs' test for
outliers also assumes the approximate normality of the data,
we omit Grubbs' test as well.
Univariate
Report
It is sometimes useful and convenient to summarize the
above results in a report.
Analysis for resistor case study
1: Sample Size = 1000
2: Location
Mean =
28.01635
Standard Deviation of Mean =
0.002008
95% Confidence Interval for Mean =
(28.0124,28.02029)
Drift with respect to location? = NO
3: Variation
Standard Deviation =
0.063495
95% Confidence Interval for SD =
(0.060829,0.066407)
Change in variation?
(based on Levene's test on quarters
of the data) = YES
4: Randomness
Autocorrelation =
0.972158
Data Are Random?
(as measured by autocorrelation) = NO
5: Distribution
Distributional test omitted due to
non-randomness of the data
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed)
Data Set is in Statistical Control? = NO
7: Outliers?
(Grubbs' test omitted due to
non-randomness of the data)
1.4.2.7.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4274.htm[6/27/2012 2:03:46 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.7. Standard Resistor
1.4.2.7.4. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and run
this case study yourself. Each step may use results
from previous steps, so please be patient. Wait
until the software verifies that the current step is
complete before clicking on the next step.
NOTE: This case study has 1,000 points. For
better performance, it is highly recommended that
you check the "No Update" box on the
Spreadsheet window for this case study. This will
suppress subsequent updating of the Spreadsheet
window as the data are created or modified.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data.
1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. 4-plot of the data.
1. 4-plot of Y. 1. Based on the 4-
plot, there are
shifts
1.4.2.7.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4274.htm[6/27/2012 2:03:46 PM]
in location and
variation and the
data
are not random.
3. Generate the individual plots.
1. Generate a run sequence plot.
2. Generate a lag plot.
1. The run
sequence plot
indicates that
there are
shifts of location
and
variation.
2. The lag plot
shows a strong
linear
pattern, which
indicates
significant
non-randomness.
4. Generate summary statistics,
quantitative
analysis, and print a univariate
report.
1. Generate a table of summary
statistics.
2. Generate the sample mean, a
confidence
interval for the population mean,
and
compute a linear fit to detect
drift in
location.
3. Generate the sample standard
deviation,
a confidence interval for the
population
standard deviation, and detect
drift in
variation by dividing the data into
quarters and computing Levene's
test for
equal standard deviations.
4. Check for randomness by generating
an
autocorrelation plot and a runs
test.
5. Print a univariate report (this
assumes
steps 2 thru 5 have already been
run).
1. The summary
statistics table
displays
25+ statistics.
2. The mean is
28.0163 and a 95%
confidence
interval is
(28.0124,28.02029).
The linear fit
indicates drift in
location since
the slope parameter
estimate is
statistically
significant.
3. The standard
deviation is 0.0635
with
a 95%
confidence interval
of
(0.060829,0.066407).
Levene's test
indicates
significant
change in
variation.
4. The lag 1
autocorrelation is
0.97.
From the
autocorrelation
plot, this is
outside the 95%
confidence interval
bands,
indicating
significant non-
randomness.
5. The results are
1.4.2.7.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4274.htm[6/27/2012 2:03:46 PM]
summarized in a
convenient
report.
1.4.2.8. Heat Flow Meter 1
http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm[6/27/2012 2:03:47 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.8. Heat Flow Meter 1
Heat Flow
Meter
Calibration
and
Stability
This example illustrates the univariate analysis of standard
resistor data.
1. Background and Data
2. Graphical Output and Interpretation
3. Quantitative Output and Interpretation
4. Work This Example Yourself
1.4.2.8.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4281.htm[6/27/2012 2:03:47 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.8. Heat Flow Meter 1
1.4.2.8.1. Background and Data
Generation This data set was collected by Bob Zarr of NIST in January,
1990 from a heat flow meter calibration and stability analysis.
The response variable is a calibration factor.
The motivation for studying this data set is to illustrate a well-
behaved process where the underlying assumptions hold and
the process is in statistical control.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Data The following are the data used for this case study.
9.206343
9.299992
9.277895
9.305795
9.275351
9.288729
9.287239
9.260973
9.303111
9.275674
9.272561
9.288454
9.255672
9.252141
9.297670
9.266534
9.256689
9.277542
9.248205
9.252107
9.276345
9.278694
9.267144
9.246132
9.238479
9.269058
9.248239
9.257439
9.268481
9.288454
9.258452
9.286130
9.251479
9.257405
9.268343
9.291302
9.219460
9.270386
9.218808
9.241185
9.269989
9.226585
1.4.2.8.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4281.htm[6/27/2012 2:03:47 PM]
9.258556
9.286184
9.320067
9.327973
9.262963
9.248181
9.238644
9.225073
9.220878
9.271318
9.252072
9.281186
9.270624
9.294771
9.301821
9.278849
9.236680
9.233988
9.244687
9.221601
9.207325
9.258776
9.275708
9.268955
9.257269
9.264979
9.295500
9.292883
9.264188
9.280731
9.267336
9.300566
9.253089
9.261376
9.238409
9.225073
9.235526
9.239510
9.264487
9.244242
9.277542
9.310506
9.261594
9.259791
9.253089
9.245735
9.284058
9.251122
9.275385
9.254619
9.279526
9.275065
9.261952
9.275351
9.252433
9.230263
9.255150
9.268780
9.290389
9.274161
9.255707
9.261663
9.250455
9.261952
9.264041
9.264509
9.242114
9.239674
9.221553
9.241935
9.215265
9.285930
9.271559
9.266046
9.285299
9.268989
9.267987
9.246166
9.231304
9.240768
9.260506
1.4.2.8.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4281.htm[6/27/2012 2:03:47 PM]
9.274355
9.292376
9.271170
9.267018
9.308838
9.264153
9.278822
9.255244
9.229221
9.253158
9.256292
9.262602
9.219793
9.258452
9.267987
9.267987
9.248903
9.235153
9.242933
9.253453
9.262671
9.242536
9.260803
9.259825
9.253123
9.240803
9.238712
9.263676
9.243002
9.246826
9.252107
9.261663
9.247311
9.306055
9.237646
9.248937
9.256689
9.265777
9.299047
9.244814
9.287205
9.300566
9.256621
9.271318
9.275154
9.281834
9.253158
9.269024
9.282077
9.277507
9.284910
9.239840
9.268344
9.247778
9.225039
9.230750
9.270024
9.265095
9.284308
9.280697
9.263032
9.291851
9.252072
9.244031
9.283269
9.196848
9.231372
9.232963
9.234956
9.216746
9.274107
9.273776
1.4.2.8.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4282.htm[6/27/2012 2:03:48 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.8. Heat Flow Meter 1
1.4.2.8.2. Graphical Output and Interpretation
Goal The goal of this analysis is threefold:
1. Determine if the univariate model:
is appropriate and valid.
2. Determine if the typical underlying assumptions for
an "in control" measurement process are valid. These
assumptions are:
1. random drawings;
2. from a fixed distribution;
3. with the distribution having a fixed location;
and
4. the distribution having a fixed scale.
3. Determine if the confidence interval
is appropriate and valid where s is the standard
deviation of the original data.
4-Plot of
Data
1.4.2.8.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4282.htm[6/27/2012 2:03:48 PM]
Interpretation The assumptions are addressed by the graphics shown
above:
1. The run sequence plot (upper left) indicates that the
data do not have any significant shifts in location or
scale over time.
2. The lag plot (upper right) does not indicate any non-
random pattern in the data.
3. The histogram (lower left) shows that the data are
reasonably symmetric, there does not appear to be
significant outliers in the tails, and it seems
reasonable to assume that the data are from
approximately a normal distribution.
4. The normal probability plot (lower right) verifies that
an assumption of normality is in fact reasonable.
Individual
Plots
Although it is generally unnecessary, the plots can be
generated individually to give more detail.
Run
Sequence
Plot
Lag Plot
1.4.2.8.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4282.htm[6/27/2012 2:03:48 PM]
Histogram
(with
overlaid
Normal PDF)
Normal
Probability
Plot
1.4.2.8.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm[6/27/2012 2:03:49 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.8. Heat Flow Meter 1
1.4.2.8.3. Quantitative Output and
Interpretation
Summary
Statistics
As a first step in the analysis, common summary statistics
are computed from the data.
Sample size = 195
Mean = 9.261460
Median = 9.261952
Minimum = 9.196848
Maximum = 9.327973
Range = 0.131126
Stan. Dev. = 0.022789
Location One way to quantify a change in location over time is to fit
a straight line to the data using an index variable as the
independent variable in the regression. For our data, we
assume that data are in sequential run order and that the
data were collected at equally spaced time intervals. In our
regression, we use the index variable X = 1, 2, ..., N, where
N is the number of observations. If there is no significant
drift in the location over time, the slope parameter should
be zero.
Coefficient Estimate Stan. Error
t-Value
B
0
9.26699 0.3253E-02
2849.
B
1
-0.56412E-04 0.2878E-04
-1.960
Residual Standard Deviation = 0.2262372E-01
Residual Degrees of Freedom = 193
The slope parameter, B
1
, has a t value of -1.96 which is
(barely) statistically significant since it is essentially equal
to the 95 % level cutoff of -1.96. However, notice that the
value of the slope parameter estimate is -0.00056. This
slope, even though statistically significant, can essentially
be considered zero.
Variation One simple way to detect a change in variation is with a
Bartlett test after dividing the data set into several equal-
sized intervals. The choice of the number of intervals is
somewhat arbitrary, although values of four or eight are
reasonable. We will divide our data into four intervals.
2 2 2 2
1.4.2.8.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm[6/27/2012 2:03:49 PM]
H
0
:
1
=
2
=
3
=
4
H
a
: At least one
i
2
is not equal to the
others.
Test statistic: T = 3.147
Degrees of freedom: k - 1 = 3
Significance level: = 0.05
Critical value:
2
1-,k-1
= 7.815
Critical region: Reject H
0
if T > 7.815
In this case, since the Bartlett test statistic of 3.147 is less
than the critical value at the 5 % significance level of
7.815, we conclude that the variances are not significantly
different in the four intervals. That is, the assumption of
constant scale is valid.
Randomness There are many ways in which data can be non-random.
However, most common forms of non-randomness can be
detected with a few simple tests. The lag plot in the
previous section is a simple graphical technique.
Another check is an autocorrelation plot that shows the
autocorrelations for various lags. Confidence bands can be
plotted at the 95 % and 99 % confidence levels. Points
outside this band indicate statistically significant values (lag
0 is always 1).
The lag 1 autocorrelation, which is generally the one of
greatest interest, is 0.281. The critical values at the 5 %
significance level are -0.087 and 0.087. This indicates that
the lag 1 autocorrelation is statistically significant, so there
is evidence of non-randomness.
A common test for randomness is the runs test.
H
0
: the sequence was produced in a random
manner
H
a
: the sequence was not produced in a
random manner
Test statistic: Z = -3.2306
Significance level: = 0.05
1.4.2.8.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm[6/27/2012 2:03:49 PM]
Critical value: Z
1-/2
= 1.96
Critical region: Reject H
0
if |Z| > 1.96
The value of the test statistic is less than -1.96, so we reject
the null hypothesis at the 0.05 significant level and
conclude that the data are not random.
Although the autocorrelation plot and the runs test indicate
some mild non-randomness, the violation of the
randomness assumption is not serious enough to warrant
developing a more sophisticated model. It is common in
practice that some of the assumptions are mildly violated
and it is a judgement call as to whether or not the
violations are serious enough to warrant developing a more
sophisticated model for the data.
Distributional
Analysis
Probability plots are a graphical test for assessing if a
particular distribution provides an adequate fit to a data set.
A quantitative enhancement to the probability plot is the
correlation coefficient of the points on the probability plot.
For this data set the correlation coefficient is 0.996. Since
this is greater than the critical value of 0.987 (this is a
tabulated value), the normality assumption is not rejected.
Chi-square and Kolmogorov-Smirnov goodness-of-fit tests
are alternative methods for assessing distributional
adequacy. The Wilk-Shapiro and Anderson-Darling tests
can be used to test for normality. The results of the
Anderson-Darling test follow.
H
0
: the data are normally distributed
H
a
: the data are not normally distributed
Adjusted test statistic: A
2
= 0.129
Significance level: = 0.05
Critical value: 0.787
Critical region: Reject H
0
if A
2
> 0.787
The Anderson-Darling test also does not reject the
normality assumption because the test statistic, 0.129, is
less than the critical value at the 5 % significance level of
0.787.
Outlier
Analysis
A test for outliers is the Grubbs' test.
H
0
: there are no outliers in the data
H
a
: the maximum value is an outlier
Test statistic: G = 2.918673
Significance level: = 0.05
Critical value for an upper one-tailed
test: 3.597898
Critical region: Reject H
0
if G > 3.597898
For this data set, Grubbs' test does not detect any outliers at
the 0.05 significance level.
Model Since the underlying assumptions were validated both
1.4.2.8.3. Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4283.htm[6/27/2012 2:03:49 PM]
graphically and analytically, with a mild violation of the
randomness assumption, we conclude that a reasonable
model for the data is:
We can express the uncertainty for C, here estimated by
9.26146, as the 95 % confidence interval
(9.258242,9.26479).
Univariate
Report
It is sometimes useful and convenient to summarize the
above results in a report. The report for the heat flow meter
data follows.
Analysis for heat flow meter data
1: Sample Size = 195
2: Location
Mean =
9.26146
Standard Deviation of Mean =
0.001632
95 % Confidence Interval for Mean =
(9.258242,9.264679)
Drift with respect to location? = NO
3: Variation
Standard Deviation =
0.022789
95 % Confidence Interval for SD =
(0.02073,0.025307)
Drift with respect to variation?
(based on Bartlett's test on quarters
of the data) = NO
4: Randomness
Autocorrelation =
0.280579
Data are Random?
(as measured by autocorrelation) = NO
5: Data are Normal?
(as tested by Anderson-Darling) = YES
6: Statistical Control
(i.e., no drift in location or scale,
data are random, distribution is
fixed, here we are testing only for
fixed normal)
Data Set is in Statistical Control? = YES
7: Outliers?
(as determined by Grubbs' test) = NO
1.4.2.8.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4284.htm[6/27/2012 2:03:50 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.8. Heat Flow Meter 1
1.4.2.8.4. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data.
1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. 4-plot of the data.
1. 4-plot of Y. 1. Based on the 4-
plot, there are no
shifts
in location or
scale, and the data
seem to
follow a normal
distribution.
3. Generate the individual plots.
1.4.2.8.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4284.htm[6/27/2012 2:03:50 PM]
1. Generate a run sequence plot.
2. Generate a lag plot.
3. Generate a histogram with an
overlaid normal pdf.
4. Generate a normal probability
plot.
1. The run sequence
plot indicates that
there are no
shifts of location or
scale.
2. The lag plot
does not indicate any
significant
patterns (which would
show the data
were not random).
3. The histogram
indicates that a
normal
distribution is a
good
distribution for
these data.
4. The normal
probability plot
verifies
that the normal
distribution is a
reasonable
distribution for
these data.
4. Generate summary statistics,
quantitative
analysis, and print a univariate
report.
1. Generate a table of summary
statistics.
2. Generate the mean, a confidence
interval for the mean, and compute
a linear fit to detect drift in
location.
3. Generate the standard deviation, a
confidence interval for the
standard
deviation, and detect drift in
variation
by dividing the data into quarters
and
computing Bartlett's test for
equal
standard deviations.
4. Check for randomness by generating
an
autocorrelation plot and a runs
test.
5. Check for normality by computing
the
normal probability plot
correlation
coefficient.
6. Check for outliers using Grubbs'
test.
7. Print a univariate report (this
1. The summary
statistics table
displays
25+ statistics.
2. The mean is
9.261 and a 95%
confidence
interval is
(9.258,9.265).
The linear fit
indicates no drift in
location since
the slope parameter
estimate is
essentially zero.
3. The standard
deviation is 0.023
with
a 95% confidence
interval of
(0.0207,0.0253).
Bartlett's test
indicates no
significant
change in
variation.
4. The lag 1
autocorrelation is
0.28.
From the
autocorrelation plot,
this is
statistically
significant at the
95%
level.
1.4.2.8.4. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4284.htm[6/27/2012 2:03:50 PM]
assumes
steps 2 thru 6 have already been
run).
5. The normal
probability plot
correlation
coefficient is
0.999. At the 5%
level,
we cannot reject
the normality
assumption.
6. Grubbs' test
detects no outliers
at the
5% level.
7. The results are
summarized in a
convenient
report.
1.4.2.9. Fatigue Life of Aluminum Alloy Specimens
http://www.itl.nist.gov/div898/handbook/eda/section4/eda429.htm[6/27/2012 2:03:51 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.9. Fatigue Life of Aluminum Alloy
Specimens
Fatigue
Life of
Aluminum
Alloy
Specimens
This example illustrates the univariate analysis of the fatigue
life of aluminum alloy specimens.
1. Background and Data
2. Graphical Output and Interpretation
1.4.2.9.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4291.htm[6/27/2012 2:03:52 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.9. Fatigue Life of Aluminum Alloy Specimens
1.4.2.9.1. Background and Data
Generation This data set comprises measurements of fatigue life
(thousands of cycles until rupture) of rectangular strips of
6061-T6 aluminum sheeting, subjected to periodic loading
with maximum stress of 21,000 psi (pounds per square inch),
as reported by Birnbaum and Saunders (1958).
Purpose of
Analysis
The goal of this case study is to select a probabilistic model,
from among several reasonable alternatives, to describe the
dispersion of the resulting measured values of life-length.
The original study, in the field of statistical reliability analysis,
was concerned with the prediction of failure times of a
material subjected to a load varying in time. It was well-
known that a structure designed to withstand a particular static
load may fail sooner than expected under a dynamic load.
If a realistic model for the probability distribution of lifetime
can be found, then it can be used to estimate the time by
which a part or structure needs to be replaced to guarantee
that the probability of failure does not exceed some maximum
acceptable value, for example 0.1 %, while it is in service.
The chapter of this eHandbook that is concerned with the
assessment of product reliability contains additional material
on statistical methods used in reliability analysis. This case
study is meant to complement that chapter by showing the use
of graphical and other techniques in the model selection stage
of such analysis.
When there is no cogent reason to adopt a particular model, or
when none of the models under consideration seems adequate
for the purpose, one may opt for a non-parametric statistical
method, for example to produce tolerance bounds or
confidence intervals.
A non-parametric method does not rely on the assumption that
the data are like a sample from a particular probability
distribution that is fully specified up to the values of some
adjustable parameters. For example, the Gaussian probability
distribution is a parametric model with two adjustable
parameters.
1.4.2.9.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4291.htm[6/27/2012 2:03:52 PM]
The price to be paid when using non-parametric methods is
loss of efficiency, meaning that they may require more data
for statistical inference than a parametric counterpart would, if
applicable. For example, non-parametric confidence intervals
for model parameters may be considerably wider than what a
confidence interval would need to be if the underlying
distribution could be identified correctly. Such identification is
what we will attempt in this case study.
It should be noted --- a point that we will stress later in the
development of this case study --- that the very exercise of
selecting a model often contributes substantially to the
uncertainty of the conclusions derived after the selection has
been made.
Software The analyses used in this case study can be generated using R
code.
Data The following data are used for this case study.
370 1016 1235 1419 1567 1820
706 1018 1238 1420 1578 1868
716 1020 1252 1420 1594 1881
746 1055 1258 1450 1602 1890
785 1085 1262 1452 1604 1893
797 1102 1269 1475 1608 1895
844 1102 1270 1478 1630 1910
855 1108 1290 1481 1642 1923
858 1115 1293 1485 1674 1940
886 1120 1300 1502 1730 1945
886 1134 1310 1505 1750 2023
930 1140 1313 1513 1750 2100
960 1199 1315 1522 1763 2130
988 1200 1330 1522 1768 2215
990 1200 1355 1530 1781 2268
1000 1203 1390 1540 1782 2440
1010 1222 1416 1560 1792
1.4.2.9.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm[6/27/2012 2:03:53 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.9. Fatigue Life of Aluminum Alloy Specimens
1.4.2.9.2. Graphical Output and Interpretation
Goal The goal of this analysis is to select a probabilistic model to describe the dispersion
of the measured values of fatigue life of specimens of an aluminum alloy described
in [1.4.2.9.1], from among several reasonable alternatives.
Initial Plots
of the Data
Simple diagrams can be very informative about location, spread, and to detect
possibly anomalous data values or particular patterns (clustering, for example).
These include dot-charts, boxplots, and histograms. Since building an effective
histogram requires that a choice be made of bin size, and this choice can be
influential, one may wish to examine a non-parametric estimate of the underlying
probability density.
These several plots variously show that the measurements range from a value
1.4.2.9.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm[6/27/2012 2:03:53 PM]
slightly greater than 350,000 to slightly less than 2,500,000 cycles. The boxplot
suggests that the largest measured value may be an outlier.
A recommended first step is to check consistency between the data and what is to be
expected if the data were a sample from a particular probability distribution.
Knowledge about the underlying properties of materials and of relevant industrial
processes typically offer clues as to the models that should be entertained. Graphical
diagnostic techniques can be very useful at this exploratory stage: foremost among
these, for univariate data, is the quantile-quantile plot, or QQ-plot (Wilk and
Gnanadesikan, 1968).
Each data point is represented by one point in the QQ-plot. The ordinate of each of
these points is one data value; if this data value happens to be the kth order statistic
in the sample (that is, the kth largest value), then the corresponding abscissa is the
"typical" value that the kth largest value should have in a sample of the same size as
the data, drawn from a particular distribution. If F denotes the cumulative
probability distribution function of interest, and the sample comprises n values, then
F
-1
[(k - 1/2) / (n + 1/2)] is a reasonable choice for that "typical" value, because it is
an approximation to the median of the kth order statistic in a sample of size n from
this distribution.
The following figure shows a QQ-plot of our data relative to the Gaussian (or,
normal) probability distribution. If the data matched expectations perfectly, then the
points would all fall on a straight line.
1.4.2.9.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm[6/27/2012 2:03:53 PM]
In practice, one needs to gauge whether the deviations from such perfect alignment
are commensurate with the natural variability associated with sampling. This can
easily be done by examining how variable QQ-plots of samples from the target
distribution may be.
The following figure shows, superimposed on the QQ-plot of the data, the QQ-plots
of 99 samples of the same size as the data, drawn from a Gaussian distribution with
the same mean and standard deviation as the data.
The fact that the cloud of QQ-plots corresponding to 99 samples from the Gaussian
distribution effectively covers the QQ-plot for the data, suggests that the chances are
better than 1 in 100 that our data are inconsistent with the Gaussian model.
This proves nothing, of course, because even the rarest of events may happen.
However, it is commonly taken to be indicative of an acceptable fit for general
purposes. In any case, one may naturally wonder if an alternative model might not
provide an even better fit.
Knowing the provenance of the data, that they portray strength of a material,
strongly suggests that one may like to examine alternative models, because in many
studies of reliability non-Gaussian models tend to be more appropriate than
Gaussian models.
Candidate
Distributions
There are many probability distributions that could reasonably be entertained as
candidate models for the data. However, we will restrict ourselves to consideration
1.4.2.9.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm[6/27/2012 2:03:53 PM]
of the following because these have proven to be useful in reliability studies.
Normal distribution
Gamma distribution
Birnbaum-Saunders distribution
3-parameter Weibull distribution
Approach A very simple approach amounts to comparing QQ-plots of the data for the
candidate models under consideration. This typically involves first fitting the models
to the data, for example employing the method of maximum likelihood [1.3.6.5.2].
The maximum likelihood estimates are the following:
Gaussian: mean 1401, standard deviation 389
Gamma: shape 11.85, rate 0.00846
Birnbaum-Saunders: shape 0.310, scale 1337
3-parameter Weibull: location 181, shape 3.43, scale 1357
The following figure shows how close (or how far) the best fitting probability
densities of the four distributions approximate the non-parametric probability
density estimate. This comparison, however, takes into account neither the fact that
our sample is fairly small (101 measured values), nor that the fitted models
themselves have been estimated from the same data that the non-parametric estimate
was derived from.
1.4.2.9.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm[6/27/2012 2:03:53 PM]
These limitations notwithstanding, it is worth examining the corresponding QQ-
plots, shown below, which suggest that the Gaussian and the 3-parameter Weibull
may be the best models.
Model
Selection
A more careful comparison of the merits of the alternative models needs to take into
account the fact that the 3-parameter Weibull model (precisely because it has three
parameters), may be intrinsically more flexible than the others, which all have two
adjustable parameters only.
Two criteria can be employed for a formal comparison: Akaike's Information
Criterion (AIC), and the Bayesian Information Criterion (BIC) (Hastie et. al., 2001).
The smaller the value of either model selection criterion, the better the model:
AIC BIC
GAU 1495 1501
GAM 1499 1504
BS 1507 1512
WEI 1498 1505
On this basis (and according both to AIC and BIC), there seems to be no cogent
reason to replace the Gaussian model by any of the other three. The values of BIC
can also be used to derive an approximate answer to the question of how strongly
the data may support each of these models. Doing this involves the application of
Bayesian statistical methods [8.1.10].
We start from an a priori assignment of equal probabilities to all four models,
1.4.2.9.2. Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4292.htm[6/27/2012 2:03:53 PM]
indicating that we have no reason to favor one over another at the outset, and then
update these probabilities based on the measured values of lifetime. The updated
probabilities of the four models, called their posterior probabilities, are
approximately proportional to exp(-BIC(GAU)/2), exp(-BIC(GAM)/2), exp(-
BIC(BS)/2), and exp(-BIC(WEI)/2). The values are 76 % for GAU, 16 % for GAM,
0.27 % for BS, and 7.4 % for WEI.
One possible use for the selected model is to answer the question of the age in
service by which a part or structure needs to be replaced to guarantee that the
probability of failure does not exceed some maximum acceptable value, for example
0.1 %.The answer to this question is the 0.1st percentile of the fitted distribution,
that is G
-1
(0.001) = 198 thousand cycles, where, in this case, G
-1
denotes the
inverse of the fitted, Gaussian probability distribution.
To assess the uncertainty of this estimate one may employ the statistical bootstrap
[1.3.3.4]. In this case, this involves drawing a suitably large number of bootstrap
samples from the data, and for each of them applying the model fitting and model
selection exercise described above, ending with the calculation of G
-1
(0.001) for
the best model (which may vary from sample to sample).
The bootstrap samples should be of the same size as the data, with each being drawn
uniformly at random from the data, with replacement. This process, based on 5,000
bootstrap samples, yielded a 95 % confidence interval for the 0.1st percentile
ranging from 40 to 366 thousands of cycles. The large uncertainty is not surprising
given that we are attempting to estimate the largest value that is exceeded with
probability 99.9 %, based on a sample comprising only 101 measured values.
Prediction
Intervals
One more application in this analysis is to evaluate prediction intervals for the
fatigue life of the aluminum alloy specimens. For example, if we were to test three
new specimens using the same process, we would want to know (with 95 %
confidence) the minimum number of cycles for these three specimens. That is, we
need to find a statistical interval [L, ] that contains the fatigue life of all three
future specimens with 95 % confidence. The desired interval is a one-sided, lower
95 % prediction interval. Since tables of factors for constructing L, are widely
available for normal models, we use the results corresponding to the normal model
here for illustration. Specifically, L is computed as
where factor r is given in Table A.14 of Hahn and Meeker (1991) or can be
obtained from an R program.
1.4.2.10. Ceramic Strength
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a.htm[6/27/2012 2:03:54 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.10. Ceramic Strength
Ceramic
Strength
This case study analyzes the effect of machining factors on the
strength of ceramics.
1. Background and Data
2. Analysis of the Response Variable
3. Analysis of Batch Effect
4. Analysis of Lab Effect
5. Analysis of Primary Factors
6. Work This Example Yourself
1.4.2.10.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a1.htm[6/27/2012 2:03:55 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.10. Ceramic Strength
1.4.2.10.1. Background and Data
Generation The data for this case study were collected by Said Jahanmir
of the NIST Ceramics Division in 1996 in connection with a
NIST/industry ceramics consortium for strength optimization
of ceramic strength
The motivation for studying this data set is to illustrate the
analysis of multiple factors from a designed experiment
This case study will utilize only a subset of a full study that
was conducted by Lisa Gill and James Filliben of the NIST
Statistical Engineering Division
The response variable is a measure of the strength of the
ceramic material (bonded S
i
nitrate). The complete data set
contains the following variables:
1. Factor 1 = Observation ID, i.e., run number (1 to 960)
2. Factor 2 = Lab (1 to 8)
3. Factor 3 = Bar ID within lab (1 to 30)
4. Factor 4 = Test number (1 to 4)
5. Response Variable = Strength of Ceramic
6. Factor 5 = Table speed (2 levels: 0.025 and 0.125)
7. Factor 6 = Down feed rate (2 levels: 0.050 and 0.125)
8. Factor 7 = Wheel grit size (2 levels: 150 and 80)
9. Factor 8 = Direction (2 levels: longitudinal and
transverse)
10. Factor 9 = Treatment (1 to 16)
11. Factor 10 = Set of 15 within lab (2 levels: 1 and 2)
12. Factor 11 = Replication (2 levels: 1 and 2)
13. Factor 12 = Bar Batch (1 and 2)
The four primary factors of interest are:
1. Table speed (X1)
2. Down feed rate (X2)
3. Wheel grit size (X3)
4. Direction (X4)
For this case study, we are using only half the data.
Specifically, we are using the data with the direction
longitudinal. Therefore, we have only three primary factors
1.4.2.10.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a1.htm[6/27/2012 2:03:55 PM]
In addition, we are interested in the nuisance factors
1. Lab
2. Batch
Purpose of
Analysis
The goals of this case study are:
1. Determine which of the four primary factors has the
strongest effect on the strength of the ceramic material
2. Estimate the magnitude of the effects
3. Determine the optimal settings for the primary factors
4. Determine if the nuisance factors (lab and batch) have
an effect on the ceramic strength
This case study is an example of a designed experiment. The
Process Improvement chapter contains a detailed discussion of
the construction and analysis of designed experiments. This
case study is meant to complement the material in that chapter
by showing how an EDA approach (emphasizing the use of
graphical techniques) can be used in the analysis of designed
experiments
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Data The following are the data used for this case study
Run Lab Batch Y X1 X2 X3
1 1 1 608.781 -1 -1 -1
2 1 2 569.670 -1 -1 -1
3 1 1 689.556 -1 -1 -1
4 1 2 747.541 -1 -1 -1
5 1 1 618.134 -1 -1 -1
6 1 2 612.182 -1 -1 -1
7 1 1 680.203 -1 -1 -1
8 1 2 607.766 -1 -1 -1
9 1 1 726.232 -1 -1 -1
10 1 2 605.380 -1 -1 -1
11 1 1 518.655 -1 -1 -1
12 1 2 589.226 -1 -1 -1
13 1 1 740.447 -1 -1 -1
14 1 2 588.375 -1 -1 -1
15 1 1 666.830 -1 -1 -1
16 1 2 531.384 -1 -1 -1
17 1 1 710.272 -1 -1 -1
18 1 2 633.417 -1 -1 -1
19 1 1 751.669 -1 -1 -1
20 1 2 619.060 -1 -1 -1
21 1 1 697.979 -1 -1 -1
22 1 2 632.447 -1 -1 -1
23 1 1 708.583 -1 -1 -1
24 1 2 624.256 -1 -1 -1
25 1 1 624.972 -1 -1 -1
26 1 2 575.143 -1 -1 -1
27 1 1 695.070 -1 -1 -1
28 1 2 549.278 -1 -1 -1
29 1 1 769.391 -1 -1 -1
30 1 2 624.972 -1 -1 -1
61 1 1 720.186 -1 1 1
62 1 2 587.695 -1 1 1
63 1 1 723.657 -1 1 1
64 1 2 569.207 -1 1 1
65 1 1 703.700 -1 1 1
66 1 2 613.257 -1 1 1
67 1 1 697.626 -1 1 1
68 1 2 565.737 -1 1 1
1.4.2.10.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a1.htm[6/27/2012 2:03:55 PM]
69 1 1 714.980 -1 1 1
70 1 2 662.131 -1 1 1
71 1 1 657.712 -1 1 1
72 1 2 543.177 -1 1 1
73 1 1 609.989 -1 1 1
74 1 2 512.394 -1 1 1
75 1 1 650.771 -1 1 1
76 1 2 611.190 -1 1 1
77 1 1 707.977 -1 1 1
78 1 2 659.982 -1 1 1
79 1 1 712.199 -1 1 1
80 1 2 569.245 -1 1 1
81 1 1 709.631 -1 1 1
82 1 2 725.792 -1 1 1
83 1 1 703.160 -1 1 1
84 1 2 608.960 -1 1 1
85 1 1 744.822 -1 1 1
86 1 2 586.060 -1 1 1
87 1 1 719.217 -1 1 1
88 1 2 617.441 -1 1 1
89 1 1 619.137 -1 1 1
90 1 2 592.845 -1 1 1
151 2 1 753.333 1 1 1
152 2 2 631.754 1 1 1
153 2 1 677.933 1 1 1
154 2 2 588.113 1 1 1
155 2 1 735.919 1 1 1
156 2 2 555.724 1 1 1
157 2 1 695.274 1 1 1
158 2 2 702.411 1 1 1
159 2 1 504.167 1 1 1
160 2 2 631.754 1 1 1
161 2 1 693.333 1 1 1
162 2 2 698.254 1 1 1
163 2 1 625.000 1 1 1
164 2 2 616.791 1 1 1
165 2 1 596.667 1 1 1
166 2 2 551.953 1 1 1
167 2 1 640.898 1 1 1
168 2 2 636.738 1 1 1
169 2 1 720.506 1 1 1
170 2 2 571.551 1 1 1
171 2 1 700.748 1 1 1
172 2 2 521.667 1 1 1
173 2 1 691.604 1 1 1
174 2 2 587.451 1 1 1
175 2 1 636.738 1 1 1
176 2 2 700.422 1 1 1
177 2 1 731.667 1 1 1
178 2 2 595.819 1 1 1
179 2 1 635.079 1 1 1
180 2 2 534.236 1 1 1
181 2 1 716.926 1 -1 -1
182 2 2 606.188 1 -1 -1
183 2 1 759.581 1 -1 -1
184 2 2 575.303 1 -1 -1
185 2 1 673.903 1 -1 -1
186 2 2 590.628 1 -1 -1
187 2 1 736.648 1 -1 -1
188 2 2 729.314 1 -1 -1
189 2 1 675.957 1 -1 -1
190 2 2 619.313 1 -1 -1
191 2 1 729.230 1 -1 -1
192 2 2 624.234 1 -1 -1
193 2 1 697.239 1 -1 -1
194 2 2 651.304 1 -1 -1
195 2 1 728.499 1 -1 -1
196 2 2 724.175 1 -1 -1
197 2 1 797.662 1 -1 -1
198 2 2 583.034 1 -1 -1
199 2 1 668.530 1 -1 -1
200 2 2 620.227 1 -1 -1
201 2 1 815.754 1 -1 -1
202 2 2 584.861 1 -1 -1
203 2 1 777.392 1 -1 -1
204 2 2 565.391 1 -1 -1
205 2 1 712.140 1 -1 -1
206 2 2 622.506 1 -1 -1
207 2 1 663.622 1 -1 -1
208 2 2 628.336 1 -1 -1
209 2 1 684.181 1 -1 -1
1.4.2.10.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a1.htm[6/27/2012 2:03:55 PM]
210 2 2 587.145 1 -1 -1
271 3 1 629.012 1 -1 1
272 3 2 584.319 1 -1 1
273 3 1 640.193 1 -1 1
274 3 2 538.239 1 -1 1
275 3 1 644.156 1 -1 1
276 3 2 538.097 1 -1 1
277 3 1 642.469 1 -1 1
278 3 2 595.686 1 -1 1
279 3 1 639.090 1 -1 1
280 3 2 648.935 1 -1 1
281 3 1 439.418 1 -1 1
282 3 2 583.827 1 -1 1
283 3 1 614.664 1 -1 1
284 3 2 534.905 1 -1 1
285 3 1 537.161 1 -1 1
286 3 2 569.858 1 -1 1
287 3 1 656.773 1 -1 1
288 3 2 617.246 1 -1 1
289 3 1 659.534 1 -1 1
290 3 2 610.337 1 -1 1
291 3 1 695.278 1 -1 1
292 3 2 584.192 1 -1 1
293 3 1 734.040 1 -1 1
294 3 2 598.853 1 -1 1
295 3 1 687.665 1 -1 1
296 3 2 554.774 1 -1 1
297 3 1 710.858 1 -1 1
298 3 2 605.694 1 -1 1
299 3 1 701.716 1 -1 1
300 3 2 627.516 1 -1 1
301 3 1 382.133 1 1 -1
302 3 2 574.522 1 1 -1
303 3 1 719.744 1 1 -1
304 3 2 582.682 1 1 -1
305 3 1 756.820 1 1 -1
306 3 2 563.872 1 1 -1
307 3 1 690.978 1 1 -1
308 3 2 715.962 1 1 -1
309 3 1 670.864 1 1 -1
310 3 2 616.430 1 1 -1
311 3 1 670.308 1 1 -1
312 3 2 778.011 1 1 -1
313 3 1 660.062 1 1 -1
314 3 2 604.255 1 1 -1
315 3 1 790.382 1 1 -1
316 3 2 571.906 1 1 -1
317 3 1 714.750 1 1 -1
318 3 2 625.925 1 1 -1
319 3 1 716.959 1 1 -1
320 3 2 682.426 1 1 -1
321 3 1 603.363 1 1 -1
322 3 2 707.604 1 1 -1
323 3 1 713.796 1 1 -1
324 3 2 617.400 1 1 -1
325 3 1 444.963 1 1 -1
326 3 2 689.576 1 1 -1
327 3 1 723.276 1 1 -1
328 3 2 676.678 1 1 -1
329 3 1 745.527 1 1 -1
330 3 2 563.290 1 1 -1
361 4 1 778.333 -1 -1 1
362 4 2 581.879 -1 -1 1
363 4 1 723.349 -1 -1 1
364 4 2 447.701 -1 -1 1
365 4 1 708.229 -1 -1 1
366 4 2 557.772 -1 -1 1
367 4 1 681.667 -1 -1 1
368 4 2 593.537 -1 -1 1
369 4 1 566.085 -1 -1 1
370 4 2 632.585 -1 -1 1
371 4 1 687.448 -1 -1 1
372 4 2 671.350 -1 -1 1
373 4 1 597.500 -1 -1 1
374 4 2 569.530 -1 -1 1
375 4 1 637.410 -1 -1 1
376 4 2 581.667 -1 -1 1
377 4 1 755.864 -1 -1 1
378 4 2 643.449 -1 -1 1
379 4 1 692.945 -1 -1 1
380 4 2 581.593 -1 -1 1
1.4.2.10.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a1.htm[6/27/2012 2:03:55 PM]
381 4 1 766.532 -1 -1 1
382 4 2 494.122 -1 -1 1
383 4 1 725.663 -1 -1 1
384 4 2 620.948 -1 -1 1
385 4 1 698.818 -1 -1 1
386 4 2 615.903 -1 -1 1
387 4 1 760.000 -1 -1 1
388 4 2 606.667 -1 -1 1
389 4 1 775.272 -1 -1 1
390 4 2 579.167 -1 -1 1
421 4 1 708.885 -1 1 -1
422 4 2 662.510 -1 1 -1
423 4 1 727.201 -1 1 -1
424 4 2 436.237 -1 1 -1
425 4 1 642.560 -1 1 -1
426 4 2 644.223 -1 1 -1
427 4 1 690.773 -1 1 -1
428 4 2 586.035 -1 1 -1
429 4 1 688.333 -1 1 -1
430 4 2 620.833 -1 1 -1
431 4 1 743.973 -1 1 -1
432 4 2 652.535 -1 1 -1
433 4 1 682.461 -1 1 -1
434 4 2 593.516 -1 1 -1
435 4 1 761.430 -1 1 -1
436 4 2 587.451 -1 1 -1
437 4 1 691.542 -1 1 -1
438 4 2 570.964 -1 1 -1
439 4 1 643.392 -1 1 -1
440 4 2 645.192 -1 1 -1
441 4 1 697.075 -1 1 -1
442 4 2 540.079 -1 1 -1
443 4 1 708.229 -1 1 -1
444 4 2 707.117 -1 1 -1
445 4 1 746.467 -1 1 -1
446 4 2 621.779 -1 1 -1
447 4 1 744.819 -1 1 -1
448 4 2 585.777 -1 1 -1
449 4 1 655.029 -1 1 -1
450 4 2 703.980 -1 1 -1
541 5 1 715.224 -1 -1 -1
542 5 2 698.237 -1 -1 -1
543 5 1 614.417 -1 -1 -1
544 5 2 757.120 -1 -1 -1
545 5 1 761.363 -1 -1 -1
546 5 2 621.751 -1 -1 -1
547 5 1 716.106 -1 -1 -1
548 5 2 472.125 -1 -1 -1
549 5 1 659.502 -1 -1 -1
550 5 2 612.700 -1 -1 -1
551 5 1 730.781 -1 -1 -1
552 5 2 583.170 -1 -1 -1
553 5 1 546.928 -1 -1 -1
554 5 2 599.771 -1 -1 -1
555 5 1 734.203 -1 -1 -1
556 5 2 549.227 -1 -1 -1
557 5 1 682.051 -1 -1 -1
558 5 2 605.453 -1 -1 -1
559 5 1 701.341 -1 -1 -1
560 5 2 569.599 -1 -1 -1
561 5 1 759.729 -1 -1 -1
562 5 2 637.233 -1 -1 -1
563 5 1 689.942 -1 -1 -1
564 5 2 621.774 -1 -1 -1
565 5 1 769.424 -1 -1 -1
566 5 2 558.041 -1 -1 -1
567 5 1 715.286 -1 -1 -1
568 5 2 583.170 -1 -1 -1
569 5 1 776.197 -1 -1 -1
570 5 2 345.294 -1 -1 -1
571 5 1 547.099 1 -1 1
572 5 2 570.999 1 -1 1
573 5 1 619.942 1 -1 1
574 5 2 603.232 1 -1 1
575 5 1 696.046 1 -1 1
576 5 2 595.335 1 -1 1
577 5 1 573.109 1 -1 1
578 5 2 581.047 1 -1 1
579 5 1 638.794 1 -1 1
580 5 2 455.878 1 -1 1
581 5 1 708.193 1 -1 1
1.4.2.10.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a1.htm[6/27/2012 2:03:55 PM]
582 5 2 627.880 1 -1 1
583 5 1 502.825 1 -1 1
584 5 2 464.085 1 -1 1
585 5 1 632.633 1 -1 1
586 5 2 596.129 1 -1 1
587 5 1 683.382 1 -1 1
588 5 2 640.371 1 -1 1
589 5 1 684.812 1 -1 1
590 5 2 621.471 1 -1 1
591 5 1 738.161 1 -1 1
592 5 2 612.727 1 -1 1
593 5 1 671.492 1 -1 1
594 5 2 606.460 1 -1 1
595 5 1 709.771 1 -1 1
596 5 2 571.760 1 -1 1
597 5 1 685.199 1 -1 1
598 5 2 599.304 1 -1 1
599 5 1 624.973 1 -1 1
600 5 2 579.459 1 -1 1
601 6 1 757.363 1 1 1
602 6 2 761.511 1 1 1
603 6 1 633.417 1 1 1
604 6 2 566.969 1 1 1
605 6 1 658.754 1 1 1
606 6 2 654.397 1 1 1
607 6 1 664.666 1 1 1
608 6 2 611.719 1 1 1
609 6 1 663.009 1 1 1
610 6 2 577.409 1 1 1
611 6 1 773.226 1 1 1
612 6 2 576.731 1 1 1
613 6 1 708.261 1 1 1
614 6 2 617.441 1 1 1
615 6 1 739.086 1 1 1
616 6 2 577.409 1 1 1
617 6 1 667.786 1 1 1
618 6 2 548.957 1 1 1
619 6 1 674.481 1 1 1
620 6 2 623.315 1 1 1
621 6 1 695.688 1 1 1
622 6 2 621.761 1 1 1
623 6 1 588.288 1 1 1
624 6 2 553.978 1 1 1
625 6 1 545.610 1 1 1
626 6 2 657.157 1 1 1
627 6 1 752.305 1 1 1
628 6 2 610.882 1 1 1
629 6 1 684.523 1 1 1
630 6 2 552.304 1 1 1
631 6 1 717.159 -1 1 -1
632 6 2 545.303 -1 1 -1
633 6 1 721.343 -1 1 -1
634 6 2 651.934 -1 1 -1
635 6 1 750.623 -1 1 -1
636 6 2 635.240 -1 1 -1
637 6 1 776.488 -1 1 -1
638 6 2 641.083 -1 1 -1
639 6 1 750.623 -1 1 -1
640 6 2 645.321 -1 1 -1
641 6 1 600.840 -1 1 -1
642 6 2 566.127 -1 1 -1
643 6 1 686.196 -1 1 -1
644 6 2 647.844 -1 1 -1
645 6 1 687.870 -1 1 -1
646 6 2 554.815 -1 1 -1
647 6 1 725.527 -1 1 -1
648 6 2 620.087 -1 1 -1
649 6 1 658.796 -1 1 -1
650 6 2 711.301 -1 1 -1
651 6 1 690.380 -1 1 -1
652 6 2 644.355 -1 1 -1
653 6 1 737.144 -1 1 -1
654 6 2 713.812 -1 1 -1
655 6 1 663.851 -1 1 -1
656 6 2 696.707 -1 1 -1
657 6 1 766.630 -1 1 -1
658 6 2 589.453 -1 1 -1
659 6 1 625.922 -1 1 -1
660 6 2 634.468 -1 1 -1
721 7 1 694.430 1 1 -1
722 7 2 599.751 1 1 -1
1.4.2.10.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a1.htm[6/27/2012 2:03:55 PM]
723 7 1 730.217 1 1 -1
724 7 2 624.542 1 1 -1
725 7 1 700.770 1 1 -1
726 7 2 723.505 1 1 -1
727 7 1 722.242 1 1 -1
728 7 2 674.717 1 1 -1
729 7 1 763.828 1 1 -1
730 7 2 608.539 1 1 -1
731 7 1 695.668 1 1 -1
732 7 2 612.135 1 1 -1
733 7 1 688.887 1 1 -1
734 7 2 591.935 1 1 -1
735 7 1 531.021 1 1 -1
736 7 2 676.656 1 1 -1
737 7 1 698.915 1 1 -1
738 7 2 647.323 1 1 -1
739 7 1 735.905 1 1 -1
740 7 2 811.970 1 1 -1
741 7 1 732.039 1 1 -1
742 7 2 603.883 1 1 -1
743 7 1 751.832 1 1 -1
744 7 2 608.643 1 1 -1
745 7 1 618.663 1 1 -1
746 7 2 630.778 1 1 -1
747 7 1 744.845 1 1 -1
748 7 2 623.063 1 1 -1
749 7 1 690.826 1 1 -1
750 7 2 472.463 1 1 -1
811 7 1 666.893 -1 1 1
812 7 2 645.932 -1 1 1
813 7 1 759.860 -1 1 1
814 7 2 577.176 -1 1 1
815 7 1 683.752 -1 1 1
816 7 2 567.530 -1 1 1
817 7 1 729.591 -1 1 1
818 7 2 821.654 -1 1 1
819 7 1 730.706 -1 1 1
820 7 2 684.490 -1 1 1
821 7 1 763.124 -1 1 1
822 7 2 600.427 -1 1 1
823 7 1 724.193 -1 1 1
824 7 2 686.023 -1 1 1
825 7 1 630.352 -1 1 1
826 7 2 628.109 -1 1 1
827 7 1 750.338 -1 1 1
828 7 2 605.214 -1 1 1
829 7 1 752.417 -1 1 1
830 7 2 640.260 -1 1 1
831 7 1 707.899 -1 1 1
832 7 2 700.767 -1 1 1
833 7 1 715.582 -1 1 1
834 7 2 665.924 -1 1 1
835 7 1 728.746 -1 1 1
836 7 2 555.926 -1 1 1
837 7 1 591.193 -1 1 1
838 7 2 543.299 -1 1 1
839 7 1 592.252 -1 1 1
840 7 2 511.030 -1 1 1
901 8 1 740.833 -1 -1 1
902 8 2 583.994 -1 -1 1
903 8 1 786.367 -1 -1 1
904 8 2 611.048 -1 -1 1
905 8 1 712.386 -1 -1 1
906 8 2 623.338 -1 -1 1
907 8 1 738.333 -1 -1 1
908 8 2 679.585 -1 -1 1
909 8 1 741.480 -1 -1 1
910 8 2 665.004 -1 -1 1
911 8 1 729.167 -1 -1 1
912 8 2 655.860 -1 -1 1
913 8 1 795.833 -1 -1 1
914 8 2 715.711 -1 -1 1
915 8 1 723.502 -1 -1 1
916 8 2 611.999 -1 -1 1
917 8 1 718.333 -1 -1 1
918 8 2 577.722 -1 -1 1
919 8 1 768.080 -1 -1 1
920 8 2 615.129 -1 -1 1
921 8 1 747.500 -1 -1 1
922 8 2 540.316 -1 -1 1
923 8 1 775.000 -1 -1 1
1.4.2.10.1. Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a1.htm[6/27/2012 2:03:55 PM]
924 8 2 711.667 -1 -1 1
925 8 1 760.599 -1 -1 1
926 8 2 639.167 -1 -1 1
927 8 1 758.333 -1 -1 1
928 8 2 549.491 -1 -1 1
929 8 1 682.500 -1 -1 1
930 8 2 684.167 -1 -1 1
931 8 1 658.116 1 -1 -1
932 8 2 672.153 1 -1 -1
933 8 1 738.213 1 -1 -1
934 8 2 594.534 1 -1 -1
935 8 1 681.236 1 -1 -1
936 8 2 627.650 1 -1 -1
937 8 1 704.904 1 -1 -1
938 8 2 551.870 1 -1 -1
939 8 1 693.623 1 -1 -1
940 8 2 594.534 1 -1 -1
941 8 1 624.993 1 -1 -1
942 8 2 602.660 1 -1 -1
943 8 1 700.228 1 -1 -1
944 8 2 585.450 1 -1 -1
945 8 1 611.874 1 -1 -1
946 8 2 555.724 1 -1 -1
947 8 1 579.167 1 -1 -1
948 8 2 574.934 1 -1 -1
949 8 1 720.872 1 -1 -1
950 8 2 584.625 1 -1 -1
951 8 1 690.320 1 -1 -1
952 8 2 555.724 1 -1 -1
953 8 1 677.933 1 -1 -1
954 8 2 611.874 1 -1 -1
955 8 1 674.600 1 -1 -1
956 8 2 698.254 1 -1 -1
957 8 1 611.999 1 -1 -1
958 8 2 748.130 1 -1 -1
959 8 1 530.680 1 -1 -1
960 8 2 689.942 1 -1 -1
1.4.2.10.2. Analysis of the Response Variable
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a2.htm[6/27/2012 2:03:56 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.10. Ceramic Strength
1.4.2.10.2. Analysis of the Response Variable
Numerical
Summary
As a first step in the analysis, common summary statistics are
computed for the response variable.
Sample size = 480
Mean = 650.0773
Median = 646.6275
Minimum = 345.2940
Maximum = 821.6540
Range = 476.3600
Stan. Dev. = 74.6383
4-Plot The next step is generate a 4-plot of the response variable.
This 4-plot shows:
1. The run sequence plot (upper left corner) shows that the
location and scale are relatively constant. It also shows a
few outliers on the low side. Most of the points are in
the range 500 to 750. However, there are about half a
dozen points in the 300 to 450 range that may require
special attention.
A run sequence plot is useful for designed experiments
in that it can reveal time effects. Time is normally a
nuisance factor. That is, the time order on which runs
are made should not have a significant effect on the
response. If a time effect does appear to exist, this
1.4.2.10.2. Analysis of the Response Variable
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a2.htm[6/27/2012 2:03:56 PM]
means that there is a potential bias in the experiment that
needs to be investigated and resolved.
2. The lag plot (the upper right corner) does not show any
significant structure. This is another tool for detecting
any potential time effect.
3. The histogram (the lower left corner) shows the
response appears to be reasonably symmetric, but with a
bimodal distribution.
4. The normal probability plot (the lower right corner)
shows some curvature indicating that distributions other
than the normal may provide a better fit.
1.4.2.10.3. Analysis of the Batch Effect
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a3.htm[6/27/2012 2:03:57 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.10. Ceramic Strength
1.4.2.10.3. Analysis of the Batch Effect
Batch is a
Nuisance
Factor
The two nuisance factors in this experiment are the batch
number and the lab. There are two batches and eight labs.
Ideally, these factors will have minimal effect on the
response variable.
We will investigate the batch factor first.
Bihistogram
This bihistogram shows the following.
1. There does appear to be a batch effect.
2. The batch 1 responses are centered at 700 while the
batch 2 responses are centered at 625. That is, the
batch effect is approximately 75 units.
3. The variability is comparable for the 2 batches.
4. Batch 1 has some skewness in the lower tail. Batch 2
has some skewness in the center of the distribution, but
not as much in the tails compared to batch 1.
5. Both batches have a few low-lying points.
Although we could stop with the bihistogram, we will show a
few other commonly used two-sample graphical techniques
1.4.2.10.3. Analysis of the Batch Effect
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a3.htm[6/27/2012 2:03:57 PM]
for comparison.
Quantile-
Quantile
Plot
This q-q plot shows the following.
1. Except for a few points in the right tail, the batch 1
values have higher quantiles than the batch 2 values.
This implies that batch 1 has a greater location value
than batch 2.
2. The q-q plot is not linear. This implies that the
difference between the batches is not explained simply
by a shift in location. That is, the variation and/or
skewness varies as well. From the bihistogram, it
appears that the skewness in batch 2 is the most likely
explanation for the non-linearity in the q-q plot.
Box Plot
This box plot shows the following.
1. The median for batch 1 is approximately 700 while the
median for batch 2 is approximately 600.
1.4.2.10.3. Analysis of the Batch Effect
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a3.htm[6/27/2012 2:03:57 PM]
2. The spread is reasonably similar for both batches,
maybe slightly larger for batch 1.
3. Both batches have a number of outliers on the low side.
Batch 2 also has a few outliers on the high side. Box
plots are a particularly effective method for identifying
the presence of outliers.
Block Plots A block plot is generated for each of the eight labs, with "1"
and "2" denoting the batch numbers. In the first plot, we do
not include any of the primary factors. The next 3 block plots
include one of the primary factors. Note that each of the 3
primary factors (table speed = X1, down feed rate = X2,
wheel grit size = X3) has 2 levels. With 8 labs and 2 levels
for the primary factor, we would expect 16 separate blocks on
these plots. The fact that some of these blocks are missing
indicates that some of the combinations of lab and primary
factor are empty.
These block plots show the following.
1. The mean for batch 1 is greater than the mean for batch
2 in all of the cases above. This is strong evidence that
the batch effect is real and consistent across labs and
primary factors.
Quantitative
Techniques
We can confirm some of the conclusions drawn from the
above graphics by using quantitative techniques. The F-test
can be used to test whether or not the variances from the two
batches are equal and the two sample t-test can be used to
test whether or not the means from the two batches are equal.
Summary statistics for each batch are shown below.
Batch 1:
NUMBER OF OBSERVATIONS = 240
MEAN = 688.9987
1.4.2.10.3. Analysis of the Batch Effect
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a3.htm[6/27/2012 2:03:57 PM]
STANDARD DEVIATION = 65.5491
VARIANCE = 4296.6845
Batch 2:
NUMBER OF OBSERVATIONS = 240
MEAN = 611.1559
STANDARD DEVIATION = 61.8543
VARIANCE = 3825.9544
F-Test The two-sided F-test indicates that the variances for the two
batches are not significantly different at the 5 % level.
H
0
:
1
2
=
2
2
H
a
:
1
2
2
2
Test statistic: F = 1.123
Numerator degrees of freedom:
1
= 239
Denominator degrees of freedom:
2
= 239
Significance level: = 0.05
Critical values: F
1-/2,
1
,
2
= 0.845
F
/2,
1
,
2
= 1.289
Critical region: Reject H
0
if F < 0.845 or F >
1.289
Two Sample
t-Test
Since the F-test indicates that the two batch variances are
equal, we can pool the variances for the two-sided, two-
sample t-test to compare batch means.
H
0
:
1
=
2
H
a
:
1
2
Test statistic: T = 13.3806
Pooled standard deviation: s
p
= 63.7285
Degrees of freedom: = 478
Significance level: = 0.05
Critical value: t
1-/2,
= 1.965
Critical region: Reject H
0
if |T| > 1.965
The t-test indicates that the mean for batch 1 is larger than
the mean for batch 2 at the 5 % significance level.
Conclusions We can draw the following conclusions from the above
analysis.
1. There is in fact a significant batch effect. This batch
effect is consistent across labs and primary factors.
2. The magnitude of the difference is on the order of 75 to
100 (with batch 2 being smaller than batch 1). The
standard deviations do not appear to be significantly
different.
3. There is some skewness in the batches.
This batch effect was completely unexpected by the scientific
investigators in this study.
Note that although the quantitative techniques support the
conclusions of unequal means and equal standard deviations,
they do not show the more subtle features of the data such as
the presence of outliers and the skewness of the batch 2 data.
1.4.2.10.3. Analysis of the Batch Effect
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a3.htm[6/27/2012 2:03:57 PM]
1.4.2.10.4. Analysis of the Lab Effect
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a4.htm[6/27/2012 2:03:58 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.10. Ceramic Strength
1.4.2.10.4. Analysis of the Lab Effect
Box Plot The next matter is to determine if there is a lab effect. The
first step is to generate a box plot for the ceramic strength
based on the lab.
This box plot shows the following.
1. There is minor variation in the medians for the 8 labs.
2. The scales are relatively constant for the labs.
3. Two of the labs (3 and 5) have outliers on the low side.
Box Plot for
Batch 1
Given that the previous section showed a distinct batch
effect, the next step is to generate the box plots for the two
batches separately.
1.4.2.10.4. Analysis of the Lab Effect
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a4.htm[6/27/2012 2:03:58 PM]
This box plot shows the following.
1. Each of the labs has a median in the 650 to 700 range.
2. The variability is relatively constant across the labs.
3. Each of the labs has at least one outlier on the low side.
Box Plot for
Batch 2
This box plot shows the following.
1. The medians are in the range 550 to 600.
2. There is a bit more variability, across the labs, for
batch2 compared to batch 1.
3. Six of the eight labs show outliers on the high side.
Three of the labs show outliers on the low side.
Conclusions We can draw the following conclusions about a possible lab
1.4.2.10.4. Analysis of the Lab Effect
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a4.htm[6/27/2012 2:03:58 PM]
effect from the above box plots.
1. The batch effect (of approximately 75 to 100 units) on
location dominates any lab effects.
2. It is reasonable to treat the labs as homogeneous.
1.4.2.10.5. Analysis of Primary Factors
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a5.htm[6/27/2012 2:03:59 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.10. Ceramic Strength
1.4.2.10.5. Analysis of Primary Factors
Main effects The first step in analyzing the primary factors is to determine which factors
are the most significant. The DOE scatter plot, DOE mean plot, and the DOE
standard deviation plots will be the primary tools, with "DOE" being short
for "design of experiments".
Since the previous pages showed a significant batch effect but a minimal lab
effect, we will generate separate plots for batch 1 and batch 2. However, the
labs will be treated as equivalent.
DOE
Scatter Plot
for Batch 1
This DOE scatter plot shows the following for batch 1.
1. Most of the points are between 500 and 800.
1.4.2.10.5. Analysis of Primary Factors
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a5.htm[6/27/2012 2:03:59 PM]
2. There are about a dozen or so points between 300 and 500.
3. Except for the outliers on the low side (i.e., the points between 300
and 500), the distribution of the points is comparable for the 3 primary
factors in terms of location and spread.
DOE Mean
Plot for
Batch 1
This DOE mean plot shows the following for batch 1.
1. The table speed factor (X1) is the most significant factor with an
effect, the difference between the two points, of approximately 35
units.
2. The wheel grit factor (X3) is the next most significant factor with an
effect of approximately 10 units.
3. The feed rate factor (X2) has minimal effect.
DOE SD
Plot for
Batch 1
1.4.2.10.5. Analysis of Primary Factors
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a5.htm[6/27/2012 2:03:59 PM]
This DOE standard deviation plot shows the following for batch 1.
1. The table speed factor (X1) has a significant difference in variability
between the levels of the factor. The difference is approximately 20
units.
2. The wheel grit factor (X3) and the feed rate factor (X2) have minimal
differences in variability.
DOE
Scatter Plot
for Batch 2
1.4.2.10.5. Analysis of Primary Factors
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a5.htm[6/27/2012 2:03:59 PM]
This DOE scatter plot shows the following for batch 2.
1. Most of the points are between 450 and 750.
2. There are a few outliers on both the low side and the high side.
3. Except for the outliers (i.e., the points less than 450 or greater than
750), the distribution of the points is comparable for the 3 primary
factors in terms of location and spread.
DOE Mean
Plot for
Batch 2
1.4.2.10.5. Analysis of Primary Factors
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a5.htm[6/27/2012 2:03:59 PM]
This DOE mean plot shows the following for batch 2.
1. The feed rate (X2) and wheel grit (X3) factors have an approximately
equal effect of about 15 or 20 units.
2. The table speed factor (X1) has a minimal effect.
DOE SD
Plot for
Batch 2
1.4.2.10.5. Analysis of Primary Factors
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a5.htm[6/27/2012 2:03:59 PM]
This DOE standard deviation plot shows the following for batch 2.
1. The difference in the standard deviations is roughly comparable for the
three factors (slightly less for the feed rate factor).
Interaction
Effects
The above plots graphically show the main effects. An additonal concern is
whether or not there any significant interaction effects.
Main effects and 2-term interaction effects are discussed in the chapter on
Process Improvement.
In the following DOE interaction plots, the labels on the plot give the
variables and the estimated effect. For example, factor 1 is table speed and it
has an estimated effect of 30.77 (it is actually -30.77 if the direction is taken
into account).
DOE
Interaction
Plot for
Batch 1
1.4.2.10.5. Analysis of Primary Factors
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a5.htm[6/27/2012 2:03:59 PM]
The ranked list of factors for batch 1 is:
1. Table speed (X1) with an estimated effect of -30.77.
2. The interaction of table speed (X1) and wheel grit (X3) with an
estimated effect of -20.25.
3. The interaction of table speed (X1) and feed rate (X2) with an
estimated effect of 9.7.
4. Wheel grit (X3) with an estimated effect of -7.18.
5. Down feed (X2) and the down feed interaction with wheel grit (X3)
are essentially zero.
DOE
Interaction
Plot for
Batch 2
1.4.2.10.5. Analysis of Primary Factors
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a5.htm[6/27/2012 2:03:59 PM]
The ranked list of factors for batch 2 is:
1. Down feed (X2) with an estimated effect of 18.22.
2. The interaction of table speed (X1) and wheel grit (X3) with an
estimated effect of -16.71.
3. Wheel grit (X3) with an estimated effect of -14.71
4. Remaining main effect and 2-factor interaction effects are essentially
zero.
Conclusions From the above plots, we can draw the following overall conclusions.
1. The batch effect (of approximately 75 units) is the dominant primary
factor.
2. The most important factors differ from batch to batch. See the above
text for the ranked list of factors with the estimated effects.
1.4.2.10.6. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a6.htm[6/27/2012 2:04:01 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.10. Ceramic Strength
1.4.2.10.6. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to use Dataplot to repeat the analysis
outlined in the case study description on the previous page. It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data. 1. You have read 1
column of numbers
into Dataplot,
variable Y.
2. Plot of the response variable
1. Numerical summary of Y.
2. 4-plot of Y.
1. The summary shows
the mean strength
is 650.08 and the
standard deviation
of the strength
is 74.64.
2. The 4-plot shows
no drift in
the location and
scale and a
1.4.2.10.6. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a6.htm[6/27/2012 2:04:01 PM]
bimodal
distribution.
3. Determine if there is a batch effect.
1. Generate a bihistogram based on
the 2 batches.
2. Generate a q-q plot.
3. Generate a box plot.
4. Generate block plots.
5. Perform a 2-sample t-test for
equal means.
6. Perform an F-test for equal
standard deviations.
1. The bihistogram
shows a distinct
batch effect of
approximately
75 units.
2. The q-q plot
shows that batch 1
and batch 2 do
not come from a
common
distribution.
3. The box plot
shows that there is
a batch effect of
approximately
75 to 100 units
and there are
some outliers.
4. The block plot
shows that the batch
effect is
consistent across
labs
and levels of the
primary factor.
5. The t-test
confirms the batch
effect with
respect to the means.
6. The F-test does
not indicate any
significant batch
effect with
respect to the
standard deviations.
4. Determine if there is a lab effect.
1. Generate a box plot for the labs
with the 2 batches combined.
2. Generate a box plot for the labs
for batch 1 only.
3. Generate a box plot for the labs
for batch 2 only.
1. The box plot
does not show a
significant lab
effect.
2. The box plot
does not show a
significant lab
effect for batch 1.
3. The box plot
does not show a
significant lab
effect for batch 2.
5. Analysis of primary factors.
1. Generate a DOE scatter plot for
batch 1.
1. The DOE scatter
plot shows the
range of the
points and the
1.4.2.10.6. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda42a6.htm[6/27/2012 2:04:01 PM]
2. Generate a DOE mean plot for
batch 1.
3. Generate a DOE sd plot for
batch 1.
4. Generate a DOE scatter plot for
batch 2.
5. Generate a DOE mean plot for
batch 2.
6. Generate a DOE sd plot for
batch 2.
7. Generate a DOE interaction
effects matrix plot for
batch 1.
8. Generate a DOE interaction
effects matrix plot for
batch 2.
presence of
outliers.
2. The DOE mean
plot shows that
table speed is
the most
significant
factor for batch 1.
3. The DOE sd plot
shows that
table speed has
the most
variability for
batch 1.
4. The DOE scatter
plot shows
the range of the
points and
the presence of
outliers.
5. The DOE mean
plot shows that
feed rate and
wheel grit are
the most
significant factors
for batch 2.
6. The DOE sd plot
shows that
the variability
is comparable
for all 3 factors
for batch 2.
7. The DOE
interaction effects
matrix plot
provides a ranked
list of factors
with the
estimated
effects.
8. The DOE
interaction effects
matrix plot
provides a ranked
list of factors
with the
estimated
effects.
1.4.3. References For Chapter 1: Exploratory Data Analysis
http://www.itl.nist.gov/div898/handbook/eda/section4/eda43.htm[6/27/2012 2:04:02 PM]
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.3. References For Chapter 1: Exploratory
Data Analysis
Anscombe, F. (1973), Graphs in Statistical Analysis, The American
Statistician, pp. 195-199.
Anscombe, F. and Tukey, J. W. (1963), The Examination and Analysis of
Residuals, Technometrics, pp. 141-160.
Barnett and Lewis (1994), Outliers in Statistical Data, 3rd. Ed., John
Wiley and Sons.
Birnbaum, Z. W. and Saunders, S. C. (1958), A Statistical Model for
Life-Length of Materials, Journal of the American Statistical Association,
53(281), pp. 151-160.
Bloomfield, Peter (1976), Fourier Analysis of Time Series, John Wiley
and Sons.
Box, G. E. P. and Cox, D. R. (1964), An Analysis of Transformations,
Journal of the Royal Statistical Society, pp. 211-243, discussion pp. 244-
252.
Box, G. E. P., Hunter, W. G., and Hunter, J. S. (1978), Statistics for
Experimenters: An Introduction to Design, Data Analysis, and Model
Building, John Wiley and Sons.
Box, G. E. P., and Jenkins, G. (1976), Time Series Analysis: Forecasting
and Control, Holden-Day.
Bradley, (1968). Distribution-Free Statistical Tests, Chapter 12.
Brown, M. B. and Forsythe, A. B. (1974), Journal of the American
Statistical Association, 69, pp. 364-367.
Chakravarti, Laha, and Roy, (1967). Handbook of Methods of Applied
Statistics, Volume I, John Wiley and Sons, pp. 392-394.
Chambers, John, William Cleveland, Beat Kleiner, and Paul Tukey,
(1983), Graphical Methods for Data Analysis, Wadsworth.
Chatfield, C. (1989). The Analysis of Time Series: An Introduction, Fourth
Edition, Chapman & Hall, New York, NY.
1.4.3. References For Chapter 1: Exploratory Data Analysis
http://www.itl.nist.gov/div898/handbook/eda/section4/eda43.htm[6/27/2012 2:04:02 PM]
Cleveland, William (1985), Elements of Graphing Data, Wadsworth.
Cleveland, William and Marylyn McGill, Editors (1988), Dynamic
Graphics for Statistics, Wadsworth.
Cleveland, William (1993), Visualizing Data, Hobart Press.
Devaney, Judy (1997), Equation Discovery Through Global Self-
Referenced Geometric Intervals and Machine Learning, Ph.d thesis,
George Mason University, Fairfax, VA.
Draper and Smith, (1981). Applied Regression Analysis, 2nd ed., John
Wiley and Sons.
du Toit, Steyn, and Stumpf (1986), Graphical Exploratory Data
Analysis, Springer-Verlag.
Efron and Gong (February 1983), A Leisurely Look at the Bootstrap, the
Jackknife, and Cross Validation, The American Statistician.
Evans, Hastings, and Peacock (2000), Statistical Distributions, 3rd. Ed.,
John Wiley and Sons.
Everitt, Brian (1978), Multivariate Techniques for Multivariate Data,
North-Holland.
Filliben, J. J. (February 1975), The Probability Plot Correlation
Coefficient Test for Normality, Technometrics, pp. 111-117.
Fuller Jr., E. R., Frieman, S. W., Quinn, J. B., Quinn, G. D., and Carter,
W. C. (1994), Fracture Mechanics Approach to the Design of Glass
Aircraft Windows: A Case Study, SPIE Proceedings, Vol. 2286, (Society
of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, WA).
Gill, Lisa (April 1997), Summary Analysis: High Performance Ceramics
Experiment to Characterize the Effect of Grinding Parameters on
Sintered Reaction Bonded Silicon Nitride, Reaction Bonded Silicon
Nitride, and Sintered Silicon Nitride , presented at the NIST - Ceramic
Machining Consortium, 10th Program Review Meeting, April 10, 1997.
Granger and Hatanaka (1964), Spectral Analysis of Economic Time
Series, Princeton University Press.
Grubbs, Frank (1950), Sample Criteria for Testing Outlying
Observations, Annals of Mathematical Statistics, 21(1) pp. 27-58.
Grubbs, Frank (February 1969), Procedures for Detecting Outlying
Observations in Samples, Technometrics, 11(1), pp. 1-21.
Hahn, G. J. and Meeker, W. Q. (1991), Statistical Intervals, John Wiley
and Sons.
1.4.3. References For Chapter 1: Exploratory Data Analysis
http://www.itl.nist.gov/div898/handbook/eda/section4/eda43.htm[6/27/2012 2:04:02 PM]
Harris, Robert L. (1996), Information Graphics, Management Graphics.
Hastie, T., Tibshirani, R. and Friedman, J. (2001), The Elements of
Statistical Learning: Data Mining, Inference, and Prediction, Springer-
Verlag, New York.
Hawkins, D. M. (1980), Identification of Outliers, Chapman and Hall.
Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect
and Handle Outliers", The ASQC Basic References in Quality Control:
Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.
Jenkins and Watts, (1968), Spectral Analysis and Its Applications,
Holden-Day.
Johnson, Kotz, and Balakrishnan, (1994), Continuous Univariate
Distributions, Volumes I and II, 2nd. Ed., John Wiley and Sons.
Johnson, Kotz, and Kemp, (1992), Univariate Discrete Distributions,
2nd. Ed., John Wiley and Sons.
Kuo, Way and Pierson, Marcia Martens, Eds. (1993), Quality Through
Engineering Design", specifically, the article Filliben, Cetinkunt, Yu, and
Dommenz (1993), Exploratory Data Analysis Techniques as Applied to a
High-Precision Turning Machine, Elsevier, New York, pp. 199-223.
Levene, H. (1960). In Contributions to Probability and Statistics: Essays
in Honor of Harold Hotelling, I. Olkin et al. eds., Stanford University
Press, pp. 278-292.
McNeil, Donald (1977), Interactive Data Analysis, John Wiley and Sons.
Mendenhall, William and Reinmuth, James (1982), Statistics for
Management and Ecomonics, Fourth Edition, Duxbury Press.
Mosteller, Frederick and Tukey, John (1977), Data Analysis and
Regression, Addison-Wesley.
Natrella, Mary (1963), Experimental Statistics, National Bureau of
Standards Handbook 91.
Nelson, Wayne (1982), Applied Life Data Analysis, Addison-Wesley.
Nelson, Wayne and Doganaksoy, Necip (1992), A Computer Program
POWNOR for Fitting the Power-Normal and -Lognormal Models to Life
or Strength Data from Specimens of Various Sizes, NISTIR 4760, U.S.
Department of Commerce, National Institute of Standards and
Technology.
Neter, Wasserman, and Kunter (1990). Applied Linear Statistical Models,
3rd ed., Irwin.
1.4.3. References For Chapter 1: Exploratory Data Analysis
http://www.itl.nist.gov/div898/handbook/eda/section4/eda43.htm[6/27/2012 2:04:02 PM]
Pepi, John W., (1994), Failsafe Design of an All BK-7 Glass Aircraft
Window, SPIE Proceedings, Vol. 2286, (Society of Photo-Optical
Instrumentation Engineers (SPIE), Bellingham, WA).
The RAND Corporation (1955), A Million Random Digits with 100,000
Normal Deviates, Free Press.
Rosner, Bernard (May 1983), Percentage Points for a Generalized ESD
Many-Outlier Procedure,Technometrics, 25(2), pp. 165-172.
Ryan, Thomas (1997), Modern Regression Methods, John Wiley.
Scott, David (1992), Multivariate Density Estimation: Theory, Practice,
and Visualization , John Wiley and Sons.
Snedecor, George W. and Cochran, William G. (1989), Statistical
Methods, Eighth Edition, Iowa State University Press.
Stefansky, W. (1972), Rejecting Outliers in Factorial Designs,
Technometrics, 14, pp. 469-479.
Stephens, M. A. (1974). EDF Statistics for Goodness of Fit and Some
Comparisons, Journal of the American Statistical Association, 69, pp.
730-737.
Stephens, M. A. (1976). Asymptotic Results for Goodness-of-Fit
Statistics with Unknown Parameters, Annals of Statistics, 4, pp. 357-369.
Stephens, M. A. (1977). Goodness of Fit for the Extreme Value
Distribution, Biometrika, 64, pp. 583-588.
Stephens, M. A. (1977). Goodness of Fit with Special Reference to Tests
for Exponentiality , Technical Report No. 262, Department of Statistics,
Stanford University, Stanford, CA.
Stephens, M. A. (1979). Tests of Fit for the Logistic Distribution Based
on the Empirical Distribution Function, Biometrika, 66, pp. 591-595.
Tietjen and Moore (August 1972), Some Grubbs-Type Statistics for the
Detection of Outliers, Technometrics, 14(3), pp. 583-597.
Tufte, Edward (1983), The Visual Display of Quantitative Information,
Graphics Press.
Tukey, John (1977), Exploratory Data Analysis, Addison-Wesley.
Velleman, Paul and Hoaglin, David (1981), The ABC's of EDA:
Applications, Basics, and Computing of Exploratory Data Analysis,
Duxbury.
Wainer, Howard (1981), Visual Revelations, Copernicus.
1.4.3. References For Chapter 1: Exploratory Data Analysis
http://www.itl.nist.gov/div898/handbook/eda/section4/eda43.htm[6/27/2012 2:04:02 PM]
Wilk, M. B. and Gnanadesikan, R. (1968), Probability Plotting Methods
for the Analysis of Data, Biometrika, 5(5), pp. 1-19.
2. Measurement Process Characterization
http://www.itl.nist.gov/div898/handbook/mpc/mpc.htm[6/27/2012 1:52:32 PM]
2. Measurement Process Characterization
1. Characterization
1. Issues
2. Check standards
2. Control
1. Issues
2. Bias and long-term variability
3. Short-term variability
3. Calibration
1. Issues
2. Artifacts
3. Designs
4. Catalog of designs
5. Artifact control
6. Instruments
7. Instrument control
4. Gauge R & R studies
1. Issues
2. Design
3. Data collection
4. Variability
5. Bias
6. Uncertainty
5. Uncertainty analysis
1. Issues
2. Approach
3. Type A evaluations
4. Type B evaluations
5. Propagation of error
6. Error budget
7. Expanded uncertainties
8. Uncorrected bias
6. Case Studies
1. Gauge study
2. Check standard
3. Type A uncertainty
4. Type B uncertainty
Detailed table of contents
References for Chapter 2
2. Measurement Process Characterization
http://www.itl.nist.gov/div898/handbook/mpc/mpc_d.htm[6/27/2012 1:50:07 PM]
2. Measurement Process Characterization - Detailed Table of
Contents
1. Characterization [2.1.]
1. What are the issues for characterization? [2.1.1.]
1. Purpose [2.1.1.1.]
2. Reference base [2.1.1.2.]
3. Bias and Accuracy [2.1.1.3.]
4. Variability [2.1.1.4.]
2. What is a check standard? [2.1.2.]
1. Assumptions [2.1.2.1.]
2. Data collection [2.1.2.2.]
3. Analysis [2.1.2.3.]
2. Statistical control of a measurement process [2.2.]
1. What are the issues in controlling the measurement process? [2.2.1.]
2. How are bias and variability controlled? [2.2.2.]
1. Shewhart control chart [2.2.2.1.]
1. EWMA control chart [2.2.2.1.1.]
2. Data collection [2.2.2.2.]
3. Monitoring bias and long-term variability [2.2.2.3.]
4. Remedial actions [2.2.2.4.]
3. How is short-term variability controlled? [2.2.3.]
1. Control chart for standard deviations [2.2.3.1.]
2. Data collection [2.2.3.2.]
3. Monitoring short-term precision [2.2.3.3.]
4. Remedial actions [2.2.3.4.]
3. Calibration [2.3.]
1. Issues in calibration [2.3.1.]
1. Reference base [2.3.1.1.]
2. Reference standards [2.3.1.2.]
2. What is artifact (single-point) calibration? [2.3.2.]
3. What are calibration designs? [2.3.3.]
1. Elimination of special types of bias [2.3.3.1.]
1. Left-right (constant instrument) bias [2.3.3.1.1.]
2. Bias caused by instrument drift [2.3.3.1.2.]
2. Solutions to calibration designs [2.3.3.2.]
1. General matrix solutions to calibration designs [2.3.3.2.1.]
3. Uncertainties of calibrated values [2.3.3.3.]
1. Type A evaluations for calibration designs [2.3.3.3.1.]
2. Repeatability and level-2 standard deviations [2.3.3.3.2.]
3. Combination of repeatability and level-2 standard deviations [2.3.3.3.3.]
2. Measurement Process Characterization
http://www.itl.nist.gov/div898/handbook/mpc/mpc_d.htm[6/27/2012 1:50:07 PM]
4. Calculation of standard deviations for 1,1,1,1 design [2.3.3.3.4.]
5. Type B uncertainty [2.3.3.3.5.]
6. Expanded uncertainties [2.3.3.3.6.]
4. Catalog of calibration designs [2.3.4.]
1. Mass weights [2.3.4.1.]
1. Design for 1,1,1 [2.3.4.1.1.]
2. Design for 1,1,1,1 [2.3.4.1.2.]
3. Design for 1,1,1,1,1 [2.3.4.1.3.]
4. Design for 1,1,1,1,1,1 [2.3.4.1.4.]
5. Design for 2,1,1,1 [2.3.4.1.5.]
6. Design for 2,2,1,1,1 [2.3.4.1.6.]
7. Design for 2,2,2,1,1 [2.3.4.1.7.]
8. Design for 5,2,2,1,1,1 [2.3.4.1.8.]
9. Design for 5,2,2,1,1,1,1 [2.3.4.1.9.]
10. Design for 5,3,2,1,1,1 [2.3.4.1.10.]
11. Design for 5,3,2,1,1,1,1 [2.3.4.1.11.]
12. Design for 5,3,2,2,1,1,1 [2.3.4.1.12.]
13. Design for 5,4,4,3,2,2,1,1 [2.3.4.1.13.]
14. Design for 5,5,2,2,1,1,1,1 [2.3.4.1.14.]
15. Design for 5,5,3,2,1,1,1 [2.3.4.1.15.]
16. Design for 1,1,1,1,1,1,1,1 weights [2.3.4.1.16.]
17. Design for 3,2,1,1,1 weights [2.3.4.1.17.]
18. Design for 10 and 20 pound weights [2.3.4.1.18.]
2. Drift-elimination designs for gage blocks [2.3.4.2.]
1. Doiron 3-6 Design [2.3.4.2.1.]
2. Doiron 3-9 Design [2.3.4.2.2.]
3. Doiron 4-8 Design [2.3.4.2.3.]
4. Doiron 4-12 Design [2.3.4.2.4.]
5. Doiron 5-10 Design [2.3.4.2.5.]
6. Doiron 6-12 Design [2.3.4.2.6.]
7. Doiron 7-14 Design [2.3.4.2.7.]
8. Doiron 8-16 Design [2.3.4.2.8.]
9. Doiron 9-18 Design [2.3.4.2.9.]
10. Doiron 10-20 Design [2.3.4.2.10.]
11. Doiron 11-22 Design [2.3.4.2.11.]
3. Designs for electrical quantities [2.3.4.3.]
1. Left-right balanced design for 3 standard cells [2.3.4.3.1.]
2. Left-right balanced design for 4 standard cells [2.3.4.3.2.]
3. Left-right balanced design for 5 standard cells [2.3.4.3.3.]
4. Left-right balanced design for 6 standard cells [2.3.4.3.4.]
5. Left-right balanced design for 4 references and 4 test items [2.3.4.3.5.]
6. Design for 8 references and 8 test items [2.3.4.3.6.]
7. Design for 4 reference zeners and 2 test zeners [2.3.4.3.7.]
8. Design for 4 reference zeners and 3 test zeners [2.3.4.3.8.]
9. Design for 3 references and 1 test resistor [2.3.4.3.9.]
10. Design for 4 references and 1 test resistor [2.3.4.3.10.]
4. Roundness measurements [2.3.4.4.]
1. Single trace roundness design [2.3.4.4.1.]
2. Multiple trace roundness designs [2.3.4.4.2.]
5. Designs for angle blocks [2.3.4.5.]
1. Design for 4 angle blocks [2.3.4.5.1.]
2. Design for 5 angle blocks [2.3.4.5.2.]
3. Design for 6 angle blocks [2.3.4.5.3.]
6. Thermometers in a bath [2.3.4.6.]
2. Measurement Process Characterization
http://www.itl.nist.gov/div898/handbook/mpc/mpc_d.htm[6/27/2012 1:50:07 PM]
7. Humidity standards [2.3.4.7.]
1. Drift-elimination design for 2 reference weights and 3 cylinders [2.3.4.7.1.]
5. Control of artifact calibration [2.3.5.]
1. Control of precision [2.3.5.1.]
1. Example of control chart for precision [2.3.5.1.1.]
2. Control of bias and long-term variability [2.3.5.2.]
1. Example of Shewhart control chart for mass calibrations [2.3.5.2.1.]
2. Example of EWMA control chart for mass calibrations [2.3.5.2.2.]
6. Instrument calibration over a regime [2.3.6.]
1. Models for instrument calibration [2.3.6.1.]
2. Data collection [2.3.6.2.]
3. Assumptions for instrument calibration [2.3.6.3.]
4. What can go wrong with the calibration procedure [2.3.6.4.]
1. Example of day-to-day changes in calibration [2.3.6.4.1.]
5. Data analysis and model validation [2.3.6.5.]
1. Data on load cell #32066 [2.3.6.5.1.]
6. Calibration of future measurements [2.3.6.6.]
7. Uncertainties of calibrated values [2.3.6.7.]
1. Uncertainty for quadratic calibration using propagation of error [2.3.6.7.1.]
2. Uncertainty for linear calibration using check standards [2.3.6.7.2.]
3. Comparison of check standard analysis and propagation of error [2.3.6.7.3.]
7. Instrument control for linear calibration [2.3.7.]
1. Control chart for a linear calibration line [2.3.7.1.]
4. Gauge R & R studies [2.4.]
1. What are the important issues? [2.4.1.]
2. Design considerations [2.4.2.]
3. Data collection for time-related sources of variability [2.4.3.]
1. Simple design [2.4.3.1.]
2. 2-level nested design [2.4.3.2.]
3. 3-level nested design [2.4.3.3.]
4. Analysis of variability [2.4.4.]
1. Analysis of repeatability [2.4.4.1.]
2. Analysis of reproducibility [2.4.4.2.]
3. Analysis of stability [2.4.4.3.]
1. Example of calculations [2.4.4.4.4.]
5. Analysis of bias [2.4.5.]
1. Resolution [2.4.5.1.]
2. Linearity of the gauge [2.4.5.2.]
3. Drift [2.4.5.3.]
4. Differences among gauges [2.4.5.4.]
5. Geometry/configuration differences [2.4.5.5.]
6. Remedial actions and strategies [2.4.5.6.]
6. Quantifying uncertainties from a gauge study [2.4.6.]
5. Uncertainty analysis [2.5.]
1. Issues [2.5.1.]
2. Approach [2.5.2.]
1. Steps [2.5.2.1.]
3. Type A evaluations [2.5.3.]
1. Type A evaluations of random components [2.5.3.1.]
1. Type A evaluations of time-dependent effects [2.5.3.1.1.]
2. Measurement configuration within the laboratory [2.5.3.1.2.]
2. Material inhomogeneity [2.5.3.2.]
2. Measurement Process Characterization
http://www.itl.nist.gov/div898/handbook/mpc/mpc_d.htm[6/27/2012 1:50:07 PM]
1. Data collection and analysis [2.5.3.2.1.]
3. Type A evaluations of bias [2.5.3.3.]
1. Inconsistent bias [2.5.3.3.1.]
2. Consistent bias [2.5.3.3.2.]
3. Bias with sparse data [2.5.3.3.3.]
4. Type B evaluations [2.5.4.]
1. Standard deviations from assumed distributions [2.5.4.1.]
5. Propagation of error considerations [2.5.5.]
1. Formulas for functions of one variable [2.5.5.1.]
2. Formulas for functions of two variables [2.5.5.2.]
3. Propagation of error for many variables [2.5.5.3.]
6. Uncertainty budgets and sensitivity coefficients [2.5.6.]
1. Sensitivity coefficients for measurements on the test item [2.5.6.1.]
2. Sensitivity coefficients for measurements on a check standard [2.5.6.2.]
3. Sensitivity coefficients for measurements from a 2-level design [2.5.6.3.]
4. Sensitivity coefficients for measurements from a 3-level design [2.5.6.4.]
5. Example of uncertainty budget [2.5.6.5.]
7. Standard and expanded uncertainties [2.5.7.]
1. Degrees of freedom [2.5.7.1.]
8. Treatment of uncorrected bias [2.5.8.]
1. Computation of revised uncertainty [2.5.8.1.]
6. Case studies [2.6.]
1. Gauge study of resistivity probes [2.6.1.]
1. Background and data [2.6.1.1.]
1. Database of resistivity measurements [2.6.1.1.1.]
2. Analysis and interpretation [2.6.1.2.]
3. Repeatability standard deviations [2.6.1.3.]
4. Effects of days and long-term stability [2.6.1.4.]
5. Differences among 5 probes [2.6.1.5.]
6. Run gauge study example using Dataplot [2.6.1.6.]
7. Dataplot macros [2.6.1.7.]
2. Check standard for resistivity measurements [2.6.2.]
1. Background and data [2.6.2.1.]
1. Database for resistivity check standard [2.6.2.1.1.]
2. Analysis and interpretation [2.6.2.2.]
1. Repeatability and level-2 standard deviations [2.6.2.2.1.]
3. Control chart for probe precision [2.6.2.3.]
4. Control chart for bias and long-term variability [2.6.2.4.]
5. Run check standard example yourself [2.6.2.5.]
6. Dataplot macros [2.6.2.6.]
3. Evaluation of type A uncertainty [2.6.3.]
1. Background and data [2.6.3.1.]
1. Database of resistivity measurements [2.6.3.1.1.]
2. Measurements on wiring configurations [2.6.3.1.2.]
2. Analysis and interpretation [2.6.3.2.]
1. Difference between 2 wiring configurations [2.6.3.2.1.]
3. Run the type A uncertainty analysis using Dataplot [2.6.3.3.]
4. Dataplot macros [2.6.3.4.]
4. Evaluation of type B uncertainty and propagation of error [2.6.4.]
7. References [2.7.]
2. Measurement Process Characterization
http://www.itl.nist.gov/div898/handbook/mpc/mpc_d.htm[6/27/2012 1:50:07 PM]
2.1. Characterization
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc1.htm[6/27/2012 1:50:12 PM]
2. Measurement Process Characterization
2.1. Characterization
The primary goal of this section is to lay the groundwork for
understanding the measurement process in terms of the errors
that affect the process.
What are the issues for characterization?
1. Purpose
2. Reference base
3. Bias and Accuracy
4. Variability
What is a check standard?
1. Assumptions
2. Data collection
3. Analysis
2.1.1. What are the issues for characterization?
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc11.htm[6/27/2012 1:50:13 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.1. What are the issues for characterization?
'Goodness' of
measurements
A measurement process can be thought of as a well-run
production process in which measurements are the output.
The 'goodness' of measurements is the issue, and goodness
is characterized in terms of the errors that affect the
measurements.
Bias,
variability
and
uncertainty
The goodness of measurements is quantified in terms of
Bias
Short-term variability or instrument precision
Day-to-day or long-term variability
Uncertainty
Requires
ongoing
statistical
control
program
The continuation of goodness is guaranteed by a statistical
control program that controls both
Short-term variability or instrument precision
Long-term variability which controls bias and day-
to-day variability of the process
Scope is
limited to
ongoing
processes
The techniques in this chapter are intended primarily for
ongoing processes. One-time tests and special tests or
destructive tests are difficult to characterize. Examples of
ongoing processes are:
Calibration where similar test items are measured on
a regular basis
Certification where materials are characterized on a
regular basis
Production where the metrology (tool) errors may be
significant
Special studies where data can be collected over the
life of the study
Application to
production
processes
The material in this chapter is pertinent to the study of
production processes for which the size of the metrology
(tool) error may be an important consideration. More
specific guidance on assessing metrology errors can be
found in the section on gauge studies.
2.1.1. What are the issues for characterization?
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc11.htm[6/27/2012 1:50:13 PM]
2.1.1.1. Purpose
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc111.htm[6/27/2012 1:50:14 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.1. What are the issues for characterization?
2.1.1.1. Purpose
Purpose is
to
understand
and
quantify
the effect
of error on
reported
values
The purpose of characterization is to develop an understanding
of the sources of error in the measurement process and how
they affect specific measurement results. This section provides
the background for:
identifying sources of error in the measurement process
understanding and quantifying errors in the
measurement process
codifying the effects of these errors on a specific
reported value in a statement of uncertainty
Important
concepts
Characterization relies upon the understanding of certain
underlying concepts of measurement systems; namely,
reference base (authority) for the measurement
bias
variability
check standard
Reported
value is a
generic
term that
identifies
the result
that is
transmitted
to the
customer
The reported value is the measurement result for a particular
test item. It can be:
a single measurement
an average of several measurements
a least-squares prediction from a model
a combination of several measurement results that are
related by a physical model
2.1.1.2. Reference base
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc112.htm[6/27/2012 1:50:14 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.1. What are the issues for characterization?
2.1.1.2. Reference base
Ultimate
authority
The most critical element of any measurement process is the
relationship between a single measurement and the reference
base for the unit of measurement. The reference base is the
ultimate source of authority for the measurement unit.
For
fundamental
units
Reference bases for fundamental units of measurement
(length, mass, temperature, voltage, and time) and some
derived units (such as pressure, force, flow rate, etc.) are
maintained by national and regional standards laboratories.
Consensus values from interlaboratory tests or
instrumentation/standards as maintained in specific
environments may serve as reference bases for other units of
measurement.
For
comparison
purposes
A reference base, for comparison purposes, may be based on
an agreement among participating laboratories or
organizations and derived from
measurements made with a standard test method
measurements derived from an interlaboratory test
2.1.1.3. Bias and Accuracy
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc113.htm[6/27/2012 1:50:15 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.1. What are the issues for characterization?
2.1.1.3. Bias and Accuracy
Definition of
Accuracy and
Bias
Accuracy is a qualitative term referring to whether there is
agreement between a measurement made on an object and
its true (target or reference) value. Bias is a quantitative
term describing the difference between the average of
measurements made on the same object and its true value.
In particular, for a measurement laboratory, bias is the
difference (generally unknown) between a laboratory's
average value (over time) for a test item and the average
that would be achieved by the reference laboratory if it
undertook the same measurements on the same test item.
Depiction of
bias and
unbiased
measurements
Unbiased measurements relative to the target
Biased measurements relative to the target
Identification
of bias
Bias in a measurement process can be identified by:
1. Calibration of standards and/or instruments by a
reference laboratory, where a value is assigned to the
client's standard based on comparisons with the
reference laboratory's standards.
2. Check standards , where violations of the control
limits on a control chart for the check standard
suggest that re-calibration of standards or instruments
is needed.
3. Measurement assurance programs, where artifacts
from a reference laboratory or other qualified agency
are sent to a client and measured in the client's
environment as a 'blind' sample.
4. Interlaboratory comparisons, where reference
standards or materials are circulated among several
laboratories.
Reduction of Bias can be eliminated or reduced by calibration of
2.1.1.3. Bias and Accuracy
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc113.htm[6/27/2012 1:50:15 PM]
bias standards and/or instruments. Because of costs and time
constraints, the majority of calibrations are performed by
secondary or tertiary laboratories and are related to the
reference base via a chain of intercomparisons that start at
the reference laboratory.
Bias can also be reduced by corrections to in-house
measurements based on comparisons with artifacts or
instruments circulated for that purpose (reference
materials).
Caution Errors that contribute to bias can be present even where all
equipment and standards are properly calibrated and under
control. Temperature probably has the most potential for
introducing this type of bias into the measurements. For
example, a constant heat source will introduce serious
errors in dimensional measurements of metal objects.
Temperature affects chemical and electrical measurements
as well.
Generally speaking, errors of this type can be identified
only by those who are thoroughly familiar with the
measurement technology. The reader is advised to consult
the technical literature and experts in the field for guidance.
2.1.1.4. Variability
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc114.htm[6/27/2012 1:50:15 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.1. What are the issues for characterization?
2.1.1.4. Variability
Sources of
time-dependent
variability
Variability is the tendency of the measurement process to
produce slightly different measurements on the same test
item, where conditions of measurement are either stable
or vary over time, temperature, operators, etc. In this
chapter we consider two sources of time-dependent
variability:
Short-term variability ascribed to the precision of
the instrument
Long-term variability related to changes in
environment and handling techniques
Depiction of
two
measurement
processes with
the same short-
term variability
over six days
where process
1 has large
between-day
variability and
process 2 has
negligible
between-day
variability
Process 1 Process
2
Large between-day variability Small between-
day variability
2.1.1.4. Variability
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc114.htm[6/27/2012 1:50:15 PM]
Distributions of short-term measurements over
6 days where distances from the centerlines
illustrate between-day variability
Short-term
variability
Short-term errors affect the precision of the instrument.
Even very precise instruments exhibit small changes
caused by random errors. It is useful to think in terms of
measurements performed with a single instrument over
minutes or hours; this is to be understood, normally, as
the time that it takes to complete a measurement
sequence.
Terminology Four terms are in common usage to describe short-term
phenomena. They are interchangeable.
1. precision
2. repeatability
3. within-time variability
4. short-term variability
Precision is
quantified by a
standard
deviation
The measure of precision is a standard deviation. Good
precision implies a small standard deviation. This
standard deviation is called the short-term standard
deviation of the process or the repeatability standard
deviation.
Caution --
long-term
variability may
be dominant
With very precise instrumentation, it is not unusual to
find that the variability exhibited by the measurement
process from day-to-day often exceeds the precision of
the instrument because of small changes in environmental
conditions and handling techniques which cannot be
2.1.1.4. Variability
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc114.htm[6/27/2012 1:50:15 PM]
controlled or corrected in the measurement process. The
measurement process is not completely characterized
until this source of variability is quantified.
Terminology Three terms are in common usage to describe long-term
phenomena. They are interchangeable.
1. day-to-day variability
2. long-term variability
3. reproducibility
Caution --
regarding term
'reproducibility'
The term 'reproducibility' is given very specific
definitions in some national and international standards.
However, the definitions are not always in agreement.
Therefore, it is used here only in a generic sense to
indicate variability across days.
Definitions in
this Handbook
We adopt precise definitions and provide data collection
and analysis techniques in the sections on check standards
and measurement control for estimating:
Level-1 standard deviation for short-term
variability
Level-2 standard deviation for day-to-day
variability
In the section on gauge studies, the concept of variability
is extended to include very long-term measurement
variability:
Level-1 standard deviation for short-term
variability
Level-2 standard deviation for day-to-day
variability
Level-3 standard deviation for very long-term
variability
We refer to the standard deviations associated with these
three kinds of uncertainty as "Level 1, 2, and 3 standard
deviations", respectively.
Long-term
variability is
quantified by a
standard
deviation
The measure of long-term variability is the standard
deviation of measurements taken over several days,
weeks or months.
The simplest method for doing this assessment is by
analysis of a check standard database. The measurements
on the check standards are structured to cover a long time
interval and to capture all sources of variation in the
measurement process.
2.1.1.4. Variability
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc114.htm[6/27/2012 1:50:15 PM]
2.1.2. What is a check standard?
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc12.htm[6/27/2012 1:50:16 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.2. What is a check standard?
A check
standard is
useful for
gathering
data on the
process
Check standard methodology is a tool for collecting data on
the measurement process to expose errors that afflict the
process over time. Time-dependent sources of error are
evaluated and quantified from the database of check
standard measurements. It is a device for controlling the
bias and long-term variability of the process once a
baseline for these quantities has been established from
historical data on the check standard.
Think in
terms of data
A check
standard can
be an artifact
or defined
quantity
The check standard should be thought of in terms of a
database of measurements. It can be defined as an artifact
or as a characteristic of the measurement process whose
value can be replicated from measurements taken over the
life of the process. Examples are:
measurements on a stable artifact
differences between values of two reference
standards as estimated from a calibration experiment
values of a process characteristic, such as a bias
term, which is estimated from measurements on
reference standards and/or test items.
An artifact check standard must be close in material
content and geometry to the test items that are measured in
the workload. If possible, it should be one of the test items
from the workload. Obviously, it should be a stable artifact
and should be available to the measurement process at all
times.
Solves the
difficulty of
sampling the
process
Measurement processes are similar to production processes
in that they are continual and are expected to produce
identical results (within acceptable limits) over time,
instruments, operators, and environmental conditions.
However, it is difficult to sample the output of the
measurement process because, normally, test items change
with each measurement sequence.
Surrogate for
unseen
measurements
Measurements on the check standard, spaced over time at
regular intervals, act as surrogates for measurements that
could be made on test items if sufficient time and resources
were available.
2.1.2. What is a check standard?
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc12.htm[6/27/2012 1:50:16 PM]
2.1.2.1. Assumptions
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc121.htm[6/27/2012 1:50:17 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.2. What is a check standard?
2.1.2.1. Assumptions
Case study:
Resistivity
check
standard
Before applying the quality control procedures
recommended in this chapter to check standard data, basic
assumptions should be examined. The basic assumptions
underlying the quality control procedures are:
1. The data come from a single statistical distribution.
2. The distribution is a normal distribution.
3. The errors are uncorrelated over time.
An easy method for checking the assumption of a single
normal distribution is to construct a histogram of the check
standard data. The histogram should follow a bell-shaped
pattern with a single hump. Types of anomalies that
indicate a problem with the measurement system are:
1. a double hump indicating that errors are being drawn
from two or more distributions;
2. long tails indicating outliers in the process;
3. flat pattern or one with humps at either end
indicating that the measurement process in not in
control or not properly specified.
Another graphical method for testing the normality
assumption is a probability plot. The points are expected to
fall approximately on a straight line if the data come from
a normal distribution. Outliers, or data from other
distributions, will produce an S-shaped curve.
A graphical method for testing for correlation among
measurements is a time-lag plot. Correlation will
frequently not be a problem if measurements are properly
structured over time. Correlation problems generally occur
when measurements are taken so close together in time that
the instrument cannot properly recover from one
measurement to the next. Correlations over time are
usually present but are often negligible.
2.1.2.2. Data collection
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc122.htm[6/27/2012 1:50:17 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.2. What is a check standard?
2.1.2.2. Data collection
Schedule for
making
measurements
A schedule for making check standard measurements over time (once a
day, twice a week, or whatever is appropriate for sampling all conditions
of measurement) should be set up and adhered to. The check standard
measurements should be structured in the same way as values reported on
the test items. For example, if the reported values are averages of two
repetitions made within 5 minutes of each other, the check standard
values should be averages of the two measurements made in the same
manner.
Exception One exception to this rule is that there should be at least J = 2 repetitions
per day. Without this redundancy, there is no way to check on the short-
term precision of the measurement system.
Depiction of
schedule for
making check
standard
measurements
with four
repetitions
per day over
K days on the
surface of a
silicon wafer
with the
repetitions
randomized
at various
positions on
the wafer
K days - 4 repetitions
2-level design for measurement process
Case study:
Resistivity
check
standard for
measurements
on silicon
wafers
The values for the check standard should be recorded along with pertinent
environmental readings and identifications for all other significant
factors. The best way to record this information is in one file with one
line or row (on a spreadsheet) of information in fixed fields for each
check standard measurement. A list of typical entries follows.
1. Identification for check standard
2. Date
3. Identification for the measurement design (if applicable)
4. Identification for the instrument
2.1.2.2. Data collection
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc122.htm[6/27/2012 1:50:17 PM]
5. Check standard value
6. Short-term standard deviation from J repetitions
7. Degrees of freedom
8. Operator identification
9. Environmental readings (if pertinent)
2.1.2.3. Analysis
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc123.htm[6/27/2012 1:50:18 PM]
2. Measurement Process Characterization
2.1. Characterization
2.1.2. What is a check standard?
2.1.2.3. Analysis
Short-term
or level-1
standard
deviations
from J
repetitions
An analysis of the check standard data is the basis for
quantifying random errors in the measurement process --
particularly time-dependent errors.
Given that we have a database of check standard
measurements as described in data collection where
represents the jth repetition on the kth day, the mean for the
kth day is
and the short-term (level-1) standard deviation with v = J - 1
degrees of freedom is
.
Drawback
of short-
term
standard
deviations
An individual short-term standard deviation will not be a
reliable estimate of precision if the degrees of freedom is less
than ten, but the individual estimates can be pooled over the K
days to obtain a more reliable estimate. The pooled level-1
standard deviation estimate with v = K(J - 1) degrees of
freedom is
.
This standard deviation can be interpreted as quantifying the
basic precision of the instrumentation used in the measurement
process.
Process
(level-2)
The level-2 standard deviation of the check standard is
appropriate for representing the process variability. It is
2.1.2.3. Analysis
http://www.itl.nist.gov/div898/handbook/mpc/section1/mpc123.htm[6/27/2012 1:50:18 PM]
standard
deviation
computed with v = K - 1 degrees of freedom as:
where
is the grand mean of the KJ check standard measurements.
Use in
quality
control
The check standard data and standard deviations that are
described in this section are used for controlling two aspects
of a measurement process:
1. Control of short-term variability
2. Control of bias and long-term variability
Case
study:
Resistivity
check
standard
For an example, see the case study for resistivity where
several check standards were measured J = 6 times per day
over several days.
2.2. Statistical control of a measurement process
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc2.htm[6/27/2012 1:50:19 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
The purpose of this section is to outline the steps that can be
taken to exercise statistical control over the measurement
process and demonstrate the validity of the uncertainty
statement. Measurement processes can change both with
respect to bias and variability. A change in instrument
precision may be readily noted as measurements are being
recorded, but changes in bias or long-term variability are
difficult to catch when the process is looking at a multitude of
artifacts over time.
What are the issues for control of a measurement process?
1. Purpose
2. Assumptions
3. Role of the check standard
How are bias and long-term variability controlled?
1. Shewhart control chart
2. Exponentially weighted moving average control chart
3. Data collection and analysis
4. Control procedure
5. Remedial actions & strategies
How is short-term variability controlled?
1. Control chart for standard deviations
2. Data collection and analysis
3. Control procedure
4. Remedial actions and strategies
2.2.1. What are the issues in controlling the measurement process?
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc21.htm[6/27/2012 1:50:19 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.1. What are the issues in controlling the
measurement process?
Purpose is to
guarantee
the
'goodness' of
measurement
results
The purpose of statistical control is to guarantee the
'goodness' of measurement results within predictable limits
and to validate the statement of uncertainty of the
measurement result.
Statistical control methods can be used to test the
measurement process for change with respect to bias and
variability from its historical levels. However, if the
measurement process is improperly specified or calibrated,
then the control procedures can only guarantee
comparability among measurements.
Assumption
of normality
is not
stringent
The assumptions that relate to measurement processes apply
to statistical control; namely that the errors of measurement
are uncorrelated over time and come from a population with
a single distribution. The tests for control depend on the
assumption that the underlying distribution is normal
(Gaussian), but the test procedures are robust to slight
departures from normality. Practically speaking, all that is
required is that the distribution of measurements be bell-
shaped and symmetric.
Check
standard is
mechanism
for
controlling
the process
Measurements on a check standard provide the mechanism
for controlling the measurement process.
Measurements on the check standard should produce
identical results except for the effect of random errors, and
tests for control are basically tests of whether or not the
random errors from the process continue to be drawn from
the same statistical distribution as the historical data on the
check standard.
Changes that can be monitored and tested with the check
standard database are:
1. Changes in bias and long-term variability
2. Changes in instrument precision or short-term
variability
2.2.1. What are the issues in controlling the measurement process?
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc21.htm[6/27/2012 1:50:19 PM]
2.2.2. How are bias and variability controlled?
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc22.htm[6/27/2012 1:50:20 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.2. How are bias and variability controlled?
Bias and
variability
are controlled
by monitoring
measurements
on a check
standard over
time
Bias and long-term variability are controlled by monitoring
measurements on a check standard over time. A change in the
measurement on the check standard that persists at a constant
level over several measurement sequences indicates possible:
1. Change or damage to the reference standards
2. Change or damage to the check standard artifact
3. Procedural change that vitiates the assumptions of the
measurement process
A change in the variability of the measurements on the check
standard can be due to one of many causes such as:
1. Loss of environmental controls
2. Change in handling techniques
3. Severe degradation in instrumentation.
The control procedure monitors the progress of measurements on
the check standard over time and signals when a significant
change occurs. There are two control chart procedures that are
suitable for this purpose.
Shewhart
Chart is easy
to implement
The Shewhart control chart has the advantage of being intuitive
and easy to implement. It is characterized by a center line and
symmetric upper and lower control limits. The chart is good for
detecting large changes but not for quickly detecting small
changes (of the order of one-half to one standard deviation) in the
process.
Depiction of
Shewhart
control chart
In the simplistic illustration of a Shewhart control chart shown
below, the measurements are within the control limits with the
exception of one measurement which exceeds the upper control
limit.
2.2.2. How are bias and variability controlled?
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc22.htm[6/27/2012 1:50:20 PM]
EWMA Chart
is better for
detecting
small changes
The EWMA control chart (exponentially weighted moving
average) is more difficult to implement but should be considered
if the goal is quick detection of small changes. The decision
process for the EWMA chart is based on an exponentially
decreasing (over time) function of prior measurements on the
check standard while the decision process for the Shewhart chart
is based on the current measurement only.
Example of
EWMA Chart
In the EWMA control chart below, the red dots represent the
measurements. Control is exercised via the exponentially weighted
moving average (shown as the curved line) which, in this case, is
approaching its upper control limit.
Artifacts for
process
control must
be stable and
available
Case study:
Resistivity
The check standard artifacts for controlling the bias or long-term
variability of the process must be of the same type and geometry
as items that are measured in the workload. The artifacts must be
stable and available to the measurement process on a continuing
basis. Usually, one artifact is sufficient. It can be:
1. An individual item drawn at random from the workload
2. A specific item reserved by the laboratory for the purpose.
Topic covered
in this
section>
The topics covered in this section include:
1. Shewhart control chart methodology
2.2.2. How are bias and variability controlled?
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc22.htm[6/27/2012 1:50:20 PM]
2. EWMA control chart methodology
3. Data collection & analysis
4. Monitoring
5. Remedies and strategies for dealing with out-of-control
signals.
2.2.2.1. Shewhart control chart
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc221.htm[6/27/2012 1:50:21 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.2. How are bias and variability controlled?
2.2.2.1. Shewhart control chart
Example of
Shewhart
control chart
for mass
calibrations
The Shewhart control chart has a baseline and upper and
lower limits, shown as dashed lines, that are symmetric
about the baseline. Measurements are plotted on the chart
versus a time line. Measurements that are outside the limits
are considered to be out of control.
Baseline is
the average
from
historical
data
The baseline for the control chart is the accepted value, an
average of the historical check standard values. A
minimum of 100 check standard values is required to
establish an accepted value.
Caution -
control limits
are computed
from the
process
standard
deviation --
not from
rational
subsets
The upper (UCL) and lower (LCL) control limits are:
UCL = Accepted value + k*process standard
deviation
LCL = Accepted value - k*process standard
deviation
where the process standard deviation is the standard
deviation computed from the check standard database.
Individual
measurements
cannot be
assessed
using the
standard
deviation
from short-
term
repetitions
This procedure is an individual observations control chart.
The previously described control charts depended on
rational subsets, which use the standard deviations
computed from the rational subsets to calculate the control
limits. For a measurement process, the subgroups would
consist of short-term repetitions which can characterize the
precision of the instrument but not the long-term variability
of the process. In measurement science, the interest is in
assessing individual measurements (or averages of short-
term repetitions). Thus, the standard deviation over time is
the appropriate measure of variability.
Choice of k
depends on
number of
measurements
we are
willing to
To achieve tight control of the measurement process, set
k = 2
in which case approximately 5% of the measurements from
a process that is in control will produce out-of-control
2.2.2.1. Shewhart control chart
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc221.htm[6/27/2012 1:50:21 PM]
reject
signals. This assumes that there is a sufficiently large
number of degrees of freedom (>100) for estimating the
process standard deviation.
To flag only those measurements that are egregiously out of
control, set
k = 3
in which case approximately 1% of the measurements from
an in-control process will produce out-of-control signals.
2.2.2.1.1. EWMA control chart
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc2211.htm[6/27/2012 1:50:21 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.2. How are bias and variability controlled?
2.2.2.1. Shewhart control chart
2.2.2.1.1. EWMA control chart
Small
changes only
become
obvious over
time
Because it takes time for the patterns in the data to emerge,
a permanent shift in the process may not immediately cause
individual violations of the control limits on a Shewhart
control chart. The Shewhart control chart is not powerful for
detecting small changes, say of the order of 1 - 1/2 standard
deviations. The EWMA (exponentially weighted moving
average) control chart is better suited to this purpose.
Example of
EWMA
control chart
for mass
calibrations
The exponentially weighted moving average (EWMA) is a
statistic for monitoring the process that averages the data in
a way that gives less and less weight to data as they are
further removed in time from the current measurement. The
data
Y
1
, Y
2
, ... , Y
t
are the check standard measurements ordered in time. The
EWMA statistic at time t is computed recursively from
individual data points, with the first EWMA statistic,
EWMA
1
, being the arithmetic average of historical data.
Control
mechanism
for EWMA
The EWMA control chart can be made sensitive to small
changes or a gradual drift in the process by the choice of the
weighting factor, . A weighting factor of 0.2 - 0.3 is
usually suggested for this purpose (Hunter), and 0.15 is also
a popular choice.
Limits for the
control chart
The target or center line for the control chart is the average
of historical data. The upper (UCL) and lower (LCL) limits
are
2.2.2.1.1. EWMA control chart
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc2211.htm[6/27/2012 1:50:21 PM]
where s times the radical expression is a good
approximation to the standard deviation of the EWMA
statistic and the factor k is chosen in the same way as for
the Shewhart control chart -- generally to be 2 or 3.
Procedure
for
implementing
the EWMA
control chart
The implementation of the EWMA control chart is the same
as for any other type of control procedure. The procedure is
built on the assumption that the "good" historical data are
representative of the in-control process, with future data
from the same process tested for agreement with the
historical data. To start the procedure, a target (average) and
process standard deviation are computed from historical
check standard data. Then the procedure enters the
monitoring stage with the EWMA statistics computed and
tested against the control limits. The EWMA statistics are
weighted averages, and thus their standard deviations are
smaller than the standard deviations of the raw data and the
corresponding control limits are narrower than the control
limits for the Shewhart individual observations chart.
2.2.2.2. Data collection
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc222.htm[6/27/2012 1:50:22 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.2. How are bias and variability controlled?
2.2.2.2. Data collection
Measurements
should cover
a sufficiently
long time
period to
cover all
environmental
conditions
A schedule should be set up for making measurements on the artifact
(check standard) chosen for control purposes. The measurements are
structured to sample all environmental conditions in the laboratory and all
other sources of influence on the measurement result, such as operators
and instruments.
For high-precision processes where the uncertainty of the result must be
guaranteed, a measurement on the check standard should be included
with every measurement sequence, if possible, and at least once a day.
For each occasion, J measurements are made on the check standard. If
there is no interest in controlling the short-term variability or precision of
the instrument, then one measurement is sufficient. However, a dual
purpose is served by making two or three measurements that track both
the bias and the short-term variability of the process with the same
database.
Depiction of
check
standard
measurements
with J = 4
repetitions
per day on the
surface of a
silicon wafer
over K days
where the
repetitions
are
randomized
over position
on the wafer
K days - 4 repetitions
2-level design for measurements on a check standard
Notation For J measurements on each of K days, the measurements are denoted by
The check
standard
value is
The check standard value for the kth day is
2.2.2.2. Data collection
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc222.htm[6/27/2012 1:50:22 PM]
defined as an
average of
short-term
repetitions
Accepted
value of check
standard
The accepted value, or baseline for the control chart, is
Process
standard
deviation
The process standard deviation is
Caution Check standard measurements should be structured in the same way as
values reported on the test items. For example, if the reported values are
averages of two measurements made within 5 minutes of each other, the
check standard values should be averages of the two measurements made
in the same manner.
Database
Case study:
Resistivity
Averages and short-term standard deviations computed from J repetitions
should be recorded in a file along with identifications for all significant
factors. The best way to record this information is to use one file with
one line (row in a spreadsheet) of information in fixed fields for each
group. A list of typical entries follows:
1. Month
2. Day
3. Year
4. Check standard identification
5. Identification for the measurement design (if applicable)
6. Instrument identification
7. Check standard value
8. Repeatability (short-term) standard deviation from J repetitions
9. Degrees of freedom
10. Operator identification
11. Environmental readings (if pertinent)
2.2.2.3. Monitoring bias and long-term variability
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc223.htm[6/27/2012 1:50:23 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.2. How are bias and variability controlled?
2.2.2.3. Monitoring bias and long-term variability
Monitoring
stage
Once the baseline and control limits for the control chart have been determined from
historical data, and any bad observations removed and the control limits recomputed, the
measurement process enters the monitoring stage. A Shewhart control chart and EWMA
control chart for monitoring a mass calibration process are shown below. For the
purpose of comparing the two techniques, the two control charts are based on the same
data where the baseline and control limits are computed from the data taken prior to
1985. The monitoring stage begins at the start of 1985. Similarly, the control limits for
both charts are 3-standard deviation limits. The check standard data and analysis are
explained more fully in another section.
Shewhart
control chart
of
measurements
of kilogram
check
standard
showing
outliers and a
shift in the
process that
occurred after
1985
EWMA chart
for
measurements
on kilogram
check
standard
In the EWMA control chart below, the control data after 1985 are shown in green, and
the EWMA statistics are shown as black dots superimposed on the raw data. The
EWMA statistics, and not the raw data, are of interest in looking for out-of-control
signals. Because the EWMA statistic is a weighted average, it has a smaller standard
deviation than a single control measurement, and, therefore, the EWMA control limits
are narrower than the limits for the Shewhart control chart shown above.
2.2.2.3. Monitoring bias and long-term variability
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc223.htm[6/27/2012 1:50:23 PM]
showing
multiple
violations of
the control
limits for the
EWMA
statistics
Measurements
that exceed
the control
limits require
action
The control strategy is based on the predictability of future measurements from
historical data. Each new check standard measurement is plotted on the control chart in
real time. These values are expected to fall within the control limits if the process has
not changed. Measurements that exceed the control limits are probably out-of-control
and require remedial action. Possible causes of out-of-control signals need to be
understood when developing strategies for dealing with outliers.
Signs of
significant
trends or
shifts
The control chart should be viewed in its entirety on a regular basis] to identify drift or
shift in the process. In the Shewhart control chart shown above, only a few points
exceed the control limits. The small, but significant, shift in the process that occurred
after 1985 can only be identified by examining the plot of control measurements over
time. A re-analysis of the kilogram check standard data shows that the control limits for
the Shewhart control chart should be updated based on the the data after 1985. In the
EWMA control chart, multiple violations of the control limits occur after 1986. In the
calibration environment, the incidence of several violations should alert the control
engineer that a shift in the process has occurred, possibly because of damage or change
in the value of a reference standard, and the process requires review.
2.2.2.4. Remedial actions
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc224.htm[6/27/2012 1:50:24 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.2. How are bias and variability controlled?
2.2.2.4. Remedial actions
Consider
possible
causes for
out-of-
control
signals and
take
corrective
long-term
actions
There are many possible causes of out-of-control signals.
A. Causes that do not warrant corrective action for the
process (but which do require that the current measurement
be discarded) are:
1. Chance failure where the process is actually in-
control
2. Glitch in setting up or operating the measurement
process
3. Error in recording of data
B. Changes in bias can be due to:
1. Damage to artifacts
2. Degradation in artifacts (wear or build-up of dirt and
mineral deposits)
C. Changes in long-term variability can be due to:
1. Degradation in the instrumentation
2. Changes in environmental conditions
3. Effect of a new or inexperienced operator
4-step
strategy for
short-term
An immediate strategy for dealing with out-of-control
signals associated with high precision measurement
processes should be pursued as follows:
Repeat
measurements
1. Repeat the measurement sequence to establish
whether or not the out-of-control signal was simply a
chance occurrence, glitch, or whether it flagged a
permanent change or trend in the process.
Discard
measurements
on test items
2. With high precision processes, for which a check
standard is measured along with the test items, new
values should be assigned to the test items based on
new measurement data.
Check for 3. Examine the patterns of recent data. If the process is
2.2.2.4. Remedial actions
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc224.htm[6/27/2012 1:50:24 PM]
drift gradually drifting out of control because of
degradation in instrumentation or artifacts, then:
Instruments may need to be repaired
Reference artifacts may need to be
recalibrated.
Reevaluate 4. Reestablish the process value and control limits from
more recent data if the measurement process cannot
be brought back into control.
2.2.3. How is short-term variability controlled?
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc23.htm[6/27/2012 1:50:24 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.3. How is short-term variability controlled?
Emphasis
on
instruments
Short-term variability or instrument precision is controlled by
monitoring standard deviations from repeated measurements
on the instrument(s) of interest. The database can come from
measurements on a single artifact or a representative set of
artifacts.
Artifacts -
Case
study:
Resistivity
The artifacts must be of the same type and geometry as items
that are measured in the workload, such as:
1. Items from the workload
2. A single check standard chosen for this purpose
3. A collection of artifacts set aside for this specific
purpose
Concepts
covered in
this section
The concepts that are covered in this section include:
1. Control chart methodology for standard deviations
2. Data collection and analysis
3. Monitoring
4. Remedies and strategies for dealing with out-of-control
signals
2.2.3.1. Control chart for standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc231.htm[6/27/2012 1:50:25 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.3. How is short-term variability controlled?
2.2.3.1. Control chart for standard deviations
Degradation
of
instrument
or
anomalous
behavior on
one
occasion
Changes in the precision of the instrument, particularly
anomalies and degradation, must be addressed. Changes in
precision can be detected by a statistical control procedure
based on the F-distribution where the short-term standard
deviations are plotted on the control chart.
The base line for this type of control chart is the pooled
standard deviation, s
1
, as defined in Data collection and
analysis.
Example of
control
chart for a
mass
balance
Only the upper control limit, UCL, is of interest for detecting
degradation in the instrument. As long as the short-term
standard deviations fall within the upper control limit
established from historical data, there is reason for
confidence that the precision of the instrument has not
degraded (i.e., common cause variations).
The control
limit is
based on the
F-
distribution
The control limit is
where the quantity under the radical is the upper critical
value from the F table with degrees of freedom (J - 1) and
K(J - 1). The numerator degrees of freedom, v1 = (J -1), are
associated with the standard deviation computed from the
current measurements, and the denominator degrees of
freedom, v2 = K(J -1), correspond to the pooled standard
deviation of the historical data. The probability is chosen
to be small, say 0.05.
The justification for this control limit, as opposed to the
more conventional standard deviation control limit, is that we
are essentially performing the following hypothesis test:
H
0
:
1
=
2
H
a
:
2
>
1
where
1
is the population value for the s
1
defined above
and
2
is the population value for the standard deviation of
the current values being tested. Generally, s
1
is based on
2.2.3.1. Control chart for standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc231.htm[6/27/2012 1:50:25 PM]
sufficient historical data that it is reasonable to make the
assumption that
1
is a "known" value.
The upper control limit above is then derived based on the
standard F test for equal standard deviations. Justification
and details of this derivation are given in Cameron and
Hailes (1974).
Sample
Code
Sample code for computing the F value for the case where
= 0.05, J = 6, and K = 6, is available for both Dataplot and
R.
2.2.3.2. Data collection
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc232.htm[6/27/2012 1:50:26 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.3. How is short-term variability controlled?
2.2.3.2. Data collection
Case
study:
Resistivity
A schedule should be set up for making measurements with a
single instrument (once a day, twice a week, or whatever is
appropriate for sampling all conditions of measurement).
Short-term
standard
deviations
The measurements are denoted
where there are J measurements on each of K occasions. The
average for the kth occasion is:
The short-term (repeatability) standard deviation for the kth
occasion is:
with (J-1) degrees of freedom.
Pooled
standard
deviation
The repeatability standard deviations are pooled over the K
occasions to obtain an estimate with K(J - 1) degrees of
freedom of the level-1 standard deviation
Note: The same notation is used for the repeatability standard
deviation whether it is based on one set of measurements or
pooled over several sets.
Database The individual short-term standard deviations along with
identifications for all significant factors are recorded in a file.
The best way to record this information is by using one file
with one line (row in a spreadsheet) of information in fixed
2.2.3.2. Data collection
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc232.htm[6/27/2012 1:50:26 PM]
fields for each group. A list of typical entries follows.
1. Identification of test item or check standard
2. Date
3. Short-term standard deviation
4. Degrees of freedom
5. Instrument
6. Operator
2.2.3.3. Monitoring short-term precision
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc233.htm[6/27/2012 1:50:26 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.3. How is short-term variability controlled?
2.2.3.3. Monitoring short-term precision
Monitoring
future
precision
Once the base line and control limit for the control chart have been determined
from historical data, the measurement process enters the monitoring stage. In
the control chart shown below, the control limit is based on the data taken prior
to 1985.
Each new
standard
deviation is
monitored on
the control
chart
Each new short-term standard deviation based on J measurements is plotted on
the control chart; points that exceed the control limits probably indicate lack of
statistical control. Drift over time indicates degradation of the instrument.
Points out of control require remedial action, and possible causes of out of
control signals need to be understood when developing strategies for dealing
with outliers.
Control chart
for precision
for a mass
balance from
historical
standard
deviations for
the balance
with 3
degrees of
freedom each.
The control
chart
identifies two
outliers and
slight
degradation
over time in
the precision
of the balance
TIME IN YEARS
Monitoring
where the
number of
There is no requirement that future standard deviations be based on J, the
number of measurements in the historical database. However, a change in the
number of measurements leads to a change in the test for control, and it may
2.2.3.3. Monitoring short-term precision
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc233.htm[6/27/2012 1:50:26 PM]
measurements
are different
from J
not be convenient to draw a control chart where the control limits are changing
with each new measurement sequence.
For a new standard deviation based on J' measurements, the precision of the
instrument is in control if
.
Notice that the numerator degrees of freedom, v1 = J'- 1, changes but the
denominator degrees of freedom, v2 = K(J - 1), remains the same.
2.2.3.4. Remedial actions
http://www.itl.nist.gov/div898/handbook/mpc/section2/mpc234.htm[6/27/2012 1:50:27 PM]
2. Measurement Process Characterization
2.2. Statistical control of a measurement process
2.2.3. How is short-term variability controlled?
2.2.3.4. Remedial actions
Examine
possible
causes
A. Causes that do not warrant corrective action (but which
do require that the current measurement be discarded) are:
1. Chance failure where the precision is actually in
control
2. Glitch in setting up or operating the measurement
process
3. Error in recording of data
B. Changes in instrument performance can be due to:
1. Degradation in electronics or mechanical components
2. Changes in environmental conditions
3. Effect of a new or inexperienced operator
Repeat
measurements
Repeat the measurement sequence to establish whether or
not the out-of-control signal was simply a chance
occurrence, glitch, or whether it flagged a permanent
change or trend in the process.
Assign new
value to test
item
With high precision processes, for which the uncertainty
must be guaranteed, new values should be assigned to the
test items based on new measurement data.
Check for
degradation
Examine the patterns of recent standard deviations. If the
process is gradually drifting out of control because of
degradation in instrumentation or artifacts, instruments may
need to be repaired or replaced.
2.3. Calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3.htm[6/27/2012 1:50:27 PM]
2. Measurement Process Characterization
2.3. Calibration
The purpose of this section is to outline the procedures for
calibrating artifacts and instruments while guaranteeing the
'goodness' of the calibration results. Calibration is a
measurement process that assigns values to the property of an
artifact or to the response of an instrument relative to
reference standards or to a designated measurement process.
The purpose of calibration is to eliminate or reduce bias in the
user's measurement system relative to the reference base. The
calibration procedure compares an "unknown" or test item(s)
or instrument with reference standards according to a specific
algorithm.
What are the issues for calibration?
1. Artifact or instrument calibration
2. Reference base
3. Reference standard(s)
What is artifact (single-point) calibration?
1. Purpose
2. Assumptions
3. Bias
4. Calibration model
What are calibration designs?
1. Purpose
2. Assumptions
3. Properties of designs
4. Restraint
5. Check standard in a design
6. Special types of bias (left-right effect & linear drift)
7. Solutions to calibration designs
8. Uncertainty of calibrated values
Catalog of calibration designs
1. Mass weights
2. Gage blocks
3. Electrical standards - saturated standard cells, zeners,
resistors
4. Roundness standards
5. Angle blocks
2.3. Calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3.htm[6/27/2012 1:50:27 PM]
6. Indexing tables
7. Humidity cylinders
Control of artifact calibration
1. Control of the precision of the calibrating instrument
2. Control of bias and long-term variability
What is instrument calibration over a regime?
1. Models for instrument calibration
2. Data collection
3. Assumptions
4. What can go wrong with the calibration procedure?
5. Data analysis and model validation
6. Calibration of future measurements
7. Uncertainties of calibrated values
1. From propagation of error for a quadratic
calibration
2. From check standard measurements for a linear
calibration
3. Comparison of check standard technique and
propagation of error
Control of instrument calibration
1. Control chart for linear calibration
2. Critical values of t* statistic
2.3.1. Issues in calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc31.htm[6/27/2012 1:50:28 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.1. Issues in calibration
Calibration
reduces
bias
Calibration is a measurement process that assigns values to
the property of an artifact or to the response of an instrument
relative to reference standards or to a designated measurement
process. The purpose of calibration is to eliminate or reduce
bias in the user's measurement system relative to the reference
base.
Artifact &
instrument
calibration
The calibration procedure compares an "unknown" or test
item(s) or instrument with reference standards according to a
specific algorithm. Two general types of calibration are
considered in this Handbook:
artifact calibration at a single point
instrument calibration over a regime
Types of
calibration
not
discussed
The procedures in this Handbook are appropriate for
calibrations at secondary or lower levels of the traceability
chain where reference standards for the unit already exist.
Calibration from first principles of physics and reciprocity
calibration are not discussed.
2.3.1.1. Reference base
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc311.htm[6/27/2012 1:50:29 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.1. Issues in calibration
2.3.1.1. Reference base
Ultimate
authority
The most critical element of any measurement process is the
relationship between a single measurement and the
reference base for the unit of measurement. The reference
base is the ultimate source of authority for the measurement
unit.
Base and
derived units
of
measurement
The base units of measurement in the Le Systeme
International d'Unites (SI) are (Taylor):
kilogram - mass
meter - length
second - time
ampere - electric current
kelvin - thermodynamic temperature
mole - amount of substance
candela - luminous intensity
These units are maintained by the Bureau International des
Poids et Mesures in Paris. Local reference bases for these
units and SI derived units such as:
pascal - pressure
newton - force
hertz - frequency
ohm - resistance
degrees Celsius - Celsius temperature, etc.
are maintained by national and regional standards
laboratories.
Other
sources
Consensus values from interlaboratory tests or
instrumentation/standards as maintained in specific
environments may serve as reference bases for other units of
measurement.
2.3.1.2. Reference standards
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc312.htm[6/27/2012 1:50:29 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.1. Issues in calibration
2.3.1.2. Reference standards
Primary
reference
standards
A reference standard for a unit of measurement is an artifact
that embodies the quantity of interest in a way that ties its
value to the reference base.
At the highest level, a primary reference standard is assigned a
value by direct comparison with the reference base. Mass is
the only unit of measurement that is defined by an artifact. The
kilogram is defined as the mass of a platinum-iridium
kilogram that is maintained by the Bureau International des
Poids et Mesures in Sevres, France.
Primary reference standards for other units come from
realizations of the units embodied in artifact standards. For
example, the reference base for length is the meter which is
defined as the length of the path by light in vacuum during a
time interval of 1/299,792,458 of a second.
Secondary
reference
standards
Secondary reference standards are calibrated by comparing
with primary standards using a high precision comparator and
making appropriate corrections for non-ideal conditions of
measurement.
Secondary reference standards for mass are stainless steel
kilograms, which are calibrated by comparing with a primary
standard on a high precision balance and correcting for the
buoyancy of air. In turn these weights become the reference
standards for assigning values to test weights.
Secondary reference standards for length are gage blocks,
which are calibrated by comparing with primary gage block
standards on a mechanical comparator and correcting for
temperature. In turn, these gage blocks become the reference
standards for assigning values to test sets of gage blocks.
2.3.2. What is artifact (single-point) calibration?
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc32.htm[6/27/2012 1:50:30 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.2. What is artifact (single-point) calibration?
Purpose Artifact calibration is a measurement process that assigns
values to the property of an artifact relative to a reference
standard(s). The purpose of calibration is to eliminate or
reduce bias in the user's measurement system relative to the
reference base.
The calibration procedure compares an "unknown" or test
item(s) with a reference standard(s) of the same nominal
value (hence, the term single-point calibration) according to
a specific algorithm called a calibration design.
Assumptions The calibration procedure is based on the assumption that
individual readings on test items and reference standards are
subject to:
Bias that is a function of the measuring system or
instrument
Random error that may be uncontrollable
What is
bias?
The operational definition of bias is that it is the difference
between values that would be assigned to an artifact by the
client laboratory and the laboratory maintaining the reference
standards. Values, in this sense, are understood to be the
long-term averages that would be achieved in both
laboratories.
Calibration
model for
eliminating
bias
requires a
reference
standard
that is very
close in
value to the
test item
One approach to eliminating bias is to select a reference
standard that is almost identical to the test item; measure the
two artifacts with a comparator type of instrument; and take
the difference of the two measurements to cancel the bias.
The only requirement on the instrument is that it be linear
over the small range needed for the two artifacts.
The test item has value X*, as yet to be assigned, and the
reference standard has an assigned value R*. Given a
measurement, X, on the test item and a measurement, R, on
the reference standard,
,
2.3.2. What is artifact (single-point) calibration?
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc32.htm[6/27/2012 1:50:30 PM]
the difference between the test item and the reference is
estimated by
,
and the value of the test item is reported as
.
Need for
redundancy
leads to
calibration
designs
A deficiency in relying on a single difference to estimate D
is that there is no way of assessing the effect of random
errors. The obvious solution is to:
Repeat the calibration measurements J times
Average the results
Compute a standard deviation from the J results
Schedules of redundant intercomparisons involving
measurements on several reference standards and test items
in a connected sequence are called calibration designs and
are discussed in later sections.
2.3.3. What are calibration designs?
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc33.htm[6/27/2012 1:50:30 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
Calibration
designs are
redundant
schemes for
intercomparing
reference
standards and
test items
Calibration designs are redundant schemes for
intercomparing reference standards and test items in such
a way that the values can be assigned to the test items
based on known values of reference standards. Artifacts
that traditionally have been calibrated using calibration
designs are:
mass weights
resistors
voltage standards
length standards
angle blocks
indexing tables
liquid-in-glass thermometers, etc.
Outline of
section
The topics covered in this section are:
Designs for elimination of left-right bias and linear
drift
Solutions to calibration designs
Uncertainties of calibrated values
A catalog of calibration designs is provided in the next
section.
Assumptions
for calibration
designs include
demands on
the quality of
the artifacts
The assumptions that are necessary for working with
calibration designs are that:
Random errors associated with the measurements
are independent.
All measurements come from a distribution with the
same standard deviation.
Reference standards and test items respond to the
measuring environment in the same manner.
Handling procedures are consistent from item to
item.
Reference standards and test items are stable during
the time of measurement.
Bias is canceled by taking the difference between
measurements on the test item and the reference
standard.
2.3.3. What are calibration designs?
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc33.htm[6/27/2012 1:50:30 PM]
Important
concept -
Restraint
The restraint is the known value of the reference standard
or, for designs with two or more reference standards, the
restraint is the summation of the values of the reference
standards.
Requirements
& properties of
designs
Basic requirements are:
The differences must be nominally zero.
The design must be solvable for individual items
given the restraint.
It is possible to construct designs which do not have these
properties. This will happen, for example, if reference
standards are only compared among themselves and test
items are only compared among themselves without any
intercomparisons.
Practical
considerations
determine a
'good' design
We do not apply 'optimality' criteria in constructing
calibration designs because the construction of a 'good'
design depends on many factors, such as convenience in
manipulating the test items, time, expense, and the
maximum load of the instrument.
The number of measurements should be small.
The degrees of freedom should be greater than
three.
The standard deviations of the estimates for the test
items should be small enough for their intended
purpose.
Check
standard in a
design
Designs listed in this Handbook have provision for a
check standard in each series of measurements. The check
standard is usually an artifact, of the same nominal size,
type, and quality as the items to be calibrated. Check
standards are used for:
Controlling the calibration process
Quantifying the uncertainty of calibrated results
Estimates that
can be
computed from
a design
Calibration designs are solved by a restrained least-
squares technique (Zelen) which gives the following
estimates:
Values for individual reference standards
Values for individual test items
Value for the check standard
Repeatability standard deviation and degrees of
freedom
Standard deviations associated with values for
reference standards and test items
2.3.3. What are calibration designs?
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc33.htm[6/27/2012 1:50:30 PM]
2.3.3.1. Elimination of special types of bias
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc331.htm[6/27/2012 1:50:31 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.1. Elimination of special types of bias
Assumptions
which may
be violated
Two of the usual assumptions relating to calibration
measurements are not always valid and result in biases.
These assumptions are:
Bias is canceled by taking the difference between the
measurement on the test item and the measurement on
the reference standard
Reference standards and test items remain stable
throughout the measurement sequence
Ideal
situation
In the ideal situation, bias is eliminated by taking the
difference between a measurement X on the test item and a
measurement R on the reference standard. However, there are
situations where the ideal is not satisfied:
Left-right (or constant instrument) bias
Bias caused by instrument drift
2.3.3.1.1. Left-right (constant instrument) bias
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3311.htm[6/27/2012 1:50:32 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.1. Elimination of special types of bias
2.3.3.1.1. Left-right (constant instrument) bias
Left-right
bias which is
not
eliminated by
differencing
A situation can exist in which a bias, P, which is constant
and independent of the direction of measurement, is
introduced by the measurement instrument itself. This type
of bias, which has been observed in measurements of
standard voltage cells (Eicke & Cameron) and is not
eliminated by reversing the direction of the current, is
shown in the following equations.
Elimination
of left-right
bias requires
two
measurements
in reverse
direction
The difference between the test and the reference can be
estimated without bias only by taking the difference
between the two measurements shown above where P
cancels in the differencing so that
.
The value of
the test item
depends on
the known
value of the
reference
standard, R*
The test item, X, can then be estimated without bias by
and P can be estimated by
.
Calibration
designs that
are left-right
balanced
This type of scheme is called left-right balanced and the
principle is extended to create a catalog of left-right
balanced designs for intercomparing reference standards
among themselves. These designs are appropriate ONLY for
comparing reference standards in the same environment, or
enclosure, and are not appropriate for comparing, say,
across standard voltage cells in two boxes.
2.3.3.1.1. Left-right (constant instrument) bias
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3311.htm[6/27/2012 1:50:32 PM]
1. Left-right balanced design for a group of 3 artifacts
2. Left-right balanced design for a group of 4 artifacts
3. Left-right balanced design for a group of 5 artifacts
4. Left-right balanced design for a group of 6 artifacts
2.3.3.1.2. Bias caused by instrument drift
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3312.htm[6/27/2012 1:50:32 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.1. Elimination of special types of bias
2.3.3.1.2. Bias caused by instrument drift
Bias caused
by linear drift
over the time
of
measurement
The requirement that reference standards and test items be
stable during the time of measurement cannot always be
met because of changes in temperature caused by body
heat, handling, etc.
Representation
of linear drift
Linear drift for an even number of measurements is
represented by
..., -5d, -3d, -1d, +1d, +3d, +5d, ...
and for an odd number of measurements by
..., -3d, -2d, -1d, 0d, +1d, +2d, +3d, ... .
Assumptions
for drift
elimination
The effect can be mitigated by a drift-elimination scheme
(Cameron/Hailes) which assumes:
Linear drift over time
Equally spaced measurements in time
Example of
drift-
elimination
scheme
An example is given by substitution weighing where scale
deflections on a balance are observed for X, a test weight,
and R, a reference weight.
Estimates of
drift-free
difference and
size of drift
The drift-free difference between the test and the reference
is estimated by
and the size of the drift is estimated by
2.3.3.1.2. Bias caused by instrument drift
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3312.htm[6/27/2012 1:50:32 PM]
Calibration
designs for
eliminating
linear drift
This principle is extended to create a catalog of drift-
elimination designs for multiple reference standards and
test items. These designs are listed under calibration
designs for gauge blocks because they have traditionally
been used to counteract the effect of temperature build-up
in the comparator during calibration.
2.3.3.2. Solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc332.htm[6/27/2012 1:50:33 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.2. Solutions to calibration designs
Solutions for
designs listed
in the catalog
Solutions for all designs that are cataloged in this Handbook are included
with the designs. Solutions for other designs can be computed from the
instructions on the following page given some familiarity with matrices.
Measurements
for the 1,1,1
design
The use of the tables shown in the catalog are illustrated for three
artifacts; namely, a reference standard with known value R* and a check
standard and a test item with unknown values. All artifacts are of the
same nominal size. The design is referred to as a 1,1,1 design for
n = 3 difference measurements
m = 3 artifacts
Convention
for showing
the
measurement
sequence and
identifying the
reference and
check
standards
The convention for showing the measurement sequence is shown below.
Nominal values are underlined in the first line showing that this design is
appropriate for comparing three items of the same nominal size such as
three one-kilogram weights. The reference standard is the first artifact,
the check standard is the second, and the test item is the third.
1 1 1
Y(1) = + -
Y(2) = + -
Y(3) = + -
Restraint +
Check standard +
Limitation of
this design
This design has degrees of freedom
v = n - m + 1 = 1
Convention
for showing
least-squares
estimates for
individual
items
The table shown below lists the coefficients for finding the estimates for
the individual items. The estimates are computed by taking the cross-
product of the appropriate column for the item of interest with the
column of measurement data and dividing by the divisor shown at the
top of the table.
2.3.3.2. Solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc332.htm[6/27/2012 1:50:33 PM]
SOLUTION MATRIX
DIVISOR = 3
OBSERVATIONS 1 1 1
Y(1) 0 -2 -1
Y(2) 0 -1 -2
Y(3) 0 1 -1
R* 3 3 3
Solutions for
individual
items from the
table above
For example, the solution for the reference standard is shown under the
first column; for the check standard under the second column; and for
the test item under the third column. Notice that the estimate for the
reference standard is guaranteed to be R*, regardless of the measurement
results, because of the restraint that is imposed on the design. The
estimates are as follows:
Convention
for showing
standard
deviations for
individual
items and
combinations
of items
The standard deviations are computed from two tables of factors as
shown below. The standard deviations for combinations of items include
appropriate covariance terms.
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
K1 1 1 1
1 0.0000 +
1 0.8165 +
1 0.8165 +
2 1.4142 + +
1 0.8165 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
K2 1 1 1
1 0.0000 +
1 1.4142 +
1 1.4142 +
2 2.4495 + +
1 1.4142 +
Unifying
equation
The standard deviation for each item is computed using the unifying
equation:
Standard
deviations for
1,1,1 design
from the
For the 1,1,1 design, the standard deviations are:
2.3.3.2. Solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc332.htm[6/27/2012 1:50:33 PM]
tables of
factors
Process
standard
deviations
must be
known from
historical
data
In order to apply these equations, we need an estimate of the standard
deviation, s
days
, that describes day-to-day changes in the measurement
process. This standard deviation is in turn derived from the level-2
standard deviation, s
2
, for the check standard. This standard deviation is
estimated from historical data on the check standard; it can be negligible,
in which case the calculations are simplified.
The repeatability standard deviation s
1
, is estimated from historical data,
usually from data of several designs.
Steps in
computing
standard
deviations
The steps in computing the standard deviation for a test item are:
Compute the repeatability standard deviation from the design or
historical data.
Compute the standard deviation of the check standard from
historical data.
Locate the factors, K
1
and K
2
for the check standard; for the
1,1,1 design the factors are 0.8165 and 1.4142, respectively, where
the check standard entries are last in the tables.
Apply the unifying equation to the check standard to estimate the
standard deviation for days. Notice that the standard deviation of
the check standard is the same as the level-2 standard deviation,
s
2
, that is referred to on some pages. The equation for the between-
days standard deviation from the unifying equation is
.
Thus, for the example above
.
This is the number that is entered into the NIST mass calibration
software as the between-time standard deviation. If you are using
this software, this is the only computation that you need to make
because the standard deviations for the test items are computed
automatically by the software.
2.3.3.2. Solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc332.htm[6/27/2012 1:50:33 PM]
If the computation under the radical sign gives a negative number,
set s
days
=0. (This is possible and indicates that there is no
contribution to uncertainty from day-to-day effects.)
For completeness, the computations of the standard deviations for
the test item and for the sum of the test and the check standard
using the appropriate factors are shown below.
2.3.3.2.1. General matrix solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3321.htm[6/27/2012 1:50:35 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. Calibration designs
2.3.3.2. General solutions to calibration designs
2.3.3.2.1. General matrix solutions to calibration designs
Requirements Solutions for all designs that are cataloged in this Handbook are included with the
designs. Solutions for other designs can be computed from the instructions below
given some familiarity with matrices. The matrix manipulations that are required for
the calculations are:
transposition (indicated by ')
multiplication
inversion
Notation n = number of difference measurements
m = number of artifacts
(n - m + 1) = degrees of freedom
X= (nxm) design matrix
r'= (mx1) vector identifying the restraint
= (mx1) vector identifying ith item of interest consisting of a 1 in the ith
position and zeros elsewhere
R*= value of the reference standard
Y= (mx1) vector of observed difference measurements
Convention
for showing
the
measurement
sequence
The convention for showing the measurement sequence is illustrated with the three
measurements that make up a 1,1,1 design for 1 reference standard, 1 check
standard, and 1 test item. Nominal values are underlined in the first line .
1 1 1
Y(1) = + -
Y(2) = + -
Y(3) = + -
Matrix
algebra for
solving a
design
The (mxn) design matrix X is constructed by replacing the pluses (+), minues (-)
and blanks with the entries 1, -1, and 0 respectively.
The (mxm) matrix of normal equations, X'X, is formed and augmented by the
restraint vector to form an (m+1)x(m+1) matrix, A:
2.3.3.2.1. General matrix solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3321.htm[6/27/2012 1:50:35 PM]
Inverse of
design matrix
The A matrix is inverted and shown in the form:
where Q is an mxm matrix that, when multiplied by s
2
, yields the usual variance-
covariance matrix.
Estimates of
values of
individual
artifacts
The least-squares estimates for the values of the individual artifacts are contained in
the (mx1) matrix, B, where
where Q is the upper left element of the A
-1
matrix shown above. The structure of
the individual estimates is contained in the QX' matrix; i.e. the estimate for the ith
item can be computed from XQ and Y by
Cross multiplying the ith column of XQ with Y
And adding R*(nominal test)/(nominal restraint)
Clarify with
an example
We will clarify the above discussion with an example from the mass calibration
process at NIST. In this example, two NIST kilograms are compared with a
customer's unknown kilogram.
The design matrix, X, is
The first two columns represent the two NIST kilograms while the third column
represents the customers kilogram (i.e., the kilogram being calibrated).
The measurements obtained, i.e., the Y matrix, are
The measurements are the differences between two measurements, as specified by
the design matrix, measured in grams. That is, Y(1) is the difference in measurement
between NIST kilogram one and NIST kilogram two, Y(2) is the difference in
measurement between NIST kilogram one and the customer kilogram, and Y(3) is
the difference in measurement between NIST kilogram two and the customer
kilogram.
The value of the reference standard, R
*
, is 0.82329.
Then
2.3.3.2.1. General matrix solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3321.htm[6/27/2012 1:50:35 PM]
If there are three weights with known values for weights one and two, then
r = [ 1 1 0 ]
Thus
and so
From A
-1
, we have
We then compute QX'
We then compute B = QX'Y + h'R
*
This yields the following least-squares coefficient estimates:
Standard
deviations of
estimates
The standard deviation for the ith item is:
2.3.3.2.1. General matrix solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3321.htm[6/27/2012 1:50:35 PM]
where
The process standard deviation, which is a measure of the overall precision of the
(NIST) mass calibrarion process,
is the residual standard deviation from the design, and s
days
is the standard
deviation for days, which can only be estimated from check standard measurements.
Example We continue the example started above. Since n = 3 and m = 3, the formula reduces
to:
Substituting the values shown above for X, Y, and Q results in
and
Y'(I - XQX')Y = 0.0000083333
Finally, taking the square root gives
s
1
= 0.002887
The next step is to compute the standard deviation of item 3 (the customers
kilogram), that is s
item
3
. We start by substitituting the values for X and Q and
computing D
Next, we substitute = [0 0 1] and = 0.02111
2
(this value is taken from a
check standard and not computed from the values given in this example).
We obtain the following computations
2.3.3.2.1. General matrix solutions to calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3321.htm[6/27/2012 1:50:35 PM]
and
and
2.3.3.3. Uncertainties of calibrated values
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc333.htm[6/27/2012 1:50:35 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.3. Uncertainties of calibrated values
Uncertainty
analysis
follows the
ISO
principles
This section discusses the calculation of uncertainties of
calibrated values from calibration designs. The discussion
follows the guidelines in the section on classifying and
combining components of uncertainty. Two types of
evaluations are covered.
1. type A evaluations of time-dependent sources of
random error
2. type B evaluations of other sources of error
The latter includes, but is not limited to, uncertainties from
sources that are not replicated in the calibration design such
as uncertainties of values assigned to reference standards.
Uncertainties
for test items
Uncertainties associated with calibrated values for test items
from designs require calculations that are specific to the
individual designs. The steps involved are outlined below.
Outline for
the section
on
uncertainty
analysis
Historical perspective
Assumptions
Example of more realistic model
Computation of repeatability standard deviations
Computation of level-2 standard deviations
Combination of repeatability and level-2 standard
deviations
Example of computations for 1,1,1,1 design
Type B uncertainty associated with the restraint
Expanded uncertainty of calibrated values
2.3.3.3.1. Type A evaluations for calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3331.htm[6/27/2012 1:50:36 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.3. Uncertainties of calibrated values
2.3.3.3.1. Type A evaluations for calibration
designs
Change over
time
Type A evaluations for calibration processes must take into
account changes in the measurement process that occur
over time.
Historically,
uncertainties
considered
only
instrument
imprecision
Historically, computations of uncertainties for calibrated
values have treated the precision of the comparator
instrument as the primary source of random uncertainty in
the result. However, as the precision of instrumentation has
improved, effects of other sources of variability have begun
to show themselves in measurement processes. This is not
universally true, but for many processes, instrument
imprecision (short-term variability) cannot explain all the
variation in the process.
Effects of
environmental
changes
Effects of humidity, temperature, and other environmental
conditions which cannot be closely controlled or corrected
must be considered. These tend to exhibit themselves over
time, say, as between-day effects. The discussion of
between-day (level-2) effects relating to gauge studies
carries over to the calibration setting, but the computations
are not as straightforward.
Assumptions
which are
specific to
this section
The computations in this section depend on specific
assumptions:
1. Short-term effects associated with instrument
response
come from a single distribution
vary randomly from measurement to
measurement within a design.
2. Day-to-day effects
come from a single distribution
vary from artifact to artifact but remain
constant for a single calibration
vary from calibration to calibration
These These assumptions have proved useful for characterizing
2.3.3.3.1. Type A evaluations for calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3331.htm[6/27/2012 1:50:36 PM]
assumptions
have proved
useful but
may need to
be expanded
in the future
high precision measurement processes, but more
complicated models may eventually be needed which take
the relative magnitudes of the test items into account. For
example, in mass calibration, a 100 g weight can be
compared with a summation of 50g, 30g and 20 g weights
in a single measurement. A sophisticated model might
consider the size of the effect as relative to the nominal
masses or volumes.
Example of
the two
models for a
design for
calibrating
test item
using 1
reference
standard
To contrast the simple model with the more complicated
model, a measurement of the difference between X, the test
item, with unknown and yet to be determined value, X*,
and a reference standard, R, with known value, R*, and the
reverse measurement are shown below.
Model (1) takes into account only instrument imprecision
so that:
(1)
with the error terms random errors that come from the
imprecision of the measuring instrument.
Model (2) allows for both instrument imprecision and
level-2 effects such that:
(2)
where the delta terms explain small changes in the values
of the artifacts that occur over time. For both models, the
value of the test item is estimated as
Standard
deviations
from both
models
For model (l), the standard deviation of the test item is
For model (2), the standard deviation of the test item is
2.3.3.3.1. Type A evaluations for calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3331.htm[6/27/2012 1:50:36 PM]
.
Note on
relative
contributions
of both
components
to uncertainty
In both cases, is the repeatability standard deviation that
describes the precision of the instrument and is the
level-2 standard deviation that describes day-to-day
changes. One thing to notice in the standard deviation for
the test item is the contribution of relative to the total
uncertainty. If is large relative to , or dominates, the
uncertainty will not be appreciably reduced by adding
measurements to the calibration design.
2.3.3.3.2. Repeatability and level-2 standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3332.htm[6/27/2012 1:50:37 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.3. Uncertainties of calibrated values
2.3.3.3.2. Repeatability and level-2 standard
deviations
Repeatability
standard
deviation
comes from
the data of a
single design
The repeatability standard deviation of the instrument can
be computed in two ways.
1. It can be computed as the residual standard deviation
from the design and should be available as output
from any software package that reduces data from
calibration designs. The matrix equations for this
computation are shown in the section on solutions to
calibration designs. The standard deviation has
degrees of freedom
v = n - m + 1
for n difference measurements and m items.
Typically the degrees of freedom are very small. For
two differences measurements on a reference
standard and test item, the degrees of freedom is
v=1.
A more
reliable
estimate
comes from
pooling over
historical
data
2. A more reliable estimate of the standard deviation
can be computed by pooling variances from K
calibrations (and then taking its square root) using
the same instrument (assuming the instrument is in
statistical control). The formula for the pooled
estimate is
Level-2
standard
deviation is
estimated
from check
standard
measurements
The level-2 standard deviation cannot be estimated from
the data of the calibration design. It cannot generally be
estimated from repeated designs involving the test items.
The best mechanism for capturing the day-to-day effects is
a check standard, which is treated as a test item and
included in each calibration design. Values of the check
standard, estimated over time from the calibration design,
2.3.3.3.2. Repeatability and level-2 standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3332.htm[6/27/2012 1:50:37 PM]
are used to estimate the standard deviation.
Assumptions The check standard value must be stable over time, and the
measurements must be in statistical control for this
procedure to be valid. For this purpose, it is necessary to
keep a historical record of values for a given check
standard, and these values should be kept by instrument
and by design.
Computation
of level-2
standard
deviation
Given K historical check standard values,
the standard deviation of the check standard values is
computed as
where
with degrees of freedom v = K - 1.
2.3.3.3.3. Combination of repeatability and level-2 standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3333.htm[6/27/2012 1:50:38 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.3. Uncertainties of calibrated values
2.3.3.3.3. Combination of repeatability and
level-2 standard deviations
Standard
deviation
of test item
depends on
several
factors
The final question is how to combine the repeatability
standard deviation and the standard deviation of the check
standard to estimate the standard deviation of the test item.
This computation depends on:
structure of the design
position of the check standard in the design
position of the reference standards in the design
position of the test item in the design
Derivations
require
matrix
algebra
Tables for estimating standard deviations for all test items are
reported along with the solutions for all designs in the catalog.
The use of the tables for estimating the standard deviations
for test items is illustrated for the 1,1,1,1 design. Matrix
equations can be used for deriving estimates for designs that
are not in the catalog.
The check standard for each design is either an additional test
item in the design, other than the test items that are submitted
for calibration, or it is a construction, such as the difference
between two reference standards as estimated by the design.
2.3.3.3.4. Calculation of standard deviations for 1,1,1,1 design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3334.htm[6/27/2012 1:50:38 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.3. Uncertainties of calibrated values
2.3.3.3.4. Calculation of standard deviations for 1,1,1,1
design
Design with
2 reference
standards
and 2 test
items
An example is shown below for a 1,1,1,1 design for two reference standards,
R
1
and R
2
, and two test items, X
1
and X
2
, and six difference measurements.
The restraint, R*, is the sum of values of the two reference standards, and the
check standard, which is independent of the restraint, is the difference
between the values of the reference standards. The design and its solution are
reproduced below.
Check
standard is
the
difference
between the
2 reference
standards
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
RESTRAINT + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1
Y(1) 2 -2 0 0
Y(2) 1 -1 -3 -1
Y(3) 1 -1 -1 -3
Y(4) -1 1 -3 -1
Y(5) -1 1 -1 -3
Y(6) 0 0 2 -2
R* 4 4 4 4
Explanation
of solution
matrix
The solution matrix gives values for the test items of
2.3.3.3.4. Calculation of standard deviations for 1,1,1,1 design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3334.htm[6/27/2012 1:50:38 PM]
Factors for
computing
contributions
of
repeatability
and level-2
standard
deviations to
uncertainty
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
K
1
1 1 1 1
1 0.3536 +
1 0.3536 +
1 0.6124 +
1 0.6124 +
0 0.7071 + -
FACTORS FOR LEVEL-2 STANDARD DEVIATIONS
WT FACTOR
K
2
1 1 1 1
1 0.7071 +
1 0.7071 +
1 1.2247 +
1 1.2247 +
0 1.4141 + -
The first table shows factors for computing the contribution of the
repeatability standard deviation to the total uncertainty. The second table
shows factors for computing the contribution of the between-day standard
deviation to the uncertainty. Notice that the check standard is the last entry in
each table.
Unifying
equation
The unifying equation is:
Standard
deviations
are
computed
using the
factors from
the tables
with the
unifying
equation
The steps in computing the standard deviation for a test item are:
Compute the repeatability standard deviation from historical data.
Compute the standard deviation of the check standard from historical
data.
Locate the factors, K
1
and K
2
, for the check standard.
Compute the between-day variance (using the unifying equation for
the check standard). For this example,
2.3.3.3.4. Calculation of standard deviations for 1,1,1,1 design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3334.htm[6/27/2012 1:50:38 PM]
.
If this variance estimate is negative, set = 0. (This is possible
and indicates that there is no contribution to uncertainty from day-to-
day effects.)
Locate the factors, K
1
and K
2
, for the test items, and compute the
standard deviations using the unifying equation. For this example,
and
2.3.3.3.5. Type B uncertainty
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3335.htm[6/27/2012 1:50:39 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.3. Uncertainties of calibrated values
2.3.3.3.5. Type B uncertainty
Type B
uncertainty
associated
with the
restraint
The reference standard is assumed to have known value, R*,
for the purpose of solving the calibration design. For the
purpose of computing a standard uncertainty, it has a type B
uncertainty that contributes to the uncertainty of the test item.
The value of R* comes from a higher-level calibration
laboratory or process, and its value is usually reported along
with its uncertainty, U. If the laboratory also reports the k
factor for computing U, then the standard deviation of the
restraint is
If k is not reported, then a conservative way of proceeding is
to assume k = 2.
Situation
where the
test is
different in
size from
the
reference
Usually, a reference standard and test item are of the same
nominal size and the calibration relies on measuring the small
difference between the two; for example, the intercomparison
of a reference kilogram compared with a test kilogram. The
calibration may also consist of an intercomparison of the
reference with a summation of artifacts where the summation
is of the same nominal size as the reference; for example, a
reference kilogram compared with 500 g + 300 g + 200 g test
weights.
Type B
uncertainty
for the test
artifact
The type B uncertainty that accrues to the test artifact from the
uncertainty of the reference standard is proportional to their
nominal sizes; i.e.,
2.3.3.3.6. Expanded uncertainties
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3336.htm[6/27/2012 1:50:40 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.3. What are calibration designs?
2.3.3.3. Uncertainties of calibrated values
2.3.3.3.6. Expanded uncertainties
Standard
uncertainty
The standard uncertainty for the test item is
Expanded
uncertainty
The expanded uncertainty is computed as
where k is either the critical value from the t table for degrees of freedom v
or k is set equal to 2.
Problem of
the degrees of
freedom
The calculation of degrees of freedom, v, can be a problem. Sometimes it
can be computed using the Welch-Satterthwaite approximation and the
structure of the uncertainty of the test item. Degrees of freedom for the
standard deviation of the restraint is assumed to be infinite. The coefficients
in the Welch-Satterthwaite formula must all be positive for the
approximation to be reliable.
Standard
deviation for
test item from
the 1,1,1,1
design
For the 1,1,1,1 design, the standard deviation of the test items can be
rewritten by substituting in the equation
so that the degrees of freedom depends only on the degrees of freedom in
the standard deviation of the check standard. This device may not work
satisfactorily for all designs.
Standard
uncertainty
from the
1,1,1,1 design
To complete the calculation shown in the equation at the top of the page,
the nominal value of the test item (which is equal to 1) is divided by the
nominal value of the restraint (which is also equal to 1), and the result is
squared. Thus, the standard uncertainty is
Degrees of Therefore, the degrees of freedom is approximated as
2.3.3.3.6. Expanded uncertainties
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3336.htm[6/27/2012 1:50:40 PM]
freedom using
the Welch-
Satterthwaite
approximation
where n - 1 is the degrees of freedom associated with the check standard
uncertainty. Notice that the standard deviation of the restraint drops out of
the calculation because of an infinite degrees of freedom.
2.3.4. Catalog of calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc34.htm[6/27/2012 1:50:40 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
Important
concept -
Restraint
The designs are constructed for measuring differences
among reference standards and test items, singly or in
combinations. Values for individual standards and test
items can be computed from the design only if the value
(called the restraint = R*) of one or more reference
standards is known. The methodology for constructing and
solving calibration designs is described briefly in matrix
solutions and in more detail in a NIST publication.
(Cameron et al.).
Designs
listed in this
catalog
Designs are listed by traditional subject area although many
of the designs are appropriate generally for
intercomparisons of artifact standards.
Designs for mass weights
Drift-eliminating designs for gage blocks
Left-right balanced designs for electrical standards
Designs for roundness standards
Designs for angle blocks
Drift-eliminating design for thermometers in a bath
Drift-eliminating designs for humidity cylinders
Properties of
designs in
this catalog
Basic requirements are:
1. The differences must be nominally zero.
2. The design must be solvable for individual items
given the restraint.
Other desirable properties are:
1. The number of measurements should be small.
2. The degrees of freedom should be greater than zero.
3. The standard deviations of the estimates for the test
items should be small enough for their intended
purpose.
Information:
Design
Solution
Given
n = number of difference measurements
m = number of artifacts (reference standards + test
items) to be calibrated
2.3.4. Catalog of calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc34.htm[6/27/2012 1:50:40 PM]
Factors for
computing
standard
deviations
the following information is shown for each design:
Design matrix -- (n x m)
Vector that identifies standards in the restraint -- (1 x
m)
Degrees of freedom = (n - m + 1)
Solution matrix for given restraint -- (n x m)
Table of factors for computing standard deviations
Convention
for showing
the
measurement
sequence
Nominal sizes of standards and test items are shown at the
top of the design. Pluses (+) indicate items that are
measured together; and minuses (-) indicate items are not
measured together. The difference measurements are
constructed from the design of pluses and minuses. For
example, a 1,1,1 design for one reference standard and two
test items of the same nominal size with three
measurements is shown below:
1 1 1
Y(1) = + -
Y(2) = + -
Y(3) = + -
Solution
matrix
Example and
interpretation
The cross-product of the column of difference
measurements and R* with a column from the solution
matrix, divided by the named divisor, gives the value for an
individual item. For example,
Solution matrix
Divisor = 3
1 1 1
Y(1) 0 -2 -1
Y(2) 0 -1 -2
Y(3) 0 +1 -1
R* +3 +3 +3
implies that estimates for the restraint and the two test items
are:
Interpretation
of table of
factors
The factors in this table provide information on precision.
The repeatability standard deviation, , is multiplied by the
appropriate factor to obtain the standard deviation for an
2.3.4. Catalog of calibration designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc34.htm[6/27/2012 1:50:40 PM]
individual item or combination of items. For example,
Sum Factor 1 1
1
1 0.0000 +
1 0.8166 +
1 0.8166
+
2 1.4142 +
+
implies that the standard deviations for the estimates are:
2.3.4.1. Mass weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341.htm[6/27/2012 1:50:41 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
Tie to
kilogram
reference
standards
Near-accurate mass measurements require a sequence of
designs that relate the masses of individual weights to a
reference kilogram(s) standard ( Jaeger & Davis). Weights
generally come in sets, and an entire set may require several
series to calibrate all the weights in the set.
Example
of weight
set
A 5,3,2,1 weight set would have the following weights:
1000 g
500g, 300g, 200g, 100g
50g, 30g 20g, 10g
5g, 3g, 2g, 1g
0.5g, 0.3g, 0.2g, 0.1g
Depiction
of a design
with three
series for
calibrating
a 5,3,2,1
weight set
with
weights
between 1
kg and 10
g
First
series
using
The calibrations start with a comparison of the one kilogram
test weight with the reference kilograms (see the graphic
above). The 1,1,1,1 design requires two kilogram reference
2.3.4.1. Mass weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341.htm[6/27/2012 1:50:41 PM]
1,1,1,1
design
standards with known values, R1* and R2*. The fourth
kilogram in this design is actually a summation of the 500,
300, 200 g weights which becomes the restraint in the next
series.
The restraint for the first series is the known average mass of
the reference kilograms,
The design assigns values to all weights including the
individual reference standards. For this design, the check
standard is not an artifact standard but is defined as the
difference between the values assigned to the reference
kilograms by the design; namely,
2nd series
using
5,3,2,1,1,1
design
The second series is a 5,3,2,1,1,1 design where the restraint
over the 500g, 300g and 200g weights comes from the value
assigned to the summation in the first series; i.e.,
The weights assigned values by this series are:
500g, 300g, 200 g and 100g test weights
100 g check standard (2nd 100g weight in the design)
Summation of the 50g, 30g, 20g weights.
Other
starting
points
The calibration sequence can also start with a 1,1,1 design.
This design has the disadvantage that it does not have
provision for a check standard.
Better
choice of
design
A better choice is a 1,1,1,1,1 design which allows for two
reference kilograms and a kilogram check standard which
occupies the 4th position among the weights. This is preferable
to the 1,1,1,1 design but has the disadvantage of requiring the
laboratory to maintain three kilogram standards.
Important
detail
The solutions are only applicable for the restraints as shown.
Designs
for
decreasing
weight sets
1. 1,1,1 design
2. 1,1,1,1 design
3. 1,1,1,1,1 design
4. 1,1,1,1,1,1 design
5. 2,1,1,1 design
6. 2,2,1,1,1 design
2.3.4.1. Mass weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341.htm[6/27/2012 1:50:41 PM]
7. 2,2,2,1,1 design
8. 5,2,2,1,1,1 design
9. 5,2,2,1,1,1,1 design
10. 5,3,2,1,1,1 design
11. 5,3,2,1,1,1,1 design
12. 5,3,2,2,1,1,1 design
13. 5,4,4,3,2,2,1,1 design
14. 5,5,2,2,1,1,1,1 design
15. 5,5,3,2,1,1,1 design
16. 1,1,1,1,1,1,1,1 design
17. 3,2,1,1,1 design
Design for
pound
weights
1. 1,2,2,1,1 design
Designs
for
increasing
weight sets
1. 1,1,1 design
2. 1,1,1,1 design
3. 5,3,2,1,1 design
4. 5,3,2,1,1,1 design
5. 5,2,2,1,1,1 design
6. 3,2,1,1,1 design
2.3.4.1.1. Design for 1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3411.htm[6/27/2012 1:50:42 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.1. Design for 1,1,1
Design 1,1,1
OBSERVATIONS 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 1
SOLUTION MATRIX
DIVISOR = 3
OBSERVATIONS 1 1 1
Y(1) 0 -2 -1
Y(2) 0 -1 -2
Y(3) 0 1 -1
R* 3 3 3
R* = value of reference weight
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
1 1 1
1 0.0000 +
1 0.8165 +
1 0.8165 +
2 1.4142 + +
1 0.8165 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
1 1 1
1 0.0000 +
1 1.4142 +
1 1.4142 +
2 2.4495 + +
1 1.4142 +
Explanation of notation and interpretation of tables
2.3.4.1.2. Design for 1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3412.htm[6/27/2012 1:50:42 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.2. Design for 1,1,1,1
Design 1,1,1,1
OBSERVATIONS 1
1 1 1
Y(1) +
-
Y(2) +
-
Y(3) +
-
Y(4)
+ -
Y(5)
+ -
Y(6)
+ -
RESTRAINT +
+
CHECK STANDARD +
-
DEGREES OF
FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS
1 1 1
1
Y(1)
2 -2 0
0
Y(2)
1 -1 -3
-1
Y(3)
1 -1 -1
-3
Y(4)
-1 1 -3
-1
Y(5)
-1 1 -1
-3
Y(6)
0 0 2
-2
R*
4 4 4
2.3.4.1.2. Design for 1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3412.htm[6/27/2012 1:50:42 PM]
4
R* = sum of
two reference
standards
FACTORS FOR
REPEATABILITY
STANDARD
DEVIATIONS
WT FACTOR
K1 1
1 1 1
1 0.3536 +
1 0.3536
+
1 0.6124
+
1 0.6124
+
0 0.7071 +
-
FACTORS FOR
BETWEEN-DAY
STANDARD
DEVIATIONS
WT FACTOR
K2 1
1 1 1
1 0.7071 +
1 0.7071
+
1 1.2247
+
1 1.2247
+
0 1.4141 +
-
Explanation of
notation and
interpretation of
tables
2.3.4.1.3. Design for 1,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3413.htm[6/27/2012 1:50:43 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.3. Design for 1,1,1,1,1
CASE 1: CHECK STANDARD =
DIFFERENCE BETWEEN
FIRST TWO WEIGHTS
OBSERVATIONS 1 1 1 1
1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) +
-
Y(5) + -
Y(6) + -
Y(7) +
-
Y(8) + -
Y(9) +
-
Y(10) +
-
RESTRAINT + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 10
OBSERVATIONS 1 1
1 1 1
Y(1) 2 -2
0 0 0
Y(2) 1 -1
-3 -1 -1
Y(3) 1 -1
-1 -3 -1
Y(4) 1 -1
-1 -1 -3
Y(5) -1 1
-3 -1 -1
Y(6) -1 1
-1 -3 -1
Y(7) -1 1
-1 -1 -3
Y(8) 0 0
2 -2 0
Y(9) 0 0
CASE 2: CHECK STANDARD =
FOURTH WEIGHT
OBSERVATIONS 1 1 1 1
1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) +
-
Y(5) + -
Y(6) + -
Y(7) +
-
Y(8) + -
Y(9) +
-
Y(10) +
-
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 10
OBSERVATIONS 1 1
1 1 1
Y(1) 2 -2
0 0 0
Y(2) 1 -1
-3 -1 -1
Y(3) 1 -1
-1 -3 -1
Y(4) 1 -1
-1 -1 -3
Y(5) -1 1
-3 -1 -1
Y(6) -1 1
-1 -3 -1
Y(7) -1 1
-1 -1 -3
Y(8) 0 0
2 -2 0
Y(9) 0 0
2 0 -2
2.3.4.1.3. Design for 1,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3413.htm[6/27/2012 1:50:43 PM]
2 0 -2
Y(10) 0 0
0 2 -2
R* 5 5
5 5 5
R* = sum of two reference
standards
FACTORS FOR REPEATABILITY
STANDARD DEVIATIONS
WT FACTOR
K1 1 1 1 1
1
1 0.3162 +
1 0.3162 +
1 0.5477 +
1 0.5477 +
1 0.5477
+
2 0.8944 +
+
3 1.2247 + +
+
0 0.6325 + -
FACTORS FOR BETWEEN-DAY
STANDARD DEVIATIONS
WT FACTOR
K2 1 1 1 1
1
1 0.7071 +
1 0.7071 +
1 1.2247 +
1 1.2247 +
1 1.2247
+
2 2.0000 +
+
3 2.7386 + +
+
0 1.4142 + -
Y(10) 0 0
0 2 -2
R* 5 5
5 5 5
R* = sum of two reference
standards
FACTORS FOR REPEATABILITY
STANDARD DEVIATIONS
WT FACTOR
K1 1 1 1 1
1
1 0.3162 +
1 0.3162 +
1 0.5477 +
1 0.5477 +
1 0.5477
+
2 0.8944 +
+
3 1.2247 + +
+
1 0.5477 +
FACTORS FOR BETWEEN-DAY
STANDARD DEVIATIONS
WT FACTOR
K2 1 1 1 1
1
1 0.7071 +
1 0.7071 +
1 1.2247 +
1 1.2247 +
1 1.2247
+
2 2.0000 +
+
3 2.7386 + +
+
1 1.2247 +
Explanation of notation and
interpretation of tables
2.3.4.1.4. Design for 1,1,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3414.htm[6/27/2012 1:50:44 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.4. Design for 1,1,1,1,1,1
Design 1,1,1,1,1,1
OBSERVATIONS 1 1 1 1 1 1
X(1) + -
X(2) + -
X(3) + -
X(4) + -
X(5) + -
X(6) + -
X(7) + -
X(8) + -
X(9) + -
X(10) + -
X(11) + -
X(12) + -
X(13) + -
X(14) + -
X(15) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 10
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1 1 1
Y(1) 1 -1 0 0 0 0
Y(2) 1 0 -1 0 0 0
Y(3) 1 0 0 -1 0 0
Y(4) 1 0 0 0 -1 0
Y(5) 2 1 1 1 1 0
Y(6) 0 1 -1 0 0 0
Y(7) 0 1 0 -1 0 0
Y(8) 0 1 0 0 -1 0
Y(9) 1 2 1 1 1 0
Y(10) 0 0 1 -1 0 0
Y(11) 0 0 1 0 -1 0
Y(12) 1 1 2 1 1 0
Y(13) 0 0 0 1 -1 0
Y(14) 1 1 1 2 1 0
Y(15) 1 1 1 1 2 0
R* 6 6 6 6 6 6
R* = sum of two reference standards
FACTORS FOR COMPUTING REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
1 1 1 1 1 1
1 0.2887 +
2.3.4.1.4. Design for 1,1,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3414.htm[6/27/2012 1:50:44 PM]
1 0.2887 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
2 0.8165 + +
3 1.1180 + + +
4 1.4142 + + + +
1 0.5000 +
FACTORS FOR COMPUTING BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
1 1 1 1 1 1
1 0.7071 +
1 0.7071 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
2 2.0000 + +
3 2.7386 + + +
4 3.4641 + + + +
1 1.2247 +
Explanation of notation and interpretation of tables
2.3.4.1.5. Design for 2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3415.htm[6/27/2012 1:50:44 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.5. Design for 2,1,1,1
Design 2,1,1,1
OBSERVATIONS 2 1 1 1
Y(1) + - -
Y(2) + - -
Y(3) + - -
Y(4) + -
Y(5) + -
Y(6) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 4
OBSERVATIONS 2 1 1 1
Y(1) 0 -1 0 -1
Y(2) 0 0 -1 -1
Y(3) 0 -1 -1 0
Y(4) 0 1 0 -1
Y(5) 0 1 -1 0
Y(6) 0 0 1 -1
R* 4 2 2 2
R* = value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
2 1 1 1
2 0.0000 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
2 0.7071 + +
3 0.8660 + + +
1 0.5000 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
2 1 1 1
2 0.0000 +
1 1.1180 +
1 1.1180 +
1 1.1180 +
2 1.7321 + +
2.3.4.1.5. Design for 2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3415.htm[6/27/2012 1:50:44 PM]
3 2.2913 + + +
1 1.1180 +
Explanation of notation and interpretation of tables
2.3.4.1.6. Design for 2,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3416.htm[6/27/2012 1:50:45 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.6. Design for 2,2,1,1,1
Design 2,2,1,1,1
OBSERVATIONS 2 2 1 1 1
Y(1) + - - +
Y(2) + - - +
Y(3) + - + -
Y(4) + -
Y(5) + - -
Y(6) + - -
Y(7) + - -
Y(8) + - -
Y(9) + - -
Y(10) + - -
RESTRAINT + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 275
OBSERVATIONS 2 2 1 1 1
Y(1) 47 -3 -44 66 11
Y(2) 25 -25 0 -55 55
Y(3) 3 -47 44 -11 -66
Y(4) 25 -25 0 0 0
Y(5) 29 4 -33 -33 22
Y(6) 29 4 -33 22 -33
Y(7) 7 -18 11 -44 -44
Y(8) 4 29 -33 -33 22
Y(9) 4 29 -33 22 -33
Y(10) -18 7 11 -44 -44
R* 110 110 55 55 55
R* = sum of three reference standards
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
2 2 1 1 1
2 0.2710 +
2 0.2710 +
1 0.3347 +
1 0.4382 +
1 0.4382 +
2 0.6066 + +
3 0.5367 + + +
1 0.4382 +
2.3.4.1.6. Design for 2,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3416.htm[6/27/2012 1:50:45 PM]
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
2 2 1 1 1
2 0.8246 +
2 0.8246 +
1 0.8485 +
1 1.0583 +
1 1.0583 +
2 1.5748 + +
3 1.6971 + + +
1 1.0583 +
Explanation of notation and interpretation of tables
2.3.4.1.7. Design for 2,2,2,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3417.htm[6/27/2012 1:50:46 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.7. Design for 2,2,2,1,1
Design 2,2,2,1,1
OBSERVATIONS 2 2 2 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + - -
Y(5) + - -
Y(6) + - -
Y(7) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 16
OBSERVATIONS 2 2 2 1 1
Y(1) 4 -4 0 0 0
Y(2) 2 -2 -6 -1 -1
Y(3) -2 2 -6 -1 -1
Y(4) 2 -2 -2 -3 -3
Y(5) -2 2 -2 -3 -3
Y(6) 0 0 4 -2 -2
Y(7) 0 0 0 8 -8
R* 8 8 8 4 4
R* = sum of the two reference standards
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
2 2 2 1 1
2 0.3536 +
2 0.3536 +
2 0.6124 +
1 0.5863 +
1 0.5863 +
2 0.6124 + +
4 1.0000 + + +
1 0.5863 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
2 2 2 1 1
2 0.7071 +
2 0.7071 +
2.3.4.1.7. Design for 2,2,2,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3417.htm[6/27/2012 1:50:46 PM]
2 1.2247 +
1 1.0607 +
1 1.0607 +
2 1.5811 + +
4 2.2361 + + +
1 1.0607 +
Explanation of notation and interpretation of tables
2.3.4.1.8. Design for 5,2,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3418.htm[6/27/2012 1:50:46 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.8. Design for 5,2,2,1,1,1
Design 5,2,2,1,1,1
OBSERVATIONS 5 2 2 1 1 1
Y(1) + - - - - +
Y(2) + - - - + -
Y(3) + - - + - -
Y(4) + - - - -
Y(5) + - - - -
Y(6) + - + -
Y(7) + - - +
Y(8) + - + -
RESTRAINT + + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 70
OBSERVATIONS 5 2 2 1 1 1
Y(1) 15 -8 -8 1 1 21
Y(2) 15 -8 -8 1 21 1
Y(3) 5 -12 -12 19 -1 -1
Y(4) 0 2 12 -14 -14 -14
Y(5) 0 12 2 -14 -14 -14
Y(6) -5 8 -12 9 -11 -1
Y(7) 5 12 -8 -9 1 11
Y(8) 0 10 -10 0 10 -10
R* 35 14 14 7 7 7
R* = sum of the four reference standards
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
5 2 2 1 1 1
5 0.3273 +
2 0.3854 +
2 0.3854 +
1 0.4326 +
1 0.4645 +
1 0.4645 +
1 0.4645 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
5 2 2 1 1 1
5 1.0000 +
2.3.4.1.8. Design for 5,2,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3418.htm[6/27/2012 1:50:46 PM]
2 0.8718 +
2 0.8718 +
1 0.9165 +
1 1.0198 +
1 1.0198 +
1 1.0198 +
Explanation of notation and interpretation of tables
2.3.4.1.9. Design for 5,2,2,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3419.htm[6/27/2012 1:50:47 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.9. Design for 5,2,2,1,1,1,1
Design 5,2,2,1,1,1,1
OBSERVATIONS 5 2 2 1 1 1 1
Y(1) + - - -
Y(2) + - - -
Y(3) + - - -
Y(4) + - - -
Y(5) + + - - -
Y(6) + + - - -
Y(7) + + - - - -
Y(8) + -
Y(9) + -
Y(10) + -
RESTRAINT + + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 60
OBSERVATIONS 5 2 2 1 1 1
1
Y(1) 12 0 0 -12 0 0
0
Y(2) 6 -4 -4 2 -12 3
3
Y(3) 6 -4 -4 2 3 -12
3
Y(4) 6 -4 -4 2 3 3 -
12
Y(5) -6 28 -32 10 -6 -6
-6
Y(6) -6 -32 28 10 -6 -6
-6
Y(7) 6 8 8 -22 -6 -6
-6
Y(8) 0 0 0 0 15 -15
0
Y(9) 0 0 0 0 15 0 -
15
Y(10) 0 0 0 0 0 15 -
15
R* 30 12 12 6 6 6
6
R* = sum of the four reference standards
2.3.4.1.9. Design for 5,2,2,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3419.htm[6/27/2012 1:50:47 PM]
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
5 2 2 1 1 1 1
5 0.3162 +
2 0.7303 +
2 0.7303 +
1 0.4830 +
1 0.4472 +
1 0.4472 +
1 0.4472 +
2 0.5477 + +
3 0.5477 + + +
1 0.4472 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
5 2 2 1 1 1 1
5 1.0000 +
2 0.8718 +
2 0.8718 +
1 0.9165 +
1 1.0198 +
1 1.0198 +
1 1.0198 +
2 1.4697 + +
3 1.8330 + + +
1 1.0198 +
Explanation of notation and interpretation of tables
2.3.4.1.10. Design for 5,3,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341a.htm[6/27/2012 1:50:47 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.10. Design for 5,3,2,1,1,1
OBSERVATIONS 5 3 2 1 1 1
Y(1) + - - + -
Y(2) + - - + -
Y(3) + - - - +
Y(4) + - -
Y(5) + - - - -
Y(6) + - + - -
Y(7) + - - + -
Y(8) + - - - +
Y(9) + - -
Y(10) + - -
Y(11) + - -
RESTRAINT + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 920
OBSERVATIONS 5 3 2 1 1 1
Y(1) 100 -68 -32 119 -111 4
Y(2) 100 -68 -32 4 119 -111
Y(3) 100 -68 -32 -111 4 119
Y(4) 100 -68 -32 4 4 4
Y(5) 60 -4 -56 -108 -108 -108
Y(6) -20 124 -104 128 -102 -102
Y(7) -20 124 -104 -102 128 -102
Y(8) -20 124 -104 -102 -102 128
Y(9) -20 -60 80 -125 -125 -10
Y(10) -20 -60 80 -125 -10 -125
Y(11) -20 -60 80 -10 -125 -125
R* 460 276 184 92 92 92
R* = sum of the three reference standards
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
5 3 2 1 1 1
5 0.2331 +
3 0.2985 +
2 0.2638 +
1 0.3551 +
1 0.3551 +
1 0.3551 +
2 0.5043 + +
3 0.6203 + + +
1 0.3551 +
2.3.4.1.10. Design for 5,3,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341a.htm[6/27/2012 1:50:47 PM]
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
5 3 2 1 1 1
5 0.8660 +
3 0.8185 +
2 0.8485 +
1 1.0149 +
1 1.0149 +
1 1.0149 +
2 1.4560 + +
3 1.8083 + + +
1 1.0149 +
Explanation of notation and interpretation of tables
2.3.4.1.11. Design for 5,3,2,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341b.htm[6/27/2012 1:50:48 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.11. Design for 5,3,2,1,1,1,1
Design 5,3,2,1,1,1,1
OBSERVATIONS 5 3 2 1 1 1 1
Y(1) + - -
Y(2) + - - -
Y(3) + - - -
Y(4) + - - - -
Y(5) + - - - -
Y(6) + - - - -
Y(7) + - - - -
Y(8) + - -
Y(9) + - -
Y(10) + - -
Y(11) + - -
RESTRAINT + + +
CHECK STANDARD +
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 40
OBSERVATIONS 5 3 2 1 1 1
1
Y(1) 20 -4 -16 12 12 12
12
Y(2) 0 -4 4 -8 -8 2
2
Y(3) 0 -4 4 2 2 -8
-8
Y(4) 0 0 0 -5 -5 -10
10
Y(5) 0 0 0 -5 -5 10 -
10
Y(6) 0 0 0 -10 10 -5
-5
Y(7) 0 0 0 10 -10 -5
-5
Y(8) 0 4 -4 -12 8 3
3
Y(9) 0 4 -4 8 -12 3
3
Y(10) 0 4 -4 3 3 -12
8
Y(11) 0 4 -4 3 3 8 -
12
R* 20 12 8 4 4 4
4
R* = sum of the three reference standards
2.3.4.1.11. Design for 5,3,2,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341b.htm[6/27/2012 1:50:48 PM]
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
5 3 2 1 1 1 1
5 0.5000 +
3 0.2646 +
2 0.4690 +
1 0.6557 +
1 0.6557 +
1 0.6557 +
1 0.6557 +
2 0.8485 + +
3 1.1705 + + +
4 1.3711 + + + +
1 0.6557 +
FACTORS FOR LEVEL-2 STANDARD DEVIATIONS
WT FACTOR
5 3 2 1 1 1 1
5 0.8660 +
3 0.8185 +
2 0.8485 +
1 1.0149 +
1 1.0149 +
1 1.0149 +
1 1.0149 +
2 1.4560 + +
3 1.8083 + + +
4 2.1166 + + + +
1 1.0149 +
Explanation of notation and interpretation of tables
2.3.4.1.12. Design for 5,3,2,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341c.htm[6/27/2012 1:50:48 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.12. Design for 5,3,2,2,1,1,1
OBSERVATIONS 5 3 2 2 1 1 1
Y(1) + - -
Y(2) + - -
Y(3) + - - -
Y(4) + - - -
Y(5) + - - -
Y(6) + - -
Y(7) + - -
Y(8) + - -
Y(9) + - - -
Y(10) + -
Y(11) + -
Y(12) - +
RESTRAINT + + +
CHECK STANDARDS +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 10
OBSERVATIONS 5 3 2 2 1 1
1
Y(1) 2 0 -2 2 0 0
0
Y(2) 0 -6 6 -4 -2 -2
-2
Y(3) 1 1 -2 0 -1 1
1
Y(4) 1 1 -2 0 1 -1
1
Y(5) 1 1 -2 0 1 1
-1
Y(6) -1 1 0 -2 -1 1
1
Y(7) -1 1 0 -2 1 -1
1
Y(8) -1 1 0 -2 1 1
-1
Y(9) 0 -2 2 2 -4 -4
-4
Y(10) 0 0 0 0 2 -2
0
Y(11) 0 0 0 0 0 2
-2
Y(12) 0 0 0 0 -2 0
2
R* 5 3 2 2 1 1
1
2.3.4.1.12. Design for 5,3,2,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341c.htm[6/27/2012 1:50:48 PM]
R* = sum of the three reference standards
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
5 3 2 2 1 1 1
5 0.3162 +
3 0.6782 +
2 0.7483 +
2 0.6000 +
1 0.5831 +
1 0.5831 +
1 0.5831 +
3 0.8124 + +
4 1.1136 + + +
1 0.5831 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
5 3 2 2 1 1 1
5 0.8660 +
3 0.8185 +
2 0.8485 +
2 1.0583 +
1 1.0149 +
1 1.0149 +
1 1.0149 +
3 1.5067 + +
4 1.8655 + + +
1 1.0149 +
Explanation of notation and interpretation of tables
2.3.4.1.13. Design for 5,4,4,3,2,2,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341d.htm[6/27/2012 1:50:49 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.13. Design for 5,4,4,3,2,2,1,1
OBSERVATIONS 5 4 4 3 2 2 1 1
Y(1) + + - - - - -
Y(2) + + - - - - -
Y(3) + - -
Y(4) + - -
Y(5) + - -
Y(6) + - -
Y(7) + - - -
Y(8) + - - -
Y(9) + - -
Y(10) + - -
Y(11) + - -
Y(12) + - -
RESTRAINT + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 916
OBSERVATIONS 5 4 4 3 2 2
1 1
Y(1) 232 325 123 8 -37 135
-1 1
Y(2) 384 151 401 108 73 105
101 -101
Y(3) 432 84 308 236 168 204 -
144 144
Y(4) 608 220 196 400 440 -120
408 -408
Y(5) 280 258 30 136 58 234 -
246 246
Y(6) 24 -148 68 64 -296 164
-8 8
Y(7) -104 -122 -142 28 214 -558 -
118 118
Y(8) -512 -354 -382 -144 -250 -598
18 -18
Y(9) 76 -87 139 -408 55 443
51 -51
Y(10) -128 26 -210 -36 -406 194 -
110 110
Y(11) -76 87 -139 -508 -55 473 -
51 51
Y(12) -300 -440 -392 116 36 -676
100 -100
R* 1224 696 720 516 476 120
508 408
2.3.4.1.13. Design for 5,4,4,3,2,2,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341d.htm[6/27/2012 1:50:49 PM]
R* = sum of the two reference standards (for going-up
calibrations)
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
5 4 4 3 2 2 1 1
5 1.2095 +
4 0.8610 +
4 0.9246 +
3 0.9204 +
2 0.8456 +
2 1.4444 +
1 0.5975 +
1 0.5975 +
4 1.5818 + +
7 1.7620 + + +
11 2.5981 + + + +
15 3.3153 + + + + +
20 4.4809 + + + + + +
0 1.1950 + -
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT FACTOR
5 4 4 3 2 2 1 1
5 2.1380 +
4 1.4679 +
4 1.4952 +
3 1.2785 +
2 1.2410 +
2 1.0170 +
1 0.7113 +
1 0.7113 +
4 1.6872 + +
7 2.4387 + + +
11 3.4641 + + + +
15 4.4981 + + + + +
20 6.2893 + + + + + +
0 1.4226 + -
Explanation of notation and interpretation of tables
2.3.4.1.14. Design for 5,5,2,2,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341e.htm[6/27/2012 1:50:50 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.14. Design for 5,5,2,2,1,1,1,1
Design 5,5,2,2,1,1,1,1
OBSERVATIONS 5 5 2 2 1 1 1 1
Y(1) + - - -
Y(2) + - - -
Y(3) + - - -
Y(4) + - - -
Y(5) + + - - - -
Y(6) + - -
Y(7) + - -
Y(8) + - -
Y(9) + - -
Y(10) + -
Y(11) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 120
OBSERVATIONS 5 5 2 2 1 1
1 1
Y(1) 30 -30 -12 -12 -22 -10
10 -2
Y(2) -30 30 -12 -12 -10 -22
-2 10
Y(3) 30 -30 -12 -12 10 -2 -
22 -10
Y(4) -30 30 -12 -12 -2 10 -
10 -22
Y(5) 0 0 6 6 -12 -12 -
12 -12
Y(6) -30 30 33 -27 -36 24 -
36 24
Y(7) 30 -30 33 -27 24 -36
24 -36
Y(8) 0 0 -27 33 -18 6
6 -18
Y(9) 0 0 -27 33 6 -18 -
18 6
Y(10) 0 0 0 0 32 8 -
32 -8
Y(11) 0 0 0 0 8 32
-8 -32
R* 60 60 24 24 12 12
12 12
R* = sum of the two reference standards
2.3.4.1.14. Design for 5,5,2,2,1,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341e.htm[6/27/2012 1:50:50 PM]
FACTORS FOR COMPUTING REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
5 5 2 2 1 1 1 1
5 0.6124 +
5 0.6124 +
2 0.5431 +
2 0.5431 +
1 0.5370 +
1 0.5370 +
1 0.5370 +
1 0.5370 +
2 0.6733 + +
4 0.8879 + + +
6 0.8446 + + + +
11 1.0432 + + + + +
16 0.8446 + + + + + +
1 0.5370 +
FACTORS FOR COMPUTING LEVEL-2 STANDARD DEVIATIONS
WT FACTOR
5 5 2 2 1 1 1 1
5 0.7071 +
5 0.7071 +
2 1.0392 +
2 1.0392 +
1 1.0100 +
1 1.0100 +
1 1.0100 +
1 1.0100 +
2 1.4422 + +
4 1.8221 + + +
6 2.1726 + + + +
11 2.2847 + + + + +
16 2.1726 + + + + + +
1 1.0100 +
Explanation of notation and interpretation of tables
2.3.4.1.15. Design for 5,5,3,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341f.htm[6/27/2012 1:50:50 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.15. Design for 5,5,3,2,1,1,1
OBSERVATIONS 5 5 3
2 1 1 1
Y(1) + -
-
Y(2) + -
-
Y(3) +
- - - -
Y(4) +
- - - -
Y(5) + -
- -
Y(6) + -
- -
Y(7) + -
- -
Y(8) + -
- -
Y(9) + -
- -
Y(10) + -
- -
RESTRAINT + +
CHECK STANDARD
+
DEGREES OF FREEDOM =
4
SOLUTION MATRIX
DIVISOR = 10
OBSERVATIONS 5
5 3 2 1
1 1
Y(1) 1 -
1 -2 -3 1
1 1
Y(2) -1
1 -2 -3 1
1 1
Y(3) 1 -
1 2 -2 -1
-1 -1
Y(4) -1
1 2 -2 -1
-1 -1
Y(5) 1 -
1 -1 1 -2
2.3.4.1.15. Design for 5,5,3,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341f.htm[6/27/2012 1:50:50 PM]
-2 3
Y(6) 1 -
1 -1 1 -2
3 -2
Y(7) 1 -
1 -1 1 3
-2 -2
Y(8) -1
1 -1 1 -2
-2 3
Y(9) -1
1 -1 1 -2
3 -2
Y(10) -1
1 -1 1 3
-2 -2
R* 5
5 3 2 1
1 1
R* = sum of the two
reference standards
FACTORS FOR REPEATABILITY
STANDARD DEVIATIONS
WT FACTOR
5 5 3
2 1 1 1
5 0.3162 +
5 0.3162 +
3 0.4690 +
2 0.5657
+
1 0.6164
+
1 0.6164
+
1 0.6164
+
3 0.7874
+ +
6 0.8246 +
+ +
11 0.8832 + +
+ +
16 0.8246 + + +
+ +
1 0.6164
+
FACTORS FOR BETWEEN-DAY
STANDARD DEVIATIONS
WT FACTOR
5 5 3
2 1 1 1
5 0.7071 +
5 0.7071 +
3 1.0863 +
2 1.0392
+
1 1.0100
+
1 1.0100
+
1 1.0100
+
3 1.4765
+ +
6 1.9287 +
+ +
11 2.0543 + +
+ +
16 1.9287 + + +
+ +
1 1.0100
+
2.3.4.1.15. Design for 5,5,3,2,1,1,1
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341f.htm[6/27/2012 1:50:50 PM]
Explanation of notation and
interpretation of tables
2.3.4.1.16. Design for 1,1,1,1,1,1,1,1 weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341g.htm[6/27/2012 1:50:51 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.16. Design for 1,1,1,1,1,1,1,1 weights
OBSERVATIONS 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) + -
Y(11) + -
Y(12) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 12
OBSERVATIONS 1 1 1 1 1 1
1 1
Y(1) 1 -1 -6 0 0 0
0 0
Y(2) 1 -1 0 -6 0 0
0 0
Y(3) 1 -1 0 0 -6 0
0 0
Y(4) 1 -1 0 0 0 -6
0 0
Y(5) 1 -1 0 0 0 0
-6 0
Y(6) 1 -1 0 0 0 0
0 -6
Y(7) -1 1 -6 0 0 0
0 0
Y(8) -1 1 0 -6 0 0
0 0
Y(9) -1 1 0 0 -6 0
0 0
Y(10) -1 1 0 0 0 -6
0 0
Y(11) -1 1 0 0 0 0
-6 0
Y(12) -1 1 0 0 0 0
0 -6
R* 6 6 6 6 6 6
6 6
R* = sum of the two reference standards
2.3.4.1.16. Design for 1,1,1,1,1,1,1,1 weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341g.htm[6/27/2012 1:50:51 PM]
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT K1 1 1 1 1 1 1 1 1
1 0.2887 +
1 0.2887 +
1 0.7071 +
1 0.7071 +
1 0.7071 +
1 0.7071 +
1 0.7071 +
1 0.7071 +
2 1.0000 + +
3 1.2247 + + +
4 1.4142 + + + +
5 1.5811 + + + + +
6 1.7321 + + + + + +
1 0.7071 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT K2 1 1 1 1 1 1 1 1
1 0.7071 +
1 0.7071 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
1 1.2247 +
2 2.0000 + +
3 2.7386 + + +
4 3.4641 + + + +
5 4.1833 + + + + +
6 4.8990 + + + + + +
1 1.2247 +
Explanation of notation and interpretation of tables
2.3.4.1.17. Design for 3,2,1,1,1 weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341h.htm[6/27/2012 1:50:51 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.17. Design for 3,2,1,1,1 weights
OBSERVATIONS 3 2 1 1 1
Y(1) + - -
Y(2) + - -
Y(3) + - -
Y(4) + - - -
Y(5) + - -
Y(6) + - -
Y(7) + - -
Y(8) + -
Y(9) + -
Y(10) + -
RESTRAINT + +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 25
OBSERVATIONS 3 2 1 1 1
Y(1) 3 -3 -4 1 1
Y(2) 3 -3 1 -4 1
Y(3) 3 -3 1 1 -4
Y(4) 1 -1 -3 -3 -3
Y(5) -2 2 -4 -4 1
Y(6) -2 2 -4 1 -4
Y(7) -2 2 1 -4 -4
Y(8) 0 0 5 -5 0
Y(9) 0 0 5 0 -5
Y(10) 0 0 0 5 -5
R* 15 10 5 5 5
R* = sum of the two reference standards
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT K1 3 2 1 1 1
3 0.2530 +
2 0.2530 +
1 0.4195 +
1 0.4195 +
1 0.4195 +
2 0.5514 + +
3 0.6197 + + +
1 0.4195 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
2.3.4.1.17. Design for 3,2,1,1,1 weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341h.htm[6/27/2012 1:50:51 PM]
WT K2 3 2 1 1 1
3 0.7211 +
2 0.7211 +
1 1.0392 +
1 1.0392 +
1 1.0392 +
2 1.5232 + +
3 1.9287 + + +
1 1.0392 +
Explanation of notation and interpretation of tables
2.3.4.1.18. Design for 10-and 20-pound weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341i.htm[6/27/2012 1:50:52 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.1. Mass weights
2.3.4.1.18. Design for 10-and 20-pound weights
OBSERVATIONS 1 2 2 1 1
Y(1) + -
Y(2) + -
Y(3) + - +
Y(4) + - +
Y(5) + - +
Y(6) + - +
Y(7) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 24
OBSERVATIONS 1 2 2 1 1
Y(1) 0 -12 -12 -16 -8
Y(2) 0 -12 -12 -8 -16
Y(3) 0 -9 -3 -4 4
Y(4) 0 -3 -9 4 -4
Y(5) 0 -9 -3 4 -4
Y(6) 0 -3 -9 -4 4
Y(7) 0 6 -6 0 0
R* 24 48 48 24 24
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT K1 1 2 2 1 1
2 0.9354 +
2 0.9354 +
1 0.8165 +
1 0.8165 +
4 1.7321 + +
5 2.3805 + + +
6 3.0000 + + + +
1 0.8165 +
FACTORS FOR BETWEEN-DAY STANDARD DEVIATIONS
WT K2 1 2 2 1 1
2 2.2361 +
2 2.2361 +
1 1.4142 +
1 1.4142 +
4 4.2426 + +
5 5.2915 + + +
2.3.4.1.18. Design for 10-and 20-pound weights
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc341i.htm[6/27/2012 1:50:52 PM]
6 6.3246 + + + +
1 1.4142 +
Explanation of notation and interpretation of tables
2.3.4.2. Drift-elimination designs for gauge blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc342.htm[6/27/2012 1:50:53 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gauge
blocks
Tie to the
defined
unit of
length
The unit of length in many industries is maintained and
disseminated by gauge blocks. The highest accuracy
calibrations of gauge blocks are done by laser intererometry
which allows the transfer of the unit of length to a gauge
piece. Primary standards laboratories maintain master sets of
English gauge blocks and metric gauge blocks which are
calibrated in this manner. Gauge blocks ranging in sizes from
0.1 to 20 inches are required to support industrial processes in
the United States.
Mechanical
comparison
of gauge
blocks
However, the majority of gauge blocks are calibrated by
comparison with master gauges using a mechanical
comparator specifically designed for measuring the small
difference between two blocks of the same nominal length.
The measurements are temperature corrected from readings
taken directly on the surfaces of the blocks. Measurements on
2 to 20 inch blocks require special handling techniques to
minimize thermal effects. A typical calibration involves a set
of 81 gauge blocks which are compared one-by-one with
master gauges of the same nominal size.
Calibration
designs for
gauge
blocks
Calibration designs allow comparison of several gauge blocks
of the same nominal size to one master gauge in a manner
that promotes economy of operation and minimizes wear on
the master gauge. The calibration design is repeated for each
size until measurements on all the blocks in the test sets are
completed.
Problem of
thermal
drift
Measurements on gauge blocks are subject to drift from heat
build-up in the comparator. This drift must be accounted for
in the calibration experiment or the lengths assigned to the
blocks will be contaminated by the drift term.
Elimination
of linear
drift
The designs in this catalog are constructed so that the
solutions are immune to linear drift if the measurements are
equally spaced over time. The size of the drift is the average
of the n difference measurements. Keeping track of drift from
design to design is useful because a marked change from its
usual range of values may indicate a problem with the
measurement system.
2.3.4.2. Drift-elimination designs for gauge blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc342.htm[6/27/2012 1:50:53 PM]
Assumption
for Doiron
designs
Mechanical measurements on gauge blocks take place
successively with one block being inserted into the
comparator followed by a second block and so on. This
scenario leads to the assumption that the individual
measurements are subject to drift (Doiron). Doiron lists
designs meeting this criterion which also allow for:
two master blocks, R1 and R2
one check standard = difference between R1 and R2
one - nine test blocks
Properties
of drift-
elimination
designs
that use 1
master
block
The designs are constructed to:
Be immune to linear drift
Minimize the standard deviations for test blocks (as
much as possible)
Spread the measurements on each block throughout the
design
Be completed in 5-10 minutes to keep the drift at the 5
nm level
Caution Because of the large number of gauge blocks that are being
intercompared and the need to eliminate drift, the Doiron
designs are not completely balanced with respect to the test
blocks. Therefore, the standard deviations are not equal for all
blocks. If all the blocks are being calibrated for use in one
facility, it is easiest to quote the largest of the standard
deviations for all blocks rather than try to maintain a separate
record on each block.
Definition
of master
block and
check
standard
At the National Institute of Standards and Technology
(NIST), the first two blocks in the design are NIST masters
which are designated R1 and R2, respectively. The R1 block
is a steel block, and the R2 block is a chrome-carbide block.
If the test blocks are steel, the reference is R1; if the test
blocks are chrome-carbide, the reference is R2. The check
standard is always the difference between R1 and R2 as
estimated from the design and is independent of R1 and R2.
The designs are listed in this section of the catalog as:
1. Doiron design for 3 gauge blocks - 6 measurements
2. Doiron design for 3 gauge blocks - 9 measurements
3. Doiron design for 4 gauge blocks - 8 measurements
4. Doiron design for 4 gauge blocks - 12 measurements
5. Doiron design for 5 gauge blocks - 10 measurements
6. Doiron design for 6 gauge blocks - 12 measurements
7. Doiron design for 7 gauge blocks - 14 measurements
8. Doiron design for 8 gauge blocks - 16 measurements
9. Doiron design for 9 gauge blocks - 18 measurements
10. Doiron design for 10 gauge blocks - 20 measurements
11. Doiron design for 11 gauge blocks - 22 measurements
2.3.4.2. Drift-elimination designs for gauge blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc342.htm[6/27/2012 1:50:53 PM]
Properties
of designs
that use 2
master
blocks
Historical designs for gauge blocks (Cameron and Hailes)
work on the assumption that the difference measurements are
contaminated by linear drift. This assumption is more
restrictive and covers the case of drift in successive
measurements but produces fewer designs. The
Cameron/Hailes designs meeting this criterion allow for:
two reference (master) blocks, R1 and R2
check standard = difference between the two master
blocks
and assign equal uncertainties to values of all test blocks.
The designs are listed in this section of the catalog as:
1. Cameron-Hailes design for 2 masters + 2 test blocks
2. Cameron-Hailes design for 2 masters + 3 test blocks
3. Cameron-Hailes design for 2 masters + 4 test blocks
4. Cameron-Hailes design for 2 masters + 5 test blocks
Important
concept -
check
standard
The check standards for the designs in this section are not
artifact standards but constructions from the design. The value
of one master block or the average of two master blocks is the
restraint for the design, and values for the masters, R1 and R2,
are estimated from a set of measurements taken according to
the design. The check standard value is the difference
between the estimates, R1 and R2. Measurement control is
exercised by comparing the current value of the check
standard with its historical average.
2.3.4.2.1. Doiron 3-6 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3421.htm[6/27/2012 1:50:53 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.1. Doiron 3-6 Design
Doiron 3-6 design
OBSERVATIONS 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 6
OBSERVATIONS 1 1 1
Y(1) 0 -2 -1
Y(2) 0 1 2
Y(3) 0 1 -1
Y(4) 0 2 1
Y(5) 0 -1 1
Y(6) 0 -1 -2
R* 6 6 6
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1
1 0.0000 +
1 0.5774 +
1 0.5774 +
1 0.5774 +
Explanation of notation and interpretation of tables
2.3.4.2.2. Doiron 3-9 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3422.htm[6/27/2012 1:50:54 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.2. Doiron 3-9 Design
Doiron 3-9 Design
OBSERVATIONS 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) + -
Y(7) - +
Y(8) - +
Y(9) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 7
SOLUTION MATRIX
DIVISOR = 9
OBSERVATIONS 1 1 1
Y(1) 0 -2 -1
Y(2) 0 -1 1
Y(3) 0 -1 -2
Y(4) 0 2 1
Y(5) 0 1 2
Y(6) 0 1 -1
Y(7) 0 2 1
Y(8) 0 -1 1
Y(9) 0 -1 -2
R(1) 9 9 9
FACTORS FOR COMPUTING REPEATABILITY STANDARD
DEVIATIONS
NOM FACTOR
1 1 1
1 0.0000 +
1 0.4714 +
1 0.4714 +
1 0.4714 +
Explanation of notation and interpretation of tables
2.3.4.2.2. Doiron 3-9 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3422.htm[6/27/2012 1:50:54 PM]
2.3.4.2.3. Doiron 4-8 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3423.htm[6/27/2012 1:50:54 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.3. Doiron 4-8 Design
Doiron 4-8 Design
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) - +
Y(4) + -
Y(5) - +
Y(6) - +
Y(7) + -
Y(8) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1
Y(1) 0 -3 -2 -1
Y(2) 0 1 2 -1
Y(3) 0 1 2 3
Y(4) 0 1 -2 -1
Y(5) 0 3 2 1
Y(6) 0 -1 -2 1
Y(7) 0 -1 -2 -3
Y(8) 0 -1 2 1
R* 8 8 8 8
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1 1
1 0.0000 +
1 0.6124 +
1 0.7071 +
1 0.6124 +
1 0.6124 +
Explanation of notation and interpretation of tables
2.3.4.2.4. Doiron 4-12 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3424.htm[6/27/2012 1:50:55 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.4. Doiron 4-12 Design
Doiron 4-12 Design
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + +
Y(3) + -
Y(4) - +
Y(5) + -
Y(6) - +
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) - +
Y(11) - +
Y(12) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 9
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1
Y(1) 0 -2 -1 -1
Y(2) 0 1 1 2
Y(3) 0 0 1 -1
Y(4) 0 2 1 1
Y(5) 0 1 -1 0
Y(6) 0 -1 0 1
Y(7) 0 -1 -2 -1
Y(8) 0 1 0 -1
Y(9) 0 -1 -1 -2
Y(10) 0 -1 1 0
Y(11) 0 1 2 1
Y(12) 0 0 -1 1
R* 6 6 6 4
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1 1
1 0.0000 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
Explanation of notation and interpretation of tables
2.3.4.2.4. Doiron 4-12 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3424.htm[6/27/2012 1:50:55 PM]
2.3.4.2.5. Doiron 5-10 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3425.htm[6/27/2012 1:50:55 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.5. Doiron 5-10 Design
Doiron 5-10 Design
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) + -
Y(7) - +
Y(8) + -
Y(9) - +
Y(10) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 90
OBSERVATIONS 1 1 1 1 1
Y(1) 0 -50 -10 -10 -30
Y(2) 0 20 4 -14 30
Y(3) 0 -10 -29 -11 -15
Y(4) 0 -20 5 5 15
Y(5) 0 0 -18 18 0
Y(6) 0 -10 -11 -29 -15
Y(7) 0 10 29 11 15
Y(8) 0 -20 14 -4 -30
Y(9) 0 10 11 29 15
Y(10) 0 20 -5 -5 -15
R* 90 90 90 90 90
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1 1 1
1 0.0000 +
1 0.7454 +
1 0.5676 +
1 0.5676 +
1 0.7071 +
1 0.7454 +
Explanation of notation and interpretation of tables
2.3.4.2.5. Doiron 5-10 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3425.htm[6/27/2012 1:50:55 PM]
2.3.4.2.6. Doiron 6-12 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3426.htm[6/27/2012 1:50:56 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.6. Doiron 6-12 Design
Doiron 6-12 Design
OBSERVATIONS 1 1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) - +
Y(4) - +
Y(5) - +
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) - +
Y(11) + -
Y(12) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 7
SOLUTION MATRIX
DIVISOR = 360
OBSERVATIONS 1 1 1 1 1
1
Y(1) 0 -136 -96 -76 -72
-76
Y(2) 0 -4 -24 -79 72
11
Y(3) 0 -20 -120 -35 0
55
Y(4) 0 4 24 -11 -72
79
Y(5) 0 -60 0 75 0
-15
Y(6) 0 20 120 -55 0
35
Y(7) 0 -76 -96 -61 -72
-151
Y(8) 0 64 24 4 -72
4
Y(9) 0 40 -120 -20 0
-20
Y(10) 0 72 72 72 144
72
Y(11) 0 60 0 15 0
-75
Y(12) 0 76 96 151 72
61
R* 360 360 360 360 360
360
R* = Value of the reference standard
2.3.4.2.6. Doiron 6-12 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3426.htm[6/27/2012 1:50:56 PM]
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1 1 1 1
1 0.0000 +
1 0.6146 +
1 0.7746 +
1 0.6476 +
1 0.6325 +
1 0.6476 +
1 0.6146 +
Explanation of notation and interpretation of tables
2.3.4.2.7. Doiron 7-14 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3427.htm[6/27/2012 1:50:57 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.7. Doiron 7-14 Design
Doiron 7-14 Design
OBSERVATIONS 1 1 1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) - +
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) - +
Y(14) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 8
PARAMETER VALUES
DIVISOR = 1015
OBSERVATIONS 1 1 1 1 1
1 1
Y(1) 0 -406 -203 -203 -203
-203 -203
Y(2) 0 0 -35 -210 35
210 0
Y(3) 0 0 175 35 -175
-35 0
Y(4) 0 203 -116 29 -116
29 -261
Y(5) 0 -203 -229 -214 -264
-424 -174
Y(6) 0 0 -175 -35 175
35 0
Y(7) 0 203 -61 -221 -26
-11 29
Y(8) 0 0 305 90 130
55 -145
Y(9) 0 0 220 15 360
-160 145
Y(10) 0 203 319 174 319
174 464
Y(11) 0 -203 26 11 61
221 -29
Y(12) 0 0 -360 160 -220
-15 -145
Y(13) 0 203 264 424 229
214 174
Y(14) 0 0 -130 -55 -305
2.3.4.2.7. Doiron 7-14 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3427.htm[6/27/2012 1:50:57 PM]
-90 145
R* 1015 1015 1015 1015 1015
1015 1015
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1 1 1 1 1
1 0.0000 +
1 0.6325 +
1 0.7841 +
1 0.6463 +
1 0.7841 +
1 0.6463 +
1 0.6761 +
1 0.6325 +
Explanation of notation and interpretation of tables
2.3.4.2.8. Doiron 8-16 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3428.htm[6/27/2012 1:50:57 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.8. Doiron 8-16 Design
Doiron 8-16 Design
OBSERVATIONS 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) - +
Y(4) - +
Y(5) + -
Y(6) - +
Y(7) - +
Y(8) - +
Y(9) - +
Y(10) - +
Y(11) + -
Y(12) - +
Y(13) - +
Y(14) - +
Y(15) + -
Y(16) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 9
SOLUTION MATRIX
DIVISOR = 2852
OBSERVATIONS 1 1 1 1 1
1 1 1
Y(1) 0 -1392 -620 -472 -516
-976 -824 -916
Y(2) 0 60 248 -78 96
878 -112 -526
Y(3) 0 352 124 -315 278
255 864 289
Y(4) 0 516 992 470 1396
706 748 610
Y(5) 0 -356 620 35 286
-979 -96 -349
Y(6) 0 92 0 23 -138
253 -552 667
Y(7) 0 -148 -992 335 -522
-407 -104 -81
Y(8) 0 -416 372 113 190
995 16 177
Y(9) 0 308 -248 170 -648
134 756 342
Y(10) 0 472 620 955 470
585 640 663
Y(11) 0 476 -124 -191 -94
-117 -128 -703
Y(12) 0 -104 -620 -150 404
-286 4 -134
2.3.4.2.8. Doiron 8-16 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3428.htm[6/27/2012 1:50:57 PM]
Y(13) 0 472 620 955 470
585 640 663
Y(14) 0 444 124 -292 140
508 312 956
Y(15) 0 104 620 150 -404
286 -4 134
Y(16) 0 568 -124 -168 -232
136 -680 -36
R* 2852 2852 2852 2852 2852
2852 2852 2852
R* = value of reference block
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT FACTOR
1 1 1 1 1 1 1 1
1 0.0000 +
1 0.6986 +
1 0.7518 +
1 0.5787 +
1 0.6996 +
1 0.8313 +
1 0.7262 +
1 0.7534 +
1 0.6986 +
Explanation of notation and interpretation of tables
2.3.4.2.9. Doiron 9-18 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3429.htm[6/27/2012 1:50:58 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.9. Doiron 9-18 Design
Doiron 9-18 Design
OBSERVATIONS 1 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) + -
Y(4) - +
Y(5) + -
Y(6) - +
Y(7) + -
Y(8) + -
Y(9) - +
Y(10) + -
Y(11) - +
Y(12) - +
Y(13) - +
Y(14) + -
Y(15) - +
Y(16) + -
Y(17) - +
Y(18) + -
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 10
SOLUTION MATRIX
DIVISOR = 8247
OBSERVATIONS 1 1 1 1 1
1 1 1 1
Y(1) 0 -3680 -2305 -2084 -1175 -
1885 -1350 -1266 -654
Y(2) 0 -696 -1422 -681 -1029
-984 -2586 -849 1203
Y(3) 0 1375 -3139 196 -491 -
1279 -1266 -894 -540
Y(4) 0 -909 -222 -1707 1962
-432 675 633 327
Y(5) 0 619 1004 736 -329
2771 -378 -1674 -513
Y(6) 0 -1596 -417 1140 342
303 42 186 57
Y(7) 0 955 2828 496 -401
971 -1689 -411 -525
Y(8) 0 612 966 741 1047
1434 852 2595 -1200
Y(9) 0 1175 1666 1517 3479
1756 2067 2085 1038
Y(10) 0 199 -1276 1036 -239 -
3226 -801 -1191 -498
Y(11) 0 654 1194 711 1038
1209 1719 1722 2922
2.3.4.2.9. Doiron 9-18 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3429.htm[6/27/2012 1:50:58 PM]
Y(12) 0 91 494 -65 -1394
887 504 2232 684
Y(13) 0 2084 1888 3224 1517
2188 1392 1452 711
Y(14) 0 1596 417 -1140 -342
-303 -42 -186 -57
Y(15) 0 175 950 -125 -1412
437 2238 486 681
Y(16) 0 -654 -1194 -711 -1038 -
1209 -1719 -1722 -2922
Y(17) 0 -420 -2280 300 90
2250 -423 483 15
Y(18) 0 84 456 -60 -18
-450 1734 -1746 -3
R* 8247 8247 8247 8247 8247
8247 8247 8247 8247
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1 1 1 1 1 1 1
1 0.0000 +
1 0.6680 +
1 0.8125 +
1 0.6252 +
1 0.6495 +
1 0.8102 +
1 0.7225 +
1 0.7235 +
1 0.5952 +
1 0.6680 +
Explanation of notation and interpretation of tables
2.3.4.2.10. Doiron 10-20 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc342a.htm[6/27/2012 1:50:58 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.10. Doiron 10-20 Design
Doiron 10-20 Design
OBSERVATIONS 1 1 1 1 1 1 1 1 1
1
Y(1) + -
Y(2) + -
Y(3) -
+
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) -
+
Y(9) + -
Y(10) +
-
Y(11) + -
Y(12) + -
Y(13) +
-
Y(14) - +
Y(15) + -
Y(16) + -
Y(17) - +
Y(18) + -
Y(19) - +
Y(20) - +
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 11
SOLUTION MATRIX
DIVISOR = 33360
OBSERVATIONS 1 1 1 1 1
1 1 1 1
Y(1) 0 -15300 -9030 -6540 -5970 -
9570 -7770 -6510 -9240
Y(2) 0 1260 1594 1716 3566
3470 9078 -5678 -24
Y(3) 0 -960 -2856 -7344 -2664 -
1320 -1992 -1128 336
Y(4) 0 -3600 -1536 816 5856 -
9120 -1632 -1728 -3744
Y(5) 0 6060 306 -1596 -906 -
1050 -978 -2262 -8376
Y(6) 0 2490 8207 -8682 -1187
1165 2769 2891 588
Y(7) 0 -2730 809 -1494 -869 -
2885 903 6557 -8844
Y(8) 0 5580 7218 11412 6102
2.3.4.2.10. Doiron 10-20 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc342a.htm[6/27/2012 1:50:58 PM]
6630 6366 5514 8472
Y(9) 0 1800 -2012 -408 -148
7340 -7524 -1916 1872
Y(10) 0 3660 1506 -3276 774
3990 2382 3258 9144
Y(11) 0 -1800 -3548 408 5708 -
1780 -9156 -3644 -1872
Y(12) 0 6270 -9251 -3534 -1609
455 -3357 -3023 516
Y(13) 0 960 2856 7344 2664
1320 1992 1128 -336
Y(14) 0 -330 -391 186 -2549 -
7925 -2457 1037 6996
Y(15) 0 2520 8748 3432 1572
1380 1476 -5796 -48
Y(16) 0 -5970 -7579 -8766 -15281 -
9425 -9573 -6007 -6876
Y(17) 0 -1260 -7154 -1716 1994
2090 7602 118 24
Y(18) 0 570 2495 9990 -6515 -
1475 -1215 635 1260
Y(19) 0 6510 9533 6642 6007
7735 9651 15329 8772
Y(20) 0 -5730 85 1410 3455
8975 3435 1225 1380
R* 33360 33360 33360 33360 33360
33360 33360 33360 33360
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1 1 1 1 1 1 1
1
1 0.0000 +
1 0.6772 +
1 0.7403 +
1 0.7498 +
1 0.6768 +
1 0.7456 +
1 0.7493 +
1 0.6779 +
1 0.7267 +
1 0.6961
+
1 0.6772 +
Explanation of notation and interpretation of tables
2.3.4.2.11. Doiron 11-22 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc342b.htm[6/27/2012 1:50:59 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.2. Drift-elimination designs for gage blocks
2.3.4.2.11. Doiron 11-22 Design
Doiron 11-22 Design
OBSERVATIONS 1 1 1 1 1 1 1 1 1
1 1
Y(1) + -
Y(2) + -
Y(3) +
-
Y(4) + -
Y(5) + -
Y(6)
+ -
Y(7) - +
Y(8) - +
Y(9) + -
Y(10) + -
Y(11) + -
Y(12) - +
Y(13) + -
Y(14) -
+
Y(15) + -
Y(16) +
-
Y(17) + -
Y(18) -
+
Y(19) + -
Y(20) - +
Y(21) -
+
Y(22) +
-
RESTRAINT +
CHECK STANDARD +
DEGREES OF FREEDOM = 12
SOLUTION MATRIX
DIVISOR = 55858
OBSERVATIONS 1 1 1 1 1
1 1 1 1 1
Y(1) 0 -26752 -18392 -15532 -9944 -
8778 -14784 -15466 -16500 -10384 -17292
Y(2) 0 1166 1119 3976 12644 -
11757 -1761 2499 1095 -2053 1046
Y(3) 0 5082 4446 3293 4712
160 5882 15395 3527 -9954 487
Y(4) 0 -968 -1935 10496 2246 -
635 -4143 -877 -13125 -643 -1060
Y(5) 0 8360 -18373 -8476 -3240 -
3287 -8075 -1197 -9443 -1833 -2848
Y(6) 0 -6908 -7923 -9807 -2668
2.3.4.2.11. Doiron 11-22 Design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc342b.htm[6/27/2012 1:50:59 PM]
431 -4753 -1296 -10224 9145 -18413
Y(7) 0 1716 3084 6091 404 -
2452 -10544 -2023 15073 332 5803
Y(8) 0 9944 13184 15896 24476
11832 13246 14318 13650 9606 12274
Y(9) 0 2860 12757 -11853 -2712
145 3585 860 578 -293 -2177
Y(10) 0 -8778 -12065 -11920 -11832 -
23589 -15007 -11819 -12555 -11659 -11228
Y(11) 0 11286 1729 -271 -4374 -
3041 -3919 -14184 -180 -3871 1741
Y(12) 0 -3608 -13906 -4734 62
2942 11102 2040 -2526 604 -2566
Y(13) 0 -6006 -10794 -7354 -1414
8582 -18954 -6884 -10862 -1162 -6346
Y(14) 0 -9460 1748 6785 2330
2450 2790 85 6877 4680 16185
Y(15) 0 5588 10824 19965 -8580
88 6028 1485 11715 2904 10043
Y(16) 0 -792 5803 3048 1376
1327 5843 1129 15113 -1911 -10100
Y(17) 0 -682 6196 3471 -1072
3188 15258 -10947 6737 -1434 2023
Y(18) 0 10384 12217 12510 9606
11659 12821 14255 13153 24209 15064
Y(19) 0 1892 10822 -1357 -466 -
490 -558 -17 -12547 -936 -3237
Y(20) 0 5522 3479 -93 -10158 -
13 5457 15332 3030 4649 3277
Y(21) 0 1760 -3868 -13544 -3622 -
692 -1700 -252 -1988 2554 11160
Y(22) 0 -1606 -152 -590 2226
11930 2186 -2436 -598 -12550 -3836
R* 55858 55858 55858 55858 55858
55858 55858 55858 55858 55858 55858
R* = Value of the reference standard
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
NOM FACTOR
1 1 1 1 1 1 1 1 1
1 1
1 0.0000 +
1 0.6920 +
1 0.8113 +
1 0.8013 +
1 0.6620 +
1 0.6498 +
1 0.7797 +
1 0.7286 +
1 0.8301 +
1 0.6583
+
1 0.6920 +
Explanation of notation and interpretation of tables
2.3.4.3. Designs for electrical quantities
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc343.htm[6/27/2012 1:51:00 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
Standard
cells
Banks of saturated standard cells that are nominally one volt
are the basis for maintaining the unit of voltage in many
laboratories.
Bias
problem
It has been observed that potentiometer measurements of the
difference between two saturated standard cells, connected in
series opposition, are effected by a thermal emf which remains
constant even when the direction of the circuit is reversed.
Designs
for
eliminating
bias
A calibration design for comparing standard cells can be
constructed to be left-right balanced so that:
A constant bias, P, does not contaminate the estimates
for the individual cells.
P is estimated as the average of difference
measurements.
Designs
for
electrical
quantities
Designs are given for the following classes of electrical
artifacts. These designs are left-right balanced and may be
appropriate for artifacts other than electrical standards.
Saturated standard reference cells
Saturated standard test cells
Zeners
Resistors
Standard
cells in a
single box
Left-right balanced designs for comparing standard cells
among themselves where the restraint is over all reference
cells are listed below. These designs are not appropriate for
assigning values to test cells.
Estimates for individual standard cells and the bias term, P,
are shown under the heading, 'SOLUTION MATRIX'. These
designs also have the advantage of requiring a change of
connections to only one cell at a time.
1. Design for 3 standard cells
2. Design for 4 standard cells
3. Design for 5 standard cells
4. Design for 6 standard cells
2.3.4.3. Designs for electrical quantities
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc343.htm[6/27/2012 1:51:00 PM]
Test cells Calibration designs for assigning values to test cells in a
common environment on the basis of comparisons with
reference cells with known values are shown below. The
designs in this catalog are left-right balanced.
1. Design for 4 test cells and 4 reference cells
2. Design for 8 test cells and 8 reference cells
Zeners Increasingly, zeners are replacing saturated standard cells as
artifacts for maintaining and disseminating the volt. Values are
assigned to test zeners, based on a group of reference zeners,
using calibration designs.
1. Design for 4 reference zeners and 2 test zeners
2. Design for 4 reference zeners and 3 test zeners
Standard
resistors
Designs for comparing standard resistors that are used for
maintaining and disseminating the ohm are listed in this
section.
1. Design for 3 reference resistors and 1 test resistor
2. Design for 4 reference resistors and 1 test resistor
2.3.4.3.1. Left-right balanced design for 3 standard cells
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3431.htm[6/27/2012 1:51:00 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.1. Left-right balanced design for 3
standard cells
Design 1,1,1
CELLS
OBSERVATIONS 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) - +
RESTRAINT + + +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 6
OBSERVATIONS 1 1 1 P
Y(1) 1 -1 0 1
Y(2) 1 0 -1 1
Y(3) 0 1 -1 1
Y(4) -1 1 0 1
Y(5) -1 0 1 1
Y(6) 0 -1 1 1
R* 2 2 2 0
R* = AVERAGE VALUE OF 3 REFERENCE CELLS
P = LEFT-RIGHT BIAS
FACTORS FOR COMPUTING STANDARD DEVIATIONS
V FACTOR CELLS
1 1 1
1 0.3333 +
1 0.3333 +
1 0.3333 +
Explanation of notation and interpretation of tables
2.3.4.3.2. Left-right balanced design for 4 standard cells
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3432.htm[6/27/2012 1:51:01 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.2. Left-right balanced design for 4
standard cells
Design 1,1,1,1
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) - +
Y(7) - +
Y(8) - +
Y(9) - +
Y(10) - +
Y(11) - +
Y(12) + -
RESTRAINT + + + +
DEGREES OF FREEDOM = 8
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1 P
Y(1) 1 -1 0 0 1
Y(2) 1 0 -1 0 1
Y(3) 0 1 -1 0 1
Y(4) 0 1 0 -1 1
Y(5) 0 0 1 -1 1
Y(6) -1 0 1 0 1
Y(7) 0 -1 1 0 1
Y(8) 0 -1 0 1 1
Y(9) -1 0 0 1 1
Y(10) 0 0 -1 1 1
Y(11) -1 1 0 0 1
Y(12) 1 0 0 -1 1
R* 2 2 2 2 0
R* = AVERAGE VALUE OF 4 REFERENCE CELLS
P = LEFT-RIGHT BIAS
FACTORS FOR COMPUTING STANDARD DEVIATIONS
V FACTOR CELLS
1 1 1 1
1 0.3062 +
1 0.3062 +
1 0.3062 +
1 0.3062 +
Explanation of notation and interpretation of tables
2.3.4.3.2. Left-right balanced design for 4 standard cells
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3432.htm[6/27/2012 1:51:01 PM]
2.3.4.3.3. Left-right balanced design for 5 standard cells
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3433.htm[6/27/2012 1:51:02 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.3. Left-right balanced design for 5
standard cells
Design 1,1,1,1,1
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) - +
Y(9) - +
Y(10) - +
RESTRAINT + + + + +
DEGREES OF FREEDOM = 5
SOLUTION MATRIX
DIVISOR = 5
OBSERVATIONS 1 1 1 1 1 P
Y(1) 1 -1 0 0 0 1
Y(2) 1 0 -1 0 0 1
Y(3) 0 1 -1 0 0 1
Y(4) 0 1 0 -1 0 1
Y(5) 0 0 1 -1 0 1
Y(6) 0 0 1 0 -1 1
Y(7) 0 0 0 1 -1 1
Y(8) -1 0 0 1 0 1
Y(9) -1 0 0 0 1 1
Y(10) 0 -1 0 0 1 1
R* 1 1 1 1 1 0
R* = AVERAGE VALUE OF 5 REFERENCE CELLS
P = LEFT-RIGHT BIAS
FACTORS FOR COMPUTING REPEATABILITY STANDARD DEVIATIONS
V FACTOR CELLS
1 1 1 1 1
1 0.4000 +
1 0.4000 +
1 0.4000 +
1 0.4000 +
1 0.4000 +
Explanation of notation and interpretation of tables
2.3.4.3.3. Left-right balanced design for 5 standard cells
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3433.htm[6/27/2012 1:51:02 PM]
2.3.4.3.4. Left-right balanced design for 6 standard cells
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3434.htm[6/27/2012 1:51:02 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.4. Left-right balanced design for 6
standard cells
Design 1,1,1,1,1,1
CELLS
OBSERVATIONS 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) + -
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) + -
Y(14) + -
Y(15) + -
RESTRAINT + + + + + +
DEGREES OF FREEDOM = 9
SOLUTION MATRIX
DIVISOR = 6
OBSERVATIONS 1 1 1 1 1 1
P
Y(1) 1 -1 0 0 0 0
1
Y(2) 1 0 -1 0 0 0
1
Y(3) 0 1 -1 0 0 0
1
Y(4) 0 1 0 -1 0 0
1
Y(5) 0 0 1 -1 0 0
1
Y(6) 0 0 1 0 -1 0
1
Y(7) 0 0 0 1 -1 0
1
Y(8) 0 0 0 1 0 -1
1
Y(9) 0 0 0 0 1 -1
1
Y(10) -1 0 0 0 1 0
1
Y(11) -1 0 0 0 0 1
1
Y(12) 0 -1 0 0 0 1
1
2.3.4.3.4. Left-right balanced design for 6 standard cells
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3434.htm[6/27/2012 1:51:02 PM]
Y(13) 1 0 0 -1 0 0
1
Y(14) 0 1 0 0 -1 0
1
Y(15) 0 0 1 0 0 -1
1
R* 1 1 1 1 1 1
0
R* = AVERAGE VALUE OF 6 REFERENCE CELLS
P = LEFT-RIGHT BIAS
FACTORS FOR COMPUTING STANDARD DEVIATIONS
V FACTOR CELLS
1 1 1 1 1 1
1 0.3727 +
1 0.3727 +
1 0.3727 +
1 0.3727 +
1 0.3727 +
1 0.3727 +
Explanation of notation and interpretation of tables
2.3.4.3.5. Left-right balanced design for 4 references and 4 test items
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3435.htm[6/27/2012 1:51:03 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.5. Left-right balanced design for 4
references and 4 test items
Design for 4 references and 4 test items.
OBSERVATIONS 1 1 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) - +
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) - +
Y(14) - +
Y(15) - +
Y(16) - +
RESTRAINT + + + +
DEGREES OF FREEDOM = 8
SOLUTION MATRIX
DIVISOR = 16
OBSERVATIONS 1 1 1 1 1 1
1 1 P
Y(1) 3 -1 -1 -1 -4 0
0 0 1
Y(2) 3 -1 -1 -1 0 0
-4 0 1
Y(3) -1 -1 3 -1 0 0
-4 0 1
Y(4) -1 -1 3 -1 -4 0
0 0 1
Y(5) -1 3 -1 -1 0 -4
0 0 1
Y(6) -1 3 -1 -1 0 0
0 -4 1
Y(7) -1 -1 -1 3 0 0
0 -4 1
Y(8) -1 -1 -1 3 0 -4
0 0 1
Y(9) -3 1 1 1 0 4
0 0 1
Y(10) -3 1 1 1 0 0
0 4 1
Y(11) 1 1 -3 1 0 0
0 4 1
Y(12) 1 1 -3 1 0 4
2.3.4.3.5. Left-right balanced design for 4 references and 4 test items
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3435.htm[6/27/2012 1:51:03 PM]
0 0 1
Y(13) 1 -3 1 1 4 0
0 0 1
Y(14) 1 -3 1 1 0 0
4 0 1
Y(15) 1 1 1 -3 0 0
4 0 1
Y(16) 1 1 1 -3 4 0
0 0 1
R* 4 4 4 4 4 4
4 4 0
R* = AVERAGE VALUE OF REFERENCE CELLS
P = ESTIMATE OF LEFT-RIGHT BIAS
FACTORS FOR COMPUTING STANDARD DEVIATIONS
V FACTORS CELLS
1 1 1 1 1 1 1 1
1 0.4330 +
1 0.4330 +
1 0.4330 +
1 0.4330 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
1 0.5000 +
Explanation of notation and interpretation of tables
2.3.4.3.6. Design for 8 references and 8 test items
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3436.htm[6/27/2012 1:51:03 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.6. Design for 8 references and 8 test
items
Design for 8 references and 8 test items.
TEST CELLS REFERENCE
CELLS
OBSERVATIONS 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1
Y(1) + -
Y(2) - +
Y(3) - +
Y(4) +
-
Y(5) +
-
Y(6) -
+
Y(7) -
+
Y(8) +
-
Y(9) + -
Y(10) + -
Y(11) - +
Y(12) -
+
Y(13) +
-
Y(14) +
-
Y(15) -
+
Y(16) -
+
RESTRAINT + + +
+ + + + +
DEGREES OF FREEDOM = 0
SOLUTION MATRIX FOR TEST CELLS
DIVISOR = 16
OBSERVATIONS 1 1 1 1 1 1
1 1
Y(1) 8 4 0 -4 -6 6
2 -2
Y(2) -8 4 0 -4 -6 6
2 -2
Y(3) 4 -8 -4 0 2 6
-6 -2
Y(4) 4 8 -4 0 2 6
-6 -2
Y(5) 0 -4 8 4 2 -2
-6 6
Y(6) 0 -4 -8 4 2 -2
-6 6
Y(7) -4 0 4 -8 -6 -2
2 6
Y(8) -4 0 4 8 -6 -2
2.3.4.3.6. Design for 8 references and 8 test items
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3436.htm[6/27/2012 1:51:03 PM]
2 6
Y(9) -6 -2 2 6 8 -4
0 4
Y(10) -6 6 2 -2 -4 8
4 0
Y(11) -6 6 2 -2 -4 -8
4 0
Y(12) 2 6 -6 -2 0 4
-8 -4
Y(13) 2 6 -6 -2 0 4
8 -4
Y(14) 2 -2 -6 6 4 0
-4 8
Y(15) 2 -2 -6 6 4 0
-4 -8
Y(16) -6 -2 2 6 -8 -4
0 4
R 2 2 2 2 2 2
2 2
SOLUTION MATRIX FOR REFERENCE
CELLS
DIVISOR = 16
OBSERVATIONS 1 1 1 1 1 1
1 1 P
Y(1) -7 7 5 3 1 -1
-3 -5 1
Y(2) -7 7 5 3 1 -1
-3 -5 1
Y(3) 3 5 7 -7 -5 -3
-1 1 1
Y(4) 3 5 7 -7 -5 -3
-1 1 1
Y(5) 1 -1 -3 -5 -7 7
5 3 1
Y(6) 1 -1 -3 -5 -7 7
5 3 1
Y(7) -5 -3 -1 1 3 5
7 -7 1
Y(8) -5 -3 -1 1 3 5
7 -7 1
Y(9) -7 -5 -3 -1 1 3
5 7 1
Y(10) -5 -7 7 5 3 1
-1 -3 1
Y(11) -5 -7 7 5 3 1
-1 -3 1
Y(12) 1 3 5 7 -7 -5
-3 -1 1
Y(13) 1 3 5 7 -7 -5
-3 -1 1
Y(14) 3 1 -1 -3 -5 -7
7 5 1
Y(15) 3 1 -1 -3 -5 -7
7 5 1
Y(16) -7 -5 -3 -1 1 3
5 7 1
R* 2 2 2 2 2 2
2 2 0
R* = AVERAGE VALUE OF 8 REFERENCE CELLS
P = ESTIMATE OF LEFT-RIGHT BIAS
FACTORS FOR COMPUTING STANDARD DEVIATIONS FOR TEST CELLS
V FACTORS TEST CELLS
1 1 1 1 1 1 1 1
1 1.1726 +
1 1.1726 +
1 1.1726 +
1 1.1726 +
1 1.1726 +
1 1.1726 +
1 1.1726 +
1 1.1726 +
Explanation of notation and interpretation of tables
2.3.4.3.6. Design for 8 references and 8 test items
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3436.htm[6/27/2012 1:51:03 PM]
2.3.4.3.7. Design for 4 reference zeners and 2 test zeners
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3437.htm[6/27/2012 1:51:04 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.7. Design for 4 reference zeners and 2
test zeners
Design for 4 references zeners and 2 test zeners.
ZENERS
OBSERVATIONS 1 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) + -
Y(8) + -
Y(9) - +
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) - +
Y(14) - +
Y(15) - +
Y(16) - +
RESTRAINT + + + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 10
SOLUTION MATRIX
DIVISOR = 16
OBSERVATIONS 1 1 1 1 1 1
P
Y(1) 3 -1 -1 -1 -2 0
1
Y(2) 3 -1 -1 -1 0 -2
1
Y(3) -1 3 -1 -1 -2 0
1
Y(4) -1 3 -1 -1 0 -2
1
Y(5) -1 -1 3 -1 -2 0
1
Y(6) -1 -1 3 -1 0 -2
1
Y(7) -1 -1 -1 3 -2 0
1
Y(8) -1 -1 -1 3 0 -2
1
Y(9) 1 1 1 -3 2 0
1
Y(10) 1 1 1 -3 0 2
2.3.4.3.7. Design for 4 reference zeners and 2 test zeners
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3437.htm[6/27/2012 1:51:04 PM]
1
Y(11) 1 1 -3 1 2 0
1
Y(12) 1 1 -3 1 0 2
1
Y(13) 1 -3 1 1 2 0
1
Y(14) 1 -3 1 1 0 2
1
Y(15) -3 1 1 1 2 0
1
Y(16) -3 1 1 1 0 2
1
R* 4 4 4 4 4 4
0
R* = AVERAGE VALUE OF 4 REFERENCE STANDARDS
P = LEFT-RIGHT EFFECT
FACTORS FOR COMPUTING STANDARD DEVIATIONS
V FACTORS ZENERS
1 1 1 1 1 1 P
1 0.4330 +
1 0.4330 +
1 0.4330 +
1 0.4330 +
1 0.3536 +
1 0.3536 +
1 0.2500 +
Explanation of notation and interpretation of tables
2.3.4.3.8. Design for 4 reference zeners and 3 test zeners
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3438.htm[6/27/2012 1:51:05 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.8. Design for 4 reference zeners and 3
test zeners
Design for 4 references and 3 test zeners.
ZENERS
OBSERVATIONS 1 1 1 1 1 1 1
Y(1) - +
Y(2) - +
Y(3) + -
Y(4) + -
Y(5) + -
Y(6) + -
Y(7) - +
Y(8) - +
Y(9) - +
Y(10) - +
Y(11) - +
Y(12) - +
Y(13) + -
Y(14) + -
Y(15) + -
Y(16) + -
Y(17) + -
Y(18) - +
RESTRAINT + + + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 11
SOLUTION MATRIX
DIVISOR = 1260
OBSERVATIONS 1 1 1 1 1 1
1 P
Y(1) -196 196 -56 56 0 0
0 70
Y(2) -160 -20 160 20 0 0
0 70
Y(3) 20 160 -20 -160 0 0
0 70
Y(4) 143 -53 -17 -73 0 0 -
315 70
Y(5) 143 -53 -17 -73 0 -315
0 70
Y(6) 143 -53 -17 -73 -315 0
0 70
Y(7) 53 -143 73 17 315 0
0 70
Y(8) 53 -143 73 17 0 315
0 70
2.3.4.3.8. Design for 4 reference zeners and 3 test zeners
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3438.htm[6/27/2012 1:51:05 PM]
Y(9) 53 -143 73 17 0 0
315 70
Y(10) 17 73 -143 53 0 0
315 70
Y(11) 17 73 -143 53 0 315
0 70
Y(12) 17 73 -143 53 315 0
0 70
Y(13) -73 -17 -53 143 -315 0
0 70
Y(14) -73 -17 -53 143 0 -315
0 70
Y(15) -73 -17 -53 143 0 0 -
315 70
Y(16) 56 -56 196 -196 0 0
0 70
Y(17) 20 160 -20 -160 0 0
0 70
Y(18) -160 -20 160 20 0 0
0 70
R* 315 315 315 315 315 315
315 0
R* = Average value of the 4 reference zeners
P = left-right effect
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
V K1 1 1 1 1 1 1 1
1 0.5000 +
1 0.5000 +
1 0.5000 +
2 0.7071 + +
3 0.8660 + + +
0 0.5578 + -
Explanation of notation and interpretation of tables
2.3.4.3.9. Design for 3 references and 1 test resistor
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3439.htm[6/27/2012 1:51:05 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.9. Design for 3 references and 1 test
resistor
Design 1,1,1,1
OBSERVATIONS 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) - +
Y(5) - +
Y(6) - +
RESTRAINT + + +
DEGREES OF FREEDOM = 3
SOLUTION MATRIX
DIVISOR = 6
OBSERVATIONS 1 1 1 1
Y(1) 1 -2 1 1
Y(2) 1 1 -2 1
Y(3) 0 0 0 -3
Y(4) 0 0 0 3
Y(5) -1 -1 2 -1
Y(6) -1 2 -1 -1
R 2 2 2 2
R = AVERAGE VALUE OF 3 REFERENCE RESISTORS
FACTORS FOR COMPUTING STANDARD DEVIATIONS
OHM FACTORS RESISTORS
1 1 1 1
1 0.3333 +
1 0.5270 +
1 0.5270 +
1 0.7817 +
Explanation of notation and interpretation of tables
2.3.4.3.10. Design for 4 references and 1 test resistor
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc343a.htm[6/27/2012 1:51:06 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.3. Designs for electrical quantities
2.3.4.3.10. Design for 4 references and 1 test
resistor
Design 1,1,1,1,1
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) - +
Y(6) - +
Y(7) - +
Y(8) - +
RESTRAINT + + + +
DEGREES OF FREEDOM = 4
SOLUTION MATRIX
DIVISOR = 8
OBSERVATIONS 1 1 1 1 1
Y(1) 3 -1 -1 -1 -1
Y(2) -1 3 -1 -1 -1
Y(3) -1 -1 3 -1 -1
Y(4) -1 -1 -1 3 -1
Y(5) 1 1 1 -3 1
Y(6) 1 1 -3 1 1
Y(7) 1 -3 1 1 1
Y(8) -3 1 1 1 1
R 2 2 2 2 2
R = AVERAGE VALUE OF REFERENCE RESISTORS
FACTORS FOR COMPUTING STANDARD DEVIATIONS
OHM FACTORS
1 1 1 1 1
1 0.6124 +
1 0.6124 +
1 0.6124 +
1 0.6124 +
1 0.3536 +
Explanation of notation and interpretation of tables
2.3.4.4. Roundness measurements
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc344.htm[6/27/2012 1:51:06 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.4. Roundness measurements
Roundness
measurements
Measurements of roundness require 360 traces of the workpiece made
with a turntable-type instrument or a stylus-type instrument. A least
squares fit of points on the trace to a circle define the parameters of
noncircularity of the workpiece. A diagram of the measurement method
is shown below.
The diagram
shows the
trace and Y,
the distance
from the
spindle center
to the trace at
the angle.
A least
squares circle
fit to data at
equally
spaced angles
gives
estimates of P
- R, the
noncircularity,
where R =
radius of the
circle and P =
distance from
the center of
the circle to
the trace.
Low precision
measurements
Some measurements of roundness do not require a high level of
precision, such as measurements on cylinders, spheres, and ring gages
where roundness is not of primary importance. For this purpose, a
single trace is made of the workpiece.
Weakness of
single trace
method
The weakness of this method is that the deviations contain both the
spindle error and the workpiece error, and these two errors cannot be
separated with the single trace. Because the spindle error is usually
small and within known limits, its effect can be ignored except when
2.3.4.4. Roundness measurements
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc344.htm[6/27/2012 1:51:06 PM]
the most precise measurements are needed.
High precision
measurements
High precision measurements of roundness are appropriate where an
object, such as a hemisphere, is intended to be used primarily as a
roundness standard.
Measurement
method
The measurement sequence involves making multiple traces of the
roundness standard where the standard is rotated between traces. Least-
squares analysis of the resulting measurements enables the
noncircularity of the spindle to be separated from the profile of the
standard.
Choice of
measurement
method
A synopsis of the measurement method and the estimation technique
are given in this chapter for:
Single-trace method
Multiple-trace method
The reader is encouraged to obtain a copy of the publication on
roundness (Reeve) for a more complete description of the measurement
method and analysis.
2.3.4.4.1. Single-trace roundness design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3441.htm[6/27/2012 1:51:07 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.4. Roundness measurements
2.3.4.4.1. Single-trace roundness design
Low precision
measurements
Some measurements of roundness do not require a high
level of precision, such as measurements on cylinders,
spheres, and ring gages where roundness is not of primary
importance. The diagram of the measurement method
shows the trace and Y, the distance from the spindle center
to the trace at the angle. A least-squares circle fit to data at
equally spaced angles gives estimates of P - R, the
noncircularity, where R = radius of the circle and P =
distance from the center of the circle to the trace.
Single trace
method
For this purpose, a single trace covering exactly 360 is
made of the workpiece and measurements at angles
of the distance between the center of the spindle and the
trace, are made at
equally spaced angles. A least-squares circle fit to the data
gives the following estimators of the parameters of the
circle.
.
Noncircularity
of workpiece
The deviation of the trace from the circle at angle ,
which defines the noncircularity of the workpiece, is
estimated by:
2.3.4.4.1. Single-trace roundness design
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3441.htm[6/27/2012 1:51:07 PM]
Weakness of
single trace
method
The weakness of this method is that the deviations contain
both the spindle error and the workpiece error, and these
two errors cannot be separated with the single trace.
Because the spindle error is usually small and within
known limits, its effect can be ignored except when the
most precise measurements are needed.
2.3.4.4.2. Multiple-trace roundness designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3442.htm[6/27/2012 1:51:08 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.4. Roundness measurements
2.3.4.4.2. Multiple-trace roundness designs
High
precision
measurements
High precision roundness measurements are required when
an object, such as a hemisphere, is intended to be used
primarily as a roundness standard. The method outlined on
this page is appropriate for either a turntable-type
instrument or a spindle-type instrument.
Measurement
method
The measurement sequence involves making multiple
traces of the roundness standard where the standard is
rotated between traces. Least-squares analysis of the
resulting measurements enables the noncircularity of the
spindle to be separated from the profile of the standard.
The reader is referred to the publication on the subject
(Reeve) for details covering measurement techniques and
analysis.
Method of n
traces
The number of traces that are made on the workpiece is
arbitrary but should not be less than four. The workpiece is
centered as well as possible under the spindle. The mark on
the workpiece which denotes the zero angular position is
aligned with the zero position of the spindle as shown in
the graph. A trace is made with the workpiece in this
position. The workpiece is then rotated clockwise by 360/n
degrees and another trace is made. This process is
continued until n traces have been recorded.
Mathematical
model for
estimation
For i = 1,...,n, the ith angular position is denoted by
Definition of
terms relating
to distances
to the least
squares circle
The deviation from the least squares circle (LSC) of the
workpiece at the position is .
The deviation of the spindle from its LSC at the
position is .
Terms
relating to
For the jth graph, let the three parameters that define the
LSC be given by
2.3.4.4.2. Multiple-trace roundness designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3442.htm[6/27/2012 1:51:08 PM]
parameters of
least squares
circle
defining the radius R, a, and b as shown in the graph. In an
idealized measurement system these parameters would be
constant for all j. In reality, each rotation of the workpiece
causes it to shift a small amount vertically and horizontally.
To account for this shift, separate parameters are needed
for each trace.
Correction
for
obstruction to
stylus
Let be the observed distance (in polar graph units) from
the center of the jth graph to the point on the curve that
corresponds to the position of the spindle. If K is the
magnification factor of the instrument in microinches/polar
graph unit and is the angle between the lever arm of the
stylus and the tangent to the workpiece at the point of
contact (which normally can be set to zero if there is no
obstruction), the transformed observations to be used in the
estimation equations are:
.
Estimates for
parameters
The estimation of the individual parameters is obtained as a
least-squares solution that requires six restraints which
essentially guarantee that the sum of the vertical and
horizontal deviations of the spindle from the center of the
LSC are zero. The expressions for the estimators are as
follows:
where
2.3.4.4.2. Multiple-trace roundness designs
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3442.htm[6/27/2012 1:51:08 PM]
Finally, the standard deviations of the profile estimators are
given by:
Computation
of standard
deviation
The computation of the residual standard deviation of the
fit requires, first, the computation of the predicted values,
The residual standard deviation with v = n*n - 5n + 6
degrees of freedom is
2.3.4.5. Designs for angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc345.htm[6/27/2012 1:51:09 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.5. Designs for angle blocks
Purpose The purpose of this section is to explain why calibration of angle
blocks of the same size in groups is more efficient than calibration
of angle blocks individually.
Calibration
schematic for
five angle
blocks
showing the
reference as
block 1 in the
center of the
diagram, the
check
standard as
block 2 at the
top; and the
test blocks as
blocks 3, 4,
and 5.
A schematic of a calibration scheme for 1 reference block, 1 check
standard, and three test blocks is shown below. The reference
block, R, is shown in the center of the diagram and the check
standard, C, is shown at the top of the diagram.
Block sizes Angle blocks normally come in sets of
1, 3, 5, 20, and 30 seconds
1, 3, 5, 20, 30 minutes
1, 3, 5, 15, 30, 45 degrees
and blocks of the same nominal size from 4, 5 or 6 different sets
can be calibrated simultaneously using one of the designs shown in
this catalog.
Design for 4 angle blocks
2.3.4.5. Designs for angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc345.htm[6/27/2012 1:51:09 PM]
Design for 5 angle blocks
Design for 6 angle blocks
Restraint The solution to the calibration design depends on the known value
of a reference block, which is compared with the test blocks. The
reference block is designated as block 1 for the purpose of this
discussion.
Check
standard
It is suggested that block 2 be reserved for a check standard that is
maintained in the laboratory for quality control purposes.
Calibration
scheme
A calibration scheme developed by Charles Reeve (Reeve) at the
National Institute of Standards and Technology for calibrating
customer angle blocks is explained on this page. The reader is
encouraged to obtain a copy of the publication for details on the
calibration setup and quality control checks for angle block
calibrations.
Series of
measurements
for calibrating
4, 5, and 6
angle blocks
simultaneously
For all of the designs, the measurements are made in groups of
seven starting with the measurements of blocks in the following
order: 2-3-2-1-2-4-2. Schematically, the calibration design is
completed by counter-clockwise rotation of the test blocks about
the reference block, one-at-a-time, with 7 readings for each series
reduced to 3 difference measurements. For n angle blocks
(including the reference block), this amounts to n - 1 series of 7
readings. The series for 4, 5, and 6 angle blocks are shown below.
Measurements
for 4 angle
blocks
Series 1: 2-3-2-1-2-4-2
Series 2: 4-2-4-1-4-3-4
Series 3: 3-4-3-1-3-2-3
Measurements
for 5 angle
blocks (see
diagram)
Series 1: 2-3-2-1-2-4-2
Series 2: 5-2-5-1-5-3-5
Series 3: 4-5-4-1-4-2-4
Series 4: 3-4-3-1-3-5-3
Measurements
for 6 angle
blocks
Series 1: 2-3-2-1-2-4-2
Series 2: 6-2-6-1-6-3-6
Series 3: 5-6-5-1-5-2-5
Series 4: 4-5-4-1-4-6-4
Series 5: 3-4-3-1-3-5-3
Equations for
the
measurements
in the first
series showing
error sources
The equations explaining the seven measurements for the first
series in terms of the errors in the measurement system are:
Z
11
= B + X
1
+ error
11
Z
12
= B + X
2
+ d + error
12
Z
13
= B + X
3
+ 2d + error
13
Z
14
= B + X
4
+ 3d + error
14
Z = B + X + 4d + error
2.3.4.5. Designs for angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc345.htm[6/27/2012 1:51:09 PM]
15 5 15
Z
16
= B + X
6
+ 5d + error
16
Z
17
= B + X
7
+ 6d + error
17
with B a bias associated with the instrument, d is a linear drift
factor, X is the value of the angle block to be determined; and the
error terms relate to random errors of measurement.
Calibration
procedure
depends on
difference
measurements
The check block, C, is measured before and after each test block,
and the difference measurements (which are not the same as the
difference measurements for calibrations of mass weights, gage
blocks, etc.) are constructed to take advantage of this situation.
Thus, the 7 readings are reduced to 3 difference measurements for
the first series as follows:
For all series, there are 3(n - 1) difference measurements, with the
first subscript in the equations above referring to the series number.
The difference measurements are free of drift and instrument bias.
Design matrix As an example, the design matrix for n = 4 angle blocks is shown
below.
1 1 1 1
0 1 -1 0
-1 1 0 0
0 1 0 -1
0 -1 0 1
-1 0 0 1
0 0 -1 1
0 0 1 -1
-1 0 1 0
0 -1 1 0
The design matrix is shown with the solution matrix for
identification purposes only because the least-squares solution is
weighted (Reeve) to account for the fact that test blocks are
measured twice as many times as the reference block. The weight
matrix is not shown.
Solutions to
the calibration
designs
measurements
Solutions to the angle block designs are shown on the following
pages. The solution matrix and factors for the repeatability standard
deviation are to be interpreted as explained in solutions to
calibration designs . As an example, the solution for the design for
n=4 angle blocks is as follows:
The solution for the reference standard is shown under the first
column of the solution matrix; for the check standard under the
second column; for the first test block under the third column; and
for the second test block under the fourth column. Notice that the
estimate for the reference block is guaranteed to be R*, regardless
2.3.4.5. Designs for angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc345.htm[6/27/2012 1:51:09 PM]
of the measurement results, because of the restraint that is imposed
on the design. Specifically,
Solutions are correct only for the restraint as shown.
Calibrations
can be run for
top and
bottom faces
of blocks
The calibration series is run with the blocks all face "up" and is
then repeated with the blocks all face "down", and the results
averaged. The difference between the two series can be large
compared to the repeatability standard deviation, in which case a
between-series component of variability must be included in the
calculation of the standard deviation of the reported average.
Calculation of
standard
deviations
when the
blocks are
measured in
two
orientations
For n blocks, the differences between the values for the blocks
measured in the top ( denoted by "t") and bottom (denoted by "b")
positions are denoted by:
The standard deviation of the average (for each block) is calculated
from these differences to be:
Standard
deviations
when the
blocks are
measured in
only one
If the blocks are measured in only one orientation, there is no way
to estimate the between-series component of variability and the
standard deviation for the value of each block is computed as
s
test
= K
1
s
1
2.3.4.5. Designs for angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc345.htm[6/27/2012 1:51:09 PM]
orientation where K
1
is shown under "Factors for computing repeatability
standard deviations" for each design and is the repeatability
standard deviation as estimated from the design. Because this
standard deviation may seriously underestimate the uncertainty, a
better approach is to estimate the standard deviation from the data
on the check standard over time. An expanded uncertainty is
computed according to the ISO guidelines.
2.3.4.5.1. Design for 4 angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3451.htm[6/27/2012 1:51:10 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.5. Designs for angle blocks
2.3.4.5.1. Design for 4 angle blocks
DESIGN MATRIX
1 1 1 1
Y(1) 0 1 -1 0
Y(2) -1 1 0 0
Y(3) 0 1 0 -1
Y(4) 0 -1 0 1
Y(5) -1 0 0 1
Y(6) 0 0 -1 1
Y(7) 0 0 1 -1
Y(8) -1 0 1 0
Y(9) 0 -1 1 0
REFERENCE +
CHECK STANDARD +
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 24
OBSERVATIONS 1 1 1 1
Y(11) 0 2.2723000 -5.0516438 -
1.2206578
Y(12) 0 9.3521166 7.3239479
7.3239479
Y(13) 0 2.2723000 -1.2206578 -
5.0516438
Y(21) 0 -5.0516438 -1.2206578
2.2723000
Y(22) 0 7.3239479 7.3239479
9.3521166
Y(23) 0 -1.2206578 -5.0516438
2.2723000
Y(31) 0 -1.2206578 2.2723000 -
5.0516438
Y(32) 0 7.3239479 9.3521166
7.3239479
Y(33) 0 -5.0516438 2.2723000 -
1.2206578
R* 1 1. 1. 1.
R* = VALUE OF REFERENCE ANGLE BLOCK
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
SIZE K1
1 1 1 1
1 0.0000 +
1 0.9749 +
1 0.9749 +
1 0.9749 +
1 0.9749 +
Explanation of notation and interpretation of tables
2.3.4.5.1. Design for 4 angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3451.htm[6/27/2012 1:51:10 PM]
2.3.4.5.2. Design for 5 angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3452.htm[6/27/2012 1:51:11 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.5. Designs for angle blocks
2.3.4.5.2. Design for 5 angle blocks
DESIGN MATRIX
1 1 1 1 1
0 1 -1 0 0
-1 1 0 0 0
0 1 0 -1 0
0 -1 0 0 1
-1 0 0 0 1
0 0 -1 0 1
0 0 0 1 -1
-1 0 0 1 0
0 -1 0 1 0
0 0 1 -1 0
-1 0 1 0 0
0 0 1 0 -1
REFERENCE +
CHECK STANDARD +
DEGREES OF FREEDOM = 8
SOLUTION MATRIX
DIVISOR = 24
OBSERVATIONS 1 1 1 1 1
Y(11) 0.00000 3.26463 -5.48893 -0.21200 -1.56370
Y(12) 0.00000 7.95672 5.38908 5.93802 4.71618
Y(13) 0.00000 2.48697 -0.89818 -4.80276 -0.78603
Y(21) 0.00000 -5.48893 -0.21200 -1.56370 3.26463
Y(22) 0.00000 5.38908 5.93802 4.71618 7.95672
Y(23) 0.00000 -0.89818 -4.80276 -0.78603 2.48697
Y(31) 0.00000 -0.21200 -1.56370 3.26463 -5.48893
Y(32) 0.00000 5.93802 4.71618 7.95672 5.38908
Y(33) 0.00000 -4.80276 -0.78603 2.48697 -0.89818
Y(41) 0.00000 -1.56370 3.26463 -5.48893 -0.21200
Y(42) 0.00000 4.71618 7.95672 5.38908 5.93802
Y(43) 0.00000 -0.78603 2.48697 -0.89818 -4.80276
R* 1. 1. 1. 1. 1.
R* = VALUE OF REFERENCE ANGLE BLOCK
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
SIZE K1
1 1 1 1 1
1 0.0000 +
1 0.7465 +
1 0.7465 +
1 0.7456 +
1 0.7456 +
1 0.7465 +
Explanation of notation and interpretation of tables
2.3.4.5.2. Design for 5 angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3452.htm[6/27/2012 1:51:11 PM]
2.3.4.5.3. Design for 6 angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3453.htm[6/27/2012 1:51:11 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.5. Designs for angle blocks
2.3.4.5.3. Design for 6 angle blocks
DESIGN MATRIX
1 1 1 1 1 1
0 1 -1 0 0 0
-1 1 0 0 0 0
0 1 0 -1 0 0
0 -1 0 0 0 1
-1 0 0 0 0 1
0 0 -1 0 0 1
0 0 0 0 1 -1
-1 0 0 0 1 0
0 -1 0 0 1 0
0 0 0 1 -1 0
-1 0 0 1 0 0
0 0 0 1 0 -1
0 0 1 -1 0 0
-1 0 1 0 0 0
0 0 1 0 -1 0
REFERENCE +
CHECK STANDARD +
DEGREES OF FREEDOM = 10
SOLUTION MATRIX
DIVISOR = 24
OBSERVATIONS 1 1 1 1 1
1
Y(11) 0.0000 3.2929 -5.2312 -0.7507 -0.6445
-0.6666
Y(12) 0.0000 6.9974 4.6324 4.6495 3.8668
3.8540
Y(13) 0.0000 3.2687 -0.7721 -5.2098 -0.6202
-0.6666
Y(21) 0.0000 -5.2312 -0.7507 -0.6445 -0.6666
3.2929
Y(22) 0.0000 4.6324 4.6495 3.8668 3.8540
6.9974
Y(23) 0.0000 -0.7721 -5.2098 -0.6202 -0.6666
3.2687
Y(31) 0.0000 -0.7507 -0.6445 -0.6666 3.2929
-5.2312
Y(32) 0.0000 4.6495 3.8668 3.8540 6.9974
4.6324
Y(33) 0.0000 -5.2098 -0.6202 -0.6666 3.2687
-0.7721
Y(41) 0.0000 -0.6445 -0.6666 3.2929 -5.2312
-0.7507
Y(42) 0.0000 3.8668 3.8540 6.9974 4.6324
4.6495
Y(43) 0.0000 -0.6202 -0.6666 3.2687 -0.7721
-5.2098
Y(51) 0.0000 -0.6666 3.2929 -5.2312 -0.7507
-0.6445
2.3.4.5.3. Design for 6 angle blocks
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3453.htm[6/27/2012 1:51:11 PM]
Y(52) 0.0000 3.8540 6.9974 4.6324 4.6495
3.8668
Y(53) 0.0000 -0.6666 3.2687 -0.7721 -5.2098
-0.6202
R* 1. 1. 1. 1. 1.
1.
R* = VALUE OF REFERENCE ANGLE BLOCK
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
SIZE K1
1 1 1 1 1 1
1 0.0000 +
1 0.7111 +
1 0.7111 +
1 0.7111 +
1 0.7111 +
1 0.7111 +
1 0.7111 +
Explanation of notation and interpretation of tables
2.3.4.6. Thermometers in a bath
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc346.htm[6/27/2012 1:51:12 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.6. Thermometers in a bath
Measurement
sequence
Calibration of liquid in glass thermometers is usually carried
out in a controlled bath where the temperature in the bath is
increased steadily over time to calibrate the thermometers
over their entire range. One way of accounting for the
temperature drift is to measure the temperature of the bath
with a standard resistance thermometer at the beginning,
middle and end of each run of K test thermometers. The test
thermometers themselves are measured twice during the run
in the following time sequence:
where R
1
, R
2
, R
3
represent the measurements on the
standard resistance thermometer and T
1
, T
2
, ... , T
K
and T'
1
,
T'
2
, ... , T'
K
represent the pair of measurements on the K test
thermometers.
Assumptions
regarding
temperature
The assumptions for the analysis are that:
Equal time intervals are maintained between
measurements on the test items.
Temperature increases by with each interval.
A temperature change of is allowed for the reading
of the resistance thermometer in the middle of the
run.
Indications
for test
thermometers
It can be shown (Cameron and Hailes) that the average
reading for a test thermometer is its indication at the
temperature implied by the average of the three resistance
readings. The standard deviation associated with this
indication is calculated from difference readings where
is the difference for the ith thermometer. This difference is
an estimate of .
Estimates of
drift
The estimates of the shift due to the resistance thermometer
and temperature drift are given by:
2.3.4.6. Thermometers in a bath
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc346.htm[6/27/2012 1:51:12 PM]
Standard
deviations
The residual variance is given by
.
The standard deviation of the indication assigned to the ith
test thermometer is
and the standard deviation for the estimates of shift and
drift are
respectively.
2.3.4.7. Humidity standards
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc347.htm[6/27/2012 1:51:13 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.7. Humidity standards
Humidity
standards
The calibration of humidity standards usually involves the
comparison of reference weights with cylinders containing
moisture. The designs shown in this catalog are drift-
eliminating and may be suitable for artifacts other than
humidity cylinders.
List of
designs
2 reference weights and 3 cylinders
2.3.4.7.1. Drift-elimination design for 2 reference weights and 3 cylinders
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3471.htm[6/27/2012 1:51:13 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.4. Catalog of calibration designs
2.3.4.7. Humidity standards
2.3.4.7.1. Drift-elimination design for 2
reference weights and 3 cylinders
OBSERVATIONS 1 1 1 1 1
Y(1) + -
Y(2) + -
Y(3) + -
Y(4) + -
Y(5) - +
Y(6) - +
Y(7) + -
Y(8) + -
Y(9) - +
Y(10) + -
RESTRAINT + +
CHECK STANDARD + -
DEGREES OF FREEDOM = 6
SOLUTION MATRIX
DIVISOR = 10
OBSERVATIONS 1 1 1 1 1
Y(1) 2 -2 0 0 0
Y(2) 0 0 0 2 -2
Y(3) 0 0 2 -2 0
Y(4) -1 1 -3 -1 -1
Y(5) -1 1 1 1 3
Y(6) -1 1 1 3 1
Y(7) 0 0 2 0 -2
Y(8) -1 1 -1 -3 -1
Y(9) 1 -1 1 1 3
Y(10) 1 -1 -3 -1 -1
R* 5 5 5 5 5
R* = average value of the two reference weights
FACTORS FOR REPEATABILITY STANDARD DEVIATIONS
WT K1 1 1 1 1 1
1 0.5477 +
1 0.5477 +
1 0.5477 +
2 0.8944 + +
3 1.2247 + + +
0 0.6325 + -
Explanation of notation and interpretation of tables
2.3.4.7.1. Drift-elimination design for 2 reference weights and 3 cylinders
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3471.htm[6/27/2012 1:51:13 PM]
2.3.5. Control of artifact calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc35.htm[6/27/2012 1:51:14 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.5. Control of artifact calibration
Purpose The purpose of statistical control in the calibration process is
to guarantee the 'goodness' of calibration results within
predictable limits and to validate the statement of uncertainty
of the result. Two types of control can be imposed on a
calibration process that makes use of statistical designs:
1. Control of instrument precision or short-term variability
2. Control of bias and long-term variability
Example of a Shewhart control chart
Example of an EWMA control chart
Short-term
standard
deviation
The short-term standard deviation from each design is the
basis for controlling instrument precision. Because the
measurements for a single design are completed in a short
time span, this standard deviation estimates the basic precision
of the instrument. Designs should be chosen to have enough
measurements so that the standard deviation from the design
has at least 3 degrees of freedom where the degrees of
freedom are (n - m + 1) with
n = number of difference measurements
m = number of artifacts.
Check
standard
Measurements on a check standard provide the mechanism for
controlling the bias and long-term variability of the calibration
process. The check standard is treated as one of the test items
in the calibration design, and its value as computed from each
calibration run is the basis for accepting or rejecting the
calibration. All designs cataloged in this Handbook have
provision for a check standard.
The check standard should be of the same type and geometry
as items that are measured in the designs. These artifacts must
be stable and available to the calibration process on a
continuing basis. There should be a check standard at each
critical level of measurement. For example, for mass
calibrations there should be check standards at the 1 kg; 100 g,
10 g, 1 g, 0.1 g levels, etc. For gage blocks, there should be
check standards at all nominal lengths.
A check standard can also be a mathematical construction,
2.3.5. Control of artifact calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc35.htm[6/27/2012 1:51:14 PM]
such as the computed difference between the calibrated values
of two reference standards in a design.
Database
of check
standard
values
The creation and maintenance of the database of check
standard values is an important aspect of the control process.
The results from each calibration run are recorded in the
database. The best way to record this information is in one file
with one line (row in a spreadsheet) of information in fixed
fields for each calibration run. A list of typical entries follows:
1. Date
2. Identification for check standard
3. Identification for the calibration design
4. Identification for the instrument
5. Check standard value
6. Repeatability standard deviation from design
7. Degrees of freedom
8. Operator identification
9. Flag for out-of-control signal
10. Environmental readings (if pertinent)
2.3.5.1. Control of precision
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc351.htm[6/27/2012 1:51:14 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.5. Control of artifact calibration
2.3.5.1. Control of precision
Control
parameters
from
historical
data
A modified control chart procedure is used for controlling
instrument precision. The procedure is designed to be
implemented in real time after a baseline and control limit for
the instrument of interest have been established from the
database of short-term standard deviations. A separate control
chart is required for each instrument -- except where
instruments are of the same type with the same basic
precision, in which case they can be treated as one.
The baseline is the process standard deviation that is pooled
from k = 1, ..., K individual repeatability standard deviations,
, in the database, each having degrees of freedom. The
pooled repeatability standard deviation is
with degrees of freedom
.
Control
procedure
is invoked
in real-
time for
each
calibration
run
The control procedure compares each new repeatability
standard deviation that is recorded for the instrument with an
upper control limit, UCL. Usually, only the upper control limit
is of interest because we are primarily interested in detecting
degradation in the instrument's precision. A possible
complication is that the control limit is dependent on the
degrees of freedom in the new standard deviation and is
computed as follows:
.
The quantity under the radical is the upper percentage point
from the F table where is chosen small to be, say, 0.05. The
other two terms refer to the degrees of freedom in the new
standard deviation and the degrees of freedom in the process
standard deviation.
2.3.5.1. Control of precision
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc351.htm[6/27/2012 1:51:14 PM]
Limitation
of
graphical
method
The graphical method of plotting every new estimate of
repeatability on a control chart does not work well when the
UCL can change with each calibration design, depending on
the degrees of freedom. The algebraic equivalent is to test if
the new standard deviation exceeds its control limit, in which
case the short-term precision is judged to be out of control
and the current calibration run is rejected. For more guidance,
see Remedies and strategies for dealing with out-of-control
signals.
As long as the repeatability standard deviations are in control,
there is reason for confidence that the precision of the
instrument has not degraded.
Case
study:
Mass
balance
precision
It is recommended that the repeatability standard deviations be
plotted against time on a regular basis to check for gradual
degradation in the instrument. Individual failures may not
trigger a suspicion that the instrument is in need of adjustment
or tuning.
2.3.5.1.1. Example of control chart for precision
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3511.htm[6/27/2012 1:51:15 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.5. Control of artifact calibration
2.3.5.1. Control of precision
2.3.5.1.1. Example of control chart for precision
Example of a
control chart
for precision
of a mass
balance
Mass calibrations usually start with the comparison of kilograms standards using a high
precision balance as a comparator. Many of the measurements at the kilogram level that were
made at NIST between 1975 and 1989 were made on balance #12 using a 1,1,1,1 calibration
design. The redundancy in the calibration design produces estimates for the individual
kilograms and a repeatability standard deviation with three degrees of freedom for each
calibration run. These standard deviations estimate the precision of the balance.
Need for
monitoring
precision
The precision of the balance is monitored to check for:
1. Slow degradation in the balance
2. Anomalous behavior at specific times
Monitoring
technique for
standard
deviations
The standard deviations over time and many calibrations are tracked and monitored using a
control chart for standard deviations. The database and control limits are updated on a yearly
or bi-yearly basis and standard deviations for each calibration run in the next cycle are
compared with the control limits. In this case, the standard deviations from 117 calibrations
between 1975 and 1985 were pooled to obtain a repeatability standard deviation with v =
3*117 = 351 degrees of freedom, and the control limits were computed at the 1 %
significance level.
Control chart
for precision
The following control chart for precision for balance #12 can be generated using both
Dataplot code and R code.
2.3.5.1.1. Example of control chart for precision
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3511.htm[6/27/2012 1:51:15 PM]
Interpretation
of the control
chart
The control chart shows that the precision of the balance remained in control through the first
five months of 1988 with only two violations of the control limits. For those occasions, the
calibrations were discarded and repeated. Clearly, for the second violation, something
significant occurred that invalidated the calibration results.
Further
interpretation
of the control
chart
However, it is also clear from the pattern of standard deviations over time that the precision of
the balance was gradually degrading and more and more points were approaching the control
limits. This finding led to a decision to replace this balance for high accuracy calibrations.
2.3.5.2. Control of bias and long-term variability
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc352.htm[6/27/2012 1:51:16 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.5. Control of artifact calibration
2.3.5.2. Control of bias and long-term
variability
Control
parameters
are estimated
using
historical
data
A control chart procedure is used for controlling bias and
long-term variability. The procedure is designed to be
implemented in real time after a baseline and control limits
for the check standard of interest have been established
from the database of check standard values. A separate
control chart is required for each check standard. The
control procedure outlined here is based on a Shewhart
control chart with upper and lower control limits that are
symmetric about the average. The EWMA control
procedure that is sensitive to small changes in the process is
discussed on another page.
For a
Shewhart
control
procedure,
the average
and standard
deviation of
historical
check
standard
values are
the
parameters of
interest
The check standard values are denoted by
The baseline is the process average which is computed from
the check standard values as
The process standard deviation is
with K - 1 degrees of freedom.
The control
limits depend
on the t
distribution
and the
degrees of
freedom in
the process
standard
deviation
If has been computed from historical data, the upper and
lower control limits are:
where t
1-/2, K-1
denotes the 1-/2 critical value from the t
2.3.5.2. Control of bias and long-term variability
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc352.htm[6/27/2012 1:51:16 PM]
table with v = K - 1 degrees of freedom.
Sample code Sample code for computing the t value for a conservative
case where = 0.05, J = 6, and K = 6, is available for both
Dataplot and R.
Simplification
for large
degrees of
freedom
It is standard practice to use a value of 3 instead of a
critical value from the t table, given the process standard
deviation has large degrees of freedom, say, v > 15.
The control
procedure is
invoked in
real-time and
a failure
implies that
the current
calibration
should be
rejected
The control procedure compares the check standard value,
C, from each calibration run with the upper and lower
control limits. This procedure should be implemented in
real time and does not necessarily require a graphical
presentation. The check standard value can be compared
algebraically with the control limits. The calibration run is
judged to be out-of-control if either:
C > UCL
or
C < LCL
Actions to be
taken
If the check standard value exceeds one of the control
limits, the process is judged to be out of control and the
current calibration run is rejected. The best strategy in this
situation is to repeat the calibration to see if the failure was
a chance occurrence. Check standard values that remain in
control, especially over a period of time, provide
confidence that no new biases have been introduced into the
measurement process and that the long-term variability of
the process has not changed.
Out-of-
control
signals that
recur require
investigation
Out-of-control signals, particularly if they recur, can be
symptomatic of one of the following conditions:
Change or damage to the reference standard(s)
Change or damage to the check standard
Change in the long-term variability of the calibration
process
For more guidance, see Remedies and strategies for dealing
with out-of-control signals.
Caution - be
sure to plot
the data
If the tests for control are carried out algebraically, it is
recommended that, at regular intervals, the check standard
values be plotted against time to check for drift or
anomalies in the measurement process.
2.3.5.2. Control of bias and long-term variability
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc352.htm[6/27/2012 1:51:16 PM]
2.3.5.2.1. Example of Shewhart control chart for mass calibrations
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3521.htm[6/27/2012 1:51:17 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.5. Control of artifact calibration
2.3.5.2. Control of bias and long-term variability
2.3.5.2.1. Example of Shewhart control chart for mass calibrations
Example of a
control chart
for mass
calibrations
at the
kilogram
level
Mass calibrations usually start with the comparison of four kilogram standards using a high
precision balance as a comparator. Many of the measurements at the kilogram level that were
made at NIST between 1975 and 1989 were made on balance #12 using a 1,1,1,1 calibration
design. The restraint for this design is the known average of two kilogram reference standards.
The redundancy in the calibration design produces individual estimates for the two test
kilograms and the two reference standards.
Check
standard
There is no slot in the 1,1,1,1 design for an artifact check standard when the first two
kilograms are reference standards; the third kilogram is a test weight; and the fourth is a
summation of smaller weights that act as the restraint in the next series. Therefore, the check
standard is a computed difference between the values of the two reference standards as
estimated from the design. The convention with mass calibrations is to report the correction to
nominal, in this case the correction to 1000 g, as shown in the control charts below.
Need for
monitoring
The kilogram check standard is monitored to check for:
1. Long-term degradation in the calibration process
2. Anomalous behavior at specific times
Monitoring
technique for
check
standard
values
Check standard values over time and many calibrations are tracked and monitored using a
Shewhart control chart. The database and control limits are updated when needed and check
standard values for each calibration run in the next cycle are compared with the control limits.
In this case, the values from 117 calibrations between 1975 and 1985 were averaged to obtain
a baseline and process standard deviation with v = 116 degrees of freedom. Control limits are
computed with a factor of
k = 3 to identify truly anomalous data points.
Control chart
of kilogram
check
standard
measurements
showing a
change in the
process after
1985
2.3.5.2.1. Example of Shewhart control chart for mass calibrations
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3521.htm[6/27/2012 1:51:17 PM]
Interpretation
of the control
chart
The control chart shows only two violations of the control limits. For those occasions, the
calibrations were discarded and repeated. The configuration of points is unacceptable if many
points are close to a control limit and there is an unequal distribution of data points on the two
sides of the control chart -- indicating a change in either:
process average which may be related to a change in the reference standards
or
variability which may be caused by a change in the instrument precision or may be the
result of other factors on the measurement process.
Small
changes only
become
obvious over
time
Unfortunately, it takes time for the patterns in the data to emerge because individual violations
of the control limits do not necessarily point to a permanent shift in the process. The Shewhart
control chart is not powerful for detecting small changes, say of the order of at most one
standard deviation, which appears to be approximately the case in this application. This level
of change might seem insignificant, but the calculation of uncertainties for the calibration
process depends on the control limits.
2.3.5.2.1. Example of Shewhart control chart for mass calibrations
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3521.htm[6/27/2012 1:51:17 PM]
Re-
establishing
the limits
based on
recent data
and EWMA
option
If the limits for the control chart are re-calculated based on the data after 1985, the extent of
the change is obvious. Because the exponentially weighted moving average (EWMA) control
chart is capable of detecting small changes, it may be a better choice for a high precision
process that is producing many control values.
Revised
control chart
based on
check
standard
measurements
after 1985
Sample code The original and revised Shewhart control charts can be generated using both Dataplot code
and R code.
2.3.5.2.2. Example of EWMA control chart for mass calibrations
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3522.htm[6/27/2012 1:51:17 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.5. Control of artifact calibration
2.3.5.2. Control of bias and long-term variability
2.3.5.2.2. Example of EWMA control chart for mass calibrations
Small
changes only
become
obvious over
time
Unfortunately, it takes time for the patterns in the data to emerge because individual violations
of the control limits do not necessarily point to a permanent shift in the process. The Shewhart
control chart is not powerful for detecting small changes, say of the order of at most one
standard deviation, which appears to be the case for the calibration data shown on the
previous page. The EWMA (exponentially weighted moving average) control chart is better
suited for this purpose.
Explanation
of EWMA
statistic at
the kilogram
level
The exponentially weighted moving average (EWMA) is a statistic for monitoring the process
that averages the data in a way that gives less and less weight to data as they are further
removed in time from the current measurement. The EWMA statistic at time t is computed
recursively from individual data points which are ordered in time to be
where the first EWMA statistic is the average of historical data.
Control
mechanism
for EWMA
The EWMA control chart can be made sensitive to small changes or a gradual drift in the
process by the choice of the weighting factor, . A weighting factor between 0.2 - 0.3 has
been suggested for this purpose (Hunter), and 0.15 is another popular choice.
Limits for the
control chart
The target or center line for the control chart is the average of historical data. The upper
(UCL) and lower (LCL) limits are
where s is the standard deviation of the historical data; the function under the radical is a good
approximation to the component of the standard deviation of the EWMA statistic that is a
function of time; and k is the multiplicative factor, defined in the same manner as for the
Shewhart control chart, which is usually taken to be 3.
Example of
EWMA chart
for check
The target (average) and process standard deviation are computed from the check standard
data taken prior to 1985. The computation of the EWMA statistic begins with the data taken at
the start of 1985. In the control chart below, the control data after 1985 are shown in green,
2.3.5.2.2. Example of EWMA control chart for mass calibrations
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3522.htm[6/27/2012 1:51:17 PM]
standard data
for kilogram
calibrations
showing
multiple
violations of
the control
limits for the
EWMA
statistics
and the EWMA statistics are shown as black dots superimposed on the raw data. The control
limits are calculated according to the equation above where the process standard deviation, s
= 0.03065 mg and k = 3. The EWMA statistics, and not the raw data, are of interest in looking
for out-of-control signals. Because the EWMA statistic is a weighted average, it has a smaller
standard deviation than a single control measurement, and, therefore, the EWMA control
limits are narrower than the limits for a Shewhart control chart.
The EWMA control chart for mass calibrations can be generated using both Dataplot code and
R code.
Interpretation
of the control
chart
The EWMA control chart shows many violations of the control limits starting at
approximately the mid-point of 1986. This pattern emerges because the process average has
actually shifted about one standard deviation, and the EWMA control chart is sensitive to
small changes.
2.3.6. Instrument calibration over a regime
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc36.htm[6/27/2012 1:51:18 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
Topics This section discusses the creation of a calibration curve for calibrating instruments (gauges)
whose responses cover a large range. Topics are:
Models for instrument calibration
Data collection
Assumptions
Conditions that can invalidate the calibration procedure
Data analysis and model validation
Calibration of future measurements
Uncertainties of calibrated values
Purpose of
instrument
calibration
Instrument calibration is intended to eliminate or reduce bias in an instrument's readings over
a range for all continuous values. For this purpose, reference standards with known values for
selected points covering the range of interest are measured with the instrument in question.
Then a functional relationship is established between the values of the standards and the
corresponding measurements. There are two basic situations.
Instruments
which require
correction for
bias
The instrument reads in the same units as the reference standards. The purpose of the
calibration is to identify and eliminate any bias in the instrument relative to the defined
unit of measurement. For example, optical imaging systems that measure the width of
lines on semiconductors read in micrometers, the unit of interest. Nonetheless, these
instruments must be calibrated to values of reference standards if line width
measurements across the industry are to agree with each other.
Instruments
whose
measurements
act as
surrogates for
other
measurements
The instrument reads in different units than the reference standards. The purpose of the
calibration is to convert the instrument readings to the units of interest. An example is
densitometer measurements that act as surrogates for measurements of radiation dosage.
For this purpose, reference standards are irradiated at several dosage levels and then
measured by radiometry. The same reference standards are measured by densitometer.
The calibrated results of future densitometer readings on medical devices are the basis
for deciding if the devices have been sterilized at the proper radiation level.
Basic steps
for correcting
the
instrument for
bias
The calibration method is the same for both situations and requires the following basic steps:
Selection of reference standards with known values to cover the range of interest.
Measurements on the reference standards with the instrument to be calibrated.
Functional relationship between the measured and known values of the reference
standards (usually a least-squares fit to the data) called a calibration curve.
Correction of all measurements by the inverse of the calibration curve.
2.3.6. Instrument calibration over a regime
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc36.htm[6/27/2012 1:51:18 PM]
Schematic
example of a
calibration
curve and
resulting
value
A schematic explanation is provided by the figure below for load cell calibration. The loadcell
measurements (shown as *) are plotted on the y-axis against the corresponding values of
known load shown on the x-axis.
A quadratic fit to the loadcell data produces the calibration curve that is shown as the solid
line. For a future measurement with the load cell, Y' = 1.344 on the y-axis, a dotted line is
drawn through Y' parallel to the x-axis. At the point where it intersects the calibration curve,
another dotted line is drawn parallel to the y-axis. Its point of intersection with the x-axis at X'
= 13.417 is the calibrated value.
2.3.6.1. Models for instrument calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc361.htm[6/27/2012 1:51:19 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.1. Models for instrument calibration
Notation The following notation is used in this chapter in discussing
models for calibration curves.
Y denotes a measurement on a reference standard
X denotes the known value of a reference standard
denotes measurement error.
a, b and c denote coefficients to be determined
Possible forms
for calibration
curves
There are several models for calibration curves that can be
considered for instrument calibration. They fall into the
following classes:
Linear:
Quadratic:
Power:
Non-linear:
Special case
of linear
model - no
calibration
required
An instrument requires no calibration if
a=0 and b=1
i.e., if measurements on the reference standards agree with
their known values given an allowance for measurement
error, the instrument is already calibrated. Guidance on
collecting data, estimating and testing the coefficients is
given on other pages.
Advantages of
the linear
The linear model ISO 11095 is widely applied to
instrument calibration because it has several advantages
2.3.6.1. Models for instrument calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc361.htm[6/27/2012 1:51:19 PM]
model over more complicated models.
Computation of coefficients and standard deviations
is easy.
Correction for bias is easy.
There is often a theoretical basis for the model.
The analysis of uncertainty is tractable.
Warning on
excluding the
intercept term
from the
model
It is often tempting to exclude the intercept, a, from the
model because a zero stimulus on the x-axis should lead to
a zero response on the y-axis. However, the correct
procedure is to fit the full model and test for the
significance of the intercept term.
Quadratic
model and
higher order
polynomials
Responses of instruments or measurement systems which
cannot be linearized, and for which no theoretical model
exists, can sometimes be described by a quadratic model
(or higher-order polynomial). An example is a load cell
where force exerted on the cell is a non-linear function of
load.
Disadvantages
of quadratic
models
Disadvantages of quadratic and higher-order polynomials
are:
They may require more reference standards to
capture the region of curvature.
There is rarely a theoretical justification; however,
the adequacy of the model can be tested statistically.
The correction for bias is more complicated than for
the linear model.
The uncertainty analysis is difficult.
Warning A plot of the data, although always recommended, is not
sufficient for identifying the correct model for the
calibration curve. Instrument responses may not appear
non-linear over a large interval. If the response and the
known values are in the same units, differences from the
known values should be plotted versus the known values.
Power model
treated as a
linear model
The power model is appropriate when the measurement
error is proportional to the response rather than being
additive. It is frequently used for calibrating instruments
that measure dosage levels of irradiated materials.
The power model is a special case of a non-linear model
that can be linearized by a natural logarithm
transformation to
so that the model to be fit to the data is of the familiar
2.3.6.1. Models for instrument calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc361.htm[6/27/2012 1:51:19 PM]
linear form
where W, Z and e are the transforms of the variables, Y, X
and the measurement error, respectively, and a' is the
natural logarithm of a.
Non-linear
models and
their
limitations
Instruments whose responses are not linear in the
coefficients can sometimes be described by non-linear
models. In some cases, there are theoretical foundations for
the models; in other cases, the models are developed by
trial and error. Two classes of non-linear functions that
have been shown to have practical value as calibration
functions are:
1. Exponential
2. Rational
Non-linear models are an important class of calibration
models, but they have several significant limitations.
The model itself may be difficult to ascertain and
verify.
There can be severe computational difficulties in
estimating the coefficients.
Correction for bias cannot be applied algebraically
and can only be approximated by interpolation.
Uncertainty analysis is very difficult.
Example of an
exponential
function
An exponential function is shown in the equation below.
Instruments for measuring the ultrasonic response of
reference standards with various levels of defects (holes)
that are submerged in a fluid are described by this
function.
Example of a
rational
function
A rational function is shown in the equation below.
Scanning electron microscope measurements of line widths
on semiconductors are described by this function (Kirby).
2.3.6.2. Data collection
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc362.htm[6/27/2012 1:51:20 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.2. Data collection
Data
collection
The process of collecting data for creating the calibration
curve is critical to the success of the calibration program.
General rules for designing calibration experiments apply, and
guidelines that are adequate for the calibration models in this
chapter are given below.
Selection
of
reference
standards
A minimum of five reference standards is required for a linear
calibration curve, and ten reference standards should be
adequate for more complicated calibration models.
The optimal strategy in selecting the reference standards is to
space the reference standards at points corresponding to equal
increments on the y-axis, covering the range of the instrument.
Frequently, this strategy is not realistic because the person
producing the reference materials is often not the same as the
person who is creating the calibration curve. Spacing the
reference standards at equal intervals on the x-axis is a good
alternative.
Exception
to the rule
above -
bracketing
If the instrument is not to be calibrated over its entire range,
but only over a very short range for a specific application,
then it may not be necessary to develop a complete calibration
curve, and a bracketing technique (ISO 11095) will provide
satisfactory results. The bracketing technique assumes that the
instrument is linear over the interval of interest, and, in this
case, only two reference standards are required -- one at each
end of the interval.
Number of
repetitions
on each
reference
standard
A minimum of two measurements on each reference standard
is required and four is recommended. The repetitions should
be separated in time by days or weeks. These repetitions
provide the data for determining whether a candidate model is
adequate for calibrating the instrument.
2.3.6.3. Assumptions for instrument calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc363.htm[6/27/2012 1:51:21 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.3. Assumptions for instrument calibration
Assumption
regarding
reference
values
The basic assumption regarding the reference values of
artifacts that are measured in the calibration experiment is
that they are known without error. In reality, this condition
is rarely met because these values themselves usually come
from a measurement process. Systematic errors in the
reference values will always bias the results, and random
errors in the reference values can bias the results.
Rule of
thumb
It has been shown by Bruce Hoadly, in an internal NIST
publication, that the best way to mitigate the effect of
random fluctuations in the reference values is to plan for a
large spread of values on the x-axis relative to the precision
of the instrument.
Assumptions
regarding
measurement
errors
The basic assumptions regarding measurement errors
associated with the instrument are that they are:
free from outliers
independent
of equal precision
from a normal distribution.
2.3.6.4. What can go wrong with the calibration procedure
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc364.htm[6/27/2012 1:51:21 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.4. What can go wrong with the calibration
procedure
Calibration
procedure
may fail to
eliminate
bias
There are several circumstances where the calibration curve
will not reduce or eliminate bias as intended. Some are
discussed on this page. A critical exploratory analysis of the
calibration data should expose such problems.
Lack of
precision
Poor instrument precision or unsuspected day-to-day effects
may result in standard deviations that are large enough to
jeopardize the calibration. There is nothing intrinsic to the
calibration procedure that will improve precision, and the best
strategy, before committing to a particular instrument, is to
estimate the instrument's precision in the environment of
interest to decide if it is good enough for the precision
required.
Outliers in
the
calibration
data
Outliers in the calibration data can seriously distort the
calibration curve, particularly if they lie near one of the
endpoints of the calibration interval.
Isolated outliers (single points) should be deleted from
the calibration data.
An entire day's results which are inconsistent with the
other data should be examined and rectified before
proceeding with the analysis.
Systematic
differences
among
operators
It is possible for different operators to produce measurements
with biases that differ in sign and magnitude. This is not
usually a problem for automated instrumentation, but for
instruments that depend on line of sight, results may differ
significantly by operator. To diagnose this problem,
measurements by different operators on the same artifacts are
plotted and compared. Small differences among operators can
be accepted as part of the imprecision of the measurement
process, but large systematic differences among operators
require resolution. Possible solutions are to retrain the
operators or maintain separate calibration curves by operator.
Lack of
system
The calibration procedure, once established, relies on the
instrument continuing to respond in the same way over time.
2.3.6.4. What can go wrong with the calibration procedure
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc364.htm[6/27/2012 1:51:21 PM]
control If the system drifts or takes unpredictable excursions, the
calibrated values may not be properly corrected for bias, and
depending on the direction of change, the calibration may
further degrade the accuracy of the measurements. To assure
that future measurements are properly corrected for bias, the
calibration procedure should be coupled with a statistical
control procedure for the instrument.
Example of
differences
among
repetitions
in the
calibration
data
An important point, but one that is rarely considered, is that
there can be differences in responses from repetition to
repetition that will invalidate the analysis. A plot of the
aggregate of the calibration data may not identify changes in
the instrument response from day-to-day. What is needed is a
plot of the fine structure of the data that exposes any day to
day differences in the calibration data.
Warning -
calibration
can fail
because of
day-to-day
changes
A straight-line fit to the aggregate data will produce a
'calibration curve'. However, if straight lines fit separately to
each day's measurements show very disparate responses, the
instrument, at best, will require calibration on a daily basis
and, at worst, may be sufficiently lacking in control to be
usable.
2.3.6.4.1. Example of day-to-day changes in calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3641.htm[6/27/2012 1:51:22 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.4. What can go wrong with the calibration procedure
2.3.6.4.1. Example of day-to-day changes in calibration
Calibration
data over 4
days
Line width measurements on 10 NIST reference standards were made
with an optical imaging system on each of four days. The four data points
for each reference value appear to overlap in the plot because of the wide
spread in reference values relative to the precision. The plot suggests that
a linear calibration line is appropriate for calibrating the imaging system.
This plot
shows
measurements
made on 10
reference
materials
repeated on
four days with
the 4 points
for each day
overlapping
REFERENCE VALUES (m)
This plot
shows the
differences
between each
measurement
and the
corresponding
reference
value.
Because days
are not
identified, the
2.3.6.4.1. Example of day-to-day changes in calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3641.htm[6/27/2012 1:51:22 PM]
plot gives no
indication of
problems in
the control of
the imaging
system from
from day to
day.
REFERENCE VALUES (m)
This plot, with
linear
calibration
lines fit to
each day's
measurements
individually,
shows how
the response
of the imaging
system
changes
dramatically
from day to
day. Notice
that the slope
of the
calibration
line goes from
positive on
day 1 to
negative on
day 3.
REFERENCE VALUES (m)
Interpretation
of calibration
findings
Given the lack of control for this measurement process, any calibration
procedure built on the average of the calibration data will fail to properly
correct the system on some days and invalidate resulting measurements.
There is no good solution to this problem except daily calibration.
2.3.6.4.1. Example of day-to-day changes in calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3641.htm[6/27/2012 1:51:22 PM]
2.3.6.5. Data analysis and model validation
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc365.htm[6/27/2012 1:51:23 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.5. Data analysis and model validation
First step -
plot the
calibration
data
If the model for the calibration curve is not known from
theoretical considerations or experience, it is necessary to
identify and validate a model for the calibration curve. To
begin this process, the calibration data are plotted as a
function of known values of the reference standards; this
plot should suggest a candidate model for describing the
data. A linear model should always be a consideration. If
the responses and their known values are in the same units,
a plot of differences between responses and known values
is more informative than a plot of the data for exposing
structure in the data.
Warning -
regarding
statistical
software
Once an initial model has been chosen, the coefficients in
the model are estimated from the data using a statistical
software package. It is impossible to over-emphasize the
importance of using reliable and documented software for
this analysis.
Output
required from
a software
package
The software package will use the method of least squares
for estimating the coefficients. The software package
should also be capable of performing a 'weighted' fit for
situations where errors of measurement are non-constant
over the calibration interval. The choice of weights is
usually the responsibility of the user. The software
package should, at the minimum, provide the following
information:
Coefficients of the calibration curve
Standard deviations of the coefficients
Residual standard deviation of the fit
F-ratio for goodness of fit (if there are repetitions on
the y-axis at each reference value)
Typical
analysis of a
quadratic fit
Load cell measurements are modeled as a quadratic
function of known loads as shown below. There are three
repetitions at each load level for a total of 33
measurements.
Parameter estimates for model y = a + b*x +
c*x*x + e:
2.3.6.5. Data analysis and model validation
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc365.htm[6/27/2012 1:51:23 PM]
Parameter Estimate Std. Error t-value
Pr(>|t|)
a -1.840e-05 2.451e-05 -0.751
0.459
b 1.001e-01 4.839e-06 20687.891
<2e-16
c 7.032e-06 2.014e-07 34.922
<2e-16
Residual standard error = 3.764e-05 (30 degrees
of freedom)
Multiple R-squared = 1
Adjusted R-squared = 1
Analysis of variance table:
Source of Degrees of Sum of Mean
Variation Freedom Squares Square
F-Ratio Pr(>F)
Model 2 12.695 6.3475
4.48e+09 <2.2e-16
Residual 30 4.2504e-08 1.4170e-
09
(Lack of fit) 8 4.7700e-09 5.9625e-
10 0.3477 0.9368
(Pure error) 22 3.7733e-08 1.7151e-
09
Total 32 12.695
The analyses shown above can be reproduced using
Dataplot code and R code.
Note: Dataplot reports a probability associated with the F-
ratio (for example, 6.334 % for the lack-of-fit test), where
a probability greater than 95 % indicates an F-ratio that is
significant at the 5 % level. R reports a p-value that
corresponds to the probability greater than the F-ratio, so a
value less than 0.05 would indicate significance at the 5 %
level. Other software may report in other ways; therefore,
it is necessary to check the interpretation for each package.
The F-ratio is
used to test
the goodness
of the fit to
the data
The F-ratio provides information on the model as a good
descriptor of the data. The F-ratio is compared with a
critical value from the F-table. An F-ratio smaller than the
critical value indicates that all significant structure has
been captured by the model.
F-ratio < 1
always
indicates a
good fit
For the load cell analysis, a plot of the data suggests a
linear fit. However, the linear fit gives a very large F-ratio.
For the quadratic fit, the F-ratio is 0.3477 with v
1
= 8 and
v
2
= 22 degrees of freedom. The critical value of F(0.05, 8,
20) = 2.45 indicates that the quadratic function is sufficient
for describing the data. A fact to keep in mind is that an F-
ratio < 1 does not need to be checked against a critical
value; it always indicates a good fit to the data.
The t-values
are used to
test the
significance of
The t-values can be compared with critical values from a
t-table. However, for a test at the 5 % significance level, a
t-value < 2 is a good indicator of non-significance. The t-
value for the intercept term, a, is < 2 indicating that the
2.3.6.5. Data analysis and model validation
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc365.htm[6/27/2012 1:51:23 PM]
individual
coefficients
intercept term is not significantly different from zero. The
t-values for the linear and quadratic terms are significant
indicating that these coefficients are needed in the model.
If the intercept is dropped from the model, the analysis is
repeated to obtain new estimates for the coefficients, b and
c.
Residual
standard
deviation
The residual standard deviation estimates the standard
deviation of a single measurement with the load cell.
Further
considerations
and tests of
assumptions
The residuals (differences between the measurements and
their fitted values) from the fit should also be examined for
outliers and structure that might invalidate the calibration
curve. They are also a good indicator of whether basic
assumptions of normality and equal precision for all
measurements are valid.
If the initial model proves inappropriate for the data, a
strategy for improving the model is followed.
2.3.6.5.1. Data on load cell #32066
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3651.htm[6/27/2012 1:51:23 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.5. Data analysis and model validation
2.3.6.5.1. Data on load cell #32066
Three
repetitions on
a load cell at
eleven known
loads
X Y
2. 0.20024
2. 0.20016
2. 0.20024
4. 0.40056
4. 0.40045
4. 0.40054
6. 0.60087
6. 0.60075
6. 0.60086
8. 0.80130
8. 0.80122
8. 0.80127
10. 1.00173
10. 1.00164
10. 1.00173
12. 1.20227
12. 1.20218
12. 1.20227
14. 1.40282
14. 1.40278
14. 1.40279
16. 1.60344
16. 1.60339
16. 1.60341
18. 1.80412
18. 1.80409
18. 1.80411
20. 2.00485
20. 2.00481
20. 2.00483
21. 2.10526
21. 2.10524
21. 2.10524
2.3.6.6. Calibration of future measurements
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc366.htm[6/27/2012 1:51:24 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.6. Calibration of future measurements
Purpose The purpose of creating the calibration curve is to correct
future measurements made with the same instrument to the
correct units of measurement. The calibration curve can be
applied many, many times before it is discarded or reworked
as long as the instrument remains in statistical control.
Chemical measurements are an exception where frequently the
calibration curve is used only for a single batch of
measurements, and a new calibration curve is created for the
next batch.
Notation The notation for this section is as follows:
Y' denotes a future measurement.
X' denotes the associated calibrated value.
are the estimates of the coefficients, a, b, c.
are standard deviations of the coefficients, a,
b, c.
Procedure To apply a correction to a future measurement, Y*, to obtain
the calibration value X* requires the inverse of the calibration
curve.
Linear
calibration
line
The inverse of the calibration line for the linear model
gives the calibrated value
Tests for
the
intercept
and slope
of
calibration
Before correcting for the calibration line by the equation
above, the intercept and slope should be tested for a=0, and
b=1. If both
2.3.6.6. Calibration of future measurements
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc366.htm[6/27/2012 1:51:24 PM]
curve -- If
both
conditions
hold, no
calibration
is needed.
there is no need for calibration. If, on the other hand only the
test for a=0 fails, the error is constant; if only the test for
b=1 fails, the errors are related to the size of the reference
standards.
Table
look-up
for t-
factor
The factor, t
1-/2,
, is found in the t-table where is the
degrees of freedom for the residual standard deviation from
the calibration curve, and is chosen to be small, say, 0.05.
Quadratic
calibration
curve
The inverse of the calibration curve for the quadratic model
requires a root
The correct root (+ or -) can usually be identified from
practical considerations.
Power
curve
The inverse of the calibration curve for the power model
gives the calibrated value
where b and the natural logarithm of a are estimated from the
power model transformed to a linear function.
Non-linear
and other
calibration
curves
For more complicated models, the inverse for the calibration
curve is obtained by interpolation from a graph of the function
or from predicted values of the function.
2.3.6.6. Calibration of future measurements
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc366.htm[6/27/2012 1:51:24 PM]
2.3.6.7. Uncertainties of calibrated values
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc367.htm[6/27/2012 1:51:25 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.7. Uncertainties of calibrated values
Purpose The purpose is to quantify the uncertainty of a 'future' result
that has been corrected by the calibration curve. In principle,
the uncertainty quantifies any possible difference between the
calibrated value and its reference base (which normally
depends on reference standards).
Explanation
in terms of
reference
artifacts
Measurements of interest are future measurements on
unknown artifacts, but one way to look at the problem is to
ask: If a measurement is made on one of the reference
standards and the calibration curve is applied to obtain the
calibrated value, how well will this value agree with the
'known' value of the reference standard?
Difficulties The answer is not easy because of the intersection of two
uncertainties associated with
1. the calibration curve itself because of limited data
2. the 'future' measurement
If the calibration experiment were to be repeated, a slightly
different calibration curve would result even for a system in
statistical control. An exposition of the intersection of the
two uncertainties is given for the calibration of proving rings
( Hockersmith and Ku).
ISO
approach to
uncertainty
can be
based on
check
standards
or
propagation
of error
General procedures for computing an uncertainty based on
ISO principles of uncertainty analysis are given in the
chapter on modeling.
Type A uncertainties for calibrated values from calibration
curves can be derived from
check standard values
propagation of error
An example of type A uncertainties of calibrated values from
a linear calibration curve are analyzed from measurements on
linewidth check standards. Comparison of the uncertainties
from check standards and propagation of error for the
linewidth calibration data are also illustrated.
An example of the derivation of propagation of error type A
2.3.6.7. Uncertainties of calibrated values
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc367.htm[6/27/2012 1:51:25 PM]
uncertainties for calibrated values from a quadratic
calibration curve for loadcells is discussed on the next page.
2.3.6.7.1. Uncertainty for quadratic calibration using propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3671.htm[6/27/2012 1:51:26 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.7. Uncertainties of calibrated values
2.3.6.7.1. Uncertainty for quadratic calibration using propagation of
error
Propagation
of error for
uncertainty
of
calibrated
values of
loadcells
The purpose of this page is to show the propagation of error for calibrated values of a loadcell
based on a quadratic calibration curve where the model for instrument response is
The calibration data are instrument responses at known loads (psi), and estimates of the
quadratic coefficients, a, b, c, and their associated standard deviations are shown with the
analysis.
A graph of the calibration curve showing a measurement Y' corrected to X', the proper load
(psi), is shown below.
2.3.6.7.1. Uncertainty for quadratic calibration using propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3671.htm[6/27/2012 1:51:26 PM]
Uncertainty of
the calibrated
value X'
The uncertainty to be evaluated is the uncertainty of the calibrated value, X', computed for any
future measurement, Y', made with the calibrated instrument where
Partial
derivatives
The partial derivatives are needed to compute uncertainty.
2.3.6.7.1. Uncertainty for quadratic calibration using propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3671.htm[6/27/2012 1:51:26 PM]
The variance
of the
calibrated
value from
propagation of
error
The variance of X' is defined from propagation of error as follows:
The values of the coefficients and their respective standard deviations from the quadratic fit to
the calibration curve are substituted in the equation. The standard deviation of the
measurement, Y, may not be the same as the standard deviation from the fit to the calibration
data if the measurements to be corrected are taken with a different system; here we assume
that the instrument to be calibrated has a standard deviation that is essentially the same as the
instrument used for collecting the calibration data and the residual standard deviation from the
quadratic fit is the appropriate estimate.
a = -0.183980e-04
sa = 0.2450e-04
b = 0.100102
sb = 0.4838e-05
c = 0.703186e-05
sc = 0.2013e-06
sy = 0.0000376353
Graph
showing the
standard
deviations of
calibrated
values X' for
given
instrument
responses Y'
ignoring
covariance
terms in the
propagation of
error
The standard deviation expressed above is not easily interpreted but it is easily graphed. A
graph showing standard deviations of calibrated values, X', as a function of instrument
response, Y', is shown below.
2.3.6.7.1. Uncertainty for quadratic calibration using propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3671.htm[6/27/2012 1:51:26 PM]
Problem with
propagation of
error
The propagation of errors shown above is not complete because it ignores the covariances
among the coefficients, a, b, c. Unfortunately, some statistical software packages do not
display these covariance terms with the other output from the analysis.
Covariance
terms for
loadcell data
The variance-covariance terms for the loadcell data set are shown below.
a b c
a 6.0049021-10
b -1.0759599-10 2.3408589-11
c 4.0191106-12 -9.5051441-13 4.0538705-14
The diagonal elements are the variances of the coefficients, a, b, c, respectively, and the off-
diagonal elements are the covariance terms.
Recomputation
of the
standard
deviation of X'
To account for the covariance terms, the variance of X' is redefined by adding the covariance
terms. Appropriate substitutions are made; the standard deviations are recomputed and
graphed as a function of instrument response.
2.3.6.7.1. Uncertainty for quadratic calibration using propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3671.htm[6/27/2012 1:51:26 PM]
sab = -1.0759599e-10
sac = 4.0191106e-12
sbc = -9.5051441e-13
The graph below shows the correct estimates for the standard deviation of X' and gives a
means for assessing the loss of accuracy that can be incurred by ignoring covariance terms. In
this case, the uncertainty is reduced by including covariance terms, some of which are
negative.
Graph
showing the
standard
deviations of
calibrated
values, X', for
given
instrument
responses, Y',
with
covariance
terms included
in the
propagation of
error
Sample code The results in this section can be generated using R code.
2.3.6.7.2. Uncertainty for linear calibration using check standards
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3672.htm[6/27/2012 1:51:27 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.7. Uncertainties of calibrated values
2.3.6.7.2. Uncertainty for linear calibration
using check standards
Check
standards
provide a
mechanism
for
calculating
uncertainties
The easiest method for calculating type A uncertainties for
calibrated values from a calibration curve requires periodic
measurements on check standards. The check standards, in
this case, are artifacts at the lower, mid-point and upper
ends of the calibration curve. The measurements on the
check standard are made in a way that randomly samples
the output of the calibration procedure.
Calculation of
check
standard
values
The check standard values are the raw measurements on
the artifacts corrected by the calibration curve. The
standard deviation of these values should estimate the
uncertainty associated with calibrated values. The success
of this method of estimating the uncertainties depends on
adequate sampling of the measurement process.
Measurements
corrected by a
linear
calibration
curve
As an example, consider measurements of linewidths on
photomask standards, made with an optical imaging system
and corrected by a linear calibration curve. The three
control measurements were made on reference standards
with values at the lower, mid-point, and upper end of the
calibration interval.
Compute the
calibration
standard
deviation
For the linewidth data, the regression equation from the
calibration experiment is
and the estimated regression coefficients are the following.
Next, we calculate the difference between the "predicted" X
from the regression fit and the observed X.
2.3.6.7.2. Uncertainty for linear calibration using check standards
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3672.htm[6/27/2012 1:51:27 PM]
Finally, we find the calibration standard deviation by
calculating the standard deviation of the computed
differences.
The calibration standard deviation for the linewidth data is
0.119 m.
The calculations in this section can be completed using
Dataplot code and R code.
Comparison
with
propagation
of error
The standard deviation, 0.119 m, can be compared with a
propagation of error analysis.
Other sources
of uncertainty
In addition to the type A uncertainty, there may be other
contributors to the uncertainty such as the uncertainties of
the values of the reference materials from which the
calibration curve was derived.
2.3.6.7.3. Comparison of check standard analysis and propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3673.htm[6/27/2012 1:51:28 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.6. Instrument calibration over a regime
2.3.6.7. Uncertainties of calibrated values
2.3.6.7.3. Comparison of check standard analysis and propagation of
error
Propagation
of error for
the linear
calibration
The analysis of uncertainty for calibrated values from a linear calibration line can be
addressed using propagation of error. On the previous page, the uncertainty was estimated
from check standard values.
Estimates
from
calibration
data
The calibration data consist of 40 measurements with an optical imaging system on 10
linewidth artifacts. A linear fit to the data gives a calibration curve with the following
estimates for the intercept, a, and the slope, b:
Parameter Estimate Std. Error t-value Pr(>|t|)
a 0.2357623 0.02430034 9.702014 7.860745e-12
b 0.9870377 0.00344058 286.881171 5.354121e-65
with the following covariance matrix.
a b
a 5.905067e-04 -7.649453e-05
b -7.649453e-05 1.183759e-05
The results shown above can be generated with R code.
Propagation
of error
The propagation of error is performed for the equation
so that the squared uncertainty of a calibrated value, X', is
where
2.3.6.7.3. Comparison of check standard analysis and propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3673.htm[6/27/2012 1:51:28 PM]
The uncertainty of the calibrated value, X',
is dependent on the value of the instrument reponse Y'.
Graph
showing
standard
deviation of
calibrated
value X'
plotted as a
function of
instrument
response Y'
for a linear
calibration
2.3.6.7.3. Comparison of check standard analysis and propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc3673.htm[6/27/2012 1:51:28 PM]
Comparison
of check
standard
analysis and
propagation
of error
Comparison of the analysis of check standard data, which gives a standard deviation of 0.119
m, and propagation of error, which gives a maximum standard deviation of 0.068 m,
suggests that the propagation of error may underestimate the type A uncertainty. The check
standard measurements are undoubtedly sampling some sources of variability that do not
appear in the formal propagation of error formula.
2.3.7. Instrument control for linear calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc37.htm[6/27/2012 1:51:29 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.7. Instrument control for linear calibration
Purpose The purpose of the control program is to guarantee that the
calibration of an instrument does not degrade over time.
Approach This is accomplished by exercising quality control on the
instrument's output in much the same way that quality control
is exercised on components in a process using a modification
of the Shewhart control chart.
Check
standards
needed for
the control
program
For linear calibration, it is sufficient to control the end-points
and the middle of the calibration interval to ensure that the
instrument does not drift out of calibration. Therefore, check
standards are required at three points; namely,
at the lower-end of the regime
at the mid-range of the regime
at the upper-end of the regime
Data
collection
One measurement is needed on each check standard for each
checking period. It is advisable to start by making control
measurements at the start of each day or as often as
experience dictates. The time between checks can be
lengthened if the instrument continues to stay in control.
Definition
of control
value
To conform to the notation in the section on instrument
corrections, X* denotes the known value of a standard, and X
denotes the measurement on the standard.
A control value is defined as the difference
If the calibration is perfect, control values will be randomly
distributed about zero and fall within appropriate upper and
lower limits on a control chart.
Calculation
of control
limits
The upper and lower control limits (Croarkin and Varner))
are, respectively,
2.3.7. Instrument control for linear calibration
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc37.htm[6/27/2012 1:51:29 PM]
where s is the residual standard deviation of the fit from the
calibration experiment, and is the estimated slope of the
linear calibration curve.
Values t* The critical value, , can be found in the t* table; v is the
degrees of freedom for the residual standard deviation; and
is equal to 0.05.
Determining
t*
For the case where = 0.05 and v = 38, the critical value of
the t* statistic is 2.497575.
R code and Dataplot code can be used to determine t*
critical values using a standard t-table for the quantile and
v degrees of freedom where is computed as
where m is the number of check standards.
Sensitivity
to departure
from
linearity
If
the instrument is in statistical control. Statistical control in
this context implies not only that measurements are
repeatable within certain limits but also that instrument
response remains linear. The test is sensitive to departures
from linearity.
Control
chart for a
system
corrected
by a linear
calibration
curve
An example of measurements of line widths on photomask
standards, made with an optical imaging system and
corrected by a linear calibration curve, are shown as an
example. The three control measurements were made on
reference standards with values at the lower, mid-point, and
upper end of the calibration interval.
2.3.7.1. Control chart for a linear calibration line
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc371.htm[6/27/2012 1:51:29 PM]
2. Measurement Process Characterization
2.3. Calibration
2.3.7. Instrument control for linear calibration
2.3.7.1. Control chart for a linear calibration line
Purpose Line widths of three photomask reference standards (at the low, middle and high end of the
calibration line) were measured on six days with an optical imaging system that had been
calibrated from similar measurements on 10 reference artifacts. The control values and limits
for the control chart , which depend on the intercept and slope of the linear calibration line,
monitor the calibration and linearity of the optical imaging system.
Initial
calibration
experiment
The initial calibration experiment consisted of 40 measurements (not shown here) on 10
artifacts and produced a linear calibration line with:
Intercept = 0.2357
Slope = 0.9870
Residual standard deviation = 0.06203 micrometers
Degrees of freedom = 38
Line width
measurements
made with an
optical
imaging
system
The control measurements, Y, and known values, X, for the three artifacts at the upper, mid-
range, and lower end (U, M, L) of the calibration line are shown in the following table:
DAY POSITION X Y
1 L 0.76 1.12
1 M 3.29 3.49
1 U 8.89 9.11
2 L 0.76 0.99
2 M 3.29 3.53
2 U 8.89 8.89
3 L 0.76 1.05
3 M 3.29 3.46
3 U 8.89 9.02
4 L 0.76 0.76
4 M 3.29 3.75
4 U 8.89 9.30
5 L 0.76 0.96
5 M 3.29 3.53
5 U 8.89 9.05
6 L 0.76 1.03
6 M 3.29 3.52
6 U 8.89 9.02
Control chart The control chart shown below can be generated using both Dataplot code and R code.
2.3.7.1. Control chart for a linear calibration line
http://www.itl.nist.gov/div898/handbook/mpc/section3/mpc371.htm[6/27/2012 1:51:29 PM]
Interpretation
of control
chart
The control measurements show no evidence of drift and are within the control limits except
on the fourth day when all three control values are outside the limits. The cause of the
problem on that day cannot be diagnosed from the data at hand, but all measurements made on
that day, including workload items, should be rejected and remeasured.
2.4. Gauge R & R studies
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc4.htm[6/27/2012 1:51:30 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
The purpose of this section is to outline the steps that can be
taken to characterize the performance of gauges and
instruments used in a production setting in terms of errors that
affect the measurements.
What are the issues for a gauge R & R study?
What are the design considerations for the study?
1. Artifacts
2. Operators
3. Gauges, parameter levels, configurations
How do we collect data for the study?
How do we quantify variability of measurements?
1. Repeatability
2. Reproducibility
3. Stability
How do we identify and analyze bias?
1. Resolution
2. Linearity
3. Hysteresis
4. Drift
5. Differences among gauges
6. Differences among geometries, configurations
Remedies and strategies
How do we quantify uncertainties of measurements made with
the gauges?
2.4.1. What are the important issues?
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc41.htm[6/27/2012 1:51:30 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.1. What are the important issues?
Basic
issues
The basic issue for the study is the behavior of gauges in a
particular environment with respect to:
Repeatability
Reproducibility
Stability
Bias
Strategy The strategy is to conduct and analyze a study that examines
the behavior of similar gauges to see if:
They exhibit different levels of precision;
Instruments in the same environment produce equivalent
results;
Operators in the same environment produce equivalent
results;
Responses of individual gauges are affected by
configuration or geometry changes or changes in setup
procedures.
Other
goals
Other goals are to:
Test the resolution of instruments
Test the gauges for linearity
Estimate differences among gauges (bias)
Estimate differences caused by geometries,
configurations
Estimate operator biases
Incorporate the findings in an uncertainty budget
2.4.2. Design considerations
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc42.htm[6/27/2012 1:51:31 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.2. Design considerations
Design
considerations
Design considerations for a gauge study are choices of:
Artifacts (check standards)
Operators
Gauges
Parameter levels
Configurations, etc.
Selection of
artifacts or
check
standards
The artifacts for the study are check standards or test items
of a type that are typically measured with the gauges under
study. It may be necessary to include check standards for
different parameter levels if the gauge is a multi-response
instrument. The discussion of check standards should be
reviewed to determine the suitability of available artifacts.
Number of
artifacts
The number of artifacts for the study should be Q (Q > 2).
Check standards for a gauge study are needed only for the
limited time period (two or three months) of the study.
Selection of
operators
Only those operators who are trained and experienced with
the gauges should be enlisted in the study, with the
following constraints:
If there is a small number of operators who are
familiar with the gauges, they should all be included
in the study.
If the study is intended to be representative of a
large pool of operators, then a random sample of L
(L > 2) operators should be chosen from the pool.
If there is only one operator for the gauge type, that
operator should make measurements on K (K > 2)
days.
Selection of
gauges
If there is only a small number of gauges in the facility,
then all gauges should be included in the study.
If the study is intended to represent a larger pool of
gauges, then a random sample of I (I > 3) gauges should
be chosen for the study.
Limit the If the gauges operate at several parameter levels (for
2.4.2. Design considerations
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc42.htm[6/27/2012 1:51:31 PM]
initial study example; frequencies), an initial study should be carried
out at 1 or 2 levels before a larger study is undertaken.
If there are differences in the way that the gauge can be
operated, an initial study should be carried out for one or
two configurations before a larger study is undertaken.
2.4.3. Data collection for time-related sources of variability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc43.htm[6/27/2012 1:51:32 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.3. Data collection for time-related sources of
variability
Time-
related
analysis
The purpose of this page is to present several options for
collecting data for estimating time-dependent effects in a
measurement process.
Time
intervals
The following levels of time-dependent errors are considered
in this section based on the characteristics of many
measurement systems and should be adapted to a specific
measurement situation as needed.
1. Level-1 Measurements taken over a short time to
capture the precision of the gauge
2. Level-2 Measurements taken over days (of other
appropriate time increment)
3. Level-3 Measurements taken over runs separated by
months
Time
intervals
Simple design for 2 levels of random error
Nested design for 2 levels of random error
Nested design for 3 levels of random error
In all cases, data collection and analysis are straightforward,
and there is no reason to estimate interaction terms when
dealing with time-dependent errors. Two levels should be
sufficient for characterizing most measurement systems. Three
levels are recommended for measurement systems where
sources of error are not well understood and have not
previously been studied.
2.4.3.1. Simple design
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc431.htm[6/27/2012 1:51:32 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.3. Data collection for time-related sources of variability
2.4.3.1. Simple design
Constraints
on time and
resources
In planning a gauge study, particularly for the first time, it is
advisable to start with a simple design and progress to more
complicated and/or labor intensive designs after acquiring
some experience with data collection and analysis. The
design recommended here is appropriate as a preliminary
study of variability in the measurement process that occurs
over time. It requires about two days of measurements
separated by about a month with two repetitions per day.
Relationship
to 2-level
and 3-level
nested
designs
The disadvantage of this design is that there is minimal data
for estimating variability over time. A 2-level nested design
and a 3-level nested design, both of which require
measurments over time, are discussed on other pages.
Plan of
action
Choose at least Q = 10 work pieces or check standards,
which are essentially identical insofar as their expected
responses to the measurement method. Measure each of the
check standards twice with the same gauge, being careful to
randomize the order of the check standards.
After about a month, repeat the measurement sequence,
randomizing anew the order in which the check standards are
measured.
Notation Measurements on the check standards are designated:
with the first index identifying the month of measurement
and the second index identifying the repetition number.
Analysis of
data
The level-1 standard deviation, which describes the basic
precision of the gauge, is
with v
1
= 2Q degrees of freedom.
2.4.3.1. Simple design
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc431.htm[6/27/2012 1:51:32 PM]
The level-2 standard deviation, which describes the
variability of the measurement process over time, is
with v
2
= Q degrees of freedom.
Relationship
to
uncertainty
for a test
item
The standard deviation that defines the uncertainty for a
single measurement on a test item, often referred to as the
reproducibility standard deviation (ASTM), is given by
The time-dependent component is
There may be other sources of uncertainty in the
measurement process that must be accounted for in a formal
analysis of uncertainty.
2.4.3.2. 2-level nested design
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc432.htm[6/27/2012 1:51:33 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.3. Data collection for time-related sources of variability
2.4.3.2. 2-level nested design
Check
standard
measurements
for estimating
time-
dependent
sources of
variability
Measurements on a check standard are recommended for studying the
effect of sources of variability that manifest themselves over time. Data
collection and analysis are straightforward, and there is no reason to
estimate interaction terms when dealing with time-dependent errors. The
measurements can be made at one of two levels. Two levels should be
sufficient for characterizing most measurement systems. Three levels are
recommended for measurement systems for which sources of error are
not well understood and have not previously been studied.
Time intervals
in a nested
design
The following levels are based on the characteristics of many
measurement systems and should be adapted to a specific measurement
situation as needed.
Level-1 Measurements taken over a short term to estimate gauge
precision
Level-2 Measurements taken over days (of other appropriate time
increment)
Definition of
number of
measurements
at each level
The following symbols are defined for this chapter:
Level-1 J (J > 1) repetitions
Level-2 K (K > 2) days
Schedule for
making
measurements
A schedule for making check standard measurements over time (once a
day, twice a week, or whatever is appropriate for sampling all conditions
of measurement) should be set up and adhered to. The check standard
measurements should be structured in the same way as values reported on
the test items. For example, if the reported values are averages of two
repetitions made within 5 minutes of each other, the check standard
values should be averages of the two measurements made in the same
manner.
Exception One exception to this rule is that there should be at least J = 2 repetitions
per day, etc. Without this redundancy, there is no way to check on the
short-term precision of the measurement system.
Depiction of
schedule for
making check
standard
2.4.3.2. 2-level nested design
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc432.htm[6/27/2012 1:51:33 PM]
measurements
with 4
repetitions per
day over K
days on the
surface of a
silicon wafer
K days - 4 repetitions
2-level design for check standard measurements
Operator
considerations
The measurements should be taken with ONE operator. Operator is not
usually a consideration with automated systems. However, systems that
require decisions regarding line edge or other feature delineations may be
operator dependent.
Case Study:
Resistivity
check
standard
Results should be recorded along with pertinent environmental readings
and identifications for significant factors. The best way to record this
information is in one file with one line or row (on a spreadsheet) of
information in fixed fields for each check standard measurement.
Data analysis
of gauge
precision
The check standard measurements are represented by
for the jth repetition on the kth day. The mean for the kth day is
and the (level-1) standard deviation for gauge precision with v = J - 1
degrees of freedom is
.
Pooling
increases the
reliability of
the estimate of
the standard
deviation
The pooled level-1 standard deviation with v = K(J - 1) degrees of
freedom is
.
Data analysis
of process
The level-2 standard deviation of the check standard represents the
process variability. It is computed with v = K - 1 degrees of freedom as:
2.4.3.2. 2-level nested design
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc432.htm[6/27/2012 1:51:33 PM]
(level-2)
standard
deviation
where
Relationship
to uncertainty
for a test item
The standard deviation that defines the uncertainty for a single
measurement on a test item, often referred to as the reproducibility
standard deviation (ASTM), is given by
The time-dependent component is
There may be other sources of uncertainty in the measurement process
that must be accounted for in a formal analysis of uncertainty.
2.4.3.3. 3-level nested design
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc433.htm[6/27/2012 1:51:34 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.3. Data collection for time-related sources of variability
2.4.3.3. 3-level nested design
Advantages
of nested
designs
A nested design is recommended for studying the effect of
sources of variability that manifest themselves over time. Data
collection and analysis are straightforward, and there is no
reason to estimate interaction terms when dealing with time-
dependent errors. Nested designs can be run at several levels.
Three levels are recommended for measurement systems
where sources of error are not well understood and have not
previously been studied.
Time
intervals in
a nested
design
The following levels are based on the characteristics of many
measurement systems and should be adapted to a specific
measurement situation as need be. A typical design is shown
below.
Level-1 Measurements taken over a short-time to
capture the precision of the gauge
Level-2 Measurements taken over days (or other
appropriate time increment)
Level-3 Measurements taken over runs separated by
months
2.4.3.3. 3-level nested design
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc433.htm[6/27/2012 1:51:34 PM]
Definition of
number of
measurements
at each level
The following symbols are defined for this chapter:
Level-1 J (J > 1) repetitions
Level-2 K (K > 2) days
Level-3 L (L > 2) runs
For the design shown above, J = 4; K = 3 and L = 2. The
design can be repeated for:
Q (Q > 2) check standards
I (I > 3) gauges if the intent is to characterize
several similar gauges
2-level nested
design
The design can be truncated at two levels to estimate
repeatability and day-to-day variability if there is no
reason to estimate longer-term effects. The analysis
remains the same through the first two levels.
Advantages This design has advantages in ease of use and
computation. The number of repetitions at each level need
not be large because information is being gathered on
several check standards.
Operator
considerations
The measurements should be made with ONE operator.
Operator is not usually a consideration with automated
systems. However, systems that require decisions regarding
2.4.3.3. 3-level nested design
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc433.htm[6/27/2012 1:51:34 PM]
line edge or other feature delineations may be operator
dependent. If there is reason to believe that results might
differ significantly by operator, 'operators' can be
substituted for 'runs' in the design. Choose L (L > 2)
operators at random from the pool of operators who are
capable of making measurements at the same level of
precision. (Conduct a small experiment with operators
making repeatability measurements, if necessary, to verify
comparability of precision among operators.) Then
complete the data collection and analysis as outlined. In
this case, the level-3 standard deviation estimates operator
effect.
Caution Be sure that the design is truly nested; i.e., that each
operator reports results for the same set of circumstances,
particularly with regard to day of measurement so that
each operator measures every day, or every other day, and
so forth.
Randomize on
gauges
Randomize with respect to gauges for each check standard;
i.e., choose the first check standard and randomize the
gauges; choose the second check standard and randomize
gauges; and so forth.
Record results
in a file
Record the average and standard deviation from each
group of J repetitions by:
check standard
gauge
Case Study:
Resistivity
Gauges
Results should be recorded along with pertinent
environmental readings and identifications for significant
factors. The best way to record this information is in one
file with one line or row (on a spreadsheet) of information
in fixed fields for each check standard measurement. A list
of typical entries follows.
1. Month
2. Day
3. Year
4. Operator identification
5. Check standard identification
6. Gauge identification
7. Average of J repetitions
8. Short-term standard deviation from J repetitions
9. Degrees of freedom
10. Environmental readings (if pertinent)
2.4.4. Analysis of variability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc44.htm[6/27/2012 1:51:35 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.4. Analysis of variability
Analysis of
variability
from a nested
design
The purpose of this section is to show the effect of various
levels of time-dependent effects on the variability of the
measurement process with standard deviations for each level
of a 3-level nested design.
Level 1 - repeatability/short-term precision
Level 2 - reproducibility/day-to-day
Level 3 - stability/run-to-run
The graph below depicts possible scenarios for a 2-level
design (short-term repetitions and days) to illustrate the
concepts.
Depiction of 2
measurement
processes with
the same
short-term
variability
over 6 days
where process
1 has large
between-day
variability and
process 2 has
negligible
between-day
variability
Process 1 Process 2
Large between-day variability Small between-day
variability
2.4.4. Analysis of variability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc44.htm[6/27/2012 1:51:35 PM]
Distributions of short-term measurements over 6
days where distances from centerlines illustrate
between-day variability
Hint on using
tabular
method of
analysis
An easy way to begin is with a 2-level table with J columns
and K rows for the repeatability/reproducibility measurements
and proceed as follows:
1. Compute an average for each row and put it in the J+1
column.
2. Compute the level-1 (repeatability) standard deviation
for each row and put it in the J+2 column.
3. Compute the grand average and the level-2 standard
deviation from data in the J+1 column.
4. Repeat the table for each of the L runs.
5. Compute the level-3 standard deviation from the L
grand averages.
Level-1: LK
repeatability
standard
deviations can
be computed
from the data
The measurements from the nested design are denoted by
Equations corresponding to the tabular analysis are shown
below. Level-1 repeatability standard deviations, s
1lk
, are
pooled over the K days and L runs. Individual standard
deviations with (J - 1) degrees of freedom each are computed
from J repetitions as
2.4.4. Analysis of variability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc44.htm[6/27/2012 1:51:35 PM]
where
Level-2: L
reproducibility
standard
deviations can
be computed
from the data
The level-2 standard deviation, s
2l
, is pooled over the L runs.
Individual standard deviations with (K - 1) degrees of
freedom each are computed from K daily averages as
where
Level-3: A
single global
standard
deviation can
be computed
from the L-
run averages
A level-3 standard deviation with (L - 1) degrees of freedom
is computed from the L-run averages as
where
Relationship
to uncertainty
for a test item
The standard deviation that defines the uncertainty for a
single measurement on a test item is given by
where the pooled values, s
1
and s
2
, are the usual
and
2.4.4. Analysis of variability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc44.htm[6/27/2012 1:51:35 PM]
There may be other sources of uncertainty in the
measurement process that must be accounted for in a formal
analysis of uncertainty.
2.4.4.1. Analysis of repeatability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc441.htm[6/27/2012 1:51:36 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.4. Analysis of variability
2.4.4.1. Analysis of repeatability
Case study:
Resistivity
probes
The repeatability quantifies the basic precision for the gauge. A level-1
repeatability standard deviation is computed for each group of J
repetitions, and a graphical analysis is recommended for deciding if
repeatability is dependent on the check standard, the operator, or the
gauge. Two graphs are recommended. These should show:
Plot of repeatability standard deviations versus check standard with
day coded
Plot of repeatability standard deviations versus check standard with
gauge coded
Typically, we expect the standard deviation to be gauge dependent -- in
which case there should be a separate standard deviation for each gauge.
If the gauges are all at the same level of precision, the values can be
combined over all gauges.
Repeatability
standard
deviations
can be
pooled over
operators,
runs, and
check
standards
A repeatability standard deviation from J repetitions is not a reliable
estimate of the precision of the gauge. Fortunately, these standard
deviations can be pooled over days; runs; and check standards, if
appropriate, to produce a more reliable precision measure. The table
below shows a mechanism for pooling. The pooled repeatability standard
deviation, , has LK(J - 1) degrees of freedom for measurements taken
over:
J repetitions
K days
L runs
Basic
pooling rules
The table below gives the mechanism for pooling repeatability standard
deviations over days and runs. The pooled value is an average of
weighted variances and is shown as the last entry in the right-hand
column of the table. The pooling can also cover check standards, if
appropriate.
View of
entire
dataset from
the nested
design
To illustrate the calculations, a subset of data collected in a nested design
for one check standard (#140) and one probe (#2362) are shown below.
The measurements are resistivity (ohm.cm) readings with six repetitions
per day. The individual level-1 standard deviations from the six
repetitions and degrees of freedom are recorded in the last two columns
of the database.
2.4.4.1. Analysis of repeatability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc441.htm[6/27/2012 1:51:36 PM]
Run Wafer Probe Month Day Op Temp Average Stddev
df
1 140 2362 3 15 1 23.08 96.0771 0.1024
5
1 140 2362 3 17 1 23.00 95.9976 0.0943
5
1 140 2362 3 18 1 23.01 96.0148 0.0622
5
1 140 2362 3 22 1 23.27 96.0397 0.0702
5
1 140 2362 3 23 2 23.24 96.0407 0.0627
5
1 140 2362 3 24 2 23.13 96.0445 0.0622
5
2 140 2362 4 12 1 22.88 96.0793 0.0996
5
2 140 2362 4 18 2 22.76 96.1115 0.0533
5
2 140 2362 4 19 2 22.79 96.0803 0.0364
5
2 140 2362 4 19 1 22.71 96.0411 0.0768
5
2 140 2362 4 20 2 22.84 96.0988 0.1042
5
2 140 2362 4 21 1 22.94 96.0482 0.0868
5
Pooled repeatability standard deviations over days, runs
Source of
Variability
Degrees
of
Freedom
Standard Deviations
Sum of Squares
(SS)
Probe 2362
run 1 - day 1
run 1 - day 2
run 1 - day 3
run 1 - day 4
run 1 - day 5
run 1 - day 6
run 2 - day 1
run 2 - day 2
run 2 - day 3
run 2 - day 4
run 2 - day 5
run 2 - day 6
5
5
5
5
5
5
5
5
5
5
5
5
0.1024
0.0943
0.0622
0.0702
0.0627
0.0622
0.0996
0.0533
0.0364
0.0768
0.1042
0.0868
0.05243
0.04446
0.01934
0.02464
0.01966
0.01934
0.04960
0.01420
0.00662
0.02949
0.05429
0.03767
gives the total
degrees of
freedom for s
1
60
gives the total sum of
squares for s
1
0.37176
The pooled value of s
1
is given by
0.07871
2.4.4.1. Analysis of repeatability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc441.htm[6/27/2012 1:51:36 PM]
The calculations displayed in the table above can be generated using both
Dataplot code and R code.
2.4.4.2. Analysis of reproducibility
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc442.htm[6/27/2012 1:51:37 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.4. Analysis of variability
2.4.4.2. Analysis of reproducibility
Case study:
Resistivity
gauges
Day-to-day variability can be assessed by a graph of check standard
values (averaged over J repetitions) versus day with a separate graph
for each check standard. Graphs for all check standards should be
plotted on the same page to obtain an overall view of the
measurement situation.
Pooling
results in
more
reliable
estimates
The level-2 standard deviations with (K - 1) degrees of freedom are
computed from the check standard values for days and pooled over
runs as shown in the table below. The pooled level-2 standard
deviation has degrees of freedom
L(K - 1) for measurements made over:
K days
L runs
Mechanism
for pooling
The table below gives the mechanism for pooling level-2 standard
deviations over runs. The pooled value is an average of weighted
variances and is the last entry in the right-hand column of the table.
The pooling can be extended in the same manner to cover check
standards, if appropriate.
The table was generated using a subset of data (shown on previous
page) collected in a nested design on one check standard (#140) with
probe (#2362) over six days. The data are analyzed for between-day
effects. The level-2 standard deviations and pooled level-2 standard
deviations over runs 1 and 2 are:
Level-2 standard deviations for a single gauge pooled
over runs
Source of variability
Standard
deviations
Degrees
of
freedom
Sum of squares
Days
Run 1
Run 2
0.027280
0.027560
5
5
------
-
0.003721
0.003798
-----------
2.4.4.2. Analysis of reproducibility
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc442.htm[6/27/2012 1:51:37 PM]
Sum
Pooled value
10
0.007519
0.02742
Relationship
to day effect
The level-2 standard deviation is related to the standard deviation for
between-day precision and gauge precision by
The size of the day effect can be calculated by subtraction using the
formula above once the other two standard deviations have been
estimated reliably.
Computation
of variance
component
for days
For our example, the variance component for between days is -
0.00028072. The negative number for the variance is interpreted as
meaning that the variance component for days is zero. However, with
only 10 degrees of freedom for the level-2 standard deviation, this
estimate is not necessarily reliable. The standard deviation for days
over the entire database shows a significant component for days.
Sample code The calculations included in this section can be implemented using
both
Dataplot code and R code.
2.4.4.3. Analysis of stability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc443.htm[6/27/2012 1:51:37 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.4. Analysis of variability
2.4.4.3. Analysis of stability
Case study:
Resistivity
probes
Run-to-run variability can be assessed graphically by a plot of check
standard values (averaged over J repetitions) versus time with a separate
graph for each check standard. Data on all check standards should be
plotted on one page to obtain an overall view of the measurement
situation.
Advantage
of pooling
A level-3 standard deviation with (L - 1) degrees of freedom is computed
from the run averages. Because there will rarely be more than two runs
per check standard, resulting in one degree of freedom per check
standard, it is prudent to have three or more check standards in the design
to take advantage of pooling. The mechanism for pooling over check
standards is shown in the table below. The pooled standard deviation has
Q(L - 1) degrees and is shown as the last entry in the right-hand column
of the table.
Example of
pooling
The following table shows how the level-3 standard deviations for a
single gauge (probe #2362) are pooled over check standards. The table
can be reproduced using
R code.
Level-3 standard deviations for a single gauge pooled over
check standards
Source of variability
Standard
deviation
Degrees
of
freedom
Sum of squares
Level-3
Chk std 138
Chk std 139
Chk std 140
Chk std 141
Chk std 142
Sum
Pooled value
0.0223
0.0027
0.0289
0.0133
0.0205
1
1
1
1
1
----
-
5
0.0004973
0.0000073
0.0008352
0.0001769
0.0004203
-----------
0.0019370
0.0197
2.4.4.3. Analysis of stability
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc443.htm[6/27/2012 1:51:37 PM]
Level-3
standard
deviations
A subset of data collected in a nested design on one check standard
(#140) with probe (#2362) for six days and two runs is analyzed for
between-run effects. The level-3 standard deviation, computed from the
averages of two runs, is 0.02885 with one degree of freedom. Dataplot
code and R code can be used to perform the calculations for this data.
Relationship
to long-
term
changes,
days and
gauge
precision
The size of the between-run effect can be calculated by subtraction using
the standard deviations for days and gauge precision as
2.4.4.4.4. Example of calculations
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc4444.htm[6/27/2012 1:51:38 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.4. Analysis of variability
2.4.4.4.
2.4.4.4.4. Example of calculations
Example of
repeatability
calculations
Short-term standard deviations based on
J = 6 repetitions with 5 degrees of freedom
K = 6 days
L = 2 runs
were recorded with a probing instrument on Q = 5 wafers.
The standard deviations were pooled over K = 6 days and L
= 2 runs to give 60 degrees of freedom for each wafer. The
pooling of repeatability standard deviations over the 5 wafers
is demonstrated in the table below.
Pooled repeatability standard deviation for a single gauge
Source of
variability
Sum of Squares (SS)
Degrees of
freedom
(DF)
Std Devs
Repeatability
Wafer #138
Wafer #139
Wafer #140
Wafer #141
Wafer #142
SUM
0.48115
0.69209
0.48483
1.21752
0.30076
3.17635
60
60
60
60
60
300
0.10290
2.4.5. Analysis of bias
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc45.htm[6/27/2012 1:51:39 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.5. Analysis of bias
Definition
of bias
The terms 'bias' and 'systematic error' have the same meaning
in this handbook. Bias is defined (VIM) as the difference
between the measurement result and its unknown 'true value'.
It can often be estimated and/or eliminated by calibration to a
reference standard.
Potential
problem
Calibration relates output to 'true value' in an ideal
environment. However, it may not assure that the gauge reacts
properly in its working environment. Temperature, humidity,
operator, wear, and other factors can introduce bias into the
measurements. There is no single method for dealing with this
problem, but the gauge study is intended to uncover biases in
the measurement process.
Sources of
bias
Sources of bias that are discussed in this Handbook include:
Lack of gauge resolution
Lack of linearity
Drift
Hysteresis
Differences among gauges
Differences among geometries
Differences among operators
Remedial actions and strategies
2.4.5.1. Resolution
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc451.htm[6/27/2012 1:51:39 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.5. Analysis of bias
2.4.5.1. Resolution
Resolution Resolution (MSA) is the ability of the measurement
system to detect and faithfully indicate small changes in
the characteristic of the measurement result.
Definition
from (MSA)
manual
The resolution of the instrument is if there is an equal
probability that the indicated value of any artifact, which
differs from a reference standard by less than , will be
the same as the indicated value of the reference.
Good versus
poor
A small implies good resolution -- the measurement
system can discriminate between artifacts that are close
together in value.
A large implies poor resolution -- the measurement
system can only discriminate between artifacts that are far
apart in value.
Warning The number of digits displayed does not indicate the
resolution of the instrument.
Manufacturer's
statement of
resolution
Resolution as stated in the manufacturer's specifications is
usually a function of the least-significant digit (LSD) of
the instrument and other factors such as timing
mechanisms. This value should be checked in the
laboratory under actual conditions of measurement.
Experimental
determination
of resolution
To make a determination in the laboratory, select several
artifacts with known values over a range from close in
value to far apart. Start with the two artifacts that are
farthest apart and make measurements on each artifact.
Then, measure the two artifacts with the second largest
difference, and so forth, until two artifacts are found
which repeatedly give the same result. The difference
between the values of these two artifacts estimates the
resolution.
Consequence
of poor
resolution
No useful information can be gained from a study on a
gauge with poor resolution relative to measurement needs.
2.4.5.1. Resolution
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc451.htm[6/27/2012 1:51:39 PM]
2.4.5.2. Linearity of the gauge
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc452.htm[6/27/2012 1:51:40 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.5. Analysis of bias
2.4.5.2. Linearity of the gauge
Definition
of linearity
for gauge
studies
Linearity is given a narrow interpretation in this Handbook to
indicate that gauge response increases in equal increments to
equal increments of stimulus, or, if the gauge is biased, that
the bias remains constant throughout the course of the
measurement process.
Data
collection
and
repetitions
A determination of linearity requires Q (Q > 4) reference
standards that cover the range of interest in fairly equal
increments and J (J > 1) measurements on each reference
standard. One measurement is made on each of the reference
standards, and the process is repeated J times.
Plot of the
data
A test of linearity starts with a plot of the measured values
versus corresponding values of the reference standards to
obtain an indication of whether or not the points fall on a
straight line with slope equal to 1 -- indicating linearity.
Least-
squares
estimates
of bias and
slope
A least-squares fit of the data to the model
Y = a + bX + measurement error
where Y is the measurement result and X is the value of the
reference standard, produces an estimate of the intercept, a,
and the slope, b.
Output
from
software
package
The intercept and bias are estimated using a statistical
software package that should provide the following
information:
Estimates of the intercept and slope,
Standard deviations of the intercept and slope
Residual standard deviation of the fit
F-test for goodness of fit
Test for
linearity
Tests for the slope and bias are described in the section on
instrument calibration. If the slope is different from one, the
gauge is non-linear and requires calibration or repair. If the
intercept is different from zero, the gauge has a bias.
2.4.5.2. Linearity of the gauge
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc452.htm[6/27/2012 1:51:40 PM]
Causes of
non-
linearity
The reference manual on Measurement Systems Analysis
(MSA) lists possible causes of gauge non-linearity that should
be investigated if the gauge shows symptoms of non-linearity.
1. Gauge not properly calibrated at the lower and upper
ends of the operating range
2. Error in the value of X at the maximum or minimum
range
3. Worn gauge
4. Internal design problems (electronics)
Note - on
artifact
calibration
The requirement of linearity for artifact calibration is not so
stringent. Where the gauge is used as a comparator for
measuring small differences among test items and reference
standards of the same nominal size, as with calibration
designs, the only requirement is that the gauge be linear over
the small on-scale range needed to measure both the reference
standard and the test item.
Situation
where the
calibration
of the
gauge is
neglected
Sometimes it is not economically feasible to correct for the
calibration of the gauge (Turgel and Vecchia). In this case, the
bias that is incurred by neglecting the calibration is estimated
as a component of uncertainty.
2.4.5.3. Drift
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc453.htm[6/27/2012 1:51:41 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.5. Analysis of bias
2.4.5.3. Drift
Definition Drift can be defined (VIM) as a slow change in the response
of a gauge.
Instruments
used as
comparators
for
calibration
Short-term drift can be a problem for comparator
measurements. The cause is frequently heat build-up in the
instrument during the time of measurement. It would be
difficult, and probably unproductive, to try to pinpoint the
extent of such drift with a gauge study. The simplest solution
is to use drift-free designs for collecting calibration data.
These designs mitigate the effect of linear drift on the results.
Long-term drift should not be a problem for comparator
measurements because such drift would be constant during a
calibration design and would cancel in the difference
measurements.
Instruments
corrected by
linear
calibration
For instruments whose readings are corrected by a linear
calibration line, drift can be detected using a control chart
technique and measurements on three or more check
standards.
Drift in
direct
reading
instruments
and
uncertainty
analysis
For other instruments, measurements can be made on a daily
basis on two or more check standards over a preset time
period, say, one month. These measurements are plotted on a
time scale to determine the extent and nature of any drift.
Drift rarely continues unabated at the same rate and in the
same direction for a long time period.
Thus, the expectation from such an experiment is to
document the maximum change that is likely to occur during
a set time period and plan adjustments to the instrument
accordingly. A further impact of the findings is that
uncorrected drift is treated as a type A component in the
uncertainty analysis.
2.4.5.4. Differences among gauges
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc454.htm[6/27/2012 1:51:41 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.5. Analysis of bias
2.4.5.4. Differences among gauges
Purpose A gauge study should address whether gauges agree with
one another and whether the agreement (or disagreement) is
consistent over artifacts and time.
Data
collection
For each gauge in the study, the analysis requires
measurements on
Q (Q > 2) check standards
K (K > 2) days
The measurements should be made by a single operator.
Data
reduction
The steps in the analysis are:
1. Measurements are averaged over days by
artifact/gauge configuration.
2. For each artifact, an average is computed over
gauges.
3. Differences from this average are then computed for
each gauge.
4. If the design is run as a 3-level design, the statistics
are computed separately for each run.
Data from a
gauge study
The data in the table below come from resistivity (ohm.cm)
measurements on Q = 5 artifacts on K = 6 days. Two runs
were made which were separated by about a month's time.
The artifacts are silicon wafers and the gauges are four-
point probes specifically designed for measuring resistivity
of silicon wafers. Differences from the wafer means are
shown in the table.
Biases for 5
probes from
a gauge study
with 5
artifacts on 6
days
Table of biases for probes and silicon wafers
(ohm.cm)
Wafers
Probe 138 139 140 141
142
------------------------------------------------
---------
1 0.02476 -0.00356 0.04002 0.03938
0.00620
181 0.01076 0.03944 0.01871 -0.01072
0.03761
182 0.01926 0.00574 -0.02008 0.02458
2.4.5.4. Differences among gauges
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc454.htm[6/27/2012 1:51:41 PM]
-0.00439
2062 -0.01754 -0.03226 -0.01258 -0.02802
-0.00110
2362 -0.03725 -0.00936 -0.02608 -0.02522
-0.03830
Plot of
differences
among
probes
A graphical analysis can be more effective for detecting
differences among gauges than a table of differences. The
differences are plotted versus artifact identification with
each gauge identified by a separate plotting symbol. For
ease of interpretation, the symbols for any one gauge can
be connected by dotted lines.
Interpretation Because the plots show differences from the average by
artifact, the center line is the zero-line, and the differences
are estimates of bias. Gauges that are consistently above or
below the other gauges are biased high or low, respectively,
relative to the average. The best estimate of bias for a
particular gauge is its average bias over the Q artifacts. For
this data set, notice that probe #2362 is consistently biased
low relative to the other probes.
Strategies for
dealing with
differences
among
gauges
Given that the gauges are a random sample of like-kind
gauges, the best estimate in any situation is an average over
all gauges. In the usual production or metrology setting,
however, it may only be feasible to make the measurements
on a particular piece with one gauge. Then, there are two
methods of dealing with the differences among gauges.
1. Correct each measurement made with a particular
gauge for the bias of that gauge and report the
standard deviation of the correction as a type A
uncertainty.
2. Report each measurement as it occurs and assess a
type A uncertainty for the differences among the
gauges.
2.4.5.5. Geometry/configuration differences
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc455.htm[6/27/2012 1:51:42 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.5. Analysis of bias
2.4.5.5. Geometry/configuration differences
How to deal
with
configuration
differences
The mechanism for identifying and/or dealing with
differences among geometries or configurations in an
instrument is basically the same as dealing with differences
among the gauges themselves.
Example of
differences
among wiring
configurations
An example is given of a study of configuration
differences for a single gauge. The gauge, a 4-point probe
for measuring resistivity of silicon wafers, can be wired in
several ways. Because it was not possible to test all wiring
configurations during the gauge study, measurements were
made in only two configurations as a way of identifying
possible problems.
Data on
wiring
configurations
and a plot of
differences
between the 2
wiring
configurations
Measurements were made on six wafers over six days
(except for 5 measurements on wafer 39) with probe #2062
wired in two configurations. This sequence of
measurements was repeated after about a month resulting
in two runs. Differences between measurements in the two
configurations on the same day are shown in the following
table.
Differences between wiring
configurations
Wafer Day Probe Run 1 Run
2
17. 1 2062. -0.0108
0.0088
17. 2 2062. -0.0111
0.0062
17. 3 2062. -0.0062
0.0074
17. 4 2062. 0.0020
0.0047
17. 5 2062. 0.0018
0.0049
17. 6 2062. 0.0002
0.0000
39. 1 2062. -0.0089
0.0075
39. 3 2062. -0.0040 -
0.0016
39. 4 2062. -0.0022
0.0052
2.4.5.5. Geometry/configuration differences
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc455.htm[6/27/2012 1:51:42 PM]
39. 5 2062. -0.0012
0.0085
39. 6 2062. -0.0034 -
0.0018
63. 1 2062. -0.0016
0.0092
63. 2 2062. -0.0111
0.0040
63. 3 2062. -0.0059
0.0067
63. 4 2062. -0.0078
0.0016
63. 5 2062. -0.0007
0.0020
63. 6 2062. 0.0006
0.0017
103. 1 2062. -0.0050
0.0076
103. 2 2062. -0.0140
0.0002
103. 3 2062. -0.0048
0.0025
103. 4 2062. 0.0018
0.0045
103. 5 2062. 0.0016 -
0.0025
103. 6 2062. 0.0044
0.0035
125. 1 2062. -0.0056
0.0099
125. 2 2062. -0.0155
0.0123
125. 3 2062. -0.0010
0.0042
125. 4 2062. -0.0014
0.0098
125. 5 2062. 0.0003
0.0032
125. 6 2062. -0.0017
0.0115
Test of
difference
between
configurations
Because there are only two configurations, a t-test is used
to decide if there is a difference. If
the difference between the two configurations is
statistically significant.
The average and standard deviation computed from the 29
differences in each run are shown in the table below along
with the t-values which confirm that the differences are
significant for both runs.
Average differences between wiring
configurations
2.4.5.5. Geometry/configuration differences
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc455.htm[6/27/2012 1:51:42 PM]
Run Probe Average Std dev
N t
1 2062 - 0.00383 0.00514
29 -4.0
2 2062 + 0.00489 0.00400
29 +6.6
Unexpected
result
The data reveal a wiring bias for both runs that changes
direction between runs. This is a somewhat disturbing
finding, and further study of the gauges is needed. Because
neither wiring configuration is preferred or known to give
the 'correct' result, the differences are treated as a
component of the measurement uncertainty.
2.4.5.6. Remedial actions and strategies
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc456.htm[6/27/2012 1:51:43 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.5. Analysis of bias
2.4.5.6. Remedial actions and strategies
Variability The variability of the gauge in its normal operating mode
needs to be examined in light of measurement
requirements.
If the standard deviation is too large, relative to
requirements, the uncertainty can be reduced by making
repeated measurements and taking advantage of the
standard deviation of the average (which is reduced by a
factor of when n measurements are averaged).
Causes of
excess
variability
If multiple measurements are not economically feasible in
the workload, then the performance of the gauge must be
improved. Causes of variability which should be examined
are:
Wear
Environmental effects such as humidity
Temperature excursions
Operator technique
Resolution There is no remedy for a gauge with insufficient resolution.
The gauge will need to be replaced with a better gauge.
Lack of
linearity
Lack of linearity can be dealt with by correcting the output
of the gauge to account for bias that is dependent on the
level of the stimulus. Lack of linearity can be tolerated
(left uncorrected) if it does not increase the uncertainty of
the measurement result beyond its requirement.
Drift It would be very difficult to correct a gauge for drift unless
there is sufficient history to document the direction and
size of the drift. Drift can be tolerated if it does not
increase the uncertainty of the measurement result beyond
its requirement.
Differences
among gauges
or
configurations
Significant differences among gauges/configurations can
be treated in one of two ways:
1. By correcting each measurement for the bias of the
specific gauge/configuration.
2.4.5.6. Remedial actions and strategies
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc456.htm[6/27/2012 1:51:43 PM]
2. By accepting the difference as part of the uncertainty
of the measurement process.
Differences
among
operators
Differences among operators can be viewed in the same
way as differences among gauges. However, an operator
who is incapable of making measurements to the required
precision because of an untreatable condition, such as a
vision problem, should be re-assigned to other tasks.
2.4.6. Quantifying uncertainties from a gauge study
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc46.htm[6/27/2012 1:51:43 PM]
2. Measurement Process Characterization
2.4. Gauge R & R studies
2.4.6. Quantifying uncertainties from a gauge
study
Gauge
studies can
be used as
the basis for
uncertainty
assessment
One reason for conducting a gauge study is to quantify
uncertainties in the measurement process that would be
difficult to quantify under conditions of actual measurement.
This is a reasonable approach to take if the results are truly
representative of the measurement process in its working
environment. Consideration should be given to all sources of
error, particularly those sources of error which do not
exhibit themselves in the short-term run.
Potential
problem with
this
approach
The potential problem with this approach is that the
calculation of uncertainty depends totally on the gauge
study. If the measurement process changes its characteristics
over time, the standard deviation from the gauge study will
not be the correct standard deviation for the uncertainty
analysis. One way to try to avoid such a problem is to carry
out a gauge study both before and after the measurements
that are being characterized for uncertainty. The 'before' and
'after' results should indicate whether or not the
measurement process changed in the interim.
Uncertainty
analysis
requires
information
about the
specific
measurement
The computation of uncertainty depends on the particular
measurement that is of interest. The gauge study gathers the
data and estimates standard deviations for sources that
contribute to the uncertainty of the measurement result.
However, specific formulas are needed to relate these
standard deviations to the standard deviation of a
measurement result.
General
guidance
The following sections outline the general approach to
uncertainty analysis and give methods for combining the
standard deviations into a final uncertainty:
1. Approach
2. Methods for type A evaluations
3. Methods for type B evaluations
4. Propagation of error
5. Error budgets and sensitivity coefficients
6. Standard and expanded uncertainties
7. Treatment of uncorrected biases
2.4.6. Quantifying uncertainties from a gauge study
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc46.htm[6/27/2012 1:51:43 PM]
Type A
evaluations
of random
error
Data collection methods and analyses of random sources of
uncertainty are given for the following:
1. Repeatability of the gauge
2. Reproducibility of the measurement process
3. Stability (very long-term) of the measurement process
Biases - Rule
of thumb
The approach for biases is to estimate the maximum bias
from a gauge study and compute a standard uncertainty
from the maximum bias assuming a suitable distribution.
The formulas shown below assume a uniform distribution
for each bias.
Determining
resolution
If the resolution of the gauge is , the standard uncertainty
for resolution is
Determining
non-linearity
If the maximum departure from linearity for the gauge has
been determined from a gauge study, and it is reasonable to
assume that the gauge is equally likely to be engaged at any
point within the range tested, the standard uncertainty for
linearity is
Hysteresis Hysteresis, as a performance specification, is defined (NCSL
RP-12) as the maximum difference between the upscale and
downscale readings on the same artifact during a full range
traverse in each direction. The standard uncertainty for
hysteresis is
Determining
drift
Drift in direct reading instruments is defined for a specific
time interval of interest. The standard uncertainty for drift is
where Y
0
and Y
t
are measurements at time zero and t,
respectively.
Other biases Other sources of bias are discussed as follows:
1. Differences among gauges
2. Differences among configurations
Case study: A case study on type A uncertainty analysis from a gauge
2.4.6. Quantifying uncertainties from a gauge study
http://www.itl.nist.gov/div898/handbook/mpc/section4/mpc46.htm[6/27/2012 1:51:43 PM]
Type A
uncertainties
from a
gauge study
study is recommended as a guide for bringing together the
principles and elements discussed in this section. The study
in question characterizes the uncertainty of resistivity
measurements made on silicon wafers.
2.5. Uncertainty analysis
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5.htm[6/27/2012 1:51:44 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
Uncertainty
measures
'goodness'
of a test
result
This section discusses the uncertainty of measurement results.
Uncertainty is a measure of the 'goodness' of a result.
Without such a measure, it is impossible to judge the fitness
of the value as a basis for making decisions relating to health,
safety, commerce or scientific excellence.
Contents 1. What are the issues for uncertainty analysis?
2. Approach to uncertainty analysis
1. Steps
3. Type A evaluations
1. Type A evaluations of random error
1. Time-dependent components
2. Measurement configurations
2. Type A evaluations of material inhomogeneities
1. Data collection and analysis
3. Type A evaluations of bias
1. Treatment of inconsistent bias
2. Treatment of consistent bias
3. Treatment of bias with sparse data
4. Type B evaluations
1. Assumed distributions
5. Propagation of error considerations
1. Functions of a single variable
2. Functions of two variables
3. Functions of several variables
6. Error budgets and sensitivity coefficients
1. Sensitivity coefficients for measurements on the
test item
2. Sensitivity coefficients for measurements on a
check standard
3. Sensitivity coefficients for measurements with a
2-level design
4. Sensitivity coefficients for measurements with a
3-level design
5. Example of error budget
7. Standard and expanded uncertainties
1. Degrees of freedom
8. Treatment of uncorrected bias
2.5. Uncertainty analysis
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5.htm[6/27/2012 1:51:44 PM]
1. Computation of revised uncertainty
2.5.1. Issues
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc51.htm[6/27/2012 1:51:45 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.1. Issues
Issues for
uncertainty
analysis
Evaluation of uncertainty is an ongoing process that can
consume time and resources. It can also require the
services of someone who is familiar with data analysis
techniques, particularly statistical analysis. Therefore, it
is important for laboratory personnel who are
approaching uncertainty analysis for the first time to be
aware of the resources required and to carefully lay out a
plan for data collection and analysis.
Problem areas Some laboratories, such as test laboratories, may not
have the resources to undertake detailed uncertainty
analyses even though, increasingly, quality management
standards such as the ISO 9000 series are requiring that
all measurement results be accompanied by statements of
uncertainty.
Other situations where uncertainty analyses are
problematical are:
One-of-a-kind measurements
Dynamic measurements that depend strongly on
the application for the measurement
Directions being
pursued
What can be done in these situations? There is no
definitive answer at this time. Several organizations,
such as the National Conference of Standards
Laboratories (NCSL) and the International Standards
Organization (ISO) are investigating methods for dealing
with this problem, and there is a document in draft that
will recommend a simplified approach to uncertainty
analysis based on results of interlaboratory tests.
Relationship to
interlaboratory
test results
Many laboratories or industries participate in
interlaboratory studies where the test method itself is
evaluated for:
repeatability within laboratories
reproducibility across laboratories
These evaluations do not lead to uncertainty statements
because the purpose of the interlaboratory test is to
evaluate, and then improve, the test method as it is
2.5.1. Issues
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc51.htm[6/27/2012 1:51:45 PM]
applied across the industry. The purpose of uncertainty
analysis is to evaluate the result of a particular
measurement, in a particular laboratory, at a particular
time. However, the two purposes are related.
Default
recommendation
for test
laboratories
If a test laboratory has been party to an interlaboratory
test that follows the recommendations and analyses of an
American Society for Testing Materials standard (ASTM
E691) or an ISO standard (ISO 5725), the laboratory
can, as a default, represent its standard uncertainty for a
single measurement as the reproducibility standard
deviation as defined in ASTM E691 and ISO 5725. This
standard deviation includes components for within-
laboratory repeatability common to all laboratories and
between-laboratory variation.
Drawbacks of
this procedure
The standard deviation computed in this manner
describes a future single measurement made at a
laboratory randomly drawn from the group and leads to a
prediction interval (Hahn & Meeker) rather than a
confidence interval. It is not an ideal solution and may
produce either an unrealistically small or unacceptably
large uncertainty for a particular laboratory. The
procedure can reward laboratories with poor performance
or those that do not follow the test procedures to the
letter and punish laboratories with good performance.
Further, the procedure does not take into account sources
of uncertainty other than those captured in the
interlaboratory test. Because the interlaboratory test is a
snapshot at one point in time, characteristics of the
measurement process over time cannot be accurately
evaluated. Therefore, it is a strategy to be used only
where there is no possibility of conducting a realistic
uncertainty investigation.
2.5.2. Approach
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc52.htm[6/27/2012 1:51:45 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.2. Approach
Procedures
in this
chapter
The procedures in this chapter are intended for test
laboratories, calibration laboratories, and scientific
laboratories that report results of measurements from
ongoing or well-documented processes.
Pertinent
sections
The following pages outline methods for estimating the
individual uncertainty components, which are consistent
with materials presented in other sections of this Handbook,
and rules and equations for combining them into a final
expanded uncertainty. The general framework is:
1. ISO Approach
2. Outline of steps to uncertainty analysis
3. Methods for type A evaluations
4. Methods for type B evaluations
5. Propagation of error considerations
6. Uncertainty budgets and sensitivity coefficients
7. Standard and expanded uncertainties
8. Treatment of uncorrected bias
Specific
situations are
outlined in
other places
in this
chapter
Methods for calculating uncertainties for specific results are
explained in the following sections:
Calibrated values of artifacts
Calibrated values from calibration curves
From propagation of error
From check standard measurements
Comparison of check standards and
propagation of error
Gauge R & R studies
Type A components for resistivity measurements
Type B components for resistivity measurements
ISO
definition of
uncertainty
Uncertainty, as defined in the ISO Guide to the Expression
of Uncertainty in Measurement (GUM) and the
International Vocabulary of Basic and General Terms in
Metrology (VIM), is a
"parameter, associated with the result of a
measurement, that characterizes the dispersion
of the values that could reasonably be
2.5.2. Approach
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc52.htm[6/27/2012 1:51:45 PM]
attributed to the measurand."
Consistent
with
historical
view of
uncertainty
This definition is consistent with the well-established
concept that an uncertainty statement assigns credible limits
to the accuracy of a reported value, stating to what extent
that value may differ from its reference value (Eisenhart).
In some cases, reference values will be traceable to a
national standard, and in certain other cases, reference
values will be consensus values based on measurements
made according to a specific protocol by a group of
laboratories.
Accounts for
both random
error and
bias
The estimation of a possible discrepancy takes into account
both random error and bias in the measurement process.
The distinction to keep in mind with regard to random error
and bias is that random errors cannot be corrected, and
biases can, theoretically at least, be corrected or eliminated
from the measurement result.
Relationship
to precision
and bias
statements
Precision and bias are properties of a measurement method.
Uncertainty is a property of a specific result for a single
test item that depends on a specific measurement
configuration (laboratory/instrument/operator, etc.). It
depends on the repeatability of the instrument; the
reproducibility of the result over time; the number of
measurements in the test result; and all sources of random
and systematic error that could contribute to disagreement
between the result and its reference value.
Handbook
follows the
ISO
approach
This Handbook follows the ISO approach (GUM) to stating
and combining components of uncertainty. To this basic
structure, it adds a statistical framework for estimating
individual components, particularly those that are classified
as type A uncertainties.
Basic ISO
tenets
The ISO approach is based on the following rules:
Each uncertainty component is quantified by a
standard deviation.
All biases are assumed to be corrected and any
uncertainty is the uncertainty of the correction.
Zero corrections are allowed if the bias cannot be
corrected and an uncertainty is assessed.
All uncertainty intervals are symmetric.
ISO
approach to
classifying
sources of
error
Components are grouped into two major categories,
depending on the source of the data and not on the type of
error, and each component is quantified by a standard
deviation. The categories are:
Type A - components evaluated by statistical
2.5.2. Approach
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc52.htm[6/27/2012 1:51:45 PM]
methods
Type B - components evaluated by other means (or in
other laboratories)
Interpretation
of this
classification
One way of interpreting this classification is that it
distinguishes between information that comes from sources
local to the measurement process and information from
other sources -- although this interpretation does not always
hold. In the computation of the final uncertainty it makes no
difference how the components are classified because the
ISO guidelines treat type A and type B evaluations in the
same manner.
Rule of
quadrature
All uncertainty components (standard deviations) are
combined by root-sum-squares (quadrature) to arrive at a
'standard uncertainty', u, which is the standard deviation of
the reported value, taking into account all sources of error,
both random and systematic, that affect the measurement
result.
Expanded
uncertainty
for a high
degree of
confidence
If the purpose of the uncertainty statement is to provide
coverage with a high level of confidence, an expanded
uncertainty is computed as
U = k u
where k is chosen to be the t
1-/2,
critical value from the t-
table with degrees of freedom.
For large degrees of freedom, it is suggested to use k = 2
to approximate 95% coverage. Details for these calculations
are found under degrees of freedom.
Type B
evaluations
Type B evaluations apply to random errors and biases for
which there is little or no data from the local process, and
to random errors and biases from other measurement
processes.
2.5.2.1. Steps
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc521.htm[6/27/2012 1:51:46 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.2. Approach
2.5.2.1. Steps
Steps in
uncertainty
analysis -
define the
result to
be
reported
The first step in the uncertainty evaluation is the definition of
the result to be reported for the test item for which an
uncertainty is required. The computation of the standard
deviation depends on the number of repetitions on the test
item and the range of environmental and operational
conditions over which the repetitions were made, in addition
to other sources of error, such as calibration uncertainties for
reference standards, which influence the final result. If the
value for the test item cannot be measured directly, but must
be calculated from measurements on secondary quantities, the
equation for combining the various quantities must be defined.
The steps to be followed in an uncertainty analysis are
outlined for two situations:
Outline of
steps to be
followed in
the
evaluation
of
uncertainty
for a
single
quantity
A. Reported value involves measurements on one quantity.
1. Compute a type A standard deviation for random
sources of error from:
Replicated results for the test item.
Measurements on a check standard.
Measurements made according to a 2-level
designed experiment
Measurements made according to a 3-level
designed experiment
2. Make sure that the collected data and analysis cover all
sources of random error such as:
instrument imprecision
day-to-day variation
long-term variation
and bias such as:
differences among instruments
operator differences.
3. Compute a standard deviation for each type B
component of uncertainty.
4. Combine type A and type B standard deviations into a
2.5.2.1. Steps
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc521.htm[6/27/2012 1:51:46 PM]
standard uncertainty for the reported result using
sensitivity factors.
5. Compute an expanded uncertainty.
Outline of
steps to be
followed in
the
evaluation
of
uncertainty
involving
several
secondary
quantities
B. - Reported value involves more than one quantity.
1. Write down the equation showing the relationship
between the quantities.
Write-out the propagation of error equation and
do a preliminary evaluation, if possible, based on
propagation of error.
2. If the measurement result can be replicated directly,
regardless of the number of secondary quantities in the
individual repetitions, treat the uncertainty evaluation as
in (A.1) to (A.5) above, being sure to evaluate all
sources of random error in the process.
3. If the measurement result cannot be replicated
directly, treat each measurement quantity as in (A.1)
and (A.2) and:
Compute a standard deviation for each
measurement quantity.
Combine the standard deviations for the
individual quantities into a standard deviation for
the reported result via propagation of error.
4. Compute a standard deviation for each type B
component of uncertainty.
5. Combine type A and type B standard deviations into a
standard uncertainty for the reported result.
6. Compute an expanded uncertainty.
7. Compare the uncerainty derived by propagation of error
with the uncertainty derived by data analysis techniques.
2.5.3. Type A evaluations
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc53.htm[6/27/2012 1:51:47 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
Type A
evaluations
apply to
both error
and bias
Type A evaluations can apply to both random error and bias.
The only requirement is that the calculation of the uncertainty
component be based on a statistical analysis of data. The
distinction to keep in mind with regard to random error and
bias is that:
random errors cannot be corrected
biases can, theoretically at least, be corrected or
eliminated from the result.
Caveat for
biases
The ISO guidelines are based on the assumption that all biases
are corrected and that the only uncertainty from this source is
the uncertainty of the correction. The section on type A
evaluations of bias gives guidance on how to assess, correct
and calculate uncertainties related to bias.
Random
error and
bias
require
different
types of
analyses
How the source of error affects the reported value and the
context for the uncertainty determines whether an analysis of
random error or bias is appropriate.
Consider a laboratory with several instruments that can
reasonably be assumed to be representative of all similar
instruments. Then the differences among these instruments
can be considered to be a random effect if the uncertainty
statement is intended to apply to the result of any instrument,
selected at random, from this batch.
If, on the other hand, the uncertainty statement is intended to
apply to one specific instrument, then the bias of this
instrument relative to the group is the component of interest.
The following pages outline methods for type A evaluations
of:
1. Random errors
2. Bias
2.5.3.1. Type A evaluations of random components
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc531.htm[6/27/2012 1:51:47 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.1. Type A evaluations of random
components
Type A
evaluations of
random
components
Type A sources of uncertainty fall into three main
categories:
1. Uncertainties that reveal themselves over time
2. Uncertainties caused by specific conditions of
measurement
3. Uncertainties caused by material inhomogeneities
Time-dependent
changes are a
primary source
of random
errors
One of the most important indicators of random error is
time, with the root cause perhaps being environmental
changes over time. Three levels of time-dependent
effects are discussed in this section.
Many possible
configurations
may exist in a
laboratory for
making
measurements
Other sources of uncertainty are related to measurement
configurations within the laboratory. Measurements on
test items are usually made on a single day, with a single
operator, on a single instrument, etc. If the intent of the
uncertainty is to characterize all measurements made in
the laboratory, the uncertainty should account for any
differences due to:
1. instruments
2. operators
3. geometries
4. other
Examples of
causes of
differences
within a
laboratory
Examples of causes of differences within a well-
maintained laboratory are:
1. Differences among instruments for measurements
of derived units, such as sheet resistance of silicon,
where the instruments cannot be directly calibrated
to a reference base
2. Differences among operators for optical
measurements that are not automated and depend
strongly on operator sightings
3. Differences among geometrical or electrical
2.5.3.1. Type A evaluations of random components
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc531.htm[6/27/2012 1:51:47 PM]
configurations of the instrumentation
Calibrated
instruments do
not fall in this
class
Calibrated instruments do not normally fall in this class
because uncertainties associated with the instrument's
calibration are reported as type B evaluations, and the
instruments in the laboratory should agree within the
calibration uncertainties. Instruments whose responses are
not directly calibrated to the defined unit are candidates
for type A evaluations. This covers situations in which
the measurement is defined by a test procedure or
standard practice using a specific instrument type.
Evaluation
depends on the
context for the
uncertainty
How these differences are treated depends primarily on
the context for the uncertainty statement. The differences,
depending on the context, will be treated either as
random differences, or as bias differences.
Uncertainties
due to
inhomogeneities
Artifacts, electrical devices, and chemical substances, etc.
can be inhomogeneous relative to the quantity that is
being characterized by the measurement process. If this
fact is known beforehand, it may be possible to measure
the artifact very carefully at a specific site and then direct
the user to also measure at this site. In this case, there is
no contribution to measurement uncertainty from
inhomogeneity.
However, this is not always possible, and measurements
may be destructive. As an example, compositions of
chemical compounds may vary from bottle to bottle. If
the reported value for the lot is established from
measurements on a few bottles drawn at random from the
lot, this variability must be taken into account in the
uncertainty statement.
Methods for testing for inhomogeneity and assessing the
appropriate uncertainty are discussed on another page.
2.5.3.1.1. Type A evaluations of time-dependent effects
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5311.htm[6/27/2012 1:51:48 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.1. Type A evaluations of random components
2.5.3.1.1. Type A evaluations of time-dependent
effects
Time-
dependent
changes are a
primary
source of
random
errors
One of the most important indicators of random error is
time. Effects not specifically studied, such as
environmental changes, exhibit themselves over time.
Three levels of time-dependent errors are discussed in this
section. These can be usefully characterized as:
1. Level-1 or short-term errors (repeatability,
imprecision)
2. Level-2 or day-to-day errors (reproducibility)
3. Level-3 or long-term errors (stability - which may
not be a concern for all processes)
Day-to-day
errors can be
the dominant
source of
uncertainty
With instrumentation that is exceedingly precise in the
short run, changes over time, often caused by small
environmental effects, are frequently the dominant source
of uncertainty in the measurement process. The uncertainty
statement is not 'true' to its purpose if it describes a
situation that cannot be reproduced over time. The
customer for the uncertainty is entitled to know the range
of possible results for the measurement result, independent
of the day or time of year when the measurement was
made.
Two levels
may be
sufficient
Two levels of time-dependent errors are probably
sufficient for describing the majority of measurement
processes. Three levels may be needed for new
measurement processes or processes whose characteristics
are not well understood.
Measurements
on test item
are used to
assess
uncertainty
only when no
other data are
available
Repeated measurements on the test item generally do not
cover a sufficient time period to capture day-to-day
changes in the measurement process. The standard
deviation of these measurements is quoted as the estimate
of uncertainty only if no other data are available for the
assessment. For J short-term measurements, this standard
deviation has v = J - 1 degrees of freedom.
2.5.3.1.1. Type A evaluations of time-dependent effects
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5311.htm[6/27/2012 1:51:48 PM]
A check
standard is
the best
device for
capturing all
sources of
random error
The best approach for capturing information on time-
dependent sources of uncertainties is to intersperse the
workload with measurements on a check standard taken at
set intervals over the life of the process. The standard
deviation of the check standard measurements estimates
the overall temporal component of uncertainty directly --
thereby obviating the estimation of individual components.
Nested design
for estimating
type A
uncertainties
Case study:
Temporal
uncertainty
from a 3-level
nested design
A less-efficient method for estimating time-dependent
sources of uncertainty is a designed experiment.
Measurements can be made specifically for estimating two
or three levels of errors. There are many ways to do this,
but the easiest method is a nested design where J short-
term measurements are replicated on K days and the entire
operation is then replicated over L runs (months, etc.). The
analysis of these data leads to:
= standard deviation with (J -1) degrees of
freedom for short-term errors
= standard deviation with (K -1) degrees of
freedom for day-to-day errors
= standard deviation with (L -1) degrees of
freedom for very long-term errors
Approaches
given in this
chapter
The computation of the uncertainty of the reported value
for a test item is outlined for situations where temporal
sources of uncertainty are estimated from:
1. measurements on the test item itself
2. measurements on a check standard
3. measurements from a 2-level nested design (gauge
study)
4. measurements from a 3-level nested design (gauge
study)
2.5.3.1.2. Measurement configuration within the laboratory
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5312.htm[6/27/2012 1:51:49 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.1. Type A evaluations of random components
2.5.3.1.2. Measurement configuration within the
laboratory
Purpose of
this page
The purpose of this page is to outline options for estimating
uncertainties related to the specific measurement
configuration under which the test item is measured, given
other possible measurement configurations. Some of these
may be controllable and some of them may not, such as:
instrument
operator
temperature
humidity
The effect of uncontrollable environmental conditions in
the laboratory can often be estimated from check standard
data taken over a period of time, and methods for
calculating components of uncertainty are discussed on
other pages. Uncertainties resulting from controllable
factors, such as operators or instruments chosen for a
specific measurement, are discussed on this page.
First, decide
on context for
uncertainty
The approach depends primarily on the context for the
uncertainty statement. For example, if instrument effect is
the question, one approach is to regard, say, the instruments
in the laboratory as a random sample of instruments of the
same type and to compute an uncertainty that applies to all
results regardless of the particular instrument on which the
measurements are made. The other approach is to compute
an uncertainty that applies to results using a specific
instrument.
Next,
evaluate
whether or
not there are
differences
To treat instruments as a random source of uncertainty
requires that we first determine if differences due to
instruments are significant. The same can be said for
operators, etc.
Plan for
collecting
data
To evaluate the measurement process for instruments,
select a random sample of I (I > 4) instruments from those
available. Make measurements on Q (Q >2) artifacts with
each instrument.
2.5.3.1.2. Measurement configuration within the laboratory
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5312.htm[6/27/2012 1:51:49 PM]
Graph
showing
differences
among
instruments
For a graphical analysis, differences from the average for
each artifact can be plotted versus artifact, with instruments
individually identified by a special plotting symbol. The
plot is examined to determine if some instruments always
read high or low relative to the other instruments and if this
behavior is consistent across artifacts. If there are
systematic and significant differences among instruments, a
type A uncertainty for instruments is computed. Notice that
in the graph for resistivity probes, there are differences
among the probes with probes #4 and #5, for example,
consistently reading low relative to the other probes. A
standard deviation that describes the differences among the
probes is included as a component of the uncertainty.
Standard
deviation for
instruments
Given the measurements,
for each of Q artifacts and I instruments, the pooled
standard deviation that describes the differences among
instruments is:
where
Example of
resistivity
measurements
on silicon
wafers
A two-way table of resistivity measurements (ohm.cm)
with 5 probes on 5 wafers (identified as: 138, 139, 140, 141,
142) is shown below. Standard deviations for probes with 4
degrees of freedom each are shown for each wafer. The
pooled standard deviation over all wafers, with 20 degrees
of freedom, is the type A standard deviation for
instruments.
Wafers
Probe 138 139 140 141
142
------------------------------------------------
-------
1 95.1548 99.3118 96.1018 101.1248
94.2593
281 95.1408 99.3548 96.0805 101.0747
94.2907
. 283 95.1493 99.3211 96.0417 101.1100
94.2487
2062 95.1125 99.2831 96.0492 101.0574
94.2520
2362 95.0928 99.3060 96.0357 101.0602
94.2148
2.5.3.1.2. Measurement configuration within the laboratory
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5312.htm[6/27/2012 1:51:49 PM]
Std dev 0.02643 0.02612 0.02826 0.03038
0.02711
DF 4 4 4 4
4
Pooled standard deviation = 0.02770 DF =
20
2.5.3.2. Material inhomogeneity
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc532.htm[6/27/2012 1:51:50 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.2. Material inhomogeneity
Purpose of this
page
The purpose of this page is to outline methods for
assessing uncertainties related to material
inhomogeneities. Artifacts, electrical devices, and
chemical substances, etc. can be inhomogeneous relative
to the quantity that is being characterized by the
measurement process.
Effect of
inhomogeneity
on the
uncertainty
Inhomogeneity can be a factor in the uncertainty analysis
where
1. an artifact is characterized by a single value and
the artifact is inhomogeneous over its surface, etc.
2. a lot of items is assigned a single value from a few
samples from the lot and the lot is inhomogeneous
from sample to sample.
An unfortunate aspect of this situation is that the
uncertainty from inhomogeneity may dominate the
uncertainty. If the measurement process itself is very
precise and in statistical control, the total uncertainty may
still be unacceptable for practical purposes because of
material inhomogeneities.
Targeted
measurements
can eliminate
the effect of
inhomogeneity
It may be possible to measure an artifact very carefully at
a specific site and direct the user to also measure at this
site. In this case there is no contribution to measurement
uncertainty from inhomogeneity.
Example Silicon wafers are doped with boron to produce desired
levels of resistivity (ohm.cm). Manufacturing processes
for semiconductors are not yet capable (at least at the
time this was originally written) of producing 2" diameter
wafers with constant resistivity over the surfaces.
However, because measurements made at the center of a
wafer by a certification laboratory can be reproduced in
the industrial setting, the inhomogeneity is not a factor in
the uncertainty analysis -- as long as only the center-
point of the wafer is used for future measurements.
Random Random inhomogeneities are assessed using statistical
2.5.3.2. Material inhomogeneity
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc532.htm[6/27/2012 1:51:50 PM]
inhomogeneities methods for quantifying random errors. An example of
inhomogeneity is a chemical compound which cannot be
sufficiently homogenized with respect to isotopes of
interest. Isotopic ratio determinations, which are
destructive, must be determined from measurements on a
few bottles drawn at random from the lot.
Best strategy The best strategy is to draw a sample of bottles from the
lot for the purpose of identifying and quantifying
between-bottle variability. These measurements can be
made with a method that lacks the accuracy required to
certify isotopic ratios, but is precise enough to allow
between-bottle comparisons. A second sample is drawn
from the lot and measured with an accurate method for
determining isotopic ratios, and the reported value for the
lot is taken to be the average of these determinations.
There are therefore two components of uncertainty
assessed:
1. component that quantifies the imprecision of the
average
2. component that quantifies how much an individual
bottle can deviate from the average.
Systematic
inhomogeneities
Systematic inhomogeneities require a somewhat different
approach. Roughness can vary systematically over the
surface of a 2" square metal piece lathed to have a
specific roughness profile. The certification laboratory
can measure the piece at several sites, but unless it is
possible to characterize roughness as a mathematical
function of position on the piece, inhomogeneity must be
assessed as a source of uncertainty.
Best strategy In this situation, the best strategy is to compute the
reported value as the average of measurements made over
the surface of the piece and assess an uncertainty for
departures from the average. The component of
uncertainty can be assessed by one of several methods
for evaluating bias -- depending on the type of
inhomogeneity.
Standard
method
The simplest approach to the computation of uncertainty
for systematic inhomogeneity is to compute the
maximum deviation from the reported value and,
assuming a uniform, normal or triangular distribution for
the distribution of inhomogeneity, compute the
appropriate standard deviation. Sometimes the
approximate shape of the distribution can be inferred
from the inhomogeneity measurements. The standard
deviation for inhomogeneity assuming a uniform
distribution is:
2.5.3.2. Material inhomogeneity
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc532.htm[6/27/2012 1:51:50 PM]
2.5.3.2.1. Data collection and analysis
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5321.htm[6/27/2012 1:51:51 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.2. Material inhomogeneity
2.5.3.2.1. Data collection and analysis
Purpose of
this page
The purpose of this page is to outline methods for:
collecting data
testing for inhomogeneity
quantifying the component of uncertainty
Balanced
measurements
at 2-levels
The simplest scheme for identifying and quantifying the effect of
inhomogeneity of a measurement result is a balanced (equal number of
measurements per cell) 2-level nested design. For example, K bottles
of a chemical compound are drawn at random from a lot and J (J > 1)
measurements are made per bottle. The measurements are denoted by
where the k index runs over bottles and the j index runs over
repetitions within a bottle.
Analysis of
measurements
The between (bottle) variance is calculated using an analysis of
variance technique that is repeated here for convenience.
where
and
Between
bottle
variance may
be negative
If this variance is negative, there is no contribution to uncertainty, and
the bottles are equivalent with regard to their chemical compositions.
Even if the variance is positive, inhomogeneity still may not be
statistically significant, in which case it is not required to be included
2.5.3.2.1. Data collection and analysis
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5321.htm[6/27/2012 1:51:51 PM]
as a component of the uncertainty.
If the between-bottle variance is statistically significantly (i.e., judged
to be greater than zero), then inhomogeneity contributes to the
uncertainty of the reported value.
Certification,
reported
value and
associated
uncertainty
The purpose of assessing inhomogeneity is to be able to assign a value
to the entire batch based on the average of a few bottles, and the
determination of inhomogeneity is usually made by a less accurate
method than the certification method. The reported value for the batch
would be the average of N repetitions on Q bottles using the
certification method.
The uncertainty calculation is summarized below for the case where
the only contribution to uncertainty from the measurement method
itself is the repeatability standard deviation, s
1
associated with the
certification method. For more complicated scenarios, see the pages on
uncertainty budgets.
If s
reported value
If , we need to distinguish two cases and their interpretations:
1. The standard deviation
leads to an interval that covers the difference between the
reported value and the average for a bottle selected at random
from the batch.
2. The standard deviation
allows one to test the instrument using a single measurement.
The prediction interval for the difference between the reported
value and a single measurement, made with the same precision
as the certification measurements, on a bottle selected at random
from the batch. This is appropriate when the instrument under
test is similar to the certification instrument. If the difference is
not within the interval, the user's instrument is in need of
calibration.
Relationship
to prediction
intervals
When the standard deviation for inhomogeneity is included in the
calculation, as in the last two cases above, the uncertainty interval
becomes a prediction interval ( Hahn & Meeker) and is interpreted as
characterizing a future measurement on a bottle drawn at random from
the lot.
2.5.3.2.1. Data collection and analysis
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5321.htm[6/27/2012 1:51:51 PM]
2.5.3.3. Type A evaluations of bias
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc533.htm[6/27/2012 1:51:51 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.3. Type A evaluations of bias
Sources of
bias relate to
the specific
measurement
environment
The sources of bias discussed on this page cover specific
measurement configurations. Measurements on test items
are usually made on a single day, with a single operator,
with a single instrument, etc. Even if the intent of the
uncertainty is to characterize only those measurements made
in one specific configuration, the uncertainty must account
for any significant differences due to:
1. instruments
2. operators
3. geometries
4. other
Calibrated
instruments
do not fall in
this class
Calibrated instruments do not normally fall in this class
because uncertainties associated with the instrument's
calibration are reported as type B evaluations, and the
instruments in the laboratory should agree within the
calibration uncertainties. Instruments whose responses are
not directly calibrated to the defined unit are candidates for
type A evaluations. This covers situations where the
measurement is defined by a test procedure or standard
practice using a specific instrument type.
The best
strategy is to
correct for
bias and
compute the
uncertainty
of the
correction
This problem was treated on the foregoing page as an
analysis of random error for the case where the uncertainty
was intended to apply to all measurements for all
configurations. If measurements for only one configuration
are of interest, such as measurements made with a specific
instrument, or if a smaller uncertainty is required, the
differences among, say, instruments are treated as biases.
The best strategy in this situation is to correct all
measurements made with a specific instrument to the
average for the instruments in the laboratory and compute a
type A uncertainty for the correction. This strategy, of
course, relies on the assumption that the instruments in the
laboratory represent a random sample of all instruments of a
specific type.
Only limited
comparisons
can be made
However, suppose that it is possible to make comparisons
among, say, only two instruments and neither is known to
be 'unbiased'. This scenario requires a different strategy
2.5.3.3. Type A evaluations of bias
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc533.htm[6/27/2012 1:51:51 PM]
among
sources of
possible bias
because the average will not necessarily be an unbiased
result. The best strategy if there is a significant difference
between the instruments, and this should be tested, is to
apply a 'zero' correction and assess a type A uncertainty of
the correction.
Guidelines
for treatment
of biases
The discussion above is intended to point out that there are
many possible scenarios for biases and that they should be
treated on a case-by-case basis. A plan is needed for:
gathering data
testing for bias (graphically and/or statistically)
estimating biases
assessing uncertainties associated with significant
biases.
caused by:
instruments
operators
configurations, geometries, etc.
inhomogeneities
Plan for
testing for
assessing
bias
Measurements needed for assessing biases among
instruments, say, requires a random sample of I (I > 1)
instruments from those available and measurements on Q (Q
>2) artifacts with each instrument. The same can be said for
the other sources of possible bias. General strategies for
dealing with significant biases are given in the table below.
Data collection and analysis for assessing biases related to:
lack of resolution of instrument
non-linearity of instrument
drift
are addressed in the section on gauge studies.
Sources of
data for
evaluating
this type of
bias
Databases for evaluating bias may be available from:
check standards
gauge R and R studies
control measurements
Strategies for assessing corrections and uncertainties associated with significant biases
Type of bias Examples Type of correction Uncertainty
1. Inconsistent
Sign change (+ to -)
Varying magnitude
Zero
Based on
maximum
bias
2.5.3.3. Type A evaluations of bias
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc533.htm[6/27/2012 1:51:51 PM]
2. Consistent
Instrument bias ~ same
magnitude over many
artifacts
Bias (for a single
instrument) =
difference from
average over several
instruments
Standard
deviation of
correction
3. Not correctable
because of sparse data
- consistent or
inconsistent
Limited testing; e.g.,
only 2 instruments,
operators,
configurations, etc.
Zero
Standard
deviation of
correction
4. Not correctable -
consistent
Lack of resolution,
non-linearity, drift,
material inhomogeneity
Zero
Based on
maximum
bias
Strategy
for no
significant
bias
If there is no significant bias over time, there is no correction
and no contribution to uncertainty.
2.5.3.3.1. Inconsistent bias
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5331.htm[6/27/2012 1:51:52 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.3. Type A evaluations of bias
2.5.3.3.1. Inconsistent bias
Strategy for
inconsistent
bias -- apply
a zero
correction
If there is significant bias but it changes direction over time,
a zero correction is assumed and the standard deviation of
the correction is reported as a type A uncertainty; namely,
Computations
based on
uniform or
normal
distribution
The equation for estimating the standard deviation of the
correction assumes that biases are uniformly distributed
between {-max |bias|, + max |bias|}. This assumption is
quite conservative. It gives a larger uncertainty than the
assumption that the biases are normally distributed. If
normality is a more reasonable assumption, substitute the
number '3' for the 'square root of 3' in the equation above.
Example of
change in
bias over
time
The results of resistivity measurements with five probes on
five silicon wafers are shown below for probe #283, which
is the probe of interest at this level with the artifacts being
1 ohm.cm wafers. The bias for probe #283 is negative for
run 1 and positive for run 2 with the runs separated by a
two-month time period. The correction is taken to be zero.
Table of biases (ohm.cm) for probe 283
Wafer Probe Run 1 Run 2
-----------------------------------
11 283 0.0000340 -0.0001841
26 283 -0.0001000 0.0000861
42 283 0.0000181 0.0000781
131 283 -0.0000701 0.0001580
208 283 -0.0000240 0.0001879
Average 283 -0.0000284 0.0000652
A conservative assumption is that the bias could fall
somewhere within the limits a, with a = maximum bias or
0.0000652 ohm.cm. The standard deviation of the
correction is included as a type A systematic component of
the uncertainty.
2.5.3.3.1. Inconsistent bias
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5331.htm[6/27/2012 1:51:52 PM]
2.5.3.3.2. Consistent bias
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5332.htm[6/27/2012 1:51:53 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.3. Type A evaluations of bias
2.5.3.3.2. Consistent bias
Consistent
bias
Bias that is significant and persists consistently over time for a
specific instrument, operator, or configuration should be corrected if it
can be reliably estimated from repeated measurements. Results with
the instrument of interest are then corrected to:
Corrected result = Measurement - Estimate of bias
The example below shows how bias can be identified graphically
from measurements on five artifacts with five instruments and
estimated from the differences among the instruments.
Graph
showing
consistent
bias for
probe #5
An analysis of bias for five instruments based on measurements on
five artifacts shows differences from the average for each artifact
plotted versus artifact with instruments individually identified by a
special plotting symbol. The plot is examined to determine if some
instruments always read high or low relative to the other instruments,
and if this behavior is consistent across artifacts. Notice that on the
graph for resistivity probes, probe #2362, (#5 on the graph), which is
the instrument of interest for this measurement process, consistently
reads low relative to the other probes. This behavior is consistent over
2 runs that are separated by a two-month time period.
Strategy -
correct for
bias
Because there is significant and consistent bias for the instrument of
interest, the measurements made with that instrument should be
corrected for its average bias relative to the other instruments.
Computation
of bias
Given the measurements,
on Q artifacts with I instruments, the average bias for instrument, I'
say, is
where
2.5.3.3.2. Consistent bias
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5332.htm[6/27/2012 1:51:53 PM]
Computation
of correction
The correction that should be made to measurements made with
instrument I' is
Type A
uncertainty
of the
correction
The type A uncertainty of the correction is the standard deviation of
the average bias or
Example of
consistent
bias for
probe #2362
used to
measure
resistivity of
silicon
wafers
The table below comes from the table of resistivity measurements
from a type A analysis of random effects with the average for each
wafer subtracted from each measurement. The differences, as shown,
represent the biases for each probe with respect to the other probes.
Probe #2362 has an average bias, over the five wafers, of -0.02724
ohm.cm. If measurements made with this probe are corrected for this
bias, the standard deviation of the correction is a type A uncertainty.
Table of biases for probes and silicon wafers (ohm.cm)
Wafers
Probe 138 139 140 141 142
-------------------------------------------------------
1 0.02476 -0.00356 0.04002 0.03938 0.00620
181 0.01076 0.03944 0.01871 -0.01072 0.03761
182 0.01926 0.00574 -0.02008 0.02458 -0.00439
2062 -0.01754 -0.03226 -0.01258 -0.02802 -0.00110
2362 -0.03725 -0.00936 -0.02608 -0.02522 -0.03830
Average bias for probe #2362 = - 0.02724
Standard deviation of bias = 0.01171 with
4 degrees of freedom
Standard deviation of correction =
0.01171/sqrt(5) = 0.00523
Note on
different
approaches
to
instrument
bias
The analysis on this page considers the case where only one
instrument is used to make the certification measurements; namely
probe #2362, and the certified values are corrected for bias due to this
probe. The analysis in the section on type A analysis of random effects
considers the case where any one of the probes could be used to make
the certification measurements.
2.5.3.3.3. Bias with sparse data
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5333.htm[6/27/2012 1:51:54 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.3. Type A evaluations
2.5.3.3. Type A evaluations of bias
2.5.3.3.3. Bias with sparse data
Strategy for
dealing with
limited data
The purpose of this discussion is to outline methods for dealing with biases that may be
real but which cannot be estimated reliably because of the sparsity of the data. For
example, a test between two, of many possible, configurations of the measurement
process cannot produce a reliable enough estimate of bias to permit a correction, but it
can reveal problems with the measurement process. The strategy for a significant bias is
to apply a 'zero' correction. The type A uncertainty component is the standard deviation
of the correction, and the calculation depends on whether the bias is
inconsistent
consistent
The analyses in this section can be produced using both Dataplot code and R code.
Example of
differences
among wiring
settings
An example is given of a study of wiring settings for a single gauge. The gauge, a 4-
point probe for measuring resistivity of silicon wafers, can be wired in several ways.
Because it was not possible to test all wiring configurations during the gauge study,
measurements were made in only two configurations as a way of identifying possible
problems.
Data on
wiring
configurations
Measurements were made on six wafers over six days (except for 5 measurements on
wafer 39) with probe #2062 wired in two configurations. This sequence of
measurements was repeated after about a month resulting in two runs. A database of
differences between measurements in the two configurations on the same day are
analyzed for significance.
Plot the
differences
between the
two wiring
configurations
A plot of the differences between the two configurations shows that the differences for
run 1 are, for the most part, less than zero, and the differences for run 2 are greater than
zero.
2.5.3.3.3. Bias with sparse data
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5333.htm[6/27/2012 1:51:54 PM]
Statistical test
for difference
between two
configurations
A t-statistic is used as an approximate test where we are
assuming the differences are approximately normal. The
average difference and standard deviation of the difference
are required for this test. If
the difference between the two configurations is statistically
significant.
The average and standard deviation computed from the N =
29 differences in each run from the table above are shown
along with corresponding t-values which confirm that the
differences are significant, but in opposite directions, for
both runs.
Average differences between wiring
configurations
Run Probe Average Std dev N
t
1 2062 - 0.00383 0.00514 29
- 4.0
2 2062 + 0.00489 0.00400 29
+ 6.6
2.5.3.3.3. Bias with sparse data
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc5333.htm[6/27/2012 1:51:54 PM]
Case of
inconsistent
bias
The data reveal a significant wiring bias for both runs that
changes direction between runs. Because of this
inconsistency, a 'zero' correction is applied to the results,
and the type A uncertainty is taken to be
For this study, the type A uncertainty for wiring bias is
Case of
consistent
bias
Even if the bias is consistent over time, a 'zero' correction is
applied to the results, and for a single run, the estimated
standard deviation of the correction is
For two runs (1 and 2), the estimated standard deviation of
the correction is
2.5.4. Type B evaluations
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc54.htm[6/27/2012 1:51:55 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.4. Type B evaluations
Type B
evaluations
apply to both
error and
bias
Type B evaluations can apply to both random error and bias.
The distinguishing feature is that the calculation of the
uncertainty component is not based on a statistical analysis
of data. The distinction to keep in mind with regard to
random error and bias is that:
random errors cannot be corrected
biases can, theoretically at least, be corrected or
eliminated from the result.
Sources of
type B
evaluations
Some examples of sources of uncertainty that lead to type B
evaluations are:
Reference standards calibrated by another laboratory
Physical constants used in the calculation of the
reported value
Environmental effects that cannot be sampled
Possible configuration/geometry misalignment in the
instrument
Lack of resolution of the instrument
Documented
sources of
uncertainty
from other
processes
Documented sources of uncertainty, such as calibration
reports for reference standards or published reports of
uncertainties for physical constants, pose no difficulties in
the analysis. The uncertainty will usually be reported as an
expanded uncertainty, U, which is converted to the standard
uncertainty,
u = U/k
If the k factor is not known or documented, it is probably
conservative to assume that k = 2.
Sources of
uncertainty
that are
local to the
measurement
process
Sources of uncertainty that are local to the measurement
process but which cannot be adequately sampled to allow a
statistical analysis require type B evaluations. One
technique, which is widely used, is to estimate the worst-
case effect, a, for the source of interest, from
experience
scientific judgment
2.5.4. Type B evaluations
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc54.htm[6/27/2012 1:51:55 PM]
scant data
A standard deviation, assuming that the effect is two-sided,
can then be computed based on a uniform, triangular, or
normal distribution of possible effects.
Following the Guide to the Expression of Uncertainty of
Measurement (GUM), the convention is to assign infinite
degrees of freedom to standard deviations derived in this
manner.
2.5.4.1. Standard deviations from assumed distributions
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc541.htm[6/27/2012 1:51:55 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.4. Type B evaluations
2.5.4.1. Standard deviations from assumed
distributions
Difficulty
of
obtaining
reliable
uncertainty
estimates
The methods described on this page attempt to avoid the
difficulty of allowing for sources of error for which reliable
estimates of uncertainty do not exist. The methods are based
on assumptions that may, or may not, be valid and require the
experimenter to consider the effect of the assumptions on the
final uncertainty.
Difficulty
of
obtaining
reliable
uncertainty
estimates
The ISO guidelines do not allow worst-case estimates of bias
to be added to the other components, but require they in some
way be converted to equivalent standard deviations. The
approach is to consider that any error or bias, for the situation
at hand, is a random draw from a known statistical
distribution. Then the standard deviation is calculated from
known (or assumed) characteristics of the distribution.
Distributions that can be considered are:
Uniform
Triangular
Normal (Gaussian)
Standard
deviation
for a
uniform
distribution
The uniform distribution leads to the most conservative
estimate of uncertainty; i.e., it gives the largest standard
deviation. The calculation of the standard deviation is based
on the assumption that the end-points, a, of the distribution
are known. It also embodies the assumption that all effects on
the reported value, between -a and +a, are equally likely for
the particular source of uncertainty.
Standard
deviation
for a
triangular
The triangular distribution leads to a less conservative
estimate of uncertainty; i.e., it gives a smaller standard
deviation than the uniform distribution. The calculation of the
standard deviation is based on the assumption that the end-
2.5.4.1. Standard deviations from assumed distributions
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc541.htm[6/27/2012 1:51:55 PM]
distribution points, a, of the distribution are known and the mode of the
triangular distribution occurs at zero.
Standard
deviation
for a
normal
distribution
The normal distribution leads to the least conservative
estimate of uncertainty; i.e., it gives the smallest standard
deviation. The calculation of the standard deviation is based
on the assumption that the end-points, a, encompass 99.7
percent of the distribution.
Degrees of
freedom
In the context of using the Welch-Saitterthwaite formula with
the above distributions, the degrees of freedom is assumed to
be infinite.
2.5.5. Propagation of error considerations
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc55.htm[6/27/2012 1:51:56 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.5. Propagation of error considerations
Top-down
approach
consists of
estimating the
uncertainty
from direct
repetitions of
the
measurement
result
The approach to uncertainty analysis that has been followed up to this point
in the discussion has been what is called a top-down approach. Uncertainty
components are estimated from direct repetitions of the measurement result.
To contrast this with a propagation of error approach, consider the simple
example where we estimate the area of a rectangle from replicate
measurements of length and width. The area
area = length x width
can be computed from each replicate. The standard deviation of the reported
area is estimated directly from the replicates of area.
Advantages of
top-down
approach
This approach has the following advantages:
proper treatment of covariances between measurements of length and
width
proper treatment of unsuspected sources of error that would emerge if
measurements covered a range of operating conditions and a
sufficiently long time period
independence from propagation of error model
Propagation
of error
approach
combines
estimates from
individual
auxiliary
measurements
The formal propagation of error approach is to compute:
1. standard deviation from the length measurements
2. standard deviation from the width measurements
and combine the two into a standard deviation for area using the
approximation for products of two variables (ignoring a possible covariance
between length and width),
Exact formula Goodman (1960) derived an exact formula for the variance between two
products. Given two random variables, x and y (correspond to width and
length in the above approximate formula), the exact formula for the variance
is:
2.5.5. Propagation of error considerations
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc55.htm[6/27/2012 1:51:56 PM]
with
X = E(x) and Y = E(y) (corresponds to width and length, respectively,
in the approximate formula)
V(x) = variance of x and V(y) = variance Y (corresponds to s
2
for
width and length, respectively, in the approximate formula)
E
ij
= {( x)
i
, ( y)
j
} where x = x - X and y = y - Y
To obtain the standard deviation, simply take the square root of the above
formula. Also, an estimate of the statistic is obtained by substituting sample
estimates for the corresponding population values on the right hand side of
the equation.
Approximate
formula
assumes
indpendence
The approximate formula assumes that length and width are independent.
The exact formula assumes that length and width are not independent.
Disadvantages
of
propagation
of error
approach
In the ideal case, the propagation of error estimate above will not differ from
the estimate made directly from the area measurements. However, in
complicated scenarios, they may differ because of:
unsuspected covariances
disturbances that affect the reported value and not the elementary
measurements (usually a result of mis-specification of the model)
mistakes in propagating the error through the defining formulas
Propagation
of error
formula
Sometimes the measurement of interest cannot be replicated directly and it is
necessary to estimate its uncertainty via propagation of error formulas (Ku).
The propagation of error formula for
Y = f(X, Z, ... )
a function of one or more variables with measurements, X, Z, ... gives the
following estimate for the standard deviation of Y:
where
is the standard deviation of the X measurements
is the standard deviation of Z measurements
is the standard deviation of Y measurements
is the partial derivative of the function Y with respect to X,
etc.
2.5.5. Propagation of error considerations
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc55.htm[6/27/2012 1:51:56 PM]
is the estimated covariance between the X,Z measurements
Treatment of
covariance
terms
Covariance terms can be difficult to estimate if measurements are not made
in pairs. Sometimes, these terms are omitted from the formula. Guidance on
when this is acceptable practice is given below:
1. If the measurements of X, Z are independent, the associated covariance
term is zero.
2. Generally, reported values of test items from calibration designs have
non-zero covariances that must be taken into account if Y is a
summation such as the mass of two weights, or the length of two gage
blocks end-to-end, etc.
3. Practically speaking, covariance terms should be included in the
computation only if they have been estimated from sufficient data. See
Ku (1966) for guidance on what constitutes sufficient data.
Sensitivity
coefficients
The partial derivatives are the sensitivity coefficients for the associated
components.
Examples of
propagation
of error
analyses
Examples of propagation of error that are shown in this chapter are:
Case study of propagation of error for resistivity measurements
Comparison of check standard analysis and propagation of error for
linear calibration
Propagation of error for quadratic calibration showing effect of
covariance terms
Specific
formulas
Formulas for specific functions can be found in the following sections:
functions of a single variable
functions of two variables
functions of many variables
2.5.5.1. Formulas for functions of one variable
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc551.htm[6/27/2012 1:51:57 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.5. Propagation of error considerations
2.5.5.1. Formulas for functions of one variable
Case:
Y=f(X,Z)
Standard deviations of reported values that are functions of a
single variable are reproduced from a paper by H. Ku (Ku).
The reported value, Y, is a function of the average of N
measurements on a single variable.
Notes
Function of
is an average of N
measurements
Standard deviation of
= standard deviation of X.
Approximation
could be
seriously in
error if n is
small--
Not directly
derived from
2.5.5.1. Formulas for functions of one variable
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc551.htm[6/27/2012 1:51:57 PM]
the formulas Note: we need to assume that the
original data follow an
approximately normal distribution.
2.5.5.2. Formulas for functions of two variables
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc552.htm[6/27/2012 1:51:58 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.5. Propagation of error considerations
2.5.5.2. Formulas for functions of two variables
Case:
Y=f(X,Z)
Standard deviations of reported values that are functions of
measurements on two variables are reproduced from a paper
by H. Ku (Ku).
The reported value, Y is a function of averages of N
measurements on two variables.
Function of ,
and are averages of N
measurements
Standard deviation of
= standard dev of X;
= standard dev of Z;
= covariance of X,Z
Note: Covariance term is to be included only if
there is a reliable estimate
Note: this is an approximation. The exact result
could be obtained starting from the exact formula
for the standard deviation of a product derived by
2.5.5.2. Formulas for functions of two variables
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc552.htm[6/27/2012 1:51:58 PM]
Goodman (1960).
2.5.5.3. Propagation of error for many variables
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc553.htm[6/27/2012 1:51:59 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.5. Propagation of error considerations
2.5.5.3. Propagation of error for many variables
Example
from fluid
flow with a
nonlinear
function
Computing uncertainty for measurands based on more complicated functions
can be done using basic propagation of errors principles. For example,
suppose we want to compute the uncertainty of the discharge coefficient for
fluid flow (Whetstone et al.). The measurement equation is
where
Assuming the variables in the equation are uncorrelated, the squared
uncertainty of the discharge coefficient is
and the partial derivatives are the following.
2.5.5.3. Propagation of error for many variables
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc553.htm[6/27/2012 1:51:59 PM]
Software can
simplify
propagation
of error
Propagation of error for more complicated functions can be done reliably with
software capable of symbolic computations or algebraic representations.
Symbolic computation software can also be used to combine the partial
derivatives with the appropriate standard deviations, and then the standard
deviation for the discharge coefficient can be evaluated and plotted for
specific values of the secondary variables, as shown in the comparison of
check standard analysis and propagation of error.
Simplification
for dealing
with
multiplicative
variables
Propagation of error for several variables can be simplified considerably for
the special case where:
the function, Y, is a simple multiplicative function of secondary
variables, and
uncertainty is evaluated as a percentage.
For three variables, X, Z, W, the function
has a standard deviation in absolute units of
In percent units, the standard deviation can be written as
if all covariances are negligible. These formulas are easily extended to more
than three variables.
2.5.5.3. Propagation of error for many variables
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc553.htm[6/27/2012 1:51:59 PM]
2.5.6. Uncertainty budgets and sensitivity coefficients
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc56.htm[6/27/2012 1:52:00 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.6. Uncertainty budgets and sensitivity
coefficients
Case study
showing
uncertainty
budget
Uncertainty components are listed in a table along with their
corresponding sensitivity coefficients, standard deviations and
degrees of freedom. A table of typical entries illustrates the
concept.
Typical budget of type A and type B uncertainty components
Type A components Sensitivity coefficient
Standard
deviation
Degrees
freedom
1. Time (repeatability) v1
2. Time (reproducibility) v2
3. Time (long-term) v3
Type B components
5. Reference standard (nominal test / nominal ref) v4
Sensitivity
coefficients
show how
components
are related
to result
The sensitivity coefficient shows the relationship of the
individual uncertainty component to the standard deviation
of the reported value for a test item. The sensitivity
coefficient relates to the result that is being reported and not
to the method of estimating uncertainty components where
the uncertainty, u, is
Sensitivity
coefficients
for type A
components
of
uncertainty
This section defines sensitivity coefficients that are
appropriate for type A components estimated from repeated
measurements. The pages on type A evaluations, particularly
the pages related to estimation of repeatability and
reproducibility components, should be reviewed before
continuing on this page. The convention for the notation for
sensitivity coefficients for this section is that:
1. refers to the sensitivity coefficient for the
2.5.6. Uncertainty budgets and sensitivity coefficients
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc56.htm[6/27/2012 1:52:00 PM]
repeatability standard deviation,
2. refers to the sensitivity coefficient for the
reproducibility standard deviation,
3. refers to the sensitivity coefficient for the stability
standard deviation,
with some of the coefficients possibly equal to zero.
Note on
long-term
errors
Even if no day-to-day nor run-to-run measurements were
made in determining the reported value, the sensitivity
coefficient is non-zero if that standard deviation proved to
be significant in the analysis of data.
Sensitivity
coefficients
for other
type A
components
of random
error
Procedures for estimating differences among instruments,
operators, etc., which are treated as random components of
uncertainty in the laboratory, show how to estimate the
standard deviations so that the sensitivity coefficients = 1.
Sensitivity
coefficients
for type A
components
for bias
This Handbook follows the ISO guidelines in that biases are
corrected (correction may be zero), and the uncertainty
component is the standard deviation of the correction.
Procedures for dealing with biases show how to estimate the
standard deviation of the correction so that the sensitivity
coefficients are equal to one.
Sensitivity
coefficients
for specific
applications
The following pages outline methods for computing
sensitivity coefficients where the components of uncertainty
are derived in the following manner:
1. From measurements on the test item itself
2. From measurements on a check standard
3. From measurements in a 2-level design
4. From measurements in a 3-level design
and give an example of an uncertainty budget with
sensitivity coefficients from a 3-level design.
Sensitivity
coefficients
for type B
evaluations
The majority of sensitivity coefficients for type B
evaluations will be one with a few exceptions. The
sensitivity coefficient for the uncertainty of a reference
standard is the nominal value of the test item divided by the
nominal value of the reference standard.
Case study-
sensitivity
coefficients
for
propagation
of error
If the uncertainty of the reported value is calculated from
propagation of error, the sensitivity coefficients are the
multipliers of the individual variance terms in the
propagation of error formula. Formulas are given for
selected functions of:
2.5.6. Uncertainty budgets and sensitivity coefficients
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc56.htm[6/27/2012 1:52:00 PM]
1. functions of a single variable
2. functions of two variables
3. several variables
2.5.6.1. Sensitivity coefficients for measurements on the test item
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc561.htm[6/27/2012 1:52:01 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.6. Uncertainty budgets and sensitivity coefficients
2.5.6.1. Sensitivity coefficients for
measurements on the test item
From data
on the test
item itself
If the temporal component is estimated from N short-term
readings on the test item itself
Y
1
, Y
2
, ..., Y
N
and
and the reported value is the average, the standard deviation of
the reported value is
with degrees of freedom .
Sensitivity
coefficients
The sensitivity coefficient is . The risk in using this
method is that it may seriously underestimate the uncertainty.
To
improve
the
reliability
of the
uncertainty
calculation
If possible, the measurements on the test item should be
repeated over M days and averaged to estimate the reported
value. The standard deviation for the reported value is
computed from the daily averages>, and the standard
deviation for the temporal component is:
with degrees of freedom where are the daily
averages and is the grand average.
The sensitivity coefficients are: a
1
= 0; a
2
= .
Note on Even if no day-to-day nor run-to-run measurements were
2.5.6.1. Sensitivity coefficients for measurements on the test item
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc561.htm[6/27/2012 1:52:01 PM]
long-term
errors
made in determining the reported value, the sensitivity
coefficient is non-zero if that standard deviation proved to be
significant in the analysis of data.
2.5.6.2. Sensitivity coefficients for measurements on a check standard
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc562.htm[6/27/2012 1:52:02 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.6. Uncertainty budgets and sensitivity coefficients
2.5.6.2. Sensitivity coefficients for
measurements on a check standard
From
measurements
on check
standards
If the temporal component of the measurement process is
evaluated from measurements on a check standard and
there are M days (M = 1 is permissible) of measurements
on the test item that are structured in the same manner as
the measurements on the check standard, the standard
deviation for the reported value is
with degrees of freedom from the K entries in
the check standard database.
Standard
deviation
from check
standard
measurements
The computation of the standard deviation from the check
standard values and its relationship to components of
instrument precision and day-to-day variability of the
process are explained in the section on two-level nested
designs using check standards.
Sensitivity
coefficients
The sensitivity coefficients are: a
1
; a
2
= .
2.5.6.3. Sensitivity coefficients for measurements from a 2-level design
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc563.htm[6/27/2012 1:52:03 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.6. Uncertainty budgets and sensitivity coefficients
2.5.6.3. Sensitivity coefficients for measurements
from a 2-level design
Sensitivity
coefficients
from a 2-
level
design
If the temporal components are estimated from a 2-level
nested design, and the reported value for a test item is an
average over
N short-term repetitions
M (M = 1 is permissible) days
of measurements on the test item, the standard deviation for
the reported value is:
See the relationships in the section on 2-level nested design
for definitions of the standard deviations and their respective
degrees of freedom.
Problem
with
estimating
degrees of
freedom
If degrees of freedom are required for the uncertainty of the
reported value, the formula above cannot be used directly and
must be rewritten in terms of the standard deviations, and
.
Sensitivity
coefficients
The sensitivity coefficients are: a
1
= ;
a
2
= .
Specific sensitivity coefficients are shown in the table below
for selections of N, M.
Sensitivity coefficients for two
components
of uncertainty
Number Number
Short-term
Day-to-
2.5.6.3. Sensitivity coefficients for measurements from a 2-level design
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc563.htm[6/27/2012 1:52:03 PM]
short-
term
N
day-to-
day
M
sensitivity
coefficient
day
sensitivity
coefficient
1 1 1
N 1 1
N M
2.5.6.4. Sensitivity coefficients for measurements from a 3-level design
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc564.htm[6/27/2012 1:52:04 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.6. Uncertainty budgets and sensitivity coefficients
2.5.6.4. Sensitivity coefficients for
measurements from a 3-level design
Sensitivity
coefficients
from a 3-
level
design
Case study
showing
sensitivity
coefficients
for 3-level
design
If the temporal components are estimated from a 3-level
nested design and the reported value is an average over
N short-term repetitions
M days
P runs
of measurements on the test item, the standard deviation for
the reported value is:
See the section on analysis of variability for definitions and
relationships among the standard deviations shown in the
equation above.
Problem
with
estimating
degrees of
freedom
If degrees of freedom are required for the uncertainty, the
formula above cannot be used directly and must be rewritten
in terms of the standard deviations , , and .
Sensitivity
coefficients
The sensitivity coefficients are:
a
1
= ; a
2
= ;
a
3
= .
Specific sensitivity coefficients are shown in the table below
for selections of N, M, P. In addition, the following
constraints must be observed:
J must be > or = N and K must be > or = M
Sensitivity coefficients for three components of uncertainty
2.5.6.4. Sensitivity coefficients for measurements from a 3-level design
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc564.htm[6/27/2012 1:52:04 PM]
Number
short-
term
N
Number
day-to-
day
M
Number
run-to-
run
P
Short-term
sensitivity coefficient
Day-to-day
sensitivity coefficient
Run-to-
run
sensitivity
coefficient
1 1 1 1
N 1 1 1
N M 1 1
N M P
2.5.6.5. Example of uncertainty budget
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc565.htm[6/27/2012 1:52:04 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.6. Uncertainty budgets and sensitivity coefficients
2.5.6.5. Example of uncertainty budget
Example of
uncertainty
budget for
three
components
of temporal
uncertainty
An uncertainty budget that illustrates several principles of
uncertainty analysis is shown below. The reported value for a
test item is the average of N short-term measurements where
the temporal components of uncertainty were estimated from
a 3-level nested design with J short-term repetitions over K
days.
The number of measurements made on the test item is the
same as the number of short-term measurements in the
design; i.e., N = J. Because there were no repetitions over
days or runs on the test item, M = 1; P = 1. The sensitivity
coefficients for this design are shown on the foregoing page.
Example of
instrument
bias
This example also illustrates the case where the measuring
instrument is biased relative to the other instruments in the
laboratory, with a bias correction applied accordingly. The
sensitivity coefficient, given that the bias correction is based
on measurements on Q artifacts, is defined as a
4
= 1, and the
standard deviation, s
4
, is the standard deviation of the
correction.
Example of error budget for type A and type B uncertainties
Type A components Sensitivity coefficient
Standard
deviation
Degrees
freedom
1. Repeatability
= 0
J - 1
2. Reproducibility
=
K - 1
2. Stability
= 1
L - 1
3. Instrument bias
= 1
Q - 1
2.5.7. Standard and expanded uncertainties
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc57.htm[6/27/2012 1:52:05 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.7. Standard and expanded uncertainties
Definition of
standard
uncertainty
The sensitivity coefficients and standard deviations are
combined by root sum of squares to obtain a 'standard
uncertainty'. Given R components, the standard uncertainty
is:
Expanded
uncertainty
assures a
high level of
confidence
If the purpose of the uncertainty statement is to provide
coverage with a high level of confidence, an expanded
uncertainty is computed as
where k is chosen to be the t
1-/2,
critical value from the t-
table with degrees of freedom. For large degrees of
freedom, k = 2 approximates 95 % coverage.
Interpretation
of uncertainty
statement
The expanded uncertainty defined above is assumed to
provide a high level of coverage for the unknown true value
of the measurement of interest so that for any measurement
result, Y,
2.5.7.1. Degrees of freedom
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc571.htm[6/27/2012 1:52:06 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.7. Standard and expanded uncertainties
2.5.7.1. Degrees of freedom
Degrees of
freedom for
individual
components
of
uncertainty
Degrees of freedom for type A uncertainties are the degrees
of freedom for the respective standard deviations. Degrees of
freedom for Type B evaluations may be available from
published reports or calibration certificates. Special cases
where the standard deviation must be estimated from
fragmentary data or scientific judgment are assumed to have
infinite degrees of freedom; for example,
Worst-case estimate based on a robustness study or
other evidence
Estimate based on an assumed distribution of possible
errors
Type B uncertainty component for which degrees of
freedom are not documented
Degrees of
freedom for
the
standard
uncertainty
Degrees of freedom for the standard uncertainty, u, which
may be a combination of many standard deviations, is not
generally known. This is particularly troublesome if there are
large components of uncertainty with small degrees of
freedom. In this case, the degrees of freedom is approximated
by the Welch-Satterthwaite formula (Brownlee).
Case study:
Uncertainty
and
degrees of
freedom
A case study of type A uncertainty analysis shows the
computations of temporal components of uncertainty;
instrument bias; geometrical bias; standard uncertainty;
degrees of freedom; and expanded uncertainty.
2.5.8. Treatment of uncorrected bias
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc58.htm[6/27/2012 1:52:06 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.8. Treatment of uncorrected bias
Background The ISO Guide ( ISO) for expressing measurement
uncertainties assumes that all biases are corrected and that the
uncertainty applies to the corrected result. For measurements
at the factory floor level, this approach has several
disadvantages. It may not be practical, may be expensive and
may not be economically sound to correct for biases that do
not impact the commercial value of the product (Turgel and
Vecchia).
Reasons for
not
correcting
for bias
Corrections may be expensive to implement if they require
modifications to existing software and "paper and pencil"
corrections can be both time consuming and prone to error.
In the scientific or metrology laboratory, biases may be
documented in certain situations, but the mechanism that
causes the bias may not be fully understood, or repeatable,
which makes it difficult to argue for correction. In these
cases, the best course of action is to report the measurement
as taken and adjust the uncertainty to account for the "bias".
The
question is
how to
adjust the
uncertainty
A method needs to be developed which assures that the
resulting uncertainty has the following properties (Phillips
and Eberhardt):
1. The final uncertainty must be greater than or equal to
the uncertainty that would be quoted if the bias were
corrected.
2. The final uncertainty must reduce to the same
uncertainty given that the bias correction is applied.
3. The level of coverage that is achieved by the final
uncertainty statement should be at least the level
obtained for the case of corrected bias.
4. The method should be transferable so that both the
uncertainty and the bias can be used as components of
uncertainty in another uncertainty statement.
5. The method should be easy to implement.
2.5.8.1. Computation of revised uncertainty
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc581.htm[6/27/2012 1:52:07 PM]
2. Measurement Process Characterization
2.5. Uncertainty analysis
2.5.8. Treatment of uncorrected bias
2.5.8.1. Computation of revised uncertainty
Definition of
the bias and
corrected
measurement
If the bias is and the corrected measurement is defined by
,
the corrected value of Y has the usual expanded uncertainty
interval which is symmetric around the unknown true value
for the measurement process and is of the following type:
Definition of
asymmetric
uncertainty
interval to
account for
uncorrected
measurement
If no correction is made for the bias, the uncertainty interval
is contaminated by the effect of the bias term as follows:
and can be rewritten in terms of upper and lower endpoints
that are asymmetric around the true value; namely,
Conditions
on the
relationship
between the
bias and U
The definition above can lead to a negative uncertainty
limit; e.g., if the bias is positive and greater than U, the
upper endpoint becomes negative. The requirement that the
uncertainty limits be greater than or equal to zero for all
values of the bias guarantees non-negative uncertainty
limits and is accepted at the cost of somewhat wider
uncertainty intervals. This leads to the following set of
restrictions on the uncertainty limits:
Situation
where bias is
not known
exactly but
must be
If the bias is not known exactly, its magnitude is estimated
from repeated measurements, from sparse data or from
theoretical considerations, and the standard deviation is
estimated from repeated measurements or from an assumed
distribution. The standard deviation of the bias becomes a
2.5.8.1. Computation of revised uncertainty
http://www.itl.nist.gov/div898/handbook/mpc/section5/mpc581.htm[6/27/2012 1:52:07 PM]
estimated component in the uncertainty analysis with the standard
uncertainty restructured to be:
and the expanded uncertainty limits become:
.
Interpretation The uncertainty intervals described above have the
desirable properties outlined on a previous page. For more
information on theory and industrial examples, the reader
should consult the paper by the authors of this technique
(Phillips and Eberhardt).
2.6. Case studies
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6.htm[6/27/2012 1:52:08 PM]
2. Measurement Process Characterization
2.6. Case studies
Contents The purpose of this section is to illustrate the planning,
procedures, and analyses outlined in the various sections of
this chapter with data taken from measurement processes at
the National Institute of Standards and Technology.
1. Gauge study of resistivity probes
2. Check standard study for resistivity measurements
3. Type A uncertainty analysis
4. Type B uncertainty analysis and propagation of
error
2.6.1. Gauge study of resistivity probes
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc61.htm[6/27/2012 1:52:08 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
Purpose The purpose of this case study is to outline the analysis of a
gauge study that was undertaken to identify the sources of
uncertainty in resistivity measurements of silicon wafers.
Outline 1. Background and data
2. Analysis and interpretation
3. Graphs showing repeatability standard deviations
4. Graphs showing day-to-day variability
5. Graphs showing differences among gauges
6. Run this example yourself with Dataplot
7. Dataplot macros
2.6.1.1. Background and data
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc611.htm[6/27/2012 1:52:09 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
2.6.1.1. Background and data
Description
of
measurements
Measurements of resistivity on 100 ohm.cm wafers were
made according to an ASTM Standard Test Method
(ASTM F84) to assess the sources of uncertainty in the
measurement system. Resistivity measurements have been
studied over the years, and it is clear from those data that
there are sources of variability affecting the process beyond
the basic imprecision of the gauges. Changes in
measurement results have been noted over days and over
months and the data in this study are structured to quantify
these time-dependent changes in the measurement process.
Gauges The gauges for the study were five probes used to measure
resistivity of silicon wafers. The five gauges are assumed to
represent a random sample of typical 4-point gauges for
making resistivity measurements. There is a question of
whether or not the gauges are essentially equivalent or
whether biases among them are possible.
Check
standards
The check standards for the study were five wafers selected
at random from the batch of 100 ohm.cm wafers.
Operators The effect of operator was not considered to be significant
for this study.
Database of
measurements
The 3-level nested design consisted of:
J = 6 measurements at the center of each wafer per
day
K = 6 days
L = 2 runs
To characterize the probes and the influence of wafers on
the measurements, the design was repeated over:
Q = 5 wafers (check standards 138, 139, 140, 141,
142)
I = 5 probes (1, 281, 283, 2062, 2362)
The runs were separated by about one month in time. The J
= 6 measurements at the center of each wafer are reduced
to an average and repeatability standard deviation and
recorded in a database with identifications for wafer, probe,
2.6.1.1. Background and data
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc611.htm[6/27/2012 1:52:09 PM]
and day.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
2.6.1.1. Background and data
2.6.1.1.1. Database of resistivity measurements
The check
standards are
five wafers
chosen at
random from
a batch of
wafers
Measurements of resistivity (ohm.cm) were made
according to an ASTM Standard Test Method (F4) at NIST
to assess the sources of uncertainty in the measurement
system. The gauges for the study were five probes owned
by NIST; the check standards for the study were five
wafers selected at random from a batch of wafers cut from
one silicon crystal doped with phosphorous to give a
nominal resistivity of 100 ohm.cm.
Measurements
on the check
standards are
used to
estimate
repeatability,
day effect,
and run effect
The effect of operator was not considered to be significant
for this study; therefore, 'day' replaces 'operator' as a factor
in the nested design. Averages and standard deviations
from J = 6 measurements at the center of each wafer are
shown in the table.
J = 6 measurements at the center of the wafer per
day
K = 6 days (one operator) per repetition
L = 2 runs (complete)
Q = 5 wafers (check standards 138, 139, 140, 141,
142)
R = 5 probes (1, 281, 283, 2062, 2362)
Run Wafer Probe Month Day Op Temp Average
Std Dev
1 138. 1. 3. 15. 1. 22.98 95.1772
0.1191
1 138. 1. 3. 17. 1. 23.02 95.1567
0.0183
1 138. 1. 3. 18. 1. 22.79 95.1937
0.1282
1 138. 1. 3. 21. 1. 23.17 95.1959
0.0398
1 138. 1. 3. 23. 2. 23.25 95.1442
0.0346
1 138. 1. 3. 23. 1. 23.20 95.0610
0.1539
1 138. 281. 3. 16. 1. 22.99 95.1591
0.0963
1 138. 281. 3. 17. 1. 22.97 95.1195
0.0606
1 138. 281. 3. 18. 1. 22.83 95.1065
0.0842
1 138. 281. 3. 21. 1. 23.28 95.0925
0.0973
1 138. 281. 3. 23. 2. 23.14 95.1990
0.1062
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
1 138. 281. 3. 23. 1. 23.16 95.1682
0.1090
1 138. 283. 3. 16. 1. 22.95 95.1252
0.0531
1 138. 283. 3. 17. 1. 23.08 95.1600
0.0998
1 138. 283. 3. 18. 1. 23.13 95.0818
0.1108
1 138. 283. 3. 21. 1. 23.28 95.1620
0.0408
1 138. 283. 3. 22. 1. 23.36 95.1735
0.0501
1 138. 283. 3. 24. 2. 22.97 95.1932
0.0287
1 138. 2062. 3. 16. 1. 22.97 95.1311
0.1066
1 138. 2062. 3. 17. 1. 22.98 95.1132
0.0415
1 138. 2062. 3. 18. 1. 23.16 95.0432
0.0491
1 138. 2062. 3. 21. 1. 23.16 95.1254
0.0603
1 138. 2062. 3. 22. 1. 23.28 95.1322
0.0561
1 138. 2062. 3. 24. 2. 23.19 95.1299
0.0349
1 138. 2362. 3. 15. 1. 23.08 95.1162
0.0480
1 138. 2362. 3. 17. 1. 23.01 95.0569
0.0577
1 138. 2362. 3. 18. 1. 22.97 95.0598
0.0516
1 138. 2362. 3. 22. 1. 23.23 95.1487
0.0386
1 138. 2362. 3. 23. 2. 23.28 95.0743
0.0256
1 138. 2362. 3. 24. 2. 23.10 95.1010
0.0420
1 139. 1. 3. 15. 1. 23.01 99.3528
0.1424
1 139. 1. 3. 17. 1. 23.00 99.2940
0.0660
1 139. 1. 3. 17. 1. 23.01 99.2340
0.1179
1 139. 1. 3. 21. 1. 23.20 99.3489
0.0506
1 139. 1. 3. 23. 2. 23.22 99.2625
0.1111
1 139. 1. 3. 23. 1. 23.22 99.3787
0.1103
1 139. 281. 3. 16. 1. 22.95 99.3244
0.1134
1 139. 281. 3. 17. 1. 22.98 99.3378
0.0949
1 139. 281. 3. 18. 1. 22.86 99.3424
0.0847
1 139. 281. 3. 22. 1. 23.17 99.4033
0.0801
1 139. 281. 3. 23. 2. 23.10 99.3717
0.0630
1 139. 281. 3. 23. 1. 23.14 99.3493
0.1157
1 139. 283. 3. 16. 1. 22.94 99.3065
0.0381
1 139. 283. 3. 17. 1. 23.09 99.3280
0.1153
1 139. 283. 3. 18. 1. 23.11 99.3000
0.0818
1 139. 283. 3. 21. 1. 23.25 99.3347
0.0972
1 139. 283. 3. 22. 1. 23.36 99.3929
0.1189
1 139. 283. 3. 23. 1. 23.18 99.2644
0.0622
1 139. 2062. 3. 16. 1. 22.94 99.3324
0.1531
1 139. 2062. 3. 17. 1. 23.08 99.3254
0.0543
1 139. 2062. 3. 18. 1. 23.15 99.2555
0.1024
1 139. 2062. 3. 18. 1. 23.18 99.1946
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
0.0851
1 139. 2062. 3. 22. 1. 23.27 99.3542
0.1227
1 139. 2062. 3. 24. 2. 23.23 99.2365
0.1218
1 139. 2362. 3. 15. 1. 23.08 99.2939
0.0818
1 139. 2362. 3. 17. 1. 23.02 99.3234
0.0723
1 139. 2362. 3. 18. 1. 22.93 99.2748
0.0756
1 139. 2362. 3. 22. 1. 23.29 99.3512
0.0475
1 139. 2362. 3. 23. 2. 23.25 99.2350
0.0517
1 139. 2362. 3. 24. 2. 23.05 99.3574
0.0485
1 140. 1. 3. 15. 1. 23.07 96.1334
0.1052
1 140. 1. 3. 17. 1. 23.08 96.1250
0.0916
1 140. 1. 3. 18. 1. 22.77 96.0665
0.0836
1 140. 1. 3. 21. 1. 23.18 96.0725
0.0620
1 140. 1. 3. 23. 2. 23.20 96.1006
0.0582
1 140. 1. 3. 23. 1. 23.21 96.1131
0.1757
1 140. 281. 3. 16. 1. 22.94 96.0467
0.0565
1 140. 281. 3. 17. 1. 22.99 96.1081
0.1293
1 140. 281. 3. 18. 1. 22.91 96.0578
0.1148
1 140. 281. 3. 22. 1. 23.15 96.0700
0.0495
1 140. 281. 3. 22. 1. 23.33 96.1052
0.1722
1 140. 281. 3. 23. 1. 23.19 96.0952
0.1786
1 140. 283. 3. 16. 1. 22.89 96.0650
0.1301
1 140. 283. 3. 17. 1. 23.07 96.0870
0.0881
1 140. 283. 3. 18. 1. 23.07 95.8906
0.1842
1 140. 283. 3. 21. 1. 23.24 96.0842
0.1008
1 140. 283. 3. 22. 1. 23.34 96.0189
0.0865
1 140. 283. 3. 23. 1. 23.19 96.1047
0.0923
1 140. 2062. 3. 16. 1. 22.95 96.0379
0.2190
1 140. 2062. 3. 17. 1. 22.97 96.0671
0.0991
1 140. 2062. 3. 18. 1. 23.15 96.0206
0.0648
1 140. 2062. 3. 21. 1. 23.14 96.0207
0.1410
1 140. 2062. 3. 22. 1. 23.32 96.0587
0.1634
1 140. 2062. 3. 24. 2. 23.17 96.0903
0.0406
1 140. 2362. 3. 15. 1. 23.08 96.0771
0.1024
1 140. 2362. 3. 17. 1. 23.00 95.9976
0.0943
1 140. 2362. 3. 18. 1. 23.01 96.0148
0.0622
1 140. 2362. 3. 22. 1. 23.27 96.0397
0.0702
1 140. 2362. 3. 23. 2. 23.24 96.0407
0.0627
1 140. 2362. 3. 24. 2. 23.13 96.0445
0.0622
1 141. 1. 3. 15. 1. 23.01 101.2124
0.0900
1 141. 1. 3. 17. 1. 23.08 101.1018
0.0820
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
1 141. 1. 3. 18. 1. 22.75 101.1119
0.0500
1 141. 1. 3. 21. 1. 23.21 101.1072
0.0641
1 141. 1. 3. 23. 2. 23.25 101.0802
0.0704
1 141. 1. 3. 23. 1. 23.19 101.1350
0.0699
1 141. 281. 3. 16. 1. 22.93 101.0287
0.0520
1 141. 281. 3. 17. 1. 23.00 101.0131
0.0710
1 141. 281. 3. 18. 1. 22.90 101.1329
0.0800
1 141. 281. 3. 22. 1. 23.19 101.0562
0.1594
1 141. 281. 3. 23. 2. 23.18 101.0891
0.1252
1 141. 281. 3. 23. 1. 23.17 101.1283
0.1151
1 141. 283. 3. 16. 1. 22.85 101.1597
0.0990
1 141. 283. 3. 17. 1. 23.09 101.0784
0.0810
1 141. 283. 3. 18. 1. 23.08 101.0715
0.0460
1 141. 283. 3. 21. 1. 23.27 101.0910
0.0880
1 141. 283. 3. 22. 1. 23.34 101.0967
0.0901
1 141. 283. 3. 24. 2. 23.00 101.1627
0.0888
1 141. 2062. 3. 16. 1. 22.97 101.1077
0.0970
1 141. 2062. 3. 17. 1. 22.96 101.0245
0.1210
1 141. 2062. 3. 18. 1. 23.19 100.9650
0.0700
1 141. 2062. 3. 18. 1. 23.18 101.0319
0.1070
1 141. 2062. 3. 22. 1. 23.34 101.0849
0.0960
1 141. 2062. 3. 24. 2. 23.21 101.1302
0.0505
1 141. 2362. 3. 15. 1. 23.08 101.0471
0.0320
1 141. 2362. 3. 17. 1. 23.01 101.0224
0.1020
1 141. 2362. 3. 18. 1. 23.05 101.0702
0.0580
1 141. 2362. 3. 22. 1. 23.22 101.0904
0.1049
1 141. 2362. 3. 23. 2. 23.29 101.0626
0.0702
1 141. 2362. 3. 24. 2. 23.15 101.0686
0.0661
1 142. 1. 3. 15. 1. 23.02 94.3160
0.1372
1 142. 1. 3. 17. 1. 23.04 94.2808
0.0999
1 142. 1. 3. 18. 1. 22.73 94.2478
0.0803
1 142. 1. 3. 21. 1. 23.19 94.2862
0.0700
1 142. 1. 3. 23. 2. 23.25 94.1859
0.0899
1 142. 1. 3. 23. 1. 23.21 94.2389
0.0686
1 142. 281. 3. 16. 1. 22.98 94.2640
0.0862
1 142. 281. 3. 17. 1. 23.00 94.3333
0.1330
1 142. 281. 3. 18. 1. 22.88 94.2994
0.0908
1 142. 281. 3. 21. 1. 23.28 94.2873
0.0846
1 142. 281. 3. 23. 2. 23.07 94.2576
0.0795
1 142. 281. 3. 23. 1. 23.12 94.3027
0.0389
1 142. 283. 3. 16. 1. 22.92 94.2846
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
0.1021
1 142. 283. 3. 17. 1. 23.08 94.2197
0.0627
1 142. 283. 3. 18. 1. 23.09 94.2119
0.0785
1 142. 283. 3. 21. 1. 23.29 94.2536
0.0712
1 142. 283. 3. 22. 1. 23.34 94.2280
0.0692
1 142. 283. 3. 24. 2. 22.92 94.2944
0.0958
1 142. 2062. 3. 16. 1. 22.96 94.2238
0.0492
1 142. 2062. 3. 17. 1. 22.95 94.3061
0.2194
1 142. 2062. 3. 18. 1. 23.16 94.1868
0.0474
1 142. 2062. 3. 21. 1. 23.11 94.2645
0.0697
1 142. 2062. 3. 22. 1. 23.31 94.3101
0.0532
1 142. 2062. 3. 24. 2. 23.24 94.2204
0.1023
1 142. 2362. 3. 15. 1. 23.08 94.2437
0.0503
1 142. 2362. 3. 17. 1. 23.00 94.2115
0.0919
1 142. 2362. 3. 18. 1. 22.99 94.2348
0.0282
1 142. 2362. 3. 22. 1. 23.26 94.2124
0.0513
1 142. 2362. 3. 23. 2. 23.27 94.2214
0.0627
1 142. 2362. 3. 24. 2. 23.08 94.1651
0.1010
2 138. 1. 4. 13. 1. 23.12 95.1996
0.0645
2 138. 1. 4. 15. 1. 22.73 95.1315
0.1192
2 138. 1. 4. 18. 2. 22.76 95.1845
0.0452
2 138. 1. 4. 19. 1. 22.73 95.1359
0.1498
2 138. 1. 4. 20. 2. 22.73 95.1435
0.0629
2 138. 1. 4. 21. 2. 22.93 95.1839
0.0563
2 138. 281. 4. 14. 2. 22.46 95.2106
0.1049
2 138. 281. 4. 18. 2. 22.80 95.2505
0.0771
2 138. 281. 4. 18. 2. 22.77 95.2648
0.1046
2 138. 281. 4. 20. 2. 22.80 95.2197
0.1779
2 138. 281. 4. 20. 2. 22.87 95.2003
0.1376
2 138. 281. 4. 21. 2. 22.95 95.0982
0.1611
2 138. 283. 4. 18. 2. 22.83 95.1211
0.0794
2 138. 283. 4. 13. 1. 23.17 95.1327
0.0409
2 138. 283. 4. 18. 1. 22.67 95.2053
0.1525
2 138. 283. 4. 19. 2. 23.00 95.1292
0.0655
2 138. 283. 4. 21. 2. 22.91 95.1669
0.0619
2 138. 283. 4. 21. 2. 22.96 95.1401
0.0831
2 138. 2062. 4. 15. 1. 22.64 95.2479
0.2867
2 138. 2062. 4. 15. 1. 22.67 95.2224
0.1945
2 138. 2062. 4. 19. 2. 22.99 95.2810
0.1960
2 138. 2062. 4. 19. 1. 22.75 95.1869
0.1571
2 138. 2062. 4. 21. 2. 22.84 95.3053
0.2012
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
2 138. 2062. 4. 21. 2. 22.92 95.1432
0.1532
2 138. 2362. 4. 12. 1. 22.74 95.1687
0.0785
2 138. 2362. 4. 18. 2. 22.75 95.1564
0.0430
2 138. 2362. 4. 19. 2. 22.88 95.1354
0.0983
2 138. 2362. 4. 19. 1. 22.73 95.0422
0.0773
2 138. 2362. 4. 20. 2. 22.86 95.1354
0.0587
2 138. 2362. 4. 21. 2. 22.94 95.1075
0.0776
2 139. 1. 4. 13. 2. 23.14 99.3274
0.0220
2 139. 1. 4. 15. 2. 22.77 99.5020
0.0997
2 139. 1. 4. 18. 2. 22.80 99.4016
0.0704
2 139. 1. 4. 19. 1. 22.68 99.3181
0.1245
2 139. 1. 4. 20. 2. 22.78 99.3858
0.0903
2 139. 1. 4. 21. 2. 22.93 99.3141
0.0255
2 139. 281. 4. 14. 2. 23.05 99.2915
0.0859
2 139. 281. 4. 15. 2. 22.71 99.4032
0.1322
2 139. 281. 4. 18. 2. 22.79 99.4612
0.1765
2 139. 281. 4. 20. 2. 22.74 99.4001
0.0889
2 139. 281. 4. 20. 2. 22.91 99.3765
0.1041
2 139. 281. 4. 21. 2. 22.92 99.3507
0.0717
2 139. 283. 4. 13. 2. 23.11 99.3848
0.0792
2 139. 283. 4. 18. 2. 22.84 99.4952
0.1122
2 139. 283. 4. 18. 2. 22.76 99.3220
0.0915
2 139. 283. 4. 19. 2. 23.03 99.4165
0.0503
2 139. 283. 4. 21. 2. 22.87 99.3791
0.1138
2 139. 283. 4. 21. 2. 22.98 99.3985
0.0661
2 139. 2062. 4. 14. 2. 22.43 99.4283
0.0891
2 139. 2062. 4. 15. 2. 22.70 99.4139
0.2147
2 139. 2062. 4. 19. 2. 22.97 99.3813
0.1143
2 139. 2062. 4. 19. 1. 22.77 99.4314
0.1685
2 139. 2062. 4. 21. 2. 22.79 99.4166
0.2080
2 139. 2062. 4. 21. 2. 22.94 99.4052
0.2400
2 139. 2362. 4. 12. 1. 22.82 99.3408
0.1279
2 139. 2362. 4. 18. 2. 22.77 99.3116
0.1131
2 139. 2362. 4. 19. 2. 22.82 99.3241
0.0519
2 139. 2362. 4. 19. 1. 22.74 99.2991
0.0903
2 139. 2362. 4. 20. 2. 22.88 99.3049
0.0783
2 139. 2362. 4. 21. 2. 22.94 99.2782
0.0718
2 140. 1. 4. 13. 1. 23.10 96.0811
0.0463
2 140. 1. 4. 15. 2. 22.75 96.1460
0.0725
2 140. 1. 4. 18. 2. 22.78 96.1582
0.1428
2 140. 1. 4. 19. 1. 22.70 96.1039
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
0.1056
2 140. 1. 4. 20. 2. 22.75 96.1262
0.0672
2 140. 1. 4. 21. 2. 22.93 96.1478
0.0562
2 140. 281. 4. 15. 2. 22.71 96.1153
0.1097
2 140. 281. 4. 14. 2. 22.49 96.1297
0.1202
2 140. 281. 4. 18. 2. 22.81 96.1233
0.1331
2 140. 281. 4. 20. 2. 22.78 96.1731
0.1484
2 140. 281. 4. 20. 2. 22.89 96.0872
0.0857
2 140. 281. 4. 21. 2. 22.91 96.1331
0.0944
2 140. 283. 4. 13. 2. 23.22 96.1135
0.0983
2 140. 283. 4. 18. 2. 22.85 96.1111
0.1210
2 140. 283. 4. 18. 2. 22.78 96.1221
0.0644
2 140. 283. 4. 19. 2. 23.01 96.1063
0.0921
2 140. 283. 4. 21. 2. 22.91 96.1155
0.0704
2 140. 283. 4. 21. 2. 22.94 96.1308
0.0258
2 140. 2062. 4. 15. 2. 22.60 95.9767
0.2225
2 140. 2062. 4. 15. 2. 22.66 96.1277
0.1792
2 140. 2062. 4. 19. 2. 22.96 96.1858
0.1312
2 140. 2062. 4. 19. 1. 22.75 96.1912
0.1936
2 140. 2062. 4. 21. 2. 22.82 96.1650
0.1902
2 140. 2062. 4. 21. 2. 22.92 96.1603
0.1777
2 140. 2362. 4. 12. 1. 22.88 96.0793
0.0996
2 140. 2362. 4. 18. 2. 22.76 96.1115
0.0533
2 140. 2362. 4. 19. 2. 22.79 96.0803
0.0364
2 140. 2362. 4. 19. 1. 22.71 96.0411
0.0768
2 140. 2362. 4. 20. 2. 22.84 96.0988
0.1042
2 140. 2362. 4. 21. 1. 22.94 96.0482
0.0868
2 141. 1. 4. 13. 1. 23.07 101.1984
0.0803
2 141. 1. 4. 15. 2. 22.72 101.1645
0.0914
2 141. 1. 4. 18. 2. 22.75 101.2454
0.1109
2 141. 1. 4. 19. 1. 22.69 101.1096
0.1376
2 141. 1. 4. 20. 2. 22.83 101.2066
0.0717
2 141. 1. 4. 21. 2. 22.93 101.0645
0.1205
2 141. 281. 4. 15. 2. 22.72 101.1615
0.1272
2 141. 281. 4. 14. 2. 22.40 101.1650
0.0595
2 141. 281. 4. 18. 2. 22.78 101.1815
0.1393
2 141. 281. 4. 20. 2. 22.73 101.1106
0.1189
2 141. 281. 4. 20. 2. 22.86 101.1420
0.0713
2 141. 281. 4. 21. 2. 22.94 101.0116
0.1088
2 141. 283. 4. 13. 2. 23.26 101.1554
0.0429
2 141. 283. 4. 18. 2. 22.85 101.1267
0.0751
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
2 141. 283. 4. 18. 2. 22.76 101.1227
0.0826
2 141. 283. 4. 19. 2. 22.82 101.0635
0.1715
2 141. 283. 4. 21. 2. 22.89 101.1264
0.1447
2 141. 283. 4. 21. 2. 22.96 101.0853
0.1189
2 141. 2062. 4. 15. 2. 22.65 101.1332
0.2532
2 141. 2062. 4. 15. 1. 22.68 101.1487
0.1413
2 141. 2062. 4. 19. 2. 22.95 101.1778
0.1772
2 141. 2062. 4. 19. 1. 22.77 101.0988
0.0884
2 141. 2062. 4. 21. 2. 22.87 101.1686
0.2940
2 141. 2062. 4. 21. 2. 22.94 101.3289
0.2072
2 141. 2362. 4. 12. 1. 22.83 101.1353
0.0585
2 141. 2362. 4. 18. 2. 22.83 101.1201
0.0868
2 141. 2362. 4. 19. 2. 22.91 101.0946
0.0855
2 141. 2362. 4. 19. 1. 22.71 100.9977
0.0645
2 141. 2362. 4. 20. 2. 22.87 101.0963
0.0638
2 141. 2362. 4. 21. 2. 22.94 101.0300
0.0549
2 142. 1. 4. 13. 1. 23.07 94.3049
0.1197
2 142. 1. 4. 15. 2. 22.73 94.3153
0.0566
2 142. 1. 4. 18. 2. 22.77 94.3073
0.0875
2 142. 1. 4. 19. 1. 22.67 94.2803
0.0376
2 142. 1. 4. 20. 2. 22.80 94.3008
0.0703
2 142. 1. 4. 21. 2. 22.93 94.2916
0.0604
2 142. 281. 4. 14. 2. 22.90 94.2557
0.0619
2 142. 281. 4. 18. 2. 22.83 94.3542
0.1027
2 142. 281. 4. 18. 2. 22.80 94.3007
0.1492
2 142. 281. 4. 20. 2. 22.76 94.3351
0.1059
2 142. 281. 4. 20. 2. 22.88 94.3406
0.1508
2 142. 281. 4. 21. 2. 22.92 94.2621
0.0946
2 142. 283. 4. 13. 2. 23.25 94.3124
0.0534
2 142. 283. 4. 18. 2. 22.85 94.3680
0.1643
2 142. 283. 4. 18. 1. 22.67 94.3442
0.0346
2 142. 283. 4. 19. 2. 22.80 94.3391
0.0616
2 142. 283. 4. 21. 2. 22.91 94.2238
0.0721
2 142. 283. 4. 21. 2. 22.95 94.2721
0.0998
2 142. 2062. 4. 14. 2. 22.49 94.2915
0.2189
2 142. 2062. 4. 15. 2. 22.69 94.2803
0.0690
2 142. 2062. 4. 19. 2. 22.94 94.2818
0.0987
2 142. 2062. 4. 19. 1. 22.76 94.2227
0.2628
2 142. 2062. 4. 21. 2. 22.74 94.4109
0.1230
2 142. 2062. 4. 21. 2. 22.94 94.2616
0.0929
2 142. 2362. 4. 12. 1. 22.86 94.2052
2.6.1.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6111.htm[6/27/2012 1:52:09 PM]
0.0813
2 142. 2362. 4. 18. 2. 22.83 94.2824
0.0605
2 142. 2362. 4. 19. 2. 22.85 94.2396
0.0882
2 142. 2362. 4. 19. 1. 22.75 94.2087
0.0702
2 142. 2362. 4. 20. 2. 22.86 94.2937
0.0591
2 142. 2362. 4. 21. 1. 22.93 94.2330
0.0556
2.6.1.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc612.htm[6/27/2012 1:52:11 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
2.6.1.2. Analysis and interpretation
Graphs of
probe effect
on
repeatability
A graphical analysis shows repeatability standard deviations
plotted by wafer and probe. Probes are coded by numbers
with probe #2362 coded as #5. The plots show that for both
runs the precision of this probe is better than for the other
probes.
Probe #2362, because of its superior precision, was chosen
as the tool for measuring all 100 ohm.cm resistivity wafers at
NIST. Therefore, the remainder of the analysis focuses on
this probe.
Plot of
repeatability
standard
deviations
for probe
#2362 from
the nested
design over
days,
wafers, runs
The precision of probe #2362 is first checked for consistency
by plotting the repeatability standard deviations over days,
wafers and runs. Days are coded by letter. The plots verify
that, for both runs, probe repeatability is not dependent on
wafers or days although the standard deviations on days D,
E, and F of run 2 are larger in some instances than for the
other days. This is not surprising because repeated probing
on the wafer surfaces can cause slight degradation. Then the
repeatability standard deviations are pooled over:
K = 6 days for K(J - 1) = 30 degrees of freedom
L = 2 runs for LK(J - 1) = 60 degrees of freedom
Q = 5 wafers for QLK(J - 1) = 300 degrees of freedom
The results of pooling are shown below. Intermediate steps
are not shown, but the section on repeatability standard
deviations shows an example of pooling over wafers.
Pooled level-1 standard deviations (ohm.cm)
Probe Run 1 DF Run 2 DF Pooled
DF
2362. 0.0658 150 0.0758 150 0.0710
300
Graphs of
reproducibility
and stability for
Averages of the 6 center measurements on each wafer
are plotted on a single graph for each wafer. The points
(connected by lines) on the left side of each graph are
2.6.1.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc612.htm[6/27/2012 1:52:11 PM]
probe #2362 averages at the wafer center plotted over 5 days; the
points on the right are the same measurements repeated
after one month as a check on the stability of the
measurement process. The plots show day-to-day
variability as well as slight variability from run-to-run.
Earlier work discounts long-term drift in the gauge as the
cause of these changes. A reasonable conclusion is that
day-to-day and run-to-run variations come from random
fluctuations in the measurement process.
Level-2
(reproducibility)
standard
deviations
computed from
day averages
and pooled over
wafers and runs
Level-2 standard deviations (with K - 1 = 5 degrees of
freedom each) are computed from the daily averages that
are recorded in the database. Then the level-2 standard
deviations are pooled over:
L = 2 runs for L(K - 1) = 10 degrees of freedom
Q = 5 wafers for QL(K - 1) = 50 degrees of
freedom
as shown in the table below. The table shows that the
level-2 standard deviations are consistent over wafers
and runs.
Level-2 standard deviations (ohm.cm) for 5 wafers
Run 1 Run 2
Wafer Probe Average Stddev DF Average
Stddev DF
138. 2362. 95.0928 0.0359 5 95.1243
0.0453 5
139. 2362. 99.3060 0.0472 5 99.3098
0.0215 5
140. 2362. 96.0357 0.0273 5 96.0765
0.0276 5
141. 2362. 101.0602 0.0232 5 101.0790
0.0537 5
142. 2362. 94.2148 0.0274 5 94.2438
0.0370 5
2362. Pooled 0.0333 25
0.0388 25
(over 2 runs)
0.0362 50
Level-3
(stability)
standard
deviations
computed
from run
averages
and pooled
Level-3 standard deviations are computed from the averages
of the two runs. Then the level-3 standard deviations are
pooled over the five wafers to obtain a standard deviation with
5 degrees of freedom as shown in the table below.
2.6.1.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc612.htm[6/27/2012 1:52:11 PM]
over
wafers
Level-3 standard deviations (ohm.cm) for 5 wafers
Run 1 Run 2
Wafer Probe Average Average Diff
Stddev DF
138. 2362. 95.0928 95.1243 -0.0315
0.0223 1
139. 2362. 99.3060 99.3098 -0.0038
0.0027 1
140. 2362. 96.0357 96.0765 -0.0408
0.0289 1
141. 2362. 101.0602 101.0790 -0.0188
0.0133 1
142. 2362. 94.2148 94.2438 -0.0290
0.0205 1
2362. Pooled
0.0197 5
Graphs of
probe
biases
A graphical analysis shows the relative biases among the 5
probes. For each wafer, differences from the wafer average
by probe are plotted versus wafer number. The graphs verify
that probe #2362 (coded as 5) is biased low relative to the
other probes. The bias shows up more strongly after the
probes have been in use (run 2).
Formulas
for
computation
of biases for
probe
#2362
Biases by probe are shown in the following table.
Differences from the mean for each wafer
Wafer Probe Run 1 Run 2
138. 1. 0.0248 -0.0119
138. 281. 0.0108 0.0323
138. 283. 0.0193 -0.0258
138. 2062. -0.0175 0.0561
138. 2362. -0.0372 -0.0507
139. 1. -0.0036 -0.0007
139. 281. 0.0394 0.0050
139. 283. 0.0057 0.0239
139. 2062. -0.0323 0.0373
139. 2362. -0.0094 -0.0657
140. 1. 0.0400 0.0109
140. 281. 0.0187 0.0106
140. 283. -0.0201 0.0003
140. 2062. -0.0126 0.0182
140. 2362. -0.0261 -0.0398
141. 1. 0.0394 0.0324
141. 281. -0.0107 -0.0037
141. 283. 0.0246 -0.0191
141. 2062. -0.0280 0.0436
141. 2362. -0.0252 -0.0534
2.6.1.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc612.htm[6/27/2012 1:52:11 PM]
142. 1. 0.0062 0.0093
142. 281. 0.0376 0.0174
142. 283. -0.0044 0.0192
142. 2062. -0.0011 0.0008
142. 2362. -0.0383 -0.0469
How to
deal with
bias due to
the probe
Probe #2362 was chosen for the certification process because
of its superior precision, but its bias relative to the other
probes creates a problem. There are two possibilities for
handling this problem:
1. Correct all measurements made with probe #2362 to the
average of the probes.
2. Include the standard deviation for the difference among
probes in the uncertainty budget.
The better choice is (1) if we can assume that the probes in the
study represent a random sample of probes of this type. This is
particularly true when the unit (resistivity) is defined by a test
method.
2.6.1.3. Repeatability standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc613.htm[6/27/2012 1:52:12 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
2.6.1.3. Repeatability standard deviations
Run 1 -
Graph of
repeatability
standard
deviations
for probe
#2362 -- 6
days and 5
wafers
showing
that
repeatability
is constant
across
wafers and
days
Run 2 -
Graph of
repeatability
standard
deviations
for probe
#2362 -- 6
days and 5
wafers
showing
that
2.6.1.3. Repeatability standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc613.htm[6/27/2012 1:52:12 PM]
repeatability
is constant
across
wafers and
days
Run 1 -
Graph
showing
repeatability
standard
deviations
for five
probes as a
function of
wafers and
probes
Symbols for codes: 1 = #1; 2 = #281; 3 = #283; 4 = #2062; 5 =
#2362
2.6.1.3. Repeatability standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc613.htm[6/27/2012 1:52:12 PM]
Run 2 -
Graph
showing
repeatability
standard
deviations
for 5 probes
as a
function of
wafers and
probes
Symbols for probes: 1 = #1; 2 = #281; 3 = #283; 4 = #2062; 5 =
#2362
2.6.1.4. Effects of days and long-term stability
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc614.htm[6/27/2012 1:52:13 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
2.6.1.4. Effects of days and long-term stability
Effects of
days and
long-term
stability on
the
measurements
The data points that are plotted in the five graphs shown below are averages of
resistivity measurements at the center of each wafer for wafers #138, 139, 140, 141,
142. Data for each of two runs are shown on each graph. The six days of
measurements for each run are separated by approximately one month and show,
with the exception of wafer #139, that there is a very slight shift upwards between
run 1 and run 2. The size of the effect is estimated as a level-3 standard deviation in
the analysis of the data.
Wafer 138
Wafer 139
2.6.1.4. Effects of days and long-term stability
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc614.htm[6/27/2012 1:52:13 PM]
Wafer 140
Wafer 141
2.6.1.4. Effects of days and long-term stability
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc614.htm[6/27/2012 1:52:13 PM]
Wafer 142
2.6.1.5. Differences among 5 probes
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc615.htm[6/27/2012 1:52:14 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
2.6.1.5. Differences among 5 probes
Run 1 -
Graph of
differences
from
wafer
averages
for each of
5 probes
showing
that
probes
#2062 and
#2362 are
biased low
relative to
the other
probes
Symbols for probes: 1 = #1; 2 = #281; 3 = #283; 4 = #2062; 5 =
#2362
Run 2 -
Graph of
differences
from
wafer
averages
2.6.1.5. Differences among 5 probes
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc615.htm[6/27/2012 1:52:14 PM]
for each of
5 probes
showing
that probe
#2362
continues
to be
biased low
relative to
the other
probes
Symbols for probes: 1 = #1; 2 = #281; 3 = #283; 4 = #2062; 5 =
#2362
2.6.1.6. Run gauge study example using Dataplot
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc616.htm[6/27/2012 1:52:14 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
2.6.1.6. Run gauge study example using
Dataplot
View of
Dataplot
macros for
this case
study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output Window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps Results and Conclusions
Click on the links below to start Dataplot
and run this case study yourself. Each
step may use results from previous steps,
so please be patient. Wait until the
software verifies that the current step is
complete before clicking on the next step.
The links in this column will connect you
with more detailed information about
each analysis step from the case study
description.
Graphical analyses of variability
Graphs to test for:
1. Wafer/day effect on repeatability
(run 1)
2. Wafer/day effect on repeatability
(run 2)
3. Probe effect on repeatability (run 1)
4. Probe effect on repeatability (run 2)
5. Reproducibility and stability
1. and 2. Interpretation: The plots verify
that, for both runs, the repeatability of
probe #2362 is not dependent on wafers
or days, although the standard deviations
on days D, E, and F of run 2 are larger in
some instances than for the other days.
3. and 4. Interpretation: Probe #2362
appears as #5 in the plots which show
that, for both runs, the precision of this
probe is better than for the other probes.
5. Interpretation: There is a separate plot
for each wafer. The points on the left side
of each plot are averages at the wafer
center plotted over 5 days; the points on
2.6.1.6. Run gauge study example using Dataplot
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc616.htm[6/27/2012 1:52:14 PM]
the right are the same measurements
repeated after one month to check on the
stability of the measurement process. The
plots show day-to-day variability as well
as slight variability from run-to-run.
Table of estimates for probe #2362
1. Level-1 (repeatability)
2. Level-2 (reproducibility)
3. Level-3 (stability)
1., 2. and 3.: Interpretation: The
repeatability of the gauge (level-1
standard deviation) dominates the
imprecision associated with
measurements and days and runs are less
important contributors. Of course, even if
the gauge has high precision, biases may
contribute substantially to the uncertainty
of measurement.
Bias estimates
1. Differences among probes - run 1
2. Differences among probes - run 2
1. and 2. Interpretation: The graphs show
the relative biases among the 5 probes.
For each wafer, differences from the
wafer average by probe are plotted versus
wafer number. The graphs verify that
probe #2362 (coded as 5) is biased low
relative to the other probes. The bias
shows up more strongly after the probes
have been in use (run 2).
2.6.1.7. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc617.htm[6/27/2012 1:52:15 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.1. Gauge study of resistivity probes
2.6.1.7. Dataplot macros
Plot of wafer
and day effect
on
repeatability
standard
deviations for
run 1
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
read mpc61.dat run wafer probe mo day op hum y
sw
y1label ohm.cm
title GAUGE STUDY
lines blank all
let z = pattern 1 2 3 4 5 6 for I = 1 1 300
let z2 = wafer + z/10 -0.25
characters a b c d e f
X1LABEL WAFERS
X2LABEL REPEATABILITY STANDARD DEVIATIONS BY
WAFER AND DAY
X3LABEL CODE FOR DAYS: A, B, C, D, E, F
TITLE RUN 1
plot sw z2 day subset run 1
Plot of wafer
and day effect
on
repeatability
standard
deviations for
run 2
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
read mpc61.dat run wafer probe mo day op hum y
sw
y1label ohm.cm
title GAUGE STUDY
lines blank all
let z = pattern 1 2 3 4 5 6 for I = 1 1 300
let z2 = wafer + z/10 -0.25
characters a b c d e f
X1LABEL WAFERS
X2LABEL REPEATABILITY STANDARD DEVIATIONS BY
WAFER AND DAY
X3LABEL CODE FOR DAYS: A, B, C, D, E, F
TITLE RUN 2
plot sw z2 day subset run 2
Plot of
repeatability
standard
deviations for
5 probes - run
1
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
read mpc61.dat run wafer probe mo day op hum y
sw
y1label ohm.cm
title GAUGE STUDY
lines blank all
let z = pattern 1 2 3 4 5 6 for I = 1 1 300
let z2 = wafer + z/10 -0.25
characters 1 2 3 4 5
X1LABEL WAFERS
X2LABEL REPEATABILITY STANDARD DEVIATIONS BY
WAFER AND PROBE
X3LABEL CODE FOR PROBES: 1= SRM1; 2= 281; 3=283;
4=2062; 5=2362
TITLE RUN 1
plot sw z2 probe subset run 1
2.6.1.7. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc617.htm[6/27/2012 1:52:15 PM]
Plot of
repeatability
standard
deviations for
5 probes - run
2
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
read mpc61.dat run wafer probe mo day op hum y
sw
y1label ohm.cm
title GAUGE STUDY
lines blank all
let z = pattern 1 2 3 4 5 6 for I = 1 1 300
let z2 = wafer + z/10 -0.25
characters 1 2 3 4 5
X1LABEL WAFERS
X2LABEL REPEATABILITY STANDARD DEVIATIONS BY
WAFER AND PROBE
X3LABEL CODE FOR PROBES: 1= SRM1; 2= 281; 3=283;
4=2062; 5=2362
TITLE RUN 2
plot sw z2 probe subset run 2
Plot of
differences
from the wafer
mean for 5
probes - run 1
reset data
reset plot control
reset i/o
dimension 500 30
read mpc61a.dat wafer probe d1 d2
let biasrun1 = mean d1 subset probe 2362
print biasrun1
title GAUGE STUDY FOR 5 PROBES
Y1LABEL OHM.CM
lines dotted dotted dotted dotted dotted solid
characters 1 2 3 4 5 blank
xlimits 137 143
let zero = pattern 0 for I = 1 1 30
x1label DIFFERENCES AMONG PROBES VS WAFER (RUN
1)
plot d1 wafer probe and
plot zero wafer
Plot of
differences
from the wafer
mean for 5
probes - run 2
reset data
reset plot control
reset i/o
dimension 500 30
read mpc61a.dat wafer probe d1 d2
let biasrun2 = mean d2 subset probe 2362
print biasrun2
title GAUGE STUDY FOR 5 PROBES
Y1LABEL OHM.CM
lines dotted dotted dotted dotted dotted solid
characters 1 2 3 4 5 blank
xlimits 137 143
let zero = pattern 0 for I = 1 1 30
x1label DIFFERENCES AMONG PROBES VS WAFER (RUN
2)
plot d2 wafer probe and
plot zero wafer
Plot of
averages by
day showing
reproducibility
and stability
for
measurements
made with
probe #2362
on 5 wafers
reset data
reset plot control
reset i/o
dimension 300 50
label size 3
read mcp61b.dat wafer probe mo1 day1 y1 mo2
day2 y2 diff
let t = mo1+(day1-1)/31.
let t2= mo2+(day2-1)/31.
x3label WAFER 138
multiplot 3 2
plot y1 t subset wafer 138 and
plot y2 t2 subset wafer 138
x3label wafer 139
plot y1 t subset wafer 139 and
plot y2 t2 subset wafer 139
x3label WAFER 140
plot y1 t subset wafer 140 and
plot y2 t2 subset wafer 140
x3label WAFER 140
plot y1 t subset wafer 141 and
plot y2 t2 subset wafer 141
2.6.1.7. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc617.htm[6/27/2012 1:52:15 PM]
x3label WAFER 142
plot y1 t subset wafer 142 and
plot y2 t2 subset wafer 142
2.6.2. Check standard for resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc62.htm[6/27/2012 1:52:16 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity
measurements
Purpose The purpose of this page is to outline the analysis of check
standard data with respect to controlling the precision and
long-term variability of the process.
Outline 1. Background and data
2. Analysis and interpretation
3. Run this example yourself using Dataplot
2.6.2.1. Background and data
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc621.htm[6/27/2012 1:52:17 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity measurements
2.6.2.1. Background and data
Explanation
of check
standard
measurements
The process involves the measurement of resistivity
(ohm.cm) of individual silicon wafers cut from a single
crystal (# 51939). The wafers were doped with
phosphorous to give a nominal resistivity of 100 ohm.cm.
A single wafer (#137), chosen at random from a batch of
130 wafers, was designated as the check standard for this
process.
Design of
data
collection and
Database
The measurements were carried out according to an ASTM
Test Method (F84) with NIST probe #2362. The
measurements on the check standard duplicate certification
measurements that were being made, during the same time
period, on individual wafers from crystal #51939. For the
check standard there were:
J = 6 repetitions at the center of the wafer on each
day
K = 25 days
The K = 25 days cover the time during which the
individual wafers were being certified at the National
Institute of Standards and Technology.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
2.6.2.1.1. Database for resistivity check standard
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6211.htm[6/27/2012 1:52:17 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity measurements
2.6.2.1. Background and data
2.6.2.1.1. Database for resistivity check
standard
Description
of check
standard
A single wafer (#137), chosen at random from a batch of
130 wafers, is the check standard for resistivity
measurements at the 100 ohm.cm level at the National
Institute of Standards and Technology. The average of six
measurements at the center of the wafer is the check
standard value for one occasion, and the standard deviation
of the six measurements is the short-term standard
deviation. The columns of the database contain the
following:
1. Crystal ID
2. Check standard ID
3. Month
4. Day
5. Hour
6. Minute
7. Operator
8. Humidity
9. Probe ID
10. Temperature
11. Check standard value
12. Short-term standard deviation
13. Degrees of freedom
Database of
measurements
on check
standard
Crystal Waf Mo Da Hr Mn Op Hum Probe Temp Avg
Stddev DF
51939 137 03 24 18 01 drr 42 2362 23.003 97.070
0.085 5
51939 137 03 25 12 41 drr 35 2362 23.115 97.049
0.052 5
51939 137 03 25 15 57 drr 33 2362 23.196 97.048
0.038 5
51939 137 03 28 10 10 JMT 47 2362 23.383 97.084
0.036 5
51939 137 03 28 13 31 JMT 44 2362 23.491 97.106
0.049 5
51939 137 03 28 17 33 drr 43 2362 23.352 97.014
0.036 5
2.6.2.1.1. Database for resistivity check standard
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6211.htm[6/27/2012 1:52:17 PM]
51939 137 03 29 14 40 drr 36 2362 23.202 97.047
0.052 5
51939 137 03 29 16 33 drr 35 2362 23.222 97.078
0.117 5
51939 137 03 30 05 45 JMT 32 2362 23.337 97.065
0.085 5
51939 137 03 30 09 26 JMT 33 2362 23.321 97.061
0.052 5
51939 137 03 25 14 59 drr 34 2362 22.993 97.060
0.060 5
51939 137 03 31 10 10 JMT 37 2362 23.164 97.102
0.048 5
51939 137 03 31 13 00 JMT 37 2362 23.169 97.096
0.026 5
51939 137 03 31 15 32 JMT 35 2362 23.156 97.035
0.088 5
51939 137 04 01 13 05 JMT 34 2362 23.097 97.114
0.031 5
51939 137 04 01 15 32 JMT 34 2362 23.127 97.069
0.037 5
51939 137 04 01 10 32 JMT 48 2362 22.963 97.095
0.032 5
51939 137 04 06 14 38 JMT 49 2362 23.454 97.088
0.056 5
51939 137 04 07 10 50 JMT 34 2362 23.285 97.079
0.067 5
51939 137 04 07 15 46 JMT 33 2362 23.123 97.016
0.116 5
51939 137 04 08 09 37 JMT 33 2362 23.373 97.051
0.046 5
51939 137 04 08 12 53 JMT 33 2362 23.296 97.070
0.078 5
51939 137 04 08 15 03 JMT 33 2362 23.218 97.065
0.040 5
51939 137 04 11 09 30 JMT 36 2362 23.415 97.111
0.038 5
51939 137 04 11 11 34 JMT 35 2362 23.395 97.073
0.039 5
2.6.2.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc622.htm[6/27/2012 1:52:18 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity measurements
2.6.2.2. Analysis and interpretation
Estimates of
the
repeatability
standard
deviation and
level-2
standard
deviation
The level-1 standard deviations (with J - 1 = 5 degrees of
freedom each) from the database are pooled over the K = 25
days to obtain a reliable estimate of repeatability. This
pooled value is
s
1
= 0.06139 ohm.cm
with K(J - 1) = 125 degrees of freedom. The level-2
standard deviation is computed from the daily averages to
be
s
2
= 0.02680 ohm.cm
with K - 1 = 24 degrees of freedom.
Relationship
to uncertainty
calculations
These standard deviations are appropriate for estimating the
uncertainty of the average of six measurements on a wafer
that is of the same material and construction as the check
standard. The computations are explained in the section on
sensitivity coefficients for check standard measurements.
For other numbers of measurements on the test wafer, the
computations are explained in the section on sensitivity
coefficients for level-2 designs.
Illustrative
table showing
computations
of
repeatability
and level-2
standard
deviations
A tabular presentation of a subset of check standard data (J
= 6 repetitions and K = 6 days) illustrates the computations.
The pooled repeatability standard deviation with K(J - 1) =
30 degrees of freedom from this limited database is shown
in the next to last row of the table. A level-2 standard
deviation with K - 1= 5 degrees of freedom is computed
from the center averages and is shown in the last row of the
table.
Control chart
for probe
#2362
The control chart for monitoring the precision of probe
#2362 is constructed as discussed in the section on control
charts for standard deviations. The upper control limit
(UCL) for testing for degradation of the probe is computed
using the critical value from the F table with numerator
degrees of freedom J - 1 = 5 and denominator degrees of
freedom K(J - 1) = 125. For a 0.05 significance level,
2.6.2.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc622.htm[6/27/2012 1:52:18 PM]
Interpretation
of control
chart for
probe #2362
The control chart shows two points exceeding the upper
control limit. We expect 5 % of the standard deviations to
exceed the UCL for a measurement process that is in-
control. Two outliers are not indicative of significant
problems with the repeatability for the probe, but the probe
should be monitored closely in the future.
Control chart
for bias and
variability
The control limits for monitoring the bias and long-term
variability of resistivity with a Shewhart control chart are
given by
UCL = Average + 2*s
2
= 97.1234 ohm.cm
Centerline = Average = 97.0698 ohm.cm
LCL = Average - 2*s
2
= 97.0162 ohm.cm
Interpretation
of control
chart for bias
The control chart shows that the points scatter randomly
about the center line with no serious problems, although one
point exceeds the upper control limit and one point exceeds
the lower control limit by a small amount. The conclusion is
that there is:
No evidence of bias, change or drift in the
measurement process.
No evidence of long-term lack of control.
Future measurements that exceed the control limits must be
evaluated for long-term changes in bias and/or variability.
2.6.2.2.1. Repeatability and level-2 standard deviations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6221.htm[6/27/2012 1:52:19 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity measurements
2.6.2.2. Analysis and interpretation
2.6.2.2.1. Repeatability and level-2 standard
deviations
Example The table below illustrates the computation of repeatability and
level-2 standard deviations from measurements on a check standard.
The check standard measurements are resistivities at the center of a
100 ohm.cm wafer. There are J = 6 repetitions per day and K = 5
days for this example.
Table of
data,
averages,
and
repeatability
standard
deviations
Measurements on check standard #137
Repetitions per day
Days 1 2 3 4 5 6
1 96.920 97.054 97.057 97.035 97.189 96.965
2 97.118 96.947 97.110 97.047 96.945 97.013
3 97.034 97.084 97.023 97.045 97.061 97.074
4 97.047 97.099 97.087 97.076 97.117 97.070
5 97.127 97.067 97.106 96.995 97.052 97.121
6 96.995 96.984 97.053 97.065 96.976 96.997
Averages 97.040 97.039 97.073 97.044 97.057 97.037
Repeatability
Standard
Deviations
0.0777 0.0602 0.0341 0.0281 0.0896 0.0614
Pooled
Repeatability
Standard
Deviation
0.0625
30 df
Level-2
Standard
Deviation
0.0139
5 df
2.6.2.3. Control chart for probe precision
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc623.htm[6/27/2012 1:52:20 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity measurements
2.6.2.3. Control chart for probe precision
Control
chart for
probe
#2362
showing
violations
of the
control
limits --
all
standard
deviations
are based
on 6
repetitions
and the
control
limits are
95%
limits
2.6.2.4. Control chart for bias and long-term variability
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc624.htm[6/27/2012 1:52:20 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity measurements
2.6.2.4. Control chart for bias and long-term variability
Shewhart
control chart
for
measurements
on a
resistivity
check
standard
showing that
the process is
in-control --
all
measurements
are averages
of 6
repetitions
2.6.2.5. Run check standard example yourself
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc625.htm[6/27/2012 1:52:21 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity measurements
2.6.2.5. Run check standard example yourself
View of
Dataplot
macros for
this case
study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot. It
is required that you have already downloaded and installed
Dataplot and configured your browser to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output Window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps Results and Conclusions
Click on the links below to start Dataplot
and run this case study yourself. Each
step may use results from previous steps,
so please be patient. Wait until the
software verifies that the current step is
complete before clicking on the next step.
The links in this column will connect you
with more detailed information about
each analysis step from the case study
description.
Graphical tests of assumptions
Histogram
Normal probability plot
The histogram and normal probability
plots show no evidence of non-normality.
Control chart for precision
Control chart for probe #2362
Computations:
1. Pooled repeatability standard
deviation
2. Control limit
The precision control chart shows two
points exceeding the upper control limit.
We expect 5% of the standard deviations
to exceed the UCL even when the
measurement process is in-control.
Control chart for check standard
Control chart for check standard #137
The Shewhart control chart shows that the
points scatter randomly about the center
2.6.2.5. Run check standard example yourself
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc625.htm[6/27/2012 1:52:21 PM]
Computations:
1. Average check standard value
2. Process standard deviation
3. Upper and lower control limits
line with no serious problems, although
one point exceeds the upper control limit
and one point exceeds the lower control
limit by a small amount. The conclusion
is that there is no evidence of bias or lack
of long-term control.
2.6.2.6. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc626.htm[6/27/2012 1:52:21 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.2. Check standard for resistivity measurements
2.6.2.6. Dataplot macros
Histogram
for check
standard
#137 to test
assumption
of normality
reset data
reset plot control
reset i/o
dimension 500 30
skip 14
read mpc62.dat crystal wafer mo day hour min op
hum probe temp y sw df
histogram y
Normal
probability
plot for
check
standard
#137 to test
assumption
of normality
reset data
reset plot control
reset i/o
dimension 500 30
skip 14
read mpc62.dat crystal wafer mo day hour min op
hum probe temp y sw df
normal probabilty plot y
Control
chart for
precision of
probe
#2372 and
computation
of control
parameter
estimates
reset data
reset plot control
reset i/o
dimension 500 30
skip 14
read mpc62.dat crystal wafer mo day hour min op
hum probe temp y sw df
let time = mo +(day-1)/31.
let s = sw*sw
let spool = mean s
let spool = spool**.5
print spool
let f = fppf(.95, 5, 125)
let ucl = spool*(f)**.5
print ucl
title Control chart for precision
characters blank blank O
lines solid dashed blank
y1label ohm.cm
x1label Time in days
x2label Standard deviations with probe #2362
x3label 5% upper control limit
let center = sw - sw + spool
let cl = sw - sw + ucl
plot center cl sw vs time
Shewhart
control
chart for
check
standard
#137 with
computation
of control
reset data
reset plot control
reset i/o
dimension 500 30
skip 14
read mpc62.dat crystal wafer mo day hour min op
hum probe temp y sw df
let time = mo +(day-1)/31.
let avg = mean y
let sprocess = standard deviation y
let ucl = avg + 2*sprocess
let lcl = avg - 2*sprocess
2.6.2.6. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc626.htm[6/27/2012 1:52:21 PM]
chart
parameters
print avg
print sprocess
print ucl lcl
title Shewhart control chart
characters O blank blank blank
lines blank dashed solid dashed
y1label ohm.cm
x1label Time in days
x2label Check standard 137 with probe 2362
x3label 2-sigma control limits
let ybar = y - y + avg
let lc1 = y - y + lcl
let lc2 = y - y + ucl
plot y lc1 ybar lc2 vs time
2.6.3. Evaluation of type A uncertainty
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc63.htm[6/27/2012 1:52:22 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.3. Evaluation of type A uncertainty
Purpose The purpose of this case study is to demonstrate the
computation of uncertainty for a measurement process with
several sources of uncertainty from data taken during a gauge
study.
Outline 1. Background and data for the study
2. Graphical and quantitative analyses and interpretations
3. Run this example yourself with Dataplot
2.6.3.1. Background and data
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc631.htm[6/27/2012 1:52:23 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.3. Evaluation of type A uncertainty
2.6.3.1. Background and data
Description
of
measurements
The measurements in question are resistivities (ohm.cm) of
silicon wafers. The intent is to calculate an uncertainty
associated with the resistivity measurements of
approximately 100 silicon wafers that were certified with
probe #2362 in wiring configuration A, according to
ASTM Method F84 (ASTM F84) which is the defined
reference for this measurement. The reported value for each
wafer is the average of six measurements made at the
center of the wafer on a single day. Probe #2362 is one of
five probes owned by the National Institute of Standards
and Technology that is capable of making the
measurements.
Sources of
uncertainty in
NIST
measurements
The uncertainty analysis takes into account the following
sources of variability:
Repeatability of measurements at the center of the
wafer
Day-to-day effects
Run-to-run effects
Bias due to probe #2362
Bias due to wiring configuration
Database of
3-level nested
design -- for
estimating
time-
dependent
sources of
uncertainty
The certification measurements themselves are not the
primary source for estimating uncertainty components
because they do not yield information on day-to-day effects
and long-term effects. The standard deviations for the three
time-dependent sources of uncertainty are estimated from a
3-level nested design. The design was replicated on each of
Q = 5 wafers which were chosen at random, for this
purpose, from the lot of wafers. The certification
measurements were made between the two runs in order to
check on the long-term stability of the process. The data
consist of repeatability standard deviations (with J - 1 = 5
degrees of freedom each) from measurements at the wafer
center.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
2.6.3.1. Background and data
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc631.htm[6/27/2012 1:52:23 PM]
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.3. Evaluation of type A uncertainty
2.6.3.1. Background and data
2.6.3.1.1. Database of resistivity measurements
Check
standards are
five wafers
chosen at
random from
a batch of
wafers
Measurements of resistivity (ohm.cm) were made
according to an ASTM Standard Test Method (F4) at the
National Institute of Standards and Technology to assess
the sources of uncertainty in the measurement system. The
gauges for the study were five probes owned by NIST; the
check standards for the study were five wafers selected at
random from a batch of wafers cut from one silicon crystal
doped with phosphorous to give a nominal resistivity of
100 ohm.cm.
Measurements
on the check
standards are
used to
estimate
repeatability,
day effect, run
effect
The effect of operator was not considered to be significant
for this study. Averages and standard deviations from J =
6 measurements at the center of each wafer are shown in
the table.
J = 6 measurements at the center of the wafer per
day
K = 6 days (one operator) per repetition
L = 2 runs (complete)
Q = 5 wafers (check standards 138, 139, 140, 141,
142)
I = 5 probes (1, 281, 283, 2062, 2362)
Standard
Run Wafer Probe Month Day Operator Temp Average
Deviation
1 138. 1. 3. 15. 1. 22.98 95.1772
0.1191
1 138. 1. 3. 17. 1. 23.02 95.1567
0.0183
1 138. 1. 3. 18. 1. 22.79 95.1937
0.1282
1 138. 1. 3. 21. 1. 23.17 95.1959
0.0398
1 138. 1. 3. 23. 2. 23.25 95.1442
0.0346
1 138. 1. 3. 23. 1. 23.20 95.0610
0.1539
1 138. 281. 3. 16. 1. 22.99 95.1591
0.0963
1 138. 281. 3. 17. 1. 22.97 95.1195
0.0606
1 138. 281. 3. 18. 1. 22.83 95.1065
0.0842
1 138. 281. 3. 21. 1. 23.28 95.0925
0.0973
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
1 138. 281. 3. 23. 2. 23.14 95.1990
0.1062
1 138. 281. 3. 23. 1. 23.16 95.1682
0.1090
1 138. 283. 3. 16. 1. 22.95 95.1252
0.0531
1 138. 283. 3. 17. 1. 23.08 95.1600
0.0998
1 138. 283. 3. 18. 1. 23.13 95.0818
0.1108
1 138. 283. 3. 21. 1. 23.28 95.1620
0.0408
1 138. 283. 3. 22. 1. 23.36 95.1735
0.0501
1 138. 283. 3. 24. 2. 22.97 95.1932
0.0287
1 138. 2062. 3. 16. 1. 22.97 95.1311
0.1066
1 138. 2062. 3. 17. 1. 22.98 95.1132
0.0415
1 138. 2062. 3. 18. 1. 23.16 95.0432
0.0491
1 138. 2062. 3. 21. 1. 23.16 95.1254
0.0603
1 138. 2062. 3. 22. 1. 23.28 95.1322
0.0561
1 138. 2062. 3. 24. 2. 23.19 95.1299
0.0349
1 138. 2362. 3. 15. 1. 23.08 95.1162
0.0480
1 138. 2362. 3. 17. 1. 23.01 95.0569
0.0577
1 138. 2362. 3. 18. 1. 22.97 95.0598
0.0516
1 138. 2362. 3. 22. 1. 23.23 95.1487
0.0386
1 138. 2362. 3. 23. 2. 23.28 95.0743
0.0256
1 138. 2362. 3. 24. 2. 23.10 95.1010
0.0420
1 139. 1. 3. 15. 1. 23.01 99.3528
0.1424
1 139. 1. 3. 17. 1. 23.00 99.2940
0.0660
1 139. 1. 3. 17. 1. 23.01 99.2340
0.1179
1 139. 1. 3. 21. 1. 23.20 99.3489
0.0506
1 139. 1. 3. 23. 2. 23.22 99.2625
0.1111
1 139. 1. 3. 23. 1. 23.22 99.3787
0.1103
1 139. 281. 3. 16. 1. 22.95 99.3244
0.1134
1 139. 281. 3. 17. 1. 22.98 99.3378
0.0949
1 139. 281. 3. 18. 1. 22.86 99.3424
0.0847
1 139. 281. 3. 22. 1. 23.17 99.4033
0.0801
1 139. 281. 3. 23. 2. 23.10 99.3717
0.0630
1 139. 281. 3. 23. 1. 23.14 99.3493
0.1157
1 139. 283. 3. 16. 1. 22.94 99.3065
0.0381
1 139. 283. 3. 17. 1. 23.09 99.3280
0.1153
1 139. 283. 3. 18. 1. 23.11 99.3000
0.0818
1 139. 283. 3. 21. 1. 23.25 99.3347
0.0972
1 139. 283. 3. 22. 1. 23.36 99.3929
0.1189
1 139. 283. 3. 23. 1. 23.18 99.2644
0.0622
1 139. 2062. 3. 16. 1. 22.94 99.3324
0.1531
1 139. 2062. 3. 17. 1. 23.08 99.3254
0.0543
1 139. 2062. 3. 18. 1. 23.15 99.2555
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
0.1024
1 139. 2062. 3. 18. 1. 23.18 99.1946
0.0851
1 139. 2062. 3. 22. 1. 23.27 99.3542
0.1227
1 139. 2062. 3. 24. 2. 23.23 99.2365
0.1218
1 139. 2362. 3. 15. 1. 23.08 99.2939
0.0818
1 139. 2362. 3. 17. 1. 23.02 99.3234
0.0723
1 139. 2362. 3. 18. 1. 22.93 99.2748
0.0756
1 139. 2362. 3. 22. 1. 23.29 99.3512
0.0475
1 139. 2362. 3. 23. 2. 23.25 99.2350
0.0517
1 139. 2362. 3. 24. 2. 23.05 99.3574
0.0485
1 140. 1. 3. 15. 1. 23.07 96.1334
0.1052
1 140. 1. 3. 17. 1. 23.08 96.1250
0.0916
1 140. 1. 3. 18. 1. 22.77 96.0665
0.0836
1 140. 1. 3. 21. 1. 23.18 96.0725
0.0620
1 140. 1. 3. 23. 2. 23.20 96.1006
0.0582
1 140. 1. 3. 23. 1. 23.21 96.1131
0.1757
1 140. 281. 3. 16. 1. 22.94 96.0467
0.0565
1 140. 281. 3. 17. 1. 22.99 96.1081
0.1293
1 140. 281. 3. 18. 1. 22.91 96.0578
0.1148
1 140. 281. 3. 22. 1. 23.15 96.0700
0.0495
1 140. 281. 3. 22. 1. 23.33 96.1052
0.1722
1 140. 281. 3. 23. 1. 23.19 96.0952
0.1786
1 140. 283. 3. 16. 1. 22.89 96.0650
0.1301
1 140. 283. 3. 17. 1. 23.07 96.0870
0.0881
1 140. 283. 3. 18. 1. 23.07 95.8906
0.1842
1 140. 283. 3. 21. 1. 23.24 96.0842
0.1008
1 140. 283. 3. 22. 1. 23.34 96.0189
0.0865
1 140. 283. 3. 23. 1. 23.19 96.1047
0.0923
1 140. 2062. 3. 16. 1. 22.95 96.0379
0.2190
1 140. 2062. 3. 17. 1. 22.97 96.0671
0.0991
1 140. 2062. 3. 18. 1. 23.15 96.0206
0.0648
1 140. 2062. 3. 21. 1. 23.14 96.0207
0.1410
1 140. 2062. 3. 22. 1. 23.32 96.0587
0.1634
1 140. 2062. 3. 24. 2. 23.17 96.0903
0.0406
1 140. 2362. 3. 15. 1. 23.08 96.0771
0.1024
1 140. 2362. 3. 17. 1. 23.00 95.9976
0.0943
1 140. 2362. 3. 18. 1. 23.01 96.0148
0.0622
1 140. 2362. 3. 22. 1. 23.27 96.0397
0.0702
1 140. 2362. 3. 23. 2. 23.24 96.0407
0.0627
1 140. 2362. 3. 24. 2. 23.13 96.0445
0.0622
1 141. 1. 3. 15. 1. 23.01 101.2124
0.0900
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
1 141. 1. 3. 17. 1. 23.08 101.1018
0.0820
1 141. 1. 3. 18. 1. 22.75 101.1119
0.0500
1 141. 1. 3. 21. 1. 23.21 101.1072
0.0641
1 141. 1. 3. 23. 2. 23.25 101.0802
0.0704
1 141. 1. 3. 23. 1. 23.19 101.1350
0.0699
1 141. 281. 3. 16. 1. 22.93 101.0287
0.0520
1 141. 281. 3. 17. 1. 23.00 101.0131
0.0710
1 141. 281. 3. 18. 1. 22.90 101.1329
0.0800
1 141. 281. 3. 22. 1. 23.19 101.0562
0.1594
1 141. 281. 3. 23. 2. 23.18 101.0891
0.1252
1 141. 281. 3. 23. 1. 23.17 101.1283
0.1151
1 141. 283. 3. 16. 1. 22.85 101.1597
0.0990
1 141. 283. 3. 17. 1. 23.09 101.0784
0.0810
1 141. 283. 3. 18. 1. 23.08 101.0715
0.0460
1 141. 283. 3. 21. 1. 23.27 101.0910
0.0880
1 141. 283. 3. 22. 1. 23.34 101.0967
0.0901
1 141. 283. 3. 24. 2. 23.00 101.1627
0.0888
1 141. 2062. 3. 16. 1. 22.97 101.1077
0.0970
1 141. 2062. 3. 17. 1. 22.96 101.0245
0.1210
1 141. 2062. 3. 18. 1. 23.19 100.9650
0.0700
1 141. 2062. 3. 18. 1. 23.18 101.0319
0.1070
1 141. 2062. 3. 22. 1. 23.34 101.0849
0.0960
1 141. 2062. 3. 24. 2. 23.21 101.1302
0.0505
1 141. 2362. 3. 15. 1. 23.08 101.0471
0.0320
1 141. 2362. 3. 17. 1. 23.01 101.0224
0.1020
1 141. 2362. 3. 18. 1. 23.05 101.0702
0.0580
1 141. 2362. 3. 22. 1. 23.22 101.0904
0.1049
1 141. 2362. 3. 23. 2. 23.29 101.0626
0.0702
1 141. 2362. 3. 24. 2. 23.15 101.0686
0.0661
1 142. 1. 3. 15. 1. 23.02 94.3160
0.1372
1 142. 1. 3. 17. 1. 23.04 94.2808
0.0999
1 142. 1. 3. 18. 1. 22.73 94.2478
0.0803
1 142. 1. 3. 21. 1. 23.19 94.2862
0.0700
1 142. 1. 3. 23. 2. 23.25 94.1859
0.0899
1 142. 1. 3. 23. 1. 23.21 94.2389
0.0686
1 142. 281. 3. 16. 1. 22.98 94.2640
0.0862
1 142. 281. 3. 17. 1. 23.00 94.3333
0.1330
1 142. 281. 3. 18. 1. 22.88 94.2994
0.0908
1 142. 281. 3. 21. 1. 23.28 94.2873
0.0846
1 142. 281. 3. 23. 2. 23.07 94.2576
0.0795
1 142. 281. 3. 23. 1. 23.12 94.3027
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
0.0389
1 142. 283. 3. 16. 1. 22.92 94.2846
0.1021
1 142. 283. 3. 17. 1. 23.08 94.2197
0.0627
1 142. 283. 3. 18. 1. 23.09 94.2119
0.0785
1 142. 283. 3. 21. 1. 23.29 94.2536
0.0712
1 142. 283. 3. 22. 1. 23.34 94.2280
0.0692
1 142. 283. 3. 24. 2. 22.92 94.2944
0.0958
1 142. 2062. 3. 16. 1. 22.96 94.2238
0.0492
1 142. 2062. 3. 17. 1. 22.95 94.3061
0.2194
1 142. 2062. 3. 18. 1. 23.16 94.1868
0.0474
1 142. 2062. 3. 21. 1. 23.11 94.2645
0.0697
1 142. 2062. 3. 22. 1. 23.31 94.3101
0.0532
1 142. 2062. 3. 24. 2. 23.24 94.2204
0.1023
1 142. 2362. 3. 15. 1. 23.08 94.2437
0.0503
1 142. 2362. 3. 17. 1. 23.00 94.2115
0.0919
1 142. 2362. 3. 18. 1. 22.99 94.2348
0.0282
1 142. 2362. 3. 22. 1. 23.26 94.2124
0.0513
1 142. 2362. 3. 23. 2. 23.27 94.2214
0.0627
1 142. 2362. 3. 24. 2. 23.08 94.1651
0.1010
2 138. 1. 4. 13. 1. 23.12 95.1996
0.0645
2 138. 1. 4. 15. 1. 22.73 95.1315
0.1192
2 138. 1. 4. 18. 2. 22.76 95.1845
0.0452
2 138. 1. 4. 19. 1. 22.73 95.1359
0.1498
2 138. 1. 4. 20. 2. 22.73 95.1435
0.0629
2 138. 1. 4. 21. 2. 22.93 95.1839
0.0563
2 138. 281. 4. 14. 2. 22.46 95.2106
0.1049
2 138. 281. 4. 18. 2. 22.80 95.2505
0.0771
2 138. 281. 4. 18. 2. 22.77 95.2648
0.1046
2 138. 281. 4. 20. 2. 22.80 95.2197
0.1779
2 138. 281. 4. 20. 2. 22.87 95.2003
0.1376
2 138. 281. 4. 21. 2. 22.95 95.0982
0.1611
2 138. 283. 4. 18. 2. 22.83 95.1211
0.0794
2 138. 283. 4. 13. 1. 23.17 95.1327
0.0409
2 138. 283. 4. 18. 1. 22.67 95.2053
0.1525
2 138. 283. 4. 19. 2. 23.00 95.1292
0.0655
2 138. 283. 4. 21. 2. 22.91 95.1669
0.0619
2 138. 283. 4. 21. 2. 22.96 95.1401
0.0831
2 138. 2062. 4. 15. 1. 22.64 95.2479
0.2867
2 138. 2062. 4. 15. 1. 22.67 95.2224
0.1945
2 138. 2062. 4. 19. 2. 22.99 95.2810
0.1960
2 138. 2062. 4. 19. 1. 22.75 95.1869
0.1571
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
2 138. 2062. 4. 21. 2. 22.84 95.3053
0.2012
2 138. 2062. 4. 21. 2. 22.92 95.1432
0.1532
2 138. 2362. 4. 12. 1. 22.74 95.1687
0.0785
2 138. 2362. 4. 18. 2. 22.75 95.1564
0.0430
2 138. 2362. 4. 19. 2. 22.88 95.1354
0.0983
2 138. 2362. 4. 19. 1. 22.73 95.0422
0.0773
2 138. 2362. 4. 20. 2. 22.86 95.1354
0.0587
2 138. 2362. 4. 21. 2. 22.94 95.1075
0.0776
2 139. 1. 4. 13. 2. 23.14 99.3274
0.0220
2 139. 1. 4. 15. 2. 22.77 99.5020
0.0997
2 139. 1. 4. 18. 2. 22.80 99.4016
0.0704
2 139. 1. 4. 19. 1. 22.68 99.3181
0.1245
2 139. 1. 4. 20. 2. 22.78 99.3858
0.0903
2 139. 1. 4. 21. 2. 22.93 99.3141
0.0255
2 139. 281. 4. 14. 2. 23.05 99.2915
0.0859
2 139. 281. 4. 15. 2. 22.71 99.4032
0.1322
2 139. 281. 4. 18. 2. 22.79 99.4612
0.1765
2 139. 281. 4. 20. 2. 22.74 99.4001
0.0889
2 139. 281. 4. 20. 2. 22.91 99.3765
0.1041
2 139. 281. 4. 21. 2. 22.92 99.3507
0.0717
2 139. 283. 4. 13. 2. 23.11 99.3848
0.0792
2 139. 283. 4. 18. 2. 22.84 99.4952
0.1122
2 139. 283. 4. 18. 2. 22.76 99.3220
0.0915
2 139. 283. 4. 19. 2. 23.03 99.4165
0.0503
2 139. 283. 4. 21. 2. 22.87 99.3791
0.1138
2 139. 283. 4. 21. 2. 22.98 99.3985
0.0661
2 139. 2062. 4. 14. 2. 22.43 99.4283
0.0891
2 139. 2062. 4. 15. 2. 22.70 99.4139
0.2147
2 139. 2062. 4. 19. 2. 22.97 99.3813
0.1143
2 139. 2062. 4. 19. 1. 22.77 99.4314
0.1685
2 139. 2062. 4. 21. 2. 22.79 99.4166
0.2080
2 139. 2062. 4. 21. 2. 22.94 99.4052
0.2400
2 139. 2362. 4. 12. 1. 22.82 99.3408
0.1279
2 139. 2362. 4. 18. 2. 22.77 99.3116
0.1131
2 139. 2362. 4. 19. 2. 22.82 99.3241
0.0519
2 139. 2362. 4. 19. 1. 22.74 99.2991
0.0903
2 139. 2362. 4. 20. 2. 22.88 99.3049
0.0783
2 139. 2362. 4. 21. 2. 22.94 99.2782
0.0718
2 140. 1. 4. 13. 1. 23.10 96.0811
0.0463
2 140. 1. 4. 15. 2. 22.75 96.1460
0.0725
2 140. 1. 4. 18. 2. 22.78 96.1582
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
0.1428
2 140. 1. 4. 19. 1. 22.70 96.1039
0.1056
2 140. 1. 4. 20. 2. 22.75 96.1262
0.0672
2 140. 1. 4. 21. 2. 22.93 96.1478
0.0562
2 140. 281. 4. 15. 2. 22.71 96.1153
0.1097
2 140. 281. 4. 14. 2. 22.49 96.1297
0.1202
2 140. 281. 4. 18. 2. 22.81 96.1233
0.1331
2 140. 281. 4. 20. 2. 22.78 96.1731
0.1484
2 140. 281. 4. 20. 2. 22.89 96.0872
0.0857
2 140. 281. 4. 21. 2. 22.91 96.1331
0.0944
2 140. 283. 4. 13. 2. 23.22 96.1135
0.0983
2 140. 283. 4. 18. 2. 22.85 96.1111
0.1210
2 140. 283. 4. 18. 2. 22.78 96.1221
0.0644
2 140. 283. 4. 19. 2. 23.01 96.1063
0.0921
2 140. 283. 4. 21. 2. 22.91 96.1155
0.0704
2 140. 283. 4. 21. 2. 22.94 96.1308
0.0258
2 140. 2062. 4. 15. 2. 22.60 95.9767
0.2225
2 140. 2062. 4. 15. 2. 22.66 96.1277
0.1792
2 140. 2062. 4. 19. 2. 22.96 96.1858
0.1312
2 140. 2062. 4. 19. 1. 22.75 96.1912
0.1936
2 140. 2062. 4. 21. 2. 22.82 96.1650
0.1902
2 140. 2062. 4. 21. 2. 22.92 96.1603
0.1777
2 140. 2362. 4. 12. 1. 22.88 96.0793
0.0996
2 140. 2362. 4. 18. 2. 22.76 96.1115
0.0533
2 140. 2362. 4. 19. 2. 22.79 96.0803
0.0364
2 140. 2362. 4. 19. 1. 22.71 96.0411
0.0768
2 140. 2362. 4. 20. 2. 22.84 96.0988
0.1042
2 140. 2362. 4. 21. 1. 22.94 96.0482
0.0868
2 141. 1. 4. 13. 1. 23.07 101.1984
0.0803
2 141. 1. 4. 15. 2. 22.72 101.1645
0.0914
2 141. 1. 4. 18. 2. 22.75 101.2454
0.1109
2 141. 1. 4. 19. 1. 22.69 101.1096
0.1376
2 141. 1. 4. 20. 2. 22.83 101.2066
0.0717
2 141. 1. 4. 21. 2. 22.93 101.0645
0.1205
2 141. 281. 4. 15. 2. 22.72 101.1615
0.1272
2 141. 281. 4. 14. 2. 22.40 101.1650
0.0595
2 141. 281. 4. 18. 2. 22.78 101.1815
0.1393
2 141. 281. 4. 20. 2. 22.73 101.1106
0.1189
2 141. 281. 4. 20. 2. 22.86 101.1420
0.0713
2 141. 281. 4. 21. 2. 22.94 101.0116
0.1088
2 141. 283. 4. 13. 2. 23.26 101.1554
0.0429
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
2 141. 283. 4. 18. 2. 22.85 101.1267
0.0751
2 141. 283. 4. 18. 2. 22.76 101.1227
0.0826
2 141. 283. 4. 19. 2. 22.82 101.0635
0.1715
2 141. 283. 4. 21. 2. 22.89 101.1264
0.1447
2 141. 283. 4. 21. 2. 22.96 101.0853
0.1189
2 141. 2062. 4. 15. 2. 22.65 101.1332
0.2532
2 141. 2062. 4. 15. 1. 22.68 101.1487
0.1413
2 141. 2062. 4. 19. 2. 22.95 101.1778
0.1772
2 141. 2062. 4. 19. 1. 22.77 101.0988
0.0884
2 141. 2062. 4. 21. 2. 22.87 101.1686
0.2940
2 141. 2062. 4. 21. 2. 22.94 101.3289
0.2072
2 141. 2362. 4. 12. 1. 22.83 101.1353
0.0585
2 141. 2362. 4. 18. 2. 22.83 101.1201
0.0868
2 141. 2362. 4. 19. 2. 22.91 101.0946
0.0855
2 141. 2362. 4. 19. 1. 22.71 100.9977
0.0645
2 141. 2362. 4. 20. 2. 22.87 101.0963
0.0638
2 141. 2362. 4. 21. 2. 22.94 101.0300
0.0549
2 142. 1. 4. 13. 1. 23.07 94.3049
0.1197
2 142. 1. 4. 15. 2. 22.73 94.3153
0.0566
2 142. 1. 4. 18. 2. 22.77 94.3073
0.0875
2 142. 1. 4. 19. 1. 22.67 94.2803
0.0376
2 142. 1. 4. 20. 2. 22.80 94.3008
0.0703
2 142. 1. 4. 21. 2. 22.93 94.2916
0.0604
2 142. 281. 4. 14. 2. 22.90 94.2557
0.0619
2 142. 281. 4. 18. 2. 22.83 94.3542
0.1027
2 142. 281. 4. 18. 2. 22.80 94.3007
0.1492
2 142. 281. 4. 20. 2. 22.76 94.3351
0.1059
2 142. 281. 4. 20. 2. 22.88 94.3406
0.1508
2 142. 281. 4. 21. 2. 22.92 94.2621
0.0946
2 142. 283. 4. 13. 2. 23.25 94.3124
0.0534
2 142. 283. 4. 18. 2. 22.85 94.3680
0.1643
2 142. 283. 4. 18. 1. 22.67 94.3442
0.0346
2 142. 283. 4. 19. 2. 22.80 94.3391
0.0616
2 142. 283. 4. 21. 2. 22.91 94.2238
0.0721
2 142. 283. 4. 21. 2. 22.95 94.2721
0.0998
2 142. 2062. 4. 14. 2. 22.49 94.2915
0.2189
2 142. 2062. 4. 15. 2. 22.69 94.2803
0.0690
2 142. 2062. 4. 19. 2. 22.94 94.2818
0.0987
2 142. 2062. 4. 19. 1. 22.76 94.2227
0.2628
2 142. 2062. 4. 21. 2. 22.74 94.4109
0.1230
2 142. 2062. 4. 21. 2. 22.94 94.2616
2.6.3.1.1. Database of resistivity measurements
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6311.htm[6/27/2012 1:52:23 PM]
0.0929
2 142. 2362. 4. 12. 1. 22.86 94.2052
0.0813
2 142. 2362. 4. 18. 2. 22.83 94.2824
0.0605
2 142. 2362. 4. 19. 2. 22.85 94.2396
0.0882
2 142. 2362. 4. 19. 1. 22.75 94.2087
0.0702
2 142. 2362. 4. 20. 2. 22.86 94.2937
0.0591
2 142. 2362. 4. 21. 1. 22.93 94.2330
0.0556
2.6.3.1.2. Measurements on wiring configurations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6312.htm[6/27/2012 1:52:25 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.3. Evaluation of type A uncertainty
2.6.3.1. Background and data
2.6.3.1.2. Measurements on wiring
configurations
Check wafers
were
measured
with the probe
wired in two
configurations
Measurements of resistivity (ohm.cm) were made
according to an ASTM Standard Test Method (F4) to
identify differences between 2 wiring configurations for
probe #2362. The check standards for the study were five
wafers selected at random from a batch of wafers cut from
one silicon crystal doped with phosphorous to give a
nominal resistivity of 100 ohm.cm.
Description of
database
The data are averages of K = 6 days' measurements and J
= 6 repetitions at the center of each wafer. There are L = 2
complete runs, separated by two months time, on each
wafer.
The data recorded in the 10 columns are:
1. Wafer
2. Probe
3. Average - configuration A; run 1
4. Standard deviation - configuration A; run 1
5. Average - configuration B; run 1
6. Standard deviation - configuration B; run 1
7. Average - configuration A; run 2
8. Standard deviation - configuration A; run 2
9. Average - configuration B; run 2
10. Standard deviation - configuration B; run 2
Wafer Probe Config A-run1 Config B-run1 Config A-run2
Config B-run2.
138. 2362. 95.1162 0.0480 95.0993 0.0466 95.1687 0.0785
95.1589 0.0642
138. 2362. 95.0569 0.0577 95.0657 0.0450 95.1564 0.0430
95.1705 0.0730
138. 2362. 95.0598 0.0516 95.0622 0.0664 95.1354 0.0983
95.1221 0.0695
138. 2362. 95.1487 0.0386 95.1625 0.0311 95.0422 0.0773
95.0513 0.0840
138. 2362. 95.0743 0.0256 95.0599 0.0488 95.1354 0.0587
95.1531 0.0482
138. 2362. 95.1010 0.0420 95.0944 0.0393 95.1075 0.0776
95.1537 0.0230
139. 2362. 99.2939 0.0818 99.3018 0.0905 99.3408 0.1279
99.3637 0.1025
139. 2362. 99.3234 0.0723 99.3488 0.0350 99.3116 0.1131
99.3881 0.0451
2.6.3.1.2. Measurements on wiring configurations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6312.htm[6/27/2012 1:52:25 PM]
139. 2362. 99.2748 0.0756 99.3571 0.1993 99.3241 0.0519
99.3737 0.0699
139. 2362. 99.3512 0.0475 99.3512 0.1286 99.2991 0.0903
99.3066 0.0709
139. 2362. 99.2350 0.0517 99.2255 0.0738 99.3049 0.0783
99.3040 0.0744
139. 2362. 99.3574 0.0485 99.3605 0.0459 99.2782 0.0718
99.3680 0.0470
140. 2362. 96.0771 0.1024 96.0915 0.1257 96.0793 0.0996
96.1041 0.0890
140. 2362. 95.9976 0.0943 96.0057 0.0806 96.1115 0.0533
96.0774 0.0983
140. 2362. 96.0148 0.0622 96.0244 0.0833 96.0803 0.0364
96.1004 0.0758
140. 2362. 96.0397 0.0702 96.0422 0.0738 96.0411 0.0768
96.0677 0.0663
140. 2362. 96.0407 0.0627 96.0738 0.0800 96.0988 0.1042
96.0585 0.0960
140. 2362. 96.0445 0.0622 96.0557 0.1129 96.0482 0.0868
96.0062 0.0895
141. 2362. 101.0471 0.0320 101.0241 0.0670 101.1353 0.0585
101.1156 0.1027
141. 2362. 101.0224 0.1020 101.0660 0.1030 101.1201 0.0868
101.1077 0.1141
141. 2362. 101.0702 0.0580 101.0509 0.0710 101.0946 0.0855
101.0455 0.1070
141. 2362. 101.0904 0.1049 101.0983 0.0894 100.9977 0.0645
101.0274 0.0666
141. 2362. 101.0626 0.0702 101.0614 0.0849 101.0963 0.0638
101.1106 0.0788
141. 2362. 101.0686 0.0661 101.0811 0.0490 101.0300 0.0549
101.1073 0.0663
142. 2362. 94.2437 0.0503 94.2088 0.0815 94.2052 0.0813
94.2487 0.0719
142. 2362. 94.2115 0.0919 94.2043 0.1176 94.2824 0.0605
94.2886 0.0499
142. 2362. 94.2348 0.0282 94.2324 0.0519 94.2396 0.0882
94.2739 0.1075
142. 2362. 94.2124 0.0513 94.2347 0.0694 94.2087 0.0702
94.2023 0.0416
142. 2362. 94.2214 0.0627 94.2416 0.0757 94.2937 0.0591
94.2600 0.0731
142. 2362. 94.1651 0.1010 94.2287 0.0919 94.2330 0.0556
94.2406 0.0651
2.6.3.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc632.htm[6/27/2012 1:52:26 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.3. Evaluation of type A uncertainty
2.6.3.2. Analysis and interpretation
Purpose of
this page
The purpose of this page is to outline an analysis of data taken
during a gauge study to quantify the type A uncertainty
component for resistivity (ohm.cm) measurements on silicon
wafers made with a gauge that was part of the initial study.
Summary of
standard
deviations at
three levels
The level-1, level-2, and level-3 standard deviations for the
uncertainty analysis are summarized in the table below from the
gauge case study.
Standard deviations for probe #2362
Level Symbol Estimate DF
Level-1 s
1
0.0729 300
Level-2 s
2
0.0362 50
Level-3 s
3
0.0197 5
Calculation of
individual
components
for days and
runs
The standard deviation that estimates the day effect is
The standard deviation that estimates the run effect is
Calculation of
the standard
deviation of
the certified
value showing
sensitivity
coefficients
The certified value for each wafer is the average of N = 6
repeatability measurements at the center of the wafer on M = 1
days and over P = 1 runs. Notice that N, M and P are not
necessarily the same as the number of measurements in the
gauge study per wafer; namely, J, K and L. The standard
deviation of a certified value (for time-dependent sources of
error), is
Standard deviations for days and runs are included in this
calculation, even though there were no replications over days or
runs for the certification measurements. These factors contribute
2.6.3.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc632.htm[6/27/2012 1:52:26 PM]
to the overall uncertainty of the measurement process even
though they are not sampled for the particular measurements of
interest.
The equation
must be
rewritten to
calculate
degrees of
freedom
Degrees of freedom cannot be calculated from the equation
above because the calculations for the individual components
involve differences among variances. The table of sensitivity
coefficients for a 3-level design shows that for
N = J, M = 1, P = 1
the equation above can be rewritten in the form
Then the degrees of freedom can be approximated using the
Welch-Satterthwaite method.
Probe bias -
Graphs of
probe biases
A graphical analysis shows the relative biases among the 5
probes. For each wafer, differences from the wafer average by
probe are plotted versus wafer number. The graphs verify that
probe #2362 (coded as 5) is biased low relative to the other
probes. The bias shows up more strongly after the probes have
been in use (run 2).
How to deal
with bias due
to the probe
Probe #2362 was chosen for the certification process because of
its superior precision, but its bias relative to the other probes
creates a problem. There are two possibilities for handling this
problem:
1. Correct all measurements made with probe #2362 to the
average of the probes.
2. Include the standard deviation for the difference among
probes in the uncertainty budget.
The best strategy, as followed in the certification process, is to
correct all measurements for the average bias of probe #2362 and
take the standard deviation of the correction as a type A
component of uncertainty.
Correction for
bias or probe
#2362 and
uncertainty
Biases by probe and wafer are shown in the gauge case study.
Biases for probe #2362 are summarized in table below for the
two runs. The correction is taken to be the negative of the
average bias. The standard deviation of the correction is the
standard deviation of the average of the ten biases.
Estimated biases for probe #2362
Wafer Probe Run 1 Run 2 All
138 2362 -0.0372 -0.0507
139 2362 -0.0094 -0.0657
140 2362 -0.0261 -0.0398
141 2362 -0.0252 -0.0534
2.6.3.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc632.htm[6/27/2012 1:52:26 PM]
142 2362 -0.0383 -0.0469
Average -0.0272 -0.0513 -0.0393
Standard deviation 0.0162
(10 values)
Configurations
Database and
plot of
differences
Measurements on the check wafers were made with the probe
wired in two different configurations (A, B). A plot of
differences between configuration A and configuration B shows
no bias between the two configurations.
Test for
difference
between
configurations
This finding is consistent over runs 1 and 2 and is confirmed by
the t-statistics in the table below where the average differences
and standard deviations are computed from 6 days of
measurements on 5 wafers. A t-statistic < 2 indicates no
significant difference. The conclusion is that there is no bias due
to wiring configuration and no contribution to uncertainty from
this source.
Differences between configurations
Status Average Std dev DF t
Pre -0.00858 0.0242 29 1.9
Post -0.0110 0.0354 29 1.7
Error budget
showing
sensitivity
coefficients,
standard
deviations and
degrees of
freedom
The error budget showing sensitivity coefficients for computing
the standard uncertainty and degrees of freedom is outlined
below.
Error budget for resistivity (ohm.cm)
Source Type Sensitivity
Standard
Deviation DF
Repeatability A a
1
= 0 0.0729 300
Reproducibility A
a
2
=
0.0362 50
Run-to-run A a
3
= 1 0.0197 5
Probe #2362 A
a
4
=
0.0162 5
Wiring
Configuration A
A a
5
= 1 0 --
Standard
uncertainty
includes
components
for
repeatability,
days, runs and
probe
The standard uncertainty is computed from the error budget as
Approximate The degrees of freedom associated with u are approximated by
2.6.3.2. Analysis and interpretation
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc632.htm[6/27/2012 1:52:26 PM]
degrees of
freedom and
expanded
uncertainty
the Welch-Satterthwaite formula as:
where the
i
are the degrees of freedom given in the rightmost
column of the table.
The critical value at the 0.05 significance level with 42 degrees
of freedom, from the t-table, is 2.018 so the expanded
uncertainty is
U = 2.018 u = 0.078 ohm.cm
2.6.3.2.1. Difference between 2 wiring configurations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6321.htm[6/27/2012 1:52:27 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.3. Evaluation of type A uncertainty
2.6.3.2. Analysis and interpretation
2.6.3.2.1. Difference between 2 wiring
configurations
Measurements
with the probe
configured in
two ways
The graphs below are constructed from resistivity
measurements (ohm.cm) on five wafers where the probe
(#2362) was wired in two different configurations, A and
B. The probe is a 4-point probe with many possible wiring
configurations. For this experiment, only two
configurations were tested as a means of identifying large
discrepancies.
Artifacts for
the study
The five wafers; namely, #138, #139, #140, #141, and #142
are coded 1, 2, 3, 4, 5, respectively, in the graphs. These
wafers were chosen at random from a batch of
approximately 100 wafers that were being certified for
resistivity.
Interpretation Differences between measurements in configurations A
and B, made on the same day, are plotted over six days for
each wafer. The two graphs represent two runs separated
by approximately two months time. The dotted line in the
center is the zero line. The pattern of data points scatters
fairly randomly above and below the zero line -- indicating
no difference between configurations for probe #2362. The
conclusion applies to probe #2362 and cannot be extended
to all probes of this type.
2.6.3.2.1. Difference between 2 wiring configurations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6321.htm[6/27/2012 1:52:27 PM]
2.6.3.2.1. Difference between 2 wiring configurations
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc6321.htm[6/27/2012 1:52:27 PM]
2.6.3.3. Run the type A uncertainty analysis using Dataplot
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc633.htm[6/27/2012 1:52:27 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.3. Evaluation of type A uncertainty
2.6.3.3. Run the type A uncertainty analysis
using Dataplot
View of
Dataplot
macros for
this case
study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output Window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps Results and Conclusions
Click on the links below to start Dataplot
and run this case study yourself. Each
step may use results from previous steps,
so please be patient. Wait until the
software verifies that the current step is
complete before clicking on the next step.
The links in this column will connect you
with more detailed information about
each analysis step from the case study
description.
Time-dependent components from 3-
level nested design
Pool repeatability standard deviations for:
1. Run 1
2. Run 2
Compute level-2 standard
deviations for:
3. Run 1
4. Run 2
5. Pool level-2 standard deviations
Database of measurements with probe
#2362
1. The repeatability standard deviation
is 0.0658 ohm.cm for run 1 and
0.0758 ohm.cm for run 2. This
represents the basic precision of the
measuring instrument.
2. The level-2 standard deviation
pooled over 5 wafers and 2 runs is
0.0362 ohm.cm. This is significant
in the calculation of uncertainty.
3. The level-3 standard deviation
pooled over 5 wafers is 0.0197
ohm.cm. This is small compared to
the other components but is
2.6.3.3. Run the type A uncertainty analysis using Dataplot
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc633.htm[6/27/2012 1:52:27 PM]
6. Compute level-3 standard
deviations
included in the uncertainty
calculation for completeness.
Bias due to probe #2362
1. Plot biases for 5 NIST probes
2. Compute wafer bias and average
bias for probe #2362
3. Correction for bias and standard
deviation
Database of measurements with 5 probes
1. The plot shows that probe #2362 is
biased low relative to the other
probes and that this bias is
consistent over 5 wafers.
2. The bias correction is the average
bias = 0.0393 ohm.cm over the 5
wafers. The correction is to be
subtracted from all measurements
made with probe #2362.
3. The uncertainty of the bias
correction = 0.0051 ohm.cm is
computed from the standard
deviation of the biases for the 5
wafers.
Bias due to wiring configuration A
1. Plot differences between wiring
configurations
2. Averages, standard deviations and
t-statistics
Database of wiring configurations A and
B
1. The plot of measurements in wiring
configurations A and B shows no
difference between A and B.
2. The statistical test confirms that
there is no difference between the
wiring configurations.
Uncertainty
1. Standard uncertainty, df, t-value
and expanded uncertainty
Elements of error budget
1. The uncertainty is computed from
the error budget. The uncertainty
for an average of 6 measurements
on one day with probe #2362 is
0.078 with 42 degrees of freedom.
2.6.3.4. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc634.htm[6/27/2012 1:52:28 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.3. Evaluation of type A uncertainty
2.6.3.4. Dataplot macros
Reads data
and plots the
repeatability
standard
deviations for
probe #2362
and pools
standard
deviations
over days,
wafers -- run
1
reset data
reset plot control
reset i/o
dimension 500 rows
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe =
2362
let df = sr - sr + 5.
y1label ohm.cm
characters * all
lines blank all
x2label Repeatability standard deviations for
probe 2362 - run 1
plot sr subset run 1
let var = sr*sr
let df11 = sum df subset run 1
let s11 = sum var subset run 1
. repeatability standard deviation for run 1
let s11 = (5.*s11/df11)**(1/2)
print s11 df11
. end of calculations
Reads data
and plots
repeatability
standard
deviations for
probe #2362
and pools
standard
deviations
over days,
wafers -- run
2
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
let df = sr - sr + 5.
y1label ohm.cm
characters * all
lines blank all
x2label Repeatability standard deviations for
probe 2362 - run 2
plot sr subset run 2
let var = sr*sr
let df11 = sum df subset run 1
let df12 = sum df subset run 2
let s11 = sum var subset run 1
let s12 = sum var subset run 2
let s11 = (5.*s11/df11)**(1/2)
let s12 = (5.*s12/df12)**(1/2)
print s11 df11
print s12 df12
let s1 = ((s11**2 + s12**2)/2.)**(1/2)
let df1=df11+df12
. repeatability standard deviation and df for
run 2
print s1 df1
. end of calculations
Computes
level-2
standard
deviations
reset data
reset plot control
reset i/o
dimension 500 rows
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
2.6.3.4. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc634.htm[6/27/2012 1:52:28 PM]
from daily
averages and
pools over
wafers -- run
1
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
sd plot y wafer subset run 1
let s21 = yplot
let wafer1 = xplot
retain s21 wafer1 subset tagplot = 1
let nwaf = size s21
let df21 = 5 for i = 1 1 nwaf
. level-2 standard deviations and df for 5
wafers - run 1
print wafer1 s21 df21
. end of calculations
Computes
level-2
standard
deviations
from daily
averages and
pools over
wafers -- run
2
reset data
reset plot control
reset i/o
dimension 500 rows
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
sd plot y wafer subset run 2
let s22 = yplot
let wafer1 = xplot
retain s22 wafer1 subset tagplot = 1
let nwaf = size s22
let df22 = 5 for i = 1 1 nwaf
. level-2 standard deviations and df for 5
wafers - run 1
print wafer1 s22 df22
. end of calculations
Pools level-2
standard
deviations
over wafers
and runs
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
sd plot y wafer subset run 1
let s21 = yplot
let wafer1 = xplot
sd plot y wafer subset run 2
let s22 = yplot
retain s21 s22 wafer1 subset tagplot = 1
let nwaf = size wafer1
let df21 = 5 for i = 1 1 nwaf
let df22 = 5 for i = 1 1 nwaf
let s2a = (s21**2)/5 + (s22**2)/5
let s2 = sum s2a
let s2 = sqrt(s2/2)
let df2a = df21 + df22
let df2 = sum df2a
. pooled level-2 standard deviation and df
across wafers and runs
print s2 df2
. end of calculations
Computes
level-
3standard
deviations
from run
averages and
pools over
wafers
reset data
reset plot control
reset i/o
dimension 500 rows
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
retain run wafer probe y sr subset probe 2362
.
mean plot y wafer subset run 1
let m31 = yplot
let wafer1 = xplot
mean plot y wafer subset run 2
let m32 = yplot
retain m31 m32 wafer1 subset tagplot = 1
let nwaf = size m31
let s31 =(((m31-m32)**2)/2.)**(1/2)
let df31 = 1 for i = 1 1 nwaf
. level-3 standard deviations and df for 5
wafers
2.6.3.4. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc634.htm[6/27/2012 1:52:28 PM]
print wafer1 s31 df31
let s31 = (s31**2)/5
let s3 = sum s31
let s3 = sqrt(s3)
let df3=sum df31
. pooled level-3 std deviation and df over 5
wafers
print s3 df3
. end of calculations
Plot
differences
from the
average wafer
value for each
probe
showing bias
for probe
#2362
reset data
reset plot control
reset i/o
dimension 500 30
read mpc61a.dat wafer probe d1 d2
let biasrun1 = mean d1 subset probe 2362
let biasrun2 = mean d2 subset probe 2362
print biasrun1 biasrun2
title GAUGE STUDY FOR 5 PROBES
Y1LABEL OHM.CM
lines dotted dotted dotted dotted dotted solid
characters 1 2 3 4 5 blank
xlimits 137 143
let zero = pattern 0 for I = 1 1 30
x1label DIFFERENCES AMONG PROBES VS WAFER (RUN
1)
plot d1 wafer probe and
plot zero wafer
let biasrun2 = mean d2 subset probe 2362
print biasrun2
title GAUGE STUDY FOR 5 PROBES
Y1LABEL OHM.CM
lines dotted dotted dotted dotted dotted solid
characters 1 2 3 4 5 blank
xlimits 137 143
let zero = pattern 0 for I = 1 1 30
x1label DIFFERENCES AMONG PROBES VS WAFER (RUN
2)
plot d2 wafer probe and
plot zero wafer
. end of calculations
Compute bias
for probe
#2362 by
wafer
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
set read format
.
cross tabulate mean y run wafer
retain run wafer probe y sr subset probe 2362
skip 1
read dpst1f.dat runid wafid ybar
print runid wafid ybar
let ngroups = size ybar
skip 0
.
let m3 = y - y
feedback off
loop for k = 1 1 ngroups
let runa = runid(k)
let wafera = wafid(k)
let ytemp = ybar(k)
let m3 = ytemp subset run = runa subset
wafer = wafera
end of loop
feedback on
.
let d = y - m3
let bias1 = average d subset run 1
let bias2 = average d subset run 2
.
mean plot d wafer subset run 1
let b1 = yplot
let wafer1 = xplot
mean plot d wafer subset run 2
let b2 = yplot
2.6.3.4. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc634.htm[6/27/2012 1:52:28 PM]
retain b1 b2 wafer1 subset tagplot = 1
let nwaf = size b1
. biases for run 1 and run 2 by wafers
print wafer1 b1 b2
. average biases over wafers for run 1 and run 2
print bias1 bias2
. end of calculations
Compute
correction for
bias for
measurements
with probe
#2362 and the
standard
deviation of
the correction
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
set read format f1.0,f6.0,f8.0,32x,f10.4,f10.4
read mpc633a.dat run wafer probe y sr
set read format
.
cross tabulate mean y run wafer
retain run wafer probe y sr subset probe 2362
skip 1
read dpst1f.dat runid wafid ybar
let ngroups = size ybar
skip 0
.
let m3 = y - y
feedback off
loop for k = 1 1 ngroups
let runa = runid(k)
let wafera = wafid(k)
let ytemp = ybar(k)
let m3 = ytemp subset run = runa subset
wafer = wafera
end of loop
feedback on
.
let d = y - m3
let bias1 = average d subset run 1
let bias2 = average d subset run 2
.
mean plot d wafer subset run 1
let b1 = yplot
let wafer1 = xplot
mean plot d wafer subset run 2
let b2 = yplot
retain b1 b2 wafer1 subset tagplot = 1
.
extend b1 b2
let sd = standard deviation b1
let sdcorr = sd/(10**(1/2))
let correct = -(bias1+bias2)/2.
. correction for probe #2362, standard dev, and
standard dev of corr
print correct sd sdcorr
. end of calculations
Plot
differences
between
wiring
configurations
A and B
reset data
reset plot control
reset i/o
dimension 500 30
label size 3
read mpc633k.dat wafer probe a1 s1 b1 s2 a2 s3
b2 s4
let diff1 = a1 - b1
let diff2 = a2 - b2
let t = sequence 1 1 30
lines blank all
characters 1 2 3 4 5
y1label ohm.cm
x1label Config A - Config B -- Run 1
x2label over 6 days and 5 wafers
x3label legend for wafers 138, 139, 140, 141,
142: 1, 2, 3, 4, 5
plot diff1 t wafer
x1label Config A - Config B -- Run 2
plot diff2 t wafer
. end of calculations
reset data
2.6.3.4. Dataplot macros
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc634.htm[6/27/2012 1:52:28 PM]
Compute
average
differences
between
configuration
A and B;
standard
deviations and
t-statistics for
testing
significance
reset plot control
reset i/o
separator character @
dimension 500 rows
label size 3
read mpc633k.dat wafer probe a1 s1 b1 s2 a2 s3
b2 s4
let diff1 = a1 - b1
let diff2 = a2 - b2
let d1 = average diff1
let d2 = average diff2
let s1 = standard deviation diff1
let s2 = standard deviation diff2
let t1 = (30.)**(1/2)*(d1/s1)
let t2 = (30.)**(1/2)*(d2/s2)
. Average config A-config B; std dev difference;
t-statistic for run 1
print d1 s1 t1
. Average config A-config B; std dev difference;
t-statistic for run 2
print d2 s2 t2
separator character ;
. end of calculations
Compute
standard
uncertainty,
effective
degrees of
freedom, t
value and
expanded
uncertainty
reset data
reset plot control
reset i/o
dimension 500 rows
label size 3
read mpc633m.dat sz a df
let c = a*sz*sz
let d = c*c
let e = d/(df)
let sume = sum e
let u = sum c
let u = u**(1/2)
let effdf=(u**4)/sume
let tvalue=tppf(.975,effdf)
let expu=tvalue*u
.
. uncertainty, effective degrees of freedom,
tvalue and
. expanded uncertainty
print u effdf tvalue expu
. end of calculations
2.6.4. Evaluation of type B uncertainty and propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc64.htm[6/27/2012 1:52:30 PM]
2. Measurement Process Characterization
2.6. Case studies
2.6.4. Evaluation of type B uncertainty and
propagation of error
Focus of this
case study
The purpose of this case study is to demonstrate uncertainty
analysis using statistical techniques coupled with type B analyses
and propagation of error. It is a continuation of the case study of
type A uncertainties.
Background -
description of
measurements
and
constraints
The measurements in question are volume resistivities (ohm.cm)
of silicon wafers which have the following definition:
= Xo
.
K
a
.
F
t
.
t
.
F
t/s
with explanations of the quantities and their nominal values
shown below:
= resistivity = 0.00128 ohm.cm
X = voltage/current (ohm)
t = thickness
wafer
(cm) = 0.628 cm
K
a
= factor
electrical
= 4.50 ohm.cm
F
F
= correction
temp
F
t/s
= factor
thickness/separation
1.0
Type A
evaluations
The resistivity measurements, discussed in the case study of type
A evaluations, were replicated to cover the following sources of
uncertainty in the measurement process, and the associated
uncertainties are reported in units of resistivity (ohm.cm).
Repeatability of measurements at the center of the wafer
Day-to-day effects
Run-to-run effects
Bias due to probe #2362
Bias due to wiring configuration
Need for
propagation
of error
Not all factors could be replicated during the gauge experiment.
Wafer thickness and measurements required for the scale
corrections were measured off-line. Thus, the type B evaluation
of uncertainty is computed using propagation of error. The
propagation of error formula in units of resistivity is as follows:
2.6.4. Evaluation of type B uncertainty and propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc64.htm[6/27/2012 1:52:30 PM]
Standard
deviations for
type B
evaluations
Standard deviations for the type B components are summarized
here. For a complete explanation, see the publication (Ehrstein
and Croarkin).
Electrical
measurements
There are two basic sources of uncertainty for the electrical
measurements. The first is the least-count of the digital volt
meter in the measurement of X with a maximum bound of
a = 0.0000534 ohm
which is assumed to be the half-width of a uniform distribution.
The second is the uncertainty of the electrical scale factor. This
has two sources of uncertainty:
1. error in the solution of the transcendental equation for
determining the factor
2. errors in measured voltages
The maximum bounds to these errors are assumed to be half-
widths of
a = 0.0001 ohm.cm and a = 0.00038 ohm.cm
respectively, from uniform distributions. The corresponding
standard deviations are shown below.
s
x
= 0.0000534/ = 0.0000308 ohm
Thickness The standard deviation for thickness, t, accounts for two sources
of uncertainty:
1. calibration of the thickness measuring tool with precision
gauge blocks
2. variation in thicknesses of the silicon wafers
The maximum bounds to these errors are assumed to be half-
widths of
a = 0.000015 cm and a = 0.000001 cm
respectively, from uniform distributions. Thus, the standard
deviation for thickness is
2.6.4. Evaluation of type B uncertainty and propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc64.htm[6/27/2012 1:52:30 PM]
Temperature
correction
The standard deviation for the temperature correction is
calculated from its defining equation as shown below. Thus, the
standard deviation for the correction is the standard deviation
associated with the measurement of temperature multiplied by
the temperature coefficient, C(t) = 0.0083. The maximum
bound to the error of the temperature measurement is assumed to
be the half-width
a = 0.13 C
of a triangular distribution. Thus the standard deviation of the
correction for
is
Thickness
scale factor
The standard deviation for the thickness scale factor is negligible.
Associated
sensitivity
coefficients
Sensitivity coefficients for translating the standard deviations for
the type B components into units of resistivity (ohm.cm) from
the propagation of error equation are listed below and in the
error budget. The sensitivity coefficient for a source is the
multiplicative factor associated with the standard deviation in the
formula above; i.e., the partial derivative with respect to that
variable from the propagation of error equation.
a
6
= ( /X) = 100/0.111 = 900.901
a
7
= ( /K
a
) = 100/4.50 = 22.222
a
8
= ( /t) = 100/0.628 = 159.24
a
9
= ( /F
T
) = 100
a
10
= ( /F
t/S
) = 100
Sensitivity
coefficients
and degrees
of freedom
Sensitivity coefficients for the type A components are shown in
the case study of type A uncertainty analysis and repeated below.
Degrees of freedom for type B uncertainties based on assumed
distributions, according to the convention, are assumed to be
infinite.
Error budget
showing
The error budget showing sensitivity coefficients for computing
the relative standard uncertainty of volume resistivity (ohm.cm)
2.6.4. Evaluation of type B uncertainty and propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc64.htm[6/27/2012 1:52:30 PM]
sensitivity
coefficients,
standard
deviations
and degrees
of freedom
with degrees of freedom is outlined below.
Error budget for volume resistivity (ohm.cm)
Source Type Sensitivity
Standard
Deviation DF
Repeatability A a
1
= 0 0.0729 300
Reproducibility A
a
2
=
0.0362 50
Run-to-run A a
3
= 1 0.0197 5
Probe #2362 A
a
4
=
0.0162 5
Wiring
Configuration A
A a
5
= 1 0 --
Resistance
ratio
B a
6
= 900.901 0.0000308
Electrical
scale
B a
7
= 22.222 0.000227
Thickness B a
8
= 159.20 0.00000868
Temperature
correction
B a
9
= 100 0.000441
Thickness
scale
B a
10
= 100 0 --
Standard
uncertainty
The standard uncertainty is computed as:
Approximate
degrees of
freedom and
expanded
uncertainty
The degrees of freedom associated with u are approximated by
the Welch-Satterthwaite formula as:
This calculation is not affected by components with infinite
degrees of freedom, and therefore, the degrees of freedom for the
standard uncertainty is the same as the degrees of freedom for the
type A uncertainty. The critical value at the 0.05 significance
level with 42 degrees of freedom, from the t-table, is 2.018 so
the expanded uncertainty is
U = 2.018 u = 0.13 ohm.cm
2.6.4. Evaluation of type B uncertainty and propagation of error
http://www.itl.nist.gov/div898/handbook/mpc/section6/mpc64.htm[6/27/2012 1:52:30 PM]
2.7. References
http://www.itl.nist.gov/div898/handbook/mpc/section7/mpc7.htm[6/27/2012 1:52:31 PM]
2. Measurement Process Characterization
2.7. References
Degrees of
freedom
K. A. Brownlee (1960). Statistical Theory and
Methodology in Science and Engineering, John Wiley &
Sons, Inc., New York, p. 236.
Calibration
designs
J. M. Cameron, M. C. Croarkin and R. C. Raybold (1977).
Designs for the Calibration of Standards of Mass, NBS
Technical Note 952, U.S. Dept. Commerce, 58 pages.
Calibration
designs for
eliminating
drift
J. M. Cameron and G. E. Hailes (1974). Designs for the
Calibration of Small Groups of Standards in the
Presence of Drift, Technical Note 844, U.S. Dept.
Commerce, 31 pages.
Measurement
assurance for
measurements
on ICs
Carroll Croarkin and Ruth Varner (1982). Measurement
Assurance for Dimensional Measurements on
I ntegrated-circuit Photomasks, NBS Technical Note
1164, U.S. Dept. Commerce, 44 pages.
Calibration
designs for
gauge blocks
Ted Doiron (1993). Drift Eliminating Designs for Non-
Simultaneous Comparison Calibrations, J Research
National Institute of Standards and Technology, 98, pp.
217-224.
Type A & B
uncertainty
analyses for
resistivities
J. R. Ehrstein and M. C. Croarkin (1998). Standard
Reference Materials: The Certification of 100 mm
Diameter Silicon Resistivity SRMs 2541 through 2547
Using Dual-Configuration Four-Point Probe
Measurements, NIST Special Publication 260-131,
Revised, 84 pages.
Calibration
designs for
electrical
standards
W. G. Eicke and J. M. Cameron (1967). Designs for
Surveillance of the Volt Maintained By a Group of
Saturated Standard Cells, NBS Technical Note 430, U.S.
Dept. Commerce 19 pages.
Theory of
uncertainty
analysis
Churchill Eisenhart (1962). Realistic Evaluation of the
Precision and Accuracy of I nstrument Calibration
SystemsJ Research National Bureau of Standards-C.
Engineering and Instrumentation, Vol. 67C, No.2, p. 161-
187.
Confidence,
prediction, and
Gerald J. Hahn and William Q. Meeker (1991). Statistical
I ntervals: A Guide for Practitioners, John Wiley & Sons,
2.7. References
http://www.itl.nist.gov/div898/handbook/mpc/section7/mpc7.htm[6/27/2012 1:52:31 PM]
tolerance
intervals
Inc., New York.
Original
calibration
designs for
weighings
J. A. Hayford (1893). On the Least Square Adjustment of
Weighings, U.S. Coast and Geodetic Survey Appendix 10,
Report for 1892.
Uncertainties
for values from
a calibration
curve
Thomas E. Hockersmith and Harry H. Ku (1993).
Uncertainties associated with proving ring calibrations,
NBS Special Publication 300: Precision Measurement and
Calibration, Statistical Concepts and Procedures, Vol. 1,
pp. 257-263, H. H. Ku, editor.
EWMA control
charts
J. Stuart Hunter (1986). The Exponentially Weighted
Moving Average, J Quality Technology, Vol. 18, No. 4,
pp. 203-207.
Fundamentals
of mass
metrology
K. B. Jaeger and R. S. Davis (1984). A Primer for Mass
Metrology, NBS Special Publication 700-1, 85 pages.
Fundamentals
of propagation
of error
Harry Ku (1966). Notes on the Use of Propagation of
Error Formulas, J Research of National Bureau of
Standards-C. Engineering and Instrumentation, Vol. 70C,
No.4, pp. 263-273.
Handbook of
statistical
methods
Mary Gibbons Natrella (1963). Experimental Statistics,
NBS Handbook 91, US Deptartment of Commerce.
Omnitab Sally T. Peavy, Shirley G. Bremer, Ruth N. Varner, David
Hogben (1986). OMNI TAB 80: An I nterpretive System
for Statistical and Numerical Data Analysis, NBS
Special Publication 701, US Deptartment of Commerce.
Uncertainties
for
uncorrected
bias
Steve D. Phillips and Keith R. Eberhardt (1997).
Guidelines for Expressing the Uncertainty of
Measurement Results Containing Uncorrected Bias,
NIST Journal of Research, Vol. 102, No. 5.
Calibration of
roundness
artifacts
Charles P. Reeve (1979). Calibration designs for
roundness standards, NBSIR 79-1758, 21 pages.
Calibration
designs for
angle blocks
Charles P. Reeve (1967). The Calibration of Angle
Blocks by Comparison, NBSIR 80-19767, 24 pages.
SI units Barry N. Taylor (1991). I nterpretation of the SI for the
United States and Metric Conversion Policy for Federal
Agencies, NIST Special Publication 841, U.S.
2.7. References
http://www.itl.nist.gov/div898/handbook/mpc/section7/mpc7.htm[6/27/2012 1:52:31 PM]
Deptartment of Commerce.
Uncertainties
for calibrated
values
Raymond Turgel and Dominic Vecchia (1987). Precision
Calibration of Phase Meters, IEEE Transactions on
Instrumentation and Measurement, Vol. IM-36, No. 4., pp.
918-922.
Example of
propagation of
error for flow
measurements
James R. Whetstone et al. (1989). Measurements of
Coefficients of Discharge for Concentric Flange-Tapped
Square-Edged Orifice Meters in Water Over the
Reynolds Number Range 600 to 2,700,000, NIST
Technical Note 1264. pp. 97.
Mathematica
software
Stephen Wolfram (1993). Mathematica, A System of
Doing Mathematics by Computer, 2nd edition, Addison-
Wesley Publishing Co., New York.
Restrained
least squares
Marvin Zelen (1962). "Linear Estimation and Related
Topics" in Survey of Numerical Analysis edited by John
Todd, McGraw-Hill Book Co. Inc., New York, pp. 558-
577.
ASTM F84 for
resistivity
ASTM Method F84-93, Standard Test Method for
Measuring Resistivity of Silicon Wafers With an I n-line
Four-Point Probe. Annual Book of ASTM Standards,
10.05, West Conshohocken, PA 19428.
ASTM E691
for
interlaboratory
testing
ASTM Method E691-92, Standard Practice for
Conducting an I nterlaboratory Study to Determine the
Precision of a Test Method. Annual Book of ASTM
Standards, 10.05, West Conshohocken, PA 19428.
Guide to
uncertainty
analysis
Guide to the Expression of Uncertainty of Measurement
(1993). ISBN 91-67-10188-9, 1st ed. ISO, Case postale
56, CH-1211, Genve 20, Switzerland, 101 pages.
ISO 5725 for
interlaboratory
testing
I SO 5725: 1997. Accuracy (trueness and precision) of
measurement results, Part 2: Basic method for
repeatability and reproducibility of a standard
measurement method, ISO, Case postale 56, CH-1211,
Genve 20, Switzerland.
ISO 11095 on
linear
calibration
I SO 11095: 1997. Linear Calibration using Reference
Materials, ISO, Case postale 56, CH-1211, Genve 20,
Switzerland.
MSA gauge
studies manual
Measurement Systems Analysis Reference Manual, 2nd
ed., (1995). Chrysler Corp., Ford Motor Corp., General
Motors Corp., 120 pages.
NCSL RP on
uncertainty
Determining and Reporting Measurement Uncertainties,
National Conference of Standards Laboratories RP-12,
2.7. References
http://www.itl.nist.gov/div898/handbook/mpc/section7/mpc7.htm[6/27/2012 1:52:31 PM]
analysis (1994), Suite 305B, 1800 30th St., Boulder, CO 80301.
ISO
Vocabulary for
metrology
I nternational Vocabulary of Basic and General Terms in
Metrology, 2nd ed., (1993). ISO, Case postale 56, CH-
1211, Genve 20, Switzerland, 59 pages.
Exact variance
for length and
width
Leo Goodman (1960). "On the Exact Variance of
Products" in Journal of the American Statistical
Association, December, 1960, pp. 708-713.
3. Production Process Characterization
http://www.itl.nist.gov/div898/handbook/ppc/ppc.htm[6/27/2012 2:11:14 PM]
3. Production Process Characterization
The goal of this chapter is to learn how to plan and conduct a Production
Process Characterization Study (PPC) on manufacturing processes. We will
learn how to model manufacturing processes and use these models to design
a data collection scheme and to guide data analysis activities. We will look
in detail at how to analyze the data collected in characterization studies and
how to interpret and report the results. The accompanying Case Studies
provide detailed examples of several process characterization studies.
1. Introduction
1. Definition
2. Uses
3. Terminology/Concepts
4. PPC Steps
2. Assumptions
1. General Assumptions
2. Specific PPC Models
3. Data Collection
1. Set Goals
2. Model the Process
3. Define Sampling Plan
4. Analysis
1. First Steps
2. Exploring Relationships
3. Model Building
4. Variance Components
5. Process Stability
6. Process Capability
7. Checking Assumptions
5. Case Studies
1. Furnace Case Study
2. Machine Case Study
Detailed Chapter Table of Contents
References
3. Production Process Characterization
http://www.itl.nist.gov/div898/handbook/ppc/ppc_d.htm[6/27/2012 2:10:10 PM]
3. Production Process Characterization - Detailed Table of
Contents [3.]
1. Introduction to Production Process Characterization [3.1.]
1. What is PPC? [3.1.1.]
2. What are PPC Studies Used For? [3.1.2.]
3. Terminology/Concepts [3.1.3.]
1. Distribution (Location, Spread and Shape) [3.1.3.1.]
2. Process Variability [3.1.3.2.]
1. Controlled/Uncontrolled Variation [3.1.3.2.1.]
3. Propagating Error [3.1.3.3.]
4. Populations and Sampling [3.1.3.4.]
5. Process Models [3.1.3.5.]
6. Experiments and Experimental Design [3.1.3.6.]
4. PPC Steps [3.1.4.]
2. Assumptions / Prerequisites [3.2.]
1. General Assumptions [3.2.1.]
2. Continuous Linear Model [3.2.2.]
3. Analysis of Variance Models (ANOVA) [3.2.3.]
1. One-Way ANOVA [3.2.3.1.]
1. One-Way Value-Splitting [3.2.3.1.1.]
2. Two-Way Crossed ANOVA [3.2.3.2.]
1. Two-way Crossed Value-Splitting Example [3.2.3.2.1.]
3. Two-Way Nested ANOVA [3.2.3.3.]
1. Two-Way Nested Value-Splitting Example [3.2.3.3.1.]
4. Discrete Models [3.2.4.]
3. Data Collection for PPC [3.3.]
1. Define Goals [3.3.1.]
2. Process Modeling [3.3.2.]
3. Define Sampling Plan [3.3.3.]
1. Identifying Parameters, Ranges and Resolution [3.3.3.1.]
2. Choosing a Sampling Scheme [3.3.3.2.]
3. Selecting Sample Sizes [3.3.3.3.]
4. Data Storage and Retrieval [3.3.3.4.]
5. Assign Roles and Responsibilities [3.3.3.5.]
4. Data Analysis for PPC [3.4.]
1. First Steps [3.4.1.]
2. Exploring Relationships [3.4.2.]
1. Response Correlations [3.4.2.1.]
2. Exploring Main Effects [3.4.2.2.]
3. Production Process Characterization
http://www.itl.nist.gov/div898/handbook/ppc/ppc_d.htm[6/27/2012 2:10:10 PM]
3. Exploring First Order Interactions [3.4.2.3.]
3. Building Models [3.4.3.]
1. Fitting Polynomial Models [3.4.3.1.]
2. Fitting Physical Models [3.4.3.2.]
4. Analyzing Variance Structure [3.4.4.]
5. Assessing Process Stability [3.4.5.]
6. Assessing Process Capability [3.4.6.]
7. Checking Assumptions [3.4.7.]
5. Case Studies [3.5.]
1. Furnace Case Study [3.5.1.]
1. Background and Data [3.5.1.1.]
2. Initial Analysis of Response Variable [3.5.1.2.]
3. Identify Sources of Variation [3.5.1.3.]
4. Analysis of Variance [3.5.1.4.]
5. Final Conclusions [3.5.1.5.]
6. Work This Example Yourself [3.5.1.6.]
2. Machine Screw Case Study [3.5.2.]
1. Background and Data [3.5.2.1.]
2. Box Plots by Factors [3.5.2.2.]
3. Analysis of Variance [3.5.2.3.]
4. Throughput [3.5.2.4.]
5. Final Conclusions [3.5.2.5.]
6. Work This Example Yourself [3.5.2.6.]
6. References [3.6.]
3.1. Introduction to Production Process Characterization
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc1.htm[6/27/2012 2:10:17 PM]
3. Production Process Characterization
3.1. Introduction to Production Process
Characterization
Overview
Section
The goal of this section is to provide an introduction to PPC.
We will define PPC and the terminology used and discuss
some of the possible uses of a PPC study. Finally, we will
look at the steps involved in designing and executing a PPC
study.
Contents:
Section 1
1. What is PPC?
2. What are PPC studies used for?
3. What terminology is used in PPC?
1. Location, Spread and Shape
2. Process Variability
3. Propagating Error
4. Populations and Sampling
5. Process Models
6. Experiments and Experimental Design
4. What are the steps of a PPC?
1. Plan PPC
2. Collect Data
3. Analyze and Interpret Data
4. Report Conclusions
3.1.1. What is PPC?
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc11.htm[6/27/2012 2:10:18 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.1. What is PPC?
In PPC,
we build
data-
based
models
Process characterization is an activity in which we:
identify the key inputs and outputs of a process
collect data on their behavior over the entire operating
range
estimate the steady-state behavior at optimal operating
conditions
and build models describing the parameter relationships
across the operating range
The result of this activity is a set of mathematical process
models that we can use to monitor and improve the process.
This is a
three-step
process
This activity is typically a three-step process.
The Screening Step
In this phase we identify all possible significant process
inputs and outputs and conduct a series of screening
experiments in order to reduce that list to the key inputs
and outputs. These experiments will also allow us to
develop initial models of the relationships between those
inputs and outputs.
The Mapping Step
In this step we map the behavior of the key outputs over
their expected operating ranges. We do this through a
series of more detailed experiments called Response
Surface experiments.
The Passive Step
In this step we allow the process to run at nominal
conditions and estimate the process stability and
capability.
Not all of
the steps
need to be
performed
The first two steps are only needed for new processes or when
the process has undergone some significant engineering
change. There are, however, many times throughout the life
of a process when the third step is needed. Examples might
be: initial process qualification, control chart development,
after minor process adjustments, after schedule equipment
maintenance, etc.
3.1.1. What is PPC?
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc11.htm[6/27/2012 2:10:18 PM]
3.1.2. What are PPC Studies Used For?
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc12.htm[6/27/2012 2:10:19 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.2. What are PPC Studies Used For?
PPC is the core
of any CI
program
Process characterization is an integral part of any
continuous improvement program. There are many steps
in that program for which process characterization is
required. These might include:
When process
characterization
is required
when we are bringing a new process or tool into
use.
when we are bringing a tool or process back up
after scheduled/unscheduled maintenance.
when we want to compare tools or processes.
when we want to check the health of our process
during the monitoring phase.
when we are troubleshooting a bad process.
The techniques described in this chapter are equally
applicable to the other chapters covered in this
Handbook. These include:
Process
characterization
techniques are
applicable in
other areas
calibration
process monitoring
process improvement
process/product comparison
reliability
3.1.3. Terminology/Concepts
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc13.htm[6/27/2012 2:10:19 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
There are just a few fundamental concepts needed
for PPC. This section will review these ideas briefly
and provide links to other sections in the Handbook
where they are covered in more detail.
Distribution(location,
spread, shape)
For basic data analysis, we will need to understand
how to estimate location, spread and shape from the
data. These three measures comprise what is known
as the distribution of the data. We will look at both
graphical and numerical techniques.
Process variability We need to thoroughly understand the concept of
process variability. This includes how variation
explains the possible range of expected data values,
the various classifications of variability, and the role
that variability plays in process stability and
capability.
Error propagation We also need to understand how variation
propagates through our manufacturing processes
and how to decompose the total observed variation
into components attributable to the contributing
sources.
Populations and
sampling
It is important to have an understanding of the
various issues related to sampling. We will define a
population and discuss how to acquire
representative random samples from the population
of interest. We will also discuss a useful formula
for estimating the number of observations required
to answer specific questions.
Modeling For modeling, we will need to know how to identify
important factors and responses. We will also need
to know how to graphically and quantitatively build
models of the relationships between the factors and
responses.
Experiments Finally, we will need to know about the basics of
designed experiments including screening designs
and response surface designs so that we can
quantify these relationships. This topic will receive
3.1.3. Terminology/Concepts
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc13.htm[6/27/2012 2:10:19 PM]
only a cursory treatment in this chapter. It is
covered in detail in the process improvement
chapter. However, examples of its use are in the
case studies.
3.1.3.1. Distribution (Location, Spread and Shape)
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc131.htm[6/27/2012 2:10:20 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
3.1.3.1. Distribution (Location, Spread and
Shape)
Distributions
are
characterized
by location,
spread and
shape
A fundamental concept in representing any of the outputs
from a production process is that of a distribution.
Distributions arise because any manufacturing process
output will not yield the same value every time it is
measured. There will be a natural scattering of the
measured values about some central tendency value. This
scattering about a central value is known as a distribution.
A distribution is characterized by three values:
Location
The location is the expected value of the output being
measured. For a stable process, this is the value
around which the process has stabilized.
Spread
The spread is the expected amount of variation
associated with the output. This tells us the range of
possible values that we would expect to see.
Shape
The shape shows how the variation is distributed
about the location. This tells us if our variation is
symmetric about the mean or if it is skewed or
possibly multimodal.
A primary
goal of PPC
is to estimate
the
distributions
of the
process
outputs
One of the primary goals of a PPC study is to characterize
our process outputs in terms of these three measurements. If
we can demonstrate that our process is stabilized about a
constant location, with a constant variance and a known
stable shape, then we have a process that is both predictable
and controllable. This is required before we can set up
control charts or conduct experiments.
Click on
each item to
read more
detail
The table below shows the most common numerical and
graphical measures of location, spread and shape.
Parameter Numerical Graphical
Location
mean
median
scatter plot
boxplot
3.1.3.1. Distribution (Location, Spread and Shape)
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc131.htm[6/27/2012 2:10:20 PM]
histogram
Spread
variance
range
inter-quartile range
boxplot
histogram
Shape
skewness
kurtosis
boxplot
histogram
probability plot
3.1.3.2. Process Variability
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc132.htm[6/27/2012 2:10:21 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
3.1.3.2. Process Variability
Variability
is present
everywhere
All manufacturing and measurement processes exhibit variation. For example, when
we take sample data on the output of a process, such as critical dimensions, oxide
thickness, or resistivity, we observe that all the values are NOT the same. This results
in a collection of observed values distributed about some location value. This is what
we call spread or variability. We represent variability numerically with the variance
calculation and graphically with a histogram.
How does
the
standard
deviation
describe the
spread of
the data?
The standard deviation (square root of the variance) gives insight into the spread of the
data through the use of what is known as the Empirical Rule. This rule (shown in the
graph below) is:
Approximately 60-78% of the data are within a distance of one standard deviation
from the average .
Approximately 90-98% of the data are within a distance of two standard deviations
from the average .
More than 99% of the data are within a distance of three standard deviations from the
average
3.1.3.2. Process Variability
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc132.htm[6/27/2012 2:10:21 PM]
Variability
accumulates
from many
sources
This observed variability is an accumulation of many different sources of variation that
have occurred throughout the manufacturing process. One of the more important
activities of process characterization is to identify and quantify these various sources
of variation so that they may be minimized.
There are
also
different
types
There are not only different sources of variation, but there are also different types of
variation. Two important classifications of variation for the purposes of PPC are
controlled variation and uncontrolled variation.
Click here
to see
examples
CONTROLLED VARIATION
Variation that is characterized by a stable and consistent pattern of variation
over time. This type of variation will be random in nature and will be exhibited
by a uniform fluctuation about a constant level.
UNCONTROLLED VARIATION
Variation that is characterized by a pattern of variation that changes over time
and hence is unpredictable. This type of variation will typically contain some
structure.
Stable
processes
only exhibit
controlled
variation
This concept of controlled/uncontrolled variation is important in determining if a
process is stable. A process is deemed stable if it runs in a consistent and predictable
manner. This means that the average process value is constant and the variability is
controlled. If the variation is uncontrolled, then either the process average is changing
or the process variation is changing or both. The first process in the example above is
stable; the second is not.
In the course of process characterization we should endeavor to eliminate all sources
3.1.3.2. Process Variability
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc132.htm[6/27/2012 2:10:21 PM]
of uncontrolled variation.
3.1.3.2.1. Controlled/Uncontrolled Variation
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc1321.htm[6/27/2012 2:10:22 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
3.1.3.2. Process Variability
3.1.3.2.1. Controlled/Uncontrolled Variation
Two trend
plots
The two figures below are two trend plots from two different oxide
growth processes. Thirty wafers were sampled from each process: one
per day over 30 days. Thickness at the center was measured on each
wafer. The x-axis of each graph is the wafer number and the y-axis is the
film thickness in angstroms.
Examples
of"in
control" and
"out of
control"
processes
The first process is an example of a process that is "in control" with
random fluctuation about a process location of approximately 990. The
second process is an example of a process that is "out of control" with a
process location trending upward after observation 20.
This process
exhibits
controlled
variation.
Note the
random
fluctuation
about a
constant
mean.
This process
exhibits
uncontrolled
variation.
Note the
structure in
the
3.1.3.2.1. Controlled/Uncontrolled Variation
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc1321.htm[6/27/2012 2:10:22 PM]
variation in
the form of
a linear
trend.
3.1.3.3. Propagating Error
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc133.htm[6/27/2012 2:10:22 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
3.1.3.3. Propagating Error
The
variation we
see can
come from
many
sources
When we estimate the variance at a particular process step, this
variance is typically not just a result of the current step, but rather is
an accumulation of variation from previous steps and from
measurement error. Therefore, an important question that we need
to answer in PPC is how the variation from the different sources
accumulates. This will allow us to partition the total variation and
assign the parts to the various sources. Then we can attack the
sources that contribute the most.
How do I
partition the
error?
Usually we can model the contribution of the various sources of
error to the total error through a simple linear relationship. If we
have a simple linear relationship between two variables, say,
then the variance associated with, y, is given by,
.
If the variables are not correlated, then there is no covariance and
the last term in the above equation drops off. A good example of
this is the case in which we have both process error and
measurement error. Since these are usually independent of each
other, the total observed variance is just the sum of the variances for
process and measurement. Remember to never add standard
deviations, we must add variances.
How do I
calculate the
individual
components?
Of course, we rarely have the individual components of variation
and wish to know the total variation. Usually, we have an estimate
of the overall variance and wish to break that variance down into its
individual components. This is known as components of variance
estimation and is dealt with in detail in the analysis of variance
page later in this chapter.
3.1.3.4. Populations and Sampling
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc134.htm[6/27/2012 2:10:23 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
3.1.3.4. Populations and Sampling
We take
samples
from a
target
population
and make
inferences
In survey sampling, if you want to know what everyone thinks
about a particular topic, you can just ask everyone and record
their answers. Depending on how you define the term,
everyone (all the adults in a town, all the males in the USA,
etc.), it may be impossible or impractical to survey everyone.
The other option is to survey a small group (Sample) of the
people whose opinions you are interested in (Target
Population) , record their opinions and use that information to
make inferences about what everyone thinks. Opinion pollsters
have developed a whole body of tools for doing just that and
many of those tools apply to manufacturing as well. We can
use these sampling techniques to take a few measurements
from a process and make statements about the behavior of that
process.
Facts
about a
sample are
not
necessarily
facts about
a
population
If it weren't for process variation we could just take one
sample and everything would be known about the target
population. Unfortunately this is never the case. We cannot
take facts about the sample to be facts about the population.
Our job is to reach appropriate conclusions about the
population despite this variation. The more observations we
take from a population, the more our sample data resembles
the population. When we have reached the point at which facts
about the sample are reasonable approximations of facts about
the population, then we say the sample is adequate.
Four
attributes
of samples
Adequacy of a sample depends on the following four
attributes:
Representativeness of the sample (is it random?)
Size of the sample
Variability in the population
Desired precision of the estimates
We will learn about choosing representative samples of
adequate size in the section on defining sampling plans.
3.1.3.5. Process Models
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc135.htm[6/27/2012 2:10:24 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
3.1.3.5. Process Models
Black box
model and
fishbone
diagram
As we will see in Section 3 of this chapter, one of the first steps in PPC is to
model the process that is under investigation. Two very useful tools for
doing this are the black-box model and the fishbone diagram.
We use the
black-box
model to
describe
our
processes
We can use the simple black-box model, shown below, to describe most of
the tools and processes we will encounter in PPC. The process will be
stimulated by inputs. These inputs can either be controlled (such as recipe or
machine settings) or uncontrolled (such as humidity, operators, power
fluctuations, etc.). These inputs interact with our process and produce
outputs. These outputs are usually some characteristic of our process that we
can measure. The measurable inputs and outputs can be sampled in order to
observe and understand how they behave and relate to each other.
Diagram
of the
black box
model
These inputs and outputs are also known as Factors and Responses,
respectively.
3.1.3.5. Process Models
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc135.htm[6/27/2012 2:10:24 PM]
Factors
Observed inputs used to explain response behavior (also called
explanatory variables). Factors may be fixed-level controlled inputs or
sampled uncontrolled inputs.
Responses
Sampled process outputs. Responses may also be functions of sampled
outputs such as average thickness or uniformity.
Factors
and
Responses
are further
classified
by
variable
type
We further categorize factors and responses according to their Variable
Type, which indicates the amount of information they contain. As the name
implies, this classification is useful for data modeling activities and is
critical for selecting the proper analysis technique. The table below
summarizes this categorization. The types are listed in order of the amount
of information they contain with Measurement containing the most
information and Nominal containing the least.
Table
describing
the
different
variable
types
Type Description Example
Measurement
discrete/continuous, order is
important, infinite range
particle count, oxide
thickness, pressure,
temperature
Ordinal
discrete, order is important,
finite range
run #, wafer #, site, bin
Nominal
discrete, no order, very few
possible values
good/bad, bin,
high/medium/low, shift,
operator
Fishbone
diagrams
help to
decompose
complexity
We can use the fishbone diagram to further refine the modeling process.
Fishbone diagrams are very useful for decomposing the complexity of our
manufacturing processes. Typically, we choose a process characteristic
(either Factors or Responses) and list out the general categories that may
influence the characteristic (such as material, machine method, environment,
etc.), and then provide more specific detail within each category. Examples
of how to do this are given in the section on Case Studies.
Sample
fishbone
diagram
3.1.3.5. Process Models
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc135.htm[6/27/2012 2:10:24 PM]
3.1.3.6. Experiments and Experimental Design
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc136.htm[6/27/2012 2:10:25 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.3. Terminology/Concepts
3.1.3.6. Experiments and Experimental Design
Factors and
responses
Besides just observing our processes for evidence of stability
and capability, we quite often want to know about the
relationships between the various Factors and Responses.
We look for
correlations
and causal
relationships
There are generally two types of relationships that we are
interested in for purposes of PPC. They are:
Correlation
Two variables are said to be correlated if an observed
change in the level of one variable is accompanied by
a change in the level of another variable. The change
may be in the same direction (positive correlation) or
in the opposite direction (negative correlation).
Causality
There is a causal relationship between two variables if
a change in the level of one variable causes a change
in the other variable.
Note that correlation does not imply causality. It is possible
for two variables to be associated with each other without
one of them causing the observed behavior in the other.
When this is the case it is usually because there is a third
(possibly unknown) causal factor.
Our goal is
to find
causal
relationships
Generally, our ultimate goal in PPC is to find and quantify
causal relationships. Once this is done, we can then take
advantage of these relationships to improve and control our
processes.
Find
correlations
and then try
to establish
causal
relationships
Generally, we first need to find and explore correlations and
then try to establish causal relationships. It is much easier to
find correlations as these are just properties of the data. It is
much more difficult to prove causality as this additionally
requires sound engineering judgment. There is a systematic
procedure we can use to accomplish this in an efficient
manner. We do this through the use of designed
experiments.
First we
screen, then
we build
When we have many potential factors and we want to see
which ones are correlated and have the potential to be
involved in causal relationships with the responses, we use
3.1.3.6. Experiments and Experimental Design
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc136.htm[6/27/2012 2:10:25 PM]
models screening designs to reduce the number of candidates. Once
we have a reduced set of influential factors, we can use
response surface designs to model the causal relationships
with the responses across the operating range of the process
factors.
Techniques
discussed in
process
improvement
chapter
The techniques are covered in detail in the process
improvement section and will not be discussed much in this
chapter. Examples of how the techniques are used in PPC
are given in the Case Studies.
3.1.4. PPC Steps
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc14.htm[6/27/2012 2:10:25 PM]
3. Production Process Characterization
3.1. Introduction to Production Process Characterization
3.1.4. PPC Steps
Follow
these 4
steps to
ensure
efficient
use of
resources
The primary activity of a PPC is to collect and analyze data
so that we may draw conclusions about and ultimately
improve our production processes. In many industrial
applications, access to production facilities for the purposes of
conducting experiments is very limited. Thus we must be
very careful in how we go about these activities so that we
can be sure of doing them in a cost-effective manner.
Step 1:
Plan
The most important step by far is the planning step. By
faithfully executing this step, we will ensure that we only
collect data in the most efficient manner possible and still
support the goals of the PPC. Planning should generate the
following:
a statement of the goals
a descriptive process model (a list of process inputs and
outputs)
a description of the sampling plan (including a
description of the procedure and settings to be used to
run the process during the study with clear assignments
for each person involved)
a description of the method of data collection, tasks and
responsibilities, formatting, and storage
an outline of the data analysis
All decisions that affect how the characterization will be
conducted should be made during the planning phase. The
process characterization should be conducted according to
this plan, with all exceptions noted.
Step 2:
Collect
Data collection is essentially just the execution of the
sampling plan part of the previous step. If a good job were
done in the planning step, then this step should be pretty
straightforward. It is important to execute to the plan as
closely as possible and to note any exceptions.
Step 3:
Analyze
and
interpret
This is the combination of quantitative (regression, ANOVA,
correlation, etc.) and graphical (histograms, scatter plots, box
plots, etc.) analysis techniques that are applied to the collected
data in order to accomplish the goals of the PPC.
Step 4: Reporting is an important step that should not be overlooked.
3.1.4. PPC Steps
http://www.itl.nist.gov/div898/handbook/ppc/section1/ppc14.htm[6/27/2012 2:10:25 PM]
Report By creating an informative report and archiving it in an
accessible place, we can ensure that others have access to the
information generated by the PPC. Often, the work involved
in a PPC can be minimized by using the results of other,
similar studies. Examples of PPC reports can be found in the
Case Studies section.
Further
information
The planning and data collection steps are described in detail
in the data collection section. The analysis and interpretation
steps are covered in detail in the analysis section. Examples
of the reporting step can be seen in the Case Studies.
3.2. Assumptions / Prerequisites
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2.htm[6/27/2012 2:10:26 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
Primary
goal is to
identify
and
quantify
sources of
variation
The primary goal of PPC is to identify and quantify sources of
variation. Only by doing this will we be able to define an
effective plan for variation reduction and process
improvement. Sometimes, in order to achieve this goal, we
must first build mathematical/statistical models of our
processes. In these models we will identify influential factors
and the responses on which they have an effect. We will use
these models to understand how the sources of variation are
influenced by the important factors. This subsection will
review many of the modeling tools we have at our disposal to
accomplish these tasks. In particular, the models covered in
this section are linear models, Analysis of Variance (ANOVA)
models and discrete models.
Contents:
Section 2
1. General Assumptions
2. Continuous Linear
3. Analysis of Variance
1. One-Way
2. Crossed
3. Nested
4. Discrete
3.2.1. General Assumptions
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc21.htm[6/27/2012 2:10:27 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.1. General Assumptions
Assumption:
process is
sum of a
systematic
component
and a random
component
In order to employ the modeling techniques described in
this section, there are a few assumptions about the process
under study that must be made. First, we must assume that
the process can adequately be modeled as the sum of a
systematic component and a random component. The
systematic component is the mathematical model part and
the random component is the error or noise present in the
system. We also assume that the systematic component is
fixed over the range of operating conditions and that the
random component has a constant location, spread and
distributional form.
Assumption:
data used to
fit these
models are
representative
of the process
being
modeled
Finally, we assume that the data used to fit these models
are representative of the process being modeled. As a
result, we must additionally assume that the measurement
system used to collect the data has been studied and proven
to be capable of making measurements to the desired
precision and accuracy. If this is not the case, refer to the
Measurement Capability Section of this Handbook.
3.2.2. Continuous Linear Model
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc22.htm[6/27/2012 2:10:27 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.2. Continuous Linear Model
Description The continuous linear model (CLM) is probably the most
commonly used model in PPC. It is applicable in many instances
ranging from simple control charts to response surface models.
The CLM is a mathematical function that relates explanatory
variables (either discrete or continuous) to a single continuous
response variable. It is called linear because the coefficients of
the terms are expressed as a linear sum. The terms themselves do
not have to be linear.
Model The general form of the CLM is:
This equation just says that if we have p explanatory variables
then the response is modeled by a constant term plus a sum of
functions of those explanatory variables, plus some random error
term. This will become clear as we look at some examples below.
Estimation The coefficients for the parameters in the CLM are estimated by
the method of least squares. This is a method that gives estimates
which minimize the sum of the squared distances from the
observations to the fitted line or plane. See the chapter on Process
Modeling for a more complete discussion on estimating the
coefficients for these models.
Testing The tests for the CLM involve testing that the model as a whole is
a good representation of the process and whether any of the
coefficients in the model are zero or have no effect on the overall
fit. Again, the details for testing are given in the chapter on
Process Modeling.
Assumptions For estimation purposes, there are no additional assumptions
necessary for the CLM beyond those stated in the assumptions
section. For testing purposes, however, it is necessary to assume
that the error term is adequately modeled by a Gaussian
distribution.
Uses The CLM has many uses such as building predictive process
models over a range of process settings that exhibit linear
behavior, control charts, process capability, building models from
3.2.2. Continuous Linear Model
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc22.htm[6/27/2012 2:10:27 PM]
the data produced by designed experiments, and building response
surface models for automated process control applications.
Examples Shewhart Control Chart - The simplest example of a very
common usage of the CLM is the underlying model used for
Shewhart control charts. This model assumes that the process
parameter being measured is a constant with additive Gaussian
noise and is given by:
Diffusion Furnace - Suppose we want to model the average wafer
sheet resistance as a function of the location or zone in a furnace
tube, the temperature, and the anneal time. In this case, let there
be 3 distinct zones (front, center, back) and temperature and time
are continuous explanatory variables. This model is given by the
CLM:
Diffusion Furnace (cont.) - Usually, the fitted line for the average
wafer sheet resistance is not straight but has some curvature to it.
This can be accommodated by adding a quadratic term for the
time parameter as follows:
3.2.3. Analysis of Variance Models (ANOVA)
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc23.htm[6/27/2012 2:10:28 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
ANOVA
allows us
to compare
the effects
of multiple
levels of
multiple
factors
One of the most common analysis activities in PPC is
comparison. We often compare the performance of similar
tools or processes. We also compare the effect of different
treatments such as recipe settings. When we compare two
things, such as two tools running the same operation, we use
comparison techniques. When we want to compare multiple
things, like multiple tools running the same operation or
multiple tools with multiple operators running the same
operation, we turn to ANOVA techniques to perform the
analysis.
ANOVA
splits the
data into
components
The easiest way to understand ANOVA is through a concept
known as value splitting. ANOVA splits the observed data
values into components that are attributable to the different
levels of the factors. Value splitting is best explained by
example.
Example:
Turned
Pins
The simplest example of value splitting is when we just have
one level of one factor. Suppose we have a turning operation
in a machine shop where we are turning pins to a diameter of
.125 +/- .005 inches. Throughout the course of a day we take
five samples of pins and obtain the following measurements:
.125, .127, .124, .126, .128.
We can split these data values into a common value (mean)
and residuals (what's left over) as follows:
.125 .127 .124 .126 .128
=
.126 .126 .126 .126 .126
+
-.001 .001 -.002 .000 .002
From these tables, also called overlays, we can easily
calculate the location and spread of the data as follows:
mean = .126
3.2.3. Analysis of Variance Models (ANOVA)
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc23.htm[6/27/2012 2:10:28 PM]
std. deviation = .0016.
Other
layouts
While the above example is a trivial structural layout, it
illustrates how we can split data values into its components.
In the next sections, we will look at more complicated
structural layouts for the data. In particular we will look at
multiple levels of one factor (One-Way ANOVA) and
multiple levels of two factors (Two-Way ANOVA) where the
factors are crossed and nested.
3.2.3.1. One-Way ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc231.htm[6/27/2012 2:10:29 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.1. One-Way ANOVA
Description A one-way layout consists of a single factor with several levels and multiple
observations at each level. With this kind of layout we can calculate the mean of the
observations within each level of our factor. The residuals will tell us about the
variation within each level. We can also average the means of each level to obtain a
grand mean. We can then look at the deviation of the mean of each level from the
grand mean to understand something about the level effects. Finally, we can compare
the variation within levels to the variation across levels. Hence the name analysis of
variance.
Model It is easy to model all of this with an equation of the form:
The equation indicates that the jth data value, from level i, is the sum of three
components: the common value (grand mean), the level effect (the deviation of each
level mean from the grand mean), and the residual (what's left over).
Estimation
click here to
see details
of one-way
value
splitting
Estimation for the one-way layout can be performed one of two ways. First, we can
calculate the total variation, within-level variation and across-level variation. These can
be summarized in a table as shown below and tests can be made to determine if the
factor levels are significant. The value splitting example illustrates the calculations
involved.
ANOVA
table for
one-way
case
In general, the ANOVA table for the one-way case is given by:
where
and
3.2.3.1. One-Way ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc231.htm[6/27/2012 2:10:29 PM]
.
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of
squares and the associated degrees of freedom (DoF).
Level effects
must sum to
zero
The second way to estimate effects is through the use of CLM techniques. If you look at
the model above you will notice that it is in the form of a CLM. The only problem is
that the model is saturated and no unique solution exists. We overcome this problem by
applying a constraint to the model. Since the level effects are just deviations from the
grand mean, they must sum to zero. By applying the constraint that the level effects
must sum to zero, we can now obtain a unique solution to the CLM equations. Most
analysis programs will handle this for you automatically. See the chapter on Process
Modeling for a more complete discussion on estimating the coefficients for these
models.
Testing We are testing to see if the observed data support the hypothesis that the levels of the
factor are significantly different from each other. The way we do this is by comparing
the within-level variancs to the between-level variance.
If we assume that the observations within each level have the same variance, we can
calculate the variance within each level and pool these together to obtain an estimate of
the overall population variance. This works out to be the mean square of the residuals.
Similarly, if there really were no level effect, the mean square across levels would be an
estimate of the overall variance. Therefore, if there really were no level effect, these
two estimates would be just two different ways to estimate the same parameter and
should be close numerically. However, if there is a level effect, the level mean square
will be higher than the residual mean square.
It can be shown that given the assumptions about the data stated below, the ratio of the
level mean square and the residual mean square follows an F distribution with degrees
of freedom as shown in the ANOVA table. If the F
0
value is significant at a given
significance level (greater than the cut-off value in a F table), then there is a level effect
present in the data.
Assumptions For estimation purposes, we assume the data can adequately be modeled as the sum of
a deterministic component and a random component. We further assume that the fixed
(deterministic) component can be modeled as the sum of an overall mean and some
contribution from the factor level. Finally, it is assumed that the random component can
be modeled with a Gaussian distribution with fixed location and spread.
Uses The one-way ANOVA is useful when we want to compare the effect of multiple levels
of one factor and we have multiple observations at each level. The factor can be either
discrete (different machine, different plants, different shifts, etc.) or continuous
(different gas flows, temperatures, etc.).
Example Let's extend the machining example by assuming that we have five different machines
making the same part and we take five random samples from each machine to obtain the
following diameter data:
Machine
3.2.3.1. One-Way ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc231.htm[6/27/2012 2:10:29 PM]
1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
0.127 0.122 0.125 0.128 0.129
0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121
Analyze Using ANOVA software or the techniques of the value-splitting example, we
summarize the data in an ANOVA table as follows:
Source Sum of Squares Deg. of Freedom Mean Square
F
0
Factor 0.000137 4 0.000034 4.86
Residual 0.000132 20 0.000007
Corrected Total 0.000269 24
Test By dividing the factor-level mean square by the residual mean square, we obtain an F
0
value of 4.86 which is greater than the cut-off value of 2.87 from the F distribution with
4 and 20 degrees of freedom and a significance level of 0.05. Therefore, there is
sufficient evidence to reject the hypothesis that the levels are all the same.
Conclusion From the analysis of these data we can conclude that the factor "machine" has an effect.
There is a statistically significant difference in the pin diameters across the machines on
which they were manufactured.
3.2.3.1.1. One-Way Value-Splitting
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2311.htm[6/27/2012 2:10:30 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.1. One-Way ANOVA
3.2.3.1.1. One-Way Value-Splitting
Example Let's use the data from the machining example to illustrate
how to use the techniques of value-splitting to break each data
value into its component parts. Once we have the component
parts, it is then a trivial matter to calculate the sums of squares
and form the F-value for the test.
Machine
1 2 3 4 5
.125 .118 .123 .126 .118
.127 .122 .125 .128 .129
.125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
Calculate
level-
means
Remember from our model, , we say each
observation is the sum of a common value, a level effect and a
residual value. Value-splitting just breaks each observation
into its component parts. The first step in value-splitting is to
calculate the mean values (rounding to the nearest thousandth)
within each machine to get the level means.
Machine
1 2 3 4 5
.1262 .1206 .1246 .1272 .123
Sweep
level
means
We can then sweep (subtract the level mean from each
associated data value) the means through the original data
table to get the residuals:
Machine
1 2 3 4 5
-
.0012
-
.0026
-
.0016
-
.0012
-
.005
.0008 .0014 .0004 .0008 .006
- - -
3.2.3.1.1. One-Way Value-Splitting
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2311.htm[6/27/2012 2:10:30 PM]
.0012 .0006
.0004
.0012
.004
-
.0002
.0034
-
.0006
-
.0002
-
.003
.0018
-
.0016
.0014 .0018
-
.002
Calculate
the grand
mean
The next step is to calculate the grand mean from the
individual machine means as:
Grand
Mean
.12432
Sweep the
grand
mean
through
the level
means
Finally, we can sweep the grand mean through the individual
level means to obtain the level effects:
Machine
1 2 3 4 5
.00188
-
.00372
.00028 .00288
-
.00132
It is easy to verify that the original data table can be
constructed by adding the overall mean, the machine effect
and the appropriate residual.
Calculate
ANOVA
values
Now that we have the data values split and the overlays
created, the next step is to calculate the various values in the
One-Way ANOVA table. We have three values to calculate
for each overlay. They are the sums of squares, the degrees of
freedom, and the mean squares.
Total sum
of squares
The total sum of squares is calculated by summing the squares
of all the data values and subtracting from this number the
square of the grand mean times the total number of data
values. We usually don't calculate the mean square for the
total sum of squares because we don't use this value in any
statistical test.
Residual
sum of
squares,
degrees of
freedom
and mean
The residual sum of squares is calculated by summing the
squares of the residual values. This is equal to .000132. The
degrees of freedom is the number of unconstrained values.
Since the residuals for each level of the factor must sum to
zero, once we know four of them, the last one is determined.
This means we have four unconstrained values for each level,
3.2.3.1.1. One-Way Value-Splitting
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2311.htm[6/27/2012 2:10:30 PM]
square or 20 degrees of freedom. This gives a mean square of
.000007.
Level sum
of squares,
degrees of
freedom
and mean
square
Finally, to obtain the sum of squares for the levels, we sum the
squares of each value in the level effect overlay and multiply
the sum by the number of observations for each level (in this
case 5) to obtain a value of .000137. Since the deviations from
the level means must sum to zero, we have only four
unconstrained values so the degrees of freedom for level
effects is 4. This produces a mean square of .000034.
Calculate
F-value
The last step is to calculate the F-value and perform the test of
equal level means. The F- value is just the level mean square
divided by the residual mean square. In this case the F-
value=4.86. If we look in an F-table for 4 and 20 degrees of
freedom at 95% confidence, we see that the critical value is
2.87, which means that we have a significant result and that
there is thus evidence of a strong machine effect. By looking
at the level-effect overlay we see that this is driven by
machines 2 and 4.
3.2.3.2. Two-Way Crossed ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc232.htm[6/27/2012 2:10:31 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.2. Two-Way Crossed ANOVA
Description When we have two factors with at least two levels and one or more observations at each level, we say we have a
two-way layout. We say that the two-way layout is crossed when every level of Factor A occurs with every level
of Factor B. With this kind of layout we can estimate the effect of each factor (Main Effects) as well as any
interaction between the factors.
Model If we assume that we have K observations at each combination of I levels of Factor A and J levels of Factor B,
then we can model the two-way layout with an equation of the form:
This equation just says that the kth data value for the jth level of Factor B and the ith level of Factor A is the sum
of five components: the common value (grand mean), the level effect for Factor A, the level effect for Factor B,
the interaction effect, and the residual. Note that (ab) does not mean multiplication; rather that there is interaction
between the two factors.
Estimation Like the one-way case, the estimation for the two-way layout can be done either by calculating the variance
components or by using CLM techniques.
Click here
for the
value
splitting
example
For the two-way ANOVA, we display the data in a two-dimensional table with the levels of Factor A in columns
and the levels of Factor B in rows. The replicate observations fill each cell. We can sweep out the common
value, the row effects, the column effects, the interaction effects and the residuals using value-splitting
techniques. Sums of squares can be calculated and summarized in an ANOVA table as shown below.
.
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of squares and the
associated degrees of freedom (DoF).
We can use CLM techniques to do the estimation. We still have the problem that the model is saturated and no
unique solution exists. We overcome this problem by applying the constraints to the model that the two main
3.2.3.2. Two-Way Crossed ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc232.htm[6/27/2012 2:10:31 PM]
effects and interaction effects each sum to zero.
Testing Like testing in the one-way case, we are testing that two main effects and the interaction are zero. Again we just
form a ratio of each main effect mean square and the interaction mean square to the residual mean square. If the
assumptions stated below are true then those ratios follow an F distribution and the test is performed by
comparing the F
0
ratios to values in an F table with the appropriate degrees of freedom and confidence level.
Assumptions For estimation purposes, we assume the data can be adequately modeled as described in the model above. It is
assumed that the random component can be modeled with a Gaussian distribution with fixed location and spread.
Uses The two-way crossed ANOVA is useful when we want to compare the effect of multiple levels of two factors
and we can combine every level of one factor with every level of the other factor. If we have multiple
observations at each level, then we can also estimate the effects of interaction between the two factors.
Example Let's extend the one-way machining example by assuming that we want to test if there are any differences in pin
diameters due to different types of coolant. We still have five different machines making the same part and we
take five samples from each machine for each coolant type to obtain the following data:
Machine
Coolant
A
1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
0.127 0.122 0.125 0.128 0.129
0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121
Coolant
B
0.124 0.116 0.122 0.126 0.125
0.128 0.125 0.121 0.129 0.123
0.127 0.119 0.124 0.125 0.114
0.126 0.125 0.126 0.130 0.124
0.129 0.120 0.125 0.124 0.117
Analyze For analysis details see the crossed two-way value splitting example. We can summarize the analysis results in
an ANOVA table as follows:
Source Sum of Squares Deg. of Freedom Mean Square
F
0
machine 0.000303 4 0.000076 8.8
coolant 0.00000392 1 0.00000392 0.45
interaction 0.00001468 4 0.00000367 0.42
residuals 0.000346 40 0.0000087
corrected total 0.000668 49
Test By dividing the mean square for machine by the mean square for residuals we obtain an F
0
value of 8.8 which is
greater than the critical value of 2.61 based on 4 and 40 degrees of freedom and a 0.05 significance level.
Likewise the F
0
values for Coolant and Interaction, obtained by dividing their mean squares by the residual mean
square, are less than their respective critical values of 4.08 and 2.61 (0.05 significance level).
Conclusion From the ANOVA table we can conclude that machine is the most important factor and is statistically
significant. Coolant is not significant and neither is the interaction. These results would lead us to believe that
some tool-matching efforts would be useful for improving this process.
3.2.3.2. Two-Way Crossed ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc232.htm[6/27/2012 2:10:31 PM]
3.2.3.2.1. Two-way Crossed Value-Splitting Example
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2321.htm[6/27/2012 2:10:32 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.2. Two-Way Crossed ANOVA
3.2.3.2.1. Two-way Crossed Value-Splitting
Example
Example:
Coolant is
completely
crossed
with
machine
The data table below is five samples each collected from five
different lathes each running two different types of coolant.
The measurement is the diameter of a turned pin.
Machine
Coolant
A
1 2 3 4 5
.125 .118 .123 .126 .118
.127 .122 .125 .128 .129
.125 .120 .125 .126 .127
.126 .124 .124 .127 .120
.128 .119 .126 .129 .121
Coolant
B
.124 .116 .122 .126 .125
.128 .125 .121 .129 .123
.127 .119 .124 .125 .114
.126 .125 .126 .130 .124
.129 .120 .125 .124 .117
For the crossed two-way case, the first thing we need to do is
to sweep the cell means from the data table to obtain the
residual values. This is shown in the tables below.
The first
step is to
sweep out
the cell
means to
obtain the
residuals
and means
Machine
1 2 3 4 5
A .1262 .1206 .1246 .1272 .123
B .1268 .121 .1236 .1268 .1206
Coolant
A
-
.0012
-
.0026
-
.0016
-
.0012
-.005
.0008 .0014 .0004 .0008 .006
-
.0012
-
.0006
.0004
-
.0012
.004
-
.0002
.0034
-
.0006
-
.0002
-.003
3.2.3.2.1. Two-way Crossed Value-Splitting Example
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2321.htm[6/27/2012 2:10:32 PM]
.0018
-
.0016
.0014 .0018 -.002
Coolant
B
-
.0028
-.005
-
.0016
-
.0008
.0044
.0012 .004
-
.0026
.0022 .0024
.0002 -.002 .0004
-
.0018
-
.0066
-
.0008
.004 .0024 .0032 .0034
.0022 -.001 .0014
-
.0028
-
.0036
Sweep the
row means
The next step is to sweep out the row means. This gives the
table below.
Machine
1 2 3 4 5
A .1243 .0019
-
.0037
.0003 .0029
-
.0013
B .1238 .003
-
.0028
-
.0002
.003
-
.0032
Sweep the
column
means
Finally, we sweep the column means to obtain the grand mean,
row (coolant) effects, column (machine) effects and the
interaction effects.
Machine
1 2 3 4 5
.1241 .0025
-
.0033
.00005 .003
-
.0023
A .0003
-
.0006
-
.0005
.00025 .0000 .001
B
-
.0003
.0006 .0005
-
.00025
.0000 -.001
What do
these
tables tell
us?
By looking at the table of residuals, we see that the residuals
for coolant B tend to be a little higher than for coolant A. This
implies that there may be more variability in diameter when
we use coolant B. From the effects table above, we see that
machines 2 and 5 produce smaller pin diameters than the other
machines. There is also a very slight coolant effect but the
machine effect is larger. Finally, there also appears to be slight
interaction effects. For instance, machines 1 and 2 had smaller
diameters with coolant A but the opposite was true for
machines 3,4 and 5.
Calculate We can calculate the values for the ANOVA table according
3.2.3.2.1. Two-way Crossed Value-Splitting Example
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2321.htm[6/27/2012 2:10:32 PM]
sums of
squares
and mean
squares
to the formulae in the table on the crossed two-way page. This
gives the table below. From the F-values we see that the
machine effect is significant but the coolant and the
interaction are not.
Source
Sums of
Squares
Degrees of
Freedom
Mean
Square
F-
value
Machine .000303 4 .000076
8.8 >
2.61
Coolant .00000392 1 .00000392
.45 <
4.08
Interaction .00001468 4 .00000367
.42 <
2.61
Residual .000346 40 .0000087
Corrected
Total
.000668 49
3.2.3.3. Two-Way Nested ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc233.htm[6/27/2012 2:10:33 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.3. Two-Way Nested ANOVA
Description Sometimes, constraints prevent us from crossing every level of one factor with every level of the
other factor. In these cases we are forced into what is known as a nested layout. We say we have
a nested layout when fewer than all levels of one factor occur within each level of the other
factor. An example of this might be if we want to study the effects of different machines and
different operators on some output characteristic, but we can't have the operators change the
machines they run. In this case, each operator is not crossed with each machine but rather only
runs one machine.
Model If Factor B is nested within Factor A, then a level of Factor B can only occur within one level of
Factor A and there can be no interaction. This gives the following model:
This equation indicates that each data value is the sum of a common value (grand mean), the
level effect for Factor A, the level effect of Factor B nested within Factor A, and the residual.
Estimation For a nested design we typically use variance components methods to perform the analysis. We
can sweep out the common value, the Factor A effects, the Factor B within A effects and the
residuals using value-splitting techniques. Sums of squares can be calculated and summarized in
an ANOVA table as shown below.
Click here
for nested
value-
splitting
example
It is important to note that with this type of layout, since each level of one factor is only present
with one level of the other factor, we can't estimate interaction between the two.
ANOVA
table for
nested case
3.2.3.3. Two-Way Nested ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc233.htm[6/27/2012 2:10:33 PM]
The row labeled, "Corr. Total", in the ANOVA table contains the corrected total sum of squares
and the associated degrees of freedom (DoF).
As with the crossed layout, we can also use CLM techniques. We still have the problem that the
model is saturated and no unique solution exists. We overcome this problem by applying to the
model the constraints that the two main effects sum to zero.
Testing We are testing that two main effects are zero. Again we just form a ratio (F
0
) of each main effect
mean square to the appropriate mean-squared error term. (Note that the error term for Factor A is
not MSE, but is MSB.) If the assumptions stated below are true then those ratios follow an F
distribution and the test is performed by comparing the F
0
ratios to values in an F table with the
appropriate degrees of freedom and confidence level.
Assumptions For estimation purposes, we assume the data can be adequately modeled by the model above and
that there is more than one variance component. It is assumed that the random component can be
modeled with a Gaussian distribution with fixed location and spread.
Uses The two-way nested ANOVA is useful when we are constrained from combining all the levels of
one factor with all of the levels of the other factor. These designs are most useful when we have
what is called a random effects situation. When the levels of a factor are chosen at random rather
than selected intentionally, we say we have a random effects model. An example of this is when
we select lots from a production run, then select units from the lot. Here the units are nested
within lots and the effect of each factor is random.
Example Let's change the two-way machining example slightly by assuming that we have five different
machines making the same part and each machine has two operators, one for the day shift and
one for the night shift. We take five samples from each machine for each operator to obtain the
following data:
Machine
Operator
Day
1 2 3 4 5
0.125 0.118 0.123 0.126 0.118
0.127 0.122 0.125 0.128 0.129
0.125 0.120 0.125 0.126 0.127
0.126 0.124 0.124 0.127 0.120
0.128 0.119 0.126 0.129 0.121
Operator
Night
0.124 0.116 0.122 0.126 0.125
0.128 0.125 0.121 0.129 0.123
0.127 0.119 0.124 0.125 0.114
0.126 0.125 0.126 0.130 0.124
0.129 0.120 0.125 0.124 0.117
Analyze For analysis details see the nested two-way value splitting example. We can summarize the
analysis results in an ANOVA table as follows:
3.2.3.3. Two-Way Nested ANOVA
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc233.htm[6/27/2012 2:10:33 PM]
Source Sum of Squares Deg. of Freedom Mean Square
F
0
Machine 3.03e-4 4 7.58e-5 20.38
Operator(Machine) 1.86e-5 5 3.72e-6 0.428
Residuals 3.46e-4 40 8.70e-6
Corrected Total 6.68e-4 49
Test By dividing the mean square for Machine by the mean square for Operator within Machine, or
Operator(Machine), we obtain an F
0
value of 20.38 which is greater than the critical value of
5.19 for 4 and 5 degrees of freedom at the 0.05 significance level. The F
0
value for
Operator(Machine), obtained by dividing its mean square by the residual mean square, is less than
the critical value of 2.45 for 5 and 40 degrees of freedom at the 0.05 significance level.
Conclusion From the ANOVA table we can conclude that the Machine is the most important factor and is
statistically significant. The effect of Operator nested within Machine is not statistically
significant. Again, any improvement activities should be focused on the tools.
3.2.3.3.1. Two-Way Nested Value-Splitting Example
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2331.htm[6/27/2012 2:10:34 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.3. Analysis of Variance Models (ANOVA)
3.2.3.3. Two-Way Nested ANOVA
3.2.3.3.1. Two-Way Nested Value-Splitting Example
Example:
Operator
is nested
within
machine.
The data table below contains data collected from five different lathes, each run by two
different operators. Note we are concerned here with the effect of operators, so the
layout is nested. If we were concerned with shift instead of operator, the layout would
be crossed. The measurement is the diameter of a turned pin.
Machine Operator
Sample
1 2 3 4 5
1
Day .125 .127 .125 .126 .128
Night .124 .128 .127 .126 .129
2
Day .118 .122 .120 .124 .119
Night .116 .125 .119 .125 .120
3
Day .123 .125 .125 .124 .126
Night .122 .121 .124 .126 .125
4
Day .126 .128 .126 .127 .129
Night .126 .129 .125 .130 .124
5
Day .118 .129 .127 .120 .121
Night .125 .123 .114 .124 .117
For the nested two-way case, just as in the crossed case, the first thing we need to do is
to sweep the cell means from the data table to obtain the residual values. We then
sweep the nested factor (Operator) and the top level factor (Machine) to obtain the
table below.
Machine Operator
Common Machine Operator
Sample
1 2 3 4 5
1
Day
.12404
.00246
-.0003
-
.0012
.0008
-
.0012
-
.0002
.0018
Night .0003
-
.0028
.0012 .002
-
.0008
.0022
2
Day
-.00324
-.0002
-
.0026
.0014
-
.0006
.0034
-
.0016
Night .0002 -.005 .004 -.002 .004 -.001
3
Day
.00006
.0005
-
.0016
.0004 .0004
-
.0006
.0014
3.2.3.3.1. Two-Way Nested Value-Splitting Example
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc2331.htm[6/27/2012 2:10:34 PM]
Night -.0005
-
.0016
-
.0026
.0004 .0024 .0014
4
Day
.00296
.0002
-
.0012
.0008
-
.0012
-.002 .0018
Night -.0002
-
.0008
.0022
-
.0018
.0032
-
.0028
5
Day
-.00224
.0012 -.005 .006 .004 -.003 -.002
Night -.0012 .0044 .0024
-
.0066
.0034
-
.0036
What
does this
table tell
us?
By looking at the residuals we see that machines 2 and 5 have the greatest variability.
There does not appear to be much of an operator effect but there is clearly a strong
machine effect.
Calculate
sums of
squares
and mean
squares
We can calculate the values for the ANOVA table according to the formulae in the
table on the nested two-way page. This produces the table below. From the F-values
we see that the machine effect is significant but the operator effect is not. (Here it is
assumed that both factors are fixed).
Source
Sums of
Squares
Degrees of
Freedom
Mean
Square
F-value
Machine .000303 4 .0000758
8.77 >
2.61
Operator(Machine) .0000186 5 .00000372
.428 <
2.45
Residual .000346 40 .0000087
Corrected Total .000668 49
3.2.4. Discrete Models
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc24.htm[6/27/2012 2:10:36 PM]
3. Production Process Characterization
3.2. Assumptions / Prerequisites
3.2.4. Discrete Models
Description There are many instances when we are faced with the
analysis of discrete data rather than continuous data.
Examples of this are yield (good/bad), speed bins
(slow/fast/faster/fastest), survey results (favor/oppose), etc.
We then try to explain the discrete outcomes with some
combination of discrete and/or continuous explanatory
variables. In this situation the modeling techniques we have
learned so far (CLM and ANOVA) are no longer appropriate.
Contingency
table
analysis and
log-linear
model
There are two primary methods available for the analysis of
discrete response data. The first one applies to situations in
which we have discrete explanatory variables and discrete
responses and is known as Contingency Table Analysis. The
model for this is covered in detail in this section. The second
model applies when we have both discrete and continuous
explanatory variables and is referred to as a Log-Linear
Model. That model is beyond the scope of this Handbook,
but interested readers should refer to the reference section of
this chapter for a list of useful books on the topic.
Model Suppose we have n individuals that we classify according to
two criteria, A and B. Suppose there are r levels of criterion
A and s levels of criterion B. These responses can be
displayed in an r x s table. For example, suppose we have a
box of manufactured parts that we classify as good or bad
and whether they came from supplier 1, 2 or 3.
Now, each cell of this table will have a count of the
individuals who fall into its particular combination of
classification levels. Let's call this count N
ij
. The sum of all
of these counts will be equal to the total number of
individuals, N. Also, each row of the table will sum to N
i.
and each column will sum to N
.j
.
Under the assumption that there is no interaction between the
two classifying variables (like the number of good or bad
parts does not depend on which supplier they came from),
we can calculate the counts we would expect to see in each
cell. Let's call the expected count for any cell E
ij
. Then the
expected value for a cell is E
ij
= N
i.
* N
.j
/N . All we need to
do then is to compare the expected counts to the observed
3.2.4. Discrete Models
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc24.htm[6/27/2012 2:10:36 PM]
counts. If there is a consderable difference between the
observed counts and the expected values, then the two
variables interact in some way.
Estimation The estimation is very simple. All we do is make a table of
the observed counts and then calculate the expected counts as
described above.
Testing The test is performed using a Chi-Square goodness-of-fit
test according to the following formula:
where the summation is across all of the cells in the table.
Given the assumptions stated below, this statistic has
approximately a chi-square distribution and is therefore
compared against a chi-square table with (r-1)(s-1) degrees
of freedom, with r and s as previously defined. If the value
of the test statistic is less than the chi-square value for a
given level of confidence, then the classifying variables are
declared independent, otherwise they are judged to be
dependent.
Assumptions The estimation and testing results above hold regardless of
whether the sample model is Poisson, multinomial, or
product-multinomial. The chi-square results start to break
down if the counts in any cell are small, say < 5.
Uses The contingency table method is really just a test of
interaction between discrete explanatory variables for
discrete responses. The example given below is for two
factors. The methods are equally applicable to more factors,
but as with any interaction, as you add more factors the
interpretation of the results becomes more difficult.
Example Suppose we are comparing the yield from two manufacturing
processes. We want want to know if one process has a higher
yield.
Make table
of counts
Good Bad Totals
Process A 86 14 100
Process B 80 20 100
Totals 166 34 200
Table 1. Yields for two production processes
We obtain the expected values by the formula given above.
This gives the table below.
3.2.4. Discrete Models
http://www.itl.nist.gov/div898/handbook/ppc/section2/ppc24.htm[6/27/2012 2:10:36 PM]
Calculate
expected
counts
Good Bad Totals
Process A 83 17 100
Process B 83 17 100
Totals 166 34 200
Table 2. Expected values for two production processes
Calculate
chi-square
statistic and
compare to
table value
The chi-square statistic is 1.276. This is below the chi-square
value for 1 degree of freedom and 90% confidence of 2.71 .
Therefore, we conclude that there is not a (significant)
difference in process yield.
Conclusion Therefore, we conclude that there is no statistically
significant difference between the two processes.
3.3. Data Collection for PPC
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc3.htm[6/27/2012 2:10:36 PM]
3. Production Process Characterization
3.3. Data Collection for PPC
Start with
careful
planning
The data collection process for PPC starts with careful
planning. The planning consists of the definition of clear and
concise goals, developing process models and devising a
sampling plan.
Many
things can
go wrong
in the data
collection
This activity of course ends without the actual collection of
the data which is usually not as straightforward as it might
appear. Many things can go wrong in the execution of the
sampling plan. The problems can be mitigated with the use of
check lists and by carefully documenting all exceptions to the
original sampling plan.
Table of
Contents
1. Set Goals
2. Modeling Processes
1. Black-Box Models
2. Fishbone Diagrams
3. Relationships and Sensitivities
3. Define the Sampling Plan
1. Identify the parameters, ranges and
resolution
2. Design sampling scheme
3. Select sample sizes
4. Design data storage formats
5. Assign roles and responsibilities
3.3.1. Define Goals
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc31.htm[6/27/2012 2:10:37 PM]
3. Production Process Characterization
3.3. Data Collection for PPC
3.3.1. Define Goals
State concise
goals
The goal statement is one of the most important parts of the
characterization plan. With clearly and concisely stated
goals, the rest of the planning process falls naturally into
place.
Goals
usually
defined in
terms of key
specifications
The goals are usually defined in terms of key specifications
or manufacturing indices. We typically want to characterize
a process and compare the results against these
specifications. However, this is not always the case. We
may, for instance, just want to quantify key process
parameters and use our estimates of those parameters in
some other activity like controller design or process
improvement.
Example
goal
statements
Click on each of the links below to see Goal Statements for
each of the case studies.
1. Furnace Case Study (Goal)
2. Machine Case Study (Goal)
3.3.2. Process Modeling
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc32.htm[6/27/2012 2:10:38 PM]
3. Production Process Characterization
3.3. Data Collection for PPC
3.3.2. Process Modeling
Identify
influential
parameters
Process modeling begins by identifying all of the important
factors and responses. This is usually best done as a team
effort and is limited to the scope set by the goal statement.
Document
with black-
box
models
This activity is best documented in the form of a black-box
model as seen in the figure below. In this figure all of the
outputs are shown on the right and all of the controllable
inputs are shown on the left. Any inputs or factors that may be
observable but not controllable are shown on the top or
bottom.
Model
relationships
using
fishbone
diagrams
The next step is to model relationships of the previously
identified factors and responses. In this step we choose a
parameter and identify all of the other parameters that may
have an influence on it. This process is easily documented
with fishbone diagrams as illustrated in the figure below.
3.3.2. Process Modeling
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc32.htm[6/27/2012 2:10:38 PM]
The influenced parameter is put on the center line and the
influential factors are listed off of the centerline and can be
grouped into major categories like Tool, Material, Work
Methods and Environment.
Document
relationships
and
sensitivities
The final step is to document all known information about
the relationships and sensitivities between the inputs and
outputs. Some of the inputs may be correlated with each
other as well as the outputs. There may be detailed
mathematical models available from other studies or the
information available may be vague such as for a machining
process we know that as the feed rate increases, the quality
of the finish decreases.
It is best to document this kind of information in a table
with all of the inputs and outputs listed both on the left
column and on the top row. Then, correlation information
can be filled in for each of the appropriate cells. See the case
studies for an example.
Examples Click on each of the links below to see the process models
for each of the case studies.
1. Case Study 1 (Process Model)
2. Case Study 2 (Process Model)
3.3.2. Process Modeling
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc32.htm[6/27/2012 2:10:38 PM]
3.3.3. Define Sampling Plan
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc33.htm[6/27/2012 2:10:38 PM]
3. Production Process Characterization
3.3. Data Collection for PPC
3.3.3. Define Sampling Plan
Sampling
plan is
detailed
outline of
measurements
to be taken
A sampling plan is a detailed outline of which
measurements will be taken at what times, on which
material, in what manner, and by whom. Sampling plans
should be designed in such a way that the resulting data
will contain a representative sample of the parameters of
interest and allow for all questions, as stated in the goals, to
be answered.
Steps in the
sampling plan
The steps involved in developing a sampling plan are:
1. identify the parameters to be measured, the range of
possible values, and the required resolution
2. design a sampling scheme that details how and when
samples will be taken
3. select sample sizes
4. design data storage formats
5. assign roles and responsibilities
Verify and
execute
Once the sampling plan has been developed, it can be
verified and then passed on to the responsible parties for
execution.
3.3.3.1. Identifying Parameters, Ranges and Resolution
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc331.htm[6/27/2012 2:10:39 PM]
3. Production Process Characterization
3.3. Data Collection for PPC
3.3.3. Define Sampling Plan
3.3.3.1. Identifying Parameters, Ranges and
Resolution
Our goals and the models we built in the previous steps
should provide all of the information needed for selecting
parameters and determining the expected ranges and the
required measurement resolution.
Goals will
tell us what
to measure
and how
The first step is to carefully examine the goals. This will tell
you which response variables need to be sampled and how.
For instance, if our goal states that we want to determine if
an oxide film can be grown on a wafer to within 10
Angstroms of the target value with a uniformity of <2%,
then we know we have to measure the film thickness on the
wafers to an accuracy of at least +/- 3 Angstroms and we
must measure at multiple sites on the wafer in order to
calculate uniformity.
The goals and the models we build will also indicate which
explanatory variables need to be sampled and how. Since
the fishbone diagrams define the known important
relationships, these will be our best guide as to which
explanatory variables are candidates for measurement.
Ranges help
screen
outliers
Defining the expected ranges of values is useful for
screening outliers. In the machining example , we would not
expect to see many values that vary more than +/- .005"
from nominal. Therefore we know that any values that are
much beyond this interval are highly suspect and should be
remeasured.
Resolution
helps choose
measurement
equipment
Finally, the required resolution for the measurements should
be specified. This specification will help guide the choice of
metrology equipment and help define the measurement
procedures. As a rule of thumb, we would like our
measurement resolution to be at least 1/10 of our tolerance.
For the oxide growth example, this means that we want to
measure with an accuracy of 2 Angstroms. Similarly, for the
turning operation we would need to measure the diameter
within .001". This means that vernier calipers would be
adequate as the measurement device for this application.
Examples Click on each of the links below to see the parameter
3.3.3.1. Identifying Parameters, Ranges and Resolution
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc331.htm[6/27/2012 2:10:39 PM]
descriptions for each of the case studies.
1. Case Study 1 (Sampling Plan)
2. Case Study 2 (Sampling Plan)
3.3.3.2. Choosing a Sampling Scheme
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc332.htm[6/27/2012 2:10:40 PM]
3. Production Process Characterization
3.3. Data Collection for PPC
3.3.3. Define Sampling Plan
3.3.3.2. Choosing a Sampling Scheme
A sampling
scheme
defines what
data will be
obtained and
how
A sampling scheme is a detailed description of what data
will be obtained and how this will be done. In PPC we are
faced with two different situations for developing
sampling schemes. The first is when we are conducting a
controlled experiment. There are very efficient and exact
methods for developing sampling schemes for designed
experiments and the reader is referred to the Process
Improvement chapter for details.
Passive data
collection
The second situation is when we are conducting a passive
data collection (PDC) study to learn about the inherent
properties of a process. These types of studies are usually
for comparison purposes when we wish to compare
properties of processes against each other or against some
hypothesis. This is the situation that we will focus on here.
There are two
principles that
guide our
choice of
sampling
scheme
Once we have selected our response parameters, it would
seem to be a rather straightforward exercise to take some
measurements, calculate some statistics and draw
conclusions. There are, however, many things which can
go wrong along the way that can be avoided with careful
planning and knowing what to watch for. There are two
overriding principles that will guide the design of our
sampling scheme.
The first is
precision
The first principle is that of precision. If the sampling
scheme is properly laid out, the difference between our
estimate of some parameter of interest and its true value
will be due only to random variation. The size of this
random variation is measured by a quantity called
standard error. The magnitude of the standard error is
known as precision. The smaller the standard error, the
more precise are our estimates.
Precision of
an estimate
depends on
several factors
The precision of any estimate will depend on:
the inherent variability of the process estimator
the measurement error
the number of independent replications (sample size)
the efficiency of the sampling scheme.
3.3.3.2. Choosing a Sampling Scheme
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc332.htm[6/27/2012 2:10:40 PM]
The second is
systematic
sampling error
(or
confounded
effects)
The second principle is the avoidance of systematic errors.
Systematic sampling error occurs when the levels of one
explanatory variable are the same as some other
unaccounted for explanatory variable. This is also referred
to as confounded effects. Systematic sampling error is best
seen by example.
Example 1: We want to compare the effect of
two different coolants on the resulting surface
finish from a turning operation. It is decided
to run one lot, change the coolant and then
run another lot. With this sampling scheme,
there is no way to distinguish the coolant
effect from the lot effect or from tool wear
considerations. There is systematic sampling
error in this sampling scheme.
Example 2: We wish to examine the effect of
two pre-clean procedures on the uniformity of
an oxide growth process. We clean one
cassette of wafers with one method and
another cassette with the other method. We
load one cassette in the front of the furnace
tube and the other cassette in the middle. To
complete the run, we fill the rest of the tube
with other lots. With this sampling scheme,
there is no way to distinguish between the
effect of the different pre-clean methods and
the cassette effect or the tube location effect.
Again, we have systematic sampling errors.
Stratification
helps to
overcome
systematic
error
The way to combat systematic sampling errors (and at the
same time increase precision) is through stratification and
randomization. Stratification is the process of segmenting
our population across levels of some factor so as to
minimize variability within those segments or strata. For
instance, if we want to try several different process recipes
to see which one is best, we may want to be sure to apply
each of the recipes to each of the three work shifts. This
will ensure that we eliminate any systematic errors caused
by a shift effect. This is where the ANOVA designs are
particularly useful.
Randomization
helps too
Randomization is the process of randomly applying the
various treatment combinations. In the above example, we
would not want to apply recipe 1, 2 and 3 in the same
order for each of the three shifts but would instead
randomize the order of the three recipes in each shift. This
will avoid any systematic errors caused by the order of the
recipes.
3.3.3.2. Choosing a Sampling Scheme
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc332.htm[6/27/2012 2:10:40 PM]
Examples The issues here are many and complicated. Click on each
of the links below to see the sampling schemes for each of
the case studies.
1. Case Study 1 (Sampling Plan)
2. Case Study 2 (Sampling Plan)
3.3.3.3. Selecting Sample Sizes
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm[6/27/2012 2:10:41 PM]
3. Production Process Characterization
3.3. Data Collection for PPC
3.3.3. Define Sampling Plan
3.3.3.3. Selecting Sample Sizes
Consider
these things
when
selecting a
sample size
When choosing a sample size, we must consider the
following issues:
What population parameters we want to estimate
Cost of sampling (importance of information)
How much is already known
Spread (variability) of the population
Practicality: how hard is it to collect data
How precise we want the final estimates to be
Cost of
taking
samples
The cost of sampling issue helps us determine how precise
our estimates should be. As we will see below, when
choosing sample sizes we need to select risk values. If the
decisions we will make from the sampling activity are very
valuable, then we will want low risk values and hence
larger sample sizes.
Prior
information
If our process has been studied before, we can use that prior
information to reduce sample sizes. This can be done by
using prior mean and variance estimates and by stratifying
the population to reduce variation within groups.
Inherent
variability
We take samples to form estimates of some characteristic
of the population of interest. The variance of that estimate
is proportional to the inherent variability of the population
divided by the sample size:
.
with denoting the parameter we are trying to estimate.
This means that if the variability of the population is large,
then we must take many samples. Conversely, a small
population variance means we don't have to take as many
samples.
Practicality Of course the sample size you select must make sense. This
is where the trade-offs usually occur. We want to take
enough observations to obtain reasonably precise estimates
3.3.3.3. Selecting Sample Sizes
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm[6/27/2012 2:10:41 PM]
of the parameters of interest but we also want to do this
within a practical resource budget. The important thing is to
quantify the risks associated with the chosen sample size.
Sample size
determination
In summary, the steps involved in estimating a sample size
are:
1. There must be a statement about what is expected of
the sample. We must determine what is it we are
trying to estimate, how precise we want the estimate
to be, and what are we going to do with the estimate
once we have it. This should easily be derived from
the goals.
2. We must find some equation that connects the desired
precision of the estimate with the sample size. This is
a probability statement. A couple are given below;
see your statistician if these are not appropriate for
your situation.
3. This equation may contain unknown properties of the
population such as the mean or variance. This is
where prior information can help.
4. If you are stratifying the population in order to reduce
variation, sample size determination must be
performed for each stratum.
5. The final sample size should be scrutinized for
practicality. If it is unacceptable, the only way to
reduce it is to accept less precision in the sample
estimate.
Sampling
proportions
When we are sampling proportions we start with a
probability statement about the desired precision. This is
given by:
where
is the estimated proportion
P is the unknown population parameter
is the specified precision of the estimate
is the probability value (usually low)
This equation simply shows that we want the probability
that the precision of our estimate being less than we want is
. Of course we like to set low, usually .1 or less.
Using some assumptions about the proportion being
approximately normally distributed we can obtain an
estimate of the required sample size as:
where z is the ordinate on the Normal curve corresponding
to .
3.3.3.3. Selecting Sample Sizes
http://www.itl.nist.gov/div898/handbook/ppc/section3/ppc333.htm[6/27/2012 2:10:41 PM]
Example Let's say we have a new process we want to try. We plan to
run the new process and sample the output for yield
(good/bad). Our current process has been yielding 65%
(p=.65, q=.35). We decide that we want the estimate of the
new process yield to be accurate to within = .10 at 95%
confidence ( = .05, z
1
-0.3198 0.1202
6.4.4.9. Example of Univariate Box-Jenkins Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc449.htm[6/27/2012 2:36:40 PM]
2
0.1797 0.1202
= 51.1286
Residual standard deviation = 10.9599
Test randomness of residuals:
Standardized Runs Statistic Z = 0.4887, p-value =
0.625
Forecasting Using our AR(2) model, we forcast values six time periods
into the future.
Period Prediction Standard Error
71 60.6405 10.9479
72 43.0317 11.4941
73 55.4274 11.9015
74 48.2987 12.0108
75 52.8061 12.0585
76 50.0835 12.0751
The "historical" data and forecasted values (with 90 %
confidence limits) are shown in the graph below.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[6/27/2012 2:36:41 PM]
6. Process or Product Monitoring and Control
6.4. Introduction to Time Series Analysis
6.4.4. Univariate Time Series Models
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
Series G This example illustrates a Box-Jenkins time series analysis
for seasonal data using the series G data set in Box, Jenkins,
and Reinsel, 1994. A plot of the 144 observations is shown
below.
Non-constant variance can be removed by performing a
natural log transformation.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[6/27/2012 2:36:41 PM]
Next, we remove trend in the series by taking first
differences. The resulting series is shown below.
Analyzing
Autocorrelation
Plot for
Seasonality
To identify an appropriate model, we plot the ACF of the
time series.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[6/27/2012 2:36:41 PM]
If very large autocorrelations are observed at lags spaced n
periods apart, for example at lags 12 and 24, then there is
evidence of periodicity. That effect should be removed, since
the objective of the identification stage is to reduce the
autocorrelation throughout. So if simple differencing is not
enough, try seasonal differencing at a selected period, such
as 4, 6, or 12. In our example, the seasonal period is 12.
A plot of Series G after taking the natural log, first
differencing, and seasonal differencing is shown below.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[6/27/2012 2:36:41 PM]
The number of seasonal terms is rarely more than one. If you
know the shape of your forecast function, or you wish to
assign a particular shape to the forecast function, you can
select the appropriate number of terms for seasonal AR or
seasonal MA models.
The book by Box and Jenkins, Time Series Analysis
Forecasting and Control (the later edition is Box, Jenkins
and Reinsel, 1994) has a discussion on these forecast
functions on pages 326 - 328. Again, if you have only a faint
notion, but you do know that there was a trend upwards
before differencing, pick a seasonal MA term and see what
comes out in the diagnostics.
An ACF plot of the seasonal and first differenced natural log
of series G is shown below.
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[6/27/2012 2:36:41 PM]
The plot has a few spikes, but most autocorrelations are near
zero, indicating that a seasonal MA(1) model is appropriate.
Model Fitting We fit an MA(1) model to the data.
The model fitting results are shown below.
Seasonal
Estimate MA(1) MA(1)
-------- ------- -------
Parameter -0.4018 -0.5569
Standard Error 0.0896 0.0731
Residual standard deviation = 0.0367
Log likelihood = 244.7
AIC = -483.4
Test the randomness of the residuals up to 30 lags using the
Box-Ljung test. Recall that the degrees of freedom for the
critical region must be adjusted to account for two estimated
parameters.
H
0
: The residuals are random.
H
a
: The residuals are not random.
Test statistic: Q = 29.4935
Significance level: = 0.05
Degrees of freedom: h = 30 - 2 = 28
Critical value:
2
1-,h
= 41.3371
Critical region: Reject H
0
if Q > 41.3371
Since the null hypothesis of the Box-Ljung test is not
rejected we conclude that the fitted model is adequate.
Forecasting Using our seasonal MA(1) model, we forcast values 12
6.4.4.10. Box-Jenkins Analysis on Seasonal Data
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44a.htm[6/27/2012 2:36:41 PM]
periods into the future and compute 90 % confidence limits.
Lower Upper
Period Limit Forecast Limit
------ -------- -------- --------
145 424.0234 450.7261 478.4649
146 396.7861 426.0042 456.7577
147 442.5731 479.3298 518.4399
148 451.3902 492.7365 537.1454
149 463.3034 509.3982 559.3245
150 527.3754 583.7383 645.2544
151 601.9371 670.4625 745.7830
152 595.7602 667.5274 746.9323
153 495.7137 558.5657 628.5389
154 439.1900 497.5430 562.8899
155 377.7598 430.1618 489.1730
156 417.3149 477.5643 545.7760
6.4.5. Multivariate Time Series Models
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc45.htm[6/27/2012 2:36:43 PM]
6. Process or Product Monitoring and Control
6.4. Introduction to Time Series Analysis
6.4.5. Multivariate Time Series Models
If each time
series
observation
is a vector
of numbers,
you can
model them
using a
multivariate
form of the
Box-Jenkins
model
The multivariate form of the Box-Jenkins univariate models
is sometimes called the ARMAV model, for AutoRegressive
Moving Average Vector or simply vector ARMA process.
The ARMAV model for a stationary multivariate time series,
with a zero mean vector, represented by
is of the form
where
x
t
and a
t
are n x 1 column vectors with a
t
representing
multivariate white noise
are n x n matrices for autoregressive and moving
average parameters
E[a
t
] = 0
where
a
is the dispersion or covariance matrix of a
t
As an example, for a bivariate series with n = 2, p = 2, and q
= 1, the ARMAV(2,1) model is:
6.4.5. Multivariate Time Series Models
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc45.htm[6/27/2012 2:36:43 PM]
with
Estimation
of
parameters
and
covariance
matrix
difficult
The estimation of the matrix parameters and covariance
matrix is complicated and very difficult without computer
software. The estimation of the Moving Average matrices is
especially an ordeal. If we opt to ignore the MA
component(s) we are left with the ARV model given by:
where
x
t
is a vector of observations, x
1t
, x
2t
, ... , x
nt
at time t
a
t
is a vector of white noise, a
1t
, a
2t
, ... , a
nt
at time t
is a n x n matrix of autoregressive parameters
E[a
t
] = 0
where
a
is the dispersion or covariance matrix
A model with p autoregressive matrix parameters is an
ARV(p) model or a vector AR model.
The parameter matrices may be estimated by multivariate
least squares, but there are other methods such as maximium
likelihood estimation.
Interesting
properties
of
parameter
matrices
There are a few interesting properties associated with the phi
or AR parameter matrices. Consider the following example
for a bivariate series with n =2, p = 2, and q = 0. The
ARMAV(2,0) model is:
6.4.5. Multivariate Time Series Models
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc45.htm[6/27/2012 2:36:43 PM]
Without loss of generality, assume that the X series is input and the Y series
are output and that the mean vector = (0,0).
Therefore, tranform the observation by subtracting their respective averages.
Diagonal
terms of
Phi matrix
The diagonal terms of each Phi matrix are the scalar estimates for each
series, in this case:
1.11
,
2.11
for the input series X,
1.22
, .
2.22
for the output series Y.
Transfer
mechanism
The lower off-diagonal elements represent the influence of the input on the
output.
This is called the "transfer" mechanism or transfer-function model as
discussed by Box and Jenkins in Chapter 11. The terms here correspond to
their terms.
The upper off-diagonal terms represent the influence of the output on the
input.
Feedback This is called "feedback". The presence of feedback can also be seen as a
high value for a coefficient in the correlation matrix of the residuals. A "true"
transfer model exists when there is no feedback.
This can be seen by expressing the matrix form into scalar form:
Delay Finally, delay or "dead' time can be measured by studying the lower off-
diagonal elements again.
If, for example,
1.21
is non-significant, the delay is 1 time period.
6.4.5.1. Example of Multivariate Time Series Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc451.htm[6/27/2012 2:36:44 PM]
6. Process or Product Monitoring and Control
6.4. Introduction to Time Series Analysis
6.4.5. Multivariate Time Series Models
6.4.5.1. Example of Multivariate Time Series Analysis
Bivariate
Gas
Furance
Example
The gas furnace data from Box, Jenkins, and Reinsel, 1994 is used to
illustrate the analysis of a bivariate time series. Inside the gas furnace, air and
methane were combined in order to obtain a mixture of gases containing
CO
2
(carbon dioxide). The input series is the methane gas feedrate described
by
Methane Gas Input Feed = 0.60 - 0.04 X(t)
the CO
2
concentration was the output series, Y(t). In this experiment 296
successive pairs of observations (X
t,
Y
t
) were collected from continuous
records at 9-second intervals. For the analysis described here, only the first
60 pairs were used. We fit an ARV(2) model as described in 6.4.5.
Plots of
input and
output
series
The plots of the input and output series are displayed below.
6.4.5.1. Example of Multivariate Time Series Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc451.htm[6/27/2012 2:36:44 PM]
Model
Fitting
The scalar form of the ARV(2) model is the following.
The equation for x
t
corresponds to gas rate while the equation for y
t
corresponds to CO
2
concentration.
The parameter estimates for the equation associated with gas rate are the
following.
Estimate Std. Err. t value Pr(>|t|)
a
1t
0.003063 0.035769 0.086 0.932
1.11
1.683225 0.123128 13.671 < 2e-16
2.11
-0.860205 0.165886 -5.186 3.44e-06
1.12
-0.076224 0.096947 -0.786 0.435
2.12
0.044774 0.082285 0.544 0.589
Residual standard error: 0.2654 based on 53 degrees of freedom
Multiple R-Squared: 0.9387
Adjusted R-squared: 0.9341
F-statistic: 203.1 based on 4 and 53 degrees of freedom
p-value: < 2.2e-16
The parameter estimates for the equation associated with CO
2
concentration
are the following.
Estimate Std. Err. t value Pr(>|t|)
a
2t
-0.03372 0.01615 -2.088 0.041641
1.22
1.22630 0.04378 28.013 < 2e-16
2.22
-0.40927 0.03716 -11.015 2.57e-15
0.22898 0.05560 4.118 0.000134
6.4.5.1. Example of Multivariate Time Series Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc451.htm[6/27/2012 2:36:44 PM]
1.21
2.21
-0.80532 0.07491 -10.751 6.29e-15
Residual standard error: 0.1198 based on 53 degrees of freedom
Multiple R-Squared: 0.9985
Adjusted R-squared: 0.9984
F-statistic: 8978 based on 4 and 53 degrees of freedom
p-value: < 2.2e-16
Box-Ljung tests performed for each series to test the randomness of the first
24 residuals were not significant. The p-values for the tests using CO
2
concentration residuals and gas rate residuals were 0.4 and 0.6, respectively.
Forecasting The forecasting method is an extension of the model and follows the theory
outlined in the previous section. The forecasted values of the next six
observations (61-66) and the associated 90 % confidence limits are shown
below for each series.
90% Lower Concentration 90% Upper
Observation Limit Forecast Limit
----------- --------- -------- ---------
61 51.0 51.2 51.4
62 51.0 51.3 51.6
63 50.6 51.0 51.4
64 49.8 50.5 51.1
65 48.7 50.0 51.3
66 47.6 49.7 51.8
90% Lower Rate 90% Upper
Observation Limit Forecast Limit
----------- --------- -------- ---------
61 0.795 1.231 1.668
62 0.439 1.295 2.150
63 0.032 1.242 2.452
64 -0.332 1.128 2.588
65 -0.605 1.005 2.614
66 -0.776 0.908 2.593
6.5. Tutorials
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5.htm[6/27/2012 2:36:45 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
Tutorial
contents
1. What do we mean by "Normal" data?
2. What do we do when data are "Non-normal"?
3. Elements of Matrix Algebra
1. Numerical Examples
2. Determinant and Eigenstructure
4. Elements of Multivariate Analysis
1. Mean vector and Covariance Matrix
2. The Multivariate Normal Distribution
3. Hotelling's T
2
1. Example of Hotelling's T
2
Test
2. Example 1 (continued)
3. Example 2 (multiple groups)
4. Hotelling's T
2
Chart
5. Principal Components
1. Properties of Principal Components
2. Numerical Example
6.5.1. What do we mean by "Normal" data?
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc51.htm[6/27/2012 2:36:46 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.1. What do we mean by "Normal" data?
The Normal
distribution
model
"Normal" data are data that are drawn (come from) a
population that has a normal distribution. This distribution is
inarguably the most important and the most frequently used
distribution in both the theory and application of statistics. If
X is a normal random variable, then the probability
distribution of X is
Normal
probability
distribution
Parameters
of normal
distribution
The parameters of the normal distribution are the mean and
the standard deviation (or the variance
2
). A special
notation is employed to indicate that X is normally distributed
with these parameters, namely
X ~ N( , ) or X ~ N( ,
2
).
Shape is
symmetric
and unimodal
The shape of the normal distribution is symmetric and
unimodal. It is called the bell-shaped or Gaussian
distribution after its inventor, Gauss (although De Moivre
also deserves credit).
The visual appearance is given below.
6.5.1. What do we mean by "Normal" data?
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc51.htm[6/27/2012 2:36:46 PM]
Property of
probability
distributions
is that area
under curve
equals one
A property of a special class of non-negative functions,
called probability distributions, is that the area under the
curve equals unity. One finds the area under any portion of
the curve by integrating the distribution between the specified
limits. The area under the bell-shaped curve of the normal
distribution can be shown to be equal to 1, and therefore the
normal distribution is a probability distribution.
Interpretation
of
There is a simple interpretation of
68.27% of the population fall between +/- 1
95.45% of the population fall between +/- 2
99.73% of the population fall between +/- 3
The
cumulative
normal
distribution
The cumulative normal distribution is defined as the
probability that the normal variate is less than or equal to
some value v, or
Unfortunately this integral cannot be evaluated in closed
form and one has to resort to numerical methods. But even
so, tables for all possible values of and would be
required. A change of variables rescues the situation. We let
Now the evaluation can be made independently of and ;
that is,
where (.) is the cumulative distribution function of the
standard normal distribution ( = 0, = 1).
Tables for the
cumulative
standard
normal
distribution
Tables of the cumulative standard normal distribution are
given in every statistics textbook and in the handbook. A rich
variety of approximations can be found in the literature on
numerical methods.
For example, if = 0 and = 1 then the area under the curve
from - 1 to + 1 is the area from 0 - 1 to 0 + 1, which
is 0.6827. Since most standard normal tables give area to the
left of the lookup value, they will have for z = 1 an area of
6.5.1. What do we mean by "Normal" data?
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc51.htm[6/27/2012 2:36:46 PM]
.8413 and for z = -1 an area of .1587. By subtraction we
obtain the area between -1 and +1 to be .8413 - .1587 =
.6826.
6.5.2. What do we do when data are non-normal
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm[6/27/2012 2:36:47 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.2. What to do when data are non-normal
Often it is
possible to
transform non-
normal data
into
approximately
normal data
Non-normality is a way of life, since no characteristic (height,
weight, etc.) will have exactly a normal distribution. One
strategy to make non-normal data resemble normal data is by
using a transformation. There is no dearth of transformations in
statistics; the issue is which one to select for the situation at
hand. Unfortunately, the choice of the "best" transformation is
generally not obvious.
This was recognized in 1964 by G.E.P. Box and D.R. Cox. They
wrote a paper in which a useful family of power transformations
was suggested. These transformations are defined only for
positive data values. This should not pose any problem because
a constant can always be added if the set of observations
contains one or more negative values.
The Box-Cox power transformations are given by
The Box-Cox
Transformation
Given the vector of data observations x = x
1
, x
2
, ...x
n
, one way
to select the power is to use the that maximizes the
logarithm of the likelihood function
The logarithm
of the
likelihood
function
where
is the arithmetic mean of the transformed data.
Confidence
bound for
In addition, a confidence bound (based on the likelihood ratio
statistic) can be constructed for as follows: A set of values
that represent an approximate 100(1- )% confidence bound for
is formed from those that satisfy
6.5.2. What do we do when data are non-normal
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm[6/27/2012 2:36:47 PM]
where denotes the maximum likelihood estimator for and
2
1-, 1
is the 100(1- ) percentile of the chi-square distribution
with 1 degree of freedom.
Example of the
Box-Cox
scheme
To illustrate the procedure, we used the data from Johnson and
Wichern's textbook (Prentice Hall 1988), Example 4.14. The
observations are microwave radiation measurements.
Sample data
.15 .09 .18 .10 .05 .12 .08
.05 .08 .10 .07 .02 .01 .10
.10 .10 .02 .10 .01 .40 .10
.05 .03 .05 .15 .10 .15 .09
.08 .18 .10 .20 .11 .30 .02
.20 .20 .30 .30 .40 .30 .05
Table of log-
likelihood
values for
various values
of
The values of the log-likelihood function obtained by varying
from -2.0 to 2.0 are given below.
LLF LLF LLF
-2.0 7.1146 -0.6 89.0587 0.7 103.0322
-1.9 14.1877 -0.5 92.7855 0.8 101.3254
-1.8 21.1356 -0.4 96.0974 0.9 99.3403
-1.7 27.9468 -0.3 98.9722 1.0 97.1030
-1.6 34.6082 -0.2 101.3923 1.1 94.6372
-1.5 41.1054 -0.1 103.3457 1.2 91.9643
-1.4 47.4229 0.0 104.8276 1.3 89.1034
-1.3 53.5432 0.1 105.8406 1.4 86.0714
1.2 59.4474 0.2 106.3947 1.5 82.8832
-1.1 65.1147 0.3 106.5069 1.6 79.5521
-0.9 75.6471 0.4 106.1994 1.7 76.0896
-0.8 80.4625 0.5 105.4985 1.8 72.5061
-0.7 84.9421 0.6 104.4330 1.9 68.8106
This table shows that = .3 maximizes the log-likelihood
function (LLF). This becomes 0.28 if a second digit of accuracy
is calculated.
The Box-Cox transform is also discussed in Chapter 1 under the
Box Cox Linearity Plot and the Box Cox Normality Plot. The
Box-Cox normality plot discussion provides a graphical method
for choosing to transform a data set to normality. The criterion
used to choose for the Box-Cox linearity plot is the value of
that maximizes the correlation between the transformed x-values
and the y-values when making a normal probability plot of the
6.5.2. What do we do when data are non-normal
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc52.htm[6/27/2012 2:36:47 PM]
(transformed) data.
6.5.3. Elements of Matrix Algebra
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc53.htm[6/27/2012 2:36:48 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.3. Elements of Matrix Algebra
Elementary Matrix Algebra
Basic
definitions
and
operations of
matrix
algebra -
needed for
multivariate
analysis
Vectors and matrices are arrays of numbers. The algebra
for symbolic operations on them is different from the
algebra for operations on scalars, or single numbers. For
example there is no division in matrix algebra, although
there is an operation called "multiplying by an inverse". It
is possible to express the exact equivalent of matrix algebra
equations in terms of scalar algebra expressions, but the
results look rather messy.
It can be said that the matrix algebra notation is shorthand
for the corresponding scalar longhand.
Vectors A vector is a column of numbers
The scalars a
i
are the elements of vector a.
Transpose The transpose of a, denoted by a', is the row arrangement
of the elements of a.
Sum of two
vectors
The sum of two vectors (say, a and b) is the vector of sums
of corresponding elements.
The difference of two vectors is the vector of differences of
corresponding elements.
6.5.3. Elements of Matrix Algebra
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc53.htm[6/27/2012 2:36:48 PM]
Product of
a'b
The product a'b is a scalar formed by
which may be written in shortcut notation as
where a
i
and b
i
are the ith elements of vector a and b,
respectively.
Product of
ab'
The product ab' is a square matrix
Product of
scalar times a
vector
The product of a scalar k, times a vector a is k times each
element of a
A matrix is a
rectangular
table of
numbers
A matrix is a rectangular table of numbers, with p rows and
n columns. It is also referred to as an array of n column
vectors of length p. Thus
is a p by n matrix. The typical element of A is a
ij
, denoting
the element of row i and column j.
Matrix
addition and
Matrices are added and subtracted on an element-by-
element basis. Thus
6.5.3. Elements of Matrix Algebra
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc53.htm[6/27/2012 2:36:48 PM]
subtraction
Matrix
multiplication
Matrix multiplication involves the computation of the sum
of the products of elements from a row of the first matrix
(the premultiplier on the left) and a column of the second
matrix (the postmultiplier on the right). This sum of
products is computed for every combination of rows and
columns. For example, if A is a 2 x 3 matrix and B is a 3 x
2 matrix, the product AB is
Thus, the product is a 2 x 2 matrix. This came about as
follows: The number of columns of A must be equal to the
number of rows of B. In this case this is 3. If they are not
equal, multiplication is impossible. If they are equal, then
the number of rows of the product AB is equal to the
number of rows of A and the number of columns is equal to
the number of columns of B.
Example of
3x2 matrix
multiplied by
a 2x3
It follows that the result of the product BA is a 3 x 3 matrix
General case
for matrix
multiplication
In general, if A is a k x p matrix and B is a p x n matrix, the
product AB is a k x n matrix. If k = n, then the product BA
can also be formed. We say that matrices conform for the
operations of addition, subtraction or multiplication when
their respective orders (numbers of row and columns) are
such as to permit the operations. Matrices that do not
conform for addition or subtraction cannot be added or
subtracted. Matrices that do not conform for multiplication
cannot be multiplied.
6.5.3.1. Numerical Examples
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc531.htm[6/27/2012 2:36:50 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.3. Elements of Matrix Algebra
6.5.3.1. Numerical Examples
Numerical
examples of
matrix
operations
Numerical examples of the matrix operations described on
the previous page are given here to clarify these operations.
Sample
matrices
If
then
Matrix
addition,
subtraction,
and
multipication
and
Multiply
matrix by a
scalar
To multiply a a matrix by a given scalar, each element of
the matrix is multiplied by that scalar
Pre-
multiplying
matrix by
transpose of
a vector
Pre-multiplying a p x n matrix by the transpose of a p-
element vector yields a n-element transpose
6.5.3.1. Numerical Examples
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc531.htm[6/27/2012 2:36:50 PM]
Post-
multiplying
matrix by
vector
Post-multiplying a p x n matrix by an n-element vector
yields an n-element vector
Quadratic
form
It is not possible to pre-multiply a matrix by a column
vector, nor to post-multiply a matrix by a row vector. The
matrix product a'Ba yields a scalar and is called a quadratic
form. Note that B must be a square matrix if a'Ba is to
conform to multiplication. Here is an example of a quadratic
form
Inverting a
matrix
The matrix analog of division involves an operation called
inverting a matrix. Only square matrices can be inverted.
Inversion is a tedious numerical procedure and it is best
performed by computers. There are many ways to invert a
matrix, but ultimately whichever method is selected by a
program is immaterial. If you wish to try one method by
hand, a very popular numerical method is the Gauss-Jordan
method.
Identity
matrix
To augment the notion of the inverse of a matrix, A
-1
(A
inverse) we notice the following relation
A
-1
A = A A
-1
= I
I is a matrix of form
I is called the identity matrix and is a special case of a
diagonal matrix. Any matrix that has zeros in all of the off-
diagonal positions is a diagonal matrix.
6.5.3.1. Numerical Examples
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc531.htm[6/27/2012 2:36:50 PM]
6.5.3.2. Determinant and Eigenstructure
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc532.htm[6/27/2012 2:36:50 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.3. Elements of Matrix Algebra
6.5.3.2. Determinant and Eigenstructure
A matrix
determinant is
difficult to
define but a
very useful
number
Unfortunately, not every square matrix has an inverse
(although most do). Associated with any square matrix is a
single number that represents a unique function of the
numbers in the matrix. This scalar function of a square
matrix is called the determinant. The determinant of a
matrix A is denoted by |A|. A formal definition for the
deteterminant of a square matrix A = (a
ij
) is somewhat
beyond the scope of this Handbook. Consult any good
linear algebra textbook if you are interested in the
mathematical details.
Singular
matrix
As is the case of inversion of a square matrix, calculation
of the determinant is tedious and computer assistance is
needed for practical calculations. If the determinant of the
(square) matrix is exactly zero, the matrix is said to be
singular and it has no inverse.
Determinant
of variance-
covariance
matrix
Of great interest in statistics is the determinant of a square
symmetric matrix D whose diagonal elements are sample
variances and whose off-diagonal elements are sample
covariances. Symmetry means that the matrix and its
transpose are identical (i.e., A = A'). An example is
where s
1
and s
2
are sample standard deviations and r
ij
is
the sample correlation.
D is the sample variance-covariance matrix for
observations of a multivariate vector of p elements. The
determinant of D, in this case, is sometimes called the
generalized variance.
Characteristic
equation
In addition to a determinant and possibly an inverse, every
square matrix has associated with it a characteristic
equation. The characteristic equation of a matrix is formed
6.5.3.2. Determinant and Eigenstructure
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc532.htm[6/27/2012 2:36:50 PM]
by subtracting some particular value, usually denoted by
the greek letter (lambda), from each diagonal element of
the matrix, such that the determinant of the resulting
matrix is equal to zero. For example, the characteristic
equation of a second order (2 x 2) matrix A may be
written as
Definition of
the
characteristic
equation for
2x2 matrix
Eigenvalues of
a matrix
For a matrix of order p, there may be as many as p
different values for that will satisfy the equation. These
different values are called the eigenvalues of the matrix.
Eigenvectors
of a matrix
Associated with each eigenvalue is a vector, v, called the
eigenvector. The eigenvector satisfies the equation
Av = v
Eigenstructure
of a matrix
If the complete set of eigenvalues is arranged in the
diagonal positions of a diagonal matrix V, the following
relationship holds
AV = VL
This equation specifies the complete eigenstructure of A.
Eigenstructures and the associated theory figure heavily in
multivariate procedures and the numerical evaluation of L
and V is a central computing problem.
6.5.4. Elements of Multivariate Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc54.htm[6/27/2012 2:36:51 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
Multivariate
analysis
Multivariate analysis is a branch of statistics concerned
with the analysis of multiple measurements, made on one or
several samples of individuals. For example, we may wish
to measure length, width and weight of a product.
Multiple
measurement,
or
observation,
as row or
column
vector
A multiple measurement or observation may be expressed
as
x = [4 2 0.6]
referring to the physical properties of length, width and
weight, respectively. It is customary to denote multivariate
quantities with bold letters. The collection of measurements
on x is called a vector. In this case it is a row vector. We
could have written x as a column vector.
Matrix to
represent
more than
one multiple
measurement
If we take several such measurements, we record them in a
rectangular array of numbers. For example, the X matrix
below represents 5 observations, on each of three variables.
By
convention,
rows
typically
represent
In this case the number of rows, (n = 5), is the number of
observations, and the number of columns, (p = 3), is the
number of variables that are measured. The rectangular
array is an assembly of n row vectors of length p. This array
is called a matrix, or, more specifically, a n by p matrix. Its
6.5.4. Elements of Multivariate Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc54.htm[6/27/2012 2:36:51 PM]
observations
and columns
represent
variables
name is X. The names of matrices are usually written in
bold, uppercase letters, as in Section 6.5.3. We could just as
well have written X as a p (variables) by n (measurements)
matrix as follows:
Definition of
Transpose
A matrix with rows and columns exchanged in this manner
is called the transpose of the original matrix.
6.5.4.1. Mean Vector and Covariance Matrix
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm[6/27/2012 2:36:52 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.1. Mean Vector and Covariance Matrix
The first step in analyzing multivariate data is computing the
mean vector and the variance-covariance matrix.
Sample
data
matrix
Consider the following matrix:
The set of 5 observations, measuring 3 variables, can be
described by its mean vector and variance-covariance matrix.
The three variables, from left to right are length, width, and
height of a certain object, for example. Each row vector X
i
is
another observation of the three variables (or components).
Definition
of mean
vector and
variance-
covariance
matrix
The mean vector consists of the means of each variable and
the variance-covariance matrix consists of the variances of the
variables along the main diagonal and the covariances between
each pair of variables in the other matrix positions.
The formula for computing the covariance of the variables X
and Y is
with and denoting the means of X and Y, respectively.
Mean
vector and
variance-
covariance
matrix for
sample
data
matrix
The results are:
6.5.4.1. Mean Vector and Covariance Matrix
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc541.htm[6/27/2012 2:36:52 PM]
where the mean vector contains the arithmetic averages of the
three variables and the (unbiased) variance-covariance matrix
S is calculated by
where n = 5 for this example.
Thus, 0.025 is the variance of the length variable, 0.0075 is the
covariance between the length and the width variables,
0.00175 is the covariance between the length and the height
variables, 0.007 is the variance of the width variable, 0.00135
is the covariance between the width and height variables and
.00043 is the variance of the height variable.
Centroid,
dispersion
matix
The mean vector is often referred to as the centroid and the
variance-covariance matrix as the dispersion or dispersion
matrix. Also, the terms variance-covariance matrix and
covariance matrix are used interchangeably.
6.5.4.2. The Multivariate Normal Distribution
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc542.htm[6/27/2012 2:36:53 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.2. The Multivariate Normal Distribution
Multivariate
normal
model
When multivariate data are analyzed, the multivariate normal model is
the most commonly used model.
The multivariate normal distribution model extends the univariate normal
distribution model to fit vector observations.
Definition
of
multivariate
normal
distribution
A p-dimensional vector of random variables
is said to have a multivariate normal distribution if its density function
f(X) is of the form
where m = (m
1
, ..., m
p
) is the vector of means and is the variance-
covariance matrix of the multivariate normal distribution. The shortcut
notation for this density is
Univariate
normal
distribution
When p = 1, the one-dimensional vector X = X
1
has the normal
distribution with mean m and variance
2
Bivariate
normal
distribution
When p = 2, X = (X
1
,X
2
) has the bivariate normal distribution with a
two-dimensional vector of means, m = (m
1
,m
2
) and covariance matrix
The correlation between the two random variables is given by
6.5.4.2. The Multivariate Normal Distribution
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc542.htm[6/27/2012 2:36:53 PM]
6.5.4.3. Hotelling's <i>T</i> squared
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc543.htm[6/27/2012 2:36:54 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
Hotelling's
T
2
distribution
A multivariate method that is the multivariate counterpart of
Student's-t and which also forms the basis for certain
multivariate control charts is based on Hotelling's T
2
distribution, which was introduced by Hotelling (1947).
Univariate
t-test for
mean
Recall, from Section 1.3.5.2,
has a t distribution provided that X is normally distributed,
and can be used as long as X doesn't differ greatly from a
normal distribution. If we wanted to test the hypothesis that
=
0
, we would then have
so that
Generalize
to p
variables
When t
2
is generalized to p variables it becomes
with
S
-1
is the inverse of the sample variance-covariance matrix,
S, and n is the sample size upon which each
i
, i = 1, 2, ..., p,
is based. (The diagonal elements of S are the variances and
the off-diagonal elements are the covariances for the p
6.5.4.3. Hotelling's <i>T</i> squared
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc543.htm[6/27/2012 2:36:54 PM]
variables. This is discussed further in Section 6.5.4.3.1.)
Distribution
of T
2
It is well known that when =
0
with F
(p,n-p)
representing the F distribution with p degrees of
freedom for the numerator and n - p for the denominator.
Thus, if were specified to be
0
, this could be tested by
taking a single p-variate sample of size n, then computing T
2
and comparing it with
for a suitably chosen .
Result does
not apply
directly to
multivariate
Shewhart-
type charts
Although this result applies to hypothesis testing, it does not
apply directly to multivariate Shewhart-type charts (for
which there is no
0
), although the result might be used as an
approximation when a large sample is used and data are in
subgroups, with the upper control limit (UCL) of a chart
based on the approximation.
Three-
sigma limits
from
univariate
control
chart
When a univariate control chart is used for Phase I (analysis
of historical data), and subsequently for Phase II (real-time
process monitoring), the general form of the control limits is
the same for each phase, although this need not be the case.
Specifically, three-sigma limits are used in the univariate
case, which skirts the relevant distribution theory for each
Phase.
Selection of
different
control
limit forms
for each
Phase
Three-sigma units are generally not used with multivariate
charts, however, which makes the selection of different
control limit forms for each Phase (based on the relevant
distribution theory), a natural choice.
6.5.4.3.1. T<sup>2</sup> Chart for Subgroup Averages -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5431.htm[6/27/2012 2:36:55 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.1.
T
2
Chart for Subgroup Averages --
Phase I
Estimate
with
Since is generally unknown, it is necessary to estimate
analogous to the way that is estimated when an chart is
used. Specifically, when there are rational subgroups, is
estimated by , with
Obtaining
the
i
Each
i
, i = 1, 2, ..., p, is obtained the same way as with an
chart, namely, by taking k subgroups of size n and computing
.
Here is used to denote the average for the lth subgroup of
the ith variable. That is,
with x
ilr
denoting the rth observation (out of n) for the ith
variable in the lth subgroup.
Estimating
the
variances
and
covariances
The variances and covariances are similarly averaged over the
subgroups. Specifically, the s
ij
elements of the variance-
covariance matrix S are obtained as
with s
ijl
for i j denoting the sample covariance between
variables X
i
and X
j
for the lth subgroup, and s
ij
for i = j
6.5.4.3.1. T<sup>2</sup> Chart for Subgroup Averages -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5431.htm[6/27/2012 2:36:55 PM]
denotes the sample variance of X
i
. The variances (= s
iil
)
for subgroup l and for variables i = 1, 2, ..., p are computed as
.
Similarly, the covariances s
ijl
between variables X
i
and X
j
for
subgroup l are computed as
.
Compare
T
2
against
control
values
As with an chart (or any other chart), the k subgroups
would be tested for control by computing k values of T
2
and
comparing each against the UCL. If any value falls above the
UCL (there is no lower control limit), the corresponding
subgroup would be investigated.
Formula
for plotted
T
2
values
Thus, one would plot
for the jth subgroup (j = 1, 2, ..., k), with denoting a vector
with p elements that contains the subgroup averages for each
of the p characteristics for the jth subgroup. ( is the
inverse matrix of the "pooled" variance-covariance matrix,
, which is obtained by averaging the subgroup variance-
covariance matrices over the k subgroups.)
Formula
for the
upper
control
limit
Each of the k values of given in the equation above would
be compared with
Lower
control
limits
A lower control limit is generally not used in multivariate
control chart applications, although some control chart
methods do utilize a LCL. Although a small value for
might seem desirable, a value that is very small would likely
indicate a problem of some type as we would not expect
every element of to be virtually equal to every element in
.
Delete out-
of-control
points once
cause
discovered
and
As with any Phase I control chart procedure, if there are any
points that plot above the UCL and can be identified as
corresponding to out-of-control conditions that have been
corrected, the point(s) should be deleted and the UCL
recomputed. The remaining points would then be compared
with the new UCL and the process continued as long as
6.5.4.3.1. T<sup>2</sup> Chart for Subgroup Averages -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5431.htm[6/27/2012 2:36:55 PM]
corrected necessary, remembering that points should be deleted only if
their correspondence with out-of-control conditions can be
identified and the cause(s) of the condition(s) were removed.
6.5.4.3.2. <i>T</i><sup>2</sup> Chart for Subgroup Averages -- Phase II
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5432.htm[6/27/2012 2:36:56 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.2.
T
2
Chart for Subgroup Averages -- Phase II
Phase II
requires
recomputing
S
p
and
and
different
control
limits
Determining the UCL that is to be subsequently applied to future
subgroups entails recomputing, if necessary, S
p
and , and using a
constant and an F-value that are different from the form given for the
Phase I control limits. The form is different because different
distribution theory is involved since future subgroups are assumed to be
independent of the "current" set of subgroups that is used in calculating
S
p
and . (The same thing happens with charts; the problem is simply
ignored through the use of 3-sigma limits, although a different approach
should be used when there is a small number of subgroups -- and the
necessary theory has been worked out.)
Illustration To illustrate, assume that a subgroups had been discarded (with possibly
a = 0) so that k - a subgroups are used in obtaining and . We shall
let these two values be represented by and to distinguish them
from the original values, and , before any subgroups are deleted.
Future values to be plotted on the multivariate chart would then be
obtained from
with denoting an arbitrary vector containing the averages for
the p characteristics for a single subgroup obtained in the future. Each
of these future values would be plotted on the multivariate chart and
compared with
Phase II
control
limits
with a denoting the number of the original subgroups that are deleted
before computing and . Notice that the equation for the control
limits for Phase II given here does not reduce to the equation for the
control limits for Phase I when a = 0, nor should we expect it to since
the Phase I UCL is used when testing for control of the entire set of
subgroups that is used in computing and .
6.5.4.3.3. Chart for Individual Observations -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5433.htm[6/27/2012 2:36:57 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.3. Chart for Individual Observations --
Phase I
Multivariate
individual
control
charts
Control charts for multivariate individual observations can
be constructed, just as charts can be constructed for
univariate individual observations.
Constructing
the control
chart
Assume there are m historical multivariate observations to be
tested for control, so that Q
j
, j = 1, 2, ...., m are computed,
with
Control
limits
Each value of Q
j
is compared against control limits of
with B( ) denoting the beta distribution with parameters p/2
and (m-p-1)/2. These limits are due to Tracy, Young and
Mason (1992). Note that a LCL is stated, unlike the other
multivariate control chart procedures given in this section.
Although interest will generally be centered at the UCL, a
value of Q below the LCL should also be investigated, as
this could signal problems in data recording.
Delete
points if
special
cause(s) are
identified
and
corrected
As in the case when subgroups are used, if any points plot
outside these control limits and special cause(s) that were
subsequently removed can be identified, the point(s) would
be deleted and the control limits recomputed, making the
appropriate adjustments on the degrees of freedom, and re-
testing the remaining points against the new limits.
6.5.4.3.3. Chart for Individual Observations -- Phase I
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5433.htm[6/27/2012 2:36:57 PM]
6.5.4.3.4. Chart for Individual Observations -- Phase II
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5434.htm[6/27/2012 2:36:57 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.4. Chart for Individual Observations --
Phase II
Control
limits
In Phase II, each value of Q
j
would be plotted against the
UCL of
with, as before, p denoting the number of characteristics.
Further
Information
The control limit expressions given in this section and the
immediately preceding sections are given in Ryan (2000,
Chapter 9).
6.5.4.3.5. Charts for Controlling Multivariate Variability
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5435.htm[6/27/2012 2:36:58 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.5. Charts for Controlling Multivariate
Variability
No
satisfactory
charts for
multivariate
variability
Unfortunately, there are no charts for controlling multivariate
variability, with either subgroups or individual observations,
that are simple, easy-to-understand and implement, and
statistically defensible. Methods based on the generalized
variance have been proposed for subgroup data, but such
methods have been criticized by Ryan (2000, Section 9.4)
and some references cited therein. For individual
observations, the multivariate analogue of a univariate
moving range chart might be considered as an estimator of
the variance-covariance matrix for Phase I, although the
distribution of the estimator is unknown.
6.5.4.3.6. Constructing Multivariate Charts
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc5436.htm[6/27/2012 2:36:58 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.4. Elements of Multivariate Analysis
6.5.4.3. Hotelling's T squared
6.5.4.3.6. Constructing Multivariate Charts
Multivariate
control
charts not
commonly
available in
statistical
software
Although control charts were originally constructed and
maintained by hand, it would be extremely impractical to try
to do that with the chart procedures that were presented in
Sections 6.5.4.3.1-6.5.4.3.4. Unfortunately, the well-known
statistical software packages do not have capability for the
four procedures just outlined. However, Dataplot, which is
used for case studies and tutorials throughout this e-
Handbook, does have that capability.
6.5.5. Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc55.htm[6/27/2012 2:36:59 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.5. Principal Components
Dimension
reduction tool
A Multivariate Analysis problem could start out with a
substantial number of correlated variables. Principal
Component Analysis is a dimension-reduction tool that
can be used advantageously in such situations. Principal
component analysis aims at reducing a large set of
variables to a small set that still contains most of the
information in the large set.
Principal
factors
The technique of principal component analysis enables us
to create and use a reduced set of variables, which are
called principal factors. A reduced set is much easier to
analyze and interpret. To study a data set that results in the
estimation of roughly 500 parameters may be difficult, but
if we could reduce these to 5 it would certainly make our
day. We will show in what follows how to achieve
substantial dimension reduction.
Inverse
transformaion
not possible
While these principal factors represent or replace one or
more of the original variables, it should be noted that they
are not just a one-to-one transformation, so inverse
transformations are not possible.
Original data
matrix
To shed a light on the structure of principal components
analysis, let us consider a multivariate data matrix X, with
n rows and p columns. The p elements of each row are
scores or measurements on a subject such as height, weight
and age.
Linear
function that
maximizes
variance
Next, standardize the X matrix so that each column mean is
0 and each column variance is 1. Call this matrix Z. Each
column is a vector variable, z
i
, i = 1, . . . , p. The main idea
behind principal component analysis is to derive a linear
function y for each of the vector variables z
i
. This linear
function possesses an extremely important property;
namely, its variance is maximized.
Linear
function is
component of
z
This linear function is referred to as a component of z. To
illustrate the computation of a single element for the jth y
vector, consider the product y = z v' where v' is a column
vector of V and V is a p x p coefficient matrix that carries
the p-element variable z into the derived n-element variable
y. V is known as the eigen vector matrix. The dimension of
6.5.5. Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc55.htm[6/27/2012 2:36:59 PM]
z is 1 x p, the dimension of v' is p x 1. The scalar algebra
for the component score for the ith individual of y
j
, j = 1,
...p is:
y
ji
= v'
1
z
1i
+ v'
2
z
2i
+ ... + v'
p
z
pi
This becomes in matrix notation for all of the y:
Y = ZV
Mean and
dispersion
matrix of y
The mean of y is m
y
= V'm
z
= 0, because m
z
= 0.
The dispersion matrix of y is
D
y
= V'D
z
V = V'RV
R is
correlation
matrix
Now, it can be shown that the dispersion matrix D
z
of a
standardized variable is a correlation matrix. Thus R is the
correlation matrix for z.
Number of
parameters to
estimate
increases
rapidly as p
increases
At this juncture you may be tempted to say: "so what?". To
answer this let us look at the intercorrelations among the
elements of a vector variable. The number of parameters to
be estimated for a p-element variable is
p means
p variances
(p
2
- p)/2 covariances
for a total of 2p + (p
2
-p)/2 parameters.
So
If p = 2, there are 5 parameters
If p = 10, there are 65 parameters
If p = 30, there are 495 parameters
Uncorrelated
variables
require no
covariance
estimation
All these parameters must be estimated and interpreted.
That is a herculean task, to say the least. Now, if we could
transform the data so that we obtain a vector of
uncorrelated variables, life becomes much more bearable,
since there are no covariances.
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[6/27/2012 2:37:00 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.5. Principal Components
6.5.5.1. Properties of Principal Components
Orthogonalizing Transformations
Transformation
from z to y
The equation y = V'z represents a transformation, where y
is the transformed variable, z is the original standardized
variable and V is the premultiplier to go from z to y.
Orthogonal
transformations
simplify things
To produce a transformation vector for y for which the
elements are uncorrelated is the same as saying that we
want V such that D
y
is a diagonal matrix. That is, all the
off-diagonal elements of D
y
must be zero. This is called
an orthogonalizing transformation.
Infinite number
of values for V
There are an infinite number of values for V that will
produce a diagonal D
y
for any correlation matrix R. Thus
the mathematical problem "find a unique V such that D
y
is diagonal" cannot be solved as it stands. A number of
famous statisticians such as Karl Pearson and Harold
Hotelling pondered this problem and suggested a
"variance maximizing" solution.
Principal
components
maximize
variance of the
transformed
elements, one
by one
Hotelling (1933) derived the "principal components"
solution. It proceeds as follows: for the first principal
component, which will be the first element of y and be
defined by the coefficients in the first column of V,
(denoted by v
1
), we want a solution such that the variance
of y
1
will be maximized.
Constrain v to
generate a
unique solution
The constraint on the numbers in v
1
is that the sum of the
squares of the coefficients equals 1. Expressed
mathematically, we wish to maximize
where
y
1i
= v
1
'
z
i
and v
1
'v
1
= 1 ( this is called "normalizing " v
1
).
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[6/27/2012 2:37:00 PM]
Computation of
first principal
component
from R and v
1
Substituting the middle equation in the first yields
where R is the correlation matrix of Z, which, in turn, is
the standardized matrix of X, the original data matrix.
Therefore, we want to maximize v
1
'Rv
1
subject to v
1
'v
1
= 1.
The eigenstructure
Lagrange
multiplier
approach
Let
>
introducing the restriction on v
1
via the Lagrange
multiplier approach. It can be shown (T.W. Anderson,
1958, page 347, theorem 8) that the vector of partial
derivatives is
and setting this equal to zero, dividing out 2 and factoring
gives
This is known as "the problem of the eigenstructure of
R".
Set of p
homogeneous
equations
The partial differentiation resulted in a set of p
homogeneous equations, which may be written in matrix
form as follows
The characteristic equation
Characterstic
equation of R is
a polynomial of
The characteristic equation of R is a polynomial of
degree p, which is obtained by expanding the determinant
of
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[6/27/2012 2:37:00 PM]
degree p
and solving for the roots
j
, j = 1, 2, ..., p.
Largest
eigenvalue
Specifically, the largest eigenvalue,
1
, and its associated
vector, v
1
, are required. Solving for this eigenvalue and
vector is another mammoth numerical task that can
realistically only be performed by a computer. In general,
software is involved and the algorithms are complex.
Remainig p
eigenvalues
After obtaining the first eigenvalue, the process is
repeated until all p eigenvalues are computed.
Full
eigenstructure
of R
To succinctly define the full eigenstructure of R, we
introduce another matrix L, which is a diagonal matrix
with
j
in the jth position on the diagonal. Then the full
eigenstructure of R is given as
RV = VL
where
V'V = VV' = I
and
V'RV = L = D
y
Principal Factors
Scale to zero
means and unit
variances
It was mentioned before that it is helpful to scale any
transformation y of a vector variable z so that its elements
have zero means and unit variances. Such a standardized
transformation is called a factoring of z, or of R, and
each linear component of the transformation is called a
factor.
Deriving unit
variances for
principal
components
Now, the principal components already have zero means,
but their variances are not 1; in fact, they are the
eigenvalues, comprising the diagonal elements of L. It is
possible to derive the principal factor with unit variance
from the principal component as follows
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[6/27/2012 2:37:00 PM]
or for all factors:
substituting V'z for y we have
where
B = VL
-1/2
B matrix The matrix B is then the matrix of factor score
coefficients for principal factors.
How many Eigenvalues?
Dimensionality
of the set of
factor scores
The number of eigenvalues, N, used in the final set
determines the dimensionality of the set of factor scores.
For example, if the original test consisted of 8
measurements on 100 subjects, and we extract 2
eigenvalues, the set of factor scores is a matrix of 100
rows by 2 columns.
Eigenvalues
greater than
unity
Each column or principal factor should represent a
number of original variables. Kaiser (1966) suggested a
rule-of-thumb that takes as a value for N, the number of
eigenvalues larger than unity.
Factor Structure
Factor
structure
matrix S
The primary interpretative device in principal components
is the factor structure, computed as
S = VL
1/2
S is a matrix whose elements are the correlations between
the principal components and the variables. If we retain,
for example, two eigenvalues, meaning that there are two
principal components, then the S matrix consists of two
columns and p (number of variables) rows.
Table showing
relation
between
variables and
principal
components
Principal Component
Variable 1 2
1 r
11
r
12
2 r
21
r
22
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[6/27/2012 2:37:00 PM]
3 r
31
r
32
4 r
41
r
42
The r
ij
are the correlation coefficients between variable i
and principal component j, where i ranges from 1 to 4
and j from 1 to 2.
The
communality
SS' is the source of the "explained" correlations among
the variables. Its diagonal is called "the communality".
Rotation
Factor analysis If this correlation matrix, i.e., the factor structure matrix,
does not help much in the interpretation, it is possible to
rotate the axis of the principal components. This may
result in the polarization of the correlation coefficients.
Some practitioners refer to rotation after generating the
factor structure as factor analysis.
Varimax
rotation
A popular scheme for rotation was suggested by Henry
Kaiser in 1958. He produced a method for orthogonal
rotation of factors, called the varimax rotation, which
cleans up the factors as follows:
for each factor, high loadings (correlations) will
result for a few variables; the rest will be near
zero.
Example The following computer output from a principal
component analysis on a 4-variable data set, followed by
varimax rotation of the factor structure, will illustrate his
point.
Before Rotation After Rotation
Variable Factor
1
Factor
2
Factor
1
Factor
2
1 .853 -.989 .997 .058
2 .634 .762 .089 .987
3 .858 -.498 .989 .076
4 .633 .736 .103 .965
Communality
Formula for
communality
statistic
A measure of how well the selected factors (principal
components) "explain" the variance of each of the
variables is given by a statistic called communality. This
is defined by
6.5.5.1. Properties of Principal Components
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc551.htm[6/27/2012 2:37:00 PM]
Explanation of
communality
statistic
That is: the square of the correlation of variable k with
factor i gives the part of the variance accounted for by
that factor. The sum of these squares for n factors is the
communality, or explained variable for that variable
(row).
Roadmap to solve the V matrix
Main steps to
obtaining
eigenstructure
for a
correlation
matrix
In summary, here are the main steps to obtain the
eigenstructure for a correlation matrix.
1. Compute R, the correlation matrix of the original
data. R is also the correlation matrix of the
standardized data.
2. Obtain the characteristic equation of R which is a
polynomial of degree p (the number of variables),
obtained from expanding the determinant of |R- I|
= 0 and solving for the roots
i
, that is:
1
,
2
, ...
,
p
.
3. Then solve for the columns of the V matrix, (v
1
, v
2
,
..v
p
). The roots, ,
i
, are called the eigenvalues (or
latent values). The columns of V are called the
eigenvectors.
6.5.5.2. Numerical Example
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc552.htm[6/27/2012 2:37:02 PM]
6. Process or Product Monitoring and Control
6.5. Tutorials
6.5.5. Principal Components
6.5.5.2. Numerical Example
Calculation
of principal
components
example
A numerical example may clarify the mechanics of principal
component analysis.
Sample data
set
Let us analyze the following 3-variate dataset with 10 observations. Each
observation consists of 3 measurements on a wafer: thickness, horizontal
displacement and vertical displacement.
Compute the
correlation
matrix
First compute the correlation matrix
Solve for the
roots of R
Next solve for the roots of R, using software
value proportion
1 1.769 .590
2 .927 .899
3 .304 1.000
6.5.5.2. Numerical Example
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc552.htm[6/27/2012 2:37:02 PM]
Notice that
Each eigenvalue satisfies |R- I| = 0.
The sum of the eigenvalues = 3 = p, which is equal to the trace of
R (i.e., the sum of the main diagonal elements).
The determinant of R is the product of the eigenvalues.
The product is
1
x
2
x
3
= .499.
Compute the
first column
of the V
matrix
Substituting the first eigenvalue of 1.769 and R in the appropriate
equation we obtain
This is the matrix expression for 3 homogeneous equations with 3
unknowns and yields the first column of V: .64 .69 -.34 (again, a
computerized solution is indispensable).
Compute the
remaining
columns of
the V matrix
Repeating this procedure for the other 2 eigenvalues yields the matrix V
Notice that if you multiply V by its transpose, the result is an identity
matrix, V'V=I.
Compute the
L
1/2
matrix
Now form the matrix L
1/2
, which is a diagonal matrix whose elements
are the square roots of the eigenvalues of R. Then obtain S, the factor
structure, using S = V L
1/2
So, for example, .91 is the correlation between variable 2 and the first
principal component.
Compute the
communality
Next compute the communality, using the first two eigenvalues only
6.5.5.2. Numerical Example
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc552.htm[6/27/2012 2:37:02 PM]
Diagonal
elements
report how
much of the
variability is
explained
Communality consists of the diagonal elements.
var
1 .8662
2 .8420
3 .9876
This means that the first two principal components "explain" 86.62% of
the first variable, 84.20 % of the second variable, and 98.76% of the
third.
Compute the
coefficient
matrix
The coefficient matrix, B, is formed using the reciprocals of the
diagonals of L
1/2
Compute the
principal
factors
Finally, we can compute the factor scores from ZB, where Z is X
converted to standard score form. These columns are the principal
factors.
Principal
factors
control
chart
These factors can be plotted against the indices, which could be times. If
time is used, the resulting plot is an example of a principal factors
control chart.
6.6. Case Studies in Process Monitoring
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc6.htm[6/27/2012 2:37:03 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
Detailed
Examples
The general points of the first five sections are illustrated in
this section using data from physical science and engineering
applications. Each example is presented step-by-step in the
text, and is often cross-linked with the relevant sections of the
chapter describing the analysis in general.
Contents:
Section 6
1. Lithography Process Example
2. Aerosol Particle Size Example
6.6.1. Lithography Process
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc61.htm[6/27/2012 2:37:04 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
Lithography
Process
This case study illustrates the use of control charts in
analyzing a lithography process.
1. Background and Data
2. Graphical Representation of the Data
3. Subgroup Analysis
4. Shewhart Control Chart
5. Work This Example Yourself
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.1. Background and Data
Case Study for SPC in Batch Processing Environment
Semiconductor
processing
creates
multiple
sources of
variability to
monitor
One of the assumptions in using classical Shewhart SPC charts
is that the only source of variation is from part to part (or
within subgroup variation). This is the case for most continuous
processing situations. However, many of today's processing
situations have different sources of variation. The
semiconductor industry is one of the areas where the
processing creates multiple sources of variation.
In semiconductor processing, the basic experimental unit is a
silicon wafer. Operations are performed on the wafer, but
individual wafers can be grouped multiple ways. In the
diffusion area, up to 150 wafers are processed in one time in a
diffusion tube. In the etch area, single wafers are processed
individually. In the lithography area, the light exposure is done
on sub-areas of the wafer. There are many times during the
production of a computer chip where the experimental unit
varies and thus there are different sources of variation in this
batch processing environment.
The following is a case study of a lithography process. Five
sites are measured on each wafer, three wafers are measured in
a cassette (typically a grouping of 24 - 25 wafers) and thirty
cassettes of wafers are used in the study. The width of a line is
the measurement under study. There are two line width
variables. The first is the original data and the second has been
cleaned up somewhat. This case study uses the raw data. The
entire data table is 450 rows long with six columns.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Case study
data: wafer
line width
measurements
Raw
Cleaned
Line
Line
Cassette Wafer Site Width Sequence
Width
=====================================================
1 1 Top 3.199275 1
3.197275
1 1 Lef 2.253081 2
2.249081
1 1 Cen 2.074308 3
2.068308
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
1 1 Rgt 2.418206 4
2.410206
1 1 Bot 2.393732 5
2.383732
1 2 Top 2.654947 6
2.642947
1 2 Lef 2.003234 7
1.989234
1 2 Cen 1.861268 8
1.845268
1 2 Rgt 2.136102 9
2.118102
1 2 Bot 1.976495 10
1.956495
1 3 Top 2.887053 11
2.865053
1 3 Lef 2.061239 12
2.037239
1 3 Cen 1.625191 13
1.599191
1 3 Rgt 2.304313 14
2.276313
1 3 Bot 2.233187 15
2.203187
2 1 Top 3.160233 16
3.128233
2 1 Lef 2.518913 17
2.484913
2 1 Cen 2.072211 18
2.036211
2 1 Rgt 2.287210 19
2.249210
2 1 Bot 2.120452 20
2.080452
2 2 Top 2.063058 21
2.021058
2 2 Lef 2.217220 22
2.173220
2 2 Cen 1.472945 23
1.426945
2 2 Rgt 1.684581 24
1.636581
2 2 Bot 1.900688 25
1.850688
2 3 Top 2.346254 26
2.294254
2 3 Lef 2.172825 27
2.118825
2 3 Cen 1.536538 28
1.480538
2 3 Rgt 1.966630 29
1.908630
2 3 Bot 2.251576 30
2.191576
3 1 Top 2.198141 31
2.136141
3 1 Lef 1.728784 32
1.664784
3 1 Cen 1.357348 33
1.291348
3 1 Rgt 1.673159 34
1.605159
3 1 Bot 1.429586 35
1.359586
3 2 Top 2.231291 36
2.159291
3 2 Lef 1.561993 37
1.487993
3 2 Cen 1.520104 38
1.444104
3 2 Rgt 2.066068 39
1.988068
3 2 Bot 1.777603 40
1.697603
3 3 Top 2.244736 41
2.162736
3 3 Lef 1.745877 42
1.661877
3 3 Cen 1.366895 43
1.280895
3 3 Rgt 1.615229 44
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
1.527229
3 3 Bot 1.540863 45
1.450863
4 1 Top 2.929037 46
2.837037
4 1 Lef 2.035900 47
1.941900
4 1 Cen 1.786147 48
1.690147
4 1 Rgt 1.980323 49
1.882323
4 1 Bot 2.162919 50
2.062919
4 2 Top 2.855798 51
2.753798
4 2 Lef 2.104193 52
2.000193
4 2 Cen 1.919507 53
1.813507
4 2 Rgt 2.019415 54
1.911415
4 2 Bot 2.228705 55
2.118705
4 3 Top 3.219292 56
3.107292
4 3 Lef 2.900430 57
2.786430
4 3 Cen 2.171262 58
2.055262
4 3 Rgt 3.041250 59
2.923250
4 3 Bot 3.188804 60
3.068804
5 1 Top 3.051234 61
2.929234
5 1 Lef 2.506230 62
2.382230
5 1 Cen 1.950486 63
1.824486
5 1 Rgt 2.467719 64
2.339719
5 1 Bot 2.581881 65
2.451881
5 2 Top 3.857221 66
3.725221
5 2 Lef 3.347343 67
3.213343
5 2 Cen 2.533870 68
2.397870
5 2 Rgt 3.190375 69
3.052375
5 2 Bot 3.362746 70
3.222746
5 3 Top 3.690306 71
3.548306
5 3 Lef 3.401584 72
3.257584
5 3 Cen 2.963117 73
2.817117
5 3 Rgt 2.945828 74
2.797828
5 3 Bot 3.466115 75
3.316115
6 1 Top 2.938241 76
2.786241
6 1 Lef 2.526568 77
2.372568
6 1 Cen 1.941370 78
1.785370
6 1 Rgt 2.765849 79
2.607849
6 1 Bot 2.382781 80
2.222781
6 2 Top 3.219665 81
3.057665
6 2 Lef 2.296011 82
2.132011
6 2 Cen 2.256196 83
2.090196
6 2 Rgt 2.645933 84
2.477933
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
6 2 Bot 2.422187 85
2.252187
6 3 Top 3.180348 86
3.008348
6 3 Lef 2.849264 87
2.675264
6 3 Cen 1.601288 88
1.425288
6 3 Rgt 2.810051 89
2.632051
6 3 Bot 2.902980 90
2.722980
7 1 Top 2.169679 91
1.987679
7 1 Lef 2.026506 92
1.842506
7 1 Cen 1.671804 93
1.485804
7 1 Rgt 1.660760 94
1.472760
7 1 Bot 2.314734 95
2.124734
7 2 Top 2.912838 96
2.720838
7 2 Lef 2.323665 97
2.129665
7 2 Cen 1.854223 98
1.658223
7 2 Rgt 2.391240 99 2.19324
7 2 Bot 2.196071 100
1.996071
7 3 Top 3.318517 101
3.116517
7 3 Lef 2.702735 102
2.498735
7 3 Cen 1.959008 103
1.753008
7 3 Rgt 2.512517 104
2.304517
7 3 Bot 2.827469 105
2.617469
8 1 Top 1.958022 106
1.746022
8 1 Lef 1.360106 107
1.146106
8 1 Cen 0.971193 108
0.755193
8 1 Rgt 1.947857 109
1.729857
8 1 Bot 1.643580 110 1.42358
8 2 Top 2.357633 111
2.135633
8 2 Lef 1.757725 112
1.533725
8 2 Cen 1.165886 113
0.939886
8 2 Rgt 2.231143 114
2.003143
8 2 Bot 1.311626 115
1.081626
8 3 Top 2.421686 116
2.189686
8 3 Lef 1.993855 117
1.759855
8 3 Cen 1.402543 118
1.166543
8 3 Rgt 2.008543 119
1.770543
8 3 Bot 2.139370 120
1.899370
9 1 Top 2.190676 121
1.948676
9 1 Lef 2.287483 122
2.043483
9 1 Cen 1.698943 123
1.452943
9 1 Rgt 1.925731 124
1.677731
9 1 Bot 2.057440 125
1.807440
9 2 Top 2.353597 126
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
2.101597
9 2 Lef 1.796236 127
1.542236
9 2 Cen 1.241040 128
0.985040
9 2 Rgt 1.677429 129
1.419429
9 2 Bot 1.845041 130
1.585041
9 3 Top 2.012669 131
1.750669
9 3 Lef 1.523769 132
1.259769
9 3 Cen 0.790789 133
0.524789
9 3 Rgt 2.001942 134
1.733942
9 3 Bot 1.350051 135
1.080051
10 1 Top 2.825749 136
2.553749
10 1 Lef 2.502445 137
2.228445
10 1 Cen 1.938239 138
1.662239
10 1 Rgt 2.349497 139
2.071497
10 1 Bot 2.310817 140
2.030817
10 2 Top 3.074576 141
2.792576
10 2 Lef 2.057821 142
1.773821
10 2 Cen 1.793617 143
1.507617
10 2 Rgt 1.862251 144
1.574251
10 2 Bot 1.956753 145
1.666753
10 3 Top 3.072840 146
2.780840
10 3 Lef 2.291035 147
1.997035
10 3 Cen 1.873878 148
1.577878
10 3 Rgt 2.475640 149
2.177640
10 3 Bot 2.021472 150
1.721472
11 1 Top 3.228835 151
2.926835
11 1 Lef 2.719495 152
2.415495
11 1 Cen 2.207198 153
1.901198
11 1 Rgt 2.391608 154
2.083608
11 1 Bot 2.525587 155
2.215587
11 2 Top 2.891103 156
2.579103
11 2 Lef 2.738007 157
2.424007
11 2 Cen 1.668337 158
1.352337
11 2 Rgt 2.496426 159
2.178426
11 2 Bot 2.417926 160
2.097926
11 3 Top 3.541799 161
3.219799
11 3 Lef 3.058768 162
2.734768
11 3 Cen 2.187061 163
1.861061
11 3 Rgt 2.790261 164
2.462261
11 3 Bot 3.279238 165
2.949238
12 1 Top 2.347662 166
2.015662
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
12 1 Lef 1.383336 167
1.049336
12 1 Cen 1.187168 168
0.851168
12 1 Rgt 1.693292 169
1.355292
12 1 Bot 1.664072 170
1.324072
12 2 Top 2.385320 171
2.043320
12 2 Lef 1.607784 172
1.263784
12 2 Cen 1.230307 173
0.884307
12 2 Rgt 1.945423 174
1.597423
12 2 Bot 1.907580 175
1.557580
12 3 Top 2.691576 176
2.339576
12 3 Lef 1.938755 177
1.584755
12 3 Cen 1.275409 178
0.919409
12 3 Rgt 1.777315 179
1.419315
12 3 Bot 2.146161 180
1.786161
13 1 Top 3.218655 181
2.856655
13 1 Lef 2.912180 182
2.548180
13 1 Cen 2.336436 183
1.970436
13 1 Rgt 2.956036 184
2.588036
13 1 Bot 2.423235 185
2.053235
13 2 Top 3.302224 186
2.930224
13 2 Lef 2.808816 187
2.434816
13 2 Cen 2.340386 188
1.964386
13 2 Rgt 2.795120 189
2.417120
13 2 Bot 2.865800 190
2.485800
13 3 Top 2.992217 191
2.610217
13 3 Lef 2.952106 192
2.568106
13 3 Cen 2.149299 193
1.763299
13 3 Rgt 2.448046 194
2.060046
13 3 Bot 2.507733 195
2.117733
14 1 Top 3.530112 196
3.138112
14 1 Lef 2.940489 197
2.546489
14 1 Cen 2.598357 198
2.202357
14 1 Rgt 2.905165 199
2.507165
14 1 Bot 2.692078 200
2.292078
14 2 Top 3.764270 201
3.362270
14 2 Lef 3.465960 202
3.061960
14 2 Cen 2.458628 203
2.052628
14 2 Rgt 3.141132 204
2.733132
14 2 Bot 2.816526 205
2.406526
14 3 Top 3.217614 206
2.805614
14 3 Lef 2.758171 207
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
2.344171
14 3 Cen 2.345921 208
1.929921
14 3 Rgt 2.773653 209
2.355653
14 3 Bot 3.109704 210
2.689704
15 1 Top 2.177593 211
1.755593
15 1 Lef 1.511781 212
1.087781
15 1 Cen 0.746546 213
0.320546
15 1 Rgt 1.491730 214
1.063730
15 1 Bot 1.268580 215
0.838580
15 2 Top 2.433994 216
2.001994
15 2 Lef 2.045667 217
1.611667
15 2 Cen 1.612699 218
1.176699
15 2 Rgt 2.082860 219
1.644860
15 2 Bot 1.887341 220
1.447341
15 3 Top 1.923003 221
1.481003
15 3 Lef 2.124461 222
1.680461
15 3 Cen 1.945048 223
1.499048
15 3 Rgt 2.210698 224
1.762698
15 3 Bot 1.985225 225
1.535225
16 1 Top 3.131536 226
2.679536
16 1 Lef 2.405975 227
1.951975
16 1 Cen 2.206320 228
1.750320
16 1 Rgt 3.012211 229
2.554211
16 1 Bot 2.628723 230
2.168723
16 2 Top 2.802486 231
2.340486
16 2 Lef 2.185010 232
1.721010
16 2 Cen 2.161802 233
1.695802
16 2 Rgt 2.102560 234
1.634560
16 2 Bot 1.961968 235
1.491968
16 3 Top 3.330183 236
2.858183
16 3 Lef 2.464046 237
1.990046
16 3 Cen 1.687408 238
1.211408
16 3 Rgt 2.043322 239
1.565322
16 3 Bot 2.570657 240
2.090657
17 1 Top 3.352633 241
2.870633
17 1 Lef 2.691645 242
2.207645
17 1 Cen 1.942410 243
1.456410
17 1 Rgt 2.366055 244
1.878055
17 1 Bot 2.500987 245
2.010987
17 2 Top 2.886284 246
2.394284
17 2 Lef 2.292503 247
1.798503
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
17 2 Cen 1.627562 248
1.131562
17 2 Rgt 2.415076 249
1.917076
17 2 Bot 2.086134 250
1.586134
17 3 Top 2.554848 251
2.052848
17 3 Lef 1.755843 252
1.251843
17 3 Cen 1.510124 253
1.004124
17 3 Rgt 2.257347 254
1.749347
17 3 Bot 1.958592 255
1.448592
18 1 Top 2.622733 256
2.110733
18 1 Lef 2.321079 257
1.807079
18 1 Cen 1.169269 258
0.653269
18 1 Rgt 1.921457 259
1.403457
18 1 Bot 2.176377 260
1.656377
18 2 Top 3.313367 261
2.791367
18 2 Lef 2.559725 262
2.035725
18 2 Cen 2.404662 263
1.878662
18 2 Rgt 2.405249 264
1.877249
18 2 Bot 2.535618 265
2.005618
18 3 Top 3.067851 266
2.535851
18 3 Lef 2.490359 267
1.956359
18 3 Cen 2.079477 268
1.543477
18 3 Rgt 2.669512 269
2.131512
18 3 Bot 2.105103 270
1.565103
19 1 Top 4.293889 271
3.751889
19 1 Lef 3.888826 272
3.344826
19 1 Cen 2.960655 273
2.414655
19 1 Rgt 3.618864 274
3.070864
19 1 Bot 3.562480 275
3.012480
19 2 Top 3.451872 276
2.899872
19 2 Lef 3.285934 277
2.731934
19 2 Cen 2.638294 278
2.082294
19 2 Rgt 2.918810 279
2.360810
19 2 Bot 3.076231 280
2.516231
19 3 Top 3.879683 281
3.317683
19 3 Lef 3.342026 282
2.778026
19 3 Cen 3.382833 283
2.816833
19 3 Rgt 3.491666 284
2.923666
19 3 Bot 3.617621 285
3.047621
20 1 Top 2.329987 286
1.757987
20 1 Lef 2.400277 287
1.826277
20 1 Cen 2.033941 288
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
1.457941
20 1 Rgt 2.544367 289
1.966367
20 1 Bot 2.493079 290
1.913079
20 2 Top 2.862084 291
2.280084
20 2 Lef 2.404703 292
1.820703
20 2 Cen 1.648662 293
1.062662
20 2 Rgt 2.115465 294
1.527465
20 2 Bot 2.633930 295
2.043930
20 3 Top 3.305211 296
2.713211
20 3 Lef 2.194991 297
1.600991
20 3 Cen 1.620963 298
1.024963
20 3 Rgt 2.322678 299
1.724678
20 3 Bot 2.818449 300
2.218449
21 1 Top 2.712915 301
2.110915
21 1 Lef 2.389121 302
1.785121
21 1 Cen 1.575833 303
0.969833
21 1 Rgt 1.870484 304
1.262484
21 1 Bot 2.203262 305
1.593262
21 2 Top 2.607972 306
1.995972
21 2 Lef 2.177747 307
1.563747
21 2 Cen 1.246016 308
0.630016
21 2 Rgt 1.663096 309
1.045096
21 2 Bot 1.843187 310
1.223187
21 3 Top 2.277813 311
1.655813
21 3 Lef 1.764940 312
1.140940
21 3 Cen 1.358137 313
0.732137
21 3 Rgt 2.065713 314
1.437713
21 3 Bot 1.885897 315
1.255897
22 1 Top 3.126184 316
2.494184
22 1 Lef 2.843505 317
2.209505
22 1 Cen 2.041466 318
1.405466
22 1 Rgt 2.816967 319
2.178967
22 1 Bot 2.635127 320
1.995127
22 2 Top 3.049442 321
2.407442
22 2 Lef 2.446904 322
1.802904
22 2 Cen 1.793442 323
1.147442
22 2 Rgt 2.676519 324
2.028519
22 2 Bot 2.187865 325
1.537865
22 3 Top 2.758416 326
2.106416
22 3 Lef 2.405744 327
1.751744
22 3 Cen 1.580387 328
0.924387
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
22 3 Rgt 2.508542 329
1.850542
22 3 Bot 2.574564 330
1.914564
23 1 Top 3.294288 331
2.632288
23 1 Lef 2.641762 332
1.977762
23 1 Cen 2.105774 333
1.439774
23 1 Rgt 2.655097 334
1.987097
23 1 Bot 2.622482 335
1.952482
23 2 Top 4.066631 336
3.394631
23 2 Lef 3.389733 337
2.715733
23 2 Cen 2.993666 338
2.317666
23 2 Rgt 3.613128 339
2.935128
23 2 Bot 3.213809 340
2.533809
23 3 Top 3.369665 341
2.687665
23 3 Lef 2.566891 342
1.882891
23 3 Cen 2.289899 343
1.603899
23 3 Rgt 2.517418 344
1.829418
23 3 Bot 2.862723 345
2.172723
24 1 Top 4.212664 346
3.520664
24 1 Lef 3.068342 347
2.374342
24 1 Cen 2.872188 348
2.176188
24 1 Rgt 3.040890 349
2.342890
24 1 Bot 3.376318 350
2.676318
24 2 Top 3.223384 351
2.521384
24 2 Lef 2.552726 352
1.848726
24 2 Cen 2.447344 353
1.741344
24 2 Rgt 3.011574 354
2.303574
24 2 Bot 2.711774 355
2.001774
24 3 Top 3.359505 356
2.647505
24 3 Lef 2.800742 357
2.086742
24 3 Cen 2.043396 358
1.327396
24 3 Rgt 2.929792 359
2.211792
24 3 Bot 2.935356 360
2.215356
25 1 Top 2.724871 361
2.002871
25 1 Lef 2.239013 362
1.515013
25 1 Cen 2.341512 363
1.615512
25 1 Rgt 2.263617 364
1.535617
25 1 Bot 2.062748 365
1.332748
25 2 Top 3.658082 366
2.926082
25 2 Lef 3.093268 367
2.359268
25 2 Cen 2.429341 368
1.693341
25 2 Rgt 2.538365 369
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
1.800365
25 2 Bot 3.161795 370
2.421795
25 3 Top 3.178246 371
2.436246
25 3 Lef 2.498102 372
1.754102
25 3 Cen 2.445810 373
1.699810
25 3 Rgt 2.231248 374
1.483248
25 3 Bot 2.302298 375
1.552298
26 1 Top 3.320688 376
2.568688
26 1 Lef 2.861800 377
2.107800
26 1 Cen 2.238258 378
1.482258
26 1 Rgt 3.122050 379
2.364050
26 1 Bot 3.160876 380
2.400876
26 2 Top 3.873888 381
3.111888
26 2 Lef 3.166345 382
2.402345
26 2 Cen 2.645267 383
1.879267
26 2 Rgt 3.309867 384
2.541867
26 2 Bot 3.542882 385
2.772882
26 3 Top 2.586453 386
1.814453
26 3 Lef 2.120604 387
1.346604
26 3 Cen 2.180847 388
1.404847
26 3 Rgt 2.480888 389
1.702888
26 3 Bot 1.938037 390
1.158037
27 1 Top 4.710718 391
3.928718
27 1 Lef 4.082083 392
3.298083
27 1 Cen 3.533026 393
2.747026
27 1 Rgt 4.269929 394
3.481929
27 1 Bot 4.038166 395
3.248166
27 2 Top 4.237233 396
3.445233
27 2 Lef 4.171702 397
3.377702
27 2 Cen 3.04394 398
2.247940
27 2 Rgt 3.91296 399
3.114960
27 2 Bot 3.714229 400
2.914229
27 3 Top 5.168668 401
4.366668
27 3 Lef 4.823275 402
4.019275
27 3 Cen 3.764272 403
2.958272
27 3 Rgt 4.396897 404
3.588897
27 3 Bot 4.442094 405
3.632094
28 1 Top 3.972279 406
3.160279
28 1 Lef 3.883295 407
3.069295
28 1 Cen 3.045145 408
2.229145
28 1 Rgt 3.51459 409
2.696590
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
28 1 Bot 3.575446 410
2.755446
28 2 Top 3.024903 411
2.202903
28 2 Lef 3.099192 412
2.275192
28 2 Cen 2.048139 413
1.222139
28 2 Rgt 2.927978 414
2.099978
28 2 Bot 3.15257 415
2.322570
28 3 Top 3.55806 416
2.726060
28 3 Lef 3.176292 417
2.342292
28 3 Cen 2.852873 418
2.016873
28 3 Rgt 3.026064 419
2.188064
28 3 Bot 3.071975 420
2.231975
29 1 Top 3.496634 421
2.654634
29 1 Lef 3.087091 422
2.243091
29 1 Cen 2.517673 423
1.671673
29 1 Rgt 2.547344 424
1.699344
29 1 Bot 2.971948 425
2.121948
29 2 Top 3.371306 426
2.519306
29 2 Lef 2.175046 427
1.321046
29 2 Cen 1.940111 428
1.084111
29 2 Rgt 2.932408 429
2.074408
29 2 Bot 2.428069 430
1.568069
29 3 Top 2.941041 431
2.079041
29 3 Lef 2.294009 432
1.430009
29 3 Cen 2.025674 433
1.159674
29 3 Rgt 2.21154 434
1.343540
29 3 Bot 2.459684 435
1.589684
30 1 Top 2.86467 436
1.992670
30 1 Lef 2.695163 437
1.821163
30 1 Cen 2.229518 438
1.353518
30 1 Rgt 1.940917 439
1.062917
30 1 Bot 2.547318 440
1.667318
30 2 Top 3.537562 441
2.655562
30 2 Lef 3.311361 442
2.427361
30 2 Cen 2.767771 443
1.881771
30 2 Rgt 3.388622 444
2.500622
30 2 Bot 3.542701 445
2.652701
30 3 Top 3.184652 446
2.292652
30 3 Lef 2.620947 447
1.726947
30 3 Cen 2.697619 448
1.801619
30 3 Rgt 2.860684 449
1.962684
30 3 Bot 2.758571 450
6.6.1.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc611.htm[6/27/2012 2:37:04 PM]
1.858571
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[6/27/2012 2:37:09 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.2. Graphical Representation of the Data
The first step in analyzing the data is to generate some
simple plots of the response and then of the response versus
the various factors.
4-Plot of
Data
Interpretation This 4-plot shows the following.
1. The run sequence plot (upper left) indicates that the
location and scale are not constant over time. This
indicates that the three factors do in fact have an
effect of some kind.
2. The lag plot (upper right) indicates that there is some
mild autocorrelation in the data. This is not
unexpected as the data are grouped in a logical order
of the three factors (i.e., not randomly) and the run
sequence plot indicates that there are factor effects.
3. The histogram (lower left) shows that most of the
data fall between 1 and 5, with the center of the data
at about 2.2.
4. Due to the non-constant location and scale and
autocorrelation in the data, distributional inferences
from the normal probability plot (lower right) are not
meaningful.
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[6/27/2012 2:37:09 PM]
The run sequence plot is shown at full size to show greater
detail. In addition, a numerical summary of the data is
generated.
Run
Sequence
Plot of Data
Numerical
Summary
Sample size = 450
Mean = 2.53228
Median = 2.45334
Minimum = 0.74655
Maximum = 5.16867
Range = 4.42212
Stan. Dev. = 0.69376
Autocorrelation = 0.60726
We are primarily interested in the mean and standard
deviation. From the summary, we see that the mean is 2.53
and the standard deviation is 0.69.
Plot response
against
individual
factors
The next step is to plot the response against each individual
factor. For comparison, we generate both a scatter plot and
a box plot of the data. The scatter plot shows more detail.
However, comparisons are usually easier to see with the
box plot, particularly as the number of data points and
groups become larger.
Scatter plot
of width
versus
cassette
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[6/27/2012 2:37:09 PM]
Box plot of
width versus
cassette
Interpretation We can make the following conclusions based on the above
scatter and box plots.
1. There is considerable variation in the location for the
various cassettes. The medians vary from about 1.7 to
4.
2. There is also some variation in the scale.
3. There are a number of outliers.
Scatter plot
of width
versus wafer
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[6/27/2012 2:37:09 PM]
Box plot of
width versus
wafer
Interpretation We can make the following conclusions based on the above
scatter and box plots.
1. The locations for the three wafers are relatively
constant.
2. The scales for the three wafers are relatively constant.
3. There are a few outliers on the high side.
4. It is reasonable to treat the wafer factor as
homogeneous.
Scatter plot
of width
versus site
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[6/27/2012 2:37:09 PM]
Box plot of
width versus
site
Interpretation We can make the following conclusions based on the above
scatter and box plots.
1. There is some variation in location based on site. The
center site in particular has a lower median.
2. The scales are relatively constant across sites.
3. There are a few outliers.
DOE mean
and sd plots
We can use the DOE mean plot and the DOE standard
deviation plot to show the factor means and standard
deviations together for better comparison.
DOE mean
plot
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[6/27/2012 2:37:09 PM]
DOE sd plot
Summary The above graphs show that there are differences between
the lots and the sites.
There are various ways we can create subgroups of this
dataset: each lot could be a subgroup, each wafer could be
a subgroup, or each site measured could be a subgroup
(with only one data value in each subgroup).
Recall that for a classical Shewhart means chart, the
average within subgroup standard deviation is used to
calculate the control limits for the means chart. However,
with a means chart you are monitoring the subgroup mean-
to-mean variation. There is no problem if you are in a
continuous processing situation - this becomes an issue if
you are operating in a batch processing environment.
We will look at various control charts based on different
subgroupings in 6.6.1.3.
6.6.1.2. Graphical Representation of the Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc612.htm[6/27/2012 2:37:09 PM]
6.6.1.3. Subgroup Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc613.htm[6/27/2012 2:37:10 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.3. Subgroup Analysis
Control
charts for
subgroups
The resulting classical Shewhart control charts for each
possible subgroup are shown below.
Site as
subgroup
The first pair of control charts use the site as the subgroup.
However, since site has a subgroup size of one we use the
control charts for individual measurements. A moving
average and a moving range chart are shown.
Moving
average
control chart
Moving
range control
chart
6.6.1.3. Subgroup Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc613.htm[6/27/2012 2:37:10 PM]
Wafer as
subgroup
The next pair of control charts use the wafer as the
subgroup. In this case, the subgroup size is five. A mean
and a standard deviation control chart are shown.
Mean control
chart
SD control
chart
There is no LCL for the standard deviation chart because of
the small subgroup size.
Cassette as
subgroup
The next pair of control charts use the cassette as the
subgroup. In this case, the subgroup size is 15. A mean and
a standard deviation control chart are shown.
Mean control
chart
6.6.1.3. Subgroup Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc613.htm[6/27/2012 2:37:10 PM]
SD control
chart
Interpretation Which of these subgroupings of the data is correct? As you
can see, each sugrouping produces a different chart. Part of
the answer lies in the manufacturing requirements for this
process. Another aspect that can be statistically determined
is the magnitude of each of the sources of variation. In
order to understand our data structure and how much
variation each of our sources contribute, we need to
perform a variance component analysis. The variance
component analysis for this data set is shown below.
Component
Variance
Component
Estimate
Cassette 0.2645
Wafer 0.0500
Site 0.1755
6.6.1.3. Subgroup Analysis
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc613.htm[6/27/2012 2:37:10 PM]
Variance
Component
Estimation
If your software does not generate the variance components
directly, they can be computed from a standard analysis of
variance output by equating mean squares (MS) to expected
mean squares (EMS).
The sum of squares and mean squares for a nested, random
effects model are shown below.
Degrees of Sum of
Source Freedom Squares
Mean Squares
-------------------- ---------- --------- --
----------
Cassette 29 127.40293
4.3932
Wafer(Cassette) 60 25.52089
0.4253
Site(Cassette, Wafer) 360 63.17865
0.1755
The expected mean squares for cassette, wafer within
cassette, and site within cassette and wafer, along with their
associated mean squares, are the following.
4.3932 = (3*5)*Var(cassettes) + 5*Var(wafer) +
Var(site)
0.4253 = 5*Var(wafer) + Var(site)
0.1755 = Var(site)
Solving these equations, we obtain the variance component
estimates 0.2645, 0.04997, and 0.1755 for cassettes, wafers,
and sites, respectively.
All of the analyses in this section can be completed using R
code.
6.6.1.4. Shewhart Control Chart
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc614.htm[6/27/2012 2:37:11 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.4. Shewhart Control Chart
Choosing
the right
control
charts to
monitor
the
process
The largest source of variation in this data is the lot-to-lot
variation. So, using classical Shewhart methods, if we specify
our subgroup to be anything other than lot, we will be ignoring
the known lot-to-lot variation and could get out-of-control
points that already have a known, assignable cause - the data
comes from different lots. However, in the lithography
processing area the measurements of most interest are the site
level measurements, not the lot means. How can we get
around this seeming contradiction?
Chart
sources of
variation
separately
One solution is to chart the important sources of variation
separately. We would then be able to monitor the variation of
our process and truly understand where the variation is coming
from and if it changes. For this dataset, this approach would
require having two sets of control charts, one for the
individual site measurements and the other for the lot means.
This would double the number of charts necessary for this
process (we would have 4 charts for line width instead of 2).
Chart only
most
important
source of
variation
Another solution would be to have one chart on the largest
source of variation. This would mean we would have one set
of charts that monitor the lot-to-lot variation. From a
manufacturing standpoint, this would be unacceptable.
Use
boxplot
type chart
We could create a non-standard chart that would plot all the
individual data values and group them together in a boxplot
type format by lot. The control limits could be generated to
monitor the individual data values while the lot-to-lot variation
would be monitored by the patterns of the groupings. This
would take special programming and management intervention
to implement non-standard charts in most floor shop control
systems.
Alternate
form for
mean
control
chart
A commonly applied solution is the first option; have multiple
charts on this process. When creating the control limits for the
lot means, care must be taken to use the lot-to-lot variation
instead of the within lot variation. The resulting control charts
are: the standard individuals/moving range charts (as seen
previously), and a control chart on the lot means that is
different from the previous lot means chart. This new chart
6.6.1.4. Shewhart Control Chart
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc614.htm[6/27/2012 2:37:11 PM]
uses the lot-to-lot variation to calculate control limits instead
of the average within-lot standard deviation. The
accompanying standard deviation chart is the same as seen
previously.
Mean
control
chart
using lot-
to-lot
variation
The control limits labeled with "UCL" and "LCL" are the
standard control limits. The control limits labeled with "UCL:
LL" and "LCL: LL" are based on the lot-to-lot variation.
6.6.1.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc615.htm[6/27/2012 2:37:12 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.1. Lithography Process
6.6.1.5. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output Window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data. 1. You have read 5
columns of numbers
into Dataplot,
variables CASSETTE,
WAFER, SITE,
WIDTH, and RUNSEQ.
2. Plot of the response variable
1. Numerical summary of WIDTH.
2. 4-Plot of WIDTH.
3. Run sequence plot of WIDTH.
1. The summary shows
the mean line width
is 2.53 and the
standard deviation
of the line
width is 0.69.
2. The 4-plot shows
non-constant
location and
scale and moderate
6.6.1.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc615.htm[6/27/2012 2:37:12 PM]
autocorrelation.
3. The run sequence
plot shows
non-constant
location and scale.
3. Generate scatter and box plots
against
individual factors.
1. Scatter plot of WIDTH versus
CASSETTE.
2. Box plot of WIDTH versus
CASSETTE.
3. Scatter plot of WIDTH versus
WAFER.
4. Box plot of WIDTH versus
WAFER.
5. Scatter plot of WIDTH versus
SITE.
6. Box plot of WIDTH versus
SITE.
7. DOE mean plot of WIDTH versus
CASSETTE, WAFER, and SITE.
8. DOE sd plot of WIDTH versus
CASSETTE, WAFER, and SITE.
1. The scatter plot
shows considerable
variation in
location.
2. The box plot
shows considerable
variation in
location and scale
and the prescence
of some outliers.
3. The scatter plot
shows minimal
variation in
location and scale.
4. The box plot
shows minimal
variation in
location and scale.
It also show
some outliers.
5. The scatter plot
shows some
variation in
location.
6. The box plot
shows some
variation in
location. Scale
seems relatively
constant.
Some outliers.
7. The DOE mean
plot shows effects
for CASSETTE and
SITE, no effect
for WAFER.
8. The DOE sd plot
shows effects
for CASSETTE and
SITE, no effect
for WAFER.
4. Subgroup analysis.
1. Generate a moving mean control
chart.
2. Generate a moving range control
chart.
3. Generate a mean control chart
for WAFER.
1. The moving mean
plot shows
a large number of
out-of-
control points.
2. The moving range
plot shows
a large number of
out-of-
6.6.1.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc615.htm[6/27/2012 2:37:12 PM]
4. Generate a sd control chart
for WAFER.
5. Generate a mean control chart
for CASSETTE.
6. Generate a sd control chart
for CASSETTE.
7. Generate an analysis of
variance. This is not
currently implemented in
DATAPLOT for nested
datasets.
8. Generate a mean control chart
using lot-to-lot variation.
control points.
3. The mean control
chart shows
a large number of
out-of-
control points.
4. The sd control
chart shows
no out-of-control
points.
5. The mean control
chart shows
a large number of
out-of-
control points.
6. The sd control
chart shows
no out-of-control
points.
7. The analysis of
variance and
components of
variance
calculations show
that
cassette to
cassette
variation is 54%
of the total
and site to site
variation
is 36% of the
total.
8. The mean control
chart shows one
point that is on
the boundary of
being out of
control.
6.6.2. Aerosol Particle Size
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc62.htm[6/27/2012 2:37:13 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
Box-
Jenkins
Modeling
of Aerosol
Particle
Size
This case study illustrates the use of Box-Jenkins modeling
with aerosol particle size data.
1. Background and Data
2. Model Identification
3. Model Estimation
4. Model Validation
5. Work This Example Yourself
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.1. Background and Data
Data Source The source of the data for this case study is Antuan Negiz
who analyzed these data while he was a post-doc in the
NIST Statistical Engineering Division from the Illinois
Institute of Technology.
Data
Collection
These data were collected from an aerosol mini-spray dryer
device. The purpose of this device is to convert a slurry
stream into deposited particles in a drying chamber. The
device injects the slurry at high speed. The slurry is
pulverized as it enters the drying chamber when it comes into
contact with a hot gas stream at low humidity. The liquid
contained in the pulverized slurry particles is vaporized, then
transferred to the hot gas stream leaving behind dried small-
sized particles.
The response variable is particle size, which is collected
equidistant in time. There are a variety of associated
variables that may affect the injection process itself and
hence the size and quality of the deposited particles. For this
case study, we restrict our analysis to the response variable.
Applications Such deposition process operations have many applications
from powdered laundry detergents at one extreme to ceramic
molding at an important other extreme. In ceramic molding,
the distribution and homogeneity of the particle sizes are
particularly important because after the molds are baked and
cured, the properties of the final molded ceramic product is
strongly affected by the intermediate uniformity of the base
ceramic particles, which in turn is directly reflective of the
quality of the initial atomization process in the aerosol
injection device.
Aerosol
Particle
Size
Dynamic
Modeling
and Control
The data set consists of particle sizes collected over time.
The basic distributional properties of this process are of
interest in terms of distributional shape, constancy of size,
and variation in size. In addition, this time series may be
examined for autocorrelation structure to determine a
prediction model of particle size as a function of time--such
a model is frequently autoregressive in nature. Such a high-
quality prediction equation would be essential as a first step
in developing a predictor-corrective recursive feedback
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
mechanism which would serve as the core in developing and
implementing real-time dynamic corrective algorithms. The
net effect of such algorthms is, of course, a particle size
distribution that is much less variable, much more stable in
nature, and of much higher quality. All of this results in final
ceramic mold products that are more uniform and predictable
across a wide range of important performance
characteristics.
For the purposes of this case study, we restrict the analysis to
determining an appropriate Box-Jenkins model of the particle
size.
Software The analyses used in this case study can be generated using
both Dataplot code and R code.
Case study
data
115.36539
114.63150
114.63150
116.09940
116.34400
116.09940
116.34400
116.83331
116.34400
116.83331
117.32260
117.07800
117.32260
117.32260
117.81200
117.56730
118.30130
117.81200
118.30130
117.81200
118.30130
118.30130
118.54590
118.30130
117.07800
116.09940
118.30130
118.79060
118.05661
118.30130
118.54590
118.30130
118.54590
118.05661
118.30130
118.54590
118.30130
118.30130
118.30130
118.30130
118.05661
118.30130
117.81200
118.30130
117.32260
117.32260
117.56730
117.81200
117.56730
117.81200
117.81200
117.32260
116.34400
116.58870
116.83331
116.58870
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
116.83331
116.83331
117.32260
116.34400
116.09940
115.61010
115.61010
115.61010
115.36539
115.12080
115.61010
115.85471
115.36539
115.36539
115.36539
115.12080
114.87611
114.87611
115.12080
114.87611
114.87611
114.63150
114.63150
114.14220
114.38680
114.14220
114.63150
114.87611
114.38680
114.87611
114.63150
114.14220
114.14220
113.89750
114.14220
113.89750
113.65289
113.65289
113.40820
113.40820
112.91890
113.40820
112.91890
113.40820
113.89750
113.40820
113.65289
113.89750
113.65289
113.65289
113.89750
113.65289
113.16360
114.14220
114.38680
113.65289
113.89750
113.89750
113.40820
113.65289
113.89750
113.65289
113.65289
114.14220
114.38680
114.63150
115.61010
115.12080
114.63150
114.38680
113.65289
113.40820
113.40820
113.16360
113.16360
113.16360
113.16360
113.16360
112.42960
113.40820
113.40820
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
113.16360
113.16360
113.16360
113.16360
111.20631
112.67420
112.91890
112.67420
112.91890
113.16360
112.91890
112.67420
112.91890
112.67420
112.91890
113.16360
112.67420
112.67420
112.91890
113.16360
112.67420
112.91890
111.20631
113.40820
112.91890
112.67420
113.16360
113.65289
113.40820
114.14220
114.87611
114.87611
116.09940
116.34400
116.58870
116.09940
116.34400
116.83331
117.07800
117.07800
116.58870
116.83331
116.58870
116.34400
116.83331
116.83331
117.07800
116.58870
116.58870
117.32260
116.83331
118.79060
116.83331
117.07800
116.58870
116.83331
116.34400
116.58870
116.34400
116.34400
116.34400
116.09940
116.09940
116.34400
115.85471
115.85471
115.85471
115.61010
115.61010
115.61010
115.36539
115.12080
115.61010
115.85471
115.12080
115.12080
114.87611
114.87611
114.38680
114.14220
114.14220
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
114.38680
114.14220
114.38680
114.38680
114.38680
114.38680
114.38680
114.14220
113.89750
114.14220
113.65289
113.16360
112.91890
112.67420
112.42960
112.42960
112.42960
112.18491
112.18491
112.42960
112.18491
112.42960
111.69560
112.42960
112.42960
111.69560
111.94030
112.18491
112.18491
112.18491
111.94030
111.69560
111.94030
111.94030
112.42960
112.18491
112.18491
111.94030
112.18491
112.18491
111.20631
111.69560
111.69560
111.69560
111.94030
111.94030
112.18491
111.69560
112.18491
111.94030
111.69560
112.18491
110.96170
111.69560
111.20631
111.20631
111.45100
110.22771
109.98310
110.22771
110.71700
110.22771
111.20631
111.45100
111.69560
112.18491
112.18491
112.18491
112.42960
112.67420
112.18491
112.42960
112.18491
112.91890
112.18491
112.42960
111.20631
112.42960
112.42960
112.42960
112.42960
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
113.16360
112.18491
112.91890
112.91890
112.67420
112.42960
112.42960
112.42960
112.91890
113.16360
112.67420
113.16360
112.91890
112.42960
112.67420
112.91890
112.18491
112.91890
113.16360
112.91890
112.91890
112.91890
112.67420
112.42960
112.42960
113.16360
112.91890
112.67420
113.16360
112.91890
113.16360
112.91890
112.67420
112.91890
112.67420
112.91890
112.91890
112.91890
113.16360
112.91890
112.91890
112.18491
112.42960
112.42960
112.18491
112.91890
112.67420
112.42960
112.42960
112.18491
112.42960
112.67420
112.42960
112.42960
112.18491
112.67420
112.42960
112.42960
112.67420
112.42960
112.42960
112.42960
112.67420
112.91890
113.40820
113.40820
113.40820
112.91890
112.67420
112.67420
112.91890
113.65289
113.89750
114.38680
114.87611
114.87611
115.12080
115.61010
115.36539
115.61010
115.85471
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
116.09940
116.83331
116.34400
116.58870
116.58870
116.34400
116.83331
116.83331
116.83331
117.32260
116.83331
117.32260
117.56730
117.32260
117.07800
117.32260
117.81200
117.81200
117.81200
118.54590
118.05661
118.05661
117.56730
117.32260
117.81200
118.30130
118.05661
118.54590
118.05661
118.30130
118.05661
118.30130
118.30130
118.30130
118.05661
117.81200
117.32260
118.30130
118.30130
117.81200
117.07800
118.05661
117.81200
117.56730
117.32260
117.32260
117.81200
117.32260
117.81200
117.07800
117.32260
116.83331
117.07800
116.83331
116.83331
117.07800
115.12080
116.58870
116.58870
116.34400
115.85471
116.34400
116.34400
115.85471
116.58870
116.34400
115.61010
115.85471
115.61010
115.85471
115.12080
115.61010
115.61010
115.85471
115.61010
115.36539
114.87611
114.87611
114.63150
114.87611
115.12080
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
114.63150
114.87611
115.12080
114.63150
114.38680
114.38680
114.87611
114.63150
114.63150
114.63150
114.63150
114.63150
114.14220
113.65289
113.65289
113.89750
113.65289
113.40820
113.40820
113.89750
113.89750
113.89750
113.65289
113.65289
113.89750
113.40820
113.40820
113.65289
113.89750
113.89750
114.14220
113.65289
113.40820
113.40820
113.65289
113.40820
114.14220
113.89750
114.14220
113.65289
113.65289
113.65289
113.89750
113.16360
113.16360
113.89750
113.65289
113.16360
113.65289
113.40820
112.91890
113.16360
113.16360
113.40820
113.40820
113.65289
113.16360
113.40820
113.16360
113.16360
112.91890
112.91890
112.91890
113.65289
113.65289
113.16360
112.91890
112.67420
113.16360
112.91890
112.67420
112.91890
112.91890
112.91890
111.20631
112.91890
113.16360
112.42960
112.67420
113.16360
112.42960
6.6.2.1. Background and Data
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc621.htm[6/27/2012 2:37:14 PM]
112.67420
112.91890
112.67420
111.20631
112.42960
112.67420
112.42960
113.16360
112.91890
112.67420
112.91890
112.42960
112.67420
112.18491
112.91890
112.42960
112.18491
6.6.2.2. Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc622.htm[6/27/2012 2:37:15 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.2. Model Identification
Check for
Stationarity,
Outliers,
Seasonality
The first step in the analysis is to generate a run sequence
plot of the response variable. A run sequence plot can
indicate stationarity (i.e., constant location and scale), the
presence of outliers, and seasonal patterns.
Non-stationarity can often be removed by differencing the
data or fitting some type of trend curve. We would then
attempt to fit a Box-Jenkins model to the differenced data or
to the residuals after fitting a trend curve.
Although Box-Jenkins models can estimate seasonal
components, the analyst needs to specify the seasonal period
(for example, 12 for monthly data). Seasonal components are
common for economic time series. They are less common for
engineering and scientific data.
Run Sequence
Plot
Interpretation
of the Run
Sequence Plot
We can make the following conclusions from the run
sequence plot.
1. The data show strong and positive autocorrelation.
2. There does not seem to be a significant trend or any
obvious seasonal pattern in the data.
6.6.2.2. Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc622.htm[6/27/2012 2:37:15 PM]
The next step is to examine the sample autocorrelations using
the autocorrelation plot.
Autocorrelation
Plot
Interpretation
of the
Autocorrelation
Plot
The autocorrelation plot has a 95% confidence band, which
is constructed based on the assumption that the process is a
moving average process. The autocorrelation plot shows that
the sample autocorrelations are very strong and positive and
decay very slowly.
The autocorrelation plot indicates that the process is non-
stationary and suggests an ARIMA model. The next step is to
difference the data.
Run Sequence
Plot of
Differenced
Data
Interpretation
of the Run
Sequence Plot
The run sequence plot of the differenced data shows that the
mean of the differenced data is around zero, with the
differenced data less autocorrelated than the original data.
The next step is to examine the sample autocorrelations of
6.6.2.2. Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc622.htm[6/27/2012 2:37:15 PM]
the differenced data.
Autocorrelation
Plot of the
Differenced
Data
Interpretation
of the
Autocorrelation
Plot of the
Differenced
Data
The autocorrelation plot of the differenced data with a 95%
confidence band shows that only the autocorrelation at lag 1
is significant. The autocorrelation plot together with run
sequence of the differenced data suggest that the differenced
data are stationary. Based on the autocorrelation plot, an
MA(1) model is suggested for the differenced data.
To examine other possible models, we produce the partial
autocorrelation plot of the differenced data.
Partial
Autocorrelation
Plot of the
Differenced
Data
Interpretation
of the Partial
Autocorrelation
Plot of the
Differenced
Data
The partial autocorrelation plot of the differenced data with
95% confidence bands shows that only the partial
autocorrelations of the first and second lag are significant.
This suggests an AR(2) model for the differenced data.
6.6.2.2. Model Identification
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc622.htm[6/27/2012 2:37:15 PM]
Akaike
Information
Criterion (AIC
and AICC)
Information-based criteria, such as the AIC or AICC (see
Brockwell and Davis (2002), pp. 171-174), can be used to
automate the choice of an appropriate model. Many software
programs for time series analysis will generate the AIC or
AICC for a broad range of models.
Whatever method is used for model identification, model
diagnostics should be performed on the selected model.
Based on the plots in this section, we will examine the
ARIMA(2,1,0) and ARIMA(0,1,1) models in detail.
6.6.2.3. Model Estimation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc623.htm[6/27/2012 2:37:16 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.3. Model Estimation
AR(2)
Model
Parameter
Estimates
The following parameter estimates were computed for the AR(2) model based on the
differenced data.
Parameter Standard 95 % Confidence
Source Estimate Error Interval
------ --------- -------- ----------------
Intercept -0.0050 0.0119
AR1 -0.4064 0.0419 (-0.4884, -0.3243)
AR2 -0.1649 0.0419 (-0.2469, -0.0829)
Number of Observations: 558
Degrees of Freedom: 558 - 3 = 555
Residual Standard Deviation: 0.4423
Both AR parameters are significant since the confidence intervals do not contain zero.
The model for the differenced data, Y
t
, is an AR(2) model:
with = 0.4423.
It is often more convenient to express the model in terms of the original data, X
t
, rather
than the differenced data. From the definition of the difference, Y
t
= X
t
- X
t-1
, we can
make the appropriate substitutions into the above equation:
to arrive at the model in terms of the original series:
MA(1)
Model
Parameter
Estimates
Alternatively, the parameter estimates for an MA(1) model based on the differenced data
are the following.
Parameter Standard 95 % Confidence
Source Estimate Error Interval
------ --------- -------- ----------------
Intercept -0.0051 0.0114
MA1 -0.3921 0.0366 (-0.4638, -0.3205)
Number of Observations: 558
Degrees of Freedom: 558 - 2 = 556
Residual Standard Deviation: 0.4434
6.6.2.3. Model Estimation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc623.htm[6/27/2012 2:37:16 PM]
The model for the differenced data, Y
t
, is an ARIMA(0,1,1) model:
with = 0.4434.
It is often more convenient to express the model in terms of the original data, X
t
, rather
than the differenced data. Making the appropriate substitutions into the above equation:
we arrive at the model in terms of the original series:
6.6.2.4. Model Validation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc624.htm[6/27/2012 2:37:17 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.4. Model Validation
Residuals After fitting the model, we should check whether the model
is appropriate.
As with standard non-linear least squares fitting, the primary
tool for model diagnostic checking is residual analysis.
4-Plot of
Residuals from
ARIMA(2,1,0)
Model
The 4-plot is a convenient graphical technique for model
validation in that it tests the assumptions for the residuals on
a single graph.
Interpretation
of the 4-Plot
We can make the following conclusions based on the above
4-plot.
1. The run sequence plot shows that the residuals do not
violate the assumption of constant location and scale. It
also shows that most of the residuals are in the range (-
1, 1).
2. The lag plot indicates that the residuals are not
autocorrelated at lag 1.
3. The histogram and normal probability plot indicate that
the normal distribution provides an adequate fit for this
model.
6.6.2.4. Model Validation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc624.htm[6/27/2012 2:37:17 PM]
Autocorrelation
Plot of
Residuals from
ARIMA(2,1,0)
Model
In addition, the autocorrelation plot of the residuals from the
ARIMA(2,1,0) model was generated.
Interpretation
of the
Autocorrelation
Plot
The autocorrelation plot shows that for the first 25 lags, all
sample autocorrelations except those at lags 7 and 18 fall
inside the 95 % confidence bounds indicating the residuals
appear to be random.
Test the
Randomness of
Residuals From
the
ARIMA(2,1,0)
Model Fit
We apply the Box-Ljung test to the residuals from the
ARIMA(2,1,0) model fit to determine whether residuals are
random. In this example, the Box-Ljung test shows that the
first 24 lag autocorrelations among the residuals are zero (p-
value = 0.080), indicating that the residuals are random and
that the model provides an adequate fit to the data.
4-Plot of
Residuals from
ARIMA(0,1,1)
Model
The 4-plot is a convenient graphical technique for model
validation in that it tests the assumptions for the residuals on
a single graph.
6.6.2.4. Model Validation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc624.htm[6/27/2012 2:37:17 PM]
Interpretation
of the 4-Plot
from the
ARIMA(0,1,1)
Model
We can make the following conclusions based on the above
4-plot.
1. The run sequence plot shows that the residuals do not
violate the assumption of constant location and scale. It
also shows that most of the residuals are in the range (-
1, 1).
2. The lag plot indicates that the residuals are not
autocorrelated at lag 1.
3. The histogram and normal probability plot indicate that
the normal distribution provides an adequate fit for this
model.
This 4-plot of the residuals indicates that the fitted model is
adequate for the data.
Autocorrelation
Plot of
Residuals from
ARIMA(0,1,1)
Model
The autocorrelation plot of the residuals from ARIMA(0,1,1)
was generated.
6.6.2.4. Model Validation
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc624.htm[6/27/2012 2:37:17 PM]
Interpretation
of the
Autocorrelation
Plot
Similar to the result for the ARIMA(2,1,0) model, it shows
that for the first 25 lags, all sample autocorrelations expect
those at lags 7 and 18 fall inside the 95% confidence bounds
indicating the residuals appear to be random.
Test the
Randomness of
Residuals From
the
ARIMA(0,1,1)
Model Fit
The Box-Ljung test is also applied to the residuals from the
ARIMA(0,1,1) model. The test indicates that there is at least
one non-zero autocorrelation amont the first 24 lags. We
conclude that there is not enough evidence to claim that the
residuals are random (p-value = 0.026).
Summary Overall, the ARIMA(0,1,1) is an adequate model. However,
the ARIMA(2,1,0) is a little better than the ARIMA(0,1,1).
6.6.2.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc625.htm[6/27/2012 2:37:18 PM]
6. Process or Product Monitoring and Control
6.6. Case Studies in Process Monitoring
6.6.2. Aerosol Particle Size
6.6.2.5. Work This Example Yourself
View
Dataplot
Macro for
this Case
Study
This page allows you to repeat the analysis outlined in the
case study description on the previous page using Dataplot . It
is required that you have already downloaded and installed
Dataplot and configured your browser. to run Dataplot. Output
from each analysis step below will be displayed in one or
more of the Dataplot windows. The four main windows are the
Output Window, the Graphics window, the Command History
window, and the data sheet window. Across the top of the
main windows there are menus for executing Dataplot
commands. Across the bottom is a command entry window
where commands can be typed in.
Data Analysis Steps
Results and
Conclusions
Click on the links below to start Dataplot and
run this case study yourself. Each step may use
results from previous steps, so please be patient.
Wait until the software verifies that the current
step is complete before clicking on the next step.
The links in this column
will connect you with
more detailed
information about each
analysis step from the
case study description.
1. Invoke Dataplot and read data.
1. Read in the data. 1. You have read
one column of numbers
into Dataplot,
variable Y.
2. Model identification plots
1. Run sequence plot of Y.
2. Autocorrelation plot of Y.
3. Run sequence plot of the
differenced data of Y.
1. The run sequence
plot shows that the
data show strong
and positive
autocorrelation.
2. The
autocorrelation plot
indicates
significant
autocorrelation
and that the
data are not
stationary.
6.6.2.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc625.htm[6/27/2012 2:37:18 PM]
4. Autocorrelation plot of the
differenced data of Y.
5. Partial autocorrelation plot
of the differenced data of Y.
3. The run sequence
plot shows that the
differenced data
appear to be
stationary
and do not
exhibit seasonality.
4. The
autocorrelation plot
of the
differenced data
suggests an
ARIMA(0,1,1)
model may be
appropriate.
5. The partial
autocorrelation plot
suggests an
ARIMA(2,1,0) model
may
be appropriate.
3. Estimate the model.
1. ARIMA(2,1,0) fit of Y.
2. ARIMA(0,1,1) fit of Y.
1. The ARMA fit
generates parameter
estimates for the
ARIMA(2,1,0)
model.
2. The ARMA fit
generates parameter
estimates for the
ARIMA(0,1,1)
model.
4. Model validation.
1. Generate a 4-plot of the
residuals from the ARIMA(2,1,0)
model.
2. Generate an autocorrelation plot
of the residuals from the
ARIMA(2,1,0) model.
3. Perform a Ljung-Box test of
randomness for the residuals from
the ARIMA(2,1,0) model.
4. Generate a 4-plot of the
residuals from the ARIMA(0,1,1)
model.
5. Generate an autocorrelation plot
of the residuals from the
ARIMA(0,1,1) model.
6. Perform a Ljung-Box test of
randomness for the residuals from
the ARIMA(0,1,1) model.
1. The 4-plot shows
that the
assumptions for
the residuals
are satisfied.
2. The
autocorrelation plot
of the
residuals
indicates that the
residuals are
random.
3. The Ljung-Box
test indicates
that the
residuals are
random.
4. The 4-plot shows
that the
assumptions for
the residuals
are satisfied.
5. The
6.6.2.5. Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmc/section6/pmc625.htm[6/27/2012 2:37:18 PM]
autocorrelation plot
of the
residuals
indicates that the
residuals are
random.
6. The Ljung-Box
test indicates
that the
residuals are not
random at the 95%
level, but
are random at the
99% level.
6.7. References
http://www.itl.nist.gov/div898/handbook/pmc/section7/pmc7.htm[6/27/2012 2:37:19 PM]
6. Process or Product Monitoring and Control
6.7. References
Selected References
Time Series Analysis
Abraham, B. and Ledolter, J. (1983). Statistical Methods for Forecasting,
Wiley, New York, NY.
Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (1994). Time Series
Analysis, Forecasting and Control, 3rd ed. Prentice Hall, Englewood Clifs,
NJ.
Box, G. E. P. and McGregor, J. F. (1974). "The Analysis of Closed-Loop
Dynamic Stochastic Systems", Technometrics, Vol. 16-3.
Brockwell, Peter J. and Davis, Richard A. (1987). Time Series: Theory and
Methods, Springer-Verlang.
Brockwell, Peter J. and Davis, Richard A. (2002). Introduction to Time
Series and Forecasting, 2nd. ed., Springer-Verlang.
Chatfield, C. (1996). The Analysis of Time Series, 5th ed., Chapman & Hall,
New York, NY.
DeLurgio, S. A. (1998). Forecasting Principles and Applications, Irwin
McGraw-Hill, Boston, MA.
Ljung, G. and Box, G. (1978). "On a Measure of Lack of Fit in Time Series
Models", Biometrika, 65, 297-303.
Nelson, C. R. (1973). Applied Time Series Analysis for Managerial
Forecasting, Holden-Day, Boca-Raton, FL.
Makradakis, S., Wheelwright, S. C. and McGhee, V. E. (1983). Forecasting:
Methods and Applications, 2nd ed., Wiley, New York, NY.
Statistical Process and Quality Control
Army Chemical Corps (1953). Master Sampling Plans for Single, Duplicate,
Double and Multiple Sampling, Manual No. 2.
Bissell, A. F. (1990). "How Reliable is Your Capability Index?", Applied
Statistics, 39, 331-340.
Champ, C.W., and Woodall, W.H. (1987). "Exact Results for Shewhart
6.7. References
http://www.itl.nist.gov/div898/handbook/pmc/section7/pmc7.htm[6/27/2012 2:37:19 PM]
Control Charts with Supplementary Runs Rules", Technometrics, 29, 393-
399.
Duncan, A. J. (1986). Quality Control and Industrial Statistics, 5th ed.,
Irwin, Homewood, IL.
Hotelling, H. (1947). Multivariate Quality Control. In C. Eisenhart, M. W.
Hastay, and W. A. Wallis, eds. Techniques of Statistical Analysis. New
York: McGraw-Hill.
Juran, J. M. (1997). "Early SQC: A Historical Supplement", Quality
Progress, 30(9) 73-81.
Montgomery, D. C. (2000). Introduction to Statistical Quality Control, 4th
ed., Wiley, New York, NY.
Kotz, S. and Johnson, N. L. (1992). Process Capability Indices, Chapman &
Hall, London.
Lowry, C. A., Woodall, W. H., Champ, C. W., and Rigdon, S. E. (1992). "A
Multivariate Exponentially Weighted Moving Average Chart",
Technometrics, 34, 46-53.
Lucas, J. M. and Saccucci, M. S. (1990). "Exponentially weighted moving
average control schemes: Properties and enhancements", Technometrics 32,
1-29.
Ott, E. R. and Schilling, E. G. (1990). Process Quality Control, 2nd ed.,
McGraw-Hill, New York, NY.
Quesenberry, C. P. (1993). "The effect of sample size on estimated limits for
and X control charts", Journal of Quality Technology, 25(4) 237-247.
Ryan, T.P. (2000). Statistical Methods for Quality Improvement, 2nd ed.,
Wiley, New York, NY.
Ryan, T. P. and Schwertman, N. C. (1997). "Optimal limits for attributes
control charts", Journal of Quality Technology, 29 (1), 86-98.
Schilling, E. G. (1982). Acceptance Sampling in Quality Control, Marcel
Dekker, New York, NY.
Tracy, N. D., Young, J. C. and Mason, R. L. (1992). "Multivariate Control
Charts for Individual Observations", Journal of Quality Technology, 24(2),
88-95.
Woodall, W. H. (1997). "Control Charting Based on Attribute Data:
Bibliography and Review", Journal of Quality Technology, 29, 172-183.
Woodall, W. H., and Adams, B. M. (1993); "The Statistical Design of
CUSUM Charts", Quality Engineering, 5(4), 559-570.
Zhang, Stenback, and Wardrop (1990). "Interval Estimation of the Process
Capability Index", Communications in Statistics: Theory and Methods,
19(21), 4455-4470.
6.7. References
http://www.itl.nist.gov/div898/handbook/pmc/section7/pmc7.htm[6/27/2012 2:37:19 PM]
Statistical Analysis
Anderson, T. W. (1984). Introduction to Multivariate Statistical Analysis,
2nd ed., Wiley New York, NY.
Johnson, R. A. and Wichern, D. W. (1998). Applied Multivariate Statistical
Analysis, Fourth Ed., Prentice Hall, Upper Saddle River, NJ.
7. Product and Process Comparisons
http://www.itl.nist.gov/div898/handbook/prc/prc.htm[6/27/2012 2:43:09 PM]
7. Product and Process
Comparisons
This chapter presents the background and specific analysis techniques
needed to compare the performance of one or more processes against known
standards or one another.
1. Introduction
1. Scope
2. Assumptions
3. Statistical Tests
4. Confidence Intervals
5. Equivalence of Tests and
Intervals
6. Outliers
7. Trends
2. Comparisons: One Process
1. Comparing to a Distribution
2. Comparing to a Nominal
Mean
3. Comparing to Nominal
Variability
4. Fraction Defective
5. Defect Density
6. Location of Population
Values
3. Comparisons: Two Processes
1. Means: Normal Data
2. Variability: Normal Data
3. Fraction Defective
4. Failure Rates
5. Means: General Case
4. Comparisons: Three +
Processes
1. Comparing Populations
2. Comparing Variances
3. Comparing Means
4. Variance Components
5. Comparing Categorical
Datasets
6. Comparing Fraction
Defectives
7. Multiple Comparisons
Detailed table of contents
References for Chapter 7
7. Product and Process Comparisons
http://www.itl.nist.gov/div898/handbook/prc/prc_d.htm[6/27/2012 2:42:08 PM]
7. Product and Process Comparisons - Detailed Table of Contents [7.]
1. Introduction [7.1.]
1. What is the scope? [7.1.1.]
2. What assumptions are typically made? [7.1.2.]
3. What are statistical tests? [7.1.3.]
1. Critical values and p values [7.1.3.1.]
4. What are confidence intervals? [7.1.4.]
5. What is the relationship between a test and a confidence interval? [7.1.5.]
6. What are outliers in the data? [7.1.6.]
7. What are trends in sequential process or product data? [7.1.7.]
2. Comparisons based on data from one process [7.2.]
1. Do the observations come from a particular distribution? [7.2.1.]
1. Chi-square goodness-of-fit test [7.2.1.1.]
2. Kolmogorov- Smirnov test [7.2.1.2.]
3. Anderson-Darling and Shapiro-Wilk tests [7.2.1.3.]
2. Are the data consistent with the assumed process mean? [7.2.2.]
1. Confidence interval approach [7.2.2.1.]
2. Sample sizes required [7.2.2.2.]
3. Are the data consistent with a nominal standard deviation? [7.2.3.]
1. Confidence interval approach [7.2.3.1.]
2. Sample sizes required [7.2.3.2.]
4. Does the proportion of defectives meet requirements? [7.2.4.]
1. Confidence intervals [7.2.4.1.]
2. Sample sizes required [7.2.4.2.]
5. Does the defect density meet requirements? [7.2.5.]
6. What intervals contain a fixed percentage of the population values? [7.2.6.]
1. Approximate intervals that contain most of the population values [7.2.6.1.]
2. Percentiles [7.2.6.2.]
3. Tolerance intervals for a normal distribution [7.2.6.3.]
4. Tolerance intervals based on the largest and smallest observations [7.2.6.4.]
3. Comparisons based on data from two processes [7.3.]
1. Do two processes have the same mean? [7.3.1.]
1. Analysis of paired observations [7.3.1.1.]
2. Confidence intervals for differences between means [7.3.1.2.]
2. Do two processes have the same standard deviation? [7.3.2.]
3. How can we determine whether two processes produce the same proportion of defectives? [7.3.3.]
4. Assuming the observations are failure times, are the failure rates (or Mean Times To Failure) for two
distributions the same? [7.3.4.]
5. Do two arbitrary processes have the same central tendency? [7.3.5.]
4. Comparisons based on data from more than two processes [7.4.]
7. Product and Process Comparisons
http://www.itl.nist.gov/div898/handbook/prc/prc_d.htm[6/27/2012 2:42:08 PM]
1. How can we compare several populations with unknown distributions (the Kruskal-Wallis
test)? [7.4.1.]
2. Assuming the observations are normal, do the processes have the same variance? [7.4.2.]
3. Are the means equal? [7.4.3.]
1. 1-Way ANOVA overview [7.4.3.1.]
2. The 1-way ANOVA model and assumptions [7.4.3.2.]
3. The ANOVA table and tests of hypotheses about means [7.4.3.3.]
4. 1-Way ANOVA calculations [7.4.3.4.]
5. Confidence intervals for the difference of treatment means [7.4.3.5.]
6. Assessing the response from any factor combination [7.4.3.6.]
7. The two-way ANOVA [7.4.3.7.]
8. Models and calculations for the two-way ANOVA [7.4.3.8.]
4. What are variance components? [7.4.4.]
5. How can we compare the results of classifying according to several categories? [7.4.5.]
6. Do all the processes have the same proportion of defects? [7.4.6.]
7. How can we make multiple comparisons? [7.4.7.]
1. Tukey's method [7.4.7.1.]
2. Scheffe's method [7.4.7.2.]
3. Bonferroni's method [7.4.7.3.]
4. Comparing multiple proportions: The Marascuillo procedure [7.4.7.4.]
5. References [7.5.]
7.1. Introduction
http://www.itl.nist.gov/div898/handbook/prc/section1/prc1.htm[6/27/2012 2:42:13 PM]
7. Product and Process Comparisons
7.1. Introduction
Goals of
this
section
The primary goal of this section is to lay a foundation for
understanding statistical tests and confidence intervals that are
useful for making decisions about processes and comparisons
among processes. The materials covered are:
Scope
Assumptions
Introduction to hypothesis testing
Introduction to confidence intervals
Relationship between hypothesis testing and confidence
intervals
Outlier detection
Detection of sequential trends in data or processes
Hypothesis
testing and
confidence
intervals
This chapter explores the types of comparisons which can be
made from data and explains hypothesis testing, confidence
intervals, and the interpretation of each.
7.1.1. What is the scope?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc11.htm[6/27/2012 2:42:14 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.1. What is the scope?
Data from
one
process
This section deals with introductory material related to
comparisons that can be made on data from one process for
cases where the process standard deviation may be known or
unknown.
7.1.2. What assumptions are typically made?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc12.htm[6/27/2012 2:42:15 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.2. What assumptions are typically made?
Validity of
tests
The validity of the tests described in this chapter depend
on the following assumptions:
1. The data come from a single process that can be
represented by a single statistical distribution.
2. The distribution is a normal distribution.
3. The data are uncorrelated over time.
An easy method for checking the assumption of a single
normal distribution is to construct a histogram of the data.
Clarification The tests described in this chapter depend on the
assumption of normality, and the data should be examined
for departures from normality before the tests are applied.
However, the tests are robust to small departures from
normality; i.e., they work fairly well as long as the data are
bell-shaped and the tails are not heavy. Quantitative
methods for checking the normality assumption are
discussed in the next section.
Another graphical method for testing the normality
assumption is the normal probability plot.
A graphical method for testing for correlation among
measurements is a time-lag plot. Correlation may not be a
problem if measurements are properly structured over time.
Correlation problems often occur when measurements are
made close together in time.
7.1.3. What are statistical tests?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc13.htm[6/27/2012 2:42:15 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.3. What are statistical tests?
What is
meant by a
statistical
test?
A statistical test provides a mechanism for making
quantitative decisions about a process or processes. The
intent is to determine whether there is enough evidence to
"reject" a conjecture or hypothesis about the process. The
conjecture is called the null hypothesis. Not rejecting may be
a good result if we want to continue to act as if we "believe"
the null hypothesis is true. Or it may be a disappointing
result, possibly indicating we may not yet have enough data
to "prove" something by rejecting the null hypothesis.
For more discussion about the meaning of a statistical
hypothesis test, see Chapter 1.
Concept of
null
hypothesis
A classic use of a statistical test occurs in process control
studies. For example, suppose that we are interested in
ensuring that photomasks in a production process have mean
linewidths of 500 micrometers. The null hypothesis, in this
case, is that the mean linewidth is 500 micrometers. Implicit
in this statement is the need to flag photomasks which have
mean linewidths that are either much greater or much less
than 500 micrometers. This translates into the alternative
hypothesis that the mean linewidths are not equal to 500
micrometers. This is a two-sided alternative because it guards
against alternatives in opposite directions; namely, that the
linewidths are too small or too large.
The testing procedure works this way. Linewidths at random
positions on the photomask are measured using a scanning
electron microscope. A test statistic is computed from the
data and tested against pre-determined upper and lower
critical values. If the test statistic is greater than the upper
critical value or less than the lower critical value, the null
hypothesis is rejected because there is evidence that the mean
linewidth is not 500 micrometers.
One-sided
tests of
hypothesis
Null and alternative hypotheses can also be one-sided. For
example, to ensure that a lot of light bulbs has a mean
lifetime of at least 500 hours, a testing program is
implemented. The null hypothesis, in this case, is that the
mean lifetime is greater than or equal to 500 hours. The
complement or alternative hypothesis that is being guarded
against is that the mean lifetime is less than 500 hours. The
test statistic is compared with a lower critical value, and if it
7.1.3. What are statistical tests?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc13.htm[6/27/2012 2:42:15 PM]
is less than this limit, the null hypothesis is rejected.
Thus, a statistical test requires a pair of hypotheses; namely,
H
0
: a null hypothesis
H
a
: an alternative hypothesis.
Significance
levels
The null hypothesis is a statement about a belief. We may
doubt that the null hypothesis is true, which might be why we
are "testing" it. The alternative hypothesis might, in fact, be
what we believe to be true. The test procedure is constructed
so that the risk of rejecting the null hypothesis, when it is in
fact true, is small. This risk, , is often referred to as the
significance level of the test. By having a test with a small
value of , we feel that we have actually "proved" something
when we reject the null hypothesis.
Errors of
the second
kind
The risk of failing to reject the null hypothesis when it is in
fact false is not chosen by the user but is determined, as one
might expect, by the magnitude of the real discrepancy. This
risk, , is usually referred to as the error of the second kind.
Large discrepancies between reality and the null hypothesis
are easier to detect and lead to small errors of the second
kind; while small discrepancies are more difficult to detect
and lead to large errors of the second kind. Also the risk
increases as the risk decreases. The risks of errors of the
second kind are usually summarized by an operating
characteristic curve (OC) for the test. OC curves for several
types of tests are shown in (Natrella, 1962).
Guidance in
this chapter
This chapter gives methods for constructing test statistics and
their corresponding critical values for both one-sided and
two-sided tests for the specific situations outlined under the
scope. It also provides guidance on the sample sizes required
for these tests.
Further guidance on statistical hypothesis testing,
significance levels and critical regions, is given in Chapter 1.
7.1.3.1. Critical values and p values
http://www.itl.nist.gov/div898/handbook/prc/section1/prc131.htm[6/27/2012 2:42:16 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.3. What are statistical tests?
7.1.3.1. Critical values and p values
Determination
of critical
values
Critical values for a test of hypothesis depend upon a test
statistic, which is specific to the type of test, and the
significance level, , which defines the sensitivity of the
test. A value of = 0.05 implies that the null hypothesis is
rejected 5% of the time when it is in fact true. The choice
of is somewhat arbitrary, although in practice values of
0.1, 0.05, and 0.01 are common. Critical values are
essentially cut-off values that define regions where the test
statistic is unlikely to lie; for example, a region where the
critical value is exceeded with probability if the null
hypothesis is true. The null hypothesis is rejected if the test
statistic lies within this region which is often referred to as
the rejection region(s). Critical values for specific tests of
hypothesis are tabled in chapter 1.
Information in
this chapter
This chapter gives formulas for the test statistics and points
to the appropriate tables of critical values for tests of
hypothesis regarding means, standard deviations, and
proportion defectives.
P values Another quantitative measure for reporting the result of a
test of hypothesis is the p-value. The p-value is the
probability of the test statistic being at least as extreme as
the one observed given that the null hypothesis is true. A
small p-value is an indication that the null hypothesis is
false.
Good practice It is good practice to decide in advance of the test how
small a p-value is required to reject the test. This is exactly
analagous to choosing a significance level, for test. For
example, we decide either to reject the null hypothesis if
the test statistic exceeds the critical value (for = 0.05) or
analagously to reject the null hypothesis if the p-value is
smaller than 0.05. It is important to understand the
relationship between the two concepts because some
statistical software packages report p-values rather than
critical values.
7.1.4. What are confidence intervals?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm[6/27/2012 2:42:17 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.4. What are confidence intervals?
How do we
form a
confidence
interval?
The purpose of taking a random sample from a lot or
population and computing a statistic, such as the mean from
the data, is to approximate the mean of the population. How
well the sample statistic estimates the underlying population
value is always an issue. A confidence interval addresses this
issue because it provides a range of values which is likely to
contain the population parameter of interest.
Confidence
levels
Confidence intervals are constructed at a confidence level,
such as 95%, selected by the user. What does this mean? It
means that if the same population is sampled on numerous
occasions and interval estimates are made on each occasion,
the resulting intervals would bracket the true population
parameter in approximately 95% of the cases. A confidence
stated at a level can be thought of as the inverse of a
significance level, .
One and
two-sided
confidence
intervals
In the same way that statistical tests can be one or two-sided,
confidence intervals can be one or two-sided. A two-sided
confidence interval brackets the population parameter from
above and below. A one-sided confidence interval brackets
the population parameter either from above or below and
furnishes an upper or lower bound to its magnitude.
Example of
a two-
sided
confidence
interval
For example, a 100( )% confidence interval for the mean
of a normal population is;
where is the sample mean, z
1-/2
is the 1-/2 critical value
of the standard normal distribution which is found in the table
of the standard normal distribution, is the known population
standard deviation, and N is the sample size.
Guidance
in this
chapter
This chapter provides methods for estimating the population
parameters and confidence intervals for the situations
described under the scope.
7.1.4. What are confidence intervals?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm[6/27/2012 2:42:17 PM]
Problem
with
unknown
standard
deviation
In the normal course of events, population standard deviations
are not known, and must be estimated from the data.
Confidence intervals, given the same confidence level, are by
necessity wider if the standard deviation is estimated from
limited data because of the uncertainty in this estimate.
Procedures for creating confidence intervals in this situation
are described fully in this chapter.
More information on confidence intervals can also be found in
Chapter 1.
7.1.5. What is the relationship between a test and a confidence interval?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc15.htm[6/27/2012 2:42:18 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.5. What is the relationship between a test
and a confidence interval?
There is a
correspondence
between
hypothesis
testing and
confidence
intervals
In general, for every test of hypothesis there is an
equivalent statement about whether the hypothesized
parameter value is included in a confidence interval. For
example, consider the previous example of linewidths
where photomasks are tested to ensure that their
linewidths have a mean of 500 micrometers. The null and
alternative hypotheses are:
H
0
: mean linewidth = 500 micrometers
H
a
: mean linewidth 500 micrometers
Hypothesis test
for the mean For the test, the sample mean, , is calculated from N
linewidths chosen at random positions on each
photomask. For the purpose of the test, it is assumed that
the standard deviation, , is known from a long history
of this process. A test statistic is calculated from these
sample statistics, and the null hypothesis is rejected if:
where z
/2
and z
1-/2
are tabled values from the normal
distribution.
Equivalent
confidence
interval
With some algebra, it can be seen that the null hypothesis
is rejected if and only if the value 500 micrometers is not
in the confidence interval
Equivalent
confidence
interval
In fact, all values bracketed by this interval would be
accepted as null values for a given set of test data.
7.1.5. What is the relationship between a test and a confidence interval?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc15.htm[6/27/2012 2:42:18 PM]
7.1.6. What are outliers in the data?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm[6/27/2012 2:42:18 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.6. What are outliers in the data?
Definition
of outliers
An outlier is an observation that lies an abnormal distance from other values in a
random sample from a population. In a sense, this definition leaves it up to the
analyst (or a consensus process) to decide what will be considered abnormal. Before
abnormal observations can be singled out, it is necessary to characterize normal
observations.
Ways to
describe
data
Two activities are essential for characterizing a set of data:
1. Examination of the overall shape of the graphed data for important features,
including symmetry and departures from assumptions. The chapter on
Exploratory Data Analysis (EDA) discusses assumptions and summarization
of data in detail.
2. Examination of the data for unusual observations that are far removed from
the mass of data. These points are often referred to as outliers. Two graphical
techniques for identifying outliers, scatter plots and box plots, along with an
analytic procedure for detecting outliers when the distribution is normal
(Grubbs' Test), are also discussed in detail in the EDA chapter.
Box plot
construction
The box plot is a useful graphical display for describing the behavior of the data in
the middle as well as at the ends of the distributions. The box plot uses the median
and the lower and upper quartiles (defined as the 25th and 75th percentiles). If the
lower quartile is Q1 and the upper quartile is Q2, then the difference (Q2 - Q1) is
called the interquartile range or IQ.
Box plots
with fences
A box plot is constructed by drawing a box between the upper and lower quartiles
with a solid line drawn across the box to locate the median. The following quantities
(called fences) are needed for identifying extreme values in the tails of the
distribution:
1. lower inner fence: Q1 - 1.5*IQ
2. upper inner fence: Q2 + 1.5*IQ
3. lower outer fence: Q1 - 3*IQ
4. upper outer fence: Q2 + 3*IQ
Outlier
detection
criteria
A point beyond an inner fence on either side is considered a mild outlier. A point
beyond an outer fence is considered an extreme outlier.
Example of
an outlier
The data set of N = 90 ordered observations as shown below is examined for
outliers:
7.1.6. What are outliers in the data?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm[6/27/2012 2:42:18 PM]
box plot
30, 171, 184, 201, 212, 250, 265, 270, 272, 289, 305, 306, 322, 322, 336, 346, 351,
370, 390, 404, 409, 411, 436, 437, 439, 441, 444, 448, 451, 453, 470, 480, 482, 487,
494, 495, 499, 503, 514, 521, 522, 527, 548, 550, 559, 560, 570, 572, 574, 578, 585,
592, 592, 607, 616, 618, 621, 629, 637, 638, 640, 656, 668, 707, 709, 719, 737, 739,
752, 758, 766, 792, 792, 794, 802, 818, 830, 832, 843, 858, 860, 869, 918, 925, 953,
991, 1000, 1005, 1068, 1441
The computations are as follows:
Median = (n+1)/2 largest data point = the average of the 45th and 46th
ordered points = (559 + 560)/2 = 559.5
Lower quartile = .25(N+1)th ordered point = 22.75th ordered point = 411 +
.75(436-411) = 429.75
Upper quartile = .75(N+1)th ordered point = 68.25th ordered point = 739
+.25(752-739) = 742.25
Interquartile range = 742.25 - 429.75 = 312.5
Lower inner fence = 429.75 - 1.5 (312.5) = -39.0
Upper inner fence = 742.25 + 1.5 (312.5) = 1211.0
Lower outer fence = 429.75 - 3.0 (312.5) = -507.75
Upper outer fence = 742.25 + 3.0 (312.5) = 1679.75
From an examination of the fence points and the data, one point (1441) exceeds the
upper inner fence and stands out as a mild outlier; there are no extreme outliers.
Histogram
with box
plot
A histogram with an overlaid box plot are shown below.
7.1.6. What are outliers in the data?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm[6/27/2012 2:42:18 PM]
The outlier is identified as the largest value in the data set, 1441, and appears as the
circle to the right of the box plot.
Outliers
may contain
important
information
Outliers should be investigated carefully. Often they contain valuable information
about the process under investigation or the data gathering and recording process.
Before considering the possible elimination of these points from the data, one should
try to understand why they appeared and whether it is likely similar values will
continue to appear. Of course, outliers are often bad data points.
7.1.7. What are trends in sequential process or product data?
http://www.itl.nist.gov/div898/handbook/prc/section1/prc17.htm[6/27/2012 2:42:19 PM]
7. Product and Process Comparisons
7.1. Introduction
7.1.7. What are trends in sequential process or
product data?
Detecting
trends by
plotting
the data
points to
see if a
line with
an
obviously
non-zero
slope fits
the points
Detecting trends is equivalent to comparing the process values
to what we would expect a series of numbers to look like if
there were no trends. If we see a significant departure from a
model where the next observation is equally likely to go up or
down, then we would reject the hypothesis of "no trend".
A common way of investigating for trends is to fit a straight
line to the data and observe the line's direction (or slope). If
the line looks horizontal, then there is no evidence of a trend;
otherwise there is. Formally, this is done by testing whether
the slope of the line is significantly different from zero. The
methodology for this is covered in Chapter 4.
Other
trend tests
A non-parametric approach for detecting significant trends
known as the Reverse Arrangement Test is described in
Chapter 8.
7.2. Comparisons based on data from one process
http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm[6/27/2012 2:42:20 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one
process
Questions
answered in
this section
For a single process, the current state of the process can be
compared with a nominal or hypothesized state. This
section outlines techniques for answering the following
questions from data gathered from a single process:
1. Do the observations come from a particular
distribution?
1. Chi-Square Goodness-of-Fit test for a
continuous or discrete distribution
2. Kolmogorov- Smirnov test for a continuous
distribution
3. Anderson-Darling and Shapiro-Wilk tests for
a continuous distribution
2. Are the data consistent with the assumed process
mean?
1. Confidence interval approach
2. Sample sizes required
3. Are the data consistent with a nominal standard
deviation?
1. Confidence interval approach
2. Sample sizes required
4. Does the proportion of defectives meet
requirements?
1. Confidence intervals
2. Sample sizes required
5. Does the defect density meet requirements?
6. What intervals contain a fixed percentage of the
data?
1. Approximate intervals that contain most of the
population values
2. Percentiles
3. Tolerance intervals
4. Tolerance intervals based on the smallest and
largest observations
General forms
of testing
These questions are addressed either by an hypothesis test
or by a confidence interval.
Parametric vs.
non-
parametric
All hypothesis-testing procedures can be broadly described
as either parametric or non-parametric/distribution-free.
Parametric test procedures are those that:
7.2. Comparisons based on data from one process
http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm[6/27/2012 2:42:20 PM]
testing
1. Involve hypothesis testing of specified parameters
(such as "the population mean=50 grams"...).
2. Require a stringent set of assumptions about the
underlying sampling distributions.
When to use
nonparametric
methods?
When do we require non-parametric or distribution-free
methods? Here are a few circumstances that may be
candidates:
1. The measurements are only categorical; i.e., they are
nominally scaled, or ordinally (in ranks) scaled.
2. The assumptions underlying the use of parametric
methods cannot be met.
3. The situation at hand requires an investigation of
such features as randomness, independence,
symmetry, or goodness of fit rather than the testing
of hypotheses about specific values of particular
population parameters.
Difference
between non-
parametric
and
distribution-
free
Some authors distinguish between non-parametric and
distribution-free procedures.
Distribution-free test procedures are broadly defined as:
1. Those whose test statistic does not depend on the
form of the underlying population distribution from
which the sample data were drawn, or
2. Those for which the data are nominally or ordinally
scaled.
Nonparametric test procedures are defined as those that
are not concerned with the parameters of a distribution.
Advantages of
nonparametric
methods.
Distribution-free or nonparametric methods have several
advantages, or benefits:
1. They may be used on all types of data-categorical
data, which are nominally scaled or are in rank form,
called ordinally scaled, as well as interval or ratio-
scaled data.
2. For small sample sizes they are easy to apply.
3. They make fewer and less stringent assumptions
than their parametric counterparts.
4. Depending on the particular procedure they may be
almost as powerful as the corresponding parametric
procedure when the assumptions of the latter are
7.2. Comparisons based on data from one process
http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm[6/27/2012 2:42:20 PM]
met, and when this is not the case, they are generally
more powerful.
Disadvantages
of
nonparametric
methods
Of course there are also disadvantages:
1. If the assumptions of the parametric methods can be
met, it is generally more efficient to use them.
2. For large sample sizes, data manipulations tend to
become more laborious, unless computer software is
available.
3. Often special tables of critical values are needed for
the test statistic, and these values cannot always be
generated by computer software. On the other hand,
the critical values for the parametric tests are readily
available and generally easy to incorporate in
computer programs.
7.2.1. Do the observations come from a particular distribution?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc21.htm[6/27/2012 2:42:21 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.1. Do the observations come from a
particular distribution?
Data are
often
assumed to
come from
a particular
distribution.
Goodness-of-fit tests indicate whether or not it is reasonable
to assume that a random sample comes from a specific
distribution. Statistical techniques often rely on observations
having come from a population that has a distribution of a
specific form (e.g., normal, lognormal, Poisson, etc.).
Standard control charts for continuous measurements, for
instance, require that the data come from a normal
distribution. Accurate lifetime modeling requires specifying
the correct distributional model. There may be historical or
theoretical reasons to assume that a sample comes from a
particular population, as well. Past data may have
consistently fit a known distribution, for example, or theory
may predict that the underlying population should be of a
specific form.
Hypothesis
Test model
for
Goodness-
of-fit
Goodness-of-fit tests are a form of hypothesis testing where
the null and alternative hypotheses are
H
0
: Sample data come from the stated distribution.
H
A
: Sample data do not come from the stated distribution.
Parameters
may be
assumed or
estimated
from the
data
One needs to consider whether a simple or composite
hypothesis is being tested. For a simple hypothesis, values of
the distribution's parameters are specified prior to drawing
the sample. For a composite hypothesis, one or more of the
parameters is unknown. Often, these parameters are estimated
using the sample observations.
A simple hypothesis would be:
H
0
: Data are from a normal distribution, = 0 and = 1.
A composite hypothesis would be:
H
0
: Data are from a normal distribution, unknown and .
Composite hypotheses are more common because they allow
us to decide whether a sample comes from any distribution of
a specific type. In this situation, the form of the distribution
is of interest, regardless of the values of the parameters.
Unfortunately, composite hypotheses are more difficult to
7.2.1. Do the observations come from a particular distribution?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc21.htm[6/27/2012 2:42:21 PM]
work with because the critical values are often hard to
compute.
Problems
with
censored
data
A second issue that affects a test is whether the data are
censored. When data are censored, sample values are in some
way restricted. Censoring occurs if the range of potential
values are limited such that values from one or both tails of
the distribution are unavailable (e.g., right and/or left
censoring - where high and/or low values are missing).
Censoring frequently occurs in reliability testing, when either
the testing time or the number of failures to be observed is
fixed in advance. A thorough treatment of goodness-of-fit
testing under censoring is beyond the scope of this document.
See D'Agostino & Stephens (1986) for more details.
Three types
of tests will
be covered
Three goodness-of-fit tests are examined in detail:
1. Chi-square test for continuous and discrete
distributions;
2. Kolmogorov-Smirnov test for continuous distributions
based on the empirical distribution function (EDF);
3. Anderson-Darling test for continuous distributions.
A more extensive treatment of goodness-of-fit techniques is
presented in D'Agostino & Stephens (1986). Along with the
tests mentioned above, other general and specific tests are
examined, including tests based on regression and graphical
techniques.
7.2.1.1. Chi-square goodness-of-fit test
http://www.itl.nist.gov/div898/handbook/prc/section2/prc211.htm[6/27/2012 2:42:21 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.1. Do the observations come from a particular distribution?
7.2.1.1. Chi-square goodness-of-fit test
Choice of
number of
groups for
"Goodness
of Fit" tests
is important
- but only
useful rules
of thumb
can be given
The test requires that the data first be grouped. The actual
number of observations in each group is compared to the
expected number of observations and the test statistic is
calculated as a function of this difference. The number of
groups and how group membership is defined will affect the
power of the test (i.e., how sensitive it is to detecting
departures from the null hypothesis). Power will not only be
affected by the number of groups and how they are defined,
but by the sample size and shape of the null and underlying
(true) distributions. Despite the lack of a clear "best
method", some useful rules of thumb can be given.
Group
Membership
When data are discrete, group membership is unambiguous.
Tabulation or cross tabulation can be used to categorize the
data. Continuous data present a more difficult challenge.
One defines groups by segmenting the range of possible
values into non-overlapping intervals. Group membership
can then be defined by the endpoints of the intervals. In
general, power is maximized by choosing endpoints such
that group membership is equiprobable (i.e., the probabilities
associated with an observation falling into a given group are
divided as evenly as possible across the intervals). Many
commercial software packages follow this procedure.
Rule-of-
thumb for
number of
groups
One rule-of-thumb suggests using the value 2n
2/5
as a good
starting point for choosing the number of groups. Another
well known rule-of-thumb requires every group to have at
least 5 data points.
Computation
of the chi-
square
goodness-
of-fit test
The formulas for the computation of the chi-square goodnes-
of-fit test are given in the EDA chapter.
7.2.1.2. Kolmogorov- Smirnov test
http://www.itl.nist.gov/div898/handbook/prc/section2/prc212.htm[6/27/2012 2:42:22 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.1. Do the observations come from a particular distribution?
7.2.1.2. Kolmogorov- Smirnov test
The K-S
test is a
good
alternative
to the chi-
square
test.
The Kolmogorov-Smirnov (K-S) test was originally proposed
in the 1930's in papers by Kolmogorov (1933) and Smirnov
(1936). Unlike the Chi-Square test, which can be used for
testing against both continuous and discrete distributions, the
K-S test is only appropriate for testing data against a
continuous distribution, such as the normal or Weibull
distribution. It is one of a number of tests that are based on the
empirical cumulative distribution function (ECDF).
K-S
procedure
Details on the construction and interpretation of the K-S test
statistic, D, and examples for several distributions are outlined
in Chapter 1.
The
probability
associated
with the
test
statistic is
difficult to
compute.
Critical values associated with the test statistic, D, are difficult
to compute for finite sample sizes, often requiring Monte
Carlo simulation. However, some general purpose statistical
software programs support the Kolmogorov-Smirnov test at
least for some of the more common distributions. Tabled
values can be found in Birnbaum (1952). A correction factor
can be applied if the parameters of the distribution are
estimated with the same data that are being tested. See
D'Agostino and Stephens (1986) for details.
7.2.1.3. Anderson-Darling and Shapiro-Wilk tests
http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm[6/27/2012 2:42:23 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.1. Do the observations come from a particular distribution?
7.2.1.3. Anderson-Darling and Shapiro-Wilk
tests
Purpose:
Test for
distributional
adequacy
The Anderson-Darling Test
The Anderson-Darling test (Stephens, 1974) is used to test
if a sample of data comes from a specific distribution. It is a
modification of the Kolmogorov-Smirnov (K-S) test and
gives more weight to the tails of the distribution than does
the K-S test. The K-S test is distribution free in the sense
that the critical values do not depend on the specific
distribution being tested.
Requires
critical
values for
each
distribution
The Anderson-Darling test makes use of the specific
distribution in calculating critical values. This has the
advantage of allowing a more sensitive test and the
disadvantage that critical values must be calculated for each
distribution. Tables of critical values are not given in this
handbook (see Stephens 1974, 1976, 1977, and 1979)
because this test is usually applied with a statistical
software program that produces the relevant critical values.
Currently, Dataplot computes critical values for the
Anderson-Darling test for the following distributions:
normal
lognormal
Weibull
extreme value type I.
Anderson-
Darling
procedure
Details on the construction and interpretation of the
Anderson-Darling test statistic, A
2
, and examples for
several distributions are outlined in Chapter 1.
Shapiro-Wilk
test for
normality
The Shapiro-Wilk Test For Normality
The Shapiro-Wilk test, proposed in 1965, calculates a W
statistic that tests whether a random sample, x
1
, x
2
, ..., x
n
comes from (specifically) a normal distribution . Small
values of W are evidence of departure from normality and
percentage points for the W statistic, obtained via Monte
Carlo simulations, were reproduced by Pearson and Hartley
(1972, Table 16). This test has done very well in
7.2.1.3. Anderson-Darling and Shapiro-Wilk tests
http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm[6/27/2012 2:42:23 PM]
comparison studies with other goodness of fit tests.
The W statistic is calculated as follows:
where the x
(i)
are the ordered sample values (x
(1)
is the
smallest) and the a
i
are constants generated from the means,
variances and covariances of the order statistics of a sample
of size n from a normal distribution (see Pearson and
Hartley (1972, Table 15).
For more information about the Shapiro-Wilk test the reader
is referred to the original Shapiro and Wilk (1965) paper
and the tables in Pearson and Hartley (1972).
7.2.2. Are the data consistent with the assumed process mean?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm[6/27/2012 2:42:24 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.2. Are the data consistent with the assumed
process mean?
The testing
of H
0
for a
single
population
mean
Given a random sample of measurements, Y
1
, ..., Y
N
, there
are three types of questions regarding the true mean of the
population that can be addressed with the sample data. They
are:
1. Does the true mean agree with a known standard or
assumed mean?
2. Is the true mean of the population less than a given
standard?
3. Is the true mean of the population at least as large as a
given standard?
Typical null
hypotheses
The corresponding null hypotheses that test the true mean, ,
against the standard or assumed mean, are:
1.
2.
3.
Test
statistic
where the
standard
deviation is
not known
The basic statistics for the test are the sample mean and the
standard deviation. The form of the test statistic depends on
whether the poulation standard deviation, , is known or is
estimated from the data at hand. The more typical case is
where the standard deviation must be estimated from the
data, and the test statistic is
where the sample mean is
and the sample standard deviation is
7.2.2. Are the data consistent with the assumed process mean?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm[6/27/2012 2:42:24 PM]
with N - 1 degrees of freedom.
Comparison
with critical
values
For a test at significance level , where is chosen to be
small, typically 0.01, 0.05 or 0.10, the hypothesis associated
with each case enumerated above is rejected if:
1. | t | t
1-/2, N-1
2. t t
1-, N-1
3. t t
, N-1
where t
1-/2, N-1
is the 1-/2 critical value from the t
distribution with N - 1 degrees of freedom and similarly for
cases (2) and (3). Critical values can be found in the t-table
in Chapter 1.
Test
statistic
where the
standard
deviation is
known
If the standard deviation is known, the form of the test
statistic is
For case (1), the test statistic is compared with z
1-/2
, which
is the 1-/2 critical value from the standard normal
distribution, and similarly for cases (2) and (3).
Caution If the standard deviation is assumed known for the purpose
of this test, this assumption should be checked by a test of
hypothesis for the standard deviation.
An
illustrative
example of
the t-test
The following numbers are particle (contamination) counts
for a sample of 10 semiconductor silicon wafers:
50 48 44 56 61 52 53 55 67 51
The mean = 53.7 counts and the standard deviation = 6.567
counts.
The test is
two-sided
Over a long run the process average for wafer particle counts
has been 50 counts per wafer, and on the basis of the sample,
we want to test whether a change has occurred. The null
hypothesis that the process mean is 50 counts is tested against
the alternative hypothesis that the process mean is not equal
to 50 counts. The purpose of the two-sided alternative is to
rule out a possible process change in either direction.
Critical
values
For a significance level of = 0.05, the chances of
erroneously rejecting the null hypothesis when it is true are 5
7.2.2. Are the data consistent with the assumed process mean?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm[6/27/2012 2:42:24 PM]
% or less. (For a review of hypothesis testing basics, see
Chapter 1).
Even though there is a history on this process, it has not been
stable enough to justify the assumption that the standard
deviation is known. Therefore, the appropriate test statistic is
the t-statistic. Substituting the sample mean, sample standard
deviation, and sample size into the formula for the test
statistic gives a value of
t = 1.782
with degrees of freedom N - 1 = 9. This value is tested
against the critical value
t
1-0.025;9
= 2.262
from the t-table where the critical value is found under the
column labeled 0.975 for the probability of exceeding the
critical value and in the row for 9 degrees of freedom. The
critical value is based on instead of because of the
two-sided alternative (two-tailed test) which requires equal
probabilities in each tail of the distribution that add to .
Conclusion Because the value of the test statistic falls in the interval (-
2.262, 2.262), we cannot reject the null hypothesis and,
therefore, we may continue to assume the process mean is 50
counts.
7.2.2.1. Confidence interval approach
http://www.itl.nist.gov/div898/handbook/prc/section2/prc221.htm[6/27/2012 2:42:25 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.2. Are the data consistent with the assumed process mean?
7.2.2.1. Confidence interval approach
Testing using
a confidence
interval
The hypothesis test results in a "yes" or "no" answer. The null
hypothesis is either rejected or not rejected. There is another way of
testing a mean and that is by constructing a confidence interval about
the true but unknown mean.
General form
of confidence
intervals
where the
standard
deviation is
unknown
Tests of hypotheses that can be made from a single sample of data
were discussed on the foregoing page. As with null hypotheses,
confidence intervals can be two-sided or one-sided, depending on the
question at hand. The general form of confidence intervals, for the
three cases discussed earlier, where the standard deviation is unknown
are:
1. Two-sided confidence interval for :
2. Lower one-sided confidence interval for :
3. Upper one-sided confidence interval for :
where t
/2, N-1
is the /2 critical value from the t distribution with N - 1
degrees of freedom and similarly for cases (2) and (3). Critical values
can be found in the t table in Chapter 1.
Confidence
level
The confidence intervals are constructed so that the probability of the
interval containing the mean is 1 - . Such intervals are referred to as
100(1- )% confidence intervals.
A 95%
confidence
interval for
The corresponding confidence interval for the test of hypothesis
example on the foregoing page is shown below. A 95 % confidence
interval for the population mean of particle counts per wafer is given by
7.2.2.1. Confidence interval approach
http://www.itl.nist.gov/div898/handbook/prc/section2/prc221.htm[6/27/2012 2:42:25 PM]
the example
Interpretation The 95 % confidence interval includes the null hypothesis if, and only
if, it would be accepted at the 5 % level. This interval includes the null
hypothesis of 50 counts so we cannot reject the hypothesis that the
process mean for particle counts is 50. The confidence interval includes
all null hypothesis values for the population mean that would be
accepted by an hypothesis test at the 5 % significance level. This
assumes, of course, a two-sided alternative.
7.2.2.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm[6/27/2012 2:42:26 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.2. Are the data consistent with the assumed process mean?
7.2.2.2. Sample sizes required
The
computation
of sample
sizes depends
on many
things, some
of which
have to be
assumed in
advance
Perhaps one of the most frequent questions asked of a statistician is,
"How many measurements should be included in the sample?
"
Unfortunately, there is no correct answer without additional
information (or assumptions). The sample size required for an
experiment designed to investigate the behavior of an unknown
population mean will be influenced by the following:
value selected for , the risk of rejecting a true hypothesis
value of , the risk of accepting a false null hypothesis when
a particular value of the alternative hypothesis is true.
value of the population standard deviation.
Application -
estimating a
minimum
sample size,
N, for
limiting the
error in the
estimate of
the mean
For example, suppose that we wish to estimate the average daily
yield, , of a chemical process by the mean of a sample, Y
1
, ..., Y
N
,
such that the error of estimation is less than with a probability of
95%. This means that a 95% confidence interval centered at the
sample mean should be
and if the standard deviation is known,
The critical value from the normal distribution for 1-/2 = 0.975 is
1.96. Therefore,
Limitation
and
interpretation
A restriction is that the standard deviation must be known. Lacking
an exact value for the standard deviation requires some
accommodation, perhaps the best estimate available from a previous
experiment.
7.2.2.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm[6/27/2012 2:42:26 PM]
Controlling
the risk of
accepting a
false
hypothesis
To control the risk of accepting a false hypothesis, we set not only
, the probability of rejecting the null hypothesis when it is true,
but also , the probability of accepting the null hypothesis when in
fact the population mean is where is the difference or shift
we want to detect.
Standard
deviation
assumed to
be known
The minimum sample size, N, is shown below for two- and one-
sided tests of hypotheses with assumed to be known.
The quantities z
1-/2
and z
1-
are critical values from the normal
distribution.
Note that it is usual to state the shift, , in units of the standard
deviation, thereby simplifying the calculation.
Example
where the
shift is stated
in terms of
the standard
deviation
For a one-sided hypothesis test where we wish to detect an increase
in the population mean of one standard deviation, the following
information is required: , the significance level of the test, and ,
the probability of failing to detect a shift of one standard deviation.
For a test with = 0.05 and = 0.10, the minimum sample size
required for the test is
N = (1.645 + 1.282)
2
= 8.567 ~ 9.
More often
we must
compute the
sample size
with the
population
standard
deviation
being
unknown
The procedures for computing sample sizes when the standard
deviation is not known are similar to, but more complex, than when
the standard deviation is known. The formulation depends on the t
distribution where the minimum sample size is given by
The drawback is that critical values of the t distribution depend on
known degrees of freedom, which in turn depend upon the sample
size which we are trying to estimate.
Iterate on the
initial
estimate
using critical
values from
Therefore, the best procedure is to start with an intial estimate based
on a sample standard deviation and iterate. Take the example
discussed above where the the minimum sample size is computed to
be N = 9. This estimate is low. Now use the formula above with
degrees of freedom N - 1 = 8 which gives a second estimate of
7.2.2.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm[6/27/2012 2:42:26 PM]
the t table
N = (1.860 + 1.397)
2
= 10.6 ~11.
It is possible to apply another iteration using degrees of freedom 10,
but in practice one iteration is usually sufficient. For the purpose of
this example, results have been rounded to the closest integer;
however, computer programs for finding critical values from the t
distribution allow non-integer degrees of freedom.
Table
showing
minimum
sample sizes
for a two-
sided test
The table below gives sample sizes for a two-sided test of
hypothesis that the mean is a given value, with the shift to be
detected a multiple of the standard deviation. For a one-sided test at
significance level , look under the value of 2 in column 1. Note
that this table is based on the normal approximation (i.e., the
standard deviation is known).
Sample Size Table for Two-Sided Tests
.01 .01 98 25 11
.01 .05 73 18 8
.01 .10 61 15 7
.01 .20 47 12 6
.01 .50 27 7 3
.05 .01 75 19 9
.05 .05 53 13 6
.05 .10 43 11 5
.05 .20 33 8 4
.05 .50 16 4 3
.10 .01 65 16 8
.10 .05 45 11 5
.10 .10 35 9 4
.10 .20 25 7 3
.10 .50 11 3 3
.20 .01 53 14 6
.20 .05 35 9 4
.20 .10 27 7 3
.20 .20 19 5 3
.20 .50 7 3 3
7.2.3. Are the data consistent with a nominal standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc23.htm[6/27/2012 2:42:27 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.3. Are the data consistent with a nominal
standard deviation?
The testing of
H
0
for a single
population
mean
Given a random sample of measurements, Y
1
, ..., Y
N
, there
are three types of questions regarding the true standard
deviation of the population that can be addressed with the
sample data. They are:
1. Does the true standard deviation agree with a
nominal value?
2. Is the true standard deviation of the population less
than or equal to a nominal value?
3. Is the true stanard deviation of the population at
least as large as a nominal value?
Corresponding
null
hypotheses
The corresponding null hypotheses that test the true
standard deviation, , against the nominal value, are:
1. H
0
: =
2. H
0
: <=
3. H
0
: >=
Test statistic The basic test statistic is the chi-square statistic
with N - 1 degrees of freedom where s is the sample
standard deviation; i.e.,
.
Comparison
with critical
values
For a test at significance level , where is chosen to be
small, typically 0.01, 0.05 or 0.10, the hypothesis
associated with each case enumerated above is rejected if:
7.2.3. Are the data consistent with a nominal standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc23.htm[6/27/2012 2:42:27 PM]
where
2
/2
is the critical value from the chi-square
distribution with N - 1 degrees of freedom and similarly
for cases (2) and (3). Critical values can be found in the
chi-square table in Chapter 1.
Warning Because the chi-square distribution is a non-negative,
asymmetrical distribution, care must be taken in looking
up critical values from tables. For two-sided tests, critical
values are required for both tails of the distribution.
Example
A supplier of 100 ohm
.
cm silicon wafers claims that his
fabrication process can produce wafers with sufficient
consistency so that the standard deviation of resistivity for
the lot does not exceed 10 ohm
.
cm. A sample of N = 10
wafers taken from the lot has a standard deviation of 13.97
ohm.cm. Is the suppliers claim reasonable? This question
falls under null hypothesis (2) above. For a test at
significance level, = 0.05, the test statistic,
is compared with the critical value,
2
0.95, 9
= 16.92.
Since the test statistic (17.56) exceeds the critical value
(16.92) of the chi-square distribution with 9 degrees of
freedom, the manufacturer's claim is rejected.
7.2.3.1. Confidence interval approach
http://www.itl.nist.gov/div898/handbook/prc/section2/prc231.htm[6/27/2012 2:42:28 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.3. Are the data consistent with a nominal standard deviation?
7.2.3.1. Confidence interval approach
Confidence
intervals
for the
standard
deviation
Confidence intervals for the true standard deviation can be
constructed using the chi-square distribution. The 100(1- )%
confidence intervals that correspond to the tests of hypothesis
on the previous page are given by
1. Two-sided confidence interval for
2. Lower one-sided confidence interval for
3. Upper one-sided confidence interval for
where for case (1),
2
/2
is the critical value from the
chi-square distribution with N - 1 degrees of freedom and
similarly for cases (2) and (3). Critical values can be found in
the chi-square table in Chapter 1.
Choice of
risk level
can
change the
conclusion
Confidence interval (1) is equivalent to a two-sided test for the
standard deviation. That is, if the hypothesized or nominal
value, , is not contained within these limits, then the
hypothesis that the standard deviation is equal to the nominal
value is rejected.
A dilemma
of
hypothesis
testing
A change in can lead to a change in the conclusion. This
poses a dilemma. What should be? Unfortunately, there is
no clear-cut answer that will work in all situations. The usual
strategy is to set small so as to guarantee that the null
hypothesis is wrongly rejected in only a small number of
7.2.3.1. Confidence interval approach
http://www.itl.nist.gov/div898/handbook/prc/section2/prc231.htm[6/27/2012 2:42:28 PM]
cases. The risk, , of failing to reject the null hypothesis when
it is false depends on the size of the discrepancy, and also
depends on . The discussion on the next page shows how to
choose the sample size so that this risk is kept small for
specific discrepancies.
7.2.3.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc232.htm[6/27/2012 2:42:29 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.3. Are the data consistent with a nominal standard deviation?
7.2.3.2. Sample sizes required
Sample sizes
to minimize
risk of false
acceptance
The following procedure for computing sample sizes for
tests involving standard deviations follows W. Diamond
(1989). The idea is to find a sample size that is large
enough to guarantee that the risk, , of accepting a false
hypothesis is small.
Alternatives
are specific
departures
from the null
hypothesis
This procedure is stated in terms of changes in the variance,
not the standard deviation, which makes it somewhat
difficult to interpret. Tests that are generally of interest are
stated in terms of , a discrepancy from the hypothesized
variance. For example:
1. Is the true variance larger than its hypothesized value
by ?
2. Is the true variance smaller than its hypothesized
value by ?
That is, the tests of interest are:
1. H
0
:
2. H
0
:
Interpretation The experimenter wants to assure that the probability of
erroneously accepting the null hypothesis of unchanged
variance is at most . The sample size, N, required for this
type of detection depends on the factor, ; the significance
level, ; and the risk, .
First choose
the level of
significance
and beta risk
The sample size is determined by first choosing appropriate
values of and and then following the directions below
to find the degrees of freedom, , from the chi-square
distribution.
The
calculations
should be
done by
creating a
table or
First compute
Then generate a table of degrees of freedom, , say
7.2.3.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc232.htm[6/27/2012 2:42:29 PM]
spreadsheet
between 1 and 200. For case (1) or (2) above, calculate
is
called the cumulative density function (CDF).
Example Consider the case where the variance for resistivity
measurements on a lot of silicon wafers is claimed to be
100 (ohm
.
cm)
2
. A buyer is unwilling to accept a shipment
if is greater than 55 ohm
.
cm for a particular lot. This
problem falls under case (1) above. How many samples are
needed to assure risks of = 0.05 and = 0.01?
Calculations If software is available to compute the roots (or zero
values) of a univariate function, then we can determine the
sample size by finding the roots of a function that calculates
C
3. z z
1-
the null hypothesis is rejected.
Example of a
one-sided test
for proportion
defective
After a new method of processing wafers was introduced
into a fabrication process, two hundred wafers were tested,
and twenty-six showed some type of defect. Thus, for N=
200, the proportion defective is estimated to be = 26/200
= 0.13. In the past, the fabrication process was capable of
producing wafers with a proportion defective of at most
0.10. The issue is whether the new process has degraded
the quality of the wafers. The relevant test is the one-sided
test (3) which guards against an increase in proportion
defective from its historical level.
Calculations
for a one-
sided test of
proportion
defective
For a test at significance level = 0.05, the hypothesis of
no degradation is validated if the test statistic z is less than
the critical value, z
0.95
= 1.645. The test statistic is
computed to be
Interpretation Because the test statistic is less than the critical value
(1.645), we cannot reject hypothesis (3) and, therefore, we
cannot conclude that the new fabrication method is
degrading the quality of the wafers. The new process may,
indeed, be worse, but more evidence would be needed to
reach that conclusion at the 95% confidence level.
7.2.4. Does the proportion of defectives meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm[6/27/2012 2:42:30 PM]
7.2.4.1. Confidence intervals
http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm[6/27/2012 2:42:31 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.4. Does the proportion of defectives meet requirements?
7.2.4.1. Confidence intervals
Confidence
intervals
using the
method of
Agresti and
Coull
The method recommended by Agresti and Coull (1998) and also by Brown, Cai
and DasGupta (2001) (the methodology was originally developed by Wilson in
1927) is to use the form of the confidence interval that corresponds to the
hypothesis test given in Section 7.2.4. That is, solve for the two values of p
0
(say,
p
upper
and p
lower
) that result from setting z = z
1-/2
and solving for p
0
= p
upper
,
and then setting z = z
/2
and solving for p
0
= p
lower
. (Here, as in Section 7.2.4,
z
/2
denotes the variate value from the standard normal distribution such that the
area to the left of the value is /2.) Although solving for the two values of p
0
might sound complicated, the appropriate expressions can be obtained by
straightforward but slightly tedious algebra. Such algebraic manipulation isn't
necessary, however, as the appropriate expressions are given in various sources.
Specifically, we have
Formulas
for the
confidence
intervals
Procedure
does not
strongly
depend on
values of p
and n
This approach can be substantiated on the grounds that it is the exact algebraic
counterpart to the (large-sample) hypothesis test given in section 7.2.4 and is also
supported by the research of Agresti and Coull. One advantage of this procedure
is that its worth does not strongly depend upon the value of n and/or p, and
indeed was recommended by Agresti and Coull for virtually all combinations of
n and p.
Another
advantage
is that the
lower limit
cannot be
negative
Another advantage is that the lower limit cannot be negative. That is not true for
the confidence expression most frequently used:
7.2.4.1. Confidence intervals
http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm[6/27/2012 2:42:31 PM]
A confidence limit approach that produces a lower limit which is an impossible
value for the parameter for which the interval is constructed is an inferior
approach. This also applies to limits for the control charts that are discussed in
Chapter 6.
One-sided
confidence
intervals
A one-sided confidence interval can also be constructed simply by replacing each
by in the expression for the lower or upper limit, whichever is desired.
The 95% one-sided interval for p for the example in the preceding section is:
Example
Conclusion
from the
example
Since the lower bound does not exceed 0.10, in which case it would exceed the
hypothesized value, the null hypothesis that the proportion defective is at most
0.10, which was given in the preceding section, would not be rejected if we used
the confidence interval to test the hypothesis. Of course a confidence interval has
value in its own right and does not have to be used for hypothesis testing.
Exact Intervals for Small Numbers of Failures and/or Small Sample Sizes
Constrution
of exact
two-sided
confidence
intervals
based on
the
binomial
distribution
If the number of failures is very small or if the sample size N is very small,
symmetical confidence limits that are approximated using the normal distribution
may not be accurate enough for some applications. An exact method based on the
binomial distribution is shown next. To construct a two-sided confidence interval
at the 100(1-)% confidence level for the true proportion defective p where N
d
defects are found in a sample of size N follow the steps below.
1. Solve the equation
for p
U
to obtain the upper 100(1-)% limit for p.
2. Next solve the equation
7.2.4.1. Confidence intervals
http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm[6/27/2012 2:42:31 PM]
for p
L
to obtain the lower 100(1-)% limit for p.
Note The interval (p
L
, p
U
) is an exact 100(1-)% confidence interval for p. However,
it is not symmetric about the observed proportion defective, .
Binomial
confidence
interval
example
The equations above that determine p
L
and p
U
can be solved using readily
available functions. Take as an example the situation where twenty units are
sampled from a continuous production line and four items are found to be
defective. The proportion defective is estimated to be = 4/20 = 0.20. The steps
for calculating a 90 % confidence interval for the true proportion defective, p
follow.
1. Initalize constants.
alpha = 0.10
Nd = 4
N = 20
2. Define a function for upper limit (fu) and a function
for the lower limit (fl).
fu = F(Nd,pu,20) - alpha/2
fl = F(Nd-1,pl,20) - (1-alpha/2)
F is the cumulative density function for the
binominal distribution.
3. Find the value of pu that corresponds to fu = 0 and
the value of pl that corresponds to fl = 0 using software
to find the roots of a function.
The values of pu and pl for our example are:
pu = 0.401029
pl = 0.071354
Thus, a 90 % confidence interval for the proportion defective, p, is (0.071,
0.400). Whether or not the interval is truly "exact" depends on the software.
The calculations used in this example can be performed using both Dataplot code
and R code.
7.2.4.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm[6/27/2012 2:42:32 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.4. Does the proportion of defectives meet requirements?
7.2.4.2. Sample sizes required
Derivation of
formula for
required
sample size
when testing
proportions
The method of determining sample sizes for testing proportions is similar
to the method for determining sample sizes for testing the mean. Although
the sampling distribution for proportions actually follows a binomial
distribution, the normal approximation is used for this derivation.
Minimum
sample size
If we are interested in detecting a change in the proportion defective of
size in either direction, the minimum sample size is
1. For a two-sided test
2. For a one-sided test
Interpretation
and sample
size for high
probability of
detecting a
change
This requirement on the sample size only guarantees that a change of size
is detected with 50% probability. The derivation of the sample size
when we are interested in protecting against a change with probability 1
- (where is small) is
1. For a two-sided test
2. For a one-sided test
where z
1-
is the critical value from the normal distribution that is
7.2.4.2. Sample sizes required
http://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm[6/27/2012 2:42:32 PM]
exceeded with probability .
Value for the
true
proportion
defective
The equations above require that p be known. Usually, this is not the case.
If we are interested in detecting a change relative to an historical or
hypothesized value, this value is taken as the value of p for this purpose.
Note that taking the value of the proportion defective to be 0.5 leads to the
largest possible sample size.
Example of
calculating
sample size
for testing
proportion
defective
Suppose that a department manager needs to be able to detect any change
above 0.10 in the current proportion defective of his product line, which is
running at approximately 10% defective. He is interested in a one-sided
test and does not want to stop the line except when the process has clearly
degraded and, therefore, he chooses a significance level for the test of 5%.
Suppose, also, that he is willing to take a risk of 10% of failing to detect a
change of this magnitude. With these criteria:
1. z
0.95
= 1.645; z
0.90
=1.282
2. = 0.10
3. p = 0.10
and the minimum sample size for a one-sided test procedure is
7.2.5. Does the defect density meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc25.htm[6/27/2012 2:42:33 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.5. Does the defect density meet requirements?
Testing defect
densities is
based on the
Poisson
distribution
The number of defects observed in an area of size A units is often
assumed to have a Poisson distribution with parameter A x D,
where D is the actual process defect density (D is defects per
unit area). In other words:
The questions of primary interest for quality control are:
1. Is the defect density within prescribed limits?
2. Is the defect density less than a prescribed limit?
3. Is the defect density greater than a prescribed limit?
Normal
approximation
to the Poisson
We assume that AD is large enough so that the normal
approximation to the Poisson applies (in other words, AD > 10
for a reasonable approximation and AD > 20 for a good one).
That translates to
where is the standard normal distribution function.
Test statistic
based on a
normal
approximation
If, for a sample of area A with a defect density target of D
0
, a
defect count of C is observed, then the test statistic
can be used exactly as shown in the discussion of the test
statistic for fraction defectives in the preceding section.
Testing the
hypothesis
that the
process defect
density is less
than or equal
to D
0
For example, after choosing a sample size of area A (see below
for sample size calculation) we can reject that the process defect
density is less than or equal to the target D
0
if the number of
defects C in the sample is greater than C
A
, where
7.2.5. Does the defect density meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc25.htm[6/27/2012 2:42:33 PM]
and z
1-
is the 100(1-) percentile of the standard normal
distribution. The test significance level is 100(1-). For a 90%
significance level use z
0.90
= 1.282 and for a 95% test use z
0.95
= 1.645. is the maximum risk that an acceptable process with a
defect density at least as low as D
0
"fails" the test.
Choice of
sample size
(or area) to
examine for
defects
In order to determine a suitable area A to examine for defects,
you first need to choose an unacceptable defect density level.
Call this unacceptable defect density D
1
= kD
0
, where k > 1.
We want to have a probability of less than or equal to is of
"passing" the test (and not rejecting the hypothesis that the true
level is D
0
or better) when, in fact, the true defect level is D
1
or
worse. Typically will be 0.2, 0.1 or 0.05. Then we need to
count defects in a sample size of area A, where A is equal to
Example Suppose the target is D
0
= 4 defects per wafer and we want to
verify a new process meets that target. We choose = 0.1 to be
the chance of failing the test if the new process is as good as D
0
( = the Type I error probability or the "producer's risk") and we
choose = 0.1 for the chance of passing the test if the new
process is as bad as 6 defects per wafer ( = the Type II error
probability or the "consumer's risk"). That means z
1-
= 1.282
and z
= -1.282.
The sample size needed is A wafers, where
which we round up to 9.
The test criteria is to "accept" that the new process meets target
unless the number of defects in the sample of 9 wafers exceeds
In other words, the reject criteria for the test of the new process
is 44 or more defects in the sample of 9 wafers.
Note: Technically, all we can say if we run this test and end up
not rejecting is that we do not have statistically significant
evidence that the new process exceeds target. However, the way
we chose the sample size for this test assures us we most likely
would have had statistically significant evidence for rejection if
7.2.5. Does the defect density meet requirements?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc25.htm[6/27/2012 2:42:33 PM]
the process had been as bad as 1.5 times the target.
7.2.6. What intervals contain a fixed percentage of the population values?
http://www.itl.nist.gov/div898/handbook/prc/section2/prc26.htm[6/27/2012 2:42:34 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage
of the population values?
Observations
tend to
cluster
around the
median or
mean
Empirical studies have demonstrated that it is typical for a
large number of the observations in any study to cluster near
the median. In right-skewed data this clustering takes place
to the left of (i.e., below) the median and in left-skewed
data the observations tend to cluster to the right (i.e., above)
the median. In symmetrical data, where the median and the
mean are the same, the observations tend to distribute
equally around these measures of central tendency.
Various
methods
Several types of intervals about the mean that contain a
large percentage of the population values are discussed in
this section.
Approximate intervals that contain most of the
population values
Percentiles
Tolerance intervals for a normal distribution
Tolerance intervals based on the smallest and largest
observations
7.2.6.1. Approximate intervals that contain most of the population values
http://www.itl.nist.gov/div898/handbook/prc/section2/prc261.htm[6/27/2012 2:42:35 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage of the population values?
7.2.6.1. Approximate intervals that contain
most of the population values
Empirical
intervals
A rule of thumb is that where there is no evidence of
significant skewness or clustering, two out of every three
observations (67%) should be contained within a distance of
one standard deviation of the mean; 90% to 95% of the
observations should be contained within a distance of two
standard deviations of the mean; 99-100% should be
contained within a distance of three standard deviations. This
rule can help identify outliers in the data.
Intervals
that apply
to any
distribution
The Bienayme-Chebyshev rule states that regardless of how
the data are distributed, the percentage of observations that are
contained within a distance of k tandard deviations of the
mean is at least (1 - 1/k
2
)100%.
Exact
intervals
for the
normal
distribution
The Bienayme-Chebyshev rule is conservative because it
applies to any distribution. For a normal distribution, a higher
percentage of the observations are contained within k standard
deviations of the mean as shown in the following table.
Percentage of observations contained between the mean
and k standard deviations
k, No. of
Standard
Deviations
Empircal Rule
Bienayme-
Chebychev
Normal
Distribution
1 67% N/A 68.26%
2 90-95% at least 75% 95.44%
3 99-100%
at least
88.89%
99.73%
4 N/A
at least
93.75%
99.99%
7.2.6.2. Percentiles
http://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm[6/27/2012 2:42:35 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage of the population values?
7.2.6.2. Percentiles
Definitions of
order
statistics and
ranks
For a series of measurements Y
1
, ..., Y
N
, denote the data
ordered in increasing order of magnitude by Y
[1]
, ..., Y
[N]
.
These ordered data are called order statistics. If Y
[j]
is the
order statistic that corresponds to the measurement Y
i
, then
the rank for Y
i
is j; i.e.,
Definition of
percentiles
Order statistics provide a way of estimating proportions of
the data that should fall above and below a given value,
called a percentile. The pth percentile is a value, Y
(p)
, such
that at most (100p) % of the measurements are less than
this value and at most 100(1- p) % are greater. The 50th
percentile is called the median.
Percentiles split a set of ordered data into hundredths.
(Deciles split ordered data into tenths). For example, 70 %
of the data should fall below the 70th percentile.
Estimation of
percentiles
Percentiles can be estimated from N measurements as
follows: for the pth percentile, set p(N+1) equal to k + d for
k an integer, and d, a fraction greater than or equal to 0 and
less than 1.
1. For 0 < k < N,
2. For k = 0, Y(p) = Y
[1]
3. For k = N, Y(p) = Y
[N]
Example and
interpretation
For the purpose of illustration, twelve measurements from a
gage study are shown below. The measurements are
resistivities of silicon wafers measured in ohm
.
cm.
i Measurements Order stats Ranks
1 95.1772 95.0610 9
2 95.1567 95.0925 6
3 95.1937 95.1065 10
7.2.6.2. Percentiles
http://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm[6/27/2012 2:42:35 PM]
4 95.1959 95.1195 11
5 95.1442 95.1442 5
6 95.0610 95.1567 1
7 95.1591 95.1591 7
8 95.1195 95.1682 4
9 95.1065 95.1772 3
10 95.0925 95.1937 2
11 95.1990 95.1959 12
12 95.1682 95.1990 8
To find the 90th percentile, p(N+1) = 0.9(13) =11.7; k = 11,
and d = 0.7. From condition (1) above, Y(0.90) is estimated
to be 95.1981 ohm
.
cm. This percentile, although it is an
estimate from a small sample of resistivities measurements,
gives an indication of the percentile for a population of
resistivity measurements.
Note that
there are
other ways of
calculating
percentiles in
common use
Some software packages set 1+p(N-1) equal to k + d, then
proceed as above. The two methods give fairly similar
results.
A third way of calculating percentiles (given in some
elementary textbooks) starts by calculating pN. If that is not
an integer, round up to the next highest integer k and use
Y
[k]
as the percentile estimate. If pN is an integer k, use
0.5(Y
[k]
+Y
[k+1]
).
Definition of
Tolerance
Interval
An interval covering population percentiles can be
interpreted as "covering a proportion p of the population
with a level of confidence, say, 90 %." This is known as a
tolerance interval.
7.2.6.3. Tolerance intervals for a normal distribution
http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm[6/27/2012 2:42:36 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage of the population values?
7.2.6.3. Tolerance intervals for a normal distribution
Definition of
a tolerance
interval
A confidence interval covers a population parameter with a stated confidence, that
is, a certain proportion of the time. There is also a way to cover a fixed proportion
of the population with a stated confidence. Such an interval is called a tolerance
interval. The endpoints of a tolerance interval are called tolerance limits. An
application of tolerance intervals to manufacturing involves comparing specification
limits prescribed by the client with tolerance limits that cover a specified proportion
of the population.
Difference
between
confidence
and tolerance
intervals
Confidence limits are limits within which we expect a given population parameter,
such as the mean, to lie. Statistical tolerance limits are limits within which we
expect a stated proportion of the population to lie.
Not related to
engineering
tolerances
Statistical tolerance intervals have a probabilistic interpretation. Engineering
tolerances are specified outer limits of acceptability which are usually prescribed by
a design engineer and do not necessarily reflect a characteristic of the actual
measurements.
Three types of
tolerance
intervals
Three types of questions can be addressed by tolerance intervals. Question (1) leads
to a two-sided interval; questions (2) and (3) lead to one-sided intervals.
1. What interval will contain p percent of the population measurements?
2. What interval guarantees that p percent of population measurements will not
fall below a lower limit?
3. What interval guarantees that p percent of population measurements will not
exceed an upper limit?
Tolerance
intervals for
measurements
from a
normal
distribution
For the questions above, the corresponding tolerance intervals are defined by lower
(L) and upper (U) tolerance limits which are computed from a series of
measurements Y
1
, ..., Y
N
:
1.
2.
3.
where the k factors are determined so that the intervals cover at least a proportion p
of the population with confidence, .
Calculation If the data are from a normally distributed population, an approximate value for the
7.2.6.3. Tolerance intervals for a normal distribution
http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm[6/27/2012 2:42:36 PM]
of k factor for
a two-sided
tolerance
limit for a
normal
distribution
factor as a function of p and for a two-sided tolerance interval (Howe, 1969) is
where
2
1-, N-1
, is the critical value of the chi-square distribution with degrees of
freedom, N - 1, that is exceeded with probability and z
1-(1-p)/2
is the critical value
of the normal distribution which is exceeded with probability (1-p)/2.
Example of
calculation
For example, suppose that we take a sample of N = 43 silicon wafers from a lot and
measure their thicknesses in order to find tolerance limits within which a proportion
p = 0.90 of the wafers in the lot fall with probability = 0.99.
Use of tables
in calculating
two-sided
tolerance
intervals
Values of the k factor as a function of p and are tabulated in some textbooks, such
as Dixon and Massey (1969). To use the tables in this handbook, follow the steps
outlined below:
1. Calculate = (1 - p)/2 = 0.05
2. Go to the page describing critical values of the normal distribution and in the
summary table under the column labeled 0.95 find z
1-(1-p)/2
= z
0.95
= 1.645.
3. Go to the table of lower critical values of the chi-square distribution and
under the column labeled 0.01 in the row labeled degrees of freedom = 42,
find
2
1-, N-1
=
2
0.01, 42
= 23.650.
4. Calculate
The tolerance limits are then computed from the sample mean, , and standard
deviation, s, according to case(1).
Important
notes
The notation for the critical value of the chi-square distribution can be confusing.
Values as tabulated are, in a sense, already squared; whereas the critical value for
the normal distribution must be squared in the formula above.
Some software is capable of computing a tolerance intervals for a given set of data
so that the user does not need to perform all the calculations. All the tolerance
intervals shown in this section can be computed using both Dataplot code and R
code. R and Dataplot examples include the case where a tolerance interval is
computed automatically from a data set.
Calculation
of a one-
sided
tolerance
interval for a
normal
The calculation of an approximate k factor for one-sided tolerance intervals comes
directly from the following set of formulas (Natrella, 1963):
7.2.6.3. Tolerance intervals for a normal distribution
http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm[6/27/2012 2:42:36 PM]
distribution
A one-sided
tolerance
interval
example
For the example above, it may also be of interest to guarantee with 0.99 probability
(or 99% confidence) that 90% of the wafers have thicknesses less than an upper
tolerance limit. This problem falls under case (3). The calculations for the k
1
factor
for a one-sided tolerance interval are:
Tolerance
factor based
on the non-
central t
distribution
The value of k
1
can also be computed using the inverse cumulative distribution
function for the non-central t distribution. This method may give more accurate
results for small values of N. The value of k
1
using the non-central t distribution
(using the same example as above) is:
where is the non-centrality parameter.
In this case, the difference between the two computations is negligble (1.8752
versus 1.8740). However, the difference becomes more pronounced as the value of
N gets smaller (in particular, for N 10). For example, if N = 43 is replaced with N
= 6, the non-central t method returns a value of 4.4111 for k
1
while the method
based on the Natrella formuals returns a value of 5.2808.
The disadvantage of the non-central t method is that it depends on the inverse
cumulative distribution function for the non-central t distribution. This function is
not available in many statistical and spreadsheet software programs, but it is
available in Dataplot and R (see Dataplot code and R code). The Natrella formulas
only depend on the inverse cumulative distribution function for the normal
distribution (which is available in just about all statistical and spreadsheet software
programs). Unless you have small samples (say N 10), the difference in the
7.2.6.3. Tolerance intervals for a normal distribution
http://www.itl.nist.gov/div898/handbook/prc/section2/prc263.htm[6/27/2012 2:42:36 PM]
methods should not have much practical effect.
7.2.6.4. Tolerance intervals based on the largest and smallest observations
http://www.itl.nist.gov/div898/handbook/prc/section2/prc264.htm[6/27/2012 2:42:38 PM]
7. Product and Process Comparisons
7.2. Comparisons based on data from one process
7.2.6. What intervals contain a fixed percentage of the population values?
7.2.6.4. Tolerance intervals based on the largest and smallest
observations
Tolerance
intervals can
be constructed
for a
distribution of
any form
The methods on the previous pages for computing tolerance limits are based on the
assumption that the measurements come from a normal distribution. If the distribution is
not normal, tolerance intervals based on this assumption will not provide coverage for the
intended proportion p of the population. However, there are methods for achieving the
intended coverage if the form of the distribution is not known, but these methods may
produce substantially wider tolerance intervals.
Risks
associated
with making
assumptions
about the
distribution
There are situations where it would be particularly dangerous to make unwarranted
assumptions about the exact shape of the distribution, for example, when testing the
strength of glass for airplane windshields where it is imperative that a very large
proportion of the population fall within acceptable limits.
Tolerance
intervals
based on
largest and
smallest
observations
One obvious choice for a two-sided tolerance interval for an unknown distribution is the
interval between the smallest and largest observations from a sample of Y
1
, ..., Y
N
measurements. Given the sample size N and coverage p, an equation from Hahn and
Meeker (p. 91),
allows us to calculate the confidence of the tolerance interval. For example, the
confidence levels for selected coverages between 0.5 and 0.9999 are shown below for N
= 25.
Confidence Coverage
1.000 0.5000
0.993 0.7500
0.729 0.9000
0.358 0.9500
0.129 0.9750
0.026 0.9900
0.007 0.9950
0.0 0.9990
0.0 0.9995
0.0 0.9999
Note that if 99 % confidence is required, the interval that covers the entire sample data
set is guaranteed to achieve a coverage of only 75 % of the population values.
What is the
optimal
sample size?
Another question of interest is, "How large should a sample be so that one can be
assured with probability that the tolerance interval will contain at least a proportion p of
the population?"
7.2.6.4. Tolerance intervals based on the largest and smallest observations
http://www.itl.nist.gov/div898/handbook/prc/section2/prc264.htm[6/27/2012 2:42:38 PM]
Approximation
for N
A rather good approximation for the required sample size is given by
where
2
, 4
is the critical value of the chi-square distribution with 4 degrees of freedom
that is exceeded with probability .
Example of
the effect of p
on the sample
size
Suppose we want to know how many measurements to make in order to guarantee that
the interval between the smallest and largest observations covers a proportion p of the
population with probability = 0.95. From the table for the upper critical value of the
chi-square distribution, look under the column labeled 0.95 in the row for 4 degrees of
freedom. The value is found to be
2
0.95, 4
= 9.488 and calculations are shown below for
p equal to 0.90 and 0.99.
For p = 0.90, = 0.95:
For p = 0.99, = 0.95:
These calculations demonstrate that requiring the tolerance interval to cover a very large
proportion of the population may lead to an unacceptably large sample size.
7.3. Comparisons based on data from two processes
http://www.itl.nist.gov/div898/handbook/prc/section3/prc3.htm[6/27/2012 2:42:39 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two
processes
Outline for
this section
In many manufacturing environments it is common to have
two or more processes performing the same task or generating
similar products. The following pages describe tests covering
several of the most common and useful cases for two
processes.
1. Do two processes have the same mean?
1. Tests when the standard deviations are equal
2. Tests when the standard deviations are unequal
3. Tests for paired data
2. Do two processes have the same standard deviation?
3. Do two processes produce the same proportion of
defectives?
4. If the observations are failure times, are the failure rates
(or mean times to failure) the same?
5. Do two arbitrary processes have the same central
tendency?
Example of
a dual
track
process
For example, in an automobile manufacturing plant, there may
exist several assembly lines producing the same part. If one
line goes down for some reason, parts can still be produced
and production will not be stopped. For example, if the parts
are piston rings for a particular model car, the rings produced
by either line should conform to a given set of specifications.
How does one confirm that the two processes are in fact
producing rings that are similar? That is, how does one
determine if the two processes are similar?
The goal is
to
determine
if the two
processes
are similar
In order to answer this question, data on piston rings are
collected for each process. For example, on a particular day,
data on the diameters of ten piston rings from each process
are measured over a one-hour time frame.
To determine if the two processes are similar, we are
interested in answering the following questions:
1. Do the two processes produce piston rings with the
same diameter?
2. Do the two processes have similar variability in the
7.3. Comparisons based on data from two processes
http://www.itl.nist.gov/div898/handbook/prc/section3/prc3.htm[6/27/2012 2:42:39 PM]
diameters of the rings produced?
Unknown
standard
deviation
The second question assumes that one does not know the
standard deviation of either process and therefore it must be
estimated from the data. This is usually the case, and the tests
in this section assume that the population standard deviations
are unknown.
Assumption
of a
normal
distribution
The statistical methodology used (i.e., the specific test to be
used) to answer these two questions depends on the
underlying distribution of the measurements. The tests in this
section assume that the data are normally distributed.
7.3.1. Do two processes have the same mean?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm[6/27/2012 2:42:40 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.1. Do two processes have the same mean?
Testing
hypotheses
related to
the means of
two
processes
Given two random samples of measurements,
Y
1
, ..., Y
N
and Z
1
, ..., Z
N
from two independent processes (the Y's are sampled from process 1
and the Z's are sampled from process 2), there are three types of
questions regarding the true means of the processes that are often
asked. They are:
1. Are the means from the two processes the same?
2. Is the mean of process 1 less than or equal to the mean of
process 2?
3. Is the mean of process 1 greater than or equal to the mean of
process 2?
Typical null
hypotheses
The corresponding null hypotheses that test the true mean of the first
process, , against the true mean of the second process, are:
1. H
0
: =
2. H
0
: < or equal to
3. H
0
: > or equal to
Note that as previously discussed, our choice of which null hypothesis
to use is typically made based on one of the following considerations:
1. When we are hoping to prove something new with the sample
data, we make that the alternative hypothesis, whenever
possible.
2. When we want to continue to assume a reasonable or traditional
hypothesis still applies, unless very strong contradictory
evidence is present, we make that the null hypothesis, whenever
possible.
Basic
statistics
from the two
processes
The basic statistics for the test are the sample means
;
and the sample standard deviations
7.3.1. Do two processes have the same mean?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm[6/27/2012 2:42:40 PM]
with degrees of freedom and respectively.
Form of the
test statistic
where the
two
processes
have
equivalent
standard
deviations
If the standard deviations from the two processes are equivalent, and
this should be tested before this assumption is made, the test statistic
is
where the pooled standard deviation is estimated as
with degrees of freedom .
Form of the
test statistic
where the
two
processes do
NOT have
equivalent
standard
deviations
If it cannot be assumed that the standard deviations from the two
processes are equivalent, the test statistic is
The degrees of freedom are not known exactly but can be estimated
using the Welch-Satterthwaite approximation
Test
strategies
The strategy for testing the hypotheses under (1), (2) or (3) above is to
calculate the appropriate t statistic from one of the formulas above,
and then perform a test at significance level , where is chosen to
be small, typically .01, .05 or .10. The hypothesis associated with each
case enumerated above is rejected if:
7.3.1. Do two processes have the same mean?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm[6/27/2012 2:42:40 PM]
1. | t | t
1-/2,
2. t t
1-,
3. t t
,
Explanation
of critical
values
The critical values from the t table depend on the significance level
and the degrees of freedom in the standard deviation. For hypothesis
(1) t
1-/2,
is the 1-/2 critical value from the t table with degrees
of freedom and similarly for hypotheses (2) and (3).
Example of
unequal
number of
data points
A new procedure (process 2) to assemble a device is introduced and
tested for possible improvement in time of assembly. The question
being addressed is whether the mean, , of the new assembly process
is smaller than the mean, , for the old assembly process (process 1).
We choose to test hypothesis (2) in the hope that we will reject this
null hypothesis and thereby feel we have a strong degree of
confidence that the new process is an improvement worth
implementing. Data (in minutes required to assemble a device) for
both the new and old processes are listed below along with their
relevant statistics.
Device Process 1 (Old) Process 2 (New)
1 32 36
2 37 31
3 35 30
4 28 31
5 41 34
6 44 36
7 35 29
8 31 32
9 34 31
10 38
11 42
Mean 36.0909 32.2222
Standard deviation 4.9082 2.5386
No. measurements 11 9
Degrees freedom 10 8
Computation
of the test
statistic
From this table we generate the test statistic
with the degrees of freedom approximated by
Decision
process
For a one-sided test at the 5% significance level, go to the t table for
0.95 signficance level, and look up the critical value for degrees of
7.3.1. Do two processes have the same mean?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm[6/27/2012 2:42:40 PM]
freedom = 16. The critical value is 1.746. Thus, hypothesis (2) is
rejected because the test statistic (t = 2.269) is greater than 1.746 and,
therefore, we conclude that process 2 has improved assembly time
(smaller mean) over process 1.
7.3.1.1. Analysis of paired observations
http://www.itl.nist.gov/div898/handbook/prc/section3/prc311.htm[6/27/2012 2:42:42 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.1. Do two processes have the same mean?
7.3.1.1. Analysis of paired observations
Definition of
paired
comparisons
Given two random samples,
Y
1
, ..., Y
N
and Z
1
, ..., Z
N
from two populations, the data are said to be paired if the ith
measurement on the first sample is naturally paired with the
ith measurement on the second sample. For example, if N
supposedly identical products are chosen from a production
line, and each one, in turn, is tested with first one measuring
device and then with a second measuring device, it is
possible to decide whether the measuring devices are
compatible; i.e., whether there is a difference between the
two measurement systems. Similarly, if "before" and "after"
measurements are made with the same device on N objects, it
is possible to decide if there is a difference between "before"
and "after"; for example, whether a cleaning process changes
an important characteristic of an object. Each "before"
measurement is paired with the corresponding "after"
measurement, and the differences
are calculated.
Basic
statistics for
the test
The mean and standard deviation for the differences are
calculated as
and
with = N - 1 degrees of freedom.
Test statistic
based on the
t
The paired-sample t test is used to test for the difference of
two means before and after a treatment. The test statistic is:
7.3.1.1. Analysis of paired observations
http://www.itl.nist.gov/div898/handbook/prc/section3/prc311.htm[6/27/2012 2:42:42 PM]
distribution
The hypotheses described on the foregoing page are rejected
if:
1. | t | t
1-/2,
2. t t
1-,
3. t t
,
where for hypothesis (1) t
1-/2,
is the 1-/2 critical value
from the t distribution with degrees of freedom and
similarly for cases (2) and (3). Critical values can be found
in the t table in Chapter 1.
7.3.1.2. Confidence intervals for differences between means
http://www.itl.nist.gov/div898/handbook/prc/section3/prc312.htm[6/27/2012 2:42:43 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.1. Do two processes have the same mean?
7.3.1.2. Confidence intervals for differences between means
Definition of
confidence
interval for
difference
between
population
means
Given two random samples,
Y
1
, ..., Y
N
and Z
1
, ..., Z
N
from two populations, two-sided confidence intervals with 100 (1- )% coverage for the
difference between the unknown population means, and , are shown in the table
below. Relevant statistics for paired observations and for unpaired observations are
shown elsewhere.
Two-sided confidence intervals with 100(1- )% coverage for - :
Paired observations
Unpaired observations
Interpretation
of confidence
interval
One interpretation of the confidence interval for means is that if zero is contained within
the confidence interval, the two population means are equivalent.
7.3.2. Do two processes have the same standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc32.htm[6/27/2012 2:42:44 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.2. Do two processes have the same standard
deviation?
Testing
hypotheses
related to
standard
deviations
from two
processes
Given two random samples of measurements,
Y
1
, ..., Y
N
and Z
1
, ..., Z
N
from two independent processes, there are three types of
questions regarding the true standard deviations of the
processes that can be addressed with the sample data. They
are:
1. Are the standard deviations from the two processes the
same?
2. Is the standard deviation of one process less than the
standard deviation of the other process?
3. Is the standard deviation of one process greater than
the standard deviation of the other process?
Typical null
hypotheses
The corresponding null hypotheses that test the true standard
deviation of the first process, , against the true standard
deviation of the second process, are:
1. H
0
: =
2. H
0
:
3. H
0
:
Basic
statistics
from the two
processes
The basic statistics for the test are the sample variances
and degrees of freedom and , respectively.
Form of the
test statistic
The test statistic is
7.3.2. Do two processes have the same standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc32.htm[6/27/2012 2:42:44 PM]
Test
strategies
The strategy for testing the hypotheses under (1), (2) or (3)
above is to calculate the F statistic from the formula above,
and then perform a test at significance level , where is
chosen to be small, typically 0.01, 0.05 or 0.10. The
hypothesis associated with each case enumerated above is
rejected if:
1. or
2.
3.
Explanation
of critical
values
The critical values from the F table depend on the
significance level and the degrees of freedom in the standard
deviations from the two processes. For hypothesis (1):
is the upper critical value from the F table
with
degrees of freedom for the numerator and
degrees of freedom for the denominator
and
is the upper critical value from the F table
with
degrees of freedom for the numerator and
degrees of freedom for the denominator.
Caution on
looking up
critical
values
The F distribution has the property that
which means that only upper critical values are required for
two-sided tests. However, note that the degrees of freedom
are interchanged in the ratio. For example, for a two-sided
test at significance level 0.05, go to the F table labeled "2.5%
significance level".
For , reverse the order of the degrees of
freedom; i.e., look across the top of the table for
and down the table for .
7.3.2. Do two processes have the same standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc32.htm[6/27/2012 2:42:44 PM]
For , look across the top of the table for
and down the table for .
Critical values for cases (2) and (3) are defined similarly,
except that the critical values for the one-sided tests are
based on rather than on .
Two-sided
confidence
interval
The two-sided confidence interval for the ratio of the two
unknown variances (squares of the standard deviations) is
shown below.
Two-sided confidence interval with 100(1- )% coverage
for:
One interpretation of the confidence interval is that if the
quantity "one" is contained within the interval, the standard
deviations are equivalent.
Example of
unequal
number of
data points
A new procedure to assemble a device is introduced and
tested for possible improvement in time of assembly. The
question being addressed is whether the standard deviation,
, of the new assembly process is better (i.e., smaller) than
the standard deviation, , for the old assembly process.
Therefore, we test the null hypothesis that . We form
the hypothesis in this way because we hope to reject it, and
therefore accept the alternative that is less than . This is
hypothesis (2). Data (in minutes required to assemble a
device) for both the old and new processes are listed on an
earlier page. Relevant statistics are shown below:
Process 1 Process 2
Mean 36.0909 32.2222
Standard deviation 4.9082 2.5874
No. measurements 11 9
Degrees freedom 10 8
Computation
of the test
statistic
From this table we generate the test statistic
Decision
process
For a test at the 5% significance level, go to the F table for
5% signficance level, and look up the critical value for
numerator degrees of freedom = 10 and
denominator degrees of freedom = 8. The critical
value is 3.35. Thus, hypothesis (2) can be rejected because
7.3.2. Do two processes have the same standard deviation?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc32.htm[6/27/2012 2:42:44 PM]
the test statistic (F = 3.60) is greater than 3.35. Therefore, we
accept the alternative hypothesis that process 2 has better
precision (smaller standard deviation) than process 1.
7.3.3. How can we determine whether two processes produce the same proportion of defectives?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm[6/27/2012 2:42:46 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.3. How can we determine whether two
processes produce the same proportion of
defectives?
Case 1: Large Samples (Normal Approximation to
Binomial)
The
hypothesis of
equal
proportions
can be tested
using a z
statistic
If the samples are reasonably large we can use the normal
approximation to the binomial to develop a test similar to
testing whether two normal means are equal.
Let sample 1 have x
1
defects out of n
1
and sample 2 have
x
2
defects out of n
2
. Calculate the proportion of defects for
each sample and the z statistic below:
where
Compare |z| to the normal z
1-/2
table value for a two-
sided test. For a one-sided test, assuming the alternative
hypothesis is p
1
> p
2
, compare z to the normal z
1-
table
value. If the alternative hypothesis is p
1
< p
2
, compare z to
z
.
Case 2: An Exact Test for Small Samples
The Fisher
Exact
Probability
test is an
excellent
choice for
small samples
The Fisher Exact Probability Test is an excellent
nonparametric technique for analyzing discrete data (either
nominal or ordinal), when the two independent samples are
small in size. It is used when the results from two
independent random samples fall into one or the other of
two mutually exclusive classes (i.e., defect versus good, or
successes vs failures).
Example of a
2x2
In other words, every subject in each group has one of two
possible scores. These scores are represented by frequencies
7.3.3. How can we determine whether two processes produce the same proportion of defectives?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm[6/27/2012 2:42:46 PM]
contingency
table
in a 2x2 contingency table. The following discussion, using
a 2x2 contingency table, illustrates how the test operates.
We are working with two independent groups, such as
experiments and controls, males and females, the Chicago
Bulls and the New York Knicks, etc.
- + Total
Group
I
A B A+B
Group
II
C D C+D
Total A+C B+D N
The column headings, here arbitrarily indicated as plus and
minus, may be of any two classifications, such as: above
and below the median, passed and failed, Democrat and
Republican, agree and disagree, etc.
Determine
whether two
groups differ
in the
proportion
with which
they fall into
two
classifications
Fisher's test determines whether the two groups differ in
the proportion with which they fall into the two
classifications. For the table above, the test would
determine whether Group I and Group II differ significantly
in the proportion of plusses and minuses attributed to them.
The method proceeds as follows:
The exact probability of observing a particular set of
frequencies in a 2 2 table, when the marginal totals are
regarded as fixed, is given by the hypergeometric
distribution
But the test does not just look at the observed case. If
needed, it also computes the probability of more extreme
outcomes, with the same marginal totals. By "more
extreme", we mean relative to the null hypothesis of equal
proportions.
7.3.3. How can we determine whether two processes produce the same proportion of defectives?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm[6/27/2012 2:42:46 PM]
Example of
Fisher's test
This will become clear in the next illustrative example.
Consider the following set of 2 x 2 contingency tables:
Observed Data
More extreme outcomes with same
marginals
(a) (b) (c)
2 5 7
3 2 5
5 7 12
1 6 7
4 1 5
5 7 12
0 7 7
5 0 5
5 7 12
Table (a) shows the observed frequencies and tables (b)
and (c) show the two more extreme distributions of
frequencies that could occur with the same marginal totals
7, 5. Given the observed data in table (a) , we wish to test
the null hypothesis at, say, = 0.05.
Applying the previous formula to tables (a), (b), and (c),
we obtain
The probability associated with the occurrence of values as
extreme as the observed results under H
0
is given by
adding these three p's:
.26515 + .04419 + .00126 = .31060
So p = 0.31060 is the probability that we get from Fisher's
test. Since 0.31060 is larger than , we cannot reject the
null hypothesis.
Tocher's Modification
Tocher's
modification
makes
Fisher's test
less
conservative
Tocher (1950) showed that a slight modification of the
Fisher test makes it a more useful test. Tocher starts by
isolating the probability of all cases more extreme than the
observed one. In this example that is
p
b
+ p
c
= .04419 + .00126 = .04545
Now, if this probability is larger than , we cannot reject
H
0
. But if this probability is less than , while the
probability that we got from Fisher's test is greater than
7.3.3. How can we determine whether two processes produce the same proportion of defectives?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm[6/27/2012 2:42:46 PM]
(as is the case in our example) then Tocher advises to
compute the following ratio:
For the data in the example, that would be
Now we go to a table of random numbers and at random
draw a number between 0 and 1. If this random number is
smaller than the ratio above of 0.0172, we reject H
0
. If it is
larger we cannot reject H
0
. This added small probability of
rejecting H
0
brings the test procedure Type I error (i.e.,
value) to exactly 0.05 and makes the Fisher test less
conservative.
The test is a one-tailed test. For a two-tailed test, the value
of p obtained from the formula must be doubled.
A difficulty with the Tocher procedure is that someone else
analyzing the same data would draw a different random
number and possibly make a different decision about the
validity of H
0
.
7.3.4. Assuming the observations are failure times, are the failure rates (or Mean Times To Failure) for two distributions the same?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc34.htm[6/27/2012 2:42:47 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.4. Assuming the observations are failure
times, are the failure rates (or Mean
Times To Failure) for two distributions
the same?
Comparing
two
exponential
distributions
is to
compare the
means or
hazard rates
The comparison of two (or more) life distributions is a
common objective when performing statistical analyses of
lifetime data. Here we look at the one-parameter exponential
distribution case.
In this case, comparing two exponential distributions is
equivalent to comparing their means (or the reciprocal of
their means, known as their hazard rates).
Type II Censored data
Definition
of Type II
censored
data
Definition: Type II censored data occur when a life test is
terminated exactly when a pre-specified number of failures
have occurred. The remaining units have not yet failed. If n
units were on test, and the pre-specified number of failures is
r (where r is less than or equal to n), then the test ends at t
r
= the time of the r-th failure.
Two
exponential
samples
oredered by
time
Suppose we have Type II censored data from two
exponential distributions with means
1
and
2
. We have two
samples from these distributions, of sizes n
1
on test with r
1
failures and n
2
on test with r
2
failures, respectively. The
observations are time to failure and are therefore ordered by
time.
Test of
equality of
1
and
2
and
confidence
Letting
7.3.4. Assuming the observations are failure times, are the failure rates (or Mean Times To Failure) for two distributions the same?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc34.htm[6/27/2012 2:42:47 PM]
interval for
1
/
2
Then
and
with T
1
and T
2
independent. Thus
where
and
has an F distribution with (2r
1
, 2r
2
) degrees of freedom.
Tests of equality of
1
and
2
can be performed using tables
of the F distribution or computer programs. Confidence
intervals for
1
/
2
, which is the ratio of the means or the
hazard rates for the two distributions, are also readily
obtained.
Numerical
example
A numerical application will illustrate the concepts outlined
above.
For this example,
H
0
:
1
/
2
= 1
H
a
:
1
/
2
1
Two samples of size 10 from exponential distributions were
put on life test. The first sample was censored after 7 failures
and the second sample was censored after 5 failures. The
times to failure were:
Sample 1: 125 189 210 356 468 550 610
Sample 2: 170 234 280 350 467
So r
1
= 7, r
2
= 5 and t
1,(r1)
= 610, t
2,(r2)
=467.
Then T
1
= 4338 and T
2
= 3836.
The estimator for
1
is 4338 / 7 = 619.71 and the estimator
for
2
is 3836 / 5 = 767.20.
The ratio of the estimators = U = 619.71 / 767.20 = .808.
7.3.4. Assuming the observations are failure times, are the failure rates (or Mean Times To Failure) for two distributions the same?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc34.htm[6/27/2012 2:42:47 PM]
If the means are the same, the ratio of the estimators, U,
follows an F distribution with 2r
1
, 2r
2
degrees of freedom.
The P(F < .808) = .348. The associated p-value is 2(.348) =
.696. Based on this p-value, we find no evidence to reject the
null hypothesis (that the true but unknown ratio = 1). Note
that this is a two-sided test, and we would reject the null
hyposthesis if the p-value is either too small (i.e., less or
equal to .025) or too large (i.e., greater than or equal to .975)
for a 95% significance level test.
We can also put a 95% confidence interval around the ratio
of the two means. Since the .025 and .975 quantiles of
F
(14,10)
are 0.3178 and 3.5504, respectively, we have
Pr(U/3.5504 <
1
/
2
< U/.3178) = .95
and (.228, 2.542) is a 95% confidence interval for the ratio
of the unknown means. The value of 1 is within this range,
which is another way of showing that we cannot reject the
null hypothesis at the 95% significance level.
7.3.5. Do two arbitrary processes have the same central tendency?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc35.htm[6/27/2012 2:42:48 PM]
7. Product and Process Comparisons
7.3. Comparisons based on data from two processes
7.3.5. Do two arbitrary processes have the same central
tendency?
The
nonparametric
equivalent of
the t test is
due to Mann
and Whitney,
called the U
test
By "arbitrary" we mean that we make no underlying assumptions about
normality or any other distribution. The test is called the Mann-Whitney U
Test, which is the nonparametric equivalent of the t test for means.
The U-test (as the majority of nonparametric tests) uses the rank sums of the
two samples.
Procedure The test is implemented as follows.
1. Rank all (n
1
+ n
2
) observations in ascending order. Ties receive the
average of their observations.
2. Calculate the sum of the ranks, call these T
a
and T
b
3. Calculate the U statistic,
U
a
= n
1
(n
2
) + 0.5(n
1
)(n
1
+ 1) - T
a
or
U
b
= n
1
(n
2
) + 0.5(n
2
)(n
2
+ 1) - T
b
where U
a
+ U
b
= n
1
(n
2
).
Null
Hypothesis
The null hypothesis is: the two populations have the same central tendency.
The alternative hypothesis is: The central tendencies are NOT the same.
Test statistic The test statistic, U, is the smaller of U
a
and U
b
. For sample sizes larger than
20, we can use the normal z as follows:
z = [ U - E(U)] /
where
The critical value is the normal tabled z for /2 for a two-tailed test or z at
7.3.5. Do two arbitrary processes have the same central tendency?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc35.htm[6/27/2012 2:42:48 PM]
level, for a one-tail test.
For small samples, tables are readily available in most textbooks on
nonparametric statistics.
Example
An illustrative
example of the
U test
Two processing systems were used to clean wafers. The following data
represent the (coded) particle counts. The null hypothesis is that there is no
difference between the central tendencies of the particle counts; the alternative
hypothesis is that there is a difference. The solution shows the typical kind of
output software for this procedure would generate, based on the large sample
approximation.
Group A Rank Group B Rank
.55 8 .49 5
.67 15.5 .68 17
.43 1 .59 9.5
.51 6 .72 19
.48 3.5 .67 15.5
.60 11 .75 20.5
.71 18 .65 13.5
.53 7 .77 22
.44 2 .62 12
.65 13.5 .48 3.5
.75 20.5 .59 9.5
N Sum of Ranks U Std. Dev of U Median
A 11 106.000 81.000 15.229 0.540
B 11 147.000 40.000 15.229 0.635
For U = 40.0 and E[U] = 0.5(n
1
)(n
2
) = 60.5, the test statistic is
where
For a two-sided test with significance level = 0.05, the critical value is z
1-/2
= 1.96. Since |z| is less than the critical value, we do not reject the null
7.3.5. Do two arbitrary processes have the same central tendency?
http://www.itl.nist.gov/div898/handbook/prc/section3/prc35.htm[6/27/2012 2:42:48 PM]
hypothesis and conclude that there is not enough evidence to claim that two
groups have different central tendencies.
7.4. Comparisons based on data from more than two processes
http://www.itl.nist.gov/div898/handbook/prc/section4/prc4.htm[6/27/2012 2:42:49 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more
than two processes
Introduction This section begins with a nonparametric procedure for
comparing several populations with unknown distributions.
Then the following topics are discussed:
Comparing variances
Comparing means (ANOVA technique)
Estimating variance components
Comparing categorical data
Comparing population proportion defectives
Making multiple comparisons
7.4.1. How can we compare several populations with unknown distributions (the Kruskal-Wallis test)?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm[6/27/2012 2:42:49 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.1. How can we compare several populations
with unknown distributions (the Kruskal-
Wallis test)?
The Kruskal-Wallis (KW) Test for Comparing
Populations with Unknown Distributions
A
nonparametric
test for
comparing
population
medians by
Kruskal and
Wallis
The KW procedure tests the null hypothesis that k samples
from possibly different populations actually originate from
similar populations, at least as far as their central
tendencies, or medians, are concerned. The test assumes
that the variables under consideration have underlying
continuous distributions.
In what follows assume we have k samples, and the
sample size of the i-th sample is n
i
, i = 1, 2, . . ., k.
Test based on
ranks of
combined data
In the computation of the KW statistic, each observation is
replaced by its rank in an ordered combination of all the k
samples. By this we mean that the data from the k samples
combined are ranked in a single series. The minimum
observation is replaced by a rank of 1, the next-to-the-
smallest by a rank of 2, and the largest or maximum
observation is replaced by the rank of N, where N is the
total number of observations in all the samples (N is the
sum of the n
i
).
Compute the
sum of the
ranks for each
sample
The next step is to compute the sum of the ranks for each
of the original samples. The KW test determines whether
these sums of ranks are so different by sample that they are
not likely to have all come from the same population.
Test statistic
follows a
2
distribution
It can be shown that if the k samples come from the same
population, that is, if the null hypothesis is true, then the
test statistic, H, used in the KW procedure is distributed
approximately as a chi-square statistic with df = k - 1,
provided that the sample sizes of the k samples are not too
small (say, n
i
>4, for all i). H is defined as follows:
7.4.1. How can we compare several populations with unknown distributions (the Kruskal-Wallis test)?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm[6/27/2012 2:42:49 PM]
where
k = number of samples (groups)
n
i
= number of observations for the i-th sample or
group
N = total number of observations (sum of all the n
i
)
R
i
= sum of ranks for group i
Example
An illustrative
example
The following data are from a comparison of four
investment firms. The observations represent percentage of
growth during a three month period.for recommended
funds.
A B C D
4.2 3.3 1.9 3.5
4.6 2.4 2.4 3.1
3.9 2.6 2.1 3.7
4.0 3.8 2.7 4.1
2.8 1.8 4.4
Step 1: Express the data in terms of their ranks
A B C D
17 10 2 11
19 4.5 4.5 9
14 6 3 12
15 13 7 16
8 1 18
SUM 65 41.5 17.5 66
Compute the
test statistic
The corresponding H test statistic is
7.4.1. How can we compare several populations with unknown distributions (the Kruskal-Wallis test)?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm[6/27/2012 2:42:49 PM]
From the chi-square table in Chapter 1, the critical value
for 1- = 0.95 with df = k-1 = 3 is 7.812. Since 13.678 >
7.812, we reject the null hypothesis.
Note that the rejection region for the KW procedure is one-
sided, since we only reject the null hypothesis when the H
statistic is too large.
7.4.2. Assuming the observations are normal, do the processes have the same variance?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc42.htm[6/27/2012 2:42:50 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.2. Assuming the observations are normal, do
the processes have the same variance?
Before
comparing
means, test
whether the
variances
are equal
Techniques for comparing means of normal populations
generally assume the populations have the same variance.
Before using these ANOVA techniques, it is advisable to test
whether this assumption of homogeneity of variance is
reasonable. The following procedure is widely used for this
purpose.
Bartlett's Test for Homogeneity of Variances
Null
hypothesis
Bartlett's test is a commonly used test for equal variances.
Let's examine the null and alternative hypotheses.
against
Test
statistic
Assume we have samples of size n
i
from the i-th population,
i = 1, 2, . . . , k, and the usual variance estimates from each
sample:
where
Now introduce the following notation:
j
= n
j
- 1 (the
j
are
the degrees of freedom) and
The Bartlett's test statistic M is defined by
7.4.2. Assuming the observations are normal, do the processes have the same variance?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc42.htm[6/27/2012 2:42:50 PM]
Distribution
of the test
statistic
When none of the degrees of freedom is small, Bartlett
showed that M is distributed approximately as . The chi-
square approximation is generally acceptable if all the n
i
are
at least 5.
Bias
correction
This is a slightly biased test, according to Bartlett. It can be
improved by dividing M by the factor
Instead of M, it is suggested to use M/C for the test statistic.
Bartlett's
test is not
robust
This test is not robust, it is very sensitive to departures from
normality.
An alternative description of Bartlett's test appears in Chapter
1.
Gear Data Example (from Chapter 1):
An
illustrative
example of
Bartlett's
test
Gear diameter measurements were made on 10 batches of
product. The complete set of measurements appears in
Chapter 1. Bartlett's test was applied to this dataset leading to
a rejection of the assumption of equal batch variances at the
.05 critical value level. applied to this dataset
The Levene Test for Homogeneity of Variances
The Levene
test for
equality of
variances
Levene's test offers a more robust alternative to Bartlett's
procedure. That means it will be less likely to reject a true
hypothesis of equality of variances just because the
distributions of the sampled populations are not normal.
When non-normality is suspected, Levene's procedure is a
better choice than Bartlett's.
Levene's test is described in Chapter 1. This description also
includes an example where the test is applied to the gear
data. Levene's test does not reject the assumption of equality
of batch variances for these data. This differs from the
conclusion drawn from Bartlett's test and is a better answer
if, indeed, the batch population distributions are non-normal.
7.4.2. Assuming the observations are normal, do the processes have the same variance?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc42.htm[6/27/2012 2:42:50 PM]
7.4.3. Are the means equal?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm[6/27/2012 2:42:51 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
Test
equality of
means
The procedure known as the Analysis of Variance or ANOVA
is used to test hypotheses concerning means when we have
several populations.
The Analysis of Variance (ANOVA)
The ANOVA
procedure
is one of the
most
powerful
statistical
techniques
ANOVA is a general technique that can be used to test the
hypothesis that the means among two or more groups are
equal, under the assumption that the sampled populations are
normally distributed.
A couple of questions come immediately to mind: what
means? and why analyze variances in order to derive
conclusions about the means?
Both questions will be answered as we delve further into the
subject.
Introduction
to ANOVA
To begin, let us study the effect of temperature on a passive
component such as a resistor. We select three different
temperatures and observe their effect on the resistors. This
experiment can be conducted by measuring all the
participating resistors before placing n resistors each in three
different ovens.
Each oven is heated to a selected temperature. Then we
measure the resistors again after, say, 24 hours and analyze
the responses, which are the differences between before and
after being subjected to the temperatures. The temperature is
called a factor. The different temperature settings are called
levels. In this example there are three levels or settings of the
factor Temperature.
What is a
factor?
A factor is an independent treatment variable whose
settings (values) are controlled and varied by the
experimenter. The intensity setting of a factor is the level.
Levels may be quantitative numbers or, in many
cases, simply "present" or "not present" ("0" or
"1").
The 1-way In the experiment above, there is only one factor,
7.4.3. Are the means equal?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm[6/27/2012 2:42:51 PM]
ANOVA temperature, and the analysis of variance that we will be
using to analyze the effect of temperature is called a one-way
or one-factor ANOVA.
The 2-way
or 3-way
ANOVA
We could have opted to also study the effect of positions in
the oven. In this case there would be two factors, temperature
and oven position. Here we speak of a two-way or two-
factor ANOVA. Furthermore, we may be interested in a third
factor, the effect of time. Now we deal with a three-way or
three-factorANOVA. In each of these ANOVA's we test a
variety of hypotheses of equality of means (or average
responses when the factors are varied).
Hypotheses
that can be
tested in an
ANOVA
First consider the one-way ANOVA. The null hypothesis is:
there is no difference in the population means of the different
levels of factor A (the only factor).
The alternative hypothesis is: the means are not the same.
For the 2-way ANOVA, the possible null hypotheses are:
1. There is no difference in the means of factor A
2. There is no difference in means of factor B
3. There is no interaction between factors A and B
The alternative hypothesis for cases 1 and 2 is: the means are
not equal.
The alternative hypothesis for case 3 is: there is an
interaction between A and B.
For the 3-way ANOVA: The main effects are factors A, B
and C. The 2-factor interactions are: AB, AC, and BC. There
is also a three-factor interaction: ABC.
For each of the seven cases the null hypothesis is the same:
there is no difference in means, and the alternative hypothesis
is the means are not equal.
The n-way
ANOVA
In general, the number of main effects and interactions can
be found by the following expression:
The first term is for the overall mean, and is always 1. The
second term is for the number of main effects. The third term
is for the number of 2-factor interactions, and so on. The last
term is for the n-factor interaction and is always 1.
In what follows, we will discuss only the 1-way and 2-way
ANOVA.
7.4.3. Are the means equal?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm[6/27/2012 2:42:51 PM]
7.4.3.1. 1-Way ANOVA overview
http://www.itl.nist.gov/div898/handbook/prc/section4/prc431.htm[6/27/2012 2:42:52 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.1. 1-Way ANOVA overview
Overview and
principles
This section gives an overview of the one-way ANOVA.
First we explain the principles involved in the 1-way
ANOVA.
Partition
response into
components
In an analysis of variance the variation in the response
measurements is partitoned into components that
correspond to different sources of variation.
The goal in this procedure is to split the total variation in
the data into a portion due to random error and portions
due to changes in the values of the independent
variable(s).
Variance of n
measurements
The variance of n measurements is given by
where is the mean of the n measurements.
Sums of
squares and
degrees of
freedom
The numerator part is called the sum of squares of
deviations from the mean, and the denominator is called
the degrees of freedom.
The variance, after some algebra, can be rewritten as:
The first term in the numerator is called the "raw sum of
squares" and the second term is called the "correction term
for the mean". Another name for the numerator is the
"corrected sum of squares", and this is usually abbreviated
by Total SS or SS(Total).
The SS in a 1-way ANOVA can be split into two
components, called the "sum of squares of treatments" and
"sum of squares of error", abbreviated as SST and SSE,
respectively.
7.4.3.1. 1-Way ANOVA overview
http://www.itl.nist.gov/div898/handbook/prc/section4/prc431.htm[6/27/2012 2:42:52 PM]
The guiding
principle
behind
ANOVA is the
decomposition
of the sums of
squares, or
Total SS
Algebraically, this is expressed by
where k is the number of treatments and the bar over the
y.. denotes the "grand" or "overall" mean. Each n
i
is the
number of observations for treatment i. The total number of
observations is N (the sum of the n
i
).
Note on
subscripting
Don't be alarmed by the double subscripting. The total SS
can be written single or double subscripted. The double
subscript stems from the way the data are arranged in the
data table. The table is usually a rectangular array with k
columns and each column consists of n
i
rows (however, the
lengths of the rows, or the n
i
, may be unequal).
Definition of
"Treatment"
We introduced the concept of treatment. The definition is:
A treatment is a specific combination of factor levels
whose effect is to be compared with other treatments.
7.4.3.2. The 1-way ANOVA model and assumptions
http://www.itl.nist.gov/div898/handbook/prc/section4/prc432.htm[6/27/2012 2:42:53 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.2. The 1-way ANOVA model and
assumptions
A model
that
describes
the
relationship
between the
response
and the
treatment
(between
the
dependent
and
independent
variables)
The mathematical model that describes the relationship
between the response and treatment for the one-way ANOVA
is given by
where Y
ij
represents the j-th observation (j = 1, 2, ...n
i
) on the
i-th treatment (i = 1, 2, ..., k levels). So, Y
23
represents the
third observation using level 2 of the factor. is the common
effect for the whole experiment,
i
represents the i-th
treatment effect and
ij
represents the random error present in
the j-th observation on the i-th treatment.
Fixed
effects
model
The errors
ij
are assumed to be normally and independently
(NID) distributed, with mean zero and variance . is
always a fixed parameter and are considered to
be fixed parameters if the levels of the treatment are fixed,
and not a random sample from a population of possible
levels. It is also assumed that is chosen so that
holds. This is the fixed effects model.
Random
effects
model
If the k levels of treatment are chosen at random, the model
equation remains the same. However, now the
i
's are
random variables assumed to be NID(0, ). This is the
random effects model.
Whether the levels are fixed or random depends on how these
levels are chosen in a given experiment.
7.4.3.3. The ANOVA table and tests of hypotheses about means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm[6/27/2012 2:42:54 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.3. The ANOVA table and tests of
hypotheses about means
Sums of
Squares help
us compute
the variance
estimates
displayed in
ANOVA
Tables
The sums of squares SST and SSE previously computed for
the one-way ANOVA are used to form two mean squares,
one for treatments and the second for error. These mean
squares are denoted by MST and MSE, respectively. These
are typically displayed in a tabular form, known as an
ANOVA Table. The ANOVA table also shows the statistics
used to test hypotheses about the population means.
Ratio of MST
and MSE
When the null hypothesis of equal means is true, the two
mean squares estimate the same quantity (error variance),
and should be of approximately equal magnitude. In other
words, their ratio should be close to 1. If the null hypothesis
is false, MST should be larger than MSE.
Divide sum of
squares by
degrees of
freedom to
obtain mean
squares
The mean squares are formed by dividing the sum of
squares by the associated degrees of freedom.
Let N = n
i
. Then, the degrees of freedom for treatment,
DFT = k - 1, and the degrees of freedom for error, DFE =
N
- k.
The corresponding mean squares are:
MST = SST / DFT
MSE = SSE / DFE
The F-test The test statistic, used in testing the equality of treatment
means is: F = MST / MSE.
The critical value is the tabular value of the F distribution,
based on the chosen level and the degrees of freedom
DFT and DFE.
The calculations are displayed in an ANOVA table, as
follows:
ANOVA table
Source SS DF MS F
7.4.3.3. The ANOVA table and tests of hypotheses about means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm[6/27/2012 2:42:54 PM]
Treatments SST k-1 SST / (k-1) MST/MSE
Error SSE N-k SSE / (N-k)
Total
(corrected)
SS N-1
The word "source" stands for source of variation. Some
authors prefer to use "between" and "within" instead of
"treatments" and "error", respectively.
ANOVA Table Example
A numerical
example
The data below resulted from measuring the difference in
resistance resulting from subjecting identical resistors to
three different temperatures for a period of 24 hours. The
sample size of each group was 5. In the language of Design
of Experiments, we have an experiment in which each of
three treatments was replicated 5 times.
Level 1 Level 2 Level 3
6.9 8.3 8.0
5.4 6.8 10.5
5.8 7.8 8.1
4.6 9.2 6.9
4.0 6.5 9.3
means 5.34 7.72 8.56
The resulting ANOVA table is
Example
ANOVA table
Source SS DF MS F
Treatments 27.897 2 13.949 9.59
Error 17.452 12 1.454
Total (corrected) 45.349 14
Correction Factor 779.041 1
Interpretation
of the
ANOVA table
The test statistic is the F value of 9.59. Using an of .05,
we have that F
.05; 2, 12
= 3.89 (see the F distribution table in
Chapter 1). Since the test statistic is much larger than the
critical value, we reject the null hypothesis of equal
population means and conclude that there is a (statistically)
7.4.3.3. The ANOVA table and tests of hypotheses about means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm[6/27/2012 2:42:54 PM]
significant difference among the population means. The p-
value for 9.59 is .00325, so the test statistic is significant at
that level.
Techniques
for further
analysis
The populations here are resistor readings while operating
under the three different temperatures. What we do not
know at this point is whether the three means are all
different or which of the three means is different from the
other two, and by how much.
There are several techniques we might use to further
analyze the differences. These are:
constructing confidence intervals around the
difference of two means,
estimating combinations of factor levels with
confidence bounds
multiple comparisons of combinations of factor levels
tested simultaneously.
7.4.3.4. 1-Way ANOVA calculations
http://www.itl.nist.gov/div898/handbook/prc/section4/prc434.htm[6/27/2012 2:42:55 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.4. 1-Way ANOVA calculations
Formulas
for 1-way
ANOVA
hand
calculations
Although computer programs that do ANOVA calculations
now are common, for reference purposes this page describes
how to calculate the various entries in an ANOVA table.
Remember, the goal is to produce two variances (of
treatments and error) and their ratio. The various
computational formulas will be shown and applied to the data
from the previous example.
Step 1:
compute
CM
STEP 1 Compute CM, the correction for the mean.
Step 2:
compute
total SS
STEP 2 Compute the total SS.
The total SS = sum of squares of all observations - CM
The 829.390 SS is called the "raw" or "uncorrected " sum of
squares.
Step 3:
compute
SST
STEP 3 Compute SST, the treatment sum of squares.
First we compute the total (sum) for each treatment.
T
1
= (6.9) + (5.4) + ... + (4.0) = 26.7
7.4.3.4. 1-Way ANOVA calculations
http://www.itl.nist.gov/div898/handbook/prc/section4/prc434.htm[6/27/2012 2:42:55 PM]
T
2
= (8.3) + (6.8) + ... + (6.5) = 38.6
T
1
= (8.0) + (10.5) + ... + (9.3) = 42.8
Then
Step 4:
compute
SSE
STEP 4 Compute SSE, the error sum of squares.
Here we utilize the property that the treatment sum of squares
plus the error sum of squares equals the total sum of squares.
Hence, SSE = SS Total - SST = 45.349 - 27.897 = 17.45.
Step 5:
Compute
MST, MSE,
and F
STEP 5 Compute MST, MSE and their ratio, F.
MST is the mean square of treatments, MSE is the mean
square of error (MSE is also frequently denoted by ).
MST = SST / (k-1) = 27.897 / 2 = 13.949
MSE = SSE / (N-k) = 17.452/ 12 = 1.454
where N is the total number of observations and k is the
number of treatments. Finally, compute F as
F = MST / MSE = 9.59
That is it. These numbers are the quantities that are
assembled in the ANOVA table that was shown previously.
7.4.3.5. Confidence intervals for the difference of treatment means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc435.htm[6/27/2012 2:42:56 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.5. Confidence intervals for the difference
of treatment means
Confidence
intervals for
the
difference
between two
means
This page shows how to construct a confidence interval
around (
i
-
j
) for the one-way ANOVA by continuing the
example shown on a previous page.
Formula for
the
confidence
interval
The formula for a (1- ) 100% confidence interval for the
difference between two treatment means is:
where = MSE.
Computation
of the
confidence
interval for
3
-
1
For the example, we have the following quantities for the
formula:
3
= 8.56
1
= 5.34
t
0.975, 12
= 2.179
Substituting these values yields (8.56 - 5.34) 2.179(0.763)
or 3.22 1.616.
That is, the confidence interval is from 1.604 to 4.836.
Additional
95%
confidence
intervals
A 95% confidence interval for
3
-
2
is: from -1.787 to
3.467.
A 95% confidence interval for
2
-
1
is: from -0.247 to
5.007.
Contrasts Later on the topic of estimating more general linear
7.4.3.5. Confidence intervals for the difference of treatment means
http://www.itl.nist.gov/div898/handbook/prc/section4/prc435.htm[6/27/2012 2:42:56 PM]
discussed
later
combinations of means (primarily contrasts) will be
discussed, including how to put confidence bounds around
contrasts.
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[6/27/2012 2:42:57 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.6. Assessing the response from any factor
combination
Contrasts This page treats how to estimate and put confidence bounds
around the response to different combinations of factors.
Primary focus is on the combinations that are known as
contrasts. We begin, however, with the simple case of a
single factor-level mean.
Estimation of a Factor Level Mean With Confidence
Bounds
Estimating
factor level
means
An unbiased estimator of the factor level mean
i
in the 1-
way ANOVA model is given by:
where
Variance of
the factor
level means
The variance of this sample mean estimator is
Confidence
intervals for
the factor
level means
It can be shown that:
has a t distribution with (N - k) degrees of freedom for the
ANOVA model under consideration, where N is the total
number of observations and k is the number of factor levels
or groups. The degrees of freedom are the same as were
used to calculate the MSE in the ANOVA table. That is: dfe
(degrees of freedom for error) = N - k. From this we can
calculate (1- )100% confidence limits for each
i
. These
are given by:
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[6/27/2012 2:42:57 PM]
Example 1
Example for
a 4-level
treatment (or
4 different
treatments)
The data in the accompanying table resulted from an
experiment run in a completely randomized design in which
each of four treatments was replicated five times.
Total Mean
Group 1 6.9 5.4 5.8 4.6 4.0 26.70 5.34
Group 2 8.3 6.8 7.8 9.2 6.5 38.60 7.72
Group 3 8.0 10.5 8.1 6.9 9.3 42.80 8.56
Group 4 5.8 3.8 6.1 5.6 6.2 27.50 5.50
All Groups 135.60 6.78
1-Way
ANOVA
table layout
This experiment can be illustrated by the table layout for
this 1-way ANOVA experiment shown below:
Level Sample j
i 1 2 ... 5 Sum Mean N
1 Y
11
Y
12
... Y
15
Y
1. 1.
n
1
2 Y
21
Y
22
... Y
25
Y
2. 2.
n
2
3 Y
31
Y
32
... Y
35
Y
3. 3.
n
3
4 Y
41
Y
42
... Y
45
Y
4. 4.
n
4
All Y
. ..
n
t
ANOVA
table
The resulting ANOVA table is
Source SS DF MS F
Treatments 38.820 3 12.940 9.724
Error 21.292 16 1.331
Total (Corrected) 60.112 19
Mean 919.368 1
Total (Raw) 979.480 20
The estimate for the mean of group 1 is 5.34, and the
sample size is n
1
= 5.
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[6/27/2012 2:42:57 PM]
Computing
the
confidence
interval
Since the confidence interval is two-sided, the entry (1 -
/2) value for the t table is (1 - 0.05/2) = 0.975, and the
associated degrees of freedom is N - 4, or 20 - 4 = 16.
From the t table in Chapter 1, we obtain t
0.975;16
= 2.120.
Next we need the standard error of the mean for group 1:
Hence, we obtain confidence limits 5.34 2.120 (0.5159)
and the confidence interval is
Definition and Estimation of Contrasts
Definition of
contrasts and
orthogonal
contrasts
Definitions
A contrast is a linear combination of 2 or more factor level
means with coefficients that sum to zero.
Two contrasts are orthogonal if the sum of the products of
corresponding coefficients (i.e., coefficients for the same
means) adds to zero.
Formally, the definition of a contrast is expressed below,
using the notation
i
for the i-th treatment mean:
C = c
1 1
+ c
2 2
+ ... + c
j j
+ ... + c
k k
where
c
1
+ c
2
+ ... + c
j
+ ... + c
k
= = 0
Simple contrasts include the case of the difference between
two factor means, such as
1
-
2
. If one wishes to compare
treatments 1 and 2 with treatment 3, one way of expressing
this is by:
1
+
2
- 2
3
. Note that
1
-
2
has coefficients +1, -1
1
+
2
- 2
3
has coefficients +1, +1, -2.
These coefficients sum to zero.
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[6/27/2012 2:42:57 PM]
An example
of
orthogonal
contrasts
As an example of orthogonal contrasts, note the three
contrasts defined by the table below, where the rows denote
coefficients for the column treatment means.
1 2 3 4
c
1
+1 0 0 -1
c
2
0 +1 -1 0
c
3
+1 -1 -1 +1
Some
properties of
orthogonal
contrasts
The following is true:
1. The sum of the coefficients for each contrast is zero.
2. The sum of the products of coefficients of each pair
of contrasts is also 0 (orthogonality property).
3. The first two contrasts are simply pairwise
comparisons, the third one involves all the treatments.
Estimation of
contrasts
As might be expected, contrasts are estimated by taking the
same linear combination of treatment mean estimators. In
other words:
and
Note: These formulas hold for any linear combination of
treatment means, not just for contrasts.
Confidence Interval for a Contrast
Confidence
intervals for
contrasts
An unbiased estimator for a contrast C is given by
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[6/27/2012 2:42:57 PM]
The estimator of is
The estimator is normally distributed because it is a
linear combination of independent normal random variables.
It can be shown that:
is distributed as t
N-r
for the one-way ANOVA model under
discussion.
Therefore, the 1- confidence limits for C are:
Example 2 (estimating contrast)
Contrast to
estimate
We wish to estimate, in our previous example, the
following contrast:
and construct a 95 % confidence interval for C.
Computing
the point
estimate and
standard
error
The point estimate is:
Applying the formulas above we obtain
and
and the standard error is = 0.5159.
Confidence
interval
For a confidence coefficient of 95 % and df = 20 - 4 = 16,
t
0.975,16
= 2.12. Therefore, the desired 95 % confidence
interval is -0.5 2.12(0.5159) or
7.4.3.6. Assessing the response from any factor combination
http://www.itl.nist.gov/div898/handbook/prc/section4/prc436.htm[6/27/2012 2:42:57 PM]
(-1.594, 0.594).
Estimation of Linear Combinations
Estimating
linear
combinations
Sometimes we are interested in a linear combination of the
factor-level means that is not a contrast. Assume that in our
sample experiment certain costs are associated with each
group. For example, there might be costs associated with
each factor as follows:
Factor Cost in $
1 3
2 5
3 2
4 1
The following linear combination might then be of interest:
Coefficients
do not have
to sum to
zero for
linear
combinations
This resembles a contrast, but the coefficients c
i
do not
sum to zero. A linear combination is given by the
definition:
with no restrictions on the coefficients c
i
.
Confidence
interval
identical to
contrast
Confidence limits for a linear combination C are obtained in
precisely the same way as those for a contrast, using the
same calculation for the point estimator and estimated
variance.
7.4.3.7. The two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc437.htm[6/27/2012 2:42:58 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.7. The two-way ANOVA
Definition
of a
factorial
experiment
The 2-way ANOVA is probably the most popular layout in
the Design of Experiments. To begin with, let us define a
factorial experiment:
An experiment that utilizes every combination of factor
levels as treatments is called a factorial experiment.
Model for
the two-
way
factorial
experiment
In a factorial experiment with factor A at a levels and factor
B at b levels, the model for the general layout can be written
as
where is the overall mean response,
i
is the effect due to
the i-th level of factor A,
j
is the effect due to the j-th level
of factor B and
ij
is the effect due to any interaction
between the i-th level of A and the j-th level of B.
Fixed
factors and
fixed effects
models
At this point, consider the levels of factor A and of factor B
chosen for the experiment to be the only levels of interest to
the experimenter such as predetermined levels for
temperature settings or the length of time for process step.
The factors A and B are said to be fixed factors and the
model is a fixed-effects model. Random actors will be
discussed later.
When an a x b factorial experiment is conducted with an
equal number of observations per treatment combination, the
total (corrected) sum of squares is partitioned as:
SS(total) = SS(A) + SS(B) + SS(AB) + SSE
where AB represents the interaction between A and B.
For reference, the formulas for the sums of squares are:
7.4.3.7. The two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc437.htm[6/27/2012 2:42:58 PM]
The
breakdown
of the total
(corrected
for the
mean) sums
of squares
The resulting ANOVA table for an a x b factorial experiment
is
Source SS df MS
Factor A SS(A) (a - 1) MS(A) = SS(A)/(a-
1)
Factor B SS(B) (b - 1) MS(B) = SS(B)/(b-
1)
Interaction AB SS(AB) (a-1)(b-
1)
MS(AB)=
SS(AB)/(a-1)(b-1)
Error SSE (N - ab) SSE/(N - ab)
Total
(Corrected)
SS(Total) (N - 1)
The
ANOVA
table can
be used to
test
hypotheses
about the
effects and
interactions
The various hypotheses that can be tested using this ANOVA
table concern whether the different levels of Factor A, or
Factor B, really make a difference in the response, and
whether the AB interaction is significant (see previous
discussion of ANOVA hypotheses).
7.4.3.8. Models and calculations for the two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc438.htm[6/27/2012 2:42:59 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?
7.4.3.8. Models and calculations for the two-way
ANOVA
Basic Layout
The
balanced
2-way
factorial
layout
Factor A has 1, 2, ..., a levels. Factor B has 1, 2, ..., b levels. There are
ab treatment combinations (or cells) in a complete factorial layout.
Assume that each treatment cell has r independent obsevations (known
as replications). When each cell has the same number of replications,
the design is a balanced factorial. In this case, the abrdata points
{y
ijk
} can be shown pictorially as follows:
Factor B
1 2 ... b
1 y
111
, y
112
, ..., y
11r
y
121
, y
122
, ..., y
12r
... y
1b1
, y
1b2
, ..., y
1br
2 y
211
, y
212
, ..., y
21r
y
221
, y
222
, ..., y
22r
... y
2b1
, y
2b2
, ..., y
2br
Factor
A
.
.
... .... ...
a y
a11
, y
a12
, ..., y
a1r
y
a21
, y
a22
, ..., y
a2r
... y
ab1
, y
ab2
, ..., y
abr
How to
obtain
sums of
squares
for the
balanced
factorial
layout
Next, we will calculate the sums of squares needed for the ANOVA
table.
Let A
i
be the sum of all observations of level i of factor A, i = 1,
... ,a. The A
i
are the row sums.
Let B
j
be the sum of all observations of level j of factor B, j = 1,
...,b. The B
j
are the column sums.
Let (AB)
ij
be the sum of all observations of level i of A and
level j of B. These are cell sums.
Let r be the number of replicates in the experiment; that is: the
number of times each factorial treatment combination appears in
the experiment.
Then the total number of observations for each level of factor A is rb
and the total number of observations for each level of factor B is raand
the total number of observations for each interaction is r.
7.4.3.8. Models and calculations for the two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc438.htm[6/27/2012 2:42:59 PM]
Finally, the total number of observations n in the experiment is abr.
With the help of these expressions we arrive (omitting derivations) at
These expressions are used to calculate the ANOVA table entries for
the (fixed effects) 2-way ANOVA.
Two-Way ANOVA Example:
Data An evaluation of a new coating applied to 3 different materials was
conducted at 2 different laboratories. Each laboratory tested 3 samples
from each of the treated materials. The results are given in the next
table:
Materials (B)
LABS (A) 1 2 3
4.1 3.1 3.5
1 3.9 2.8 3.2
4.3 3.3 3.6
2.7 1.9 2.7
2 3.1 2.2 2.3
2.6 2.3 2.5
Row and
column
sums
The preliminary part of the analysis yields a table of row and column
sums.
Material (B)
7.4.3.8. Models and calculations for the two-way ANOVA
http://www.itl.nist.gov/div898/handbook/prc/section4/prc438.htm[6/27/2012 2:42:59 PM]
Lab (A) 1 2 3 Total (A
i
)
1 12.3 9.2 10.3 31.8
2 8.4 6.4 7.5 22.3
Total (B
j
) 20.7 15.6 17.8 54.1
ANOVA
table
From this table we generate the ANOVA table.
Source SS df MS F p-value
A 5.0139 1 5.0139 100.28 0
B 2.1811 2 1.0906 21.81 .0001
AB 0.1344 2 0.0672 1.34 .298
Error 0.6000 12 0.0500
Total (Corr) 7.9294 17
7.4.4. What are variance components?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc44.htm[6/27/2012 2:43:00 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.4. What are variance components?
Fixed and Random Factors and Components of
Variance
A fixed level
of a factor or
variable
means that
the levels in
the
experiment
are the only
ones we are
interested in
In the previous example, the levels of the factor
temperature were considered as fixed; that is, the three
temperatures were the only ones that we were interested in
(this may sound somewhat unlikely, but let us accept it
without opposition). The model employed for fixed levels is
called a fixed model. When the levels of a factor are
random, such as operators, days, lots or batches, where the
levels in the experiment might have been chosen at random
from a large number of possible levels, the model is called
a random model, and inferences are to be extended to all
levels of the population.
Random
levels are
chosen at
random from
a large or
infinite set of
levels
In a random model the experimenter is often interested in
estimating components of variance. Let us run an example
that analyzes and interprets a component of variance or
random model.
Components of Variance Example for Random Factors
Data for the
example
A company supplies a customer with a larger number of
batches of raw materials. The customer makes three sample
determinations from each of 5 randomly selected batches to
control the quality of the incoming material. The model is
and the k levels (e.g., the batches) are chosen at random
from a population with variance . The data are shown
below
Batch
1 2 3 4 5
74 68 75 72 79
7.4.4. What are variance components?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc44.htm[6/27/2012 2:43:00 PM]
76 71 77 74 81
75 72 77 73 79
ANOVA table
for example
A 1-way ANOVA is performed on the data with the
following results:
ANOVA
Source SS df MS EMS
Treatment (batches) 147.74 4 36.935 + 3
Error 17.99 10 1.799
Total (corrected) 165.73 14
Interpretation
of the
ANOVA table
The computations that produce the SS are the same for both
the fixed and the random effects model. For the random
model, however, the treatment sum of squares, SST, is an
estimate of { + 3 }. This is shown in the EMS
(Expected Mean Squares) column of the ANOVA table.
The test statistic from the ANOVA table is F = 36.94 / 1.80
= 20.5.
If we had chosen an value of .01, then the F value from
the table in Chapter 1 for a df of 4 in the numerator and 10
in the denominator is 5.99.
Method of
moments
Since the test statistic is larger than the critical value, we
reject the hypothesis of equal means. Since these batches
were chosen via a random selection process, it may be of
interest to find out how much of the variance in the
experiment might be attributed to batch diferences and how
much to random error. In order to answer these questions,
we can use the EMS column. The estimate of is 1.80 and
the computed treatment mean square of 36.94 is an estimate
of + 3 . Setting the MS values equal to the EMS values
(this is called the Method of Moments), we obtain
where we use s
2
since these are estimators of the
corresponding
2
's.
Computation
of the
Solving these expressions
7.4.4. What are variance components?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc44.htm[6/27/2012 2:43:00 PM]
components
of variance
The total variance can be estimated as
Interpretation In terms of percentages, we see that 11.71/13.51 = 86.7
percent of the total variance is attributable to batch
differences and 13.3 percent to error variability within the
batches.
7.4.5. How can we compare the results of classifying according to several categories?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc45.htm[6/27/2012 2:43:01 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.5. How can we compare the results of classifying
according to several categories?
Contingency
Table
approach
When items are classified according to two or more criteria, it is often of
interest to decide whether these criteria act independently of one another.
For example, suppose we wish to classify defects found in wafers produced
in a manufacturing plant, first according to the type of defect and, second,
according to the production shift during which the wafers were produced. If
the proportions of the various types of defects are constant from shift to
shift, then classification by defects is independent of the classification by
production shift. On the other hand, if the proportions of the various defects
vary from shift to shift, then the classification by defects depends upon or is
contingent upon the shift classification and the classifications are dependent.
In the process of investigating whether one method of classification is
contingent upon another, it is customary to display the data by using a cross
classification in an array consisting of r rows and c columns called a
contingency table. A contingency table consists of r x c cells representing
the r x c possible outcomes in the classification process. Let us construct an
industrial case:
Industrial
example
A total of 309 wafer defects were recorded and the defects were classified as
being one of four types, A, B, C, or D. At the same time each wafer was
identified according to the production shift in which it was manufactured, 1,
2, or 3.
Contingency
table
classifying
defects in
wafers
according to
type and
production
shift
These counts are presented in the following table.
Type of Defects
Shift A B C D Total
1 15(22.51) 21(20.99) 45(38.94) 13(11.56) 94
2 26(22.9) 31(21.44) 34(39.77) 5(11.81) 96
3 33(28.50) 17(26.57) 49(49.29) 20(14.63) 119
Total 74 69 128 38 309
7.4.5. How can we compare the results of classifying according to several categories?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc45.htm[6/27/2012 2:43:01 PM]
(Note: the numbers in parentheses are the expected cell frequencies).
Column
probabilities
Let p
A
be the probability that a defect will be of type A. Likewise, define p
B
,
p
C
, and p
D
as the probabilities of observing the other three types of defects.
These probabilities, which are called the column probabilities, will satisfy
the requirement
p
A
+ p
B
+ p
C
+ p
D
= 1
Row
probabilities
By the same token, let p
i
(i=1, 2, or 3) be the row probability that a defect
will have occurred during shift i, where
p
1
+ p
2
+ p
3
= 1
Multiplicative
Law of
Probability
Then if the two classifications are independent of each other, a cell
probability will equal the product of its respective row and column
probabilities in accordance with the Multiplicative Law of Probability.
Example of
obtaining
column and
row
probabilities
For example, the probability that a particular defect will occur in shift 1 and
is of type A is (p
1
) (p
A
). While the numerical values of the cell probabilities
are unspecified, the null hypothesis states that each cell probability will equal
the product of its respective row and column probabilities. This condition
implies independence of the two classifications. The alternative hypothesis is
that this equality does not hold for at least one cell.
In other words, we state the null hypothesis as H
0
: the two classifications are
independent, while the alternative hypothesis is H
a
: the classifications are
dependent.
To obtain the observed column probability, divide the column total by the
grand total, n. Denoting the total of column j as c
j
, we get
Similarly, the row probabilities p
1
, p
2
, and p
3
are estimated by dividing the
row totals r
1
, r
2
, and r
3
by the grand total n, respectively
Expected cell
frequencies
Denote the observed frequency of the cell in row i and column jof the
contingency table by n
ij
. Then we have
7.4.5. How can we compare the results of classifying according to several categories?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc45.htm[6/27/2012 2:43:01 PM]
Estimated
expected cell
frequency
when H
0
is
true.
In other words, when the row and column classifications are independent, the
estimated expected value of the observed cell frequency n
ij
in an r x c
contingency table is equal to its respective row and column totals divided by
the total frequency.
The estimated cell frequencies are shown in parentheses in the contingency
table above.
Test statistic From here we use the expected and observed frequencies shown in the table
to calculate the value of the test statistic
df = (r-1)(c-
1)
The next step is to find the appropriate number of degrees of freedom
associated with the test statistic. Leaving out the details of the derivation, we
state the result:
The number of degrees of freedom associated with a contingency
table consisting of r rows and c columns is (r-1) (c-1).
So for our example we have (3-1) (4-1) = 6 d.f.
Testing the
null
hypothesis
In order to test the null hypothesis, we compare the test statistic with the
critical value of
2
1-/2
at a selected value of . Let us use = 0.05. Then
the critical value is
2
0.95,6
= 12.5916 (see the chi square table in Chapter
1). Since the test statistic of 19.18 exceeds the critical value, we reject the
null hypothesis and conclude that there is significant evidence that the
proportions of the different defect types vary from shift to shift. In this case,
the p-value of the test statistic is 0.00387.
7.4.6. Do all the processes have the same proportion of defects?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc46.htm[6/27/2012 2:43:02 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.6. Do all the processes have the same
proportion of defects?
The contingency table approach
Testing for
homogeneity
of proportions
using the chi-
square
distribution
via
contingency
tables
When we have samples from n populations (i.e., lots,
vendors, production runs, etc.), we can test whether there
are significant differences in the proportion defectives for
these populations using a contingency table approach. The
contingency table we construct has two rows and n
columns.
To test the null hypothesis of no difference in the
proportions among the n populations
H
0
: p
1
= p
2
= ... = p
n
against the alternative that not all n population proportions
are equal
H
1
: Not all p
i
are equal (i = 1, 2, ..., n)
The chi-square
test statistic
we use the following test statistic:
where f
o
is the observed frequency in a given cell of a 2 x
n contingency table, and f
c
is the theoretical count or
expected frequency in a given cell if the null hypothesis
were true.
The critical
value
The critical value is obtained from the
2
distribution
table with degrees of freedom (2-1)(n-1) = n-1, at a given
level of significance.
An illustrative example
Data for the
example
Diodes used on a printed circuit board are produced in lots
of size 4000. To study the homogeneity of lots with
7.4.6. Do all the processes have the same proportion of defects?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc46.htm[6/27/2012 2:43:02 PM]
respect to a demanding specification, we take random
samples of size 300 from 5 consecutive lots and test the
diodes. The results are:
Lot
Results 1 2 3 4 5 Totals
Nonconforming 36 46 42 63 38 225
Conforming 264 254 258 237 262 1275
Totals 300 300 300 300 300 1500
Computation
of the overall
proportion of
nonconforming
units
Assuming the null hypothesis is true, we can estimate the
single overall proportion of nonconforming diodes by
pooling the results of all the samples as
Computation
of the overall
proportion of
conforming
units
We estimate the proportion of conforming ("good") diodes
by the complement 1 - 0.15 = 0.85. Multiplying these two
proportions by the sample sizes used for each lot results in
the expected frequencies of nonconforming and
conforming diodes. These are presented below:
Table of
expected
frequencies
Lot
Results 1 2 3 4 5 Totals
Nonconforming 45 45 45 45 45 225
Conforming 255 255 255 255 255 1275
Totals 300 300 300 300 300 1500
Null and
alternate
hypotheses
To test the null hypothesis of homogeneity or equality of
proportions
H
0
: p
1
= p
2
= ... = p
5
against the alternative that not all 5 population proportions
are equal
H
1
: Not all p
i
are equal (i = 1, 2, ...,5)
Table for
computing the
test statistic
we use the observed and expected values from the tables
above to compute the
2
test statistic. The calculations are
presented below:
2 2
7.4.6. Do all the processes have the same proportion of defects?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc46.htm[6/27/2012 2:43:02 PM]
f
o
f
c
(f
o
- f
c
)
(f
o
- f
c
) (f
o
- f
c
) / f
c
36 45 -9 81 1.800
46 45 1 1 0.022
42 45 -3 9 0.200
63 45 18 324 7.200
38 45 -7 49 1.089
264 225 9 81 0.318
254 255 -1 1 0.004
258 255 3 9 0.035
237 255 -18 324 1.271
262 255 7 49 0.192
12.131
Conclusions If we choose a .05 level of significance, the critical value
of
2
with 4 degrees of freedom is 9.488 (see the chi
square distribution table in Chapter 1). Since the test
statistic (12.131) exceeds this critical value, we reject the
null hypothesis.
7.4.7. How can we make multiple comparisons?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm[6/27/2012 2:43:03 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
What to do
after
equality of
means is
rejected
When processes are compared and the null hypothesis of
equality (or homogeneity) is rejected, all we know at that
point is that there is no equality amongst them. But we do
not know the form of the inequality.
Typical
questions
Questions concerning the reason for the rejection of the null
hypothesis arise in the form of:
"Which mean(s) or proportion (s) differ from a
standard or from each other?"
"Does the mean of treatment 1 differ from that of
treatment 2?"
"Does the average of treatments 1 and 2 differ from
the average of treatments 3 and 4?"
Multiple
Comparison
test
procedures
are needed
One popular way to investigate the cause of rejection of the
null hypothesis is a Multiple Comparison Procedure. These
are methods which examine or compare more than one pair
of means or proportions at the same time.
Note: Doing pairwise comparison procedures over and over
again for all possible pairs will not, in general, work. This is
because the overall significance level is not as specified for
a single pair comparison.
ANOVA F
test is a
preliminary
test
The ANOVA uses the F test to determine whether there
exists a significant difference among treatment means or
interactions. In this sense it is a preliminary test that informs
us if we should continue the investigation of the data at
hand.
If the null hypothesis (no difference among treatments or
interactions) is accepted, there is an implication that no
relation exists between the factor levels and the response.
There is not much we can learn, and we are finished with the
analysis.
When the F test rejects the null hypothesis, we usually want
to undertake a thorough analysis of the nature of the factor-
7.4.7. How can we make multiple comparisons?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm[6/27/2012 2:43:03 PM]
level effects.
Procedures
for
examining
factor-level
effects
Previously, we discussed several procedures for examining
particular factor-level effects. These were
Estimation of the Difference Between Two Factor
Means
Estimation of Factor Level Effects
Confidence Intervals For A Contrast
Determine
contrasts in
advance of
observing
the
experimental
results
These types of investigations should be done on
combinations of factors that were determined in advance of
observing the experimental results, or else the confidence
levels are not as specified by the procedure. Also, doing
several comparisons might change the overall confidence
level (see note above). This can be avoided by carefully
selecting contrasts to investigate in advance and making sure
that:
the number of such contrasts does not exceed the
number of degrees of freedom between the treatments
only orthogonal contrasts are chosen.
However, there are also several powerful multiple
comparison procedures we can use after observing the
experimental results.
Tests on Means after Experimentation
Procedures
for
performing
multiple
comparisons
If the decision on what comparisons to make is withheld
until after the data are examined, the following procedures
can be used:
Tukey's Method to test all possible pairwise
differences of means to determine if at least one
difference is significantly different from 0.
Scheff's Method to test all possible contrasts at the
same time, to see if at least one is significantly
different from 0.
Bonferroni Method to test, or put simultaneous
confidence intervals around, a pre-selected group of
contrasts
Multiple Comparisons Between Proportions
Procedure
for
proportion
defective
data
When we are dealing with population proportion defective
data, the Marascuilo procedure can be used to
simultaneously examine comparisons between all groups
after the data have been collected.
7.4.7. How can we make multiple comparisons?
http://www.itl.nist.gov/div898/handbook/prc/section4/prc47.htm[6/27/2012 2:43:03 PM]
7.4.7.1. Tukey's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc471.htm[6/27/2012 2:43:04 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
7.4.7.1. Tukey's method
Tukey's
method
considers
all possible
pairwise
differences
of means at
the same
time
The Tukey method applies simultaneously to the set of all
pairwise comparisons
{
i
-
j
}
The confidence coefficient for the set, when all sample sizes
are equal, is exactly 1- . For unequal sample sizes, the
confidence coefficient is greater than 1- . In other words,
the Tukey method is conservative when there are unequal
sample sizes.
Studentized Range Distribution
The
studentized
range q
The Tukey method uses the studentized range distribution.
Suppose we have r independent observations y
1
, ..., y
r
from
a normal distribution with mean and variance
2
. Let w be
the range for this set , i.e., the maximum minus the
minimum. Now suppose that we have an estimate s
2
of the
variance
2
which is based on degrees of freedom and is
independent of the y
i
. The studentized range is defined as
The
distribution
of q is
tabulated in
many
textbooks
and can be
calculated
using
Dataplot
The distribution of q has been tabulated and appears in many
textbooks on statistics. In addition, Dataplot has a CDF
function (SRACDF) and a percentile function (SRAPPF) for
q.
As an example, let r = 5 and = 10. The 95th percentile is
q
.05;5,10
= 4.65. This means:
So, if we have five observations from a normal distribution,
the probability is .95 that their range is not more than 4.65
times as great as an independent sample standard deviation
estimate for which the estimator has 10 degrees of freedom.
7.4.7.1. Tukey's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc471.htm[6/27/2012 2:43:04 PM]
Tukey's Method
Confidence
limits for
Tukey's
method
The Tukey confidence limits for all pairwise comparisons
with confidence coefficient of at least 1- are:
Notice that the point estimator and the estimated variance are
the same as those for a single pairwise comparison that was
illustrated previously. The only difference between the
confidence limits for simultaneous comparisons and those for
a single comparison is the multiple of the estimated standard
deviation.
Also note that the sample sizes must be equal when using the
studentized range approach.
Example
Data We use the data from a previous example.
Set of all
pairwise
comparisons
The set of all pairwise comparisons consists of:
2
-
1
,
3
-
1
,
1
-
4
,
2
-
3
,
2
-
4
,
3
-
4
Confidence
intervals for
each pair
Assume we want a confidence coefficient of 95 percent, or
.95. Since r = 4 and n
t
= 20, the required percentile of the
studentized range distribution is q
.05; 4,16
. Using the Tukey
method for each of the six comparisons yields:
Conclusions The simultaneous pairwise comparisons indicate that the
differences
1
-
4
and
2
-
3
are not significantly different
from 0 (their confidence intervals include 0), and all the
other pairs are significantly different.
Unequal
sample sizes
It is possible to work with unequal sample sizes. In this case,
one has to calculate the estimated standard deviation for each
7.4.7.1. Tukey's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc471.htm[6/27/2012 2:43:04 PM]
pairwise comparison. The Tukey procedure for unequal
sample sizes is sometimes referred to as the Tukey-Kramer
Method.
7.4.7.2. Scheffe's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc472.htm[6/27/2012 2:43:05 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
7.4.7.2. Scheffe's method
Scheffe's
method tests
all possible
contrasts at
the same
time
Scheff's method applies to the set of estimates of all
possible contrasts among the factor level means, not just the
pairwise differences considered by Tukey's method.
Definition of
contrast
An arbitrary contrast is defined by
where
Infinite
number of
contrasts
Technically there is an infinite number of contrasts. The
simultaneous confidence coefficient is exactly 1- , whether
the factor level sample sizes are equal or unequal.
Estimate and
variance for
C
As was described earlier, we estimate C by:
for which the estimated variance is:
Simultaneous
confidence
interval
It can be shown that the probability is 1 - that all
confidence limits of the type
7.4.7.2. Scheffe's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc472.htm[6/27/2012 2:43:05 PM]
are correct simultaneously.
Scheffe method example
Contrasts to
estimate
We wish to estimate, in our previous experiment, the
following contrasts
and construct 95 percent confidence intervals for them.
Compute the
point
estimates of
the
individual
contrasts
The point estimates are:
Compute the
point
estimate and
variance of
C
Applying the formulas above we obtain in both cases:
and
where = 1.331 was computed in our previous example.
The standard error = .5158 (square root of .2661).
Scheffe
confidence
interval
For a confidence coefficient of 95 percent and degrees of
freedom in the numerator of r - 1 = 4 - 1 = 3, and in the
denominator of 20 - 4 = 16, we have:
The confidence limits for C
1
are -.5 3.12(.5158) = -.5
1.608, and for C
2
they are .34 1.608.
The desired simultaneous 95 percent confidence intervals
are
-2.108 C
1
1.108
7.4.7.2. Scheffe's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc472.htm[6/27/2012 2:43:05 PM]
-1.268 C
2
1.948
Comparison
to confidence
interval for a
single
contrast
Recall that when we constructed a confidence interval for a
single contrast, we found the 95 percent confidence interval:
-1.594 C 0.594
As expected, the Scheff confidence interval procedure that
generates simultaneous intervals for all contrasts is
considerabley wider.
Comparison of Scheff's Method with Tukey's Method
Tukey
preferred
when only
pairwise
comparisons
are of
interest
If only pairwise comparisons are to be made, the Tukey
method will result in a narrower confidence limit, which is
preferable.
Consider for example the comparison between
3
and
1
.
Tukey: 1.13 <
3
-
1
< 5.31
Scheff: 0.95 <
3
-
1
< 5.49
which gives Tukey's method the edge.
The normalized contrast, using sums, for the Scheff method
is 4.413, which is close to the maximum contrast.
Scheffe
preferred
when many
contrasts are
of interest
In the general case when many or all contrasts might be of
interest, the Scheff method tends to give narrower
confidence limits and is therefore the preferred method.
7.4.7.3. Bonferroni's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc473.htm[6/27/2012 2:43:06 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
7.4.7.3. Bonferroni's method
Simple
method
The Bonferroni method is a simple method that allows
many comparison statements to be made (or confidence
intervals to be constructed) while still assuring an overall
confidence coefficient is maintained.
Applies for a
finite number
of contrasts
This method applies to an ANOVA situation when the
analyst has picked out a particular set of pairwise
comparisons or contrasts or linear combinations in advance.
This set is not infinite, as in the Scheff case, but may
exceed the set of pairwise comparisons specified in the
Tukey procedure.
Valid for
both equal
and unequal
sample sizes
The Bonferroni method is valid for equal and unequal
sample sizes. We restrict ourselves to only linear
combinations or comparisons of treatment level means
(pairwise comparisons and contrasts are special cases of
linear combinations). We denote the number of statements
or comparisons in the finite set by g.
Bonferroni
general
inequality
Formally, the Bonferroni general inequality is presented by:
where A
i
and its complement are any events.
Interpretation
of Bonferroni
inequality
In particular, if each A
i
is the event that a calculated
confidence interval for a particular linear combination of
treatments includes the true value of that combination, then
the left-hand side of the inequality is the probability that all
the confidence intervals simultaneously cover their
respective true values. The right-hand side is one minus the
sum of the probabilities of each of the intervals missing
their true values. Therefore, if simultaneous multiple
interval estimates are desired with an overall confidence
coefficient 1- , one can construct each interval with
confidence coefficient (1- /g), and the Bonferroni
inequality insures that the overall confidence coefficient is
at least 1- .
7.4.7.3. Bonferroni's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc473.htm[6/27/2012 2:43:06 PM]
Formula for
Bonferroni
confidence
interval
In summary, the Bonferroni method states that the
confidence coefficient is at least 1- that simultaneously all
the following confidence limits for the g linear
combinations C
i
are "correct" (or capture their respective
true values):
where
Example using Bonferroni method
Contrasts to
estimate
We wish to estimate, as we did using the Scheffe method,
the following linear combinations (contrasts):
and construct 95 % confidence intervals around the
estimates.
Compute the
point
estimates of
the individual
contrasts
The point estimates are:
Compute the
point
estimate and
variance of C
As before, for both contrasts, we have
and
where = 1.331 was computed in our previous example.
The standard error is .5158 (the square root of .2661).
7.4.7.3. Bonferroni's method
http://www.itl.nist.gov/div898/handbook/prc/section4/prc473.htm[6/27/2012 2:43:06 PM]
Compute the
Bonferroni
simultaneous
confidence
interval
For a 95 % overall confidence coefficient using the
Bonferroni method, the t value is t
1-0.05/(2*2),16
= t
0.9875,16
= 2.473 (from the t table in Chapter 1). Now we can
calculate the confidence intervals for the two contrasts. For
C
1
we have confidence limits -0.5 2.473 (.5158) and for
C
2
we have confidence limits 0.34 2.473 (0.5158).
Thus, the confidence intervals are:
-1.776 C
1
0.776
-0.936 C
2
1.616
Comparison
to Scheffe
interval
Notice that the Scheff interval for C
1
is:
-2.108 C
1
1.108
which is wider and therefore less attractive.
Comparison of Bonferroni Method with Scheff and
Tukey Methods
No one
comparison
method is
uniformly
best - each
has its uses
1. If all pairwise comparisons are of interest, Tukey has
the edge. If only a subset of pairwise comparisons are
required, Bonferroni may sometimes be better.
2. When the number of contrasts to be estimated is
small, (about as many as there are factors) Bonferroni
is better than Scheff. Actually, unless the number of
desired contrasts is at least twice the number of
factors, Scheff will always show wider confidence
bands than Bonferroni.
3. Many computer packages include all three methods.
So, study the output and select the method with the
smallest confidence band.
4. No single method of multiple comparisons is
uniformly best among all the methods.
7.4.7.4. Comparing multiple proportions: The Marascuillo procedure
http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm[6/27/2012 2:43:07 PM]
7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.7. How can we make multiple comparisons?
7.4.7.4. Comparing multiple proportions: The
Marascuillo procedure
Testing for
equal
proportions of
defects
Earlier, we discussed how to test whether several
populations have the same proportion of defects. The
example given there led to rejection of the null hypothesis
of equality.
Marascuilo
procedure
allows
comparison of
all possible
pairs of
proportions
Rejecting the null hypothesis only allows us to conclude
that not (in this case) all lots are equal with respect to the
proportion of defectives. However, it does not tell us which
lot or lots caused the rejection.
The Marascuilo procedure enables us to simultaneously test
the differences of all pairs of proportions when there are
several populations under investigation.
The Marascuillo Procedure
Step 1:
compute
differences p
i
- p
j
Assume we have samples of size n
i
(i = 1, 2, ..., k) from k
populations. The first step of this procedure is to compute
the differences p
i
- p
j
, (where i is not equal to j) among all
k(k-1)/2 pairs of proportions.
The absolute values of these differences are the test-
statistics.
Step 2:
compute test
statistics
Step 2 is to pick a significance level and compute the
corresponding critical values for the Marascuilo procedure
from
Step 3:
compare test
statistics
against
corresponding
critical values
The third and last step is to compare each of the k(k-1)/2
test statistics against its corresponding critical r
ij
value.
Those pairs that have a test statistic that exceeds the
critical value are significant at the level.
7.4.7.4. Comparing multiple proportions: The Marascuillo procedure
http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm[6/27/2012 2:43:07 PM]
Example
Sample
proportions
To illustrate the Marascuillo procedure, we use the data
from the previous example. Since there were 5 lots, there
are (5 x 4)/2 = 10 possible pairwise comparisons to be
made and ten critical ranges to compute. The five sample
proportions are:
p
1
= 36/300 = .120
p
2
= 46/300 = .153
p
3
= 42/300 = .140
p
4
= 63/300 = .210
p
5
= 38/300 = .127
Table of
critical values
For an overall level of significance of 0.05, the critical
value of the chi-square distribution having four degrees of
freedom is
2
0.95,4
= 9.488 and the square root of 9.488 is
3.080. Calculating the 10 absolute differences and the 10
critical values leads to the following summary table.
contrast value critical range significant
|p
1
- p
2
| .033 0.086 no
|p
1
- p
3
| .020 0.085 no
|p
1
- p
4
| .090 0.093 no
|p
1
- p
5
| .007 0.083 no
|p
2
- p
3
| .013 0.089 no
|p
2
- p
4
| .057 0.097 no
|p
2
- p
5
| .026 0.087 no
|p
3
- p
4
| .070 0.095 no
|p
3
- p
5
| .013 0.086 no
|p
4
- p
5
| .083 0.094 no
The table of critical values can be generated using both
Dataplot code and R code.
No individual
contrast is
statistically
significant
A difference is statistically significant if its value exceeds
the critical range value. In this example, even though the
null hypothesis of equality was rejected earlier, there is not
enough data to conclude any particular difference is
significant. Note, however, that all the comparisons
involving population 4 come the closest to significance -
leading us to suspect that more data might actually show
that population 4 does have a significantly higher
proportion of defects.
7.4.7.4. Comparing multiple proportions: The Marascuillo procedure
http://www.itl.nist.gov/div898/handbook/prc/section4/prc474.htm[6/27/2012 2:43:07 PM]
7.5. References
http://www.itl.nist.gov/div898/handbook/prc/section5/prc5.htm[6/27/2012 2:43:08 PM]
7. Product and Process Comparisons
7.5. References
Primary
References
Agresti, A. and Coull, B. A. (1998). Approximate is better
than "exact" for interval estimation of binomial proportions",
The American Statistician, 52(2), 119-126.
Berenson M.L. and Levine D.M. (1996) Basic Business
Statistics, Prentice-Hall, Englewood Cliffs, New Jersey.
Bhattacharyya, G. K., and R. A. Johnson, (1997). Statistical
Concepts and Methods, John Wiley and Sons, New York.
Birnbaum, Z. W. (1952). "Numerical tabulation of the
distribution of Kolmogorov's statistic for finite sample size",
Journal of the American Statistical Association, 47, page 425.
Brown, L. D. Cai, T. T. and DasGupta, A. (2001). Interval
estimation for a binomial proportion", Statistical Science,
16(2), 101-133.
Diamond, W. J. (1989). Practical Experiment Designs, Van-
Nostrand Reinhold, New York.
Dixon, W. J. and Massey, F.J. (1969). Introduction to
Statistical Analysis, McGraw-Hill, New York.
Draper, N. and Smith, H., (1981). Applied Regression
Analysis, John Wiley & Sons, New York.
Fliess, J. L., Levin, B. and Paik, M. C. (2003). Statistical
Methods for Rates and Proportions, Third Edition, John Wiley
& Sons, New York.
Hahn, G. J. and Meeker, W. Q. (1991). Statistical Intervals: A
Guide for Practitioners, John Wiley & Sons, New York.
Hicks, C. R. (1973). Fundamental Concepts in the Design of
Experiments, Holt, Rinehart and Winston, New York.
Hollander, M. and Wolfe, D. A. (1973). Nonparametric
Statistical Methods, John Wiley & Sons, New York.
Howe, W. G. (1969). "Two-sided Tolerance Limits for Normal
Populations - Some Improvements", Journal of the Americal
Statistical Association, 64 , pages 610-620.
Kendall, M. and Stuart, A. (1979). The Advanced Theory of
7.5. References
http://www.itl.nist.gov/div898/handbook/prc/section5/prc5.htm[6/27/2012 2:43:08 PM]
Statistics, Volume 2: Inference and Relationship. Charles
Griffin & Co. Limited, London.
Mendenhall, W., Reinmuth, J. E. and Beaver, R. J. Statistics
for Management and Economics, Duxbury Press, Belmont,
CA.
Montgomery, D. C. (1991). Design and Analysis of
Experiments, John Wiley & Sons, New York.
Moore, D. S. (1986). "Tests of Chi-Square Type". From
Goodness-of-Fit Techniques (D'Agostino & Stephens eds.).
Myers, R. H., (1990). Classical and Modern Regression with
Applications, PWS-Kent, Boston, MA.
Neter, J., Wasserman, W. and Kutner, M. H. (1990). Applied
Linear Statistical Models, 3rd Edition, Irwin, Boston, MA.
Lawless, J. F., (1982). Statistical Models and Methods for
Lifetime Data, John Wiley & Sons, New York.
Pearson, A. V., and Hartley, H. O. (1972). Biometrica Tables
for Statisticians, Vol 2, Cambridge, England, Cambridge
University Press.
Sarhan, A. E. and Greenberg, B. G. (1956). "Estimation of
location and scale parameters by order statistics from singly
and double censored samples," Part I, Annals of Mathematical
Statistics, 27, 427-451.
Searle, S. S., Casella, G. and McCulloch, C. E. (1992).
Variance Components, John Wiley & Sons, New York.
Siegel, S. (1956). Nonparametric Statistics, McGraw-Hill,
New York.
Shapiro, S. S. and Wilk, M. B. (1965). "An analysis of
variance test for normality (complete samples)", Biometrika,
52, 3 and 4, pages 591-611.
Some Additional References and Bibliography
Books D'Agostino, R. B. and Stephens, M. A. (1986). Goodness-of-
FitTechniques, Marcel Dekker, Inc., New York.
Hicks, C. R. 1973. Fundamental Concepts in the Design of
Experiments. Holt, Rhinehart and Winston,New-York
Miller, R. G., Jr. (1981). Simultaneous Statistical Inference,
Springer-Verlag, New York.
Neter, Wasserman, and Whitmore (1993). Applied Statistics,
4th Edition, Allyn and Bacon, Boston, MA.
7.5. References
http://www.itl.nist.gov/div898/handbook/prc/section5/prc5.htm[6/27/2012 2:43:08 PM]
Neter, J., Wasserman, W. and Kutner, M. H. (1990). Applied
Linear Statistical Models, 3rd Edition, Irwin, Boston, MA.
Scheffe, H. (1959). The Analysis of Variance, John Wiley,
New-York.
Articles Begun, J. M. and Gabriel, K. R. (1981). "Closure of the
Newman-Keuls Multiple Comparisons Procedure", Journal of
the American Statistical Association, 76, page 374.
Carmer, S. G. and Swanson, M. R. (1973. "Evaluation of Ten
Pairwise Multiple Comparison Procedures by Monte-Carlo
Methods", Journal of the American Statistical Association, 68,
pages 66-74.
Duncan, D. B. (1975). "t-Tests and Intervals for Comparisons
suggested by the Data" Biometrics, 31, pages 339-359.
Dunnett, C. W. (1980). "Pairwise Multiple Comparisons in the
Homogeneous Variance for Unequal Sample Size Case",
Journal of the American Statistical Association, 75, page 789.
Einot, I. and Gabriel, K. R. (1975). "A Study of the Powers of
Several Methods of Multiple Comparison", Journal of the
American Statistical Association, 70, page 351.
Gabriel, K. R. (1978). "A Simple Method of Multiple
Comparisons of Means", Journal of the American Statistical
Association, 73, page 364.
Hochburg, Y. (1974). "Some Conservative Generalizations of
the T-Method in Simultaneous Inference", Journal of
Multivariate Analysis, 4, pages 224-234.
Kramer, C. Y. (1956). "Extension of Multiple Range Tests to
Group Means with Unequal Sample Sizes", Biometrics, 12,
pages 307-310.
Marcus, R., Peritz, E. and Gabriel, K. R. (1976). "On Closed
Testing Procedures with Special Reference to Ordered
Analysis of Variance", Biometrics, 63, pages 655-660.
Ryan, T. A. (1959). "Multiple Comparisons in Psychological
Research", Psychological Bulletin, 56, pages 26-47.
Ryan, T. A. (1960). "Significance Tests for Multiple
Comparisons of Proportions, Variances, and Other Statistics",
Psychological Bulletin, 57, pages 318-328.
Scheffe, H. (1953). "A Method for Judging All Contrasts in the
Analysis of Variance", Biometrika,40, pages 87-104.
Sidak, Z., (1967). "Rectangular Confidence Regions for the
Means of Multivariate Normal Distributions", Journal of the
American Statistical Association, 62, pages 626-633.
7.5. References
http://www.itl.nist.gov/div898/handbook/prc/section5/prc5.htm[6/27/2012 2:43:08 PM]
Tukey, J. W. (1953). The Problem of Multiple Comparisons,
Unpublished Manuscript.
Waller, R. A. and Duncan, D. B. (1969). "A Bayes Rule for the
Symmetric Multiple Comparison Problem", Journal of the
American Statistical Association 64, pages 1484-1504.
Waller, R. A. and Kemp, K. E. (1976). "Computations of
Bayesian t-Values for Multiple Comparisons", Journal of
Statistical Computation and Simulation, 75, pages 169-172.
Welsch, R. E. (1977). "Stepwise Multiple Comparison
Procedure", Journal of the American Statistical Association,
72, page 359.
8. Assessing Product Reliability
http://www.itl.nist.gov/div898/handbook/apr/apr.htm[6/27/2012 2:50:13 PM]
8. Assessing Product Reliability
This chapter describes the terms, models and techniques used to evaluate
and predict product reliability.
1. Introduction
1. Why important?
2. Basic terms and models
3. Common difficulties
4. Modeling "physical
acceleration"
5. Common acceleration models
6. Basic non-repairable lifetime
distributions
7. Basic models for repairable
systems
8. Evaluate reliability "bottom-
up"
9. Modeling reliability growth
10. Bayesian methodology
2. Assumptions/Prerequisites
1. Choosing appropriate life
distribution
2. Plotting reliability data
3. Testing assumptions
4. Choosing a physical
acceleration model
5. Models and assumptions for
Bayesian methods
3. Reliability Data Collection
1. Planning reliability assessment
tests
4. Reliability Data Analysis
1. Estimating parameters from
censored data
2. Fitting an acceleration model
3. Projecting reliability at use
conditions
4. Comparing reliability between
two or more populations
5. Fitting system repair rate
models
6. Estimating reliability using a
Bayesian gamma prior
Click here for a detailed table of contents
References for Chapter 8
8. Assessing Product Reliability
http://www.itl.nist.gov/div898/handbook/apr/apr_d.htm[6/27/2012 2:48:49 PM]
8. Assessing Product Reliability - Detailed Table of Contents [8.]
1. Introduction [8.1.]
1. Why is the assessment and control of product reliability important? [8.1.1.]
1. Quality versus reliability [8.1.1.1.]
2. Competitive driving factors [8.1.1.2.]
3. Safety and health considerations [8.1.1.3.]
2. What are the basic terms and models used for reliability evaluation? [8.1.2.]
1. Repairable systems, non-repairable populations and lifetime distribution models [8.1.2.1.]
2. Reliability or survival function [8.1.2.2.]
3. Failure (or hazard) rate [8.1.2.3.]
4. "Bathtub" curve [8.1.2.4.]
5. Repair rate or ROCOF [8.1.2.5.]
3. What are some common difficulties with reliability data and how are they overcome? [8.1.3.]
1. Censoring [8.1.3.1.]
2. Lack of failures [8.1.3.2.]
4. What is "physical acceleration" and how do we model it? [8.1.4.]
5. What are some common acceleration models? [8.1.5.]
1. Arrhenius [8.1.5.1.]
2. Eyring [8.1.5.2.]
3. Other models [8.1.5.3.]
6. What are the basic lifetime distribution models used for non-repairable populations? [8.1.6.]
1. Exponential [8.1.6.1.]
2. Weibull [8.1.6.2.]
3. Extreme value distributions [8.1.6.3.]
4. Lognormal [8.1.6.4.]
5. Gamma [8.1.6.5.]
6. Fatigue life (Birnbaum-Saunders) [8.1.6.6.]
7. Proportional hazards model [8.1.6.7.]
7. What are some basic repair rate models used for repairable systems? [8.1.7.]
1. Homogeneous Poisson Process (HPP) [8.1.7.1.]
2. Non-Homogeneous Poisson Process (NHPP) - power law [8.1.7.2.]
3. Exponential law [8.1.7.3.]
8. How can you evaluate reliability from the "bottom-up" (component failure mode to system failure
rate)? [8.1.8.]
1. Competing risk model [8.1.8.1.]
2. Series model [8.1.8.2.]
3. Parallel or redundant model [8.1.8.3.]
4. R out of N model [8.1.8.4.]
5. Standby model [8.1.8.5.]
6. Complex systems [8.1.8.6.]
9. How can you model reliability growth? [8.1.9.]
1. NHPP power law [8.1.9.1.]
2. Duane plots [8.1.9.2.]
8. Assessing Product Reliability
http://www.itl.nist.gov/div898/handbook/apr/apr_d.htm[6/27/2012 2:48:49 PM]
3. NHPP exponential law [8.1.9.3.]
10. How can Bayesian methodology be used for reliability evaluation? [8.1.10.]
2. Assumptions/Prerequisites [8.2.]
1. How do you choose an appropriate life distribution model? [8.2.1.]
1. Based on failure mode [8.2.1.1.]
2. Extreme value argument [8.2.1.2.]
3. Multiplicative degradation argument [8.2.1.3.]
4. Fatigue life (Birnbaum-Saunders) model [8.2.1.4.]
5. Empirical model fitting - distribution free (Kaplan-Meier) approach [8.2.1.5.]
2. How do you plot reliability data? [8.2.2.]
1. Probability plotting [8.2.2.1.]
2. Hazard and cum hazard plotting [8.2.2.2.]
3. Trend and growth plotting (Duane plots) [8.2.2.3.]
3. How can you test reliability model assumptions? [8.2.3.]
1. Visual tests [8.2.3.1.]
2. Goodness of fit tests [8.2.3.2.]
3. Likelihood ratio tests [8.2.3.3.]
4. Trend tests [8.2.3.4.]
4. How do you choose an appropriate physical acceleration model? [8.2.4.]
5. What models and assumptions are typically made when Bayesian methods are used for reliability
evaluation? [8.2.5.]
3. Reliability Data Collection [8.3.]
1. How do you plan a reliability assessment test? [8.3.1.]
1. Exponential life distribution (or HPP model) tests [8.3.1.1.]
2. Lognormal or Weibull tests [8.3.1.2.]
3. Reliability growth (Duane model) [8.3.1.3.]
4. Accelerated life tests [8.3.1.4.]
5. Bayesian gamma prior model [8.3.1.5.]
4. Reliability Data Analysis [8.4.]
1. How do you estimate life distribution parameters from censored data? [8.4.1.]
1. Graphical estimation [8.4.1.1.]
2. Maximum likelihood estimation [8.4.1.2.]
3. A Weibull maximum likelihood estimation example [8.4.1.3.]
2. How do you fit an acceleration model? [8.4.2.]
1. Graphical estimation [8.4.2.1.]
2. Maximum likelihood [8.4.2.2.]
3. Fitting models using degradation data instead of failures [8.4.2.3.]
3. How do you project reliability at use conditions? [8.4.3.]
4. How do you compare reliability between two or more populations? [8.4.4.]
5. How do you fit system repair rate models? [8.4.5.]
1. Constant repair rate (HPP/exponential) model [8.4.5.1.]
2. Power law (Duane) model [8.4.5.2.]
3. Exponential law model [8.4.5.3.]
6. How do you estimate reliability using the Bayesian gamma prior model? [8.4.6.]
7. References For Chapter 8: Assessing Product Reliability [8.4.7.]
8.1. Introduction
http://www.itl.nist.gov/div898/handbook/apr/section1/apr1.htm[6/27/2012 2:48:55 PM]
8. Assessing Product Reliability
8.1. Introduction
This section introduces the terminology and models that will
be used to describe and quantify product reliability. The
terminology, probability distributions and models used for
reliability analysis differ in many cases from those used in
other statistical applications.
Detailed
contents of
Section 1
1. Introduction
1. Why is the assessment and control of product
reliability important?
1. Quality versus reliability
2. Competitive driving factors
3. Safety and health considerations
2. What are the basic terms and models used for
reliability evaluation?
1. Repairable systems, non-repairable
populations and lifetime distribution
models
2. Reliability or survival function
3. Failure (or hazard) rate
4. "Bathtub" curve
5. Repair rate or ROCOF
3. What are some common difficulties with
reliability data and how are they overcome?
1. Censoring
2. Lack of failures
4. What is "physical acceleration" and how do we
model it?
5. What are some common acceleration models?
1. Arrhenius
2. Eyring
3. Other models
6. What are the basic lifetime distribution models
used for non-repairable populations?
1. Exponential
2. Weibull
3. Extreme value distributions
4. Lognormal
5. Gamma
6. Fatigue life (Birnbaum-Saunders)
7. Proportional hazards model
7. What are some basic repair rate models used for
repairable systems?
1. Homogeneous Poisson Process (HPP)
8.1. Introduction
http://www.itl.nist.gov/div898/handbook/apr/section1/apr1.htm[6/27/2012 2:48:55 PM]
2. Non-Homogeneous Poisson Process
(NHPP) with power law
3. Exponential law
8. How can you evaluate reliability from the
"bottom- up" (component failure mode to system
failure rates)?
1. Competing risk model
2. Series model
3. Parallel or redundant model
4. R out of N model
5. Standby model
6. Complex systems
9. How can you model reliability growth?
1. NHPP power law
2. Duane plots
3. NHPP exponential law
10. How can Bayesian methodology be used for
reliability evaluation?
8.1.1. Why is the assessment and control of product reliability important?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr11.htm[6/27/2012 2:48:56 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.1. Why is the assessment and control of
product reliability important?
We depend
on,
demand,
and expect
reliable
products
In today's technological world nearly everyone depends upon
the continued functioning of a wide array of complex
machinery and equipment for their everyday health, safety,
mobility and economic welfare. We expect our cars,
computers, electrical appliances, lights, televisions, etc. to
function whenever we need them - day after day, year after
year. When they fail the results can be catastrophic: injury,
loss of life and/or costly lawsuits can occur. More often,
repeated failure leads to annoyance, inconvenience and a
lasting customer dissatisfaction that can play havoc with the
responsible company's marketplace position.
Shipping
unreliable
products
can
destroy a
company's
reputation
It takes a long time for a company to build up a reputation for
reliability, and only a short time to be branded as "unreliable"
after shipping a flawed product. Continual assessment of new
product reliability and ongoing control of the reliability of
everything shipped are critical necessities in today's
competitive business arena.
8.1.1.1. Quality versus reliability
http://www.itl.nist.gov/div898/handbook/apr/section1/apr111.htm[6/27/2012 2:48:57 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.1. Why is the assessment and control of product reliability important?
8.1.1.1. Quality versus reliability
Reliability
is "quality
changing
over time"
The everyday usage term "quality of a product" is loosely
taken to mean its inherent degree of excellence. In industry,
this is made more precise by defining quality to be
"conformance to requirements at the start of use". Assuming
the product specifications adequately capture customer
requirements, the quality level can now be precisely
measured by the fraction of units shipped that meet
specifications.
A motion
picture
instead of a
snapshot
But how many of these units still meet specifications after a
week of operation? Or after a month, or at the end of a one
year warranty period? That is where "reliability" comes in.
Quality is a snapshot at the start of life and reliability is a
motion picture of the day-by-day operation. Time zero
defects are manufacturing mistakes that escaped final test.
The additional defects that appear over time are "reliability
defects" or reliability fallout.
Life
distributions
model
fraction
fallout over
time
The quality level might be described by a single fraction
defective. To describe reliability fallout a probability model
that describes the fraction fallout over time is needed. This is
known as the life distribution model.
8.1.1.2. Competitive driving factors
http://www.itl.nist.gov/div898/handbook/apr/section1/apr112.htm[6/27/2012 2:48:57 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.1. Why is the assessment and control of product reliability important?
8.1.1.2. Competitive driving factors
Reliability
is a major
economic
factor in
determining
a product's
success
Accurate prediction and control of reliability plays an
important role in the profitability of a product. Service costs
for products within the warranty period or under a service
contract are a major expense and a significant pricing factor.
Proper spare part stocking and support personnel hiring and
training also depend upon good reliability fallout predictions.
On the other hand, missing reliability targets may invoke
contractual penalties and cost future business.
Companies that can economically design and market products
that meet their customers' reliability expectations have a
strong competitive advantage in today's marketplace.
8.1.1.3. Safety and health considerations
http://www.itl.nist.gov/div898/handbook/apr/section1/apr113.htm[6/27/2012 2:48:58 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.1. Why is the assessment and control of product reliability important?
8.1.1.3. Safety and health considerations
Some failures
have serious
social
consequences
and this
should be
taken into
account
when
planning
reliability
studies
Sometimes equipment failure can have a major impact on
human safety and/or health. Automobiles, planes, life
support equipment, and power generating plants are a few
examples.
From the point of view of "assessing product reliability", we
treat these kinds of catastrophic failures no differently from
the failure that occurs when a key parameter measured on a
manufacturing tool drifts slightly out of specification,
calling for an unscheduled maintenance action.
It is up to the reliability engineer (and the relevant
customer) to define what constitutes a failure in any
reliability study. More resource (test time and test units)
should be planned for when an incorrect reliability
assessment could negatively impact safety and/or health.
8.1.2. What are the basic terms and models used for reliability evaluation?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr12.htm[6/27/2012 2:48:58 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used
for reliability evaluation?
Reliability
methods
and
terminology
began with
19th
century
insurance
companies
Reliability theory developed apart from the mainstream of
probability and statistics, and was used primarily as a tool to
help nineteenth century maritime and life insurance
companies compute profitable rates to charge their customers.
Even today, the terms "failure rate" and "hazard rate" are
often used interchangeably.
The following sections will define some of the concepts,
terms, and models we need to describe, estimate and predict
reliability.
8.1.2.1. Repairable systems, non-repairable populations and lifetime distribution models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr121.htm[6/27/2012 2:48:59 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.1. Repairable systems, non-repairable populations
and lifetime distribution models
Life
distribution
models
describe
how non-
repairable
populations
fail over
time
A repairable system is one which can be restored to satisfactory operation by
any action, including parts replacements or changes to adjustable settings.
When discussing the rate at which failures occur during system operation
time (and are then repaired) we will define a Rate Of Occurrence Of Failure
(ROCF) or "repair rate". It would be incorrect to talk about failure rates or
hazard rates for repairable systems, as these terms apply only to the first
failure times for a population of non repairable components.
A non-repairable population is one for which individual items that fail are
removed permanently from the population. While the system may be
repaired by replacing failed units from either a similar or a different
population, the members of the original population dwindle over time until
all have eventually failed.
We begin with models and definitions for non-repairable populations. Repair
rates for repairable populations will be defined in a later section.
The theoretical population models used to describe unit lifetimes are known
as Lifetime Distribution Models. The population is generally considered to
be all of the possible unit lifetimes for all of the units that could be
manufactured based on a particular design and choice of materials and
manufacturing process. A random sample of size n from this population is
the collection of failure times observed for a randomly selected group of n
units.
Any
continuous
PDF
defined
only for
non-
negative
values can
be a
lifetime
distribution
model
A lifetime distribution model can be any probability density function (or
PDF) f(t) defined over the range of time from t = 0 to t = infinity. The
corresponding cumulative distribution function (or CDF) F(t) is a very
useful function, as it gives the probability that a randomly selected unit will
fail by time t. The figure below shows the relationship between f(t) and F(t)
and gives three descriptions of F(t).
8.1.2.1. Repairable systems, non-repairable populations and lifetime distribution models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr121.htm[6/27/2012 2:48:59 PM]
1. F(t) = the area under the PDF f(t) to the left of
t.
2. F(t) = the probability that a single randomly
chosen new unit will fail by time t.
3. F(t) = the proportion of the entire population
that fails by time t.
The figure above also shows a shaded area under f(t) between the two times
t
1
and t
2
. This area is [F(t
2
) - F(t
1
)] and represents the proportion of the
population that fails between times t
1
and t
2
(or the probability that a brand
new randomly chosen unit will survive to time t
1
but fail before time t
2
).
Note that the PDF f(t) has only non-negative values and eventually either
becomes 0 as t increases, or decreases towards 0. The CDF F(t) is
monotonically increasing and goes from 0 to 1 as t approaches infinity. In
other words, the total area under the curve is always 1.
The
Weibull
model is a
good
example of
a life
distribution
The 2-parameter Weibull distribution is an example of a popular F(t). It has
the CDF and PDF equations given by:
where is the "shape" parameter and is a scale parameter called the
characteristic life.
Example: A company produces automotive fuel pumps that fail according to
a Weibull life distribution model with shape parameter = 1.5 and scale
parameter 8,000 (time measured in use hours). If a typical pump is used 800
hours a year, what proportion are likely to fail within 5 years?
Solution: The probability associated with the 800*5 quantile of a Weibull
distribution with = 1.5 and = 8000 is 0.298. Thus about 30% of the
pumps will fail in the first 5 years.
Functions for computing PDF values and CDF values, are available in both
8.1.2.1. Repairable systems, non-repairable populations and lifetime distribution models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr121.htm[6/27/2012 2:48:59 PM]
Dataplot code and R code.
8.1.2.2. Reliability or survival function
http://www.itl.nist.gov/div898/handbook/apr/section1/apr122.htm[6/27/2012 2:49:00 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.2. Reliability or survival function
Survival is the
complementary
event to failure
The Reliability FunctionR(t), also known as the Survival
Function S(t), is defined by:
R(t) = S(t) = the probability a unit survives beyond time t.
Since a unit either fails, or survives, and one of these two
mutually exclusive alternatives must occur, we have
R(t) = 1 - F(t), F(t) = 1 - R(t)
Calculations using R(t) often occur when building up from
single components to subsystems with many components.
For example, if one microprocessor comes from a
population with reliability function R
m
(t) and two of them
are used for the CPU in a system, then the system CPU
has a reliability function given by
R
cpu
(t) = R
m
2
(t)
The reliability
of the system is
the product of
the reliability
functions of
the
components
since both must survive in order for the system to survive.
This building up to the system from the individual
components will be discussed in detail when we look at
the "Bottom-Up" method. The general rule is: to calculate
the reliability of a system of independent components,
multiply the reliability functions of all the components
together.
8.1.2.3. Failure (or hazard) rate
http://www.itl.nist.gov/div898/handbook/apr/section1/apr123.htm[6/27/2012 2:49:01 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.3. Failure (or hazard) rate
The
failure
rate is the
rate at
which the
population
survivors
at any
given
instant are
"falling
over the
cliff"
The failure rate is defined for non repairable populations as the
(instantaneous) rate of failure for the survivors to time t during
the next instant of time. It is a rate per unit of time similar in
meaning to reading a car speedometer at a particular instant and
seeing 45 mph. The next instant the failure rate may change and
the units that have already failed play no further role since only
the survivors count.
The failure rate (or hazard rate) is denoted by h(t) and calculated
from
The failure rate is sometimes called a "conditional failure rate"
since the denominator 1 - F(t) (i.e., the population survivors)
converts the expression into a conditional rate, given survival
past time t.
Since h(t) is also equal to the negative of the derivative of
ln{R(t)}, we have the useful identity:
If we let
be the Cumulative Hazard Function, we then have F(t) = 1 - e
-
H(t)
. Two other useful identities that follow from these formulas
are:
8.1.2.3. Failure (or hazard) rate
http://www.itl.nist.gov/div898/handbook/apr/section1/apr123.htm[6/27/2012 2:49:01 PM]
It is also sometimes useful to define an average failure rate over
any interval (T
1
, T
2
) that "averages" the failure rate over that
interval. This rate, denoted by AFR(T
1
,T
2
), is a single number
that can be used as a specification or target for the population
failure rate over that interval. If T
1
is 0, it is dropped from the
expression. Thus, for example, AFR(40,000) would be the
average failure rate for the population over the first 40,000 hours
of operation.
The formulas for calculating AFR's are:
8.1.2.4. "Bathtub" curve
http://www.itl.nist.gov/div898/handbook/apr/section1/apr124.htm[6/27/2012 2:49:01 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.4. "Bathtub" curve
A plot of
the
failure
rate
over
time for
most
products
yields a
curve
that
looks
like a
drawing
of a
bathtub
If enough units from a given population are observed operating and failing over
time, it is relatively easy to compute week-by-week (or month-by-month)
estimates of the failure rate h(t). For example, if N
12
units survive to start the
13th month of life and r
13
of them fail during the next month (or 720 hours) of
life, then a simple empirical estimate of h(t) averaged across the 13th month of
life (or between 8640 hours and 9360 hours of age), is given by (r
13
/ N
12
*
720). Similar estimates are discussed in detail in the section on Empirical Model
Fitting.
Over many years, and across a wide variety of mechanical and electronic
components and systems, people have calculated empirical population failure
rates as units age over time and repeatedly obtained a graph such as shown
below. Because of the shape of this failure rate curve, it has become widely
known as the "Bathtub" curve.
The initial region that begins at time zero when a customer first begins to use the
product is characterized by a high but rapidly decreasing failure rate. This region
is known as the Early Failure Period (also referred to as Infant Mortality
Period, from the actuarial origins of the first bathtub curve plots). This
decreasing failure rate typically lasts several weeks to a few months.
Next, the failure rate levels off and remains roughly constant for (hopefully) the
majority of the useful life of the product. This long period of a level failure rate
is known as the Intrinsic Failure Period (also called the Stable Failure
Period) and the constant failure rate level is called the Intrinsic Failure Rate.
Note that most systems spend most of their lifetimes operating in this flat
portion of the bathtub curve
Finally, if units from the population remain in use long enough, the failure rate
begins to increase as materials wear out and degradation failures occur at an ever
increasing rate. This is the Wearout Failure Period.
8.1.2.4. "Bathtub" curve
http://www.itl.nist.gov/div898/handbook/apr/section1/apr124.htm[6/27/2012 2:49:01 PM]
NOTE: The Bathtub Curve also applies (based on much empirical evidence) to
Repairable Systems. In this case, the vertical axis is the Repair Rate or the Rate
of Occurrence of Failures (ROCOF).
8.1.2.5. Repair rate or ROCOF
http://www.itl.nist.gov/div898/handbook/apr/section1/apr125.htm[6/27/2012 2:49:02 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.2. What are the basic terms and models used for reliability evaluation?
8.1.2.5. Repair rate or ROCOF
Repair
Rate
models are
based on
counting
the
cumulative
number of
failures
over time
A different approach is used for modeling the rate of
occurrence of failure incidences for a repairable system. In
this chapter, these rates are called repair rates (not to be
confused with the length of time for a repair, which is not
discussed in this chapter). Time is measured by system power-
on-hours from initial turn-on at time zero, to the end of
system life. Failures occur as given system ages and the
system is repaired to a state that may be the same as new, or
better, or worse. The frequency of repairs may be increasing,
decreasing, or staying at a roughly constant rate.
Let N(t) be a counting function that keeps track of the
cumulative number of failures a given system has had from
time zero to time t. N(t) is a step function that jumps up one
every time a failure occurs and stays at the new level until the
next failure.
Every system will have its own observed N(t) function over
time. If we observed the N(t) curves for a large number of
similar systems and "averaged" these curves, we would have
an estimate of M(t) = the expected number (average number)
of cumulative failures by time t for these systems.
The Repair
Rate (or
ROCOF)
is the
mean rate
of failures
per unit
time
The derivative of M(t), denoted m(t), is defined to be the
Repair Rate or the Rate Of Occurrence Of Failures at Time
t or ROCOF.
Models for N(t), M(t) and m(t) will be described in the section
on Repair Rate Models.
8.1.3. What are some common difficulties with reliability data and how are they overcome?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr13.htm[6/27/2012 2:49:03 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.3. What are some common difficulties with
reliability data and how are they
overcome?
The
Paradox
of
Reliability
Analysis:
The more
reliable a
product is,
the harder
it is to get
the failure
data
needed to
"prove" it
is reliable!
There are two closely related problems that are typical with
reliability data and not common with most other forms of
statistical data. These are:
Censoring (when the observation period ends, not all
units have failed - some are survivors)
Lack of Failures (if there is too much censoring, even
though a large number of units may be under
observation, the information in the data is limited due to
the lack of actual failures)
These problems cause considerable practical difficulty when
planning reliability assessment tests and analyzing failure data.
Some solutions are discussed in the next two sections.
Typically, the solutions involve making additional assumptions
and using complicated models.
8.1.3.1. Censoring
http://www.itl.nist.gov/div898/handbook/apr/section1/apr131.htm[6/27/2012 2:49:03 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.3. What are some common difficulties with reliability data and how are they overcome?
8.1.3.1. Censoring
When not
all units
on test fail
we have
censored
data
Consider a situation in which we are reliability testing n (non repairable) units
taken randomly from a population. We are investigating the population to
determine if its failure rate is acceptable. In the typical test scenario, we have a
fixed time T to run the units to see if they survive or fail. The data obtained are
called Censored Type I data.
Censored Type I Data
During the T hours of test we observe r failures (where r can be any number
from 0 to n). The (exact) failure times are t
1
, t
2
, ..., t
r
and there are (n - r) units
that survived the entire T-hour test without failing. Note that T is fixed in
advance and r is random, since we don't know how many failures will occur until
the test is run. Note also that we assume the exact times of failure are recorded
when there are failures.
This type of censoring is also called "right censored" data since the times of
failure to the right (i.e., larger than T) are missing.
Another (much less common) way to test is to decide in advance that you want
to see exactly r failure times and then test until they occur. For example, you
might put 100 units on test and decide you want to see at least half of them fail.
Then r = 50, but T is unknown until the 50th fail occurs. This is called Censored
Type II data.
Censored Type II Data
We observe t
1
, t
2
, ..., t
r
, where r is specified in advance. The test ends at time T
= t
r
, and (n-r) units have survived. Again we assume it is possible to observe
the exact time of failure for failed units.
Type II censoring has the significant advantage that you know in advance how
many failure times your test will yield - this helps enormously when planning
adequate tests. However, an open-ended random test time is generally
impractical from a management point of view and this type of testing is rarely
seen.
Sometimes
we don't
even know
the exact
time of
failure
Readout or Interval Data
Sometimes exact times of failure are not known; only an interval of time in
which the failure occurred is recorded. This kind of data is called Readout or
Interval data and the situation is shown in the figure below:
8.1.3.1. Censoring
http://www.itl.nist.gov/div898/handbook/apr/section1/apr131.htm[6/27/2012 2:49:03 PM]
.
Multicensored Data
In the most general case, every unit observed yields exactly one of the following
three types of information:
a run-time if the unit did not fail while under observation
an exact failure time
an interval of time during which the unit failed.
The units may all have different run-times and/or readout intervals.
Many
special
methods
have been
developed
to handle
censored
data
How do we handle censored data?
Many statistical methods can be used to fit models and estimate failure rates,
even with censored data. In later sections we will discuss the Kaplan-Meier
approach, Probability Plotting, Hazard Plotting, Graphical Estimation, and
Maximum Likelihood Estimation.
Separating out Failure Modes
Note that when a data set consists of failure times that can be sorted into several
different failure modes, it is possible (and often necessary) to analyze and model
each mode separately. Consider all failures due to modes other than the one
being analyzed as censoring times, with the censored run-time equal to the time
it failed due to the different (independent) failure mode. This is discussed further
in the competing risk section and later analysis sections.
8.1.3.2. Lack of failures
http://www.itl.nist.gov/div898/handbook/apr/section1/apr132.htm[6/27/2012 2:49:04 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.3. What are some common difficulties with reliability data and how are they overcome?
8.1.3.2. Lack of failures
Failure
data is
needed to
accurately
assess and
improve
reliability
- this
poses
problems
when
testing
highly
reliable
parts
When fitting models and estimating failure rates from
reliability data, the precision of the estimates (as measured by
the width of the confidence intervals) tends to vary inversely
with the square root of the number of failures observed - not
the number of units on test or the length of the test. In other
words, a test where 5 fail out of a total of 10 on test gives
more information than a test with 1000 units but only 2
failures.
Since the number of failures r is critical, and not the sample
size n on test, it becomes increasingly difficult to assess the
failure rates of highly reliable components. Parts like memory
chips, that in typical use have failure rates measured in parts
per million per thousand hours, will have few or no failures
when tested for reasonable time periods with affordable
sample sizes. This gives little or no information for
accomplishing the two primary purposes of reliability testing,
namely:
accurately assessing population failure rates
obtaining failure mode information to feedback for
product improvement.
Testing at
much
higher
than
typical
stresses
can yield
failures
but models
are then
needed to
relate
these back
to use
stress
How can tests be designed to overcome an expected lack of
failures?
The answer is to make failures occur by testing at much higher
stresses than the units would normally see in their intended
application. This creates a new problem: how can these
failures at higher-than-normal stresses be related to what
would be expected to happen over the course of many years at
normal use stresses? The models that relate high stress
reliability to normal use reliability are called acceleration
models.
8.1.3.2. Lack of failures
http://www.itl.nist.gov/div898/handbook/apr/section1/apr132.htm[6/27/2012 2:49:04 PM]
8.1.4. What is "physical acceleration" and how do we model it?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr14.htm[6/27/2012 2:49:05 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.4. What is "physical acceleration" and how
do we model it?
When
changing
stress is
equivalent to
multiplying
time to fail
by a
constant, we
have true
(physical)
acceleration
Physical Acceleration (sometimes called True
Acceleration or just Acceleration) means that operating a
unit at high stress (i.e., higher temperature or voltage or
humidity or duty cycle, etc.) produces the same failures that
would occur at typical-use stresses, except that they happen
much quicker.
Failure may be due to mechanical fatigue, corrosion,
chemical reaction, diffusion, migration, etc. These are the
same causes of failure under normal stress; the time scale is
simply different.
An
Acceleration
Factor is the
constant
multiplier
between the
two stress
levels
When there is true acceleration, changing stress is equivalent
to transforming the time scale used to record when failures
occur. The transformations commonly used are linear,
which means that time-to-fail at high stress just has to be
multiplied by a constant (the acceleration factor) to obtain
the equivalent time-to-fail at use stress.
We use the following notation:
t
s
= time-to-fail at stress
t
u
= corresponding time-to-fail at
use
F
s
(t) = CDF at stress F
u
(t) = CDF at use
f
s
(t) = PDF at stress f
u
(t) = PDF at use
h
s
(t) = failure rate at
stress
h
u
(t) = failure rate at use
Then, an acceleration factor AF between stress and use
means the following relationships hold:
Linear Acceleration Relationships
Time-to-Fail
t
u
= AF t
s
Failure Probability
F
u
(t) = F
s
(t/AF)
Reliability
R
u
(t) = R
s
(t/AF)
PDF or Density Function
f
u
(t) = (1/AF)f
s
(t/AF)
Failure Rate
h
u
(t) = (1/AF) h
s
(t/AF)
8.1.4. What is "physical acceleration" and how do we model it?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr14.htm[6/27/2012 2:49:05 PM]
Each failure
mode has its
own
acceleration
factor
Failure data
should be
separated by
failure mode
when
analyzed, if
acceleration
is relevant
Probability
plots of data
from
different
stress cells
have the
same slope
(if there is
acceleration)
Note: Acceleration requires that there be a stress dependent
physical process causing change or degradation that leads to
failure. In general, different failure modes will be affected
differently by stress and have different acceleration factors.
Therefore, it is unlikely that a single acceleration factor will
apply to more than one failure mechanism. In general,
different failure modes will be affected differently by stress
and have different acceleration factors. Separate out
different types of failure when analyzing failure data.
Also, a consequence of the linear acceleration relationships
shown above (which follows directly from "true
acceleration") is the following:
The Shape Parameter for the key life
distribution models (Weibull, Lognormal) does
not change for units operating under different
stresses. Probability plots of data from different
stress cells will line up roughly parallel.
These distributions and probability plotting will be
discussed in later sections.
8.1.5. What are some common acceleration models?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr15.htm[6/27/2012 2:49:06 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.5. What are some common acceleration
models?
Acceleration
models
predict time
to fail as a
function of
stress
Acceleration factors show how time-to-fail at a particular
operating stress level (for one failure mode or mechanism)
can be used to predict the equivalent time to fail at a
different operating stress level.
A model that predicts time-to-fail as a function of stress
would be even better than a collection of acceleration
factors. If we write t
f
= G(S), with G(S) denoting the model
equation for an arbitrary stress level S, then the acceleration
factor between two stress levels S
1
and S
2
can be evaluated
simply by AF = G(S
1
)/G(S
2
). Now we can test at the higher
stress S
2
, obtain a sufficient number of failures to fit life
distribution models and evaluate failure rates, and use the
Linear Acceleration Relationships Table to predict what will
occur at the lower use stress S
1
.
A model that predicts time-to-fail as a function of operating
stresses is known as an acceleration model.
Acceleration
models are
often
derived
from
physics or
kinetics
models
related to
the failure
mechanism
Acceleration models are usually based on the physics or
chemistry underlying a particular failure mechanism.
Successful empirical models often turn out to be
approximations of complicated physics or kinetics models,
when the theory of the failure mechanism is better
understood. The following sections will consider a variety of
powerful and useful models:
Arrhenius
Eyring
Other Models
8.1.5.1. Arrhenius
http://www.itl.nist.gov/div898/handbook/apr/section1/apr151.htm[6/27/2012 2:49:06 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.5. What are some common acceleration models?
8.1.5.1. Arrhenius
The
Arrhenius
model
predicts
failure
acceleration
due to
temperature
increase
One of the earliest and most successful acceleration models
predicts how time-to-fail varies with temperature. This
empirically based model is known as the Arrhenius equation.
It takes the form
with T denoting temperature measured in degrees Kelvin
(273.16 + degrees Celsius) at the point when the failure
process takes place and k is Boltzmann's constant (8.617 x
10
-5
in ev/K). The constant A is a scaling factor that drops
out when calculating acceleration factors, with H
(pronounced "Delta H") denoting the activation energy,
which is the critical parameter in the model.
The
Arrhenius
activation
energy,
H, is all you
need to
know to
calculate
temperature
acceleration
The value of H depends on the failure mechanism and the
materials involved, and typically ranges from .3 or .4 up to
1.5, or even higher. Acceleration factors between two
temperatures increase exponentially as H increases.
The acceleration factor between a higher temperature T
2
and
a lower temperature T
1
is given by
Using the value of k given above, this can be written in
terms of T in degrees Celsius as
Note that the only unknown parameter in this formula is
H.
Example: The acceleration factor between 25C and 125C
is 133 if H = .5 and 17,597 if H = 1.0.
The Arrhenius model has been used successfully for failure
8.1.5.1. Arrhenius
http://www.itl.nist.gov/div898/handbook/apr/section1/apr151.htm[6/27/2012 2:49:06 PM]
mechanisms that depend on chemical reactions, diffusion
processes or migration processes. This covers many of the
non mechanical (or non material fatigue) failure modes that
cause electronic equipment failure.
8.1.5.2. Eyring
http://www.itl.nist.gov/div898/handbook/apr/section1/apr152.htm[6/27/2012 2:49:07 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.5. What are some common acceleration models?
8.1.5.2. Eyring
The Eyring
model has a
theoretical
basis in
chemistry
and quantum
mechanics
and can be
used to
model
acceleration
when many
stresses are
involved
Henry Eyring's contributions to chemical reaction rate theory
have led to a very general and powerful model for
acceleration known as the Eyring Model. This model has
several key features:
It has a theoretical basis from chemistry and quantum
mechanics.
If a chemical process (chemical reaction, diffusion,
corrosion, migration, etc.) is causing degradation
leading to failure, the Eyring model describes how the
rate of degradation varies with stress or, equivalently,
how time to failure varies with stress.
The model includes temperature and can be expanded
to include other relevant stresses.
The temperature term by itself is very similar to the
Arrhenius empirical model, explaining why that model
has been so successful in establishing the connection
between the H parameter and the quantum theory
concept of "activation energy needed to cross an
energy barrier and initiate a reaction".
The model for temperature and one additional stress takes
the general form:
for which S
1
could be some function of voltage or current or
any other relevant stress and the parameters , H, B, and
C determine acceleration between stress combinations. As
with the Arrhenius Model, k is Boltzmann's constant and
temperature is in degrees Kelvin.
If we want to add an additional non-thermal stress term, the
model becomes
and as many stresses as are relevant can be included by
adding similar terms.
8.1.5.2. Eyring
http://www.itl.nist.gov/div898/handbook/apr/section1/apr152.htm[6/27/2012 2:49:07 PM]
Models with
multiple
stresses
generally
have no
interaction
terms -
which means
you can
multiply
acceleration
factors due
to different
stresses
Note that the general Eyring model includes terms that have
stress and temperature interactions (in other words, the
effect of changing temperature varies, depending on the
levels of other stresses). Most models in actual use do not
include any interaction terms, so that the relative change in
acceleration factors when only one stress changes does not
depend on the level of the other stresses.
In models with no interaction, you can compute acceleration
factors for each stress and multiply them together. This
would not be true if the physical mechanism required
interaction terms - but, at least to first approximations, it
seems to work for most examples in the literature.
The Eyring
model can
also be used
to model
rate of
degradation
leading to
failure as a
function of
stress
Advantages of the Eyring Model
Can handle many stresses.
Can be used to model degradation data as well as
failure data.
The H parameter has a physical meaning and has
been studied and estimated for many well known
failure mechanisms and materials.
In practice,
the Eyring
Model is
usually too
complicated
to use in its
most general
form and
must be
"customized"
or simplified
for any
particular
failure
mechanism
Disadvantages of the Eyring Model
Even with just two stresses, there are 5 parameters to
estimate. Each additional stress adds 2 more unknown
parameters.
Many of the parameters may have only a second-
order effect. For example, setting = 0 works quite
well since the temperature term then becomes the
same as in the Arrhenius model. Also, the constants C
and E are only needed if there is a significant
temperature interaction effect with respect to the other
stresses.
The form in which the other stresses appear is not
specified by the general model and may vary
according to the particular failure mechanism. In other
words, S
1
may be voltage or ln (voltage) or some
other function of voltage.
Many well-known models are simplified versions of the
Eyring model with appropriate functions of relevant stresses
chosen for S
1
and S
2
. Some of these will be shown in the
Other Models section. The trick is to find the right
simplification to use for a particular failure mechanism.
8.1.5.3. Other models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr153.htm[6/27/2012 2:49:08 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.5. What are some common acceleration models?
8.1.5.3. Other models
Many
useful 1, 2
and 3
stress
models are
simple
Eyring
models.
Six are
described
This section will discuss several acceleration models whose
successful use has been described in the literature.
The (Inverse) Power Rule for Voltage
The Exponential Voltage Model
Two Temperature/Voltage Models
The Electromigration Model
Three Stress Models (Temperature, Voltage and
Humidity)
The Coffin-Manson Mechanical Crack Growth Model
The (Inverse) Power Rule for Voltage
This model, used for capacitors, has only voltage dependency
and takes the form:
This is a very simplified Eyring model with , H, and C all
0, and S = lnV, and = -B.
The Exponential Voltage Model
In some cases, voltage dependence is modeled better with an
exponential model:
Two Temperature/Voltage Models
Temperature/Voltage models are common in the literature and
take one of the two forms given below:
8.1.5.3. Other models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr153.htm[6/27/2012 2:49:08 PM]
Again, these are just simplified two stress Eyring models with
the appropriate choice of constants and functions of voltage.
The Electromigration Model
Electromigration is a semiconductor failure mechanism where
open failures occur in metal thin film conductors due to the
movement of ions toward the anode. This ionic movement is
accelerated high temperatures and high current density. The
(modified Eyring) model takes the form
with J denoting the current density. H is typically between
.5 and 1.2 electron volts, while an n around 2 is common.
Three-Stress Models (Temperature, Voltage and
Humidity)
Humidity plays an important role in many failure mechanisms
that depend on corrosion or ionic movement. A common 3-
stress model takes the form
Here RH is percent relative humidity. Other obvious variations
on this model would be to use an exponential voltage term
and/or an exponential RH term.
Even this simplified Eyring 3-stress model has 4 unknown
parameters and an extensive experimental setup would be
required to fit the model and calculate acceleration factors.
The
Coffin-
Manson
Model is a
useful
non-
Eyring
model for
crack
growth or
material
fatigue
The Coffin-Manson Mechanical Crack Growth Model
Models for mechanical failure, material fatigue or material
deformation are not forms of the Eyring model. These models
typically have terms relating to cycles of stress or frequency of
use or change in temperatures. A model of this type known as
the (modified) Coffin-Manson model has been used
successfully to model crack growth in solder and other metals
due to repeated temperature cycling as equipment is turned on
and off. This model takes the form
with
N
f
= the number of cycles to fail
8.1.5.3. Other models
http://www.itl.nist.gov/div898/handbook/apr/section1/apr153.htm[6/27/2012 2:49:08 PM]
f = the cycling frequency
T = the temperature range during a cycle
and G(T
max
) is an Arrhenius term evaluated at the maximum
temperature reached in each cycle.
Typical values for the cycling frequency exponent and the
temperature range exponent are around -1/3 and 2,
respectively (note that reducing the cycling frequency reduces
the number of cycles to failure). The H activation energy
term in G(T
max
) is around 1.25.
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
http://www.itl.nist.gov/div898/handbook/apr/section1/apr16.htm[6/27/2012 2:49:09 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution
models used for non-repairable
populations?
A handful
of lifetime
distribution
models
have
enjoyed
great
practical
success
There are a handful of parametric models that have
successfully served as population models for failure times
arising from a wide range of products and failure
mechanisms. Sometimes there are probabilistic arguments
based on the physics of the failure mode that tend to justify
the choice of model. Other times the model is used solely
because of its empirical success in fitting actual failure data.
Seven models will be described in this section:
1. Exponential
2. Weibull
3. Extreme Value
4. Lognormal
5. Gamma
6. Birnbaum-Saunders
7. Proportional hazards
8.1.6.1. Exponential
http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm[6/27/2012 2:49:10 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
8.1.6.1. Exponential
All the key
formulas
for using
the
exponential
model
Formulas and Plots
The exponential model, with only one unknown parameter, is the simplest of all
life distribution models. The key equations for the exponential are shown below:
Note that the failure rate reduces to the constant for any time. The exponential
distribution is the only distribution to have a constant failure rate. Also, another
name for the exponential mean is the Mean Time To Fail or MTTF and we
have MTTF = 1/.
The cumulative hazard function for the exponential is just the integral of the
failure rate or H(t) = t.
The PDF for the exponential has the familiar shape shown below.
The
Exponential
distribution
'shape'
8.1.6.1. Exponential
http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm[6/27/2012 2:49:10 PM]
The
Exponential
CDF
Below is an example of typical exponential lifetime data displayed in Histogram
form with corresponding exponential PDF drawn through the histogram.
Histogram
of
Exponential
Data
8.1.6.1. Exponential
http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm[6/27/2012 2:49:10 PM]
The
Exponential
models the
flat portion
of the
"bathtub"
curve -
where most
systems
spend most
of their
'lives'
Uses of the Exponential Distribution Model
1. Because of its constant failure rate property, the exponential distribution is
an excellent model for the long flat "intrinsic failure" portion of the
Bathtub Curve. Since most components and systems spend most of their
lifetimes in this portion of the Bathtub Curve, this justifies frequent use of
the exponential distribution (when early failures or wear out is not a
concern).
2. Just as it is often useful to approximate a curve by piecewise straight line
segments, we can approximate any failure rate curve by week-by-week or
month-by-month constant rates that are the average of the actual changing
rate during the respective time durations. That way we can approximate
any model by piecewise exponential distribution segments patched
together.
3. Some natural phenomena have a constant failure rate (or occurrence rate)
property; for example, the arrival rate of cosmic ray alpha particles or
Geiger counter tics. The exponential model works well for inter arrival
times (while the Poisson distribution describes the total number of events
in a given period). When these events trigger failures, the exponential life
distribution model will naturally apply.
Exponential
probability
plot
We can generate a probability plot of normalized exponential data, so that a
perfect exponential fit is a diagonal line with slope 1. The probability plot for
100 normalized random exponential observations ( = 0.01) is shown below.
8.1.6.1. Exponential
http://www.itl.nist.gov/div898/handbook/apr/section1/apr161.htm[6/27/2012 2:49:10 PM]
We can calculate the exponential PDF and CDF at 100 hours for the case where
= 0.01. The PDF value is 0.0037 and the CDF value is 0.6321.
Functions for computing exponential PDF values, CDF values, and for producing
probability plots, are found in both Dataplot code and R code.
8.1.6.2. Weibull
http://www.itl.nist.gov/div898/handbook/apr/section1/apr162.htm[6/27/2012 2:49:11 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
8.1.6.2. Weibull
Weibull
Formulas
Formulas and Plots
The Weibull is a very flexible life distribution model with two parameters. It has CDF and
PDF and other key formulas given by:
with the scale parameter (the Characteristic Life), (gamma) the Shape Parameter, and
is the Gamma function with (N) = (N-1)! for integer N.
The cumulative hazard function for the Weibull is the integral of the failure rate or
A more general three-parameter form of the Weibull includes an additional waiting time
parameter (sometimes called a shift or location parameter). The formulas for the 3-
parameter Weibull are easily obtained from the above formulas by replacing t by (t - )
wherever t appears. No failure can occur before hours, so the time scale starts at , and not
0. If a shift parameter is known (based, perhaps, on the physics of the failure mode), then all
you have to do is subtract from all the observed failure times and/or readout times and
8.1.6.2. Weibull
http://www.itl.nist.gov/div898/handbook/apr/section1/apr162.htm[6/27/2012 2:49:11 PM]
analyze the resulting shifted data with a two-parameter Weibull.
NOTE: Various texts and articles in the literature use a variety of different symbols for the
same Weibull parameters. For example, the characteristic life is sometimes called c ( = nu or
= eta) and the shape parameter is also called m (or = beta). To add to the confusion, some
software uses as the characteristic life parameter and as the shape parameter. Some
authors even parameterize the density function differently, using a scale parameter .
Special Case: When = 1, the Weibull reduces to the Exponential Model, with = 1/ = the
mean time to fail (MTTF).
Depending on the value of the shape parameter , the Weibull model can empirically fit a
wide range of data histogram shapes. This is shown by the PDF example curves below.
Weibull
data
'shapes'
From a failure rate model viewpoint, the Weibull is a natural extension of the constant failure
rate exponential model since the Weibull has a polynomial failure rate with exponent { - 1}.
This makes all the failure rate curves shown in the following plot possible.
Weibull
failure rate
'shapes'
8.1.6.2. Weibull
http://www.itl.nist.gov/div898/handbook/apr/section1/apr162.htm[6/27/2012 2:49:11 PM]
The Weibull
is very
flexible and
also has
theoretical
justification
in many
applications
Uses of the Weibull Distribution Model
1. Because of its flexible shape and ability to model a wide range of failure rates, the
Weibull has been used successfully in many applications as a purely empirical model.
2. The Weibull model can be derived theoretically as a form of Extreme Value
Distribution, governing the time to occurrence of the "weakest link" of many competing
failure processes. This may explain why it has been so successful in applications such
as capacitor, ball bearing, relay and material strength failures.
3. Another special case of the Weibull occurs when the shape parameter is 2. The
distribution is called the Rayleigh Distribution and it turns out to be the theoretical
probability model for the magnitude of radial error when the x and y coordinate errors
are independent normals with 0 mean and the same standard deviation.
Weibull
probability
plot
We generated 100 Weibull random variables using T = 1000, = 1.5 and = 5000. To see
how well these random Weibull data points are actually fit by a Weibull distribution, we
generated the probability plot shown below. Note the log scale used is base 10.
8.1.6.2. Weibull
http://www.itl.nist.gov/div898/handbook/apr/section1/apr162.htm[6/27/2012 2:49:11 PM]
If the data follow a Weibull distribution, the points should follow a straight line.
We can comput the PDF and CDF values for failure time T = 1000, using the example
Weibull distribution with = 1.5 and = 5000. The PDF value is 0.000123 and the CDF
value is 0.08556.
Functions for computing Weibull PDF values, CDF values, and for producing probability
plots, are found in both Dataplot code and R code.
8.1.6.3. Extreme value distributions
http://www.itl.nist.gov/div898/handbook/apr/section1/apr163.htm[6/27/2012 2:49:12 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
8.1.6.3. Extreme value distributions
The Extreme
Value
Distribution
usually
refers to the
distribution
of the
minimum of
a large
number of
unbounded
random
observations
Description, Formulas and Plots
We have already referred to Extreme Value Distributions when describing the uses of the
Weibull distribution. Extreme value distributions are the limiting distributions for the
minimum or the maximum of a very large collection of random observations from the same
arbitrary distribution. Gumbel (1958) showed that for any well-behaved initial distribution
(i.e., F(x) is continuous and has an inverse), only a few models are needed, depending on
whether you are interested in the maximum or the minimum, and also if the observations are
bounded above or below.
In the context of reliability modeling, extreme value distributions for the minimum are
frequently encountered. For example, if a system consists of n identical components in series,
and the system fails when the first of these components fails, then system failure times are the
minimum of n random component failure times. Extreme value theory says that, independent
of the choice of component model, the system model will approach a Weibull as n becomes
large. The same reasoning can also be applied at a component level, if the component failure
occurs when the first of many similar competing failure processes reaches a critical level.
The distribution often referred to as the Extreme Value Distribution (Type I) is the limiting
distribution of the minimum of a large number of unbounded identically distributed random
variables. The PDF and CDF are given by:
Extreme
Value
Distribution
formulas
and PDF
shapes
If the x values are bounded below (as is the case with times of failure) then the limiting
distribution is the Weibull. Formulas and uses of the Weibull have already been discussed.
PDF Shapes for the (minimum) Extreme Value Distribution (Type I) are shown in the
following figure.
8.1.6.3. Extreme value distributions
http://www.itl.nist.gov/div898/handbook/apr/section1/apr163.htm[6/27/2012 2:49:12 PM]
The natural
log of
Weibull
data is
extreme
value data
Uses of the Extreme Value Distribution Model
1. In any modeling application for which the variable of interest is the minimum of many
random factors, all of which can take positive or negative values, try the extreme value
distribution as a likely candidate model. For lifetime distribution modeling, since failure
times are bounded below by zero, the Weibull distribution is a better choice.
2. The Weibull distribution and the extreme value distribution have a useful mathematical
relationship. If t
1
, t
2
, ...,t
n
are a sample of random times of fail from a Weibull
distribution, then ln t
1
, ln t
2
, ...,ln t
n
are random observations from the extreme value
distribution. In other words, the natural log of a Weibull random time is an extreme
value random observation.
Because of this relationship, computer programs designed for the extreme value
distribution can be used to analyze Weibull data. The situation exactly parallels using
normal distribution programs to analyze lognormal data, after first taking natural
logarithms of the data points.
Probability
plot for the
extreme
value
distribution
Assume = ln 200,000 = 12.206 and = 1/2 = 0.5. The extreme value distribution associated
with these parameters could be obtained by taking natural logarithms of data from a Weibull
population with characteristic life = 200,000 and shape = 2.
We generate 100 random numbers from this extreme value distribution and construct the
following probability plot.
8.1.6.3. Extreme value distributions
http://www.itl.nist.gov/div898/handbook/apr/section1/apr163.htm[6/27/2012 2:49:12 PM]
Data from an extreme value distribution will line up approximately along a straight line when
this kind of plot is constructed. The slope of the line is an estimate of , "y-axis" value on the
line corresponding to the "x-axis" 0 point is an estimate of . For the graph above, these turn
out to be very close to the actual values of and .
For the example extreme value distribution with = ln 200,000 = 12.206 and = 1/2 = 0.5,
the PDF values corresponding to the points 5, 8, 10, 12, 12.8. are 0.110E-5, 0.444E-3, 0.024,
0.683 and 0.247. and the CDF values corresponding to the same points are 0.551E-6, 0.222E-
3, 0.012, 0.484 and 0.962.
Functions for computing extreme value distribution PDF values, CDF values, and for
producing probability plots, are found in both Dataplot code and R code.
8.1.6.4. Lognormal
http://www.itl.nist.gov/div898/handbook/apr/section1/apr164.htm[6/27/2012 2:49:13 PM]
8. Assessing Product Reliability
8.1. Introduction
8.1.6. What are the basic lifetime distribution models used for non-repairable populations?
8.1.6.4. Lognormal
Lognormal
Formulas and
relationship
to the normal
distribution
Formulas and Plots
The lognormal life distribution, like the Weibull, is a very flexible model that can empirically
fit many types of failure data. The two-parameter form has parameters is the shape
parameter and T
50
is the median (a scale parameter).
Note: If time to failure, t
f
, has a lognormal distribution, then the (natural) logarithm of time to
failure has a normal distribution with mean = ln T
50
and standard deviation . This makes
lognormal data convenient to work with; just take natural logarithms of all the failure times
and censoring times and analyze the resulting normal data. Later on, convert back to real time
and lognormal parameters using as the lognormal shape and T
50
= e
, so a
plot of y versus x on a log-log scale should resemble a straight line with
slope if the Weibull model is appropriate. The cumulative hazard plot
for the Weibull distribution is shown below.
A least-squares regression fit of the data (using base 10 logarithms to
transform columns (1) and (6)) indicates that the estimated slope for the
Weibull distribution is 1.27, which is fairly similar to the exponential
model slope of 1. The Weibull fit looks somewhat better than the
exponential fit; however, with a sample of just 10, and only 6 failures, it
8.2.2.2. Hazard and cumulative hazard plotting
http://www.itl.nist.gov/div898/handbook/apr/section2/apr222.htm[6/27/2012 2:49:39 PM]
is difficult to pick a model from the data alone.
Software The analyses in this section can can be implemented using both Dataplot
code and R code.
8.2.2.3. Trend and growth plotting (Duane plots)
http://www.itl.nist.gov/div898/handbook/apr/section2/apr223.htm[6/27/2012 2:49:41 PM]
8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.2. How do you plot reliability data?
8.2.2.3. Trend and growth plotting (Duane plots)
Repair rates
are typically
either nearly
constant
over time or
else
consistently
follow a
good or a
bad trend
Models for repairable systems were described earlier. These models are
for the cumulative number of failuress (or the repair rate) over time. The
two models used with most success throughout industry are the HPP
(constant repair rate or "exponential" system model) and the NHPP
Power Law process (the repair rate is the polynomial m(t) = t
-
).
Before constructing a Duane Plot, there are a few simple trend plots that
often convey strong evidence of the presence or absence of a trend in the
repair rate over time. If there is no trend, an HPP model is reasonable. If
there is an apparent improvement or degradation trend, a Duane Plot will
provide a visual check for whether the NHPP Power law model is
consistent with the data.
A few simple
plots can
help us
decide
whether
trends are
present
These simple visual graphical tests for trends are
1. Plot cumulative failures versus system age (a step function that
goes up every time there is a new failure). If this plot looks linear,
there is no obvious improvement (or degradation) trend. A bending
downward indicates improvement; bending upward indicates
degradation.
2. Plot the inter arrival times between new failures (in other words,
the waiting times between failures, with the time to the first failure
used as the first "inter-arrival" time). If these trend up, there is
improvement; if they trend down, there is degradation.
3. Plot the reciprocals of the inter-arrival times. Each reciprocal is a
new failure rate estimate based only on the waiting time since the
last failure. If these trend down, there is improvement; an upward
trend indicates degradation.
Trend plots
and a Duane
Plot for
actual
Reliability
Improvement
Test data
Case Study 1: Use of Trend Plots and Duane Plots with Reliability
Improvement Test Data
A prototype of a new, complex piece of equipment went through a 1500
operational hours Reliability Improvement Test. During the test there
were 10 failures. As part of the improvement process, a cross functional
Failure Review Board made sure every failure was analyzed down to the
root cause and design and parts selection fixes were implemented on the
prototype. The observed failure times were: 5, 40, 43, 175, 389, 712, 747,
795, 1299 and 1478 hours, with the test ending at 1500 hours. The
reliability engineer on the Failure Review Board first made trend plots as
described above, then made a Duane plot. These plots follow.
8.2.2.3. Trend and growth plotting (Duane plots)
http://www.itl.nist.gov/div898/handbook/apr/section2/apr223.htm[6/27/2012 2:49:41 PM]
8.2.2.3. Trend and growth plotting (Duane plots)
http://www.itl.nist.gov/div898/handbook/apr/section2/apr223.htm[6/27/2012 2:49:41 PM]
Time Cum MTBF
5 5
40 20
43 14.3
175 43.75
389 77.8
712 118.67
747 106.7
795 99.4
1299 144.3
1478 147.8
8.2.2.3. Trend and growth plotting (Duane plots)
http://www.itl.nist.gov/div898/handbook/apr/section2/apr223.htm[6/27/2012 2:49:41 PM]
Comments: The three trend plots all show an improvement trend. The
reason it might be useful to try all three trend plots is that a trend might
show up more clearly on one plot than the others. Formal statistical tests
on the significance of this visual evidence of a trend will be shown in the
section on Trend Tests.
The points on the Duane Plot line up roughly as a straight line, indicating
the NHPP Power Law model is consistent with the data.
Estimates for the reliability growth slope and the MTBF at the end of
this test for this case study will be given in a later section.
8.2.3. How can you test reliability model assumptions?
http://www.itl.nist.gov/div898/handbook/apr/section2/apr23.htm[6/27/2012 2:49:42 PM]
8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.3. How can you test reliability model
assumptions?
Models
are
frequently
necessary
- but
should
always be
checked
Since reliability models are often used to project (extrapolate)
failure rates or MTBF's that are well beyond the range of the
reliability data used to fit these models, it is very important to
"test" whether the models chosen are consistent with whatever
data are available. This section describes several ways of
deciding whether a model under examination is acceptable.
These are:
1. Visual Tests
2. Goodness of Fit Tests
3. Likelihood Ratio Tests
4. Trend Tests
8.2.3.1. Visual tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr231.htm[6/27/2012 2:49:43 PM]
8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.3. How can you test reliability model assumptions?
8.2.3.1. Visual tests
A visual
test of a
model is a
simple plot
that tells us
at a glance
whether the
model is
consistent
with the
data
We have already seen many examples of visual tests of
models. These were: Probability Plots, Cum hazard Plots,
Duane Plots and Trend Plots. In all but the Trend Plots, the
model was "tested' by how well the data points followed a
straight line. In the case of the Trend Plots, we looked for
curvature away from a straight line (cum repair plots) or
increasing or decreasing size trends (inter arrival times and
reciprocal inter-arrival times).
These simple plots are a powerful diagnostic tool since the
human eye can often detect patterns or anomalies in the data
by studying graphs. That kind of invaluable information
would be lost if the analyst only used quantitative statistical
tests to check model fit. Every analysis should include as
many visual tests as are applicable.
Advantages of Visual Tests
1. Easy to understand and explain.
2. Can occasionally reveal patterns or anomalies in the
data.
3. When a model "passes" a visual test, it is somewhat
unlikely any quantitative statistical test will "reject" it
(the human eye is less forgiving and more likely to
detect spurious trends)
Combine
visual tests
with formal
quantitative
tests for the
"best of
both
worlds"
approach
Disadvantages of Visual Tests
1. Visual tests are subjective.
2. They do not quantify how well or how poorly a model
fits the data.
3. They are of little help in choosing between two or more
competing models that both appear to fit the data.
4. Simulation studies have shown that correct models may
often appear to not fit well by sheer chance - it is hard
to know when visual evidence is strong enough to
reject what was previously believed to be a correct
model.
You can retain the advantages of visual tests and remove
their disadvantages by combining data plots with formal
8.2.3.1. Visual tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr231.htm[6/27/2012 2:49:43 PM]
statistical tests of goodness of fit or trend.
8.2.3.2. Goodness of fit tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr232.htm[6/27/2012 2:49:43 PM]
8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.3. How can you test reliability model assumptions?
8.2.3.2. Goodness of fit tests
A
Goodness
of Fit test
checks on
whether
your data
are
reasonable
or highly
unlikely,
given an
assumed
distribution
model
General tests for checking the hypothesis that your data are
consistent with a particular model are discussed in Chapter 7.
Details and examples of the Chi-Square Goodness of Fit test
and the Kolmolgorov-Smirnov (K-S) test are given in
Chapter 1. The Chi-Square test can be used with Type I or
Type II censored data and readout data if there are enough
failures and readout times. The K-S test generally requires
complete samples, which limits its usefulness in reliability
analysis.
These tests control the probability of rejecting a valid model
as follows:
the analyst chooses a confidence level designated by
100 (1 - ).
a test statistic is calculated from the data and compared
to likely values for this statistic, assuming the model is
correct.
if the test statistic has a very unlikely value, or less than
or equal to an probability of occurring, where is a
small value like .1 or .05 or even .01, then the model is
rejected.
So the risk of rejecting the right model is kept to or less,
and the choice of usually takes into account the potential
loss or difficulties incurred if the model is rejected.
8.2.3.3. Likelihood ratio tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr233.htm[6/27/2012 2:49:44 PM]
8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.3. How can you test reliability model assumptions?
8.2.3.3. Likelihood ratio tests
Likelihood
Ratio Tests
are a
powerful,
very general
method of
testing
model
assumptions.
However,
they require
special
software,
not always
readily
available.
Likelihood functions for reliability data are described in
Section 4. Two ways we use likelihood functions to choose
models or verify/validate assumptions are:
1. Calculate the maximum likelihood of the sample data
based on an assumed distribution model (the maximum
occurs when unknown parameters are replaced by their
maximum likelihood estimates). Repeat this calculation for
other candidate distribution models that also appear to fit the
data (based on probability plots). If all the models have the
same number of unknown parameters, and there is no
convincing reason to choose one particular model over
another based on the failure mechanism or previous
successful analyses, then pick the model with the largest
likelihood value.
2. Many model assumptions can be viewed as putting
restrictions on the parameters in a likelihood expression that
effectively reduce the total number of unknown parameters.
Some common examples are:
Examples
where
assumptions
can be
tested by the
Likelihood
Ratio Test
i) It is suspected that a type of data, typically
modeled by a Weibull distribution, can be fit
adequately by an exponential model. The
exponential distribution is a special case of the
Weibull, with the shape parameter set to 1. If
we write the Weibull likelihood function for the
data, the exponential model likelihood function
is obtained by setting to 1, and the number of
unknown parameters has been reduced from two
to one.
ii) Assume we have n cells of data from an
acceleration test, with each cell having a
different operating temperature. We assume a
lognormal population model applies in every
cell. Without an acceleration model assumption,
the likelihood of the experimental data would be
the product of the likelihoods from each cell
and there would be 2n unknown parameters (a
different T
50
and for each cell). If we assume
an Arrhenius model applies, the total number of
8.2.3.3. Likelihood ratio tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr233.htm[6/27/2012 2:49:44 PM]
parameters drops from 2n to just 3, the single
common and the Arrhenius A and H
parameters. This acceleration assumption
"saves" (2n-3) parameters.
iii) We life test samples of product from two
vendors. The product is known to have a failure
mechanism modeled by the Weibull distribution,
and we want to know whether there is a
difference in reliability between the vendors.
The unrestricted likelihood of the data is the
product of the two likelihoods, with 4 unknown
parameters (the shape and characteristic life for
each vendor population). If, however, we
assume no difference between vendors, the
likelihood reduces to having only two unknown
parameters (the common shape and the common
characteristic life). Two parameters are "lost" by
the assumption of "no difference".
Clearly, we could come up with many more examples like
these three, for which an important assumption can be
restated as a reduction or restriction on the number of
parameters used to formulate the likelihood function of the
data. In all these cases, there is a simple and very useful way
to test whether the assumption is consistent with the data.
The Likelihood Ratio Test Procedure
Details of
the
Likelihood
Ratio Test
procedure
In general,
calculations
are difficult
and need to
be built into
the software
you use
Let L
1
be the maximum value of the likelihood of the data
without the additional assumption. In other words, L
1
is the
likelihood of the data with all the parameters unrestricted
and maximum likelihood estimates substituted for these
parameters.
Let L
0
be the maximum value of the likelihood when the
parameters are restricted (and reduced in number) based on
the assumption. Assume k parameters were lost (i.e., L
0
has
k less parameters than L
1
).
Form the ratio = L
0
/L
1
. This ratio is always between 0
and 1 and the less likely the assumption is, the smaller
will be. This can be quantified at a given confidence level as
follows:
1. Calculate = -2 ln . The smaller is, the larger
will be.
2. We can tell when is significantly large by
comparing it to the 100 (1- ) percentile point of a
Chi Square distribution with k degrees of freedom.
has an approximate Chi-Square distribution with k
degrees of freedom and the approximation is usually
8.2.3.3. Likelihood ratio tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr233.htm[6/27/2012 2:49:44 PM]
good, even for small sample sizes.
3. The likelihood ratio test computes and rejects the
assumption if is larger than a Chi-Square
percentile with k degrees of freedom, where the
percentile corresponds to the confidence level chosen
by the analyst.
Note: While Likelihood Ratio test procedures are very
useful and widely applicable, the computations are difficult
to perform by hand, especially for censored data, and
appropriate software is necessary.
8.2.3.4. Trend tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr234.htm[6/27/2012 2:49:45 PM]
8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.3. How can you test reliability model assumptions?
8.2.3.4. Trend tests
Formal
Trend Tests
should
accompany
Trend Plots
and Duane
Plots. Three
are given in
this section
In this section we look at formal statistical tests that can
allow us to quantitatively determine whether or not the
repair times of a system show a significant trend (which
may be an improvement or a degradation trend). The
section on trend and growth plotting contained a discussion
of visual tests for trends - this section complements those
visual tests as several numerical tests are presented.
Three statistical test procedures will be described:
1. The Reverse Arrangement Test (a simple and useful
test that has the advantage of making no assumptions
about a model for the possible trend)
2. The Military Handbook Test (optimal for
distinguishing between "no trend' and a trend
following the NHPP Power Law or Duane model)
3. The Laplace Test (optimal for distinguishing between
"no trend' and a trend following the NHPP
Exponential Law model)
The Reverse
Arrangement
Test (RAT
test) is simple
and makes no
assumptions
about what
model a trend
might follow
The Reverse Arrangement Test
Assume there are r repairs during the observation period
and they occurred at system ages T
1
, T
2
, T
3
, ...T
r
(we set the
start of the observation period to T = 0). Let I
1
= T
1
,
I
2
= T
2
- T
1
, I
3
= T
3
- T
2
, ..., I
r
= T
r
- T
r-1
be the inter-
arrival times for repairs (i.e., the sequence of waiting times
between failures). Assume the observation period ends at
time T
end
>T
r
.
Previously, we plotted this sequence of inter-arrival times
to look for evidence of trends. Now, we calculate how
many instances we have of a later inter-arrival time being
strictly greater than an earlier inter-arrival time. Each time
that happens, we call it a reversal. If there are a lot of
reversals (more than are likely from pure chance with no
trend), we have significant evidence of an improvement
trend. If there are too few reversals we have significant
evidence of degradation.
8.2.3.4. Trend tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr234.htm[6/27/2012 2:49:45 PM]
A formal definition of the reversal count and some
properties of this count are:
count a reversal every time I
j
< I
k
for some j and k
with j < k
this reversal count is the total number of reversals R
for r repair times, the maximum possible number of
reversals is r(r-1)/2
if there are no trends, on the average one would
expect to have r(r-1)/4 reversals.
As a simple example, assume we have 5 repair times at
system ages 22, 58, 71, 156 and 225, and the observation
period ended at system age 300 . First calculate the inter
arrival times and obtain: 22, 36, 13, 85, 69. Next, count
reversals by "putting your finger" on the first inter-arrival
time, 22, and counting how many later inter arrival times
are greater than that. In this case, there are 3. Continue by
"moving your finger" to the second time, 36, and counting
how many later times are greater. There are exactly 2.
Repeating this for the third and fourth inter-arrival times
(with many repairs, your finger gets very tired!) we obtain 2
and 0 reversals, respectively. Adding 3 + 2 + 2 + 0 = 7, we
see that R = 7. The total possible number of reversals is
5x4/2 = 10 and an "average" number is half this, or 5.
In the example, we saw 7 reversals (2 more than average).
Is this strong evidence for an improvement trend? The
following table allows us to answer that at a 90% or 95%
or 99% confidence level - the higher the confidence, the
stronger the evidence of improvement (or the less likely that
pure chance alone produced the result).
A useful table
to check
whether a
reliability
test has
demonstrated
significant
improvement
Value of R Indicating Significant Improvement (One-Sided
Test)
Number
of
Repairs
Minimum R for
90% Evidence
of
Improvement
Minimum R for
95% Evidence
of
Improvement
Minimum R for
99% Evidence
of
Improvement
4 6 6 -
5 9 9 10
6 12 13 14
7 16 17 19
8 20 22 24
9 25 27 30
10 31 33 36
11 37 39 43
12 43 46 50
One-sided test means before looking at the data we
8.2.3.4. Trend tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr234.htm[6/27/2012 2:49:45 PM]
expected improvement trends, or, at worst, a constant repair
rate. This would be the case if we know of actions taken to
improve reliability (such as occur during reliability
improvement tests).
For the r = 5 repair times example above where we had R =
7, the table shows we do not (yet) have enough evidence to
demonstrate a significant improvement trend. That does not
mean that an improvement model is incorrect - it just means
it is not yet "proved" statistically. With small numbers of
repairs, it is not easy to obtain significant results.
For numbers of repairs beyond 12, there is a good
approximation formula that can be used to determine
whether R is large enough to be significant. Calculate
Use this
formula when
there are
more than 12
repairs in the
data set
and if z > 1.282, we have at least 90% significance. If z >
1.645, we have 95% significance, and a z > 2.33 indicates
99% significance since z has an approximate standard
normal distribution.
That covers the (one-sided) test for significant
improvement trends. If, on the other hand, we believe there
may be a degradation trend (the system is wearing out or
being over stressed, for example) and we want to know if
the data confirms this, then we expect a low value for R and
we need a table to determine when the value is low enough
to be significant. The table below gives these critical values
for R.
Value of R Indicating Significant Degradation Trend (One-
Sided Test)
Number
of
Repairs
Maximum R
for 90%
Evidence of
Degradation
Maximum R
for 95%
Evidence of
Degradation
Maximum R
for 99%
Evidence of
Degradation
4 0 0 -
5 1 1 0
6 3 2 1
7 5 4 2
8 8 6 4
9 11 9 6
10 14 12 9
11 18 16 12
8.2.3.4. Trend tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr234.htm[6/27/2012 2:49:45 PM]
12 23 20 16
For numbers of repairs r >12, use the approximation
formula above, with R replaced by [r(r-1)/2 - R].
Because of
the success of
the Duane
model with
industrial
improvement
test data, this
Trend Test is
recommended
The Military Handbook Test
This test is better at finding significance when the choice is
between no trend and a NHPP Power Law (Duane) model.
In other words, if the data come from a system following
the Power Law, this test will generally do better than any
other test in terms of finding significance.
As before, we have r times of repair T
1
, T
2
, T
3
, ...T
r
with
the observation period ending at time T
end
>T
r
. Calculate
and compare this to percentiles of the chi-square
distribution with 2r degrees of freedom. For a one-sided
improvement test, reject no trend (or HPP) in favor of an
improvement trend if the chi square value is beyond the 90
(or 95, or 99) percentile. For a one-sided degradation test,
reject no trend if the chi-square value is less than the 10 (or
5, or 1) percentile.
Applying this test to the 5 repair times example, the test
statistic has value 13.28 with 10 degrees of freedom, and
the chi-square percentile is 79%.
The Laplace Test
This test is better at finding significance when the choice is
between no trend and a NHPP Exponential model. In other
words, if the data come from a system following the
Exponential Law, this test will generally do better than any
test in terms of finding significance.
As before, we have r times of repair T
1
, T
2
, T
3
, ...T
r
with
the observation period ending at time T
end
>T
r
. Calculate
and compare this to high (for improvement) or low (for
degradation) percentiles of the standard normal distribution.
Formal tests Case Study 1: Reliability Test Improvement Data
8.2.3.4. Trend tests
http://www.itl.nist.gov/div898/handbook/apr/section2/apr234.htm[6/27/2012 2:49:45 PM]
generally
confirm the
subjective
information
conveyed by
trend plots
(Continued from earlier work)
The failure data and Trend plots and Duane plot were
shown earlier. The observed failure times were: 5, 40, 43,
175, 389, 712, 747, 795, 1299 and 1478 hours, with the test
ending at 1500 hours.
Reverse Arrangement Test: The inter-arrival times are: 5,
35, 3, 132, 214, 323, 35, 48, 504 and 179. The number of
reversals is 33, which, according to the table above, is just
significant at the 95% level.
The Military Handbook Test: The Chi-Square test
statistic, using the formula given above, is 37.23 with 20
degrees of freedom and has significance level 98.9%. Since
the Duane Plot looked very reasonable, this test probably
gives the most precise significance assessment of how
unlikely it is that sheer chance produced such an apparent
improvement trend (only about 1.1% probability).
8.2.4. How do you choose an appropriate physical acceleration model?
http://www.itl.nist.gov/div898/handbook/apr/section2/apr24.htm[6/27/2012 2:49:47 PM]
8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.4. How do you choose an appropriate
physical acceleration model?
Choosing a
good
acceleration
model is
part science
and part art
- but start
with a good
literature
search
Choosing a physical acceleration model is a lot like choosing
a life distribution model. First identify the failure mode and
what stresses are relevant (i.e., will accelerate the failure
mechanism). Then check to see if the literature contains
examples of successful applications of a particular model for
this mechanism.
If the literature offers little help, try the models described in
earlier sections :
Arrhenius
The (inverse) power rule for voltage
The exponential voltage model
Two temperature/voltage models
The electromigration model
Three stress models (temperature, voltage and
humidity)
Eyring (for more than three stresses or when the above
models are not satisfactory)
The Coffin-Manson mechanical crack growth model
All but the last model (the Coffin-Manson) apply to chemical
or electronic failure mechanisms, and since temperature is
almost always a relevant stress for these mechanisms, the
Arrhenius model is nearly always a part of any more general
model. The Coffin-Manson model works well for many
mechanical fatigue-related mechanisms.
Sometimes models have to be adjusted to include a
threshold level for some stresses. In other words, failure
might never occur due to a particular mechanism unless a
particular stress (temperature, for example) is beyond a
threshold value. A model for a temperature-dependent
mechanism with a threshold at T = T
0
might look like
time to fail = f(T)/(T-T
0
)
for which f(T) could be Arrhenius. As the temperature
decreases towards T
0
, time to fail increases toward infinity
in this (deterministic) acceleration model.
8.2.4. How do you choose an appropriate physical acceleration model?
http://www.itl.nist.gov/div898/handbook/apr/section2/apr24.htm[6/27/2012 2:49:47 PM]
Models
derived
theoretically
have been
very
successful
and are
convincing
In some cases, a mathematical/physical description of the
failure mechanism can lead to an acceleration model. Some
of the models above were originally derived that way.
Simple
models are
often the
best
In general, use the simplest model (fewest parameters) you
can. When you have chosen a model, use visual tests and
formal statistical fit tests to confirm the model is consistent
with your data. Continue to use the model as long as it gives
results that "work," but be quick to look for a new model
when it is clear the old one is no longer adequate.
There are some good quotes that apply here:
Quotes from
experts on
models
"All models are wrong, but some are useful." - George Box,
and the principle of Occam's Razor (attributed to the 14th
century logician William of Occam who said Entities should
not be multiplied unnecessarily - or something equivalent to
that in Latin).
A modern version of Occam's Razor is: If you have two
theories that both explain the observed facts then you should
use the simplest one until more evidence comes along - also
called the Law of Parsimony.
Finally, for those who feel the above quotes place too much
emphasis on simplicity, there are several appropriate quotes
from Albert Einstein:
"Make your theory as simple as possible, but no
simpler"
"For every complex question there is a simple
and wrong solution."
8.2.5. What models and assumptions are typically made when Bayesian methods are used for reliability evaluation?
http://www.itl.nist.gov/div898/handbook/apr/section2/apr25.htm[6/27/2012 2:49:48 PM]
8. Assessing Product Reliability
8.2. Assumptions/Prerequisites
8.2.5. What models and assumptions are
typically made when Bayesian methods
are used for reliability evaluation?
The basics of Bayesian methodology were explained earlier,
along with some of the advantages and disadvantages of
using this approach. Here we only consider the models and
assumptions that are commonplace when applying Bayesian
methodology to evaluate system reliability.
Bayesian
assumptions
for the
gamma
exponential
system
model
Assumptions:
1. Failure times for the system under investigation can be
adequately modeled by the exponential distribution. For
repairable systems, this means the HPP model applies and the
system is operating in the flat portion of the bathtub curve.
While Bayesian methodology can also be applied to non-
repairable component populations, we will restrict ourselves
to the system application in this Handbook.
2. The MTBF for the system can be regarded as chosen from
a prior distribution model that is an analytic representation of
our previous information or judgments about the system's
reliability. The form of this prior model is the gamma
distribution (the conjugate prior for the exponential model).
The prior model is actually defined for = 1/MTBF since it
is easier to do the calculations this way.
3. Our prior knowledge is used to choose the gamma
parameters a and b for the prior distribution model for .
There are many possible ways to convert "knowledge" to
gamma parameters, depending on the form of the
"knowledge" - we will describe three approaches.
Several
ways to
choose the
prior
gamma
parameter
values
i) If you have actual data from previous testing done
on the system (or a system believed to have the same
reliability as the one under investigation), this is the
most credible prior knowledge, and the easiest to use.
Simply set the gamma parameter a equal to the total
number of failures from all the previous data, and set
the parameter b equal to the total of all the previous
test hours.
ii) A consensus method for determining a and b that
8.2.5. What models and assumptions are typically made when Bayesian methods are used for reliability evaluation?
http://www.itl.nist.gov/div898/handbook/apr/section2/apr25.htm[6/27/2012 2:49:48 PM]
works well is the following: Assemble a group of
engineers who know the system and its sub-
components well from a reliability viewpoint.
Have the group reach agreement on a reasonable
MTBF they expect the system to have. They
could each pick a number they would be willing
to bet even money that the system would either
meet or miss, and the average or median of these
numbers would be their 50% best guess for the
MTBF. Or they could just discuss even-money
MTBF candidates until a consensus is reached.
Repeat the process again, this time reaching
agreement on a low MTBF they expect the
system to exceed. A "5%" value that they are
"95% confident" the system will exceed (i.e.,
they would give 19 to 1 odds) is a good choice.
Or a "10%" value might be chosen (i.e., they
would give 9 to 1 odds the actual MTBF exceeds
the low MTBF). Use whichever percentile choice
the group prefers.
Call the reasonable MTBF MTBF
50
and the low
MTBF you are 95% confident the system will
exceed MTBF
05
. These two numbers uniquely
determine gamma parameters a and b that have
percentile values at the right locations
We call this method of specifying gamma prior
parameters the 50/95 method (or the 50/90
method if we use MTBF
10
, etc.). A simple way
to calculate a and b for this method is described
below.
iii) A third way of choosing prior parameters starts the
same way as the second method. Consensus is reached
on an reasonable MTBF, MTBF
50
. Next, however, the
group decides they want a somewhatweak prior that
will change rapidly, based on new test information. If
the prior parameter "a" is set to 1, the gamma has a
standard deviation equal to its mean, which makes it
spread out, or "weak". To insure the 50th percentile is
set at
50
= 1/ MTBF
50
, we have to choose b = ln 2
MTBF
50
, which is approximately .6931 MTBF
50
.
Note: As we will see when we plan Bayesian tests, this
weak prior is actually a very friendly prior in terms of
saving test time
8.2.5. What models and assumptions are typically made when Bayesian methods are used for reliability evaluation?
http://www.itl.nist.gov/div898/handbook/apr/section2/apr25.htm[6/27/2012 2:49:48 PM]
Many variations are possible, based on the above three
methods. For example, you might have prior data from
sources that you don't completely trust. Or you might
question whether the data really apply to the system under
investigation. You might decide to "weight" the prior data by
.5, to "weaken" it. This can be implemented by setting a = .5
x the number of fails in the prior data and b = .5 times the
number of test hours. That spreads out the prior distribution
more, and lets it react quicker to new test data.
Consequences
After a new
test is run,
the
posterior
gamma
parameters
are easily
obtained
from the
prior
parameters
by adding
the new
number of
fails to "a"
and the new
test time to
"b"
No matter how you arrive at values for the gamma prior
parameters a and b, the method for incorporating new test
information is the same. The new information is combined
with the prior model to produce an updated or posterior
distribution model for .
Under assumptions 1 and 2, when a new test is run with T
system operating hours and r failures, the posterior
distribution for is still a gamma, with new parameters:
a' = a + r, b' = b + T
In other words, add to a the number of new failures and add
to b the number of new test hours to obtain the new
parameters for the posterior distribution.
Use of the posterior distribution to estimate the system
MTBF (with confidence, or prediction, intervals) is described
in the section on estimating reliability using the Bayesian
gamma model.
Obtaining Gamma Parameters
An example
using the
"50/95"
consensus
method
A group of engineers, discussing the reliability of a new
piece of equipment, decide to use the 50/95 method to
convert their knowledge into a Bayesian gamma prior.
Consensus is reached on a likely MTBF
50
value of 600 hours
and a low MTBF
05
value of 250. RT is 600/250 = 2.4. (Note:
if the group felt that 250 was a MTBF
10
value, instead of a
MTBF
05
value, then the only change needed would be to
replace 0.95 in the B1 equation by 0.90. This would be the
"50/90" method.)
Using software to find the root of a univariate function, the
gamma prior parameters were found to be a = 2.863 and b =
1522.46. The parameters will have (approximately) a
probability of 50% of l being below 1/600 = 0.001667 and a
probability of 95% of being below 1/250 = 0.004. (The
probabilities are based on the 0.001667 and 0.004 quantiles
of a gamma distribution with shape parameter a = 2.863 and
scale parameter b = 1522.46)
8.2.5. What models and assumptions are typically made when Bayesian methods are used for reliability evaluation?
http://www.itl.nist.gov/div898/handbook/apr/section2/apr25.htm[6/27/2012 2:49:48 PM]
The gamma parameter estimates in this example can be
produced using R code.
This example will be continued in Section 3, in which the
Bayesian test time needed to confirm a 500 hour MTBF at
80% confidence will be derived.
8.3. Reliability Data Collection
http://www.itl.nist.gov/div898/handbook/apr/section3/apr3.htm[6/27/2012 2:49:49 PM]
8. Assessing Product Reliability
8.3. Reliability Data Collection
In order to assess or improve reliability, it is usually necessary
to have failure data. Failure data can be obtained from field
studies of system performance or from planned reliability
tests, sometimes called Life Tests. This section focuses on how
to plan reliability tests. The aim is to answer questions such
as: how long should you test, what sample size do you need
and what test conditions or stresses need to be run?
Detailed
contents of
Section 8.3
The section detailed outline follows.
3. Reliability Data Collection
1. How do you plan a reliability assessment test?
1. Exponential life distribution (or HPP model) tests
2. Lognormal or Weibull tests
3. Reliability growth tests (Duane model)
4. Accelerated life tests
5. Bayesian gamma prior model tests
8.3.1. How do you plan a reliability assessment test?
http://www.itl.nist.gov/div898/handbook/apr/section3/apr31.htm[6/27/2012 2:49:49 PM]
8. Assessing Product Reliability
8.3. Reliability Data Collection
8.3.1. How do you plan a reliability assessment
test?
The Plan
for a
reliability
test ends
with a
detailed
description
of the
mechanics
of the test
and starts
with stating
your
assumptions
and what
you want to
discover or
prove
Planning a reliability test means:
How long should you test?
How many units have to be put on test?
For repairable systems, this is often limited to 1.
If acceleration modeling is part of the experimental
plan
What combination of stresses and how many
experimental cells?
How many units go in each cell?
The answers to these questions depend on:
What models are you assuming?
What decisions or conclusions do you want to make
after running the test and analyzing the data?
What risks are you willing to take of making wrong
decisions or conclusions?
It is not always possible, or practical, to completely answer
all of these questions for every model we might want to use.
This section looks at answers, or guidelines, for the following
models:
exponential or HPP Model
Weibull or lognormal model
Duane or NHPP Power Law model
acceleration models
Bayesian gamma prior model
8.3.1.1. Exponential life distribution (or HPP model) tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr311.htm[6/27/2012 2:49:50 PM]
8. Assessing Product Reliability
8.3. Reliability Data Collection
8.3.1. How do you plan a reliability assessment test?
8.3.1.1. Exponential life distribution (or HPP
model) tests
Using an
exponential
(or HPP)
model to
test whether
a system
meets its
MTBF
requirement
is common
in industry
Exponential tests are common in industry for verifying that
tools, systems or equipment are meeting their reliability
requirements for Mean Time Between Failure (MTBF). The
assumption is that the system has a constant failure (or repair)
rate, which is the reciprocal of the MTBF. The waiting time
between failures follows the exponential distribution model.
A typical test situation might be: a new complex piece of
equipment or tool is installed in a factory and monitored
closely for a period of several weeks to several months. If it
has no more than a pre-specified number of failures during
that period, the equipment "passes" its reliability acceptance
test.
This kind of reliability test is often called a Qualification
Test or a Product Reliability Acceptance Test (PRAT).
Contractual penalties may be invoked if the equipment fails
the test. Everything is pegged to meeting a customer MTBF
requirement at a specified confidence level.
How Long Must You Test A Piece of Equipment or a
System In order to Assure a Specified MTBF at a Given
Confidence?
You start with a given MTBF objective, say M, and a
confidence level, say 100 (1- ). You need one more piece
of information to determine the test length: how many fails
do you want to allow and still "pass" the equipment? The
more fails allowed, the longer the test required. However, a
longer test allowing more failures has the desirable feature of
making it less likely a good piece of equipment will be
rejected because of random "bad luck" during the test period.
The recommended procedure is to iterate on r = the number
of allowable fails until a larger r would require an
unacceptable test length. For any choice of r, the
corresponding test length is quickly calculated by multiplying
M (the objective) by the factor in the table below
corresponding to the r-th row and the desired confidence
level column.
8.3.1.1. Exponential life distribution (or HPP model) tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr311.htm[6/27/2012 2:49:50 PM]
For example, to confirm a 200-hour MTBF objective at 90%
confidence, allowing up to 4 failures on the test, the test
length must be 200 7.99 = 1598 hours. If this is
unacceptably long, try allowing only 3 fails for a test length
of 200 6.68 = 1336 hours. The shortest test would allow no
fails and last 200 2.3 = 460 hours. All these tests guarantee
a 200-hour MTBF at 90% confidence, when the equipment
passes. However, the shorter test are much less "fair" to the
supplier in that they have a large chance of failing a
marginally acceptable piece of equipment.
Use the
Test length
Table to
determine
how long to
test
Test Length Guide Table
NUMBER
OF
FAILURES
ALLOWED
FACTOR FOR GIVEN CONFIDENCE LEVELS
r 50% 60% 75% 80% 90% 95%
0 .693 .916 1.39 1.61 2.30 3.00
1 1.68 2.02 2.69 2.99 3.89 4.74
2 2.67 3.11 3.92 4.28 5.32 6.30
3 3.67 4.18 5.11 5.52 6.68 7.75
4 4.67 5.24 6.27 6.72 7.99 9.15
5 5.67 6.29 7.42 7.90 9.28 10.51
6 6.67 7.35 8.56 9.07 10.53 11.84
7 7.67 8.38 9.68 10.23 11.77 13.15
8 8.67 9.43 10.80 11.38 13.00 14.43
9 9.67 10.48 11.91 12.52 14.21 15.70
10 10.67 11.52 13.02 13.65 15.40 16.96
15 15.67 16.69 18.48 19.23 21.29 23.10
20 20.68 21.84 23.88 24.73 27.05 29.06
The formula to calculate the factors in the table is the
following.
Example: A new factory tool must meet a 400-hour MTBF
requirement at 80% confidence. You have up to two months
of 3-shift operation to decide whether the tool is acceptable.
What is a good test plan?
Two months of around-the-clock operation, with some time
off for maintenance and repairs, amounts to a maximum of
about 1300 hours. The 80% confidence factor for r = 1 is
8.3.1.1. Exponential life distribution (or HPP model) tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr311.htm[6/27/2012 2:49:50 PM]
2.99, so a test of 400 2.99 = about 1200 hours (with up to 1
fail allowed) is the best that can be done.
Shorten
required
test times
by testing
more than
one system
NOTE: Exponential test times can be shortened significantly
if several similar tools or systems can be put on test at the
same time. Test time means the same as "tool hours" and one
tool operating for 1000 hours is equivalent (as far as the
exponential model is concerned) to 2 tools operating for 500
hours each, or 10 tools operating for 100 hours each. Just
count all the fails from all the tools and the sum of the test
hours from all the tools.
8.3.1.2. Lognormal or Weibull tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr312.htm[6/27/2012 2:49:51 PM]
8. Assessing Product Reliability
8.3. Reliability Data Collection
8.3.1. How do you plan a reliability assessment test?
8.3.1.2. Lognormal or Weibull tests
Planning
reliability
tests for
distributions
other than
the
exponential
is difficult
and involves
a lot of
guesswork
Planning a reliability test is not simple and straightforward
when the assumed model is lognormal or Weibull. Since
these models have two parameters, no estimates are possible
without at least two test failures, and good estimates require
considerably more than that. Because of censoring, without a
good guess ahead of time at what the unknown parameters
are, any test plan may fail.
However, it is often possible to make a good guess ahead of
time about at least one of the unknown parameters -
typically the "shape" parameter ( for the lognormal or
for the Weibull). With one parameter assumed known, test
plans can be derived that assure the reliability or failure rate
of the product tested will be acceptable.
Lognormal Case (shape parameter known): The
lognormal model is used for many microelectronic wear-out
failure mechanisms, such as electromigration. As a
production monitor, samples of microelectronic chips taken
randomly from production lots might be tested at levels of
voltage and temperature that are high enough to significantly
accelerate the occurrence of electromigration failures.
Acceleration factors are known from previous testing and
range from several hundred to several thousand.
Lognormal
test plans,
assuming
sigma and
the
acceleration
factor are
known
The goal is to construct a test plan (put n units on stress test
for T hours and accept the lot if no more than r failures
occur). The following assumptions are made:
The life distribution model is lognormal
Sigma = is known from past testing and does not
vary appreciably from lot to lot
Lot reliability varies because T
50
's (the lognormal
median or 50th percentile) differ from lot to lot
The acceleration factor from high stress to use stress is
a known quantity "A"
A stress time of T hours is practical as a line monitor
A nominal use T
50
of T
u
(combined with ) produces
an acceptable use CDF (or use reliability function).
This is equivalent to specifying an acceptable use
CDF at, say, 100,000 hours to be a given value p and
8.3.1.2. Lognormal or Weibull tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr312.htm[6/27/2012 2:49:51 PM]
0
calculating T
u
via:
where is the inverse of the standard normal
distribution
An unacceptable use CDF of p
1
leads to a "bad" use
T
50
of T
b
, using the same equation as above with p
o
replaced by p
1
The acceleration factor A is used to calculate a "good" or
acceptable proportion of failures p
a
at stress and a "bad" or
unacceptable proportion of fails p
b
:
where is the standard normal CDF. This reduces the
reliability problem to a well-known Lot Acceptance
Sampling Plan (LASP) problem, which was covered in
Chapter 6.
If the sample size required to distinguish between p
a
and p
b
turns out to be too large, it may be necessary to increase T
or test at a higher stress. The important point is that the
above assumptions and equations give a methodology for
planning ongoing reliability tests under a lognormal model
assumption.
Weibull test
plans,
assuming
gamma and
the
acceleration.
factor are
known
Weibull Case (shape parameter known): The assumptions
and calculations are similar to those made for the
lognormal:
The life distribution model is Weibull
Gamma = is known from past testing and does not
vary appreciably from lot to lot
Lot reliability varies because 's (the Weibull
characteristic life or 62.3 percentile) differ from lot to
lot
The acceleration factor from high stress to use stress is
a known quantity "A"
A stress time of T hours is practical as a line monitor
A nominal use of
u
(combined with ) produces
an acceptable use CDF (or use reliability function).
This is equivalent to specifying an acceptable use
CDF at, say, 100,000 hours to be a given value p
0
and
calculating
u
8.3.1.2. Lognormal or Weibull tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr312.htm[6/27/2012 2:49:51 PM]
An unacceptable use CDF of p
1
leads to a "bad" use
of , using the same equation as above with p
o
replaced by p
1
The acceleration factor A is used next to calculate a "good"
or acceptable proportion of failures p
a
at stress and a "bad"
or unacceptable proportion of failures p
b
:
This reduces the reliability problem to a Lot Acceptance
Sampling Plan (LASP) problem, which was covered in
Chapter 6.
If the sample size required to distinguish between p
a
and p
b
turns out to be too large, it may be necessary to increase T
or test at a higher stress. The important point is that the
above assumptions and equations give a methodology for
planning ongoing reliability tests under a Weibull model
assumption.
Planning Tests to Estimate Both Weibull or Both
Lognormal Parameters
Rules-of-
thumb for
general
lognormal
or Weibull
life test
planning
All that can be said here are some general rules-of-thumb:
1. If you can observe at least 10 exact times of failure,
estimates are usually reasonable - below 10 failures
the critical shape parameter may be hard to estimate
accurately. Below 5 failures, estimates are often very
inaccurate.
2. With readout data, even with more than 10 total
failures, you need failures in three or more readout
intervals for accurate estimates.
3. When guessing how many units to put on test and for
how long, try various reasonable combinations of
distribution parameters to see if the corresponding
calculated proportion of failures expected during the
test, multiplied by the sample size, gives a reasonable
number of failures.
4. As an alternative to the last rule, simulate test data
from reasonable combinations of distribution
parameters and see if your estimates from the
simulated data are close to the parameters used in the
simulation. If a test plan doesn't work well with
simulated data, it is not likely to work well with real
data.
8.3.1.2. Lognormal or Weibull tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr312.htm[6/27/2012 2:49:51 PM]
8.3.1.3. Reliability growth (Duane model)
http://www.itl.nist.gov/div898/handbook/apr/section3/apr313.htm[6/27/2012 2:49:52 PM]
8. Assessing Product Reliability
8.3. Reliability Data Collection
8.3.1. How do you plan a reliability assessment test?
8.3.1.3. Reliability growth (Duane model)
Guidelines
for
planning
how long to
run a
reliability
growth test
A reliability improvement test usually takes a large resource
commitment, so it is important to have a way of estimating
how long a test will be required. The following procedure
gives a starting point for determining a test time:
1. Guess a starting value for , the growth slope. Some
guidelines were previously discussed. Pick something
close to 0.3 for a conservative estimate (perhaps a new
cross-functional team will be working on the
improvement test or the system to be improved has
many new parts with possibly unknown failure
mechanisms), or close to 0.5 for a more optimistic
estimate.
2. Use current data and engineering estimates to arrive at
a consensus for what the starting MTBF for the system
is. Call this M
1
.
3. Let M
T
be the target MTBF (the customer
requirement). Then the improvement needed on the test
is given by
IM = M
T
/M
1
4. A first pass estimate of the test time needed is
This estimate comes from using the starting MTBF of M
1
as
the MTBF after 1 hour on test and using the fact that the
improvement from 1 hour to T hours is just .
Make sure
test time
makes
engineering
sense
The reason the above is just a first pass estimate is it will give
unrealistic (too short) test times when a high is assumed.
A very short reliability improvement test makes little sense
because a minimal number of failures must be observed
before the improvement team can determine design and parts
changes that will "grow" reliability. And it takes time to
implement these changes and observe an improved repair
rate.
Iterative Simulation methods can also be used to see if a planned test
8.3.1.3. Reliability growth (Duane model)
http://www.itl.nist.gov/div898/handbook/apr/section3/apr313.htm[6/27/2012 2:49:52 PM]
simulation
is an aid
for test
planning
is likely to generate data that will demonstrate an assumed
growth rate.
8.3.1.4. Accelerated life tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr314.htm[6/27/2012 2:49:53 PM]
8. Assessing Product Reliability
8.3. Reliability Data Collection
8.3.1. How do you plan a reliability assessment test?
8.3.1.4. Accelerated life tests
Accelerated
testing is
needed when
testing even
large sample
sizes at use
stress would
yield few or
no failures
within a
reasonable
time
Accelerated life tests are component life tests with
components operated at high stresses and failure data
observed. While high stress testing can be performed for the
sole purpose of seeing where and how failures occur and
using that information to improve component designs or
make better component selections, we will focus in this
section on accelerated life testing for the following two
purposes:
1. To study how failure is accelerated by stress and fit an
acceleration model to data from multiple stress cells
2. To obtain enough failure data at high stress to
accurately project (extrapolate) what the CDF at use
will be.
If we already know the acceleration model (or the
acceleration factor to typical use conditions from high stress
test conditions), then the methods described two pages ago
can be used. We assume, therefore, that the acceleration
model is not known in advance.
Test
planning
means
picking
stress levels
and sample
sizes and
test times to
produce
enough data
to fit models
and make
projections
Test planning and operation for a (multiple) stress cell life
test experiment consists of the following:
Pick several combinations of the relevant stresses (the
stresses that accelerate the failure mechanism under
investigation). Each combination is a "stress cell".
Note that you are planning for only one mechanism of
failure at a time. Failures on test due to any other
mechanism will be considered censored run times.
Make sure stress levels used are not too high - to the
point where new failure mechanisms that would never
occur at use stress are introduced. Picking a maximum
allowable stress level requires experience and/or good
engineering judgment.
Put random samples of components in each stress cell
and run the components in each cell for fixed (but
possibly different) lengths of time.
Gather the failure data from each cell and use the data
to fit an acceleration model and a life distribution
model and use these models to project reliability at
use stress conditions.
8.3.1.4. Accelerated life tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr314.htm[6/27/2012 2:49:53 PM]
Test planning would be similar to topics already covered in
the chapters that discussed modeling and experimental
design except for one important point. When you test
components in a stress cell for a fixed length test, it is
typical that some (or possibly many) of the components end
the test without failing. This is the censoring problem, and it
greatly complicates experimental design to the point at
which it becomes almost as much of an art (based on
engineering judgment) as a statistical science.
An example will help illustrate the design issues. Assume a
metal migration failure mode is believed to follow the 2-
stress temperature voltage model given by
Normal use conditions are 4 volts and 25 degrees Celsius,
and the high stress levels under consideration are 6, 8,12
volts and 85
o
, 105
o
and 125
o
. It probably would be a waste
of resources to test at (6v, 85
o
), or even possibly (8v, 85
o
)
or (6v,105
o
) since these cells are not likely to have enough
stress acceleration to yield a reasonable number of failures
within typical test times.
If you write all the 9 possible stress cell combinations in a
3x3 matrix with voltage increasing by rows and temperature
increasing by columns, the result would look like the matrix
below:
Matrix Leading to "Backward L Design"
6v, 85
o
6v, 105
o
6v, 125
o
8v, 85
o
8v,105
o
8v,125
o
12v,85
o
12v,105
o
12v,125
o
"Backwards
L" designs
are common
in
accelerated
life testing.
Put more
experimental
units in
lower stress
cells.
The combinations in bold are the most likely design choices
covering the full range of both stresses, but still hopefully
having enough acceleration to produce failures. This is the
so-called "backwards L" design commonly used for
acceleration modeling experiments.
Note: It is good design practice to put more of your test
units in the lower stress cells, to make up for the fact that
these cells will have a smaller proportion of units failing.
8.3.1.4. Accelerated life tests
http://www.itl.nist.gov/div898/handbook/apr/section3/apr314.htm[6/27/2012 2:49:53 PM]
Sometimes
simulation is
the best way
to learn
whether a
test plan has
a chance of
working
Design by Simulation:
A lengthy, but better way to choose a test matrix is the
following:
Pick an acceleration model and a life distribution
model (as usual).
Guess at the shape parameter value of the life
distribution model based on literature studies or earlier
experiments. The shape parameter should remain the
same for all stress cells. Choose a scale parameter
value at use so that the use stress CDF exactly meets
requirements (i.e., for the lognormal, pick a use T
50
that gives the desired use reliability - for a Weibull
model choice, do the same for the characteristic life
parameter).
Guess at the acceleration model parameters values (
H and , for the 2-stress model shown above). Again,
use whatever is in the literature for similar failure
mechanisms or data from earlier experiments).
Calculate acceleration factors from any proposed test
cells to use stress and divide the use scale parameter
by these acceleration factors to obtain "trial" cell scale
parameters.
Simulate cell data for each proposed stress cell using
the derived cell scale parameters and the guessed
shape parameter.
Check that every proposed cell has sufficient failures
to give good estimates.
Adjust the choice of stress cells and the sample size
allocations until you are satisfied that, if everything
goes as expected, the experiment will yield enough
data to provide good estimates of the model
parameters.
After you
make
advance
estimates, it
is sometimes
possible to
construct an
optimal
experimental
design - but
software for
this is
scarce
Optimal Designs:
Recent work on designing accelerated life tests has shown it
is possible, for a given choice of models and assumed values
of the unknown parameters, to construct an optimal design
(one which will have the best chance of providing good
sample estimates of the model parameters). These optimal
designs typically select stress levels as far apart as possible
and heavily weight the allocation of sample units to the
lower stress cells. However, unless the experimenter can find
software that incorporates these optimal methods for his or
her particular choice of models, the methods described
above are the most practical way of designing acceleration
experiments.
8.3.1.5. Bayesian gamma prior model
http://www.itl.nist.gov/div898/handbook/apr/section3/apr315.htm[6/27/2012 2:49:54 PM]
8. Assessing Product Reliability
8.3. Reliability Data Collection
8.3.1. How do you plan a reliability assessment test?
8.3.1.5. Bayesian gamma prior model
How to
plan a
Bayesian
test to
confirm a
system
meets its
MTBF
objective
Review Bayesian Basics and assumptions, if needed. We start
at the point when gamma prior parameters a and b have
already been determined. Assume we have a given MTBF
objective, M, and a desired confidence level of 100(1- ).
We want to confirm the system will have an MTBF of at least
M at the 100(1- ) confidence level. As in the section on
classical (HPP) test plans, we pick a number of failures, r, that
we can allow on the test. We need a test time T such that we
can observe up to r failures and still "pass" the test. If the test
time is too long (or too short), we can iterate with a different
choice of r.
When the test ends, the posterior gamma distribution will have
(worst case - assuming exactly r failures) new parameters of
a' = a + r, b' = b + T
and passing the test means that the failure rate
1-
, the upper
100(1- ) percentile for the posterior gamma, has to equal
the target failure rate 1/M. But this percentile is, by definition,
G
-1
(1- ; a', b'), with G
-1
denoting the inverse of the gamma
distribution with parameters a', b'. We can find the value of T
that satisfies G
-1
(1- ; a', b') = 1/M by trial and error.
However, based on the properties of the gamma distribution, it
turns out that we can calculate T directly by using
T = M(G
-1
(1- ; a', 1)) - b
Special Case: The Prior Has a = 1 (The "Weak" Prior)
When the
prior is a
weak prior
with a = 1,
the
Bayesian
test is
always
shorter
than the
classical
There is a very simple way to calculate the required Bayesian
test time when the prior is a weak prior with a = 1. Just use
the Test Length Guide Table to calculate the classical test
time. Call this T
c
. The Bayesian test time T is just T
c
minus
the prior parameter b (i.e., T = T
c
- b). If the b parameter was
set equal to (ln 2) MTBF
50
(where MTBF
50
is the consensus
choice for an "even money" MTBF), then
T = T
c
- (ln 2) MTBF
50
This shows that when a weak prior is used, the Bayesian test
8.3.1.5. Bayesian gamma prior model
http://www.itl.nist.gov/div898/handbook/apr/section3/apr315.htm[6/27/2012 2:49:54 PM]
test
time is always less than the corresponding classical test time.
That is why this prior is also known as a friendly prior.
Note: In general, Bayesian test times can be shorter, or longer,
than the corresponding classical test times, depending on the
choice of prior parameters. However, the Bayesian time will
always be shorter when the prior parameter a is less than, or
equal to, 1.
Example: Calculating a Bayesian Test Time
Example A new piece of equipment has to meet a MTBF requirement
of 500 hours at 80 % confidence. A group of engineers decide
to use their collective experience to determine a Bayesian
gamma prior using the 50/95 method described in Section 2.
They think 600 hours is a likely MTBF value and they are
very confident that the MTBF will exceed 250. Following the
example in Section 2, they determine that the gamma prior
parameters are a = 2.863 and b = 1522.46.
Now they want to determine an appropriate test time so that
they can confirm a MTBF of 500 with at least 80 %
confidence, provided they have no more than two failures.
We obtain a test time of 1756.117 hours using
500(G
-1
(1-0.2; 2.863+2, 1)) - 1522.46
To compare this result to the classical test time required, use
the Test Length Guide Table. The table factor is 4.28, so the
test time needed is 500 4.28 = 2140 hours for a non-
Bayesian test. The Bayesian test saves about 384 hours, or an
18 % savings. If the test is run for 1756 hours, with no more
than two failures, then an MTBF of at least 500 hours has
been confirmed at 80 % confidence.
If, instead, the engineers had decided to use a weak prior with
an MTBF
50
of 600, the required test time would have been
2140 - 600 ln 2 = 1724 hours
8.4. Reliability Data Analysis
http://www.itl.nist.gov/div898/handbook/apr/section4/apr4.htm[6/27/2012 2:49:55 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
After you have obtained component or system reliability data,
how do you fit life distribution models, reliability growth
models, or acceleration models? How do you estimate failure
rates or MTBF's and project component or system reliability
at use conditions? This section answers these kinds of
questions.
Detailed
outline for
Section 4
The detailed outline for section 4 follows.
4. Reliability Data Analysis
1. How do you estimate life distribution parameters from
censored data?
1. Graphical estimation
2. Maximum Likelihood Estimation (MLE)
3. A Weibull MLE example
2. How do you fit an acceleration model?
1. Graphical estimation
2. Maximum likelihood
3. Fitting models using degradation data instead of
failures
3. How do you project reliability at use conditions?
4. How do you compare reliability between two or more
populations?
5. How do you fit system repair rate models?
1. Constant repair rate (HPP/Exponential) model
2. Power law (Duane) model
3. Exponential law model
6. How do you estimate reliability using the Bayesian
gamma prior model?
8.4.1. How do you estimate life distribution parameters from censored data?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr41.htm[6/27/2012 2:49:55 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.1. How do you estimate life distribution
parameters from censored data?
Graphical
estimation
methods
(aided by
computer
line fits)
are easy
and quick
Maximum
likelihood
methods
are
usually
more
precise -
but
require
special
software
Two widely used general methods will be described in this
section:
Graphical estimation
Maximum Likelihood Estimation (MLE)
Recommendation On Which Method to Use
Maximum likelihood estimation (except when the failure data
are very sparse - i.e., only a few failures) is a more precise and
flexible method. However, with censored data, the method of
maximum likelihood estimation requires special computer
programs for distributions other than the exponential. This is
no longer an obstacle since, in recent years, many statistical
software packages have added reliability platforms that will
calculate MLE's and most of these packages will estimate
acceleration model parameters and give confidence bounds as
well.
If important business decisions are based on reliability
projections made from life test data and acceleration
modeling, then it pays to obtain state-of-the art MLE
reliability software. Otherwise, for monitoring and tracking
reliability, estimation methods based on computer-augmented
graphical procedures will often suffice.
8.4.1.1. Graphical estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr411.htm[6/27/2012 2:49:56 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.1. How do you estimate life distribution parameters from censored data?
8.4.1.1. Graphical estimation
The line on a
probability
plot uniquely
identifies
distributional
parameters
Once you have calculated plotting positions from your
failure data, and have generated the probability plot for your
chosen model, parameter estimation follows easily. But
along with the mechanics of graphical estimation, be aware
of both the advantages and the disadvantages of graphical
estimation methods.
Most
probability
plots have
simple
procedures
to calculate
underlying
distribution
parameter
estimates
Graphical Estimation Mechanics:
If you draw a line through points on a probability plot, there
are usually simple rules to find estimates of the slope (or
shape parameter) and the scale parameter. On lognormal
probability plot with time on the x-axis and cumulative
percent on the y-axis, draw horizontal lines from the 34th
and the 50th percentiles across to the fitted line, and drop
vertical lines to the time axis from these intersection points.
The time corresponding to the 50th percentile is the T
50
estimate. Divide T
50
by the time corresponding to the 34th
percentile (this is called T
34
). The natural logarithm of that
ratio is the estimate of sigma, or the slope of the line ( =
ln(T
50
/ T
34
)).
For a Weibull probability plot draw a horizontal line from
the y-axis to the fitted line at the 62.3 percentile point. That
estimation line intersects the line through the points at a
time that is the estimate of the characteristic life parameter
. In order to estimate the slope of the fitted line (or the
shape parameter ), choose any two points on the fitted line
and divide the change in the y variable by the change in x
variable.
Using a
computer
generated
line fitting
routine
removes
subjectivity
and can lead
directly to
computer
To remove the subjectivity of drawing a line through the
points, a least-squares (regression) fit can be performed
using the equations described in the section on probability
plotting. An example of this for the Weibull was also shown
in that section. Another example of a Weibull plot for the
same data appears later in this section.
Finally, if you have exact times and complete samples (no
censoring), many software packages have built-in
Probability Plotting functions. Examples were shown in the
8.4.1.1. Graphical estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr411.htm[6/27/2012 2:49:56 PM]
parameter
estimates
based on the
plotting
positions
sections describing various life distribution models.
Do
probability
plots even if
you use some
other method
for the final
estimates
Advantages of Graphical Methods of Estimation:
Graphical methods are quick and easy to use and
make visual sense.
Calculations can be done with little or no special
software needed.
Visual test of model (i.e., how well the points line up)
is an additional benefit.
Disadvantages of Graphical Methods of Estimation
Perhaps the
worst
drawback of
graphical
estimation is
you cannot
get legitimate
confidence
intervals for
the estimates
The statistical properties of graphical estimates (i.e., how
precise are they on average) are not good:
they are biased,
even with large samples, they are not minimum
variance (i.e., most precise) estimates,
graphical methods do not give confidence intervals
for the parameters (intervals generated by a regression
program for this kind of data are incorrect), and
formal statistical tests about model fit or parameter
values cannot be performed with graphical methods.
As we will see in the next section, Maximum Likelihood
Estimates overcome all these disadvantages - at least for
reliability data sets with a reasonably large number of
failures - at a cost of losing all the advantages listed above
for graphical estimation.
8.4.1.2. Maximum likelihood estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr412.htm[6/27/2012 2:49:57 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.1. How do you estimate life distribution parameters from censored data?
8.4.1.2. Maximum likelihood estimation
There is
nothing
visual
about the
maximum
likelihood
method -
but it is a
powerful
method
and, at
least for
large
samples,
very
precise
Maximum likelihood estimation begins with writing a
mathematical expression known as the Likelihood Function
of the sample data. Loosely speaking, the likelihood of a set
of data is the probability of obtaining that particular set of
data, given the chosen probability distribution model. This
expression contains the unknown model parameters. The
values of these parameters that maximize the sample
likelihood are known as the Maximum Likelihood Estimates
or MLE's.
Maximum likelihood estimation is a totally analytic
maximization procedure. It applies to every form of censored
or multicensored data, and it is even possible to use the
technique across several stress cells and estimate acceleration
model parameters at the same time as life distribution
parameters. Moreover, MLE's and Likelihood Functions
generally have very desirable large sample properties:
they become unbiased minimum variance estimators as
the sample size increases
they have approximate normal distributions and
approximate sample variances that can be calculated
and used to generate confidence bounds
likelihood functions can be used to test hypotheses
about models and parameters
With small
samples,
MLE's may
not be very
precise and
may even
generate a
line that
lies above
or below
the data
points
There are only two drawbacks to MLE's, but they are
important ones:
With small numbers of failures (less than 5, and
sometimes less than 10 is small), MLE's can be heavily
biased and the large sample optimality properties do not
apply
Calculating MLE's often requires specialized software
for solving complex non-linear equations. This is less
of a problem as time goes by, as more statistical
packages are upgrading to contain MLE analysis
capability every year.
Additional information about maximum likelihood
estimatation can be found in Chapter 1.
8.4.1.2. Maximum likelihood estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr412.htm[6/27/2012 2:49:57 PM]
Likelihood
equation
for
censored
data
Likelihood Function Examples for Reliability Data:
Let f(t) be the PDF and F(t) the CDF for the chosen life
distribution model. Note that these are functions of t and the
unknown parameters of the model. The likelihood function for
Type I Censored data is:
with C denoting a constant that plays no role when solving for
the MLE's. Note that with no censoring, the likelihood reduces
to just the product of the densities, each evaluated at a failure
time. For Type II Censored Data, just replace T above by the
random end of test time t
r
.
The likelihood function for readout data is:
with F(T
0
) defined to be 0.
In general, any multicensored data set likelihood will be a
constant times a product of terms, one for each unit in the
sample, that look like either f(t
i
), [F(T
i
)-F(T
i-1
)], or [1-F(t
i
)],
depending on whether the unit was an exact time failure at
time t
i
, failed between two readouts T
i-1
and T
i
, or survived to
time t
i
and was not observed any longer.
The general mathematical technique for solving for MLE's
involves setting partial derivatives of ln L (the derivatives are
taken with respect to the unknown parameters) equal to zero
and solving the resulting (usually non-linear) equations. The
equation for the exponential model can easily be solved,
however.
MLE for
the
exponential
model
parameter
turns
out to be
just (total #
of failures)
divided by
(total unit
test time)
MLE's for the Exponential Model (Type I Censoring):
8.4.1.2. Maximum likelihood estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr412.htm[6/27/2012 2:49:57 PM]
Note: The MLE of the failure rate (or repair rate) in the
exponential case turns out to be the total number of failures
observed divided by the total unit test time. For the MLE of
the MTBF, take the reciprocal of this or use the total unit test
hours divided by the total observed failures.
There are examples of Weibull and lognormal MLE analysis
later in this section.
8.4.1.3. A Weibull maximum likelihood estimation example
http://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm[6/27/2012 2:49:58 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.1. How do you estimate life distribution parameters from censored data?
8.4.1.3. A Weibull maximum likelihood estimation
example
Reliability
analysis
using
Weibull
data
We will plot Weibull censored data and estimate parameters using data
from a previous example (8.2.2.1).
The recorded failure times were 54, 187, 216, 240, 244, 335, 361, 373,
375, and 386 hours, and 10 units that did not fail were removed from the
test at 500 hours. The data are summarized in the following table.
Time Censored Frequency
54 0 1
187 0 1
216 0 1
240 0 1
244 0 1
335 0 1
361 0 1
373 0 1
375 0 1
386 0 1
500 1 10
The column labeled "Time" contains failure and censoring times, the
"Censored" column contains a variable to indicate whether the time in
column one is a failure time or a censoring time, and the "Frequency"
column shows how many units failed or were censored at that time.
First, we generate a survival curve using the Kaplan-Meier method and a
Weibull probability plot. Note: Some software packages might use the
name "Product Limit Method" or "Product Limit Survival Estimates"
instead of the equivalent name "Kaplan-Meier".
8.4.1.3. A Weibull maximum likelihood estimation example
http://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm[6/27/2012 2:49:58 PM]
Next, we perform a regression analysis for a survival model assuming
that failure times have a Weibull distribution. The Weibull characteristic
life parameter ( ) estimate is 606.5280 and the shape parameter ( )
estimate is 1.7208.
The log-likelihood and Akaike's Information Criterion (AIC) from the
model fit are -75.135 and 154.27. For comparison, we computed the AIC
8.4.1.3. A Weibull maximum likelihood estimation example
http://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm[6/27/2012 2:49:58 PM]
for the lognormal distribution and found that it was only slightly larger
than the Weibull AIC.
Lognormal AIC Weibull AIC
154.39 154.27
When comparing values of AIC, smaller is better. The probability
density of the fitted Weibull distribution is shown below.
Based on the estimates of and , the lifetime expected value and
standard deviation are the following.
The greek letter, , represents the gamma function.
Discussion Maximum likelihood estimation (MLE) is an accurate and easy way to
estimate life distribution parameters, provided that a good software
analysis package is available. The package should also calculate
confidence bounds and log-likelihood values.
The analyses in this section can can be implemented using R code.
8.4.2. How do you fit an acceleration model?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr42.htm[6/27/2012 2:49:59 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.2. How do you fit an acceleration model?
Acceleration
models can
be fit by
either
graphical
procedures
or maximum
likelihood
methods
As with estimating life distribution model parameters, there
are two general approaches for estimating acceleration model
parameters:
Graphical estimation (or computer procedures based
on a graphical approach)
Maximum Likelihood Estimation (an analytic
approach based on writing the likelihood of all the data
across all the cells, incorporating the acceleration
model).
The same comments and recommendations concerning these
methods still apply. Note that it is even harder, however, to
find useful software programs that will do maximum
likelihood estimation across stress cells and fit and test
acceleration models.
Sometimes it
is possible
to fit a
model using
degradation
data
Another promising method of fitting acceleration models is
sometimes possible when studying failure mechanisms
characterized by a stress-induced gradual degradation
process that causes the eventual failure. This approach fits
models based on degradation data and has the advantage of
not actually needing failures. This overcomes censoring
limitations by providing measurement data at consecutive
time intervals for every unit in every stress cell.
8.4.2.1. Graphical estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr421.htm[6/27/2012 2:50:00 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.2. How do you fit an acceleration model?
8.4.2.1. Graphical estimation
This section will discuss the following:
1. How to fit an Arrhenius model with graphical estimation
2. Graphical estimation: an Arrhenius model example
3. Fitting more complicated models
Estimate
acceleration
model
parameters
by
estimating
cell T
50
values (or
values)
and then
using
regression
to fit the
model
across the
cells
How to fit an Arrhenius Model with Graphical Estimation
Graphical methods work best (and are easiest to describe) for a simple
one-stress model like the widely used Arrhenius model
with T denoting temperature measured in degrees Kelvin (273.16 +
degrees Celsius) and k is Boltzmann's constant (8.617 x 10
-5
in eV/K).
When applying an acceleration model to a distribution of failure times,
we interpret the deterministic model equation to apply at any distribution
percentile we want. This is equivalent to setting the life distribution scale
parameter equal to the model equation (T
50
for the lognormal, for the
Weibull and the MTBF or 1/ for the exponential). For the lognormal,
for example, we have
So, if we run several stress cells and compute T
50
values for each cell, a
plot of the natural log of these T
50
values versus the corresponding 1/kT
values should be roughly linear with a slope of H and an intercept of
ln A. In practice, a computer fit of a line through these points is typically
used to obtain the Arrhenius model estimates. Remember that T is in
Kelvin in the above equations. For temperature in Celsius, use the
8.4.2.1. Graphical estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr421.htm[6/27/2012 2:50:00 PM]
following for 1/kT:
11605/(t C + 273.16).
An example will illustrate the procedure.
Graphical Estimation: An Arrhenius Model Example:
Arrhenius
model
example
Component life tests were run at three temperatures: 85 C, 105 C and
125 C. The lowest temperature cell was populated with 100
components; the 105 C cell had 50 components and the highest stress
cell had 25 components. All tests were run until either all the units in the
cell had failed or 1000 hours was reached. Acceleration was assumed to
follow an Arrhenius model and the life distribution model for the failure
mode was believed to be lognormal. The normal operating temperature
for the components is 25 C and it is desired to project the use CDF at
100,000 hours.
Test results:
Cell 1 (85 C): 5 failures at 401, 428, 695, 725 and 738 hours. Ninety-
five units were censored at 1000 hours running time.
Cell 2 (105 C): 35 failures at 171, 187, 189, 266, 275, 285, 301, 302,
305, 316, 317, 324, 349, 350, 386, 405, 480, 493, 530, 534, 536, 567,
589, 598, 599, 614, 620, 650, 668, 685, 718, 795, 854, 917, and 926
hours. Fifteen units were censored at 1000 hours running time.
Cell 3 (125 C): 24 failures at 24, 42, 92, 93, 141, 142, 143, 159, 181,
188, 194, 199, 207, 213, 243, 256, 259, 290, 294, 305, 392, 454, 502 and
696. One unit was censored at 1000 hours running time.
Failure analysis confirmed that all failures were due to the same failure
mechanism (if any failures due to another mechanism had occurred, they
would have been considered censored run times in the Arrhenius
analysis).
Steps to Fitting the Distribution Model and the Arrhenius Model:
Do plots for each cell and estimate T
50
and sigma as previously
discussed.
Plot all the cells on the same graph and check whether the lines are
roughly parallel (a necessary consequence of true acceleration).
If probability plots indicate that the lognormal model is appropriate
and that sigma is consistant among cells, plot ln T
50
versus
11605/(t C + 273.16) for each cell, check for linearity and fit a
straight line through the points. Since the points have different
values of precision, due to different numbers of failures in each
cell, it is recommended that the number of failures in each cell be
used as weights in a regression when fitting a line through the
points.
Use the slope of the line as the H estimate and calculate the
Arrhenius A constant from the intercept using A = e
intercept
.
Estimate the common sigma across all the cells by the weighted
average of the individual cell sigma estimates. Use the number of
8.4.2.1. Graphical estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr421.htm[6/27/2012 2:50:00 PM]
failures in a cell divided by the total number of failures in all cells
as that cell's weight. This will allow cells with more failures to
play a bigger role in the estimation process.
Solution for
Arrhenius
model
example
Analysis of Multicell Arrhenius Model Data:
The following lognormal probability plot was generated for our data so
that all three stress cells are plotted on the same graph.
Note that the lines are somewhat straight (a check on the lognormal
model) and the slopes are approximately parallel (a check on the
acceleration assumption).
The cell ln T
50
and sigma estimates are obtained from linear regression
fits for each cell using the data from the probability plot. Each fit will
yield a cell A
o
, the ln T
50
estimate, and A
1
, the cell sigma estimate.
These are summarized in the table below.
Summary of Least Squares Estimation of Cell Lognormal
Parameters
Cell Number
ln T
50
Sigma
1 (t C = 85) 8.168 .908
2 (t C = 105) 6.415 .663
3 (t C = 125) 5.319 .805
The three cells have 11605/(t C + 273.16) values of 32.40, 30.69 and
29.15 respectively, in cell number order. The Arrhenius plot is
8.4.2.1. Graphical estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr421.htm[6/27/2012 2:50:00 PM]
With only three cells, it is unlikely a straight line through the points will
present obvious visual lack of fit. However, in this case, the points
appear to line up very well.
Finally, the model coefficients are computed from a weighted linear fit
of ln T
50
versus 11605/(t C + 273.16), using weights of 5, 35, and 24
for each cell. This will yield a ln A estimate of -18.312 (A = e
-18.312
=
0.1115x10
-7
) and a H estimate of 0.808. With this value of H, the
acceleration between the lowest stress cell of 85 C and the highest of
125 C is
which is almost 14 acceleration. Acceleration from 125 C to the use
condition of 25 C is 3708. The use T
50
is e
-18.312
x
e
0.808x11605x1/298.16
= e
13.137
= 507383.
A single sigma estimate for all stress conditions can be calculated as a
weighted average of the three sigma estimates obtained from the
experimental cells. The weighted average is (5/64) 0.908 + (35/64)
0.663 + (24/64) 0.805 = 0.74.
The analyses in this section can can be implemented using both Dataplot
code and R code.
Fitting More Complicated models
Models Two stress models, such as the temperature/voltage model given by
8.4.2.1. Graphical estimation
http://www.itl.nist.gov/div898/handbook/apr/section4/apr421.htm[6/27/2012 2:50:00 PM]
involving
several
stresses can
be fit using
multiple
regression
need at least four or five carefully chosen stress cells to estimate all the
parameters. The Backwards L design previously described is an example
of a design for this model. The bottom row of the "backward L" could be
used for a plot testing the Arrhenius temperature dependence, similar to
the above Arrhenius example. The right hand column could be plotted
using y = ln T
50
and x = ln V, to check the voltage term in the model.
The overall model estimates should be obtained from fitting the multiple
regression model
Fitting this model, after setting up the Y, X1 = X
1
, X2 = X
2
data
vectors, provides estimates for b
0
, b
1
and b
2
.
Three stress models, and even Eyring models with interaction terms, can
be fit by a direct extension of these methods. Graphical plots to test the
model, however, are less likely to be meaningful as the model becomes
more complex.
8.4.2.2. Maximum likelihood
http://www.itl.nist.gov/div898/handbook/apr/section4/apr422.htm[6/27/2012 2:50:01 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.2. How do you fit an acceleration model?
8.4.2.2. Maximum likelihood
The
maximum
likelihood
method can
be used to
estimate
distribution
and
acceleration
model
parameters
at the same
time
The likelihood equation for a multi-cell acceleration model utilizes the likelihood function for
each cell, as described in section 8.4.1.2. Each cell will have unknown life distribution
parameters that, in general, are different. For example, if a lognormal model is used, each cell
might have its own T
50
and sigma.
Under an acceleration assumption, however, all the cells contain samples from populations
that have the same value of sigma (the slope does not change for different stress cells). Also,
the T
50
values are related to one another by the acceleration model; they all can be written
using the acceleration model equation that includes the proper cell stresses.
To form the likelihood equation under the acceleration model assumption, simply rewrite each
cell likelihood by replacing each cell T
50
with its acceleration model equation equivalent and
replacing each cell sigma with the same overall sigma. Then, multiply all these modified cell
likelihoods together to obtain the overall likelihood equation.
Once the overall likelihood equation has been created, the maximum likelihood estimates
(MLE) of sigma and the acceleration model parameters are the values that maximize this
likelihood. In most cases, these values are obtained by setting partial derivatives of the log
likelihood to zero and solving the resulting (non-linear) set of equations.
The method
is
complicated
and
requires
specialized
software
As you can see, the procedure is complicated, computationally intensive, and is only practical
if appropriate software is available. MLE does have many desirable features.
The method can, in theory at least, be used for any distribution model and acceleration
model and type of censored data.
Estimates have "optimal" statistical properties as sample sizes (i.e., numbers of failures)
become large.
Approximate confidence bounds can be calculated.
Statistical tests of key assumptions can be made using the likelihood ratio test. Some
common tests are:
the life distribution model versus another simpler model with fewer parameters
(i.e., a 3-parameter Weibull versus a 2-parameter Weibull, or a 2-parameter
Weibull versus an exponential),
the constant slope from cell to cell requirement of typical acceleration models,
and
the fit of a particular acceleration model.
In general, the recommendations made when comparing methods of estimating life
distribution model parameters also apply here. Software incorporating acceleration model
analysis capability, while rare just a few years ago, is now readily available and many
companies and universities have developed their own proprietary versions.
8.4.2.2. Maximum likelihood
http://www.itl.nist.gov/div898/handbook/apr/section4/apr422.htm[6/27/2012 2:50:01 PM]
Steps For Fitting The Arrhenius Model
Use MLE to
fit an
Arrhenius
model to
example
data
Data from the Arrhenius example given in section 8.4.2.1 were analyzed using MLE. The
analyses in this section can can be implemented using R code.
1. We generate survival curves for each cell. All plots and estimates are based on individual
cell data, without the Arrhenius model assumption.
2. The results of lognormal survival regression modeling for the three data cells are shown
below.
Cell 1 - 85 C
Parameter Estimate Stan. Dev z Value
--------- -------- --------- -------
Intercept 8.891 0.890 9.991
ln(scale) 0.192 0.406 0.473
sigma = exp(ln(scale)) = 1.21
ln likelihood = -53.4
Cell 2 - 105 C
Parameter Estimate Stan. Dev z Value
8.4.2.2. Maximum likelihood
http://www.itl.nist.gov/div898/handbook/apr/section4/apr422.htm[6/27/2012 2:50:01 PM]
--------- -------- --------- -------
Intercept 6.470 0.108 60.14
ln(scale) -0.336 0.129 -2.60
sigma = exp(ln(scale)) = 0.715
ln likelihood = -265.2
Cell 3 - 125 C
Parameter Estimate Stan. Dev z Value
--------- -------- --------- -------
Intercept 5.33 0.163 32.82
ln(scale) -0.21 0.146 -1.44
sigma = exp(ln(scale)) = 0.81
ln likelihood = -156.5
The cell ln likelihood values are -53.4, -265.2 and -156.5, respectively. Adding them together
yields a total ln likelihood of -475.1 for all the data fit with separate lognormal parameters for
each cell (no Arrhenius model assumption).
3. Fit the Arrhenius model to all data using MLE.
Parameter Estimate Stan. Dev z Value
--------- -------- --------- -------
Intercept -19.906 2.3204 -8.58
l/kT 0.863 0.0761 11.34
ln(scale) -0.259 0.0928 -2.79
sigma = exp(ln(scale))Scale = 0.772
ln likelihood = -476.7
4. The likelihood ratio test statistic for the Arrhenius model fit (which also incorporates the
single sigma acceleration assumption) is -2ln , where denotes the ratio of the likelihood
values with (L
0
), and without (L
1
) the Arrhenius model assumption so that
-2ln = -2ln (L
0
/L
1
) = -2(ln L
0
- ln L
1
).
Using the results from steps 2 and 3, we have -2ln = -2(-476.7 - (-475.1)) = 3.2. The
degrees of freedom for the Chi-Square test statistic is 6 - 3 = 3, since six parameters were
reduced to three under the acceleration model assumption. The chance of obtaining a value 3.2
or higher is 36.3% for a Chi-Square distribution with 3 degrees of freedom, which indicates
an acceptable model (no significant lack of fit).
This completes the Arrhenius model analysis of the three cells of data. If different cells of data
have different voltages, then a new variable "ln V" could be added as an effect to fit the
Inverse Power Law voltage model. In fact, several effects can be included at once if more
than one stress varies across cells. Cross product stress terms could also be included by adding
these columns to the spreadsheet and adding them in the model as additional "effects".
Example Comparing Graphical Estimates and MLE
Arrhenius
example
comparing
graphical
and MLE
method
results
The results from the three-stress-cell Arrhenius example using graphical and MLE methods
for estimating parameters are shown in the table below.
Graphical Estimates MLE
ln T
50
Sigma
ln T
50
Sigma
Cell 1 8.17 0.91 8.89 1.21
Cell 2 6.42 0.66 6.47 0.71
8.4.2.2. Maximum likelihood
http://www.itl.nist.gov/div898/handbook/apr/section4/apr422.htm[6/27/2012 2:50:01 PM]
Cell 3 5.32 0.81 5.33 0.81
Acceleration Model Overall Estimates
H Sigma ln A
Graphical 0.808 0.74 -18.312
MLE 0.863 0.77 -19.91
Note that when there are a lot of failures and little censoring, the two methods are in fairly
close agreement. Both methods are also in close agreement on the Arrhenius model results.
However, even small differences can be important when projecting reliability numbers at use
conditions. In this example, the CDF at 25 C and 100,000 hours projects to 0.014 using the
graphical estimates and only 0.003 using the MLE.
MLE
method
tests models
and gives
confidence
intervals
The maximum likelihood method allows us to test whether parallel lines (a single sigma) are
reasonable and whether the Arrhenius model is acceptable. The likelihood ratio tests for the
three example data cells indicated that a single sigma and the Arrhenius model are
appropriate. In addition, we can compute confidence intervals for all estimated parameters
based on the MLE results.
8.4.2.3. Fitting models using degradation data instead of failures
http://www.itl.nist.gov/div898/handbook/apr/section4/apr423.htm[6/27/2012 2:50:03 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.2. How do you fit an acceleration model?
8.4.2.3. Fitting models using degradation data instead of
failures
If you can fit
models using
degradation
data, you
don't need
actual test
failures
When failure can be related directly to a change over time in a measurable
product parameter, it opens up the possibility of measuring degradation over
time and using that data to extrapolate when failure will occur. That allows us to
fit acceleration models and life distribution models without actually waiting for
failures to occur.
This overview of degradation modeling assumes you have chosen a life
distribution model and an acceleration model and offers an alternative to the
accelerated testing methodology based on failure data, previously described. The
following topics are covered.
Common assumptions
Advantages
Drawbacks
A simple method
A more accurate approach for a special case
Example
More details can be found in Nelson (1990, pages 521-544) or Tobias and
Trindade (1995, pages 197-203).
Common Assumptions When Modeling Degradation Data
You need a
measurable
parameter
that drifts
(degrades)
linearly to a
critical
failure value
Two common assumptions typically made when degradation data are modeled
are the following:
1. A parameter D, that can be measured over time, drifts monotonically
(upwards, or downwards) towards a specified critical value DF. When it
reaches DF, failure occurs.
2. The drift, measured in terms of D, is linear over time with a slope (or rate
of degradation) R, that depends on the relevant stress the unit is operating
under and also the (random) characteristics of the unit being measured.
Note: It may be necessary to define D as a transformation of some
standard parameter in order to obtain linearity - logarithms or powers are
sometimes needed.
The figure below illustrates these assumptions by showing degradation plots of
five units on test. Degradation readings for each unit are taken at the same four
time points and straight lines fit through these readings on a unit-by-unit basis.
These lines are then extended up to a critical (failure) degradation value. The
projected times of failure for these units are then read off the plot. The are: t , t ,
8.4.2.3. Fitting models using degradation data instead of failures
http://www.itl.nist.gov/div898/handbook/apr/section4/apr423.htm[6/27/2012 2:50:03 PM]
1 2
...,t
5
.
Plot of
linear
degradation
trends for
five units
read out at
four time
points
In many practical situations, D starts at 0 at time zero, and all the linear
theoretical degradation lines start at the origin. This is the case when D is a "%
change" parameter, or failure is defined as a change of a specified magnitude in
a parameter, regardless of its starting value. Lines all starting at the origin
simplify the analysis since we don't have to characterize the population starting
value for D, and the "distance" any unit "travels" to reach failure is always the
constant DF. For these situations, the degradation lines would look as follows.
Often, the
degradation
lines go
through the
origin - as
when %
change is the
measurable
parameter
increasing to
a failure
level
8.4.2.3. Fitting models using degradation data instead of failures
http://www.itl.nist.gov/div898/handbook/apr/section4/apr423.htm[6/27/2012 2:50:03 PM]
It is also common to assume the effect of measurement error, when reading
values of D, has relatively little impact on the accuracy of model estimates.
Advantages of Modeling Based on Degradation Data
Modeling
based on
complete
samples of
measurement
data, even
with low
stress cells,
offers many
advantages
1. Every degradation readout for every test unit contributes a data point. This
leads to large amounts of useful data, even if there are very few failures.
2. You don't have to run tests long enough to obtain significant numbers of
failures.
3. You can run low stress cells that are much closer to use conditions and
obtain meaningful degradation data. The same cells would be a waste of
time to run if failures were needed for modeling. Since these cells are
more typical of use conditions, it makes sense to have them influence
model parameters.
4. Simple plots of degradation vs time can be used to visually test the linear
degradation assumption.
Drawbacks to Modeling Based on Degradation Data
Degradation
may not
proceed in a
smooth,
linear
fashion
towards
what the
customer
calls
"failure"
1. For many failure mechanisms, it is difficult or impossible to find a
measurable parameter that degrades to a critical value in such a way that
reaching that critical value is equivalent to what the customer calls a
failure.
2. Degradation trends may vary erratically from unit to unit, with no
apparent way to transform them into linear trends.
3. Sometimes degradation trends are reversible and a few units appear to
"heal themselves" or get better. This kind of behavior does not follow
typical assumptions and is difficult to model.
4. Measurement error may be significant and overwhelm small degradation
trends, especially at low stresses.
5. Even when degradation trends behave according to assumptions and the
chosen models fit well, the final results may not be consistent with an
analysis based on actual failure data. This probably means that the failure
mechanism depends on more than a simple continuous degradation
process.
Because of the last listed drawback, it is a good idea to have at least one high-
stress cell where enough real failures occur to do a standard life distribution
model analysis. The parameter estimates obtained can be compared to the
predictions from the degradation data analysis, as a "reality" check.
A Simple Method For Modeling Degradation Data
A simple
approach is
to extend
each unit's
degradation
line until a
projected
"failure
time" is
obtained
1. As shown in the figures above, fit a line through each unit's degradation
readings. This can be done by hand, but using a least squares regression
program is better.
2. Take the equation of the fitted line, substitute DF for Y and solve for X.
This value of X is the "projected time of fail" for that unit.
3. Repeat for every unit in a stress cell until a complete sample of
(projected) times of failure is obtained for the cell.
4. Use the failure times to compute life distribution parameter estimates for a
cell. Under the fairly typical assumption of a lognormal model, this is very
simple. Take natural logarithms of all failure times and treat the resulting
data as a sample from a normal distribution. Compute the sample mean
8.4.2.3. Fitting models using degradation data instead of failures
http://www.itl.nist.gov/div898/handbook/apr/section4/apr423.htm[6/27/2012 2:50:03 PM]
and the sample standard deviation. These are estimates of ln T
50
and ,
respectively, for the cell.
5. Assuming there are k cells with varying stress, fit an appropriate
acceleration model using the cell ln T
50
values, as described in the
graphical estimation section. A single sigma estimate is obtained by taking
the square root of the average of the cell
2
estimates (assuming the same
number of units each cell). If the cells have n
j
units on test, where the n
j
values are not all equal, use the pooled sum-of-squares estimate across all
k cells calculated by
A More Accurate Regression Approach For the Case When D = 0 at time 0
and the "Distance To Fail" DF is the Same for All Units
Models can
be fit using
all the
degradation
readings and
linear
regression
Let the degradation measurement for the i-th unit at the j-th readout time in the
k-th stress cell be given by D
ijk
, and let the corresponding readout time be
denoted by t
jk
. That readout gives a degradation rate (or slope) estimate of D
ijk
/
t
jk
. This follows from the linear assumption or:
(Rate of degradation) (Time on test) = (Amount of degradation)
Based on that readout alone, an estimate of the natural logarithm of the time to
fail for that unit is
y
ijk
= ln DF - (ln D
ijk
- ln t
jk
).
This follows from the basic formula connecting linear degradation with failure
time
(rate of degradation) (time of failure) = DF
by solving for (time of failure) and taking natural logarithms.
For an Arrhenius model analysis, with
with the x
k
values equal to 1/KT. Here T is the temperature of the k-th cell,
measured in Kelvin (273.16 + degrees Celsius) and K is Boltzmann's constant
(8.617 10
-5
in eV/ unit Kelvin). Use a linear regression program to estimate a
= ln A and b = H. If we further assume t
f
has a lognormal distribution, the
mean square residual error from the regression fit is an estimate of
2
(with the
lognormal sigma).
One way to think about this model is as follows: each unit has a random rate R
8.4.2.3. Fitting models using degradation data instead of failures
http://www.itl.nist.gov/div898/handbook/apr/section4/apr423.htm[6/27/2012 2:50:03 PM]
of degradation. Since t
f
= DF/R, it follows from a characterization property of
the normal distribution that if t
f
is lognormal, then R must also have a lognormal
distribution (assuming DF and R are independent). After we take logarithms, ln
R has a normal distribution with a mean determined by the acceleration model
parameters. The randomness in R comes from the variability in physical
characteristics from unit to unit, due to material and processing differences.
Note: The estimate of sigma based on this simple graphical approach might tend
to be too large because it includes an adder due to the measurement error that
occurs when making the degradation readouts. This is generally assumed to have
only a small impact.
Example: Arrhenius Degradation Analysis
An example
using the
regression
approach to
fit an
Arrhenius
model
A component has a critical parameter that studies show degrades linearly over
time at a rate that varies with operating temperature. A component failure based
on this parameter occurs when the parameter value changes by 30% or more.
Fifteen components were tested under 3 different temperature conditions (5 at 65
C, 5 at 85 C and the last 5 at 105 C). Degradation percent values were read
out at 200, 500 and 1000 hours. The readings are given by unit in the following
three temperature cell tables.
65 C
200 hr 500 hr 1000 hr
Unit 1 0.87 1.48 2.81
Unit 2 0.33 0.96 2.13
Unit 3 0.94 2.91 5.67
Unit 4 0.72 1.98 4.28
Unit 5 0.66 0.99 2.14
85 C
200 hr 500 hr 1000 hr
Unit 1 1.41 2.47 5.71
Unit 2 3.61 8.99 17.69
Unit 3 2.13 5.72 11.54
Unit 4 4.36 9.82 19.55
Unit 5 6.91 17.37 34.84
105 C
200 hr 500 hr 1000 hr
Unit 1 24.58 62.02 124.10
Unit 2 9.73 24.07 48.06
Unit 3 4.74 11.53 23.72
Unit 4 23.61 58.21 117.20
Unit 5 10.90 27.85 54.97
8.4.2.3. Fitting models using degradation data instead of failures
http://www.itl.nist.gov/div898/handbook/apr/section4/apr423.htm[6/27/2012 2:50:03 PM]
Note that one unit failed in the 85 C cell and four units failed in the 105 C
cell. Because there were so few failures, it would be impossible to fit a life
distribution model in any cell but the 105 C cell, and therefore no acceleration
model can be fit using failure data. We will fit an Arrhenius/lognormal model,
using the degradation data.
Solution:
Fit the
model to the
degradation
data
From the above tables, first create a variable (DEG) with 45 degradation values
starting with the first row in the first table and proceeding to the last row in the
last table. Next, create a temperature variable (TEMP) that has 15 repetitions of
65, followed by 15 repetitions of 85 and then 15 repetitions of 105. Finally,
create a time variable (TIME) that corresponds to readout times.
Fit the Arrhenius/lognormal equation, y
ijk
= a + b x
ijk
, where
y
ijk
= ln(30) - (ln(DEG) - ln(TIME))
and
x
ijk
= 100000 / [8.617*(TEMP + 273.16)].
The linear regression results are the following.
Parameter Estimate Stan. Dev t Value
--------- -------- --------- -------
a -18.94337 1.83343 -10.33
b 0.81877 0.05641 14.52
Residual standard deviation = 0.5611
Residual degrees of freedom = 45
The Arrhenius model parameter estimates are: ln A = -18.94; H = 0.82. An
estimate of the lognormal sigma is = 0.56.
The analyses in this section can can be implemented using both Dataplot code
and R code.
8.4.3. How do you project reliability at use conditions?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr43.htm[6/27/2012 2:50:05 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.3. How do you project reliability at use
conditions?
When
projecting
from high
stress to use
conditions,
having a
correct
acceleration
model and
life
distribution
model is
critical
General Considerations
Reliability projections based on failure data from high stress
tests are based on assuming we know the correct acceleration
model for the failure mechanism under investigation and we
are also using the correct life distribution model. This is
because we are extrapolating "backwards" - trying to
describe failure behavior in the early tail of the life
distribution, where we have little or no actual data.
For example, with an acceleration factor of 5000 (and some
are much larger than this), the first 100,000 hours of use life
is "over" by 20 hours into the test. Most, or all, of the test
failures typically come later in time and are used to fit a life
distribution model with only the first 20 hours or less being
of practical use. Many distributions may be flexible enough
to adequately fit the data at the percentiles where the points
are, and yet differ from the data by orders of magnitude in
the very early percentiles (sometimes referred to as the early
"tail" of the distribution).
However, it is frequently necessary to test at high stress (to
obtain any failures at all!) and project backwards to use.
When doing this bear in mind two important points:
Project for
each failure
mechanism
separately
Distribution models, and especially acceleration
models, should be applied only to a single failure
mechanism at a time. Separate out failure mechanisms
when doing the data analysis and use the competing
risk model to build up to a total component failure rate
Try to find theoretical justification for the chosen
models, or at least a successful history of their use for
the same or very similar mechanisms. (Choosing
models solely based on empirical fit is like
extrapolating from quicksand to a mirage.)
How to Project from High Stress to Use Stress
Two types of use-condition reliability projections are
common:
1. Projection to use conditions after completing a multiple
8.4.3. How do you project reliability at use conditions?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr43.htm[6/27/2012 2:50:05 PM]
stress cell experiment and successfully fitting both a
life distribution model and an acceleration model
2. Projection to use conditions after a single cell at high
stress is run as a line reliability monitor.
Arrhenius
model
projection
example
The Arrhenius example from the graphical estimation and the
MLE estimation sections ended by comparing use projections
of the CDF at 100,000 hours. This is a projection of the first
type. We know from the Arrhenius model assumption that the
T
50
at 25 C is just
Using the graphical model estimates for ln A and we have
T
50
at use = e
-18.312
e
0.808 11605/298.16
= e
13.137
= 507383
and combining this T
50
with the estimate of the common
sigma of 0.74 allows us to easily estimate the CDF or failure
rate after any number of hours of operation at use conditions.
In particular, the CDF value of a lognormal at T/T
50
(where
time T = 100,000, T
50
= 507383, and sigma = 0.74) is 0.014,
which matches the answer given in the MLE estimation
section as the graphical projection of the CDF at 100,000
hours at a use temperature of 25 C.
If the life distribution model had been Weibull, the same type
of analysis would be performed by letting the characteristic
life parameter vary with stress according to the acceleration
model, while the shape parameter is constant for all stress
conditions.
The second type of use projection was used in the section on
lognormal and Weibull tests, in which we judged new lots of
product by looking at the proportion of failures in a sample
tested at high stress. The assumptions we made were:
we knew the acceleration factor between use and high
stress
the shape parameter (sigma for the lognormal, gamma
for the Weibull) is also known and does not change
significantly from lot to lot.
With these assumptions, we can take any proportion of
failures we see from a high stress test and project a use CDF
or failure rate. For a T-hour high stress test and an
acceleration factor of A from high stress to use stress, an
observed proportion p is converted to a use CDF at 100,000
hours for a lognormal model using:
8.4.3. How do you project reliability at use conditions?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr43.htm[6/27/2012 2:50:05 PM]
T
50Stress
= TG
-1
(p, 0, )
CDF = G((100000/(AT
50Stress
)), 0, ).
where G(q, , ) is the lognormal distribution function with
mean and standard deviation .
If the model is Weibull, we can find the use CDF or failure
rate with:
A
Stress
= TW
-1
(p, , 1)
CDF = W((100000/(AA
Stress
)), , 1).
where W(q, , ) is the Weibull distribution function with
shape parameter and scale parameter .
The analyses in this section can can be implemented using
both Dataplot code and R code.
8.4.4. How do you compare reliability between two or more populations?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr44.htm[6/27/2012 2:50:05 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.4. How do you compare reliability between
two or more populations?
Several
methods
for
comparing
reliability
between
populations
are
described
Comparing reliability among populations based on samples of
failure data usually means asking whether the samples came
from populations with the same reliability function (or CDF).
Three techniques already described can be used to answer
this question for censored reliability data. These are:
Comparing sample proportion failures
Likelihood ratio test comparisons
Lifetime regression comparisons
Comparing Sample Proportion Failures
Assume each sample is a random sample from possibly a
different lot, vendor or production plant. All the samples are
tested under the same conditions. Each has an observed
proportion of failures on test. Call these sample proportions of
failures p
1
, p
2
, p
3
, ...p
n
. Could these all have come from
equivalent populations?
This is a question covered in Chapter 7 for two populations,
and for more than two populations, and the techniques
described there apply equally well here.
Likelihood Ratio Test Comparisons
The Likelihood Ratio test was described earlier. In this
application, the Likelihood ratio has as a denominator the
product of all the Likelihoods of all the samples assuming
each population has its own unique set of parameters. The
numerator is the product of the Likelihoods assuming the
parameters are exactly the same for each population. The test
looks at whether -2ln is unusually large, in which case it is
unlikely the populations have the same parameters (or
reliability functions).
This procedure is very effective if, and only if, it is built into
the analysis software package being used and this software
covers the models and situations of interest to the analyst.
Lifetime Regression Comparisons
Lifetime regression is similar to maximum likelihood and
8.4.4. How do you compare reliability between two or more populations?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr44.htm[6/27/2012 2:50:05 PM]
likelihood ratio test methods. Each sample is assumed to have
come from a population with the same shape parameter and a
wide range of questions about the scale parameter (which is
often assumed to be a "measure" of lot-to-lot or vendor-to-
vendor quality) can be formulated and tested for significance.
For a complicated, but realistic example, assume a company
manufactures memory chips and can use chips with some
known defects ("partial goods") in many applications.
However, there is a question of whether the reliability of
"partial good" chips is equivalent to "all good" chips. There
exists lots of customer reliability data to answer this question.
However the data are difficult to analyze because they contain
several different vintages with known reliability differences
as well as chips manufactured at many different locations.
How can the partial good vs all good question be resolved?
A lifetime regression model can be constructed with variables
included that change the scale parameter based on vintage,
location, partial versus all good, and any other relevant
variables. Then, a good lifetime regression program will sort
out which, if any, of these factors are significant and, in
particular, whether there is a significant difference between
"partial good" and "all good".
8.4.5. How do you fit system repair rate models?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr45.htm[6/27/2012 2:50:06 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.5. How do you fit system repair rate
models?
Fitting
models
discussed
earlier
This subsection describes how to fit system repair rate models
when you have actual failure data. The data could come from
from observing a system in normal operation or from running
tests such as Reliability Improvement tests.
The three models covered are the constant repair rate
(HPP/exponential) model, the power law (Duane) model and
the exponential law model.
8.4.5.1. Constant repair rate (HPP/exponential) model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr451.htm[6/27/2012 2:50:07 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.5. How do you fit system repair rate models?
8.4.5.1. Constant repair rate (HPP/exponential) model
This
section
covers
estimating
MTBF's
and
calculating
upper and
lower
confidence
bounds
The HPP or exponential model is widely used for two reasons:
Most systems spend most of their useful lifetimes operating in the flat
constant repair rate portion of the bathtub curve
It is easy to plan tests, estimate the MTBF and calculate confidence
intervals when assuming the exponential model.
This section covers the following:
1. Estimating the MTBF (or repair rate/failure rate)
2. How to use the MTBF confidence interval factors
3. Tables of MTBF confidence interval factors
4. Confidence interval equation and "zero fails" case
5. Calculation of confidence intervals
6. Example
Estimating the MTBF (or repair rate/failure rate)
For the HPP system model, as well as for the non repairable exponential
population model, there is only one unknown parameter (or equivalently,
the MTBF = 1/ ). The method used for estimation is the same for the HPP
model and for the exponential population model.
The best
estimate of
the MTBF
is just
"Total
Time"
divided by
"Total
Failures"
The estimate of the MTBF is
This estimate is the maximum likelihood estimate whether the data are
censored or complete, or from a repairable system or a non-repairable
population.
Confidence
Interval
Factors
multiply
the
estimated
How To Use the MTBF Confidence Interval Factors
1. Estimate the MTBF by the standard estimate (total unit test hours
divided by total failures)
2. Pick a confidence level (i.e., pick 100x(1- )). For 95%, = .05; for
90%, = .1; for 80%, = .2 and for 60%, = .4
8.4.5.1. Constant repair rate (HPP/exponential) model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr451.htm[6/27/2012 2:50:07 PM]
MTBF to
obtain
lower and
upper
bounds on
the true
MTBF
3. Read off a lower and an upper factor from the confidence interval
tables for the given confidence level and number of failures r
4. Multiply the MTBF estimate by the lower and upper factors to obtain
MTBF
lower
and MTBF
upper
5. When r (the number of failures) = 0, multiply the total unit test hours
by the "0 row" lower factor to obtain a 100 (1- /2)% one-sided
lower bound for the MTBF. There is no upper bound when r = 0.
6. Use (MTBF
lower
, MTBF
upper
) as a 100(1- )% confidence interval
for the MTBF (r > 0)
7. Use MTBF
lower
as a (one-sided) lower 100(1- /2)% limit for the
MTBF
8. Use MTBF
upper
as a (one-sided) upper 100(1- /2)% limit for the
MTBF
9. Use (1/MTBF
upper
, 1/MTBF
lower
) as a 100(1- )% confidence
interval for
10. Use 1/MTBF
upper
as a (one-sided) lower 100(1- /2)% limit for
11. Use 1/MTBF
lower
as a (one-sided) upper 100(1- /2)% limit for
Tables of MTBF Confidence Interval Factors
Confidence
bound
factor
tables for
60, 80, 90
and 95%
confidence
Confidence Interval Factors to Multiply MTBF Estimate
60% 80%
Num
Fails r
Lower for
MTBF
Upper for
MTBF
Lower for
MTBF
Upper for
MTBF
0 0.6213 - 0.4343 -
1 0.3340 4.4814 0.2571 9.4912
2 0.4674 2.4260 0.3758 3.7607
3 0.5440 1.9543 0.4490 2.7222
4 0.5952 1.7416 0.5004 2.2926
5 0.6324 1.6184 0.5391 2.0554
6 0.6611 1.5370 0.5697 1.9036
7 0.6841 1.4788 0.5947 1.7974
8 0.7030 1.4347 0.6156 1.7182
9 0.7189 1.4000 0.6335 1.6567
10 0.7326 1.3719 0.6491 1.6074
11 0.7444 1.3485 0.6627 1.5668
12 0.7548 1.3288 0.6749 1.5327
13 0.7641 1.3118 0.6857 1.5036
14 0.7724 1.2970 0.6955 1.4784
15 0.7799 1.2840 0.7045 1.4564
20 0.8088 1.2367 0.7395 1.3769
25 0.8288 1.2063 0.7643 1.3267
30 0.8436 1.1848 0.7830 1.2915
35 0.8552 1.1687 0.7978 1.2652
8.4.5.1. Constant repair rate (HPP/exponential) model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr451.htm[6/27/2012 2:50:07 PM]
40 0.8645 1.1560 0.8099 1.2446
45 0.8722 1.1456 0.8200 1.2280
50 0.8788 1.1371 0.8286 1.2142
75 0.9012 1.1090 0.8585 1.1694
100 0.9145 1.0929 0.8766 1.1439
500 0.9614 1.0401 0.9436 1.0603
Confidence Interval Factors to Multiply MTBF Estimate
90% 95%
Num
Fails
Lower for
MTBF
Upper for
MTBF
Lower for
MTBF
Upper for
MTBF
0 0.3338 - 0.2711 -
1 0.2108 19.4958 0.1795 39.4978
2 0.3177 5.6281 0.2768 8.2573
3 0.3869 3.6689 0.3422 4.8491
4 0.4370 2.9276 0.3906 3.6702
5 0.4756 2.5379 0.4285 3.0798
6 0.5067 2.2962 0.4594 2.7249
7 0.5324 2.1307 0.4853 2.4872
8 0.5542 2.0096 0.5075 2.3163
9 0.5731 1.9168 0.5268 2.1869
10 0.5895 1.8432 0.5438 2.0853
11 0.6041 1.7831 0.5589 2.0032
12 0.6172 1.7330 0.5725 1.9353
13 0.6290 1.6906 0.5848 1.8781
14 0.6397 1.6541 0.5960 1.8291
15 0.6494 1.6223 0.6063 1.7867
20 0.6882 1.5089 0.6475 1.6371
25 0.7160 1.4383 0.6774 1.5452
30 0.7373 1.3893 0.7005 1.4822
35 0.7542 1.3529 0.7190 1.4357
40 0.7682 1.3247 0.7344 1.3997
45 0.7800 1.3020 0.7473 1.3710
50 0.7901 1.2832 0.7585 1.3473
75 0.8252 1.2226 0.7978 1.2714
100 0.8469 1.1885 0.8222 1.2290
500 0.9287 1.0781 0.9161 1.0938
Confidence Interval Equation and "Zero Fails" Case
Formulas
for
confidence
bound
factors -
Confidence bounds for the typical Type I censoring situation are obtained
from chi-square distribution tables or programs. The formula for calculating
confidence intervals is:
8.4.5.1. Constant repair rate (HPP/exponential) model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr451.htm[6/27/2012 2:50:07 PM]
even for
"zero fails"
case
In this formula,
2
/2,2r
is a value that the chi-square statistic with 2r
degrees of freedom is less than with probability /2. In other words, the
left-hand tail of the distribution has probability /2. An even simpler
version of this formula can be written using T = the total unit test time:
These bounds are exact for the case of one or more repairable systems on
test for a fixed time. They are also exact when non repairable units are on
test for a fixed time and failures are replaced with new units during the
course of the test. For other situations, they are approximate.
When there are zero failures during the test or operation time, only a (one-
sided) MTBF lower bound exists, and this is given by
MTBF
lower
= T/(-ln )
The interpretation of this bound is the following: if the true MTBF were
any lower than MTBF
lower
, we would have seen at least one failure during
T hours of test with probability at least 1-. Therefore, we are 100(1-) %
confident that the true MTBF is not lower than MTBF
lower
.
Calculation
of
confidence
limits
A one-sided, lower 100(1-/2) % confidence bound for the MTBF is given
by
LOWER = 2T/G
-1
(1-/2, [2(r+1)])
where T is the total unit or system test time, r is the total number of
failures, and G(q,) is the
2
distribution function with shape parameter .
A one-sided, upper 100(1-/2) % confidence bound for the MTBF is given
by
UPPER = 2T/G
-1
(/2, [2r])
The two intervals together, (LOWER, UPPER), are a 100(1-) % two-sided
confidence interval for the true MTBF.
Please use caution when using CDF and inverse CDF functions in
commercial software because some functions require left-tail probabilities
and others require right-tail probabilities. In the left-tail case, /2 is used
for the upper bound because 2T is being divided by the smaller percentile,
and 1-/2 is used for the lower bound because 2T is divided by the larger
percentile. For the right-tail case, 1-/2 is used to compute the upper bound
8.4.5.1. Constant repair rate (HPP/exponential) model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr451.htm[6/27/2012 2:50:07 PM]
and /2 is used to compute the lower bound. Our formulas for G
-1
(q,)
assume the inverse CDF function requires left-tail probabilities.
Example
Example
showing
how to
calculate
confidence
limits
A system was observed for two calendar months of operation, during which
time it was in operation for 800 hours and had 2 failures.
The MTBF estimate is 800/2 = 400 hours. A 90 %, two-sided confidence
interval is given by (4000.3177, 4005.6281) = (127, 2251). The same
interval could have been obtained using
LOWER = 1600/G
-1
(0.95,6)
UPPER = 1600/G
-1
(0.05,4)
Note that 127 is a 95 % lower limit for the true MTBF. The customer is
usually only concerned with the lower limit and one-sided lower limits are
often used for statements of contractual requirements.
Zero fails
confidence
limit
calculation
What could we have said if the system had no failures? For a 95 % lower
confidence limit on the true MTBF, we either use the 0 failures factor from
the 90 % confidence interval table and calculate 800 0.3338 = 267, or we
use T/(ln ) = 800/(ln 0.05) = 267.
The analyses in this section can can be implemented using both Dataplot
code and R code.
8.4.5.2. Power law (Duane) model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr452.htm[6/27/2012 2:50:09 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.5. How do you fit system repair rate models?
8.4.5.2. Power law (Duane) model
The Power
Law
(Duane)
model has
been very
successful in
modeling
industrial
reliability
improvement
data
Brief Review of Power Law Model and Duane Plots
Recall that the Power Law is a NHPP with the expected number of
fails, M(t), and the repair rate, M'(t) = m(t), given by:
The parameter = 1-b is called the Reliability Growth Slope and
typical industry values for growth slopes during reliability
improvement tests are in the .3 to .6 range.
If a system is observed for a fixed time of T hours and failures
occur at times t
1
, t
2
, t
3
, ..., t
r
(with the start of the test or
observation period being time 0), a Duane plot is a plot of (t
i
/ i)
versus t
i
on log-log graph paper. If the data are consistent with a
Power Law model, the points in a Duane Plot will roughly follow a
straight line with slope and intercept (where t = 1 on the log-log
paper) of -log
10
a.
MLE's for
the Power
Law model
are given
Estimates for the Power Law Model
Computer aided graphical estimates can easily be obtained by
doing a regression fit of Y = ln (t
i
/ i) vs X = ln t
i
. The slope is
the estimate and e
-intercept
is the a estimate. The estimate of b is
1- .
However, better estimates can easily be calculated. These are
modified maximum likelihood estimates (corrected to eliminate
bias). The formulas are given below for a fixed time of T hours,
and r failures occurring at times t
1
, t
2
, t
3
, ..., t
r
.
8.4.5.2. Power law (Duane) model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr452.htm[6/27/2012 2:50:09 PM]
The estimated MTBF at the end of the test (or observation) period
is
Approximate
confidence
bounds for
the MTBF at
end of test
are given
Approximate Confidence Bounds for the MTBF at End of Test
We give an approximate 100(1-) % confidence interval (M
L
,
M
U
) for the MTBF at the end of the test. Note that M
L
is a
100(1-/2) % one-sided lower confidence bound and M
U
is a
100(1-/2) % one-sided upper confidence bound. The formulas
are:
where z
1-/2
is the 100(1-/2) percentile point of the standard
normal distribution.
Case Study 1: Reliability Improvement Test Data Continued
Fitting the
power law
model to
case study 1
failure data
This case study was introduced in section 2, where we did various
plots of the data, including a Duane Plot. The case study was
continued when we discussed trend tests and verified that
significant improvement had taken place. Now we will complete
the case study data analysis.
The observed failure times were: 5, 40, 43, 175, 389, 712, 747, 795,
1299 and 1478 hours, with the test ending at 1500 hours. We
estimate , a, and the MTBF at the end of test, along with a
100(1-) % confidence interval for the true MTBF at the end of
test (assuming, of course, that the Power Law model holds). The
parameters and confidence intervals for the power law model were
8.4.5.2. Power law (Duane) model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr452.htm[6/27/2012 2:50:09 PM]
estimated to be the following.
Estimate of = 0.5165
Estimate of a = 0.2913
Estimate of MTBF at the end of the test = 310.234
80 % two-sided confidence interval:
(157.7139 , 548.5565)
90 % one-sided lower confidence limit = 157.7139
The analyses in this section can can be implemented using R code.
8.4.5.3. Exponential law model
http://www.itl.nist.gov/div898/handbook/apr/section4/apr453.htm[6/27/2012 2:50:10 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.5. How do you fit system repair rate models?
8.4.5.3. Exponential law model
Estimates
of the
parameters
of the
Exponential
Law model
can be
obtained
from either
a graphical
procedure
or
maximum
likelihood
estimation
Recall from section 1 that the Exponential Law refers to a
NHPP process with repair rate M'(t) = m(t) = . This
model has not been used nearly as much in industrial
applications as the Power Law model, and it is more difficult
to analyze. Only a brief description will be given here.
Since the expected number of failures is given by
M(t) = and ln M(t) = , a plot of the
cum fails versus time of failure on a log-linear scale should
roughly follow a straight line with slope . Doing a
regression fit of y = ln cum fails versus x = time of failure
will provide estimates of the slope and the intercept - ln
.
Alternatively, maximum likelihood estimates can be obtained
from the following pair of equations:
The first equation is non-linear and must be solved iteratively
to obtain the maximum likelihood estimate for . Then, this
estimate is substituted into the second equation to solve for
the maximum likelihood estimate for .
8.4.6. How do you estimate reliability using the Bayesian gamma prior model?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr46.htm[6/27/2012 2:50:11 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.6. How do you estimate reliability using the Bayesian gamma
prior model?
The Bayesian paradigm was introduced in Section 1 and Section 2 described the assumptions
underlying the gamma/exponential system model (including several methods to transform
prior data and engineering judgment into gamma prior parameters "a" and "b"). Finally, we
saw in Section 3 how to use this Bayesian system model to calculate the required test time
needed to confirm a system MTBF at a given confidence level.
Review of
Bayesian
procedure
for the
gamma
exponential
system
model
The goal of Bayesian reliability procedures is to obtain as accurate a posterior distribution as
possible, and then use this distribution to calculate failure rate (or MTBF) estimates with
confidence intervals (called credibility intervals by Bayesians). The figure below
summarizes the steps in this process.
How to
estimate
the MTBF
with
bounds,
based on
the
Once the test has been run, and r failures observed, the posterior gamma parameters are:
a' = a + r, b' = b + T
and a (median) estimate for the MTBF is calculated by
1 / G
-1
(0.5, a', (1/b'))
8.4.6. How do you estimate reliability using the Bayesian gamma prior model?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr46.htm[6/27/2012 2:50:11 PM]
posterior
distribution
where G(q, , ) represents the gamma distribution with shape parameter , and scale
parameter . Some people prefer to use the reciprocal of the mean of the posterior distribution
as their estimate for the MTBF. The mean is the minimum mean square error (MSE)
estimator of , but using the reciprocal of the mean to estimate the MTBF is always more
conservative than the "even money" 50% estimator.
A lower 80% bound for the MTBF is obtained from
1 / G
-1
(0.8, a', (1/b'))
and, in general, a lower 100(1-) % lower bound is given by
1 / G
-1
((1-), a', (1/b')).
A two-sided 100(1-) % credibility interval for the MTBF is
[1 / G
-1
((1-/2), a', (1/b')),
1 / G
-1
((/2), a', (1/b'))].
Finally, the G((1/M), a', (1/b')) calculates the probability that MTBF is greater than M.
Example
A Bayesian
example to
estimate
the MTBF
and
calculate
upper and
lower
bounds
A system has completed a reliability test aimed at confirming a 600 hour MTBF at an 80%
confidence level. Before the test, a gamma prior with a = 2, b = 1400 was agreed upon, based
on testing at the vendor's location. Bayesian test planning calculations, allowing up to 2 new
failures, called for a test of 1909 hours. When that test was run, there actually were exactly
two failures. What can be said about the system?
The posterior gamma CDF has parameters a' = 4 and b' = 3309. The plot below shows CDF
values on the y-axis, plotted against 1/ = MTBF, on the x-axis. By going from probability, on
the y-axis, across to the curve and down to the MTBF, we can estimate any MTBF percentile
point.
8.4.6. How do you estimate reliability using the Bayesian gamma prior model?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr46.htm[6/27/2012 2:50:11 PM]
The MTBF values are shown below.
1 / G
-1
(0.9, 4, (1/3309))
= 495 hours
1 / G
-1
(0.8, 4, (1/3309))
= 600 hours (as expected)
1 / G
-1
(0.5, 4, (1/3309))
= 901 hours
1 / G
-1
(0.1, 4, (1/3309))
= 1897 hours
The test has confirmed a 600 hour MTBF at 80 % confidence, a 495 hour MTBF at 90 %
confidence and (495, 1897) is a 90 % credibility interval for the MTBF. A single number
(point) estimate for the system MTBF would be 901 hours. Alternatively, you might want to
use the reciprocal of the mean of the posterior distribution (b'/a') = 3309/4 = 827 hours as a
single estimate. The reciprocal mean is more conservative, in this case it is a 57 % lower
bound (G((4/3309), 4, (1/3309))).
The analyses in this section can can be implemented using R code.
8.4.6. How do you estimate reliability using the Bayesian gamma prior model?
http://www.itl.nist.gov/div898/handbook/apr/section4/apr46.htm[6/27/2012 2:50:11 PM]
8.4.7. References For Chapter 8: Assessing Product Reliability
http://www.itl.nist.gov/div898/handbook/apr/section4/apr47.htm[6/27/2012 2:50:12 PM]
8. Assessing Product Reliability
8.4. Reliability Data Analysis
8.4.7. References For Chapter 8: Assessing
Product Reliability
Aitchison, J. and Brown, J. A. C.,(1957), The Log-normal distribution,
Cambridge University Press, New York and London.
Ascher, H. (1981), "Weibull Distribution vs Weibull Process,"
Proceedings Annual Reliability and Maintainability Symposium, pp. 426-
431.
Ascher, H. and Feingold, H. (1984), Repairable Systems Reliability,
Marcel Dekker, Inc., New York.
Bain, L.J. and Englehardt, M. (1991), Statistical Analysis of Reliability
and Life-Testing Models: Theory and Methods, 2nd ed., Marcel Dekker,
New York.
Barlow, R. E. and Proschan, F. (1975), Statistical Theory of Reliability
and Life Testing, Holt, Rinehart and Winston, New York.
Birnbaum, Z.W., and Saunders, S.C. (1968), "A Probabilistic
Interpretation of Miner's Rule," SIAM Journal of Applied Mathematics,
Vol. 16, pp. 637-652.
Birnbaum, Z.W., and Saunders, S.C. (1969), "A New Family of Life
Distributions," Journal of Applied Probability, Vol. 6, pp. 319-327.
Cox, D.R. and Lewis, P.A.W. (1966), The Statistical Analysis of Series
of Events, John Wiley & Sons, Inc., New York.
Cox, D.R. (1972), "Regression Models and Life Tables," Journal of the
Royal Statistical Society, B 34, pp. 187-220.
Cox, D. R., and Oakes, D. (1984), Analysis of Survival Data, Chapman
and Hall, London, New York.
Crow, L.H. (1974), "Reliability Analysis for Complex Repairable
Systems", Reliability and Biometry, F. Proschan and R.J. Serfling, eds.,
SIAM, Philadelphia, pp 379- 410.
Crow, L.H. (1975), "On Tracking Reliability Growth," Proceedings
Annual Reliability and Maintainability Symposium, pp. 438-443.
Crow, L.H. (1982), "Confidence Interval Procedures for the Weibull
Process With Applications to Reliability Growth," Technometrics,
8.4.7. References For Chapter 8: Assessing Product Reliability
http://www.itl.nist.gov/div898/handbook/apr/section4/apr47.htm[6/27/2012 2:50:12 PM]
24(1):67-72.
Crow, L.H. (1990), "Evaluating the Reliability of Repairable Systems,"
Proceedings Annual Reliability and Maintainability Symposium, pp. 275-
279.
Crow, L.H. (1993), "Confidence Intervals on the Reliability of
Repairable Systems," Proceedings Annual Reliability and Maintainability
Symposium, pp. 126-134
Duane, J.T. (1964), "Learning Curve Approach to Reliability
Monitoring," IEEE Transactions On Aerospace, 2, pp. 563-566.
Gumbel, E. J. (1954), Statistical Theory of Extreme Values and Some
Practical Applications, National Bureau of Standards Applied
Mathematics Series 33, U.S. Government Printing Office, Washington,
D.C.
Hahn, G.J., and Shapiro, S.S. (1967), Statistical Models in Engineering,
John Wiley & Sons, Inc., New York.
Hoyland, A., and Rausand, M. (1994), System Reliability Theory, John
Wiley & Sons, Inc., New York.
Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994), Continuous
Univariate Distributions Volume 1, 2nd edition, John Wiley & Sons, Inc.,
New York.
Johnson, N.L., Kotz, S. and Balakrishnan, N. (1995), Continuous
Univariate Distributions Volume 2, 2nd edition, John Wiley & Sons, Inc.,
New York.
Kaplan, E.L., and Meier, P. (1958), "Nonparametric Estimation From
Incomplete Observations," Journal of the American Statistical
Association, 53: 457-481.
Kalbfleisch, J.D., and Prentice, R.L. (1980), The Statistical Analysis of
Failure Data, John Wiley & Sons, Inc., New York.
Kielpinski, T.J., and Nelson, W.(1975), "Optimum Accelerated Life-Tests
for the Normal and Lognormal Life Distributins," IEEE Transactions on
Reliability, Vol. R-24, 5, pp. 310-320.
Klinger, D.J., Nakada, Y., and Menendez, M.A. (1990), AT&T Reliability
Manual, Van Nostrand Reinhold, Inc, New York.
Kolmogorov, A.N. (1941), "On A Logarithmic Normal Distribution Law
Of The Dimensions Of Particles Under Pulverization," Dokl. Akad Nauk,
USSR 31, 2, pp. 99-101.
Kovalenko, I.N., Kuznetsov, N.Y., and Pegg, P.A. (1997), Mathematical
Theory of Reliability of Time Dependent Systems with Practical
Applications, John Wiley & Sons, Inc., New York.
Landzberg, A.H., and Norris, K.C. (1969), "Reliability of Controlled
8.4.7. References For Chapter 8: Assessing Product Reliability
http://www.itl.nist.gov/div898/handbook/apr/section4/apr47.htm[6/27/2012 2:50:12 PM]
Collapse Interconnections." IBM Journal Of Research and Development,
Vol. 13, 3.
Lawless, J.F. (1982), Statistical Models and Methods For Lifetime Data,
John Wiley & Sons, Inc., New York.
Leon, R. (1997-1999), JMP Statistical Tutorials on the Web at
http://www.nist.gov/cgi-bin/exit_nist.cgi?
url=http://web.utk.edu/~leon/jmp/.
Mann, N.R., Schafer, R.E. and Singpurwalla, N.D. (1974), Methods For
Statistical Analysis Of Reliability & Life Data, John Wiley & Sons, Inc.,
New York.
Martz, H.F., and Waller, R.A. (1982), Bayesian Reliability Analysis,
Krieger Publishing Company, Malabar, Florida.
Meeker, W.Q., and Escobar, L.A. (1998), Statistical Methods for
Reliability Data, John Wiley & Sons, Inc., New York.
Meeker, W.Q., and Hahn, G.J. (1985), "How to Plan an Accelerated Life
Test - Some Practical Guidelines," ASC Basic References In Quality
Control: Statistical Techniques - Vol. 10, ASQC , Milwaukee,
Wisconsin.
Meeker, W.Q., and Nelson, W. (1975), "Optimum Accelerated Life-Tests
for the Weibull and Extreme Value Distributions," IEEE Transactions on
Reliability, Vol. R-24, 5, pp. 321-322.
Michael, J.R., and Schucany, W.R. (1986), "Analysis of Data From
Censored Samples," Goodness of Fit Techniques, ed. by D'Agostino,
R.B., and Stephens, M.A., Marcel Dekker, New York.
MIL-HDBK-189 (1981), Reliability Growth Management, U.S.
Government Printing Office.
MIL-HDBK-217F ((1986), Reliability Prediction of Electronic
Equipment, U.S. Government Printing Office.
MIL-STD-1635 (EC) (1978), Reliability Growth Testing, U.S.
Government Printing Office.
Nelson, W. (1990), Accelerated Testing, John Wiley & Sons, Inc., New
York.
Nelson, W. (1982), Applied Life Data Analysis, John Wiley & Sons, Inc.,
New York.
O'Connor, P.D.T. (1991), Practical Reliability Engineering (Third
Edition), John Wiley & Sons, Inc., New York.
Peck, D., and Trapp, O.D. (1980), Accelerated Testing Handbook,
Technology Associates and Bell Telephone Laboratories, Portola, Calif.
Pore, M., and Tobias, P. (1998), "How Exact are 'Exact' Exponential
8.4.7. References For Chapter 8: Assessing Product Reliability
http://www.itl.nist.gov/div898/handbook/apr/section4/apr47.htm[6/27/2012 2:50:12 PM]
System MTBF Confidence Bounds?", 1998 Proceedings of the Section on
Physical and Engineering Sciences of the American Statistical
Association.
SEMI E10-0701, (2001), Standard For Definition and Measurement of
Equipment Reliability, Availability and Maintainability (RAM),
Semiconductor Equipment and Materials International, Mountainview,
CA.
Tobias, P. A., and Trindade, D. C. (1995), Applied Reliability, 2nd
edition, Chapman and Hall, London, New York.