One Way ANOVA PDF
One Way ANOVA PDF
One Way ANOVA PDF
Summary of Video
A vase filled with coins takes center stage as the video begins. Students will be taking part
in an experiment organized by psychology professor John Kelly in which they will guess the
amount of money in the vase. As a subterfuge for the real purpose of the experiment, students
are told that they are taking part in a study to test the theory of the “Wisdom of the Crowd,”
which is that the average of all of the guesses will probably be more accurate than most of the
individual guesses. However, the real purpose of the study is to see whether holding heavier
or lighter clipboards while estimating the amount of money in the jar will have an impact on
students’ guesses. The idea being tested is that physical experience can influence our thinking
in ways we are unaware of – this phenomenon is called embodied cognition.
The sheet on which students will record their monetary guesses is clipped onto a clipboard.
For the actual experiment, clipboards, each holding varying amounts of paper, weigh either
one pound, two pounds or three pounds. Students are randomly assigned to clipboards and
are unaware of any difference in the clipboards. After the data are collected, guesses are
entered into a computer program and grouped according to the weights of the clipboards. The
mean guess for each group is computed and the output is shown in Table 31.1.
Money Guesses
1 $106.56 75 $100.62
2 $129.79 75 $204.95
3 $143.29 75 $213.13
Total $126.55 225 $180.16
Table
31.1
Table 31.1. Average guesses by clipboard weight.
In this case, F = 0.796 with a p-value of 0.45. That means there is a 45% chance of getting an
F value at least this extreme when there is no difference between the population means. So,
the data from this experiment do not provide sufficient evidence to reject the null hypothesis.
One of the underlying assumptions of ANOVA is that the data in each group are normally
distributed. However, the boxplots in Figure 31.2 indicate that the data are skewed and include
some rather extreme outliers. John’s students tried some statistical manipulations on the data to
make them more normal and reran the ANOVA. However, the conclusion remained the same.
$1,600.00
$1,400.00
$1,200.00
$1,000.00
MoneyGuess
$800.00
$600.00
$400.00
$200.00
$0.00
1 2 3
Clipboard Weight
But what if we used the data displayed in Figure 31.3 instead? The sample means are the same,
around $107, $130, and $143, but this time the data are less spread out about those means.
200
175
MoneyGuess
150
125
100
75
50
1 2 3
Clipboard Weight
In this case, after running ANOVA, the result is F = 33.316 with a p-value that is essentially
zero. Our conclusion is to reject the null hypothesis and conclude that the population means
are significantly different.
In John’s experiment, the harsh reality of a rigorous statistical analysis has shot down the idea
that holding something heavy causes people, unconsciously, to make larger estimates, at least
in this particular study. But if the real experiment didn’t work, what about the cover story – the
theory of the Wisdom of the Crowd? The actual amount in the vase is $237.52. Figure 31.4
shows a histogram of all the guesses. The mean of the estimates is $129.22 – more than $100
off, but still better than about three-quarters of the individual guesses. So, the crowd was wiser
than the people in it.
B. Be able to identify the factor(s) and response variable from a description of an experiment.
D. Know how to compute the F statistic and determine its degrees of freedom given the
following summary statistics: sample sizes, sample means and sample standard deviations.
Be able to use technology to compute the p-value for F.
F. Recognize that statistically significant differences among population means depend on the
size of the differences among the sample means, the amount of variation within the samples,
and the sample sizes.
G. Recognize when underlying assumptions for ANOVA are reasonably met so that it is
appropriate to run an ANOVA.
For example, suppose a statistics class wanted to test whether or not the amount of caffeine
consumed affected memory. The variable caffeine is called a factor and students wanted
to study how three levels of that factor affected the response variable, memory. Twelve
students were recruited to take part in the study. The participants were divided into three
groups of four and randomly assigned to one of the following drinks:
After drinking the caffeinated beverage, the participants were given a memory test (words
remembered from a list). The results are given in Table 31.2.
Table
Table
331.2.
1.2 Number of words recalled in memory test.
For an ANOVA, the null hypothesis is that the population means among the groups are the
same. In this case, H0 : µ A = µB = µC , where µ A is the population mean number of words
recalled after people drink Coca Cola and similarly for µB and µC . The alternative or research
hypothesis is that there is some inequality among the three means. Notice that there is a lot of
variation in the number of words remembered by the participants. We break that variation into
two components:
(1) variation in the number of words recalled among the three groups also called
between-groups variation
Unit 31: One-Way ANOVA | Student Guide | Page 5
(2) variation in number of words among participants within each group also called
within-groups variation.
To measure each of these components, we’ll compute two different variances, the mean
square for groups (MSG) and the mean square error (MSE). The basic idea in gathering
evidence to reject the null hypothesis is to show that the between-groups variation is
substantially larger than the within-groups variation and we do that by forming the ratio, which
we call F:
In the caffeine example, we have three groups. More generally, suppose there were k different
groups (each assigned to consume varying amounts of caffeine) with sample sizes n1, n2, …
nk. Then the null hypothesis is H 0 : µ1 = µ2 = . . . = µk and the alternative hypothesis is that
at least two of the population means differ. The formulas for computing the between-groups
variation and within-groups variation are given below:
where x is the mean of all the observations and x1 ,x2 , . . . ,xk are the
sample means for each group.
We return to our three-group caffeine experiment to see how this works. To begin, we
calculate the sample means and standard deviations (See Table 31.3.).
Table 31.3.
Table 31.3Group means and standard deviations.
All that is left is to find the p-value. If the null hypothesis is true, then the F-statistic has the F
distribution with 2 and 12 degrees of freedom. We use software to see how likely it would be
to get an F value at least as extreme as 5.78. Figure 31.5 shows the result giving a p-value of
around 0.017. Since p < 0.05, we conclude that the amount of caffeine consumed affected the
mean memory score.
Distribution Plot
F, df1=2, df2=12
1.0
0.8
0.6
Density
0.4
0.2
0.01746
0.0
0 5.78
F
It takes a lot of work to compute F and find the p-value. Here’s where technology can help.
Statistical software such as Minitab, spreadsheet software such as Excel, and even graphing
calculators can calculate ANOVA tables. Table 31.4 shows output from Minitab. Now, match
the calculations above with the values in Table 31.4. Check out where you can find the values
for MSG, MSE, F, the degrees of freedom for F, and the p-value directly from the output of
ANOVA. That will be a time saver!
Table 31.4. ANOVA output from Minitab.
Table 31.4. ANOVA output from Minitab.
It is important to understand that ANOVA does not tell you which population
Itmeans
is important
differ, to understand
only thattwo
that at least ANOVA
of the does
means not tell you
differ. Wewhich
wouldpopulation
have to usemeans
other differ,
only
teststhat at least
to help two of the
us decide whichmeans
of thediffer.
threeWe would have
population meansto use
are other tests todifferent
significantly help us decide
from each
which of theother.
three However,
populationwe can also
means are get a clue by different
significantly plotting the
fromdata.
each Figure
other.31.6
However,
shows comparative dotplots for the number of words for each group. The sample means
we can also get a clue by plotting the data. Figure 31.6 shows comparative dotplots for the
are marked with triangles. Notice that the biggest difference in sample means is
number
between of groups
words for each
A (34 mggroup. Theand
caffeine) sample means
C (160 mg ofare markedThe
caffeine). withsample
triangles. Notice
means for that
groups
the biggestB and C are quite
difference close together.
in sample means is So, it looks
between as if consuming
groups Coca Cola
A (34 mg caffeine) anddoesn’t
C (160 mg
give the memory boost you could expect from consuming coffee or Jolt Energy.
of caffeine). The sample means for groups B and C are quite close together. So, it looks as
if consuming Coca Cola doesn’t give the memory boost you could expect from consuming
coffee or Jolt Energy.
A
Group
B
C
6 7 8 9 10 11 12 13 14 15 16 17 18
Figure 31.6. Comparative dotplots.
Number of Words
There is one last detail before jumping into running an ANOVA – there are some
Figure 31.6. assumptions
underlying Comparative that need to be checked in order for the results of the analysis
dotplots.
to be valid. What we should have done first with our caffeine experiment, we will do last.
Here is
There areone
thelast
three things
detail to check.
before jumping into running an ANOVA – there are some underlying
assumptions that need to be checked in order for the results of the analysis to be valid. What
1. Each group’s data need to be an independent random sample from that
we should have done
population. In first with our
the case caffeine
of an experiment,
experiment, we willneed
the subjects do last. Here
to be are the three
randomly
things toassigned
check. to the levels of the factor.
Check: The subjects in the caffeine-memory experiment were divided into groups. Groups
were then randomly assigned to the level of caffeine.
Check: The normal quantile plots of Words Recalled for each group are shown in Figure
31.7. Based on these plots, it seems reasonable to assume these data are from a Normal
distribution.
90 90
50 50
10 10
Percent
1 1
0 5 10 15 20 5 10 15 20
90
50
10
1
5 10 15 20
3. All populations have the same standard deviation. The results from ANOVA will be
approximately correct as long as the ratio of the largest standard deviation to the smallest
standard deviation is less than 2.
Check: The ratio of the largest to the smallest standard deviation is 2.236/1.789 or around
1.25, which is less than 2.
An analysis of variance or ANOVA is a method of inference used to test whether or not three
or more population means are equal. In a one-way ANOVA there is one factor that is thought
to be related to the response variable.
An analysis of variances tests the equality of means by comparing two types of variation,
between-groups variation and within-groups variation. Between-groups variation deals
with the spread of the group sample means about the grand mean, the mean of all the
observations. It is measured by the mean square for groups, MSG. Within-groups variation
deals with the spread of individual data values within a group about the group mean. It is
measured by the mean square error, MSE.
2. What was different about the clipboards that students were holding?
4. What is the name of the test statistic that results from ANOVA?
5. Was the professor able to conclude from the F-statistic that the population means differed
depending on the weight of the clipboard? Explain.
You will use the Wafer Thickness tool to collect data for this activity. There are three control
settings that affect wafer thickness during the manufacture of polished wafers used in the
production of microchips.
1. Leave Controls 2 and 3 set at level 2. Your first task will be to perform an experiment to
collect data and determine whether settings for Control 1 affect the mean thickness of polished
wafers.
a. Open the Wafer Thickness tool. Set Control 1 to level 1, and Controls 2 and 3 to level 2 (the
middle setting). In Real Time mode, collect data from 10 polished wafers. Store the data in a
statistical package or Excel spreadsheet or in a calculator list. Make a sketch of the histogram
produced by the interactive tool.
b. Set Control 1 to level 2. Leave Controls 2 and 3 set at level 2. Repeat (a). Sketch the second
histogram using the same scales as was used on the first. Store the data in your spreadsheet
or a calculator list.
c. Set Control 1 to 3. Leave Controls 2 and 3 set at level 2. Repeat (a). Sketch your third
histogram, again using the same scales as were used on the first histogram. Store the data in
your spreadsheet or a calculator list.
d. Calculate the means and standard deviations for each of your three samples. Based on
the sample means and on your histograms, do you think that there is sufficient evidence that
changing the level of Control 1 changes the mean thickness of the polished wafers produced?
Or might these sample-mean differences be due simply to chance variation? Explain your
thoughts.
e. Use technology to run an ANOVA. State the null hypothesis being tested, the value of F, the
p-value, and your conclusion.
2. Your next task will be to perform an experiment to collect data and determine whether
settings for Control 2 affect the mean thickness of polished wafers.
a. Leave Controls 1 and 3 set at level 2. Adapt the process used in question 1(a – c) to collect
the data on Control 2.
c. Provided you answered yes to (b), use technology to run an ANOVA. State the null
hypothesis being tested, the value of F, the p-value, and your conclusion. (If you answered no
to (b), skip this part.)
3. Your final task will be to perform an experiment to collect data and determine whether
settings for Control 3 affect the mean thickness of polished wafers.
a. Leave Controls 1 and 2 set at level 2. Adapt the process used in question 1(a – c) to collect
the data on Control 3.
b. Compute the standard deviations for the three samples. Is the underlying assumption of
equal standard deviations reasonably satisfied? Explain.
c. Provided you answered yes to (b), use technology to run an ANOVA. State the null
hypothesis being tested, the value of F, the p-value, and your conclusion. (If you answered no
to (b), skip this part.)
Table
31.5
Table 31.5. Test results.
a. Calculate the mean test score for each group. Calculate the standard deviation of the test
scores for each group.
b. Make comparative dotplots for the test results of the three groups. Do you think that the
dotplots give sufficient evidence that there is a difference in population mean test results
depending on the type of noise? Explain.
c. Run an ANOVA. State the hypotheses you are testing. Show the calculations for the
F-statistic. What are the degrees of freedom associated with this F-statistic?
2. Not all hotdogs have the same calories. Table 31.6 contains calorie data on a random
sample of Beef, Poultry, and Veggie dogs. (One extreme outlier for Veggie dogs was omitted
from the data.) Does the mean calorie count differ depending on the type of hotdog? You first
encountered this topic in Unit 5, Boxplots.
Table
31.6
Table 31.6. Calorie content Table
31.7
Table 31.7. First-year college GPA by high
of hotdogs. school rating.
a. Verify that the standard deviations allow the use of ANOVA to compare population means.
b. Use technology to run an ANOVA. State the value of the F-statistic, the degrees of freedom
for F, the p-value of the test, and your conclusion.
c. Make boxplots that compare the calorie data for each type of hot dog. Add a dot to each
boxplot to mark the sample means. Do your plots help confirm your conclusion in (b)?
3. Many states rate their high schools using factors such as students’ performance, teachers’
educational backgrounds, and socioeconomic conditions. High school ratings for one state
have been boiled down into three categories: high, medium, and low. The question for one of
the state universities is whether or not college grade performance differs depending on high
school rating. Table 31.7 contains random samples of students from each high school rating
level and their first-year cumulative college grade point averages (GPA).
a. Calculate the sample means for the GPAs in each group. Based on the sample means
alone, does high school rating appear to have an impact on mean college GPA? Explain.
b. Check to see that underlying assumptions for ANOVA are reasonably satisfied.
a. The sample mean ACL scores for nursing, other health professional students, and
education majors were 46.44, 45.58, and 48.59, respectively. Do these sample means provide
sufficient evidence to conclude that there was some difference in population mean ACL scores
among these three majors? Explain.
b. A one-way analysis of variance was run to determine if there was a difference among the
three groups on mean ACL scores. Assuming that all students answered the NSSE questions
related to ACL, what were the degrees of freedom of the F-test?
c. The results from the ANOVA gave F = 8.382. Determine the p-value. What can you
conclude?
Ratings for A Ratings for B Ratings for C Ratings for A Ratings for B Ratings for C
8 4 6 8 4 6
10 5 5 10 5 5
7 7 7 7 6 8
8 8 5 8 9 2
6 7 6 3 6 7
7 8 5 7 8 3
4 6 6 4 6 7
7 5 5 9 5 5
6 5 4 6 4 4
8 6 2 8 7 2
6 6 3 5 6 3
5 7 4 4 9 4
6 3 5 6 3 3
7 8 5 8 9 7
8 5 6 10 3 8
a. Find the sample means of each candy type based on the ratings in Table 31.8. Then do the
same for the ratings in Table 31.9. Based on these results, can you tell if there is a significant
difference in population mean ratings among the different types of candies? Explain.
b. Make comparative boxplots for the data in Table 31.8. Then do the same for the data in
Table 31.9. For both sets of plots, mark the mean with a dot on each boxplot. For which data
set is it more likely that the results from a one-way ANOVA will be significant? Explain.
c. Run an ANOVA based on Data Set #1. Report the value of the F-statistic, the p-value, and
your conclusion. Then do the same for Data Set #2. Explain why you should not be surprised
by the results.
2. The data in Table 31.10 were part of a study to investigate online questionnaire design.
The researcher was interested in the effect that type of answer entry and type of question-to-
question navigation would have on the time it would take to complete online surveys. Twenty-
Display Type Navigation Type Time (sec) Display Type Navigation Type Time (sec)
1 1 97 1 3 117
3 3 83 2 2 74
1 1 102 1 3 66
3 3 85 3 1 62
1 1 92 1 2 93
3 3 71 3 1 62
1 2 105 1 2 64
3 3 92 3 1 48
1 2 67 1 2 57
3 3 71 3 1 96
1 2 54 2 3 68
3 3 66 3 1 90
1 3 63 2 3 71
2 1 61 3 1 74
1 3 101 2 3 74
2 1 117 3 2 78
1 3 124 2 3 92
2 1 97 3 2 71
2 1 126 2 3 80
3 2 83 3 2 49
2 1 107 2 3 67
3 2 88 1 1 101
2 1 88 2 2 111
3 2 62 1 1 103
2 2 55 2 2 80
1 3 73 1 1 103
2 2 126 2 2 111
b. Make comparative boxplots of the times for each level of Display Type. Mark the location
of the means on your boxplot. Do you see anything unusual in the data that might make it not
appropriate to use ANOVA? If so, follow up with normal quantile plots to check the assumption
of normality.
c. Run an ANOVA using Display Type as the factor. State the null hypothesis you are testing.
Report the value of the F-statistic, the p-value, and your conclusion.
d. Make comparative boxplots of the times for each level of Navigation Type. Mark the location
of the means on your boxplot. Do you see anything unusual in these data that might make
it not appropriate to use ANOVA? If so, follow up with normal quantile plots to check the
assumption of normality.
e. Run an ANOVA using Navigation Type as the factor. Report the value of the F-statistic, the
degrees of freedom of the F-statistic, the p-value, and your conclusion.
f. Based on this study, what recommendations would you make to online questionnaire
designers?
3. A group researching wage discrepancies among the four regions of the U.S. focused on full-
time, hourly-wage workers between the ages of 20 and 40. Researchers randomly selected
200 workers meeting the age criteria from the northeast, midwest, south and west and
recorded their hourly pay rates. The mean hourly rate for the combined regions was $15.467. A
summary of the data are given in Table 31.11. The researchers ran an ANOVA on these data.
Table
Table31.11.
31.11Summary of hourly rate data.
c. Calculate the value of the F-statistic and give its degrees of freedom. Show calculations.
e. Based on the evidence in Table 31.11 and your answers to (a – d), what conclusions can the
researchers make?
4. A study focusing on women’s wages was investigating whether there was a significant
difference in salaries in four occupations commonly (but not exclusively) held by women –
cashier, customer service representative, receptionist, and secretary/administrative assistant.
Weekly wages from 50 women working in each occupation are recorded in Table 31.12.
Table
Table
331.12.
1.12 Weekly wages of women in four occupations.
Data from 2012 March Supplement, Current Population Survey.
c. Run an ANOVA. Record the ANOVA table and highlight the value of F, and the p-value.