SolExamW08 PDF
SolExamW08 PDF
SolExamW08 PDF
$1.80 (b)
(d)
$2.10
(e)
$2.20
5. Which of the following is the most likely value of the standard deviation
of the prices recorded?
(a)
(c)
$0.50
(d)
$1.50
(e)
$2.00
Use the four scatterplots below for questions 6 9. The four scatterplots
below are, as labelled, respectively plots of Y1 vs X1, Y2 vs X2, Y3 vs
X3 and Y4 vs X4.
For each scatterplot, choose the value of the associated correlation from
the five listed below. Note that you will not need to use all the five
listed values, and it is possible that some values may be used more than
once.
(a)
0.958
(b)
0.504 (c)
6. Y1 vs X1 (d)
7. Y2 vs X2 (e)
8. Y3 vs X3 (b)
9. Y4 vs X4 (a)
0.032
(d)
0.217
(e)
0.830
18. To test the null hypothesis that SFU undergraduates and UBC undergraduates tend to sleep the same, on average, during exam period, we
would need which one of the following?
(a) Tables for the Chi-squared distribution on 9 degrees of freedom.
(b) A table for the t distribution with 8 degrees of freedom.
(c) A table for the t distribution with 19 degrees of freedom.
(d) A table for an F distribution with degrees of freedom of the denominator equal to 19.
(e) A table for an F distribution with degrees of freedom of
the denominator equal to 18.
Use the following information for questions 19, 20 and 21:
The owner of a small clothing store is concerned that her average sales
each day are only $149, not enough to cover rent and salary. She decides
to try out some new window displays, to see if these will increase her
average sales. She buys the new window displays on trial. To decide
if she should keep the new displays, she collects sales data for 20 days
to test the null hypothesis that the daily expected sales are unchanged
(equal to $149) versus the alternative hypothesis that expected daily
sales are greater than $149.
19. Suppose that the displays really do work. If the store owner extends her
trial period from 20 days to 30 days, which statement most precisely
describes what can be said about the power of her test?
(a) The power would increase.
(b) The power would stay the same.
(c) The power would decrease.
(d) The power would remain zero.
(e) The power could be chosen to be 5%.
20. Suppose that, based on the data collected in the trial, the owner calculates a p-value of 0.04. This means
(a) there is a 4% chance that sales increased during the trial period.
(b) there is a 4% chance that sales decreased during the trial period.
(c) during the trial period, sales increased by 4%.
(d) during the trial period, sales decreased by 4%.
(e) during the trial period, her sales figures were pretty high,
if indeed the new displays typically would have no effect.
21. Suppose that, based on the data collected in the trial, the owner of the
store decides to keep the new displays. Then
(a) she is in danger of making a Type I error.
(b) she is in danger of making a Type II error.
(c) she is in danger of making a Type III error.
(d) she will get a bigger .
(e) she will get a smaller .
10
24. Three different labs tested two types of cream, A and B, recording the
percentage of solubility in some liquid. Each lab repeated each experiment, and the data are given below:
Lab
Cream
A
1 6.8, 6.6
2 7.5, 7.4
3 7.8, 9.1
type
B
5.3, 6.1
7.2, 6.5
8.8, 9.1
11
For questions 2528, consider studying if gender and the highest academic qualification obtained (none, high school diploma, bachelors degree, post-graduate degree) are independent.
25. True or false? To study independence of gender and the highest qualification obtained, it would be useful to compare the four conditional
distributions:
the conditional distribution of gender given no qualification was
obtained,
the conditional distribution of gender given the highest qualification is high school diploma,
the conditional distribution of gender given the highest qualification is a bachelors degree,
the conditional distribution of gender given the highest qualification is a post-graduate degree.
(a) True
(b) False
26. True or false? To study independence of gender and the highest academic qualification obtained, it would be useful to construct a scatterplot.
(a) True
(b) False
27. True or false? To study independence of gender and the highest academic qualification obtained, it would be useful to calculate a correlation coefficient.
(a) True
(b) False
28. True or false? To study independence of gender and the highest academic qualification obtained, it would be useful to calculate a chi-square
statistic.
(a) True
(b) False
12
29. (6 marks) In his twenty seasons playing in the National Hockey League
(NHL), Wayne Gretzky played the following number of games per season:
79, 80, 80, 80, 74, 80, 80, 79, 64, 78,
73, 78, 74, 45, 81, 48, 80, 82, 82, 70
(a) Create a stemandleaf plot for these data.
8
7
6
5
4
0
9
4
0 0
4 9
0 0
8 3
1 0
8 4
2
0
(3 marks. 1 if empty stem omitted. 1 for any data points missing or misplaced. No need to order leaves on stems. Splitting
stems accepted.)
(b) Which of the following best describes the distribution? (Circle
one)
Symmetric
Left skewed
Right skewed
Uniform
(c) Identify any apparent outliers in the data, and provide a plausible
explanation for the value(s).
The values 45 and 48 are apparent outliers, being much lower
than the others. (1 mark. 0.5 if 64 included.). Possibly Gretzky was injured those seasons (1 mark, and in fact correct. But
other vaguely plausible explanations, like he was dropped from
the team, also permitted.)
13
30. (10 marks) Recall that a Roulette wheel has 38 slots, labelled 0, 1, 2,
..., 36, and 00. I will play Roulette by betting on the slot labelled 00.
For one play of Roulette, I pay $1. If 00 comes up on the wheel, I
get my dollar back, plus $35, for a net gain of $35. If 00 does not
come up on the wheel, I lose my dollar, for a net gain of $1. Let X
be my net gain in one play of Roulette.
(a) Find the probability distribution of X.
x 1 35
1
P (X = x) 37
38
38
(2 marks)
(b) Find E (X) , the expected value of X.
E (X) = 35
1
37
1
38
38
2
38
1
= .
19
=
(2 marks)
(c) Find Var(X) , the variance of X.
Now
E X2
= 352
=
37
1
+ (1)2
38
38
631
.
19
Hence
2
631
1
Var (X) =
19
19
11 988
=
361
33.21.
(3 marks. Alternatively Var(X) can be found directly via E
14
X+
1 2
19
.)
(d) Suppose that I play Roulette 100 times, each time betting on 00.
Let W be my total winnings. What is E (W )? What is Var(W )?
Think of
W = X1 + X2 + + X100
where each Xi has the distribution of X above. Then
E (W ) = 100E (X)
1
= 100
19
100
=
.
19
(1 mark)
Var (W ) = 100Var (X)
11 98800
=
361
= 3320. 8
(2 marks. 1 if 1002 used instead of 100)
31. (5 marks) Every day Lucky Louie plays a die roll game. He rolls a die
five times and counts the number of ones. If he rolls exactly two ones,
then he treats himself and buys a Barstucks Macchiato. That is the
only way he treats himself. Let X be the number of Macchiatos Lucky
Louie buys in the month of June, a month with thirty days. Then
X has a Binomial distribution defined by two parameters, denoted as
usual n and p.
(a) What is the value of n here?
30 (2 marks)
(b) What is the value of p?
p = P (rolling exactly 2 ones from 5)
2 3
5
1
5
=
2
6
6
125
1
= 10
36 216
625
=
3888
= 0.160 75
15
(a) Why did the researchers only allow the subjects to attempt the
task one at a time?
The dowsers would probably influence each other in their decisions,
the results then no longer being independent. (2 marks)
(b) Briefly explain why this experiment was not doubleblind.
It is not possible for the experiments to be blind to the knowledge
of which containers held water. (2 marks, or note that there is no
possible ambiguity in the scoring of the dowsers, so no argument
for the investigators being blinded.)
(c) One of the eight dowsers successfully determined the presence or
absence of water in all twelve containers. This proves false the
hypothesis that no-one has the genuine ability to dowse for water.
True or false? (Circle one)
True
False
(1 mark)
34. (9 marks) One definition of obesity is in terms of body mass index.
In a study of obesity in Vancouver fourth graders, random samples
of fourth graders were taken from each school and body mass indices
recorded. Here is a summary of the body mass index data from two
schools, Laura Secord and Charles Dickens.
School
Number of
children measured
Average
body mass
SD of
the body masses
Laura Secord
24.3
3.1
Charles Dickens
21.0
2.9
(a) Carry out a hypothesis test to determine if the average body mass
of fourth graders at Laura Secord is equal to the average at Charles
Dickens. Test at the 0.05 significance level. Clearly state your test
statistic, the tables you use and show all calculations. State your
conclusion in the context of this problem.
Use a two sample t test with pooled standard deviation, since 7 and
17
6 are small and the two sample standard deviations are similar. (1
mark) The test statistic is t = (
x1 x2 )/SE(
x1 x2 ). To calculate
the SE:
(7 1) 3.12 + (6 1) 2.92
7+62
99.71
=
11
= 9.064545 sp = 3.010738.
s2p =
x1 x2
24.3 21.0
=
= 1.970126.
SE
1.675020
x1 x2
24.3 21.0
=
= 1.9811.
SE
1.6765690
(2 marks) The 0.05 two-sided rule rejects the null hypothesis if
|t| > 2.570582. (1 mark) Once again, we do not reject the null
hypothesis. (1 mark for conclusion as above.)
t=
18
(b) Find a 95% confidence interval for the average body mass index
of fourth graders at Laura Secord School.
The 95% confidence interval is
s
x t
7
where t is from the t-table for a 95% confidence interval, degrees
of freedom = 6. So t = 2.447 (1 mark) and the confidence interval
is
3.1
24.3 2.447 = 24.3 2.447 1.171690
7
= 24.3 2.865953
or (21.4, 27.2). (2 marks. 1 if wrong school chosen.)
35. (9 marks) What affects how a person chooses at random? Each of
92 randomly sampled university students was given a slip of paper that
said
Randomly choose one of the letters S or Q.
Of these 92 students, 61 chose S. The remaining 31 students chose Q.
Another 98 randomly sampled university students were given a slip of
paper that said
Randomly choose one of the letters Q or S.
Of these 98 students, 45 chose S. The remaining 53 students chose Q.
Is there an association between how the students responded and the
ordering of the letters in the question? Carry out the appropriate test
at level 0.05. Clearly show the calculation of your test statistic and
your rejection rule (in particular, clarify which of the tables provided
you have used, if any). State your conclusion, in the context of this
problem.
This is a chi-squared test for homogeneity/independence, studying the
two variables: order of letters on the paper, response to question.
The table of observed counts is
19
Responded "Q"
Received
Responded "S"
Total
31
61
92
53
45
98
Total
84
106
190
Responded "S"
Total
92*84/190
=40.67368
92*106/190
=51.32632
92
98*84/190
=43.32632
106*98/190
=54.67368
98
Total
84
106
190
(4 marks, 1 for each error. If rounding to nearest integers, 1) The chisquared statistic is
(31 40.67368)2 /40.67368 + (61 51.32632)2 /51.32632
+(53 43.32632)2 /43.32632 + (45 54.67368)2 /54.67368
= 7.9955
(1 mark. No deduction for minor rounding errors.) We reject the null hypothesis of no relationship if the statistic is too large. We use the chi-squared
table, df =(2-1)(2-1)=1.
We can make our decision two ways. One way is to determine the level
0.05 rejection rule. From the chi-squared table, we see that we reject the null
hypothesis if the chi-squared statistic is bigger than 3.84. So we reject the
null hypothesis. Alternately, we can calculate the p-value. From the table,
20
the p-value is between 0.0025 and 0.005. Since the p-value is less than 0.05,
we reject the null hypothesis. (1 mark)
We conclude that what was written on the paper is associated with the response. (1 mark)
21