ST104A 03 June
ST104A 03 June
ST104A 03 June
UNIVERSITY OF LONDON
ST104A
Statistics 1
Page 1 of 21
D1
SECTION A
Answer all parts of Question 1 (50 marks in total).
(a) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as either nominal or ordinal.
Justify your answer. (Note that no marks will be awarded without a justification.)
i. The manufacturer of a car.
ii. The amount of money in a bank account.
iii. The Gross Domestic Product (GDP) of a country.
iv. The rating of a hotel according to the number of stars it has.
[8 marks]
x,
8,
7,
(c) The salaries of the employees of a company are normally distributed with mean
25, 000 and a standard deviation of 10, 000.
i. What is the proportion of employees with a salary of at least 20, 000?
ii. What is the proportion of employees with salaries between 15, 000 and 35, 000?
[4 marks]
i.
i=5
!
i=3
2xi
ii.
i=4
!
3(yi 3)
i=2
iii.
y42
i=3
!
(2xi + yi2 ).
i=1
[6 marks]
UL15/0217
Page 2 of 6
D00
UL15/0850
Page 2 of 21
(e) The variable X takes the values 2, 4, 6 and 8 according to the following distribution
x
pX (x)
2
0.3
4
0.2
6
0.1
8
0.4
(g) It is stated in a consumer magazine that the average price of football shirts in
London is 19.00. A random sample is taken by obtaining a single football shirt
from each of 16 randomly chosen London retailers. The sample mean is 20.20
and the sample standard deviation is 2.40. Carry out a hypothesis test, at two
appropriate significance levels, to determine whether the price of football shirts in
London is more expensive than the price stated in the consumer magazine. State
your hypotheses, the test statistic and its distribution under the null hypothesis,
and your conclusion in the context of the problem.
[7 marks]
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. The chance that a normal random variable is less than two standard deviations
from its mean is 99%.
ii. The lower the regression coefficient in absolute value the weaker the correlation.
iii. Increasing the sample size will increase the width of a confidence interval for a
population mean (assuming that everything else remains constant).
iv. When testing a hypothesis, we use a two tailed test if we want to test whether
the parameter is greater than what is stated in the null hypothesis.
v. A population list is needed in order to conduct quota sampling.
vi. The regression of the variable Y on the variable X will always have the same
slope as the regression of the variable X on the variable Y .
[12 marks]
UL15/0217
Page 3 of 6
D00
UL15/0850
Page 3 of 21
SECTION B
Answer two questions from this section (25 marks each).
2. (a) Questionnaires were mailed to 300 households, in three different areas of a city,
to assess the level of local sporting facilities. The collected data are shown in
the table below
Area 1
Area 2
Area 3
Total
Total
100
100
100
300
i. Based on the data in the table, and without conducting a significance test,
would you say there is an association between areas and level of local
sporting facilities?
ii. Calculate the 2 statistic and use it to test for independence, using two
appropriate significance levels. What do you conclude?
[14 marks]
(b)
UL15/0217
Page 4 of 6
D00
UL15/0850
Page 4 of 21
3. The following data shows the recorded times (y) in seconds taken by 10 international
athletes to run 100 metres together with the corresponding wind speeds (x) at
the time of running. A positive wind speed indicates the wind is in the direction
of running and therefore considered to be helpful whereas a negative wind speed
indicates the wind is against the runner.
Athlete #1
x
-2.45
y
10.52
#2
-1.23
10.47
#3
-0.78
10.41
#4
-0.33
10.25
#5
-0.37
10.54
#6
0.34
10.09
#7
#8 #9 #10
0.53 1.17 2.35 2.91
10.30 9.99 9.92 9.87
i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
time for a runner for a wind speed of 1.5? Will you trust this value? Justify
your answer.
[13 marks]
Sample size
22
25
Sample mean
65.33
61.58
UL15/0217
Page 5 of 6
D00
UL15/0850
Page 5 of 21
4. (a) The following data show the length (in inches) of fish caught in one day in a
river:
10.1
11.2
12.1
12.4
13.2
14.3
10.4
11.2
12.1
12.5
13.4
14.5
10.5
11.5
12.2
12.6
13.5
14.8
10.9
11.7
12.2
12.8
13.6
15.2
11.1
11.9
12.3
12.9
13.7
15.5
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean (given that the sum of the data is 376.3), the median and
the modal group.
iii. Comment on the data given the shape of the histogram and the measures
you have calculated.
iv Name two other types of graphical displays that would be suitable to
represent the data.
[12 marks]
(b) In order to estimate the percentage of city households that have high speed
internet access, a random sample of 140 city households was taken. Of these,
70 had high speed internet access. A similar sample of 170 rural households
was also taken and it was found that 61 of them had high speed internet access.
The data are summarised in the table below
With high speed internet
Total
City Households
70
140
Rural Households
61
170
i. Give a 95% confidence interval for the difference between the proportions
of high speed internet access in city and rural households.
ii. Carry out a hypothesis test, at two suitable significance levels, to determine
whether city households are more likely to have high speed internet access
compared to rural households. State the test hypotheses, and specify your
test statistic and its distribution under the null hypothesis. Comment on
your findings.
iii. State any assumptions you made in (ii.).
[13 marks]
END OF PAPER
UL15/0217
Page 6 of 6
D00
UL15/0850
Page 6 of 21
ST104a Statistics 1
Examination Formula Sheet
Standard deviation of a discrete random
variable:
v
uN
uX
2
= =t
pi (xi )2
N
X
pi x i
i=1
i=1
Z=
x
z/2
n
P
(1 )/n
s
x
t/2, n1
n
Sample size
proportion:
z/2 2 2
e2
determination
Z=
/ n
for
z/2 2 p(1 p)
e2
UL15/0850
/ n
Page 7 of 21
0
X
S/ n
Z=
1 X
(1 2 )
X
p 2
12 /n1 + 22 /n2
d d
X
Sd / n
T =
sd
x
d t/2, n1
n
(P1 P2 ) (1 2 )
Z=p
P (1 P ) (1/n1 + 1/n2 )
R1 + R2
n1 + n2
2 test of association:
r X
c
X
(Oij Eij )2
Eij
r = s
i=1 j=1
n
P
xi yi n
xy
i=1
n
P
x2i
n
x2
i=1
rs = 1
n
P
d2i
i=1
n(n2
n
y2
i=1
1)
i=1
a = y b
x
2
UL15/0850
yi2
n
P
Page 8 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 9 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 10 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 11 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 12 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 13 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 14 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 15 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 16 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 17 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 18 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 19 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 20 of 21
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) Cambridge University Press, reproduced with permission.
UL15/0850
Page 21 of 21