Practice Material Big Data
Practice Material Big Data
Practice Material Big Data
3. A statistics professor kept attendance records and recorded the number of absent students per
class. This data is displayed in the following histogram with the frequency of each number of absent
students shown above the bars.
4. Suppose we were to run a linear regression using the data in the following scatter plot.
1
What are the most reasonable values for y-intercept, b, and the slope m?
A. 𝑏𝑏 = 120 and 𝑚𝑚 = −2
B. 𝑏𝑏 = 45 and 𝑚𝑚 = 2
C. 𝑏𝑏 = 45 and 𝑚𝑚 = −20
D. 𝑏𝑏 = 45 and 𝑚𝑚 = −2
6. What price would we expect to pay on a 3 bedroom, 1,000 square foot house three miles from the
beach?
A. $201,422
B. $229,198
C. $243,850
D. $177,243
2
7. Which of the following statements is true?
A. The age of a driver is not statistically significant in explaining damage.
B. The Significance F tells us that this model is statistically significant in explaining variation in
damage.
C. We should not put much faith in these results since the value of R2 is less than 10%.
D. At α= 0.05, there is no evidence of a linear relationship between age and damage.
9. On average, what would be the dollar value of an accident involving a 25-year-old driver?
A. $10,795.47
B. $2,474.90
C. $13,372.58
D. $11,836.56
10. Which of the following statements is the best explanation of the R2?
A. 3.5% of the variation in accident damage can be explained by variation in the age of the
driver.
B. 3.5% of accident damage can be explained by variation in the age of the driver.
3
C. 3.5% of accident damage is explained by the age of the driver.
D. 3.5% of the time, the amount of damage is explained by the age of the driver.
END OF EXAM
4
ANSWER PAPER: NOT TO BE SEEN BY STUDENTS
1) D
2) A
3) A
4) D
5) C
6) C
7) B
8) D
9) B
10) A
5
All questions carry equal marks. Each correct answer is worth 10 marks and any incorrect
answer is worth zero marks.
1. Defining hypotheses is a useful way of approaching research because:
E. it allows the development of testable propositions.
F. it allows for the development of indisputable proof to be established in research findings.
G. it will impress the reader.
H. it looks suitably scientific.
2. The ________ is used to test the significance of the population correlation coefficient.
A. normal distribution
B. Student's t-distribution
C. F-distribution
D. chi-square distribution
3. The purpose of hypothesis statements is to draw a conclusion about the population parameters for
which we do not have complete knowledge.
A. True
B. False
4. The National Center for Education Statistics would like to test the hypothesis that the proportion of
Bachelor's degrees that were earned by women equals 0.60. A random sample of 140 college
graduates with Bachelor degrees found that 75 were women. The National Center for Education
Statistics would like to set α = 0.10. The correct hypothesis statement for this hypothesis test would
be _______________________.
E.
H 0 : p ≤ 0.60; H1 : p > 0.60
F.
H 0 : p ≠ 0.60; H1 : p =
0.60
G.
H 0 : p > 0.60; H1 : p ≤ 0.60
=
H.
H 0 : p 0.60; H1 : p ≠ 0.60
5. Apple is considering offering the new version of the iPad in color options other than black and
white. Before deciding, Apple would like to survey potential iPad users on their preferences.
Management feels that the gender of the person will affect their response. As a result, Apple would
like to ensure the sample in composed of an equal number of male and female respondents. This is
an example of cluster sampling.
A. True
B. False
6. Two variables have a correlation coefficient equal to +0.55 from a sample size of 8. Which one of
the following statements describes the results of the hypothesis test that the population
correlation coefficient is greater than zero using α = 0.05?
A. Because the test statistic is greater than the critical value, we fail to reject the null
hypothesis and conclude that the population correlation coefficient is not greater than zero.
B. Because the test statistic is greater than the critical value, we can reject the null hypothesis
and conclude that the population correlation coefficient is greater than zero.
6
C. Because the test statistic is less than the critical value, we fail to reject the null hypothesis
and conclude that the population correlation coefficient is not greater than zero.
D. Because the test statistic is less than the critical value, we can reject the null hypothesis and
conclude that the population correlation coefficient is not greater than zero.
9. According to these regression results, the average number of hours that a person who is 40 years
old is ________.
E. 6.8
7
F. 7.1
G. 7.7
H. 8.4
10. The percentage of the variation in hours of sleep per night that is explained by the age of the adult
is ________.
E. 11.23
F. 15.27
G. 23.60
H. 39.07
END OF EXAM
8
ANSWER PAPER: NOT TO BE SEEN BY STUDENTS
1 A
2 B
3 A
4 D
5 B
6 C
7 D
8 C
9 C
10 D
9
All questions carry equal marks. Each correct answer is worth 10 marks and any incorrect
answer is worth zero marks.
11. A ________ sample is a sample in which every member of the population has an equal chance of
being chosen:
I. Stratified.
J. Probability.
K. Simple random.
L. Systematic.
13. There are five rows of students seated in a statistics class. The following table shows the number of
students in each row and the average score of the most recent exam for that row.
10
14. Susan would like to conduct a survey of homeowners in the Meadowbrook neighbourhood to get
their opinions on proposed road modifications in the area. Which of the following is an example of
a systematic sample?
I. Susan selects every third house on each street in the neighbourhood.
J. Susan ensures that her sample contains an equal number of two-story, split-level, and ranch
homes in her sample.
K. Susan randomly chooses two streets in the neighbourhood and selects every home on these
streets.
L. Susan selects the first 20 homes that she passes as she walks into the entrance of the
neighbourhood
15. Porter Automotive is a car dealership that sells Buicks and Hondas. The following data shows the
number of buyers this month according to the brand of car they purchased as well as their age
group.
16. A professor would like to test the hypothesis that the average number of minutes that a student
needs to complete a statistics exam is equal to 45 minutes. The correct hypothesis statement would
be?
I. H0: μ = 45; H1: μ > 45.
J. H0: μ ≠ 45; H1: μ = 45.
K. H0: μ = 45; H1: μ < 45.
L. H0: μ = 45; H1: μ ≠ 45.
17. A regression analysis between sales (in $1000) and advertising (in $) resulted in the following least
squares line, = 80,000 + 4x. This implies that:
E. An increase $4 in advertising is expected to result in an increase of $4,000 in sales.
F. An increase of $1 in advertising is expected to result in an increase of $4,000 in sales.
G. An increase of $1 in advertising is expected to result in an increase of $80,004 in sales.
H. An increase of $1 in advertising is expected to result in an increase of $4 in sales.
19. Suppose you are interested in examining the determinants of earnings. You have information on
the age of the individual as well as their level of education: high school graduate, college graduate
or graduate degree. Let Y = earnings, X 1 = age, X 2 = 1 if the person has only a high school degree
and 0 otherwise, X 3 = 1 if the person has a college degree and 0 otherwise, X 4 = 1 if the person
has a graduate degree and 0 otherwise. Which of the following model specifications would not
work?
I. Y = β0 + β1X1 + β2X2 + β3X3 + β4X4
J. Y = β1X1 + β2X2 + β3X3
K. Y = β0 + β1X1 + β2X2 + β3X3
L. None of the above
20. Suppose we were to run a linear regression using the data in the following scatter plot.
What are the most reasonable values for y-intercept 𝑏𝑏 and the slope 𝑚𝑚?
I. 𝑏𝑏 = 120 and 𝑚𝑚 = −2
J. 𝑏𝑏 = 45 and 𝑚𝑚 = −20
K. 𝑏𝑏 = 45 and 𝑚𝑚 = −2
L. 𝑏𝑏 = 45 and 𝑚𝑚 = 2
END OF EXAM
12
ANSWER PAPER: NOT TO BE SEEN BY STUDENTS
1) C
2) D
3) D
4) A
5) C
6) D
7) B
8) C
9) A
10) A
13