Nothing Special   »   [go: up one dir, main page]

Practice Material Big Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

All questions carry equal marks.

Each correct answer is worth 10 marks and any incorrect


answer
1. Deanna has been hired to visit the local shopping mall to conduct a survey about the upcoming
political election. She needs to select respondents at the mall and ask them questions about their
voting tendencies. Deanna decides to position herself by the only entrance to the mall and select
every 10th shopper entering the mall to participate. Which of the following sampling techniques
best describes Deanna's method?
A. Probability
B. Cluster
C. Simple random
D. Systematic

2. Which of the following is an example of quantitative data?


A. Apple's closing stock price today
B. The zip code of your home address
C. Your gender
D. Your telephone number

3. A statistics professor kept attendance records and recorded the number of absent students per
class. This data is displayed in the following histogram with the frequency of each number of absent
students shown above the bars.

How many statistics classes had two or fewer students absent?


A. 85
B. 9
C. 40
D. 42

4. Suppose we were to run a linear regression using the data in the following scatter plot.

1
What are the most reasonable values for y-intercept, b, and the slope m?
A. 𝑏𝑏 = 120 and 𝑚𝑚 = −2
B. 𝑏𝑏 = 45 and 𝑚𝑚 = 2
C. 𝑏𝑏 = 45 and 𝑚𝑚 = −20
D. 𝑏𝑏 = 45 and 𝑚𝑚 = −2

QUESTIONS 5 AND 6 ARE BASED ON THE FOLLOWING INFORMATION:


A real estate appraiser is interested in determining the factors that determine the price of a house. She
wants to run the following regression: Y = β0 + β1X1 + β2X2 + β3X3 where Y = price of the house in
$1,000s, X1 = number of bedrooms, X2 = square footage of living space, and X3 = number of miles from the
beach. Taking a sample of 30 houses, the appraiser runs a multiple regression and gets the following
results: = 123.2 + 4.59X1 + 0.125X2 - 6.04X3 and R2 = 0.47.
5. What should the null and alternative hypotheses be for β1?
A. H0 : β1 = 0, H1 : β1 < 0
B. H0 : β1 ≠ 0, H1 : β1 = 0
C. H0 : β1 = 0, H1 : β1 ≠ 0
D. H0 : β1 = 0, H1 : β1 > 0

6. What price would we expect to pay on a 3 bedroom, 1,000 square foot house three miles from the
beach?
A. $201,422
B. $229,198
C. $243,850
D. $177,243

QUESTIONS 7, 8, 9 AND 10 ARE BASED ON THE FOLLOWING INFORMATION:


An insurance company analyst is interested in analyzing the dollar value of damage in automobile
accidents. She collects data from 115 accidents, and records the amount of damage as well as the age of
the driver. The results of her regression analysis are listed below.

2
7. Which of the following statements is true?
A. The age of a driver is not statistically significant in explaining damage.
B. The Significance F tells us that this model is statistically significant in explaining variation in
damage.
C. We should not put much faith in these results since the value of R2 is less than 10%.
D. At α= 0.05, there is no evidence of a linear relationship between age and damage.

8. How would you best explain the y-intercept in this situation?


A. For each additional 1-year increase in the age of the driver, we would expect damage to
increase by $10,726.
B. For each additional 1-year increase in the age of the driver, we would expect damage to
increase by $70.
C. The average amount of damage was $10,726.
D. It makes no sense to explain the intercept in this situation, since we cannot have a driver
with age of zero.

9. On average, what would be the dollar value of an accident involving a 25-year-old driver?
A. $10,795.47
B. $2,474.90
C. $13,372.58
D. $11,836.56

10. Which of the following statements is the best explanation of the R2?
A. 3.5% of the variation in accident damage can be explained by variation in the age of the
driver.
B. 3.5% of accident damage can be explained by variation in the age of the driver.

3
C. 3.5% of accident damage is explained by the age of the driver.
D. 3.5% of the time, the amount of damage is explained by the age of the driver.

END OF EXAM

4
ANSWER PAPER: NOT TO BE SEEN BY STUDENTS

1) D
2) A
3) A
4) D
5) C
6) C
7) B
8) D
9) B
10) A

5
All questions carry equal marks. Each correct answer is worth 10 marks and any incorrect
answer is worth zero marks.
1. Defining hypotheses is a useful way of approaching research because:
E. it allows the development of testable propositions.
F. it allows for the development of indisputable proof to be established in research findings.
G. it will impress the reader.
H. it looks suitably scientific.

2. The ________ is used to test the significance of the population correlation coefficient.
A. normal distribution
B. Student's t-distribution
C. F-distribution
D. chi-square distribution

3. The purpose of hypothesis statements is to draw a conclusion about the population parameters for
which we do not have complete knowledge.
A. True
B. False

4. The National Center for Education Statistics would like to test the hypothesis that the proportion of
Bachelor's degrees that were earned by women equals 0.60. A random sample of 140 college
graduates with Bachelor degrees found that 75 were women. The National Center for Education
Statistics would like to set α = 0.10. The correct hypothesis statement for this hypothesis test would
be _______________________.

E.
H 0 : p ≤ 0.60; H1 : p > 0.60

F.
H 0 : p ≠ 0.60; H1 : p =
0.60

G.
H 0 : p > 0.60; H1 : p ≤ 0.60
=
H.
H 0 : p 0.60; H1 : p ≠ 0.60

5. Apple is considering offering the new version of the iPad in color options other than black and
white. Before deciding, Apple would like to survey potential iPad users on their preferences.
Management feels that the gender of the person will affect their response. As a result, Apple would
like to ensure the sample in composed of an equal number of male and female respondents. This is
an example of cluster sampling.
A. True
B. False
6. Two variables have a correlation coefficient equal to +0.55 from a sample size of 8. Which one of
the following statements describes the results of the hypothesis test that the population
correlation coefficient is greater than zero using α = 0.05?
A. Because the test statistic is greater than the critical value, we fail to reject the null
hypothesis and conclude that the population correlation coefficient is not greater than zero.
B. Because the test statistic is greater than the critical value, we can reject the null hypothesis
and conclude that the population correlation coefficient is greater than zero.
6
C. Because the test statistic is less than the critical value, we fail to reject the null hypothesis
and conclude that the population correlation coefficient is not greater than zero.
D. Because the test statistic is less than the critical value, we can reject the null hypothesis and
conclude that the population correlation coefficient is not greater than zero.

7. Which of the following is NOT part of the sampling design process?


E. Determining the relevant sample frame.
F. Selection of the sampling technique.
G. Specifying the sampling unit.
H. Refining the research question.
NOTE – Last 3 questions all relate to the information provided below
Some research has indicated that as people age they need less sleep. A random sample of adults was
selected and the number of hours that they slept last night was recorded. Simple regression analysis was
performed on this sample and the results are shown below.

8. The size of the sample is ________.


E. 21
F. 22
G. 23
H. 24

9. According to these regression results, the average number of hours that a person who is 40 years
old is ________.
E. 6.8
7
F. 7.1
G. 7.7
H. 8.4

10. The percentage of the variation in hours of sleep per night that is explained by the age of the adult
is ________.
E. 11.23
F. 15.27
G. 23.60
H. 39.07

END OF EXAM

8
ANSWER PAPER: NOT TO BE SEEN BY STUDENTS

1 A
2 B
3 A
4 D
5 B
6 C
7 D
8 C
9 C
10 D

9
All questions carry equal marks. Each correct answer is worth 10 marks and any incorrect
answer is worth zero marks.
11. A ________ sample is a sample in which every member of the population has an equal chance of
being chosen:
I. Stratified.
J. Probability.
K. Simple random.
L. Systematic.

12. Which of the following is an example of a discrete random variable?


E. The percentage of people living below the poverty level in a Boston.
F. The monthly electric bill for a local business.
G. The amount of time it takes for a worker to complete a complex task.
H. The number of people eating at a local café between noon and 2:00 p.m.

13. There are five rows of students seated in a statistics class. The following table shows the number of
students in each row and the average score of the most recent exam for that row.

Row Number of Students Row Average


1 6 82.3
2 7 91.4
3 4 85.0
4 5 78.3
5 6 89.1
What is the average exam score for this class?
E. 84.7
F. 86.8
G. 83.0
H. 85.7.

10
14. Susan would like to conduct a survey of homeowners in the Meadowbrook neighbourhood to get
their opinions on proposed road modifications in the area. Which of the following is an example of
a systematic sample?
I. Susan selects every third house on each street in the neighbourhood.
J. Susan ensures that her sample contains an equal number of two-story, split-level, and ranch
homes in her sample.
K. Susan randomly chooses two streets in the neighbourhood and selects every home on these
streets.
L. Susan selects the first 20 homes that she passes as she walks into the entrance of the
neighbourhood

15. Porter Automotive is a car dealership that sells Buicks and Hondas. The following data shows the
number of buyers this month according to the brand of car they purchased as well as their age
group.

Age Buick Honda


Under 40 years old 6 17
40 years or older 19 8
The percentage of buyers who are under 40 old in this data is _____.
E. 20%
F. 50%
G. 46%
H. 54%

16. A professor would like to test the hypothesis that the average number of minutes that a student
needs to complete a statistics exam is equal to 45 minutes. The correct hypothesis statement would
be?
I. H0: μ = 45; H1: μ > 45.
J. H0: μ ≠ 45; H1: μ = 45.
K. H0: μ = 45; H1: μ < 45.
L. H0: μ = 45; H1: μ ≠ 45.

17. A regression analysis between sales (in $1000) and advertising (in $) resulted in the following least
squares line, = 80,000 + 4x. This implies that:
E. An increase $4 in advertising is expected to result in an increase of $4,000 in sales.
F. An increase of $1 in advertising is expected to result in an increase of $4,000 in sales.
G. An increase of $1 in advertising is expected to result in an increase of $80,004 in sales.
H. An increase of $1 in advertising is expected to result in an increase of $4 in sales.

18. An indication of no linear relationship between two variables would be a:


I. Coefficient of determination equal to 1
J. Coefficient of correlation equal to -1
K. Coefficient of correlation of 0
11
L. Coefficient of determination equal to -1

19. Suppose you are interested in examining the determinants of earnings. You have information on
the age of the individual as well as their level of education: high school graduate, college graduate
or graduate degree. Let Y = earnings, X 1 = age, X 2 = 1 if the person has only a high school degree
and 0 otherwise, X 3 = 1 if the person has a college degree and 0 otherwise, X 4 = 1 if the person
has a graduate degree and 0 otherwise. Which of the following model specifications would not
work?
I. Y = β0 + β1X1 + β2X2 + β3X3 + β4X4
J. Y = β1X1 + β2X2 + β3X3
K. Y = β0 + β1X1 + β2X2 + β3X3
L. None of the above

20. Suppose we were to run a linear regression using the data in the following scatter plot.

What are the most reasonable values for y-intercept 𝑏𝑏 and the slope 𝑚𝑚?
I. 𝑏𝑏 = 120 and 𝑚𝑚 = −2
J. 𝑏𝑏 = 45 and 𝑚𝑚 = −20
K. 𝑏𝑏 = 45 and 𝑚𝑚 = −2
L. 𝑏𝑏 = 45 and 𝑚𝑚 = 2

END OF EXAM

12
ANSWER PAPER: NOT TO BE SEEN BY STUDENTS

1) C
2) D
3) D
4) A
5) C
6) D
7) B
8) C
9) A
10) A

13

You might also like