STA467/567 Fall 2022
STA467/567 Fall 2022
STA467/567 Fall 2022
Fall 2022
Tyler Drellishak
a. Construct a scatter plot of Sunday circulation versus daily circulation. Does the
plot suggest a linear relationship between daily and Sunday circulation? Do
you think this is a plausible relationship?
The plot suggests a linear relationship between daily and Sunday circulation, I
believe this is a plausible relationship as papers which are more popular on
weekdays are also likely to be more popular on Sundays.
σ = 109.4 = 11968.36
𝐻 :𝛽 = 0
𝐻 :𝛽 ≠ 0
𝑇 = 18.93
𝑝<2
g. Provide an interval estimate (based on 95% level) for the average Sunday
circulation of newspapers with daily circulation of 500,000.
We should not predict beyond the range of our predictors; we cannot extrapolate
our regression models. It could be accurate however we do not know about the
relationship between the variables beyond our model.
2. One may wonder if people of similar heights tend to marry each other. For this
purpose, a sample of newly married couples was selected. Let X be the height of
the husband and Y be the height of the wife. The heights (in centimeters) of
husbands and wives are found in the file named (P052.txt).
a. Compute the covariance between the heights of the husbands and wives.
b. What would the covariance be if heights were measured in inches rather than
in centimeters?
69.413𝑐𝑚 1𝑖𝑛
∗ = 10.759𝑖𝑛
1 2.54 𝑐𝑚
c. Compute the correlation coefficient between the heights of the husband and
wife.
r = 0.763
d. What would the correlation be if heights were measured in inches rather than
in centimeters?
f. We wish to fit a regression model relating the heights of husbands and wives.
Which one of the two variables would you choose as the response variable?
Justify your answer.
I’d choose wife height to be the response variable, for the prior question a man is
choosing a wife exactly 5 cm shorter than himself. This makes the Male height
the predictor variable and wife height the response.
g. Using your choice of the response variable in part (f), test the null hypothesis
that the slope is zero.
𝐻 :β = 0
𝐻 :β ≠ 0
𝑇 = 11.458
𝑝 < 2-
h. Using your choice of the response variable in part(f), test the null hypothesis
that the intercept is zero.
𝐻 :β = 0
𝐻 :β ≠ 0
𝑇 = 3.933
𝑝 = 0.000161
i. What is the coefficient of determination for the model you fitted in (f) and
interpret its value? How this coefficient of determination is related to the
correlation coefficient in part (c)?
j. If Y and X were reversed in the above regression, what would you expect 𝑅# to
be? Why?
( , )
I would expect 𝑅 = 0.5828 , 𝑅 = 𝑟 and 𝑟 = since 𝑐𝑜𝑣(𝑥, 𝑦) =
𝑐𝑜𝑣(𝑦, 𝑥)
( , ) ( , )
So we know 𝑅 = 𝑟 = = = 𝑟 (𝑟𝑒𝑣𝑒𝑟𝑠𝑒𝑑) = 𝑅 (𝑟𝑒𝑣𝑒𝑟𝑠𝑒𝑑)
3. Name one or more graphs that can be used to validate each of the following
assumptions. For each graph, sketch an example where the corresponding
assumption is valid and an example where the assumption is clearly invalid.
4. The Expanded Computer Repair Times Data: Length of Service Calls (Minutes)
and Number of Units Repaired (Units). You can find the data in the file named
“P124.txt’’.
1) It is a linear model.
To check this assumption we will look at the scatterplot of length of call vs
units repaired to see if a linear model appears appropriate