Simple Linear Regression
Simple Linear Regression
Simple Linear Regression
Brandon Stewart1
Princeton
1
These slides are heavily influenced by Matt Blackwell, Adam Glynn and Jens
Hainmueller. Illustrations by Shay O’Brien.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 1 / 103
Where We’ve Been and Where We’re Going...
Last Week
I hypothesis testing
I what is regression
This Week
I Monday:
F mechanics of OLS
F properties of OLS
I Wednesday:
F hypothesis tests for regression
F confidence intervals for regression
F goodness of fit
Next Week
I mechanics with two regressors
I omitted variables, multicollinearity
Long Run
I probability → inference → regression
Questions?
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 2 / 103
Macrostructure
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 3 / 103
1 Mechanics of OLS
4 Properties Continued
7 Goodness of fit
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 4 / 103
The population linear regression function
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 5 / 103
The sample linear regression function
ubi = Yi − Ybi
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 6 / 103
Overall Goals for the Week
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 7 / 103
What is OLS?
An estimator for the slope and the intercept of the regression line
We talked last week about ways to derive this estimator and we
settled on deriving it by minimizing the squared prediction errors of
the regression, or in other words, minimizing the sum of the squared
residuals:
Ordinary Least Squares (OLS):
n
X
(βb0 , βb1 ) = arg min (Yi − b0 − b1 Xi )2
b0 ,b1 i=1
In words, the OLS estimates are the intercept and slope that minimize
the sum of the squared residuals.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 8 / 103
Graphical Example
How do we fit the regression line Ŷ = β̂0 + β̂1 X to the data?
0 1
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 9 / 103
Graphical Example
How do we fit the regression line Ŷ = β̂0 + β̂1 X to the data?
Answer: We will minimize the squared sum of residuals
Residual ui is “part”
of Yi not predicted
ui Yi Y i
n 2
min
u i 1
i
0, 1
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 9 / 103
Deriving the OLS estimator
Let’s think about n pairs of sample observations:
(Y1 , X1 ), (Y2 , X2 ), . . . , (Yn , Xn )
Let {b0 , b1 } be possible values for {β0 , β1 }
Define the least squares objective function:
n
X
S(b0 , b1 ) = (Yi − b0 − b1 Xi )2 .
i=1
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 10 / 103
The OLS estimator
βb0 = Y − βb1 X
Pn
(Xi − X )(Yi − Y )
β1 = i=1
b Pn 2
i=1 (Xi − X )
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 11 / 103
Intuition of the OLS estimator
The intercept equation tells us that the regression line goes through
the point (Y , X ):
Y = βb0 + βb1 X
The slope for the regression line can be written as the following:
Pn
i=1 (Xi − X )(Yi − Y) Sample Covariance between X and Y
βb1 = Pn 2
=
i=1 (Xi − X )
Sample Variance of X
The higher the covariance between X and Y , the higher the slope will
be.
Negative covariances → negative slopes;
positive covariances → positive slopes
What happens when Xi doesn’t vary?
What happens when Yi doesn’t vary?
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 12 / 103
A Visual Intuition for the OLS Estimator
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 13 / 103
A Visual Intuition for the OLS Estimator
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 13 / 103
A Visual Intuition for the OLS Estimator
+
+
+ + -
+ +
+ + +
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 13 / 103
Mechanical properties of OLS
Later we’ll see that under certain assumptions, OLS will have nice
statistical properties.
But some properties are mechanical since they can be derived from
the first order conditions of OLS.
1 The residuals will be 0 on average:
n
1X
ubi = 0
n
i=1
2 The residuals will be uncorrelated with the predictor
(cov
c is the sample covariance):
cov(X
c i , ubi ) = 0
3 The residuals will be uncorrelated with the fitted values:
cov(
c Ybi , ubi ) = 0
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 14 / 103
OLS slope as a weighted sum of the outcomes
One useful derivation is to write the OLS estimator for the slope as a
weighted sum of the outcomes.
n
X
βb1 = Wi Yi
i=1
(Xi − X )
Wi = Pn 2
i=1 (Xi − X )
This is important for two reasons. First, it’ll make derivations later
much easier. And second, it shows that is just the sum of a random
variable. Therefore it is also a random variable.
To the board!
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 15 / 103
1 Mechanics of OLS
4 Properties Continued
7 Goodness of fit
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 16 / 103
Sampling distribution of the OLS estimator
Remember: OLS is an estimator—it’s a machine that we plug data
into and we get out estimates.
Sample 1: {(Y1 , X1 ), . . . , (Yn , Xn )} (βb0 , βb1 )1
Sample 2: {(Y1 , X1 ), . . . , (Yn , Xn )} (βb0 , βb1 )2
.. OLS ..
. .
Sample k − 1: {(Y1 , X1 ), . . . , (Yn , Xn )} (βb0 , βb1 )k−1
Sample k: {(Y1 , X1 ), . . . , (Yn , Xn )} (βb0 , βb1 )k
Just like the sample mean, sample difference in means, or the sample
variance
It has a sampling distribution, with a sampling variance/standard
error, etc.
Let’s take a simulation approach to demonstrate:
I Pretend that the AJR data represents the population of interest
I See how the line varies from sample to sample
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 17 / 103
Simulation procedure
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 18 / 103
Population Regression
12
Log GDP per capita growth
11
10
9
8
7
6
1 2 3 4 5 6 7 8
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 19 / 103
Randomly sample from AJR
12
Log GDP per capita growth
11
10
9
8
7
6
1 2 3 4 5 6 7 8
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 20 / 103
Sampling distribution of OLS
You can see that the estimated slopes and intercepts vary from sample
to sample, but that the “average” of the lines looks about right.
Sampling distribution of intercepts Sampling distribution of slopes
300
Frequency
Frequency
300
100
100
0
0
6 8 10 12 14 -1.5 -1.0 -0.5 0.0 0.5
^
β0 ^
β1
Is this unique?
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 21 / 103
Assumptions for unbiasedness of the sample mean
What assumptions did we make to prove that the sample mean was
unbiased?
E[X ] = µ
Just one: random sample
We’ll need more than this for the regression case
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 22 / 103
Our goal
βb1 ∼?(?, ?)
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 23 / 103
OLS Assumptions Preview
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 24 / 103
Hierarchy of OLS Assumptions
@(A--B?(.C)D*EF<3GH* 5:(--/'(:*<?*EF3GH*
!"#$%&'(%)$* 3$4/(-#"$#--*
I-680,)%'*!$J#.#$'#************* 98(::B9(80:#*!$J#.#$'#***
+(,(*+#-'./0%)$* 5)$-/-,#$'6* EK*($"*!LH" E,*($"*NH*
M)8)-C#"(-%'/,6* M)8)-C#"(-%'/,6*
O).8(:/,6*)J*G..).-*
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 25 / 103
OLS Assumption I
Assumption (I. Linearity in Parameters)
The population regression model is linear in its parameters and correctly
specified as:
Y = β0 + β1 X1 + u
Potential Violations:
Time series data (regressor values may exhibit persistence)
Sample selection problems (sample not representative of the
population)
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 27 / 103
OLS Assumption III
Assumption (III. Variation in X ; a.k.a. No Perfect Collinearity)
The observed data:
xi for i = 1, ..., n
are not all the same value.
Satisfied as long as there is some variation in the regressor X in the
sample.
-1
-2
-3 -2 -1 0 1 2 3
X
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 29 / 103
OLS Assumption IV
Violations:
Recall that u represents all unobserved factors that influence Y
If such unobserved factors are also correlated with X , Cov(X , u) 6= 0
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 30 / 103
Violating the zero conditional mean assumption
How does this assumption get violated? Let’s generate data from the
following model:
Yi = 1 + 0.5Xi + ui
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 31 / 103
Violating the zero conditional mean assumption
5
4
4
3
3
2
2
Y
Y
1
1
0
0
-1
-1
-2
-2
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
X X
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 32 / 103
Unbiasedness (to the blackboard)
With Assumptions 1-4, we can show that the OLS estimator for the slope
is unbiased, that is E [βb1 ] = β1 .
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 33 / 103
Unbiasedness of OLS
The sampling distributions of the estimators β̂1 and β̂0 are centered about
the true population parameter values β1 and β0 .
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 34 / 103
Where are we?
βb1 ∼?(β1 , ?)
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 35 / 103
Sampling variance of estimated slope
1 Linearity
2 Random (iid) sample
3 Variation in Xi
4 Zero conditional mean of the errors
5 Homoskedasticity
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 36 / 103
Variance of OLS Estimators
How can we derive Var[β̂0 ] and Var[β̂1 ]? Let’s make the following additional
assumption:
Var[u|X ] = σu2
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 38 / 103
Variance of OLS Estimators
Theorem (Variance of OLS Estimators)
Given OLS Assumptions I–V (Gauss-Markov Assumptions):
σu2 σu2
Var[β̂1 | X ] = Pn 2
=
i=1 (xi − x̄) SSTx
x̄ 2
2 1
Var[β̂0 | X ] = σu + Pn 2
n i=1 (xi − x̄)
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 39 / 103
Understanding the sampling variance
σu2
var[βb1 |X1 , . . . , Xn ] = Pn 2
i=1 (Xi − X )
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 40 / 103
Estimating the Variance of OLS Estimators
How can we estimate the unobserved error variance Var [u] = σu2 ?
We can derive an estimator based on the residuals:
Recall: The errors ui are NOT the same as the residuals ûi .
Intuitively, the scatter of the residuals around the fitted regression line should
reflect the unseen scatter about the true population regression line.
We can measure scatter with the mean squared deviation:
n n
1X ¯ 2= 1
X
MSD(û) ≡ (ûi − û) ûi2
n n
i=1 i=1
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 41 / 103
Estimating the Variance of OLS Estimators
By construction, the regression line is closer since it is drawn to fit the
actual sample we have
Specifically, the regression line is drawn so as to minimize the sum of the
squares of the distances between it and the observations
So the spread of the residuals MSD(û) will slightly underestimate the error
variance Var[u] = σu2 on average
In fact, we can show that with a single regressor X we have:
n−2 2
E [MSD(û)] = σu (degrees of freedom adjustment)
n
We plug this estimate into the variance estimators for β̂0 and β̂1 .
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 42 / 103
Where are we?
σu2
β1 ∼? β1 , Pn
b
2
i=1 (Xi − X )
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 43 / 103
Where We’ve Been and Where We’re Going...
Last Week
I hypothesis testing
I what is regression
This Week
I Monday:
F mechanics of OLS
F properties of OLS
I Wednesday:
F hypothesis tests for regression
F confidence intervals for regression
F goodness of fit
Next Week
I mechanics with two regressors
I omitted variables, multicollinearity
Long Run
I probability → inference → regression
Questions?
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 44 / 103
1 Mechanics of OLS
4 Properties Continued
7 Goodness of fit
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 45 / 103
Example: Epstein and Mershon SCOTUS data
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 46 / 103
Douglas
90
Goldberg
Marshall
Fortas
80 Warren Brennan
Black
y = 27.6 + 41.2x + u
70
Stevens ____
Rise
60
Run
CLlib
Blackmun Stewart
50
Frankfurter
Jackson
Clark
Whittaker Harlan
Souter White
40
Burton
Powell Minton
Kennedy
O'Connor Reed
30
Scalia Burger
●
Thomas
Rehnquist
20
SCscore
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 47 / 103
How to get β0 and β1
Pn
(x − x̄)(yi − ȳ )
β̂1 = Pn i
i=1
2
.
i=1 (xi − x̄)
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 48 / 103
1 Mechanics of OLS
4 Properties Continued
7 Goodness of fit
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 49 / 103
Where are we?
@(A--B?(.C)D*EF<3GH* 5:(--/'(:*<?*EF3GH*
!"#$%&'(%)$* 3$4/(-#"$#--*
I-680,)%'*!$J#.#$'#************* 98(::B9(80:#*!$J#.#$'#***
+(,(*+#-'./0%)$* 5)$-/-,#$'6* EK*($"*!LH" E,*($"*NH*
M)8)-C#"(-%'/,6* M)8)-C#"(-%'/,6*
O).8(:/,6*)J*G..).-*
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 50 / 103
Where are we?
σu2
βb1 ∼? β1 , Pn 2
i=1 (Xi − X )
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 51 / 103
OLS is BLUE :(
Theorem (Gauss-Markov)
Given OLS Assumptions I–V, the OLS estimator is BLUE, i.e. the
1 Best: Lowest variance in class
2 Linear: Among Linear estimators
3 Unbiased: Among Linear Unbiased estimators
4 Estimator.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 52 / 103
Gauss-Markov Theorem
OLS is efficient in the class of unbiased, linear estimators.
All estimators
unbiased
linear
σu2
β1 ∼? β1 , Pn
b
2
i=1 (Xi − X )
σu2
And we know that Pn (X 2 is the lowest variance of any linear
i=1 i −X )
estimator of β1
What about the last question mark? What’s the form of the
distribution? Uniform? t? Normal? Exponential? Hypergeometric?
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 54 / 103
Large-sample distribution of OLS estimators
Remember that the OLS estimator is the sum of independent r.v.’s:
n
X
βb1 = Wi Yi
i=1
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 55 / 103
Where are we?
Under Assumptions 1-5 and in large samples, we know that
σu2
β1 ∼ N β1 , Pn
b
2
i=1 (Xi − X )
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 56 / 103
Sampling distribution in small samples
1 Linearity
2 Random (iid) sample
3 Variation in Xi
4 Zero conditional mean of the errors
5 Homoskedasticity
6 Errors are conditionally Normal
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 57 / 103
OLS Assumptions VI
Assumption (VI. Normality)
⊥X , and
The population error term is independent of the explanatory variable, u⊥
is normally distributed with mean zero and variance σu2 :
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 58 / 103
Sampling Distribution for βb1
Theorem (Sampling Distribution of βb1 )
Under Assumptions I–VI,
βb1 ∼ N β1 , Var[βb1 | X ]
where σu2
Var[β̂1 | X ] = Pn 2
i=1 (xi − x̄)
which implies βb − β1 βb1 − β1
q 1 = ∼ N(0, 1)
Var[β̂1 | X ] SE (β̂)
Proof.
Given Assumptions I–VI, β̂1 is a linear combination of the i.i.d. normal random variables:
n
X (xi − x̄)
β̂1 = β1 + ui where ui ∼ N(0, σu2 ).
i=1
SSTx
Any linear combination of independent normals is normal, and we can transform/standarize any
normal random variable into a standard normal by subtracting off its mean and dividing by its
standard deviation.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 59 / 103
Sampling distribution of OLS slope
If we have Yi given Xi is distributed N(β0 + β1 Xi , σu2 ), then we have
the following at any sample size:
βb1 − β1
∼ N(0, 1)
SE [βb1 ]
βb1 − β1
∼ tn−2
SE
c [βb1 ]
Proof.
The logic is perfectly analogous to the t-value for the population mean — because we
are estimating the denominator, we need a distribution that has fatter tails than N(0, 1)
to take into account the additional uncertainty.
This time, σ̂u2 contains two estimated parameters (β̂0 and β̂1 ) instead of one, hence the
degrees of freedom = n − 2.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 61 / 103
Where are we?
σu2
βb1 ∼ N β1 , Pn 2
i=1 (Xi − X )
βb1 − β1
∼ tn−2
SE
c [βb1 ]
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 62 / 103
Large Sample Properties: Consistency
We just looked formally at the small sample properties of the OLS
estimator, i.e., how (β̂0 , β̂1 ) behaves in repeated samples of a given n.
Now let’s take a more rigorous look at the large sample properties, i.e., how
(β̂0 , β̂1 ) behaves when n → ∞.
plim βb1 = β1
n→∞
Cov[X , u]
= β1 + (by the law of large numbers)
Var[X ]
= β1 (Cov[X , u] = 0 and Var[X ] > 0)
n3
{p,
n2
-1 _
Il, ~,
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 65 / 103
Large Sample Properties: Asymptotic Normality
For statistical inference, we need to know the sampling distribution of β̂
when n → ∞.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 66 / 103
Large Sample Inference
Proof.
Proof is similar to the small-sample normality proof:
n
X (xi − x̄)
β̂1 = β1 + ui
i=1
SSTx
√ 1 Pn
√ n · n i=1 (xi − x̄)ui
n(β̂1 − β1 ) = 1
Pn
i=1 (xi − x̄)
2
n
where the numerator converges in distribution to a normal random variable by CLT.
Then, rearranging the terms, etc. gives you the right formula given in the theorem.
For a more formal and detailed proof, see Wooldridge Appendix 5A.
For 2 and 3, we need to know more than just the mean and the variance of
the sampling distribution of β̂1 . We need to know the full shape of the
sampling distribution of our estimators β̂0 and β̂1 .
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 68 / 103
1 Mechanics of OLS
4 Properties Continued
7 Goodness of fit
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 69 / 103
Null and alternative hypotheses review
Null: H0 : β1 = 0
I The null is the straw man we want to knock down.
I With regression, almost always null of no relationship
Alternative: Ha : β1 6= 0
I Claim we want to test
I Almost always “some effect”
I Could do one-sided test, but you shouldn’t
Notice these are statements about the population parameters, not the
OLS estimates.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 70 / 103
Test statistic
Under the null of H0 : β1 = c, we can use the following familiar test
statistic:
βb1 − c
T =
SE
c [βb1 ]
As we saw in the last section, if the errors are conditionally Normal,
then under the null hypothesis we have:
T ∼ tn−2
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 71 / 103
Rejection region
Choose a level of the test, α, and find rejection regions that
correspond to that value under the null distribution:
P(−tα/2,n−2 < T < tα/2,n−2 ) = 1 − α
This is exactly the same as with sample means and sample differences
in means, except that the degrees of freedom on the t distribution
have changed.
0.5
Reject Retain Reject
0.4
0.3
dnorm(x)
0.2
0.1
0.025 0.025
0.0
-t = -1.96 t = 1.96
-4 -2 0 2 4
x
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 72 / 103
p-value
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 73 / 103
1 Mechanics of OLS
4 Properties Continued
7 Goodness of fit
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 74 / 103
Confidence intervals
Very similar to the approach with sample means. By the sampling
distribution of the OLS estimator, we know that we can find t-values
such that:
βb1 − β1
P − tα/2,n−2 ≤ ≤ tα/2,n−2 = 1 − α
SE
c [βb1 ]
If we rearrange this as before, we can get an expression for confidence
intervals:
P βb1 − tα/2,n−2 SE c [βb1 ] = 1 − α
c [βb1 ] ≤ β1 ≤ βb1 + tα/2,n−2 SE
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 75 / 103
Sampling distribution of interval estimates
CIs Simulation Example
Returning to the simulation example, we can simulate the sampling distributions of the
95% interval estimates for βb0 and βb1 .
Returning to our simulation example we can simulate the sampling
distributions of the 95 % confidence interval estimates for βb1 and βb0
●
6
●
● ●
4
●
●
●
●
2
●
●
●
yy
●
●
−2
●
●
●
●
−4
●
●●
●
−6
●
●
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 76 / 103
When we repeat the process over and over, we expect 95% of the confidence intervals
CIs Simulation
to contain Example
the true parameters.
Note that, in a given sample, one CI may cover its true value and the other may not.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 76 / 103
Prediction error
How do we judge how well a line fits the data?
One way is to find out how much better we do at predicting Y once
we include X into the regression model.
Prediction errors without X : best prediction is the mean, so our
squared errors, or the total sum of squares (SStot ) would be:
n
X
SStot = (Yi − Y )2
i=1
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 77 / 103
Sum of Squares
11
10
9
8
7
6
1 2 3 4 5 6 7 8
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 78 / 103
Sum of Squares
Residuals
12
Log GDP per capita growth
11
10
9
8
7
6
1 2 3 4 5 6 7 8
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 78 / 103
R-square
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 79 / 103
Is R-squared useful?
15 ●
●
● ●
●
●
● ●
●●
10
●
●● ●
●
● ●
● ●
● ●
●
●
● ● ● ● ●
● ● ●
● ●
● ●
●
y
● ●
● ● ●
5
● ● ●
●● ● ● ● ●
●
●● ●● ●
● ●
●● ● ●
●
● ●●
● ● ●
● ●
● ●
●
● ●
●
●
●
●
● R−squared = 0.66
●
● ●
●
● ●
0
●
●● ●
● ● ●
●
●
●
●
● ●
●
●
● ●●
10
●●
● ● ●
●
● ●● ●
●
●
●
8
●
● ●● ● ●
●●
●●
● ●
●
● ●
6
●
● ●
●
●
y
●
●
● ● ● ● ●
●●
● ● ● ● ●
4
●
● ●
● ● ● ●
● ● ● ●●
●
●
●
●● ● ● ●
●
●●
●●
R−squared = 0.96
2
●
● ● ●
●
●● ●
●● ● ●
●
●
● ●
●●
● ●
●●
0
12
12
10
10
Y
Y
8
8
6
6
4
4
5 10 15 5 10 15
X X
12
12
10
10
Y
Y
8
8
6
6
4
5 10 15 5 10 15
X X
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 80 / 103
Why r 2 ?
To calculate r 2 , we need to think about the following two quantities:
1 TSS: Total sum of squares
2 SSE: Sum of squared errors
n
X
TSS = (yi − ȳ )2 .
i=1
n
X
SSE = ui2 .
i=1
SSE
r2 = 1 − .
TSS
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 81 / 103
Douglas
TSS
90
Goldberg
Marshall
Fortas
80
Warren Brennan
Black
70
Stevens
60
CLlib
Blackmun Stewart
50
Frankfurter
Jackson
Clark
Whittaker Harlan
Souter White
40
Burton
Powell Minton
Kennedy
O'Connor Reed
30
Scalia Burger
Thomas
Rehnquist
20
SCscore
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 82 / 103
Douglas
TSS
90
Goldberg
SSE
Marshall
Fortas
80
Warren Brennan
Stevens
60
CLlib
Blackmun Stewart
50
Frankfurter
Jackson
Clark
Whittaker Harlan
Souter White
40
Burton
Powell Minton
Kennedy
O'Connor Reed
30
Scalia Burger
Thomas
Rehnquist
20
SCscore
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 83 / 103
Derivation
n
X n
X
2
(yi − ȳ ) = yi − ȳ )}2
{ûi + (b
i=1 i=1
n
X
= {ûi2 + 2ûi (b yi − ȳ )2 }
yi − ȳ ) + (b
i=1
n
X n
X n
X
= ûi2 +2 yi − ȳ ) +
ûi (b yi − ȳ )2
(b
i=1 i=1 i=1
X n n
X
= ûi2 + yi − ȳ )2
(b
i=1 i=1
TSS = SSE + RegSS
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 84 / 103
Coefficient of Determination
SSE RegSS
+ =1
TSS TSS
RegSS SSE
=1− = r2
TSS TSS
r 2 is a measure of how much of the variation in Y is accounted for by X .
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 85 / 103
1 Mechanics of OLS
4 Properties Continued
7 Goodness of fit
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 86 / 103
OLS Assumptions Summary
@(A--B?(.C)D*EF<3GH* 5:(--/'(:*<?*EF3GH*
!"#$%&'(%)$* 3$4/(-#"$#--*
I-680,)%'*!$J#.#$'#************* 98(::B9(80:#*!$J#.#$'#***
+(,(*+#-'./0%)$* 5)$-/-,#$'6* EK*($"*!LH" E,*($"*NH*
M)8)-C#"(-%'/,6* M)8)-C#"(-%'/,6*
O).8(:/,6*)J*G..).-*
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 87 / 103
What Do the Regression Coefficients Mean Substantively?
So far, we have learned the statistical properties of the OLS estimator
However, these properties do not tell us what types of inference we
can draw from the estimates
Note that Assumption I would make OLS the best, not just best linear,
predictor, so it is certainly desired
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 90 / 103
State Legislators and African American Population
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.31489 0.32775 -4.012 0.000264 ***
bpop 0.35848 0.02519 14.232 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
“A one percentage point increase in the African American population is associated with
a 0.35 percentage point increase in the fraction of African American state legislators
(p < 0.001).”
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 91 / 103
Ground Rules: Interpretation of the Slope
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 92 / 103
Reporting Statistical Significance
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 93 / 103
Reporting Substantive Significance
Statistical significance and substantive significance are not the same: with a
large enough sample size even truly microscopic differences can be
statistically significant!
Examples:
Earnings on Schooling: The standard deviation is 2.5 years for schooling and
$50,000 for annual earnings. Thus, the slope estimates suggest that a one
standard deviation increase in schooling is associated with a .8 standard
deviation increase in earnings.
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 94 / 103
Next Week
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 95 / 103
1 Mechanics of OLS
4 Properties Continued
7 Goodness of fit
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 96 / 103
Fun with Non-Linearities
The linear regression model can accommodate non-linearity in X (but
not in β)
We do this by first transforming X appropriately
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 97 / 103
Example from the American War Library
World War II
●
6e+05
Y: Numbers of American Soldiers Wounded in Action
5e+05
4e+05
3e+05
World War I
●
Vietnam War
● Civil War, South
●
1e+05
Korean War
●
Okinawa
Operation Iraqi Freedom,
●
Iraq
Iwo Jima
Revolutionary
●Republic War
0e+00
Operation Terrorism,
Terrorism
PersianTerrorism
Gulf,
Russia
Moro
China
Terrorism
OperationRussia
Enduring
Texas War
Aleutian
Yemen,
KhobarOp
South of
Philippines
Spanish Indian
World
USS
North
Dominican
Israel
Texas 1812
Campaign
D−Day
Lebanon
Of Korea
Grenada
Yangtze
Barbary
Franco−Amer
NorthChina Mexico
Atlantic
Riyadh,
EnduringSiberia
Border
Freedom,
Italy ● War
Wars
American
Trade
Towers,
Persian Cole
Gulf
Oklahoma
Desert
Campaigns
Boxer
Panama
Attack/USS
War War
Center
Saudi
City
Expedition
Rebellion
Liberty
Independence
Service
Wars
Civil
Naval
Nicaragua
Mexican
Haiti Expedition
Saudi
War
Freedom,
Cortina
Japan
Trieste Arabia
Shield/Storm
Afghanistan
War War
Arabia
War
Afghanistan
Theater
●
●
●●● ●
●
β̂1 = 1.23 −→ One additional soldier killed predicts 1.23 additional soldiers
wounded on average
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 98 / 103
Wounded (Scale in Levels)
World War II ●
Civil War, North ●
World War I ●
Vietnam War ●
Civil War, South ●
Korean War ●
Okinawa ●
Operation Iraqi Freedom, Iraq ●
Iwo Jima ●
Revolutionary War ●
War of 1812 ●
Aleutian Campaign ●
D−Day ●
Philippines War ●
Indian Wars ●
Spanish American War ●
Terrorism, World Trade Center ●
Yemen, USS Cole ●
Terrorism Khobar Towers, Saudi Arabia ●
Persian Gulf ●
Terrorism Oklahoma City ●
Persian Gulf, Op Desert Shield/Storm ●
Russia North Expedition ●
Moro Campaigns ●
China Boxer Rebellion ●
Panama ●
Dominican Republic ●
Israel Attack/USS Liberty ●
Lebanon ●
Texas War Of Independence ●
South Korea ●
Grenada ●
China Yangtze Service ●
Mexico ●
Nicaragua ●
Barbary Wars ●
Russia Siberia Expedition ●
Dominican Republic ●
China Civil War ●
Terrorism Riyadh, Saudi Arabia ●
North Atlantic Naval War ●
Franco−Amer Naval War ●
Operation Enduring Freedom, Afghanistan ●
Mexican War ●
Operation Enduring Freedom, Afghanistan Theater ●
Haiti ●
Texas Border Cortina War ●
Nicaragua ●
Italy Trieste ●
Japan ●
Number of Wounded
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 99 / 103
Wounded (Logarithmic Scale)
Number of Wounded
10 100 1,000 10,000 100,000 1,000,000
World War II ●
Civil War, North ●
World War I ●
Vietnam War ●
Civil War, South ●
Korean War ●
Okinawa ●
Operation Iraqi Freedom, Iraq ●
Iwo Jima ●
Revolutionary War ●
War of 1812 ●
Aleutian Campaign ●
D−Day ●
Philippines War ●
Indian Wars ●
Spanish American War ●
Terrorism, World Trade Center ●
Yemen, USS Cole ●
Terrorism Khobar Towers, Saudi Arabia ●
Persian Gulf ●
Terrorism Oklahoma City ●
Persian Gulf, Op Desert Shield/Storm ●
Russia North Expedition ●
Moro Campaigns ●
China Boxer Rebellion ●
Panama ●
Dominican Republic ●
Israel Attack/USS Liberty ●
Lebanon ●
Texas War Of Independence ●
South Korea ●
Grenada ●
China Yangtze Service ●
Mexico ●
Nicaragua ●
Barbary Wars ●
Russia Siberia Expedition ●
Dominican Republic ●
China Civil War ●
Terrorism Riyadh, Saudi Arabia ●
North Atlantic Naval War ●
Franco−Amer Naval War ●
Operation Enduring Freedom, Afghanistan ●
Mexican War ●
Operation Enduring Freedom, Afghanistan Theater ●
Haiti ●
Texas Border Cortina War ●
Nicaragua ●
Italy Trieste ●
Japan ●
2 4 6 8 10 12
Log(Number of Wounded)
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 100 / 103
Regression: Log-Level
World War II
●
12
Korean●War ●
●
Okinawa
●
Operation Iraqi Freedom, Iraq
Iwo●Jima
10 ●
Revolutionary War
●
War of 1812
Aleutian ●
Campaign
D−Day
Philippines War
●●● Wars
8
Indian
Spanish American
● War
●
Terrorism, World Trade Center
●
Yemen, USS Cole
Terrorism Khobar Towers,
● Gulf Saudi Arabia
Persian
PersianTerrorism
Gulf, OpOklahoma
●
Desert City
Shield/Storm
Russia
Moro ● Expedition
North
Campaigns
6
●
● Rebellion
China Boxer
Panama
●
Dominican
Israel ● Republic
●
Attack/USS
Lebanon Liberty
Texas War Of●Independence
South Korea
●
Grenada
●
China Yangtze
● Service
Mexico
Nicaragua
Barbary
Russia ● Wars
Siberia Expedition
Dominican
Terrorism
NorthChina ●
● Republic
Riyadh,
Civil
Atlantic Saudi
War War
Naval Arabia
Franco−Amer
4
Operation Enduring ●
●
Freedom,
Mexican
●
Operation Enduring Freedom, WarAfghanistan
Haiti Afghanistan Theater ● ●
Texas Border ●
●Cortina War
Nicaragua
Italy Trieste
● ●
●
Japan
2
β̂1 = 0.0000237 −→ One additional soldier killed predicts 0.0023 percent increase
in the number of soldiers wounded on average
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 101 / 103
Regression: Log-Log
World War II
●
12
● War
Korean ●
●
Okinawa
●
Operation Iraqi Freedom, Iraq
● Iwo Jima
10 ●
Revolutionary War
●
War of 1812
Aleutian Campaign ●
D−DayPhilippines War
● ●
8
Indian
● Wars
Spanish American●War
●
Terrorism, World Trade Center
● Yemen, USS Cole
Terrorism Khobar ●Towers, Saudi Arabia
Persian Gulf
● Terrorism Oklahoma
Persian Gulf, City Shield/Storm
Op Desert
Russia
Moro ●North
Campaigns Expedition
6
● Operation
● ●● Afghanistan
Enduring Freedom,
●
Operation Enduring Freedom, Afghanistan ● Mexican War
● Haiti Theater ●
Texas Border Cortina War ● ●
Nicaragua
Italy●Trieste
●
●
Japan
2
2 4 6 8 10 12
Stewart (Princeton) Week 5: Simple Linear Regression October 10, 12, 2016 103 / 103