Nothing Special   »   [go: up one dir, main page]

ECON0019 Week1 SLR OLS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

ECON0019

The SLR Model: Properties of OLS estimators

Professor Dennis Kristensen

UCL

Dennis Kristensen (UCL) ECON0019 1 / 33


Recap

In this week’s video you learned about


1 De…nition and interpretation of Simple Linear Regression (SLR) model
2 The importance of mean–independence between error term and
regressor
3 Estimation of coe¢ cients by OLS

Dennis Kristensen (UCL) ECON0019 2 / 33


Outline

This lecture covers Sections 2.5–2.6 of Wooldridge and p. 10-14 of lecture


notes:
1 Unbiasedness of OLS estimators
2 Variance of the OLS estimators
3 Summary and tasks left undone

Dennis Kristensen (UCL) ECON0019 3 / 33


Statistical analysis of OLS estimators

We have seen how to estimate the SLR model by OLS. But is the
estimated version useful?
Is it informative about the population version?
Can it be used to draw inference?
To answer these questions, we wish to study the statistical properties
of the OLS estimator.
Mathematical statistics: How do our estimators behave across
di¤erent samples of data?
For example, on average, would we get the right answer if we could
repeatedly sample?
That is, is the expected value of OLS estimators = population values?
in e¤ect, the average outcome across all possible random samples –
and determine if we are right on average.
Leads to the notion of unbiasedness.

Dennis Kristensen (UCL) ECON0019 4 / 33


The four main assumptions used in our analysis

Assumption SLR.1 (Linear in Parameters). The population model can


be written as
y = β0 + β1 x + u
where β0 and β1 are the (unknown) population parameters.

We view x and u as random variables; thus, y is of course random.


While β0 and β1 are …xed numbers.
Without further assumptions on u, SLR.1 is always true.

Dennis Kristensen (UCL) ECON0019 5 / 33


Assumption SLR.2 (Random Sampling). We have a random sample of
size n, f(xi , yi ) : i = 1, ..., n g, following the population
model.

We use these data to estimate β0 and β1 by OLS.


Because each i is a draw from the population, we can write

yi = β0 + β1 xi + ui

for each i.
Because of SLR.2, we can treat (yi , xi , ui ), i = 1, ..., n, as identically
and independently distributed (i.i.d.) random variables
N.B. ui is the unobserved error for observation i
It is not the residual ûi that we compute from the data!

Dennis Kristensen (UCL) ECON0019 6 / 33


Assumption SLR.3 (Sample Variation in the Explanatory Variable).
The sample outcomes on xi are not all the same value.

This is the same as saying the sample variance of fxi : i = 1, ..., n g is


not zero.
In practice, this is hardly an assumption at all.
If in the population x does not change then we are not asking an
interesting question.
If the xi ’s are all the same value in the sample, we are unlucky and
cannot proceed.

Dennis Kristensen (UCL) ECON0019 7 / 33


Assumption SLR.4 (Zero Conditional Mean). In the population, the
error term has zero mean given any value of the explanatory
variable:
E [u jx ] = 0 for all x.

This is the key assumption for showing that OLS is unbiased


the zero value is a normalization
The important requirement is that E [u jx ] does not change with x.
Note: We can compute the OLS estimates whether or not this
assumption holds
even if there is no underlying population model
but the estimates may not be meaningful if SLR.4 fails

Dennis Kristensen (UCL) ECON0019 8 / 33


Distribution of slope estimator

Remember β1 is a unknown constant in the population and so the


same in any given sample.
In contrast, its estimator, β̂1 , varies across di¤erent samples and is a
random outcome
before we collect our data, we do not know which value β̂1 will take.
We then wish to understand better the distribution of β̂1
This will allow us to, e.g., derive con…dence intervals for β1 - the
parameter we are trying to learn about.
As a …rst step, we will show that β̂1 ’s distribution is centered at β1 .

Dennis Kristensen (UCL) ECON0019 9 / 33


Theorem
Unbiasedness of OLS Under Assumptions SLR.1–SLR.4 and conditional
on Xn = fx1 , x2 , ..., xn g,

E[ β̂0 jXn ] = β0 and E[ β̂1 jXn ] = β1 .

By the Law of Iterated Expectations,

E[ β̂1 ] = E E[ β̂1 jXn ] = E[ β1 ] = β1 .

So on average, the OLS estimators will "get it right".


The problem is we do not know which kind of sample we have.
We can never know whether we are close to the population value.
We could hope that our sample is “good” so that β̂1 is “close” to β1
but the opposite could equally be true

Dennis Kristensen (UCL) ECON0019 10 / 33


Unbiasedness is a property of the statistical procedure or rule.
After estimating an equation like

[ = 1.142 + .099 educ,


lwage

it is tempting to say 9.9% is an “unbiased estimate” of the return to


education.
This statement is incorrect:
The rule used to get β̂0 = 1.142 and β̂1 = .099 is unbiased (if we
believe E [u jeduc ] = 0).
The actual numerical estimates are “just” a particular draw from the
underlying distribution of the estimators.

Dennis Kristensen (UCL) ECON0019 11 / 33


Proof of unbiasedness in 3 steps

1 Obtain a convenient expression of estimator


2 Write estimator = population parameter + sampling error
3 Show that E [sampling error] = 0.

Dennis Kristensen (UCL) ECON0019 12 / 33


Step 1: Write down a formula for estimator

It is convenient to use
∑ni=1 (xi x̄ ) (yi ȳ ) ∑ni=1 (xi x̄ )yi
β̂1 = n =
∑i =1 (xi x̄ )2 ∑ni=1 (xi x̄ )2

where we have used that ∑ni=1 (xi x̄ )ȳ = 0


With SSTx = ∑ni=1 (xi x̄ )2 – the total variation in the xi - we can
write
∑n (xi x̄ )yi
β̂1 = i =1 (1)
SSTx
Under SLR.3, SSTx > 0 and so β̂1 exists/is well-de…ned.

Dennis Kristensen (UCL) ECON0019 13 / 33


Step 2: Write estimator as parameter + sampling error

Replace each yi with yi = β0 + β1 xi + ui (which uses SLR.1–SLR.2).


The numerator of (1) becomes
n n
∑ (xi x̄ )yi = ∑ (xi x̄ )( β0 + β1 xi + ui )
i =1 i =1
n n n
= β0 ∑ (xi x̄ ) + β1 ∑ (xi x̄ )xi + ∑ (xi x̄ )ui
i =1 i =1
| {z } | {z } i =1
=0 =SST x
n
= β1 SSTx + ∑ (xi x̄ )ui
i =1

Thus,
∑ni=1 (xi x̄ )yi ∑n (xi x̄ )ui
β̂1 = = β 1 + i =1
SSTx SSTx
| {z }
sampling error

Dennis Kristensen (UCL) ECON0019 14 / 33


∑ni=1 (xi x̄ )ui
β̂1 = β1 +
SSTx
| {z }
sampling error

Sampling error is a theoretical object derived under SLR.1


Its actual value in a given sample is unknown to us because fui gni=1
are unobserved
It could now be tempting to try to compute E [sampling error].
But this is complicated because sampling error is a highly non–linear
function of fxi gni=1
But it is a linear function of fui gni=1 - let us utilise this!

Dennis Kristensen (UCL) ECON0019 15 / 33


∑ni=1 (xi x̄ )ui
β̂1 = β1 +
SSTx
| {z }
sampling error

Let us condition on the n values that x took in our sample,

Xn := fxi gni=1 = fx1 , x2 , ..., xn g.

Conditional on Xn , SSTx and (xi x̄ ) can now be treated as constants


while fui gni=1 remain random.
We will now compute E[ β̂1 jXn ]

Dennis Kristensen (UCL) ECON0019 16 / 33


Step 3: Derive (conditional) mean of estimator

∑ni=1 (xi x̄ )ui


E[ β̂1 jXn ] = E β1 + Xn
SSTx
∑ni=1 (xi x̄ )ui
= β1 + E Xn
SSTx
n
1
= β1 + ∑
SSTx i =1
E [ (xi x̄ )ui j Xn ]
n
1
= β1 +
SSTx ∑ (xi x̄ )E [ ui j Xn ]
i =1

Dennis Kristensen (UCL) ECON0019 17 / 33


n
1
E[ β̂1 jXn ] = β1 +
SSTx ∑ (xi x̄ )E [ ui j Xn ]
i =1

ui is independent of fxj : j 6= i g under SLR.2. Thus,


SLR.2 SLR.4
E [ui jXn ] = E [ui jxi ] = 0 for all i.

This would not be true if, in the population, u and x are correlated.
Use above to obtain
n
1
E[ β̂1 jXn ] = β1 +
SSTx ∑ (xi x̄ )E [ui jXn ] = β1 .
| {z }
i =1
=0

Dennis Kristensen (UCL) ECON0019 18 / 33


Importance of SLR.1–SLR.4

SLR.1: y = β0 + β1 x + u
SLR.2: random sampling from the population
SLR.3: some sample variation in the x
SLR.4: E [u jx ] = 0
SLR.4 is critical and so you should always think hard if it is
reasonable in a given application:
What are the omitted factors?
Are they likely to be correlated with x?
If yes then SLR.4 fails and OLS will be biased.

Dennis Kristensen (UCL) ECON0019 19 / 33


EXAMPLE: Student Performance and Student-Teacher
Ratios

Using data from MEAP98.DTA,

\ = 75.03
math4 0.616 str ,

math4 is percentage of students passing math test in each school


str is student–teacher ratio in the school (class size).
β̂1 = 0.616 < 0: An increase in class size increases pass rate!
But do we believe OLS is unbiased in this setting?
“ability” likely a¤ects student performance and so contained in u
And E [ability jstr ] = 0 seems very unlikely. Rather, we expect
Cov(ability , str ) < 0
OLS likely picking up this negative correlation
We will study how to determine the sign of possible bias later

Dennis Kristensen (UCL) ECON0019 20 / 33


Variance of OLS estimators

Under SLR.1–SLR.4, the OLS estimators are unbiased.


This tells us that, on average, the estimates will equal the population
values.
But we also want to measure the dispersion (spread) of the
estimators.
We use variance (and, ultimately, standard deviation) as measure.

Dennis Kristensen (UCL) ECON0019 21 / 33


Homoskedasticity assumption

We could characterize the variance of the OLS estimators under


SLR.1–SLR.4 (and we will later).
For now, we introduce an assumption that simpli…es calculations:
Assumption SLR.5 (Homoskedasticity, or Constant Variance). The
error has the same variance given any value of the
explanatory variable x:

Var(u jx ) = σ2 > 0 for all x,

where σ2 is (virtually always) unknown.

Under the SLR.1, SRL.4 and SLR.5:

E [y jx ] = β0 + β1 x, Var(y jx ) = σ2

SLR.5 may not be realistic; it must be determined on a case-by-case


basis.
Dennis Kristensen (UCL) ECON0019 22 / 33
EXAMPLE: Savings and income

Suppose y = sav , x = inc and we think

E [sav jinc ] = β0 + β1 inc

β1 captures the e¤ect of an income change on average family savings


If we impose SLR.5 then

Var(sav jinc ) = σ2

the variability in savings does not change with income.


But more reasonable that savings would be more variable as income
increases:

Var(sav jinc = 100, 000) > Var(sav jinc = 10, 000)

Dennis Kristensen (UCL) ECON0019 23 / 33


Variances of OLS estimators

Theorem
(Sampling Variances of OLS) Under Assumptions SLR.1–SLR.5, and
conditional on Xn ,

σ2 σ2
Var( β̂1 jXn ) = =
∑ni=1 (xi x )2 SSTx
σ2 n 1 ∑ni=1 xi2
Var( β̂0 jXn ) =
SSTx

This is the “standard” formula for the variance of the OLS slope
estimator.
It is not valid if Assumption SLR.5 is violated.
The homoskedasticity assumption was not used to show unbiasedness
of the OLS estimators.
This requires only SLR.1–SLR.4.
Dennis Kristensen (UCL) ECON0019 24 / 33
Factors determining variance

σ2
Var( β̂1 jXn ) =
SSTx
1 The more “noise” in the relationship between y and x – that is, the
larger variability in u – the harder it is to learn about β1 :

σ2 ") Var ( β̂1 ) "

2 By contrast, more variation in fxi g is a good thing:

SSTx ") Var ( β̂1 ) #

Dennis Kristensen (UCL) ECON0019 25 / 33


Notice that SSTx /n is the sample variance in x.
This will get close to the population variance of x, σ2x , as n gets large:

SSTx nσ2x .

Thus, as n grows,
1 2 2
Var( β̂1 ) σ /σx
n
and so Var( β̂1 ) shrinks at the rate 1/n.
This show why more data is a good thing: It shrinks the sampling
variance of our estimators.

Dennis Kristensen (UCL) ECON0019 26 / 33


Sketch of proof of theorem

Recall that
∑ni=1 (xi x̄ )ui
β̂1 = β1 +
SSTx
Again, conditional on Xn , we can treat the parts involving fxi gni=1 as
non–random.
In addition, we will use the following fact: For uncorrelated random
variables, the variance of the sum is the sum of the variances.

Dennis Kristensen (UCL) ECON0019 27 / 33


ui is independent of fxj : j 6= i g under SLR.2. This combined with
SLR.5 yields
SLR.2 SLR.5
Var(ui jXn ) = Var(ui jxi ) = σ2 .

SLR.2 also implies that any two errors are uncorrelated,


SLR.2
Cov(ui , uj jXn ) = 0.

Therefore,

∑ni=1 (xi x̄ )ui


Var( β̂1 jXn ) = Var β1 + Xn
SSTx
n n
1 1
=
SSTx2 ∑ (xi x̄ )2 Var(ui jXn ) =
SSTx2 ∑ (xi x̄ )2 σ2
i =1 i =1
σ2
=
SSTx
Dennis Kristensen (UCL) ECON0019 28 / 33
Estimating the error variance

σ2
Var( β̂1 jXn ) =
SSTx
We can compute SSTx from the observed data fxi : i = 1, ..., n g.
We need to estimate σ2 since it is unknown. By LIE and SLR.4,

σ 2 = E[u 2 ].

If we could observe fui gni=1 then an unbiased estimator of σ2 would


be n1 ∑ni=1 ui2
But this estimator is infeasible – we do not observe fui gni=1 .
We could instead replace ui with its “estimate,” the OLS residual ûi :
σ̃2 = n1 ∑ni=1 ûi2 = n1 SSR
However, this estimator is biased: E[σ̃2 ] < σ2
The bias is due to us using residuals
Dennis Kristensen (UCL) ECON0019 29 / 33
Bias is easy to correct:
n
1 1
2 i∑
σ̂2 = SSR = ûi2 .
n 2 n =1

Replacing n by n 2 is a so–called degrees of freedom adjustment.

Theorem
(Unbiased Estimator of σ2 ) Under Assumptions SLR.1-SLR.5,

E[σ̂2 jXn ] = σ2 .

Dennis Kristensen (UCL) ECON0019 30 / 33


Standard error of the regression

p
σ̂ = SSR/(n 2) is called the standard error of the regression
it is an estimate of the standard deviation of the error in the regression
Stata calls it the root mean squared error.
p
Given σ̂, we can now, for example, estimate sd ( β̂1 jXn ) = σ/ SSTx
by
σ̂
se ( β̂1 ) = p .
SSTx
This is called the standard error of β̂1 . We will use these a lot.
Almost all regression packages report the standard errors in a column
next to the coe¢ cient estimates.

Dennis Kristensen (UCL) ECON0019 31 / 33


EXAMPLE: Returns to education

Using WAGE2.DTA

[ = 1.142 + .0993 educ


lwage
(.109 ) (.0081 )

For reasons we will see, it is useful to report the standard errors below
the corresponding coe¢ cient, usually in parentheses.
In this regression σ̂ = .5383 (see “Root MSE” in the Stata output).
Sometimes it is reported, but not usually.

Dennis Kristensen (UCL) ECON0019 32 / 33


Summary and tasks left undone

Under SLR.1–SLR.4, we have showed that OLS is unbiased


And we have derived the sampling variance of the OLS estimators
(which underlies the standard error calculations) when SLR.5 also
holds.
We could continue by discussing statistical inferences for population
parameters.
But the mechanics are very similar to multiple regression, and so we
hold o¤
Also how do we know the OLS approach is a good one?
There are a lot of ways to combine data to estimate a slope and
intercept.
We will argue, that OLS is “best” in a certain sense under
SLR.1–SLR.5

Dennis Kristensen (UCL) ECON0019 33 / 33

You might also like