Nothing Special   »   [go: up one dir, main page]

Ordinary Least Squares

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Short Guides to Microeconometrics Kurt Schmidheiny

Fall 2019 Unversität Basel

The Multiple Linear Regression Model

1 Introduction

The multiple linear regression model and its estimation using ordinary
least squares (OLS) is doubtless the most widely used tool in econometrics.
It allows to estimate the relation between a dependent variable and a set
of explanatory variables. Prototypical examples in econometrics are:

• Wage of an employee as a function of her education and her work


experience (the so-called Mincer equation).

• Price of a house as a function of its number of bedrooms and its age


(an example of hedonic price regressions).

The dependent variable is an interval variable, i.e. its values represent


a natural order and differences of two values are meaningful. In practice,
this means that the variable needs to be observed with some precision
and that all observed values are far from ranges which are theoretically
excluded. Wages, for example, do strictly speaking not qualify as they
cannot take values beyond two digits (cents) and values which are nega-
tive. In practice, monthly wages in dollars in a sample of full time workers
is perfectly fine with OLS whereas wages measured in three wage cate-
gories (low, middle, high) for a sample that includes unemployed (with
zero wages) ask for other estimation tools.

Version: 17-9-2019, 16:16


The Multiple Linear Regression Model 2

2 The Econometric Model

The multiple linear regression model assumes a linear (in parameters)


relationship between a dependent variable yi and a set of explanatory
variables x0i =(xi0 , xi1 , ..., xiK ). xik is also called an independent
variable, a covariate or a regressor. The first regressor xi0 = 1 is a constant
unless otherwise specified.
Consider a sample of N observations i = 1, ... , N . Every single obser-
vation i follows
yi = x0i β + ui
where β is a (K + 1)-dimensional column vector of parameters, x0i is a
(K + 1)-dimensional row vector and ui is a scalar called the error term.
The whole sample of N observations can be expressed in matrix nota-
tion,
y = Xβ + u
where y is a N -dimensional column vector, X is a N × (K + 1) matrix
and u is a N -dimensional column vector of error terms, i.e.

     
y1 1 x11 · · · x1K   u1
β0
 y 
 2 
 1 x21 · · · x2K   u 
 2 
β1
   
 y3  1 x31 · · · x3K  u3 
       
=    ..  +
 ..  .. .. ..  .. 
       
 ..  .
.
 
 .   . . .   . 
βK
yN 1 xN 1 · · · xN K uN
N ×1 N × (K + 1) (K + 1) × 1 N ×1

The data generation process (dgp) is fully described by a set of as-


sumptions. Several of the following assumptions are formulated in dif-
ferent alternatives. Different sets of assumptions will lead to different
properties of the OLS estimator.

OLS1: Linearity
yi = x0i β + ui and E[ui ] = 0
3 Short Guides to Microeconometrics

OLS1 assumes that the functional relationship between dependent and


explanatory variables is linear in parameters, that the error term enters
additively and that the parameters are constant across individuals i.

OLS2: Independence
{xi , yi }N
i=1 i.i.d. (independent and identically distributed)

OLS2 means that the observations are independently and identically dis-
tributed. This assumption is in practice guaranteed by random sampling.

OLS3: Exogeneity
a) ui |xi ∼ N (0, σi2 )
b) ui ⊥ xi (independent)
c) E[ui |xi ] = 0 (mean independent)
d) Cov[xi , ui ] = 0 (uncorrelated)
OLS3a assumes that the error term is normally distributed conditional
on the explanatory variables. OLS3b means that the error term is in-
dependent of the explanatory variables. OLS3c states that the mean of
the error term is independent of the explanatory variables. OLS3d means
that the error term and the explanatory variables are uncorrelated. Either
OLS3a or OLS3b imply OLS3c and OLS3d. OLS3c implies OLS3d.

OLS4: Error Variance


a) V [ui |xi ] = σ 2 < ∞ (homoscedasticity)
b) V [ui |xi ] = σi2 = g(xi ) < ∞ (conditional heteroscedasticity)
OLS4a (homoscedasticity) means that the variance of the error term is
a constant. OLS4b (conditional heteroscedasticity) allows the variance of
the error term to depend on the explanatory variables.
The Multiple Linear Regression Model 4

OLS5: Identifiability
E[xi x0i ] = QXX is positive definite and finite
rank(X) = K + 1 < N
The OLS5 assumes that the regressors are not perfectly collinear, i.e. no
variable is a linear combination of the others. For example, there can only
be one constant. Intuitively, OLS5 means that every explanatory variable
adds additional information. OLS5 also assumes that all regressors (but
the constant) have strictly positive variance both in expectations and in
the sample and not too many extreme values.

3 Estimation with OLS

Ordinary least squares (OLS) minimizes the squared distances between


the observed and the predicted dependent variable y:

N
X
S (β) = (yi − x0i β)2 = (y − Xβ)0 (y − Xβ) → min
β
i=1

The resulting OLS estimator of β is:


−1
βb = (X 0 X) X 0 y

Given the OLS estimator, we can predict the dependent variable by


ybi = x0i βb and the error term by u
bi = yi − x0i β.
b ubi is called the residual.

4 Goodness-of-fit

The goodness-of-fit of an OLS regression can be measured as


SSR SSE
R2 = 1 − =
SST SST
PN 2
where SST = i=1 (yi − y) is the total sum of squares and SSR =
PN 2 PN
i=1 u
bi the residual sum of squares. SSE = yi − y)2 is called
i=1 (b
5 Short Guides to Microeconometrics

4
E(y|x)
OLS
3 data

0
y

−1

−2

−3

−4
0 2 4 6 8 10
x

Figure 1: The linear regression model with one regressor. β0 = −2,


β1 = 0.5, σ 2 = 1, x ∼ uniform(0, 10), u ∼ N (0, σ 2 ).

the explained sum of squares if the regression contains a constant and


therefore y = yb. In this case, R2 lies by definition between 0 and 1 and
reports the fraction of the sample variation in y that is explained by the
xs.
Note: R2 increases by construction with every (also irrelevant) addi-
tional regressors and is therefore not a good criterium for the selection of
regressors. The adjusted R2 is a modified version that does not necessarily
increase with additional regressors:
N − 1 SSR
adj. R2 = 1 − .
N − K − 1 SST
The Multiple Linear Regression Model 6

5 Small Sample Properties

Assuming OLS1, OLS2, OLS3a, OLS4, and OLS5, the following proper-
ties can be established for finite, i.e. even small, samples.

• The OLS estimator of β is unbiased :

E[β|X]
b =β

• The OLS estimator is (multivariate) normally distributed:


 
b ∼ N β, V [β|X]
β|X b

−1
with variance V [β|X]
b = σ 2 (X 0 X) under homoscedasticity (OLS4a)
−1 −1
and V [β|X]
b = σ 2 (X 0 X) X 0 ΩX (X 0 X) under known heteroscedas-
ticity (OLS4b). Under homoscedasticity (OLS4a) the variance V
can be unbiasedly estimated as

Vb (β|X)
b c2 (X 0 X)−1

with
c2 = b0 u
u b
σ .
N −K −1

• Gauß-Markov-Theorem: under homoscedasticity (OLS4a),

βb is BLUE (best linear unbiased estimator)

6 Tests in Small Samples

Assume OLS1, OLS2, OLS3a, OLS4a, and OLS5.


A simple null hypotheses of the form H0 : βk = q is tested with the
t-test. If the null hypotheses is true, the t-statistic

βbk − q
t= ∼ tN −K−1
se[
b βbk ]
7 Short Guides to Microeconometrics

follows a t-distribution with N − K − 1 degrees of freedom. The standard


error se[
b βbk ] is the square root of the element in the (k + 1)−th row and
(k+1)−th column of Vb [β|X].b For example, to perform a two-sided test of
H0 against the alternative hypotheses HA : βk 6= q on the 5% significance
level, we calculate the t-statistic and compare its absolute value to the
0.975-quantile of the t-distribution. With N = 30 and K = 2, H0 is
rejected if |t| > 2.052.
A null hypotheses of the form H0 : Rβ = q with J linear restrictions
is jointly tested with the F -test. If the null hypotheses is true, the F -
statistic
 0  −1  
0
Rβb − q RVb (β|X)R
b Rβb − q
F = ∼ FJ,N −K−1
J
follows an F distribution with J numerator degrees of freedom and N −
K − 1 denominator degrees of freedom. For example, to perform a two-
sided test of H0 against the alternative hypotheses HA : Rβ 6= q at the
5% significance level, we calculate the F -statistic and compare it to the
0.95-quantile of the F -distribution. With N = 30, K = 2 and J = 2, H0
is rejected if F > 3.35. We cannot perform one-sided F -tests.
Only under homoscedasticity (OLS4a), the F -statistic can also be
computed as

(SSRrestricted − SSR)/J (R2 − Rrestricted


2
)/J
F = = 2
∼ FJ,N −K−1
SSR/(N − K − 1) (1 − R )/(N − K − 1)
2
where SSRrestricted and Rrestricted are, respectively, estimated by re-
striced least squares which minimizes S(β) s.t. Rβ = q. Exclusionary
restrictions of the form H0 : βk = 0, βm = 0, ... are a special case of
H0 : Rβ = q. In this case, restricted least squares is simply estimated as
a regression were the explanatory variables k, m, ... are excluded.
The Multiple Linear Regression Model 8

7 Confidence Intervals in Small Samples

Assuming OLS1, OLS2, OLS3a, OLS4a, and OLS5, we can construct


confidence intervals for a particular coefficient βk . The (1 − α) confidence
interval is given by
 
βbk − t(1−α/2),(N −K−1) se[
b βbk ] , βbk + t(1−α/2),(N −K−1) se[
b βbk ]

where t(1−α/2),(N −K−1) is the (1 − α/2) quantile of the t-distribution with


N − K − 1 degrees of freedom.
 For example, the 95 % confidence  interval
with N = 30 and K = 2 is βk − 2.052se[
b b βk ] , βk + 2.052se[
b b b βk ] .
b

8 Asymptotic Properties of the OLS Estimator

Assuming OLS1, OLS2, OLS3d, OLS4a or OLS4b, and OLS5 the follow-
ing properties can be established for large samples.

• The OLS estimator is consistent:

plim βb = β

• The OLS estimator is asymptotically normally distributed under


OLS4a as
√ d
N (βb − β) −→ N 0, σ 2 Q−1

XX

and under OLS4b as


√ d
N (βb − β) −→ N 0, Q−1 −1

XX QXΩX QXX

where QXX = E[xi x0i ] and QXΩX = E[u2i xi x0i ] is assumed positive
definite (see handout on “Heteroskedasticity in the Linear Model”).

• The OLS estimator is approximately normally distributed


 
A
βb ∼ N β, Avar[β]
b
9 Short Guides to Microeconometrics

where the asymptotic variance Avar[β]


b can be consistently esti-
mated under OLS4a (homoscedasticity) as

Avar[
[ β] c2 (X 0 X)−1
b =σ

with σc2 = ub0 u


b/N and under OLS4b (heteroscedasticity) as the ro-
bust or Eicker-Huber-White estimator (see handout on “Heteroscedas-
ticity in the linear Model”)
N
!
−1 −1
X
0
Avar[
[ β] b = (X X) b xi x (X 0 X) .
u 2 0
i i
i=1

Note: In practice we can almost never be sure that the errors are
homoscedastic and should therefore always use robust standard errors.

9 Asymptotic Tests

Assume OLS1, OLS2, OLS3d, OLS4a or OLS4b, and OLS5.


A simple null hypotheses of the form H0 : βk = q is tested with the
z-test. If the null hypotheses is true, the z-statistic

βbk − q A
z= ∼ N (0, 1)
se[
b βbk ]

follows approximately the standard normal distribution. The standard


error se[
b βbk ] is the square root of the element in the (k + 1)−th row and
(k + 1)−th column of Avar[[ β]. b For example, to perform a two sided
test of H0 against the alternative hypotheses HA : βk 6= q on the 5%
significance level, we calculate the z-statistic and compare its absolute
value to the 0.975-quantile of the standard normal distribution. H0 is
rejected if |z| > 1.96.
A null hypotheses of the form H0 : Rβ = q with J linear restrictions is
jointly tested with the Wald test. If the null hypotheses is true, the Wald
The Multiple Linear Regression Model 10

statistic
 0  −1  
b 0 A
W = Rβb − q RAvar[
[ β]R Rβb − q ∼ χ2J

follows approximately an χ2 distribution with J degrees of freedom. For


example, to perform a test of H0 against the alternative hypotheses HA :
Rβ 6= q on the 5% significance level, we calculate the Wald statistic and
compare it to the 0.95-quantile of the χ2 -distribution. With J = 2, H0 is
rejected if W > 5.99. We cannot perform one-sided Wald tests.
Under OLS4a (homoscedasticity) only, the Wald statistic can also be
computed as

(SSRrestricted − SSR) (R2 − Rrestricted


2
) A 2
W = = ∼ χJ
SSR/N (1 − R2 )/N
2
where SSRrestricted and Rrestricted are, respectively, estimated by re-
stricted least squares which minimizes S(β) s.t. Rβ = q. Exclusionary
restrictions of the form H0 : βk = 0, βm = 0, ... are a special case of
H0 : Rβ = q. In this case, restricted least squares is simply estimated as
a regression were the explanatory variables k, m, ... are excluded.
Note: the Wald statistic can also be calculated as
A
W = J · F ∼ χ2J

where F is the small sample F -statistic. This formulation differs by a


factor (N − K − 1)/N but has the same asymptotic distribution.

10 Confidence Intervals in Large Samples

Assuming OLS1, OLS2, OLS3d, OLS5, and OLS4a or OLS4b, we can


construct confidence intervals for a particular coefficient βk . The (1 − α)
confidence interval is given by
 
βbk − z(1−α/2) se[
b βbk ] , βbk + z(1−α/2) se[
b βbk ]
11 Short Guides to Microeconometrics

where z(1−α/2) is the (1 − α/2) quantile of the standard


 normal distribu-
tion. For example, the 95 % confidence interval is βbk − 1.96se[
b βbk ] , βbk +

1.96se[
b βbk ] .

11 Small Sample vs. Asymptotic Properties

The t-test, F -test and confidence interval for small samples depend on the
normality assumption OLS3a (see Table 1). This assumption is strong and
unlikely to be satisfied. The asymptotic z-test, Wald test and the con-
fidence interval for large samples rely on much weaker assumptions. Al-
though most statistical software packages report the small sample results
by default, we would typically prefer the large sample approximations. In
practice, small sample and asymptotic tests and confidence intervals are
very similar already for relatively small samples, i.e. for (N − K) > 30.
Large sample tests also have the advantage that they can be based on
heteroscedasticity robust standard errors.

12 More Known Issues

Non-linear functional form: The true relationship between the dependent


variable and the explanatory variables is often not linear and thus in vi-
olation of assumption OLS1. The multiple linear regression model allows
for many forms of non-linear relationships by transforming both depen-
dent and explanatory variables. See the handout on “Functional Form in
the Linear Model” for details.
Aggregate regressors: Some explanatory variables may be constant
within groups (clusters) of individual observations. For example, wages of
individual workers are regressed on state-level unemployment rates. This
is a violation of the independence across individual observations (OLS2 ).
In this case, the usual standard errors will be too small and t-statistics too

large by a factor of up to M , where M is the average number of individ-
ual observations per group (cluster). For example, the average number of
The Multiple Linear Regression Model 12

workers per state. Cluster-robust standard errors will provide asymptot-


ically consistent standard errors for the usual OLS point estimates. See
the handout on “Clustering in the Linear Model” for more details and
generalizations.
Omitted variables: Omitting explanatory variables in the regression
generally violates the exogeneity assumption (OLS3 ) and leads to biased
and inconsistent estimates of the coefficients for the included variables.
This omitted-variable bias does not occur if the omitted variables are
uncorrelated with all included explanatory variables.
Irrelevant regressors: Including irrelevant explanatory variables, i.e.
variables which do not have an effect on the dependent variable, does not
lead to biased or inconsistent estimates of the coefficients for the other
included variables. However, including too many irrelevant regressors may
lead to very imprecise estimates, i.e. very large standard errors, in small
datasets.
Reverse causality: A reverse causal effect of the dependent variable
on one or several explanatory variables is a violation of the exogeneity
assumption (OLS3 ) and leads to biased and inconsistent estimates. See
the handout on “Instrumental Variables” for a potential solution.
Measurement error : Imprecise measurement of the explanatory vari-
ables is a violation of OLS3 and leads to biased and inconsistent estimates.
See the handout on “Instrumental Variables” for a potential solution.
Multicollinearity: Perfectly correlated explanatory variables violate
the identifiability assumption (OLS5 ) and their effects cannot be esti-
mated separately. The effects of highly but not perfectly correlated vari-
ables can in principle be separately estimated. However, the estimated
coefficients will be very imprecise, i.e. the standard errors will be very
large. If variables are (almost) perfectly correlated in all conceivable states
of the world, there is no theoretical meaning of separate effects. If mul-
ticollinearity is only a feature of a specific sample, collecting more data
may provide the necessary variation to estimate separate effects.
13 Short Guides to Microeconometrics

13 Summary of OLS Properties

Case [1] [2] [3] [4] [5] [6]


Assumptions
OLS1: linearity X X X X X X
OLS2: independence X X X X X X
OLS3: exogeneity
- OLS3a: normality X × × × X ×
- OLS3b: independent X X × × × ×
- OLS3c: mean indep. X X X X X ×
- OLS3d: uncorrelated X X X X X X
OLS4: error variance
- OLS4a: homoscedastic X X X
- OLS4b: heteroscedastic X X X
OLS5: identifiability X X X X X X
Small sample properties of β̂
unbiased X X X X X ×
normally distributed X × × × X ×
efficient X X X × × ×
t-test, F -test X × × × × ×
Large sample properties of β̂
consistent X X X X X X
approx. normal X X X X X X
asymptotically efficient X X X × × ×
z-test, Wald test X X X X∗ X∗ X∗
Notes: X = fullfiled, × = violated, ∗ = corrected standard errors.
The Multiple Linear Regression Model 14

Implementation in Stata 14

The multiple linear regression model is estimated by OLS with the regress
command. For example,
webuse auto.dta
regress mpg weight displacement

regresses the mileage of a car (mpg) on weight and displacement (see


annotated output next page). A constant is automatically added if not
suppressed by the option noconst
regress mpg weight displacement, noconst

Estimation based on a subsample is performed as


regress mpg weight displacement if weight>3000

where only cars heavier than 3000 lb are considered. Transformations of


variables are included with new variables
generate logmpg = log(mpg)
generate weight2 = weight^2
regress logmpg weight weight2 displacement

The Eicker-Huber-White covariance is reported with the option robust


regress mpg weight displacement, vce(robust)

F -tests for one or more restrictions are calculated with the post-estimation
command test. For example
test weight

tests H0 : β1 = 0 against HA : β1 6= 0, and


test weight displacement

tests H0 : β1 = 0 and β2 = 0 against HA : β1 6= 0 or β2 6= 0


New variables with residuals and fitted values are generated by
predict uhat if e(sample), resid
predict pricehat if e(sample)
15

SSE/K K
#
"! N-K-1
%('
$ () N
K
F-Test
N-K-1 *+ : -. = 0
and -# = 0
Source SS df MS Number of obs = 74
p-value
F( 2, 71) = 66.79
*+ : -. = 0
SSE Model 1595.40969 2 797.704846 Prob > F = 0.0000 and -# = 0
Residual 848.049768 71 11.9443629 R-squared = 0.6529
SSR R2
Adj R-squared = 0.6432
SST Total 2443.45946 73 33.4720474 Root MSE = 3.4561 adj. R2

N-1 "!

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]


-7.
weight -.0065671 .0011662 -5.63 0.000 -.0088925 -.0042417
-7#
displacement .0052808 .0098696 0.54 0.594 -.0143986 .0249602
-7+ _cons 40.08452 2.02011 19.84 0.000 36.05654 44.11251

546(-7. ) 95%-confidence interval for -7.

t-Test *+ : -. = 0 p-value *+ : -. = 0
Short Guides to Microeconometrics
The Multiple Linear Regression Model 16

Implementation in R

The multiple linear regression model is estimated by OLS with the lm


function. For example,
> library(foreign)
> auto <- read.dta("http://www.stata-press.com/data/r11/auto.dta")
> fm <- lm(mpg~weight+displacement, data=auto)
> summary(fm)

regresses the mileage of a car (mpg) on weight and displacement. A


constant is automatically added if not suppressed by -1
> lm(mpg~weight+displacement-1, data=auto)

Estimation based on a subsample is performed as


> lm(mpg~weight+displacement, subset=(weight>3000), data=auto)

where only cars heavier than 3000 lb are considered. Tranformations of


variables are directly included with the I() function
> lm(I(log(mpg))~weight+I(weight^2)+ displacement, data=auto)

The Eicker-Huber-White covariance is reported after estimation with


> library(sandwich)
> library(lmtest)
> coeftest(fm, vcov=sandwich)

F -tests for one or more restrictions are calculated with the command
waldtest which also uses the two packages sandwich and lmtest
> waldtest(fm, "weight", vcov=sandwich)

tests H0 : β1 = 0 against HA : β1 6= 0 with Eicker-Huber-White, and


> waldtest(fm, .~.-weight-displacement, vcov=sandwich)

tests H0 : β1 = 0 and β2 = 0 against HA : β1 6= 0 or β2 6= 0.


New variables with residuals and fitted values are generated by
> auto$uhat <- resid(fm)
> auto$mpghat <- fitted(fm)
17 Short Guides to Microeconometrics

References

Introductory textbooks

Stock, James H. and Mark W. Watson (2020), Introduction to Economet-


rics, 4th Global ed., Pearson. Chapters 4 - 9.
Wooldridge, Jeffrey M. (2009), Introductory Econometrics: A Modern
Approach, 4th ed., Cengage Learning. Chapters 2 - 8.

Advanced textbooks

Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics:


Methods and Applications, Cambridge University Press. Sections 4.1-
4.4.
Wooldridge, Jeffrey M. (2002), Econometric Analysis of Cross Section and
Panel Data, MIT Press. Chapters 4.1 - 4.23.

Companion textbooks

Angrist, Joshua D. and Jörn-Steffen Pischke (2009), Mostly Harmless


Econometrics: An Empiricist’s Companion, Princeton University Press.
Chapter 3.
Kennedy, Peter (2008), A Guide to Econometrics, 6th ed., Blackwell Pub-
lishing. Chapters 3 - 11, 14.

You might also like