Assignments Ashoka University

Econometrics
Multiple Linear Regression
Kanika Mahajan
Ashoka University
March 29, 2019
Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 1 / 32

Outline
Specification
Interpretation
Estimation
Partialling out interpretation
Assumptions: Putting a structure on MLR
Omitted Variable Bias
Inference

Ceteris Paribus
This assumption implies that all other actors affecting y are fixed. In a
single variable regression this is difficult to argue. Multiple regression
allows to control for other variables.
Examples:
wage = β0 + β1 Educ + β2 Experience + u (1)

In a simple linear regression when Experience is a part of the error term,
we assume that education is not correlated with experience for no
correlation between error and x to hold.

Regression: Two Explanatory Variables
Specification: Population Regression Function
y = β0 + β1 x1 + β2 x2 + u (2)
β0 : Intercept
β1 : Change in y with respect to x1 holding other factors constant
β2 : Change in y with respect to x2 holding other factors constant

Regression: Quadratic Specification
Specification:
y = β0 + β1 x + β2 x 2 + u (3)
Marginal effect of x can be written as:
∂y
= β1 + β2 x (4)
∂x

Regression: k Independent Variables
Specification: Population Regression Function
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u (5)
Number of population parameters to be estimated= k + 1

β1 : may or may not be interpreted as slope since we can have
non-linear forms for explanatory variables
Terminology: OLS regression of y on x1 , x2 , , xk and marginal
effects/partial effects of variables.

Interpretation: Marginal Effects
In terms of changes:
∆ŷ = β̂0 + β̂1 ∆x1 + β̂2 ∆x2 + ... + β̂k ∆xk

When only x1 changes, other are constant:
∆ŷ = β̂1 ∆x1

Ideal Experiment: Keep x2 fixed and then vary x1 in the sample. But does
not happen in reality.

Interpretation
Wooldridge: College grade point average (colGPA), high school
GPA(hsGPA), and achievement test score (ACT) for a sample of 141
students from a large university; both college and high school GPAs are on
a four-point scale. Estimate OLS regression line to predict college GPA
from high school GPA and achievement test score:
ˆ
colGPA = 1.29 + 0.453hsGPA + 0.0094ACT
Intercept: 1.29 is the predicted college GPA when high school GPA and
the ACT Score are zero.
Holding ACT fixed, another point on hsGPA is associated with .453 of a
point on the college GPA. For example: two students, A and B, have the
same ACT score, but the high school GPA of Student A is one point
higher than the high school GPA of Student B, then we predict Student A
to have a college GPA .453 higher than that of Student B. Change in ACT
has a very small effect.

More than two variables changing simultaneously
In terms of changes:
ˆ
∆colGPA = 0.453∆hsGPA + 0.0094∆ACT
Estimated effect of change in college GPA when High School GPA
increases by 2 points and ACT score increases by 10 units:
ˆ
∆colGPA = 0.453 ∗ 2 + 0.0094 ∗ 10

Estimation: Obtaining the OLS estimates
Sample Regression Function:
yi = β̂0 + β̂1 xi1 + β̂2 xi2 + ... + β̂k xik + ûi

Predicted dependent variable:
ŷi = β̂0 + β̂1 xi1 + β̂2 xi2 + ... + β̂k xik

Residual: ûi = yi − ŷi
The method of ordinary least squares chooses the estimates to minimize
the sum of squared residuals.
n
X
(yi − β̂0 + β̂1 xi1 + β̂2 xi2 + ... + β̂k xik )2
i=1
Number of first order conditions: k + 1

Method of Moments
The First Order Conditions are sample counterparts of the below moment
conditions:
E (u) = 0
E (xj u) = 0
where j = 1, 2..., k

Interpretation of the estimates: Partialling out
y = β0 + β1 x1 + β2 x2 + u
Estimates given by:
Xn n
X Xn Xn
2 2
β̂1 = ( rî1 yi )/( rî1 ); β̂2 = ( rî2 yi )/( rî2 )
i=1 i=1 i=1 i=1
where rî1 are the OLS residuals from a simple regression of x1 on x2 , and
rî2 are the OLS residuals from a simple regression of x2 on x1 , using the
sample
Then do a simple regression of y on rˆ1 to obtain β̂1 . Similarly for β̂2
When k explanatory variables:
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
Then rˆ1 is the residual obtained by regressing of x1 on x2 , ..., xk , using the
sample.
Algebraic Properties
1. The sample average of the residuals is zero.

2. The sample covariance between each independent variable and the OLS
residuals is zero. Consequently, the sample covariance between the OLS
fitted values and the OLS residuals is zero.
3. The point (x̄1 , x̄2 , , x̄k , ȳ ) is always on the OLS regression hyperplane.

When multiple regression produces the same estimates as
simple regression
ŷ = β̂0 + β̂1 x1 + β̂2 x2

ỹ = β̃0 + β̃1 x1
Then, β̂0 = β̃0 when below holds:
1) β̂2 = 0 (because in this case the third first order condition is redundant.
Can set β̂2 = 0)
2) No correlation between x1 and x2 (can be seen from the partialling out
interpretation, residual rî1 is equal to xi1 − x¯1 )
The above can be generalized to k independent variables.

Goodness of Fit
Same as previously for Simple linear Regression.

Just another way of writing it down: squared correlation coefficient
between the actual yi and the fitted values ŷi .
R-square never decreases when any variable is added to a regression.
So whether an explanatory variable should be included in a model
depends on whether the explanatory variable has a nonzero partial
effect on y in the population.

Regression through the origin
y = β̃1 x1 + β̃2 x2 + ... + β̃k xk

What changes?
The sample average of residuals is no longer equal to zero.
Consequently, ȳ 6= ŷ¯
SSR
R − Square = 1 −
SST
n
X
SSR = (yi − β̃1 xi1 + ... + β̃k xik )2
i=1
It is possible that SSR > SST and thus R-square is less than one.
This has no intuitive meaning. Intercept included so that R-square
has a meaning. If true β0 = 0 then fine but if this assumption is
wrong then, biased estimates for slope as well since the specification
is wrong. If include an intercept when its true value is zero then only
penalty is larger variance of slope estimates.
Assumptions: Properties of the OLS estimators
MLR.1 Linear in parameters. True model given by:
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
MLR.2 Random Sampling
yi = β0 + β1 xi1 + β2 xi2 + ... + βk xik + u

MLR.3 Zero Conditional Mean
E (u|x1 , x2 , ..., xk ) = 0
Two important cases when the above assumption fails?
1) Omitted Variables Bias
2) Reverse causality
Endogenous vs Exogenous explanatory variables.

Assumptions: Properties of the OLS estimators
MLR.4 No perfect collinearity between the explanatory variables. If an

independent variable is an exact linear combination of the other
independent variables, then the model suffers from perfect collinearity, and
it cannot be estimated. Examples:
1) x1 = log (x) and x2 = log (x 2 )
2) x1 =Proportion area under soil A; x2 =Proportion area under soil B; only
two types of soils
Under the above assumptions MLR.1-MLR.4, OLS estimators are unbiased
E (β̂j ) = βj

Suppose we omit a variable that actually belongs in the true (or

population) model. True population model given by:
y = β0 + β1 x1 + β2 x2 + u
The above model satisfies the assumption MLR.1-MLR.4. The estimated
model is:
y = β̃0 + β̃1 x1 + ũ
Pn
(xi1 − x̄1 )yi
β̃1 = Pi=1
n 2
i=1 (xi1 − x̄1 )

substitute for yi from the true model:
yi = β0 + β1 xi1 + β2 xi2 + u
Pn
(xi1 − x̄1 )(β0 + β1 xi1 + β2 xi2 + u)
β̃1 = i=1 Pn 2
i=1 (xi1 − x̄1 )
On further simplification:
Pn
(xi1 − x̄1 )xi2
E (β̃1 |x1 , x2 ) = β1 + β2 Pi=1
n 2
i=1 (xi1 − x̄1 )

Now:
Pn
(xi1 − x̄1 )xi2
Pi=1
n 2
i=1 (xi1 − x̄1 )
is nothing but the coefficient δ̃1 from the regression below:
x2 = δ̃0 + δ̃1 x1 + e
Therefore,
E (β̃1 |x1 , x2 ) = β1 + β2 δ̃1

OVB = E (β̃1 |x1 , x2 ) − β1 = β2 δ̃1

Two cases when unbiased:

1) β2 = 0
2) δ̃1 = 0 (This is the sample covariance between x1 and x2 )
Signing the Bias:

1) Positive: β2 > 0; Corr (x1 , x2 ) > 0
2) Positive: β2 < 0; Corr (x1 , x2 ) < 0
3) Negative: β2 > 0; Corr (x1 , x2 ) < 0
4) Negative: β2 < 0; Corr (x1 , x2 ) > 0

Sign and Size of the Bias is of importance. Examples:

1) Returns to Education. Ability is unobserved.
2) Effect of fertlizer on yield. Soil quality is unobserved.
Important Terminology:
1) Downward Biased: When β1 > 0 and β2 δ̃1 < 0, When β1 < 0 and
β2 δ̃1 > 0
2) Upward Biased: When β1 > 0 and β2 δ̃1 > 0, When β1 < 0 and
β2 δ̃1 < 0

Omitted Variable Bias: Many explanatory variables
True model:
y = β0 + β1 x1 + β2 x2 + β3 x3 + u
x3 is omitted, what is the sign of the bias?
Sign of the Bias is difficult to determine when there are multiple
regressors in the estimated model.
Notable point: Correlation between a single explanatory variable and
the error generally results in all OLS estimators being biased.
An approximation, assume that x1 and x2 are uncorrelated, then we
can sign the bias. Same derivation as before.

Inference: Variance of the OLS estimates
An additional assumption:
MLR.5 Homoskedastic Errors (Var (u|x) = σ 2 ) Example:
Savings = β0 + β1 Income + u
Variance(u|Income) = σ 2 If Variance changes with any of the explanatory
variables, then heteroskedasticity is present.
Gauss-Markov Assumptions: MLR.1-MLR.5

Under Assumptions MLR.1 through MLR.5, conditional on the sample

values of the independent variables:
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
σ2
Var (β̂j ) =
SSTj (1 − Rj2 )
for j = 1,P2, ..., k, where
SSTj = ni=1 (xij − x̄j )2 (Total variation in xj )
Rj2 =R-square from regressing xj on other explanatory variables

Factors affecting variance of the OLS estimators:

1) Variance in errors
2) Sample variation in explanatory variable
3) The extent of linear relationship between the independent variables.
When Rj2 = 1 then multicollinearity. When high correlation then can lead
to large variances. Use data reduction techniques. When looking at the
variance of a particular coefficient, high correlations among other variables
does not matter.

Omitted variables: Effect on Variance
True model (satisfies all Gauss-Markov assumptions) given by:
y = β0 + β1 x1 + β2 x2 + u
Consider the below two estimators for β1
Estimate true model:
y = β̂0 + β̂1 x1 + β̂2 x2 + û

Estimate the below model in which x2 is omitted:
y = β̃0 + β̃1 x1 + ũ

Omitted variables: Effect on Variance
Consider the below cases for affect on tradeoff:
Case I: β2 6= 0
We clearly prefer β̂1 since it is unbiased.
But here note that Var (β̃1 ) < Var (β̂1 ) when there is correlation
between x1 and x2 and population variance of errors is known. As
sample size increases bias does not go away but tradeoff in variance
reduces.
Also, when we do not know the population σ 2 we estimate it using
sample and that can be larger when β2 6= 0.
Case II: β2 = 0
In this case we prefer β̃1 because we gain nothing in bias but lose in terms
of variance (if there is correlation between x1 and x2 ).
Var (β̃1 ) < Var (β̂1 )
can see the above from the direct application of variance formula
Estimating the variance of Errors
Pn 2
2 i=1 ûi
σ̂ =
n−k −1
The denominator reflects the degrees of freedom= n − (k + 1).
This means that, given n − (k + 1) of the residuals, the remaining (k + 1)
residuals are known.
State the below theorem without proof:

Under the Gauss-Markov Assumptions MLR.1 through MLR.5, E (σ̂ 2 ) = σ 2
Have shown this for a simple regression framework.
Terminology for σ̂: standard error of the regression/the root mean squared
error. Notably, while SSR must fall when another explanatory variable is
added, the degrees of freedom also falls by one. So RMSE of a regression
can increase or decrease when another variable is added.

Variance of slope estimate
A note on terminology:
σ2
s.d.(β̂j ) =
SSTj (1 − Rj2 )
σ̂ 2
s.e.(β̂j ) =
SSTj (1 − Rj2 )

Gauss Markov Theorem
Under Assumptions MLR.1 through MLR.5, the OLS estimator β̂j for βj is
the best linear unbiased estimator (BLUE).
Linear in the above context has a different meaning: linear function of the
data on the dependent variable
n
X
β̂j = wij yi
i=1
Best: this implies minumum variance amongst all the class of linear
unbiased estimators

Assignments Ashoka University

Uploaded by

Copyright:

Available Formats

Assignments Ashoka University

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignments Ashoka University

Uploaded by

Copyright:

Available Formats

Econometrics

Multiple Linear Regression

March 29, 2019

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 1 / 32

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 2 / 32

wage = β0 + β1 Educ + β2 Experience + u (1)

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 3 / 32

Specification: Population Regression Function

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 4 / 32

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 5 / 32

Specification: Population Regression Function

Number of population parameters to be estimated= k + 1

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 6 / 32

∆ŷ = β̂0 + β̂1 ∆x1 + β̂2 ∆x2 + ... + β̂k ∆xk

∆ŷ = β̂1 ∆x1

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 7 / 32

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 8 / 32

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 9 / 32

Sample Regression Function:

yi = β̂0 + β̂1 xi1 + β̂2 xi2 + ... + β̂k xik + ûi

ŷi = β̂0 + β̂1 xi1 + β̂2 xi2 + ... + β̂k xik

Number of first order conditions: k + 1

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 10 / 32

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 11 / 32

When k explanatory variables:

1. The sample average of the residuals is zero.

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 13 / 32

ŷ = β̂0 + β̂1 x1 + β̂2 x2

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 14 / 32

Same as previously for Simple linear Regression.

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 15 / 32

y = β̃1 x1 + β̃2 x2 + ... + β̃k xk

MLR.1 Linear in parameters. True model given by:

yi = β0 + β1 xi1 + β2 xi2 + ... + βk xik + u

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 17 / 32

MLR.4 No perfect collinearity between the explanatory variables. If an

Under the above assumptions MLR.1-MLR.4, OLS estimators are unbiased

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 18 / 32

Suppose we omit a variable that actually belongs in the true (or

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 19 / 32

substitute for yi from the true model:

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 20 / 32

is nothing but the coefficient δ̃1 from the regression below:

E (β̃1 |x1 , x2 ) = β1 + β2 δ̃1

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 21 / 32

Two cases when unbiased:

Signing the Bias:

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 22 / 32

Sign and Size of the Bias is of importance. Examples:

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 23 / 32

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 24 / 32

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 25 / 32

Under Assumptions MLR.1 through MLR.5, conditional on the sample

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 26 / 32

Factors affecting variance of the OLS estimators:

Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 27 / 32

True model (satisfies all Gauss-Markov assumptions) given by:

y = β̂0 + β̂1 x1 + β̂2 x2 + û