Assignments Ashoka University
Assignments Ashoka University
Assignments Ashoka University
Kanika Mahajan
Ashoka University
Specification
Interpretation
Estimation
Partialling out interpretation
Assumptions: Putting a structure on MLR
Omitted Variable Bias
Inference
This assumption implies that all other actors affecting y are fixed. In a
single variable regression this is difficult to argue. Multiple regression
allows to control for other variables.
Examples:
y = β0 + β1 x1 + β2 x2 + u (2)
β0 : Intercept
β1 : Change in y with respect to x1 holding other factors constant
β2 : Change in y with respect to x2 holding other factors constant
Specification:
y = β0 + β1 x + β2 x 2 + u (3)
Marginal effect of x can be written as:
∂y
= β1 + β2 x (4)
∂x
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u (5)
In terms of changes:
ˆ
colGPA = 1.29 + 0.453hsGPA + 0.0094ACT
Intercept: 1.29 is the predicted college GPA when high school GPA and
the ACT Score are zero.
Holding ACT fixed, another point on hsGPA is associated with .453 of a
point on the college GPA. For example: two students, A and B, have the
same ACT score, but the high school GPA of Student A is one point
higher than the high school GPA of Student B, then we predict Student A
to have a college GPA .453 higher than that of Student B. Change in ACT
has a very small effect.
In terms of changes:
ˆ
∆colGPA = 0.453∆hsGPA + 0.0094∆ACT
Estimated effect of change in college GPA when High School GPA
increases by 2 points and ACT score increases by 10 units:
ˆ
∆colGPA = 0.453 ∗ 2 + 0.0094 ∗ 10
The First Order Conditions are sample counterparts of the below moment
conditions:
E (u) = 0
E (xj u) = 0
where j = 1, 2..., k
y = β0 + β1 x1 + β2 x2 + u
Estimates given by:
Xn n
X Xn Xn
2 2
β̂1 = ( rˆi1 yi )/( rˆi1 ); β̂2 = ( rˆi2 yi )/( rˆi2 )
i=1 i=1 i=1 i=1
where rˆi1 are the OLS residuals from a simple regression of x1 on x2 , and
rˆi2 are the OLS residuals from a simple regression of x2 on x1 , using the
sample
Then do a simple regression of y on rˆ1 to obtain β̂1 . Similarly for β̂2
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
Then rˆ1 is the residual obtained by regressing of x1 on x2 , ..., xk , using the
sample.
Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 12 / 32
Algebraic Properties
SSR
R − Square = 1 −
SST
n
X
SSR = (yi − β̃1 xi1 + ... + β̃k xik )2
i=1
It is possible that SSR > SST and thus R-square is less than one.
This has no intuitive meaning. Intercept included so that R-square
has a meaning. If true β0 = 0 then fine but if this assumption is
wrong then, biased estimates for slope as well since the specification
is wrong. If include an intercept when its true value is zero then only
penalty is larger variance of slope estimates.
Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 16 / 32
Assumptions: Properties of the OLS estimators
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
MLR.2 Random Sampling
E (u|x1 , x2 , ..., xk ) = 0
Two important cases when the above assumption fails?
1) Omitted Variables Bias
2) Reverse causality
Endogenous vs Exogenous explanatory variables.
E (β̂j ) = βj
y = β0 + β1 x1 + β2 x2 + u
The above model satisfies the assumption MLR.1-MLR.4. The estimated
model is:
y = β̃0 + β̃1 x1 + ũ
Pn
(xi1 − x̄1 )yi
β̃1 = Pi=1
n 2
i=1 (xi1 − x̄1 )
yi = β0 + β1 xi1 + β2 xi2 + u
Pn
(xi1 − x̄1 )(β0 + β1 xi1 + β2 xi2 + u)
β̃1 = i=1 Pn 2
i=1 (xi1 − x̄1 )
On further simplification:
Pn
(xi1 − x̄1 )xi2
E (β̃1 |x1 , x2 ) = β1 + β2 Pi=1
n 2
i=1 (xi1 − x̄1 )
Now:
Pn
(xi1 − x̄1 )xi2
Pi=1
n 2
i=1 (xi1 − x̄1 )
x2 = δ̃0 + δ̃1 x1 + e
Therefore,
Important Terminology:
1) Downward Biased: When β1 > 0 and β2 δ̃1 < 0, When β1 < 0 and
β2 δ̃1 > 0
2) Upward Biased: When β1 > 0 and β2 δ̃1 > 0, When β1 < 0 and
β2 δ̃1 < 0
True model:
y = β0 + β1 x1 + β2 x2 + β3 x3 + u
x3 is omitted, what is the sign of the bias?
Sign of the Bias is difficult to determine when there are multiple
regressors in the estimated model.
Notable point: Correlation between a single explanatory variable and
the error generally results in all OLS estimators being biased.
An approximation, assume that x1 and x2 are uncorrelated, then we
can sign the bias. Same derivation as before.
An additional assumption:
MLR.5 Homoskedastic Errors (Var (u|x) = σ 2 ) Example:
Savings = β0 + β1 Income + u
Variance(u|Income) = σ 2 If Variance changes with any of the explanatory
variables, then heteroskedasticity is present.
Gauss-Markov Assumptions: MLR.1-MLR.5
y = β0 + β1 x1 + β2 x2 + ... + βk xk + u
σ2
Var (β̂j ) =
SSTj (1 − Rj2 )
for j = 1,P2, ..., k, where
SSTj = ni=1 (xij − x̄j )2 (Total variation in xj )
Rj2 =R-square from regressing xj on other explanatory variables
y = β0 + β1 x1 + β2 x2 + u
Consider the below two estimators for β1
Estimate true model:
y = β̃0 + β̃1 x1 + ũ
Case I: β2 6= 0
We clearly prefer β̂1 since it is unbiased.
But here note that Var (β̃1 ) < Var (β̂1 ) when there is correlation
between x1 and x2 and population variance of errors is known. As
sample size increases bias does not go away but tradeoff in variance
reduces.
Also, when we do not know the population σ 2 we estimate it using
sample and that can be larger when β2 6= 0.
Case II: β2 = 0
In this case we prefer β̃1 because we gain nothing in bias but lose in terms
of variance (if there is correlation between x1 and x2 ).
Var (β̃1 ) < Var (β̂1 )
can see the above from the direct application of variance formula
Kanika Mahajan (Ashoka University) Econometrics March 29, 2019 29 / 32
Estimating the variance of Errors
Pn 2
2 i=1 ûi
σ̂ =
n−k −1
The denominator reflects the degrees of freedom= n − (k + 1).
This means that, given n − (k + 1) of the residuals, the remaining (k + 1)
residuals are known.
Terminology for σ̂: standard error of the regression/the root mean squared
error. Notably, while SSR must fall when another explanatory variable is
added, the degrees of freedom also falls by one. So RMSE of a regression
can increase or decrease when another variable is added.
A note on terminology:
σ2
s.d.(β̂j ) =
SSTj (1 − Rj2 )
σ̂ 2
s.e.(β̂j ) =
SSTj (1 − Rj2 )
Under Assumptions MLR.1 through MLR.5, the OLS estimator β̂j for βj is
the best linear unbiased estimator (BLUE).
Linear in the above context has a different meaning: linear function of the
data on the dependent variable
n
X
β̂j = wij yi
i=1
Best: this implies minumum variance amongst all the class of linear
unbiased estimators