The Assumptions of The CLRM
The Assumptions of The CLRM
The Assumptions of The CLRM
General
In the previous section we described the desirable properties of estimators. However, we need to make clear that there is no
guarantee that the OLS estimators will
possess any of these properties unless a number of assumptions – which this section
presents – hold.
In general, when we calculate estimators of population parameters from sample data
we are bound to make some initial assumptions about the population distribution.
ASTERIOU: “chap03” — 2011/3/29 — 18:47 — page 36 — #10
36 The Classical Linear Regression Model
Usually, they amount to a set of statements about the distribution of the variables we are
investigating, without which our model and estimates cannot be justified. Therefore,
it is important not only to present the assumptions but also to move beyond them,
to the extent that we will at least study what happens when they go wrong, and how
we may test whether they have gone wrong. This will be examined in the third part of
this book.
The assumptions
The CLRM consists of eight basic assumptions about the ways in which the observations
are generated:
1 Linearity. The first assumption is that the dependent variable can be calculated as a
linear function of a specific set of independent variables, plus a disturbance term.
This can be expressed mathematically as follows: the regression model is linear in
the unknown coefficients α and β so that Yt = α + βXt + ut, for t = 1, 2, 3, . . . , n.
2X
t has some variation. By this assumption we mean that not all observations of Xt
are the same; at least one has to be different so that the sample Var(X) is not 0. It
is important to distinguish between the sample variance, which simply shows how
much X varies over the particular sample, and the stochastic nature of X. In many
places in this book we shall make the assumption that X is non-stochastic (see point 3
below). This means that the variance of X at any point in time is zero, so Var(Xt) = 0,
and if we could somehow repeat the world over again X would always take exactly
the same values. But, of course, over any sample there will (indeed must) be some
variation in X.
3 Xt is non-stochastic and fixed in repeated samples. By this assumption we mean first
that Xt is a variable whose values are not determined by some chance mechanism – they are determined by an experimenter or
investigator; and second that
it is possible to repeat the sample with the same independent variable values. This
implies that Cov(Xs, ut) = 0 for all s, and t = 1, 2, . . . , n; that is, Xt and ut are
uncorrelated.
4 The expected value of the disturbance term is zero. This means that the disturbance is a
genuine disturbance, so that if we took a large number of samples the mean disturbance would be zero. This can be denoted as
E(ut) = 0. We need this assumption in
order to interpret the deterministic part of a regression model, α+βXt, as a ‘statistical
average’ relation.
5 Homoskedasticity. This requires that all disturbance terms have the same variance, so
that Var(ut) = σ 2 = constant for all t.
6 Serial independence. This requires that all disturbance terms are independently distributed, or, more easily, are not correlated
with one another, so that Cov(ut, us) =
E(ut − Eut)(us − Eus) = E(utus) = 0 for all t = s. This assumption has a special significance in economics; to grasp what it means
in practice, recall that we nearly always
obtain our data from time series in which each t is one year, or one quarter, or one
ASTERIOU: “chap03” — 2011/3/29 — 18:47 — page 37 — #11
Simple Regression 37
week ahead of the last. The condition means, therefore, that the disturbance in one
period should not be related to a disturbance in the next or previous periods. This
condition is frequently violated since, if there is a disturbing effect at one time, it is
likely to persist. In this discussion we shall be studying violations of this assumption
quite carefully.
7 Normality of residuals. The disturbances u1, u2, . . . , un are assumed to be independently and identically normally distributed,
with mean zero and common
variance σ 2.
8 n > 2 and multicollinearity. This assumption says that the number of observations
must be greater than two, or in general must be greater than the number of independent variables, and that there are no exact
linear relationships among the
variables.