Classical Least Squares Theory - Lecture Notes
Classical Least Squares Theory - Lecture Notes
Classical Least Squares Theory - Lecture Notes
CHUNG-MING KUAN
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 1 / 100
Lecture Outline
1 The Method of Ordinary Least Squares (OLS)
Simple Linear Regression
Multiple Linear Regression
Geometric Interpretations
Measures of Goodness of Fit
Example: Analysis of Suicide Rate
2 Statistical Properties of the OLS Estimator
Classical Conditions
Without the Normality Condition
With the Normality Condition
3 Hypothesis Testing
Tests for Linear Hypotheses
Power of the Tests
Alternative Interpretation of the F Test
Confidence Regions
Example: Analysis of Suicide Rate
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 2 / 100
Lecture Outline (cont’d)
4 Multicollinearity
Near Multicollinearity
Regression with Dummy Variables
Example: Analysis of Suicide Rate
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 3 / 100
Simple Linear Regression
Together we write:
y= α + βx + e(α, β) .
| {z } | {z }
linear function error
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 4 / 100
For the specification α + βx, the objective is to find the “best” fit of the
data (yt , xt ), t = 1, . . . , T .
1 Minimizing a least-squares (LS) criterion function wrt α and β:
T
1 X
QT (α, β) := (yt − α − βxt )2 .
T
t=1
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 5 / 100
For the specification α + βx, the objective is to find the “best” fit of the
data (yt , xt ), t = 1, . . . , T .
1 Minimizing a least-squares (LS) criterion function wrt α and β:
T
1 X
QT (α, β) := (yt − α − βxt )2 .
T
t=1
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 5 / 100
For the specification α + βx, the objective is to find the “best” fit of the
data (yt , xt ), t = 1, . . . , T .
1 Minimizing a least-squares (LS) criterion function wrt α and β:
T
1 X
QT (α, β) := (yt − α − βxt )2 .
T
t=1
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 5 / 100
The OLS Estimators
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 6 / 100
The estimated regression line is ŷ = α̂T + β̂T x, which is the linear
function evaluated at α̂T and β̂T , and ê = y − ŷ is the error
evaluated at α̂T and β̂T and also known as residual.
The t-th fitted value of the regression line is ŷt = α̂T + β̂T xt .
The t-th residual is êt = yt − ŷt = et (α̂T , β̂T ).
No other linear functions of the form a + bx can provide a better fit of
the data in terms of sum of squared errors.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 7 / 100
Algebraic Properties
ȳ = α̂T + β̂T x̄; that is, the estimated regression line must pass
through the point (x̄, ȳ ).
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 8 / 100
Example: Analysis of Suicide Rate
Suppose we want to know how the suicide rate (s) in Taiwan can be
explained by unemployment rate (u), GDP growth rate (g ), or time
(t). The suicide rate is 1/100000.
Data (1981–2013): s̄ = 12.05 with s.d. 3.91; ḡ = 5.64 with s.d. 3.16;
ū = 3.09 with s.d. 1.33.
Estimation results:
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 9 / 100
10 / 100
2013
(b) Suicide and unemploy. rates
2012
2011
2010
2009
October 18, 2014
2008
2007
2006
2005
2004
2003
Unemplyment Rate
2002
2001
2000
1999
1998
1997
1996
Suicide Rate
1995
1994
1993
1992
1991
1990
1989
1988
1987
20
15
10
2009
2008
2007
2006
2005
2004
2003
Real GDP Growth Rate
2002
2001
-5
25
20
15
10
0
Multiple Linear Regression
y = β1 x1 + · · · + βk xk + e(β1 , . . . , βk ).
y = Xβ + e(β), (1)
where β = (β1 β2 · · · βk )0 ,
y1 x11 x12 · · · x1k e1 (β)
y2 x21 x22 · · · x2k e2 (β)
y= .. , X= .. .. .. .. , e(β) = .. .
. . . . . .
yT xT 1 xT 2 · · · xTk eT (β)
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 11 / 100
Least-squares criterion function:
1 1
QT (β) := e(β)0 e(β) = (y − Xβ)0 (y − Xβ). (2)
T T
The FOCs of minimizing QT (β) are −2X0 (y − Xβ)/T = 0, leading
to the normal equations:
X0 Xβ = X0 y.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 12 / 100
Given [ID-1], X0 X is positive definite and hence invertible. The unique
solution to the normal equations is known as the OLS estimator of β:
The result below requires only the identification requirement and does
not depend on the statistical properties of y and X.
Theorem 3.1
Given specification (1), suppose [ID-1] holds. Then, the OLS estimator
β̂ T = (X0 X)−1 X0 y uniquely minimizes the criterion function (2).
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 13 / 100
The magnitude of β̂ T is affected by the measurement units of the
dependent and explanatory variables.
A larger coefficient does not imply that the associated regressor is more
important.
The so-called “beta coefficients” (see homework) do not depend on the
measurement units, and hence their magnitudes are comparable.
Given β̂ T , the vector of the OLS fitted values is ŷ = Xβ̂ T , and the
vector of the OLS residuals is ê = y − ŷ = e(β̂ T ).
Plugging β̂ T into the FOCs: X0 (y − Xβ) = 0, we have:
X0 ê = 0.
PT
When X contains a vector of ones, t=1 êt = 0.
0
ŷ0 ê = β̂ T X0 ê = 0.
These are all algebraic results under the OLS method.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 14 / 100
Geometric Interpretations
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 15 / 100
y
x2
ê = (I − P )y
x2 β̂ 2 P y = x1 β̂ 1 + x2 β̂ 2
x1
x1 β̂ 1
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 16 / 100
Theorem 3.3 (Frisch-Waugh-Lovell)
Given y = X1 β 1 + X2 β 2 + e, the OLS estimators of β 1 and β 2 are
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 17 / 100
Proof: Writing y = X1 β̂ 1,T + X2 β̂ 2,T + (I − P)y, where P = X(X0 X)−1 X0
with X = [X1 X2 ], we have
X01 (I − P2 )y
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 18 / 100
Some Implications of the FWL Theorem
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 19 / 100
Observe that (I − P1 )y = (I − P1 )X2 β̂ 2,T + (I − P1 )(I − P)y.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 20 / 100
y
x2 ê = (I − P)y
Py
(I − P1 )y
(P − P1 )y
x1
P1 y
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 21 / 100
Measures of Goodness of Fit
RSS ESS
R2 = =1− , (4)
TSS TSS
measures the proportion of the total variation of yt that can be
explained by the model.
It is invariant wrt measurement units of the dependent variable but not
invariant wrt constant addition.
It is a relative measure such that 0 ≤ R 2 ≤ 1.
It is nondecreasing in the number of regressors. (Why?)
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 22 / 100
Centered R 2
When the specification contains a constant term,
T
X T
X T
X
(yt − ȳ )2 = (ŷt − ŷ¯ )2 + êt2 .
|t=1 {z } |t=1 {z } |t=1
{z }
centered TSS centered RSS ESS
The centered coefficient of determination (or centered R 2 ),
PT 2
2 t=1 (ŷt − ȳ ) Centered RSS ESS
R = PT = =1− ,
t=1 (yt − ȳ )
2 Centered TSS Centered TSS
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 24 / 100
Adjusted R 2
ê0 ê/(T − k)
R̄ 2 = 1 − .
(y0 y − T ȳ 2 )/(T − 1)
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 25 / 100
Example: Analysis of Suicide Rate
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 26 / 100
Estimation results with t but without g :
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 27 / 100
Estimation results with t and t 2 :
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 28 / 100
Classical Conditions
[A1] X is non-stochastic.
[A2] y is a random vector such that
(i) IE(y) = Xβ o for some β o ;
(ii) var(y) = σo2 IT for some σo2 > 0.
[A3] y is a random vector s.t. y ∼ N (Xβ o , σo2 IT ) for some β o and σo2 > 0.
The specification (1) with [A1] and [A2] is known as the classical
linear model, whereas (1) with [A1] and [A3] is the classical normal
linear model.
When var(y) = σo2 IT , the elements of y are homoskedastic and
(serially) uncorrelated.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 29 / 100
Without Normality
Theorem 3.4
Consider the linear specification (1).
(a) Given [A1] and [A2](i), β̂ T is unbiased for β o .
(b) Given [A1] and [A2], σ̂T2 is unbiased for σo2 .
(c) Given [A1] and [A2], var(β̂ T ) = σo2 (X0 X)−1 .
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 30 / 100
Proof: By [A1], IE(β̂ T ) = IE[(X0 X)−1 X0 y] = (X0 X)−1 X0 IE(y). [A2](i)
gives IE(y) = Xβ o , so that
where the 4-th equality follows from [A2](ii) that var(y) = σo2 IT .
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 31 / 100
Proof (cont’d): As trace(IT − P) = rank(IT − P) = T − k, we have
IE(ê0 ê) = σo2 (T − k) and
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 32 / 100
Theorem 3.4 establishes unbiasedness of the OLS estimators β̂ T and
σ̂T2 but does not address the issue of efficiency.
By Theorem 3.4(c), the elements of β̂ T can be more precisely
estimated (i.e., with a smaller variance) when X has larger variation.
To see this, consider the simple linear regression: y = α + βx + e, it
can be verified that
1
var(β̂T ) = σo2 PT .
t=1 (xt − x̄)2
PT
Thus, the larger the (squared) variation of xt (i.e., t=1 (xt − x̄)2 ),
the smaller is the variance of β̂T .
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 33 / 100
The result below establishes efficiency of β̂ T among all unbiased
estimators of β o that are linear in y.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 34 / 100
Proof (cont’d): The condition CX = 0 implies cov(β̂ T , Cy) = 0. Thus,
This shows that var(β̌ T ) − var(β̂ T ) is a p.s.d. matrix σo2 CC0 , so that β̂ T
is more efficient than any linear unbiased estimator β̌ T .
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 35 / 100
Example: IE(y) = X1 b1 and var(y) = σo2 IT . Two specification:
y = X1 β 1 + e.
y = Xβ + e = X1 β 1 + X2 β 2 + e.
0 0
with the OLS estimator β̂ T = (β̂ 1,T β̂ 2,T )0 . Clearly, b̂1,T is the BLUE of
b1 with var(b̂1,T ) = σo2 (X01 X1 )−1 . By the Frisch-Waugh-Lovell Theorem,
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 36 / 100
Example (cont’d):
is p.s.d. This shows that b̂1,T is more efficient than β̂ 1,T , as it ought to
be.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 37 / 100
With Normality
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 39 / 100
Proof (cont’d): Let C orthogonally diagonalizes IT − P such that
C0 (IT − P)C = Λ. Since rank(IT − P) = T − k, Λ contains T − k
eigenvalues equal to one and k eigenvalues equal to zero. Then,
" #
IT −k 0
y∗0 (IT − P)y∗ = y∗0 C[C0 (IT − P)C]C0 y∗ = η 0 η.
0 0
proving (b). (c) is a direct consequence of (b) and the facts that
χ2 (T − k) has mean T − k and variance 2(T − k).
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 40 / 100
Theorem 3.8
Given the linear specification (1), suppose that [A1] and [A3] hold. Then
the OLS estimators β̂ T and σ̂T2 are the best unbiased estimators (BUE)
for β o and σo2 , respectively.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 41 / 100
Proof (cont’d):
By the information matrix equality, − IE[H(β o , σo2 )] is the information
matrix. Then, its inverse,
σo2 (X0 X)−1 0
− IE[H(β o , σo2 )]−1 = ,
2σo4
0 T
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 42 / 100
Tests for Linear Hypotheses
so that
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 43 / 100
When q = 1, Rβ̂ T and R(X0 X)−1 R0 are scalars. Under the null hypothesis,
Rβ̂ T − r R(β̂ T − β o )
0 0 1/2 = ∼ N (0, 1).
−1
σo [R(X X) R ] σo [R(X0 X)−1 R0 ]1/2
Rβ̂ T − r
τ= .
σ̂T [R(X0 X)−1 R0 ]1/2
Theorem 3.9
Given the linear specification (1), suppose that [A1] and [A3] hold. When
R is 1 × k, τ ∼ t(T − k) under the null hypothesis.
Note: The normality condition [A3] is crucial for this t distribution result.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 44 / 100
Proof: We write the statistic τ as
,s
Rβ̂ T − r (T − k)σ̂T2 /σo2
τ= ,
σo [R(X0 X)−1 R0 ]1/2 T −k
= 0.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 45 / 100
Examples
β̂i,T − c
τ= √ ∼ t(T − k),
σ̂T mii
aβ̂i,T + b β̂j,T − c
τ= p ∼ t(T − k).
σ̂T [a2 mii + b 2 mjj + 2abmij ]
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 46 / 100
When R is a q × k matrix with full row rank q (q > 1), we have under the
null hypothesis: [R(X0 X)−1 R0 ]−1/2 (Rβ̂ T − r)/σo ∼ N (0, Iq ). Hence,
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 47 / 100
Theorem 3.10
Given the linear specification (1), suppose that [A1] and [A3] hold. When
R is q × k with full row rank, ϕ ∼ F (q, T − k) under the null hypothesis.
Notes:
1 When q = 1, ϕ ∼ F (1, T − k), and this distribution is the same as
that of τ 2 .
2 Note that t distribution is symmetric about zero. Hence one may
consider one- or two-sided t test. On the other hand, F distribution is
non-negative and asymmetric, it is more typical to consider only
one-sided F test.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 48 / 100
Example: Ho : β1 = b1 and β2 = b2 . The F statistic,
!0 " #−1 !
1 β̂1,T − b1 m11 m12 β̂1,T − b1
ϕ= 2 ,
2σ̂T β̂2,T − b2 m21 m22 β̂2,T − b2
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 49 / 100
Test Power
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 50 / 100
Proof: When Rβ o = r + δ,
(Rβ̂ T − r)0 [R(X0 X)−1 R0 ]−1 (Rβ̂ T − r)/σo2 ∼ χ2 (q; δ 0 D−1 δ),
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 51 / 100
Test power is determined by the non-centrality parameter δ 0 D−1 δ,
where δ signifies the deviation from the null. When Rβ o deviates
farther from the hypothetical value r (i.e., δ is “large”), the
non-centrality parameter δ 0 D−1 δ increases, and so does the power.
Example: The null distribution is F (2, 20), and its critical value at 5%
level is 3.49. Then for F (2, 20; ν1 , 0) with the non-centrality
parameter ν1 = 1, 3, 5, the probabilities that ϕ exceeds 3.49 are
approximately 12.1%, 28.2%, and 44.3%, respectively.
Example: The null distribution is F (5, 60), and its critical value at 5%
level is 2.37. Then for F (5, 60; ν1 , 0) with ν1 = 1, 3, 5, the
probabilities that ϕ exceeds 2.37 are approximately 9.4%, 20.5%, and
33.2%, respectively.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 52 / 100
Alternative Interpretation
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 53 / 100
The sum of squared, constrained OLS residuals are:
where the 2nd term on the RHS is the numerator of the F statistic.
Letting ESSc = ë0 ë and ESSu = ê0 ê we have
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 54 / 100
The sum of squared, constrained OLS residuals are:
where the 2nd term on the RHS is the numerator of the F statistic.
Letting ESSc = ë0 ë and ESSu = ê0 ê we have
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 54 / 100
Confidence Regions
IP{ g α ≤ βi,o ≤ g α } = 1 − α,
= 1 − α.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 55 / 100
The confidence region for a vector of parameters can be constructed
by resorting to F statistic.
For (β1,o = b1 , β2,o = b2 )0 , suppose T − k = 30 and α = 0.05. Then,
F0.05 (2, 30) = 3.32, and
!0 " #−1 !
1 β̂1,T − b1 m11 m12 β̂1,T − b1
IP ≤ 3.32
2σ̂T2 β̂2,T − b2 m21 m22 β̂2,T − b2
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 56 / 100
Example: Analysis of Suicide Rate
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 57 / 100
Part II: Estimation results with t and g
F tests for the joint significance of the coefficients of g and t: 0.03 (Model
2) and 0.13 (Model 4), which are insignificant even at 10% level.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 58 / 100
Part III: Estimation results with t and t 2
const ut ut−1 gt gt−1 t t2 R̄ 2 /F
F tests for the joint significance of the coefficients of g and t: 5.45∗ (Model
3) and 4.29∗ (Model 5), which are significant at 5% level.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 59 / 100
Selected estimation results (with more precise estimates):
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 60 / 100
Near Multicollinearity
σo2
var(β̂i,T ) = σo2 [x0i (I − Pi )xi ]−1 = PT ,
t=1 (xti − x̄i )2 (1 − R 2 (i))
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 61 / 100
How do we circumvent the problems from near multicollinearity?
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 62 / 100
Digression: Regression with Dummy Variables
Example: Let yt denote the wage of the t th individual and xt the working
experience (in years). Consider the following specification:
yt = α0 + α1 Dt + β0 xt + et ,
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 63 / 100
We may also consider the specification with a dummy variable and its
interaction with a regressor:
yt = α0 + α1 Dt + β0 xt + β1 (xt Dt ) + et .
Then, the slopes of the regressions for female and male are, respectively,
β0 and β0 + β1 . These two regressions coincide if α1 = 0 and β1 = 0. In
this case, testing no wage discrimination against female amounts to
testing the joint hypothesis of α1 = 0 and β1 = 0.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 64 / 100
Example: Consider two dummy variables:
D1,t = 1 if high school is t’s highest degree and D1,t=0 otherwise;
D2,t = 1 if college or graduate is t’s highest degree and D2,t=0 otherwise.
The specification below in effect puts together 3 regressions:
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 65 / 100
Example: Analysis of Suicide Rate
The “before-change” regression has the intercept α0 and slope β0 , and the
“after-change” regression has the intercept α0 + δ and slope β0 + γ.
Testing a structure change at T ∗ amounts to testing δ = 0 and γ = 0
(Chow test).
Alternatively, we can estimate the specification:
st = α0 (1 − Dt ) + α1 Dt + β0 ut−1 (1 − Dt ) + β1 ut−1 Dt + et ,
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 66 / 100
Part I: Estimation results with a known change: Without t
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 67 / 100
Part II: Estimation results with a known change: With t
F test of the coefficients of Dt and tDt being zero: 15.28∗∗ (’92); 15.83∗∗
(’93); 15.52∗∗ (’94); 14.51∗∗ (’95)
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 68 / 100
We do not know T ∗ , the year of change, and hence tried estimating with
different T ∗ :
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 69 / 100
Limitation of the Classical Conditions
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 70 / 100
When var(y) 6= σo2 IT
β̂ T is not the BLUE for β o , and it is not the BUE for β o under
normality.
c β̂ T ) = σ̂T2 (X0 X)−1 is a biased estimator for
The estimator var(
var(β̂ T ). Consequently, the t and F tests do not have t and F
distributions, even when y is normally distributed.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 71 / 100
The GLS Estimator
which is still linear and unbiased. It would be the BLUE provided that
G is chosen such that GΣo G0 = σo2 IT .
−1/2 −1/2
Setting G = Σo , where Σo = CΛ−1/2 C0 and C orthogonally
−1/2 −1/20
diagonalizes Σo : C0 Σo C = Λ, we have Σo Σo Σo = IT .
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 72 / 100
−1/2 −1/2
With y∗ = Σo y and X∗ = Σo X, we have the GLS estimator:
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 73 / 100
Stochastic Properties of the GLS Estimator
T 1 1
log L(β; Σo ) = − log(2π)− log(det(Σo ))− (y−Xβ)0 Σ−1
o (y−Xβ),
2 2 2
with the FOC: X0 Σ−1
o (y − Xβ) = 0. Thus, the GLS estimator is also
the MLE under normality.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 74 / 100
Under normality, the information matrix is
IE[X0 Σ−1
o (y − Xβ)(y − Xβ) 0 −1
Σo X] = X0 Σ−1
o X.
β=β o
Thus, the GLS estimator is the BUE for β o , because its covariance
matrix reaches the Crámer-Rao lower bound.
Under the null hypothesis Rβ o = r, we have
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 75 / 100
Under normality, the information matrix is
IE[X0 Σ−1
o (y − Xβ)(y − Xβ) 0 −1
Σo X] = X0 Σ−1
o X.
β=β o
Thus, the GLS estimator is the BUE for β o , because its covariance
matrix reaches the Crámer-Rao lower bound.
Under the null hypothesis Rβ o = r, we have
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 75 / 100
The Feasible GLS Estimator
where Σ
b is an estimator of Σ .
T o
Further difficulties in FGLS estimation:
The number of parameters in Σo is T (T + 1)/2. Estimating Σo
without some prior restrictions on Σo is practically infeasible.
Even when an estimator Σ b T is available under certain assumptions,
β̂ FGLS is a complex function of the data y and X. As such, the
finite-sample properties of the FGLS estimator are typically difficult to
derive.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 76 / 100
The Feasible GLS Estimator
where Σ
b is an estimator of Σ .
T o
Further difficulties in FGLS estimation:
The number of parameters in Σo is T (T + 1)/2. Estimating Σo
without some prior restrictions on Σo is practically infeasible.
Even when an estimator Σ b T is available under certain assumptions,
β̂ FGLS is a complex function of the data y and X. As such, the
finite-sample properties of the FGLS estimator are typically difficult to
derive.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 76 / 100
The Feasible GLS Estimator
where Σ
b is an estimator of Σ .
T o
Further difficulties in FGLS estimation:
The number of parameters in Σo is T (T + 1)/2. Estimating Σo
without some prior restrictions on Σo is practically infeasible.
Even when an estimator Σ b T is available under certain assumptions,
β̂ FGLS is a complex function of the data y and X. As such, the
finite-sample properties of the FGLS estimator are typically difficult to
derive.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 76 / 100
Some Remarks on the Feasible GLS Estimation
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 77 / 100
Tests for Heteroskedasticity
A simple form of Σo is
" #
σ12 IT1 0
Σo = ,
0 σ22 IT2
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 78 / 100
More generally, for some constants c0 , c1 > 0, σt2 = c0 + c1 xtj2 .
The Goldfeld-Quandt test:
(1) Rearrange obs. according to the values of xj in a descending order.
(2) Divide the rearranged data set into three groups with T1 , Tm , and T2
observations, respectively.
(3) Drop the Tm observations in the middle group and perform separate
OLS regressions using the data in the first and third groups.
(4) The statistic is the ratio of the variance estimates:
Some questions:
Can we estimate the model with all observations and then compute σ̂T2 1
and σ̂T2 2 based on T1 and T2 residuals?
If Σo is not diagonal, does the F test above still work?
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 79 / 100
More generally, for some constants c0 , c1 > 0, σt2 = c0 + c1 xtj2 .
The Goldfeld-Quandt test:
(1) Rearrange obs. according to the values of xj in a descending order.
(2) Divide the rearranged data set into three groups with T1 , Tm , and T2
observations, respectively.
(3) Drop the Tm observations in the middle group and perform separate
OLS regressions using the data in the first and third groups.
(4) The statistic is the ratio of the variance estimates:
Some questions:
Can we estimate the model with all observations and then compute σ̂T2 1
and σ̂T2 2 based on T1 and T2 residuals?
If Σo is not diagonal, does the F test above still work?
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 79 / 100
More generally, for some constants c0 , c1 > 0, σt2 = c0 + c1 xtj2 .
The Goldfeld-Quandt test:
(1) Rearrange obs. according to the values of xj in a descending order.
(2) Divide the rearranged data set into three groups with T1 , Tm , and T2
observations, respectively.
(3) Drop the Tm observations in the middle group and perform separate
OLS regressions using the data in the first and third groups.
(4) The statistic is the ratio of the variance estimates:
Some questions:
Can we estimate the model with all observations and then compute σ̂T2 1
and σ̂T2 2 based on T1 and T2 residuals?
If Σo is not diagonal, does the F test above still work?
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 79 / 100
GLS and FGLS Estimation
−1/2
Clearly, var(Σo y) = IT . The GLS estimator is:
−1
X01 X1 X02 X2 X01 y1 X02 y2
β̂ GLS = + + .
σ12 σ22 σ12 σ22
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 80 / 100
With σ̂T2 1 and σ̂T2 2 from separate regressions, an estimator of Σo is
" #
σ̂T2 1 IT1 0
Σ
b=
2
.
0 σ̂T2 IT2
yt 1 xt,j−1 xt,j+1 x e
= βj + β1 + · · · + βj−1 + βj+1 + · · · + βk tk + t ,
xtj xtj xtj xtj xtj xtj
where var(yt /xtj ) = c := σo2 . Here, the GLS estimator is readily computed
as the OLS estimator for the transformed specification.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 81 / 100
Discussion and Remarks
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 82 / 100
Serial Correlation
When time series data yt are correlated over time, they are said to
exhibit serial correlation. For cross-section data, the correlations of yt
are known as spatial correlation.
A general form of Σo is that its diagonal elements (variances of yt )
are a constant σo2 , and the off-diagonal elements (cov(yt , yt−i )) are
non-zero.
In the time series context, cov(yt , yt−i ) are known as the
autocovariances of yt , and the autocorrelations of yt are
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 83 / 100
Simple Model: AR(1) Disturbances
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 84 / 100
By recursive substitution,
∞
X
t = ψ1i ut−i ,
i=0
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 85 / 100
The variance-covariance matrix var(y) is thus
1 ψ1 ψ12 · · · ψ1T −1
· · · ψ1T −2
ψ1 1 ψ1
Σo = σo2 ψ12
ψ1 1 · · · ψ1T −3 ,
. .. .. .. ..
.. . . . .
T −1 T −2 T −3
ψ1 ψ1 ψ1 ··· 1
with σo2 = σu2 /(1 − ψ12 ). Note that all off-diagonal elements of this matrix
are non-zero, but there are only two unknown parameters.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 86 / 100
−1/2
A transformation matrix for GLS estimation is the following Σo :
1 0 0 ··· 0 0
−√ ψ √ 1
···
1
0 0 0
1−ψ12 1−ψ12
− √ ψ1 2 √ 1 2 · · ·
0 0 0
1 1−ψ1 1−ψ1
.
σo .. .. .. .. .. ..
. . . . . .
1
0 0 0 ··· √ 0
1−ψ12
0 0 0 · · · − √ ψ1 2 √ 1 2
1−ψ1 1−ψ1
−1/2
Any matrix that is a constant proportion to Σo can also serve as a
legitimate transformation matrix for GLS estimation
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 87 / 100
The Cochrane-Orcutt Transformation is based on:
q
1 − ψ12 0 0 ··· 0 0
−ψ 1 0 ··· 0 0
1
0 −ψ1 1 ··· 0 0
q
−1/2 −1/2
Vo = σo 1 − ψ12 Σo = .. .. .. . . .. ..
,
. . . . . .
0 0 0 ··· 1 0
0 0 0 · · · −ψ1 1
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 88 / 100
Model Extensions
t = ψ1 t−1 + · · · + ψp t−p + ut ,
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 89 / 100
Tests for AR(1) Disturbances
Under AR(1), the null hypothesis is ψ1 = 0. A natural estimator of ψ1 is
the OLS estimator of regressing êt on êt−1 :
PT
êt êt−1
ψ̂T = Pt=2
T
.
2
t=2 êt−1
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 90 / 100
For 0 < ψ̂T ≤ 1 (−1 ≤ ψ̂T < 0), 0 ≤ d < 2 (2 < d ≤ 4), there may
be positive (negative) serial correlation. Hence, d essentially checks
whether ψ̂T is “close” to zero (i.e., d is “close” to 2).
Difficulty: The exact null distribution of d holds only under the
classical conditions [A1] and [A3] and depends on the data matrix X.
Thus, the critical values for d can not be tabulated, and this test is
not pivotal.
The null distribution of d lies between a lower bound (dL ) and an
upper bound (dU ):
∗
dL,α < dα∗ < dU,α
∗
.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 91 / 100
Durbin-Watson test:
∗ ∗
(1) Reject the null if d < dL,α (d > 4 − dL,α ).
∗ ∗
(2) Do not reject the null if d > dU,α (d < 4 − dU,α ).
∗ ∗ ∗ ∗
(3) Test is inconclusive if dL,α < d < dU,α (4 − dL,α > d > 4 − dU,α ).
For the specification yt = β1 + β2 xt2 + · · · + βk xtk + γyt−1 + et ,
Durbin’s h statistic is
s
T
h = γ̂T ≈ N (0, 1),
1 − T var(γ̂
c T)
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 92 / 100
FGLS Estimation
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 94 / 100
Application: Linear Probability Model
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 95 / 100
Application: Linear Probability Model
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 95 / 100
An FGLS estimator may be obtained using
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 96 / 100
Application: Seemingly Unrelated Regressions
y i = Xi β i + ei , i = 1, 2, . . . , N.
PN PN
where y is TN × 1, X is TN × i=1 ki , and β is i=1 ki × 1.
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 97 / 100
Suppose yit and yjt are contemporaneously correlated, but yit and yjτ
are serially uncorrelated, i.e., cov(yi , yj ) = σij IT .
For this system, Σo = So ⊗ IT with
σ12 σ12 · · · σ1N
σ21 σ22 · · · σ2N
So = .. .. .. .. ;
. . . .
σN1 σN2 · · · σN2
that is, the SUR system has both serial and spatial correlations.
As Σ−1 −1
o = So ⊗ IT , then
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 98 / 100
Remarks:
When σij = 0 for i 6= j, So is diagonal, and so is Σo . Then, the GLS
estimator for each β i reduces to the corresponding OLS estimator, so
that joint estimation of N equations is not necessary.
If all equations in the system have the same regressors, i.e., Xi = X0
(say) and X = IN ⊗ X0 , the GLS estimator is also the same as the OLS
estimator.
More generally, there would not be much efficiency gain for GLS
estimation if yi and yj are less correlated and/or Xi and Xj are highly
correlated.
The FGLS estimator can be computed as
b−1 ⊗ I )X]−1 X0 (S
β̂ FGLS = [X0 (S b−1 ⊗ I )y.
TN T TN T
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 99 / 100
S TN is an N × N matrix:
b
ê01
ê02
h
1 i
STN =
b .. ê ê . . . ê
1 2 N ,
T
.
ê0N
C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 100 / 100