Lecture 2

Autoregressive Processes

Dennis Sun
Stats 253

June 25, 2014

1 Last Class

2 Bootstrap Standard Errors

3 Maximum Likelihood Estimation

4 Spatial Autoregression
Case Study
Simultaneous vs. Conditional Autoregression
Non-Gaussian Data

5 Wrapping Up

Outline of Lecture

Last Class

Where are we?

Last Class

Motivation for AR processes

The linear regression model

y = X + ,  N (0, 2 I)

assumes observations yt are independent.

We can introduce dependence by adding a lag term:

yt = xTt + yt1 + t

Last Class

Least Squares Estimation

We can still estimate and by least squares:

y2 y1
.. ..
. = X2:n . +

yn yn1

Advantages: consistent estimate of and

Disadvantages: discard 1 observation, standard errors are incorrect

Last Class

Simulation Study
Simulated many instances of a length 1000 random walk
yt = yt1 + t , = 1

Estimate by autoregression.
Histogram of phi


0.90 0.95 1.00 1.05 1.10


Var() = .003
Last Class

Simulation Study

Simulated many instances of a length 1000 random walk

yt = yt1 + t , = 1

Estimate by autoregression.

Good Bad

Last Class

Million dollar question

How do we obtain correct standard errors?

Bootstrap Standard Errors

The (Parametric) Bootstrap

In the simulation, we knew and so were able to simulate many

instances of
yt = yt1 + t
to estimate Var().
In practice, we do not know thats why were estimating it!
Idea: We have a (pretty good) estimate of . Why not simulate
many instances of
yt = yt1 + t
to estimate Var()?
This is the (parametric) bootstrap.

Maximum Likelihood Estimation

Where are we?

Maximum Likelihood Estimation

Review of the MLE

Another general approach for estimating parameters is maximum

likelihood estimation.
The likelihood is the probability distribution, viewed as a function of
L() = p(y1 , ..., yn |)

The MLE estimates by choosing the maximizes L for the

observed data:
mle = argmax log L()

Maximum Likelihood Estimation

MLE of an AR process
We need to calculate p(y1 , ..., yn |).

p(y1 , ..., yn ) = p(y1 ) p(y2 |y1 ) p(y3 |y1 , y2 ) ... p(yn |y1 , ..., yn1 ).

Recall that for an AR process, we have yt = yt1 + t .

p(yt |y1 , ..., yt1 ) = p(yt |yt1 ) for t=2, ..., n

is the density of a N (yt1 , 2 ).

1 1 2
p(yt |yt1 ) = exp 2 (yt yt1 )
2 2

Putting it all together, we have:

 n1 ( )
1 1 X 2
p(y1 , ..., yn ) = p(y1 ) exp 2 (yt yt1 )
2 2

Maximum Likelihood Estimation

MLE of an AR process
 n1 ( )
1 1 X
p(y1 , ..., yn ) = p(y1 ) exp 2 (yt yt1 )2
2 2

The log-likelihood is:

1 X
log p(y1 ) (n 1) log( 2) 2 (yt yt1 )2

and we maximize this over .

How does this compare with regression (least squares)?
In least squares, we minimize
(yt yt1 )2 .

Maximum likelihood and least squares are identical for AR time series!
Maximum Likelihood Estimation


Maximum likelihood is another recipe for coming up with a good

The MLE for an AR process turns out to be the same as the least
squares estimator.
= mle
The parametric bootstrap is a general way to get an estimate of

Spatial Autoregression

Where are we?

Spatial Autoregression

Graphical Representation of AR(1) process

AR(1) process: yt = yt1 + t


An edge between yi and yj indicates that yi and yj are dependent,

conditional on the rest.

Spatial Autoregression Case Study

North Carolina SIDS Data

Sudden infant death syndrome (SIDS): unexplained infant deaths.
Is it genetic? environmental? random?
Number of SIDS cases Si , i = 1, ..., 100 collected for 100 North
Carolina counties.

Freeman-Tukey transformed data:
yi = (1000Si /ni )1/2 + (1000(Si + 1)/ni )1/2
Spatial Autoregression Case Study

An Autoregressive Model
Lets try to model this as a spatial process.

Let N (i) denote the neighbors of county i. Consider the model:

1 X
yi i = (yj j ) + i ,
|N (i)|
jN (i)

where e.g., i = xTi . What happens if = 0?

Spatial Autoregression Case Study

Estimating Parameters

1 X
yi i = (yj j ) + i
|N (i)|
jN (i)

Should we estimate parameters by least squares? No! Its

inconsistent. (Whittle 1954)
Lets try maximum likelihood.
First, write in vector notation as

y = W (y ) + 

(I W )(y ) = 
so y = + (I W )1  N (, (I W )1 2 I(I W T )1 ).
Now we can write down the likelihood and maximize it.

Spatial Autoregression Case Study

Data Analysis

R Code:
model <- spautolm(ft.SID74 ~ 1, data=nc,
listw=nb2listw(neighbors, zero.policy=T))
R Output:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.8597 0.1445 19.791 < 2.2e-16

Lambda: 0.38891 LR test value: 11.286 p-value: 0.00078095

Numerical Hessian standard error of lambda: 0.10761

Log likelihood: -133.3255

ML residual variance (sigma squared): 0.80589, (sigma: 0.89771)
Number of observations: 100
Number of parameters estimated: 3
AIC: 272.65

Spatial Autoregression Case Study

Data Analysis
R Code:
model <- spautolm(ft.SID74 ~ ft.NWBIR74, data=nc,
listw=nb2listw(neighbors, zero.policy=T))
R Output:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.5444201 0.2161106 7.1464 8.906e-13
ft.NWBIR74 0.0416524 0.0060981 6.8303 8.471e-12

Lambda: 0.083728 LR test value: 0.38241 p-value: 0.53632

Numerical Hessian standard error of lambda: 0.13428

Log likelihood: -117.7629

ML residual variance (sigma squared): 0.616, (sigma: 0.78486)
Number of observations: 100
Number of parameters estimated: 4
AIC: 243.53
Spatial Autoregression Simultaneous vs. Conditional Autoregression

Different Specifications?

Previously, we considered the simultaneous specification:

1 X
yi i = (yj j ) + i
|N (i)|
jN (i)

We might also consider the conditional specification:

1 X
(yj j ), 2

yi (yj , j N (i)) N i +
|N (i)|
jN (i)

Are the two specifications equivalent?
Is the conditional specification even well defined?

Spatial Autoregression Simultaneous vs. Conditional Autoregression

Difficulties with the Conditional Specification

Recall that with temporal data, we had the conditional specification
yt (y1 , ..., yt1 ) N (t + yt1 , 2 )

We were able to write the joint distribution in terms of these

conditionals using:
p(y1 , ..., yn ) = p(y1 ) p(y2 |y1 ) ... p(yn |y1 , ..., yn1 )

This formula doesnt help us here.

Spatial Autoregression Simultaneous vs. Conditional Autoregression

Difficulties with the Conditional Specification

In general, given a set of conditionals p(yi |yj , j 6= i), there does not
necessarily exist a joint distribution p(y1 , ..., yn ) with those
However, in this case, we can show that

y N (, (I W )1 2 I)

Spatial Autoregression Simultaneous vs. Conditional Autoregression

Data Analysis
R Code:
model <- spautolm(ft.SID74 ~ ft.NWBIR74, data=nc,
listw=nb2listw(neighbors, zero.policy=T), family="CAR")
R Output:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.5446517 0.2156409 7.1631 7.889e-13
ft.NWBIR74 0.0416498 0.0060856 6.8440 7.704e-12

Lambda: 0.078486 LR test value: 0.3631 p-value: 0.54679

Numerical Hessian standard error of lambda: 0.12741

Log likelihood: -117.7726

ML residual variance (sigma squared): 0.6151, (sigma: 0.78428)
Number of observations: 100
Number of parameters estimated: 4
AIC: 243.55
Spatial Autoregression Non-Gaussian Data

What to do about non-Gaussian data?

What if instead of

1 X
(yj j ), 2

yi (yj , j N (i)) N i +
|N (i)|
jN (i)

we had

1 X
yi (yj , j N (i)) Pois i + (yj j )?
|N (i)|
jN (i)

Impossible to write down joint distribution.
Challenging to simulate.

Spatial Autoregression Non-Gaussian Data

Some Preliminary Solutions

Simulation: Gibbs sampler

Start with an initial (y1 , ..., yn ), simulate sequentially:

y1 yj , j 6= 1

y2 yj , j 6= 2
yn yj , j 6= n
and repeat.
In the long run, the samples y = (y1 , ..., yn ) will be samples from the
joint distribution.
Estimation: coding and pseudo-likelihood

Spatial Autoregression Non-Gaussian Data


Spatial Autoregression Non-Gaussian Data


Spatial Autoregression Non-Gaussian Data


Consider maximizing the pseudo-likelihood L() = p(yblack |ywhite ).

This is easy because the yi s at the black nodes are independent,
given the yi s at the white nodes.
Wrapping Up

Where are we?

Wrapping Up

What Weve Learned

The (parametric) bootstrap can be used to get valid standard errors.

The MLE is a general way of coming up with an estimator: equivalent
to least squares in the temporal case, but better in the spatial case.
There are two similar, but different formulations of spatial
autoregression: simultaneous and conditional.
Things are easiest in the Gaussian setting, but Gibbs sampling and
coding can be used with non-Gaussian data.

Wrapping Up


Enrollment cap?
Homework 1: autoregression and bootstrap
Will be posted by tomorrow night.
Remember that you can work in pairs! (Hand in only one problem set
per pair.)
Will be graded check, resubmit, or zero.
Edgar will be lecturing next Monday on R for spatial data.
Jingshu and Edgar will be holding workshops starting next week.

