Nothing Special   »   [go: up one dir, main page]

Verbeek e Nijman - Testing For Selectivity Bias in Panel Data Models

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Economics Department of the University of Pennsylvania

Institute of Social and Economic Research -- Osaka University

Testing for Selectivity Bias in Panel Data Models


Author(s): Marno Verbeek and Theo Nijman
Source: International Economic Review, Vol. 33, No. 3 (Aug., 1992), pp. 681-703
Published by: Wiley for the Economics Department of the University of Pennsylvania and
Institute of Social and Economic Research -- Osaka University
Stable URL: http://www.jstor.org/stable/2527133
Accessed: 21-08-2015 12:53 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Wiley, Economics Department of the University of Pennsylvania and Institute of Social and Economic Research -
- Osaka University are collaborating with JSTOR to digitize, preserve and extend access to International Economic Review.

http://www.jstor.org

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
INTERNATIONAL ECONOMIC REVIEW
Vol. 33, No. 3, August 1992

TESTING FOR SELECTIVITY BIAS IN PANEL DATA MODELS*

BY MARNO VERBEEKAND THEO NIJMAN1

We discuss several tests to check for the presence of selectivity bias in


estimatorsbased on panel data. One approachto test for selectivity bias is to
specify the selection mechanismexplicitly and estimate it jointly with the
model of interest. Alternatively,one can derive the asymptoticallyefficient
LM test. Both approachesare computationallydemanding.In this paper, we
propose the use of simple variableadditionand (quasi-) Hausmantests for
selectivitybias thatdo not requireany knowledgeof the responseprocess. We
comparethe power of these tests with the asymptoticallyefficienttest using
Monte Carlomethods.

1. INTRODUCTION
Missing observations are a rule rather than an exception in panel data sets. It is
common practice in applied economic analysis of panel data to analyze only the
observations on units for which a complete time series is available. Since the
seminal contributions of Heckman (1976, 1979) and Hausman and Wise (1979) it is
well known that inferences based on either the balanced sub-panel (with the
complete observations only) or the unbalanced panel without correcting for
selectivity bias, may be subject to bias if the nonresponse is endogenously
determined. Even if the response process is known, estimation of the full model
including a response equation explaining the missing observations, is, in general,
rather cumbersome (compare Ridder 1990, Verbeek 1990). Therefore, it is worth-
while to have some simple tests to check for the presence of selectivity bias which
can be performed first. An obvious choice for such a test is the Lagrange Multiplier
test, which requires estimation of the model under the null hypothesis only. As will
be shown in this paper, the computation of the LM test statistic is still rather
cumbersome and, in addition, its value is highly dependent on the specification of
the response mechanism and the distributionalassumptions. In this paper we will
therefore consider several simpler tests to check for the presence of selectivity bias
without the necessity of having to estimate the full model or to specify a response
equation. A consequential advantage of these tests is that they can be performed in
a simple way in cases with wave nonresponse, where all observations on the
variables of the model are missing for some individuals in some periods, as well as
item nonresponse, where only information on the endogenous variable is missing.
For ease of presentation we will in this paper restrict attention to the linear

* Manuscriptreceived March 1990; final revision received September 1991.


i Helpful comments of Bertrand Melenberg, Arie Kapteyn, Peter Kooreman, Arthur van Soest and
two anonymous referees are gratefully acknowledged. The authors have benefittedfrom financialsupport
of the Netherlands Organizationfor Scientific Research (N.W.O.) and the Royal Netherlands Academy
of Arts and Sciences (K.N.A.W.), respectively.

681

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
682 MARNO VERBEEK AND THEO NIJMAN

regression model, although several of the tests can straightforwardlybe generalized


to nonlinear models. Consider
(1) yit = xit, + a i + cit, t = 1, , T; i = 1, ., N.
where xit is a k dimensional row vector of exogenous variables relating to the ith
cross sectional unit at period t, 13is a column vector of unknown parameters of
interest, ai and 8it are unobserved i.i.d. random variables with expectation zero
and variance o-2 and o-2, respectively, which are mutually independent. The
variables in xit are assumed to be strictly exogenous, i.e., Etci Ix}it}= 0 for all i,
s, t and E{cai~xit}= 0 for all i, t. For simplicity we assume that the model does not
contain an intercept term and that means have been removed from all data. T and
N denote the number of periods and the number of cross sectional units (individ-
uals, households, firms) in the panel, respectively.
Whether or not observations for yit are available is denoted by the dummy
variable rit, such that rit = 1 if yit is observed and rit = 0 otherwise. In addition,
we define c = t=1 rit, so that ci = 1 if and only if Yit is observed for all t.
Observations on xit are assumed to be available when rit = 1. A commonly used
assumption to describe the process generating rit is based on a latent variable
specification. In that case, rit is determined by the sign of r*t, given by, for
example,

(2) r* =ZitY + g+ n1it, t =1,, T; i= 1, , N,

with zit a row vector of exogenous variables, possibly containing (partly) the same
variables as xit, and qitan unobserved random variable. The term g accounts for
unobserved time-invariant individual-specific effects. Now, rit = 1 if rt > 0 and
zero otherwise. For the moment however, we shall not use additional assumptions
on the process that determines rit. Only in Section 4, where the LM test is
discussed, we shall assume that specification (2) holds.
When estimating 13in (1) using the available observations one is implicitly
conditioning upon the outcome of the selection process, i.e., upon rit = 1. The
problem of selectivity bias arises from the fact that this conditioning may affect the
unobserved determinants of yit, in particular, this may occur if the indicator
variable rit is not independent of the individual effect ai or the error term cit.
Similar problems arise if one concentrates attention to the complete observations
only, i.e., to those cross-sectional units for which a complete time series is available
(forming a balanced sub-panel). In this case one is implicitly conditioning upon
ci = 1 (rij = ... riT = 1).
In this paper attention will be paid to several simple testing procedures that can
be used to check whether selectivity bias is seriously present. First, in Section 2,
we analyze two well known estimators, the fixed effects (FE) and the random
effects (RE) estimator, and discuss the conditions for no selectivity bias in these
estimators. It appears that the condition that rit is independent of both ai and cit
in (1) is not necessary (though sufficient)for consistency. Moreover, it is shown that
the fixed effects estimator is more robust for selectivity bias than the random effects
estimator. Section 3 shows how differences between the FE and RE estimators

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 683

from a balanced and unbalanced design can be used to construct simple (quasi-)
Hausman tests of selectivity bias. Moreover, some simple variable addition tests
are suggested. Neither of these tests does require knowledge of the process that
determines rit.
In Section 4 we introduce and specify a latent variable specification to describe
the selection process rit. If this description is correct and data are available to
identify its unknown parameters, the Lagrange Multipliertest for independence of
rit and ac + sit can be computed and is asymptotically efficient. Moreover, it is
possible to use a two step estimation and testing procedure based on the results of
Heckman (1976, 1979). Both of these tests are computationally not very attractive.
To illustrate the findings of Section 2 and to obtain some idea about the power of
the tests proposed in Section 3, we perform a Monte Carlo study, the results of
which are reported in Sections 5 and 6. Finally, Section 7 contains some concluding
remarks.

2. SELECTIVITYBIAS IN THE FIXED AND RANDOMEFFECTS ESTIMATORS

In this section we derive conditions for consistency of the fixed effects (or
"within") estimator for the regression coefficients /3 in (1). Subsequently, we
consider the random effects estimator. Since most panel data sets are characterized
by a large number of cross sectional observations covering a fairly short time
period, we shall concentrate on consistency for N -> oo and keep T fixed. T is
assumed to be strictly larger than one.
If we define fit as the value of xit in deviation from its (observed) individual
mean, i.e.
T /T T

(3) -it =x it- isris ri, if is > 0


s= I s=1 s=1

= 0 otherwise,
and analogously for Yit, the FE estimator based on the unbalanced panel is given
by (compare Hsiao 1986, p. 31)2
N T \ SN T

(4) !3FE(U) = ( E ;itfitrit (| E E i'tysitrit


and t=t o b d t t=b

and the one based on the balanced sub-panel by


|N T j fN T
(5)~~~ 13 FE (B Yt -ftvi 4,1 Ytyi

2
This estimator is only defined if at least one individualis observed more than once; for finite samples
there will generally be a small but nonzero probabilitythat this is not the case, but for practical purposes
this can be ignored. Similar remarks hold for all other estimators presented below.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
684 MARNO VERBEEK AND THEO NIJMAN

Obviously, FE(.) is unbiased and consistent3 for 8 if selection is determined


independently of ai and 8it. Using Yit = Pit,8+ pit, one immediately sees that this
condition is too strong, since independence of ri = (ri, ..., riT)' and the
transformed error term fit also guarantees unbiasedness and consistency. It is
straightforwardto show that an even weaker condition for consistency of FE(U)
and PFE(B) is that4
(6) E=?itlrilrit 0, t= 1, , T; i= 1, , N

or

(7) E{fitcij}ci = 0, t= 1, , T; i= 1, ., N,

respectively. Consequently, a sufficient condition5for both conditions (6) and (7) to


hold is that
(8) E{fitri}= 0 t= 1,..., T; i= 1,..., N.

First of all, it should be noted that (8) does not involve aj. Thus, the fact that
selection (indicated by rit) depends upon the individual effects ai in the model of
interest does not introduce a selectivity bias in the fixed effects estimators. In
addition, if selection affects the conditional expectation of each of the error terms
Eil, -.., 8iT in the same way, selectivity bias will also not occur. In all these cases
selectivity may have an effect on the structuralequation (1), but since this effect is
fixed for a given individual over all periods in which its dependent variable is
observed, it is absorbed in the fixed effect and no consistency problems arise for the
FE estimator. In Section 4 some more attention to condition (8) will be paid in the
context of the latent variable equation (2) explaining rit.
Next we consider the random effects estimator (compare Hsiao 1986, p. 34 ff.).
First, we stack the observations for each cross sectional unit into vectors and
matrices, i.e.

lYii lXil \ /Il

i =. i. (?).
YiT XiT ? iT

Let Tj denote the number of periods unit i is observed, i.e. Tj = rit . For each
cross sectional unit we define a Tj x T matrix Ri transforming yi into the
Ti-dimensional vector of observed values Y!lbs, say. This matrix Ri is obtained by
deleting the rows of the T-dimensional identity matrix corresponding to the
unobserved elements. Now we can write Y pbs = Rjy . Defining t = (1, 1, , 1)'

3Throughout the paper, we assume that the usual regularityconditions are met.
4
The conditional expectations in the sequel are also conditional on the exogenous variables, but for
the sake of notation these are omitted.
5A case in which this sufficient condition is not necessarily met, but condition (6) holds, is the
situation where observations are missing deterministically(given xi,) (E{ritxi} = ri= 0), for example,
if being on vacation implies nonresponse.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 685

of dimension T, the variance covariance matrix of the error term in (1) can be
written as
0 = V{tai + cE} = a + OjI.

Writingfi = R i Ri and XfpbS = R1X1, the random effects estimator based on the
unbalanced panel is given by6

X
(9) RE(U) = (jE Xb5(Q.)1Xobs ( xobsf(?)-lYbs)-

If only the complete observations in the panel are used the randomeffects estimator
is given by

(I10) xPRE(B)= Xnz jj (EXtQ 1ly Jo |

Note that these estimators can easily be computed using OLS on transformed data
even if the unbalanced panel is used (see, e.g., Baltagi 1985 or Wansbeek and
Kapteyn 1989).
The estimators PRE(.) are consistent if
(11) E{ai + itcri} = 0, t = 1, ..., T; i = 1, ..., N.

Clearly, this condition is stronger than condition (8) needed for consistency of the
fixed effects estimator and consequently, we can conclude that the fixed effects
estimator is more robust with respect to nonrandom selectivity than the random
effects estimator. This may be a reason to prefer the fixed effects estimator although
of course some efficiency is lost by this choice if in fact condition (11) holds.
Assuming normality of the error terms in (1) and a probit model to describe the
selection process rit, this point is further elaborated in Section 4.
Before we propose several simple tests to check for the presence of selectivity
bias, it is important to note two things. First, the conditions for consistency of the
fixed effects and random effects estimators are different and, second, there is no
reason why the inconsistencies in estimators based on the balanced sub-panel and
those on the unbalanced panel would coincide. These two points enable us to
construct tests for the presence of selectivity bias (or, in fact, for consistency of the
FE of RE estimators) using only the four simple estimators presented above. This
will be the main theme of the next section.

3. SIMPLETESTS FOR SELECTIVITYBIAS

In Section 2 four estimators of p have been presented which are all consistent in
the absence of nonrandomselection (i.e. if rit is independent of ai and Eit),namely

6 For expository
purposes we ignore the fact that in practice unknown variances have to be replaced
by consistent estimates.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
686 MARNO VERBEEK AND THEO NIJMAN

the fixed effects estimators based on the balanced sub-panel and the unbalanced
panel and the random effects estimators based on the balanced and unbalanced
panel. In general, it is quite unlikely that the pseudo true values, i.e. the probability
limits under the true data generating process, of either two estimators are identical,
unless both estimators are consistent. Therefore, it is possible to construct a test for
selectivity bias based on the differences between either two, three or four
estimators.
Let us stack all four estimators into a 4k dimensional vector 8 as follows.

(12) [= (f3E (B)' , 3FE (U), ,P RE (B) , J RE (U)T)

Under weak regularity assumptions 8 is asymptotically normally distributed


according to
L

(13) N(f3
-N - [3) N(O, V),
where p denotes the vector of pseudo true values. From (13) it immediately follows
that the hypothesis D/3 = 0 can be tested using

(14) (D = Nf3 'D'(DVD') -D3,


which is asymptotically distributed as a central Chi-square with d degrees of
freedom under the null hypothesis D/3 = 0, where A - denotes a generalized
inverse of A and d is the rank of DVD'.
In order to be able to compute the test statistics in (14) for appropriatechoices
of D, an estimator for the full matrix V is needed. Using the definitions of the four
estimators given in (4), (5), (9) and (10), it is a straightforwardexercise to determine
their variances and their covariances. Denoting VI, = V{I3FE(B)}, V22 =
V{I3FE(U)}, V33 = V{I3RE(B)}and V44 = V{I3RE(U)},it follows that all blocks in
the matrix V are a function of the variance covariance matrices of the four
estimators in p only. In particular, it holds that

(Vi V22 V33 V44


V= V22 44
(15) V22V1.lV33

V44

Using (15) any test statistic given in (14) can easily be computed. Two obvious
candidates from the tests that compare two out of four possible estimators, are
those comparingthe fixed or randomeffects estimators from the balanced sub-panel
and the unbalanced panel, where D = DI = [I - I 0] or D = D2 [ ?0 I - I],
respectively. Two other choices, D3 = [I 0 - I 0] and D4 = [0 I 0 - I], result
in the standard Hausman specification test for uncorrelated individual effects (see,
e.g., Hsiao 1986, p. 48) and its generalization to an unbalanced panel, respectively.
A fifth test compares the FE estimator in the balanced sub-panel and the RE
estimator in the unbalanced panel (D5 = [I 0 0 - I]), while for the last possible
test D6 = [0 I - I 0]. Obviously, alternative tests which compare three or more
estimators of 6 are possible.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 687

Since the tests proposed above are based on the comparison of two estimators for
the same parametervector and since some special cases correspond to well known
Hausman tests in the literature we shall refer to them as (quasi-) Hausman tests.
Unlike in the standard case our tests are based on estimators which are all
inconsistent under the alternative. In the very unlikely case where all estimators
would have identical asymptotic biases these tests will have no power at all.
Keeping this in mind the null hypotheses (Ho: Di06 = 0) of the tests above can be
translated into hypotheses in terms of estimator consistency.
Let us define

HoE:E{?i Iri} 0 (the fixed effects estimators are consistent),

and

HoE:E{ai + Eilri} = 0 (the RE and FE estimators are consistent).

The null hypothesis (denoted by Ho) of nonrandom selection, i.e. the hypothesis
that rit and ai and sit are independent, is the strongest hypothesis (since it implies
all the others). However, for conducting inferences it is not relevant whether Ho is
true or not, but whether Ho E or HJFEare correct, since inferences will be based on
either the random effects or the fixed effects estimator. Notice that the latter
hypothesis is implied by the former, i.e. whenever the random effects estimator is
consistent, the fixed effects estimator is consistent as well. The (quasi-) Hausman
tests may be appropriate instruments for checking the consistency of these
estimators, although they are only able to test for the weaker hypotheses Ho.
Consequently, a rejection of Ho (for some i = 1, ..., 6) by the corresponding
Hausman test, implies that HoE should be rejected. If Ho is rejected, HFE should
be rejected as well. However, the converse is not true.
Note that if both HoREand HFE are false, all estimators are inconsistent. In that
case knowledge of the selection process can be used to model selection simulta-
neously with model (1) to obtain consistent random effects or fixed effects
estimators correcting for selectivity. However, the joint estimation of a selection
process and model (1) may be computationally demanding, unless some simplifying
distributional assumptions are made. See, for example, Ridder (1990) or Verbeek
(1990). In addition, the restrictions needed to identify ,8 may be stronger than one
would like, while the resulting estimates will depend heavily on the available prior
information (compare Manski 1989, 1990).
Note that only the first test statistic (based on D 1) is appropriate for checking
HFE, while any other test statistic can be used for HoRE. The optimal testing
procedure seems to be to test for the stronger hypothesis first (HoE), and, if this
test rejects, test subsequently for the weaker one (HFE). Of course, it is preferable
to use the most powerful test out of all possible tests for the hypothesis HoE.
However, the analysis of statistical power is extremely difficult if not impossible,
not only because the test statistics are not mutually independent, but also because
we are working with Hausman specification tests for which the null hypotheses Ho
cannot be written down in a simple parametricform. Therefore, standardresults on

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
688 MARNO VERBEEK AND THEO NIJMAN

the power of Hausman tests (compare Holly 1982) and on sequential testing (see,
e.g., Mizon 1977, Holly 1987) are not directly applicable in this situation.
Of course alternative tests for selectivity can be constructed. Remember that
selectivity bias in model (1) occurs because the conditional expectation of the error
term ai + Eit does not equal zero. If this conditional expectation E{fa + Eitiri}
were known (possibly apart from one or more proportionality factors) one could
add it as an extra regressor (or combination of regressors) in (1) such that the new
error term has expectation zero (given xit and ri). Subsequently, the parameters in
the extended model can be estimated consistently using standard methods. This is
the essence of the well known two step estimation procedure in the cross sectional
sample selection model proposed by Heckman (1976, 1979)and the simple two step
estimators for models with censored endogenous regressors and sample selection
suggested by Vella (1990). An application for the case of nonresponse in panel data
is presented by Nijman and Verbeek (1990).
Of course, the conditional expectation E{cai + itc ri} is not known (or identifi-
able) unless the selection process is known (or identifiable), and therefore this
procedure will have the same drawbacks as joint estimation of the model and the
selection process, although the computational burden may be somewhat less. As a
testing procedure it may be worthwhile to try to approximate the conditional
expectation in a simple way and to check whether it enters model (1) significantly.
Since E{cai + cit IriJ will be a function of ri, the functional form of which depends
upon the joint distribution of ai + Eit and ri, one can think of two more or less
distinct ways of approximatingit. Firstly, one can have one or more variables, zit,
say, that are likely to determine the probability of selection (i.e. affect the
distribution of ri), and enter these variables in a convenient form, for example as
a low order polynomial. The resulting test would then be a joint test of the
hypothesis that, conditional on xit, yit does not depend on (this function of) zit and
the hypothesis of no selectivity bias. Alternatively, one can choose some function
of ri itself, from which it is known that it should not enter the model significantly
under the hypothesis of no selectivity bias. The resulting test is a test of the
selectivity bias hypothesis only. In the sequel we shall concentrate on this second
approach and consider three possible variables that can be included in the
regression equation. First, Ti = 1 ri, the number of waves individual i
participates, second ci = =Iff ris, a 0-1 variable equal to 1 if and only if individual
i is observed in all periods and third, rit- , indicating whether individual i is
observed in the previous period or not. Note that ri,0 = 0 by assumption. To test
the significance of these variables in (1) we are forced to use the unbalanced panel
since in the balanced panel the added variables are identical for all individuals and
thus incorporated in the intercept term. Since the additional variables are constant
over time for each individual in the first two cases, the corresponding parameters
are not identified in the case where the individual effects ai are treated as fixed. We
shall therefore concentrate attention to random effects estimators.
Although one could expect that the added variables have an influence on the
relationship between Yit and xit if there is selective nonresponse, there is no reason
why this effect would be linear and thus the power of the tests may be doubtful. If
we denote the coefficient for the added variable w, say, by y, then the null

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 689

hypothesis of the variable addition test is Ho: Yiv = 0. Note that Ho implies Ho'
but that the converse is not true.

4. SPECIFICATIONOF THE RESPONSEMECHANISMAND THE LM TEST FOR SELECTIVITY


BIAS

In this section we assume that response rit is determined by a random effects


probit model, an assumption which is often made in empirical applications
(compare Hausman and Wise 1979, Nijman and Verbeek 1990, Ridder 1990). Under
this assumption and assuming normality of the error terms in (1) it is possible to
derive the LM test statistic for the null hypothesis that rit is independent of the
unobserved determinants of yit (ai and Eit). Furthermore, we pay some more
attention to the conditions for consistency of the FE and RE estimators in the
context of this example.
Suppose rit is determined by the sign of a latent variable r*t, which is generated
by

(16) r= zity + *+ Tqit, t= 1, ..., T; i = 1, ..., N,

where zit is a row vector of exogenous variables, usually containing partly the same
variables as xit, Tit denotes an unobserved random variable and g is an
individual-specific effect. In order to account for possible correlation between g
and the explanatory variables zit, we follow Chamberlain(1984) in assuming that,

(17) g = Zil IT + Zi2IT2 + + ZiT'T + si,

where gi is independent of all zit's. Substitution in (16) yields

(18) r* = Zity + ZilT1T + Zi272 + + ZiTIT + (i + Nit.

To be able to identify the parameters in (18) it is essential to assume that


observations on zit are available for both rit = 1 and rit = 0. Note that this
assumption is not required when performingthe (quasi-) Hausman tests or variable
addition tests proposed in Section 3. The unobserved random variables in (1) and
(18) are assumed to be normally distributed according to

(19) N!09
EE,7
JI 2 )

where Ei = (Eil, ... , EiT)' and mqi= (rmil, . TiT)'. For identification of the
probit model we will impose (as usual) 2- + 0r = 1. Of course, one can test the
model assumptions implied by (18) and (19) along the lines discussed in, for
example, Lee (1984) and Lee and Maddala (1985).
Under these assumptions the expectation of Eit given selection is given by (see
the Appendix)

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
690 MARNO VERBEEK AND THEO NIJMAN

(20) EfEitril} = 2 (iz{i +


7)f(itri}- 2_~2
+ E{fi + -qis ri})v

while the conditional expectation of ai given selection is given by (see the


Appendix)
T
(21) E{ailri}- 2
0 + Tor~
> E{f; + qi1slri}.
s =1I

The conditional expectation E{fi + Thitri} is a complicated function (see the


Appendix) of the variables in zit and reduces to "Heckman's (1979) lambda" if
there is no individual effect in the probit model (o-2 = 0).
Under the normality assumption the independence of ri and (ai, Eit) is
equivalent to orag = cr,,, = 0. Clearly, this condition implies that (11) holds,
implying consistency of both the random effects and the fixed effects estimators.
For the transformed error term sit (20) implies that
T /T

(22) E{fitIri}I= 2 Efi + rqitIE - risEfi + 7qisIri} E ris.


0r 71 s =1/s=1

From this it immediately follows that condition (8) is fulfilled and the fixed effects
estimator is consistent if either o-,, = 0 or E{ i + Bit IriI does not vary over time.
The latter condition implies (see the Appendix) that there is no selectivity bias if the
probability of an individual of being observed is constant over time, even if o-E7 =
0. This will occur when zit y is constant over time. Since (22) does not contain oaf,
a correlation between the individual effects in the structural equation (1) and the
probit equation (18) does not result in a bias in the fixed effects estimator.
The condition that E{ i + itIri} does not vary with t is clearly not sufficient for
consistency of PRE. For the latter we either need that E{ i + Bit rir}is constant and
T -> o (since the FE-estimator and the RE-estimator are equivalent when T tends
to infinity)7or that E{ i + qitIril is constant and orap+ crEq = 0, which does not
seem to be very likely in practice.
The actual magnitude of the inconsistencies of the estimators is determined by
the projection of the conditional expectations derived above on the (transformed)
xit's. Although it is possible to analyze the effects of changes in model parameters
on the conditional expectation of the (transformed) error term analytically (com-
pare Ridder 1990), it is, in general, virtually impossible to give analytical expres-
sions in terms of the model parameters for projections of these expectations on the
explanatory variables, i.e. of the biases in the estimators. To obtain some insight in
the numerical importance of the bias in the four estimators discussed above, we will
present some numerical results in the next section.
Given the model in (1) and (18) and the assumed normality of the error terms in
(19) is it possible to write down the likelihood function (compare Ridder 1990) and

7 This equivalence also holds when the model is not correctly specified, as in our case.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 691

to derive the Lagrange Multiplier test statistic for the null hypothesis HO: =
crag = 0. The loglikelihood function involves the joint distribution of the observed
y-values in Y!,bs and the response indicator ri. In particular, the loglikelihood
contribution of individual i is given by
(23) Li = log f(y obs, ri) = log f(riIyrYbs) + log f(yobs),

where we are using f(.) as generic notation for any density/mass function. The
second term in the right-handside of (23) is the log of a Ti-variate normal density
function, while the first term is the loglikelihood function of a (conditional)
T-variate probit model (see the Appendix for details).
Denoting the full vector of parameters involved in (23) (including oragand o-,,) by
0, the Lagrange Multiplier test statistic is given by
N L'N dL L' N
(24) (LMI -=
Z i I a

where 00 is the ML estimate for 0 under HO:ora,6=ocre = 0. Since there does not
appear to be any form of block diagonality of the Fisher information matrix under
the null, the scores with respect to all parameters in the model are required to
compute this test statistic from the first derivatives of the loglikelihood. For the
cross sectional case the LM test for selectivity is discussed in Lee and Maddala
(1985).
Because under HO the two terms in the right-hand side of (23) depend on
nonoverlapping subsets of the vector of parameters, the score contributions with
respect to the parameters in (1) can be found in Hsiao (1986, p. 39),8 while those for
the parameters in (18) can be derived from a standard random effects probit
likelihood (see the Appendix). The most difficultscore contributions are those with
respect to the two covariances oragand o-re; the latter even requires double
numerical integration (see the Appendix). Because estimation under HO requires
numerical integration (for each individual) for the probit part of the model and
computation of each score contribution also requires numerical integration over
one or two dimensions, the LM test is rather unattractive in applied work.
For the cross sectional sample selection model Heckman (1976, 1979) proposed
a simple way to test for selectivity bias and to obtain consistent estimators. As
discussed in Ridder (1990) this method can be generalized to the case of panel data,
where two correction terms to equation (1) are added instead of just the one
variable known as Heckman's lambda (or the inverted Mill's ratio). These two
correction terms are the conditional expectations of the two error terms (ai and Eit)
given the sampling scheme, as given in (20) and (21) evaluated at the (consistent)
parameterestimates of the probit model under the null hypothesis (see Nijman and
Verbeek 1990 for an application). The two unknown covariances 0ragand o-,, are
not included in these correction terms but are the corresponding true coefficients in

8 Note that (3.3.20) in Hsiao (1986) contains a printingerror:the first - sign on the second line should
read a + sign.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
692 MARNO VERBEEK AND THEO NIJMAN

equation (1). Obviously, consistent estimation of these coefficients orag and 0rn

allows one to check whether nonresponse is selective or not. Since estimation of


the parameters in the response equation as well as computation of the conditional
expectation of (i + rqit in (20) and (21) requires numerical integration, these
generalized Heckman (1979) method is still computationally unattractive. There-
fore, it may be worthwhile to have some simple variables that can be used instead
to approximate the true correction terms to check for selective nonresponse, for
example those suggested in the previous section.
If the specification of the response process in (18) is correct, the Lagrange
Multipliertest is known to be asymptotically efficient for testing the null hypothesis
Ho. To obtain some idea about the power of the alternative simple tests we
performed a Monte Carlo study under this assumption, the results of which are
presented in the next two sections. In Section 5 we introduce the Monte Carlo
model and present estimates for the pseudo true values of the four estimators in
(12), giving insight in the importance of the selectivity bias in these estimators. In
Section 6 some numerical results on the power of the simple tests in comparison
with the Lagrange Multiplier test are presented.

5. NUMERICALRESULTS ON THE PSEUDO TRUE VALUES OF THE RE AND FE


ESTIMATORS

In this section we will present some numerical results on the pseudo true values
of PFE and IRE, defined as the probabilitylimits of these estimators under the true
data generating process. For expository purposes we consider a simple model
consisting of equations (1) and (18) with only one exogenous variable included
besides the constant term.
This exogenous variable (zit = xit) is assumed to be generated by a Gaussian
AR(1) process with mean zero, autocorrelationcoefficient Px and variance o-$ For
simplicity we have imposed equality of all wt's in (17). The model used for
simulation is thus given by

(25) yit f38xit + ai + Eit


(26) rt = KYo+ YlXit + ITXi + (i + 7)it

where xi is the average value of the xit's over time. We concentrate on a model
with only one explanatory variable, since it elucidates the discussion most clearly.
Including an additional variable in (25) that is uncorrelated with xit essentially
would not change the results, while inclusion of a variable that is correlated with
xit would result in biases that depend heavily on the sign and magnitude of this
correlation. Similar remarks hold for the inclusion of additional variables in (26).
We consider two possible specifications for the selection equation, one in which
XTis a priori set to zero (in which case the probability of selection in period t is
determined by xit), and one in which Yi is a priori set to zero such that the average
value of xit over time determines the selection probability. Given this choice of
specification, the relative biases of the estimators for 13lin this model, defined as
(,81 - I381)/,31,
where ,3i is the pseudo true value of the respective estimators for ,31,

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 693

TABLE 1
RELATIVE INCONSISTENCIES (IN PERCENT) IN THE FE AND RE ESTIMATORS FROM A BALANCED AND
UNBALANCED PANEL

Reference situation (REF): T = 3, R2 = 0.1, R = 0.9, Pa = 0.1, Px = 0.7, po 0.5,


p =0.1 and Pae =0. 5
Ry2 - R r=
2 Pa= Px PO Pe Pa
estimator REF 0.9 0.1 0.9 0.3 ((1) 0.9 0.9
A: 7T=0,OpE = 0.9
FE(B) -78 -8 -49 -25 -90 -61 -28 -77
RE(B) -79 -9 -49 -27 -93 -61 -39 -81
FE(U) -98 -10 -50 -33 -101 -77 -37 -98
RE(U) -116 -13 -53 -39 -115 -88 -56 -121

B: IT = 0, PE6, = 0
RE(B) -6 -1 -5 -2 -6 -6 -17 -11
RE(U) -6 -1 -4 -6 -7 -5 -19 -12

C: yi = 0, PE" = 0.9
RE(B) -34 -3 -38 -1 -17 -27 -17 -35
RE(U) -74 -7 -44 -4 -41 -61 -32 -75

1. Relative inconsistency of an estimator is defined as its pseudo true value minus the true value
divided by the true value (multiplied by 100 percent).
2. The number of replications in each situation is chosen such that all (Monte Carlo) standard
errors are smaller than 0.5 percent.
3. All simulation results are obtained using the NAG-library subroutines GOSCCFand GO5DDF.
4. From the analytical results we know that the fixed effects estimators are consistent in panels
B and C, which was confirmed by the Monte Carlo results.

depend on T, the number of time periods, and the following eight hyper-
parameters.
2
Pa = (cr2 + o-2)-), the importance of the individual effect in equation (25);
f - 2, the importance of the individual effect in the selection equation;
Px, the autocorrelation coefficient of xit;
po = F(Dyo),the (unconditional) probability of observation when xit = 0 for
all t;
R 2 = P2 2(3?2o2+ + 2+ 2) - 1, the (theoretical) R 2 of equation (1);
Rr, the (theoretical) R2 of the selection equation;
R2 = 2o2 /2 2 + 1) - 1 if T = O, or
R72 = T2 r(V2c2 + 1)-1 if Yj = 0, with - 2Q + 4Px + 2P2)/9
ol(3
(the variance of xi);
PETE
= o-J bI(-8,O'(the correlation between the error shocks in (25) and (26);
Pad, the correlation between the individual effects in (25) and (26).
If we assume that all correlations are nonnegative, all of the hyperparametersare
restricted to the interval [0, 1], so that one has some more feeling what "small" and
"large" values for these parametersmean. Without loss of generality, it is assumed
that Yi ' 0 or X- 0. In Table 1 estimated relative biases (relative differences
between the estimated pseudo true values and the true values) of the four
estimators discussed above are given for several combinations of parametervalues

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
694 MARNO VERBEEK AND THEO NIJMAN

and T = 3. The number of replications is chosen in such a way that all (Monte
Carlo) standard errors are smaller than 0.005. In the table the parametervalues are
chosen as follows. There is one "reference situation" characterized by T = 3,
R = 0.1, R 2 = 0.9, pa = 0.1, Px = 0.7, po = 0.5, pf = 0.1 and pad = 0.5.
Three alternative combinations of X and p,, are considered given in panels A, B
and C. The columns in the table correspond to the reference situation (REF) or this
situation with only one of the parametervalues changed. For example, the column
with heading Px = 0. 3 refers to the reference situation given above with Px = 0. 3
instead of 0.7. If X = 0 and p,, = 0.9 (panel A) we see in this column that the fixed
effects estimator based on the balanced panel suffers from an inconsistency of -90
percent, while the same figurefor the random effects estimator from the unbalanced
panel is -115 percent. The standarderrors implied by the Monte Carlo experiment
are such that the true relative inconsistencies are with a 95 percent probability
within a 1 percent point range of the reported values.
Although, as always, it is difficultto draw definitive conclusions from results for
specific parameter values the results in Table 1 suggest the following points.
The biases in the estimators can be substantial. In some cases it is even possible
that the sign of the pseudo true value is opposite to the sign of the true value of 3k1.
Moreover, like other simulation results (not reported in this paper) suggest, if the
true 13iparameter is equal to zero (which implies that R 2 = 0), a significant effect
of the explanatory variable on Yit can be found. This phenomenon is also known
from the standard (cross section) sample selection model of Heckman (1979).
Although the fact that the conditions for the fixed effects estimator to be
consistent are weaker than those for the random effects estimator does not
necessarily imply that the bias in the latter is always larger than that in the first, our
simulations show that this is in fact the case. If there is a difference between the RE
and FE pseudo true values, it is in favor of the latter estimator. This result is caused
by the fact that we have assumed that Pad> 0. In the not very likely situation where
Pad< 0 and p,, > 0, the bias in the random effects estimator may in fact be smaller.
If the amount of bias is used as criterion for choosing an estimator, it is obvious
from our analytical and numerical results that the fixed effects estimator is likely to
be preferable to the random effects estimator.
For almost all situations we consider, the bias in the estimator based on the
unbalanced panel is larger (in absolute value) than that in the same estimator based
on the balanced panel; if it is smaller the difference between the two estimates is
negligible given the size of the Monte Carlo experiment. This somewhat surprising
result suggests that a balanced panel may be preferred to an unbalanced panel. A
possible explanation for this result might be that the individuals that are not
observed in all periods have on average a lower probabilityof being observed, thus
also a lower probability in those periods they are observed, implying a larger
correction term in the regression equation. In the standard sample selection model
of Heckman this would mean that for those individuals Heckman's lambda deviates
more from zero.
Keeping all parameters fixed at some level except one, it may be possible to say
something about the change of the bias if that one parameter is changed. It is
evident from the analytical results and also from the numerical results above that a

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 695

rise in R y2 will cause a decrease in the absolute value of the bias, simply because a
rising R 2 diminishes the role of the error terms ai and Eit. On the other hand, a rise
in R,2 increases the absolute value of the bias, since it increases the correlation
between the probabilitiesof being observed and the explanatory variables) xit. For
P0 2 2(yo ? 0), an increase in p0 diminishes this correlation and therefore
decreases the absolute value of the bias. Obviously, increasing the (absolute values
of the) correlation coefficients p871or pad(already being nonnegative) causes a rise
in the absolute value of the bias of all estimators. A more importantindividualeffect
in equation (25), Pa, seems to reduce the absolute value of the bias; the effect of Px
and p6 is ambiguous.

6. NUMERICALRESULTS ON THE POWEROF THE TESTS

In Section 3 a number of tests were proposed which can be used to check


whether selectivity bias is present or not. In this section we present numerical
results on the power properties of the quasi-Hausman tests, the variable addition
tests and the LM test of Section 4 for the Monte Carlo model introduced in Section
5. We shall not consider the generalized Heckman test because it is as hard to
compute but asymptotically less powerful than the asymptotically optimal La-
grange Multiplier test.
For simplicity we restrict ourselves to an analysis of the asymptotic local power.
That is, we consider the power of our tests under a sequence of local alternatives,
in general 0 = 00 + 8/NV for some vector 8, where 00 denotes the parameter
value under the null hypothesis (compare Engle 1984 or Holly 1987). Under such a
sequence of local alternatives our tests (or their x2 equivalents) are asymptotically
noncentrally x2 distributed, with a decentrality parameter A determined by 8. For
the (quasi-) Hausman tests, for example, and a sequence of local alternatives given
by (3 = ,3 + /VNIit holds that
L
(27) (R = Nf3'R'(RVR')-R3 - d(85'R'(RVR')-R) = Xd(AR), N -- o.

Since the power of a test is a direct function of its decentrality parameter, we report
decentrality parameters only.
We interpretthe particularalternative implied by the Monte Carlo model as being
one in a sequence of local alternatives. For all cases in the Monte Carlo set-up we
choose a sample size9 of N = 25,000 to estimate the pseudo true values 0 by 0. We
estimate 8 by 8 = N/<(6 - 00), which gives us (an estimate for) the decentrality
parameter for sample size n. In Table 2 decentrality parameters for n = 500 are
reported. From these decentrality parameters one can compute the probability of
rejection of the null hypothesis in a sample of 500 observations based on an
approximation by the asymptotic distribution. Considering, for example, the
reference situation in panel A (7T = 0, pE7 = 0.9), we see that the Hausman test
comparing the RE estimators from the unbalanced panel and the balanced sub-

9 Sample size refers to the number of individuals in the panel, including those that are observed only
once or twice.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TABLE 2
DECENTRALITY PARAMETERS OF THE CHI-SQUARE DISTRIBUTIONS OF SEVERAL TESTS FOR
SELECTIVITY BIAS AT n = 500 AND T = 3

Reference situation(REF): T = 3, R 2 = 0.1, R2 =0.9, Pa= 0.1, Px= 0.7, po =0.5,


p= 0. and Pa, = 0.5
R2 R2 Pa PX P= Pe Pae
test DF REF 0.9 0.1 0.9 0.3 '(1) 0.9 0.9
A: 7r= 0, pE7 = 0.9
Quasi-Hausmantests:
1 1 1.41 1.27 0.07 2.00 0.31 1.52 0.26 1.05
2 1 7.23 6.00 0.06 3.55 1.53 7.48 1.84 7.33
3 1 0.85 0.72 0.03 1.76 0.60 0.43 1.13 0.72
4 1 2.07 1.81 0.01 3.55 0.85 1.43 1.37 1.66
5 2 2.04 1.64 0.04 2.02 0.89 1.83 1.39 2.49
6 2 7.27 6.04 0.10 4.25 1.69 7.48 2.44 7.34
Variableadditiontests:
7 1 0.01 0.01 0.04 0.14 0.03 0.11 0.10 0.04
8 1 0.03 0.03 0.00 0.24 0.04 0.04 0.17 0.14
9 1 0.02 0.01 0.01 0.02 0.00 0.14 0.03 0.02
LagrangeMultipliertest:
LM 2 55.1 49.2 5.46 31.3 58.5 57.3 14.1 66.3
B: 7T= 0, PE-q = 0
Quasi-Hausmantests:
2 1 0.07 0.06 0.00 0.02 0.00 0.02 0.00 0.00
3 1 0.12 0.35 0.06 0.72 0.09 0.01 0.81 0.41
4 1 0.06 0.45 0.04 0.18 0.04 0.00 0.81 0.38
5 2 0.17 0.44 0.00 0.79 0.12 0.02 0.98 0.57
6 2 0.15 0.36 0.07 0.73 0.11 0.05 0.84 0.46
Variableadditiontests:
7 1 0.09 0.07 1.88 0.61 0.32 0.22 1.23 0.59
8 1 0.06 0.09 1.31 0.39 0.17 0.21 0.98 0.64
9 1 0.00 0.12 0.16 0.00 0.14 0.04 0.27 0.15
LagrangeMultipliertest:
LM 2t 1.33 0.13 4.12 4.95 1.06 1.15 5.92 3.74

C: Y 0, P"'E = 0.9
Quasi-Hausmantests:
2 1 19.6 19.4 0.10 1.47 11.4 20.7 8.96 17.7
3 1 19.9 18.3 3.73 6.68 22.4 15.2 4.35 19.3
4 1 16.0 14.7 1.56 2.50 15.1 12.7 3.93 15.3
5 2 30.6 29.3 3.73 7.60 27.1 28.3 11.9 29.2
6 2 29.4 28.4 3.74 6.86 24.5 27.5 11.4 27.9
Variableadditiontests:
7 1 29.9 27.6 0.09 3.92 36.7 27.0 16.0 24.9
8 1 22.7 21.6 0.08 3.16 30.6 21.6 13.9 18.5
9 1 2.80 2.29 0.05 0.04 5.85 2.10 0.59 2.19
LagrangeMultipliertest:
LM 2 75.9 73.6 13.7 20.1 83.8 66.8 12.1 83.0
1. Fixed Effects (Balancedvs. Unbalanced)
2. RandomEffects (Balancedvs. Unbalanced)
3. Unbalanced(RandomEffects vs. Fixed Effects)
4. Fixed Effects, Balancedvs. RandomEffects, Unbalanced
5. Balanced(FE vs. RE) and Unbalanced(FE vs. RE)
6. RE (Balancedvs. Unbalanced)and FE, Balancedvs. RE, Unbalanced
7. Et rit
8. Il. rit
9. ri, t- I
= 0 is imposeda priori,this test has one degree of freedom.
tlf the restrictionpE71
Estimateddecentralityparametersare based on 25,000 individualobservations.Estimatesfor
decentralityparametersfor sample size n can be obtainedby multiplyingthe numbersby n/500.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 697

TABLE 3
PROBABILITIES OF REJECTION (AT 5 PERCENT) FOR SEVERAL DECENTRALITY PARAMETERS

Decentrality parameter
DF 0 1 2 3 4 5 10 20
1 0.05 0.17 0.29 0.41 0.52 0.61 0.89 0.99
2 0.05 0.13 0.23 0.32 0.42 0.50 0.82 0.99

panel has a decentrality parameter of 7.23, implying a 77 percent probability of


rejection at a nominal size of 5 percent (if n = 500). If the available sample contains
1000 individuals, the decentrality parameteris twice as large (14.46) corresponding
to a 97 percent probability of rejection. Similarly, the implied probabilities of
rejection (at a nominal size of 5 percent) for six (quasi-) Hausman tests, three
variable addition tests and the LM test for any number of observations can be
computed using Table 3.
Note that the estimated decentrality parameters in Table 2 are not normally but
(noncentrally) Chi-square distributed, which makes computation of confidence
intervals difficult. Based on the asymptotic normality of the parameter estimators
the variance of A approximately satisfies

n2 l N
(28) VfA^}=~ N d+-A

where d is the number of degrees of freedom, and where we use the fact that N/nA&
is Chi-square distributed. It is important to note that this variance increases with
the true value A. For large enough A the corresponding standard error for N =
25,000 and n = 500 is (approximately) given by 0.283VA.
Looking at panel A of Table 2 first, where both Ho E and Ho E are false, we see
that in this case none of the variable addition tests has any power. Obviously, these
variables are under these data generating processes not capable of approximating
the Heckman (1979) like correction terms. This is probably due to the fact that our
simple variables are not capable of capturingthe time variation in these correction
terms (due to zi y). With regardto the Hausman tests, the results in Table 2 suggest
that the test based on comparison of the random effects estimators in the balanced
and the unbalanced panel (the second test) is more powerful than all other tests
based on comparison of two estimators. Looking at the tests that compare two pairs
of estimators (the fifth and the sixth test in Table 2), the latter seems to perform
relatively well, although it is not performing uniformly better than the best one
degree of freedom test. The test statistic based on comparing all four estimators
(which is not reported in the Table) does not result in a very powerful test compared
to those tests based on two pairs of estimators, since the additional degree of
freedom has a much more dominant effect on the power than a (fairly small) rise in
the decentrality parameter. For panel A of Table 2, the LM test is obviously far
more powerful then any Hausman test. Note that the power of all tests reduces
substantially if the R 2 of the selection equation is reduced from 0.9 to 0. 1; the bias

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
698 MARNO VERBEEK AND THEO NIJMAN

in the estimators is however still substantial (53 percent for the random effects
estimator from the unbalanced panel).
If 0-E, = 0, i.e. if the error shocks in the structural equation and the selection
equation are uncorrelated, but oag =#0 (so HoE is true and Ho E is not; panel B) all
tests seem to have limited power only. Even the power of the LM test is very
limited in this case, in which, of course, the null hypothesis Ho is only violated in
one direction orag=#0). Since the bias in the fixed effects estimators is zero in this
case, while that in the random effects estimators is small (compare Table 1), this
does not seem to be a situation to worry about.
As shown in panel C of Table 2, the power of all tests appears to be larger in the
case where the response is determined by an individual effect which is correlated
with the regressor (I =#0 and Yi = 0) than in the case where the regressor itself
determines the response (IT = 0 and Yi =A0). Note that for the Hausman tests
comparing FE and RE estimators we have a standard situation in which one of the
estimators in the test statistic is consistent even if the null hypothesis does not hold.
Remarkably, the variable addition tests have fairly good power properties as well,
especially the one based on adding the number of waves an individual is partici-
pating (Et rit). The one based on including ri t_1 has only very limited power.
Concerning the Hausman tests, the one comparing the RE and FE estimator in the
unbalanced panel, which is the standard Hausman test for uncorrelated individual
effects, has the largest power of the one degree of freedom tests. In some cases it
is worthwhile to combine two restrictions and perform a two degrees of freedom
test. It should be clear from the simulation results in the table that it is well possible
that the standard Hausman (1978) specification test for testing the hypothesis that
the individual effects are uncorrelated with the explanatory variables rejects due to
the presence of selectivity bias.
Unfortunately, none of the simple tests seems to have uniformly better power
properties than the others, so we cannot recommend one particulartest. The power
of all tests seems to depend crucially on the fact whether Ho E is false or, if it is
true, why HoE is true ((E = 0 or yl = 0?). In the lattercase (yl = 0) the power
of most simple tests is quite reasonable, while it is not if (TEN= 0. In line with the
Monte Carlo results above, we are tempted to say that both the second and the third
Hausman test (RE, balanced versus unbalanced, and unbalanced, FE versus RE,
respectively) perform relatively well and may be a good choice in applied work.
The best choice for a variable addition test seems to be to include Et rit in the
structural equation.
So far, we have only considered numerical analyses for a three wave panel (T =
3). If T increases, the number of individuals in the balanced subpanel (keeping all
parameters fixed) will decrease, which may increase the differences found between
the estimators from the balanced and the unbalanced panel. Moreover, the
difference between the fixed effects estimator and the random effects estimator for
a given sample will get smaller, since the weight of the between estimator in the
random effects estimator is inversely related with T (compare Hsiao 1986, p. 36).
This suggests that the power of the Hausman tests comparing estimators from the
balanced and unbalanced panel will increase with T and that of the standard
Hausman specification tests will decrease with T. For larger T the second Hausman

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 699

test (comparing the random effects estimators from the balanced and unbalanced
panel) is probably the most attractive way to test hypothesis Ho E.

7. CONCLUDINGREMARKS

In this paper we suggested several simple tests to check the presence of selective
nonresponse in a panel data model. We considered the selectivity bias of the fixed
and random effects estimators and showed that the FE estimator is more robust to
nonresponse biases than the RE estimator. Several simple Hausman tests have
been suggested which are based on the differences in the pseudo true values of
these estimators. Furthermore, some variable addition tests are proposed which
can be used to test for selectivity bias. Neither of these tests requires estimation of
the model under selectivity nor a specification of the response mechanism.
Our theoretical results show that the conditions for consistency of a fixed effects
estimator are weaker than that for a consistent random effects estimator. In
addition, a Monte Carlo study shows that the bias of the FE estimator is likely to
be smaller than that of the RE estimator in cases where both estimators are
inconsistent. The numerical results also indicate that the bias resulting from a
balanced sub-panel is likely to be smaller than that from the unbalanced panel.
Although the proposed Hausman and variable addition tests have poor power
properties in some cases, they may be a good instrument for checking the
importance of the selectivity problem. In particular when response is partly
determined by an individual effect which is correlated with the regressor the power
of several Hausman tests and variable addition tests is quite reasonable in
comparison with the Lagrange Multiplier test. For practical purposes at least two
Hausman tests can be recommended: the one comparing the random effects
estimators from the balanced and unbalanced panel, and the one comparing the RE
and FE estimators in the unbalanced panel (the standard Hausman test for
correlated individual effects). A test that is even simpler is the variable addition test
including Tj = It rit in the specification of equation (1). This test also seems to
perform quite reasonable in practice.
For ease of presentation attention in this paper was restricted to the linear
regression model, although several of the tests can straightforwardlybe generalized
to nonlinear models. For example, for any model that is identified from both the
unbalanced panel and the balanced sub-panel, it is possible to compute a simple
Hausman test comparing the corresponding two estimators. Moreover, adding Tj
or ci as an additional explanatory variable is possible in virtually any kind of model
and consequently, its significance can be tested straightforwardly, yielding very
simple checks for the presence of selectivity bias.

Tilburg University, The Netherlands

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
700 MARNO VERBEEK AND THEO NIJMAN

APPENDIX
SOME TECHNICALDETAILS

The Derivation of (20) and (21). From (19) it is readily verified that

K (o
2
0 l
0ae
(29) E N O (T2i ( I
(it + i 0,2 + 0fl2

which yields

(30) E{efifit + lij} = (I - To+ act ) + 716

and proves (20) and (22) if we use the definition of iit and take expectations
conditional upon ril, , riT. It also follows that

(31) E{oaij~t + qij}= 27 LI - To+


2 2AI), +

which proves (21) after taking conditional expectations upon ril, , riT.
Moreover, since E{(jIri} is fixed over time and since (dropping the zi, IT, terms
for notational convenience)

,Zity + (i

(32) E{71irtri}= J + f(Jijri) df if ri = I


t( i

where 4 and FDare the standard normal density and distribution function,
respectively, andf(lijri) is the conditional density of (i given selection (see Ridder
1990), it is evident that there is no selectivity bias if zity is constant over time, i.e.
if the probability of an individual of being observed is constant for all t.

The Lagrange Multiplier Test Statistic for Selectivity Bias. The loglikelihood
contribution of an individual i in the full model is given by

(33) Li = log f(ri Riyi)f(Riyi)

wheref(rilRiyi) is the likelihood function of a (conditional) T-variate probit model


and f(Riyi) is the likelihood function of a Ti-dimensional linear error components
model (compare Hsiao 1986, p. 38). The second term is simple and can be written
as

(34) log f(Ryji)= - g2 2T-log log 2-log(4-2+


-1 2 - Ti2)
2 2

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 701

2o-2 A 1itrit(it 3) 2(o-2+ Tio(-) i - /3)2

The first term in (33) is somewhat more complicated because we have to derive the
conditional distribution of the error term in the probit model. From (19) and
defining vit = rit(ai + Eit) (where rit is treated as nonstochastic), the conditional
expectationof the errorterm (i + qit is given by
_' ( 2 T
(35) E{st + it'qlvij l, Vi} = rit _ - a 2 v;
'E \ a s=1 /
T

+ 2+ 2 vE vs = cit, say.
O +Toa s =I

Using (19) the conditional variance of (i + qit can also be derived. It is


straightforwardto show that the conditional distribution of (i + qit given vil,
ViT corresponds with the (unconditional) distribution of the sum of three normal
variables uit + vji + rit v2i whose distribution is characterized by

E{v j} = E{v2i} = 0, E{uit} = cit.

2 _ =
V{uit} = rit ?2/ f? St say

VIv1} = 2 Ti (o2 e(o-2 + Ti 2) -I ==) c, say

Vt v 2i } = (T ? To(T a (T ( ,2+ a
T2, = say

COV { vli} (i = a0 (O_(( + T() a ) = 12, say

and all other covariances equal to zero. For notational convenience we do not
explicitly add an index i to the (co)variances st and w. Note that cit = 0, s72= try
OJI = (rf and W2= 0 under Ho. Like in the unconditional error components probit
model (compare Heckman 1981), the likelihood function can be written as (drop-
ping the z is rs terms for notational convenience)

(36) f(rilRiyi) = E{H e


Ad ZitY + cit
d li + ritv2i)}

where the expectation is taken over vji and v2i, and dit = 2rit - 1. It is this
likelihood function that has to be differentiated w.r.t. the unknown parameters My,
O'2, (rag and o-En. However, the expectation operator depends on the unknown
parameter vector 0 (because the density of vji and v2i is not defined with respect
to the same measure under Ho and the alternative), implying that the order of
taking expectations and differentiating is not interchangeable. This problem can
easily be solved by defining two new integration variables that are both standard

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
702 MARNO VERBEEK AND THEO NIJMAN

normally distributed (under the null and the alternative), rl and T2, say. Then we
obtain

(37) f(rijRiyi) if~1 1,


TJn
di Zit y +
+ Cijr + aitr
S+t
I + bitr2)
4(T1)4('2) d'r, dT2

where

ait = O1/2 + ritwow


12(1/2 and bit = rit(2 - 12W 1)1/2

Since f(Riyi) does not depend on o(TE and god, differentiating the log of the
expression above and evaluating the result under Ho yields the scores w.r.t. the
two covariances. Using the fact that for any element qjof the parameter vector (-y,

(38) 3Li = fri JjRyj)f(riRiyi )

with

Cft T aF(.)
af(rijRiyi)
(39) I FH 0(.) a
_ (rI)k( 'r2) d'rI d'T2,
JJS s=It 1, t=AS

the score w.r.t. ap can easily be derived using the following equality (under Ho)

aFq(.) ( ZitY + O7T dit acit 1/2 \


(40) ditZi) = + .
ao-ae an7 an\ a~a aa

Similarly, for o(,TE we use

a (,t-) ( Zity + aerj di- (cit - 2O- 2 2 2 -l


(41) = P\dit ~)~t + r. ?(lo' +T 0

from which the score w.r.t. o(TE under Ho can easily be derived. Note that both ml
and '2 occur in the integrand such that numerical integration over two dimensions
will be required.
For the scores w.r.t. y and o-2 = 1 - o-2it suffices under Ho to look at af(ri)l/ y
and af(ri)l/ao-, where (compare Heckman 1981)
f T/ Zity eI
(42) f(r) = H (1Idit drl-.
t= In t ov
P(,r+)
Both scores will require numerical integration over one dimension.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions
TESTING FOR SELECTIVITY BIAS 703

REFERENCES

BALTAGI, B. H., "Pooling Cross-Sections with Unequal Time-Series Lengths," Economics Letters 18
(1985), 133-136.
CHAMBERLAIN, G., "Panel Data," in Z. Griliches and M. D. Intriligator,eds., Handbook of Econometrics,
Vol. 2 (Amsterdam: North Holland, 1984), 1247-1318.
ENGLE,R. F., "Wald, Likelihood Ratio and LagrangeMultiplierTests in Econometrics," in Z. Griliches
and M. D. Intrilligator,eds., Handbook of Econometrics, Vol. 2 (Amsterdam:North Holland, 1984),
775-826.
HAUSMAN, J. A., "Specification Tests in Econometrics," Econometrica 46 (1978), 1251-1271.
ANDD. A. WISE,"Attrition Bias in Experimentaland Panel Data: The Gary Income Maintenance
Experiment," Econometrica 47 (1979), 455-473.
HECKMAN, J. J., "The Common Structure of Statistical Models of Truncation, Sample Selection and
Limited Dependent Variables and a Simple Estimator for Such Models," The Annals of Economic
and Social Measurement 5 (1976), 475-492.
, "Sample Selection Bias as a Specification Error," Econometrica 47 (1979), 153-161.
, "Statistical Models for Discrete Panel Data," in C. F. Manskiand D. McFadden, eds., Structural
Analysis of Discrete Data with Econometric Applications (Cambridge:MIT Press, 1981), 114-178.
HOLLY, A., "A Remark on Hausman's Specification Test," Econometrica 50 (1982), 749-759.
, "Specification Tests: An Overview," in T. F. Bewley, ed., Advances in Econometrics, Fifth
WorldCongress, Vol. 1 (Cambridge:CambridgeUniversity Press, 1987), 59-97.
HSIAO, C., Analysis of Panel Data (Cambridge:CambridgeUniversity Press, 1986).
LEE, L. F., "Tests for the Bivariate Normal Distribution in Econometric Models with Selectivity,"
Econometrica 52 (1984), 843-863.
AND G. S. MADDALA,"The Common Structure of Tests for Selectivity Bias, Serial Correlation,
Heteroskedasticity and Non-normality in the Tobit Model," International Economic Review 26
(1985), 1-20.
MANSKI, C. F., "Anatomy of the Selection Problem," The Journal of Human Resources 24 (1989),
343-360.
, "The Selection Problem," Working Paper No. 9012, Social Systems Research Institute,
University of Wisconsin, 1990.
MIZON, G. E., "Inferential Procedures in Nonlinear Models: An Application to a UK Industrial Cross
Section Study of Factor Substitution and Returns to Scale," Econometrica 45 (1977), 1221-1242.
NIJMAN,T. E. AND M. VERBEEK,"Nonresponse in Panel Data: The Impact on Estimates of a Life Cycle
Consumption Function," mimeo, Tilburg University, 1990.
RIDDER,G., "Attrition in Multi-WavePanel Data," in J. Hartog, G. Ridder and J. Theeuwes, eds., Panel
Data and Labor Market Studies (Elsevier: North-Holland, 1990), 45-67.
VELLA, F., "A Simple Estimator for Simultaneous Models with Censored Endogenous Regressors,"
mimeo, Rice University, 1990.
VERBEEK,M., "On the Estimation of a Fixed Effects Model with Selectivity Bias," Economics Letters
34 (1990), 267-270.
WANSBEEK, T. J. AND A. KAPTEYN, "Estimation of the Error Components Model with Incomplete
Panels," Journal of Econometrics 41 (1989), 341-361.

This content downloaded from 132.211.1.50 on Fri, 21 Aug 2015 12:53:36 UTC
All use subject to JSTOR Terms and Conditions

You might also like