Topic 5: Probability Bounds and The Distribution of Sample Statistics

Course 003: Basic Econometrics, 2011-2012
Topic 5: Probability Bounds and the Distribution of Sample Statistics

Rohini Somanathan Course 003, 2011-2012
Page 0
Rohini Somanathan
'
The Inference Problem

So far, our starting point has been a given probability space. The likelihood of dierent outcomes in this space is determined by our probability measure- weve discussed dierent types of sample spaces and a variety of measures that can be used to assign probabilities to events. Well now look at how we can generate information about the probability space by analyzing a sample of outcomes. This process is referred to as statistical inference. Inference procedures are parametric when we make assumptions about the probability space from which our sample is drawn (for example, each sample observation represents an outcome of a normally distributed random variable with unknown mean and unit variance). If we make no such assumptions our procedures are nonparametric. Well discuss parametric inference. This involves both the estimation of population parameters and testing hypotheses about them.
&
Page 1 Rohini Somanathan
'
Dening a Statistic
Denition: A function of one or more random variables that does not depend on any unknown parameters is called a statistic. Notice that: a statistic is itself a random variable weve considered several functions of random variables, whose distributions are well dened such as : Y= Y=
i=1 X n
where
X N(, 2 ). We showed that Y N(0, 1).
Xi , where each Xi has a bernoulli distribution with parameter p was shown to have a
n
binomial distribution with parameters n and p. Y=

i=1
X2 where each Xi has a standard normal distribution was shown to have a 2 distribution n i
etc... Only some of these functions of random variables are statistics ( why?) This distinction is important because statistics have sample counterparts. In a problem of estimating an unknown parameter, , our estimator will be a statistic whose value can be regarded as an estimate of . It turns out that for large samples, the distributions of some statistics, such as the sample mean, are easily identied.
&
'
Markovs Inequality
We begin with some useful inequalities which provide us with distribution-free bounds on the probability of certain events and are useful in proving the law of large numbers, one of our two main large sample theorems. Markovs Inequality: Let X be a random variable with density function f(x) such that P(X 0) = 1. Then for any given number t > 0, P(X t) E(X) t
Proof. (for discrete distributions) E(X) = x xf(x) = x<t xf(x) + xt xf(x) All terms in these summations are non-negative by assumption, so we have E(X) xf(x) tf(x) = tP(X t)
xt xt
This inequality obviously holds for t E(X) (why?). Its main interest is in bounding the probability in the tails. For example, if the mean of X is 1, the probability of X taking values bigger than 100 is less than .01. This is true irrespective of the distribution of X- this is what makes the result powerful.
&
'
Chebyshevs Inequality
This is a special case of Markovs inequality and relates the variance of a distribution to the probability associated with deviations from the mean. Chebyshevs Inequality: Let X be a random variable such that the distribution of X has a nite variance 2 and mean . Then, for every t > 0, 2 P(|X | t) 2 t or equivalently, P(|X | < t) 1 2 t2
Proof. Use Markovs inequality with Y = (X )2 and use t2 in place of the constant t. Then Y takes only non-negative values and E(Y) = Var(X) = 2 . In particular, this tells us that for any random variable, the probability that values taken by the 1 variable will be more than 3 standard deviations away from the mean cannot exceed 9 P(|X | 3) 1 9
For most distributions, this upper bound is considerably higher than the actual probability of this event.
&
'
Probability bounds ..an example

Chebyshevs Inequality can, in principle be used for computing bounds for the probabilities of certain events. In practice this is not often used because the bounds it provides are quite dierent from actual probabilities as seen in the following example: Let the density function of X be given by f(x) =
3 If t = 2 , then 1 (x). I (2 3) ( 3, 3)
3 2
In this case = 0 and 2 = 1.
3 3 Pr(|X | ) = Pr(|X| ) = 1 2 2 Chebyshevs inequality gives us while our bound is

1 4. 1 t2
3 2
1 3 = .13 dx = 1 2 2 3
4 9
which is much higher. If t = 2, the exact probability is 0,
&
'
The sample mean and its properties

Our estimate for the mean of a population is typically the sample mean. We now dene this formally and derive its distribution. We will further justify the use of this estimator when we move on to discuss estimation. Denition: Suppose the random variables X1 , . . . , Xn form a random sample of size n from a distribution with mean and variance 2 . The arithmetic average of these sample observations, Xn is known as the sample mean: Xn = 1 (X1 + + Xn ) n
Since Xn is the sum of i.i.d. random variables, it is also a random variable E(Xn ) =
1 n n
E(Xi ) =
i=1
1 n .n
=
1 n2 n
Var(Xn ) =
n 1 Var( Xi ) n2 i=1
Var(Xi ) =
i=1
1 n2 n2
2 n
Weve therefore learned something about the distribution of a mean of a sample: Its expectation is equal to that of the population. It is more concentrated around its mean value than was the original distribution. The larger the sample, the lower the variance of Xn .
&
'
Sample size and precision of the sample mean

We can use Chebyshevs Inequality to ask how big a sample we should take, if we want to ensure a certain level of precision in our estimate of the sample mean. Suppose the random sample is picked from a distribution which unknown mean and variance equal to 4 and we want to ensure an estimate which is within 1 unit of the real mean with probability .99. So we want Pr(|X | 1) .01. Applying Chebyshevs Inequality we get Pr(|X | 1) n = 400.
4 n.
Since we want
4 n
= .01 we take
This calculation does not use any information on the distribution of X and therefore often gives us a much larger number than we would get if this information was available. Example:
each Xi followed a bernoulli distribution with p = 1 , then the total number of successes T = 2
n
Xi follows
i=1
T a binomial, Xn = n , E(T ) = n and Var(T ) = n 2 4 wed like our sample mean to lie within .1 of the population mean, i.e. in the interval [.4, .6], with probability equal to .7. Using Chebyshevs Inequality, we have n 25 P(.4 Xn .6) = P(.4n < T < .6n) = P(|T n | .1n) 1 2 = 1 n . This gives us n = 84. 2 4(0.1n)
If we compute these probabilities directly from the binomial distribution, we get F(9) F(6) = .7 when n = 15, so if we knew that Xi followed a Bernoulli distribution we would take this sample size for the desired level of precision in our estimate of Xn .
&
'
Convergence of Real Sequences

We would like our estimate of the sample mean (and other parameters) to be well-behaved. What does this mean? One desirable property is that it gets closer to the parameter that we are trying to estimate as our sample gets larger. Were going to make precise this notion of getting closer. Recall that a sequence is just a function from the set of natural numbers N to any set A 1 (Examples: yn = 2n , yn = n ) A real number sequence {yn } converges to y if for every > 0, there exists N( ) for which n N( ) = |yn y| < . In such as case, we say that {yn } y Which of the above functions converge? If we have a sequence of functions {fn }, the sequence is said to converge to a function f if fn (x) f(x) for all x in the domain of f. In the case of matrices, a sequence of matrices converges if each the sequences formed by (i, j)th elements converge, i.e. Yn [i, j] Y[i, j].
&
'
Sequences of Random Variables

A sequence of random variables is a sequence for which the set A is a collection of random variables and the function dening the sequence puts these random variables in a specic order. Examples: Yn =
1 n n i=1
Xi , where Xi N(, 2 ), or Yn =
n i=1
Xi , where Xi Bernoulli(p)
We now need to modify our notion of convergence, since the sequence {Yn } no longer denes a given sequence of real numbers, but rather, many dierent real number sequences, depending on the realizations of X1 , . . . , Xn . Convergence questions can no longer be veried unequivocally since we are not referring to a given real sequence, but they can be assigned a probability of occurrence based on the probability space for random variables involved. There are several types of random variable convergence discussed in the literature. Well focus on two of these: Convergence in Distribution Convergence in Probability
&
'
Convergence in Distribution
Denition: Let {Yn } be a sequence of random variables, and let {Fn } be the associated sequence of cumulative distribution functions. If there exists a cumulative distribution function F such that Fn (y) F(y) y at which F is continuous, then F is called the limiting CDF of {Yn }. Letting Y have the distribution function F, we say that Yn converges in distribution to the random variable Y and denote this by Yn Y. The notation Yn F is also used to denote Yn Y F Convergence in distribution holds if there is convergence in the sequence of densities (fn (y) f(y)) or in the sequence of MGFs (MYn (t) MY (t)). In some cases, it may be easier to use these to show convergence in distribution. Result: Let Xn X, and let the random variable g(X) be dened by a function g(.) that is continuous, except perhaps on a set of points assigned probability zero by the probability distribution of X. Then g(Xn ) g(X) Example: Suppose Zn Z N(0, 1), then 2Zn + 5 2Z + 5 N(5, 4) (why?)
d d d d d d d
&
'
Convergence in Probability
This concept formalizes the idea that we can bring the outcomes of the random variable Yn arbitrarily close to the outcomes of the random variable Y for large enough n. Denition: The sequence of random variables {Yn } converges in probability to the random varaible Y i limn P(|yn y| < ) = 1
p
>0
We denote this by Yn Y or plim Yn = Y. This justies using outcomes of Y as an approximation for outcomes of Yn since the two are very close for large n. Notice that while convergence in distribution is a statement about the distribution functions of Yn and Y whereas convergence in probability is a statement about the joint density of outcomes, yn and y. Distribution functions of very dierent experiments may be the same: an even number on a fair die and a head on a fair coin have the same distribution function, but the outcomes of these random variables are unrelated. Therefore Yn Y implies Yn Y In the special case where Yn c, we also have p Yn c and the two are equivalent.
p d d
&
'
Properties of the plim operator

plimAXn = AplimXn the plim of a sum is the sum of the plims the plim of a product is the product of the plims Example: Yn = (2 +
1 n )X + 3
and X N(1, 2). Using the properties of the plim operator, we have N(5, 8) Since convergence in probability implies
plim Yn = plim(2 + convergence in
1 n )X + plim(3) = 2X + 3 d distribution, Yn N(5, 8)
&
'
The Weak Law of Large Numbers

Consider now the convergence of the random variable sequence whose nth term is given by: 1 Xn = n
n
Xi
i=1
WLLN: Let {Xn } be a sequence of i.i.d. random variables with nite mean and variance 2 . Then Xn . Proof. Using Chebyshevs Inequality, P(|X | < ) 1 Hence
n p
2 n 2
lim P(|X | < ) = 1 or plimX =
The WLLN will allow us to use the sample mean as an estimate of the population mean, under very general conditions.
&
'
Central Limit Theorems

Central limit theorems specify conditions under which sequences of random variables converge in distribution to known families of distributions. These are very useful in deriving asymptotic distributions of test statistics whose exact distributions are either cumbersome or dicult to derive. There are a large number of theorems which vary by the assumptions they place on the random variables -scalar or multivariate, dependent or independent, identically or non-indentically distributed. The Lindberg-Levy CLT: Let {Xn } be a sequence of i.i.d. random variables with EXi = and var(Xi ) = 2 (0, ) i. Then Xn
n
n(Xn ) d = N(0, 1)
&
'
Lindberg-Levy CLT..applications
Approximating Binomial Probabilities via the Normal Distribution: Let {Xn } be a sequence of i.i.d. Bernoulli random variables. Then, by the LLCLT:
n
Xi np
i=1
np(1 p)
n
N(0, 1) and
i=1
Xi N(np, np(1 p))
In this case,
i=1
Xi is of the form aZn + b with a =
np(1 p) and b = np and since Zn

n
converges to Z in distribution, the asymptotic distribution

i=1
Xi is normal with mean and
variance given above (based on our results on normally distributed variables). Approximating 2 Probabilities via the Normal Distribution: Let {Xn } be a sequence of i.i.d. chi-square random variables with 1 degree of freedom. Using the additivity property
n
of variables with gamma distributions, we have distribution is and its variance is by the LLCLT:
n i=1
Xi 2 Recall that the mean of gamma n random variable, =

v 2
2 .
For
i=1 a 2 n
and = 2. Then,
Xi n d N(0, 1) and 2n
n i=1
Xi N(n, 2n)
&

Topic 5: Probability Bounds and The Distribution of Sample Statistics

Uploaded by

Copyright:

Available Formats

Topic 5: Probability Bounds and The Distribution of Sample Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 5: Probability Bounds and The Distribution of Sample Statistics

Uploaded by

Copyright:

Available Formats

Course 003: Basic Econometrics, 2011-2012