Lecture - 9 EstimationRM (ECON 1005 2011-2012)

ECON 1005
INTRODUCTION TO STATISTICS
ESTIMATION
Introduction
• In the last two lectures, we discussed the characteristics and
properties of the probability distributions of random variables
• These characteristics were the parameters:

– n and p for the Binomial Distribution
– λ for the Poisson Distribution
– μ and σ for the Normal Distribution
• In the real world we frequently do not know the values of

these parameters and will have to estimate them.
• Three words become critical for this chapter: estimation (the

process); estimate (the result) and estimator (the facilitator).
Three approaches to estimating unknown
population parameters.
First approach -
• Perform a complete enumeration of the population
(also known as a census) and calculate the mean and
variance from the dataset so derived. Unfortunately:
– It can be expensive
– It can be time consuming
– It consumes large quantities of resources
– It can be destructive to the elements of the population
– It may yield a level of accuracy that is not cost effective
when compared with the results of an appropriately sized
sample.
Second Approach -
• Guess the average value or proportion from

our knowledge of the population
Unfortunately this approach:

– is very unscientific
– may produce results that vary widely
– leaves the researcher with no method of judging
how close the ‘guesstimate’ is to the real value of
the population parameter.
Third Approach -
• The preferred method :- draw a random

sample of appropriate size from the
population and use the sample data and your
choice of a formula (called a sample statistic)
to estimate the unknown population
parameter.
Definition of Estimation
Estimation then is the process by which we

estimate the value of an unknown population
parameter by making use of the data from a
random sample that was drawn from that
Population.
THE ESTIMATION PROCESS
• Identify the Unknown Population Parameter

|
• Decide on the Size of the Random Sample - n
• Select the Random Sample of Size n
• Choose an Appropriate Sample Statistic [Estimator]
• Substitute the Sample Data into the Sample Statistic
• Calculate the estimate and interpret

Two Types of Estimates
Suppose we seek to estimate the mean age of
Level I students on the Campus. We may draw
a random sample of 100 Level I students from
the Campus, record their ages, substitute the
100 values into the formula for the mean of a
sample (also called the sample statistic or
estimator), and read off the estimate.
The resulting estimate can be a single value

e.g. 20 or an interval of values ( 18 , 22).
Two Types of Estimates
• The single valued estimate of an unknown

population parameter derived through
estimation is called a point estimate.
• On the other hand, the estimate comprising of

an interval of values derived through
estimation is called an interval estimate for
the unknown population parameter.
Estimators
• How do we use the data from our random
sample to arrive at an estimate?
• We substitute the sample data into a formula
better known as a sample statistic.
• These sample statistics are called estimators.
• A point estimator for an unknown population
parameter is a sample statistic into which the
data from the random sample is substituted,
so as to yield a point estimate of that
parameter.
Commonly Used Point Estimators
Population Sample
Parameter Statistic
Meanμ Sample Mean
Sample Median
Sample Mode
Std Deviation σ Sample Std Deviation s
Λ
Proportion p Sample Proportion p
Example
The mean and standard deviation of the teaching experience of
faculty members in a department at a University are unknown. A
random sample of 5 faculty members were selected; their teaching
experience in years were as follows: 7 8 14 7 20
• Identify suitable point estimators for the mean teaching experience

of the entire faculty
• Identify suitable point estimators for the standard deviation of

teaching experience of the entire faculty
• Find a point estimate of the mean teaching experience of the entire

faculty
• Find a point estimate of the std deviation of the teaching

experience of the entire faculty.
Solution
1. We can use any of three point estimators to estimate the population mean viz.
sample mean, sample mode or sample median.
2. We can use the sample standard deviation (s) as the point estimator for the
population standard deviation.
On the basis of the three estimators declared in 1. above, we can compute three
point estimates.
– Sample Mean = 1/5 ( 7 + 8 + 14 + 7 + 20 ) = 11.2
– Sample Mode = 7
– Sample Median = 8
– The point estimate of the population standard deviation is the value of s .

s =  1/4 (4.2 2 + 3.22 + 4.22 +2.82 +8.82) = 5.718
Solution
Some Realities
• Since we must estimate population parameters from
samples, it is inevitable that we make errors.
• Different sample sizes can give rise to different point

estimates when the same estimator is used
• Different estimators can give rise to different point

estimates when the same sample is used
• Different estimators and different sample sizes can give rise

to different point estimates
• Some estimates will agree with the true value of the

population parameter; others would not.
Error in Estimation
• The difference between the point estimate and the true
value of the population parameter is known as the total
error in the estimate.
• This total error between the point estimate and the true
value of the population parameter can be the result of both
sampling error and non-sampling error.
• The sampling errors occur because of chance.
• Other errors may also arise as a result of human errors, and

not chance; these tend to impair the results obtained. Such
errors are called non-sampling errors.
• TOTAL ERROR IN THE ESTIMATE = SAMPLING ERROR + NON-

SAMPLING ERROR
Sources of Non-Sampling Error
There exists many sources of non-sampling

error. Some of these sources are :
• Inability to obtain all the required information from
all elements of the sample
• Difficulties in defining terms
• Differences in interpretation of questions
• Errors in the data collection such as in recording or
coding
• Errors made in the data tabulation activity.
Example
Consider a history class of five students. Their exam scores were 70, 78, 80,
80 & 95.
• Find the population mean. (μ = 80.6)
Suppose that a random sample of three students was drawn i.e. 70, 80 & 95.
• Use the sample data and the sample mean to estimate the population
mean. ( = 81.67)
• What is the difference due to chance? (1.07)
Now suppose that we mistakenly recorded 82 instead of 80.
• What would be the new estimate of the population mean? ( = 82.33)
• What is the new difference between the population mean and the point
estimate? (1.73)
Example (cont’d)
• It is this difference of 1.73 that we call the total error in
the estimate. It is subdivided into two components:
– The sampling error of 1.07
– The non-sampling error of 0.66
• As this error grows, the sample statistic will become

less useful as an estimator of the population
parameter.
• We must therefore be able to determine the impact of

the error on the inferences that we will be making by
subjecting the estimators to specific tests. These are
discussed in the next chapter.
Unbiased Point Estimators
SAMPLING DISTRIBUTION OF THE MEAN
Return to our population of history scores for the class comprising five
students A, B, C, D and E.
A = 70, B = 78, C = 80, D = 80, E = 95
Population Mean = 80.6 Population Std Deviation = 8.09
We will now perform the following activities.
1. Consider all possible samples of three scores from this population; there
are 10 such samples.
2. Compute the sample mean for each of the 10 samples.
3. Construct the Frequency Distribution of Sample Means.
4. Construct the Relative Frequency Distribution of Sample Means.
5. Rename Relative Frequency as Probability to create the Probability

Distribution of the Sample Means
1 & 2. Generating the 10 Random Samples of Size 3
Sample Scores in the Sample

Sample Mean
• ABC 70, 78, 80 76.00
• ABD 70, 78, 80 76.00
• ABE 70, 78, 95 81.00
• ACD 70, 80, 80 76.67
• ACE 70, 80, 95 81.67
• ADE 70, 80, 95 81.67
• BCD 78, 80, 80 79.33
• BCE 78, 80, 95 84.33
• BDE 78, 80, 95 84.33
• CDE 80, 80, 95 85.00
3. The Frequency Distribution of Sample Means
X f
76.00 2
76.67 1
79.33 1
81.00 1
81.67 2
84.33 2
85.00 1
∑f = 10
4. The Relative Frequency Distribution of
Sample Means
X Relative Frequency
76.00 0.2
76.67 0.1
79.33 0.1
81.00 0.1
81.67 0.2
84.33 0.2
85.00 0.1
∑Rel. Freq. = 1
5. The Probability Distribution of Sample Means
(or The Sampling Distribution of the Mean)
X Probability
76.00 0.2
76.67 0.1
79.33 0.1
81.00 0.1
81.67 0.2
84.33 0.2
85.00 0.1
∑Probability = 1
Sampling Distributions in this Course
• In general, the probability distribution of a
Sample Statistic is called its sampling distribution.
• We will focus on two sampling distributions:
– Sampling Distribution of the Mean
– Sampling Distribution of the Proportion
• In the Sampling Distribution of the Mean, the
random variable is the sample mean .
• In the Sampling Distribution of the Proportion,
the random variable is the sample proportion pΛ.
The Mean of the Sampling Distribution of the Mean
• The mean of the sampling distribution of the

mean is equal to the population mean μ.
• Class Activity
Compute the mean of the Sampling Distribution of
the Mean History Score based on the ten random
samples of size 3.
Show that it is indeed equal to the population mean.
The Standard Deviation of the Sampling Distribution
of the Mean
• The Standard Deviation of the Sampling Distribution of
Mean is given by σx where
σx = σ /√n.
• σx is also called the standard error or the standard

error mean.
• The spread of the Sampling Distribution of the Mean is

smaller than the spread of the corresponding
population distribution.
• The standard deviation of the Sampling Distribution of

Mean decreases as the sample size increases.
What kind of distribution will the Sampling
Distribution of the Mean have?
• If the population from which the samples are

drawn is normally distributed with mean μ
and standard deviation σ , then the Sampling
Distribution of the Mean will also be normally
distributed with mean μ and standard
deviation σx (irrespective of the sample size).
• Does the above result hold true if the

population were not normally distributed?
What kind of Probability Distribution does the Sampling
Distribution of the Mean possess when the population is not
Normal ?
The Central Limit Theorem assures us that

• If the sample size is large, the Sampling
Distribution of the Mean will be approximately
normally distributed with mean μ and standard
deviation σx irrespective of the distribution of
the population.
• Large is taken to mean n ≥ 30.
• What happens when the sample size is small i.e.

n < 30?
Example
• The population mean and standard deviation
of the final exam grades of ECON1005
students is known to be 65 and 20
respectively. If a sample of 100 students are
drawn at random, what is the probability that
the sample mean will be greater than 70?
What kind of Probability Distribution does the Sampling
Distribution of the Mean possess when the population is not
Normal and sample size is small i.e. n < 30?
• We must look to the Student t Distribution
• The Student t Distribution is a specific type of bell-shaped distribution with a lower

height and a wider spread than the Standard Normal Distribution.
• The Student t Distribution has only one parameter i.e. the number of degrees of
freedom abbreviated df
• The number of degrees of freedom is the number of observations that can be

freely chosen.
• The mean of the Student t Distribution is 0
• The standard deviation of the Student t Distribution is df/(df – 2)
• As the degrees of freedom increases the Student t Distribution approaches the

Standard Normal Distribution.
What kind of Probability Distribution does the Sampling Distribution
of the Mean possess when the population is not Normal and sample
size is small i.e. n < 30?
• If the population from which the samples are

drawn is either of unknown distribution or not
normally distributed with mean μ and standard
deviation σ, then the Sampling Distribution of the
Mean is specified by the Student t Distribution
with n - 1 degrees of freedom.
• The random variable of the Student t Distribution

is given by T where
T = (x - μ)/σx .
Activity
• The mean age of all diabetes patients in
Tobago is 50 with a Std.Dev. Of 15. If a sample
of 25 patients is drawn determine the
probability that the mean age of these
patients will be less than 60 years old?
The Sampling Distribution of Proportion
• The probability distribution of the sample
proportion is called the Sampling Distribution
of the Proportion.
• The random variable of the Sampling
Distribution of the Proportion is pΛ
• The mean of the Sampling Distribution of the
Proportion is the population proportion p.
• The standard deviation of the Sampling
Distribution of the Proportion is given by
√(pq/n).
What is the shape of the Sampling Distribution
of the Proportion?
The Central Limit Theorem assures us that
• If the sample size is sufficiently large, the

Sampling Distribution of the Proportion will be
approximately normally distributed with mean
p and standard deviation √(pq/n).
• Sufficiently Large means np > 5 and nq > 5.

Interval Estimates: Confidence Intervals
• We were speaking all along about Unbiased Point
Estimators.
• Instead of assigning a single value to an unknown
population parameter, we can construct an interval of
values around the point estimate and make a probabilistic
statement that the interval contains the value of the
corresponding population parameter.
• Such activity is called interval estimation and interval

estimators are called Confidence Intervals.
• These estimators, when applied to the data from a random

sample, defines an interval that is likely to contain the true
value of the population parameter being estimated.
Interval Estimates
• The likelihood is seen to be a level of confidence; this is discerned
from the probabilistic statement and written as a percentage
• An interval that is constructed based on the confidence level is

called a confidence interval.
• A 90% Confidence Interval means a 10% significance level i.e. α = 10%
• A 95% Confidence Interval means a 5% significance level i.e. α = 5%
• Confidence Interval Estimates in this course are as follows:
– For the population mean based on large samples

– For the population mean based on small samples
– For the population mean based on large samples with σ unknown
– For the population mean based on small samples with σ unknown
– For the population proportion
A 100 (1 - α)% Confidence Interval Estimate for the
Population Mean μ
Population Mean μ when σ is unknown.
• Let X ~ N(μ , σ) where σ is unknown. A single sample of size n was drawn
and the sample mean X was computed. On the basis of this single sample
mean, find a 100(1 - α)% Confidence Interval Estimate for μ.
• Here we substitute s for the unknown σ.
• However, it matters whether
– n is large i.e. (n ≥ 30)
– n is small i.e. (n < 30)
• If n ≥ 30 the CLT allows us to use the Normal Distribution N(μ , s/√n ) as the
Sampling Distribution
• If n < 30 the CLT allows us to use the Student-t Distribution with n – 1 df as

the Sampling Distribution.
Population Mean μ when σ is unknown and n ≥ 30.
A 100( 1 – α)% interval estimate for the population

mean μ when n ≥ 30 and σ is unknown is given by
X - Zα/2 s/√n ≤ μ ≤ X + Zα/2 s/√n

or
(X - Zα/2 s/√n , X + Zα/2 s/√n)
where Zα/2 comes from the Std Normal Distribution and

s is the sample standard deviation.
Diagram & Interpretation???

Population Mean μ when σ is unknown and n < 30.
A 100( 1 – α)% interval estimate for the population mean
μ when n < 30 and σ is unknown is given by
X - tα/2 s/√n ≤ μ ≤ X + tα/2 s/√n

or
(X - tα/2 s/√n , X + tα/2 s/√n)
where tα/2 comes from the Student-t Distribution with

(n – 1) degrees of freedom and s is the sample standard
deviation
• Diagram & Interpretation???

Class Exercise 1
The standard deviation for a population is 14.8.

A sample of 100 observations selected from this
population gave a mean of 143.72.
• Construct a 99% confidence interval for μ.

• Does the width of the confidence intervals constructed

in parts a. to c. decrease as the confidence level
decreases? Explain.
Interpretation
• If the experiment of drawing a random sample
of 100 observations is repeated a very large
number of times. 99% of the time that we
obtain a sample mean of 143.72 the true
value of the population mean (insert what you
are studying) will be between 139.909 and
147.531
Answer to Class Exercise 1
• 99% CI is (139.92 and 147.52)
• 95% CI is (140.82 and 146.62)
• 90% CI is (141.28 and 146.16)
• Notice that the width of the Confidence
Interval decreases as the Confidence level
decreases.
• It makes sense right? Why?
Another Class Exercise
A sample of 10 observations taken from a
normally distributed population produced the
following data:
44 52 31 48 46 39 47 36 41 57
a. What is the point estimate of μ?

Sample Mean = 44.1 and S = 7.67
b. Construct a 95% confidence interval for μ.

Activity
• A sample of 49 households in Grenada yielded
an average monthly income of $3000 with a
Std.Dev. of $450. Construct an 80% confidence
interval of the monthly average income of all
households in Grenada, if the underlying
population is normally distributed?
Population Proportion p.
A 100( 1 – α)% interval estimate for the population

proportion p is given by
pΛ - Zα/2 √(pq/n) ≤ p ≤ pΛ + Zα/2 √(pq/n)

or
(pΛ - Zα/2 √(pq/n) , pΛ + Zα/2 √(pq/n))
where Zα/2 comes from the Std Normal Distribution.

Exercise
• In a sample of 100 patients from a clinic, 30%
have been found to have a history of heart
disease in their family. Compute a 90%
confidence interval for the proportion of all
patients having a history of heart disease in
their family?
Working out Confidence Intervals:
Minitab Version
Exhibit I
Variable N Mean Median TrMean StDev SE Mean

Group A 135 51.44 51.00 51.40 09.37 *
Group B 108 51.80 53.00 52.05 ** 1.168
Variable Minimum Maximum Q1 Q3

Group A 18.00 71.00 44.00 57.00
Group B 13.00 81.00 43.00 61.00
Question: Assuming normality calculate a 91% confidence
interval for the mean Group B score and give an
interpretation to the result obtained.
91% Confidence Interval for the mean Group B score is given by:
(X - Zα/2 σx , X + Zα/2 σx) where Z0.09/2 = Z0.045 = 1.70
Substituting our known values:

= 51.8 – 1.7(1.168) < μ < 51.8 + 1.7(1.168)
= 49.814 < μ < 53.786
Therefore, if random samples of size 108 are drawn a large number of times
and the sample mean calculated is 51.8, then 91% of the times, the
corresponding population mean would lie between 49.8 and 53.8.
End of Lecture 8
• We have reviewed the Confidence Intervals
that form an integral part of the 5 stages of a
statistical analysis.
• Next we move on to another level of
investigation with respect to sample data.
• This involves Hypothesis testing.

Lecture - 9 EstimationRM (ECON 1005 2011-2012)

Uploaded by

Copyright:

Available Formats

Lecture - 9 EstimationRM (ECON 1005 2011-2012)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture - 9 EstimationRM (ECON 1005 2011-2012)

Uploaded by

Copyright:

Available Formats

ECON 1005

• These characteristics were the parameters:

• In the real world we frequently do not know the values of

• Three words become critical for this chapter: estimation (the

• Guess the average value or proportion from

Unfortunately this approach:

• The preferred method :- draw a random

Estimation then is the process by which we

• Identify the Unknown Population Parameter

• Select the Random Sample of Size n

• Choose an Appropriate Sample Statistic [Estimator]

• Substitute the Sample Data into the Sample Statistic

• Calculate the estimate and interpret

The resulting estimate can be a single value

• The single valued estimate of an unknown

• On the other hand, the estimate comprising of

Std Deviation σ Sample Std Deviation s

• Identify suitable point estimators for the mean teaching experience

• Identify suitable point estimators for the standard deviation of

• Find a point estimate of the mean teaching experience of the entire

• Find a point estimate of the std deviation of the teaching

– Sample Mean = 1/5 ( 7 + 8 + 14 + 7 + 20 ) = 11.2

– The point estimate of the population standard deviation is the value of s .

• Different sample sizes can give rise to different point

• Different estimators can give rise to different point

• Different estimators and different sample sizes can give rise

• Some estimates will agree with the true value of the

• The sampling errors occur because of chance.

• Other errors may also arise as a result of human errors, and

• TOTAL ERROR IN THE ESTIMATE = SAMPLING ERROR + NON-

There exists many sources of non-sampling

• Find the population mean. (μ = 80.6)

• What is the difference due to chance? (1.07)

Now suppose that we mistakenly recorded 82 instead of 80.

• What would be the new estimate of the population mean? ( = 82.33)

• As this error grows, the sample statistic will become

• We must therefore be able to determine the impact of

We will now perform the following activities.

2. Compute the sample mean for each of the 10 samples.

3. Construct the Frequency Distribution of Sample Means.

4. Construct the Relative Frequency Distribution of Sample Means.

5. Rename Relative Frequency as Probability to create the Probability

Sample Scores in the Sample

• The mean of the sampling distribution of the

• σx is also called the standard error or the standard

• The spread of the Sampling Distribution of the Mean is

• The standard deviation of the Sampling Distribution of

• If the population from which the samples are

• Does the above result hold true if the

The Central Limit Theorem assures us that

• Large is taken to mean n ≥ 30.

• What happens when the sample size is small i.e.

• The Student t Distribution is a specific type of bell-shaped distribution with a lower

• The number of degrees of freedom is the number of observations that can be

• The mean of the Student t Distribution is 0

• The standard deviation of the Student t Distribution is df/(df – 2)

• As the degrees of freedom increases the Student t Distribution approaches the

• If the population from which the samples are