Nothing Special   »   [go: up one dir, main page]

Sampling CH-2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter -2

Simple Random Sampling

2.1.Basic Concepts

Simple random sampling (SRS) is a method of selection of a sample comprising of n number of


sampling units out of the population having N number of sampling units such that every
sampling unit has an equal chance of being chosen.

The samples can be drawn in two possible ways.


 The sampling units are chosen without replacement in the sense that the units once
chosen are not placed back in the population.
 The sampling units are chosen with replacement in the sense that the chosen units are
placed back in the population.

Sampling without replacement

Sampling without replacement (wor) means that once a unit has been selected, it cannot be
selected again. In other words, it means that no unit can appear more than once in the sample. If
there are n sample units required for selection from a population having N units, then there are
( ) ways of selecting n units out of a total of N units without replacement, disregarding the order
of the n units. Hence, simple random sampling is equivalent to the selection of one of the ( )
possible samples with an equal probability ⁄ assigned to each sample.
( )

In simple random sampling without replacement the probability of a specified unit of the
population being selected at any given draw is equal to the probability of its being selected at the
first draw, that is, ⁄ . However, for a sample of size n, the sum of the probabilities of these
mutually exclusive events is ⁄ .

Sampling with replacement:

The process of sampling with replacement (wr) allows for a unit to be selected on more than one
draw. There are ways of selecting n units out of a total of N units with replacement. In this
case, the order of selection will be considered. All selections are independent since the selected
unit is returned to the population before making the next selection. Thus, the probability is , ⁄
for any specific element on each of the n draws.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 1
Simple random sampling with or without replacement is practically identical if the sample size is
a very small fraction of the population size. Generally, sampling without replacement yields
more precise results and is operationally more convenient.

2.2.Simple Random Sample Selection Procedures

In sample survey when sample units are selected from a population there could be possibilities of
biases in the selection procedure which may come from the use of a non-random method. That is,
the selection is consciously or unconsciously influenced by subjective judgment of human being.
Such bias can be avoided by using a random selection method. The true randomness can be
ensured by using the method of selection which cannot be affected by human influence.
There are different random sample selection methods. The important aspect of random selection
in each method is that the selection of each unit is based purely on chance. This chance is known
as probability of selection which eliminates selection bias. If there is a bias in the selection, it
may prevent the sample from being representative of the population. Representative means that
probability samples permits scientific approaches in which the samples give accurate estimates
of the total population. We consider here two basic and common procedures of random selection
method.

Lottery Method: This is a very common method of taking a random sample. Under this method,
we label each member of the population by identifiable disc or a ticket or pieces of paper. Discs
or tickets must be of identical size, color and shape. They are placed in a container (urn/bowl)
and well mixed before each draw, and then without looking into the container selection of
designated labels will be performed with or without replacement. Then series of drawing may be
continued until a sample of the required size is selected. This procedure shows that selection of
each item depends entirely on chance.

For example, if we want to take a sample of 18 persons out of a population of 90 persons, the
procedure is to write the names of all the 90 persons on separate slips (tickets) of paper. The slips
(tickets) of paper must be of identical size, color and shape. The next step is to fold these slips,
mix them thoroughly and then make a blindfold selection of 18 slips one at a time without
replacement.
This lottery method becomes quite cumbersome and time consuming to use as the sizes of
sample and population increase. To avoid such problems and to reduce the labor of selection
process, another method known as a random number table selection process can be used.

Use of Random Numbers: A table of random numbers consists of digits from 0 to 9, which are
equally represented with no pattern or order, produced by a computer random number generator.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 2
The members of the population are numbered from 1 to N and n numbers are selected from one
of the random tables in any convenient and systematic way.

The procedure of selection is outlined as follows.

 Identify the population units (N) and give serial numbers from 1 to N. This total number
N determines how many of the random digits we need to read when selecting the sample
elements. This requires preparation of accurate sampling frame.
 Decide the sample size (n) to be selected, which will indicate the total serial numbers to
be selected.
 Select a starting point of the table of random numbers; you can start from any one of the
columns, which can be determined randomly.
 Since each digit has an equal chance of being selected at any draw, you may read down
columns of digits in the table.
 Depending on the population size N, you can use numbers in pairs, three at a time, four at
a time, and so on, to read from the table.
 If selected numbers are less or equal to the population size N, then they will be
considered as sample serial numbers.
 All selected numbers greater than N should be ignored.
 For sampling without replacement, reject numbers that come up for a second time.
 The selection process continues until n distinct units are obtained.

For example, consider a population with size N = 5000. Suppose it is desired to take a sample of
25 items out of 5000 without replacement. Since N = 5000, we need four digit numbers. All
items from 1 to 5000 should be numbered. We can start anywhere in the table and select numbers
four at a time. Thus, using a random table found at the end of this chapter, if we start from
column five and read down columns then we will obtain 2913, 2108, 2993, 2425, 1365, 1760,
2104, 1266, 4033, 4147, 0334,4225, 0150, 2940, 1836,1322, 2362, 3942, 3172, 2893, 3933,
2514, 1578, 3649, 0784 by ignoring all numbers greater than 5000.

2.3.Review of sampling distribution

Terminologies
N = Population size
n = Sample size
Y=∑ Value of the character under study for the ith unit in the
population

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 3
∑ Value of the character under study for the ith unit in the sample
∑ ⁄ Population mean

∑ ⁄ Sample mean

∑ ̅ ∑ ̅
, , Population variance

The relationship between these two variances can be established by expressing each variance in
terms of other, i.e., or .
∑ ̅
Sample variance and its square root denoted by is the
standard deviation of the sample elements.

∑ ̅ ̅ ∑ ̅ ̅
, or is the covariance of the random variable X and
Y.

or is the population correlation coefficient.

∑ ̅ ̅
is sample covariance

is sample correlation coefficient

is sampling fraction

sampling weight (expansion factor)

CV = coefficient of variation

cv = estimated coefficient of variation

The sample statistics are computed from the results of sample surveys since the primary
objective of a sample survey is to provide estimates of the population parameters, because the
reality shows that almost all population parameters are unknown.

Theorem 1: Prove that the probability of selecting a specified unit of the population at any given
draw is equal to the probability of its being selected at the first draw.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 4
Proof: In simple random sampling method an equal probability of selection is assigned to each
unit of the population at the first draw. Thus, in SRS from a population of N units, the
probability of drawing any unit at the first draw is , the probability of drawing any unit in the
second draw from among the available N1is and so on.

Let, yi be the event that any specified unit is selected at the ith draw. P (yi) = Prob.{A specific
unit is not selected at any one of previous (i-1) draws and then selected at the ith draw}

∑[ ]

∑[ ]

That means

Theorem 2: The probability that a specified unit is selected in the sample of size n is .
Proof: Since a specified unit can be selected in the sample of size n in n mutually exclusive
ways. It can be selected in the sample at the ith draw (I =1, 2,…, n) and since the probability that
it is selected at nth draw is

Therefore, the probability that a specified unit is included in the sample would be the sum of the
probabilities of inclusion in the sample at 1st draw, 2nd draw, … , nth draw. Thus, by addition
theorem of probability, we get

(⋃ ) ∑

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 5
Theorem 3: The possible numbers of sample of size n from a population of size N if sampling is
done with replacement is Nn.
Proof: The first unit can be drawn from N units in N ways. Similarly, second unit can also be
drawn in N ways because the first selected unit again mixed with the population. So on up to the
selection of nth unit. Thus, the total numbers of ways are ( ) ( ) ( ) ( )

( )

The Probability of drawing a sample is .

Example: For demonstration purpose we will consider a very small hypothetical population of 5
farmers, who use fertilizer in their farming. Suppose the amount of fertilizer used (in kg) by each
farmer is 70, 78, 80, 80, and 95. Then, the following parameters of the population and sample
values (statistics) are computed to justify the basic idea behind estimation.

Let Yi denotes the amount of fertilizer used by each farmer (i =1, 2, - - -, 5). The population size
is 5, i.e. N = 5. The total amount of fertilizer used by all farmers and the average fertilizer
consumption per farmer are computed as follows.

The total amount of fertilizer used is ∑ .


The mean consumption of fertilizer per farmer is . Regarding fertilizer
consumption variability among farmers, both types of population variances and their
corresponding standard deviations are calculated.

∑ ̅

∑ ̅

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 6
Taking the square root of each variance gives standard deviation of the population, which gives,
and . In reality all these population characteristics are mostly unknown
for relatively large size of population and should be estimated from survey results collected and
summarized from sample elements.
Now we want to estimate these population values from sample elements assuming that
population parameters are unknown. In the following sampling distribution we will examine all
possible samples.

Assume that sample of three farmers are selected from the total farmers to estimate the
population parameters. The total number of possible samples can be calculated as ( ) ( )
. The following table shows the ten possible samples with their corresponding values and
sample means. Let Fi represents the ith farmer, i = 1, 2, - - - , 5.

For each possible sample, dividing the sum of the amount of fertilizer used by the size of a
sample would give the sample mean ( ̅ ). For instance, the mean of the first sample is
=76.00, and the remaining sample means can be calculated in a similar way.
From the values of random variable ̅ , we can construct the frequency distribution as shown
below. From this frequency we obtain the probabilities of the random variable ̅ , by dividing
the frequency of the random variable ̅ by the sum of the frequencies.
Table 1: Possible samples with corresponding values and sample means

Types of Value for each Sample Mean ̅


No.
Sample Units Sample element
1 F1F2 F3 70,78,80 76.00
2 F1F2F4 70,78,80 76.00
3 F1F2 F5 70,78,95 81.00
4 F1F3F4 70,80,80 76.67
5 F1F3F5 70,80,95 81.67
6 F1F4 F5 70,80,95 81.67
7 F2F3F4 78,80,80 79.33
8 F2F3F5 78,80,95 84.33
9 F2F4 F5 78,80,95 84.33
10 F3F4 F5 80,80,95 85.00

This table gives the sampling distribution of ( ̅ ). If we draw just one sample of three farmers
from the population of five farmers, we may draw any one of the 10 possible samples of farmers.
Hence, the sample mean ̅ can assume any one of the values listed above with the corresponding
probabilities.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 7
For instance, the probability of the mean 81.67 is ̅ . This shows that
the sample average ( ̅ ) is a random variable that depends on which sample is selected. Its values
vary from 76.00 to 85 and some of these values are lower or higher than the population mean
̅ .

Table 2: Frequency distribution and their probability

Probability
Values of ̅ Frequency (f)
of ̅
76.00 2 2 10 = 0.2
76.67 1 1 10 = 0.1
79.33 1 1 10 = 0.1
81.00 1 1 10 = 0.1
81.67 2 2 10 = 0.2
84.33 2 2 10 = 0.2
85.00 1 1 10 = 0.1
Total 10 1.00

The overall mean, which can be calculated from all possible samples, is equal to the true
population mean. That is, the expected value of ̅ , denoted by E( ̅ ) , taken over all possible
∑ ̅
samples equals the true mean of the population. From the table, E( ̅ ∑
.
which is the same as ̅ . It can also be calculated using probability concept, that is, ̅
∑ ̅ ̅

What is the deviation of sample mean from the true population mean? It can be observed
that the sample mean is either equal to or different from the true population mean. This deviation
can be assessed in terms of probability. We will continue with the same example to explain the
properties of this deviation. We will consider only when the deviation is one unit or two units or
four units from the true population.

̅ ̅ ̅ ⁄

̅ ̅ ̅ ⁄

̅ ̅ ̅ ⁄

This indicates that the greater the demands we make of being close to "true" value, the smaller
the chance we have of fulfilling it.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 8
Variability of the mean: The Sampling variance of the mean, ̅ , is defined as the average
of the squared deviations of the sample means from the true mean, that is ̅ ̅ ̅
∑ ̅ ̅
, where n is the total number of possible samples, is the mean of ith sample and ̅ is
the true mean of the population. The square root of the sampling variance,√ ̅ , is called the
standard error (S.E.) of the mean of the sample. The smaller the standard error of the mean, the
greater is its reliability. For each possible ith sample, we can compute sample variance . Then,
the mean of the sample variance ( ) is equal to the population variance ( ), i.e.

, where n is total number of possible samples. Consider again example 2.1, the
population consisting of 5 farmers. The sample variances for all 10 possible samples of size 3
can be computed as:
∑ ̅ ∑
, where ̅ , for ith sample with sample size j=3.

A summary of the calculated sample variances are listed below.

Therefore, the mean of the sample variance ) is computed as,

We know that the population variance is = 81.8, and this shows that = with some
rounding errors. But the sampling variance, Var( ̅), is not the same as the population variance
( , that is, Var( ̅ )≠ . The equality can be established using the following relationship.
̅ , where is a finite population correction (fpc).

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 9
2.4.Properties of simple random sampling

Theorem 4: In SRSWOR the sample mean ̅ is an unbiased estimator of


population mean ̅ .

E( ̅ * ∑ + * ∑ + where

Since, takes only two values 1 and 0

( )

Hence, ̅ ∑ ∑ ̅

Theorem 5: In SRSWR, the sample mean ̅ is an unbiased estimator of population mean ̅ .


Proof: We have

̅ [ ∑ ] ∑ ∑̅ ̅ ̅

2.5.Variance and standard error of the estimate

Theorem 6: In SRSWOR, the variance of the sample mean is given by

̅ ( )

Proof: we have

̅ [̅ ̅ ] ̅ ̅

Now ̅ * ∑ + [∑ ∑ ]

= [ ∑ ∑ ]

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 10
Where ∑ ∑

But ∑ ̅ ∑ ̅

∑ ∑ ̅ ̅ ̅

Therefore ∑ *( ) ̅+

Also (∑ ) ∑ [ ∑ ∑ ]

[ ̅ ̅ ]

[ ̅ ]

[̅ ]

̅ [ ∑ ∑ ]

[( ) ̅ ] [̅ ]

̅ ( )̅ ( ) ( )

̅ ( )

So, ̅ ̅ ( ) ̅

̅ ( ) ( )

Theorem 7: In SRSWR, variance of sample mean is given by

̅ ( )

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 11
Proof: we have ∑

( ∑ ) ∑ ∑

Since in case of SRSWR each observation is independent, therefore

But

Then,

Eg. From a population of 50 units, a random sample of size 10 is drawn without replacement.
From the sample following result are obtained.

∑ ∑

Calculate the sample mean and its variance.



Solution: ̅

∑ ̅
, which is estimate value of .

Therefore,

Eg. Draw all possible samples of size 2 from the population {8, 12, 16} and verify that

̅ ̅ And find variance of estimate of the population mean.

Solution: In SRSWOR the number of samples is NCn = 3C2 = 3

̅ ∑̅ ∑̅

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 12
̅ , Therefore ̅ ̅

Table 3: possible samples and sample mean of (̅)

No. Sample Mean ( ̅)


1 (8,12) 10
2 (8,16) 12
3 (12,16) 14
Total 36
Again estimator of population mean is sample mean and so its variance

∑ ̅

Then

2.6.Estimation of Variance and standard error from a Sample

Since the expression of variance of sample mean involve which is based on population
values, so these expressions cannot be used in real life applications. In order to estimate the
variance of on the basis of a sample, an estimator of (or equivalently ) is needed.
Consider is an estimator of or and we investigate its biased ness for in the case of
SRSWOR and SRSWR.

Consider ∑ ̅ ∑ ̅ ̅ ̅

[∑ ̅ ̅ ̅ ]

[∑ ̅ ̅ ̅ ]

[∑ ̅ ] [ ̅ ]

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 13
In case of SRSWOR

̅ And so

[ ] [ ]

In case of SRSWR

̅ And so

[ ] [ ] ( ( ) )

Hence, ,

Standard errors: The standard error of is defined as √ ̅ . In order to estimate the


standard error, one simple option to consider the square root of the estimate of the variance of
sample mean.

Under SRSWOR, a possible estimator is ̂ ̅ √ √

Under SRSWR, a possible estimator is ̂ ̅ √ √

If we look at all these expressions, we can observe that as n increases, the value of √ also
increases and hence the standard error decreases. Thus, the standard error from a sample is used
for various purposes. It is mainly used:
 To compare the precision of estimate from SRS with that from other sampling methods.
 To determine the sample size required in a survey, and
 To estimate the actual precision of the survey.

Consistency: An estimate is consistent if its values tend to concentrate increasingly around the
true value as the sample size increases. In other words, the estimate assumes the population value
with probability approaching unity as the sample size tends to infinity. This definition of
consistency strictly applies to estimates based on samples drawn from an infinite population. We
use the following definition in the case of a finite population. An estimate is said to be a
consistent estimate of the parameter Y if it takes the population value when n = N.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 14
Theorem 8: An estimator is said to be consistent if it tends to the population value as the sample
size increases. Let ̅ is an estimator of a population parameter which denoted by ̅ . Then ̅ is a
consistent estimator of Y if:

For any positive number , ̅ ̅ This indicates that ̅ approaches ̅ as


n approaches ∞.

̅ ̅ ̅

Example: An estimator is said to be consistent if it tends to the population value with increasing
sample size. As the size of the sample increases, the sample estimates concentrate around the
population value. By considering the population of 5 farmers, we can find all possible samples of
size 2, 3, and 4 without replacement and compute the sample results. The sampling distribution is
has already been calculated when the sample size is three and in similar way the sampling
distributions can be calculated for sample sizes two and four. The following possible sample
means can be observed from three different sample sizes.

74.00  ̅  87.5, when the sample size n = 2 with 10 possible samples.


76.00  ̅  85.00, when the sample size n = 3 with 10 possible samples.
77.00  ̅  83.25, when the sample size n = 4 with 5 possible samples.

This example shows that as the sample size increases, the sample mean tends to the population
mean in both directions.

Efficiency: A particular sampling scheme is said to be more" efficient" than another if, for a
fixed sample size, the sampling variance of survey estimates for the first scheme is less than that
for the second. For the same population often comparisons of efficiency are made with simple
random sampling as a basic scheme using the ratio of their variances.

For example; if and are two estimators of , with equal sample size, and having variances
V( ) and V( ) respectively, the efficiency of relative to is given as follows.

Efficiency ( , Thus, if this ratio is greater than one, then is a better estimator
than .

Theorem 9: The variance of the sample mean is more in SRSWR in comparison to its variance
in SRSWOR, i .e

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 15
Proof: We have and

Therefore,

[ ]

That implies

That means variance of the sample mean is more in SRSWR as compared with its variance in the
case of SRSWOR. In other words SRSWOR provides a more efficient estimate of sample mean
relative to SRSWR.

Example: A population have 7 units 1, 2, 3, 4, 5, 6, 7. Write down all possible samples of size 2
(without replacement) which can be drawn from the given population and verify that sample
mean is an unbiased estimate of the population mean. Also calculate its sample variance and
verify that

Table 4: Possible values corresponding sample mean and variances

Sample Sample Sample( ̅) ̅ ̅ ̅ ̅


No. values
1 (1,2) 1.5 -2.5 6.25
2 (1,3) 2.0 -2.0 4.00
3 (1,4) 2.5 -1.5 2.25
4 (1,5) 3.0 -1.0 1.00
5 (1,6) 3.5 -0.5 0.25
6 (1,7) 4.0 0 0
7 (2,3) 2.5 -1.5 2.25
8 (2,4) 3.0 -1.0 1.00
9 (2,5) 3.5 -0.5 0.25
10 (2,6) 4.0 0 0
11 (2,7) 4.5 +0.5 0.25
12 (3,4) 3.5 -0.5 0.25
13 (3,5) 4.0 0 0
14 (3,6) 4.5 +0.5 0.2
15 (3,7) 5.0 +1.0 1.00
16 (4,5) 4.5 +0.5 0.25

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 16
17 (4,6) 5.0 +1.0 1.00
18 (4,7) 5.5 +1.5 2.25
19 (5,6) 5.5 +1.5 2.25
20 (5,7) 6.0 +2.0 4.00
21 (6,7) 6.5 +2.5 6.25
Total 84.0 35.00
Solution: we have Y= 1,2,3,4,5,6,7

∑ ̅
, And

From the table, we have ∑ ̅ and ∑ ̅ ̅

( )
∑ ̅
̅ ̅
( ) ( )

∑ ̅ ̅
̅
( )

Verification: In SRSWOR the variance of sample mean is given by

̅ =

In SRSWR the variance of sample mean is given by

̅ =

Hence,

2.7. Confidence Interval

In practice surveys are conducted only once for one specific objective. In other words, one does
not draw all possible samples to calculate the variance or the standard error of an estimate.
However, if probability-sampling methods are used, the sample estimates and their associated
measures of sampling error can be determined on the basis of a single sample.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 17
Therefore, any specific value or estimate obtained from sample observations may be different
from population parameter. Hence, the estimate from sample could be less or greater or equal to
the population value. Because of this discrepancy an assessment must be made on the accuracy
of the estimate. The question is “How do we reasonably confident that our inference is correct?”
Estimates are often presented in terms of what is called confidence intervals to express precision
in a meaningful way. A confidence interval constitutes a statement on the level of confidence
that the true value for the population lies within a specified range of values.

A 95% confidence interval can be described as follows. If sampling is repeated indefinitely, each
sample will lead to a new confidence interval. Then in 95% of the samples the interval will cover
the true population value. For example, consider a sample mean ̅, which is unbiased estimate of
population mean μy, the confidence interval for μy is μ y = ̅  Sampling error, where the
sampling error depends on the sampling distribution of ̅ . Translating this into a description of
a normal distribution, an approximate 1001 % probability confidence interval for ̅ is:

(̅ ⁄ ̅ ̅ ⁄ ̅ )

Where, μy is an unknown population parameter, 1-  is the confidence level,  is the permissible


level of error or the percentage that one is willing to be wrong and is known as the significance
level. ⁄ is a critical value for the normal distribution, ̅ ⁄ ̅ is the upper confidence
limit, ̅ ⁄ ̅ is the lower confidence limit.

2.8.Estimator of population total

Sometimes, it is also of interest to estimate the population total. E.g. total house hold income,
total expenditure etc. let denotes the population total

∑ ̅ This can be estimated by ̂ ̅̂ ̅

Obviously

̂ ̅ ̅

( )
̂ ̅ {
( )

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 18
Similarly, for the population total (parameter) the confidence limit is given as:

̂ ⁄
̂ ̅ ⁄ ̅ . Since S.E ( ̅ ) is not known we substitute the S.E
( ̅) by the sample standard error, s.e.( ̅) computed from the sample observations.

Estimation for Sub-populations


Sometimes needs arise to estimate population parameters not only for the entire population, but
also for its “subdivision” or “subpopulations” known as domain of study. Such division could be
by residence, age, sex, geographical area, income group, etc. Note that in some cases study
domains may coincide with strata or may differ.

Notation:
N = the number of elements in the population
Nj = the number of elements in the jth domain
nj = the number of sample elements in a SRS of size n that happen to fall in the jth domain.
Yjk are measurements on the kth element in jth domain, for k = 1, 2, - - -, nj for sample and k =1,
2, - - -, Nj for population
The objective is to estimate the subpopulation parameters such as mean, ̅ j , and total, Yj for the
jth domain. These parameters and their estimators are computed as follows.

i) Subpopulation Mean ( ̅ j): The subpopulation mean is defined as and its sample

estimator is given by ̅ .

̅
a. E( ̅ , b. ̅ , where ∑ , where ,
sampling fraction for jth domain.
̅
The sample variance is given by : var( ̅ , ∑ and its

standard error is (̅ ) √ . If is not known use in place of


ii). Sub population total Yj: it is given by ∑ and consider two cases to get its
population estimator ̂ .
Case 1 is when is known; a) ̂ ,

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 19
Case 2 is when is unknown; estimate by a) ̂ . Then the total estimate will

be: a) ̂ ̅ ∑ , b) ̂ , where

( )
(

∑ ∑

The sample estimate is given by a) (̂ ) , if is known.



b) (̂) , if unknown, where and

2.9.Sample size determination

In the planning of a sample survey one of the first consideration is sample size determination.
Since every survey is different, there can be no hard and fast rules for determining sample size.
Generally, the factors, which decide the scale of the survey operations, have to do with cost,
time, operational constraints and the desired precision of the results. Once these points have been
appraised and individually assessed, the investigators are in a better position to decide the size of
the sample.

One of the major considerations in deciding sample size has to do with the level of error that one
deems tolerable and acceptable. We know that measures of sampling error such as standard error
or coefficient of variation are frequently used to indicate the precision of sample estimates. Since
it is desirable to have high levels of precision, it is also desirable to have large sample sizes,
since the larger the sample, the more precise estimates will be. The sample size can be
determined by specifying the precision required for each major finding to be produced from the
survey.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 20
The sample size required under simple random sampling for estimation of population mean y is
as follows. Consider that the sample estimate ̅ differs in absolute value from the true unknown
̅
mean y by no more than d, i.e., an absolute error ̅ ̅ or relative error in which
. Specifying maximum allowable difference between ̅ and , and

Allowing for a small probability  that the error may exceed that difference, choose a sample
size n such that ̅ .

With SRS we can show that, assuming the estimate ̅ has a standard normal distribution, the

sample n must satisfy the relation given bay where and Z is

the reliability coefficient which denote the upper point of standard normal distribution.

If the population size N is very much greater than the required sample size n, the relation above
can be approximated by or . As a first approximation calculate . If
, the sampling fraction is very small, say less than 5%, we may consider as a satisfactory
approximation to the required sample size n. Otherwise calculate using the given formula,
( ⁄ ̅)
. If we use the relative error ̅ , then we get ( )

where is coefficient of variation.

2.10. Relative error

Often we wish to consider not the absolute value of the standard error, but its value in relation to
the magnitude of the statistic (mean, total, etc.) being estimated. For this purpose, One can
express the standard error as a proportion (or a percent) of the value being estimated. This form
is called the relative standard error or coefficient of variation and is denoted by the symbol CV.

Statistical measures such as standard deviation and the standard error appear in the units of
measurement of variables. Such measurement units may cause difficulties in making some
comparisons. Relative measures, such as coefficients of variation, can be used to overcome the
problems. The element coefficient of variation can be expressed as ̅
and estimated by
̂
. For the mean ( ̅ ), the coefficient of variation is given by ̅ ̂
, For the total
̅
̅ ̅
( ̂ ), the coefficient of variation is given by ̂ , which is the same as the
̅ ̅
coefficient of variation of the mean.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 21
Example: A sample survey of retail outlets is to be conducted in a city that contains 2,500
outlets. The objective is to estimate the average retail price of 20 items of a commonly used
food. An estimate is needed that is within 10% of the true of the average retail price in the city.
An SRS will be taken from available list of all outlets. Another survey from the same population
showed an average price of $7.00 for 20 items with a standard deviation of $1.4. Assuming
99.7% confidence internal, determine the sample size.

Solution: N =2500 s = 1.4 s2 = (1.4)2 = 0.1 ̅ =7.00

,
̅

= 36,

Therefore, , which is a good approximation for the sample. But if you calculate for
n, you will get that

Importance and Limitations of simple Random Sampling

Simple random sampling is very important as a basis for development of the theory of sampling.
It serves as a central reference for all other sampling designs. Under simple random sampling
any particular sample of n elements from a population of N elements can be chosen and in
addition, is as likely to be chosen as any other sample. In this sense, it is conceptually the
simplest possible method, and hence it is one against which all other methods can be compared.
However, despite such importance, simple random sampling has the following limitations:

 It can be expensive and often not feasible in practice since it requires that all elements be
identified and labeled prior to the sampling. This prior identification is not possible, and
hence a simple random sample of elements cannot be drawn.
 Since it gives each element in the population an equal chance of being chosen in the
sample, it may result in samples that are spread out over a large geographic area. Such a
geographic distribution of the sample would be very costly to implement.
 It would not be good for those surveys in which interest is focused on subgroups that
comprise a small proportion of the population. For example, it is not likely to be an
efficient design for rare events such as disability and special crops.

Sampling Theory | Chapter Two | Simple Random Sampling |Nigisti G.


Page 22

You might also like