Chapter 4 Sample Size
Chapter 4 Sample Size
Chapter 4 Sample Size
A
survey without proper planning and sampling is the introduction of
intentional error into research analysis and a recipe for inefficient sample
estimates and a disastrous policy.
-Abdi-Khalil Edriss-
Furthermore, two important questions on the designing of any sample survey inquiry are
the total cost of the survey and the precision of the main estimates. Both these are related
to the size of the sample; given the variability of the data, type of sampling and the
method of estimation. Keep in mind that the ultimate goal is, the survey should be
designed to provide estimates with minimum sampling error (meaning, with maximum
precision) when the total cost is fixed; and a sample size that fulfills these conditions is
called the optimal sample size.
~ 91 ~
This guarantees that the sample is normally distributed; however, depending on
the type of research and considering other factors, the actual sample size should
be determined using one of the formulae being discussed consequently.
For sample size estimation, we must initially obtain or estimate some basic figures. If the
sample size is too small the desired precision might not be achieved; if the sample size is
too large unnecessary costs may be incurred. Here are some optimal sample size
estimation methods.
z 2 (1 p) p
n
e2
1.962 (1 p) p
n
e2
where z is the z-value yielding the desired degree of confidence, p is an estimate of the
population proportion, and e is the absolute size of the error in estimating p that the
researcher is willing to permit.
POINTS TO PONDER
If no previous estimate of p is available, using p = 0.5 in the sample size formula will
yield the maximum size n required for any given e and z. Any other values of p will yield
a smaller n. Note that statistical proofs show that the value of p(1-p) increases as p
approaches o.5, therefore a safer estimate of n is obtained with the value of p nearer to
0.5.
~ 92 ~
NUMERICAL EXAMPLE
Economics & Business: To calculate a good representative sample size for the micro-
enterprise survey, the study focused on micro-enterprises supported by special five rural
banking groups which comprises about 92% (p=0.92) of the total micro-enterprises
supported by both rural banking groups and streamline commercial banks in the area.
Hence, for 95% (Z=1.96) level of confidence, within 5% (e=0.05) margin of error, and
taking into account the proportion of micro-enterprises supported by rural banking groups
only, the sample size, n, was obtained as follows –
(1.96) 2 (1 0.92)(0.92)
n 113
(0.05) 2
and adding 5% for a possibility of non-respondents, the sample size is 119 (113 + 113 x
0.05 = 119) businesswomen.
Suppose we plan to collect data on households who had access to rural credit and on
those who had no access to rural credit, how do you determine a representative and
adequate sample that does not inject biasedness and inefficiency in the sample
characteristics or estimates?
For example, data set is stratified into two groups, those who benefit from a particular
project and those who do not benefit; adopt a technology, or not adopt a technology; have
access to credit or do not have access to credit, etc. If it is intended to compare two
groups or strata, and to determine the sample size for the survey, the optimal sample size
for each group should be equal or at least 60:40 ratios (beneficiary versus non-
beneficiaries).
NUMERICAL EXAMPLE
Health: Estimate the sample size required to determine the prevalence of iron deficiency
in women (Data: Courtesy of World Vision International – Malawi).
Step 1
Guess/anticipate the proportion we are about to measure. If we expect to find 45
anemic pregnant women out of every 100 pregnant women, our anticipated
proportion will be 0.45. The normal level of confidence on estimates is set at ±
5%. However, for sub-national estimates we may be satisfied with ±10%.
Calculations
p = previously known prevalence = 0.45
e = % error within = ± 5%. For confidence of 95%, Z=1.96 (2-tailed test)
Therefore,
~ 93 ~
(1.96) 2 (1 0.45)(0.45)
Sample size, n 380.3
(0.05) 2
Step 2
Inflate to account for non-responders by 10%, that is,
Step 3
It very difficult to predict what the design effect will be before carrying out the
study. After the survey is finished, the design effect can be calculated more
accurately and can be used to calculate the actual margin of error.
A design effect of 2 is usually used for most variables (unless the literature
suggests otherwise). Hence,
2 x 418 = 836
Step 4
Now, how do we estimate the number of households we must visit? To estimate
the number of households that are required for the survey, we must know first the
average household size (6 persons per household is the national estimate for
Malawi) and the proportion of women of childbearing age in the population.
If
o The average household size is 6 persons
o 5% of the population is made up of pregnant women
o 836 women should be screened
~ 94 ~
POINTS TO PONDER
We can reduce the sample size by lowering the level of confidence to 90%, then Z-score
=1.64. This tells us that the smaller the confidence interval, the smaller the sample size is
needed for the actual survey.
The following formula deals with sample size determination when two proportions are
given on the same indicator or variable –
[v 2 p 0 (1 p 0 ) u p1 (1 p1 ) p 2 (1 p 2 )]2
n
d2
p1 p2
Where, p 0 , u is one-tailed test with Z-value of a normal
2
distribution corresponding to power1 80% and v is one tailed with Z-value
of a normal distribution corresponding to 95% confidence level2.
NUMERICAL EXAMPLE
Economics & Health: Estimate the sample size required to determine whether the
prevalence of anemia in pregnant women has decreased within the previous 12-month
intervention period.
Step 1
Calculations
Previously, 45% of pregnant women were iron deficient. After 12 months of
intervention, the expected decrease in prevalence of anemia would be 10%
resulting in a predicted new prevalence of 35%. Calculate the sample size
required to demonstrate the difference between the proportions.
1
The probability of making a Type II error, denoted by , is due to a decision to accept a false null
hypothesis. The complement (1- ) of the probability of Type II error measures the probability of rejecting
the false null hypothesis, and it is known as the power of a statistical test.
2
Similarly, the probability of making Type I error, denoted by , is rejecting a true null hypothesis, and
referred as the level of significance. The complement (1 - ) of the probability of Type I error measures the
probability of not rejecting a true null hypothesis, and known as confidence level.
~ 95 ~
The difference, d = 0.45 - 0.35 = 0.1
p1 p2 0.45 0.35
Now, p0 0.4 and then the sample size,
2 2
[v 2 p 0 (1 p 0 ) u p1 (1 p1 ) p 2 (1 p 2 )]2
n
d2
Step 2
Inflate to account for non-respondents by 10%, that is,
Step 3
Choose an appropriate design effect
Again, use a design effect of 2 for anemia variable to adjust the sample size as
follows.
2 x 350.9 = 701.8
Therefore to detect a true difference of 10% (i.e., a reduction from 45% to 35%)
with a confidence level of 95%, a survey would require 702 pregnant women.
Step 4
Estimate the number of households that must be visited
n = [702/(6x0.05)] = 2340
~ 96 ~
Note that these calculations must be repeated for each of the indicators being measured
in the survey. Otherwise, we should pick an indicator with higher sample size so that it
can take care of the other indicators.
NUMERICAL EXAMPLE
Economics & Agribusiness: Since the various Malawi Social Action Fund (MASAF)
projects are implemented in 27 districts of Malawi in late 1990s, using two-stage
sampling methods, MASAF enumerated areas (EAs) within given district and households
within the selected MASAF EAs will be randomly chosen to have representative sample
size of households for the survey.
The sample (or, the number of households sampled, P) per EA will be determined using
the following formula.
P1i a Mi / Mi c/a
P2i bi / Li
Where
a is the number of EAs to be selected in each of the district
c is the number of EAs to be selected in each of the district sample in the 2004
Malawi Demographic and Health Survey (MDHS)
Mi is the number of households in the ith EA in each district according to the 1998
population cen3Us,
Mi is the total number of households in each of the district according to the 1998
population census,
bi is the household sampled selected in each EA, and
Li is the total number of households listed in the selected ith EA during the 2004
MDHS listing operation
Before the final household selection, a complete household listing operation would be
completed, if not readily available, for each selected EA. However, if listings of
household are available from NSO, the selected households will be verified if matching
will be possible and create a panel of households. This will help to effectively evaluate
the impacts of MASAF 3 APL 1.
~ 97 ~
For example,
Estimate the sample size required to determine whether ‘poor households receiving daily
transfer of US$0.3” has reduced the number of people living on less than US$1 per day
since 2003 MASAF 3 inception.
NUMERICAL EXAMPLE
Economics: Estimate the sample size required to determine whether ‘poor households
receiving daily transfer of US$0.3 has reduced the number of people living on less than
US$1 per day from 2003 to 2007.
Step 1
Calculations
Previously, 55% of the Malawi Population lives below US$1 per day. After 3-4
years of intervention, the expected decrease in number of poor households would
be 10% resulting in a predicted new rate of 45% poverty level. Calculate the
sample size required to demonstrate the difference between the proportions.
p1 p2 0.55 0.45
Now, p0 0.5 and then the sample size,
2 2
[v 2 p 0 (1 p 0 ) u p1 (1 p1 ) p 2 (1 p 2 )]2
n
d2
= 3.341/0.01 = 334.2
Step 2
Inflate to account for non-respondents by 10% (sub-national level), that is,
~ 98 ~
Step 3
Choose an appropriate design effect - again, using a design effect of 2 for most
indicators/variables
2x367.4 = 734.8
Therefore, to detect a true difference of 10% (i.e., a reduction from 55% to 45%)
with a confidence level of 95%, a survey would require, on average, 735
households per district3.
Step 4
Taking into account the proportion of MASAF 3 beneficiaries throughout the 27
districts, the estimated number of households that must be visited for the survey in
the 27 districts is -
Note that these calculations must be repeated for each of the indicators being
measured in the survey.
Note that due to the diversity of the indicators to be studied in the survey, sample
size will vary for different indicators. Therefore, the variable with the highest
sample size will be taken as the sample size for all other variables. Although, this
will increase the number of household for other indicators, it is statistically better
to have a large representative sample size than a reduced one, and also it is easier
to administer the whole questionnaire at every selected household during the
survey.
IMPORTANT NOTE - Knowing the number of beneficiaries, which were 6, 841, 055,
from MASAF in the 27 districts, this sample size sounds appropriate in performing
the analysis on the impacts of the projects using panel data as recommended by
the World Bank and MASAF team. It is sound and correct sample size as it is close
to the sample size4 (n=14, 000) used in MDHS 2000 in which this survey intends to
match the households to create the panel data, which will enable us to obtain real
impact of MASAF.
3
Proportional probability sampling methods is applied among the districts as the population size varies.
4
Refer to MDHS 2000 sample design technique, Appendix A, page 197.
~ 99 ~
IV - Sample Size with Variance – Formula C
(1.96) 2 2
n
e2
The value for frequently must be estimated, if not use sample variance. A rule of thumb
to estimate if no similar studies have been done is that is approximately equal to 1/6
of the difference between the highest and lowest value in the population.
NUMERICAL EXAMPLE
Economics & Business: A small agricultural credit institution wanted to estimate the
sample size from 1, 200, 000 of its clients. It has information that the standard deviation
of the sample mean is 5 Malawi Kwacha (MK) or the maximum loan given was MK2000
and the minimum was MK500. With allowable error of 10%, estimate the sample size, n,
that is required at the moment.
Solution
(1.96) 2 2
(1.962 )(5) 2
n 9604clients.
e2 0.12
NUMERICAL EXAMPLE
Economics: Past experience indicates that the standard deviation of the amount of maize
consumed per month by households in a certain village is 50 bags. How large sample
must be taken for the estimate of the true mean consumption to have a 95% probability
within 5 bags of maize of the true mean?
Solution
(1.96) 2 (50) 2
n 384 households
(5) 2
NUMERICAL EXAMPLE
Life Sciences: A nutritionist is interested in the effectiveness of fortified food in the
villages, and pre-study record showed there was a monthly consumption of 2 kg among
children in a household throughout the country. How large sample must be taken for the
~ 100 ~
estimate of the true mean consumption to have a 99% probability within 250 grams of
fortified food of the true mean?
Solution
(1.96) 2 2
(1.962 )(0.25) 2
n 2401 households
e2 0.012
4.7. When proportions and population size are known, how do you
determine a sample size?
The following formula can be employed to calculate the sample size (Kothari, 2004)
when population size and population proportion of major interest are available.
z2 p.q.N
n
e2 N 1 z 2 . p.q
NUMERICAL EXAMPLE
Economics & Agribusiness: The above formula was employed to come up with an
appropriate sample size of Cotton Smallholder farmers in Malawi (2009).
In the Malawi case; Number of cotton farmers, N = 156,023, p = 0.5, q= 0.5, Z = 1.65 at
= 0.1, and e = 0.08, therefore,
1.652 (0.5)(0.5)156023
n 106.3
0.08 156023 1 1.652 (0.5)(0.5)
2
This results to a sample of 106 cotton farmers. There was an additional 10% to carter for
non-response and spoilt questionnaires. Thus, a total number of 117 cotton farmers were
randomly sampled for the interview.
~ 101 ~
VI - Sample size for Different Strata – Formula E
Given a total sample size n, its allocation or distribution to the different strata or groups
would be based on the following principles: (1) a specified total cost of surveying the
sample, (2) with minimum sampling variance of the strata and/or, (3) proportional
allocation (or PPS).
m
C c0 ni ci
i 1
Where c0 is the overhead cost, and ci the average cost of taking a sample unit in
the ith stratum, which may vary from stratum to stratum, depending on field
condition (rough roads, distance, mountains, crossing valleys and rivers, weather,
etc.)
If the cost per unit ci is assumed to be the same, c, in all the strata, then the
previous function becomes
C c0 nc
Therefore, given the total cost, the total sample size can be determined as
n (C c0 ) / c
For continues data, if σ2 is the desired variance of a sample estimator, y, then the
required sample size n for a specified variance becomes,
m 2
N i si
i 1
n
sss2
~ 102 ~
Also sample size for each stratum or group, given at a fixed cost, is -
nNi i
ni
Ni i
This optimum allocation is known as the Neyman allocation.
Note that the population standard deviation value σi will not known usually, and
estimates such as si (sample standard deviation estimates) would have to be obtained
from previous study or reconnaissance/pilot survey relating to the desired variable.
However, if such information is lacking on the sample standard deviation, then the
alternative is to use the range of the variable and determine the sample standard
deviation and use it in the formula.
2
(3) Proportional allocation – if sss is the desired stratified variance of the proportion p,
then the required sample size n for the Neyman allocation is –
m
N i ( pi qi )
i 1
n
N 2 sss2
m
n ( N i pi qi ) / Ns ss2
i 1
2
Now, in absence of any information on the stratified sample variance, si , for each
stratum, and if it can be assumed to be the same in all the strata, the Neyman allocation
takes the simple form –
nNi
ni
N
Where ni is proportional to Ni, or that the sample is allocated to the different strata in
proportion to the number of sub-population units Ni.
~ 103 ~
NUMERICAL EXAMPLE
Economics and Business: The data on the number of women who received rural credit
from a certain credit institution in certain villages of Malawi for the years 2005 and 2010
is given in following Table 6.1, in the 5 strata according to the total amount of loan they
had received, along with the present number of households in villages.
Using the data in Table 6.1, determine the allocations of the sample in the different strata
according to the following principles; (i) Neyman allocation, (ii) proportional allocation,
and (iii) allocation proportional to total number of women in different strata.
Now, after calculating the sample size, n=380, the computation to different strata of the
sample 380 households is shown in Table 6.2.
~ 104 ~
VII - Sample size and a model – Formula F
The bigger controversy arises when determining adequate sample size to run a regression
model (refer to Chapter 7 for details on regression models). For example, if one considers
that n=40 is sufficient to run ordinary statistical tests, then does it mean this sample size
is adequate to run a model that has 5 or more independent variables? The answer is NO;
So far we have discussed sample size in the context of precision and confidence with
respect to one variable only. In research, however, the theoretical framework has several
variables of interest, and the question arises how one should come up with a sample size
when all the factors are taken into account. Krejcie and Morgan (1970) greatly simplified
the sample size decision by providing a table that ensures a good decision model. Table
4.3 provides the generalized scientific guideline for sample size decisions. The interested
student is advised to read Krejcie and Morgan (1970), as well as, Cohen (1969) for
decisions on sample size5.
Furthermore, Roscoe (1975) proposes the following rules of thumb for determining
sample size:
Sample sizes larger than 30 and less than 500 are appropriate for most research.
Where samples are to be broken into sub-samples (male-headed/female-headed
household, urban/rural area, etc.), a minimum sample size of 30 for each category is
necessary.
5
As precautions, note that this table suggests that a specific value of sample size, n, is always appropriate
for a given population size, N, ignoring some statistical parameters such as , , , , , etc. used in model
estimation. Hence, the suggested sample sizes should be used with caution for simple surveys and statistical
parameter estimates.
~ 105 ~
Table 4.3: Sample Size (n) for a given Population Size (N)
N n N n N n
10 10 220 140 1200 291
15 14 230 144 1300 297
20 19 240 148 1400 302
25 24 250 152 1500 306
30 28 260 155 1600 310
35 32 270 159 1700 313
40 36 280 162 1800 317
45 40 290 165 1900 320
50 44 300 169 2000 322
55 48 320 175 2200 327
60 52 340 181 2400 331
65 56 360 186 2600 335
70 59 380 191 2800 338
75 63 400 196 3000 341
80 66 420 201 3500 346
85 70 440 205 4000 351
90 73 460 210 4500 354
95 76 480 214 5000 357
100 80 500 217 6000 361
110 86 550 226 7000 364
120 92 600 234 8000 367
130 97 650 242 9000 368
140 103 700 248 10000 370
150 108 750 254 15000 375
160 113 800 260 20000 377
170 118 850 265 30000 379
180 123 900 269 40000 380
190 127 950 274 50000 381
200 132 1000 278 75000 382
210 136 1100 285 100000 384
POINTS TO PONDER
In sum, the sample size, n, is a function of: (1) the variability in the population, (2) precision or
accuracy needed, (3) confidence level desired, (4) type of sampling plan used (for example,
simple random sampling versus stratified random sampling), and (5) the number of
independent variables in a model. Note that these are not considered in Table 6.3, and hence
appropriate sample size should be estimated when conducting a survey using the various
formulae given previously.
~ 106 ~
IX - Determination of a Population size – Formula H
Important Formulas
s
pestimated
n
t nt
N estimated
Pestimated s
t 2 n(n s)
Vestimated
s3
~ 107 ~
t 2 n( n s )
2 Vestimated 2
s3
NUMERICAL EXAMPLE
Economics & Business: Suppose an officer from Wildlife Malawi is concerned about the
apparent decline in the number of mountain antelopes in Nyka park. Estimates of the
population size are available from previous years. For determination of whether or not
there has been a decline, first a random sample of 100 antelopes is caught (t=100), tagged
and then released. A month later a second sample of 50 is taken (n=50), and twenty
antelopes are recaptured in the second sample (s=20). Estimate the population size, N.
(Assume that tagging does not affect the likelihood of recapture).
Solution
Using the equations given in method 1, we have
nt 50(100)
N estimated 250
s 20
Thus, the officer estimates the total number of mountain antelopes is 250, with a bound
error of estimation of approximately 87 mountain antelopes. Note that we might be
concerned about the high bound of error. This could have been improved if we had a
larger sample size.
~ 108 ~
Important Formulas
Estimation of N (note that t = initial sample, n is second sample and s is the
number of recaptured samples within n) is -
nt
N estimated
s
t 2 n( n s )
Vestimated
s 2 ( s 1)
t 2 n( n s )
2 Vestimated 2
s 2 ( s 1)
NUMERICAL EXAMPLE
Economics: Authorities in Liwonde National Park are interested in the total number of
birds of a particular species that inhabit the park. A random sample of t=200 birds is
trapped, tagged and then released. In the same month a second sample is drawn until 30-
tagged birds are recaptured (s = 30). In total, 100 birds are recaptured in order to find 30
tagged one (n = 100). Estimate N, and place a bound on the error of estimation.
Solution
Using formulas in method 2, we estimate N by
nt
N estimated = 100 (200)/30 = 666.67
s
Hence, we estimate 667 birds of particular species inhabit Liwonde National Park. We are
quite confident that our estimate is within approximately 203 birds of the true population
size.
~ 109 ~
Method 3 – Quadrat
The third technique involves estimating population density and size from quadrat
(plot, volumes or intervals of time samples). That is, estimation of the number of
elements in a defined area or volume can be accomplished by first estimating the
number of elements per unit area (that is, the density of the elements) and then
multiplying the estimate density by the size of the area under study.
It seems that there is nothing new here. However, it is often the case that the
elements being counted (diseased trees, bacteria colonies, traffic accidents, etc)
are themselves randomly distributed over area, volume, or time.
Important Formulas
Thus, under the assumption of randomly dispersed elements (assuming nj to have
Poisson distribution), we have the following estimator of and M.
maverage
estimated
each area
Estimated variance of is
estimated
Vestimated
q each area
estimated
2 V 2
q each area
~ 110 ~
Estimator of the total M is
Mestimated = A estimated
Estimated variance of M is
A2 estimated
VM
q each area
estimated
2 VM 2A
q each area
NUMERICAL EXAMPLE
Economics: Department of Forestry is investigating the density of trees having fusiform
rust on a Northern tree plantation of 500 acres. The density is to be estimated from a
sample of q=20 quadrats, where each quadrat is =0.5 acre. The 20 sample plots had an
average m = 2.0 infected trees per quadrat.
i. Estimate the density of infected trees, and place a bound on the error estimation.
ii. Estimate the total number of infected trees in the 500-acre plantation, and place a
bound on the error of estimation.
Solution
(i) Using equation in method 3 with = 0.5, we determine the estimated density as
m 2
4 trees per acre
0.5
4
2 2 0.63
q (0.5)(20)
Thus, we estimate the density as 4.00 0.63, or from 3.37 to 4.63 infected trees per
acre.
~ 111 ~
(ii) Calculation for total number of trees infected in the 500-acre area is
estimated 4
2 VM 2A 2(500) 632.45
q each area (0.5)(20)
Thus, we estimate the total number of infected trees as 2000 633, or 1368 to 2633 in the
500-acre area of Northern Plantation.
To explain the notion of stocked quadrats, let y denote the number of sampled
quadrats that are not stocked for a sample of q quadrats, each of area , and
from a population of area A. Now, under the assumption of randomness of
elements, the proportion of unstocked quadrats in the population is
approximately e- . We know that from our previous discussions the sample
proportion of unstocked quadrats is a good estimator of the population
proportion. Thus y/q is an estimator of e- , and this result leads to the
following estimators of and M.
Important Formulas
Estimator of the density is
1 y
estimator ln , where ln denotes natural logarithm
q
Estimated variance of is -
~ 112 ~
1
V 2
(e 1)
q
1
2 V 2 2
(e 1)
q
Mestimated = A
Estimator variance of M is -
A2
VM A 2V (e 1)
q 2
1
2 VM 2A 2
(e 1)
q
NUMERICAL EXAMPLE
Economics: Recall the previous problem statement of the 500-acre Northern Plantation.
Now for estimation of the density of trees infected by fusiform rust, q=30 quadrats and
= 0.5 acre each will be sampled, but only the presence or absence of infected trees will be
noted for each sampled quadrat, rather than counting the number of trees which is
cumbersome sometimes. Suppose y = 6 of the 30 quadrats show no signs of fusiform rust.
Estimate the density and number of infected trees, placing bounds on the error of
estimation in both cases.
Solution
Using the formulas in method 4, the density is estimated by
ˆ 1 y 1 6
estimator ln ln( ) ( 2)( 1.609) 3.2 trees per acree
q 0.5 30
~ 113 ~
1 1
2 V 2 2
(e 1) 2 (e 3.2( 0.5) 1) 1.519
q 30(0.5) 2
We then estimate the density as 3.2 1.5, or 1.7 to 4.7 infected trees per acre.
1 1
2 VM 2A 2
(e 1) 2(500) 2
(e 3.2( 0.5) 1) 572.02
q 30(0.5)
Now, our estimate of the total number of infected trees is 1600 572, or 1028 to
2172 in the 500-acre Northern Plantation.
aj
p̂ pestimated
mj
2
N n (a j pˆ m j )
Var ( pˆ ) Var ( p estimated ) 2
( )
Nnmmean n 1
~ 114 ~
Bound of the error of estimation:
2 var( pˆ )
NUMERICAL EXAMPLE
Life Sciences: Of the total 415 village residents, a sample 25 residents were asked
whether they have sanitation facilities or not. The data set is given as follows in Table 6.4
1 8 4 14 10 5
2 12 7 15 9 4
3 4 1 16 3 1
4 5 3 17 6 4
5 6 3 18 5 2
6 6 4 19 5 3
7 7 4 20 4 1
8 5 2 21 6 3
9 8 3 22 8 3
10 3 2 23 7 4
11 2 1 24 3 0
12 6 3 25 8 3
13 5 2
mj = 151 aj = 72
Solution
The best estimate of the population proportion of households with sanitation facilities is
p̂ or p estimated
~ 115 ~
aj 72
pˆ pestimated 0.48 48%
mj 151
(a j m j pˆ ) 2 a 2j 2 pˆ ajmj pˆ 2 m 2j
and
mj 151
mmean 6.04
n 25
2
N n (a j pˆ m j ) 415 25 12.729
Var ( pˆ ) 2
( 2
( ) 0.00055
Nnmmean n 1 415(25)(6.04) 24
Thus, the best estimate of the proportion of people who have sanitation facilities is 0.48
or 48%. The error of estimation should be less than 5% with probability of approximately
95%.
~ 116 ~
===============================================================
MENTAL GYMNASTICS
CHAPTER FOUR
===============================================================
3. Given a total sample size n, its allocation or distribution to the different strata or
groups would be based on mainly three principles. State and explain these principles.
5. The data on the number of food secured households whose income ranges from zero
to above 25, 000 Malawi Kwacha for the years 2007 and 2010 is given in following
Table, in the 6 strata according to the total amount of on- and off-farm incomes along
with the present number of households in project areas of some districts.
~ 117 ~
Using the data in the previous Table, determine the allocations of the sample in the
different strata according to the following principles; (i) Neyman allocation, (ii)
proportional allocation, and (iii) allocation proportional to total number of
households in different strata.
~ 118 ~