Chapter Five Sampling Design: Infinite
Chapter Five Sampling Design: Infinite
SAMPLING DESIGN
5.1. CENSUS AND SAMPLE SURVEY
Undertaking a census survey, most of the time, is not possible. Sometimes it is possible to obtain
sufficiently accurate results by studying only a part of total population, technically called
samples. The process of selecting samples is called sampling technique. In sampling, however,
the samples selected should be as representative of the total population as possible in order to
produce a miniature cross-section. A researcher must prepare a sample design for his study i.e. he
must plan how a sample should be selected and of what size such a sample would be. Sample
design is determined before data are collected.
While developing a sample design, the researcher must pay attention to the following points.
Type of universe: The first step in developing any sample design is to define the universe. The
universe can be finite or infinite. In finite universe, the number of items is certain. Examples can be
the population of a city, the number of workers in a factory, etc. But in case of an infinite
universe, the number of items is infinite. Examples for an infinite universe can be listeners of a
specific radio program, number of stars in the sky, etc.
• Sampling Unit: A decision has to be taken concerning a sampling unit before selecting
sample. Sampling unit may be a geographical one such as district, kebele, village, etc., or
a social unit such as family, school, etc., or it may be an individual.
• Source list: It is also known as sampling frame from which sample is to be drawn. It
contains the names of all items of a universe (for finite universe). A source list should be
comprehensive, correct, reliable and appropriate. It is extremely important for the source
list to be as representative of the population as possible.
• Size of sample: This refers to the number of items to be selected from the universe to
constitute a sample. The size of sample should neither be excessively large, nor too small.
It should be optimum. In order to decide on the size of the sample to be selected, a
researcher must take in to consideration the size of population variance, the size of
population, the parameter of interest in the research study, and budgetary constraint.
• Parameters of interest: In determining the sample design, one must consider the
question of the specific population parameters which are of interest.
• Budgetary constraint: Cost considerations, from practical point of view, have a major
impact upon decisions relating to not only the size of the sample but also to the type of
sample. This fact can even lead to non-probability samples.
• Sampling procedure: Finally, the researcher must decide the type of sample he will use.
He must decide about the technique to be used in selecting the items for the sample.
There are several sample designs out of which the researcher must choose one for his
study. Obviously, he must select that design which, for a given sample size and for a
given cost, has a smaller sampling error.
In this context one must remember that two costs are involved in a sampling analysis-the cost of
collecting the data and the cost of an incorrect inference resulting from the data. There are two
causes of incorrect inferences namely systematic bias and sampling error.
1. Systematic bias: Systematic bias results from errors in the sampling procedures, and it can
not be reduced or eliminated by increasing the sample size. However, the causes responsible for
these errors can be detected and corrected. Bias enters in when a sample fails to represent the
population it was intended to represent. Usually a systematic bias is the result of one or more of
the following factors:
i) Inappropriate sampling frame: If the sampling frame is inappropriate i.e., a biased
representation of the universe, it will result in a systematic bias.
ii) Defective measuring device: In survey work, systematic bias can result if the questionnaire or
the interviewer is biased. Similarly, if the physical measuring device is defective there will be
systematic bias in the data collected through such a measuring device.
iii) Non-respondents: If we are unable to sample all the individuals initially included in the
sample, there may arise a systematic bias.
iv) Indeterminacy principle: Sometimes we find that individuals act differently when kept under
observation than what they do when kept in non-observed situations. For instance, if workers are
aware that somebody is observing them in course of a work study on the basis of which the
average length of time to complete a task will be determined and accordingly the quota will be
set for piece work, they generally tend to work slowly in comparison to the speed with which
they work if kept unobserved.
v) Natural bias in the reporting of data: This also leads to a systematic bias. There is usually a
downward bias in the income data collected by government taxation department, whereas we
find an upward bias in the income data collected by some social organization.
2. Sampling errors: Sampling errors are the random variations in the sample estimates around
the true population parameters. Since they occur randomly and are equally likely to be in either
direction, their nature happens to be of compensatory type and the expected value of such errors
happens to be equal to zero. Sampling error decreases with the increase in the size of the sample,
and it happens to be of a smaller magnitude in case of homogeneous population.
Sampling error can be measured for a given sample design and size. The measurement of
sampling error is usually called the ‘precision of the sampling plan’. If we increase the sample
size, the precision can be improved. But increasing the size of the sample has its own limitations.
A large sized sample increases the cost of collecting data and also enhances the systematic bias.
Thus the effective way to increase precision is usually to select a better sampling design which
has a smaller sampling error for a given sample size at a given cost. In brief, while selecting a
sampling procedure, researcher must ensure that the procedure causes a relatively small sampling
error and helps to control the systematic bias in a better way.
Sample designs are basically of two types-Probability sampling and non-probability sampling.
Probability sampling is based on the concept of random selection, whereas non-probability
sampling is non-random selection.
With regard to how to take a random sample, we can use a lottery method. To do this, we first
write the name of each element of a finite population on a slip of paper, and then put the slips of
paper in to a box and mix them thoroughly and then draw the required number of slips for the
sample one after the other without replacement. Using the above example where we want to
draw two from 4 elements, the probability of drawing any one element for our sample in the first
draw is 2/4, the probability of drawing one more element in the second draw is 1/3 ( the first
element drawn is not replaced). Since these draws are independent, the joint probability of the
two elements which constitute our sample is the product of their individual probabilities and this
works out to 2/4 X 1/3 =1/6. This verifies our earlier calculation.
Another method of selecting random samples is to use random number tables. Random numbers
are often tabulated on some standard books. Sampling is then just by picking numbers at random
from the table until the required sample size is obtained.
(i) Systematic sampling: In some instances, the most practical way of sampling is to select
every ith item on a list. An element of randomness is introduced into this kind of sampling by
using random numbers to pick up the unit with which to start. The following steps will help us to
make systematic sampling:
- Assign a sequence number to each member of the population.
- Determine the skip interval by dividing the number of units in the population by the sample
size. I=P/S where I is skip interval, P is population size, and S is sample size.
- Select a starting point in a random digit table (it must be between 1 and I).
- include that item in a sample and select every ith item thereafter until total sample has been
selected.
For example, if we want to take 20 samples from a population of 100 members, our skip interval
is 5 (i.e 100/20). Our starting point must be selected randomly from the interval 1 to 5. Then
every fifth item will be our sample. If our starting point is 2, then our sample must include
members with sequence numbers of 2, 7, 12, 17, 22, 25, …, 97.
The advantage of this sampling technique is the samples will spread evenly over the entire
population. It is also an easier and less costly method of sampling and can be conveniently used
even in case of large populations. However, if there is a hidden periodicity in the population,
systematic sampling will prove to be an inefficient method of sampling.
(ii) Stratified sampling: If a population from which a sample is to be drawn does not constitute
a homogeneous group, stratified sampling is generally applied. Under stratified sampling, the
population is divided into several subpopulations (strata) that are individually more
homogeneous than the total population and then we select items from each stratum to constitute a
sample. Stratified sampling results in a more reliable and detailed information. The basic steps
for stratified sampling are:
• Divide the population to be surveyed in to strata of similar study units or into areas with
which similar social, environmental, or economic conditions exist.
• Make a separate and complete list of the stratum and from each stratum draw a separate
random sample of study units using these lists.
• A similar survey is then done on the sample of study units in each of the strata i.e. the
same questionnaire is used.
The main advantages of stratified sampling are: (i) more reliable information is obtained for the
same sample size if the population is stratified than they are for the population as a whole. (ii)
Comparisons between strata are easy. This is so because a separate but similar survey is done in
each stratum.
The following questions are highly relevant in the context of stratified sampling:
How to form strata?
The strata are formed on the basis of common characteristic(s) of the items to be put in each
stratum. This means that various strata be formed in such a way as to ensure elements being most
homogeneous within each stratum and most heterogeneous between the different strata. Strata
are purposively formed usually based on past experience and personal judgment of the
researcher. At times, pilot study may be conducted for determining a more appropriate and
efficient stratification plan.
How should items be selected from each stratum?
For selection of items from each stratum, we may use simple random sampling. Systematic
sampling can also be used if it is considered more appropriate in certain situations.
How many items be selected from each stratum or how to allocate the sample size to each
stratum?
We usually follow the method of proportional allocation under which the sizes of the samples
from the different strata are kept proportional to the sizes of the strata. That is, if P i represents the
proportion of population included in stratum i, and n represents the total sample size, the number
of elements selected from stratum i is n.Pi.
For example, Suppose we want to take a sample of size n=30 to be drawn from a population of
size N=800 which is divided into three strata of size N 1=400, N2=240, and N3=160. The sample
size for stratum with N1=400 is n1=30(400/800)=15.
The sample size for stratum with N2=240 is n2=30(240/800)=9.
The sample size for stratum with N3=160 is n3=30(160/800)=6.
In cases where strata differ not only in size but also in variability and is considered reasonable to
take larger samples from the more variable strata and smaller samples from the less variable
strata, we can then account for both (differences in stratum size and differences in stratum
variability) by using disproportionate sampling design by using the formula:
Where denote the standard deviations for the ith stratum, Ni denote the size of the ith stratum,
and ni denote the sample size of the ith stratum.
For example, assume a population is divided into three strata so that N1=5000, N2=2000, and
N3=3000.Respective standard deviations are How should a sample of
size n=84 be allocated to the three strata , if we want optimum allocation using disproportionate
sampling design?
The solution is .
(iii) Cluster sampling: If the total area of interest happens to be a big one, a convenient way in
which a sample can be taken is to divide the area in to a number of smaller non-overlapping
areas and then to randomly select a number of these smaller areas (clusters), with the ultimate
sample consisting of all units in these small areas or clusters.
In cluster sampling, the total population is divided into a number of relatively small subdivisions
which are themselves clusters of still smaller units and then some of these clusters are randomly
selected for inclusion in the overall sample. Cluster sampling reduces cost by concentrating
surveys in selected clusters. But certainly it is less precise than simple random sampling.
Differences between stratified sampling and cluster sampling
If instead of taking a census of all health stations within the selected districts, we randomly
sample health stations from each selected districts, then it is a case of a four stage sampling plan.
If we select randomly at all stages, we will have what is known as multi-stage random sampling
design.
Sampling with probability proportional to the cluster size: In case the cluster sampling units
do not have the same number or approximately the same number of elements, it is considered
appropriate to use a random selection process where the probability of each cluster being
included in the sample is proportional to the size of the cluster. For this purpose, we have to list
the number of elements in each cluster irrespective of the method of ordering the cluster. Then
we must sample systematically the appropriate number of elements from the cumulative totals.
For illustration, consider the following are the number of departmental stores in 10 cities: 35, 17,
10, 32, 80, 18, 26, 19, 26, and 57. If we want to select a sample of 8 stores, using cities as
clusters and selecting within clusters proportional to size, how many stores from each city should
be chosen? (Use a starting point of 8)
Let us put the information as follows:
City number Number of departmental stores Cumulative total Sample
1 35 35 8
2 17 52 48
3 10 62
4 32 94 88
5 80 174 128, 168
6 18 192
7 26 218 208
8 19 237
9 26 263 248
10 57 320 288
Since in the given problem, we have 320 departmental stores from which we have to select a
sample of 8 stores, the appropriate sampling interval is 40. As we have to use the starting point of
8, so we add successively increments of 40 till 8 numbers have been selected. The numbers, thus,
obtained are: 8, 48, 88, 128, 168, 208, 248, and 288. Two stores should be selected randomly
from city number 5 and one each from city number 1, 2, 4, 7, 9, and 10.
Cluster Clusters of members All members of a selected clusters are included in the sample
sampling selected from the larger Not all clusters are included
population of clusters Disadv. –often lower statistical efficiency (more error) due to
subgroups being homogeneous rather than heterogeneous
While useful for many studies, non-probability sampling procedures provide only a weak basis
for generalization. In reality, the conclusions drawn from a study of a non-probability sample are
limited to that sample and cannot be used for further generalization. In this type of sampling,
items for the sample are selected deliberately by the researcher; his choice concerning the items
remains supreme. For instance, if economic conditions of people living in a state are to be
studied, a few towns and villages may be purposively selected for intensive study on the
principle that they can be representative of the entire state. Thus the judgment of the organizers
of the study plays an important part in this sampling design.
In such a design, personal element has a great chance of entering into the selection of the sample.
Sampling error in this type of sampling cannot be estimated and the element of bias, great or
small, is always there. Some of the major non-probability sampling techniques are the following:
a) Judgment (purposive) sampling - The researcher uses his judgment to select people that he
feels are representative of the population to have a particular expertise or knowledge which
makes them suitable.
d) Referral Sampling: this is a non-probability sampling technique which utilizes some form of
referral, wherein respondents who are initially contacted are asked to supply the names and
addresses of members of the target population.