Nothing Special   »   [go: up one dir, main page]

Sampling Issues in Research

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Sampling Methods in

Epidemiologic Research

1
Sampling
1.Introduction and concept in sampling
- Sampling is an important issue in research and day to day life
Census vs sample
• In a census, every animal in the population is evaluated.
• In a sample, data are only collected from a sub-set of the
population.
• Taking measurements or collecting data on a sample of the
population is more convenient than collecting data on the entire
population.
• In census the only error is the measurement it self
• In sample error can happen due to measurement and sampling
• Census: more time, more resource , observation less reliable

2
Sampling
 Sampling unit: smallest of division of population: house, calf, herd:

they are the basic elements of population that is sampled

 Sampling frame: complete list of sampling unit from which we pick

our sample

Parameters Vs statistics

Value calculated from sample: statistical value (S. Statistic)

Value calculated for whole population: parameter (P. Parameter)

3
Descriptive versus analytic studies

• Samples are drawn to support both descriptive studies


(surveys) and analytic studies (observational studies).

Hierarchy of populations

1. The external population

2. The target population

3. The study population

4
 The external population is the population to which it might be
possible to extrapolate results from a study.

 It is often not defined and might vary depending on the


perspective of the individual interpreting the results of the study.

 The target population is the immediate population to which the


study results will be extrapolated.

 The animals included in the study would be derived from the


target population: it also mean the population at risk

 The study population is the population of individuals selected to


participate in the study (regardless of whether or not they
actually participate ). 5
Figure sampling process

6
Types of error
• In a study based on a sample of observations, the variability of the
outcome being measured, measurement error, and sample-to-sample
variability all affect the results we obtain.
• Hence, when we make inferences based on the sample data, they are
subject to error.
• In analytical studies two type errors
• Type I (a) error: You conclude that the outcomes are different when in
fact they are not.
• Type II (β) error: You conclude that the outcomes are not different when
in fact they are.
7
The α level is used as significance level—
 1-β measures power of the test

8
• Statistical test results reported in medical literature are aimed at disproving the
null hypothesis (i.e that there is no difference among groups).

• If differences are found, they are reported with a P-value which expresses the
probability that the observed differences could be due to chance, and not due to
the presence of the factor being evaluated.

• P is the probability of making a Type I (a) error.


• When P<O.05, we are 'reasonably' sure that any effect detected is not due to
chance.

9
Accuracy and precision

 In sampling procedures accuracy and precision are two different


statistical indicators and it is perhaps worth clarifying their
meaning at this point, as frequent reference will be made to these
two terms in the coming sections.

 How valid, how well the sample is selected

 In a survey the picture of the population obtained from sample


should be both accurate and precise
10
Sampling Accuracy

• Sampling accuracy is usually expressed as a relative index in


percentage form (i.e. between 0 and 100%) and indicates the
closeness of a sample-based parameter estimator to the true data
population value.

• When sample size increases and samples are representative,


sampling accuracy also increases.

• Its rate of growth, very sharp in the region of small samples,


becomes slower beyond a certain sample size.
11
Sampling Precision

• Sampling precision is related to the variability of the samples


used.

• It is measured, in reverse sense, by the coefficient of variation


(CV), a relative index of variability that utilizes the sample
variance and the sample mean.

• Estimates can be of high precision (that is with narrow


confidence limits), but of low accuracy.

• When sample size increases precision also increases as a result


of decreasing variability.
12
Sampling
• The aim of the sampling process is to draw a sample which is a true
representation of the population and which leads to estimates of
population characteristics having an acceptable precision or accuracy.

• The choice of the appropriate target population is determined by four


factors:

1. Popn. representativeness,

2.Access required,

3.Population data accuracy

4.The sample size.

13
Sampling methods /Types of sampling

Non-probability sampling and

 probability sampling

14
Non random sampling
 A type of sample which is not produced by random selection

• The selection is based on judgement, convenience, and


purposive., which is without a formal process for random
selection

• This method of sampling is inappropriate for descriptive studies


(use of non-probability samples might be misleading)

• However, non-probability sampling procedures are often used in


analytic studies.

15
1. Judgement sample

• This type of sample is chosen because, in the judgement


of the investigator, it is 'representative' of the target
population.

• This is almost impossible to justify because the criteria


for inclusion and the process of selection are largely
implicit, not explicit.

16
2.Convenience sample
• A convenience sample is chosen because it is easy to obtain.
• For instance, nearby herds, herds with good handling facilities,
herds with records that are easily accessible, volunteer herds etc
might be selected for study.

3. Purposive sample
• The selection of this type of sample is based on the elements
possessing one or more attributes such as known exposure to a risk
factor or a specific disease status.
• This approach is often used in observational analytic studies.
17
Random sampling/probability sampling
 It is one in which every element in the population has a known non-zero
probability of being included in the sample.

 This approach implies that a formal process of random selection has


been applied to the sampling frame.
 Types of probability / random sampling
1. Simple random sampling
2. Systematic random sample
3. Stratified random sample
4. Cluster sampling
5. Multistage sampling
18
1. Simple random sampling

 In here each member of the population do have equal chance to be


selected randomly.

 A complete list of all individuals in the population must be available.

 The list is called sampling frame

 However , it doesn’t mean you do it in any way you like it, random
selection can be done in different ways .

 Use of random table or lottery


19
• A disadvantage of the technique is that it may result in
large variation of the estimate thereby requiring larger
sample sizes.

• In practice simple random sampling is seldom used


due to absence of list or

• If the population is heterogeneous with the problem to


be studied, the result will lack precision .

20
2. Systematic random sampling
• In a systematic random sample, a

• complete list of the population to be sampled is not required


provided an estimate of the total number of animals is available
and,

• The sampling interval (j) is computed as the study population size


divided by the required sample size.=N/n= sampling fraction

21
3. Stratified random sample
• Prior to sampling, the population is divided into mutually exclusive
strata based on factors likely to affect the outcome.
• Then, within each stratum, a simple or systematic random sample is
chosen.
• The simplest form of stratified random sampling is called proportional
(the number sampled within each stratum is proportional to the total
number in the stratum).
There are three advantages of stratified random sampling.
1. It ensures that all strata are represented in the sample.
2. The precision of overall estimates might be greater than those derived
from a simple random sample.
22
3. It produces estimates of stratum-specific outcomes, although the
precision of these estimates will be lower than the precision of the
overall estimate.

For example,

 Assume you believe that cats are less likely to be up to date on


vaccines than dogs. Total population 500

 You would make up two lists - one of cats and one of dogs and
sample from each list.

 If 40% of the patients are cats, then 500*0.4=200 cats would be


selected, and 300 dogs would be selected.
23
24
4. Clustered sampling

A cluster is a natural or convenient collection of elements with one


or more characteristics in common.

For example:

• a litter is a cluster of piglets

• a dairy herd is a cluster of cattle

• a pen in a feedlot is a cluster of cattle, and

• a county is a cluster of farms.

25
• Cluster sampling is one of the probability sampling techniques
where as sampling is applied at an aggregated level (=group) of
individual units.
• Typically, the individual still remains the unit of interest such as
for example its disease status, but
• the sampling unit becomes a grouping of individual animals such
as the herd they belong to.
• All elements within each randomly selected group are then
included in the sample
• Therefore, this technique does only require a sampling frame for
the groups, but not for the members within the groups.
26
Sampling

• The random selection of the clusters as the sampling units can be


performed using simple random, systematic or stratified random
sampling.

• The variance is largely influenced by the number of clusters

• The technique assumes that the elements within the different


clusters are heterogeneous (unlike stratified sampling).

27
28
4. Multistage sampling
 Involves several level of random sampling (one for each), for
example with the nationally selected herd, a further random
selection process is used to determine which animal to be studied
(two stage sampling)

 The sample could be selected in three stages

 Example : within local government , random selection rural areas,


from each rural area random selection of farm, from each farm
random selection of individual animals

29
Summary and comparison of d/t sampling methods

30
Importance of sample size calculation (objectives)
• Usually a researcher would like to show a statistical significance
variation, but the difference should be meaning full

• Thus the researcher has to define what a meaning full difference is

• If your goal to find a statically significant result, increasing


sample is important , however other consideration

• Minimize study Cost

• Reduce the individual at risk

• To avoid statistical significance with no clinical or population


meaning
31
Sample Size determination
Once the target population has been selected
Empirical method
 Used a similar previous study
 It has no scientific basis
Analytical method
 It is a scientific approach
 Purpose of study
 Desire accuracy
• Note that there is a very close relation between sample size
calculation in the planning phase of a study and the statistical
test used in the analysis phase.

32
Sample Size determination
• A decision as to the required sample size has to be based on
different calculations depending on whether estimates for a
categorical or a continuous variable are to be calculated.

• The factors to take into consideration include:

 the accuracy required,

 the sampling method to be used,

 the size of the smallest subgroup and the actual variability of the
variable of interest in the population.

33
Sample Size determination
Definitions and some concepts
• Null hypothesis (H0): No difference between groups
H0: p1 = p2 H0: 1 = 2

• Alternative hypothesis (HA): There is a difference between


groups
HA: p1  p2 HA : 1  2

• P-Value: Chance of obtaining observed result or one more


extreme when groups are equal (under H0)

– Test of significance of H0
– Based on distribution of a test statistic assuming H0 is true
– It is NOT the probability that H0 is true
34
Sample Size determination
Definitions and some concepts
• Type I error: Rejecting H0 when H0 is true

• : The type I error rate.


• Maximum p-value considered statistically significant

• Type II error: Failing to reject H0 when H0 is false

• : The type II error rate

• Power (1 - ): Probability of detecting group effect given


the size of the effect () and the sample size of the trial (N)
35
Sample size issues in research
The key pieces of information required to calculate sample size:
• Maximum tolerable error/ desired absolute precision.
• It refers the precision we need for our estimate.
– It is usually given as absolute percentage like + 5 % desired
absolute precision.
-The higher precision we need the larger sample size we will take
• Confidence level: It refers to the level of certainty that the
investigator want to have in his estimate.
• The common confidence level used is 95% but 90% and 99% are
also not infrequently used.
– The higher is the confidence level desired, the higher is the
sample size
36
• Variance (The variability of the measurement on the study
subjects ) determines the size of sample we take.

• If the population is variable, we need to take more sample


and if uniform small sample size is enough.

– This variability is expressed by variance which is P (1-P)


for proportion (prevalence) and sd2 for mean.

• Population size: it has little effect on sample size if it is


enough large in relative the samples size required if (i.e. n<
0.1N).
37
Samples size determination for prevalence and incidence study

• For simple random sampling,

Where ,
n = required sample size
pexp = expected prevalence
d = desired absolute precision

 If there is no any previous study to be used in guessing the expected


prevalence, 50% is used so that a larger sample size would be taken so that
the investigator will be to the safest side.

38
 If the N is relatively small compared with n the sample size
obtained needs adjustment as follows

nadj = Nxn /N+n

where,

nadj = sample size for finite population( small population)

n= samples size for theoretically infinite population,

N= populations size

 For systematic and stratified sampling techniques the same


formulas to that simple random can be used.

39
Table: the approximate sample size required to estimate
prevalence in large population with the desired fixed width
confidence limits. (Modified from Cannon and Roe, 1982.)

Expected Level of confidence


prevalenc
90% 95% 99%
e
Desired absolute Desired absolute Desired absolute
precision precision precision
10% 5% 1% 10% 5% 1% 10% 5% 1%
10% 24 97 2435 35 138 3457 60 239 5971
20% 43 173 4329 61 246 6147 106 425 10616
30% 57 227 5682 81 323 8067 139 557 13933
40% 65 260 6494 92 369 9220 159 637 15923
50% 68 271 6764 96 384 9604 166 663 16587
60% 65 260 6494 92 369 9220 159 637 15923
70% 57 227 5682 81 323 8067 139 557 13933
80% 43 173 4329 61 246 6147 106 425 10616
90% 24 97 2435 35 138 3457 60 239 5971
40
Sample size for cluster sampling

• One-stage cluster sampling:

• The first step in determining sample size in one-stage

cluster sampling is prediction of the average number

of animals per cluster.

41
One-stage cluster sampling
♥ The appropriate formula for a 95% confidence interval is then:
where:
g = 1.962 {nVc+Pexp (1-Pexp)}
nd2
g = number of clusters to be sampled;
n = predicted average number of animals per cluster;
Pexp = expected prevalence;
d = desired absolute precision;
Vc = between-cluster variance.
42
Two-stage cluster sampling

 Determination of sample size for two-stage cluster sampling


depends on whether:

• The total sample size is fixed, and the number of clusters to be


sampled is required; or

• The number of clusters is fixed, and the number of animals to be


sampled is required.

43
(1)When the total sample size is fixed, and the number of clusters to
be sampled is required; or
g= 1.962TsVc
d2Ts -1 .962 Pexp(1-Pexp)
Where,
g = number of clusters to be sampled;
Pexp = expected prevalence;
d = desired absolute precision;
Ts = total number of animals to be sampled;
Vc = between-cluster variance
44
(2) Number of animals to be sampled when the number of clusters is
fixed:

To determine the number of animals to be sampled when the number


of clusters is fixed, the appropriate formula for a 95% confidence
interval is:

Ts = 1.962g Pexp(1-Pexp)

gd2 -1 .962 Vc

45
If you are sampling from a finite population (eg <1,000 animals), then the
formula to determine the required sample size is:

Where

n= required sample size

a l-confidence level (usually a=0.05)

D =estimated minimum number of diseased animals in the group


(population size*minimum expected prevalence )

N =population size
46
• If you are sampling from an infinite population, then the following
approximate formula can be used:

Where
n=the required sample size,
a is usually set to 0.05 or 0.01,
q=(l-minimum expected prevalence ).
• If you take the required sample and get no positive results (assuming that
you set a to 0.05), then you can say that you are 95% confident that the
prevalence of the disease in the population is below the minimal
threshold which you specified about the disease in question.
• Thus, you accept this as sufficient evidence of the absence of the disease.
47
48
The sample size required for rate studies

- With specified degree of precision

• If we wish not only to detect disease, but also wish to estimate its prevalence,
then a somewhat more complex calculation is used to estimate sample size.

• As you might expect, the sample size is larger than that needed to detect only the
presence of disease.

• Sample size for an infinite population (ninf) is estimated by the formula


where P = the estimated prevalence of infection
(as a decimal), Z corresponds to the degree of
confidence in our estimate (usually Z = l.96 for
95% confidence in our estimate) and d = the
maximum difference between observed and true
prevalence that we are willing to accept (as a
decimal)

49
• it is useful to understand the principles behind these calculations,
the required sample sizes can be obtained much more quickly from
tables or specialised epidemiological computer software such as
EpiInfo or EpiScope.

• presents a table of the sample sizes given a 95% confidence level


for different prevalence and absolute precision levels.
• For example, to estimate the prevalence of disease in a large
population to within +/- 5% at the 95%confidence level with an
expected prevalence of 20%, it is necessary to examine a random
sample of 246 animals.
50
Sampling to prove that disease is not present
• The probability of failing to detect disease (when it actually
exists) is given by:

Where:

• N: the population size

• d: the number of diseased animals present

• n: the number of animals tested

51
Sample size: Comparing proportions or means
 n=sample size per group to compare two proportions or two
means

52
Sample size for comparing proportions

53
54
55

You might also like