Nothing Special   »   [go: up one dir, main page]

Introduction To Sampling and Sampling Designs: University of Baguio

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

University of Baguio

RESEARCH AND DEVELOPMENT CENTER


IRC-MATHEMATICS AND STATISTICS

INTRODUCTION TO SAMPLING AND


SAMPLING DESIGNS

STATISTICS WITH SPSS LECTURE NOTES


Dr. Victor Hafalla Jr., RME, REE, MAAS, Ph.D.
University Statistician (UB)
UB Graduate School
“We can't solve
problems by using the
same kind of thinking
we used when we
created them.”

Albert Einstein
1879-1955
What is Sampling?
• Sampling is the process of selecting units (e.g. experimental
elements, persons, entities) from a population of interest so
that by studying the sample we may fairly generalize the
parameter of interest in the population from which the
sample were chosen.
• The process of gathering information from all elements in
the population of study is called census or complete
enumeration.

• Usually when we do a statistical inquiry, we do not gather


information from all the elements of the population but
only on a small portion or a subset of it. This is called a
sample survey.
What are the advantages of Sample
Surveys over the Census?
• Doing sample surveys gives the researchers advantages
over the census. The advantages of sample surveys are
(1) reduced cost,
(2) greater speed,
(3) greater scope and
(4) greater accuracy.
What is a Sampling Design?
• A sampling design determines how to select a sample from
a sampling frame.

• Sampling designs are either probability or non-probability.


• A probability sampling design is a procedure wherein every
element of the population is given a (known) nonzero
chance of being selected in the sample.

• Whereas, a non-probability sampling design is a procedure


wherein not all the elements in the population are given a
chance of being included in the sample. The latter suggests
that there is a bias in the selection procedure for the
sample from the population elements.
What are the Non-probability
Sampling Designs?
• Sometimes non-probability sampling, with its inherent
bias, may be essential and the only recourse for sampling.

• Some methods of non-probability sampling are:


purposive sampling
quota sampling
convenience sampling
judgment sampling
accidental sampling
snowball sampling
Purposive Sampling:
sets out to make the sample agree with the population with
regards to certain characteristics

Quota Sampling:
a specific number of particular types of elements are
selected

Convenience Sampling:
choose units which come to hand or are convenient
Judgment Sampling:
select sample in accordance with an expert’s judgment

Accidental Sampling:
used for cases where the sample is difficult to obtain or is
obtained by chance only

Snowball Sampling:
in this procedure the samples are gathered though the use of
new information from the previous batch of samples until the
required number of samples is completed
Exercise:
Which non-probability sampling technique is depicted in
each situation?

1. A study on the occurrence of Syphillis among young


males made the researcher visit some of the hospitals in
Region 1 and gave interview materials to nurses and
staffs of doctors treating this disease with instructions
that patients consulting the doctors must answer the
survey prior to consultation or diagnosis. From among 4
regional hospitals, he was able to generate 25 cases
inflicted with the disease.
2. A dentist wants to study the effect of a new
mouthwash with organic compound on the occurrence of
tooth decay. For this purpose, he hired a research
assistant to find his experimental units (patients with
severe tooth decay). He gave him a quota of 42
respondents. The research assistant went to interview
some of the prospective patients in the market and asked
them to refer others for interview.
3. Because of clashes between government and rebels in
Sulu, a market researcher only chose Davao and
Zamboanga for its provincial wide survey for a brand of
toothpaste.
4. In a manufacturing firm, the quality control engineer
samples a fixed 5% of the products in a batch for testing
before shipment.
What are the Probability
Sampling Designs?
• On the other hand, the different probability sampling
designs include:
simple random sampling,
stratified random sampling,
systematic random sampling,
cluster sampling and
multi-stage sampling.
Simple Random Sampling:
Simple random sampling is the process of selecting a
sample of size n, giving each sample unit an equal chance
of being included in the sample.
An SRS of n observations of the population is a sample that
is chosen in such a way that each subset of n observations
of the population has the same chance of being selected (ie.
lottery method)
Simple random sampling may be with replacement
(SRSWR) or without replacement (SRSWOR).
Stratified Random Sampling
There are cases wherein the
population consists of items which
are heterogeneous with respect to
the characteristic under study.
In such situations the population
should be divided, or stratified, in
more or less homogenous
subpopulations or strata before
sampling is done.
Stratified random sampling
consists of selecting an SRS from
each of the strata into which the
population has been divided.
Systematic Random Sampling
Systematic sampling is a method of selecting a sample by
taking every kth unit from an ordered population, the first
unit being selected at random from 1 to k. Here k is called
the sampling interval.
Cluster Sampling
When we have to sample a population over a wide geographic area, simple
random sampling would require covering a large geographic area in order to
gather the units in the sample. Because of this, cluster sampling is preferable.
Cluster Sampling is a method of selecting a sample of distinct groups, or
clusters, of smaller units called elements. The sample clusters may be chosen
by SRS or by systematic sampling.
Similar to strata in stratified sampling, clusters are mutually exclusive
subpopulations which together comprise the entire population. Unlike strata,
however, clusters are preferably formed with heterogeneous elements so that
each cluster will be typical of the population.
Multi-stage Sampling
In multistage sampling the selection of the sample is
accomplished in two or more steps. The population is first
divided into a number of first-stage or primary units, from
which a sample is drawn. Within the sampled first-stage
units, a sample of second-stage or secondary units is drawn
into a hierarchy of sampling units corresponding to the
different sampling stages.
Exercise:
Which probability sampling technique is depicted in
each situation?

1. The experiment is to gather information about a


certain type of insect in the Sierra Madre where the
insect is known to reside in a wide area of the forest. The
entomologist breaks the Sierra Madre into 14
comparable areas and selects a sample of 6 areas to
survey.
2. A survey to determine the palatability of a brand of
infant formula (milk) to infants 2-2.5 years old, 2.5-3.25
years old, and 3.25-4.5 years old. The nutritionist selects
a number of samples from each of the age groups in a list
given by the local civil registry.
3. A librarian wants to study the usage ratio of its
medical books in the library. She selects from among the
books in the library every 12th book using the Dewey
Decimal System number of the books from their
database.
4. An agriculturist wants to study coconut trees planted
on 26 experimental plots. He selects 12 plots for this
purpose knowing that each plot is heterogeneous of the
characteristics he wishes to investigate.
5. From 250 families in Brgy. Bagumbayan, a nurse
interviewed 46 households determined through lottery
from the list given by the barangay secretary by
numbering each household in the list and drawing the 46
using a stained glass bowl.
How do we determine the
sample size n?
Methods for determining sample size (n) in your study
includes:
sample sizes from similar studies
using published tables
use of computing formulas
calculating sample size from effect sizes and
statistical power
SAMPLE SIZE FROM SIMILAR STUDIES
Similar studies can provide hints on the sample size that is
needed for a particular study or a particular test
procedure.
The sample size of the same study which you wish to
conduct may also be a basis for the sample size. However,
errors committed in the sampling design and the
computation of the sample size may involve repeating those
errors. It is therefore vital that the researcher have a
thorough review of the literature concerning his study to
discover and avoid those errors.
SAMPLE SIZE USING PUBLISHED TABLES
Tables from research journals and statistics books provide
sample sizes for specific studies, statistical tests, or a set of
predetermined criteria (e.g. power of tests). Before using a
published table, it is essential that the researcher
understands the underlying assumptions of its use to
ensure its applicability in the study.
USING COMPUTING FORMULAS

USING FORMULA FOR PROPORTIONS

where: n = sample size


z = the z-value corresponding to an α/2 where α is
the level of significance
(for α =0.05 z=1.96, for α =0.01 z=2.575)
e = specified amount of error
p = population proportion or its point
estimate .

q = 1-p (or
COMPUTING SAMPLE SIZES FROM THE MEAN

where: n = sample size


z = the z-value corresponding to an α/2
where α is the level of significance
(for α =0.05, z=1.96, and for α =0.01, z=2.575)
σ2 = population variance
e = specified amount of error
USING YAMANE’s FORMULA
If the population size (N) is known, a simplified formula is
Yamane’s formula. This is usually used for social science
researches.

where: n = sample size


N = population size
e = desired degree of error at a a given
confidence level (ie. at 95% confidence level,
e=0.05)

Note: Yamane’s Formula can only be used when estimating for the population proportion
and is optimal when p is close to 0.5.
Examples:
1. Suppose we wish to study the learning styles (e.g.
converger) among college seniors. Assume there is a large
population of the number of seniors and since we do not
know how many of the population units exhibit the
learning style we wish to study, we assume that there is a
large variance for this and that p=0.5 (for greater
accuracy). Also, we desire a 95% confidence level and
therefore the level of precision is ±5% (Z0.05/2 = 1.96).

Solution:
2. We wish to study the cadet trainees’ leadership
qualities in a certain academy. The population is 415 and
we wish to choose a level of precision of 5%. What is the
sample size?

Solution:
3. An electrical firm manufactures light bulbs that have
a life that is approximately normally distributed with a
standard deviation or 40 hours. How large a sample is
needed if we wish to be 96% confident that our sample
mean will be within 10 hours of the true mean.

Solution:
Here, e=10, σ=40, α=0.04, hence
USING EFFECT SIZES AND STATISTICAL POWER
Statistical Power:
The statistical power of a test is the probability that it will lead you
to reject the null hypothesis when that hypothesis is in fact wrong
(β),
usually between 0.01 to 0.99 (Cohen suggests a minimum of 0.80
statistical power)

Level of Significance (α ):
is 1- β
The rule of thumb is either α=0.05 of α=0.01

Effect size:
Different statistical tests gives different effect sizes, for simplicity
we follow Cohen’s convention

Source: Cohen, J. (1988). Statistical power analysis for the behavior


sciences. (2nd ed.). Hillsdale, NJ: Erlbaum
Chi-Square, One- and Two-Way:
For a Chi-square test, Cohen considered a w of .10 to constitute a small effect,
.3 a medium effect, and .5 a large effect. The required total sample size
depends on the degrees of freedom, as shown in the table below:

Source: Cohen, J. (1988). Statistical power analysis for the behavior


sciences. (2nd ed.). Hillsdale, NJ: Erlbaum
Pearson r (correlation):
Cohen considered a ρ of .1 to be small, .3 medium, and .5 large.
You need 783 pairs of scores for a small effect, 85 for a medium
effect, and 28 for a large effect. In terms of percentage of
variance explained, small is 1%, medium is 9%, and large is 25%.

One-Sample t-test:
A d of .2 is considered small, .5 medium, and .8 large. For 80%
power you need 196 scores for small effect, 33 for medium, and
14 for large.

Source: Cohen, J. (1988). Statistical power analysis for the behavior


sciences. (2nd ed.). Hillsdale, NJ: Erlbaum
Independent Samples t-test, Pooled Variances:
A d of .2 is considered small, .5 medium, and .8 large. For 80% power
you need, in each of the two groups, 393 scores for small effect, 64 for
medium, and 26 for large.

Correlated Samples t-Test (Paired Samples):


Using Howell’s table and G*Power, assuming nondirectional
hypotheses (two-tailed test) and α =.05 criterion of significance.

Source: Cohen, J. (1988). Statistical power analysis for the behavior


sciences. (2nd ed.). Hillsdale, NJ: Erlbaum
One-Way Independent Samples ANOVA:
Cohen considered an f of .10 to be a small effect, .25 a medium effect, and .40
a large effect. Using this to translate Cohen’s guidelines into proportions of
variance, a small effect is one which accounts for about 1% of the variance, a
medium effect 6%, and a large effect 14%. The required sample size per
group varies with treatment degrees of freedom, as shown below:

Source: Cohen, J. (1988). Statistical power analysis for the behavior


sciences. (2nd ed.). Hillsdale, NJ: Erlbaum
What are other considerations in
determining the sample size?
The equations for determining the sample size all assume simple
random sampling and may not be applicable for other sampling
designs such as cluster sampling and multi-stage sampling.

Also, researchers often allocate sampling costs in their research


budgets and this must be considered in determining the sample
size and procedure.

Other more complicated tests such as multiple regression


analysis and other multivariate statistical methods (such as
factor analysis, discriminant analysis, cluster analysis, etc.)
employ a ‘rule-of-thumb’ minimum number of samples per
variable in the study (e.g. 10-20 samples per variable).
Furthermore, skewed distributions for the sample taken
may entail additional samples to satisfy the assumption of
normality in most statistical tests.

Finally, the equations above yield the estimated number of


samples that need to be obtained and many researchers
add 5% or more of this number to compensate for the
non-responses to the survey, incomplete and biased
responses, and coding errors which cannot be rectified.

You might also like