Nothing Special   »   [go: up one dir, main page]

LR 2 Sampling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

ECON 1005 Lecture #1

An Introduction to Statistics
(based largely on PS Mann Appendix 1)

Dr. Henry Bailey


We saw some important definitions earlier. Can you remember
the key ideas for each of these terms?

Inferential Methods that use sample results to help make


statistics decisions or predictions about a population.

Population Consists of all elements that are being studied

Sample The portion of the population selected for study

Census A survey that includes every member of the population

The list of elements from which a sample is to be


Sampling frame drawn. E.g. telephone directory; UWI students
registration database; population census data
An element:
of a sample or population is a specific subject or object (for example, a
person, firm, item, state, or country) about which the information is
collected.

A variable:
a characteristic under study that assumes different values for different
elements. In contrast to a variable, the value of a constant is fixed.

An Observation (or measurement):


the value of a variable for an element

A Data set:
a collection of observations on one or more variables.
• Why not take a census?
• Impossibility of conducting a census due to:
– time
– cost
– lack of access to the entire population etc.
Random sample:
A sample drawn in such a way that each element of the population has an
equal chance of being selected

Simple Random sampling:


All samples of the same size selected from a population have the same
chance of being selected
The probability of drawing a specific element on the rth draw= that for
the 1st draw.

Sampling with replacement:


Each time an element is selected from the population, it is back in the
population before the next element is selected.

Sampling without replacement:


the selected element is not replaced in the population.
Advantage:
– eliminates selection bias

Limitations:
– Can be costly- esp for widely dispersed population
– It is possible that the resulting sample may not be representative
of the population
• Esp where the population is stratified

Example:
– Suppose we are interested in the average age of students in
ECON 1005
–What is the population?
–Suppose we take this class as the sample, and find the average age
–Will be sample be representative of the population?
–What has gone wrong?
• Divide the population into categories or strata, using a
stratification factor. These factors must be:
– Mutually exclusive
– Each member of the population must be assigned to only 1
stratum
– E.g. gender, age group, income group
• A simple random sample is then drawn from each
stratum to ensure adequate representation of all
strata in the sample.
• The collection of simple random samples constitutes
the stratified random sample.
The idea behind Stratified Random Sampling is that a sample
must be truly representative of the population if the inferences
from the sample about the population are to be reliable.
• Think of some stratification factors for the following:
– The ECON1005 Class
– The residents of the capital city in your country
– Licensed Motor Vehicles in your country
– West Indies Cricket Fans across the Caribbean
• Advantages:
– uses knowledge of the population to increase the
precision of the results obtained from the sample.
– The chance of any individual being drawn is still
measurable and all possible samples of equal size still have
the same chance of selection.
• Although the choice of the stratification factor depends
on judgment, the procedure still maintains an element
of randomness, esp since the mode of selection within
each stratum is random.
– Efficiency: results from small samples
– Allows analysis at sub stratum level.
• Disadvantage:
– each element of the population now does not have an
equal chance of being drawn.
• Works best when:
– the variation within each stratum/group is small compared
to the variation between strata/groups.
• When within-group variation is small, it can provide results
nearly identical to those of Simple random sampling.
• E.g. the health status of people by age group:
>20; 20-29; 30-39; 40-49; 50-59; 60-69; 70-79; 80-89; 90+
• Aka multi-stage sampling
– The population is divided into groups (clusters)
– A simple random sampling process is then applied
to select a sample of clusters.
– All members of the selected cluster(s) constitute the
cluster sample.
• Example:
– Survey to investigate students’ attitudes to
University policy on consultations.
– Divide the student body into faculties
– Use simple random sampling to pick 2 faculties
– Survey each student in the two faculties.
• Cluster sampling may seem to resemble stratified
random sampling in that both sample designs
involve a grouping of the members of the
population. The similarity stops there!
– When we stratify, every stratum is sampled, whereas
when we cluster, we select among the clusters, with the
resultant cluster(s) constituting the sample.
– Further, when we stratify we use a simple random
sample of each stratum; when we cluster we select each
member of each of the selected clusters.
• Advantages
– Cluster sampling is particularly useful when it is difficult or
costly to develop a sampling frame.
– It is also useful when the population elements are widely
dispersed geographically.
– The selection process remains a random one, since we select
among the clusters in much the same way that we select
among individual population members in random sampling.
• Limitation
– Under both the simple random and stratified random
sampling designs, one member at a time (of the population
or stratum) is selected. The selection of any one member is
independent of the selection of another.
• Cluster sampling does not share this characteristic.
• There are many ways of selecting the clusters
themselves, but one general rule always applies:
– clusters should be formed so that the variation within
each cluster is large relative to the variation between
clusters.
– I.e. the clusters are collectively homogenous but within
each one, they are as heterogeneous as can be.
– This is how we obtain a sample that is truly
representative of the population as a whole.
– So, would you use cluster sampling to investigate the
health status of adults by age group?
>20; 20-29; 30-39; 40-49; 50-59; 60-69; 70-79; 80-89; 90+
– In a non-random sample, some members of the
population may not have any chance of being selected in
the sample.
• Convenience sample
– E.g, students at a university/ shoppers at a store

• Judgment sample (or purposive sample)


– Based on ‘judgment’ of the researcher
e.g. specific doctors within a specialty

• Quota sample
– Divide population into subpopulations & sample therein
– E.g. sampling by gender: 480 men & 520 women
– Does not require a sampling frame
• Simple random sampling
• Stratified random sampling
• Cluster sampling

• Convenience sampling
• Judgment sampling (or purposive sample)
• Quota sampling
• Sampling Error
– the difference between the result obtained from a sample survey
and the result that would have been obtained if the whole
population had been included in the survey.
– E.g. simple random sample of the heights of students in a class
might produce a sample of the the 10 tallest students!
• Examples:
– Selection Err: sampling frame excludes some types of member (tele directory)
– Non response Err: many people don’t respond or participate
– Response Err: respondents provide incorrect answers
– Voluntary response Err: only some people respond (esp in call-in or mail-in surveys)
• Recall the steps of Statistical Investigation:
– Question
– Sampling procedure
– Data collection
– Generalization
– Decide on reliability
• Direct Observation
• Experiments
• Surveys
– Personal Interviews
– Telephone Interviews
– Self Administered Survey
– Mail Survey
– Internet Survey
• Questionnaire
– Keep it short and simple; avoid open-ended questions;
leave options for “Other”; should be anonymous
• Observation schedules
• Experiments
– control and treatment group
• Interviews
– formal or informal; predetermined list of questions
• Once we have collected our data, we
assemble a team to begin entering the data
into our statistical software.
• We may have to code the data for it to be
relevant to the software.
• From the software, we can now begin to
derive descriptive statistics and display them
in the various forms discussed in the next unit.
• Now on to your reading !
• PS Mann- Introductory Statistics.
• Read Chapter 1 and Appendix A before the
next class.

You might also like