Nothing Special   »   [go: up one dir, main page]

Chapter 3

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

CHAPTER III: SAMPLING AND SAMPLING DISTRIBUTIONS

1.1 Sampling Theory


Sampling theory is a study of the relationships existing between a population and

samples drawn from the population.

Sampling theory is applicable only to random samples.

The theory of sampling studies the relationship that exists between the universe and the

sample drawn from it.

1.1.1 Definitions of terms:

1. Parameter: Characteristic or measure obtained from a population.

2. Statistic: Characteristic or measure obtained from a sample.

3. Sampling: The process or method of sample selection from the population.

4. Sampling unit: the ultimate unit to be sampled or elements of the population to be

sampled.

Examples:

 If somebody studies Scio-economic status of the households, households are the

sampling unit.

 If one studies performance of freshman students in some college, the student is the

sampling unit.

5. Sampling frame: is the list of all elements in a population under study.

Examples:

- List of households.
- List of students in the registrar office.

I.1.3 The Need for Sampling

- Reduced cost

- Greater speed

- Greater accuracy

- Greater scope
1.1.4 Bias and errors in sampling and non-sampling errors

There are two types of errors

1.1.4.1 Sampling error:

 It is the discrepancy between the population value and sample value.

 May arise due to inappropriate sampling techniques applied

1.1.4.2 Non sampling errors: are errors due to procedure bias such as:

 Due to incorrect responses

 Measurement

 Errors at different stages in processing the data.


1.1.5 Types of sampling

1.1.5.1 Random Sampling or probability sampling


- It is a method of sampling in which all elements in the population have a pre-assigned non-
zero probability to be included in to the sample.
Examples:

 Simple random sampling

 Stratified random sampling

 Cluster sampling

 Systematic sampling
i. Simple Random Sampling:
- It is a method of selecting items from a population such that every possible sample of
specific size has an equal chance of being selected. In this case, sampling may be with or
without replacement. Or
- All elements in the population have the same pre-assigned non-zero probability to be
included in to the sample.
- Simple random sampling can be done either using the lottery method or table of random
numbers.
ii. Stratified Random Sampling:
- The population will be divided in to non-overlapping but exhaustive groups called strata.
- Simple random samples will be chosen from each stratum.
- Elements in the same strata should be more or less homogeneous while different in
different strata.
- It is applied if the population is heterogeneous.
- Some of the criteria for dividing a population into strata are: Sex (male, female); Age
(under 18, 18 to 28, 29 to 39,); Occupation (blue-collar, professional, and other).

iii. Cluster Sampling:


- The population is divided in to non-overlapping groups called clusters.
- A simple random sample of groups or cluster of elements is chosen and all the sampling
units in the selected clusters will be surveyed.
- Clusters are formed in a way that elements within a cluster are heterogeneous, i.e.
observations in each cluster should be more or less dissimilar.
- Cluster sampling is useful when it is difficult or costly to generate a simple random sample.
For example, to estimate the average annual household income in a large city we use
cluster sampling, because to use simple random sampling we need a complete list of
households in the city from which to sample. To use stratified random sampling, we would
again need the list of households. A less expensive way is to let each block within the city
represent a cluster. A sample of clusters could then be randomly selected, and every
household within these clusters could be interviewed to find the average annual household
income.
iv. Systematic Sampling:
- A complete list of all elements within the population (sampling frame) is required.
- The procedure starts in determining the first element to be included in the sample.
- Then the technique is to take the kth item from the sampling frame.
N
N= population size , n=sample size , k = =sampling int erval.
- Let, n
- Chose any number between 1 and k . Suppose it is j ( 1≤ j≤k ) .
th th th
- The j unit is selected at first and then ( j+k ) ,( j+2 k ) , .. . .etc until the
required sample size is reached.
1.1.5.2 Non Random Sampling or non-probability sampling
- It is a sampling technique in which the choice of individuals for a sample depends on the
basis of convenience, personal choice or interest.
Examples:

 Judgment sampling.

 Convenience sampling
 Quota Sampling.
1. Judgment Sampling
- In this case, the person taking the sample has direct or indirect control over which
items are selected for the sample.
2. Convenience Sampling
- In this method, the decision maker selects a sample from the population in a manner
that is relatively easy and convenient.
3. Quota Sampling
- In this method, the decision maker requires the sample to contain a certain number of
items with a given characteristic. Many political polls are, in part, quota sampling.

Note: let N = population size , n=sample size .

1. Suppose simple random sampling is used


n
 We have N possible samples if sampling is with replacement.

 We have
()N
n
possible samples if sampling is without replacement.
2. After this onwards, we consider that samples are drawn from a given population
using simple random sampling.

1.2 Sampling Distributions


A sampling distribution is a probability distribution for the possible values of a sample

statistic, such as a sample mean.

NOTE: The normal probability distribution is used to determine probabilities for the

normally distributed individual measurements, given the mean and the standard

deviation. Symbolically, the variable is the measurement X, with the population mean µ

and population standard deviation δ. In contrast to such distributions of individual

measurements, a sampling distribution is a probability distribution for the possible

values of a sample statistic.

1.2.1 Sampling distribution of the mean and proportion

1.2.1.1 Sampling Distribution of the Mean

The sampling distribution of the mean is the probability distributions of the means, X of all
simple random samples of a given sample size n that can be drawn from the population.

NB: the sampling distribution of the mean is not the sample distribution, which is the
distribution of the measured values of X in one random sample. Rather, the sampling
distribution of the mean is the probability distribution for X , the sample mean.

For any given sample size n taken from a population with mean µ and standard deviation δ, the
value of the sample mean X would vary from sample to sample if several random samples
were obtained from the population. This variability serves as the basis for sampling
distribution.

The sampling distribution of the mean is described by two parameters: the expected value ( X )

= X , or mean of the sampling distribution of the mean, and the standard deviation of the mean
δ x , the standard error of the mean.

Properties of the Sampling Distribution of Means


1. The mean of the sampling distribution of the means is equal to the population mean. µ =
μX = X.
2. The standard deviation of the sampling distribution of the means (standard error) is
equal to the population standard deviation divided by the square root of the sample
size:
δx = δ/√n. This hold true if and only of n<0.05N and N is very large. If N is finite

and n≥ 0.05N,
δx =
δ


N −n
√ n N−1 . The expression √ N −n
N −1 is called finite population
correction factor/finite population multiplier. In the calculation of the standard error of
the mean, if the population standard deviation δ is unknown, the standard error of the
δ
mean x , can be estimated by using the sample standard error of the mean
SX which is

calculated as follows: √n
SX =
S
or S X =
S


N −n
√ n N −1 .
3. The sampling distribution of means is approximately normal for sufficiently large
sample sizes (n≥ 30).

Central Limit Theorem and the Sampling Distribution of the Mean


The Central Limit Theorem (CLT) states that:

1. If the population is normally distributed, the distribution of sample means is normal

regardless of the sample size.

2. If the population from which samples are taken is not normal, the distribution of sample

means will be approximately normal if the sample size (n) is sufficiently large (n ≥ 30).

The larger the sample size is used, the closer the sampling distribution is to the normal

curve.

The relationship between the shape of the population distribution and the shape of the
sampling distribution of the mean is called the Central Limit Theorem.

The significance of the Central Limit Theorem is that it permits us to use sample statistics to
make inference about population parameters with out knowing anything about the shape of the
frequency distribution of that population other than what we can get from the sample. It also
permits us to use the normal distribution (curve for analyzing distributions whose shape is
unknown. It creates the potential for applying the normal distribution to many problems when
the sample is sufficiently large.

Example:

1. The distribution of annual earnings of all bank tellers with five years of experience is skewed

negatively. This distribution has a mean of Birr 15,000 and a standard deviation of Birr 2000. If

we draw a random sample of 30 tellers, what is the probability that their earnings will average

more than Birr 15,750 annually? And interpret the result?

Solution:

Steps:

δ
1. Calculate µ and x

µ = Birr 15,000

δ x = δ/√n= 2000/√30 = Birr 365.15

2. Calculate Z for X

X −X X −μ
ZX = =
δX δX
15 ,750−15 ,000
Z 15, 750 = = +2.05
365

3. Find the area covered by the interval

P ( X > 15,750) = P (Z > +2.05)

= 0.5 - P (0 to +2.05)

= 0.5 – 0.4798

= 0.0202
4. Interpret the results

There is a 2.02% chance that the average earning being more than Birr 15, 750 annually in a
group of 30 tellers.

2. Suppose that during any hour in a large department store, the average number of shoppers is
448, with a standard deviation of 21 shoppers. What is the probability of randomly selecting 49
different shopping hours, counting the shoppers, and having the sample mean fall between 441
and 446 shoppers, inclusive?

Solution:

δ
1. Calculate µ and x

µ = 448 shoppers

δ x = δ/√n= 21/√49 = 3

2. Calculate Z for X

X −X X −μ
ZX = =
δX δX
441−448
Z 441 = = −2 .33
3

446−448
Z 446 = = −0 .67
3

3. Find the area covered by the interval

P (441 ≤ X ≤ 446) = P (-2.33 ≤ Z≤ -0.67)

= P (0 to -2.33) - P (0 to - 0.67)

= 0.4901 – 0.2486

= 0.2415

4. Interpret the results


There is a 24.153% chance of randomly selecting 49 hourly periods for which the sample
mean falls between 441 and 446 shoppers.

3. A production company’s 350 hourly employees average 37.6 year of age, with a standard
deviation of 8.3 years. If a random sample of 45 hourly employees is taken, what is the
probability that the sample will have an average age of less than 40 years?

Solution:

δ
1. Calculate µ and x

µ = 37.6 years n/N= 45/350 > 5%...... FPCF is needed

δx =
δ


N −n
√ n N−1 =
δx =
8 . 3 350−45

√ 45 350−1 √
= 1 .16

2. Calculate Z for X

X −X X −μ
ZX = =
δX δX
40−37 . 6
Z 40 = = +2. 07
1. 16

3. Find the area covered by the interval

P ( X < 40) = P (Z < +2.07)

= 0.5 + P (0 to +2.07)

= 0.5 + 0.48077

= 0.98077

4. Interpret the results

There is a 98.08% chance of randomly selecting 45 hourly employees and their mean age be
less than 40 years.

1.2.1.2 Sampling Distribution of Proportions ( P )


Some times in statistics it is important to know the proportion of a certain characteristic in a population.
That is, there are numerous problems in business where we want to know the proportion of items in a
population that possess a certain characteristic. For example,

- A quality control engineer might want to know what proportions of products of an


assembly line are defective.
- A labor economist might want to know what proportion of the labor force is
unemployed.

Whereas the mean is computed by averaging a set of values, the sample proportion is computed by
dividing the frequency that a given characteristic occurs in a sample by the number of items in the
sample.

Where P = sample proportions

X
P=
n X = number of items in a sample that possess the characteristic

n = number of items in the sample

Like other probability distribution,

Sampling distribution of the proportion is described by two parameters: the mean of the sample
δ
proportions, E ( P ) and the standard deviation of the proportions, P which is called the standard error of
the proportion.

Properties of Sampling distribution of P

1. The population proportion, P, is always equal to the mean of the sample proportion, i.e., P = E ( P ).

2. The standard error of the proportion is equal to:


Where P= population proportion
δ P=
√ Pq
n ,

q=1–P

n = sample size.

Or

δ P=
√ √ Pq N −n
n

N −1 , where √ N −n
N −1 = finite population correction factor.

The finite population correction factor is not needed if n < 0.05N.


Central Limit Theorem (CLT) and Sampling distribution of
P

How does a researcher use the sample proportion in analysis?

Answer: By applying the Central Limit Theorem. The CLT states that normal distribution
approximates the shape of the distribution of sample proportions if np and nq are greater than
5. Consequently we solve problems involving sample proportions by using a normal
distribution whose mean and standard deviation are:

μP = P , δ P =
√ Pq
n
and Z P =
P−P
δP

NB: The sampling distribution of p can be approximated by a normal distribution whenever


the sample size is large i.e., np and nq>5.

Example:

1. Suppose that 60% of the electrical contractors in a region use a particular brand of wire. What is
the probability of taking a random sample of size 120 from these electrical contractors and
finding that 0.5 or less use that brand of wire?
Solution:

n = 120 P = 0.6 q = 0.4 P( p < 0.5) =?

Steps:

1. Check that np and nq > 5


120*0.6 = 72 and 120*0.4 = 48. Both are greater than 5.

δ
2. Calculate P

δ P=
√ Pq
n =
=

0 .6∗0 . 4
120
=0 .0477

3. Calculate Z for p
P −P
Zp =
δp
0.5−0.6
Z 0 .5 = = −2 .24
0 . 0477

4. Find the area covered by the interval

P( p < 0.5) = P (Z < -2.24)

= 0.5 - P (0 to -2.24)

= 0.5 – 0.48745

= 0.01255

5. Interpret the results


The probability of finding 50% or less of the contractors to use this particular brand of wire is
very low (1.255%) if we take a random sample of 120 contractors.

2. If 10% of a population of parts is defective, what is the probability of randomly selecting 80


parts and finding that 12 or more are defective?
Solution:

n = 80
δ
2. P
=

0 .10∗0 . 90
80
=0 .0335

P = 0.1

X = 12

p = X/n = 12/80 = 0.15

P( p > 0.15) =?

3. Calculate Z for p
0 .15−0 .1
Z 0 .15 = = +1. 49
0 .0335

4. Find the area covered by the interval

P( p > 0.15) = P (Z > + 1.49)


= 0.5 – P(0 to + 1.49)

= 0.5 – P (0 to + 1.49)

= 0.5 – 0.43189 = 0.06811

About 6.81% of the time, twelve or more defective parts would appear in a random sample of
eighty parts when the population proportion is 0.10.

You might also like