Nothing Special   »   [go: up one dir, main page]

S&P Lecture Notes 4 (Chapter 7)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Chapter 7

Lecture
Sampling and Sampling Distribution of Sample Mean
and Sample Proportion

Sep. 20,2021
7. Sampling and Sampling Distributions
 Sampling is simply the process of learning about the

population on the basis of a sample drawn from it.

 Thus in the sampling technique instead of every unit of the

universe only a part of the universe is studied and

 Conclusions are drawn on that basis for the entire universe.


Some practical examples of sampling
▪ Although much of the development in the theory of sampling has taken place only in

recent years, the idea of sampling is pretty old.


▪ Since times immemorial
✓ People have examined a handful of grains to ascertain the quality of the entire lot.
✓ A house wife examines only two or three grains of boiling rice to know whether the
pot of rice is ready or not.
✓ A doctor examines a few drops of blood and draws conclusion about the blood
constitution of the whole body.
✓ A businessman places orders for material examining only a small sample of the same.
✓ A teacher may put questions to one or two students and find out whether the class as
a whole is following the lesson.
▪ In fact there is hardly any field where the technique of sampling is not used either
consciously or unconsciously.
(Continued)
 It should be noted that a sample is not studied for its own sake.
 The basic objective of its study is to draw inference about the
population.
 In other words, sampling is a tool which helps to know the
characteristics of the universe or population by examining only a
small part of it.
 The values obtained from the study of a sample, such as the average
and dispersion, are known as ‘statistic’.
 On the other hand, such values for population are called
‘parameters’.
7.1. Basic concepts (population, sample, parameter, statistic,
Sampling frame . . .)
Definitions:
A population:
➢ It is the complete set of possible measurements for which inferences are to be
made.
➢ The population represents the target of an investigation, and
➢ The objective of the investigation is to draw conclusions about the population
➢Hence, we sometimes call it target population.
Examples 7.1:
➢ Population of trees under specified climatic conditions
➢ Population of farms having a certain type of natural fertility
➢ Population of households, etc
❑ The population could be finite or infinite (an imaginary collection of units).
❑ There are two ways of investigating the population: Census and sample survey.
Cont`d
Census:

➢ a complete enumeration of the population. But in most real problems it


cannot be realized, hence we take sample.

➢ Sample: is the set of measurements that are actually collected in the


course of an investigation.

➢ It should be selected using some pre-defined sampling technique in such a


way that they represent the population very well.

Examples 7.2:

 Monthly production data of a certain factory in the past 10 years.

 Small portion of a finite population.


Cont`d
Parameter:
➢ Characteristic or measure obtained from a population.

Statistic:
➢ Characteristic or measure obtained from a sample.

Sampling:
➢ The process or method of sample selection from the population.

Sampling unit:
➢ The ultimate unit to be sampled or elements of the population to be sampled.

Examples 7.3:
 If somebody studies Scio-economic status of the households, households is the
sampling unit.
 If one studies performance of freshman students in some college, the student is the
sampling unit.
Cont`d
Sampling frame:

➢ It is the list of all elements in a population.

Sample size:

➢ The number of elements or observation to be included in the sample.

Errors in sample survey:

There are two types of errors (Sampling error and non-sampling error)

a) Sampling Error:

 The error which arise due to only a sample being used to estimate population
parameter. It is the discrepancy between the population value and sample value.

 Sampling error is the difference between an estimate and the true value of the
parameter being evaluated.

 May arise due to in appropriate sampling techniques applied.


Cont`d
b) Non sampling Errors:
❑ They are errors due to procedure bias such as:

 Due to incorrect responses

 Measurement or recording errors

 Interviewer errors: it occurs in surveys when an interviewer introduces bias

into an interview or when a questionnaire is badly designed.

 Non-response error: non responses can be due to refusals

 Errors at different stages in processing the data.


7.2. Reasons for Sampling
 Reduced cost (cost of studying all items in the population is often prohibitive)

 Greater speed

 Greater accuracy

 Greater scope (physical impossibility of checking all items in the population)

 Avoids destructive nature of certain test

 Adequacy of sample result

 Save time

 The only option when the population is infinite

 Because of the above consideration, in practice we take sample and make


conclusion about the population values such as population mean and population
variance.
Cont`d
 But, Sometimes taking a census makes more sense than using a sample.

 Because of the following reasons :

➢ Universality

➢ Qualitativeness

➢ Detailedness

➢ Non-representativeness
7.3. Different types of Sampling (Probability vs Non
probability Sampling Techniques)
 There are two types of sampling techniques.

 These are

✓ Random sampling (probability sampling) and

✓ Non-random (non-probabilistic sampling).


A) Random Sampling or Probability Sampling

 It is a method of sampling in which all elements in the population have a pre-


assigned non zero probability to be included in to the sample.

 Random sampling includes:

➢ Simple random sampling

➢ Stratified random sampling

➢ Cluster sampling

➢ Systematic sampling
❖ Advantages of Probability Sampling

The following are the basic advantages of probability sampling methods:

 Probability sampling does not depend upon the existence of detailed


information about the universe for its effectiveness.

 Probability sampling provides estimates which are essentially unbiased and


have measurable precision.

 It is possible to evaluate the relative efficiency of various sample designs only


when probability sampling is used.
❖ Limitations of Probability Sampling

 Probability sampling requires a very high level of skill and

experience for its use

 It requires a lot of time to plan and execute a probability sample.

 The costs involved in probability sampling are generally large as

compared to non-probability sampling.


1. Simple Random Sampling:

 It is a method of selecting items from a population such that every


possible sample of specific size has an equal chance of being selected.

 The sampling may be with or without replacement.

 All elements in the population have the same pre-assigned non zero
probability to be included in to the sample.

 Simple random sampling can be done either using the lottery method or
table of random numbers.
I. Lottery Method:
 It is a very popular method of taking a random sample.
 All items of the universe are numbered or named on separate slips of paper of identical size and
shape.
 These slips are then folded and mixed up in a container or drum.
 A blindfold selection then made of the number of slips required to constitute the desired sample
size.
 The selection of items thus depends entirely on chance.
For Instance,
 If we want to take a sample of 10 persons out of a population of 100, the procedure is to write the
names of the 100 persons on separate slips of paper, fold these slips, mix them thoroughly and then
make a blindfold selection of 10 slips.
 It is very popular in lottery draws where a decision about prizes is to be made.
 However, while adopting lottery method it is absolutely essential to see that
 The slips are of identical size, shape and color,
 Otherwise there is a lot of possibility of personal prejudice and bias affecting the results.
II. Table of Random Numbers
 Table of random numbers are tables of the digits 0, 1, 2,…,, 9,

 Each digit having an equal chance of being selected at any draw.

For convenience,
 The numbers are put in blocks

 In using these tables to select a simple random sample, the steps are:

Step 1: each element numbered for example for a population of size 500 we assign 001
to 500.

Step 2: select a random starting point

Step 3: we need only respective number of digits. Proceed in this fashion until the
required number of sample selected
Note: If sampling is without replacement, reject all the numbers that comes more
than once.
2. Stratified Random Sampling:
 The population will be divided in to non overlapping but exhaustive groups called
strata.

 Simple random samples will be chosen from each stratum.

 Elements in the same strata should be more or less homogeneous while different in
different strata.

 It is applied if the population is heterogeneous.

 Some of the criteria for dividing a population into strata are:

➢ Sex (male, female);

➢ Age (under 18, 18 to 28, 29 to 39);

➢ Occupation (blue-collar, professional, other).

➢Economic Status ( low, Middle, High ) etc.


3. Cluster Sampling:
 The population is divided in to non overlapping groups called clusters.

 A simple random sample of groups or cluster of elements is chosen and all the sampling
units in the selected clusters will be surveyed.

 Clusters are formed in a way that elements within a cluster are heterogeneous, i.e.
observations in each cluster should be more or less dissimilar.

 Cluster sampling is useful when it is difficult or costly to generate a simple random sample.

For example,

 To estimate the average annual household income in a large city we use cluster sampling,
because to use simple random sampling we need a complete list of households in the city
from which to sample. To use stratified random sampling, we would again need the list of
households. A less expensive way is to let each block within the city represent a cluster. A
sample of clusters could then be randomly selected, and every household within these
clusters could be interviewed to find the average annual household income.
4. Systematic Sampling:
 A complete list of all elements within the population (sampling frame) is
required.

 The procedure starts in determining the first element to be included in the


sample.

 Then the technique is to take the kth item from the sampling frame.
Let
B) Non Random Sampling or Non-probability Sampling.

 It is a sampling technique in which the choice of individuals for a sample

depends on the basis of convenience,

 Personal choice or interest. It includes:

➢ Judgment sampling

➢ Convenience sampling

➢ Quota Sampling.
1. Judgment Sampling
 In judgment sampling, the person taking the sample has direct or indirect
control over which items are selected for the sample.

2. Convenience Sampling

 The decision maker selects a sample from the population in a manner that
is relatively easy and convenient.

3. Quota Sampling

 The decision maker requires the sample to contain a certain number of


items with a given characteristic.

 Many political polls are, in part, quota sampling.


7.4. Sampling Distribution of the Sample Mean and Proportion

❖ Sampling Distribution

 Given a variable X, if we arrange its values in ascending order and


assign probability to each of the values or if we present Xi in a form
of relative frequency distribution the result is called Sampling
Distribution of X.
❖ Sampling Distribution of the Sample Mean
 In addition to knowing how individual data values vary about the
mean for a population, researchers are interested in knowing how the
means of samples of the same size taken from the same population
vary about the population mean.
Suppose a researcher selects a sample of 30 adult males and finds the mean of
the measure of the triglyceride levels for the sample subjects to be 187
milligrams/deciliter. Then suppose a second sample is selected, and the mean of
that sample is found to be 192 milligrams/deciliter. Continue the process for 100
samples.
What happens then is that the mean becomes a random variable, and the sample
means 187, 192, 184. . . 196 constitute a sampling distribution of sample means.
❖ Definition
A sampling distribution of sample means is a probability distribution using the
means computed from all possible random samples of a specific size n taken from
a population.
 It is a theoretical probability distribution that shows the functional
relationship between the possible values of a given sample mean
based on Samples of size and the probability associated with each
value, for all possible samples of size drawn from that particular
population.

 If the samples are randomly selected with replacement, the sample


means, for the most part, will be somewhat different from the
population mean 𝜇 . These differences are caused by sampling error.
When all possible samples of a specific size n are selected with
replacement from a population, the distribution of the sample
means for a variable has two important properties, these properties
are:
I. The mean of the sample means will be the same as the
population mean.
II. The standard deviation of the sample means will be smaller than
the standard deviation of the population, and it will be equal to
the population standard deviation divided by the square root of
the sample size.
 The following example illustrates these two properties. Suppose a
professor gave an 8-point quiz to a small class of four students. The results
of the quiz were 2, 6, 4, and 8. For the sake of discussion, assume that the
four students constitute the population. Hence the mean of the population
will be:

2+6+4+8
𝜇= =5
4
And the population standard deviation is:

σ𝑛𝑖=1(𝑥𝑖 − 𝜇)2 (2 − 5)2 +(6 − 5)2 +(4 − 5)2 +(8 − 5)2


𝜎= = ≈ 2.236
σ𝑛𝑖=1 𝑓𝑖 4
Now, if all samples of size 2 are taken with replacement and the mean of each sample is
found, the distribution is as shown below.
Sample Mean Sample Mean
2,2 2 6,2 4
2,4 3 6,4 5
2,6 4 6,6 6
2,8 5 6,8 7
4,2 3 8,2 5
4,4 4 8,4 6
4,6 5 8,6 7
4,8 6 8,8 8
A frequency distribution of sample means is as follows.
𝑥 f
2 1
3 2
4 3
5 4
6 3
7 2
8 1
This implies the mean of the sample mean 𝜇𝑥 will be:

σ𝑛𝑖=1 𝑥𝑖 𝑓𝑖 2 ∗ 1 + 3 ∗2 + ⋯+ 8∗1 80
𝜇𝑥 = 𝑛 = =
σ𝑖=1 𝑓𝑖 16 16
= 5 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑚 (𝜇)
⇒ 𝜇𝑥 = 𝜇
- The standard deviation of sample means, denoted by 𝜎𝑥 , is

σ𝑛𝑖=1(𝑥𝑖 − 𝜇𝑥 )2 (2 − 5)2 +(3 − 5)2 + ⋯ + (8 − 5)2


𝜎𝑥 = = ≈ 1.581
σ𝑛𝑖=1 𝑓𝑖 16

which is the same as the population standard deviation, divided by 2 :

𝜎 2.236
𝜎𝑥 = = ≈ 1.581
𝑛 2
Example 7.4
In summary, if all possible samples of size n are taken with replacement from the
same population, the mean of the sample means, denoted by 𝜇𝑥 , equals the
population mean 𝜇; and the standard deviation of the sample means, denoted
𝜎
by 𝜎𝑥 , equals . The standard deviation of the sample means is called the
𝑛
standard error of the mean.
- A third property of the sampling distribution of sample means pertains to the
shape of the distribution and is explained by the central limit theorem.
The Central Limit Theorem
As the sample size n increases without limit, the shape of the distribution of the
sample means taken with replacement from a population with mean 𝜇 and
standard deviation 𝜎 will approach a normal distribution. As previously shown,
𝜎
this distribution will have a mean 𝜇 and a standard deviation .
𝑛
- If the sample size is sufficiently large, the central limit theorem can be used
to answer questions about sample means in the same manner that a normal
distribution can be used to answer questions about individual values. The only
difference here is that a new formula must be used for the z values. It is:

𝑥−𝜇
𝑍=𝜎
ൗ 𝑛
- If a large number of samples of a given size are selected from a normally
distributed population, or if a large number of samples of a given size that is
greater than or equal to 30 are selected from a population that is not normally
distributed, and the sample means are computed, then the distribution of
sample means will look like the normal distribution.
- It’s important to remember two things when you use the central
limit theorem:
1. When the original variable is normally distributed, the
distribution of the sample means will be normally distributed, for
any sample size n.
2. When the distribution of the original variable is not normal, a
sample size of 30 or more is needed to use a normal distribution
to approximate the distribution of the sample means. The larger
the sample, the better the approximation will be.
- The following examples show you how the standard normal
distribution can be used to answer questions about sample
means.
Example 7.5
A. C. Nielsen: a research group reported that children between the ages of
2 and 5 watch an average of 25 hours of television per week. Assume the
variable is normally distributed and the standard deviation is 3 hours. If 20
children between the ages of 2 and 5 are randomly selected, find the
probability that the mean of the number of hours they watch television
will be greater than 26.3 hours.
Solution
Since the variable is approximately normally distributed, the distribution of
sample means will be approximately normal, with a mean of 25. The
standard deviation of the sample means is

𝜎 3
𝜎𝑥 = = = 0.671
𝑛 20
Step 1: Draw a normal curve and shade the desired area. The distribution of
the means is shown in the Figure below, with the appropriate area shaded.

Step 2: Convert the value to a z value. The z value is

𝑥 − 𝜇 26.3 − 25 1.3
𝑍=𝜎 = = = 1.94
ൗ 𝑛 3ൗ 0.671
20
Step 3: Find the corresponding area for the z value. The area to the
right of 1.94 is 1.000 - 0.9738 = 0.0262 or 2.62%

Step 4: Conclusion
One can conclude that the probability of obtaining a sample mean
larger than 26.3 hours is 2.62% [that is, P ( 𝑥 > 26.3) = 0.0262].
Specifically, the probability that the 20 children selected between the
ages of 2 and 5 watches more than 26.3 hours of television per week is
2.62%.
Solution
Let 𝑥 be the amount of uric acid in normal adult males
𝜇 = 5.7, 𝜎 = 1 and n = 9
𝜎2
⇒ 𝑥~𝑁(𝜇, )
𝑛

1
~𝑁(5.7, )
9
𝑥−𝜇 𝑥−𝜇
⇒ 𝑍= =𝜎 ~𝑁(0,1)
𝑠𝑑 ൗ 𝑛

Then,
i. P (X > 6) =?
𝑥−𝜇 6−5.7 0.3
⇒ P (X > 6) = P (𝜎 > 1 )=P 𝑧>
ൗ 𝑛 ൗ 9 0.3333

= P 𝑧 > 0.90 = 1 − P 𝑧 < 0.90 = 1 − 0.8159 = 𝟎. 𝟏𝟖𝟒𝟏


ii. P (5 < X < 6) =?

5−5.7 𝑥−𝜇 6−5.7 −0.7 0.3


⇒ P (5 < X < 6) = P ( 1 <𝜎 > 1 )=P <𝑧<
ൗ 9 ൗ 𝑛 ൗ 9 0.3333 0.3333

= P −2.10 < 𝑧 < 0.90 = P 𝑧 < 0.90 − P 𝑧 < −2.10


= 0.8159 − 0.01786 = 𝟎. 𝟕𝟗𝟖𝟎
iii. P (X < 5.2) =?

𝑥−𝜇 5.2−5.7 −0.5


⇒ P (X < 5.2) = P (𝜎 < 1 )=P 𝑧< = P 𝑧 < −1.50 =
ൗ 𝑛 ൗ 9 0.3333

𝟎. 𝟎𝟔𝟔𝟖
End

Of

Chapter 7

You might also like