Handouts STA632
Handouts STA632
Handouts STA632
Sampling Techniques
Virtual University of Pakistan
STA632-Sampling Techniques
BASIC CONCEPTS
Q . Define Sampling Techniques?
Ans: The data is very important for decision making. Appropriate method of data collection
is desired. How best to obtain the data or sample. How best to use the data/sample to estimate
the characteristic of the whole population. There are two parts of any sampling strategy. First
is the selection procedure. Second is the estimation procedure.
Q .Define sampling?
Ans: The method of selecting the sample is called sampling design.
Q . Define sample?
Ans: A perfect sample would be a version of population, mirroring every characteristic of
population. (Lohr,2nd edition).A good sample is a small but representative part of the population.
Q . What is difference between Observation and Sampling Units?
Ans: A unit that can be selected for a sample is called sampling unit. An object on which a
measurement is taken is called observation unit.
Q . Define sampling frame?
Ans: The sampling frame is the list/map from which the potential sampling units are drawn. For
example, List of all the class rooms , Map of area containing farms.
Q . Distinguish between parameter and statistic?
Ans: A parameter is any summary number described the whole population. A Statistic is any
summary number obtained from a sample.
Sample
(Subset of class) Statistic
Virtual University of Pakistan Page 2
STA632-Sampling Techniques
Population
Parameter
(Entire class)
Number of whiteballs
P(White)
Number of total balls
1
P(White)
2
Q . Define randomness?
Ans: When a unit has some chances of selection in the sample. When we are not certain about
the selection of any unit
Q . How many types of probability sampling?
Ans: Simple random sampling, stratified sampling, cluster sampling, systematic sampling,
multistage /multiphase sampling.
Q. Define Non-probability sampling?
Ans: No random process is followed for the selection of units. The population is evenly
distributed with respect to the characteristics of interest.
Q . What is convenience sampling?
Ans: Respondents are selected according to convenience of researcher. This is also called
accidental sampling, opportunity sampling or grab sampling.
Q. Define purposive sampling?
Ans: The units which are appropriate to meet the objective of study. The selection is based on
the judgment of researcher.
Q. What is quota sampling?
Ans: The population is divided into subgroups on the basis of similar characteristics. The
subgroups are called “Quota”. The nonrandom selection is made in each quota.
Q . Define snow-ball sampling?
Ans: This is also called chain sampling so it works like a chain. The selected subject asked for
the assistance to reach the other subjects.
Q. Define simple random sampling?
Ans: Each member of the population has an equal probability of being included in the sample.
Each sample of size ‘n’ has an equal probability of being selected in the sample.
Q. What is the difference between sampling with replacement and without replacement?
Ans: Sampling With Replacement: If the unit is replaced before the selection of next unit then
sampling is with replacement (WR).
Nn 1
Total samples = , P( S )
Nn
1 1 1
P(S1) = P(S2) = P(S3) = 3 N
3 C2 Cn
N!
N
Cn
n !( N n)!
Q . Prove that sample mean is unbiased estimator for population mean under simple
random sampling with/without replacement.
Ans: Unbiased Estimator: The estimator is unbiased if its expected value is equal to the
population parameter
E(estimator) Parameter
The expectation of a random variable is defined as the sum of the products of the probabilities
and the variable.
N
E( y ) p Y
i i1 i i
Theorem: In simple random sampling with replacement and without replacement, the sample
mean,
y , is an unbiased estimator of the population mean Y .
Under SRSWR: y is unbiased estimate of Y when
E y Y
n= sample size N=population size yi = observe variable on the ith sample unit.
n
E ( y) E 1
n
i
y
i1
n
1
n E( y )
i1 i
E( y )
i
N
E( y ) p Y
i i1 i i
N
1Y
i1 N i
N
1 Y
N i1 i
Y
E( y) Y
N
m
Under SRSWOR: Since there are n
possible distinct samples for without
replacement, then
m n y
N
E y ni
k 1 i1
n
N 1
N
n 1
Now in
n
possible samples, each unit is appearing times,
m
n y
N
E y ni
n
k 1 i1
.
N 1 N
N
1
n
Yi
n 1
i1
n
N
1 Y
N i1 i
Y
Q . Derive the Variance of Sample Mean under Simple Random Sampling (SRS) with
replacement and without replacement?
Ans: Variance of Sample Mean Under SRSWOR:
The variance of the sample mean for simple random sampling without replacement,
ywor ,
is
Var ( ywor ) N n Sn
2
N
2
(1 f ) Sn
where, f n
N
Proof:
We know that
n
y 1
n yi
i1 ,
The variance of this estimator is
n
Var y 12 Var yi
n
i1
Since the variance of the sum of the random variables is equal to the sum of the variance of
random variables plus the sum of the covariance.
Var y 12
n n
Var ( yi ) Cov ( yi y j )
n i 1
i, j 1
j i
Var ( y ) E ( y Y )2 E ( y 2) Y 2
i i i
N
1 Y 2 Y 2
N i1 i
N
1
NY
Y
2 2
N i1 i
N
1 Y Y
2
N i1 i
N 1S 2
N
Virtual University of Pakistan Page 10
STA632-Sampling Techniques
Cov( y y ) E( y y ) Y 2
i j i j
N
1
Y Y Y 2
N ( N 1) i, j 1 i j
j i
N 2 N
1
Yi
Y 2 Y 2
N N 1
i1 i
i 1
N 2 N
1
Yi
Y 2 Y 2
N N 1
i1 i
i 1
N 2
N 2 N
Yi
N N 1
1
Y 2 i 1
N N 1
Yi
i 1
i1 i N2
2
N
N 2 Y
i
Cov( y y ) 1
Yi i 1
S2 / N
i j N ( N 1)
i1 N
n
Var y 1 [ N 1S 2 Sn ]
n 2
n i1 N
2
i, j1
j i
n n 1
1 n N 1S 2 n S 2 N n Sn
2
n2 N
N
Var ( ywr ) N 1 Sn
2
N
2
(1 1 ) Sn
N
Since the variance of the sum of the random variables is equal to the sum of the variance of
random variables plus the sum of the covariance
n
Var y 1 Var ( yi )
n2 i1
Var ( y ) E( y Y )2 E( y 2) Y 2
i i i
N
1 Y 2 Y 2
N i1 i
N 1S 2
N
n N 1 2
1
Var y
n i1
2 N
S
Var y N 1S 2
Nn
Q. Derive unbiased estimator of variance under SRS with replacement and without
replacement.
Ans: For simple random sampling without replacement, s2 is an unbiased estimator of S2 and for
simple random sampling with replacement s2 is an unbiased estimator of S2 (N-1)/N.
Proof: For both simple random sampling without replacement and simple random sampling with
replacement, we have
n n 2
(n 1)s2 ( y y)2 ( y Y ) ( y Y )
i1 i i1 i
n n 2 n
( y Y )2 y Y 2 ( yi Y ) ( y Y )
i1 i i1
i1
n 2 n
( y Y )2 n y Y 2( y Y ) ( yi Y )
i1 i
i1
n 2
( y Y )2 n y Y 2n( y Y )2
i1 i
n 2
(y Y ) 2
n y Y
i1 i
Taking expectation
(n 1) E(s2) n E( y Y )2 n E( y Y )2
i
E( y Y )2 N 1 S 2,
As we know that i N therefore
E (s 2 ) n N 1 S 2
N n n S 2
n 1 N
N n 1 n
S2
E (s 2 ) n N 1 S 2 N 1 n S 2
n 1 N N n 1 n
N 1S 2
N
Standard error
SE ( y) Var ( y)
S 1
f for srswor
n
S 1n 1 N
1 for srswr
Q . What is R?
Ans: R was initially written by Ross Ihaka and Robert Gentleman at the Department of
Statistics of the University of Auckland. The R software is derived from an original set of notes
describing the S and S-Plus environments.
Downloading R https://cran.r-project.org/
R Commands y<-c(1,2,3,4,5)
#indexing Y[2]
#Mean mean(y)
#Standard Deviation sd(y)
#Variance var(y)
yp <- c(11,150, 121, 198, 12, 136, 14, 129, 17, 115, 186, 110, 121, 15, 14)
#population size N<-length(yp)
#sample of size 5 ys <- sample(yp,5)
#sample size n<-length(ys)
#mean of the sample mys<-mean(ys)
#variance of the sample vys<-var(ys)
#variance of ybar vybar<-(1-n/N) *var(yp)/n
#standard error sdr<-sqrt(vybar)
#estimate of population total ept<-N*mys
#estimated variance vept<-N^2*vybar
#estimated standard error sdept<-sqrt(vept)
yp <- c(11,150, 121, 198, 12, 136, 14, 129, 17, 115, 186, 110, 121, 15, 14)
#sample of size 5 ys <- sample(yp,5, replace=TRUE)
Q. Define the Simulation Study for Sampling Strategy?
Ans: The simulation study is useful to evaluate a sampling strategy. We can generate the
populations considering specific situations. Generate the population. The sample of size ‘n’ is
obtained ‘k’ times. From each sample the estimator is obtained. The variance of ‘k’ estimators is
calculated for examining the efficiency.
Consider a Population consisting on
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Enter population data in R
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
Taking sample by SRSWOR in R ys <- sample(yp,5)
mean(ys)
Now we will take ‘k’ samples in R.
R program for k Samples
Suppose, k=10, n=5.
The ‘for loop’ is used to repeat the statements.
m1<-c();
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110,
121, 115, 114)
for (i in 1:10){
s <- sample(yp,5)
m1[i] <- mean(s)
}
Output
Sample means with 10 repetitions
149.8 129.2 140.8 132.4 118.2 117.6 118.4 118.0 132.6 132.6
k Samples
Suppose, k=10000, n=5.
The ‘for loop’ is used to repeat the statements.
m1<-c();
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
for (i in 1:10000){
s <- sample(yp,5)
m1[i] <- mean(s)
}var(m1)
Output
On first run, the result was 101.7741
On second run, the result was 102.1403
On third run, the result was 100.2455
Q . Obtain the 10000 random sample of size 6 under SRSWOR using the following
Population and find the mean.
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Ans:
Given population is
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Enter population data in R
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
Here, k=15000, n=6.
The ‘for loop’ is used to repeat the statements.
m1<-c();
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
for (i in 1:10000){
s <- sample(yp,6)
m1[i] <- mean(s)
}
var(m1)
Out put
On first run, the result was 75.4034
On second run, the result was 75.9143
On third run, the result was 75.6871
Q . Obtain 1000 random number through normal distribution with mean 0 and variance 1
as population. Obtain the 10000 random sample of size 6 under SRSWOR using the
population and find the mean.
Ans: 1000 values with mean=0 and standard devitaion=1.
rnorm(n,mean,sd)
yp <- rnorm(1000,0,1)
Suppose, k=10000, n=5.
m1<-c();
yp <- rnorm(1000,0,1)
for (i in 1:10000){
s <- sample(yp,5)
m1[i] <- mean(s)
}
var(m1)
Output
On first run, the result was 0.2044265
On second run, the result was 0.1914794
On third run, the result was 0.198996
Q . Explain the Estimation of Sample Size for Mean Estimation by using an example?
Ans: Sample size Estimation
Let ‘d’ be the margin of error with some probability by which sampling value differs from
population value. Specifying a maximum allowable difference ‘d’ between the estimate and the
true value and allowing for a small probability .
The probability of the margin of error being less than d is given by
P y Y d
C0 C1E (n) d E ( y Y )2
L(n) C0 C1n d Var ( y )
S2
L(n) C0 C1n d
n
Derivative with respect to n and equating to zero will produce optimum value as
n dS 2 / C1
Q . Estimation of sample size for estimation of proportion?
Ans: Suppose we have N population units i.e. Y1, Y2, …. Yi, …YN
yi = 1 if ith unit possesses a certain attribute and 0 otherwise.
The population proportion is defined
N
The sample proportion is Y Yi N A / N P
i 1
n
y yi / n a / n p
i 1
Y
i 1
i
2
A NP
n
The same is the case for sample.
y
i 1
2
i
a np
N N
(Y Y ) Y
i 1
i
2
i 1
i
2
NY 2
N 2
(Yi Y )
S 2 i 1
( N 1)
N
S2 P(1 P)
N 1
NPQ
S2
N 1
Similarly s2 = npq / (n – 1)
For SRSWOR N n pq
var ( pwor ) ,
N n 1
For SRSWR N 1 pq
var ( pwr ) ,
N n 1
For SRSWOR N n 2
VAR( ywor ) S ,
Nn
N n NPQ
VAR( pwor ) ,
Nn N 1
N n PQ
VAR( pwor ) ,
n N 1
N n PQ
Standard Error of Proportion S .E ( pwor )
n N 1
For the large sample size
var( p) pq / n
The standard error
S.E ( p) pq / n
Sample size Estimation
pP d
t
S .E ( p) S .E ( p)
d t S.E ( p)
t 2 PQ / d 2
n
1 N1 t 2 PQ / d 2 1
When sample size is large.
pq
n t2
d
d t S.E ( p)
d 2 t 2 VAR( p)
N n PQ
d 2 t2
N 1 n
t 2 NPQ t 2 PQ
d2
n N 1 N 1
t 2 NPQ 2 t 2 PQ
n d
N 1 N 1
t 2 NPQ
n
N 1
2 t 2 PQ
d N 1
t 2 NPQ
n
N 1
1
( N 1)d 2 t 2 PQ
N 1
t 2 NPQ
n
( N 1)d 2 t 2 PQ
t 2 PQ
n d2
1 t 2 PQ
N d 2
( N 1)
t 2 PQ / d 2
n
1 N1 t 2 PQ / d 2 1
Q. Derive the Confidence Interval for mean and proportion. Also explain with the help of
R.
Ans: Confidence Interval for Mean (Known variance)
The interval estimation of population mean
y z1 S .E ( y )
2
Lower Limit
y z1 S .E ( y )
2
Upper Limit
y z1 S .E ( y )
2
Example: The XYZ Company produces cold drink diet cans with standard deviation of the
amount poured into cans by automatic filling machine is 1.4 ml (milliliter). A random sample is
taken of the amount of filling in cans were 281, 278, 276, 282, 280, 279, 278, 280. Suppose that
population of filling amount follows normal distribution. Determine 95% confidence interval for
the mean amount in all cans filled by the machine.
Solution:
Given
1.4
y
y 279.25
n
1.4
S .E ( y) 0.495
n 8
p z1 S .E ( p)
2
A simple random sample of 80 students is taken from a population of 470 students in a
department. The total number of smokers in the sample was 22.
Confidence Interval of Smokers
Smoker are 22
22
p 0.275
q 180
p 0.725
pq
S .E ( p) 0.0499
n
p 1.96S.E ( p)
The Lower Limit
STRATIFIED SAMPLING
Notations
N= + + + ---+
Similarly,
n = n1 + n2 + n3 + ---+ nh
Nh
Yhi
i 1
Yh Stratum mean
Nh
nh
1 Nh
Yhi Yh
2
Sh2
N h 1 i 1
1 nh
Estimated population total sh2 ( yhi y h )2
nh 1 i 1
yst Nyst
Q. Prove that sample mean is unbiased estimator of population mean under stratified
sampling.
In stratified sampling, the sample mean is an unbiased estimator of population mean i.e.
E yst Yst
As in Simple Random Sampling
E ( yh ) Yh
1 K
E yst Nh E ( yh ) Yst
Virtual University of Pakistan N h1 Page 28
STA632-Sampling Techniques
E( yh ) Y h
Similarly it can be proved that
E ( yst/ ) Y
Q . In how many allocation methods variance of sample mean can be derived in stratified
sampling?
Ans: Allocation of Sample Size
The variance of y can be derived by using following allocation methods:
st
i. Arbitrary Allocation,
ii. Proportional Allocation, and
iii. Optimum Allocation.
Q . Drive the variance of sample mean in stratified sampling using arbitrary allocation?
Ans: Arbitrary Allocation
The total sample is allocated arbitrarily among the strata
Theorem
The variance of sample mean, yst for stratified random sampling for finite population
sampling is
1 K Sh2
Var yst h h hn
N N n
N 2 h1 h
1 k S2
Var yst Nh Nh nh nh
N2 h 1 h
1 k Sh2
Var yst Nh2 Nh nh
N2 h 1 N h nh
k N h2 Sh2
Var yst Nh nh
h 1 N
2 N h nh
We know that the mean for stratified random sampling design is:
1 k
yst
N
Nh yh
h 1
1 k
Var yst Nh2Var yh
N2 h 1
N h nh Sh2
Var ( yh )
N h nh
k N h2 Sh2
Var yst Nh nh
h 1 N
2 N h nh
k
Var yst Wh2 Sh2
Nh nh
h 1 N h nh
1 N h2 Sh2
k 1 k
Var yst 2 Nh Sh2
N 2 h1 nh N h 1
1 k S2
Var yst Nh Nh nh nh
N2 h 1 h
N h2 Sh2 k
k
Var yst/ N h Sh2
h 1 nh h 1
k N h2 Sh2
Var yst Nh nh
h 1 N
2 N h nh
Virtual University of Pakistan Page 30
STA632-Sampling Techniques
k
Var yst Wh2 Sh2
Nh nh
h 1 N h nh
nN h
Nh N
Varprop yst Wh2 Sh2
k
h 1 nN h
Nh
N
k N n
Varprop yst Wh2 Sh2
h 1 nN h
k N n
Varprop yst Wh2 Sh2
h 1 nN h
Q. Derive the variance of sample mean in stratified sampling using optimum allocation
Ans: The purpose of optimum allocation is to allocate nh in such a way that minimum variance is
achieved for a minimum cost. nh are chosen either to minimize Var ( yst ) for a fixed sample size
or cost is minimized for given variance.
The two aspect of optimum allocation are
i. Sample size is proportional to stratum size and standard deviation of stratum (Neyman
Allocation).
ii. Sample size is inversely proportional to cost.
Sample Size to Minimum Variance for Fixed Cost
In stratified random sampling Var ( yst ) will be minimum subject to the cost when nh is
proportional Wh Sh / ch to i.e.
K
nh Wh Sh ch Wh Sh
h1
ch
k k
Var ( y st ) Wh2 Sh2 / nh Wh Sh2 / N
h 1 h 1
k
where C = total cost, C C0 Ch nh
h 1
nh Wh Sh Ch
Wh Sh
Ch
nh
k
k Wh Sh Ch
h 1
nh n
h 1
nh WS Ch
k h h
n
Wh Sh Ch
h 1
nh WS Ch
k h h
n
Wh Sh Ch
h 1
n Wh Sh Ch
nh k
Wh Sh Ch
h 1
n Wh Sh Ch
nh k
Wh Sh Ch
h 1
1 k k
Varmin yst Wh Sh ch Wh Sh ch
n h1 h1
Neyman Allocation
n Wh Sh Ch
nh k
Wh Sh Ch
h 1
n Wh Sh C
nh k
Wh Sh C
h 1
n Wh Sh
nh k
Wh Sh
h 1
2
1 k 1 k
Varmin yst Wh Sh Wh Sh2
n h1 N h 1
1 k k 1 k
Varmin yst Wh Sh ch Wh Sh ch Wh Sh2
n h1 h1 N h1
For large N
2
1 k
Varmin yst Wh Sh
n h1
Example:
Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18. The sample size from stratum-1, stratum-2 and
stratum-3 is arbitrarily decided as 4, 8 and 6, respectively.
The sample from each stratum is chosen as
Stra-1 Stra-2 Stra-3
1 7 23
3 12 14
2 8 20
5 5 22
11 24
10 17
9
12
N1 y1 N2 y2 N3 y3
1
yst
N
yst 13.70
W12 S12
N1 n1
W22 S22
N2 n2
W32 S32
N3 n3
N1 n1 N 2 n2 N3 n3
3
Var yst Wh2 Sh2
Nh nh
h 1 N h nh
13.7 1.96(0.8463)
Lower Limit
13.7 1.96(0.8463) 11.41
Upper Limit
13.7 1.96(0.8463) 14.73
Example:
Ans: Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18.
The sample size from stratum-1, stratum-2 and stratum-3 is
n 18
nh 6
L 3
The sample from each stratum is chosen as
Stra-1 Stra-2 Stra-3
1 7 23
3 12 14
2 8 20
5 5 22
4 11 24
3 10 17
mean 3 8.83 20
Nh 100 250 350
Sh 1.4142 2.6394 3.8471
nh 6 6 6
k k
yst Wh y h N h yh / N
h 1 h 1
N1 y1 N2 y2 N3 y3
1
yst
N
yst 13.58
3
Var yst Wh2 Sh2
Nh nh
h 1 N h nh
W12 S12
N1 n1
W22 S22
N2 n2
W32 S32
N3 n3
N1 n1 N 2 n2 N3 n3
13.58 1.96(0.8701)
Lower Limit
13.58 1.96(0.8701) 11.36
Upper Limit
13.58 1.96(0.8701) 14.78
Example:
Ans: Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18. First we will allocate the sample size to each stratum
according to proportional allocation.
19
N1 y1 N2 y2 N3 y3
1
yst
N
yst 13.46
W12 S12
N1 n1
W22 S22
N2 n2
W32 S32
N3 n3
N1 n1 N 2 n2 N3 n3
Lower Limit
13.46 1.96 (0.726) 12.47
Upper Limit
13.46 1.96 (0.726) 15.32
Q . Give an example to explain how Sample Size is obtained in Optimum Allocation?
Ans: A manufacturing company in interested to conduct a survey about a certain product from
three towns (say A, B, and C) of a city. The towns are different from each other with respect to
the household income. The number of houses in Town A, B, and C are 170, 135, and 80,
respectively.
The company finds that obtaining an observation cost from town A or B is same as Rs.500 (i.e.
c1= c2=500). The cost per observation in the town C is Rs. 800. (i.e.c3=800).
S1 3, S2 7, S3 10
The overall sample size with certain margin or error is 30. Find the sample size from each Town
(stratum) n1 , n2 , n3 ?
Sample Size in Optimum
Sh 3 7 10
Nh 170 135 80
N S / c
nh n 3 h h h
N S / c
Virtual University of Pakistan h h h Page 39
h 1
STA632-Sampling Techniques
N S
h 1
h h / ch 93.35
N S / c
nh n 3 h h h
N S / c
h h h
h 1
NS / c
n1 n 3 1 1 1
22.8078
N S / c n1 30 7.33 7
h h h 93.35
h 1
N S / c
42.2616
n2 n 3 2 2 2
n2 30 13.58 14
N S / c 93.35
h h h
h 1
NS / c 28.2843
n3 n 3 3 3 3
n3 30 9.09 9
N S / c 93.35
h h h
h 1
n1 7, n2 14, n3 9
Q . By using an example explain the Variance for Optimum Allocation
Ans: n 7, n 14, n 9
1 2 3
5 10, 8 20
4 9, 12 22
3 12, 11 24
6 8, 9 17
10 11,13 23
21
19
Nh 170 135 80
nh 7 14 9
k k
yst Wh yh N h yh / N yst
1
N1 y1 N2 y2 N3 y3
h 1 h 1 N
yst 9.47
9.49 1.96(0.57399)
Lower Limit
9.49 1.96(0.57399) 8.36
Upper Limit
Virtual University of Pakistan 9.49 1.96(0.57399) 10.61 Page 41
STA632-Sampling Techniques
S1 3, S2 7, S3 10
The overall sample size with certain margin or error is 30. Find the sample size from each Town
(stratum)
n1 , n2 , n3 ?
N S
nh n 3 h h
N S
h h
h 1
Stra-1 Stra-2 Stra-3
Sh 3 7 10
Nh 170 135 80
Ch 500 500 800
NhSh 510 945 800
N S
3
N S
h 1
h h 2255
nh n 3 h h
N S
h h
h 1
NS
n1 n 3 1 1 510
n1 30 6.78 7
N S
h h 2255
h 1
945
NS n2 30 12.57 12
n3 n 3 3 3 2255
N h Sh
h 1
N S
n2 n 3 2 2 800
N S n3 30 10.64 11
h h 2255
h 1
n1 7, n2 12, n3 11
k k
N1 y1 N2 y2 N3 y3
1
yst Wh yh N h yh / N yst
h 1 h 1 N
yst 9.29
3
Var yst Wh2 Sh2
Nh nh
h 1 N h nh
W12 S12
N1 n1
W22 S22
N2 n2
W32 S32
N3 n3
N1 n1 N 2 n2 N3 n3
st Var y 0.3184
Confidence Interval
The interval estimation of population mean
yst z1 S .E ( yst )
2
9.29 1.96(0.5643)
Lower Limit
9.29 1.96 (0.5643) 8.18
Upper Limit
9.29 1.96(0.5643) 10.39
Yi Y
N 2
S 2 i 1
N 1
k Nh
(Yhi Y )2
h 1 i 1
S2
N 1
k Nh
N n
(Yhi Y )2
Varran y h 1 i 1
Nn N 1
Yhi Y h Y h Y
k Nh k Nh 2
(Yhi Y )2
h 1 i 1 h 1 i 1
Yhi Y h
k Nh 2 k Nh 2
Y h Y
h 1 i 1 h 1 i 1
Y h Y 0
k
as
h 1
k k 2
( N 1) S 2 Nh 1 Sh2 Nh Y h Y
h 1 h 1
for the large N
k k 2
NS 2 N h Sh2 N h Y h Y
h 1 h 1
kNh 2 k Nh 2
S2 Sh Y h Y
h 1 N h 1 N
k k 2
Wh Sh2 Wh Y h Y
h 1 h 1
k k 2
S 2 Wh Sh2 Wh Y h Y
h 1 h 1
k 2
Vran V prop Wh Y h Y
h 1
Vran V prop
Variance (Prop) & Variance (Opt)
N n k
Varprop yst Wh Sh2
Nn h1
2
1 k 1 k
Varopt yst Wh Sh Wh Sh2
n h1 N h 1
1 k
2
k
Wh Sh2 Wh Sh
n h1 h1
1 k
2 2
k k
Var y prop Var yopt Wh S h2 2 Wh S h Wh S h
n h1
h1 h1
1 k
2
k k k k
Wh Sh2 2 Wh Sh Wh Sh Wh Wh Sh
n h1 h1 h1 h1 h1
2
1 k k
Var y prop Var yopt Wh Sh Wh Sh
n h1 h 1
2
1 k k
Var y prop Var yopt Wh Sh Wh Sh
Comparison n h1 h 1
V prop Vopt
Vran V prop
Vran V prop Vopt
Example:
All the 80 farms in a population are stratified by farm size. The expenditure on the insecticides
used during the last year by each farmer is presented in table
Ans:
Population Mean
It is given that N=80,
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75 65 ..... 16)
Y Yi / N 47.79
i 1 80
Variance Under SRSWOR
The overall standard deviation is
Var( y) N n S 2
Nn
Yi Y
N 2
S 2 i 1
N 1
268.68
Var( y) 80 24 268.68
80 24
Var( y) 7.84
Variance under Stratified Sampling
Stra-1 Stra-2 Stra-3
Wh 0.25 0.45 0.30
Nh 20 36 24
Sh2 169.52 70.56 61.45
nh 8 8 8
3
Var yst Wh2 Sh2
Nh nh
h 1 N h nh
W12 S12
N1 n1
W22 S22
N2 n2
W32 S32
N3 n3
N1 n1 N 2 n2 N3 n3
Var yst 2.64
Under SRSWOR
Var( y) 7.84
Population Mean
It is given that N=80,
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75 65 ..... 16)
Y Yi / N 47.79
i 1 80
Variance under SRSWOR
The overall standard deviation is
Yi Y
N 2
S 2 i 1
N 1
268.68
Var( y) N n S 2
Nn
Var( y) 80 24 268.68
80 24
Var( y) 7.84
Proportional Allocation
N1=20, N2=36, N3=24,N=80, n=24
n
n nh 24 N h
n1 N1 N 20 6
N 80
n 24 Page 48
Virtual University of Pakistan n2 N 2 36 10.8
N 80
n 24
n3 N3 24 7.2
N 80
STA632-Sampling Techniques
3
Var yst Wh2 Sh2
Nh nh
h 1 N h nh
W12 S12
N1 n1
W22 S22
N2 n2
W32 S32
N3 n3
N1 n1 N 2 n2 N3 n3
RE= 290.5
Example:
Comparison between Neyman Allocation and Simple Random Sampling?
Ans: Population Mean
It is given that N=80,
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75 65 ..... 16)
Y Yi / N 47.79
i 1 80
Variance under SRSWOR
The overall standard deviation is
Var( y) N n S 2
Nn
Yi Y
N 2
S 2 i 1
N 1
268.68
Var( y) 80 24 268.68
80 24
Var( y) 7.84
Variance under Neyman Allocation
Stra-1 Stra-2 Stra-3
Nh 20 36 24
NS N S NS
n1 n 3 1 1 8.3 n2 n 3 2 2 9.7 n3 n 3 3 3 6
N S N S N S
h h h h h h
h 1 h 1 h 1
Nh 20 36 24
nh 8 3
Var yst Wh2 Sh2
10 Nh 6nh
h 1 N h nh
W12 S12
N1 n1
W22 S22
N2 n2
W32 S32
N3 n3
N1 n1 N 2 n2 N3 n3
Comparison of Variances
Under SRSWOR
Var( y) 7.84
RE= 311.2
• str1<-c(75,76,65,79,86,62,57,92,45,50,69,48,48,77,60,60,55,64,66,58)
• str2<c(55,40,51,45,38,55,35,33,41,30,43,48,42,53,54,38,37,36,40,52,44,36,39,47,48,46,3
9,46,42,41,28,47,61,35,31,23)
• str3<-c(35,31,26,28,38,32,36,42,18,40,33,16,25,29,18,25,28,35,32,26,13,30,19,37)
Parameters
Mean and Standard Deviation
y<-c(str1,str2,str3)
m_p=mean(y)
sd_p=sd(y)
Output
m_p43.7875
sd_p16.39
Defining Terms
Defining Stratum Size
N1=length(str1)
N2=length(str2)
N3=length(str3)
N=N1+N2+N3
Defining Weights
W1=N1/N;W2=N2/N;
W3=N3/N
n=24
Mean & Standard Deviation
Mean of Strata
m_st1=mean(str1)
m_st2=mean(str2)
m_st3=mean(str3)
sd_st1=sd(str1)
sd_st2=sd(str2)
sd_st3=sd(str3)
Output
64.6,42.19,28.83
13.01,8.4,7.84
Variances of Strata
var_st1=var(str1)
var_st2=var(str2)
var_st3=var(str3)
m_yst=(1/N)*
Q . Define the following data in R and perform stratified sampling using the proportional
allocation using the sample size 24. Also find the mean and variance.
Ans:
n2=round(n*(N2/N))
n3=round(n*(N3/N))
Term1=(W1^2)*var_st1*(N1-n1)/(N1*n1)
Term2=(W2^2)*var_st2*(N2-n2)/(N2*n2)
Output
24=6+11+7
0.25,0.45,0.3
(N3-n3)/(N3*n3)
vp_prop=Term1+Term2+Term3
Output
2.698
Sampling from Stratum
s1=sample(str1,n1)
s2=sample(str2,n2)
s3=sample(str3,n3)
ms_st1=mean(s1)
ms_st2=mean(s2)
ms_st3=mean(s3)
m_prop=?
Output
73.17,42.55,27.14
45.58
Estimated Variance
vars_st1=var(s1);vars_st2=var(s2);vars_st3=var(s3)
Term1= (W1^2)*vars_st1*(N1-n1)/(N1*n1)
Term2=(W2^2)*vars_st2*(N2-n2)/(N2*n2)
Term3=(W3^2)*vars_st3*(N3-n3)/(N3*n3)
vs_prop =Term1+Term2+Term3
output
> vs_prop
[1] 2.57464
Q . Define the following data in R and perform stratified sampling using the Neyman
allocation using the sample size 24. Also find the mean and variance.
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37
Ans:
Defining Strata in R
str1<-c(75,76,65,79,86,62,57,92,45,50,69,48,48,77,60,60,55,64,66,58)
str2<c(55,40,51,45,38,55,35,33,41,30,43,48,42,53,54,38,37,36,40,52,44,36,39,47,48,46,39,46,42
,41,28,47,61,35,31,23)
str3<-c(35,31,26,28,38,32,36,42,18,40,33,16,25,29,18,25,28,35,32,26,13,30,19,37)
Parameters
y<-c(str1,str2,str3)
m_p=mean(y)
sd_p=sd(y)
m_p 43.7875
sd_p 16.39
Defining Terms
Defining Stratum Size
N1=length(str1)
N2=length(str2)
N3=length(str3)
N=N1+N2+N3
Defining Weights
W1=N1/N; W2=N2/N; W3=N3/N
n=24
Output
80=20+36+24
0.25,0.45,0.3
Sample Size Allocation
nn1=round(n*(N1*sd_st1/sum))
nn2=round(n*(N2*sd_st2/sum))
nn3=round(n*(N3*sd_st3/sum))
Term1=(W1^2)*var_st1*
(N1-n1)/(N1*n1)
Term2=(W2^2)*var_st2*
(N2-n2)/(N2*n2)
Output
24=8+10+6
0.79,1.03,0.69
Variance Under Neyman Allocation
Term3=(W3^2)*var_st3*
(N3-n3)/(N3*n3)
vp_Ney=Term1+Term2+Term3
Output
2.52
Sampling From Strata
s1=sample(str1,n1)
s2=sample(str2,n2)
s3=sample(str3,n3)
ms_st1=mean(s1)
ms_st2=mean(s2)
ms_st3=mean(s3)
m_Ney=?
Output
68.13, 40.2,26.83
43.17
Estimated Variance
vars_st1=var(s1);vars_st2=var(s2);vars_st3=var(s3)
Term1= (W1^2)*vars_st1*(N1-nn1)/(N1*nn1)
Term2= (W2^2)*vars_st2*(N2-nn2)/(N2*nn2)
Term3=(W3^2)*vars_st3*(N3-nn3)/(N3*nn3)
Estimated Variance
vs_Ney=Term1+Term2 +Term3
Output
> vs_Ney
Q. Define the following data in R and perform stratified sampling. Calculate the
parameters for each stratum. Also find the mean of population.
Stratum
N=16;N1=4;N2=6;N3=6; n1=2;n2=3;n3=3;n=8;
Y<-c(12,14,19,22,362,441,456,482,444,472,124,189,142,165,135,140)
Y1<-c(12,14,19,22)
Y2<-c(362,441,456,482,444,472)
Y3<-c(124,189,142,165,135,140)
N=16;N1=4;N2=6;N3=6; n1=2;n2=3;n3=3;n=8;
w1=N1/N; w2=N2/N;w3=N3/N;
y1=c();y2=c();y3=c();yst=c(); for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
}
mean(yst); var(yst)
Output
226.138
56.25539
mean(y)
vp=(w1^2)*var(Y1)*(N1-n1)/(N1*n1) + (w2^2)*var(Y2)*(N2-n2)/(N2*n2) +
(w3^2)*var(Y3)*(N3-n3)/(N3*n3)
Output
226.1875
Vp=56.12526
Q . Generate the stratified population consisting on three strata such that stratum-1 is
normally distributed with mean 10 and standard deviation 2 with 100 values, stratum-2 is
normally distributed with mean 100 and standard deviation 2 with 500 values, stratum-3 is
normally distributed with mean 500 and standard deviation 2 with 1000 values. Find the
mean and variance for this population using the method of stratified sampling.
Ans: Simulation Study
N1=100;N2=500;N3=1000;n=50
Y1<-rnorm(N1,mean=10,sd=2)
Y2<-rnorm(N2,mean=100,sd=2)
Y3<-rnorm(N3,mean=500,sd=2)
Y<-c(Y1,Y2,Y3)
y1=c();y2=c();y3=c();yst=c();
N=N1+N2+N3;
w1=N1/N;w2=N2/N;w3=N3/N;
n1=round(n*w1)
n2=round(n*w2)
n3=round(n*w3)
Looping
for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
}
mean(yst); var(yst)
Output
Mean(yst)=344.36
Var(yst) = 0.08
Variance of Mean Using R
mean(y)
vp=(w1^2)*var(Y1)*(N1-n1)/(N1*n1) + (w2^2)*var(Y2)*(N2-n2)/(N2*n2) +
(w3^2)*var(Y3)*(N3-n3)/(N3*n3)
Q Generate a population of size 1000 consisting on three strata such that
200 values for stratum-1 from normal distribution with mean=2 and standard
deviation=3.
300 values for stratum-2 from normal distribution with mean=10 and standard
deviation=9.
500 values for stratum-3 from normal distribution with mean=30 and standard
deviation=5.
Allocate the sample size to each stratum by Neyman Allocation where n=50.
Select the sample from each stratum and estimate the mean of population.
Ans:
Defining Population
N1=200;N2=300;N3=500 ; n=50
Y1<-rnorm(N1,mean=2,sd=3)
Y2<-rnorm(N2,mean=10,sd=9)
Y3<-rnorm(N3,mean=30,sd=5)
Y<-c(Y1,Y2,Y3)
Sample Size Under Neyman Allocation
N=N1+N2+N3;
w1=N1/N;w2=N2/N;w3=N3/N
sum=w1*3+w2*9+w3*5
n1=round(n*w1*3/sum)
n2=round(n*w2*9/sum)
n3=round(n*w3*5/sum)
Looping
y1=c();y2=c();y3=c();yst=c();
for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
} mean(yst); var(yst)
Output
Mean(yst)=344.36
Var(yst) = 0.08
Q. Find the mean and variance of Proportion in Stratified Sampling
Ans: Suppose we have N population units i.e. Y1, Y2, …. Yi, …YN
yi = 1 if ith unit possesses a certain attribute and 0 otherwise.
The population proportion is defined
N
Y Yi N A / N P
i 1
The sample proportion is
n
y yi / n a / n p
i 1
Y
i 1
i
2
A NP
y
i 1
2
i
a np
N N
(Yi Y )2 Yi 2 NY 2
i 1 i 1
N 2
(Yi Y )
S i 1
2
( N 1)
N
S2 P(1 P)
N 1
NPQ
S2
N 1
Similarly s2 = npq / (n – 1)
Unbiased Variance Estimator
For SRSWOR
N n 2
VAR( ywor ) S ,
Nn
N n NPQ
VAR( pwor ) ,
Nn N 1
Virtual University of Pakistan Page 60
STA632-Sampling Techniques
N n PQ
VAR( pwor ) ,
n N 1
Proportion Estimation
1 k
pst
N
ph N h
h 1
For single stratum
Ah N h Ph ,
S h2 N h Ph Qh / ( N h 1),
N h nh
Var ( ph ) Ph Qh ,
( N h 1) nh
1 k
pst
N
ph N h
h 1
1 K
Var ( pst ) Nh2Var ( ph )
N2 h 1
N h nh PhQh
Var ( ph )
N h 1 nh
1 k N h2 ( N h nh ) PhQh
2
Var ( pst )
N h 1 Nh 1 nh
k ( N h nh ) PhQh
Var ( pst ) Wh2
h 1 Nh 1 nh
Sh2 Nh PhQh / ( Nh 1)
N n k Wh Nh PhQh
Varprop pst
Nn h1 ( N h 1)
Sh2 Nh PhQh / ( Nh 1)
2
1 k 1 k
Varopt pst Wh N h PhQh / ( N h 1) Wh Sh2
n h1 N h 1
2
1 k 1 k
Varopt pst Wh PhQh Wh Sh2
n h1 N h 1
Example:
The management of a local newspaper is to decide whether it should continue with the
publication of 'Children Column', which had been introduced on experimental basis. For this
purpose, it is imperative to estimate the proportion of readers who would favor its continuance.
The frame consists of readers who had stayed with the paper for the last six months. Since
different attitudes are expected from the urban and rural readers, the population is stratified into
urban readers and rural readers. In the population, there are 73000 urban readers and 30280 rural
readers.
1016
n2 30280 298
103280
The investigator selected WOR simple random samples of 718 respondents from stratum I
(urban readers) and 298 readers from stratum II (rural readers). The number of individuals who
favor continuation of the column was 570 from stratum I and 143 from stratum II.
Sample Size Allocation
1 k
pst
N
ph N h
h 1
1
pst p1N1 p2 N2
N
Output
P1=0.7939
P2= 0.4799
Pst=0.7018
Example:
The management of a local newspaper is to decide whether it should continue with the
publication of 'Children Column', which had been introduced on experimental basis. For this
purpose, it is imperative to estimate the proportion of readers who would favor its continuance.
The frame consists of readers who had stayed with the paper for the last six months. Since
different attitudes are expected from the urban and rural readers, the population is stratified into
urban readers and rural readers. In the population, there are 73000 urban readers and 30280 rural
readers.
The investigator selected WOR simple random samples of 718 respondents from stratum I
(urban readers) and 298 readers from stratum II (rural readers). The number of individuals who
favor continuation of the column was 570 from stratum I and 143 from stratum II.
Sample Size Allocation
Output
P1=0.7939
P2= 0.4799
N1=73000, N2= 30280
N=103280, n=1016
N1 73000 N 2 30280
W1 0.7068 W2 0.2932
N 103280 N 103280
Proportion Estimation
N1=73000, N2= 30280, N=103280, n=1016
1 k
pst
N
ph N h
h 1
1
pst p1N1 p2 N2
N
pst 0.7018
Estimated Variance
Stra-1 Stra-2
Nh 73000 30280
nh 718 298
ph 0.7939 0.4799
k ( N h nh ) ph qh
var ( pst ) Wh2
h 1 Nh nh 1
( N1 n1 ) p1q1 ( N 2 n2 ) p2 q2
Var ( pst ) W12 W22
N1 n1 1 N2 n2 1
Output
Ws 0.7068,0.2932
= .0001129 + .0000715 = .0001844
Confidence Interval for Proportion
Systematic Sampling
Introduction:
First unit is selected randomly from first ‘k’ units and rest of the units are selected automatically.
Systematic sampling has many types but we discuss the commonly used methods i.e. the linear
and circular systematic sampling.
‘ ‘ ‘ ‘
‘
‘
‘ ‘ ‘ ‘
‘
‘
‘ ‘ ‘ ‘
‘
‘
k k, 2k, 3k ,… ,ik …. Nk
1 1,7,13 125,164,155
2 2,8,14 135,169,159
3 3,9,15 157,147,139
4 4,10,16 192,150,147
5 5,11,17 151,138,149
6 6,12,18 175,167,158
N=15, n=3
125, 135,
157, 192, 151,
175,164,169, 147,
150,138, 167, 155, 159,139
Q . Prove that sample mean is unbiased estimator of population mean
Sample
Group Sample Composition
Mean
‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk
Population Mean
The population mean is given by
1 k n
Y yri ,
nk r 1 i 1
Expectation of Sample Mean
From the table, we can see that
1 n
yr yri
n i 1
1 k
E ( ysy ) yr ,
k r 1
1 k 1 n
E ( ysy ) yri ,
k r 1 n i 1
1 k n
E ( ysy ) yri ,
nk r 1 i 1
1 k n
Y yri ,
nk r 1 i 1
E ( ysy ) Y
Example
A certain company claims about their daily production in numbers as
125, 135, 157,192,151, 175,164,169,147,150,138,167,155,159,139,147,149,158.
We are interested to select the systematic sample of size 3.
Here we have N=18 with n=3, so k=6
125, 135, 157,192,151, 175,164,169,147,150,138,167,155,159,139,147,149,158.
125,164,155 148
135,169,159 154.333
157,147,139 147.667
192,150,147 163
151,138,149 146
175,167,158 166.667
Mean=154.28
Q. Derive the Variance of Sample Mean Under Systematic Sampling?
Sample
Group Sample Composition
Mean
1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,…,ik…. Nk yk
1 k
V ( ysy )
k r 1
( yr Y )2
k n 2
yri Y
r 1 i 1
S2
nk 1
k n
nk 1 S 2 yri Y
2
r 1 i 1
k n
nk 1 S 2 yri Y
2
r 1 i 1
k n 2
yri yr yr Y
r 1 i 1
yri yr yr Y
k n k n 2
2
r 1 i 1 r 1 i 1
nk 1 S 2 yri yr 2 yr Y
k n k n 2
r 1 i 1 r 1 i 1
Sample
Group Sample Composition
Mean
‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk
nk 1 S 2 yri yr 2 yr Y
k n k n 2
r 1 i 1 r 1 i 1
nk 1 S 2 k n 1 Sw2 n yr Y
k 2
r 1
N 1 2 k (n 1) 2
Var ( ysy ) S Sw
N N
N 1 2 k (n 1) 2
Var ( ysy ) S Sw
N N
Example:
The heights of the 30 trees from a certain area of a forest are given by
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,
28,29,31
Select a systematic random sample of size 5.
Estimate the mean of the population
Estimate the variance.
We have
N=30, n=5;k=6
40,32,30,29,17 29.6
38,30,28,32,22 30
3 32
6,31,24,34,35 30.6
35,29,25,36,28 29
32,37,26,21,29 30.6
35,41,27,19,31
1 k
E ( ysy ) yr ,
k r 1
1 k
V ( ysy )
k r 1
( yr Y )2
Sum=181.8
Mean=30.3
var=0.89
Sumdev=5.34
Q . Drive the following expression of variance of sample mean under Systematic Sampling.
Proof:
1 k
V ( ysy ) ( yr Y )2
k r 1
2
n
k yi
k Vn( y ) 1 i 1 Y
yrikr Y1 ynru Y
1 sy
nk (n 1) r 1 i u 1
w
N 1 2
S
nk 1 k n
2
V ( ysy ) 2 yri Y
The intra class correlation between the pairs nofkunits that are in2the same systematic sample is
1 k r 1ni 1
V ( ysy ) 2 yri Y
n k r 1 i 1
n
yri Y yru Y
1 k 2
V ( ysy ) yrj Y
n2 k r 1 i 1
1 k n
yri Y yru Y
2 k n
2 ri
V ( ysy ) y Y
n k r 1 i 1 r 1 i u 1
1
yri Y yru Y
k n
V ( ysy ) 2
nk 1 S 2
n k r 1 i u 1
1
nk 1 S 2 yri Y yru Y
k n
V ( ysy ) 2
n k r 1 i u 1
1 k n
V ( ysy ) 2 nk 1 S 2 yri yi yru yu
n k r 1 i u 1
k n E ( yri Y )( yru Y )
w n 1 nk 1 S 2 yri yi yru yu w
r 1 i u 1
E ( yri Y )2
1
V ( ysy ) nk 1 S 2 w n 1 nk 1 S 2
n2 k
nk 1 S 2 1 n 1
V ( ysy ) w
n2 k
nk 1 S 2 1
V ( ysy ) w n 1
n2 k
OR
nk 1 S 2 1
V ( ysy ) n 1
n
w
nk
OR
N 1 S 2
V ( ysy )
Nn 1 w n 1
Q. Describe the comparison between SRS and Systematic Sampling on th basis of variance
of sample mean?
Ans:
N 1 2 k (n 1) 2
Var ( ysy ) S Sw
N N
N n 2
Var ( ysrs ) S
Nn
N n 2 N 1 2 k (n 1) 2
Var ( ysrs ) Var ( ysy ) S S Sw
Nn N N
N n N 1 2 (n 1) 2
S Sw
Nn N n
Var ( ysrs )
R.E
Var ( ysy )
N n 2
S
R.E Nn
N 1 S 2 1 n 1
w
Nn
1
R.E
N 1 1 n 1
w
N n
Var ( ysrs )
R.E
Var ( ysy )
1
R.E
N 1 1
n 1
N n
w
‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk
1 n
ystr
nk
ky j
j 1
1 n
ystr yj
n j 1
Virtual University of Pakistan Page 75
V y j
1 n
V ystr
n2 j 1
STA632-Sampling Techniques
1 nk 1 2
V ystr 2 Sj
n j 1 k
k 1 n
V ystr 2 S 2j
n k j 1
k 1 2
V ystr Swst
nk
nk n 2
V ystr Swst
n.nk
N n 2
V ystr Swst
nN
N n 2
V ystr Swst
nN
n
yrj y j
1 k
yri yi yru yu
2
V ( ysy ) 2
n k r 1 i 1
1 k n k n
V ( ysy ) 2 yri yi 2
yri yi yru yu
n k r 1 i 1 r 1 i u 1
1 k n
V ( ysy ) nk 1 S 2
yri yi y ru yu
n2 k r 1 i u 1
Alternative form of variance
1 k n
V ( ysy ) 2
nk 1 S 2 yri yi yru yu
n k r 1 i u 1
The intra class correlation between the pairs of units that are in the same systematic sample is
E ( yri yi )( yru yu )
wst
E ( yri yi )2
1 k n
nk (n 1) r 1 i u 1
yri yi yru yu
w
N 1 2
S wst
nk
yrj y j yru yu
k n
wst n 1 nk 1 Swst
2
r 1 j u 1
1
yru yu
k n
V ( ysy ) N n S 2
wst yrj y j
n2 k r 1 j u 1
1
V ( ysy ) N n Swst
2
wst n 1 N n S wst
2
n2 k
N n Swst
2
V ( ysy )
nN
1 wst n 1
N n 2
Var ( ystr ) Swst
Nn
N n Swst
2
N n Swst
2
Var ( ysrs ) Var ( ysy )
nN
1 wst n 1 nN
Var ( ystr )
R.E
Var ( ysy )
1
R.E
1 wst n 1
Q . State the Stratified Sampling in terms of systematic sampling for population with linear
trend?
Ans: The population increase according to linear trend. The variance of sample mean for
SRSWOR is
Var ( ywor )
k 1 nk 1
12
For Stratified Sampling in Terms of Systematic Sampling
N n 2
Var ( ystr ) Swst
Nn
k
y Yk
2
r
2
S wst r 1
k 1
y Yk
2
r
r 1
yr2 Yk2 2 yr Yk
k
r 1
k k
yr2 kYk2 2Yk yr
r 1 r 1
k
yr2 kYk2 2kYk2
r 1
k
yr2 kYk2
r 1
2
k
yr
yr2 r 1
k
r 1 k
N N
yi i 1 2 .... N
i 1 i 1
k k
yr r 1 2 .... k
r 1 r 1
k
k k 1
r
r 1 2
k
k k 1 2k 1
r
r 1
2
6
r
2
k
k k 1 2k 1 k (k 1) 2
6 4k
k k 1 2k 1 (k 1)
2 3 2
k k 1 k 1
2 6
k
k k 1 k 1
y Yk
2
r
r 1 2 6
k
y Yk
2
r
2
S wst r 1
k 1
k k 1
2
S wst
12
N n 2
Var ( ystr ) Swst
Nn
nk n
Var ( ystr ) 2
2
Swst
n k
k 2 1
Var ( ystr )
12n
Q . Describe the variance of systematic sampling for population with linear trend.
Ans: Variance of sample mean under Systematic Sampling
k
y Y
2
r
Var ( ysy ) r 1
k
k
2
yr
1 k 2 r 1
yr
k r 1 k
N N
yi i 1 2 .... N
i 1 i 1
Virtual University of Pakistan Page 79
STA632-Sampling Techniques
k k
yr r 1 2 .... k
r 1 r 1
k
k k 1
r
r 1 2
k
k k 1 2k 1
r
r 1
2
6
k
2
r
1 k 2 r 1
y
yr
k r 1 k
k
2
r
1 k 2 r 1
r
k r 1 k
1 k k 1 2k 1 k (k 1)
2
k 6 4k
k k 1 2k 1 (k 1)
2k 3 2
k k 1 k 1
2k 6
k k 1 k 1
12k
k 2
1
When n=1 12
Var ( ywor )
k 1 nk 1
12
k 2 1
Var ( ystr )
12n
Virtual University of Pakistan Page 80
Var ( ysy )
k 2
1
12
STA632-Sampling Techniques
When n>1
Var ( ywor )
k 1 nk 1
12
k 2 1
Var ( ystr )
12n
Var ( ysy )
k 2
1
12
y<-c(125, 135,157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158)
y<-c(125, 135,157,192,
151,175,164,169,147,
150,138,167,155,159,139,147,149,158)
n=3;N=length(y)
k=N/n
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-y[s]
mean(sys.sample)
Output
First run=154.33
2nd run=148
Example: 2
The heights of the 30 trees from a certain area of a forest are given by
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,28,29,31
Select a systematic random sample of size 5. Also find sample mean.
Ans:
N=30, n=5;k=6
pop<-c(40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,
36,21,19,17,22,35,28,29,31)
n=5;N=length(pop);
k=N/n
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-pop[s]
mean(sys.sample)
var(sys.sample)
sd(sys.sample)
Output
29.6(8.26)
29(sd=6.04)
Example: A certain company claims about their daily production as
125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.
We are interested to select the systematic sample of size 3.
Obtain 10000 samples using systematic sampling. Find the mean of each sample. Find the mean
of means and variance of means.
Ans:
Example
The heights of the 30 trees from a certain area of a forest are given by
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,
28,29,31
Select a systematic random sample of size 5.
Obtain 5000 samples using systematic sampling.
Find the mean of each sample.
Find the mean of means and variance of means.
Ans
We have
N=30, n=5;k=6
pop<-c(40,38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,
19,17,22,35,28,29,31)
n=5;N=length(pop);k=N/n
for(i in 1:5000)
{
Select the 5000 sample each of size 50 using the systematic sampling technique and estimate the
mean of each sample.Find the mean and variance of 5000 means.
Ans:
N=500; n=50;k=N/n;m=c();
pop<-rnorm(N,mean=20,sd=10)
for(i in 1:5000)
{
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-pop[s]
m[i]=mean(sys.sample)
}
mean(m);var(m);
Output
19.93
1.456223
Cluster Sampling
Introduction:
Cluster Sampling
A cluster is the sampling unit consisting on the observation units.
Any sampling method can be used for selection of clusters.
All the units within a cluster are studied.
Nine clusters each of same size.
Clusters Settings
Sample
Mean
Cluster 1, 2 3 …, j, … M
y1.
1 y11 y12 y13 …, y1j, … y1M y1
M
2 y21 y22 y23 …, y2j, … y2M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
yi.
i yi1 yi2 yi3 …, yij, … yiM yi
M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
N yN1 yN2 yN3 …, yNj, … yNM yN
Notations
Suppose we have a population of N Clusters each of size M.
NM= Total elements in population.
yij = observation value of jth element in ith cluster.
nM= Total elements in sample.
yi.= yi= Total of ith cluster.
M
yi. yij
j 1
y
y
M
Unbiased mean estimator
M E( y ) E( y )
M E( y ) Y
E( y ) Y
Q. Find the variance of Sample Mean in Cluster Sampling considering equal cluster sizes.
Ans:
Unbiased mean estimator
The unbiased mean estimator is,
n M
yij
i 1 j 1
y
Mn
V y E y Y
2
1 f 2
V y Sb
n
yi Y
N 2
i 1
Sb2
N 1
yi Y
N 2
1 f i 1
V y
n N 1
2
1
yi Y
N 2 N M
yij Y
i 1 M
i 1 j 1
2
1
2 ij
y Y yik Y
N M 1 N M 2 1 N n
M yij Y 2 yij Y
i 1 j 1 M i 1 j 1 M i 1 j k 1
2
1 1
M yij Y y
N M N n
NM 1 S 2
yij Y ik Y
i 1 j 1 M 2 i 1 j k 1
2
1 M 1
N N n
M yij Y 2 NM 1 S 2 yij Y yik Y
i 1
j 1 M i 1 j k 1
The interclass correlation between the elements within a cluster,
E ( yij Y )( yik Y )
w
E ( yij Y )2
y
1 N n
NM ( M 1) i 1 j k 1
yij Y ik Y
w
NM 1 2
NM S
yij Y yik Y
N n
w M 1 NM 1 S 2
i 1 j k 1
2
1
N M 1
M yij Y 2 NM 1 S 2 w M 1 NM 1 S 2
i 1 j 1 M
2
M
N
yi Y
N 2
yij Y
1 f i 1 1 1 f i 1 j 1
V y 2
n N 1 M n N 1
1 1 f NM 1 S w M 1 NM 1 S
2 2
2
M n N 1
1 1 f
V ( y) 2
M n
NM 1 S 2
N 1 1 w M 1
S2
V ( y) 1 w M 1 NM 1 NM ,
nM
N 1 N
Q. Compare the simple random sampling and cluster Sampling in terms of variances of
sample means such that
N n S2
Var ( ysrs )
N n
NM nM S
2
V ysrs
NM nM
Ans:
Comparison
N n S2
Var ( ysrs )
N n
NM nM
2
S
V ysrs nM
NM
1 N M 2
S2 yij Y
NM 1 i 1 j 1
Mean sum of squares within clusters in the population
N M
S 2 NM 1 yij Y
2
i 1 j 1
Mean sum of squares for ith cluster
N M 2
( yij yi ) ( yi Y )
i 1 j 1
N M N
( yij yi )2 M ( yi Y )2
i 1 j 1 i 1
N (M 1)Sw2 M ( N 1)Sb2
Var ( ysrs )
R.E
Var ( y )
S2
R.E
MSb2
N ( M 1) S w2 M ( N 1) Sb2
R.E
MSb2 NM 1
1 N ( M 1) S w2 M ( N 1) Sb2
NM 1 MSb2 MSb2
1 N ( M 1) S w2
( N 1)
NM 1 MSb 2
This value increases when Sw is large and Sb is small. So cluster sampling will be
efficient if clusters are so
Formed that the variation the between cluster means is as small as possible while
variation within the clusters is as large as possible.
Q. Compare the simple random sampling and cluster Sampling in terms of interaclass
correlation
Ans:
Var ( ysrs )
R.E
Var ( y )
S2
V ( y)
nM 1 w M 1
S2
V ysrs
nM
S2
R.E nM
S2
1 w M 1
nM
1
R.E
1 w M 1
Sample
Mean
Cluster 1, 2 3 …, j, … Mi
y1.
1 y11 y12 y13 …, y1j, … y1M1 y1
M
2 y21 y22 y23 …, y2j, … y2M2
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
yi.
i yi1 yi2 yi3 …, yij, … yiMi yi
M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
N yN1 yN2 yN3 …, yNj, … yNMN yN
Mean Estimator
Since the cluster sizes are unequal, the total size is,
N
M Mi
i 1
N
M i yi
i 1
Y
M
Expected Value of Sample Mean
Ans.
Bias(T ) E (T )
Bias( y ) E ( y ) Y
N N
yi M i yi
i 1
Bias( y ) i 1
N M
N
M yi N
1
i 1
M i yi
M N i 1
N N
M i yi N
1 i 1 i 1
M i yi
M N i 1
N N
1
N M i yi
M i yi i 1 Ni 1
M i 1
Q. Find the expression of mean square error for the mean estimator in cluster sampling for
unequal cluster sizes
1 f 2
MSE ( y ) Sb
n
2
( N 1)
Cov(m, y )
M
1 f 2
MSE ( y ) Sb
n
2
( N 1)
Cov(m, y )
M
Q. Find the expected value Weighted Mean For Unequal Cluster, where is weighted mean
is given by
n
M i yi
i 1
yw
nM
Answer:
Weighted Mean
Since the cluster sizes are unequal, the mean of cluster size is,
M
M
N
The weighted mean based on the size of ith cluster is,
n
M i yi
i 1
yw
nM
n
M i yi
E yw E i 1
nM
Q. Find the variance expression of weighted Mean for unequal cluster, where weighted
mean is given by
n
M i yi
i 1
yw
nM
Ans:
Example: Find mean and variance of sample mean in cluster sampling using the following
data set.
Ans:
Total
Population Mean
N M
Yij
i 1 j 1
Y
MN
3846
Y 128.2
30
Variance of Sample Mean Using Population
1 f 2
Var y Sb
n
yi Y
N 2
i 1
Sb2
N 1
yi Y
N 2
1 f i 1
Var y ,
n N 1
N n 1 N
Var y yi yc 2
Nn N 1 i 1
y Y
2
i
y Y
2
yi yi i
Y 128.2
N n 1 N
2
yi Y
Nn N 1 i 1
63 1
(118.24)
6 3 6 1
1.304
Total
Total
N n 1 n
yi y
2
Nn n 1 i 1
2
yi yi yi Y
669 133.8 31.36
Y 128.87
Estimated Variance
var y
N n 1 3
yi y
2
Nn n 1 i 1
63 1
37.78
63 2
3.15
Example-2
420 trees is divided into 105 clusters.
Each cluster of size 4.
A simple random sample of 15 clusters is selected.
Estimate the mean yield by using cluster sampling.
Sample of 15 Clusters
n M
yij
i 1 j 1
y
Mn
1142.44
y 19.0407
60
Estimated Variance
1 f 2
V y Sb
n
yi Y
N 2
i 1
Sb2
N 1
yi Y
N 2
1 f i 1
V y
n N 1
N n 1 n
Var y yi yc 2
Nn n 1 i 1
SS 1495.5596
2
cluster yi yi yi Y
Var y
N n 1 n
yi y
2
Nn n 1 i 1
90 1
1495.5596
105 15 14
6.104
Q. Define the following data in R language in form of clusters and mean and variance.
Total
Ans:
How to Do This in R?
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
#----Grand total----
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
N=6;M=5;
#----Mean of each cluster----
Yibar<-Yi/M
#----Population Mean----
pop.mean=sum(Yi)/(N*M)
#----Sum of Squares----
dv.p<-(Yibar-pop.mean)^2
sdv.p<-sum(dv.p)
vr.p<-sdv.p/(N-1)
#----The Variance----
cvr.p<-((N-n)/(N*n))*vr.p
N n 1 N
2
yi Y
Nn N 1 i 1
Q. Perform the cluster sampling using R language with following data set. Also find mean
and variance.
Cluster-1 125 115 129 134 111
Answer:
Defining Clusters
Sum Of Clusters
#----Population----
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
#----Grand total----
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
#------Sampling----
n=3;N=6;M=5;
yi=sample(Yi,n)
#----Estimated Mean----
yibar<-yi/M
Est.mean=sum(yi)/(n*M)
Estimated Variance
#----Sum of Squares----
dv<-(yibar-Est.mean)^2
sdv<-sum(dv)
vr<-sdv/(n-1)
#----The Variance----
cvr<-((N-n)/(N*n))*vr
N n 1 n
yi y
2
Nn n 1 i 1
Q.70 Find the mean and variance for Unequal Clusters in cluster sampling using the
following data
Clus-2 134 125 142 141 131 151 164 139 141
Clus-6 140 125 124 124 115 111 148 157 143 151
Ans
Population of Six Clusters
Clus-2 134 125 142 141 131 151 164 139 141
Clus-6 140 125 124 124 115 111 148 157 143 151
N
M i yi
i 1
Y Y 133.66
M
Estimator-I: Mean of Cluster Mean
The mean of cluster means
n
yi
i 1
y
n
Clus-3 144 143 122 134 126 157
n
yi
i 1
y
n
1 f 2
MSE ( y ) Sb
n
2
( N 1)
Cov(m, y )
M
1 f 2
MSE ( y ) Sb
n
2
( N 1)
Cov(m, y )
M
Q. Find the weighted mean and variance for Unequal Clusters in cluster sampling using the
following data
Answer:
Population of Six Clusters
Clus-2 134 125 142 141 131 151 164 139 141
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151
N
M i yi
Population Mean in Case of Unequal Y i 1 Y 133.66
M
Estimator II: Weighted Mean
N
M Mi
i 1
The weighted mean based on the size of ith cluster is
n
M i yi
i 1
yw
nM
Estimator II: Weighted Mean
Clus-3 1 1 1 1 1 1
144 143 122 134 126 157
Clus-4 1 1 1 1 1 1 1
114 111 134 131 146 152 131
Clus-5 1 1 1 1 1
119 126 122 129 130
n
M i yi
i 1
yw
nM
n
M i yi
V yw V i 1
nM
1 f 2
V y Sbw
n
yi Mi yi M i
M
125.75 4 73.61
140.89 9 185.56
137.67 6 120.88
131.29 7 134.49
125.20 5 91.61
133.80 10 195.80
1 f 2
V y Sbw
n
Q.72 Perform the cluster sampling using R language with following data set. Also find
mean and variance.
Clu1<-c(125,115,129,134)
Clu2<-c(134,125,142,141,131,151,164,139,141)
Clu3<-c(144,143,122,134,126,157)
Clu4<-c(114,111,134,131,146,152,131)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115,111,148,157,143,151)
Population Mean and Sum Of Clusters
#----Population----
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
pop.mean=mean(pop)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
Mean of Cluster Means
#----Grand total----
n=3;N=6;M=c(4,9,6,7,5,10);
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
# ----- Mean of cluster means using population----
n=3;N=6;M=c(4,9,6,7,5,10);
clu.mean.p=Yi/M
m.c.m.p=mean(clu.mean.p)
##----For MSE----
cov=cov(clu.mean.p,M)
dv.p<-(clu.mean.p-m.c.m.p)^2
sdv.p<-sum(dv.p)
vr.p<-sdv.p/(N-1)
term1<-((N-n)/(N*n))*vr.p
term2=((-(N-1)/sum(M))*cov)^2
mse=term1+term2
####-----From Sample----
j=sample(1:6,n)
yi=Yi[j];mi=M[j]
clu.mean=yi/mi
m.clu.mean=mean(clu.mean)
Q. Perform the Simulation Study using the following data for Equal Cluster Sizes.
Total
Cluster-1 125 115 129 134 111 614
Cluster-2 134 125 142 141 131 673
Cluster-3 144 143 122 134 126 669
Cluster-4 114 111 134 131 146 636
Cluster-5 119 126 122 129 130 626
Cluster-6 140 125 124 124 115 628
Answer:
How to Do This in R?
#----Population----
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
Sampling and Estimated Mean
#----Grand total----
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
n=3;N=6;M=5;
for(i in 1:10000)
{yi=sample(Yi,n)
Est.mean[i]=sum(yi)/(n*M) }
Q. Perform the Simulation Study using the following data for Equal Cluster Sizes.
Answer:
Defining Clusters
#----------Defining Clusters------------
Clu1<-c(125,115,129,134)
Clu2<-c(134,125,142,141,131,151,164,139,141)
Clu3<-c(144,143,122,134,126,157)
Clu4<-c(114,111,134,131,146,152,131)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115,111,148,157,143,151)
Population Mean and Sum Of Clusters
#----Population----
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
pop.mean=mean(pop)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
Sum of Clusters
#----Grand total----
n=3;N=6;M=c(4,9,6,7,5,10);
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
With Mean of Cluster Means
for(i in 1:10000)
{j=sample(1:6,n)
yi=Yi[j];mi=M[j]
clu.mean=yi/mi
m.clu.mean[i]=mean(clu.mean) }
mean(m.clu.mean)
var(m.clu.mean)
Weighted Mean
n
M i yi
i 1
yw
nM
#----Sum of clusters----
mc1<-mean(Clu1)
mc2<-mean(Clu2)
mc3<-mean(Clu3)
mc4<-mean(Clu4)
mc5<-mean(Clu5)
mc6<-mean(Clu6)
mYi=c(mc1,mc2,mc3,mc4,mc5,mc6)
With Weighted Mean
w.m=c(); Mbar=mean(M);
for(i in 1:10000)
{j=sample(1:6,n)
myi=mYi[j];mi=M[j];
w.m[i]=sum(mi*myi)/(n*Mbar) }
mean(w.m)
var(w.m)
Introduction:
Hansen and Hurwitz (1943) perhaps the first who discussed the concept of unequal
probability theory.
The estimator for population total Y as suggested by Hansen and Hurwitz (1943) is
1 n yi
,
yHH
n i 1 pi
Computing HH Estimator
5 11 11/90 80-90
Total 90
1 n yi
yHH ,
n i 1 pi
Q. Prove that Hansen-Hurwitz Estimator is unbiased to population total. Also find the
variance expression.
1 n yi
yHH ,
n i 1 pi
Answer:
The estimator for population total Y as suggested by Hansen and Hurwitz (1943) is
1 n yi
yHH ,
n i 1 pi
1 n y
E ( y HH )
n i 1
E( i )
pi
N
yi Y
E( ) i Pi Y
pi i 1 Pi
1 n yi
yHH ,
n i 1 pi
2
1 N y
V yHH pi i Y
2
yi N yi
n i 1 pi V pi Y
pi i 1 pi
Another Form of Variance
1 N Yi 2
Var ( yHH ) Y 2
n i 1 pi
Q. Find all possible samples of size two using the following data and find Hansen-
Hurwitz Estimator. Also find mean and variance of Hansen-Hurwitz Estimator.
Y 0 1 2 3
Yi .5 .2 .1 .2
Z 1 1 1 4
Zi 1 2 3 4
Answer:
The following is a population with four values with respective size.
Y 0 1 2 3
Yi .5 .2 .1 .2
Z 1 1 1 4
Zi 1 2 3 4
HH estimator will be calculated for all the samples. Further mean and variance
will be obtained.
Yi Zi Pi Zi Z i
0.5 1 0.1
1.2 2 0.2
2.1 3 0.3
3.2 4 0.4
E ( yHH ) y HH pi p j 7 Y
Var( yPPS ) E ( yPPS
2
) Y 2
Var( yPPS ) 49.5 49 0.5
Using the Formula
1 N Yi 2
Var ( yPPS )
n i 1 Pi
Y 2
1
50 49 0.50
2
Yi Zi Pi Zi Z i
0.5 1 0.1
1.2 2 0.2
2.1 3 0.3
3.2 4 0.4
Q. Describe the Lahiri’s Method of selection?
Answer:
Lahiri’s Method of Selection
A pair of random numbers is chosen such that one from 1 to N and other 1 to Zmax
(say R)
1
o PI (ith) Zi / Z max
N
if R exceeds the size of the ith unit; then that unit is rejected otherwise it is accepted.
Sr.No Yi Zi
1 0.5 15
2 1.2 20
3 2.1 7
4 3.2 13
Q. Example for Lahiri’s Method
1 5 yi 144179.05
yPPS 5 28836
n i 1 pi
Suppose the ith unit is not selected at the first draw but the jth unit is selected (j i)
then the probability of selecting the jth unit at the first draw is Pj = Zj/Z;
The conditional probability of selecting the ith unit at the second draw is
The probability of inclusion of ith unit at the second draw to be included in the sample
is the sum of the product that the jth unit is selected at the first draw and the ith unit is
selected at the second draw given the jth unit is selected at the first draw i.e.
N
Pi
P 1 P
j i
j
j
The total probability i, the probability of inclusion of the ith population unit to be in the
sample is
N
Pi
i Pi Pj
j i 1 Pj
N P P
Pi 1 j i
i 1 1 Pj 1 Pi
The probability that both ith and jth units are in the sample is denoted by ij and is
defined as
Pj Pi
ij PP
i j i Pj Pi j Pi Pj
1 Pi 1 Pj
1 1
PP
i j
1 Pi 1 Pj
N
iii. (
j i
ij i j ) i (1 i )
N N N
iv. i j n i 2
i 1 j i i 1 i 1
i
Relation (i)
N
• We know a n
i 1
i
N
• Taking expectation i 1
i n
Relation (ii)
N
j i
ij (n 1) i
is sum of all the probabilities of the samples containing ith and jth units
is the sum of the probabilities containing first and second units; first and third units;
and so on.
Thus every P(s) containing the first unit occurs (n-1) times in this sum as the sample
has (n-1) other members in it and it occurs once for each of these members.
N N
ij ,
j i
j i
ij (n 1) i
Relation (iii)
N
(j i
ij i j ) i (1 i )
Taking L.H.S
N N N
(j i
ij i j ) ij i j
j i j i
N N
j i
ij i ( j i )
j 1
N N
j i
ij i ( j i )
j 1
Relation (iv)
N N N
i j n i 2
i 1 j i i 1 i 1
i
N N N
i j i j
i 1 j i i 1 j i
Variance of HT Estimator
n N
yi Yi
yHT ai ,
i 1 i i 1 i
2
N Y
Var ( y HT ) E ai i
i 1 i
2
N Y
E ai i
i 1 i
2
N Y
Var ( y HT ) E ai i
i 1 i
2
N Y
E (ai ) i
i 1 i
N Yi 2
2
N Yi
E ai E (a i ) 2
2
i 1 i i 1 i
N
E (ai a j ) i i
Y Y
i, j i i
j i
2
N Y
Var ( y HT ) E ai i
i 1 i
2
N Y
E (ai ) i
i 1 i
N
Yi 2 N Yi Yi
E (a i ) 2 E (ai a j )
2
i 1 i i , j i i
j i
N 2 Y
2 N
Y Y
Eai i 2 E (a i ) E (a j ) i i
i 1 i i , j i i
j i
N 2
Yi
E (ai2 ) E ( ai )
2
2
i 1 i
N YiY j
E (ai a j ) E ( ai ) E ( a j )
i , j 1 i j
j i
N
Yi 2
E (ai2 ) E (ai )
2
2
i 1 i
N YiY j
E (ai a j ) E (ai ) E (a j )
i , j 1 i j
j i
N
Yi 2
2 E (ai2 ) E (ai )
2
i 1 i
N YiY j
E (ai a j ) E (ai ) E (a j )
i , j 1 i j
j i
N
Yi 2
Var ( y HT ) Var (ai )
i 1 i2
N YiY j
i , j 1
Cov ( ai , a j )
i j
j i
N Yi 2
Var ( y HT ) 1 i i
i 1 i2
ij i j
N YiY j
i ,1 i j
j i
N Yi 2
Var ( y HT ) 1 i i
i 1 i2
ij i j
N YiY j
i ,1 i j
j i
(
j i
ij i j ) i (1 i )
N Yi 2
Var ( y HT ) 1 i i
i 1 i2
ij i j
N YiY j
i ,1 i j
j i
N Yi 2
Var ( y HT ) ( i j ij )
i 1 2 i
j i
N ij i j
YiY j
i , j 1 i j
j i
N Yi 2
Var ( y HT ) ( i j ij )
i 1 2i
j i
N ij i j
YiY j
i , j 1 i j
j i
1 N Yi 2 Y j2
i j ij 2 2 )
2 i ,1
( ) (
i j
j i
N i j ij
YiY j
i , j 1 i j
j i
1 N Yi 2 Y j2
i j ij 2 2 )
2 i ,1
( ) (
i j
j i
N i j ij
YiY j
i , j 1 i j
j i
1 N
( i j ij )
2 i ,1
j i
Yi 2 Y j2 YiY j
2 2 2
i j i j
1 N Y Y
) ( i j ij ) ( i j )2
VarSYG ( yHT
2 i, j i j
j i
Ans:
Unit No. 1 2 3 4
HT estimator will be calculated for all the samples. Further mean and variance will be
obtained.
Unit No. 1 2 3 4
n
yi
yHT
i 1 i
Pi Zi Z i
y1 y2
yHT
1 2
N P Pi
i Pi 1 j
j 1 1 Pj 1 Pi
N P P1
1 P1 1 j
j 1 1 Pj 1 P1
Pj
1 P 1.456
j
Pj
1 P 1.456
j
1 1
ij PP
1 Pi 1 Pj
i j
1 1
ij 0.1* 0.2
0.9 0.8
ij 0.1* 0.2 1.111 1.25
0.0472
n
yi
yHT
i 1 i
E yHT
/
7.0000
0.8229.
Unit No. 1 2 3 4
Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01
• Yi is the count of animals from the sample of four strips.
HH Estimator
Unit No. 1 2 3 4
Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01
1 60 60 14 1
4 0.05 0.05 0.02 0.01
800
HT Estimator
Unit No. 1 2 3 4
Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01
i 1 1 pi
n
n
yi
yHT
i 1 i
Y
Yi 0.5 1.2 2.1 3.2
Z 1 1 1 4
Zi 1 2 3 4
Answer:
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z
N<-4;n=2;means=c();
#----With PPS-----
for(i in 1:10)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
means[i]<-mean(y[s]) }
mean(means)
Q. Select 10000 samples with PPS Sampling With Replacement using the following
data in R. Find mean of each sample. Further find mean of means
Y
Yi 0.5 1.2 2.1 3.2
Z 1 1 1 4
Zi 1 2 3 4
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z
N<-4;n=2
#----With PPS-----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
means[i]<-mean(y[s]) }
mean(means)
Q. Select 10000 samples with PPS Sampling With Replacement using the following
data in R. Find HH estimator from each sample. Further find mean of HH estimators.
Y
Yi 0.5 1.2 2.1 3.2
Z 1 1 1 4
Zi 1 2 3 4
Answer:
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4);z<-sum(zi);pi<-zi/z
N<-4;n=2
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
hh[i] <- mean(y[s]/pi[s])/N }
mean(hh)
Example:
Answer:
Population from R
Defining Terms
# -----Defining Population----
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
pi
0.02020940 0.02093986 0.02142683 0.02556611 0.02605308 0.02629657 0.02678354
0.02678354 0.02702703 0.02727051 0.02751400 0.02775749 0.02775749 0.02848795
0.02921841 0.03140979 0.03140979 0.03238374 0.03335768 0.03360117 0.03408814
0.03457512 0.03530558 0.0389578 0.03968834 0.04212320 0.04261018 0.04358412
0.04382761 0.04382761 0.05015827
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----Sampling with PPS----
s <- sample(1:N,n,replace=TRUE,prob=pi)
y[s]
mean(y[s])
mu <- mean(y)
With SRS
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----Sampling with PPS----
s <- sample(1:N,n,replace=TRUE,prob=NULL)
y[s]
mean(y[s])
mu <- mean(y)
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----With PPS-----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
means[i]<-mean(y[s]) }
mean(means)
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----SRSWR----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=NULL)
means[i]<-mean(y[s]) }
mean(means)
Observations
Mean of Population is 30.17
Mean of means with SRS is 30.25-
Mean of means with PPS is 33.37
Q. Read the data from the following command and select 10000 samples with PPS
Sampling with replacement using the following data in R. Find HH estimator from each
sample. Further find mean of HH estimators.
y <- trees$Volume
zi <- trees$Girth
Answer:
Population from R
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
zi
8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0 12.9 12.9 13.3 13.7
13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi);N<-31;n=10;
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
hh[i] <- mean(y[s]/pi[s])/N }
mean(hh)
Q. Select 10000 samples with PPS Sampling with replacement using the following data in
R. Find HT estimator from each sample. Further find mean of HT estimators.
Z 1 1 1 4
Zi 1 2 3 4
Answer:
# -----Defining Population----
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z;N=4;n=2
#---Calculation of Pi----
#---Looping-----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
yu=unique(s)
ht[i] <- sum(y[yu]/pii[yu])/N}
mean(ht)
Q. Select 10000 samples with PPS Sampling without replacement using the following data
in R. Find HT estimator from each sample. Further find mean of HT estimators.
Answer:
# -----Defining Population----
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z;N=4;n=2
#---Calculation of Pi----
piinv<-1-pi
sm<-sum(pi/piinv)
pii <- pi*(1+sm-pi/(1-pi))
ith HT Estimator
#---Looping-----
hht=c();
for(i in 1:10000)
{s <- sample(1:N,n,replace=FALSE,prob=pi)
hht[i] <- sum(y[s]/pii[s])/N}
mean(hht)
Q. Read the data from the following command and select 10000 samples with PPS
Sampling with replacement using the following data in R. Find HT estimator from each
sample. Further find mean of HT estimators.
y <- trees$Volume
zi <- trees$Girth
Answer:
# -----Reading the data-----
y <- trees$Volume
zi <- trees$Girth
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
zi
8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0 12.9 12.9 13.3 13.7
13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6
# -----Defining Population----
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
pi
0.02020940 0.02093986 0.02142683 0.02556611 0.02605308 0.02629657 0.02678354
0.02678354 0.02702703 0.02727051 0.02751400 0.02775749 0.02775749 0.02848795
0.02921841 0.03140979 0.03140979 0.03238374 0.03335768 0.03360117 0.03408814
0.03457512 0.03530558 0.0389578 0.03968834 0.04212320 0.04261018 0.04358412
0.04382761 0.04382761 0.05015827
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#---Calculation of Pi----
pii <- 1 - (1-pi)^n
pii
Values of pii
> pii
0.1846714 0.1907295 0.1947457 0.2281662 0.2320147 0.2339325 0.2377552
0.2377552 0.2396601 0.2415607 0.2434571 0.2453491 0.2453491 0.2509998
0.2566124 0.2732237 0.2732237 0.2804987 0.2877080 0.2895002 0.2930723
0.2966283 0.3019321 0.3279149 0.3330058 0.3497258 0.3530242 0.3595758
0.3612043 0.3612043 0.4022598
#---Looping-----
hht=c();
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
yu=unique(s)
hht[i] <- sum(y[yu]/pii[yu])/N}
mean(hht)
Pi 1 Pi
1 2 Pi
Select the second unit with probability proportional to size of remaining units.
Durbin’s Procedure
Durbin (1967) suggested a selection procedure. The procedure for a sample of size 2 is
given as
1 1
Pj
1 2 Pi 1 2 Pj
Q. Describe Shehbaz-Hanif-Samiuddin’s Procedure
Pi 1 Pi
1 2 Pi
Select second unit with probability proportional to size of
1 1
Pj
1 2 Pi 1 2 Pj
Introduction:
Auxiliary Variable
We have discussed the estimation of parameters on the basis of single variable.
Supporting/supplementary variable may be used at design stage or estimation stage.
Supporting variable is used to enhance the efficiency of estimation.
The supporting variable must be correlated with main variable.
The supplementary information usually referred to as benchmark variable or Auxiliary
variable.
Graunt (1662) was the first who used auxiliary information to estimate the population
size of England.
After Graunt(1662), Laplace was the first to introduce the use of auxiliary information for
the estimation of population of France.
Q. Prove that ratio estimator is almost unbiased for large sample size.
Answer:
Ratio Estimator
y y
Rˆ
x x
Ratio Estimator for Mean and Total
The Ratio Estimator for population mean
y
yr X
x
The Ratio Estimator for population total
y
yr X
x
Expectation of Ratio Estimator
For the large sample size the expectation of ratio estimator is approximately equal
to population ratio, i.e.
E(Rˆ ) R
y
Rˆ R R
x
y Rx
x
When n is large
y Rx
Rˆ R
X
E ( y ) RE ( x )
E ( Rˆ R)
X
E ( y ) RE ( x )
E ( Rˆ R)
X
Y RX
E ( Rˆ R )
X
E ( Rˆ ) R
y y
Rˆ
x x
Ratio Estimator for Mean and Total
The Ratio Estimator for population mean
y
yr X
x
The Ratio Estimator for population total
y
yr X
x
Variance of Ratio Estimator
y Rx
Rˆ R
X
1
E ( Rˆ R)2 2 E ( y Rx ) 2
X
d y Rx
D Y RX
1
E ( Rˆ R)2 2 V (d )
X
N n 1
MSE ( Rˆ )
Nn X 2
1 N 2
N 1 (Yi RX i )
i 1
N n 1
mse( Rˆ )
Nn X 2
1 n 2
n 1 ( yi rxi )
i 1
Alternative Expression of Variance
N n 1 1 N
MSE ( Rˆ ) 2
Nn X N 1 i 1
(Yi RX i )2
N n 1 1 N
SE ( Rˆ )
2
(Yi Y ) ( RX i RX )
Nn X N 1 i 1
2
N n 1
MSE ( Rˆ ) 2
[ SY2 2 RSYX R 2 S X2 ]
Nn X
Expression of Variance for Ratio Estimator of Mean
N n 1
MSE ( Rˆ )
Nn X 2
SY2 2 RSYX R 2 S X2
N n
MSE ( yr )
Nn
SY 2 R SY S X R 2 S X2
2
MSE ( yr ) Y 2
CY2 2 CY C X C X2
C.L.( Rˆ ) : Rˆ t mse( Rˆ )
C.L.( y ) : yr t mse( yr )
Q. Find the approximate Bias expression of Ratio Estimator?
Answer:
Ratio Estimator of Mean
y y
Rˆ
x x
Notations
e0 y Y Y
e1 x X X
Using these notations
E e02 C y2 ,
E e12 Cx2 ,
E e0e1 C yx ,
where, C yx yxC yCx
Using Notations
y Y (1 e0 )
x X (1 e1 )
y
yr X
x
Y (1 e0 )
yr X
X (1 e1 )
Bias of Ratio Estimator
Y 1 e0 1 e1
1
Y 1 e0 1 e1 e12
yr Y Y e0 e1 e0e1 e12
1 f
Bias( yr ) Y Cx2 yxC yCx
n
1 f
Bias( yr ) Y Cx2 yxC yCx
n
1 f
Bias( R) R Cx2 yxC yCx
n
y
yr X
x
Var ( yr ) Y 2
[CY2 2 CY C X C X2 ]
• The mean per unit estimator is
V ( y ) Y 2 CY2
Comparison
Var( y ) Var( yr ) 0
Y 2CY2
Y 2 [CY2 2 CY C X C X2 ] 0
CY2 [CY2 2CY CX CX2 ] 0
2CY CX CX2 0
1 CX
2 CY
Q. Prove that Hartley-Ross is an Unbiased Ratio Estimator
n( N 1) ( y r x )
rHR r
N (n 1) X
Answer:
Hartley-Ross Estimator
The HR Estimator is
n( N 1) ( y r x )
rHR r
N (n 1) X
1 n yi
where, r
n i 1 xi
The variance of HR Estimator is
1
Var ( rHR )
nX2
SY2 R 2 S X2 2 R PSY S X
• We know that
n r
E r E i E ( ri )
r 1 n
E ( ri ) E ( xi )
E r
E ( xi )
E ( ri ) E ( xi )
X
E ( ri ) E ( xi )
E r R R
X
E r R
1
Y E ( ri ) E ( xi )
X
E r R
1
Y E ( ri ) E ( xi )
X
E r R
1
E ( ri xi ) E ( ri ) E ( xi )
X
yi
Y E ( yi ) E ( xi ) E ( ri xi )
xi
1
E r R cov( ri xi )
X
E r R
N
1 1
X N
(r R)( x X )
i 1
i i
1 N 1
E r R E ( srx )
X N
E r R
1 N 1 N
(ri R)( xi X )
X N ( N 1) i 1
1 N 1
E r R E ( srx )
X N
E r R
1 N 1 N
(ri R)( xi X )
X N ( N 1) i 1
Hartley-Ross Estimator
n n
1 n r i xi
1 n
srx ri xi i 1 i 1 ri xi nrx
n 1 i 1 n n 1 i 1
1 n n
srx
n 1 i 1
yi nxr
n 1
y rx
1 N 1 1 N 1 n
E r R E ( srx ) E y rx
X N X N n 1
1 N 1 n
E r E y rx R
X N n 1
1 N 1 n
E r E y rx R
X N n 1
1 N 1 n
E r y rx R
X N n 1
HR Unbiased Ratio Estimator
1 N 1 n
E r y rx R
X N n 1
E Estimator Parameter
n( N 1) ( y r x )
rHR r
N (n 1) X
Q. Prove that regression estimator is an unbiased estimator.
Ans:
yreg y ˆ yx ( X x )
E y E y ˆ E ( X x )
reg yx
E yreg Y
Q. Find variance of Regression Estimator.
Ans
yreg y ˆ yx ( X x )
yreg Y (1 e0 )
ˆ yx ( X X (1 e1 ))
yreg Y 1 e0 ˆ yx Xe1
y Y 1 e ˆ Xe
reg 0 yx 1
2 E Ye0 ˆ yx Xe1
E Ye E ˆ
2 2
0 yx Xe1
2 E Ye ˆ Xe
0 yx 1
Y 2C y2 ˆ yx
2
X 2C x2
2YX ˆ yx C yC x
2 2 2
2 Sy
Y C y 2 X C x
2 2
Sx
Sy
2YX C yC x
Sx
2 2 2
2 Sy
2 2 2
2 Sy
Y C y 2 X C x Y C y 2 X C x
2 2 2 2
Sx Sx
S S
2YX y C yC x 2YX y C yC x
Sx Sx
V ( yreg ) S y 1
2 2
Q.Find expected value and MSE of ratio estimator by taking all possible samples of size 2
from the following population.
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 11 21 32 14
Answer:
Small Population Example
The following is an artificial population with four values.
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 11 21 32 14
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 11 21 32 14
y
X
x
X 19.5
Unit y x y/x yr
6
MSE yr yri Y P( s )
6 2
1
MSE yr yr 15
2
i 1 6 i 1
MSE yr 43.54
Q. Read the data from the following command and select 10000 samples by simple random
smapling without replacement. Find ratio estimator from each sample. Further find mean
of ratio estimators and compare it with mean per unit estimator.
y <- trees$Volume
zi <- trees$Girth
Answer:
Example
y
yr X
x
Defining Population
#----Defining Population
y <- trees$Volume
x <- trees$Girth
N <- 31; n <- 4
mux <- mean(x)
r<-c();mratio<-c();
Relative Efficiency
Q. Generate a population of size 1000 for the given parameters using bivariate normal
distribution, such that
Y , X as 2 2 .
1 0.85
0.85 1
We consider sample sizes: n 10 . Select 10,000 random samples considering SRSWOR
and calculate
X
ty
x
Further, calculate the MSEs for above estimators and calculate relative efficiencies with
respect to mean per unit estimator ( y )
Answer:
library(mvtnorm)
N=1000;ryx=0.85; n=10;
m=c(2,2); # vector of mean.
# variance covariance matrix is given below.
sig=matrix(c(1,0.85,0.85,1),ncol=2);
r=rmvnorm(N,m,sig);
x=r[,2];y=r[,1];
data=data.frame(x,y);
plot(y,x)
Simulation Study
r<-c();mratio<-c();m=c();
for(i in 1:10000)
{s<-sample(1:N,n)
m[i]<-mean(y[s])
r[i] <- mean(y[s])/mean(x[s])
mratio[i] <- r[i]*mux}
var(m);var(mratio)
re<-var(m)/var(mratio);re
y
yr X
x
yx
yp
X
e0 y Y Y
e1 x X X
Using these notations,
E e02 C y2 ,
E e12 Cx2 ,
E e0e1 C yx ,
where, C yx yxC yCx
Using Notations
y Y (1 e0 )
x X (1 e1 )
yx
yp
X
Y (1 e0 ) X (1 e1 )
yp
X
Y 1 e0 1 e1
Bias of Product Estimator
Y 1 e0 1 e1
Y 1 e0 e1 e0e1
y p Y Y e0 e1 e0e1
E y p Y YE e0 e1 e0e1
Bias y p YE e0 e1 e0e1
Bias y p YE e0e1
E e0e1 C yx ,
where, C yx yxC yCx
Bias y p Y C yx
Q. Find Mean Square Error of Product Estimator?
Answer:
MSE of Product Estimator
yx
yp
X
e0 y Y Y
e1 x X X
Using these notations,
E e02 C y2 ,
E e12 Cx2 ,
E e0e1 C yx ,
where, C yx yxC yCx
Using Notations
yx
yp
X
Y (1 e0 ) X (1 e1 )
yp
X
Y 1 e0 1 e1
MSE of Product Estimator
Y 1 e0 1 e1
Y 1 e0 e1 e0e1
y p Y Y e0 e1 e0e1
E y p Y YE e0 e1 e0e1
E y p Y Y 2 E e0 e1 e0e1
2 2
MSE y p Y 2 E e0 e1 e0e1
2
MSE y p Y 2 E e0 e1
2
Q. Find expected value and MSE of product estimator by taking all possible samples of
size 2 from the following population.
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 32 21 14 11
Answer:
Small Population Example
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 32 21 14 11
yx
X 19.5
X
All Possible Samples
Expected Value
Unit
1,3 10 23 11.79
2,4 20 16 16.41
popmean 15 E y p 13.97
MSE y p y pi Y P( s)
6
2
i 1
6
1
MSE yr yr 15
2
i 1 6
MSE y p 7.36
Q.109 Generate a population of size 1000 for the given parameters using bivariate normal
distribution, such that
Y , X as 2 2 .
1 0.85
0.85 1
We consider sample sizes: n 10 . Select 10,000 random samples considering SRSWOR
and calculate
x
ty
X
Further calculate the MSEs for above estimators and calculate relative efficiencies with
respect to mean per unit estimator ( y )
Answer:
Defining Population
library(mvtnorm)
N=1000;ryx=-0.85; n=10;
m=c(2,2); # vector of mean.
# variance covariance matrix is given below.
sig=matrix(c(1,-0.85,-0.85,1),ncol=2);
r=rmvnorm(N,m,sig);
x=r[,2];y=r[,1];
data=data.frame(x,y);
p<-c();mp<-c();m=c();
for(i in 1:10000)
{s<-sample(1:N,n)
m[i]<-mean(y[s])
p[i] <- mean(y[s])*mean(x[s])
mp[i] <- p[i]/mux}
var(m);var(mp)
re<-var(m)/var(mp);re
yst
yrst X
xst
Notations
e0st yst Y Y
e1st xst X X
Using these notations,
E e W C
k
2 2 2
0 st h h yh ,
h 1
E e12st Wh2hCxh2 ,
k
h 1
k
E e0 st e1st Wh2hC yxh
h 1
yst Y (1 e0 st )
xst X (1 e1st )
yst
yrst X
xst
Y (1 e0 st )
yrst X
X (1 e1st )
Bias of Ratio Estimator
Y 1 e0 st 1 e1st
1
Y 1 e0 st 1 e1st e12st
e0 st e1st
yrst Y Y 2
e0 st e1st e1st
k
E e0 st e1st Wh2hC yxh
h 1
Bias( yrst )
h 1
Q. Find the expression of MSE of Combined Type Ratio Estimator?
Answer:
Combined Ratio Estimator in Stratified Sampling
yst
yrst X
xst
Notations
e0st yst Y Y
e1st xst X X
Using these notations
E e02st Wh2hC yh
k
2
,
h 1
E e12st Wh2hCxh2 ,
k
h 1
k
E e0 st e1st Wh2hC yxh
h 1
yst Y (1 e0 st )
xst X (1 e1st )
yst
yrst X
xst
Y (1 e0 st )
yrst X
X (1 e1st )
Y 1 e0 st 1 e1st
1
Y 1 e0 st 1 e1st
yrst Y 1 e0 st e1st
yrst Y Y e0 st e1st
E yrst Y Y 2 E e0 st e1st
2 2
E yrst Y
2
E e02st E e12st
Y 2
2 E e1st e1st
MSE yrst
L C yh
2
C xh2
Y W
2 2
2C yxh
h h
h 1
MSE yrst
C yh
L 2
C xh2
Y W
2 2
2C yxh
h h
h 1
Example
Consider a population of size 700 consisting on three strata such that N1=100, N2=250
and N3=350. The required sample size is 18.
The population mean for the Variable Y and X is 15 and 62.14, respectively.
The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as 4, 8
and 6, respectively.
11,51 24,96
10,49 17,68
9,45
12,54
nh 4 8 6
Sample Means
k k
yst Wh y h N h yh / N
h 1 h 1
1
yst N1 y1 N 2 y2 N 3 y3
N
• Sample mean of variable X
k k
xst Wh xh N h xh / N
h 1 h 1
1
xst N1x1 N 2 x2 N 3 x3
N
Sample mean of Y
Stra-
Stra-1 2 Stra-3
mean 2.75 9.25 20
Nh 100 250 350
Sh 1.708 2.493 3.847
nh 4 8 6
k k
yst Wh y h N h yh / N
h 1 h 1
1
yst N1 y1 N 2 y2 N 3 y3
N
yst 13.70
Sample mean of X
xst 61.32
Combined Ratio Estimator
yst
yrst X
xst
13.70
yrst 62.14
61.32
yrst 13.89
Q.115 Find the expression of Bias and MSE of Separate Type Ratio Estimator in stratified
sampling.
Answer:
Separate Type Ratio Estimator
Bias( yr )
1 f
Y C x2 yxC yC x
n
• In case of Stratified Sampling
Bias( yrh )
1 fh
Yh C xh
2
yxhC yhC xh
nh
E ( yrh ) Yh
1 fh
Yh Cxh2 yxhC yhC xh
nh
E ( yrh ) Yh
1 fh
Yh Cxh2 yxhC yhC xh
nh
Bias of Separate Type Ratio Estimator
k k
yrsst Wh yrh E yrsst Wh E yrh
h 1 h 1
1 fh
Yh Cxh yxhC yhCxh
k
E yrsst Wh Yh 2
h 1 nh
1 fh
Yh Cxh2 yxhC yhCxh
k
E yrsst Y Wh
h 1 nh
1 fh
Yh Cxh2 yxhC yhCxh
k
E yrsst Y Wh
h 1 nh
Bias of Separate Type Ratio Estimator
Bias yrsst
W Y C yxhC yhCxh
k
2
h h h xh
h 1
Expression of MSE for Ratio Estimator of Mean
MSE ( yr )
Y 2 C y2 Cx2 2 yxC yCx
MSE ( yrh )
hYh2 C yh2 Cxh2 2 yxhC yhCxh
MSE of Separate Type Ratio Estimator
k k
yrsst Wh yrh MSE yrsst Wh2 MSE yrh
h 1 h 1
MSE ( yrh ) Y h h
2
C 2
yh C 2 yxhC yhCxh
2
xh
h 1
Nh
2
k Yhi Rh X hi
MSE yrsst Wh2h i 1
h 1 Nh 1
Bias and MSE of Separate Type Ratio Estimator
Bias yrsst
W Y C yxhC yhCxh
k
2
h h h xh
h 1
MSE yrsst
k C yh
2
C xh
2
W Y
2 2
2 yxhC yhC xh
h h h
h 1
Example
Consider a population of size 700 consisting on three strata such that N1=100,
N2=250 and N3=350. The required sample size is 18.
The population means for the Variable Y and X is 15 and 62.14, respectively.
The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as
4, 8 and 6, respectively.
The overall mean of stratum-1, stratum-2 and stratum-3 for variable X is 25, 45
and 85, respectively.
11,51 24,96
10,49 17,68
9,45
12,54
Sample Means
Sample mean of variable Y
k k
yst Wh y h N h yh / N
h 1 h 1
1
yst N1 y1 N 2 y2 N 3 y3
N
Sample mean of variable X
k k
xst Wh xh N h xh / N
h 1 h 1
1
xst N1x1 N 2 x2 N 3 x3
N
3
yh
yrsst Wh Xh
h 1 xh
y y y
W1 1 X 1 W2 2 X 2 W3 3 X 3
x1 x2 x3
W1 yr1 W2 yr 2 W3 yr 3
y1
y r1 X1
x1
Separate Type Ratio Estimator
3
yh
yrsst Wh Xh
h 1 xh
3
yrsst Wh yrh
h 1
W1 yr1 W2 yr 2 W3 yr 3
3
yh
yrsst Wh Xh
h 1 xh
3
yrsst Wh yrh
h 1
W1 yr1 W2 yr 2 W3 yr 3
W1=0.
142857
W2=0.
357143
W3=
0.5
Separate Type Ratio Estimator
3
yh
yrsst Wh Xh
h 1 xh
3
yrsst Wh yrh
h 1
W1 yr1 W2 yr 2 W3 yr 3
Example
Consider a population of size 700 consisting on three strata such that N1=100,
N2=250 and N3=350. The required sample size is 18.
The population means for the Variable Y and X is 15 and 62.14, respectively.
The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as 4, 8
and 6, respectively.
The overall mean of stratum-1, stratum-2 and stratum-3 for variable X is 25, 45 and
85, respectively.
Sample Means
k k
yst Wh y h N h yh / N
h 1 h 1
1
yst N1 y1 N 2 y2 N 3 y3
N
Sample mean of variable X
k k
xst Wh xh N h xh / N
h 1 h 1
1
xst N1x1 N 2 x2 N 3 x3
N
Sample Information of Y and X
3
yh
yrsst Wh Xh
h 1 xh
y y y
W1 1 X 1 W2 2 X 2 W3 3 X 3
x1 x2 x3
W1 yr1 W2 yr 2 W3 yr 3
y1
y r1 X1
x1
Separate Type Ratio Estimator
3
yh
yrsst Wh Xh
h 1 xh
3
yrsst Wh yrh
h 1
W1 yr1 W2 yr 2 W3 yr 3
W1=0.142857
W2=0.357143
W3= 0.5
Separate Type Ratio Estimator
3
yh
yrsst Wh Xh
h 1 xh
3
yrsst Wh yrh
h 1
W1 yr1 W2 yr 2 W3 yr 3
W1=0.142857
W2=0.357143
W3= 0.5
W1 yr1 W2 yr 2 W3 yr 3
0.143 2.55 0.357 9.12
0.5 20.65
yreg y ˆ yx ( X x )
S yx
ˆ yx
S x2
Combined Type Regression Estimator
E yregc WE yh ˆ yx X Wh E xh
k k
h 1 h 1
E yregc WY ˆ yx X Wh X
k k
h 1 h 1
E yregc WY ˆ yx X Wh X
k k
h 1 h 1
Expectation of Regression Estimator
E yregc Y ˆ yx X X
E yreg Y
Variance of Regression Estimator
yst
2
E
yx X xst Y
ˆ
2
k
V yregc E Wh yh ˆ yx ( X Wh xh ) Y
k
h 1 h 1
2
k
V yregc E Wh yh Y ˆ yx Wh xh X
k
h 1 h 1
2
k
V yregc E Wh yh Y ˆ yx Wh xh X
k
h 1 h 1
Variance of Combined Type Regression Estimator
V yregc
k Shy2 02 Shx2
W
2
2 0 Shxy
h h
h 1
yreg y ˆ yx ( X x )
S yx
ˆ yx
S x2
Separate Type Regression Estimator
k yh
yregs Wh
yxh X h xh
h 1
ˆ
Expectation of Separate Type Regression Estimator
E yregs
k E ( yh )
Wh ˆ
h 1 yxh X h E ( xh )
E yregs
k Yh
W ˆ
yxh X h X h
h
h 1
E yregs WhYh Y
k
h 1
Variance of Regression Estimator
k
yregs Wh yh
h 1
k
ˆ yxh Wh X h xh
h 1
V yregs E yregs Y
2
2
k
V yregs E Wh yh ˆ yxh Wh X h xh Y
k
h 1 h 1
2
k
V yregs E Wh yh Y Wh ˆ yxh xh X h
k
h 1 h 1
k 2 k
2
h Wh2 ˆ 2yxh E xh X h
2
W E y h Y
V yregs k
h 1 h 1
2Wh yxh E yh Y xh X h
2 ˆ2
h 1
V yregc
k Shy2 yxh
2
Shx2
W 2
h h
h 1 2 yxh Shxy
library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
w1=N1/N;
w2=N2/N;
w3=N3/N;
Defining Parameters
r1=rmvnorm(N1,m1,sig1);
r2=rmvnorm(N2,m2,sig2);
r3=rmvnorm(N3,m3,sig3);
# generate random variable for #stratum 1.
y1=r1[,1];x1=r1[,2]
# generate random variable for #stratum 2.
y2=r2[,1];x2=r2[,2]
# generate random variable for #stratum 3.
y3=r3[,1];x3=r3[,2]
x<-c(x1,x2,x3)
Looping
> mean(yst);mean(xst);mean(rst);
[1] 20.01764
[1] 10.02838
[1] 20.02706
> var(yst);var(rst);var(yst)/var(rst)
Q. Define the process of simulation study for combined type ratio estimator also find mean
and variance of estimator using following data
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
Ans.
library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
w1=N1/N;
w2=N2/N;
w3=N3/N;
# mean vectors for stratum 1, 2 and 3.
m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2)
sig2=matrix(c(1,0.80,0.80,1),ncol=2); sig3=matrix(c(1,0.75,0.75,1),ncol=2);
r1=rmvnorm(N1,m1,sig1);
r2=rmvnorm(N2,m2,sig2);
r3=rmvnorm(N3,m3,sig3);
xs2=x2[sa2];ys2=y2[sa2];
sa3=sample(1:N3,n3)
xs3=x3[sa3];ys3=y3[sa3];
yst[i]=w1*mean(ys1)+w2*mean(ys2)+w3*mean(ys3);
xst[i]=w1*mean(xs1)+w2*mean(xs2)+w3*mean(xs3);
rst[i]=yst[i]*(mean(x)/(xst[i]))
}
mean(yst);mean(xst);
mean(rst);
var(yst);var(rst);
var(yst)/var(rst)
Q. Define the process of simulation study for seperate type ratio estimator also find mean
and variance of estimator using following data
library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
# mean vectors for stratum 1, 2 and 3.
m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given #below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2);
sig2=matrix(c(1,0.80,0.80,1),ncol=2); sig3=matrix(c(1,0.75,0.75,1),ncol=2);
Ans:
library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
w1=N1/N;
w2=N2/N;
w3=N3/N;
r2=rmvnorm(N2,m2,sig2);
r3=rmvnorm(N3,m3,sig3);
Double Sampling
Introduction:
Ratio and regression methods of estimation require the knowledge of population mean of auxiliary
variable in advance. An estimate of mean of auxiliary variable from a large sample may be used. This
procedure of selecting a large sample for collecting information on auxiliary variable x and then selecting
a subsample from it for collecting the information on the study variable y is called double sampling or
two-phase sampling.
The estimate is calculated by taking the sample in two phases. The expected value of the statistic
is
E (t ) E1 E2 (t )
Var (t ) E t E (t )
2
Var (t ) E t E2 (t ) E2 (t ) E (t )
2
Var (t ) E t E2 (t ) E E2 (t ) E (t )
2 2
Var (t ) E1 E2 t E2 (t ) E1E2 E2 (t ) E (t )
2 2
Var (t ) E1 V2 (t ) E1 E2 (t ) E1 E2 (t )
2
Var (t ) E1 V2 (t ) V1 E2 (t )
y
yr X
x
y2
yrd x1
x2
Notations
e0 y2 Y Y
e1 x1 X X
e2 x2 X X
Using these notations
E e0 0, E e02 2C y2 ,
E e1 0, E e12 1Cx2 ,
E e1 0, E e22 2Cx2 ,
E e1e2 1Cx2 ,
Proof:
e1e2
x X x
1 2 X
E e1e2
E x1 X x2 X
X2 X2
E1E2 x1 X x2 X
E e1e2
X2
E1 x1 X x1 X
E e1e2
X2
E1 x1 X
2
E e1e2
X2
1S x2
E e1e2
X2
E e1e2 1Cx2
e0 y2 Y Y
e1 x1 X X
e2 x2 X X
y2
yrd x1
x2
Y (1 e0 )
yrd X (1 e1 )
X (1 e2 )
Y (1 e0 )
yrd (1 e1 )
(1 e2 )
The Bias of Ratio Estimator in Double Sampling
y2
yrd x1
x2
e0 y2 Y Y
e1 x1 X X
e2 x2 X X
y2
yrd x1
x2
Y (1 e0 )
yrd X (1 e1 )
X (1 e2 )
Y (1 e0 )
yrd (1 e1 )
(1 e2 )
E yrd Y Y 2 E e12 e22 e02 e0e1 e0e2 e1e2
2
e0 y2 Y Y , e1 x1 X X , e2 x2 X X
y2 Y (1 e0 ), x1 X (1 e1 ), x2 X (1 e2 )
Notations
E e0 0, E e02 2C y2 , E e1 0
Y (1 e0 )
y pd X (1 e2 )
X (1 e1 )
Y (1 e0 )
y pd (1 e2 )
(1 e1 )
y pd Y (1 e0 )(1 e2 )(1 e1 )1
E y pd Y Y 1C x2 1C x2 2 C y C x 1 C y C x
Bias y pd Y (2 1 ) C y C x
e0 y2 Y Y , e1 x1 X X , e2 x2 X X
y2 Y (1 e0 ) Y (1 e0 )
y pd x2 , y pd X (1 e2 ), y pd (1 e2 )
x1 X (1 e1 ) (1 e1 )
The MSE of Product Estimator in Double Sampling
y pd Y (1 e0 )(1 e2 )(1 e1 ) 1
y pd Y Y (e2 e1 e0 )
E y pd Y Y 2 E e2 e1 e0
2 2
MSE yrd Y 2 2 C y2 2 1 C x2 2 C x C y xy
MSE yrd 2Y 2 C y2 C x2 2 C x C y xy 1Y 2 C x2 2 C x C y xy
Q. Generate a population of size 1000 for the given parameters using bivariate normal
distribution, such that
Y , X as 10 2 .
1 0.85
0.85 1
Consider sample sizes n1=100;n2=20. Select 10,000 random samples considering double and
calculate
x1
t y2
x2
Further, calculate the MSEs for above estimator.
Ans:
library(mvtnorm)
N=1000;ryx=0.85; n1=100;n2=20;
m=c(10,2); # vector of mean.
mean(mratio)
var(mratio)
Q. Prove that sample mean is unbiased estimator of population mean in two stage sampling
n m y
y
ij
i 1 j 1 nm
Ans:
The estimate is calculated by taking the sample in two stages
The expected value and variance of the statistic is
E (t ) E1 E2 (t )
Var (t ) E1 V2 (t ) V1 E2 (t )
E ( y ) E1 E2 ( y )
n m y
E ( y ) E1 E2 ij
i 1 j 1 nm
n 1 m y
E ( y ) E1 E2 ij
i 1 n
j 1 m
n 1 m E2 yij
E ( y ) E1
i 1 n j 1 m
n 1 M yij
E ( y ) E1
i 1 n j 1 M
n 1
E ( y ) E1 Yi
i 1 n
n 1
E ( y ) E1 Yi
i 1 n
E ( y ) E1 Yi
N Y
E( y ) i E( y ) Y
i 1 N
Q. Prove that sample mean is unbiased estimator of population mean in two stage
sampling.
n m y
y
ij
i 1 j 1 nm
Ans:
n m y n
1
y yi
ij
i 1 j 1 nm i 1 n
Var (t ) E1 V2 (t ) V1 E2 (t )
Var ( y ) E1 V2 ( y ) V1 E2 ( y )
n m yij
E2 ( y ) E2
i 1 j 1 nm
n 1 m E2 yij
E2 ( y )
i 1 n j 1 m
n 1 M y
E2 ( y ) ij
i 1 n j 1 M
n 1 M y
E2 ( y ) ij
i 1 n j 1 M
n 1
E2 ( y ) Yi
i 1 n
E2 ( y ) Yn
Yn is the estimator based on the 1st stage sample of size n, by SRS, we have
N n 2
V (y) S
Nn
N n 2
V1 E2 ( y ) V1 Yn V1 E2 ( y ) S1
Nn
n 1
E1 V2 ( y ) E1 V2 yi
i 1 n
1 n
Cov y , y
n
1
E1 V2 ( y ) E1 2 V2 yi 2 i j
n i 1 n i j 1
1 n
1 n M m 2
E1 V2 ( y ) E1 2
n
V y n
i 1
2 i 2
E1
i 1 Mm
S2i
1 n M m 1 M m
E1 V2 ( y ) 2
E1 S22i E1 S22i
n i 1 Mm n Mm
1 M m N S22i
E1 V2 ( y )
n Mm i 1 N
1 M m 2
E1 V2 ( y ) S2
n Mm
Var ( y ) E1 V2 ( y ) V1 E2 ( y )
1 M m 2
E1 V2 ( y ) S2
n Mm
N n 2
V1 E2 ( y ) S1
Nn
Var (t ) E1 V2 (t ) V1 E2 (t )
1 M m 2 N n 2
Var (t ) S2 S1
n Mm Nn
1 M m 2 N n 2
Var (t ) s2 s1
n Mm Nn
Q. Explain two stage sampling for unequal first stage units, also find expected value of
following estimator.
n
M i yi n ui yi
y
i 1 nM i 1 n
Let the population has N clusters with Mi as size of ith cluster. Let a sample of n first stage units
be selected from this population.Let mi units will be selected at second stage.
Mi= Size of ith cluster
N
M 0 M i NM
i 1
Mean of ith first stage unit
Mi
y ij
Yi i 1
Mi
Overall population mean
N
M Y i i
Y i 1
NM
The Expected Value of Mean
n
M i yi n ui yi
y
i 1 nM i 1 n
E ( y ) E1 E2 ( y )
n uy
E ( y ) E1 E2 i i
i 1 n
n u y
E ( y ) E1 E2 i i
i 1 n
n uY
E ( y ) E1 i i
i 1 n
n M i E1 Yi
E( y )
i 1 n
n E1 uiYi
E( y )
i 1 n
N uiYi
E( y )
i 1 N
N MY
E( y ) i i
i 1 MN
E( y ) Y
Q. Find the variance expression of following estimator in two stage sampling when first
stage units are not equal in sizes.
n
M i yi n ui yi
y
i 1 nM i 1 n
Ans
Var (t ) E1 V2 (t ) V1 E2 (t )
Var ( y ) E1 V2 ( y ) V1 E2 ( y )
n u y
V1 E2 ( y ) V1 E2 i i
i 1 n
n uY
V1 E2 ( y ) V1 i i
i 1 n
V1 E2 ( y ) V1 y
1 1
V1 E2 ( y ) S12
n N
n uy
E1 V2 ( y ) E1 V2 i i
i 1 n
n u2 n u 2 M mi 2
E1 V2 ( y ) E1 i2 V2 yi E1 i2 i S2i
i 1 n i 1 n M i mi
n
ui2 M i mi N
ui2 M i mi S22i
E1 V2 ( y ) 2
i 1 n M i mi i 1 n M i mi N
Var (t ) E1 V2 (t ) V1 E2 (t )
N
ui2 M i mi S22i N n 2
Var (t ) S1
i 1 n M i mi N Nn
Q. Write the program for Two Stage Sampling using R language considering following
data taking n1=3;n2=2. Calculate the sample mean
Ans:
#--Defining Clusters in R----
Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu4<-c(114,111,134,131,146)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)
Clus<-data.frame(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
pop<-t(Clus)
1st Stage Sample
n1=3;n2=2;
w<-pop[sample(nrow(pop),n1),]
sc1<-w[1,]
sc2<-w[2,]
sc3<-w[3,]
s2_sc1<-sample(sc1,n2)
s2_sc2<-sample(sc2,n2)
s2_sc3<-sample(sc3,n2)
f_s=c(s2_sc1,s2_sc2,s2_sc3)
est=mean(f_s)
Q2: Generate the three clusters such that
Cluster-1 from normal distribution with mean zero and variance 1.
Cluster-2 from normal distribution with mean 2 and variance 5.
Cluster-3 from normal distribution with mean 4 and variance 9.
Calculate the sample mean using two stage cluster sampling taking n1=2;n2=15.
Ans:
Clu1<-rnorm(100,0,1)
Clu2<-rnorm(100,2,5)
Clu3<-rnorm(100,4,9)
Clus<-data.frame(Clu1,Clu2,Clu3)
pop<-t(Clus)
#1st Stage Sampling
n1=2;n2=15;
w<-pop[sample(nrow(pop),n1),]
sc1<-w[1,]
sc2<-w[2,]
#2nd Stage Sampling
s2_sc1<-sample(sc1,n2)
s2_sc2<-sample(sc2,n2)
f_s=c(s2_sc1,s2_sc2)
est=mean(f_s)
Q.Find Bias of Ratio Estimator in Two Stage Sampling.
Ans:
The Ratio Estimator of mean under two stage sampling when the first stage units are equal.
y
yr 2 s X
x
1 M m 2 N n 2
V (y) S2 S1
n Mm Nn
V ( y) 1 M m 2 N n 2
V02 2 S2 y S1 y
Y 2
nY Mm Nn
V (x ) 1 M m 2 N n 2
V20 2 S2 x S1x
X nX 2 Mm Nn
1 1 N M m N n
V11 ( xy 2i S y 2i S x 2i ) ( xy1S y1S x1)
YX nN i 1 Mm Nn
1 Mi
S y 2i
2
( yij Yi )2
M i 1 i 1
S xy 2i
xy 2i
S x 2 i S y 2i
y Y (1 e0 ), x X (1 e1 )
y Y (1 e0 )
yr 2 s X , yr 2 s X
x X (1 e1 )
Bias of Ratio Estimator Under Two Stage Sampling
yr 2 s Y 1 e0 1 e1 yr 2 s Y 1 e0 1 e1 e12
1
yr 2 s Y Y e0 e1 e12 e0e1
E yr 2 s Y YE e0 e1 e12 e0e1
Notations
e0 y Y Y , e1 x X X
y Y (1 e0 ), x X (1 e1 )
y
yr 2 s X
x
Y (1 e0 )
yr 2 s X
X (1 e1 )
MSE of Ratio Estimator Under Two Stage Sampling
yr 2 s Y 1 e0 1 e1 yr 2 s Y 1 e0 1 e1
1
yr 2 s Y 1 e0 e1 e0e1 yr 2 s Y Y e0 e1
E yr 2 s Y Y 2 E e0 e1
2 2
e0 y Y Y , e1 x X X
y Y (1 e0 ), x X (1 e1 )
yx
y p2s ,
X
Y (1 e0 )(1 e1 )
y p2s X
X
y p 2 s Y Y e0 e1 e0e1
E y p 2 s Y YE e0 e1 e0e1
Bias y p 2 s Y V11
Q.Find MSE of Product Estimator in Two Stage Sampling
Ans:
The Product Estimator of mean under two stage sampling when the first stage units are equal
yx
y p2s
X
Notations
E (e0 ) E (e1 ) 0,
E (e0e1 ) V11
e0 y Y Y
e1 x X X
y Y (1 e0 ), x X (1 e1 )
yx Y (1 e0 )(1 e1 )
y p2s , y p 2s X
X X
MSE of Product Estimator
y p 2 s Y 1 e0 1 e1
y p 2 s Y 1 e0 e1 e0e1
y p 2 s Y Y e0 e1
E y p 2 s Y Y 2 E e0 e1 Y 2 E e02 e12 2e0e1
2 2
Introduction:
Ranked set sampling (RSS) is an alternative to simple random sampling that can sometimes offer
large improvements in precision. McIntyre (1952) introduced the basic concept of ranked set
sampling in order to estimate the population means of pasture and forage yields. The RSS
procedure is elaborated as follows
First, a simple random sample of size k is drawn from the population and the k sampling units
are ranked with respect to the variable of interest, say X, by judgment without actual
measurement. Then the unit with rank 1 is identified and taken for the measurement of X. Next,
another simple random sample of size k is drawn and the units of the sample are ranked by
judgment, the unit with rank 2 is taken for the measurement of X and the remaining units are
discarded. Then the unit with rank 2 is identified and taken for the measurement of X. Then
another sample of size 3.
𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏
𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏
𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏
𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏
𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏
𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏
X *
i
X* i 1
k
The Expected Value is
E X
k
*
EX*
i
i 1
k
We have assumed perfect rankings. X *i is distributed like the ith order statistic from a continuous
distribution with p.d.f. f(x) and c.d.f. F(x).
E X
k
*
EX*
i
i 1
1 k
(k 1)!
EX* F ( x) 1 F ( x) f ( x)dx
i 1 k i
kx
k i 1 (i 1)(k i)!
1 k k 1
E X kx F ( x) 1 F ( x) f ( x)dx
* i 1 k i
k i 1 i 1
k k 1
k i
EX* F ( x) 1 F ( x) dx
i 1
x f ( x )
i 1 i 1
EX *
x f ( x)dx X
EX* X
Q. Find variance expression for mean estimator under ranked set sampling.
Ans:
X *
i
X* i 1
By definition
2 2
E X *i X E X *i X *i X *i X
X X
2 2 2
E X *i X E X *i X *i *
i
X X
2 2 2
E X *i X *i V X *i E X *i X *
i
V X*
1 k
k1 X X
k
2
2 2
E X *i X *i 2
*
i
k i 1 i 1
Taking ,
k
2 k 1
k
i 1 i 1 i 1
k k 1
k i
k x X f ( x ) F ( x) 1 F ( x) dx
2 i 1
i 1 i 1
V X 2
1 2 k
k X i X
2
* *
k i 1
2
V X* X X
k
1 2
*
i
k k2 i 1
143. Generate a population of size 1000 following normal distribution with mean 0 and
variance 1. Further calculate the sample mean using ranked set sampling taking sample
size as 6.
Ans:
k=6;rssx=matrix(,k,k);
pop=rnorm(1000,0,1)
for(i in 1:k)
{
s=sample(pop,k)
xs=sort(s,decreasing = FALSE)
rssx[i,]=xs
}
rssx_s=diag(rssx)
est_srs=mean(x)
est_rss=mean(rssx_s)
rssx
Q. Find Bias of Ratio Estimator in Ranked Set Sampling such that
E (e0 ) E (e1) 0
1 x2
E e12
k
1
2
X *
X Cx2 Dx2[i ]
rk X 2 X 2 rk 2 i 1 i
1 y
2
E e0
k
1
2
2
2
2
2
Yi* Y C y2 Dy2[i ]
rk Y Y rk i 1
1 yx
E e0 e1
k
1
X * X Yi* Y Cxy Dxy[i ]
rk YX YXrk 2 i 1 i
Ans:
The ratio estimator of mean is
Y*
y[ rss ] X
X*
The Variance of Sample Mean Under RSS
2 1 k
V X * x 2 X *i X
2
k k i 1
Notations
e0 Y * Y Y , e1 X * X X
E (e0 ) E (e1) 0
1 x2
E e12
k
1
2
2
2
2
X *i X Cx2 Dx2[i ]
rk X X rk i 1
1 y
2
E e02
k
1
2
2
2
2
Yi* Y C y2 Dy2[i ]
rk Y Y rk i 1
1 yx
E e0 e1
k
1
2
X *i X Yi* Y Cxy Dxy[i ]
rk YX YXrk i 1
Y * Y (1 e0 )
X * X (1 e1 )
Y*
y[ rss ] * X
X
Y (1 e0 )
y[ rss ] X
X (1 e1 )
E y[ rss ] Y YE e0 e1 e12 e0 e1
Cx2 Dx2[i ]
Bias y[ rss ] Y
Cxy Dxy[i ]
2
2
X *i X Cx2 Dx2[i ]
rk X X rk i 1
1 y
2
E e0
k
1
2
2
2
2 2
Y *
i
Y C y2 Dy2[i ]
rk Y Y rk i 1
1 yx
E e0 e1
k
1
2
X *i X Yi* Y Cxy Dxy[i ]
rk YX YXrk i 1
Ans:
Notations
e0 Y * Y Y , e1 X * X X
E(e0 ) E(e1) 0,
1 x2
E e
k
1
2
2
2
1 2
2
X *i X Cx2 Dx2[i ]
rk X X rk i 1
1 y
2
E e
k
1
Y * Y
2
2
C y2 Dy2[i ]
rk Y 2 Y 2 rk 2 i 1 i
0
1 yx
E e0 e1
k
1
2
X *i X Yi* Y Cxy Dxy[i ]
rk YX YXrk i 1
Y * Y (1 e0 )
X * X (1 e1 )
Y*
y[ rss ] * X
X
Y (1 e0 )
y[ rss ] X
X (1 e1 )
y[ rss ] Y 1 e1 e0 y[ rss ] Y Y e0 e1
E y[ rss ] Y Y 2 E e0 e1
2 2
MSE y[ rss ] Y 2 C y2 Dy2[i ] Cx2 Dx2[i ] 2 Cxy Dxy[i ]
C y2 Cx2 2Cxy
MSE y[ rss ] Y 2
Dy2[i ] Dx2[i ] 2 Dxy[i ]
𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏
𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏
𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏
𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏
𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏
𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏
𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏
𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏
𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏
𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏
𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏
𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏
X *
i
The mean Estimator in RSS is X* i 1
k
r k
X *
ic
In case of r cycles X* c 1 i 1
rk
1 r k /2 * k /2
X *ERSS e i ,1,c X k i , k ,c
rk c 1 i 1
X *
i 1 2
1 r * k 1 /2
k 1 /2
X *ERSS o i ,1,c X k 1
rk c 1 i 1
X *
i , k , c
X * k 1
, c
2
i 1 k ,
2
1 r k /2 * k /2
X *MRSS e k X k k 2
X *
1 r k *
X *MRSS o X k 1
rk c 1 i 1 i , 2 ,c
Pair Ranked Set Sampling and Double Ranked Set Sampling are further developments in Ranked
set sampling.
Introduction:
The individuals chosen for the sample are not ready to participate in the survey. This is a type of
selection bias Unit/Item nonresponse. The problem of non-response can be dealt with using
following methods.
Sub-sampling of non-respondents
Randomized response technique.
Hansen and Hurwitz (1946) technique
Taking a sub sample of non-respondents after the first mail attempt and then enumerating the sub
sample of non-respondents by personal interviews
n1 n
Hansen & Hurwitz (1946) Estimator y y1 2 y2'
n n
Q. Prove that following Hansen and Hurwitz estimator is unbiased to population mean.
n1 n
y y1 2 y2'
n n
Ans:
Hansen & Hurwitz (1946) Estimator
n1 n
y* y1 2 y2'
n n
n
E y * E 1 y1 2 y2'
n
n n
n
E y * E1E2 1 y1 2 y2'
n
n n
Hansen & Hurwitz (1946) is Unbiased
n n
E y* 1 Y 2 Y
n n
n n
E y* Y 1 2
n n
E y* Y
Q.Find Variance of following Hansen & Hurwitz (1946) Estimator.
n1 n
y y1 2 y2'
n n
Ans:
The Hansen & Hurwitz (1946) Estimator is
n1 n
y* y1 2 y2'
n n
Variance
y*
1
n
n1 y1 n2 y2 2 y2' y2
n
n
1
V y * V n1 y1 n2 y2 2 y2' y2
n
n n
1
V n1 y1 n2 y2 S y2
n
n22 n22
V y2 y2 2 E y2' y2
' 2
2
n n
n2 n
n22
2 2
E y ' Y2 E y2 Y2 2E y2' Y2 ( y2 Y2 )
2 2
n
n22
2 E y2' Y2 E y2 Y2
2 2
n
n22 1 1 2
2 S y (2)
n n2 n2
k
n2
2
k 1 S y2(2)
n
S y2(2)
V ( y * ) S y2 S y2(2)
150.Generate a population consisting on 1000 values in R program and calculate Hansen &
Hurwitz (1946) Estimator such that n1=80;n2=20;r=10.
N=1000; n=100;n1=80;n2=20;r=10;
pop <- rnorm(N,0,1)
s <- sample(pop,n)
s_r<-pop[1:80]
s_nr<-pop[81:100]
s2<-sample(s_nr,r)
m<-(n1/n)*mean(s_r)+(n2/n)*mean(s2)
Q. Find the expression of Bias of ratio estimator in case of nonresponse on study variable
only, where
E (e0* ) E (e1) 0,
N n x
2
E e
2
1 2 Cx
2
nN X
E e0*2
1
S y2 S y2(2) C *2
Y2
y
E e0*e1
S yx
C yx
YX
Ans:
Ratio estimator when non-response occurs in study variable y is as follow
y*
y X
*
r
x
V ( y * ) S y2 S y2(2)
Notations
e0* y * Y Y , e1 x X X
E (e0* ) E (e1) 0,
N n x
2
E e12 2 Cx
2
nN X
E e0*2
1
S y2 S y2(2) C *2
2 y
Y
E e0*e1
S yx
C yx
YX
y * Y (1 e0* )
x X (1 e1 )
y* Y (1 e0* )
yr* X
yr* X
x X (1 e1 )
yr* Y 1 e0* 1 e1 yr* Y 1 e0* 1 e1 e12
1
yr* Y 1 e1 e12 e0* e0*e1 e0*e12
yr* Y Y e0* e1 e12 e0*e1 e0*e12
yr* Y Y e0* e1 e12 e0*e1
E yr* Y YE e0* e1 e12 e0*e1
Bias yr* Y Cx2 C yx
Q. Find the expression of MSE of ratio estimator in case of nonresponse on study variable
only, where
E (e0* ) E (e1) 0,
N n x
2
E e
2
1 2 Cx
2
nN X
E e0*2
1
S y2 S y2(2) C *2
Y2
y
E e0*e1
S yx
C yx
YX
Ans.
y*
y X
*
r
x
y Y (1 e0* )
*
x X (1 e1 )
y*
y X
*
r
x
Y (1 e0* )
y
*
X
X (1 e1 )
r
yr* Y 1 e0* 1 e1 yr* Y 1 e0* 1 e1
1
yr* Y Y e0* e1
E yr* Y Y 2 E e0* e1
2 2
MSE yr* Y 2 C *2
y Cx 2 C yx
2
Q. Find Bias and MSE of Ratio Estimator in Case of Non-response on Both Variable,
where
E (e0* ) E (e1* ) 0,
E e0*2
1
S y2 S y2(2) C *2
Y2
y
E e1*2
1
S x2 S x2(2) Cx*2
X2
E e0*e1*
1
S xy S xy2 (2) Cxy
*
YX
Ans
Ratio estimator when non-response occurs in study variable y is follow as
y*
yr** X
x*
V ( y * ) S y2 S y2(2)
V ( x * ) S x2 S x2(2)
e0* y * Y Y
e1* x * X X
E (e0* ) E (e1* ) 0,
E e0*2
1
S y2 S y2(2) C *2
2 y
Y
E e1*2
1
S x2 S x2(2) Cx*2
X2
E e0*e1*
1
S xy S xy2 (2) Cxy
*
YX
y * Y (1 e0* )
x * X (1 e1* )
y*
yr** X
x*
Y (1 e0* )
yr** X
X (1 e1* )
Bias of Ratio Estimator Under Non-response
yr** Y 1 e0* 1 e1* yr Y 1 e0 1 e1 e1
1 ** * * *2
Introduction:
This technique is useful to estimate the sensitive characteristics in the population. It was first
proposed by S. L. Warner in 1965.Qualitative and quantitative response models. Qualitative
response models are used to estimate the proportion of some behavior or occurrence in a
population.
For example, to estimate the “proportion of people who smoke cigarette today”
The Warner introduced the following model
Z Yp (1 p)(1 Y )
p be the probability to answer the sensitive question and Y the true proportion of those
interviewed bearing the sensitive property. Z is the proportion of YES answers.
The Warner model
Z Yp (1 p)(1 Y )
It can be transformed as
Z p 1
Y
2 p 1
For Example
Statement 1: " I smoke cigarettes."
Statement 2: "I never smoke cigarettes."
3 1 1
Y 4 6 1
2 1 1 8
6
Example 1.
Statement 1: " I have falsified my tax return.“
Statement 2: " I have never falsified my tax return."
The Warner model
Z Yp (1 p)(1 Y )
40 1 1
Y 1 (1 Y )
50 6 6
4 1
1
Y 5 6
1
2 1
6
Example 2.
Statement 1: " Have you ever used a sick day leave when you weren't really sick? “
Statement 2: " Have you never used a sick day leave when you weren't really sick?”
350 1 1
Y 1 (1 Y )
500 6 6
7 1
1
Y 10 6
1
2 1
6
Y 0.2
E Z E Y
Variance of Response
V Z V Y V S
Q. Generate a population of size 1000 using normal distribution with mean 10 and standard
deviation 2. Further consider sample size n=100. Select 10,000 random samples using the model
Z=S+Y. The variable S is taken to be normal variate with mean equal to zero and standard
deviation equal 1.
(i) Calculate the following estimator considering SRSWOR
tz
(ii) Calculate the variance for above estimators
Ans:
s <- rnorm(N, 0, 1)
sa <- sample(1:N, n)
sy<-y[sa]; ss=s[sa];
z<-sy+ss
mz=mean(z)
N=1000; n=100;mz=c();
y <- rnorm(N, 10, 2)
s <- rnorm(N, 0, 1)
for(i in 1:10000)
{
sa <- sample(1:N, n)
sy<-y[sa]; ss=s[sa];
mz[i]=mean(sy+ss);
}
mean(mz)
Quantitative RRT with Non-Sensitive Auxiliary Variable.
Y may the monthly income of the head of a household and X may be her current age.Y may be
the total value of purchase orders in a year for a company and X may be the total turnover for
that company in that year.
Scrambled Response
Let Y be the study variable, a sensitive variable which cannot be observed directly due to
respondent bias. Let X be a non sensitive auxiliary variable which has a positive correlation with
Y. Let S be a scrambling variable independent of Y and X.
The respondent is asked to report a scrambled response for Y given by
Z Y S
Respondent is asked to provide a true response for X.
Mean & Variance of Z
S variable is introduced such that
E S 0
E Z E Y
V Z V Y V S
V S
Cz2 C y2
Y2
Ratio Estimator of Mean for Sensitive Variable
z
yrs X
x
Q. Find Bias & MSE of Ratio Estimator Using RRT.
The Ratio Estimator is
z
yrrt X
x
Notations
e0 z Z Z , e1 x X X
yrrt Y 1 e0 1 e1
1
1 f
Bias( yr ) Y Cx2 zxCz Cx
n
yrrt Y Y e0 e1
E yrrt Y Y 2 E e02 e02 2e0e1
2
e0 y Y Y , e1 x X X , e2 z Z Z
E e0 E e1 E e2 0
where, C yx yxC y Cx
y Y (1 e0 ), x X (1 e1 ), z Z (1 e2 )
X Z
yrr y
x z
Y (1 e0 )
yrr XZ
X (1 e1 ) Z (1 e2 )
MSE of the Estimator
yrr Y (1 e0 )(1 e1 )1 (1 e2 )1 Y (1 e0 )(1 e1 )(1 e2 )
yrr Y (1 e0 )(1 e1 e2 e1e2 ) Y (1 e0 )(1 e1 e2 )
yrr Y (1 e0 e1 e2 ) yrr Y Y e0 e1 e2
yrr Y Y e0 e1 e2 E yrr Y Y 2 E e0 e1 e2 2
2
yrreg Y (1 e0 ) ˆ yx X X (1 e1 ) ˆ yz Z Z (1 e2 )
yrreg Y 1 e0 ˆ yx Xe1 ˆ yz Ze2
2 2
V yrreg E Y 2e02 ˆyx2 X 2 e1 ˆ yz2 Z 2 e2 2YX ˆ yx e0e1 2YZ ˆ yz e0 e2 2 XZ ˆ yz ˆ yx e1e2
V yrreg Y 2C y2 ˆ yx
2
X 2Cx2 ˆ yz2 Z 2Cz2 2YX ˆ yxC yx 2YZ ˆ yz C yz 2 XZ ˆ yz ˆ yxCxz
Example
On a line transect of length L =100 meters, a total of y =18 birds were detected at the following
distances (in meters) from the transect line
0, 0, 1, 3, 7, 11, 11, 12, 15, 15, 18, 19, 21, 23, 28, 33, 34, 44.
It is desired to estimate the density of birds in the study region. w0=20
y0
D
2w0 L
Sites are selected as the lines are selected in line transects.
Observations are obtained on selected sites.
Estimation is same as in line transects
163. Explain Adaptive Cluster Sampling
When the selection procedure depends on the observations during the survey
Select an initial sample of size n with a suitable design.
Observe the selected units for a specified condition.
If any of the initially selected unit satisfied the pre-defined condition, its adjacent neighboring
units will be sampled and investigated.
• When the selection procedure depends on the observations during the survey.
• Select an initial sample of size n with a suitable design.
• Observe the selected units for a specified condition.
• If any of the initially selected unit satisfied the pre-defined condition, its adjacent
neighboring units will be sampled and investigated.
The sample mean is
1
wy in1 wyi
n