Handouts STA632

Course Handouts
Sampling Techniques
Virtual University of Pakistan
STA632-Sampling Techniques
BASIC CONCEPTS
Q . Define Sampling Techniques?
Ans: The data is very important for decision making. Appropriate method of data collection
is desired. How best to obtain the data or sample. How best to use the data/sample to estimate
the characteristic of the whole population. There are two parts of any sampling strategy. First
is the selection procedure. Second is the estimation procedure.
Q .Define sampling?
Ans: The method of selecting the sample is called sampling design.
Q . Define sample?
Ans: A perfect sample would be a version of population, mirroring every characteristic of
population. (Lohr,2nd edition).A good sample is a small but representative part of the population.
Q . What is difference between Observation and Sampling Units?
Ans: A unit that can be selected for a sample is called sampling unit. An object on which a
measurement is taken is called observation unit.
Q . Define sampling frame?
Ans: The sampling frame is the list/map from which the potential sampling units are drawn. For
example, List of all the class rooms , Map of area containing farms.
Q . Distinguish between parameter and statistic?
Ans: A parameter is any summary number described the whole population. A Statistic is any
summary number obtained from a sample.
Sample
(Subset of class) Statistic
Virtual University of Pakistan Page 2
Population
Parameter
(Entire class)
Q . Define sampling error?

Ans: The errors in the estimates because sample is the part of population. The difference
between the parameter and statistic is considered a sampling error.
Sampling Error= Statistic-Parameter.
Q . Define non-sampling error?
Ans: These errors can occur even the whole population is studied. Non-sampling error cannot
attributed to the sampling.
Q . What is selection bias?
Ans: When a random process is not considered in selection of sample. Some part of the
population is ignored during selection of sample. Some results of study may not be accurate.
Q . What is measurement error?
Ans: When the response has a tendency to differ from the true value in one direction. problem
with the measurement instrument.
Q . Define Nonresponse?
Ans: The individuals chosen for the sample are not ready to participate in the survey. It is called
nonresponse. This is a type of selection bias.
Q . What is probability sample?
Ans: This is a sample which is obtained when each unit of the population has some chances for
being included into the sample. This is obtained by a random process.
Example: A bag consists on one white and one black ball.
White ball Black ball

1 1

Number of whiteballs
P(White) 
Number of total balls
1
P(White) 
2
Q . Define randomness?
Ans: When a unit has some chances of selection in the sample. When we are not certain about
the selection of any unit
Q . How many types of probability sampling?
Ans: Simple random sampling, stratified sampling, cluster sampling, systematic sampling,
multistage /multiphase sampling.
Q. Define Non-probability sampling?
Ans: No random process is followed for the selection of units. The population is evenly
distributed with respect to the characteristics of interest.
Q . What is convenience sampling?
Ans: Respondents are selected according to convenience of researcher. This is also called
accidental sampling, opportunity sampling or grab sampling.
Q. Define purposive sampling?
Ans: The units which are appropriate to meet the objective of study. The selection is based on
the judgment of researcher.
Q. What is quota sampling?
Ans: The population is divided into subgroups on the basis of similar characteristics. The
subgroups are called “Quota”. The nonrandom selection is made in each quota.
Q . Define snow-ball sampling?
Ans: This is also called chain sampling so it works like a chain. The selected subject asked for
the assistance to reach the other subjects.
Q. Define simple random sampling?

Ans: Each member of the population has an equal probability of being included in the sample.
Each sample of size ‘n’ has an equal probability of being selected in the sample.
Q. What is the difference between sampling with replacement and without replacement?
Ans: Sampling With Replacement: If the unit is replaced before the selection of next unit then
sampling is with replacement (WR).
Nn 1
Total samples = , P( S ) 
Nn
Example: Consider a population of size N=3, as A, B, C

The possible samples of size n=2 are
S1=(A,A), S2=(A,B), S3=(A,C), S4= (B,A), S5=(B,B), S6=(B,C), S7= (C,A), S8=(C,B),
S9=(C,C)
1 1 1
P  Si   n
 2 
N 3 9
Sampling without Replacement: If the unit is not replaced before the selection of next unit
then sampling is without replacement (WOR).
N 1
Total Samples =
Cn , P( S )  N
Cn
Example: Consider a population of size N=3, as, A, B, C

The possible samples of size n=2 are S1=(A,B) , S2= (A,C) , S3= (B,C)
1 1 1
P(S1) = P(S2) = P(S3) = 3  N
3 C2 Cn
N!
N
Cn 
n !( N  n)!
Q. Define stratified sampling?

Ans: When population is heterogeneous with respect to characteristics of interest. Divide the
population into homogeneous subgroups called strata. Select the simple random sample from
each subgroup.
Q . Define linear systematic sampling?
Ans: First unit is selected randomly from first ‘k’ units. Every kth unit is included in the sample.
The interval ‘k’ is calculated by dividing the population size to the sample size.

Q . Define circular systematic sampling?

Ans: ‘k’ will be decided by rounding down to the integer nearest to N/n. First unit will be
selected at random from 1 to N units. Every kth unit is included in the sample.
Q . Define cluster sampling?
Ans: A cluster is the sampling unit consisting on the observation units. Any sampling method
can be used for selection of clusters. All the units within a cluster are studied.

SIMPLE RANDOM SAMPLING

Q . Prove that sample mean is unbiased estimator for population mean under simple
random sampling with/without replacement.
Ans: Unbiased Estimator: The estimator is unbiased if its expected value is equal to the
population parameter
E(estimator)  Parameter
The expectation of a random variable is defined as the sum of the products of the probabilities
and the variable.
N
E( y )   p Y
i i1 i i
Theorem: In simple random sampling with replacement and without replacement, the sample
mean,
y , is an unbiased estimator of the population mean Y .
Under SRSWR: y is unbiased estimate of Y when
E  y  Y
n= sample size N=population size yi = observe variable on the ith sample unit.
The population mean Y is

n
y1
n  y
i1 i .
An estimator of population total is
n
y  N
n  y
i1 i
y is unbiased estimate of Y when
E  y  Y
The expectation of y , by definition is,

 
n
E ( y)  E  1
n

 i 
y


 i1 
n
1
n  E( y )
i1 i
 E( y )
i
N
E( y )   p Y
i i1 i i
N
  1Y
i1 N i
N
 1 Y
N i1 i
Y
E( y) Y

N 
m


Under SRSWOR: Since there are n  
 possible distinct samples for without
replacement, then
m  n y  
N 
E  y      ni  

k 1  i1 




n 
 N 1

 
N 

 
 n 1 
 
Now in
n


 possible samples, each unit is appearing   times,
m 
n y  
N
E  y      ni   
 n
k 1 i1 

  
.

N 1 N 
N 
1
n

  Yi


 n 1
  i1


n 
N
 1  Y
N i1 i
Y

Q . Derive the Variance of Sample Mean under Simple Random Sampling (SRS) with
replacement and without replacement?
Ans: Variance of Sample Mean Under SRSWOR:
The variance of the sample mean for simple random sampling without replacement,
ywor ,
is
Var ( ywor )  N  n Sn
2
N
2
 (1 f ) Sn
where, f  n
N
Proof:
We know that
n
y 1
n  yi
i1 ,
The variance of this estimator is
 
n
Var  y   12 Var   yi 
n
 i1

 


Since the variance of the sum of the random variables is equal to the sum of the variance of
random variables plus the sum of the covariance.
Var  y   12
 n n 

 Var ( yi )   Cov ( yi y j ) 

n  i 1 
i, j 1
j i
Var ( y )  E ( y Y )2  E ( y 2) Y 2
i i i
N
 1  Y 2 Y 2
N i1 i
 N 
 1 
 NY 
  Y
2 2
N i1 i 

 N 
 1   Y Y  
2
N i1 i  

 N 1S 2
N
Cov( y y )  E( y y ) Y 2
i j i j
N
 1
 Y Y Y 2
N ( N 1) i, j 1 i j
j i
 N 2 N
 
 1 

 Yi

 Y 2  Y 2
N  N 1
 
 
i1 i
   
 

 i 1 

 N 2 N
 
 1 

 Yi

 Y 2  Y 2
N  N 1
 
 
i1 i
   
 
 i 1
 

 N 2
 
 N 2 N


 Yi 

N  N 1
 
 1  
 Y 2   i 1 
N  N 1


 Yi 
 




 i 1


 i1 i N2 
2
 
N  
N 2  Y
i  
Cov( y y )   1 

 Yi  i 1 






   S2 / N
i j N ( N 1) 

 i1 N 


After substitution we get
n
Var  y   1 [  N 1S 2   Sn ]
n 2
n i1 N
2
i, j1
j i

n  n 1
  
 1  n N 1S 2   n  S 2   N  n Sn
  2
n2  N 
N

Variance of sample mean for SRSWR:

The variance of the sample mean for simple random sampling with replacement,
ywr , is
Var ( ywr )  N 1 Sn
2
N
2
 (1 1 ) Sn
N

Proof: We know that

n
y 1
n  yi
i1 ,
The variance of this estimator is
 
 n
Var y  2 Var   yi 
1 



n  


i1   
Since the variance of the sum of the random variables is equal to the sum of the variance of
random variables plus the sum of the covariance
 n
 

Var y  1   Var ( yi ) 
n2 i1 
Var ( y )  E( y Y )2  E( y 2) Y 2
i i i
N
 1  Y 2 Y 2
N i1 i
 N 1S 2
N
 n N 1 2 
1
 
Var y   
n i1
2 N
S 

 
Var y  N 1S 2
Nn
Q. Derive unbiased estimator of variance under SRS with replacement and without
replacement.
Ans: For simple random sampling without replacement, s2 is an unbiased estimator of S2 and for
simple random sampling with replacement s2 is an unbiased estimator of S2 (N-1)/N.
Proof: For both simple random sampling without replacement and simple random sampling with
replacement, we have
n n 2
(n 1)s2   ( y  y)2    ( y  Y )  ( y  Y ) 
i1 i i1 i 

n n 2 n
  ( y  Y )2    y  Y   2  ( yi  Y ) ( y  Y )
i1 i i1 
i1

n 2 n
  ( y  Y )2  n  y  Y   2( y  Y )  ( yi  Y )
i1 i  
i1
n 2
  ( y  Y )2  n  y  Y   2n( y  Y )2
i1 i  
n 2
  (y  Y ) 2 
 n y  Y 
i1 i  
Taking expectation
(n 1) E(s2)  n E( y  Y )2  n E( y  Y )2
i
E( y  Y )2  N 1 S 2,
As we know that i N therefore
(n 1) E(s2)  n N 1S 2  n E( y  Y )2.

N

1 n  S2
Now E ( y  Y )2
is the variance of the sample mean, which is


 N


 n for
1 1



 S2
simple random sampling without replacement and is N 



n for simple random
sampling with replacement.
Hence for simple random sampling without replacement
E (s 2 )  n N 1 S 2  

N  n  n S 2
n 1 N 
 N  n 1 n
 S2
and for simple random sampling with replacement
E (s 2 )  n N 1 S 2  N 1 n S 2
n 1 N N n 1 n
 N 1S 2
N
Standard error
The standard error of the sample mean

y is

SE ( y)  Var ( y)
S 1



f for srswor
n
 


 
S 1n 1 N

  1  for srswr
 

 
Q . What is R?
Ans: R was initially written by Ross Ihaka and Robert Gentleman at the Department of
Statistics of the University of Auckland. The R software is derived from an original set of notes
describing the S and S-Plus environments.
Downloading R https://cran.r-project.org/



R Commands y<-c(1,2,3,4,5)
#indexing Y[2]
#Mean mean(y)
#Standard Deviation sd(y)
#Variance var(y)
Q . How sampling is done by using R?

Ans: Sampling With replacement Sample (y, size, replace=TRUE)
out
Sampling With replacement Sample (y, size, replace=FALSE)
OR
Sample(y, size)
Consider a Population consisting on
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Enter population data in R
yp <- c(111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
Taking sample by SRSWR in R
ys <- sample(yp,5) Output: 114 121 111 186 150
Mean of sample mean(ys) Output: 136.4
Standard Deviation sd(ys) Output: 31.73

yp <- c(11,150, 121, 198, 12, 136, 14, 129, 17, 115, 186, 110, 121, 15, 14)
#population size N<-length(yp)
#sample of size 5 ys <- sample(yp,5)
#sample size n<-length(ys)
#mean of the sample mys<-mean(ys)
#variance of the sample vys<-var(ys)
#variance of ybar vybar<-(1-n/N) *var(yp)/n
#standard error sdr<-sqrt(vybar)
#estimate of population total ept<-N*mys
#estimated variance vept<-N^2*vybar
#estimated standard error sdept<-sqrt(vept)
yp <- c(11,150, 121, 198, 12, 136, 14, 129, 17, 115, 186, 110, 121, 15, 14)
#sample of size 5 ys <- sample(yp,5, replace=TRUE)
Q. Define the Simulation Study for Sampling Strategy?
Ans: The simulation study is useful to evaluate a sampling strategy. We can generate the
populations considering specific situations. Generate the population. The sample of size ‘n’ is
obtained ‘k’ times. From each sample the estimator is obtained. The variance of ‘k’ estimators is
calculated for examining the efficiency.
Consider a Population consisting on
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
Taking sample by SRSWOR in R ys <- sample(yp,5)
mean(ys)
Now we will take ‘k’ samples in R.
R program for k Samples
Suppose, k=10, n=5.
The ‘for loop’ is used to repeat the statements.
m1<-c();
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110,
121, 115, 114)

for (i in 1:10){
s <- sample(yp,5)
m1[i] <- mean(s)
}
Output
Sample means with 10 repetitions
149.8 129.2 140.8 132.4 118.2 117.6 118.4 118.0 132.6 132.6
k Samples
Suppose, k=10000, n=5.
m1<-c();
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
for (i in 1:10000){
s <- sample(yp,5)
m1[i] <- mean(s)
}var(m1)
Output
On first run, the result was 101.7741
On second run, the result was 102.1403
On third run, the result was 100.2455
Q . Obtain the 10000 random sample of size 6 under SRSWOR using the following
Population and find the mean.
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Ans:
Given population is
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
Here, k=15000, n=6.
m1<-c();

yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
for (i in 1:10000){
s <- sample(yp,6)
m1[i] <- mean(s)
}
var(m1)
Out put
Q . Obtain 1000 random number through normal distribution with mean 0 and variance 1
as population. Obtain the 10000 random sample of size 6 under SRSWOR using the
population and find the mean.
Ans: 1000 values with mean=0 and standard devitaion=1.
rnorm(n,mean,sd)
yp <- rnorm(1000,0,1)
Suppose, k=10000, n=5.
m1<-c();
yp <- rnorm(1000,0,1)
for (i in 1:10000){
s <- sample(yp,5)
m1[i] <- mean(s)
}
var(m1)
Output
Q . Explain the Estimation of Sample Size for Mean Estimation by using an example?
Ans: Sample size Estimation

Let ‘d’ be the margin of error with some probability  by which sampling value differs from
population value. Specifying a maximum allowable difference ‘d’ between the estimate and the
true value and allowing for a small probability .
The probability of the margin of error being less than d is given by
P  y  Y  d   
We assume that sample mean is normally distributed, i.e.

y Y
t
S .E ( y )
y Y d
t 
S.E ( y ) S.E ( y )
d  y  Y  t S.E ( y )
Consider the case of sampling without replacement.
N  n S2
d t
N n
 N  n S2 
d t 
2 2

 N n
d 2 Nn
2
  NS 2  nS 2 
t
(tS / d ) 2
n
(tS / d ) 2
1
N
When the population size is large
n  (tS / d )2
Example: A quality manager is interested to estimate the mean diameter of bolts produced in the
last week. Determine the sample size to obtain 90% confidence level for population if the error
in the estimate will not be more than 3cm. A pilot sample yields a standard deviation of 30 cm.
Solution:
n  (tS / d )2
(1.645) 2 (30) 2
n
(3) 2
n = 270.60
Sample size when cost is involved C  C0  nC1
Cost plays major role in conduct of surveys
C  Total Cost
C0  Fixed Cost
C1  Cost PerUnit

d ( y  Y )2
The loss due to sample mean is not equal to population mean

where d is a constant.
L(n)  E (C0  nC1 )  E d ( y  Y )2 
 C0  C1E (n)  d E ( y  Y )2
L(n)  C0  C1n  d Var ( y )
S2
L(n)  C0  C1n  d
n
Derivative with respect to n and equating to zero will produce optimum value as
n  dS 2 / C1
Q . Estimation of sample size for estimation of proportion?
Ans: Suppose we have N population units i.e. Y1, Y2, …. Yi, …YN
yi = 1 if ith unit possesses a certain attribute and 0 otherwise.
The population proportion is defined
N
The sample proportion is Y   Yi N  A / N  P
i 1
n
y   yi / n  a / n  p
i 1
Since Yi takes the values as 1 and 0.

N
Y
i 1
i
2
 A  NP
n
The same is the case for sample.
y
i 1
2
i
 a  np
N N
 (Y  Y )  Y
i 1
i
2
i 1
i
2
 NY 2
N 2
 (Yi  Y )
S 2  i 1
( N  1)
N
S2  P(1  P)
N 1
NPQ
S2 
N 1
Similarly s2 = npq / (n – 1)

For SRSWOR N  n pq
var ( pwor )  ,
N n 1
For SRSWR N  1 pq
var ( pwr )  ,
N n 1
When fpc is ignored var( p)  pq / (n  1)
For SRSWOR N n 2
VAR( ywor )  S ,
Nn
N  n NPQ
VAR( pwor )  ,
Nn N  1
N  n PQ
VAR( pwor )  ,
n N 1
N  n PQ
Standard Error of Proportion S .E ( pwor ) 
n N 1
For the large sample size
var( p)  pq / n
The standard error
S.E ( p)  pq / n
Sample size Estimation
pP d
t 
S .E ( p) S .E ( p)
d  t  S.E ( p)
t 2 PQ / d 2
n
1  N1  t 2 PQ / d 2  1
When sample size is large.
pq
n  t2
d
d  t  S.E ( p)
d 2  t 2 VAR( p)

 N  n PQ 
d 2  t2 
 N  1 n 
 t 2 NPQ t 2 PQ 
d2    
 n  N  1 N 1
t 2 NPQ  2 t 2 PQ 
n d 
 N  1  N  1 
t 2 NPQ
n
 N  1
 2 t 2 PQ 
d  N  1 
 
t 2 NPQ
n
 N  1
1
( N  1)d 2  t 2 PQ 
N 1
t 2 NPQ
n
( N  1)d 2  t 2 PQ 
t 2 PQ
n d2
1 t 2 PQ 
 
N  d 2 
( N 1)
t 2 PQ / d 2
n
1  N1  t 2 PQ / d 2  1
Q. Derive the Confidence Interval for mean and proportion. Also explain with the help of
R.
Ans: Confidence Interval for Mean (Known variance)
The interval estimation of population mean
y  z1 S .E ( y )
2

Lower Limit
y  z1 S .E ( y )
2
Upper Limit
y  z1 S .E ( y )
2
Confidence Interval for Mean (Unknown variance)

The interval estimation of population mean when population standard deviation is not known
y  t1 ,v
S .E ( y )
2
Example: The XYZ Company produces cold drink diet cans with standard deviation of the
amount poured into cans by automatic filling machine is 1.4 ml (milliliter). A random sample is
taken of the amount of filling in cans were 281, 278, 276, 282, 280, 279, 278, 280. Suppose that
population of filling amount follows normal distribution. Determine 95% confidence interval for
the mean amount in all cans filled by the machine.
Solution:
Given
  1.4
y
 y  279.25
n
 1.4
S .E ( y)    0.495
n 8
The 95% confidence limits will be

y  1.96S.E ( y)
The Lower Limit

279.25  1.96(0.495)  278.2998
The Upper Limit

279.25  1.96(0.495)  280.2201
Confidence Interval of Mean Using R

yp <- c(111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
#population size N<-length(yp)
#sample of size 5 ys <- sample(yp,5)
#sample size n<-length(ys)
#mean of the sample mys<-mean(ys)

#variance of the sample vys<-var(ys)

#variance of ybar vybar<-var(yp)/n
#standard error sdr<-sqrt(vybar)
The interval estimation of population mean when population standard deviation is not known
error <- qnorm(0.975)*sdr
The respective lower and upper limits are
left <- mys-error
right <- mys+error
Confidence Interval for Proportion
The interval estimation of population pro-portion is
p  z1 S .E ( p)
2
A simple random sample of 80 students is taken from a population of 470 students in a
department. The total number of smokers in the sample was 22.
Confidence Interval of Smokers
Smoker are 22
22
p  0.275
q  180
 p  0.725
pq
S .E ( p)   0.0499
n

p  z1 S .E ( p)
2
p  1.96S.E ( p)
The Lower Limit
0.275 1.96(0.0499)  0.1772

The Upper Limit
0.275  1.96(0.0499)  0.3728

STRATIFIED SAMPLING

Notations
N= + + + ---+
Similarly,
n = n1 + n2 + n3 + ---+ nh
Nh
 Yhi
i 1
Yh   Stratum mean
Nh
nh
Sample mean of stratum  Yhi

yh  i 1
nh
As in Simple Random Sampling

E ( yh )  Yh
Sample Mean N h  nh Sh2

Var ( yh ) 
N h nh
k k
yst   N h yh / N   Wh y h
h 1 h 1
1 Nh
 Yhi  Yh 
2
Sh2  
N h  1 i 1
1 nh
Estimated population total sh2   ( yhi  y h )2
nh  1 i 1
yst  Nyst
Q. Prove that sample mean is unbiased estimator of population mean under stratified
sampling.
In stratified sampling, the sample mean is an unbiased estimator of population mean i.e.
E  yst   Yst
As in Simple Random Sampling
E ( yh )  Yh
Taking expectation of the sample mean of stratified sample, we have:
1 K
E  yst    Nh E ( yh )  Yst
Virtual University of Pakistan N h1 Page 28
E( yh )  Y h
Similarly it can be proved that
E ( yst/ )  Y
Q . In how many allocation methods variance of sample mean can be derived in stratified
sampling?
Ans: Allocation of Sample Size
The variance of y can be derived by using following allocation methods:
st
i. Arbitrary Allocation,
ii. Proportional Allocation, and
iii. Optimum Allocation.
Q . Drive the variance of sample mean in stratified sampling using arbitrary allocation?
Ans: Arbitrary Allocation
The total sample is allocated arbitrarily among the strata
Theorem
The variance of sample mean, yst for stratified random sampling for finite population
sampling is
1 K  Sh2 
Var  yst    h  h hn 
N N  n
N 2 h1  h
1 k S2
Var  yst    Nh  Nh  nh  nh
N2 h 1 h
1 k Sh2
Var  yst    Nh2  Nh  nh 
N2 h 1 N h nh
k N h2 Sh2
Var  yst     Nh  nh 
h 1 N
2 N h nh
We know that the mean for stratified random sampling design is:
1 k
yst 
N
 Nh yh
h 1
1 k
Var  yst    Nh2Var  yh 
N2 h 1

N h  nh Sh2
Var ( yh ) 
N h nh
k N h2 Sh2
Var  yst     Nh  nh 
h 1 N
2 N h nh
k
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh
1 N h2 Sh2
k 1 k
Var  yst     2  Nh Sh2
N 2 h1 nh N h 1
1 k S2
Var  yst    Nh  Nh  nh  nh
N2 h 1 h
k Wh2 Sh2 k Wh Sh2

Var  yst    
h 1 nh h 1 N
For large value of N, the above equation transferred to

k
Var  yst    Wh2 Sh2 nh
h 1
kWh2 Sh2 k Wh Sh2

Var  yst    
h 1 nh h 1 N
Variance of total
Var  y   Var  N
/
st yst 
  N h2 Sh2 k
k
Var yst/     N h Sh2
h 1 nh h 1
Q . Derive the Variance of Sample Mean in Stratified Sampling using Proportional

Allocation?
Ans: Its Originally proposed by Bowley (1926) . If sampling fraction in all the strata is same,
then the allocation is termed as proportional allocation. The sample size of hth stratum in this
case given by
nh / n  N h / N
n
nh  N h  nWh
N
k N h2 Sh2
Var  yst     Nh  nh 
h 1 N
2 N h nh
k
 Nh  nh 
h 1 N h nh
 nN h 
 Nh  N 
Varprop  yst    Wh2 Sh2  
k
h 1 nN h
Nh
N
k  N  n
Varprop  yst    Wh2 Sh2
h 1 nN h
k  N  n
Varprop  yst    Wh2 Sh2
h 1 nN h
Q. Derive the variance of sample mean in stratified sampling using optimum allocation
Ans: The purpose of optimum allocation is to allocate nh in such a way that minimum variance is
achieved for a minimum cost. nh are chosen either to minimize Var ( yst ) for a fixed sample size
or cost is minimized for given variance.
The two aspect of optimum allocation are
i. Sample size is proportional to stratum size and standard deviation of stratum (Neyman
Allocation).
ii. Sample size is inversely proportional to cost.
Sample Size to Minimum Variance for Fixed Cost
In stratified random sampling Var ( yst ) will be minimum subject to the cost when nh is
proportional Wh Sh / ch to i.e.
K 

nh  Wh Sh ch    Wh Sh
 h1
ch 

The variance of sample mean yst for stratified random sampling is
k k
Var ( y st )   Wh2 Sh2 / nh   Wh Sh2 / N
h 1 h 1
k
where C = total cost, C  C0   Ch nh
h 1
We introduce Lagrange’s multiplier i.e.

 k 
Virtual University of Pakistan F  Var ( yst )    C  C0   nhCh .  Page 31
 h 1 
k Wh2 Sh2 k Wh Sh2

Var  yst    
h 1 nh h 1 N
Partially differentiating w.r.t. nh and equating to zero

Wh2 Sh2
  Ch  0
nh2
nh   Wh Sh Ch
Wh Sh
Ch
nh 

k
k  Wh Sh Ch
h 1
 nh  n 
h 1 
nh WS Ch
 k h h
n
 Wh Sh Ch
h 1
nh WS Ch
 k h h
n
 Wh Sh Ch
h 1
n Wh Sh Ch
nh  k
 Wh Sh Ch
h 1
Variance under Optimum Allocation
Minimum Variance for Fixed Cost

k Wh2 Sh2 k Wh Sh2
Var  yst    
h 1 nh h 1 N
n Wh Sh Ch
nh  k
 Wh Sh Ch
h 1

1 k  k 
Varmin  yst     Wh Sh ch   Wh Sh ch 
n  h1  h1 
Neyman Allocation
n Wh Sh Ch
nh  k
 Wh Sh Ch
h 1
If C1 = C2 = … = C then the cost function will be C = C0 + nC
n Wh Sh C
nh  k
 Wh Sh C
h 1
n Wh Sh
nh  k
 Wh Sh
h 1
2
1 k  1 k
Varmin  yst     Wh Sh    Wh Sh2
n  h1  N h 1
1 k  k  1 k
Varmin  yst     Wh Sh ch   Wh Sh ch    Wh Sh2
n  h1  h1  N h1
For large N
2
1 k 
Varmin  yst     Wh Sh 
n  h1 
Example:
Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18. The sample size from stratum-1, stratum-2 and
stratum-3 is arbitrarily decided as 4, 8 and 6, respectively.
The sample from each stratum is chosen as
Stra-1 Stra-2 Stra-3
1 7 23
3 12 14
2 8 20

5 5 22
11 24
10 17
9
12

1 7 23
3 12 14
2 8 20
5 5 22
11 24
10 17
9
12
mean 2.75 9.25 20
Nh 100 250 350
Sh 1.708 2.493 3.847
nh 4 8 6

mean 2.75 9.25 20
Nh 100 250 350
Sh 1.7078 2.4928 3.8471
nh 4 8 6
k k
yst   Wh y h   N h yh / N
h 1 h 1
 N1 y1  N2 y2  N3 y3 
1
yst 
N
yst  13.70
W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3
3
 Nh  nh 
h 1 N h nh


Wh 0.1429 0.3571 0.5
Nh 100 250 350
Sh2 2.9168 6.2143 14.800
nh 4 8 6
Var  yst   0.7162
Confidence Interval for Mean

yst  z1 S .E ( yst )
2
13.7  1.96(0.8463)
Lower Limit
13.7  1.96(0.8463)  11.41
Upper Limit
13.7  1.96(0.8463)  14.73
Example:
Ans: Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18.
The sample size from stratum-1, stratum-2 and stratum-3 is
n 18
nh   6
L 3
1 7 23
3 12 14
2 8 20
5 5 22
4 11 24
3 10 17


1 7 23
3 12 14
2 8 20
5 5 22
4 11 24
3 10 17
mean 3 8.83 20
Nh 100 250 350
Sh 1.4142 2.6394 3.8471
nh 6 6 6

mean 3 8.83 20
Nh 100 250 350
Sh 1.4142 2.6394 3.8471
nh 6 6 6
k k
h 1 h 1
 N1 y1  N2 y2  N3 y3 
1
yst 
N
yst  13.58
3
 Nh  nh 
h 1 N h nh
W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

Wh 0.1429 0.3571 0.5
Nh 100 250 350
Sh2 2 6.967 14.800
nh 6 6 6

Var  yst   0.7163

yst  z1 S .E ( yst )
2
13.58  1.96(0.8701)
Lower Limit
13.58 1.96(0.8701)  11.36
Upper Limit
13.58  1.96(0.8701)  14.78
Example:
Ans: Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18. First we will allocate the sample size to each stratum
according to proportional allocation.
Sample Size Allocation

N1=100, N2=250, N3=350, N=700, n=18
n
nh  Nh
N
n 18 n 18
n1  N1  100  3 n2  N2  250  6
N 700 N 700
n 18
n3  N3  350  9
N 700

1 5 23
2 11 14
5 10 20
9 22
12 24
6 17
23
21

19

1 5 23
2 11 14
5 10 20
9 22
12 24
6 17
23
21
19
mean 2.67 8.83 19.86
Nh 100 250 350
nh 3 6 9

mean 2.67 8.83 19.86
Nh 100 250 350
nh 3 6 9
k k
yst   Wh yh   N h yh / N
h 1 h 1
 N1 y1  N2 y2  N3 y3 
1
yst 
N
yst  13.46

Nh 100 250 350
nh 3 6 9
Sh2 4.333 7.77 12.476
3
 Nh  nh 
h 1 N h nh

W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3
Var  yst   0.527

yst  z1 S .E ( yst )
2
13.46  1.96 (0.726)
Lower Limit
13.46  1.96 (0.726)  12.47
Upper Limit
13.46  1.96 (0.726)  15.32
Q . Give an example to explain how Sample Size is obtained in Optimum Allocation?
Ans: A manufacturing company in interested to conduct a survey about a certain product from
three towns (say A, B, and C) of a city. The towns are different from each other with respect to
the household income. The number of houses in Town A, B, and C are 170, 135, and 80,
respectively.
The company finds that obtaining an observation cost from town A or B is same as Rs.500 (i.e.
c1= c2=500). The cost per observation in the town C is Rs. 800. (i.e.c3=800).
S1  3, S2  7, S3  10
The overall sample size with certain margin or error is 30. Find the sample size from each Town
(stratum) n1 , n2 , n3  ?
Sample Size in Optimum
Town-A Town-B Town-C
Sh 3 7 10
Nh 170 135 80
Ch 500 500 800
 
 N S / c 
nh  n  3 h h h

 N S / c 
Virtual University of Pakistan  h h h  Page 39
 h 1 

Sh 3 7 10
Nh 170 135 80
Ch 500 500 800
NhSh 510 945 800
NhSh/
sqrt(Ch) 22.8078 42.2616 28.2843
3
N S
h 1
h h / ch  93.35
 
 N S / c 
nh  n  3 h h h

 N S / c 
 h h h 
   h 1 
 NS / c 
n1  n  3 1 1 1
  22.8078 
 N S / c  n1  30    7.33  7
 h h h   93.35 
 h 1 
 
 N S / c 
 42.2616 
n2  n  3 2 2 2
 n2  30    13.58  14
 N S / c   93.35 
 h h h 
 h 1 
 
 NS / c   28.2843 
n3  n  3 3 3 3
 n3  30    9.09  9
 N S / c   93.35 
 h h h 
 h 1 
n1  7, n2  14, n3  9
Q . By using an example explain the Variance for Optimum Allocation
Ans: n  7, n  14, n  9
1 2 3

1 5, 7 23
2 11, 6 14

5 10, 8 20
4 9, 12 22
3 12, 11 24
6 8, 9 17
10 11,13 23
21
19

Mean 4.4286 9.4286 20.333
Nh 170 135 80
nh 7 14 9
k k
yst   Wh yh   N h yh / N yst 
1
 N1 y1  N2 y2  N3 y3 
h 1 h 1 N
yst  9.47

Mean 4.4286 9.4286 20.3333
Nh 170 135 80
nh 7 14 9
Sh2 8.9523 5.8022 10.5
3
 Nh  nh 
h 1 N h nh
2  N1  n1 
W12 S1  W22 S2 2  N 2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3
Var  yst   0.329

Confidence Interval
yst  z1 S .E ( yst )
2
9.49  1.96(0.57399)
Lower Limit
9.49 1.96(0.57399)  8.36
Upper Limit
Virtual University of Pakistan 9.49  1.96(0.57399)  10.61 Page 41
Q . Give an example to explain how Sample Size is obtained in Neyman Allocation?

Ans: A manufacturing company in interested to conduct a survey about a certain product from
three towns (say A, B, and C) of a city. The towns are different from each other with respect to
the household income. The number of houses in Town A, B, and C are 170, 135, and 80,
respectively.
The company finds that obtaining an observation cost from town A or B or c is same.
S1  3, S2  7, S3  10
The overall sample size with certain margin or error is 30. Find the sample size from each Town
(stratum)
n1 , n2 , n3  ?
Town-A Town-B Town-C

Sh 3 7 10
Nh 170 135 80
Ch 500 500 500
 
 N S 
nh  n  3 h h 
 N S 
 h h 
 h 1 
Sh 3 7 10
Nh 170 135 80
Ch 500 500 800
NhSh 510 945 800
 
 N S 
3
N S
h 1
h h  2255
nh  n  3 h h 
 N S 
 h h 
   h 1 
 NS 
n1  n  3 1 1   510 
n1  30    6.78  7
 N S 
 h h   2255 
 h 1 
   945 
 NS  n2  30    12.57  12
n3  n  3 3 3   2255 
 
  N h Sh 
 h 1 

 
 N S 
n2  n  3 2 2   800 
 N S  n3  30    10.64  11
 h h   2255 
 h 1 
n1  7, n2  12, n3  11
Q . Derive the Variance for Neyman Allocation by using an example?

Ans:
n1  7, n2  12, n3  11
1 5, 7 23
2 11, 6 14
5 10, 8 20
4 9, 12 22
3 12, 11 24
6 8, 9 17
10 23
21,19
20,18

Mean 4.4286 9.0000 20.0909
Nh 170 135 80
nh 7 12 11
k k
 N1 y1  N2 y2  N3 y3 
1
yst   Wh yh   N h yh / N yst 
h 1 h 1 N
yst  9.29

Mean 4.4286 9.0000 20.0909
Nh 170 135 80
nh 7 12 11
Sh2 8.9524 5.2727 8.8909

3
 Nh  nh 
h 1 N h nh
W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3
st Var  y   0.3184
Confidence Interval
yst  z1 S .E ( yst )
2
9.29  1.96(0.5643)
Lower Limit
9.29  1.96 (0.5643)  8.18
Upper Limit
9.29  1.96(0.5643)  10.39
Q . Give Comparison of Allocation Methods?

Ans: Variance (Rand) & Variance (Prop)
N n 2
Varran  y   S
Nn
 Yi  Y 
N 2
S 2  i 1
N 1
k Nh
  (Yhi  Y )2
h 1 i 1
S2 
N 1
k Nh
N n
  (Yhi  Y )2
Varran  y   h 1 i 1
Nn N 1
 Yhi  Y h   Y h  Y 
k Nh k Nh 2
  (Yhi  Y )2  
h 1 i 1 h 1 i 1
  Yhi  Y h     Yhi  Y h Y h  Y 

Nh
k
 2 2
 Yh Y
h 1 i 1 

  Yhi  Y h   
k Nh 2 k Nh 2
   Y h Y
h 1 i 1 h 1 i 1
 Y h  Y   0
k
as
h 1
 
k k 2
( N  1) S 2    Nh  1 Sh2   Nh Y h  Y
h 1 h 1
for the large N
 
k k 2
NS 2   N h Sh2   N h Y h  Y
h 1 h 1
 
kNh 2 k Nh 2
S2   Sh   Y h Y
h 1 N h 1 N
 
k k 2
  Wh Sh2   Wh Y h  Y
h 1 h 1
 
k k 2
 S 2    Wh Sh2    Wh Y h  Y
h 1 h 1
 
k 2
Vran V prop    Wh Y h  Y
h 1
Vran V prop
Variance (Prop) & Variance (Opt)
N n k
Varprop  yst    Wh Sh2
Nn h1
2
1 k  1 k
Varopt  yst     Wh Sh    Wh Sh2
n  h1  N h 1
Var ( y prop )  Var ( yopt )
1 k
2
 k 
   Wh Sh2    Wh Sh  
n  h1  h1  
 
1 k
2 2
 k   k 
 
Var y prop  Var yopt      Wh S h2  2   Wh S h     Wh S h 
n  h1


  h1   h1  

1 k
2
 k  k  k  k 
   Wh Sh2  2   Wh Sh   Wh Sh    Wh   Wh Sh  
n  h1  h1  h1  h1  h1  
 
2
 
    1 k k
Var y prop  Var yopt   Wh  Sh   Wh Sh 
n h1  h 1 
2
 
    1 k k
Var y prop  Var yopt   Wh  Sh   Wh Sh 
Comparison n h1  h 1 
V prop  Vopt
Vran V prop
Vran V prop  Vopt
Example:
All the 80 farms in a population are stratified by farm size. The expenditure on the insecticides
used during the last year by each farmer is presented in table
Large farmers Medium farmers Small farmers

75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37
• Select a stratified sample of 24 farmers by using equal allocation
• Compute the variance of sample mean under simple random sampling without
replacement.
• Compute the variance of sample mean under stratified sampling using equal allocation.
• Compare the variances

Ans:
Population Mean
It is given that N=80,
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75  65  .....  16)
Y   Yi / N   47.79
i 1 80
Variance Under SRSWOR
The overall standard deviation is
Var( y)  N  n S 2
Nn
 Yi  Y 
N 2
S 2  i 1
N 1
 268.68
Var( y)   80  24  268.68
 
 80 24 
Var( y)  7.84
Variance under Stratified Sampling
Wh 0.25 0.45 0.30
Nh 20 36 24
Sh2 169.52 70.56 61.45
nh 8 8 8
3
 Nh  nh 
h 1 N h nh
W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3
Var  yst   2.64
Under SRSWOR
Var( y)  7.84
Under Stratified Sampling for equal allocation

Var  yst   2.64

Q . Give an example to explain the Comparison between Proportional Allocation and

Simple Random Sampling?
Ans : All the 80 farms in a population are stratified by farm size. The expenditure on the
insecticides used during the last year by each farmer is presented in table below (Source:
Elements of Survey Sampling by Singh and Mangat).

75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37
Population Mean
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75  65  .....  16)
Y   Yi / N   47.79
i 1 80
Variance under SRSWOR
 Yi  Y 
N 2
S 2  i 1
N 1
 268.68
Var( y)  N  n S 2
Nn
Var( y)   80  24  268.68
 
 80 24 
Var( y)  7.84
Proportional Allocation
N1=20, N2=36, N3=24,N=80, n=24
n
n nh  24 N h
n1  N1  N 20  6
N 80
n 24 Page 48
Virtual University of Pakistan n2  N 2  36  10.8
N 80
n 24
n3  N3  24  7.2
N 80
Variance under Proportional

Wh 0.25 0.45 0.30
Nh 20 36 24
Sh2 169.52 70.56 61.45
nh 6 11 7
3
 Nh  nh 
h 1 N h nh
W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3
1.2361 + .9021 + .5596
Comparison of Variances Var  yst   2.698

Under SRSWOR
Var( y)  7.84
Under Stratified Sampling for proportional allocation
Var  yst   2.698
RE= 290.5
Example:
Comparison between Neyman Allocation and Simple Random Sampling?
Ans: Population Mean
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75  65  .....  16)
Y   Yi / N   47.79
i 1 80
Variance under SRSWOR

Var( y)  N  n S 2
Nn
 Yi  Y 
N 2
S 2  i 1
N 1
 268.68
Var( y)   80  24  268.68
 
 80 24 
Var( y)  7.84
Variance under Neyman Allocation
Wh 0.25 0.45 0.30
Nh 20 36 24
Sh2 169.52 70.56 61.45
     
 NS   N S   NS 
n1  n  3 1 1   8.3 n2  n  3 2 2   9.7 n3  n  3 3 3   6
 N S   N S   N S 
 h h   h h   h h 
 h 1   h 1   h 1 
Wh 0.25 0.45 0.30
Nh 20 36 24
Sh2 169.52 70.56 61.45
nh 8 3
10  Nh 6nh 
h 1 N h nh
W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

.7946 + 1.0320 + .6913
Var  yst   2.5179
Comparison of Variances
Under SRSWOR
Var( y)  7.84
Under Stratified Sampling for Neyman allocation
Var  yst   2.5179
RE= 311.2
Stratified Sampling with R

Q. Define the following data in R and perform stratified sampling. Calculate the
parameters for each stratum. Also find the mean of population.
75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37
Ans:
Defining Strata in R
• str1<-c(75,76,65,79,86,62,57,92,45,50,69,48,48,77,60,60,55,64,66,58)
• str2<c(55,40,51,45,38,55,35,33,41,30,43,48,42,53,54,38,37,36,40,52,44,36,39,47,48,46,3
9,46,42,41,28,47,61,35,31,23)
• str3<-c(35,31,26,28,38,32,36,42,18,40,33,16,25,29,18,25,28,35,32,26,13,30,19,37)
Parameters
Mean and Standard Deviation
y<-c(str1,str2,str3)
m_p=mean(y)

sd_p=sd(y)
Output
m_p43.7875
sd_p16.39
Defining Terms
Defining Stratum Size
N1=length(str1)
N2=length(str2)
N3=length(str3)
N=N1+N2+N3
Defining Weights
W1=N1/N;W2=N2/N;
W3=N3/N
n=24
Mean & Standard Deviation
Mean of Strata
m_st1=mean(str1)
m_st2=mean(str2)
m_st3=mean(str3)
Standard Deviation of Strata
sd_st1=sd(str1)
sd_st2=sd(str2)
sd_st3=sd(str3)
Output
64.6,42.19,28.83
13.01,8.4,7.84
Variance & Stratified Mean

Variances of Strata
var_st1=var(str1)
var_st2=var(str2)
var_st3=var(str3)
Mean of population using stratified population
m_yst=(1/N)*
(N1*m_st1 + N2*m_st2 + N3*m_st3)
Q . Define the following data in R and perform stratified sampling using the proportional
allocation using the sample size 24. Also find the mean and variance.

75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37
Ans:

n1=round(n*(N1/N))
n2=round(n*(N2/N))
n3=round(n*(N3/N))
Term1=(W1^2)*var_st1*(N1-n1)/(N1*n1)
Term2=(W2^2)*var_st2*(N2-n2)/(N2*n2)
Output
24=6+11+7

0.25,0.45,0.3
Variance under Proportional Allocation

Term3=(W3^2)*var_st3*
(N3-n3)/(N3*n3)
vp_prop=Term1+Term2+Term3
Output
2.698
Sampling from Stratum
s1=sample(str1,n1)
s2=sample(str2,n2)
s3=sample(str3,n3)
ms_st1=mean(s1)
ms_st2=mean(s2)
ms_st3=mean(s3)
m_prop=?
Output
73.17,42.55,27.14
45.58
Estimated Variance
vars_st1=var(s1);vars_st2=var(s2);vars_st3=var(s3)
Term1= (W1^2)*vars_st1*(N1-n1)/(N1*n1)
Term2=(W2^2)*vars_st2*(N2-n2)/(N2*n2)
Term3=(W3^2)*vars_st3*(N3-n3)/(N3*n3)
vs_prop =Term1+Term2+Term3
output
> vs_prop
[1] 2.57464
Q . Define the following data in R and perform stratified sampling using the Neyman
allocation using the sample size 24. Also find the mean and variance.

75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35

60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37
Ans:
Defining Strata in R
str1<-c(75,76,65,79,86,62,57,92,45,50,69,48,48,77,60,60,55,64,66,58)
str2<c(55,40,51,45,38,55,35,33,41,30,43,48,42,53,54,38,37,36,40,52,44,36,39,47,48,46,39,46,42
,41,28,47,61,35,31,23)
str3<-c(35,31,26,28,38,32,36,42,18,40,33,16,25,29,18,25,28,35,32,26,13,30,19,37)
Parameters
Mean and Standard Deviation
y<-c(str1,str2,str3)
m_p=mean(y)
sd_p=sd(y)
m_p 43.7875
sd_p 16.39
Defining Terms
Defining Stratum Size
N1=length(str1)
N2=length(str2)
N3=length(str3)
N=N1+N2+N3
Defining Weights
W1=N1/N; W2=N2/N; W3=N3/N
n=24
Output
80=20+36+24
0.25,0.45,0.3
nn1=round(n*(N1*sd_st1/sum))
(N1-n1)/(N1*n1)
(N2-n2)/(N2*n2)
Output
24=8+10+6
0.79,1.03,0.69
Variance Under Neyman Allocation
(N3-n3)/(N3*n3)

vp_Ney=Term1+Term2+Term3
Output
2.52
Sampling From Strata
s1=sample(str1,n1)
s2=sample(str2,n2)
s3=sample(str3,n3)
ms_st1=mean(s1)
ms_st2=mean(s2)
ms_st3=mean(s3)
m_Ney=?
Output
68.13, 40.2,26.83
43.17
Estimated Variance
vars_st1=var(s1);vars_st2=var(s2);vars_st3=var(s3)
Term1= (W1^2)*vars_st1*(N1-nn1)/(N1*nn1)
Term2= (W2^2)*vars_st2*(N2-nn2)/(N2*nn2)
Term3=(W3^2)*vars_st3*(N3-nn3)/(N3*nn3)
Estimated Variance
vs_Ney=Term1+Term2 +Term3
Output
> vs_Ney
Q. Define the following data in R and perform stratified sampling. Calculate the
parameters for each stratum. Also find the mean of population.
Stratum
1 12, 14, 19, 22
2 362, 441, 456, 482, 444, 472,
3 124, 189, 142, 165, 135, 140
N=16;N1=4;N2=6;N3=6; n1=2;n2=3;n3=3;n=8;
Y<-c(12,14,19,22,362,441,456,482,444,472,124,189,142,165,135,140)
Y1<-c(12,14,19,22)
Y2<-c(362,441,456,482,444,472)
Y3<-c(124,189,142,165,135,140)
N=16;N1=4;N2=6;N3=6; n1=2;n2=3;n3=3;n=8;

w1=N1/N; w2=N2/N;w3=N3/N;
y1=c();y2=c();y3=c();yst=c(); for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
}
mean(yst); var(yst)
Output
226.138
56.25539
mean(y)
vp=(w1^2)*var(Y1)*(N1-n1)/(N1*n1) + (w2^2)*var(Y2)*(N2-n2)/(N2*n2) +
(w3^2)*var(Y3)*(N3-n3)/(N3*n3)
Output
226.1875
Vp=56.12526
Q . Generate the stratified population consisting on three strata such that stratum-1 is
normally distributed with mean 10 and standard deviation 2 with 100 values, stratum-2 is
normally distributed with mean 100 and standard deviation 2 with 500 values, stratum-3 is
normally distributed with mean 500 and standard deviation 2 with 1000 values. Find the
mean and variance for this population using the method of stratified sampling.
Ans: Simulation Study
N1=100;N2=500;N3=1000;n=50
Y1<-rnorm(N1,mean=10,sd=2)
Y<-c(Y1,Y2,Y3)

y1=c();y2=c();y3=c();yst=c();
N=N1+N2+N3;
w1=N1/N;w2=N2/N;w3=N3/N;
n1=round(n*w1)
n2=round(n*w2)
n3=round(n*w3)
Looping
for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
}
mean(yst); var(yst)
Output
Mean(yst)=344.36
Var(yst) = 0.08
Variance of Mean Using R
mean(y)
vp=(w1^2)*var(Y1)*(N1-n1)/(N1*n1) + (w2^2)*var(Y2)*(N2-n2)/(N2*n2) +
(w3^2)*var(Y3)*(N3-n3)/(N3*n3)
Q Generate a population of size 1000 consisting on three strata such that
 200 values for stratum-1 from normal distribution with mean=2 and standard
deviation=3.
deviation=9.

deviation=5.
 Allocate the sample size to each stratum by Neyman Allocation where n=50.
 Select the sample from each stratum and estimate the mean of population.
Ans:
Defining Population
N1=200;N2=300;N3=500 ; n=50
Y<-c(Y1,Y2,Y3)
Sample Size Under Neyman Allocation
N=N1+N2+N3;
w1=N1/N;w2=N2/N;w3=N3/N
sum=w1*3+w2*9+w3*5
n1=round(n*w1*3/sum)
Looping
y1=c();y2=c();y3=c();yst=c();
for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
} mean(yst); var(yst)
Output
Mean(yst)=344.36

Var(yst) = 0.08
Q. Find the mean and variance of Proportion in Stratified Sampling
Ans: Suppose we have N population units i.e. Y1, Y2, …. Yi, …YN
yi = 1 if ith unit possesses a certain attribute and 0 otherwise.
The population proportion is defined
N
Y   Yi N  A / N  P
i 1
The sample proportion is
n
y   yi / n  a / n  p
i 1
Since Yi takes the values as 1 and 0.

N
Y
i 1
i
2
 A  NP
The same is the case for sample.

n
y
i 1
2
i
 a  np
N N
 (Yi  Y )2  Yi 2  NY 2
i 1 i 1
N 2
 (Yi  Y )
S  i 1
2
( N  1)
N
S2  P(1  P)
N 1
NPQ
S2 
N 1
Similarly s2 = npq / (n – 1)
Unbiased Variance Estimator
For SRSWOR
N n 2
VAR( ywor )  S ,
Nn
N  n NPQ
VAR( pwor )  ,
Nn N  1
N  n PQ
VAR( pwor )  ,
n N 1
Proportion Estimation
1 k
pst 
N
 ph N h
h 1
For single stratum
Ah  N h Ph ,
S h2  N h Ph Qh / ( N h  1),
N h  nh
Var ( ph )  Ph Qh ,
( N h  1) nh
1 k
pst 
N
 ph N h
h 1
1 K
Var ( pst )   Nh2Var ( ph )
N2 h 1
N h  nh PhQh
Var ( ph ) 
N h  1 nh
1 k N h2 ( N h  nh ) PhQh
2 
Var ( pst ) 
N h 1 Nh  1 nh
k ( N h  nh ) PhQh
Var ( pst )   Wh2
h 1 Nh  1 nh
If Nh – 1 ~ Nh and nh/Nh is ignored

k
Var ( pst )   Wh2 PhQh / nh
h 1
Estimator of Variance
k ( N h  nh ) ph qh
var ( pst )   Wh2
h 1 Nh nh  1
Variance in Case Of Proportional Allocation

N n k
Varprop  yst    Wh Sh2
Nn h1

Sh2  Nh PhQh / ( Nh  1)
N  n k Wh Nh PhQh
Varprop  pst   
Nn h1 ( N h  1)
Variance in Case of Neyman Allocation

2
1 k  1 k
Varopt  yst     Wh Sh    Wh Sh2
n  h1  N h 1
Sh2  Nh PhQh / ( Nh  1)
2
1 k  1 k
Varopt  pst     Wh N h PhQh / ( N h  1)    Wh Sh2
n  h1  N h 1
For large stratum size i.e. Nh – 1 ~ Nh
2
1 k  1 k
Varopt  pst     Wh PhQh    Wh Sh2
n  h1  N h 1
Example:
The management of a local newspaper is to decide whether it should continue with the
publication of 'Children Column', which had been introduced on experimental basis. For this
purpose, it is imperative to estimate the proportion of readers who would favor its continuance.
The frame consists of readers who had stayed with the paper for the last six months. Since
different attitudes are expected from the urban and rural readers, the population is stratified into
urban readers and rural readers. In the population, there are 73000 urban readers and 30280 rural
readers.

N1=73000, N2= 30280, N=103280, n=1016
1016
n1  73000  718
103280
1016
n2  30280  298
103280

The investigator selected WOR simple random samples of 718 respondents from stratum I
(urban readers) and 298 readers from stratum II (rural readers). The number of individuals who
favor continuation of the column was 570 from stratum I and 143 from stratum II.
x1= 570, x2= 143, n1=718, n2=298

x1 x
p1  , p2  2
n1 n2
Out put
P1=0.7939
P2= 0.4799
N1=73000, N2= 30280, N=103280, n=1016
1 k
pst 
N
 ph N h
h 1
1
pst   p1N1  p2 N2 
N
Output
P1=0.7939
P2= 0.4799
Pst=0.7018
Example:
The management of a local newspaper is to decide whether it should continue with the
publication of 'Children Column', which had been introduced on experimental basis. For this
purpose, it is imperative to estimate the proportion of readers who would favor its continuance.
The frame consists of readers who had stayed with the paper for the last six months. Since
different attitudes are expected from the urban and rural readers, the population is stratified into
urban readers and rural readers. In the population, there are 73000 urban readers and 30280 rural
readers.
The investigator selected WOR simple random samples of 718 respondents from stratum I
(urban readers) and 298 readers from stratum II (rural readers). The number of individuals who
favor continuation of the column was 570 from stratum I and 143 from stratum II.

x1= 570, x2= 143, n1=718, n2=298

x1 570
p1    0.7939
n1 718
x2 143
p2    0.4799
n2 298
Output
P1=0.7939
P2= 0.4799
N1=73000, N2= 30280
N=103280, n=1016
N1 73000 N 2 30280
W1    0.7068 W2    0.2932
N 103280 N 103280
N1=73000, N2= 30280, N=103280, n=1016
1 k
pst 
N
 ph N h
h 1
1
pst   p1N1  p2 N2 
N
pst  0.7018
Estimated Variance
Stra-1 Stra-2
Nh 73000 30280
nh 718 298
ph 0.7939 0.4799
k ( N h  nh ) ph qh
var ( pst )   Wh2
h 1 Nh nh  1
( N1  n1 ) p1q1 ( N 2  n2 ) p2 q2
Var ( pst )  W12  W22
N1 n1  1 N2 n2  1
Output
Ws  0.7068,0.2932
= .0001129 + .0000715 = .0001844
Confidence Interval for Proportion

pst  1.96S.E ( pst )

The Lower Limit
0.7018  1.96(0.0136)  0.1772
The Upper Limit
0.7018  1.96(0.0136)  0.6752
Systematic Sampling

Introduction:
First unit is selected randomly from first ‘k’ units and rest of the units are selected automatically.
Systematic sampling has many types but we discuss the commonly used methods i.e. the linear
and circular systematic sampling.
Linear Systematic Sampling
Group Sample Composition
1 1, k+1, 2k+1, …, (i-1)k+1, … (n-1)k+1

2 2, k+2, 2k+2, …, (i-1)k+2, … (n-1)k+2
‘ ‘ ‘ ‘
‘
‘
‘ ‘ ‘ ‘
‘
‘
‘ ‘ ‘ ‘
‘
‘
r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r
‘ ‘ ‘ ‘
‘
‘
‘ ‘ ‘ ‘
‘
‘
‘ ‘ ‘ ‘
‘
‘
k k, 2k, 3k ,… ,ik …. Nk

Every kth unit is included in the sample.

The interval ‘k’ is calculated by dividing the population size to the sample size.
The sampling interval will be
k=N/n
The feature of operational convenience is prominent.
No risk to miss the large part of population.
Example
A certain company claims about their daily production in numbers as
125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.
We are interested to select the systematic sample of size 3
Here we have N=18 with n=3, so k=6
125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.
Random Start Serial Number Sampled values
1 1,7,13 125,164,155
2 2,8,14 135,169,159
3 3,9,15 157,147,139
4 4,10,16 192,150,147
5 5,11,17 151,138,149
6 6,12,18 175,167,158
N=15, n=3
125, 135,
157, 192, 151,
175,164,169, 147,
150,138, 167, 155, 159,139
Q . Prove that sample mean is unbiased estimator of population mean
Sample
Mean
1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1

2 2, k+2, 2k+2,…, (i-1)k+2,… (n-1)k+2 y2

‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r yr
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk
Population Mean
The population mean is given by
1 k n
Y   yri ,
nk r 1 i 1
Expectation of Sample Mean
From the table, we can see that
1 n
yr   yri
n i 1
1 k
E ( ysy )   yr ,
k r 1
1 k 1 n
E ( ysy )    yri ,
k r 1 n i 1
1 k n
E ( ysy )    yri ,
nk r 1 i 1
1 k n
Y   yri ,
nk r 1 i 1
E ( ysy )  Y
Example
A certain company claims about their daily production in numbers as
125, 135, 157,192,151, 175,164,169,147,150,138,167,155,159,139,147,149,158.
We are interested to select the systematic sample of size 3.
125, 135, 157,192,151, 175,164,169,147,150,138,167,155,159,139,147,149,158.

Random Sampled values Serial Number

Start
1 125,164,155 1,7,13
2 135,169,159 2,8,14
3 157,147,139 3,9,15
4 192,150,147 4,10,16
5 151,138,149 5,11,17
6 175,167,158 6,12,18
Sampled values Sampled Means
125,164,155 148
135,169,159 154.333
157,147,139 147.667
192,150,147 163
151,138,149 146
175,167,158 166.667
Mean=154.28
Q. Derive the Variance of Sample Mean Under Systematic Sampling?
Ans: Linear Systematic Sampling
Sample
Mean
1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1
2 2, k+2, 2k+2,…, (i-1)k+2,… (n-1)k+2 y2
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r yr

‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,…,ik…. Nk yk
Variance of Sample Mean

Expressions of Variance
1 k
V ( ysy )   ( yr  Y )2
k r 1
N  1 2 k (n  1) 2
Var ( ysy )  S  Sw
N N
1 k n
Sw2    ( yri  yr )2
k (n  1) r 1 i 1
The variance of sample mean is given by
1 k
V ( ysy )  
k r 1
( yr  Y )2
k n 2
   yri  Y 
r 1 i 1
S2 
 nk  1
k n
 nk  1 S 2     yri  Y 
2
r 1 i 1
k n
 nk  1 S 2     yri  Y 
2
r 1 i 1
k n 2
    yri  yr  yr  Y 
r 1 i 1
    yri  yr      yr  Y 
k n k n 2
2
r 1 i 1 r 1 i 1
 nk  1 S 2     yri  yr 2     yr  Y 
k n k n 2
r 1 i 1 r 1 i 1
Sample
Mean
1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1

2 2, k+2, 2k+2,…, (i-1)k+2,… (n-1)k+2 y2
‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r yr
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk
 nk  1 S 2     yri  yr 2     yr  Y 
k n k n 2
r 1 i 1 r 1 i 1
 nk  1 S 2  k  n  1 Sw2  n   yr  Y 
k 2
r 1
N  1 2 k (n  1) 2
N N
N  1 2 k (n  1) 2
N N
Example:
The heights of the 30 trees from a certain area of a forest are given by
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,
28,29,31
Select a systematic random sample of size 5.
Estimate the mean of the population
Estimate the variance.
We have
N=30, n=5;k=6

Random Serial Serial Sampled Values

Start Number Number
1 1,7,13,19,25 1,7,13,19,25 40, 32,30,29,17
2 2,8,14,20,26 2,8,14,20,26 38,30,28,32,22
3 3,9,15,21,27 3,9,15,21,27 36,31,24,34,35
4 4,10,16,22,28 4,10,16,22,28 35,29,25,36,28
5 5,11,17,23,29 5,11,17,23,29 32,37,26,21,29
6 6,12,18,24,30 6,12,18,24,30 35,41,27,19,31
Sampled Values Sample Means
40,32,30,29,17 29.6
38,30,28,32,22 30
3 32
6,31,24,34,35 30.6
35,29,25,36,28 29
32,37,26,21,29 30.6
35,41,27,19,31
Mean and Variance
1 k
E ( ysy )   yr ,
k r 1
1 k
V ( ysy )  
k r 1
( yr  Y )2
Sum=181.8
Mean=30.3
var=0.89
Sumdev=5.34
Q . Drive the following expression of variance of sample mean under Systematic Sampling.
Virtual University of Pakistan  N  1 S 2  Page 72

V ( ysy ) 
Nn 1   w  n  1
Proof:
1 k
V ( ysy )   ( yr  Y )2
k r 1
2
n 
k   yi 
k Vn( y )  1  i 1  Y 

    yrikr Y1  ynru  Y 
1 sy
nk (n  1) r 1 i u 1  
w   
 N  1  2
  S
 nk  1 k  n 
2
V ( ysy )  2     yri  Y  
The intra class correlation between the pairs nofkunits that are in2the same systematic sample is
1 k r 1ni 1 
V ( ysy )  2     yri  Y  
n k r 1 i 1 
n 
     yri  Y  yru  Y  
1 k 2
V ( ysy )    yrj  Y
n2 k r 1 i 1 
1 k n 
      yri  Y  yru  Y 
2 k n
2    ri
V ( ysy )  y  Y 
n k  r 1 i 1 r 1 i u 1 
1  
    yri  Y  yru  Y 
k n
V ( ysy )  2 
 nk  1 S 2

n k r 1 i u 1 
1  
 nk  1 S 2      yri  Y  yru  Y 
k n
V ( ysy )  2 
n k r 1 i u 1 
1  k n 
V ( ysy )  2  nk  1 S 2      yri  yi  yru  yu  
n k r 1 i u 1 
k n E ( yri  Y )( yru  Y )
 w  n  1 nk  1 S 2      yri  yi  yru  yu  w 
r 1 i u 1
E ( yri  Y )2
1 
V ( ysy )   nk  1 S 2  w  n  1 nk  1 S 2 
n2 k 
 nk  1 S 2 1   n  1 
V ( ysy )   w 
n2 k
 nk  1 S 2 1  
V ( ysy )   w  n  1
n2 k
OR
 nk  1 S 2 1  
V ( ysy )   n  1
n 
w
nk
OR

 N  1 S 2 
V ( ysy ) 
Nn 1   w  n  1
Q. Describe the comparison between SRS and Systematic Sampling on th basis of variance
of sample mean?
Ans:
N  1 2 k (n  1) 2
N N
N n 2
Var ( ysrs )  S
Nn
N  n 2 N  1 2 k (n  1) 2
Var ( ysrs )  Var ( ysy )  S  S  Sw
Nn N N
 N  n N  1  2 (n  1) 2
  S  Sw
 Nn N  n
Var ( ysrs )  Var ( ysy ) 

(n  1) 2
n

Sw  S 2 
Var ( ysrs )
R.E 
Var ( ysy )
N n 2
S
R.E  Nn
 N  1 S 2 1   n  1 
 w 
Nn
1
R.E 
 N  1 1   n  1 
w 
N n 
Var ( ysrs )
R.E 
Var ( ysy )
1
R.E 
 N  1 1  
 n  1
N n 
w

Q. Derive the Variance of Stratified Sampling in Systematic setting?

Ans:
Sample
Mean
1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1

2 2, k+2, 2k+2,…, (i-1)k+2,… (n-1)k+2 y2
‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r yr
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk
Stratified Sampling in Setting of Sys Sampling

1 L
ystr 
N
 Nh yh
h 1
1 n
ystr 
nk
 ky j
j 1
1 n
ystr   yj
n j 1
V  y j 
1 n
V  ystr  
n2 j 1
1 nk 1 2
V  ystr   2  Sj
n j 1 k
k 1 n
V  ystr   2  S 2j
n k j 1
k 1 2
V  ystr   Swst
nk
nk  n 2
n.nk
N n 2
nN
N n 2
nN
Q . Describe a comparison between Stratified and Systematic Sampling on the basis of

variance of sample mean.
Ans:
1 k
V ( ysy )   ( yr  Y )2
k r 1
2
n n 
1 k   yi  yi 
V ( ysy )    i 1  i 1 
k r 1  n n 
 
 
2
n
1 k 
V ( ysy )  2     yri  yi  
n k r 1 i 1 
n 
   yrj  y j 
1 k
   yri  yi  yru  yu 
2
V ( ysy )  2
n k r 1 i 1 

1 k n k n 
V ( ysy )  2     yri  yi  2
     yri  yi  yru  yu 
n k  r 1 i 1 r 1 i u 1 
1  k n 
V ( ysy )    nk  1 S 2
     yri  yi  y ru  yu  
n2 k  r 1 i u 1 
Alternative form of variance
1  k n 
V ( ysy )  2 
 nk  1 S 2      yri  yi  yru  yu 
n k r 1 i u 1 
The intra class correlation between the pairs of units that are in the same systematic sample is
E ( yri  yi )( yru  yu )
 wst 
E ( yri  yi )2
1 k n
 
nk (n  1) r 1 i u 1
  yri  yi  yru  yu 
w 
 N 1  2
  S wst
 nk 
   yrj  y j   yru  yu 
k n
 wst  n  1 nk  1 Swst
2

r 1 j u 1
1  
   yru  yu 
k n
V ( ysy )    N  n  S 2
wst     yrj  y j
n2 k  r 1 j u 1 
1 
V ( ysy )   N  n  Swst
2
  wst  n  1 N  n  S wst
2 
n2 k  
 N  n  Swst
2
V ( ysy ) 
nN
 1   wst  n  1
N n 2
Var ( ystr )  Swst
Nn
2
2
Var ( ysrs )  Var ( ysy ) 
nN
 1   wst  n  1  nN
Var ( ystr )
R.E 
Var ( ysy )
1
R.E 
1   wst  n  1 
Q . State the Stratified Sampling in terms of systematic sampling for population with linear
trend?

Ans: The population increase according to linear trend. The variance of sample mean for
SRSWOR is
Var ( ywor ) 
 k  1 nk  1
12
For Stratified Sampling in Terms of Systematic Sampling
N n 2
Nn
k
 y  Yk 
2
r
2
S wst  r 1
k 1
 y  Yk 
2
r
r 1
   yr2  Yk2  2 yr Yk 
k
r 1
 k k

   yr2  kYk2  2Yk  yr 
 r 1 r 1 
 k 
   yr2  kYk2  2kYk2 
 r 1 
k
  yr2  kYk2
r 1
2
 k 
  yr 
  yr2   r 1 
k
r 1 k
N N
 yi   i  1  2  ....  N
i 1 i 1
k k
 yr   r  1  2  ....  k
r 1 r 1
k
k  k  1
r 
r 1 2
k
k  k  1 2k  1
r
r 1
2

6
 r 
2
k
Virtual University of Pakistan  r 2

 Page 78
r 1 k
 k  k  1 2k  1  k (k  1) 2 
  
 6 4k 
k  k  1   2k  1 (k  1) 
  
2  3 2 
k  k  1   k  1 
  
2  6 
k
k  k  1   k  1 
 y  Yk  
2
r  
r 1 2  6 
k
 y  Yk 
2
r
2
S wst  r 1
k 1
k  k  1
2
S wst 
12
N n 2
Nn
nk  n
Var ( ystr )  2
2
Swst
n k
k 2 1
Var ( ystr ) 
12n
Q . Describe the variance of systematic sampling for population with linear trend.
Ans: Variance of sample mean under Systematic Sampling
k
 y Y 
2
r
Var ( ysy )  r 1
k
  k  
2
  yr 
1  k 2  r 1  
  yr 
k  r 1 k 
 
 
N N
 yi   i  1  2  ....  N
i 1 i 1
k k
 yr   r  1  2  ....  k
r 1 r 1
k
k  k  1
r 
r 1 2
k
k  k  1 2k  1
r
r 1
2

6
  k  
2
  r 
1  k 2  r 1  
y
  yr 
k  r 1 k 
 
 
  k  
2
 r 
1  k 2  r 1  
 r 
k  r 1 k 
 
 
1  k  k  1 2k  1  k (k  1)  
2
   
k  6 4k 
k  k  1   2k  1 (k  1) 
   
2k  3 2 
k  k  1   k  1 
  
2k  6 
k  k  1 k  1

12k

k 2
 1
When n=1 12
Var ( ywor ) 
 k  1 nk  1
12
k 2 1
Var ( ystr ) 
12n
Var ( ysy ) 
k 2
 1
12
When n>1
Var ( ywor ) 
 k  1 nk  1
12
k 2 1
Var ( ystr ) 
12n
Var ( ysy )
k 2
 1
12
Var ( ystr )  Var ( ysy )  Var ( ywor ) for n  1
Var ( ystr )  Var ( ysy )  Var ( ywor ) for n  1
Example: A certain company claims about their daily production as

125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.
Select the systematic sample of size 3 using R language. Also find mean of sample.
Ans:
y<-c(125, 135,157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158)
y<-c(125, 135,157,192,
151,175,164,169,147,
150,138,167,155,159,139,147,149,158)
n=3;N=length(y)
k=N/n
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-y[s]

mean(sys.sample)
Output
First run=154.33
2nd run=148
Example: 2
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,28,29,31
Select a systematic random sample of size 5. Also find sample mean.
Ans:
N=30, n=5;k=6
pop<-c(40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,
36,21,19,17,22,35,28,29,31)
n=5;N=length(pop);
k=N/n
sys.sample<-pop[s]
mean(sys.sample)
var(sys.sample)
sd(sys.sample)
Output
29.6(8.26)
29(sd=6.04)
Example: A certain company claims about their daily production as
125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.
We are interested to select the systematic sample of size 3.
Obtain 10000 samples using systematic sampling. Find the mean of each sample. Find the mean
of means and variance of means.
Ans:


pop<-c(125, 135,157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158)
n=3;N=length(pop);
k=N/n
for(i in 1:10000)
{
sys.sample<-pop[s]
m[i]<- mean(sys.sample)
}mean(m)
var(m)
Out put
Mean=154.29
Var=63.6
Example
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,
28,29,31
Select a systematic random sample of size 5.
Obtain 5000 samples using systematic sampling.
Find the mean of each sample.
Find the mean of means and variance of means.
Ans
We have
N=30, n=5;k=6
pop<-c(40,38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,
19,17,22,35,28,29,31)
n=5;N=length(pop);k=N/n
for(i in 1:5000)
{


sys.sample<-pop[s]
m[i]<- mean(sys.sample)
}
mean(m)
var(m)
Output
Mean=30.29
Var=0.89
Example:
Generate a population of size 1000 values from normal distribution with mean=2 and standard
deviation=3.
Select the 10000 samples each of size 50 using systematic sampling technique and estimate the
mean of each sample.
Find the mean and variance of 10000 means.
Ans:
N=1000; n=50;k=N/n;m=c();
pop<-rnorm(N,mean=2,sd=3)
for(i in 1:10000)
{
sys.sample<-pop[s]
m[i]=mean(sys.sample)
-} mean(m);var(m)
Out put
2.01667
0.1929605
Example: 2
Generate a population of size 500 values from normal distribution with mean=20 and standard
deviation=10.

Select the 5000 sample each of size 50 using the systematic sampling technique and estimate the
mean of each sample.Find the mean and variance of 5000 means.
Ans:
N=500; n=50;k=N/n;m=c();
pop<-rnorm(N,mean=20,sd=10)
for(i in 1:5000)
{
sys.sample<-pop[s]
m[i]=mean(sys.sample)
}
mean(m);var(m);
Output
19.93
1.456223

Cluster Sampling

Introduction:
 Cluster Sampling
 A cluster is the sampling unit consisting on the observation units.
 Any sampling method can be used for selection of clusters.
 All the units within a cluster are studied.
 Nine clusters each of same size.
 Clusters Settings
Sample
Mean
Cluster 1, 2 3 …, j, … M
y1.
1 y11 y12 y13 …, y1j, … y1M  y1
M
2 y21 y22 y23 …, y2j, … y2M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
yi.
i yi1 yi2 yi3 …, yij, … yiM  yi
M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
N yN1 yN2 yN3 …, yNj, … yNM yN

Total y.1 y.1 y.1 y.1 Y
 Notations
 Suppose we have a population of N Clusters each of size M.
 NM= Total elements in population.
 yij = observation value of jth element in ith cluster.
 nM= Total elements in sample.
 yi.= yi= Total of ith cluster.
M
yi.   yij
j 1
 y = Sample mean per cluster.

n
 yi
i 1
y
n
 y = Overall sample mean.
M n
  yi
j 1 i 1
y
Mn
 Y = Overall population mean.
M n
  yij
j 1 i 1
y
Mn
 Let yi be the total of ith cluster.
y
y
M
 Unbiased mean estimator
M E( y )  E( y )
M E( y )  Y
E( y )  Y
Q. Find the variance of Sample Mean in Cluster Sampling considering equal cluster sizes.
Ans:
 Unbiased mean estimator
 The unbiased mean estimator is,

n M
  yij
i 1 j 1
y
Mn
V  y  E y Y  
2
 Variance of mean estimator
1 f  2
V y    Sb
 n 
  yi  Y 
N 2
i 1
Sb2 
N 1
  yi  Y 
N 2
 1  f  i 1
V y   
 n  N 1
2
1 
  yi  Y   
N 2 N M
  yij  Y 
i 1 M
i 1  j 1 
2
1 
    2     ij
y  Y  yik  Y 
N M 1 N M 2 1 N n
M yij  Y   2   yij  Y 

i 1  j 1  M i 1 j 1 M i 1 j  k 1
2
1  1  
  M   yij  Y    y 
N M N n
   NM  1 S 2
    yij  Y ik Y 

i 1  j 1  M 2  i 1 j  k 1 
2
1 M  1  
    
N N n
  M  yij  Y   2  NM  1 S 2     yij  Y yik  Y
i 1 
 j 1  M  i 1 j  k 1
The interclass correlation between the elements within a cluster,
E ( yij  Y )( yik  Y )
w 
E ( yij  Y )2
  y 
1 N n
 
NM ( M  1) i 1 j  k 1
 yij  Y ik Y
w 
 NM  1  2
 NM  S
 
   yij  Y  yik  Y 
N n
w  M  1 NM  1 S 2  
i 1 j  k 1
2
1 
 
N M 1
M yij  Y   2  NM  1 S 2   w  M  1 NM  1 S 2 

i 1  j 1  M  

2
M 
 
N
  yi  Y 
N 2
   yij  Y 
 1  f  i 1 1  1  f  i 1  j 1 
V y     2 
 n  N 1 M  n  N 1
1  1  f   NM  1 S   w  M  1 NM  1 S
2 2
 2 
M  n  N 1
1 1 f 
V ( y)  2  
M  n 
 NM  1 S 2 
N 1 1   w  M  1 
 S2 
V ( y)   1   w  M  1  NM  1  NM ,
 nM  
 
N 1  N
Q. Compare the simple random sampling and cluster Sampling in terms of variances of
sample means such that
N  n S2
Var ( ysrs ) 
N n
 NM  nM  S
2
V  ysrs    
 NM  nM
Ans:
 Comparison
N  n S2
Var ( ysrs ) 
N n
 NM  nM
2
S
V  ysrs     nM
 NM 
 
1 N M 2
S2    yij  Y
NM  1 i 1 j 1
Mean sum of squares within clusters in the population
 
N M
S 2  NM  1    yij  Y
2
i 1 j 1
Mean sum of squares for ith cluster
 
N M 2
   ( yij  yi )  ( yi  Y )
i 1 j 1
N M N
   ( yij  yi )2  M  ( yi  Y )2
i 1 j 1 i 1

 N (M  1)Sw2  M ( N  1)Sb2
Var ( ysrs )
R.E 
Var ( y )
S2
R.E 
MSb2
N ( M  1) S w2  M ( N  1) Sb2
R.E 
MSb2  NM  1
1  N ( M  1) S w2 M ( N  1) Sb2 
   
 NM  1  MSb2 MSb2 
1  N ( M  1) S w2 
   ( N  1) 
 NM  1  MSb 2

 This value increases when Sw is large and Sb is small. So cluster sampling will be
efficient if clusters are so
 Formed that the variation the between cluster means is as small as possible while
variation within the clusters is as large as possible.
Q. Compare the simple random sampling and cluster Sampling in terms of interaclass
correlation
Ans:
Var ( ysrs )
R.E 
Var ( y )
 S2 
V ( y)  
 nM  1   w  M  1 
 
S2
V  ysrs  
nM
S2
R.E  nM
 S2 
  1   w  M  1 
 nM 
1
R.E 
1   w  M  1 
Q. Explain the concept of cluster sampling for unequal cluster sizes.

Ans:
 Clusters Settings

Sample
Mean
Cluster 1, 2 3 …, j, … Mi
y1.
1 y11 y12 y13 …, y1j, … y1M1  y1
M
2 y21 y22 y23 …, y2j, … y2M2
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
yi.
i yi1 yi2 yi3 …, yij, … yiMi  yi
M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
N yN1 yN2 yN3 …, yNj, … yNMN yN
Total y.1 y.1 y.1 y.1 Y
 Mean Estimator
 Since the cluster sizes are unequal, the total size is,
N
M    Mi
i 1
 The mean of ith cluster is,

Mi
 yij
j 1
yi 
Mi
 The overall mean is
N Mi
  yij
i 1 j 1
Y N
 Mi
i 1
N
 M i yi
i 1
Y 
M
 Expected Value of Sample Mean

 The mean of clusters

N
 M i yi
i 1
Y 
M
n
 yi
i 1
y
n
N
 yi
Ey  i 1
N
Q. Find the expression of Bias for the mean estimator in cluster sampling for unequal
cluster sizes.
Ans.
The bias is defined as
Bias(T )  E (T )  
Bias( y )  E ( y )  Y
N N
 yi  M i yi
i 1
Bias( y )   i 1
N M
 N 
 M  yi N 
1 
 i 1
  M i yi 
M N i 1 
 
 
 N N 
  M i  yi N 
1  i 1 i 1
   M i yi 
M N i 1 
 
 
 N N 
1 
N  M i  yi 
  M i yi  i 1 Ni 1 
M   i 1
 
 
Q. Find the expression of mean square error for the mean estimator in cluster sampling for
unequal cluster sizes

MSE (T )  Var (T )   Bias(T ) 

2
MSE ( y )  Var ( y )   Bias( y ) 

2
1 f  2
MSE ( y )    Sb
 n 
2
 ( N  1) 
  Cov(m, y ) 
 M  
MSE (T )  Var (T )   Bias(T ) 

2
MSE ( y )  Var ( y )   Bias( y ) 

2
1 f  2
MSE ( y )    Sb
 n 
2
 ( N  1) 
  Cov(m, y ) 
 M  
Q. Find the expected value Weighted Mean For Unequal Cluster, where is weighted mean
is given by
n
 M i yi
i 1
yw 
nM
Answer:
 Weighted Mean
 Since the cluster sizes are unequal, the mean of cluster size is,
M
M
N
 The weighted mean based on the size of ith cluster is,
n
 M i yi
i 1
yw 
nM

 Expectation of Weighted Mean

 Taking expectation on both sides
 n 
  M i yi 
E  yw   E  i 1 
 nM 
 
 
Q. Find the variance expression of weighted Mean for unequal cluster, where weighted
mean is given by
n
 M i yi
i 1
yw 
nM
Ans:
 The variance is given by

 n 
  M i yi 
V  yw   V  i 1 
 nM 
 
 
1 f  2
V y    Sbw
 n 
 The Estimator of Variance

The estimator of variance is given by
 1 f  ˆ2
Vˆ  y     Sbw
 n 

Example: Find mean and variance of sample mean in cluster sampling using the following
data set.
Ans:
Population of Six Clusters
Total
Cluster-1 125 115 129 134 111 614
Cluster-2 134 125 142 141 131 673
Cluster-3 144 143 122 134 126 669
Cluster-4 114 111 134 131 146 636
Cluster-5 119 126 122 129 130 626
Cluster-6 140 125 124 124 115 628
 Population Mean
N M
  Yij
i 1 j 1
Y 
MN
3846
Y  128.2
30
 Variance of Sample Mean Using Population
1 f  2
Var  y     Sb
 n 
  yi  Y 
N 2
i 1
Sb2 
N 1

  yi  Y 
N 2
 1  f  i 1
Var  y     ,
 n  N 1
N n 1 N
Var  y     yi  yc 2
Nn N  1 i 1
 Variance of Sample Mean

N n 1 N
 
2
 yi  Y
Nn N  1 i 1
 y Y 
2
i
 y Y 
2
yi yi i
614 122.8 29.16
673 134.6 40.96
669 133.8 31.36
636 127.2 1.00
626 125.2 9.00
628 125.6 6.76
Y  128.2
N n 1 N
 
2
 yi  Y
Nn N  1 i 1
63 1
 (118.24)
6  3 6 1
 1.304

 Population of Six Clusters
Total
Cluster-1 125 115 129 134 111 614
Cluster-2 134 125 142 141 131 673
Cluster-3 144 143 122 134 126 669
Cluster-4 114 111 134 131 146 636
Cluster-5 119 126 122 129 130 626
Cluster-6 140 125 124 124 115 628
 Mean and Variance Using All the Observations

3846
Y  128.2
30
V  y   1.304
Example: Find the mean and variance by taking a sample of size 3 from the previous data
set under cluster sampling.
Ans:
Sampled clusters
Total
Cluster-3 144 143 122 134 126 669
Cluster-4 114 111 134 131 146 636
Cluster-6 140 125 124 124 115 628

n M 3 5
  yij   yij
i 1 j 1 i 1 j 1 1933
y    128.87
Mn 5 3 15
 Estimated Variance of Sample mean

N n 1 n
 yi  y 
2

Nn n  1 i 1
 
2
yi yi yi  Y
669 133.8 31.36
636 127.2 1.00
628 125.6 6.76
Y  128.87
 Estimated Variance
var  y 
N n 1 3
 yi  y 
2
 
Nn n  1 i 1
63 1
  37.78
63 2
 3.15
 Example-2
 420 trees is divided into 105 clusters.
 Each cluster of size 4.
 A simple random sample of 15 clusters is selected.
 Estimate the mean yield by using cluster sampling.
 Sample of 15 Clusters

n M
  yij
i 1 j 1
y
Mn
1142.44
y  19.0407
60
1 f  2
V y    Sb
 n 
  yi  Y 
N 2
i 1
Sb2 
N 1
  yi  Y 
N 2
 1  f  i 1
V y   
 n  N 1
N n 1 n
Var  y     yi  yc 2
Nn n  1 i 1
SS  1495.5596
 
2
cluster yi yi yi  Y
1 80.4600 20.1150 1.162
2 50.6900 12.6725 41.082
3 81.7500 20.4375 1.961

4 63.5200 15.8800 9.967
5 97.2000 24.3000 27.699
6 26.8000 6.7000 152.202
7 99.1300 24.7825 33.011
8 63.2500 15.8125 10.397
9 179.350 44.8375 665.666
10 133.760 33.4400 207.446
11 59.1500 14.7875 18.058
12 32.8500 8.2125 117.170
13 58.3000 14.5750 19.909
14 92.7300 23.1825 17.185
15 23.5000 5.8750 173.238
Var  y 
N n 1 n
 yi  y 
2
 
Nn n  1 i 1
90 1
  1495.5596
105  15 14
 6.104
Q. Define the following data in R language in form of clusters and mean and variance.
Total
Cluster-1 125 115 129 134 111 614
Cluster-2 134 125 142 141 131 673
Cluster-3 144 143 122 134 126 669
Cluster-4 114 111 134 131 146 636

Cluster-5 119 126 122 129 130 626
Cluster-6 140 125 124 124 115 628
Ans:
How to Do This in R?
Cluster-1 125 115 129 134 111
Cluster-2 134 125 142 141 131
Cluster-3 144 143 122 134 126
Cluster-4 114 111 134 131 146
Cluster-5 119 126 122 129 130
Cluster-6 140 125 124 124 115
#--Defining Clusters in R----

Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu4<-c(114,111,134,131,146)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
#----Grand total----
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
N=6;M=5;
#----Mean of each cluster----
Yibar<-Yi/M
#----Population Mean----
pop.mean=sum(Yi)/(N*M)

#----Sum of Squares----
dv.p<-(Yibar-pop.mean)^2
sdv.p<-sum(dv.p)
vr.p<-sdv.p/(N-1)
#----The Variance----
cvr.p<-((N-n)/(N*n))*vr.p
N n 1 N
 
2
 yi  Y
Nn N  1 i 1
Q. Perform the cluster sampling using R language with following data set. Also find mean
and variance.
Cluster-1 125 115 129 134 111
Cluster-2 134 125 142 141 131
Cluster-3 144 143 122 134 126
Cluster-4 114 111 134 131 146
Cluster-5 119 126 122 129 130
Cluster-6 140 125 124 124 115
Answer:
 Defining Clusters
Cluster-1 125 115 129 134 111
Cluster-2 134 125 142 141 131
Cluster-3 144 143 122 134 126
Cluster-4 114 111 134 131 146
Cluster-5 119 126 122 129 130
Cluster-6 140 125 124 124 115

Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu4<-c(114,111,134,131,146)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)

 Sum Of Clusters
#----Population----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
 Sampling and Estimated Mean
t.sum=sum(Yi)
#------Sampling----
n=3;N=6;M=5;
yi=sample(Yi,n)
#----Estimated Mean----
yibar<-yi/M
Est.mean=sum(yi)/(n*M)
#----Sum of Squares----
dv<-(yibar-Est.mean)^2
sdv<-sum(dv)
vr<-sdv/(n-1)
#----The Variance----
cvr<-((N-n)/(N*n))*vr
N n 1 n
 yi  y 
2

Nn n  1 i 1
Q.70 Find the mean and variance for Unequal Clusters in cluster sampling using the
following data
Clus-1 125 115 129 134
Clus-2 134 125 142 141 131 151 164 139 141

Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151
Ans
Population of Six Clusters
Clus-1 125 115 129 134
Clus-2 134 125 142 141 131 151 164 139 141
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151
 Population Mean in Case of Unequal Cluster Sizes
N
 M i yi
i 1
Y   Y  133.66
M
 Estimator-I: Mean of Cluster Mean
 The mean of cluster means
n
 yi
i 1
y
n
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130

n
 yi
i 1
y
n
 Mean Square Error

MSE ( y )  Var ( y )   Bias( y ) 
2
1 f  2
MSE ( y )    Sb
 n 
2
 ( N  1) 
  Cov(m, y ) 
 M  
 Population Mean in Case of Unequal Cluster Sizes
Y  133.66, Sb2  39.81

 Mean Square Error
MSE ( y )  Var ( y )   Bias( y ) 
2
1 f  2
MSE ( y )    Sb
 n 
2
 ( N  1) 
  Cov(m, y ) 
 M 
Q. Find the weighted mean and variance for Unequal Clusters in cluster sampling using the
following data
Clus-1 125 115 129 134

Clus-2 134 125 142 141 131 151 164 139 141
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151
Answer:
 Population of Six Clusters
Clus-1 125 115 129 134

Clus-2 134 125 142 141 131 151 164 139 141
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151
N
 M i yi
 Population Mean in Case of Unequal Y  i 1  Y  133.66
M
 Estimator II: Weighted Mean
 Since the cluster sizes are unequal, the total size is
N
M    Mi
i 1
 The weighted mean based on the size of ith cluster is
n
 M i yi
i 1
yw 
nM
Clus-3 1 1 1 1 1 1
144 143 122 134 126 157
Clus-4 1 1 1 1 1 1 1
114 111 134 131 146 152 131
Clus-5 1 1 1 1 1
119 126 122 129 130
n
 M i yi
i 1
yw 
nM
 Variance of Weighted Mean
 The variance is given by
 n 
  M i yi 
V  yw   V  i 1 
 nM 
 
 
 1 f  2
V  y    Sbw
 n 

yi Mi yi  M i
M
125.75 4 73.61
140.89 9 185.56
137.67 6 120.88
131.29 7 134.49
125.20 5 91.61
133.80 10 195.80
 1 f  2
V  y    Sbw
 n 
Q.72 Perform the cluster sampling using R language with following data set. Also find
mean and variance.
Clus-1 125 115 129 134

Clus-2 134 125 142 141 131 151 164 139 141
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151
Answer:
#----------Defining Clusters------------
Clu1<-c(125,115,129,134)
Clu2<-c(134,125,142,141,131,151,164,139,141)
Clu3<-c(144,143,122,134,126,157)
Clu4<-c(114,111,134,131,146,152,131)
Clu5<-c(119,126,122,129,130)

Clu6<-c(140,125,124,124,115,111,148,157,143,151)
 Population Mean and Sum Of Clusters
#----Population----
pop.mean=mean(pop)
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
 Mean of Cluster Means
n=3;N=6;M=c(4,9,6,7,5,10);
t.sum=sum(Yi)
# ----- Mean of cluster means using population----
n=3;N=6;M=c(4,9,6,7,5,10);
clu.mean.p=Yi/M
m.c.m.p=mean(clu.mean.p)
##----For MSE----
cov=cov(clu.mean.p,M)
dv.p<-(clu.mean.p-m.c.m.p)^2
sdv.p<-sum(dv.p)
vr.p<-sdv.p/(N-1)
term1<-((N-n)/(N*n))*vr.p
term2=((-(N-1)/sum(M))*cov)^2
mse=term1+term2
####-----From Sample----
j=sample(1:6,n)
yi=Yi[j];mi=M[j]
clu.mean=yi/mi
m.clu.mean=mean(clu.mean)

Q. Perform the Simulation Study using the following data for Equal Cluster Sizes.
Total
Cluster-1 125 115 129 134 111 614
Cluster-2 134 125 142 141 131 673
Cluster-3 144 143 122 134 126 669
Cluster-4 114 111 134 131 146 636
Cluster-5 119 126 122 129 130 626
Cluster-6 140 125 124 124 115 628
Answer:
 How to Do This in R?

Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)
 Sum Of Clusters
#----Population----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
 Sampling and Estimated Mean
t.sum=sum(Yi)
n=3;N=6;M=5;
for(i in 1:10000)
{yi=sample(Yi,n)
Est.mean[i]=sum(yi)/(n*M) }
Q. Perform the Simulation Study using the following data for Equal Cluster Sizes.
Answer:
 Defining Clusters
#----------Defining Clusters------------
Clu1<-c(125,115,129,134)

Clu2<-c(134,125,142,141,131,151,164,139,141)
Clu3<-c(144,143,122,134,126,157)
Clu4<-c(114,111,134,131,146,152,131)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115,111,148,157,143,151)
#----Population----
pop.mean=mean(pop)
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
 Sum of Clusters
n=3;N=6;M=c(4,9,6,7,5,10);
t.sum=sum(Yi)
 With Mean of Cluster Means
for(i in 1:10000)
{j=sample(1:6,n)
yi=Yi[j];mi=M[j]
clu.mean=yi/mi
m.clu.mean[i]=mean(clu.mean) }
mean(m.clu.mean)
var(m.clu.mean)
 Weighted Mean
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
n
 M i yi
i 1
yw 
nM

mc1<-mean(Clu1)
mc2<-mean(Clu2)
mc3<-mean(Clu3)
mc4<-mean(Clu4)
mc5<-mean(Clu5)
mc6<-mean(Clu6)
mYi=c(mc1,mc2,mc3,mc4,mc5,mc6)
 With Weighted Mean
w.m=c(); Mbar=mean(M);
for(i in 1:10000)
{j=sample(1:6,n)
myi=mYi[j];mi=M[j];
w.m[i]=sum(mi*myi)/(n*Mbar) }
mean(w.m)
var(w.m)

Unequal Probability Sampling

Introduction:
 Hansen and Hurwitz (1943) perhaps the first who discussed the concept of unequal
probability theory.
 The most commonly used scheme is probability proportional to size (PPS).
 Sampling with respect to department size.
 Sampling with respect to number of trees.
 PPS With Replacement
 The estimator for population total Y as suggested by Hansen and Hurwitz (1943) is
1 n yi
 ,
yHH 
n i 1 pi
 Computing HH Estimator
Depar-tment Faculty size Pi Numbers
1 332 32/90 1-32
2 110 10/90 33-42
3 221 21/90 43-63
4 116 16/90 64-79
5 11 11/90 80-90
Total 90
1 n yi
yHH   ,
n i 1 pi

Q. Prove that Hansen-Hurwitz Estimator is unbiased to population total. Also find the
variance expression.
1 n yi
yHH   ,
n i 1 pi
Answer:
The estimator for population total Y as suggested by Hansen and Hurwitz (1943) is
1 n yi
yHH   ,
n i 1 pi
1 n y
E ( y HH )  
n i 1
E( i )
pi
N
yi Y
E( )   i Pi  Y
pi i 1 Pi
Variance of Hansen-Hurwitz Estimator
1 n yi
yHH   ,
n i 1 pi
2
1 N y 
V  yHH    pi  i  Y 
2
 yi  N  yi 
n i 1  pi V     pi   Y 
  pi  i 1  pi 
 Another Form of Variance
1  N Yi 2 
Var ( yHH )   Y 2 
n  i 1 pi 
Q. Find all possible samples of size two using the following data and find Hansen-
Hurwitz Estimator. Also find mean and variance of Hansen-Hurwitz Estimator.
Y 0 1 2 3
Yi .5 .2 .1 .2
Z 1 1 1 4
Zi 1 2 3 4
Answer:
 The following is a population with four values with respective size.
Y 0 1 2 3
Yi .5 .2 .1 .2
Z 1 1 1 4
Zi 1 2 3 4
 We are interested to take all possible samples of size 2.

 HH estimator will be calculated for all the samples. Further mean and variance
will be obtained.
Yi Zi Pi  Zi Z i
0.5 1 0.1
1.2 2 0.2
2.1 3 0.3
3.2 4 0.4
 All Possible Samples
Small Population Example

 Small Population Example

 The expected value of HH estimator is
E ( yHH )   y HH pi p j  7  Y
Var( yPPS )  E ( yPPS
2
) Y 2
Var( yPPS )  49.5  49  0.5
 Using the Formula
1  N Yi 2 
Var ( yPPS )   
n  i 1 Pi
Y 2 
1
50  49  0.50
 2
Yi Zi Pi  Zi Z i
0.5 1 0.1
1.2 2 0.2
2.1 3 0.3
3.2 4 0.4
Q. Describe the Lahiri’s Method of selection?
Answer:
 Lahiri’s Method of Selection
 A pair of random numbers is chosen such that one from 1 to N and other 1 to Zmax
(say R)
1
o PI (ith)   Zi / Z max 
N
 if R exceeds the size of the ith unit; then that unit is rejected otherwise it is accepted.
 probability of selecting the ith unit at the first draw.
Sr.No Yi Zi
1 0.5 15
2 1.2 20
3 2.1 7

4 3.2 13
Q. Example for Lahiri’s Method
 Selection with Lahiri’s Method
 The Estimated Total

 The estimated value for population total using Lahiri’s method of selection
1 5 yi 144179.05
yPPS    5  28836
n i 1 pi
Q. Explain the concept of Unequal Probability Sampling Without Replacement?

Answer:
 let a sample of two units is selected from a population of N units.
 Let the probability of the selection of the ith unit is Pi = Zi/Z.
 Suppose the ith unit is not selected at the first draw but the jth unit is selected (j  i)
then the probability of selecting the jth unit at the first draw is Pj = Zj/Z;
 The conditional probability of selecting the ith unit at the second draw is
 The probability of inclusion of ith unit at the second draw to be included in the sample
is the sum of the product that the jth unit is selected at the first draw and the ith unit is
selected at the second draw given the jth unit is selected at the first draw i.e.
N
Pi
P 1 P
j i
j
j
 The total probability i, the probability of inclusion of the ith population unit to be in the
sample is
N
Pi
 i  Pi   Pj
j i 1  Pj
 N P P 
 Pi 1   j  i 
 i 1 1  Pj 1  Pi 
 The probability that both ith and jth units are in the sample is denoted by ij and is
defined as
Pj Pi
ij  PP
i j i  Pj Pi j  Pi  Pj
1  Pi 1  Pj
 1 1 
 PP
i j   
1  Pi 1  Pj 
Q. The Horvitz Thompson Estimator?

Answer:
 HT Estimator
 The general theory of unequal probabilities sampling without replacement was

presented firstly by Horvitz and Thompson (1952).
 An unbiased estimator suggested by them for population total Y is

n N
yi Yi
yHT     ai ,
i 1 i i 1 i

Q. Prove that HT Estimator is unbiased estimator.
 The Horvitz and Thompson in 1952 suggested the estimator total is

n N
yi Yi
yHT     ai ,
i 1 i i 1 i
N N
Yi
E ( yHT )   i  Yi  Y
i 1 i i 1
Q.81 what are the Some Relations of i ?

Answer:
 Relations of i
 The following are some relations of

N
i. 
i 1
i n
N
ii. 
j i
ij  (n  1)  i
N
iii.  (
j i
ij   i j )   i (1   i )
N N N
iv.   i j  n i   2
i 1 j i i 1 i 1
i
 Relation (i)
N
• We know a  n
i 1
i
N
• Taking expectation i 1
i n
 Relation (ii)
N
 j i
ij  (n  1)  i
 is sum of all the probabilities of the samples containing ith and jth units

 is the sum of the probabilities containing first and second units; first and third units;
and so on.
 Thus every P(s) containing the first unit occurs (n-1) times in this sum as the sample
has (n-1) other members in it and it occurs once for each of these members.
N N
 ij ,
j i
 j i
ij  (n  1) i
 Relation (iii)
N
 (j i
ij   i j )   i (1   i )
Taking L.H.S
N N N
 (j i
ij   i j )    ij   i   j
j i j i
N N
 j i
ij   i (  j   i )
j 1
N N
 j i
ij   i (  j   i )
j 1
Using (i) and (ii) relation
 Relation (iv)
N N N
  i j  n i   2
i 1 j i i 1 i 1
i
N N N
  i j   i  j
i 1 j i i 1 j i
Q.82 Fine the variance expression of Horvitz Thompson estimator?

Answer:
 HT Estimator
 The Horvitz and Thompson in 1952 suggested the estimator total is

n N
yi Yi
yHT     ai ,
i 1 i i 1 i
N N
Yi
E ( yHT )   i  Yi  Y
i 1 i i 1
 Variance of HT Estimator
Var( yHT )  E ( yHT )2   E ( yHT )

2

n N
yi Yi
yHT     ai ,
i 1 i i 1 i
2
 N Y 
Var ( y HT )  E   ai i 
 i 1  i 
2
  N Y 
  E   ai i  
  i 1  i  
2
 N Y 
Var ( y HT )  E   ai i 
 i 1  i 
2
 N Y 
   E (ai ) i  
 i 1  i 
N Yi 2 
2
 N Yi 
E   ai     E (a i ) 2 
2
 i 1  i   i 1  i 
N 
   E (ai a j ) i i 
Y Y
 i, j i i 
 j i 
2
 N Y 
Var ( y HT )  E   ai i 
 i 1  i 
2
 N Y 
   E (ai ) i  
 i 1  i 
N  
Yi 2   N Yi Yi 
  E (a i ) 2     E (ai a j )
2
 i 1  i  i , j i i 
 j i 
N 2 Y 
2 N
Y Y
    Eai  i 2    E (a i ) E (a j ) i i
 i 1  i  i , j i i
j i
N 2
Yi 
 E (ai2 )   E ( ai )  
2
2  
i 1  i
N YiY j
   E (ai a j )  E ( ai ) E ( a j ) 
i , j 1  i j 
j i
N
Yi 2 
 E (ai2 )   E (ai )  
2
2  
i 1  i
N YiY j
   E (ai a j )  E (ai ) E (a j ) 
i , j 1  i j 
j i

N
Yi 2 
  2 E (ai2 )   E (ai )  
2
i 1  i
 
N YiY j
   E (ai a j )  E (ai ) E (a j ) 
i , j 1  i j 
j i
N
Yi 2
Var ( y HT )   Var (ai ) 
i 1  i2
N YiY j
  
i , j 1
Cov ( ai , a j )
i j
j i
N Yi 2
Var ( y HT )   1   i   i
i 1  i2
   ij   i j 
N YiY j
i ,1  i j
j i
N Yi 2
Var ( y HT )   1   i   i
i 1  i2
   ij   i j 
N YiY j
i ,1  i j
j i
 (
j i
ij   i j )   i (1   i )
N Yi 2
Var ( y HT )   1   i   i
i 1  i2
   ij   i j 
N YiY j
i ,1  i j
j i
N Yi 2
Var ( y HT )   ( i j   ij )
i 1 2 i
j i
N  ij   i j
  YiY j
i , j 1  i j
j i

N Yi 2
Var ( y HT )   ( i j   ij )
i 1 2i
j i
N  ij   i j
  YiY j
i , j 1  i j
j i
1 N Yi 2 Y j2
 i j ij  2   2 )
2 i ,1
(    ) (
i j
j i
N  i j   ij
  YiY j
i , j 1  i j
j i
1 N Yi 2 Y j2
 i j ij  2   2 )
2 i ,1
(    ) (
i j
j i
N  i j   ij
  YiY j
i , j 1  i j
j i
1 N
 ( i j   ij )
2 i ,1
j i
 Yi 2 Y j2 YiY j 
 2  2  2 
i  j  i j 
1 N Y Y
 )   ( i j   ij ) ( i  j )2
VarSYG ( yHT
2 i, j i  j
j i
Q. Calculate HT estimator by taking all possible samples of size 2 without replacement.

Further find mean and variance.
Ans:

 The following is a population with four values with respective size.
Unit No. 1 2 3 4
Yi 0.5 1.2 2.1 3.2

Zi 1 2 3 4
Pi 0.1 0.2 0.3 0.4

 HT estimator will be calculated for all the samples. Further mean and variance will be
obtained.
Unit No. 1 2 3 4
Yi 0.5 1.2 2.1 3.2

Zi 1 2 3 4
Pi 0.1 0.2 0.3 0.4
n
yi
yHT  
i 1  i
Pi  Zi Z i
y1 y2
yHT  
1  2
 N P Pi 
 i  Pi 1   j  
 j 1 1  Pj 1  Pi 
 N P P1 
 1  P1 1   j  
 j 1 1  Pj 1  P1 
Pj
1 P  1.456
j
 1  0.1(1  1.456  0.111)

 0.2345
Unit No. 1 2 3 4
Yi 0.5 1.2 2.1 3.2
Zi 1 2 3 4
Pi 0.1 0.2 0.3 0.4
i 0.2345 0.4413 0.6083 0.7159
Pj
1 P  1.456
j
 1  0.1(1  1.456  0.111)

 0.2345

 1 1 
 ij  PP   
1  Pi 1  Pj 
i j
 1 1 
 ij  0.1* 0.2  
 0.9 0.8 
 ij  0.1* 0.2 1.111  1.25
 0.0472
n
yi
yHT  
i 1  i
E  yHT
/
  7.0000
Var  yHT   49.8229   7.0000

/ 2
 0.8229.
Q.85 what are the HT Estimator for with Replacement Sampling?

Answer:
 Sample of Four Units
Unit No. 1 2 3 4
Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01
• Yi is the count of animals from the sample of four strips.
 HH Estimator
Unit No. 1 2 3 4

Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01
1  60 60 14 1 
     
4  0.05 0.05 0.02 0.01 
 800
 HT Estimator
Unit No. 1 2 3 4
Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01
 i  1  1  pi 
n
n
yi
yHT  
i 1  i
Q. Perform PPS Sampling With Replacement using the following data in R?
Y
Yi 0.5 1.2 2.1 3.2
Z 1 1 1 4
Zi 1 2 3 4
Answer:
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z
N<-4;n=2;means=c();
#----With PPS-----
for(i in 1:10)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
means[i]<-mean(y[s]) }

mean(means)
Q. Select 10000 samples with PPS Sampling With Replacement using the following
data in R. Find mean of each sample. Further find mean of means
Y
Yi 0.5 1.2 2.1 3.2
Z 1 1 1 4
Zi 1 2 3 4
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z
N<-4;n=2
#----With PPS-----
for(i in 1:10000)
mean(means)
Q. Select 10000 samples with PPS Sampling With Replacement using the following
data in R. Find HH estimator from each sample. Further find mean of HH estimators.
Y
Yi 0.5 1.2 2.1 3.2
Z 1 1 1 4
Zi 1 2 3 4
Answer:
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4);z<-sum(zi);pi<-zi/z
N<-4;n=2
for(i in 1:10000)
hh[i] <- mean(y[s]/pi[s])/N }
mean(hh)
Example:
Answer:
 Population from R

 Read in the trees data set from R

 let the variable of interest y be tree volume and draw by draw selection probability be
proportional to girth.
 The variable of interest is tree volume.
# -----Reading the data-----

y <- trees$Volume
zi <- trees$Girth
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4
25.7 24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
zi
8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0 12.9 12.9 13.3
13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6
 Defining Terms
# -----Defining Population----
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
pi
0.02020940 0.02093986 0.02142683 0.02556611 0.02605308 0.02629657 0.02678354
0.02678354 0.02702703 0.02727051 0.02751400 0.02775749 0.02775749 0.02848795
0.02921841 0.03140979 0.03140979 0.03238374 0.03335768 0.03360117 0.03408814
0.03457512 0.03530558 0.0389578 0.03968834 0.04212320 0.04261018 0.04358412
0.04382761 0.04382761 0.05015827
 Sampling With PPS
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----Sampling with PPS----
s <- sample(1:N,n,replace=TRUE,prob=pi)
y[s]
mean(y[s])
mu <- mean(y)
 With SRS
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)

N<-31;n=10;
#----Sampling with PPS----
s <- sample(1:N,n,replace=TRUE,prob=NULL)
y[s]
mean(y[s])
mu <- mean(y)
 Simulation with PPS
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----With PPS-----
for(i in 1:10000)
mean(means)
 Simulation with SRSWR
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----SRSWR----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=NULL)
mean(means)
 Observations
 Mean of Population is 30.17
 Mean of means with SRS is 30.25-
 Mean of means with PPS is 33.37
Q. Read the data from the following command and select 10000 samples with PPS
Sampling with replacement using the following data in R. Find HH estimator from each
sample. Further find mean of HH estimators.
y <- trees$Volume
zi <- trees$Girth

Answer:

y <- trees$Volume
zi <- trees$Girth
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
zi
8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0 12.9 12.9 13.3 13.7
13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi);N<-31;n=10;
for(i in 1:10000)
hh[i] <- mean(y[s]/pi[s])/N }
mean(hh)
Q. Select 10000 samples with PPS Sampling with replacement using the following data in
R. Find HT estimator from each sample. Further find mean of HT estimators.
Y 0.5 1.2 2.1 3.2

Yi
Y
Z 1 1 1 4
Zi 1 2 3 4
Answer:
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z;N=4;n=2
#---Calculation of Pi----

pii <- 1 - (1-pi)^n
#---Looping-----
for(i in 1:10000)
yu=unique(s)
ht[i] <- sum(y[yu]/pii[yu])/N}
mean(ht)
Q. Select 10000 samples with PPS Sampling without replacement using the following data
in R. Find HT estimator from each sample. Further find mean of HT estimators.
Y 0.5 1.2 2.1 3.2

Yi
Y
Z 1 1 1 4
Zi 1 2 3 4
Answer:
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z;N=4;n=2
piinv<-1-pi
sm<-sum(pi/piinv)
pii <- pi*(1+sm-pi/(1-pi))
 ith HT Estimator
#---Looping-----
hht=c();
for(i in 1:10000)
{s <- sample(1:N,n,replace=FALSE,prob=pi)
hht[i] <- sum(y[s]/pii[s])/N}
mean(hht)
Q. Read the data from the following command and select 10000 samples with PPS
Sampling with replacement using the following data in R. Find HT estimator from each
sample. Further find mean of HT estimators.
y <- trees$Volume
zi <- trees$Girth
Answer:

y <- trees$Volume
zi <- trees$Girth
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
zi
8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0 12.9 12.9 13.3 13.7
13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
pi
0.02020940 0.02093986 0.02142683 0.02556611 0.02605308 0.02629657 0.02678354
0.02678354 0.02702703 0.02727051 0.02751400 0.02775749 0.02775749 0.02848795
0.02921841 0.03140979 0.03140979 0.03238374 0.03335768 0.03360117 0.03408814
0.03457512 0.03530558 0.0389578 0.03968834 0.04212320 0.04261018 0.04358412
0.04382761 0.04382761 0.05015827
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
pii <- 1 - (1-pi)^n
pii
 Values of pii
> pii
0.1846714 0.1907295 0.1947457 0.2281662 0.2320147 0.2339325 0.2377552
0.2377552 0.2396601 0.2415607 0.2434571 0.2453491 0.2453491 0.2509998
0.2566124 0.2732237 0.2732237 0.2804987 0.2877080 0.2895002 0.2930723
0.2966283 0.3019321 0.3279149 0.3330058 0.3497258 0.3530242 0.3595758
0.3612043 0.3612043 0.4022598
#---Looping-----
hht=c();
for(i in 1:10000)
yu=unique(s)
hht[i] <- sum(y[yu]/pii[yu])/N}

mean(hht)
Q. Describe the Brewer selection procedure

Answer:
 Brewer’s Procedure
 Brewer (1963a) suggested a selection procedure. The procedure for a sample of size 2 is
given as
 Select the first unit with probability proportional to
Pi 1  Pi 
1  2 Pi 
 Select the second unit with probability proportional to size of remaining units.
Q. Describe the Durbin’s Procedure
 Durbin’s Procedure
 Durbin (1967) suggested a selection procedure. The procedure for a sample of size 2 is
given as
 Select first unit with probability proportional to size.
 Select second unit with probability proportional to size of
 1 1 
Pj   
1  2 Pi 1  2 Pj 
Q. Describe Shehbaz-Hanif-Samiuddin’s Procedure
 They suggested the procedure in 2003.
 Select first unit with probability proportional to
Pi 1  Pi 
1  2 Pi 
 Select second unit with probability proportional to size of
 1 1 
Pj   
1  2 Pi 1  2 Pj 

Estimation with Auxiliary Variable

Introduction:
 Auxiliary Variable
 We have discussed the estimation of parameters on the basis of single variable.
 Supporting/supplementary variable may be used at design stage or estimation stage.
 Supporting variable is used to enhance the efficiency of estimation.
 The supporting variable must be correlated with main variable.
 The supplementary information usually referred to as benchmark variable or Auxiliary
variable.
 Graunt (1662) was the first who used auxiliary information to estimate the population
size of England.
 After Graunt(1662), Laplace was the first to introduce the use of auxiliary information for
the estimation of population of France.
X = Cell Phones: 2016

Y = Cell Phones: 2017
X = Working hours
Y = Number of Items Processed
X = the size of unit i
Y = the number of animals in unit i
X = Diameter of the tree
Y = Volume of the tree
 Some Types of Relationships

Q. Prove that ratio estimator is almost unbiased for large sample size.
Answer:
 Ratio Estimator
 The Ratio Estimator is
y y
Rˆ  
x x
 Ratio Estimator for Mean and Total
 The Ratio Estimator for population mean
y
yr  X
x
 The Ratio Estimator for population total
y
yr  X
x
 Expectation of Ratio Estimator
 For the large sample size the expectation of ratio estimator is approximately equal
to population ratio, i.e.
E(Rˆ )  R

y
Rˆ  R   R
x
y  Rx

x
When n is large
y  Rx
Rˆ  R 
X
E ( y )  RE ( x )
E ( Rˆ  R) 
X
E ( y )  RE ( x )
E ( Rˆ  R) 
X
Y  RX
E ( Rˆ  R ) 
X
E ( Rˆ )  R
Q. Find the variance of Ratio Estimator.

Answer:
 Ratio Estimator
• The Ratio Estimator is
y y
Rˆ  
x x
 Ratio Estimator for Mean and Total
 The Ratio Estimator for population mean
y
yr  X
x
 The Ratio Estimator for population total
y
yr  X
x
 Variance of Ratio Estimator
y  Rx
Rˆ  R 
X
1
E ( Rˆ  R)2  2 E ( y  Rx ) 2
X
d  y  Rx
D  Y  RX
1
E ( Rˆ  R)2  2 V (d )
X

N n 1
MSE ( Rˆ ) 
Nn X 2
 1 N 2
 N  1  (Yi  RX i ) 
 i 1 
N  n 1
mse( Rˆ ) 
Nn X 2
 1 n 2
 n  1  ( yi  rxi ) 
 i 1 
 Alternative Expression of Variance
N n 1  1 N 
MSE ( Rˆ )  2  
Nn X  N  1 i 1
(Yi  RX i )2 

N n 1 1 N
SE ( Rˆ )  
2
(Yi  Y )  ( RX i  RX ) 
Nn X N  1 i 1
2
N n 1
MSE ( Rˆ )  2
[ SY2  2 RSYX  R 2 S X2 ]
Nn X
 Expression of Variance for Ratio Estimator of Mean
N n 1
MSE ( Rˆ ) 
Nn X 2
 SY2  2 RSYX  R 2 S X2 
N n
MSE ( yr ) 
Nn
 SY  2 R SY S X  R 2 S X2 
2
MSE ( yr )  Y 2
CY2  2  CY C X  C X2 
C.L.( Rˆ ) : Rˆ  t mse( Rˆ )
C.L.( y ) : yr  t mse( yr )
Q. Find the approximate Bias expression of Ratio Estimator?
Answer:
 Ratio Estimator of Mean
y y
Rˆ  
x x
 Notations
e0   y  Y  Y

e1   x  X  X
Using these notations
E  e02   C y2 ,
E  e12   Cx2 ,
E  e0e1   C yx ,
where, C yx   yxC yCx
 Using Notations
y  Y (1  e0 )
x  X (1  e1 )
y
yr  X
x
Y (1  e0 )
yr  X
X (1  e1 )
 Bias of Ratio Estimator
 Y 1  e0 1  e1 
1
 Y 1  e0  1  e1  e12 
yr  Y  Y  e0  e1  e0e1  e12 
1 f
Bias( yr )  Y  Cx2   yxC yCx 
n
1 f
Bias( yr )  Y  Cx2   yxC yCx 
n
1 f
Bias( R)  R  Cx2   yxC yCx 
n
Q. Comparison of Ratio Estimator with Mean Per Unit Estimator?

Answer:
 Ratio Estimator for Mean
 The Ratio Estimator for population mean, i.e.
y
yr  X
x
Var ( yr )  Y 2
[CY2  2  CY C X  C X2 ]
• The mean per unit estimator is
V ( y )  Y 2 CY2
 Comparison

Var( y )  Var( yr )  0
Y 2CY2 
Y 2 [CY2  2  CY C X  C X2 ]  0
CY2  [CY2  2CY CX  CX2 ]  0
2CY CX  CX2  0
1 CX

2 CY
Q. Prove that Hartley-Ross is an Unbiased Ratio Estimator
n( N  1) ( y  r x )
rHR  r 
N (n  1) X
Answer:
 Hartley-Ross Estimator
 The HR Estimator is
n( N  1) ( y  r x )
rHR  r 
N (n  1) X
1 n yi
where, r  
n i 1 xi
 The variance of HR Estimator is
1
Var ( rHR ) 
nX2
 SY2  R 2 S X2  2 R PSY S X 
• We know that
 n r
E  r   E   i   E ( ri )
 r 1 n 
E ( ri ) E ( xi )
E r  
E ( xi )
E ( ri ) E ( xi )

X
E ( ri ) E ( xi )
E r   R  R
X
E r   R 
1
 Y  E ( ri ) E ( xi ) 
X 

E r   R 
1
 Y  E ( ri ) E ( xi ) 
X 
E r   R 
1
  E ( ri xi )  E ( ri ) E ( xi )
X
yi
Y  E ( yi )  E ( xi )  E ( ri xi )
xi
1
E r   R   cov( ri xi )
X
E r   R 
N
1 1

X N
 (r  R)( x  X )
i 1
i i
1 N 1
E r   R   E ( srx )
X N
E r   R 
1 N 1 N
  (ri  R)( xi  X )
X N ( N  1) i 1
1 N 1
E r   R   E ( srx )
X N
E r   R 
1 N 1 N
  (ri  R)( xi  X )
X N ( N  1) i 1
 Hartley-Ross Estimator
 n n

1  n  r i  xi 
1  n 
srx    ri xi  i 1 i 1     ri xi  nrx 
n  1  i 1 n  n  1  i 1 
 
 
1  n  n
srx   
n  1  i 1
yi  nxr  
 n 1
 y  rx 
1 N 1 1 N 1  n 
E r   R   E ( srx )   E y  rx 
X N X N  n 1 
1 N 1 n
E r   E   y  rx   R
X N n 1 
1 N 1 n
E r   E   y  rx   R
X N n 1 

 1 N 1 n
E r   y  rx   R
 X N n 1 
 HR Unbiased Ratio Estimator
 1 N 1 n
E r   y  rx   R
 X N n 1 
E  Estimator   Parameter
n( N  1) ( y  r x )
rHR  r 
N (n  1) X
Q. Prove that regression estimator is an unbiased estimator.
Ans:
yreg  y  ˆ yx ( X  x )
E  y   E  y   ˆ E ( X  x )
reg yx
E  yreg   Y
Q. Find variance of Regression Estimator.
Ans
yreg  y  ˆ yx ( X  x )
yreg  Y (1  e0 ) 
ˆ yx ( X  X (1  e1 ))
yreg  Y 1  e0   ˆ yx Xe1
y  Y 1  e   ˆ Xe
reg 0 yx 1
yreg  Y  Ye0  ˆ yx Xe1
E  yreg  Y   E Ye0  ˆ yx Xe1  

2 2
V ( yreg )  E Ye0   E ˆ yx Xe1  

2 2
 
2 E Ye0 ˆ yx Xe1
 E Ye   E  ˆ 
2 2
0 yx Xe1
2 E Ye ˆ Xe 
0 yx 1
Y 2C y2  ˆ yx
2
X 2C x2 
 
 2YX ˆ yx  C yC x 

 2 2 2
2 Sy

Y C y   2 X C x 
2 2
Sx
 
 Sy 
 2YX   C yC x 
 Sx 
 2 2 2
2 Sy
  2 2 2
2 Sy

Y C y   2 X C x  Y C y   2 X C x 
2 2 2 2
Sx Sx
   
 S   S 
 2YX  y  C yC x   2YX  y  C yC x 
 Sx   Sx 
V ( yreg )   S y 1   
2 2
Q.Find expected value and MSE of ratio estimator by taking all possible samples of size 2
from the following population.
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 11 21 32 14
Answer:
 The following is an artificial population with four values.
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 11 21 32 14

 Calculate expected value for the ratio estimator.
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 11 21 32 14
y
X
x
X  19.5

Unit y x y/x yr
1,2 8.5 16 0.53 10.36
1,3 10 21.5 0.47 9.07
1,4 16.5 12.5 1.32 25.74
2,3 13.5 26.5 0.51 9.93
2,4 20 17.5 1.14 22.29
3,4 21.5 23 0.93 18.23
6
MSE  yr     yri  Y  P( s )
6 2
1
MSE  yr     yr  15
2
i 1 6 i 1
MSE  yr   43.54
Q. Read the data from the following command and select 10000 samples by simple random
smapling without replacement. Find ratio estimator from each sample. Further find mean
of ratio estimators and compare it with mean per unit estimator.
y <- trees$Volume
zi <- trees$Girth
Answer:
 Example
y
yr  X
x
 Defining Population
#----Defining Population
y <- trees$Volume
x <- trees$Girth
N <- 31; n <- 4
mux <- mean(x)
r<-c();mratio<-c();
 Relative Efficiency

#----repeating 10000 times

for(i in 1:10000)
{s<-sample(1:N,n)
m[i]<-mean(y[s])
r[i] <- mean(y[s])/mean(x[s])
mratio[i] <- r[i]*mux}
var(m);var(mratio)
re<-var(m)/var(mratio);re
Q. Generate a population of size 1000 for the given parameters using bivariate normal
distribution, such that
Y , X  as    2 2 .
 1 0.85

0.85 1 
We consider sample sizes: n  10 . Select 10,000 random samples considering SRSWOR
and calculate
X
ty
x
Further, calculate the MSEs for above estimators and calculate relative efficiencies with
respect to mean per unit estimator ( y )
Answer:
library(mvtnorm)
N=1000;ryx=0.85; n=10;
m=c(2,2); # vector of mean.
# variance covariance matrix is given below.
sig=matrix(c(1,0.85,0.85,1),ncol=2);
r=rmvnorm(N,m,sig);
x=r[,2];y=r[,1];
data=data.frame(x,y);
plot(y,x)
 Simulation Study
r<-c();mratio<-c();m=c();
for(i in 1:10000)
{s<-sample(1:N,n)
m[i]<-mean(y[s])
r[i] <- mean(y[s])/mean(x[s])
mratio[i] <- r[i]*mux}
var(m);var(mratio)
re<-var(m)/var(mratio);re

Q. Find Bias of Product Estimator?

Answer:
 Product Estimator
y
yr  X
x
yx
yp 
X
 Bias of Product Estimator
e0   y  Y  Y
e1   x  X  X
Using these notations,
E  e02   C y2 ,
E  e12   Cx2 ,
E  e0e1   C yx ,
 Using Notations
y  Y (1  e0 )
x  X (1  e1 )
yx
yp 
X
Y (1  e0 ) X (1  e1 )
yp 
X
 Y 1  e0 1  e1 
 Bias of Product Estimator
 Y 1  e0 1  e1 
 Y 1  e0  e1  e0e1 
y p  Y  Y  e0  e1  e0e1 
E  y p  Y   YE  e0  e1  e0e1 
Bias  y p   YE  e0  e1  e0e1 
Bias  y p   YE  e0e1 
E  e0e1   C yx ,

Bias  y p   Y C yx
Q. Find Mean Square Error of Product Estimator?
Answer:
 MSE of Product Estimator
yx
yp 
X
e0   y  Y  Y
e1   x  X  X
E  e02   C y2 ,
E  e12   Cx2 ,
E  e0e1   C yx ,
 Using Notations
yx
yp 
X
Y (1  e0 ) X (1  e1 )
yp 
X
 Y 1  e0 1  e1 
 MSE of Product Estimator
 Y 1  e0 1  e1 
 Y 1  e0  e1  e0e1 
y p  Y  Y  e0  e1  e0e1 
E  y p  Y   YE  e0  e1  e0e1 
E  y p  Y   Y 2 E  e0  e1  e0e1 
2 2
MSE  y p   Y 2 E  e0  e1  e0e1 
2
MSE  y p   Y 2 E  e0  e1 
2
MSE  y p   Y 2 E  e02  e12  2e0e1 

MSE  y p  
Y 2  C y2  Cx2  2  C yC y 
MSE  yr  
Y 2  C y2  Cx2  2  C yC y 

Q. Find expected value and MSE of product estimator by taking all possible samples of
size 2 from the following population.
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 32 21 14 11
Answer:
 The following is an artificial population with four values.
Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 32 21 14 11
 We are interested to take all possible samples of size 2.
 Calculate expected value for the product estimator.
yx
X  19.5
X
 All Possible Samples

 Expected Value
Unit
1,2 8.5 26.5 11.55
1,3 10 23 11.79
1,4 16.5 21.5 18.19
2,3 13.5 17.5 12.12
2,4 20 16 16.41
3,4 21.5 12.5 13.78
popmean  15 E  y p   13.97
MSE  y p     y pi  Y  P( s)
6
2
i 1

6
1
MSE  yr     yr  15
2
i 1 6
MSE  y p   7.36
Q.109 Generate a population of size 1000 for the given parameters using bivariate normal
Y , X  as    2 2 .
 1 0.85

 0.85 1 
We consider sample sizes: n  10 . Select 10,000 random samples considering SRSWOR
and calculate
x
ty
X
Further calculate the MSEs for above estimators and calculate relative efficiencies with
respect to mean per unit estimator ( y )
Answer:
 Defining Population
library(mvtnorm)
N=1000;ryx=-0.85; n=10;
sig=matrix(c(1,-0.85,-0.85,1),ncol=2);
r=rmvnorm(N,m,sig);
x=r[,2];y=r[,1];
data=data.frame(x,y);
p<-c();mp<-c();m=c();
for(i in 1:10000)
{s<-sample(1:N,n)
m[i]<-mean(y[s])
p[i] <- mean(y[s])*mean(x[s])
mp[i] <- p[i]/mux}
var(m);var(mp)
re<-var(m)/var(mp);re

Ratio Estimator in Stratified Sampling

Q. Find the expression of Bias of Combined Type Ratio Estimator.

Answer:
 Combined Type Ratio Estimator in Stratified Sampling
 The combined type ratio estimator is
yst
yrst  X
xst
 Notations
e0st   yst  Y  Y
e1st   xst  X  X
E e   W  C
k
2 2 2
0 st h h yh ,
h 1
E  e12st   Wh2hCxh2 ,
k
h 1
k
E  e0 st e1st   Wh2hC yxh
h 1
where, C yxh   yxhC yhCxh

 Bias of the Estimator
yst  Y (1  e0 st )
xst  X (1  e1st )
yst
yrst  X
xst
Y (1  e0 st )
yrst  X
X (1  e1st )
 Y 1  e0 st 1  e1st 
1
 Y 1  e0 st  1  e1st  e12st 
 e0 st  e1st 
yrst  Y  Y  2 
 e0 st e1st  e1st 
k
h 1

Bias( yrst ) 
Y Wh2h  Cxh   yxhC yhCxh 

k
2
h 1
Q. Find the expression of MSE of Combined Type Ratio Estimator?
Answer:
 Combined Ratio Estimator in Stratified Sampling
yst
yrst  X
xst
 Notations
e0st   yst  Y  Y
e1st   xst  X  X
E  e02st   Wh2hC yh
k
2
,
h 1
E  e12st   Wh2hCxh2 ,
k
h 1
k
h 1
where, C yxh   yxhC yhCxh

 MSE of Combined Type Ratio Estimator
yst  Y (1  e0 st )
xst  X (1  e1st )
yst
yrst  X
xst
Y (1  e0 st )
yrst  X
X (1  e1st )
 Y 1  e0 st 1  e1st 
1
 Y 1  e0 st 1  e1st 
yrst  Y 1  e0 st  e1st 
yrst  Y  Y  e0 st  e1st 
E  yrst  Y   Y 2 E  e0 st  e1st 
2 2

yrst  Y 1  e0 st  e1st  e0 st e1st 

E  yrst  Y   Y 2 E  e0 st  e1st 
2 2
E  yrst  Y  
2
 E  e02st   E  e12st  
Y 2

 2 E  e1st e1st  
 
MSE  yrst  
L C yh
2
 C xh2 
Y W  
2 2

 2C yxh 
h h
h 1
MSE  yrst  
C yh
L 2
 C xh2 
Y W  
2 2

 2C yxh 
h h
h 1
 Example
 Consider a population of size 700 consisting on three strata such that N1=100, N2=250
and N3=350. The required sample size is 18.
 The population consisting on two variables Y and X.
 The population mean for the Variable Y and X is 15 and 62.14, respectively.
 The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as 4, 8
and 6, respectively.
 The sample from each stratum is chosen as (y,x)
1,22 7,39 23,92
3,29 12,55 14,65
2,25 8,42 20,84
5,32 5,30 22,89
11,51 24,96

10,49 17,68
9,45
12,54
 Considering the Variable Y

1 7 23
3 12 14
2 8 20
5 5 22
11 24
10 17
9
12
mean 2.75 9.25 20
Nh 100 250 350
Sh 1.708 2.493 3.847
nh 4 8 6
 Considering the Variable X

22 39 92
29 55 65
25 42 84
32 30 89
51 96
49 68
45
54
mean 27 45.63 82.33
Nh 100 250 350
Sh 4.397 8.450 12.91

nh 4 8 6
 Sample Means
• Sample mean of variable Y
k k
h 1 h 1
1
yst   N1 y1  N 2 y2  N 3 y3 
N
• Sample mean of variable X
k k
xst   Wh xh   N h xh / N
h 1 h 1
1
xst   N1x1  N 2 x2  N 3 x3 
N
 Sample mean of Y
Stra-
Stra-1 2 Stra-3
mean 2.75 9.25 20
Nh 100 250 350
Sh 1.708 2.493 3.847
nh 4 8 6
k k
h 1 h 1
1
yst   N1 y1  N 2 y2  N 3 y3 
N
yst  13.70
 Sample mean of X

mean 27 45.63 82.33
Nh 100 250 350
Sh 4.397 8.450 12.91
nh 4 8 6
k k
h 1 h 1
1
xst   N1x1  N 2 x2  N 3 x3 
N

xst  61.32
 Combined Ratio Estimator
yst
yrst  X
xst
13.70
yrst   62.14
61.32
yrst  13.89
Q.115 Find the expression of Bias and MSE of Separate Type Ratio Estimator in stratified
sampling.
Answer:
 Separate Type Ratio Estimator
 The separate type ratio estimator is

k
yrsst  Wh yrh
h 1
k
yh
yrsst  Wh Xh
h 1 xh
Bias( yr ) 
1 f
Y  C x2   yxC yC x 
n
• In case of Stratified Sampling
Bias( yrh ) 
1  fh
Yh  C xh
2
  yxhC yhC xh 
nh
E ( yrh )  Yh 
1  fh
Yh  Cxh2   yxhC yhC xh 
nh
E ( yrh )  Yh 
1  fh
Yh  Cxh2   yxhC yhC xh 
nh
 Bias of Separate Type Ratio Estimator
k k
yrsst  Wh yrh  E  yrsst   Wh E  yrh 
h 1 h 1

 1  fh 
Yh  Cxh   yxhC yhCxh  
k
E  yrsst   Wh  Yh  2
h 1  nh 
1  fh
Yh Cxh2   yxhC yhCxh 
k
E  yrsst   Y  Wh
h 1 nh
1  fh
Yh  Cxh2   yxhC yhCxh 
k
E  yrsst   Y  Wh
h 1 nh
 Bias of Separate Type Ratio Estimator
Bias  yrsst  
W  Y  C   yxhC yhCxh 
k
2
h h h xh
h 1
 Expression of MSE for Ratio Estimator of Mean
MSE ( yr ) 
Y 2  C y2  Cx2  2  yxC yCx 
MSE ( yrh ) 
hYh2  C yh2  Cxh2  2 yxhC yhCxh 
 MSE of Separate Type Ratio Estimator
k k
yrsst  Wh yrh  MSE  yrsst   Wh2 MSE  yrh 
h 1 h 1
MSE ( yrh )   Y h h
2
C 2
yh  C  2 yxhC yhCxh 
2
xh
MSE  yrsst   Wh2hYh2  C yh  Cxh2  2 yxhC yhCxh 

k
2
h 1
 Nh
2
k   Yhi  Rh X hi  
MSE  yrsst    Wh2h i 1 
h 1  Nh  1 
 
 Bias and MSE of Separate Type Ratio Estimator
Bias  yrsst  
W  Y  C   yxhC yhCxh 
k
2
h h h xh
h 1
MSE  yrsst  
k  C yh
2
 C xh
2

 W Y 
2 2

 2  yxhC yhC xh 
h h h
h 1  

Example
 Consider a population of size 700 consisting on three strata such that N1=100,
N2=250 and N3=350. The required sample size is 18.
 The population means for the Variable Y and X is 15 and 62.14, respectively.
 The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as
4, 8 and 6, respectively.
 The overall mean of stratum-1, stratum-2 and stratum-3 for variable X is 25, 45
1,22 7,39 23,92
3,29 12,55 14,65
2,25 8,42 20,84
5,32 5,30 22,89
11,51 24,96
10,49 17,68
9,45
12,54

 Sample Means
 Sample mean of variable Y
k k
h 1 h 1
1
yst   N1 y1  N 2 y2  N 3 y3 
N
 Sample mean of variable X
k k
h 1 h 1
1
xst   N1x1  N 2 x2  N 3 x3 
N

 Sample Information of Y and X
3
yh
yrsst  Wh Xh
h 1 xh
y y y
 W1 1 X 1  W2 2 X 2  W3 3 X 3
x1 x2 x3
 W1 yr1  W2 yr 2  W3 yr 3
y1
y r1  X1
x1
3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1
 W1 yr1  W2 yr 2  W3 yr 3

3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1
 W1 yr1  W2 yr 2  W3 yr 3
W1=0.
142857
W2=0.
357143
W3=
0.5
3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1
 W1 yr1  W2 yr 2  W3 yr 3
 0.143  2.55  0.357  9.12

 0.5  20.65
Example
 Consider a population of size 700 consisting on three strata such that N1=100,
N2=250 and N3=350. The required sample size is 18.
 The population means for the Variable Y and X is 15 and 62.14, respectively.
 The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as 4, 8
 The overall mean of stratum-1, stratum-2 and stratum-3 for variable X is 25, 45 and
85, respectively.


 Sample Means
k k
h 1 h 1
1
yst   N1 y1  N 2 y2  N 3 y3 
N
k k
h 1 h 1
1
xst   N1x1  N 2 x2  N 3 x3 
N
 Sample Information of Y and X
3
yh
yrsst  Wh Xh
h 1 xh
y y y
 W1 1 X 1  W2 2 X 2  W3 3 X 3
x1 x2 x3
 W1 yr1  W2 yr 2  W3 yr 3
y1
y r1  X1
x1

3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1
 W1 yr1  W2 yr 2  W3 yr 3
W1=0.142857
W2=0.357143
W3= 0.5
3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1
 W1 yr1  W2 yr 2  W3 yr 3
W1=0.142857
W2=0.357143
W3= 0.5

3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1
 W1 yr1  W2 yr 2  W3 yr 3
 0.143  2.55  0.357  9.12
 0.5  20.65
Q. Find the variance expression of combined Type Regression Estimator?

Answer:
 Regression Estimator
 The Regression Estimator without stratification

yreg  y  ˆ yx ( X  x )
S yx
ˆ yx 
S x2
 Combined Type Regression Estimator
 The Regression Estimator under stratified sampling
yregc  yst  ˆyx ( X  xst )

k
yregc  Wh yh 
h 1
k
ˆ yx ( X  Wh xh )
h 1
 Expected Value of Estimator
 Taking Expected Value on both sides
 
E  yregc   WE  yh   ˆ yx  X  Wh E  xh  
k k
h 1  h 1 
 
E  yregc   WY  ˆ yx  X  Wh X 
k k
h 1  h 1 
 
E  yregc   WY  ˆ yx  X  Wh X 
k k
h 1  h 1 
 Expectation of Regression Estimator
E  yregc   Y  ˆ yx  X  X 
E  yreg   Y
 Variance of Regression Estimator
yregc  yst  ˆ yx  X  xst 

V  yregc   E  yregc  Y 
2
 yst 
2

 E 
  yx  X  xst   Y 
ˆ
V  yregc   E  yst  ˆ yx  X  xst   Y 

2
2
 k 
V  yregc   E  Wh yh  ˆ yx ( X  Wh xh )  Y 
k
 h 1 h 1 
2
 k 
V  yregc   E  Wh  yh  Y   ˆ yx Wh  xh  X  
k
 h 1 h 1 

2
 k 
V  yregc   E  Wh  yh  Y   ˆ yx Wh  xh  X  
k
 h 1 h 1 
 Variance of Combined Type Regression Estimator
V  yregc  
k  Shy2   02 Shx2 
 W  
2

 2  0 Shxy 
h h
h 1
Q. Find the variance of Separate Type Regression Estimator?

Answer:
 Regression Estimator
 The Regression Estimator without stratification
yreg  y  ˆ yx ( X  x )
S yx
ˆ yx 
S x2
 Separate Type Regression Estimator
 The Combined Type Regression Estimator is
yregc  yst  ˆyx ( X  xst )

 The Separate Type Regression Estimator is
k  yh  
yregs  Wh  
  yxh  X h  xh  
h 1
ˆ
 Expectation of Separate Type Regression Estimator

E  yregs  
k  E ( yh )  
 Wh  ˆ 
h 1   yxh  X h  E ( xh )  
E  yregs  
k Yh  
W ˆ 
  yxh  X h  X h  
h
h 1
E  yregs   WhYh  Y
k
h 1
 Variance of Regression Estimator
k
yregs  Wh yh 
h 1
k
ˆ yxh Wh  X h  xh 
h 1
V  yregs   E  yregs  Y 
2
2
 k 
V  yregs   E  Wh yh  ˆ yxh Wh  X h  xh   Y 
k
 h 1 h 1 
2
 k 
V  yregs   E  Wh  yh  Y   Wh ˆ yxh  xh  X h  
k
 h 1 h 1 
 k 2 k
2
 h    Wh2 ˆ 2yxh E  xh  X h  
2
W E y h  Y 
V  yregs    k 
h 1 h 1
 
 2Wh  yxh E  yh  Y  xh  X h 
2 ˆ2

 h 1 
V  yregc  
k  Shy2   yxh
2
Shx2 
 W 2

h h  
h 1  2  yxh Shxy 
Q. Simulation Study for Combined Type Ratio Estimator.

Answer:
 Simulation Study
library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;

w1=N1/N;
w2=N2/N;
w3=N3/N;
 Defining Parameters
# mean vectors for stratum 1, 2 and 3.

m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given #below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2);
sig2=matrix(c(1,0.80,0.80,1),ncol=2); sig3=matrix(c(1,0.75,0.75,1),ncol=2);
 Generating Populations
r1=rmvnorm(N1,m1,sig1);
# generate random variable for #stratum 1.
y1=r1[,1];x1=r1[,2]
y2=r2[,1];x2=r2[,2]
y3=r3[,1];x3=r3[,2]
x<-c(x1,x2,x3)
 Looping
 Means and Variances
> mean(yst);mean(xst);mean(rst);
[1] 20.01764
[1] 10.02838
[1] 20.02706

> var(yst);var(rst);var(yst)/var(rst)
Q. Define the process of simulation study for combined type ratio estimator also find mean
and variance of estimator using following data
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;

m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2)
Ans.
library(mvtnorm)
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
w1=N1/N;
w2=N2/N;
w3=N3/N;
m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2)

y1=r1[,1];x1=r1[,2]
# generate random variable for #stratum 2

y2=r2[,1];x2=r2[,2]

y3=r3[,1];x3=r3[,2]
x<-c(x1,x2,x3)
for(i in 1:5000){
sa1=sample(1:N1,n1)
xs1=x1[sa1];ys1=y1[sa1];
sa2=sample(1:N2,n2)

sa3=sample(1:N3,n3)
yst[i]=w1*mean(ys1)+w2*mean(ys2)+w3*mean(ys3);
xst[i]=w1*mean(xs1)+w2*mean(xs2)+w3*mean(xs3);
rst[i]=yst[i]*(mean(x)/(xst[i]))
}
mean(yst);mean(xst);
mean(rst);
var(yst);var(rst);
var(yst)/var(rst)
#Means and Variances

> mean(yst);mean(xst);mean(rst);
> var(yst);var(rst);var(yst)/var(rst)
Q. Define the process of simulation study for seperate type ratio estimator also find mean
and variance of estimator using following data
library(mvtnorm)
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
sig1=matrix(c(1,0.85,0.85,1),ncol=2);
Ans:
library(mvtnorm)
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
w1=N1/N;
w2=N2/N;
w3=N3/N;

m1=c(10,5);
m2=c(20,10);
m3=c(30,15)

sig1=matrix(c(1,0.85,0.85,1),ncol=2);


y1=r1[,1];x1=r1[,2]

y2=r2[,1];x2=r2[,2]

y3=r3[,1];x3=r3[,2]
x<-c(x1,x2,x3)
for(i in 1:5000){
sa1=sample(1:N1,n1)
r1[i]=mean(ys1)*(mean(x1)/mean(xs1))
sa2=sample(1:N2,n2)
sa3=sample(1:N3,n3)
rsst[i]=w1*r1[i]+w2*r2[i]+w3*r3[i];
yst[i]=w1*mean(ys1)+w2*mean(ys2)+w3*mean(ys3);
}
mean(rsst);mean(yst);
var(yst);var(rsst);var(yst)/var(rsst)
Means and Variances

> mean(rsst);mean(yst);

Double Sampling

Introduction:
Ratio and regression methods of estimation require the knowledge of population mean of auxiliary
variable in advance. An estimate of mean of auxiliary variable from a large sample may be used. This
procedure of selecting a large sample for collecting information on auxiliary variable x and then selecting
a subsample from it for collecting the information on the study variable y is called double sampling or
two-phase sampling.
The estimate is calculated by taking the sample in two phases. The expected value of the statistic
is
E (t )  E1  E2 (t ) 
The variance of the statistic is
Var (t )  E  t  E (t ) 
2
Var (t )  E  t  E2 (t )    E2 (t )  E (t )  
2
Var (t )  E  t  E2 (t )   E  E2 (t )  E (t ) 
2 2
Var (t )  E1 E2  t  E2 (t )   E1E2  E2 (t )  E (t ) 
2 2
Var (t )  E1 V2 (t )   E1  E2 (t )  E1  E2 (t )  
2
Var (t )  E1 V2 (t )   V1  E2 (t ) 
The Ratio Estimator for population mean
y
yr  X
x
The Ratio Estimator for Double Sampling
y2
yrd  x1
x2
Notations
e0   y2  Y  Y

e1   x1  X  X
e2   x2  X  X
E  e0   0, E  e02   2C y2 ,
E  e1   0, E  e12   1Cx2 ,
E  e1   0, E  e22   2Cx2 ,
E  e1e2   1Cx2 ,
E  e0e1   1 yxCyCx , E  e0e2   2  yxC yCx ,
Prove that E  e1e2   1Cx

2
Proof:
e1e2 
 x  X  x
1 2 X
 E  e1e2  
E  x1  X  x2  X 
X2 X2
E1E2  x1  X  x2  X 
E  e1e2  
X2
E1  x1  X  x1  X 
E  e1e2  
X2
E1  x1  X 
2
E  e1e2  
X2
1S x2
E  e1e2  
X2
E  e1e2   1Cx2

Q. Find Bias of Ratio Estimator in Double Sampling.

y2
yrd  x1
x2
e0   y2  Y  Y
e1   x1  X  X
e2   x2  X  X
y2
yrd  x1
x2
Y (1  e0 )
yrd  X (1  e1 )
X (1  e2 )
Y (1  e0 )
yrd  (1  e1 )
(1  e2 )
The Bias of Ratio Estimator in Double Sampling
yrd  Y (1  e0 )(1  e1 )(1  e2 )1
yrd  Y (1  e0 )(1  e1 )(1  e2  e22  ....)
yrd  Y (1  e0 )(1  e1 )(1  e2  e22 )
yrd  Y (1  e0 )(1  e1  e2  e1e2  e22  e1e22 )
yrd  Y (1  e0 )(1  e1  e2  e1e2  e22 )
yrd  Y  Y (e1  e2  e1e2  e22  e0  e0e1  e0e2 )
E  yrd  Y   YE (e1  e2  e1e2  e22  e0  e0e1  e0e2 )
E  yrd  Y   Y  2Cx2  1Cx2  1C yCx  2 C yCx 

E  yrd  Y   Y (2  1 )Cx2  (2  1 ) C yCx 
E  yrd  Y   Y  2  1  Cx2  C yCx 
Bias  yrd   Y  2  1  Cx2  C yCx 
Q.Find MSE of Ratio Estimator in Double Sampling.
y2
yrd  x1
x2
e0   y2  Y  Y
e1   x1  X  X
e2   x2  X  X
y2
yrd  x1
x2
Y (1  e0 )
yrd  X (1  e1 )
X (1  e2 )
Y (1  e0 )
yrd  (1  e1 )
(1  e2 )
MSE of Ratio Estimator in Double Sampling
yrd  Y (1  e0 )(1  e1 )(1  e2 )1

yrd  Y (1  e0 )(1  e1 )(1  e2  e22  ....)
yrd  Y  Y (e1  e2  e0 )
E  yrd  Y   Y 2 E  e1  e2  e0 
2 2

E  yrd  Y   Y 2 E e12  e22  e02  e0e1  e0e2  e1e2 
2
MSE  yrd   Y 2 2 C y2   2  1   Cx2  2 Cx C y  xy 

MSE  yrd   2Y 2  Cy2  Cx2  2 Cx C y  xy   1Y 2  Cx2  2 Cx C y  xy 

Q. Find Bias of Product Estimator in Double Sampling.

The Product Estimator is
x
yp  y
X
The Product Estimator in Double Sampling is
yx
y pd  2 2
x1
e0   y2  Y  Y , e1   x1  X  X , e2   x2  X  X
y2  Y (1  e0 ), x1  X (1  e1 ), x2  X (1  e2 )
Notations
E  e0   0, E  e02   2C y2 , E  e1   0
E  e12   1Cx2 , E  e1   0, E  e22   2Cx2
E  e1e2   1Cx2 , E  e0e1   1  yxC y C x , E  e0e2   2  yxC y C x
Bias of Product Estimator in Double Sampling

y
y pd  2 x2
x1
Y (1  e0 )
y pd  X (1  e2 )
X (1  e1 )
Y (1  e0 )
y pd  (1  e2 )
(1  e1 )
y pd  Y (1  e0 )(1  e2 )(1  e1 )1

y pd  Y (1  e0 )(1  e2 )(1  e1  e12  ....)
y pd  Y (1  e0 )(1  e2 )(1  e1  e12 )
y pd  Y (1  e0 )(1  e2  e1  e1e2  e12  e2e12 )
y pd  Y (1  e0 )(1  e2  e1  e1e2  e12 )
y pd  Y  Y (e2  e1  e1e2  e12  e0  e0e2  e0e1 )
E  y pd  Y   YE (e2  e1  e1e2  e12  e0  e0e2  e0e1 )

E  y pd  Y   Y 1C x2  1C x2  2  C y C x  1  C y C x 
Bias  y pd   Y (2  1 )  C y C x 
Q. Find MSE of Product Estimator in Double Sampling.

The Product Estimator for Double Sampling
y2
y pd  x2
x1
e0   y2  Y  Y , e1   x1  X  X , e2   x2  X  X
y2 Y (1  e0 ) Y (1  e0 )
y pd  x2 , y pd  X (1  e2 ), y pd  (1  e2 )
x1 X (1  e1 ) (1  e1 )
The MSE of Product Estimator in Double Sampling
y pd  Y (1  e0 )(1  e2 )(1  e1 ) 1
y pd  Y (1  e0 )(1  e2 )(1  e1  ....)
y pd  Y  Y (e2  e1  e0 )

E  y pd  Y   Y 2 E  e2  e1  e0 
2 2
E  y pd  Y   Y 2 E e12  e22  e02  2e0e2  2e0e1  2e1e2

 
2
 
MSE  yrd   Y 2 2 C y2   2  1  C x2  2 C x C y  xy 
  
MSE  yrd   2Y 2 C y2  C x2  2 C x C y  xy  1Y 2 C x2  2 C x C y  xy 
Q. Generate a population of size 1000 for the given parameters using bivariate normal
Y , X  as   10 2 .
 1 0.85

0.85 1 
Consider sample sizes n1=100;n2=20. Select 10,000 random samples considering double and
calculate
x1
t  y2
x2
Further, calculate the MSEs for above estimator.
Ans:
library(mvtnorm)
N=1000;ryx=0.85; n1=100;n2=20;

sig=matrix(c(1,0.85,0.85,1),ncol=2);
r=rmvnorm(N,m,sig);
x=r[,2];y=r[,1];
r<-c();mratio<-c();m=c();
for(i in 1:10000)
{
s1<-sample(1:N,n1)
s2<-sample(s1,n2)
m[i]<-mean(x[s1])
r[i] <- mean(y[s2])/mean(x[s2])
mratio[i] <- r[i]*m[i]
}
mean(mratio)
var(mratio)

Two Stage Sampling

Q. Prove that sample mean is unbiased estimator of population mean in two stage sampling
n m y
y  
ij
i 1 j 1 nm
Ans:
The estimate is calculated by taking the sample in two stages
The expected value and variance of the statistic is
E (t )  E1  E2 (t ) 
Var (t )  E1 V2 (t )  V1  E2 (t ) 
Expected Value of Mean

n m y
y   ij
i 1 j 1 nm
E ( y )  E1  E2 ( y ) 
  n m y 
E ( y )  E1  E2   ij  
 
  i 1 j 1 nm  
 n 1  m y 
E ( y )  E1   E2   ij  
 i 1 n 
  j 1 m  
 n 1  m E2  yij   
E ( y )  E1     
 i 1 n  j 1 m  
  
 n 1  M yij 
E ( y )  E1      
 i 1 n j 1 M
  
 n 1 
E ( y )  E1   Yi 
 i 1 n 
 n 1 
E ( y )    E1 Yi  
 i 1 n 
E ( y )  E1 Yi 

 N Y 
E( y )    i   E( y )  Y
 i 1 N 
Q. Prove that sample mean is unbiased estimator of population mean in two stage
sampling.
n m y
y  
ij
i 1 j 1 nm
Ans:
n m y n
1
y     yi
ij
i 1 j 1 nm i 1 n
Var (t )  E1 V2 (t )   V1  E2 (t ) 
Var ( y )  E1 V2 ( y )   V1  E2 ( y ) 
 n m yij 
E2 ( y )  E2   
 i 1 j 1 nm 
 n 1 m E2  yij  
E2 ( y )     
 i 1 n j 1 m 
 n 1 M y 
E2 ( y )     ij 
 i 1 n j 1 M 
 n 1 M y 
E2 ( y )     ij 
 i 1 n j 1 M 
 n 1 
E2 ( y )    Yi 
 i 1 n 
E2 ( y )  Yn
Yn is the estimator based on the 1st stage sample of size n, by SRS, we have

 N n 2
V (y)   S
 Nn 
 N n 2
V1  E2 ( y )   V1 Yn   V1  E2 ( y )     S1
   Nn 
  n 1 
E1 V2 ( y )   E1 V2   yi  
  i 1 n  
1 n 
  Cov  y , y 
n
1
E1 V2 ( y )   E1  2 V2  yi   2 i j
 n i 1 n i  j 1 
1 n
 1  n M m 2 
E1 V2 ( y )   E1  2
n
V  y   n
i 1
2 i 2
E1  
 i 1 Mm
S2i 

1  n M m  1 M m 
E1 V2 ( y )   2 
E1  S22i    E1  S22i 
n  i 1 Mm  n  Mm 
1  M  m N S22i 
E1 V2 ( y )     
n  Mm i 1 N 
1 M m 2
E1 V2 ( y )   S2 
n  Mm 
Var ( y )  E1 V2 ( y )   V1  E2 ( y ) 
1 M m 2
E1 V2 ( y )   S2 
n  Mm 
 N n 2
V1  E2 ( y )     S1
 Nn 
Var (t )  E1 V2 (t )   V1  E2 (t ) 
1 M m 2  N n  2
Var (t )   S2     S1
n  Mm   Nn 
1 M m 2  N n  2
Var (t )   s2     s1
n  Mm   Nn 

Q. Explain two stage sampling for unequal first stage units, also find expected value of
following estimator.
n
M i yi n ui yi
y  
i 1 nM i 1 n
Let the population has N clusters with Mi as size of ith cluster. Let a sample of n first stage units
be selected from this population.Let mi units will be selected at second stage.
Mi= Size of ith cluster
N
M 0   M i  NM
i 1
Mean of ith first stage unit
Mi
y ij
Yi  i 1
Mi
Overall population mean
N
M Y i i
Y i 1
NM
The Expected Value of Mean
n
M i yi n ui yi
y  
i 1 nM i 1 n
E ( y )  E1  E2 ( y ) 
  n uy 
E ( y )  E1  E2   i i 
  i 1 n 
  n u y 
E ( y )  E1  E2   i i  
  i 1 n  
 n uY 
E ( y )  E1   i i 
 i 1 n 

 n M i E1 Yi  
E( y )    
 i 1 n 
 
 n E1  uiYi  
E( y )    
 i 1 n 
 
 N uiYi 
E( y )    
 i 1 N 
 N MY 
E( y )    i i 
 i 1 MN 
E( y )  Y
Q. Find the variance expression of following estimator in two stage sampling when first
stage units are not equal in sizes.
n
M i yi n ui yi
y  
i 1 nM i 1 n
Ans
Var (t )  E1 V2 (t )   V1  E2 (t ) 
Var ( y )  E1 V2 ( y )   V1  E2 ( y ) 
  n u y 
V1  E2 ( y )   V1  E2   i i  
  i 1 n  
 n uY 
V1  E2 ( y )   V1   i i 
 i 1 n 
V1  E2 ( y )   V1  y 
1 1 
V1  E2 ( y )      S12
n N 
  n uy 
E1 V2 ( y )   E1 V2   i i 
  i 1 n 

 n u2   n u 2 M  mi 2 
E1 V2 ( y )   E1  i2 V2  yi    E1  i2 i S2i 
 i 1 n   i 1 n M i mi 
n
ui2 M i  mi N
ui2 M i  mi S22i
E1 V2 ( y )    2
 
i 1 n M i mi i 1 n M i mi N
Var (t )  E1 V2 (t )  V1  E2 (t )
N
ui2 M i  mi S22i  N  n  2
Var (t )     S1
i 1 n M i mi N  Nn 
Q. Write the program for Two Stage Sampling using R language considering following
data taking n1=3;n2=2. Calculate the sample mean
Cluster-1 125 115 129 134 111
Cluster-2 134 125 142 141 131
Cluster-3 144 122 134 126

143
Cluster-4 114 111 134 131 146
Cluster-5 119 126 122 129 130
Cluster-6 140 125 124 124 115
Ans:
Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu4<-c(114,111,134,131,146)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)
Clus<-data.frame(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
pop<-t(Clus)
1st Stage Sample
n1=3;n2=2;
w<-pop[sample(nrow(pop),n1),]
sc1<-w[1,]
sc2<-w[2,]

sc3<-w[3,]
s2_sc1<-sample(sc1,n2)
f_s=c(s2_sc1,s2_sc2,s2_sc3)
est=mean(f_s)
Q2: Generate the three clusters such that
Cluster-1 from normal distribution with mean zero and variance 1.
Cluster-2 from normal distribution with mean 2 and variance 5.
Cluster-3 from normal distribution with mean 4 and variance 9.
Calculate the sample mean using two stage cluster sampling taking n1=2;n2=15.
Ans:
Clu1<-rnorm(100,0,1)
Clus<-data.frame(Clu1,Clu2,Clu3)
pop<-t(Clus)
#1st Stage Sampling
n1=2;n2=15;
w<-pop[sample(nrow(pop),n1),]
sc1<-w[1,]
sc2<-w[2,]
#2nd Stage Sampling
f_s=c(s2_sc1,s2_sc2)
est=mean(f_s)
Q.Find Bias of Ratio Estimator in Two Stage Sampling.
Ans:
The Ratio Estimator of mean under two stage sampling when the first stage units are equal.
y
yr 2 s  X
x
The estimator of mean in two stage sampling is

n m y n
1
y   ij   yi
i 1 j 1 nm i 1 n
1 M m 2  N n  2
V (y)  S2     S1
n  Mm   Nn 
Some important notations

e0   y  Y  Y , e1   x  X  X
E (e0 )  E (e1)  0, E (e02 )  V02 , E (e12 )  V20 , E (e0e1)  V11

V ( y) 1 M m 2   N n  2
V02   2 S2 y     S1 y
Y 2
nY  Mm   Nn 
V (x ) 1 M m 2   N n  2
V20  2  S2 x     S1x
X nX 2  Mm   Nn 
1  1 N  M m  N n 
V11    ( xy 2i S y 2i S x 2i )    ( xy1S y1S x1) 
YX  nN i 1 Mm   Nn  
1 Mi
S y 2i 
2
 ( yij  Yi )2
M i  1 i 1
S xy 2i
 xy 2i 
S x 2 i S y 2i
y  Y (1  e0 ), x  X (1  e1 )
y Y (1  e0 )
yr 2 s  X , yr 2 s  X
x X (1  e1 )
Bias of Ratio Estimator Under Two Stage Sampling
yr 2 s  Y 1  e0 1  e1   yr 2 s  Y 1  e0  1  e1  e12 
1
yr 2 s  Y 1  e1  e12  e0  e0e1  e0e12 
yr 2 s  Y  Y  e0  e1  e12  e0e1  e0e12 
yr 2 s  Y  Y  e0  e1  e12  e0e1 
E  yr 2 s  Y   YE  e0  e1  e12  e0e1 
Bias  yr 2 s   Y V20  V11 
Q.Find MSE of Ratio Estimator in Two Stage Sampling.

Ans:
The Ratio Estimator of mean under two stage sampling when the first stage units are equal
y
yr 2 s  X
x

Notations
E (e0 )  E (e1)  0, E (e02 )  V02 , E (e12 )  V20 , E (e0e1)  V11
e0   y  Y  Y , e1   x  X  X
y  Y (1  e0 ), x  X (1  e1 )
y
yr 2 s  X
x
Y (1  e0 )
yr 2 s  X
X (1  e1 )
MSE of Ratio Estimator Under Two Stage Sampling
yr 2 s  Y 1  e0 1  e1   yr 2 s  Y 1  e0 1  e1 
1
yr 2 s  Y 1  e0  e1  e0e1   yr 2 s  Y  Y  e0  e1 
E  yr 2 s  Y   Y 2 E  e0  e1 
2 2
E  yr 2 s  Y   Y 2 E  e02  e12  2e0e1 

2
MSE  yr 2 s   Y 2 V02  V20  2V11 
Q.Find Bias of Product Estimator in Two Stage Sampling.

Ans:
The Product Estimator of mean under two stage sampling when the first stage units are equal
yx
y p2s 
X
Some important notations
E (e0 )  E (e1 )  0, E (e0 2 )  V02
E (e12 )  V20 , E (e0e1 )  V11
e0   y  Y  Y , e1   x  X  X
y  Y (1  e0 ), x  X (1  e1 )
yx
y p2s  ,
X
Y (1  e0 )(1  e1 )
y p2s  X
X

Bias of Ratio Estimator Under Two Stage Sampling

y p 2 s  Y 1  e0 1  e1   y p 2 s  Y 1  e0  e1  e0e1 
y p 2 s  Y  Y  e0  e1  e0e1 
E  y p 2 s  Y   YE  e0  e1  e0e1 
Bias  y p 2 s   Y V11 
Q.Find MSE of Product Estimator in Two Stage Sampling
Ans:
The Product Estimator of mean under two stage sampling when the first stage units are equal
yx
y p2s 
X
Notations
E (e0 )  E (e1 )  0,
E (e0 2 )  V02 , E (e12 )  V20
E (e0e1 )  V11
e0   y  Y  Y
e1   x  X  X
y  Y (1  e0 ), x  X (1  e1 )
yx Y (1  e0 )(1  e1 )
y p2s  , y p 2s  X
X X
MSE of Product Estimator
y p 2 s  Y 1  e0 1  e1 
y p 2 s  Y 1  e0  e1  e0e1 
y p 2 s  Y  Y  e0  e1 
E  y p 2 s  Y   Y 2 E  e0  e1   Y 2 E  e02  e12  2e0e1 
2 2
MSE  y p 2 s   Y 2 V02  V20  2V11 

Ranked set sampling

Introduction:
Ranked set sampling (RSS) is an alternative to simple random sampling that can sometimes offer
large improvements in precision. McIntyre (1952) introduced the basic concept of ranked set
sampling in order to estimate the population means of pasture and forage yields. The RSS
procedure is elaborated as follows
First, a simple random sample of size k is drawn from the population and the k sampling units
are ranked with respect to the variable of interest, say X, by judgment without actual
measurement. Then the unit with rank 1 is identified and taken for the measurement of X. Next,
another simple random sample of size k is drawn and the units of the sample are ranked by
judgment, the unit with rank 2 is taken for the measurement of X and the remaining units are
discarded. Then the unit with rank 2 is identified and taken for the measurement of X. Then
another sample of size 3.
𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏
𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏
𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏
𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏
𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏
𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏
Ranking with Auxiliary Variable

In a case when we are using the auxiliary information (or variable) for ranking the variable of
interest, then ranking error depend on the correlation. The performance of the RSS estimator is
actually based on the ranking of the sample units in variable of interest
Q. Prove that sample mean is an unbiased estimator of mean under ranked set sampling
Ans:
The mean Estimator in RSS is
k
 X  *
i
X*  i 1
k
The Expected Value is
 E  X  
k
*
EX* 
i
i 1
k
We have assumed perfect rankings. X *i  is distributed like the ith order statistic from a continuous
distribution with p.d.f. f(x) and c.d.f. F(x).

    x (i 1)(kk!  i)! F ( x)

E X *
i 

i 1
1  F ( x)
k i
f ( x)dx
 E  X  
k
*
EX* 
i
i 1
1 k  

(k  1)!
EX*    F ( x) 1  F ( x) f ( x)dx 
i 1 k i
  kx
k i 1  (i  1)(k  i)! 
1 k   k  1 

E  X      kx    F ( x) 1  F ( x)  f ( x)dx 
* i 1 k i
k i 1   i  1  
k  k 1
k i 

 
EX*     F ( x)  1  F ( x)   dx
i 1
 x f ( x ) 
i 1  i  1  

EX *
   x f ( x)dx  X

EX*  X
Q. Find variance expression for mean estimator under ranked set sampling.
Ans:
The mean Estimator in RSS is
 X  *
i
X*  i 1
By definition
   
2 2
E X *i   X  E X *i   X *i   X *i   X
      X   X 
2 2 2
E X *i   X  E X *i   X *i  *
i
        X   X 
2 2 2
E X *i   X *i   V X *i   E X *i   X *
i

V X* 
1 k
   k1   X    X 
k
2 
2 2
E X *i   X *i  2
*
i
k i 1 i 1
Taking ,
k  
2  k  1
 
k
 E X *i  X *i    k x  X     F ( x) 1  F ( x) f ( x)dx 

2 i 1 k i
i 1 i 1    i 1  
k  k 1
k i 

 
 k   x  X  f ( x )     F ( x)  1  F ( x)   dx
2 i 1
 i 1  i  1  
V X   2
1  2 k
 k   X i   X   
2
* *
k  i 1
2
V X*   X   X 
k
1 2
 *
i
k k2 i 1
143. Generate a population of size 1000 following normal distribution with mean 0 and
variance 1. Further calculate the sample mean using ranked set sampling taking sample
size as 6.
Ans:
k=6;rssx=matrix(,k,k);
pop=rnorm(1000,0,1)
for(i in 1:k)
{
s=sample(pop,k)
xs=sort(s,decreasing = FALSE)
rssx[i,]=xs
}
rssx_s=diag(rssx)
est_srs=mean(x)
est_rss=mean(rssx_s)
rssx
Q. Find Bias of Ratio Estimator in Ranked Set Sampling such that
E (e0 )  E (e1)  0
1  x2
E  e12    
k
1

2
 X *
 X  Cx2  Dx2[i ]
rk X 2 X 2 rk 2 i 1 i 
1 y
 
2
E  e0  
k
1
2 
2
2
2
 2
Yi*  Y  C y2  Dy2[i ]
rk Y Y rk i 1
1  yx
E  e0 e1     
k
1
  X *  X Yi*  Y  Cxy  Dxy[i ]
rk YX YXrk 2 i 1 i 
Ans:
The ratio estimator of mean is

Y*
y[ rss ]  X
X*
The Variance of Sample Mean Under RSS
2 1 k

V  X *   x  2  X *i   X 
2
k k i 1
Notations
e0  Y *  Y  Y , e1   X *  X  X
E (e0 )  E (e1)  0
1  x2
E  e12    
k
1
2 
2
2
 2
X *i   X  Cx2  Dx2[i ]
rk X X rk i 1
1 y
 
2
E  e02  
k
1
2 
2
2
 2
Yi*  Y  C y2  Dy2[i ]
rk Y Y rk i 1
1  yx
E  e0 e1     
k
1
2 
 X *i   X Yi*  Y  Cxy  Dxy[i ]
rk YX YXrk i 1
Y *  Y (1  e0 )
X *  X (1  e1 )
Y*
y[ rss ]  * X
X
Y (1  e0 )
y[ rss ]  X
X (1  e1 )
Bias of Ratio Estimator Under RS Sampling

y[ rss ]  Y 1  e0 1  e1   y[ rss ]  Y 1  e0  1  e1  e12 

1
y[ rss ]  Y 1  e1  e12  e0  e0 e1  e0 e12 
y[ rss ]  Y  Y  e0  e1  e12  e0 e1  e0e12 
y[ rss ]  Y  Y  e0  e1  e12  e0e1 
E  y[ rss ]  Y   YE  e0  e1  e12  e0 e1 
 Cx2  Dx2[i ] 
Bias  y[ rss ]   Y  
   Cxy  Dxy[i ]  
Q.Find MSE of Ratio Estimator in Ranked Set Sampling such that

E (e0 )  E (e1)  0
1  x2
E  e12    
k
1
2 
2
2
 2
X *i   X  Cx2  Dx2[i ]
rk X X rk i 1
1 y
 
2
E  e0  
k
1

2
2
2
 2 2
Y *

i
 Y  C y2  Dy2[i ]
rk Y Y rk i 1
1  yx
E  e0 e1     
k
1
2 
 X *i   X Yi*  Y  Cxy  Dxy[i ]
rk YX YXrk i 1
Ans:
The Ratio Estimator of mean for ranked set sampling is

Y*
y[ rss ]  X
X*
Notations
e0  Y *  Y  Y , e1   X *  X  X
E(e0 )  E(e1)  0,
1  x2
E e    
k
1
2 
2
2
1 2
 2
X *i   X  Cx2  Dx2[i ]
rk X X rk i 1

1 y
 
2
E e  
k
1
 Y * Y
2
2
  C y2  Dy2[i ]
rk Y 2 Y 2 rk 2 i 1 i 
0
1  yx
E  e0 e1     
k
1
2 
 X *i   X Yi*  Y  Cxy  Dxy[i ]
rk YX YXrk i 1
Y *  Y (1  e0 )
X *  X (1  e1 )
Y*
y[ rss ]  * X
X
Y (1  e0 )
y[ rss ]  X
X (1  e1 )
MSE of Ratio Estimator Under RS Sampling
y[ rss ]  Y 1  e0 1  e1   y[ rss ]  Y 1  e0 1  e1 

1
y[ rss ]  Y 1  e1  e0   y[ rss ]  Y  Y  e0  e1 
E  y[ rss ]  Y   Y 2 E  e0  e1 
2 2
MSE  y[ rss ]   Y 2 E  e02  e12  2e0e1 

MSE  y[ rss ]   Y 2 C y2  Dy2[i ]  Cx2  Dx2[i ]  2 Cxy  Dxy[i ]  
  C y2  Cx2  2Cxy  
MSE  y[ rss ]   Y 2  
  Dy2[i ]  Dx2[i ]  2 Dxy[i ] 
Similarly, the Bias and MSE of Product Estimator Under RSS

The Product Estimator of mean is
Y* *
y[ pss ]  X
X
Bias of Product estimator is
Bias  y[ pss ]   Y Cxy  Dxy[i ] 
MSE of product estimator

MSE  y[ pss ]   Y 2   C y2  Cx2  2Cxy   Dy2[i ]  Dx2[i ]  2Dxy[i ] 

Q. Write some types of sampling schemes using ranked set sampling.
Ranked Set Sampling
𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏
𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏
𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏
𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏
𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏
𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏
Extreme Ranked Set Sampling

Samawi et al. (1996) suggested a modified RSS design named as Extreme ranked set sampling
(ERSS).
𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏
𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏
𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏
𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏
𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏
𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏
Median Ranked set sampling
x1(1)1 x1(2)1 x1(3)1 x1(4)1 x1(5)1 x1(6)1

x2(1)1 x2(2)1 x2(3)1 x2(4)1 x2(5)1 x2(6)1
x3(1)1 x3(2)1 x3(3)1 x3(4)1 x3(5)1 x3(6)1
x4(1)1 x4(2)1 x4(3)1 x4(4)1 x4(5)1 x4(6)1
x5(1)1 x5(2)1 x5(3)1 x5(4)1 x5(5)1 x5(6)1
x6(1)1 x6(2)1 x6(3)1 x6(4)1 x6(5)1 x6(6)1
x1(1)1 x1(2)1 x1(3)1 x1(4)1 x1(5)1
x2(1)1 x2(2)1 x2(3)1 x2(4)1 x2(5)1

x3(1)1 x3(2)1 x3(3)1 x3(4)1 x3(5)1
x4(1)1 x4(2)1 x4(3)1 x4(4)1 x4(5)1
x5(1)1 x5(2)1 x5(3)1 x5(4)1 x5(5)1
Mean in Ranked Set Sampling (RSS)

k
 X  *
i
The mean Estimator in RSS is X*  i 1
k
r k
 X   *
ic
In case of r cycles X*  c 1 i 1
rk
Mean in Extreme Ranked Set Sampling (ERSS)
1 r  k /2 * k /2

X *ERSS e    i ,1,c  X k i , k ,c 
rk c 1  i 1
X  *
i 1 2 
1 r   *  k 1 /2 
k 1 /2
X *ERSS o     i ,1,c  X k 1
rk c 1  i 1
X  *
 i , k , c
 X *  k 1  
, c 
 2  
i 1 k ,
2
Mean in Median Ranked Set Sampling (MRSS)
The mean Estimator in Case of even set size
1 r  k /2 * k /2 
X *MRSS e     k   X  k  k  2 
X  *
rk c 1  i 1 i , 2 ,c i 1  2 i , 2 ,c 
The mean Estimator in Case of odd set size
1 r k * 
X *MRSS o    X  k 1  
rk c 1  i 1 i , 2 ,c 
Pair Ranked Set Sampling and Double Ranked Set Sampling are further developments in Ranked
set sampling.

Dealing with Non-response

Introduction:
The individuals chosen for the sample are not ready to participate in the survey. This is a type of
selection bias Unit/Item nonresponse. The problem of non-response can be dealt with using
following methods.
Sub-sampling of non-respondents
Randomized response technique.
Hansen and Hurwitz (1946) technique
Taking a sub sample of non-respondents after the first mail attempt and then enumerating the sub
sample of non-respondents by personal interviews
n1 n
Hansen & Hurwitz (1946) Estimator y  y1  2 y2'
n n
Q. Prove that following Hansen and Hurwitz estimator is unbiased to population mean.
n1 n
y y1  2 y2'
n n
Ans:
Hansen & Hurwitz (1946) Estimator
n1 n
y*  y1  2 y2'
n n
n 
E  y *   E  1 y1  2 y2' 
n
n n 
n 
E  y *   E1E2  1 y1  2 y2' 
n
n n 
Hansen & Hurwitz (1946) is Unbiased
n n 
E  y*    1 Y  2 Y 
n n 
n n 
E  y*   Y  1  2 
n n
E  y*   Y
Q.Find Variance of following Hansen & Hurwitz (1946) Estimator.
n1 n
y y1  2 y2'
n n
Ans:
The Hansen & Hurwitz (1946) Estimator is
n1 n
y*  y1  2 y2'
n n
Variance
y* 
1
n
 n1 y1  n2 y2   2  y2'  y2 
n
n
1 
V  y *   V   n1 y1  n2 y2   2  y2'  y2 
n
n n 

1 
V   n1 y1  n2 y2    S y2
n 
n22 n22
V  y2  y2   2 E  y2'  y2 
' 2
2
n n
n22 n22  '

E  y2  y2   2 E  y2  Y2 
2
 ( y2  Y2 ) 
' 2
n2 n  
n22 
2   2
E y '  Y2   E  y2  Y2   2E  y2'  Y2  ( y2  Y2 ) 
2 2

n  
n22 
 2  E  y2'  Y2   E  y2  Y2  
2 2
n  
 
n22  1 1 2
 2   S y (2)
n  n2 n2 
 
 k 
n2
2 
 k  1 S y2(2)
n
  S y2(2)
V ( y * )   S y2   S y2(2)
150.Generate a population consisting on 1000 values in R program and calculate Hansen &
Hurwitz (1946) Estimator such that n1=80;n2=20;r=10.
N=1000; n=100;n1=80;n2=20;r=10;
pop <- rnorm(N,0,1)
s <- sample(pop,n)
s_r<-pop[1:80]
s_nr<-pop[81:100]
s2<-sample(s_nr,r)
m<-(n1/n)*mean(s_r)+(n2/n)*mean(s2)
Q. Find the expression of Bias of ratio estimator in case of nonresponse on study variable
only, where

E (e0* )  E (e1)  0,
 N n  x
2
E e   
2
1  2   Cx
2
 nN  X
E  e0*2  
1
 S y2   S y2(2)   C *2
Y2 
y
E  e0*e1   
S yx
  C yx
YX
Ans:
Ratio estimator when non-response occurs in study variable y is as follow
y*
y  X
*
r
x
V ( y * )   S y2   S y2(2)
Notations
e0*   y *  Y  Y , e1   x  X  X
E (e0* )  E (e1)  0,
 N n  x
2
E  e12     2   Cx
2
 nN  X
E  e0*2  
1
 S y2   S y2(2)   C *2
2  y
Y
E  e0*e1   
S yx
  C yx
YX
y *  Y (1  e0* )
x  X (1  e1 )
y* Y (1  e0* )
yr*  X 
 yr*  X
x X (1  e1 )
Bias of Ratio Estimator Under Non-response

 
yr*  Y 1  e0*  1  e1   yr*  Y 1  e0* 1  e1  e12
1

yr*  Y 1  e1  e12  e0*  e0*e1  e0*e12 
yr*  Y  Y  e0*  e1  e12  e0*e1  e0*e12 
yr*  Y  Y  e0*  e1  e12  e0*e1 
E  yr*  Y   YE  e0*  e1  e12  e0*e1 
Bias  yr*    Y Cx2  C yx 
Q. Find the expression of MSE of ratio estimator in case of nonresponse on study variable
only, where
E (e0* )  E (e1)  0,
 N n  x
2
E e   
2
1  2   Cx
2
 nN  X
E  e0*2  
1
 S y2   S y2(2)   C *2
Y2 
y
E  e0*e1   
S yx
  C yx
YX
Ans.
y*
y  X
*
r
x
y  Y (1  e0* )
*
x  X (1  e1 )
y*
y  X
*
r
x
Y (1  e0* )
y 
*
X
X (1  e1 )
r
 
yr*  Y 1  e0*  1  e1   yr*  Y 1  e0* 1  e1 
1
yr*  Y  Y  e0*  e1 

E  yr*  Y   Y 2 E  e0*  e1 
2 2
MSE  yr*   Y 2 E  e0*2  e12  2e0*e1 
MSE  yr*   Y 2  C *2
y   Cx  2 C yx 
2
Similary, Bias and MSE of Product Estimator

Product estimator when non-response occurs in study variable y is follow as
y*x
y *p 
X
Bias and MSE of Product Estimator Under Non-response
Bias  y *p    Y C yx 
MSE  y *p   Y 2  C *2
y   Cx  2 C yx 
2
Q. Find Bias and MSE of Ratio Estimator in Case of Non-response on Both Variable,
where
E (e0* )  E (e1* )  0,
E  e0*2  
1
 S y2   S y2(2)   C *2
Y2 
y
E  e1*2  
1
 S x2   S x2(2)   Cx*2
X2 
E  e0*e1*  
1
 S xy   S xy2 (2)   Cxy
*
YX
Ans
Ratio estimator when non-response occurs in study variable y is follow as
y*
yr**  X
x*
Variance of Hansen & Hurwitz (1946) Estimator
V ( y * )   S y2   S y2(2)
V ( x * )   S x2   S x2(2)
e0*   y *  Y  Y

e1*   x *  X  X
E (e0* )  E (e1* )  0,
E  e0*2  
1
 S y2   S y2(2)   C *2
2  y
Y
E  e1*2  
1
 S x2   S x2(2)   Cx*2
X2 
E  e0*e1*  
1
 S xy   S xy2 (2)   Cxy
*
YX 
y *  Y (1  e0* )
x *  X (1  e1* )
y*
yr**  X
x*
Y (1  e0* )
yr**  X
X (1  e1* )
Bias of Ratio Estimator Under Non-response
yr**  Y 1  e0* 1  e1*   yr  Y 1  e0 1  e1  e1 
1 ** * * *2
yr**  Y 1  e1*  e1*2  e0*  e0*e1*  e0*e1*2 

yr**  Y  Y  e0*  e1*  e1*2  e0*e1*  e0*e1*2 
yr**  Y  Y  e0*  e1*  e1*2  e0*e1* 
E  yr**  Y   YE  e0*  e1*  e1*2  e0*e1* 
Bias  yr**    Y Cx*2  C *yx 
Notations
y *  Y (1  e0* )
x  X (1  e1* )
y*
yr**  X
x*
Y (1  e0* )
yr*  X
X (1  e1* )
MSE of Ratio Estimator Under RS Sampling
yr**  Y 1  e0* 1  e1*   yr  Y 1  e0 1  e1 
1 ** * *
yr**  Y 1  e1*  e0*  e0*e1*   yr**  Y  Y  e0*  e1* 

E  yr**  Y   Y 2 E  e0*  e1* 

2 2
MSE  yr**   Y 2 E  e0*2  e1*2  2e0*e1* 

MSE  yr**   Y 2  C*2
y  Cx  2C yx 
*2 *
roduct estimator when non-response occurs in study variable y is follow as

y*x *
y 
**
p
X
Bias and ME of Product Estimator Under Non-response
Bias  y **
p   Y C yx 
 *
MSE  y **
p   Y  C y  Cx  2C yx 
2 *2 *2 *

Randomized Response Technique

Introduction:
This technique is useful to estimate the sensitive characteristics in the population. It was first
proposed by S. L. Warner in 1965.Qualitative and quantitative response models. Qualitative
response models are used to estimate the proportion of some behavior or occurrence in a
population.
For example, to estimate the “proportion of people who smoke cigarette today”
The Warner introduced the following model
Z  Yp  (1  p)(1  Y )
p be the probability to answer the sensitive question and Y the true proportion of those
interviewed bearing the sensitive property. Z is the proportion of YES answers.
The Warner model
Z  Yp  (1  p)(1  Y )
It can be transformed as
Z  p 1
Y
2 p 1
For Example
Statement 1: " I smoke cigarettes."
Statement 2: "I never smoke cigarettes."
3  1 1
Y 4 6 1
 
2 1 1 8
6
Example 1.
Statement 1: " I have falsified my tax return.“
Statement 2: " I have never falsified my tax return."
The Warner model
Z  Yp  (1  p)(1  Y )
40 1  1
 Y    1   (1  Y )
50 6  6
4 1
 1
Y 5 6
 
1
2  1
6
Example 2.
Statement 1: " Have you ever used a sick day leave when you weren't really sick? “
Statement 2: " Have you never used a sick day leave when you weren't really sick?”

The Warner model

Z  Yp  (1  p)(1  Y )
350 1  1
 Y    1   (1  Y )
500 6  6
7 1
 1
Y 10 6
1
2  1
6
Y  0.2
Quantitative Randomized Response Technique.

Y may the monthly income of the head of a household. Y may be the total value of purchase
orders in a year for a company. Y may be the number of times you bunk the class during a
semester.
Let S be a scrambling variable independent of Y. The respondent is asked to report a scrambled
response for Y given by
Z  Y S
Mean of Response S variable is introduced such that
E S   0
E  Z   E Y 
Variance of Response
V  Z   V Y   V  S 
Q. Generate a population of size 1000 using normal distribution with mean 10 and standard
deviation 2. Further consider sample size n=100. Select 10,000 random samples using the model
Z=S+Y. The variable S is taken to be normal variate with mean equal to zero and standard
deviation equal 1.
(i) Calculate the following estimator considering SRSWOR
tz
(ii) Calculate the variance for above estimators
Ans:
Let S be a scrambling variable independent of Y. The respondent is asked to report a scrambled

response for Y given by
Z  Y S
# Estimation of Mean
N=1000; n=100;
y <- rnorm(N, 10, 2)

s <- rnorm(N, 0, 1)
sa <- sample(1:N, n)
sy<-y[sa]; ss=s[sa];
z<-sy+ss
mz=mean(z)
N=1000; n=100;mz=c();
y <- rnorm(N, 10, 2)
s <- rnorm(N, 0, 1)
for(i in 1:10000)
{
sa <- sample(1:N, n)
sy<-y[sa]; ss=s[sa];
mz[i]=mean(sy+ss);
}
mean(mz)
Quantitative RRT with Non-Sensitive Auxiliary Variable.
Y may the monthly income of the head of a household and X may be her current age.Y may be
the total value of purchase orders in a year for a company and X may be the total turnover for
that company in that year.
Scrambled Response
Let Y be the study variable, a sensitive variable which cannot be observed directly due to
respondent bias. Let X be a non sensitive auxiliary variable which has a positive correlation with
Y. Let S be a scrambling variable independent of Y and X.
The respondent is asked to report a scrambled response for Y given by
Z  Y S
Respondent is asked to provide a true response for X.
Mean & Variance of Z
S variable is introduced such that
E S   0
E  Z   E Y 
V  Z   V Y   V  S 
V S 
Cz2  C y2 
Y2
Ratio Estimator of Mean for Sensitive Variable
z
yrs  X
x
Q. Find Bias & MSE of Ratio Estimator Using RRT.
The Ratio Estimator is
z
yrrt  X
x
Notations

e0   z  Z  Z , e1   x  X  X
E  e02   Cz2 , E  e12   Cx2 , E  e0e1   C zx ,
where, Czx   zxCz Cx

z  Z (1  e0 ), x  X (1  e1 )
z
yrrt  X
x
Z (1  e0 )
yrrt  X
X (1  e1 )
Bias of Ratio Estimator Using RRT
Bias of Ratio Estimator=  Y 1  e0 1  e1   Y 1  e0  1  e1  e12 
1
yrrt  Y 1  e0 1  e1 
1
1 f
Bias( yr )  Y  Cx2   zxCz Cx 
n
For MSE of Ratio Estimator

z  Z (1  e0 )
x  X (1  e1 )
z
yrrt  X
x
Z (1  e0 )
yrrt  X
X (1  e1 )
MSE of Ratio Estimator  Y 1  e0 1  e1   Y 1  e0 1  e1 
1
yrrt  Y  Y  e0  e1 
E  yrrt  Y   Y 2 E  e02  e02  2e0e1 
2
MSE  yrrt   Y 2  Cz2  Cx2  2 xz Cz Cx 

Q. Find MSE of Ratio Estimator with Two Auxiliary Variables
ANs;
X Z
yrr  y
x z
Notations

e0   y  Y  Y , e1   x  X  X , e2   z  Z  Z
E  e0   E  e1   E  e2   0
E  e02   C y2 , E  e12   Cx2 , E  e22   C z2 , E  e0e1   C yx ,
where, C yx   yxC y Cx
y  Y (1  e0 ), x  X (1  e1 ), z  Z (1  e2 )
X Z
yrr  y
x z
Y (1  e0 )
yrr  XZ
X (1  e1 ) Z (1  e2 )
MSE of the Estimator
yrr  Y (1  e0 )(1  e1 )1 (1  e2 )1  Y (1  e0 )(1  e1 )(1  e2 )
yrr  Y (1  e0 )(1  e1  e2  e1e2 )  Y (1  e0 )(1  e1  e2 )
yrr  Y (1  e0  e1  e2 )  yrr  Y  Y  e0  e1  e2 
yrr  Y  Y  e0  e1  e2   E  yrr  Y   Y 2 E  e0  e1  e2 2
2
MSE  yrr   Y 2 E  e02  e12  e22  2e0 e1  2e0 e2  2e1 e2 

MSE  yrr   Y  C y2  Cx2  Cz2  2Cxy  2C yz  2Cxz 
 
MSE  yrr   Y C y2  Cx2  Cz2  2 Cxy  C yz  Cxz 
Q. Define Regression Estimator with Two Auxiliary Variables, also find variance
expression of the estimator.
ANs:
Regression Estimator with Two Auxiliary Variables for population mean
yrreg  y  ˆyx ( X  x )  ˆyz (Z  z )
S S
ˆ yx  yx2 , ˆ yz  yz2
Sx Sz
Notations
e0   y  Y  Y , e1   x  X  X , e2   z  Z  Z
E  e0   E  e1   E  e2   0
E  e02   Cy2 , E  e12   Cx2 , E  e22   Cz2 , E  e0e1   C yx
where, Cyx   yxCyCx
Variance of Regression Estimator
yrreg  y  ˆyx ( X  x )  ˆyz (Z  z )

yrreg  Y (1  e0 )  ˆ yx  X  X (1  e1 )   ˆ yz  Z  Z (1  e2 ) 
yrreg  Y 1  e0   ˆ yx  Xe1   ˆ yz  Ze2 
E  yrreg  Y   E Ye0  ˆ yx  Xe1   ˆ yz  Ze2 

2 2
  
2 2
 
V yrreg  E Y 2e02  ˆyx2 X 2  e1   ˆ yz2 Z 2  e2   2YX ˆ yx  e0e1   2YZ ˆ yz e0 e2  2 XZ ˆ yz ˆ yx  e1e2 

V  yrreg   E Y 2e02  ˆyx2 X 2  e1   ˆ yz2 Z 2  e2   2YX ˆ yx  e0e1   2YZ ˆ yz  e0 e2   2 XZ ˆ yz ˆ yx  e1e2 

2 2
 
V  yrreg    Y 2C y2  ˆ yx
2
X 2Cx2  ˆ yz2 Z 2Cz2  2YX ˆ yxC yx  2YZ ˆ yz C yz  2 XZ ˆ yz ˆ yxCxz 
V  yrreg    S y2 1   yx2   yz2  2 yx  yz  xz 

Designs for Hard To Detect Populations

Q. Define Capture Recapture Sampling.

Select a sample, mark the selected units and released back to the population.
Select second sample independently from the population.
Estimate the proportion of the marked units from the second sample.
Suppose
T = Number of animals in the population
K = Number of animals marked on the first visit
n = Number of animals captured on the second visit
k = Number of recaptured animals that were marked
Since the proportion of the marked subjects in the recaptured sample is likely to be about the
same as the first sample in the whole population
k K

n T
The estimated population size is
nK
T 
k
Example 1.
When you are interested to estimate the homeless people in your city. An initial Sample of 100
homeless people is selected, marked and released. A sample of size 200 is selected on the second
visit and 50 of them were selected in the first visit.
T=??, K=100, n=200, k=50
nK 200 100
T   400
k 50
Example 2.
In a field study K =300 mice are caught in traps, tagged, and released. A few days later the
researchers return to the study area and independently capture n =200 mice, of which they find
that k =50 have tags. T=??, K=300, n=200, k=50
nK 200  300
T 
k 50
Example 3.
We are interested to estimate the size of a population of turtles in a wildlife preserve. An initial
Sample of 20 turtles is selected, marked and released. A sample of size 30 is selected on the
second visit and 10 marked turtles found among them.
nK 20  30
T   60
k 10
Q.Define Line and Point Transects

Mostly used for animal or plant species. Select the sample of n lines of
size L. Observer moves along a selected line and notes the location
relative to the line of every individual of the species detected.
Narrow-Strip Method
y0
D
2w0 L
Ay0
Est.Total  AD 
2w0 L

Example
On a line transect of length L =100 meters, a total of y =18 birds were detected at the following
distances (in meters) from the transect line
0, 0, 1, 3, 7, 11, 11, 12, 15, 15, 18, 19, 21, 23, 28, 33, 34, 44.
It is desired to estimate the density of birds in the study region. w0=20
y0
D
2w0 L
Sites are selected as the lines are selected in line transects.
Observations are obtained on selected sites.
Estimation is same as in line transects
163. Explain Adaptive Cluster Sampling
When the selection procedure depends on the observations during the survey
Select an initial sample of size n with a suitable design.
Observe the selected units for a specified condition.
If any of the initially selected unit satisfied the pre-defined condition, its adjacent neighboring
units will be sampled and investigated.
Adaptive Cluster Sampling:
• When the selection procedure depends on the observations during the survey.
• Select an initial sample of size n with a suitable design.
• Observe the selected units for a specified condition.
• If any of the initially selected unit satisfied the pre-defined condition, its adjacent
neighboring units will be sampled and investigated.
The sample mean is
1
wy   in1 wyi
n


Handouts STA632

Uploaded by

Copyright:

Available Formats

Handouts STA632

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handouts STA632

Uploaded by

Copyright:

Available Formats

Course Handouts

Q . Define sampling error?

White ball Black ball

Virtual University of Pakistan Page 3

Virtual University of Pakistan Page 4

Example: Consider a population of size N=3, as A, B, C

Example: Consider a population of size N=3, as, A, B, C

Q. Define stratified sampling?

Virtual University of Pakistan Page 5

Q . Define circular systematic sampling?

Virtual University of Pakistan Page 6

SIMPLE RANDOM SAMPLING

Virtual University of Pakistan Page 7

The population mean Y is

The expectation of y , by definition is,

Virtual University of Pakistan Page 8

Virtual University of Pakistan Page 9

After substitution we get

Variance of sample mean for SRSWR:

Virtual University of Pakistan Page 11

Proof: We know that

Virtual University of Pakistan Page 12

(n 1) E(s2)  n N 1S 2  n E( y  Y )2.

Hence for simple random sampling without replacement

and for simple random sampling with replacement

The standard error of the sample mean

Virtual University of Pakistan Page 13

Virtual University of Pakistan Page 14

Virtual University of Pakistan Page 15

Virtual University of Pakistan Page 16

Q . How sampling is done by using R?

Virtual University of Pakistan Page 17

Virtual University of Pakistan Page 18

Virtual University of Pakistan Page 19

Virtual University of Pakistan Page 20

We assume that sample mean is normally distributed, i.e.

Virtual University of Pakistan Page 21

The loss due to sample mean is not equal to population mean

Since Yi takes the values as 1 and 0.

Virtual University of Pakistan Page 22

When fpc is ignored var( p)  pq / (n  1)

Virtual University of Pakistan Page 23

Virtual University of Pakistan Page 24

Confidence Interval for Mean (Unknown variance)

The 95% confidence limits will be

The Lower Limit

The Upper Limit

Confidence Interval of Mean Using R

Virtual University of Pakistan Page 25

#variance of the sample vys<-var(ys)

The 95% confidence limits will be

0.275 1.96(0.0499)  0.1772

Virtual University of Pakistan Page 26

Virtual University of Pakistan Page 27

Sample mean of stratum  Yhi

As in Simple Random Sampling

Sample Mean N h  nh Sh2

Taking expectation of the sample mean of stratified sample, we have:

Virtual University of Pakistan Page 29

k Wh2 Sh2 k Wh Sh2