Nothing Special   »   [go: up one dir, main page]

Handouts STA632

Download as pdf or txt
Download as pdf or txt
You are on page 1of 220

Course Handouts

Sampling Techniques
Virtual University of Pakistan
STA632-Sampling Techniques

BASIC CONCEPTS
Q . Define Sampling Techniques?
Ans: The data is very important for decision making. Appropriate method of data collection
is desired. How best to obtain the data or sample. How best to use the data/sample to estimate
the characteristic of the whole population. There are two parts of any sampling strategy. First
is the selection procedure. Second is the estimation procedure.

Q .Define sampling?
Ans: The method of selecting the sample is called sampling design.
Q . Define sample?
Ans: A perfect sample would be a version of population, mirroring every characteristic of
population. (Lohr,2nd edition).A good sample is a small but representative part of the population.
Q . What is difference between Observation and Sampling Units?
Ans: A unit that can be selected for a sample is called sampling unit. An object on which a
measurement is taken is called observation unit.
Q . Define sampling frame?
Ans: The sampling frame is the list/map from which the potential sampling units are drawn. For
example, List of all the class rooms , Map of area containing farms.
Q . Distinguish between parameter and statistic?
Ans: A parameter is any summary number described the whole population. A Statistic is any
summary number obtained from a sample.

Sample
(Subset of class) Statistic
Virtual University of Pakistan Page 2
STA632-Sampling Techniques

Population
Parameter
(Entire class)

Q . Define sampling error?


Ans: The errors in the estimates because sample is the part of population. The difference
between the parameter and statistic is considered a sampling error.
Sampling Error= Statistic-Parameter.
Q . Define non-sampling error?
Ans: These errors can occur even the whole population is studied. Non-sampling error cannot
attributed to the sampling.
Q . What is selection bias?
Ans: When a random process is not considered in selection of sample. Some part of the
population is ignored during selection of sample. Some results of study may not be accurate.
Q . What is measurement error?
Ans: When the response has a tendency to differ from the true value in one direction. problem
with the measurement instrument.
Q . Define Nonresponse?
Ans: The individuals chosen for the sample are not ready to participate in the survey. It is called
nonresponse. This is a type of selection bias.
Q . What is probability sample?
Ans: This is a sample which is obtained when each unit of the population has some chances for
being included into the sample. This is obtained by a random process.
Example: A bag consists on one white and one black ball.

White ball Black ball


1 1

Virtual University of Pakistan Page 3


STA632-Sampling Techniques

Number of whiteballs
P(White) 
Number of total balls
1
P(White) 
2
Q . Define randomness?
Ans: When a unit has some chances of selection in the sample. When we are not certain about
the selection of any unit
Q . How many types of probability sampling?
Ans: Simple random sampling, stratified sampling, cluster sampling, systematic sampling,
multistage /multiphase sampling.
Q. Define Non-probability sampling?
Ans: No random process is followed for the selection of units. The population is evenly
distributed with respect to the characteristics of interest.
Q . What is convenience sampling?
Ans: Respondents are selected according to convenience of researcher. This is also called
accidental sampling, opportunity sampling or grab sampling.
Q. Define purposive sampling?
Ans: The units which are appropriate to meet the objective of study. The selection is based on
the judgment of researcher.
Q. What is quota sampling?
Ans: The population is divided into subgroups on the basis of similar characteristics. The
subgroups are called “Quota”. The nonrandom selection is made in each quota.
Q . Define snow-ball sampling?
Ans: This is also called chain sampling so it works like a chain. The selected subject asked for
the assistance to reach the other subjects.
Q. Define simple random sampling?

Virtual University of Pakistan Page 4


STA632-Sampling Techniques

Ans: Each member of the population has an equal probability of being included in the sample.
Each sample of size ‘n’ has an equal probability of being selected in the sample.
Q. What is the difference between sampling with replacement and without replacement?
Ans: Sampling With Replacement: If the unit is replaced before the selection of next unit then
sampling is with replacement (WR).

Nn 1
Total samples = , P( S ) 
Nn

Example: Consider a population of size N=3, as A, B, C


The possible samples of size n=2 are
S1=(A,A), S2=(A,B), S3=(A,C), S4= (B,A), S5=(B,B), S6=(B,C), S7= (C,A), S8=(C,B),
S9=(C,C)
1 1 1
P  Si   n
 2 
N 3 9
Sampling without Replacement: If the unit is not replaced before the selection of next unit
then sampling is without replacement (WOR).
N 1
Total Samples =
Cn , P( S )  N
Cn

Example: Consider a population of size N=3, as, A, B, C


The possible samples of size n=2 are S1=(A,B) , S2= (A,C) , S3= (B,C)

1 1 1
P(S1) = P(S2) = P(S3) = 3  N
3 C2 Cn
N!
N
Cn 
n !( N  n)!

Q. Define stratified sampling?


Ans: When population is heterogeneous with respect to characteristics of interest. Divide the
population into homogeneous subgroups called strata. Select the simple random sample from
each subgroup.
Q . Define linear systematic sampling?
Ans: First unit is selected randomly from first ‘k’ units. Every kth unit is included in the sample.
The interval ‘k’ is calculated by dividing the population size to the sample size.

Virtual University of Pakistan Page 5


STA632-Sampling Techniques

Q . Define circular systematic sampling?


Ans: ‘k’ will be decided by rounding down to the integer nearest to N/n. First unit will be
selected at random from 1 to N units. Every kth unit is included in the sample.
Q . Define cluster sampling?
Ans: A cluster is the sampling unit consisting on the observation units. Any sampling method
can be used for selection of clusters. All the units within a cluster are studied.

Virtual University of Pakistan Page 6


STA632-Sampling Techniques

SIMPLE RANDOM SAMPLING

Virtual University of Pakistan Page 7


STA632-Sampling Techniques

Q . Prove that sample mean is unbiased estimator for population mean under simple
random sampling with/without replacement.
Ans: Unbiased Estimator: The estimator is unbiased if its expected value is equal to the
population parameter
E(estimator)  Parameter
The expectation of a random variable is defined as the sum of the products of the probabilities
and the variable.
N
E( y )   p Y
i i1 i i
Theorem: In simple random sampling with replacement and without replacement, the sample

mean,
y , is an unbiased estimator of the population mean Y .
Under SRSWR: y is unbiased estimate of Y when

E  y  Y

n= sample size N=population size yi = observe variable on the ith sample unit.

The population mean Y is


n
y1
n  y
i1 i .
An estimator of population total is
n
y  N
n  y
i1 i
y is unbiased estimate of Y when
E  y  Y

The expectation of y , by definition is,

Virtual University of Pakistan Page 8


STA632-Sampling Techniques

 
n
E ( y)  E  1
n

 i 
y


 i1 
n
1
n  E( y )
i1 i

 E( y )
i
N
E( y )   p Y
i i1 i i
N
  1Y
i1 N i
N
 1 Y
N i1 i

Y
E( y) Y

N 
m


Under SRSWOR: Since there are n  
 possible distinct samples for without
replacement, then
m  n y  
N 
E  y      ni  

k 1  i1 




n 

 N 1

 
N 

 
 n 1 
 
Now in
n


 possible samples, each unit is appearing   times,

m 
n y  
N
E  y      ni   
 n
k 1 i1 

  
.

N 1 N 
N 
1
n

  Yi


 n 1
  i1


n 

N
 1  Y
N i1 i

Y

Virtual University of Pakistan Page 9


STA632-Sampling Techniques

Q . Derive the Variance of Sample Mean under Simple Random Sampling (SRS) with
replacement and without replacement?
Ans: Variance of Sample Mean Under SRSWOR:

The variance of the sample mean for simple random sampling without replacement,
ywor ,
is

Var ( ywor )  N  n Sn
2
N
2
 (1 f ) Sn

where, f  n
N
Proof:

We know that
n
y 1
n  yi
i1 ,
The variance of this estimator is
 
n
Var  y   12 Var   yi 
n
 i1

 

Since the variance of the sum of the random variables is equal to the sum of the variance of
random variables plus the sum of the covariance.

Var  y   12
 n n 

 Var ( yi )   Cov ( yi y j ) 

n  i 1 
i, j 1
j i
Var ( y )  E ( y Y )2  E ( y 2) Y 2
i i i
N
 1  Y 2 Y 2
N i1 i
 N 
 1 
 NY 
  Y
2 2
N i1 i 

 N 
 1   Y Y  
2

N i1 i  

 N 1S 2
N
Virtual University of Pakistan Page 10
STA632-Sampling Techniques

Cov( y y )  E( y y ) Y 2
i j i j
N
 1
 Y Y Y 2
N ( N 1) i, j 1 i j
j i
 N 2 N
 
 1 

 Yi

 Y 2  Y 2
N  N 1
 
 
i1 i
   
 

 i 1 

 N 2 N
 
 1 

 Yi

 Y 2  Y 2
N  N 1
 
 
i1 i
   
 
 i 1
 

 N 2
 
 N 2 N


 Yi 

N  N 1
 
 1  
 Y 2   i 1 

N  N 1


 Yi 
 




 i 1


 i1 i N2 
2
 
N  

N 2  Y
i  

Cov( y y )   1 

 Yi  i 1 






   S2 / N
i j N ( N 1) 

 i1 N 

After substitution we get

n
Var  y   1 [  N 1S 2   Sn ]
n 2
n i1 N
2
i, j1
j i

n  n 1
  

 1  n N 1S 2   n  S 2   N  n Sn
  2
n2  N 
N

Variance of sample mean for SRSWR:


The variance of the sample mean for simple random sampling with replacement,
ywr , is

Var ( ywr )  N 1 Sn
2
N
2
 (1 1 ) Sn
N

Virtual University of Pakistan Page 11


STA632-Sampling Techniques

Proof: We know that


n
y 1
n  yi
i1 ,
The variance of this estimator is
 
 n
Var y  2 Var   yi 
1 



n  


i1   

Since the variance of the sum of the random variables is equal to the sum of the variance of
random variables plus the sum of the covariance
 n
 

Var y  1   Var ( yi ) 
n2 i1 
Var ( y )  E( y Y )2  E( y 2) Y 2
i i i
N
 1  Y 2 Y 2
N i1 i
 N 1S 2
N
 n N 1 2 
1
 
Var y   
n i1
2 N
S 

 
Var y  N 1S 2
Nn
Q. Derive unbiased estimator of variance under SRS with replacement and without
replacement.
Ans: For simple random sampling without replacement, s2 is an unbiased estimator of S2 and for
simple random sampling with replacement s2 is an unbiased estimator of S2 (N-1)/N.
Proof: For both simple random sampling without replacement and simple random sampling with
replacement, we have
n n 2
(n 1)s2   ( y  y)2    ( y  Y )  ( y  Y ) 
i1 i i1 i 

n n 2 n
  ( y  Y )2    y  Y   2  ( yi  Y ) ( y  Y )
i1 i i1 
i1

Virtual University of Pakistan Page 12


STA632-Sampling Techniques

n 2 n
  ( y  Y )2  n  y  Y   2( y  Y )  ( yi  Y )
i1 i  
i1
n 2
  ( y  Y )2  n  y  Y   2n( y  Y )2
i1 i  

n 2
  (y  Y ) 2 
 n y  Y 
i1 i  

Taking expectation
(n 1) E(s2)  n E( y  Y )2  n E( y  Y )2
i

E( y  Y )2  N 1 S 2,
As we know that i N therefore

(n 1) E(s2)  n N 1S 2  n E( y  Y )2.


N

1 n  S2
Now E ( y  Y )2
is the variance of the sample mean, which is


 N


 n for
1 1



 S2
simple random sampling without replacement and is N 



n for simple random
sampling with replacement.

Hence for simple random sampling without replacement

E (s 2 )  n N 1 S 2  

N  n  n S 2
n 1 N 
 N  n 1 n
 S2

and for simple random sampling with replacement

E (s 2 )  n N 1 S 2  N 1 n S 2
n 1 N N n 1 n

 N 1S 2
N
Standard error

The standard error of the sample mean


y is

Virtual University of Pakistan Page 13


STA632-Sampling Techniques

SE ( y)  Var ( y)

S 1



f for srswor
n
 


 
S 1n 1 N

  1  for srswr
 

 

Q . What is R?
Ans: R was initially written by Ross Ihaka and Robert Gentleman at the Department of
Statistics of the University of Auckland. The R software is derived from an original set of notes
describing the S and S-Plus environments.
Downloading R https://cran.r-project.org/

Virtual University of Pakistan Page 14


STA632-Sampling Techniques

Virtual University of Pakistan Page 15


STA632-Sampling Techniques

Virtual University of Pakistan Page 16


STA632-Sampling Techniques

R Commands y<-c(1,2,3,4,5)
#indexing Y[2]
#Mean mean(y)
#Standard Deviation sd(y)
#Variance var(y)

Q . How sampling is done by using R?


Ans: Sampling With replacement Sample (y, size, replace=TRUE)
out
Sampling With replacement Sample (y, size, replace=FALSE)
OR
Sample(y, size)
Consider a Population consisting on
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Enter population data in R
yp <- c(111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
Taking sample by SRSWR in R
ys <- sample(yp,5) Output: 114 121 111 186 150
Mean of sample mean(ys) Output: 136.4
Standard Deviation sd(ys) Output: 31.73

Virtual University of Pakistan Page 17


STA632-Sampling Techniques

yp <- c(11,150, 121, 198, 12, 136, 14, 129, 17, 115, 186, 110, 121, 15, 14)
#population size N<-length(yp)
#sample of size 5 ys <- sample(yp,5)
#sample size n<-length(ys)
#mean of the sample mys<-mean(ys)
#variance of the sample vys<-var(ys)
#variance of ybar vybar<-(1-n/N) *var(yp)/n
#standard error sdr<-sqrt(vybar)
#estimate of population total ept<-N*mys
#estimated variance vept<-N^2*vybar
#estimated standard error sdept<-sqrt(vept)
yp <- c(11,150, 121, 198, 12, 136, 14, 129, 17, 115, 186, 110, 121, 15, 14)
#sample of size 5 ys <- sample(yp,5, replace=TRUE)
Q. Define the Simulation Study for Sampling Strategy?
Ans: The simulation study is useful to evaluate a sampling strategy. We can generate the
populations considering specific situations. Generate the population. The sample of size ‘n’ is
obtained ‘k’ times. From each sample the estimator is obtained. The variance of ‘k’ estimators is
calculated for examining the efficiency.
Consider a Population consisting on
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Enter population data in R
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
Taking sample by SRSWOR in R ys <- sample(yp,5)
mean(ys)
Now we will take ‘k’ samples in R.
R program for k Samples
Suppose, k=10, n=5.
The ‘for loop’ is used to repeat the statements.
m1<-c();
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110,
121, 115, 114)

Virtual University of Pakistan Page 18


STA632-Sampling Techniques

for (i in 1:10){
s <- sample(yp,5)
m1[i] <- mean(s)
}
Output
Sample means with 10 repetitions
149.8 129.2 140.8 132.4 118.2 117.6 118.4 118.0 132.6 132.6
k Samples
Suppose, k=10000, n=5.
The ‘for loop’ is used to repeat the statements.
m1<-c();
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
for (i in 1:10000){
s <- sample(yp,5)
m1[i] <- mean(s)
}var(m1)
Output
On first run, the result was 101.7741
On second run, the result was 102.1403
On third run, the result was 100.2455
Q . Obtain the 10000 random sample of size 6 under SRSWOR using the following
Population and find the mean.
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Ans:
Given population is
111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114
Enter population data in R
yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
Here, k=15000, n=6.
The ‘for loop’ is used to repeat the statements.
m1<-c();

Virtual University of Pakistan Page 19


STA632-Sampling Techniques

yp <- c(111,150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
for (i in 1:10000){
s <- sample(yp,6)
m1[i] <- mean(s)
}
var(m1)

Out put
On first run, the result was 75.4034
On second run, the result was 75.9143
On third run, the result was 75.6871
Q . Obtain 1000 random number through normal distribution with mean 0 and variance 1
as population. Obtain the 10000 random sample of size 6 under SRSWOR using the
population and find the mean.
Ans: 1000 values with mean=0 and standard devitaion=1.
rnorm(n,mean,sd)
yp <- rnorm(1000,0,1)
Suppose, k=10000, n=5.
m1<-c();
yp <- rnorm(1000,0,1)
for (i in 1:10000){
s <- sample(yp,5)
m1[i] <- mean(s)
}
var(m1)
Output
On first run, the result was 0.2044265
On second run, the result was 0.1914794
On third run, the result was 0.198996
Q . Explain the Estimation of Sample Size for Mean Estimation by using an example?
Ans: Sample size Estimation

Virtual University of Pakistan Page 20


STA632-Sampling Techniques

Let ‘d’ be the margin of error with some probability  by which sampling value differs from
population value. Specifying a maximum allowable difference ‘d’ between the estimate and the
true value and allowing for a small probability .
The probability of the margin of error being less than d is given by

P  y  Y  d   

We assume that sample mean is normally distributed, i.e.


y Y
t
S .E ( y )
y Y d
t 
S.E ( y ) S.E ( y )
d  y  Y  t S.E ( y )
Consider the case of sampling without replacement.
N  n S2
d t
N n
 N  n S2 
d t 
2 2

 N n
d 2 Nn
2
  NS 2  nS 2 
t
(tS / d ) 2
n
(tS / d ) 2
1
N
When the population size is large
n  (tS / d )2
Example: A quality manager is interested to estimate the mean diameter of bolts produced in the
last week. Determine the sample size to obtain 90% confidence level for population if the error
in the estimate will not be more than 3cm. A pilot sample yields a standard deviation of 30 cm.
Solution:
n  (tS / d )2
(1.645) 2 (30) 2
n
(3) 2
n = 270.60
Sample size when cost is involved C  C0  nC1
Cost plays major role in conduct of surveys
C  Total Cost
C0  Fixed Cost
C1  Cost PerUnit

Virtual University of Pakistan Page 21


STA632-Sampling Techniques
d ( y  Y )2

The loss due to sample mean is not equal to population mean


where d is a constant.
L(n)  E (C0  nC1 )  E d ( y  Y )2 

 C0  C1E (n)  d E ( y  Y )2
L(n)  C0  C1n  d Var ( y )
S2
L(n)  C0  C1n  d
n
Derivative with respect to n and equating to zero will produce optimum value as
n  dS 2 / C1
Q . Estimation of sample size for estimation of proportion?
Ans: Suppose we have N population units i.e. Y1, Y2, …. Yi, …YN
yi = 1 if ith unit possesses a certain attribute and 0 otherwise.
The population proportion is defined
N
The sample proportion is Y   Yi N  A / N  P
i 1
n
y   yi / n  a / n  p
i 1

Since Yi takes the values as 1 and 0.


N

Y
i 1
i
2
 A  NP

n
The same is the case for sample.
y
i 1
2
i
 a  np

N N

 (Y  Y )  Y
i 1
i
2

i 1
i
2
 NY 2

N 2
 (Yi  Y )
S 2  i 1
( N  1)

N
S2  P(1  P)
N 1
NPQ
S2 
N 1

Similarly s2 = npq / (n – 1)

Virtual University of Pakistan Page 22


STA632-Sampling Techniques

For SRSWOR N  n pq
var ( pwor )  ,
N n 1

For SRSWR N  1 pq
var ( pwr )  ,
N n 1

When fpc is ignored var( p)  pq / (n  1)

For SRSWOR N n 2
VAR( ywor )  S ,
Nn
N  n NPQ
VAR( pwor )  ,
Nn N  1
N  n PQ
VAR( pwor )  ,
n N 1

N  n PQ
Standard Error of Proportion S .E ( pwor ) 
n N 1
For the large sample size
var( p)  pq / n
The standard error
S.E ( p)  pq / n
Sample size Estimation

pP d
t 
S .E ( p) S .E ( p)
d  t  S.E ( p)

t 2 PQ / d 2
n
1  N1  t 2 PQ / d 2  1
When sample size is large.
pq
n  t2
d

d  t  S.E ( p)

d 2  t 2 VAR( p)

Virtual University of Pakistan Page 23


STA632-Sampling Techniques

 N  n PQ 
d 2  t2 
 N  1 n 

 t 2 NPQ t 2 PQ 
d2    
 n  N  1 N 1

t 2 NPQ  2 t 2 PQ 
n d 
 N  1  N  1 

t 2 NPQ

n
 N  1
 2 t 2 PQ 
d  N  1 
 

t 2 NPQ

n
 N  1
1
( N  1)d 2  t 2 PQ 
N 1

t 2 NPQ
n
( N  1)d 2  t 2 PQ 

t 2 PQ
n d2
1 t 2 PQ 
 
N  d 2 
( N 1)

t 2 PQ / d 2
n
1  N1  t 2 PQ / d 2  1

Q. Derive the Confidence Interval for mean and proportion. Also explain with the help of
R.
Ans: Confidence Interval for Mean (Known variance)
The interval estimation of population mean

y  z1 S .E ( y )
2

Virtual University of Pakistan Page 24


STA632-Sampling Techniques

Lower Limit
y  z1 S .E ( y )
2

Upper Limit
y  z1 S .E ( y )
2

Confidence Interval for Mean (Unknown variance)


The interval estimation of population mean when population standard deviation is not known
y  t1 ,v
S .E ( y )
2

Example: The XYZ Company produces cold drink diet cans with standard deviation of the
amount poured into cans by automatic filling machine is 1.4 ml (milliliter). A random sample is
taken of the amount of filling in cans were 281, 278, 276, 282, 280, 279, 278, 280. Suppose that
population of filling amount follows normal distribution. Determine 95% confidence interval for
the mean amount in all cans filled by the machine.
Solution:
Given
  1.4
y
 y  279.25
n
 1.4
S .E ( y)    0.495
n 8

The 95% confidence limits will be


y  1.96S.E ( y)

The Lower Limit


279.25  1.96(0.495)  278.2998

The Upper Limit


279.25  1.96(0.495)  280.2201

Confidence Interval of Mean Using R


yp <- c(111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114)
#population size N<-length(yp)
#sample of size 5 ys <- sample(yp,5)
#sample size n<-length(ys)
#mean of the sample mys<-mean(ys)

Virtual University of Pakistan Page 25


STA632-Sampling Techniques

#variance of the sample vys<-var(ys)


#variance of ybar vybar<-var(yp)/n
#standard error sdr<-sqrt(vybar)
The interval estimation of population mean when population standard deviation is not known
error <- qnorm(0.975)*sdr
The respective lower and upper limits are
left <- mys-error
right <- mys+error
Confidence Interval for Proportion
The interval estimation of population pro-portion is

p  z1 S .E ( p)
2
A simple random sample of 80 students is taken from a population of 470 students in a
department. The total number of smokers in the sample was 22.
Confidence Interval of Smokers
Smoker are 22
22
p  0.275
q  180
 p  0.725
pq
S .E ( p)   0.0499
n

The 95% confidence limits will be


p  z1 S .E ( p)
2

p  1.96S.E ( p)
The Lower Limit

0.275 1.96(0.0499)  0.1772


The Upper Limit
0.275  1.96(0.0499)  0.3728

Virtual University of Pakistan Page 26


STA632-Sampling Techniques

STRATIFIED SAMPLING

Virtual University of Pakistan Page 27


STA632-Sampling Techniques

Notations
N= + + + ---+
Similarly,
n = n1 + n2 + n3 + ---+ nh
Nh
 Yhi
i 1
Yh   Stratum mean
Nh

nh

Sample mean of stratum  Yhi


yh  i 1
nh

As in Simple Random Sampling


E ( yh )  Yh

Sample Mean N h  nh Sh2


Var ( yh ) 
N h nh
k k
yst   N h yh / N   Wh y h
h 1 h 1

1 Nh
 Yhi  Yh 
2
Sh2  
N h  1 i 1

1 nh
Estimated population total sh2   ( yhi  y h )2
nh  1 i 1

yst  Nyst
Q. Prove that sample mean is unbiased estimator of population mean under stratified
sampling.

In stratified sampling, the sample mean is an unbiased estimator of population mean i.e.

E  yst   Yst
As in Simple Random Sampling

E ( yh )  Yh

Taking expectation of the sample mean of stratified sample, we have:

1 K
E  yst    Nh E ( yh )  Yst
Virtual University of Pakistan N h1 Page 28
STA632-Sampling Techniques

E( yh )  Y h
Similarly it can be proved that
E ( yst/ )  Y
Q . In how many allocation methods variance of sample mean can be derived in stratified
sampling?
Ans: Allocation of Sample Size
The variance of y can be derived by using following allocation methods:
st
i. Arbitrary Allocation,
ii. Proportional Allocation, and
iii. Optimum Allocation.
Q . Drive the variance of sample mean in stratified sampling using arbitrary allocation?
Ans: Arbitrary Allocation
The total sample is allocated arbitrarily among the strata
Theorem
The variance of sample mean, yst for stratified random sampling for finite population
sampling is
1 K  Sh2 
Var  yst    h  h hn 
N N  n
N 2 h1  h

1 k S2
Var  yst    Nh  Nh  nh  nh
N2 h 1 h

1 k Sh2
Var  yst    Nh2  Nh  nh 
N2 h 1 N h nh

k N h2 Sh2
Var  yst     Nh  nh 
h 1 N
2 N h nh

We know that the mean for stratified random sampling design is:
1 k
yst 
N
 Nh yh
h 1

1 k
Var  yst    Nh2Var  yh 
N2 h 1

Virtual University of Pakistan Page 29


STA632-Sampling Techniques

N h  nh Sh2
Var ( yh ) 
N h nh
k N h2 Sh2
Var  yst     Nh  nh 
h 1 N
2 N h nh

k
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh

1 N h2 Sh2
k 1 k
Var  yst     2  Nh Sh2
N 2 h1 nh N h 1

1 k S2
Var  yst    Nh  Nh  nh  nh
N2 h 1 h

k Wh2 Sh2 k Wh Sh2


Var  yst    
h 1 nh h 1 N

For large value of N, the above equation transferred to


k
Var  yst    Wh2 Sh2 nh
h 1

kWh2 Sh2 k Wh Sh2


Var  yst    
h 1 nh h 1 N
Variance of total
Var  y   Var  N
/
st yst 

  N h2 Sh2 k
k
Var yst/     N h Sh2
h 1 nh h 1

Q . Derive the Variance of Sample Mean in Stratified Sampling using Proportional


Allocation?
Ans: Its Originally proposed by Bowley (1926) . If sampling fraction in all the strata is same,
then the allocation is termed as proportional allocation. The sample size of hth stratum in this
case given by
nh / n  N h / N
n
nh  N h  nWh
N

k N h2 Sh2
Var  yst     Nh  nh 
h 1 N
2 N h nh
Virtual University of Pakistan Page 30
STA632-Sampling Techniques

k
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh

 nN h 
 Nh  N 
Varprop  yst    Wh2 Sh2  
k

h 1 nN h
Nh
N
k  N  n
Varprop  yst    Wh2 Sh2
h 1 nN h

k  N  n
Varprop  yst    Wh2 Sh2
h 1 nN h

Q. Derive the variance of sample mean in stratified sampling using optimum allocation
Ans: The purpose of optimum allocation is to allocate nh in such a way that minimum variance is
achieved for a minimum cost. nh are chosen either to minimize Var ( yst ) for a fixed sample size
or cost is minimized for given variance.
The two aspect of optimum allocation are
i. Sample size is proportional to stratum size and standard deviation of stratum (Neyman
Allocation).
ii. Sample size is inversely proportional to cost.
Sample Size to Minimum Variance for Fixed Cost
In stratified random sampling Var ( yst ) will be minimum subject to the cost when nh is
proportional Wh Sh / ch to i.e.

K 

nh  Wh Sh ch    Wh Sh
 h1
ch 

The variance of sample mean yst for stratified random sampling is

k k
Var ( y st )   Wh2 Sh2 / nh   Wh Sh2 / N
h 1 h 1

k
where C = total cost, C  C0   Ch nh
h 1

We introduce Lagrange’s multiplier i.e.


 k 
Virtual University of Pakistan F  Var ( yst )    C  C0   nhCh .  Page 31
 h 1 
STA632-Sampling Techniques

k Wh2 Sh2 k Wh Sh2


Var  yst    
h 1 nh h 1 N

Partially differentiating w.r.t. nh and equating to zero


Wh2 Sh2
  Ch  0
nh2

nh   Wh Sh Ch

Wh Sh
Ch
nh 

k
k  Wh Sh Ch
h 1
 nh  n 
h 1 

nh WS Ch
 k h h
n
 Wh Sh Ch
h 1

nh WS Ch
 k h h
n
 Wh Sh Ch
h 1

n Wh Sh Ch
nh  k
 Wh Sh Ch
h 1

Variance under Optimum Allocation

Minimum Variance for Fixed Cost


k Wh2 Sh2 k Wh Sh2
Var  yst    
h 1 nh h 1 N

n Wh Sh Ch
nh  k
 Wh Sh Ch
h 1

Virtual University of Pakistan Page 32


STA632-Sampling Techniques

1 k  k 
Varmin  yst     Wh Sh ch   Wh Sh ch 
n  h1  h1 

Neyman Allocation
n Wh Sh Ch
nh  k
 Wh Sh Ch
h 1

If C1 = C2 = … = C then the cost function will be C = C0 + nC

n Wh Sh C
nh  k
 Wh Sh C
h 1

n Wh Sh
nh  k
 Wh Sh
h 1

2
1 k  1 k
Varmin  yst     Wh Sh    Wh Sh2
n  h1  N h 1

1 k  k  1 k
Varmin  yst     Wh Sh ch   Wh Sh ch    Wh Sh2
n  h1  h1  N h1
For large N
2
1 k 
Varmin  yst     Wh Sh 
n  h1 

Example:
Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18. The sample size from stratum-1, stratum-2 and
stratum-3 is arbitrarily decided as 4, 8 and 6, respectively.
The sample from each stratum is chosen as
Stra-1 Stra-2 Stra-3
1 7 23
3 12 14
2 8 20

Virtual University of Pakistan Page 33


STA632-Sampling Techniques

5 5 22
11 24
10 17
9
12

Stra-1 Stra-2 Stra-3


1 7 23
3 12 14
2 8 20
5 5 22
11 24
10 17
9
12
mean 2.75 9.25 20
Nh 100 250 350
Sh 1.708 2.493 3.847
nh 4 8 6

Stra-1 Stra-2 Stra-3


mean 2.75 9.25 20
Nh 100 250 350
Sh 1.7078 2.4928 3.8471
nh 4 8 6
k k
yst   Wh y h   N h yh / N
h 1 h 1

 N1 y1  N2 y2  N3 y3 
1
yst 
N

yst  13.70

W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

3
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh

Virtual University of Pakistan Page 34


STA632-Sampling Techniques

Stra-1 Stra-2 Stra-3


Wh 0.1429 0.3571 0.5
Nh 100 250 350
Sh2 2.9168 6.2143 14.800
nh 4 8 6

Var  yst   0.7162

Confidence Interval for Mean


The interval estimation of population mean
yst  z1 S .E ( yst )
2

13.7  1.96(0.8463)
Lower Limit
13.7  1.96(0.8463)  11.41
Upper Limit
13.7  1.96(0.8463)  14.73

Example:
Ans: Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18.
The sample size from stratum-1, stratum-2 and stratum-3 is
n 18
nh   6
L 3
The sample from each stratum is chosen as
Stra-1 Stra-2 Stra-3
1 7 23
3 12 14
2 8 20
5 5 22
4 11 24
3 10 17

Virtual University of Pakistan Page 35


STA632-Sampling Techniques

Stra-1 Stra-2 Stra-3


1 7 23
3 12 14
2 8 20
5 5 22
4 11 24
3 10 17

mean 3 8.83 20
Nh 100 250 350
Sh 1.4142 2.6394 3.8471
nh 6 6 6

Stra-1 Stra-2 Stra-3


mean 3 8.83 20
Nh 100 250 350
Sh 1.4142 2.6394 3.8471
nh 6 6 6

k k
yst   Wh y h   N h yh / N
h 1 h 1

 N1 y1  N2 y2  N3 y3 
1
yst 
N
yst  13.58
3
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh

W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

Stra-1 Stra-2 Stra-3


Wh 0.1429 0.3571 0.5
Nh 100 250 350
Sh2 2 6.967 14.800
nh 6 6 6

Virtual University of Pakistan Page 36


STA632-Sampling Techniques

Var  yst   0.7163

Confidence Interval for Mean


The interval estimation of population mean
yst  z1 S .E ( yst )
2

13.58  1.96(0.8701)
Lower Limit
13.58 1.96(0.8701)  11.36

Upper Limit
13.58  1.96(0.8701)  14.78

Example:
Ans: Consider a population of size 700 consisting on three strata such that N1=100, N2=250 and
N3=350. The required sample size is 18. First we will allocate the sample size to each stratum
according to proportional allocation.

Sample Size Allocation


N1=100, N2=250, N3=350, N=700, n=18
n
nh  Nh
N
n 18 n 18
n1  N1  100  3 n2  N2  250  6
N 700 N 700
n 18
n3  N3  350  9
N 700

The sample from each stratum is chosen as


Stra-1 Stra-2 Stra-3
1 5 23
2 11 14
5 10 20
9 22
12 24
6 17
23
21

Virtual University of Pakistan Page 37


STA632-Sampling Techniques

19

Stra-1 Stra-2 Stra-3


1 5 23
2 11 14
5 10 20
9 22
12 24
6 17
23
21
19
mean 2.67 8.83 19.86
Nh 100 250 350
nh 3 6 9

Stra-1 Stra-2 Stra-3


mean 2.67 8.83 19.86
Nh 100 250 350
nh 3 6 9
k k
yst   Wh yh   N h yh / N
h 1 h 1

 N1 y1  N2 y2  N3 y3 
1
yst 
N

yst  13.46

Stra-1 Stra-2 Stra-3


Nh 100 250 350
nh 3 6 9
Sh2 4.333 7.77 12.476
3
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh

Virtual University of Pakistan Page 38


STA632-Sampling Techniques

W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

Var  yst   0.527

Confidence Interval for Mean


The interval estimation of population mean
yst  z1 S .E ( yst )
2

13.46  1.96 (0.726)

Lower Limit
13.46  1.96 (0.726)  12.47
Upper Limit
13.46  1.96 (0.726)  15.32
Q . Give an example to explain how Sample Size is obtained in Optimum Allocation?
Ans: A manufacturing company in interested to conduct a survey about a certain product from
three towns (say A, B, and C) of a city. The towns are different from each other with respect to
the household income. The number of houses in Town A, B, and C are 170, 135, and 80,
respectively.
The company finds that obtaining an observation cost from town A or B is same as Rs.500 (i.e.
c1= c2=500). The cost per observation in the town C is Rs. 800. (i.e.c3=800).

S1  3, S2  7, S3  10

The overall sample size with certain margin or error is 30. Find the sample size from each Town
(stratum) n1 , n2 , n3  ?
Sample Size in Optimum

Town-A Town-B Town-C

Sh 3 7 10

Nh 170 135 80

Ch 500 500 800

 
 N S / c 
nh  n  3 h h h

 N S / c 
Virtual University of Pakistan  h h h  Page 39
 h 1 
STA632-Sampling Techniques

Stra-1 Stra-2 Stra-3


Sh 3 7 10
Nh 170 135 80
Ch 500 500 800
NhSh 510 945 800
NhSh/
sqrt(Ch) 22.8078 42.2616 28.2843
3

N S
h 1
h h / ch  93.35

 
 N S / c 
nh  n  3 h h h

 N S / c 
 h h h 
   h 1 
 NS / c 
n1  n  3 1 1 1
  22.8078 
 N S / c  n1  30    7.33  7
 h h h   93.35 
 h 1 
 
 N S / c 
 42.2616 
n2  n  3 2 2 2
 n2  30    13.58  14
 N S / c   93.35 
 h h h 
 h 1 
 
 NS / c   28.2843 
n3  n  3 3 3 3
 n3  30    9.09  9
 N S / c   93.35 
 h h h 
 h 1 

n1  7, n2  14, n3  9
Q . By using an example explain the Variance for Optimum Allocation
Ans: n  7, n  14, n  9
1 2 3

Stra-1 Stra-2 Stra-3


1 5, 7 23
2 11, 6 14

Virtual University of Pakistan Page 40


STA632-Sampling Techniques

5 10, 8 20
4 9, 12 22
3 12, 11 24
6 8, 9 17
10 11,13 23
21
19

Stra-1 Stra-2 Stra-3


Mean 4.4286 9.4286 20.333

Nh 170 135 80

nh 7 14 9

k k
yst   Wh yh   N h yh / N yst 
1
 N1 y1  N2 y2  N3 y3 
h 1 h 1 N

yst  9.47

Stra-1 Stra-2 Stra-3


Mean 4.4286 9.4286 20.3333
Nh 170 135 80
nh 7 14 9
Sh2 8.9523 5.8022 10.5
3
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh
2  N1  n1 
W12 S1  W22 S2 2  N 2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

Var  yst   0.329


Confidence Interval
The interval estimation of population mean
yst  z1 S .E ( yst )
2

9.49  1.96(0.57399)
Lower Limit
9.49 1.96(0.57399)  8.36
Upper Limit
Virtual University of Pakistan 9.49  1.96(0.57399)  10.61 Page 41
STA632-Sampling Techniques

Q . Give an example to explain how Sample Size is obtained in Neyman Allocation?


Ans: A manufacturing company in interested to conduct a survey about a certain product from
three towns (say A, B, and C) of a city. The towns are different from each other with respect to
the household income. The number of houses in Town A, B, and C are 170, 135, and 80,
respectively.
The company finds that obtaining an observation cost from town A or B or c is same.

S1  3, S2  7, S3  10
The overall sample size with certain margin or error is 30. Find the sample size from each Town
(stratum)
n1 , n2 , n3  ?

Town-A Town-B Town-C


Sh 3 7 10
Nh 170 135 80
Ch 500 500 500

 
 N S 
nh  n  3 h h 
 N S 
 h h 
 h 1 
Stra-1 Stra-2 Stra-3
Sh 3 7 10
Nh 170 135 80
Ch 500 500 800
NhSh 510 945 800
 
 N S 
3

N S
h 1
h h  2255
nh  n  3 h h 
 N S 
 h h 
   h 1 
 NS 
n1  n  3 1 1   510 
n1  30    6.78  7
 N S 
 h h   2255 
 h 1 
   945 
 NS  n2  30    12.57  12
n3  n  3 3 3   2255 
 
  N h Sh 
 h 1 

Virtual University of Pakistan Page 42


STA632-Sampling Techniques

 
 N S 
n2  n  3 2 2   800 
 N S  n3  30    10.64  11
 h h   2255 
 h 1 

n1  7, n2  12, n3  11

Q . Derive the Variance for Neyman Allocation by using an example?


Ans:
n1  7, n2  12, n3  11
Stra-1 Stra-2 Stra-3
1 5, 7 23
2 11, 6 14
5 10, 8 20
4 9, 12 22
3 12, 11 24
6 8, 9 17
10 23
21,19
20,18

Stra-1 Stra-2 Stra-3


Mean 4.4286 9.0000 20.0909
Nh 170 135 80
nh 7 12 11

k k
 N1 y1  N2 y2  N3 y3 
1
yst   Wh yh   N h yh / N yst 
h 1 h 1 N

yst  9.29

Stra-1 Stra-2 Stra-3


Mean 4.4286 9.0000 20.0909
Nh 170 135 80
nh 7 12 11
Sh2 8.9524 5.2727 8.8909

Virtual University of Pakistan Page 43


STA632-Sampling Techniques

3
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh

W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

st Var  y   0.3184
Confidence Interval
The interval estimation of population mean
yst  z1 S .E ( yst )
2
9.29  1.96(0.5643)

Lower Limit
9.29  1.96 (0.5643)  8.18
Upper Limit
9.29  1.96(0.5643)  10.39

Q . Give Comparison of Allocation Methods?


Ans: Variance (Rand) & Variance (Prop)
N n 2
Varran  y   S
Nn

 Yi  Y 
N 2

S 2  i 1
N 1
k Nh
  (Yhi  Y )2
h 1 i 1
S2 
N 1
k Nh

N n
  (Yhi  Y )2
Varran  y   h 1 i 1
Nn N 1

 Yhi  Y h   Y h  Y 
k Nh k Nh 2
  (Yhi  Y )2  
h 1 i 1 h 1 i 1

  Yhi  Y h     Yhi  Y h Y h  Y 


Nh
k
 2 2
 Yh Y
h 1 i 1 

Virtual University of Pakistan Page 44


STA632-Sampling Techniques

  Yhi  Y h   
k Nh 2 k Nh 2
   Y h Y
h 1 i 1 h 1 i 1

 Y h  Y   0
k
as
h 1

 
k k 2
( N  1) S 2    Nh  1 Sh2   Nh Y h  Y
h 1 h 1
for the large N

 
k k 2
NS 2   N h Sh2   N h Y h  Y
h 1 h 1

 
kNh 2 k Nh 2
S2   Sh   Y h Y
h 1 N h 1 N

 
k k 2
  Wh Sh2   Wh Y h  Y
h 1 h 1

 
k k 2
 S 2    Wh Sh2    Wh Y h  Y
h 1 h 1

 
k 2
Vran V prop    Wh Y h  Y
h 1

Vran V prop
Variance (Prop) & Variance (Opt)
N n k
Varprop  yst    Wh Sh2
Nn h1
2
1 k  1 k
Varopt  yst     Wh Sh    Wh Sh2
n  h1  N h 1

Var ( y prop )  Var ( yopt )

1 k
2
 k 
   Wh Sh2    Wh Sh  
n  h1  h1  
 

1 k
2 2
 k   k 
 
Var y prop  Var yopt      Wh S h2  2   Wh S h     Wh S h 
n  h1


  h1   h1  

Virtual University of Pakistan Page 45


STA632-Sampling Techniques

1 k
2
 k  k  k  k 
   Wh Sh2  2   Wh Sh   Wh Sh    Wh   Wh Sh  
n  h1  h1  h1  h1  h1  
 

2
 
    1 k k
Var y prop  Var yopt   Wh  Sh   Wh Sh 
n h1  h 1 
2
 
    1 k k
Var y prop  Var yopt   Wh  Sh   Wh Sh 
Comparison n h1  h 1 
V prop  Vopt

Vran V prop
Vran V prop  Vopt

Example:
All the 80 farms in a population are stratified by farm size. The expenditure on the insecticides
used during the last year by each farmer is presented in table

Large farmers Medium farmers Small farmers


75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37
• Select a stratified sample of 24 farmers by using equal allocation
• Compute the variance of sample mean under simple random sampling without
replacement.
• Compute the variance of sample mean under stratified sampling using equal allocation.
• Compare the variances

Virtual University of Pakistan Page 46


STA632-Sampling Techniques

Ans:
Population Mean
It is given that N=80,
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75  65  .....  16)
Y   Yi / N   47.79
i 1 80
Variance Under SRSWOR
The overall standard deviation is

Var( y)  N  n S 2
Nn
 Yi  Y 
N 2

S 2  i 1
N 1
 268.68

Var( y)   80  24  268.68
 

 80 24 

Var( y)  7.84
Variance under Stratified Sampling
Stra-1 Stra-2 Stra-3
Wh 0.25 0.45 0.30
Nh 20 36 24
Sh2 169.52 70.56 61.45
nh 8 8 8
3
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh

W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3
Var  yst   2.64
Under SRSWOR
Var( y)  7.84

Under Stratified Sampling for equal allocation


Var  yst   2.64

Virtual University of Pakistan Page 47


STA632-Sampling Techniques

Q . Give an example to explain the Comparison between Proportional Allocation and


Simple Random Sampling?
Ans : All the 80 farms in a population are stratified by farm size. The expenditure on the
insecticides used during the last year by each farmer is presented in table below (Source:
Elements of Survey Sampling by Singh and Mangat).

Large farmers Medium farmers Small farmers


75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37

Population Mean
It is given that N=80,
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75  65  .....  16)
Y   Yi / N   47.79
i 1 80
Variance under SRSWOR
The overall standard deviation is
 Yi  Y 
N 2

S 2  i 1
N 1
 268.68

Var( y)  N  n S 2
Nn
Var( y)   80  24  268.68
 

 80 24 

Var( y)  7.84
Proportional Allocation
N1=20, N2=36, N3=24,N=80, n=24

n
n nh  24 N h
n1  N1  N 20  6
N 80
n 24 Page 48
Virtual University of Pakistan n2  N 2  36  10.8
N 80

n 24
n3  N3  24  7.2
N 80
STA632-Sampling Techniques

Variance under Proportional


Stra-1 Stra-2 Stra-3
Wh 0.25 0.45 0.30
Nh 20 36 24
Sh2 169.52 70.56 61.45
nh 6 11 7

3
Var  yst    Wh2 Sh2
 Nh  nh 
h 1 N h nh

W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

1.2361 + .9021 + .5596

Comparison of Variances Var  yst   2.698


Under SRSWOR
Var( y)  7.84
Under Stratified Sampling for proportional allocation

Var  yst   2.698

RE= 290.5

Example:
Comparison between Neyman Allocation and Simple Random Sampling?
Ans: Population Mean
It is given that N=80,
n = 24, N1 = 20, N2 = 36, and N3 = 24.
W1 = 0.25, W2 = 0.45, and W3 = 0.30.
N (75  65  .....  16)
Y   Yi / N   47.79
i 1 80
Variance under SRSWOR
The overall standard deviation is

Virtual University of Pakistan Page 49


STA632-Sampling Techniques

Var( y)  N  n S 2
Nn

 Yi  Y 
N 2

S 2  i 1
N 1
 268.68

Var( y)   80  24  268.68
 

 80 24 

Var( y)  7.84
Variance under Neyman Allocation
Stra-1 Stra-2 Stra-3

Wh 0.25 0.45 0.30

Nh 20 36 24

Sh2 169.52 70.56 61.45

     
 NS   N S   NS 
n1  n  3 1 1   8.3 n2  n  3 2 2   9.7 n3  n  3 3 3   6
 N S   N S   N S 
 h h   h h   h h 
 h 1   h 1   h 1 

Stra-1 Stra-2 Stra-3

Wh 0.25 0.45 0.30

Nh 20 36 24

Sh2 169.52 70.56 61.45

nh 8 3
Var  yst    Wh2 Sh2
10  Nh 6nh 
h 1 N h nh

W12 S12
 N1  n1 
 W22 S22
 N2  n2 
 W32 S32
 N3  n3 
N1 n1 N 2 n2 N3 n3

Virtual University of Pakistan Page 50


STA632-Sampling Techniques

.7946 + 1.0320 + .6913

Var  yst   2.5179

Comparison of Variances
Under SRSWOR
Var( y)  7.84

Under Stratified Sampling for Neyman allocation

Var  yst   2.5179

RE= 311.2

Stratified Sampling with R


Q. Define the following data in R and perform stratified sampling. Calculate the
parameters for each stratum. Also find the mean of population.
Large farmers Medium farmers Small farmers
75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37
Ans:
Defining Strata in R

• str1<-c(75,76,65,79,86,62,57,92,45,50,69,48,48,77,60,60,55,64,66,58)

• str2<c(55,40,51,45,38,55,35,33,41,30,43,48,42,53,54,38,37,36,40,52,44,36,39,47,48,46,3
9,46,42,41,28,47,61,35,31,23)

• str3<-c(35,31,26,28,38,32,36,42,18,40,33,16,25,29,18,25,28,35,32,26,13,30,19,37)

Parameters
Mean and Standard Deviation
y<-c(str1,str2,str3)
m_p=mean(y)

Virtual University of Pakistan Page 51


STA632-Sampling Techniques

sd_p=sd(y)
Output
m_p43.7875
sd_p16.39
Defining Terms
Defining Stratum Size

N1=length(str1)

N2=length(str2)

N3=length(str3)

N=N1+N2+N3

Defining Weights

W1=N1/N;W2=N2/N;

W3=N3/N

n=24
Mean & Standard Deviation
Mean of Strata

m_st1=mean(str1)

m_st2=mean(str2)

m_st3=mean(str3)

Standard Deviation of Strata

sd_st1=sd(str1)

sd_st2=sd(str2)

sd_st3=sd(str3)
Output

64.6,42.19,28.83
13.01,8.4,7.84

Variance & Stratified Mean

Virtual University of Pakistan Page 52


STA632-Sampling Techniques

Variances of Strata

var_st1=var(str1)

var_st2=var(str2)

var_st3=var(str3)

Mean of population using stratified population

m_yst=(1/N)*

(N1*m_st1 + N2*m_st2 + N3*m_st3)

Q . Define the following data in R and perform stratified sampling using the proportional
allocation using the sample size 24. Also find the mean and variance.

Large farmers Medium farmers Small farmers


75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35
60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37

Ans:

Sample Size Allocation


n1=round(n*(N1/N))

n2=round(n*(N2/N))

n3=round(n*(N3/N))
Term1=(W1^2)*var_st1*(N1-n1)/(N1*n1)
Term2=(W2^2)*var_st2*(N2-n2)/(N2*n2)

Output
24=6+11+7

Virtual University of Pakistan Page 53


STA632-Sampling Techniques

0.25,0.45,0.3

Variance under Proportional Allocation


Term3=(W3^2)*var_st3*

(N3-n3)/(N3*n3)

vp_prop=Term1+Term2+Term3

Output
2.698
Sampling from Stratum
s1=sample(str1,n1)
s2=sample(str2,n2)
s3=sample(str3,n3)
ms_st1=mean(s1)
ms_st2=mean(s2)
ms_st3=mean(s3)
m_prop=?
Output
73.17,42.55,27.14
45.58
Estimated Variance
vars_st1=var(s1);vars_st2=var(s2);vars_st3=var(s3)
Term1= (W1^2)*vars_st1*(N1-n1)/(N1*n1)
Term2=(W2^2)*vars_st2*(N2-n2)/(N2*n2)
Term3=(W3^2)*vars_st3*(N3-n3)/(N3*n3)
vs_prop =Term1+Term2+Term3
output
> vs_prop
[1] 2.57464
Q . Define the following data in R and perform stratified sampling using the Neyman
allocation using the sample size 24. Also find the mean and variance.

Large farmers Medium farmers Small farmers


75 76 55 40 51 28 35 31 26
65 79 45 38 55 47 28 38 32
86 62 35 33 41 61 36 42 18
57 92 30 43 48 35 40 33 16
45 50 42 53 54 31 25 29
69 48 38 37 36 23 18 25
48 77 40 52 44 28 35

Virtual University of Pakistan Page 54


STA632-Sampling Techniques

60 60 36 39 47 32 26
55 64 48 46 39 13 30
66 58 46 42 41 19 37

Ans:
Defining Strata in R

str1<-c(75,76,65,79,86,62,57,92,45,50,69,48,48,77,60,60,55,64,66,58)
str2<c(55,40,51,45,38,55,35,33,41,30,43,48,42,53,54,38,37,36,40,52,44,36,39,47,48,46,39,46,42
,41,28,47,61,35,31,23)
str3<-c(35,31,26,28,38,32,36,42,18,40,33,16,25,29,18,25,28,35,32,26,13,30,19,37)
Parameters

Mean and Standard Deviation

y<-c(str1,str2,str3)
m_p=mean(y)
sd_p=sd(y)
m_p 43.7875
sd_p 16.39
Defining Terms
Defining Stratum Size
N1=length(str1)
N2=length(str2)
N3=length(str3)
N=N1+N2+N3
Defining Weights
W1=N1/N; W2=N2/N; W3=N3/N
n=24
Output
80=20+36+24
0.25,0.45,0.3
Sample Size Allocation
nn1=round(n*(N1*sd_st1/sum))
nn2=round(n*(N2*sd_st2/sum))
nn3=round(n*(N3*sd_st3/sum))
Term1=(W1^2)*var_st1*
(N1-n1)/(N1*n1)
Term2=(W2^2)*var_st2*
(N2-n2)/(N2*n2)
Output
24=8+10+6
0.79,1.03,0.69
Variance Under Neyman Allocation
Term3=(W3^2)*var_st3*
(N3-n3)/(N3*n3)

Virtual University of Pakistan Page 55


STA632-Sampling Techniques

vp_Ney=Term1+Term2+Term3

Output
2.52
Sampling From Strata
s1=sample(str1,n1)
s2=sample(str2,n2)
s3=sample(str3,n3)
ms_st1=mean(s1)
ms_st2=mean(s2)
ms_st3=mean(s3)
m_Ney=?
Output
68.13, 40.2,26.83
43.17
Estimated Variance
vars_st1=var(s1);vars_st2=var(s2);vars_st3=var(s3)
Term1= (W1^2)*vars_st1*(N1-nn1)/(N1*nn1)
Term2= (W2^2)*vars_st2*(N2-nn2)/(N2*nn2)
Term3=(W3^2)*vars_st3*(N3-nn3)/(N3*nn3)
Estimated Variance
vs_Ney=Term1+Term2 +Term3
Output
> vs_Ney
Q. Define the following data in R and perform stratified sampling. Calculate the
parameters for each stratum. Also find the mean of population.

Stratum

1 12, 14, 19, 22

2 362, 441, 456, 482, 444, 472,

3 124, 189, 142, 165, 135, 140

N=16;N1=4;N2=6;N3=6; n1=2;n2=3;n3=3;n=8;

Y<-c(12,14,19,22,362,441,456,482,444,472,124,189,142,165,135,140)
Y1<-c(12,14,19,22)
Y2<-c(362,441,456,482,444,472)
Y3<-c(124,189,142,165,135,140)
N=16;N1=4;N2=6;N3=6; n1=2;n2=3;n3=3;n=8;

Virtual University of Pakistan Page 56


STA632-Sampling Techniques

w1=N1/N; w2=N2/N;w3=N3/N;
y1=c();y2=c();y3=c();yst=c(); for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
}
mean(yst); var(yst)
Output
226.138
56.25539
mean(y)
vp=(w1^2)*var(Y1)*(N1-n1)/(N1*n1) + (w2^2)*var(Y2)*(N2-n2)/(N2*n2) +
(w3^2)*var(Y3)*(N3-n3)/(N3*n3)
Output
226.1875
Vp=56.12526
Q . Generate the stratified population consisting on three strata such that stratum-1 is
normally distributed with mean 10 and standard deviation 2 with 100 values, stratum-2 is
normally distributed with mean 100 and standard deviation 2 with 500 values, stratum-3 is
normally distributed with mean 500 and standard deviation 2 with 1000 values. Find the
mean and variance for this population using the method of stratified sampling.
Ans: Simulation Study
N1=100;N2=500;N3=1000;n=50
Y1<-rnorm(N1,mean=10,sd=2)
Y2<-rnorm(N2,mean=100,sd=2)
Y3<-rnorm(N3,mean=500,sd=2)
Y<-c(Y1,Y2,Y3)

Virtual University of Pakistan Page 57


STA632-Sampling Techniques

y1=c();y2=c();y3=c();yst=c();
N=N1+N2+N3;
w1=N1/N;w2=N2/N;w3=N3/N;
n1=round(n*w1)
n2=round(n*w2)
n3=round(n*w3)
Looping
for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
}
mean(yst); var(yst)

Output
Mean(yst)=344.36
Var(yst) = 0.08
Variance of Mean Using R
mean(y)
vp=(w1^2)*var(Y1)*(N1-n1)/(N1*n1) + (w2^2)*var(Y2)*(N2-n2)/(N2*n2) +
(w3^2)*var(Y3)*(N3-n3)/(N3*n3)
Q Generate a population of size 1000 consisting on three strata such that
 200 values for stratum-1 from normal distribution with mean=2 and standard
deviation=3.
 300 values for stratum-2 from normal distribution with mean=10 and standard
deviation=9.

Virtual University of Pakistan Page 58


STA632-Sampling Techniques

 500 values for stratum-3 from normal distribution with mean=30 and standard
deviation=5.
 Allocate the sample size to each stratum by Neyman Allocation where n=50.
 Select the sample from each stratum and estimate the mean of population.
Ans:
Defining Population
N1=200;N2=300;N3=500 ; n=50
Y1<-rnorm(N1,mean=2,sd=3)
Y2<-rnorm(N2,mean=10,sd=9)
Y3<-rnorm(N3,mean=30,sd=5)
Y<-c(Y1,Y2,Y3)
Sample Size Under Neyman Allocation
N=N1+N2+N3;
w1=N1/N;w2=N2/N;w3=N3/N
sum=w1*3+w2*9+w3*5
n1=round(n*w1*3/sum)
n2=round(n*w2*9/sum)
n3=round(n*w3*5/sum)
Looping
y1=c();y2=c();y3=c();yst=c();
for(i in 1:10000){
sa1=sample(Y1,n1)
sa2=sample(Y2,n2)
sa3=sample(Y3,n3)
y1[i]=mean(sa1);
y2[i]=mean(sa2);
y3[i]=mean(sa3);
yst[i]=w1*y1[i]+w2*y2[i]+w3*y3[i];
} mean(yst); var(yst)
Output
Mean(yst)=344.36

Virtual University of Pakistan Page 59


STA632-Sampling Techniques

Var(yst) = 0.08
Q. Find the mean and variance of Proportion in Stratified Sampling
Ans: Suppose we have N population units i.e. Y1, Y2, …. Yi, …YN
yi = 1 if ith unit possesses a certain attribute and 0 otherwise.
The population proportion is defined
N
Y   Yi N  A / N  P
i 1
The sample proportion is
n
y   yi / n  a / n  p
i 1

Since Yi takes the values as 1 and 0.


N

Y
i 1
i
2
 A  NP

The same is the case for sample.


n

y
i 1
2
i
 a  np

N N

 (Yi  Y )2  Yi 2  NY 2
i 1 i 1

N 2
 (Yi  Y )
S  i 1
2
( N  1)
N
S2  P(1  P)
N 1
NPQ
S2 
N 1

Similarly s2 = npq / (n – 1)
Unbiased Variance Estimator

For SRSWOR
N n 2
VAR( ywor )  S ,
Nn

N  n NPQ
VAR( pwor )  ,
Nn N  1
Virtual University of Pakistan Page 60
STA632-Sampling Techniques

N  n PQ
VAR( pwor )  ,
n N 1
Proportion Estimation

1 k
pst 
N
 ph N h
h 1
For single stratum

Ah  N h Ph ,
S h2  N h Ph Qh / ( N h  1),
N h  nh
Var ( ph )  Ph Qh ,
( N h  1) nh
1 k
pst 
N
 ph N h
h 1

1 K
Var ( pst )   Nh2Var ( ph )
N2 h 1

N h  nh PhQh
Var ( ph ) 
N h  1 nh

1 k N h2 ( N h  nh ) PhQh
2 
Var ( pst ) 
N h 1 Nh  1 nh
k ( N h  nh ) PhQh
Var ( pst )   Wh2
h 1 Nh  1 nh

If Nh – 1 ~ Nh and nh/Nh is ignored


k
Var ( pst )   Wh2 PhQh / nh
h 1
Estimator of Variance
k ( N h  nh ) ph qh
var ( pst )   Wh2
h 1 Nh nh  1

Variance in Case Of Proportional Allocation


N n k
Varprop  yst    Wh Sh2
Nn h1

Virtual University of Pakistan Page 61


STA632-Sampling Techniques

Sh2  Nh PhQh / ( Nh  1)
N  n k Wh Nh PhQh
Varprop  pst   
Nn h1 ( N h  1)

Variance in Case of Neyman Allocation


2
1 k  1 k
Varopt  yst     Wh Sh    Wh Sh2
n  h1  N h 1

Sh2  Nh PhQh / ( Nh  1)
2
1 k  1 k
Varopt  pst     Wh N h PhQh / ( N h  1)    Wh Sh2
n  h1  N h 1

For large stratum size i.e. Nh – 1 ~ Nh

2
1 k  1 k
Varopt  pst     Wh PhQh    Wh Sh2
n  h1  N h 1
Example:
The management of a local newspaper is to decide whether it should continue with the
publication of 'Children Column', which had been introduced on experimental basis. For this
purpose, it is imperative to estimate the proportion of readers who would favor its continuance.
The frame consists of readers who had stayed with the paper for the last six months. Since
different attitudes are expected from the urban and rural readers, the population is stratified into
urban readers and rural readers. In the population, there are 73000 urban readers and 30280 rural
readers.

Sample Size Allocation


N1=73000, N2= 30280, N=103280, n=1016
1016
n1  73000  718
103280

1016
n2  30280  298
103280

Virtual University of Pakistan Page 62


STA632-Sampling Techniques

The investigator selected WOR simple random samples of 718 respondents from stratum I
(urban readers) and 298 readers from stratum II (rural readers). The number of individuals who
favor continuation of the column was 570 from stratum I and 143 from stratum II.
Sample Size Allocation

x1= 570, x2= 143, n1=718, n2=298


x1 x
p1  , p2  2
n1 n2
Out put
P1=0.7939
P2= 0.4799
Proportion Estimation

N1=73000, N2= 30280, N=103280, n=1016

1 k
pst 
N
 ph N h
h 1

1
pst   p1N1  p2 N2 
N

Output
P1=0.7939
P2= 0.4799
Pst=0.7018
Example:
The management of a local newspaper is to decide whether it should continue with the
publication of 'Children Column', which had been introduced on experimental basis. For this
purpose, it is imperative to estimate the proportion of readers who would favor its continuance.
The frame consists of readers who had stayed with the paper for the last six months. Since
different attitudes are expected from the urban and rural readers, the population is stratified into
urban readers and rural readers. In the population, there are 73000 urban readers and 30280 rural
readers.
The investigator selected WOR simple random samples of 718 respondents from stratum I
(urban readers) and 298 readers from stratum II (rural readers). The number of individuals who
favor continuation of the column was 570 from stratum I and 143 from stratum II.
Sample Size Allocation

Virtual University of Pakistan Page 63


STA632-Sampling Techniques

x1= 570, x2= 143, n1=718, n2=298


x1 570
p1    0.7939
n1 718
x2 143
p2    0.4799
n2 298

Output
P1=0.7939
P2= 0.4799
N1=73000, N2= 30280
N=103280, n=1016

N1 73000 N 2 30280
W1    0.7068 W2    0.2932
N 103280 N 103280
Proportion Estimation
N1=73000, N2= 30280, N=103280, n=1016
1 k
pst 
N
 ph N h
h 1
1
pst   p1N1  p2 N2 
N
pst  0.7018
Estimated Variance
Stra-1 Stra-2
Nh 73000 30280
nh 718 298
ph 0.7939 0.4799

k ( N h  nh ) ph qh
var ( pst )   Wh2
h 1 Nh nh  1

( N1  n1 ) p1q1 ( N 2  n2 ) p2 q2
Var ( pst )  W12  W22
N1 n1  1 N2 n2  1

Output
Ws  0.7068,0.2932
= .0001129 + .0000715 = .0001844
Confidence Interval for Proportion

The 95% confidence limits will be

Virtual University of Pakistan Page 64


STA632-Sampling Techniques

pst  1.96S.E ( pst )


The Lower Limit
0.7018  1.96(0.0136)  0.1772
The Upper Limit
0.7018  1.96(0.0136)  0.6752

Systematic Sampling

Virtual University of Pakistan Page 65


STA632-Sampling Techniques

Introduction:
First unit is selected randomly from first ‘k’ units and rest of the units are selected automatically.
Systematic sampling has many types but we discuss the commonly used methods i.e. the linear
and circular systematic sampling.

Linear Systematic Sampling

Group Sample Composition

1 1, k+1, 2k+1, …, (i-1)k+1, … (n-1)k+1


2 2, k+2, 2k+2, …, (i-1)k+2, … (n-1)k+2
‘ ‘ ‘ ‘


‘ ‘ ‘ ‘


‘ ‘ ‘ ‘

r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r

‘ ‘ ‘ ‘


‘ ‘ ‘ ‘


‘ ‘ ‘ ‘


k k, 2k, 3k ,… ,ik …. Nk

Virtual University of Pakistan Page 66


STA632-Sampling Techniques

Every kth unit is included in the sample.


The interval ‘k’ is calculated by dividing the population size to the sample size.
The sampling interval will be
k=N/n
The feature of operational convenience is prominent.
No risk to miss the large part of population.
Example
A certain company claims about their daily production in numbers as
125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.
We are interested to select the systematic sample of size 3
Here we have N=18 with n=3, so k=6
125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.

Random Start Serial Number Sampled values

1 1,7,13 125,164,155
2 2,8,14 135,169,159
3 3,9,15 157,147,139
4 4,10,16 192,150,147
5 5,11,17 151,138,149
6 6,12,18 175,167,158
N=15, n=3
125, 135,
157, 192, 151,
175,164,169, 147,
150,138, 167, 155, 159,139
Q . Prove that sample mean is unbiased estimator of population mean
Sample
Group Sample Composition
Mean

1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1


2 2, k+2, 2k+2,…, (i-1)k+2,… (n-1)k+2 y2

Virtual University of Pakistan Page 67


STA632-Sampling Techniques

‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘

r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r yr

‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk

Population Mean
The population mean is given by
1 k n
Y   yri ,
nk r 1 i 1
Expectation of Sample Mean
From the table, we can see that
1 n
yr   yri
n i 1

1 k
E ( ysy )   yr ,
k r 1
1 k 1 n
E ( ysy )    yri ,
k r 1 n i 1

1 k n
E ( ysy )    yri ,
nk r 1 i 1

1 k n
Y   yri ,
nk r 1 i 1

E ( ysy )  Y

Example
A certain company claims about their daily production in numbers as
125, 135, 157,192,151, 175,164,169,147,150,138,167,155,159,139,147,149,158.
We are interested to select the systematic sample of size 3.
Here we have N=18 with n=3, so k=6
125, 135, 157,192,151, 175,164,169,147,150,138,167,155,159,139,147,149,158.

Virtual University of Pakistan Page 68


STA632-Sampling Techniques

Random Sampled values Serial Number


Start
1 125,164,155 1,7,13
2 135,169,159 2,8,14
3 157,147,139 3,9,15
4 192,150,147 4,10,16
5 151,138,149 5,11,17
6 175,167,158 6,12,18

Sampled values Sampled Means

125,164,155 148
135,169,159 154.333
157,147,139 147.667
192,150,147 163
151,138,149 146
175,167,158 166.667

Mean=154.28
Q. Derive the Variance of Sample Mean Under Systematic Sampling?

Ans: Linear Systematic Sampling

Sample
Group Sample Composition
Mean
1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1

2 2, k+2, 2k+2,…, (i-1)k+2,… (n-1)k+2 y2

‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘

r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r yr

Virtual University of Pakistan Page 69


STA632-Sampling Techniques

‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,…,ik…. Nk yk

Variance of Sample Mean


Expressions of Variance
1 k
V ( ysy )   ( yr  Y )2
k r 1
N  1 2 k (n  1) 2
Var ( ysy )  S  Sw
N N
1 k n
Sw2    ( yri  yr )2
k (n  1) r 1 i 1

The variance of sample mean is given by

1 k
V ( ysy )  
k r 1
( yr  Y )2

k n 2
   yri  Y 
r 1 i 1
S2 
 nk  1
k n
 nk  1 S 2     yri  Y 
2

r 1 i 1

k n
 nk  1 S 2     yri  Y 
2

r 1 i 1

k n 2
    yri  yr  yr  Y 
r 1 i 1

    yri  yr      yr  Y 
k n k n 2
2

r 1 i 1 r 1 i 1

 nk  1 S 2     yri  yr 2     yr  Y 
k n k n 2

r 1 i 1 r 1 i 1

Sample
Group Sample Composition
Mean

1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1

Virtual University of Pakistan Page 70


STA632-Sampling Techniques

2 2, k+2, 2k+2,…, (i-1)k+2,… (n-1)k+2 y2

‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘

r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r yr

‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk

 nk  1 S 2     yri  yr 2     yr  Y 
k n k n 2

r 1 i 1 r 1 i 1

 nk  1 S 2  k  n  1 Sw2  n   yr  Y 
k 2

r 1

N  1 2 k (n  1) 2
Var ( ysy )  S  Sw
N N

N  1 2 k (n  1) 2
Var ( ysy )  S  Sw
N N
Example:
The heights of the 30 trees from a certain area of a forest are given by
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,
28,29,31
Select a systematic random sample of size 5.
Estimate the mean of the population
Estimate the variance.
We have
N=30, n=5;k=6

Virtual University of Pakistan Page 71


STA632-Sampling Techniques

Random Serial Serial Sampled Values


Start Number Number
1 1,7,13,19,25 1,7,13,19,25 40, 32,30,29,17
2 2,8,14,20,26 2,8,14,20,26 38,30,28,32,22
3 3,9,15,21,27 3,9,15,21,27 36,31,24,34,35
4 4,10,16,22,28 4,10,16,22,28 35,29,25,36,28
5 5,11,17,23,29 5,11,17,23,29 32,37,26,21,29
6 6,12,18,24,30 6,12,18,24,30 35,41,27,19,31

Sampled Values Sample Means

40,32,30,29,17 29.6
38,30,28,32,22 30
3 32
6,31,24,34,35 30.6
35,29,25,36,28 29
32,37,26,21,29 30.6
35,41,27,19,31

Mean and Variance

1 k
E ( ysy )   yr ,
k r 1
1 k
V ( ysy )  
k r 1
( yr  Y )2

Sum=181.8
Mean=30.3
var=0.89
Sumdev=5.34
Q . Drive the following expression of variance of sample mean under Systematic Sampling.

Virtual University of Pakistan  N  1 S 2  Page 72


V ( ysy ) 
Nn 1   w  n  1
STA632-Sampling Techniques

Proof:
1 k
V ( ysy )   ( yr  Y )2
k r 1
2
n 
k   yi 
k Vn( y )  1  i 1  Y 

    yrikr Y1  ynru  Y 
1 sy
nk (n  1) r 1 i u 1  
w   
 N  1  2
  S
 nk  1 k  n 
2
V ( ysy )  2     yri  Y  
The intra class correlation between the pairs nofkunits that are in2the same systematic sample is
1 k r 1ni 1 
V ( ysy )  2     yri  Y  
n k r 1 i 1 

n 
     yri  Y  yru  Y  
1 k 2
V ( ysy )    yrj  Y
n2 k r 1 i 1 
1 k n 
      yri  Y  yru  Y 
2 k n
2    ri
V ( ysy )  y  Y 
n k  r 1 i 1 r 1 i u 1 
1  
    yri  Y  yru  Y 
k n
V ( ysy )  2 
 nk  1 S 2

n k r 1 i u 1 
1  
 nk  1 S 2      yri  Y  yru  Y 
k n
V ( ysy )  2 
n k r 1 i u 1 
1  k n 
V ( ysy )  2  nk  1 S 2      yri  yi  yru  yu  
n k r 1 i u 1 

k n E ( yri  Y )( yru  Y )
 w  n  1 nk  1 S 2      yri  yi  yru  yu  w 
r 1 i u 1
E ( yri  Y )2

1 
V ( ysy )   nk  1 S 2  w  n  1 nk  1 S 2 
n2 k 
 nk  1 S 2 1   n  1 
V ( ysy )   w 
n2 k
 nk  1 S 2 1  
V ( ysy )   w  n  1
n2 k

OR

 nk  1 S 2 1  
V ( ysy )   n  1
n 
w
nk

OR

Virtual University of Pakistan Page 73


STA632-Sampling Techniques

 N  1 S 2 
V ( ysy ) 
Nn 1   w  n  1

Q. Describe the comparison between SRS and Systematic Sampling on th basis of variance
of sample mean?
Ans:
N  1 2 k (n  1) 2
Var ( ysy )  S  Sw
N N
N n 2
Var ( ysrs )  S
Nn
N  n 2 N  1 2 k (n  1) 2
Var ( ysrs )  Var ( ysy )  S  S  Sw
Nn N N
 N  n N  1  2 (n  1) 2
  S  Sw
 Nn N  n

Var ( ysrs )  Var ( ysy ) 


(n  1) 2
n

Sw  S 2 

Var ( ysrs )
R.E 
Var ( ysy )

N n 2
S
R.E  Nn
 N  1 S 2 1   n  1 
 w 
Nn

1
R.E 
 N  1 1   n  1 
w 
N n 

Var ( ysrs )
R.E 
Var ( ysy )

1
R.E 
 N  1 1  
 n  1
N n 
w

Virtual University of Pakistan Page 74


STA632-Sampling Techniques

Q. Derive the Variance of Stratified Sampling in Systematic setting?


Ans:
Sample
Group Sample Composition
Mean

1 1, k+1, 2k+1,…, (i-1)k+1, … (n-1)k+1 y1


2 2, k+2, 2k+2,…, (i-1)k+2,… (n-1)k+2 y2

‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘

r r, k+r, 2k+r, (i-1)k+r, … (n-1)k+r yr

‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
k k, 2k, 3k,… ,ik …. Nk yk

Stratified Sampling in Setting of Sys Sampling


1 L
ystr 
N
 Nh yh
h 1

1 n
ystr 
nk
 ky j
j 1

1 n
ystr   yj
n j 1
Virtual University of Pakistan Page 75
V  y j 
1 n
V  ystr  
n2 j 1
STA632-Sampling Techniques

1 nk 1 2
V  ystr   2  Sj
n j 1 k

k 1 n
V  ystr   2  S 2j
n k j 1

k 1 2
V  ystr   Swst
nk
nk  n 2
V  ystr   Swst
n.nk

N n 2
V  ystr   Swst
nN

N n 2
V  ystr   Swst
nN

Q . Describe a comparison between Stratified and Systematic Sampling on the basis of


variance of sample mean.
Ans:
1 k
V ( ysy )   ( yr  Y )2
k r 1
2
n n 
1 k   yi  yi 
V ( ysy )    i 1  i 1 
k r 1  n n 
 
 
2
n
1 k 
V ( ysy )  2     yri  yi  
n k r 1 i 1 

n 
   yrj  y j 
1 k
   yri  yi  yru  yu 
2
V ( ysy )  2
n k r 1 i 1 

Virtual University of Pakistan Page 76


STA632-Sampling Techniques

1 k n k n 
V ( ysy )  2     yri  yi  2
     yri  yi  yru  yu 
n k  r 1 i 1 r 1 i u 1 
1  k n 
V ( ysy )    nk  1 S 2
     yri  yi  y ru  yu  
n2 k  r 1 i u 1 
Alternative form of variance
1  k n 
V ( ysy )  2 
 nk  1 S 2      yri  yi  yru  yu 
n k r 1 i u 1 
The intra class correlation between the pairs of units that are in the same systematic sample is
E ( yri  yi )( yru  yu )
 wst 
E ( yri  yi )2

1 k n
 
nk (n  1) r 1 i u 1
  yri  yi  yru  yu 
w 
 N 1  2
  S wst
 nk 

   yrj  y j   yru  yu 
k n
 wst  n  1 nk  1 Swst
2

r 1 j u 1

1  
   yru  yu 
k n
V ( ysy )    N  n  S 2
wst     yrj  y j
n2 k  r 1 j u 1 

1 
V ( ysy )   N  n  Swst
2
  wst  n  1 N  n  S wst
2 
n2 k  

 N  n  Swst
2
V ( ysy ) 
nN
 1   wst  n  1

N n 2
Var ( ystr )  Swst
Nn

 N  n  Swst
2
 N  n  Swst
2
Var ( ysrs )  Var ( ysy ) 
nN
 1   wst  n  1  nN
Var ( ystr )
R.E 
Var ( ysy )

1
R.E 
1   wst  n  1 

Q . State the Stratified Sampling in terms of systematic sampling for population with linear
trend?

Virtual University of Pakistan Page 77


STA632-Sampling Techniques

Ans: The population increase according to linear trend. The variance of sample mean for
SRSWOR is

Var ( ywor ) 
 k  1 nk  1
12
For Stratified Sampling in Terms of Systematic Sampling
N n 2
Var ( ystr )  Swst
Nn
k

 y  Yk 
2
r
2
S wst  r 1

k 1

 y  Yk 
2
r
r 1

   yr2  Yk2  2 yr Yk 
k

r 1

 k k

   yr2  kYk2  2Yk  yr 
 r 1 r 1 
 k 
   yr2  kYk2  2kYk2 
 r 1 
k
  yr2  kYk2
r 1

2
 k 
  yr 
  yr2   r 1 
k

r 1 k

N N

 yi   i  1  2  ....  N
i 1 i 1

k k

 yr   r  1  2  ....  k
r 1 r 1

k
k  k  1
r 
r 1 2
k
k  k  1 2k  1
r
r 1
2

6

 r 
2
k

Virtual University of Pakistan  r 2


 Page 78
r 1 k
STA632-Sampling Techniques

 k  k  1 2k  1  k (k  1) 2 
  
 6 4k 

k  k  1   2k  1 (k  1) 
  
2  3 2 

k  k  1   k  1 
  
2  6 

k
k  k  1   k  1 
 y  Yk  
2
r  
r 1 2  6 
k

 y  Yk 
2
r
2
S wst  r 1

k 1
k  k  1
2
S wst 
12
N n 2
Var ( ystr )  Swst
Nn
nk  n
Var ( ystr )  2
2
Swst
n k

k 2 1
Var ( ystr ) 
12n

Q . Describe the variance of systematic sampling for population with linear trend.
Ans: Variance of sample mean under Systematic Sampling
k

 y Y 
2
r
Var ( ysy )  r 1

k
  k  
2

  yr 
1  k 2  r 1  
  yr 
k  r 1 k 
 
 
N N

 yi   i  1  2  ....  N
i 1 i 1
Virtual University of Pakistan Page 79
STA632-Sampling Techniques

k k

 yr   r  1  2  ....  k
r 1 r 1

k
k  k  1
r 
r 1 2
k
k  k  1 2k  1
r
r 1
2

6

  k  
2

  r 
1  k 2  r 1  
y
  yr 
k  r 1 k 
 
 

  k  
2

 r 
1  k 2  r 1  
 r 
k  r 1 k 
 
 

1  k  k  1 2k  1  k (k  1)  
2

   
k  6 4k 

k  k  1   2k  1 (k  1) 
   
2k  3 2 

k  k  1   k  1 
  
2k  6 

k  k  1 k  1

12k


k 2
 1
When n=1 12

Var ( ywor ) 
 k  1 nk  1
12
k 2 1
Var ( ystr ) 
12n
Virtual University of Pakistan Page 80

Var ( ysy ) 
k 2
 1
12
STA632-Sampling Techniques

When n>1

Var ( ywor ) 
 k  1 nk  1
12

k 2 1
Var ( ystr ) 
12n

Var ( ysy )
k 2
 1
12

Var ( ystr )  Var ( ysy )  Var ( ywor ) for n  1

Var ( ystr )  Var ( ysy )  Var ( ywor ) for n  1

Example: A certain company claims about their daily production as


125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.
Select the systematic sample of size 3 using R language. Also find mean of sample.
Ans:
Here we have N=18 with n=3, so k=6

y<-c(125, 135,157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158)
y<-c(125, 135,157,192,
151,175,164,169,147,
150,138,167,155,159,139,147,149,158)
n=3;N=length(y)
k=N/n
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-y[s]

Virtual University of Pakistan Page 81


STA632-Sampling Techniques

mean(sys.sample)
Output
First run=154.33
2nd run=148

Example: 2
The heights of the 30 trees from a certain area of a forest are given by
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,28,29,31
Select a systematic random sample of size 5. Also find sample mean.
Ans:
N=30, n=5;k=6
pop<-c(40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,
36,21,19,17,22,35,28,29,31)
n=5;N=length(pop);
k=N/n
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-pop[s]
mean(sys.sample)
var(sys.sample)
sd(sys.sample)

Output
29.6(8.26)
29(sd=6.04)
Example: A certain company claims about their daily production as
125, 135, 157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158.
We are interested to select the systematic sample of size 3.
Obtain 10000 samples using systematic sampling. Find the mean of each sample. Find the mean
of means and variance of means.
Ans:

Virtual University of Pakistan Page 82


STA632-Sampling Techniques

Here we have N=18 with n=3, so k=6


pop<-c(125, 135,157,192,151,175,164,169,147,150,138,167,155,159,139,147,149,158)
n=3;N=length(pop);
k=N/n
for(i in 1:10000)
{
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-pop[s]
m[i]<- mean(sys.sample)
}mean(m)
var(m)
Out put
Mean=154.29
Var=63.6

Example
The heights of the 30 trees from a certain area of a forest are given by
40, 38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,19,17,22,35,
28,29,31
Select a systematic random sample of size 5.
Obtain 5000 samples using systematic sampling.
Find the mean of each sample.
Find the mean of means and variance of means.
Ans
We have
N=30, n=5;k=6
pop<-c(40,38,36,35,32,35,32,30,31,29,37,41,30,28,24,25,26,27,29,32,34,36,21,
19,17,22,35,28,29,31)
n=5;N=length(pop);k=N/n
for(i in 1:5000)
{

Virtual University of Pakistan Page 83


STA632-Sampling Techniques

start <- sample(1:k, 1)


s <- seq(start, N, k)
sys.sample<-pop[s]
m[i]<- mean(sys.sample)
}
mean(m)
var(m)
Output
Mean=30.29
Var=0.89
Example:
Generate a population of size 1000 values from normal distribution with mean=2 and standard
deviation=3.
Select the 10000 samples each of size 50 using systematic sampling technique and estimate the
mean of each sample.
Find the mean and variance of 10000 means.
Ans:
N=1000; n=50;k=N/n;m=c();
pop<-rnorm(N,mean=2,sd=3)
for(i in 1:10000)
{
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-pop[s]
m[i]=mean(sys.sample)
-} mean(m);var(m)
Out put
2.01667
0.1929605
Example: 2
Generate a population of size 500 values from normal distribution with mean=20 and standard
deviation=10.

Virtual University of Pakistan Page 84


STA632-Sampling Techniques

Select the 5000 sample each of size 50 using the systematic sampling technique and estimate the
mean of each sample.Find the mean and variance of 5000 means.
Ans:
N=500; n=50;k=N/n;m=c();
pop<-rnorm(N,mean=20,sd=10)
for(i in 1:5000)
{
start <- sample(1:k, 1)
s <- seq(start, N, k)
sys.sample<-pop[s]
m[i]=mean(sys.sample)
}
mean(m);var(m);

Output
19.93
1.456223

Virtual University of Pakistan Page 85


STA632-Sampling Techniques

Cluster Sampling

Virtual University of Pakistan Page 86


STA632-Sampling Techniques

Introduction:

 Cluster Sampling
 A cluster is the sampling unit consisting on the observation units.
 Any sampling method can be used for selection of clusters.
 All the units within a cluster are studied.
 Nine clusters each of same size.

 Clusters Settings

Sample
Mean

Cluster 1, 2 3 …, j, … M
y1.
1 y11 y12 y13 …, y1j, … y1M  y1
M
2 y21 y22 y23 …, y2j, … y2M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
yi.
i yi1 yi2 yi3 …, yij, … yiM  yi
M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
N yN1 yN2 yN3 …, yNj, … yNM yN

Virtual University of Pakistan Page 87


STA632-Sampling Techniques

Total y.1 y.1 y.1 y.1 Y

 Notations
 Suppose we have a population of N Clusters each of size M.
 NM= Total elements in population.
 yij = observation value of jth element in ith cluster.
 nM= Total elements in sample.
 yi.= yi= Total of ith cluster.
M
yi.   yij
j 1

 y = Sample mean per cluster.


n
 yi
i 1
y
n
 y = Overall sample mean.
M n
  yi
j 1 i 1
y
Mn
 Y = Overall population mean.
M n
  yij
j 1 i 1
y
Mn

 Let yi be the total of ith cluster.

y
y
M
 Unbiased mean estimator
M E( y )  E( y )
M E( y )  Y

E( y )  Y

Q. Find the variance of Sample Mean in Cluster Sampling considering equal cluster sizes.
Ans:
 Unbiased mean estimator
 The unbiased mean estimator is,

Virtual University of Pakistan Page 88


STA632-Sampling Techniques

n M
  yij
i 1 j 1
y
Mn

V  y  E y Y  
2

 Variance of mean estimator

1 f  2
V y    Sb
 n 
  yi  Y 
N 2

i 1
Sb2 
N 1

  yi  Y 
N 2

 1  f  i 1
V y   
 n  N 1
2
1 
  yi  Y   
N 2 N M
  yij  Y 
i 1 M
i 1  j 1 
2
1 
    2     ij
y  Y  yik  Y 
N M 1 N M 2 1 N n
M yij  Y   2   yij  Y 

i 1  j 1  M i 1 j 1 M i 1 j  k 1
2
1  1  
  M   yij  Y    y 
N M N n
   NM  1 S 2
    yij  Y ik Y 

i 1  j 1  M 2  i 1 j  k 1 
2
1 M  1  
    
N N n
  M  yij  Y   2  NM  1 S 2     yij  Y yik  Y
i 1 
 j 1  M  i 1 j  k 1
The interclass correlation between the elements within a cluster,
E ( yij  Y )( yik  Y )
w 
E ( yij  Y )2

  y 
1 N n
 
NM ( M  1) i 1 j  k 1
 yij  Y ik Y
w 
 NM  1  2
 NM  S
 

   yij  Y  yik  Y 
N n
w  M  1 NM  1 S 2  
i 1 j  k 1
2
1 
 
N M 1
M yij  Y   2  NM  1 S 2   w  M  1 NM  1 S 2 

i 1  j 1  M  

Virtual University of Pakistan Page 89


STA632-Sampling Techniques

2
M 
 
N
  yi  Y 
N 2
   yij  Y 
 1  f  i 1 1  1  f  i 1  j 1 
V y     2 
 n  N 1 M  n  N 1
1  1  f   NM  1 S   w  M  1 NM  1 S
2 2
 2 
M  n  N 1
1 1 f 
V ( y)  2  
M  n 
 NM  1 S 2 
N 1 1   w  M  1 
 S2 
V ( y)   1   w  M  1  NM  1  NM ,
 nM  
 
N 1  N

Q. Compare the simple random sampling and cluster Sampling in terms of variances of
sample means such that
N  n S2
Var ( ysrs ) 
N n
 NM  nM  S
2
V  ysrs    
 NM  nM
Ans:
 Comparison
N  n S2
Var ( ysrs ) 
N n
 NM  nM
2
S
V  ysrs     nM
 NM 

 
1 N M 2
S2    yij  Y
NM  1 i 1 j 1
Mean sum of squares within clusters in the population

 
N M
S 2  NM  1    yij  Y
2

i 1 j 1
Mean sum of squares for ith cluster

 
N M 2
   ( yij  yi )  ( yi  Y )
i 1 j 1
N M N
   ( yij  yi )2  M  ( yi  Y )2
i 1 j 1 i 1

Virtual University of Pakistan Page 90


STA632-Sampling Techniques

 N (M  1)Sw2  M ( N  1)Sb2
Var ( ysrs )
R.E 
Var ( y )
S2
R.E 
MSb2
N ( M  1) S w2  M ( N  1) Sb2
R.E 
MSb2  NM  1
1  N ( M  1) S w2 M ( N  1) Sb2 
   
 NM  1  MSb2 MSb2 
1  N ( M  1) S w2 
   ( N  1) 
 NM  1  MSb 2

 This value increases when Sw is large and Sb is small. So cluster sampling will be
efficient if clusters are so
 Formed that the variation the between cluster means is as small as possible while
variation within the clusters is as large as possible.

Q. Compare the simple random sampling and cluster Sampling in terms of interaclass
correlation
Ans:

Var ( ysrs )
R.E 
Var ( y )
 S2 
V ( y)  
 nM  1   w  M  1 
 
S2
V  ysrs  
nM
S2
R.E  nM
 S2 
  1   w  M  1 
 nM 
1
R.E 
1   w  M  1 

Q. Explain the concept of cluster sampling for unequal cluster sizes.


Ans:
 Clusters Settings

Virtual University of Pakistan Page 91


STA632-Sampling Techniques

Sample
Mean

Cluster 1, 2 3 …, j, … Mi
y1.
1 y11 y12 y13 …, y1j, … y1M1  y1
M
2 y21 y22 y23 …, y2j, … y2M2
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
yi.
i yi1 yi2 yi3 …, yij, … yiMi  yi
M
‘ ‘ ‘ ‘ ‘ ‘ ‘
‘ ‘ ‘ ‘ ‘ ‘ ‘
N yN1 yN2 yN3 …, yNj, … yNMN yN

Total y.1 y.1 y.1 y.1 Y

 Mean Estimator
 Since the cluster sizes are unequal, the total size is,

N
M    Mi
i 1

 The mean of ith cluster is,


Mi
 yij
j 1
yi 
Mi
 The overall mean is
N Mi
  yij
i 1 j 1
Y N
 Mi
i 1

N
 M i yi
i 1
Y 
M
 Expected Value of Sample Mean

Virtual University of Pakistan Page 92


STA632-Sampling Techniques

 The mean of clusters


N
 M i yi
i 1
Y 
M
n
 yi
i 1
y
n
N
 yi
Ey  i 1
N
Q. Find the expression of Bias for the mean estimator in cluster sampling for unequal
cluster sizes.

Ans.

The bias is defined as

Bias(T )  E (T )  
Bias( y )  E ( y )  Y
N N
 yi  M i yi
i 1
Bias( y )   i 1
N M
 N 
 M  yi N 
1 
 i 1
  M i yi 
M N i 1 
 
 
 N N 
  M i  yi N 
1  i 1 i 1
   M i yi 
M N i 1 
 
 
 N N 
1 
N  M i  yi 
  M i yi  i 1 Ni 1 
M   i 1
 
 

Q. Find the expression of mean square error for the mean estimator in cluster sampling for
unequal cluster sizes

Virtual University of Pakistan Page 93


STA632-Sampling Techniques

MSE (T )  Var (T )   Bias(T ) 


2

MSE ( y )  Var ( y )   Bias( y ) 


2

1 f  2
MSE ( y )    Sb
 n 
2
 ( N  1) 
  Cov(m, y ) 
 M  

MSE (T )  Var (T )   Bias(T ) 


2

MSE ( y )  Var ( y )   Bias( y ) 


2

1 f  2
MSE ( y )    Sb
 n 
2
 ( N  1) 
  Cov(m, y ) 
 M  

Q. Find the expected value Weighted Mean For Unequal Cluster, where is weighted mean
is given by

n
 M i yi
i 1
yw 
nM
Answer:
 Weighted Mean
 Since the cluster sizes are unequal, the mean of cluster size is,
M
M
N
 The weighted mean based on the size of ith cluster is,
n
 M i yi
i 1
yw 
nM

Virtual University of Pakistan Page 94


STA632-Sampling Techniques

 Expectation of Weighted Mean


 Taking expectation on both sides

 n 
  M i yi 
E  yw   E  i 1 
 nM 
 
 

Q. Find the variance expression of weighted Mean for unequal cluster, where weighted
mean is given by

n
 M i yi
i 1
yw 
nM
Ans:

 The variance is given by


 n 
  M i yi 
V  yw   V  i 1 
 nM 
 
 
1 f  2
V y    Sbw
 n 

 The Estimator of Variance


The estimator of variance is given by
 1 f  ˆ2
Vˆ  y     Sbw
 n 

Virtual University of Pakistan Page 95


STA632-Sampling Techniques

Example: Find mean and variance of sample mean in cluster sampling using the following
data set.
Ans:

Population of Six Clusters

Total

Cluster-1 125 115 129 134 111 614

Cluster-2 134 125 142 141 131 673

Cluster-3 144 143 122 134 126 669

Cluster-4 114 111 134 131 146 636

Cluster-5 119 126 122 129 130 626

Cluster-6 140 125 124 124 115 628

 Population Mean
N M
  Yij
i 1 j 1
Y 
MN
3846
Y  128.2
30
 Variance of Sample Mean Using Population
1 f  2
Var  y     Sb
 n 

  yi  Y 
N 2

i 1
Sb2 
N 1

Virtual University of Pakistan Page 96


STA632-Sampling Techniques

  yi  Y 
N 2

 1  f  i 1
Var  y     ,
 n  N 1
N n 1 N
Var  y     yi  yc 2
Nn N  1 i 1

 Variance of Sample Mean


N n 1 N
 
2
 yi  Y
Nn N  1 i 1

 y Y 
2
i

 y Y 
2
yi yi i

614 122.8 29.16

673 134.6 40.96

669 133.8 31.36

636 127.2 1.00

626 125.2 9.00

628 125.6 6.76

Y  128.2

N n 1 N
 
2
 yi  Y
Nn N  1 i 1
63 1
 (118.24)
6  3 6 1
 1.304

Virtual University of Pakistan Page 97


STA632-Sampling Techniques

 Population of Six Clusters

Total

Cluster-1 125 115 129 134 111 614

Cluster-2 134 125 142 141 131 673

Cluster-3 144 143 122 134 126 669

Cluster-4 114 111 134 131 146 636

Cluster-5 119 126 122 129 130 626

Cluster-6 140 125 124 124 115 628

 Mean and Variance Using All the Observations


3846
Y  128.2
30
V  y   1.304
Example: Find the mean and variance by taking a sample of size 3 from the previous data
set under cluster sampling.
Ans:
Sampled clusters

Total

Cluster-3 144 143 122 134 126 669

Cluster-4 114 111 134 131 146 636

Cluster-6 140 125 124 124 115 628


n M 3 5
  yij   yij
i 1 j 1 i 1 j 1 1933
y    128.87
Mn 5 3 15
 Estimated Variance of Sample mean

Virtual University of Pakistan Page 98


STA632-Sampling Techniques

N n 1 n
 yi  y 
2

Nn n  1 i 1

 
2
yi yi yi  Y
669 133.8 31.36

636 127.2 1.00

628 125.6 6.76

Y  128.87

 Estimated Variance
var  y 
N n 1 3
 yi  y 
2
 
Nn n  1 i 1
63 1
  37.78
63 2
 3.15

 Example-2
 420 trees is divided into 105 clusters.
 Each cluster of size 4.
 A simple random sample of 15 clusters is selected.
 Estimate the mean yield by using cluster sampling.
 Sample of 15 Clusters

Virtual University of Pakistan Page 99


STA632-Sampling Techniques

n M
  yij
i 1 j 1
y
Mn
1142.44
y  19.0407
60
 Estimated Variance
1 f  2
V y    Sb
 n 

  yi  Y 
N 2

i 1
Sb2 
N 1

  yi  Y 
N 2

 1  f  i 1
V y   
 n  N 1

N n 1 n
Var  y     yi  yc 2
Nn n  1 i 1
SS  1495.5596

 
2
cluster yi yi yi  Y

1 80.4600 20.1150 1.162

2 50.6900 12.6725 41.082

3 81.7500 20.4375 1.961

Virtual University of Pakistan Page 100


STA632-Sampling Techniques

4 63.5200 15.8800 9.967

5 97.2000 24.3000 27.699

6 26.8000 6.7000 152.202

7 99.1300 24.7825 33.011

8 63.2500 15.8125 10.397

9 179.350 44.8375 665.666

10 133.760 33.4400 207.446

11 59.1500 14.7875 18.058

12 32.8500 8.2125 117.170

13 58.3000 14.5750 19.909

14 92.7300 23.1825 17.185

15 23.5000 5.8750 173.238

Var  y 
N n 1 n
 yi  y 
2
 
Nn n  1 i 1
90 1
  1495.5596
105  15 14
 6.104
Q. Define the following data in R language in form of clusters and mean and variance.

Total

Cluster-1 125 115 129 134 111 614

Cluster-2 134 125 142 141 131 673

Cluster-3 144 143 122 134 126 669

Cluster-4 114 111 134 131 146 636

Virtual University of Pakistan Page 101


STA632-Sampling Techniques

Cluster-5 119 126 122 129 130 626

Cluster-6 140 125 124 124 115 628

Ans:

How to Do This in R?

Cluster-1 125 115 129 134 111

Cluster-2 134 125 142 141 131

Cluster-3 144 143 122 134 126

Cluster-4 114 111 134 131 146

Cluster-5 119 126 122 129 130

Cluster-6 140 125 124 124 115

#--Defining Clusters in R----


Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu4<-c(114,111,134,131,146)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)

pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)

#----Grand total----
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
N=6;M=5;
#----Mean of each cluster----
Yibar<-Yi/M
#----Population Mean----
pop.mean=sum(Yi)/(N*M)

Virtual University of Pakistan Page 102


STA632-Sampling Techniques

#----Sum of Squares----
dv.p<-(Yibar-pop.mean)^2
sdv.p<-sum(dv.p)
vr.p<-sdv.p/(N-1)
#----The Variance----
cvr.p<-((N-n)/(N*n))*vr.p
N n 1 N
 
2
 yi  Y
Nn N  1 i 1

Q. Perform the cluster sampling using R language with following data set. Also find mean
and variance.
Cluster-1 125 115 129 134 111

Cluster-2 134 125 142 141 131

Cluster-3 144 143 122 134 126

Cluster-4 114 111 134 131 146

Cluster-5 119 126 122 129 130

Cluster-6 140 125 124 124 115

Answer:
 Defining Clusters

Cluster-1 125 115 129 134 111

Cluster-2 134 125 142 141 131

Cluster-3 144 143 122 134 126

Cluster-4 114 111 134 131 146

Cluster-5 119 126 122 129 130

Cluster-6 140 125 124 124 115

#--Defining Clusters in R----


Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu4<-c(114,111,134,131,146)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)

Virtual University of Pakistan Page 103


STA632-Sampling Techniques

 Sum Of Clusters

#----Population----
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)

 Sampling and Estimated Mean

#----Grand total----
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
#------Sampling----
n=3;N=6;M=5;
yi=sample(Yi,n)
#----Estimated Mean----
yibar<-yi/M
Est.mean=sum(yi)/(n*M)

 Estimated Variance

#----Sum of Squares----
dv<-(yibar-Est.mean)^2
sdv<-sum(dv)
vr<-sdv/(n-1)
#----The Variance----
cvr<-((N-n)/(N*n))*vr
N n 1 n
 yi  y 
2

Nn n  1 i 1

Q.70 Find the mean and variance for Unequal Clusters in cluster sampling using the
following data

Clus-1 125 115 129 134

Clus-2 134 125 142 141 131 151 164 139 141

Virtual University of Pakistan Page 104


STA632-Sampling Techniques

Clus-3 144 143 122 134 126 157

Clus-4 114 111 134 131 146 152 131

Clus-5 119 126 122 129 130

Clus-6 140 125 124 124 115 111 148 157 143 151

Ans
Population of Six Clusters

Clus-1 125 115 129 134

Clus-2 134 125 142 141 131 151 164 139 141

Clus-3 144 143 122 134 126 157

Clus-4 114 111 134 131 146 152 131

Clus-5 119 126 122 129 130

Clus-6 140 125 124 124 115 111 148 157 143 151

 Population Mean in Case of Unequal Cluster Sizes

N
 M i yi
i 1
Y   Y  133.66
M
 Estimator-I: Mean of Cluster Mean
 The mean of cluster means
n
 yi
i 1
y
n
Clus-3 144 143 122 134 126 157

Clus-4 114 111 134 131 146 152 131

Clus-5 119 126 122 129 130

Virtual University of Pakistan Page 105


STA632-Sampling Techniques

n
 yi
i 1
y
n

 Mean Square Error


MSE ( y )  Var ( y )   Bias( y ) 
2

1 f  2
MSE ( y )    Sb
 n 
2
 ( N  1) 
  Cov(m, y ) 
 M  

 Population Mean in Case of Unequal Cluster Sizes

Y  133.66, Sb2  39.81


 Mean Square Error
MSE ( y )  Var ( y )   Bias( y ) 
2

1 f  2
MSE ( y )    Sb
 n 
2
 ( N  1) 
  Cov(m, y ) 
 M 

Q. Find the weighted mean and variance for Unequal Clusters in cluster sampling using the
following data

Clus-1 125 115 129 134


Clus-2 134 125 142 141 131 151 164 139 141
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151

Answer:
 Population of Six Clusters

Clus-1 125 115 129 134

Virtual University of Pakistan Page 106


STA632-Sampling Techniques

Clus-2 134 125 142 141 131 151 164 139 141
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151

N
 M i yi
 Population Mean in Case of Unequal Y  i 1  Y  133.66
M
 Estimator II: Weighted Mean

 Since the cluster sizes are unequal, the total size is

N
M    Mi
i 1
 The weighted mean based on the size of ith cluster is
n
 M i yi
i 1
yw 
nM
 Estimator II: Weighted Mean

Clus-3 1 1 1 1 1 1
144 143 122 134 126 157
Clus-4 1 1 1 1 1 1 1
114 111 134 131 146 152 131
Clus-5 1 1 1 1 1
119 126 122 129 130
n
 M i yi
i 1
yw 
nM

 Variance of Weighted Mean

 The variance is given by

 n 
  M i yi 
V  yw   V  i 1 
 nM 
 
 
 1 f  2
V  y    Sbw
 n 

Virtual University of Pakistan Page 107


STA632-Sampling Techniques

 Estimator II: Weighted Mean

yi Mi yi  M i
M

125.75 4 73.61

140.89 9 185.56

137.67 6 120.88

131.29 7 134.49

125.20 5 91.61

133.80 10 195.80

 1 f  2
V  y    Sbw
 n 

Q.72 Perform the cluster sampling using R language with following data set. Also find
mean and variance.

Clus-1 125 115 129 134


Clus-2 134 125 142 141 131 151 164 139 141
Clus-3 144 143 122 134 126 157
Clus-4 114 111 134 131 146 152 131
Clus-5 119 126 122 129 130
Clus-6 140 125 124 124 115 111 148 157 143 151
Answer:
#----------Defining Clusters------------

Clu1<-c(125,115,129,134)
Clu2<-c(134,125,142,141,131,151,164,139,141)
Clu3<-c(144,143,122,134,126,157)
Clu4<-c(114,111,134,131,146,152,131)
Clu5<-c(119,126,122,129,130)

Virtual University of Pakistan Page 108


STA632-Sampling Techniques

Clu6<-c(140,125,124,124,115,111,148,157,143,151)
 Population Mean and Sum Of Clusters

#----Population----
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
pop.mean=mean(pop)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
 Mean of Cluster Means

#----Grand total----
n=3;N=6;M=c(4,9,6,7,5,10);
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
# ----- Mean of cluster means using population----
n=3;N=6;M=c(4,9,6,7,5,10);
clu.mean.p=Yi/M
m.c.m.p=mean(clu.mean.p)
##----For MSE----
cov=cov(clu.mean.p,M)
dv.p<-(clu.mean.p-m.c.m.p)^2
sdv.p<-sum(dv.p)
vr.p<-sdv.p/(N-1)
term1<-((N-n)/(N*n))*vr.p
term2=((-(N-1)/sum(M))*cov)^2
mse=term1+term2

####-----From Sample----
j=sample(1:6,n)
yi=Yi[j];mi=M[j]
clu.mean=yi/mi
m.clu.mean=mean(clu.mean)

Virtual University of Pakistan Page 109


STA632-Sampling Techniques

Q. Perform the Simulation Study using the following data for Equal Cluster Sizes.
Total
Cluster-1 125 115 129 134 111 614
Cluster-2 134 125 142 141 131 673
Cluster-3 144 143 122 134 126 669
Cluster-4 114 111 134 131 146 636
Cluster-5 119 126 122 129 130 626
Cluster-6 140 125 124 124 115 628

Answer:
 How to Do This in R?

#--Defining Clusters in R----


Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)
 Sum Of Clusters

#----Population----
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
 Sampling and Estimated Mean

#----Grand total----
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
n=3;N=6;M=5;
for(i in 1:10000)
{yi=sample(Yi,n)
Est.mean[i]=sum(yi)/(n*M) }

Q. Perform the Simulation Study using the following data for Equal Cluster Sizes.
Answer:
 Defining Clusters

#----------Defining Clusters------------
Clu1<-c(125,115,129,134)

Virtual University of Pakistan Page 110


STA632-Sampling Techniques

Clu2<-c(134,125,142,141,131,151,164,139,141)
Clu3<-c(144,143,122,134,126,157)
Clu4<-c(114,111,134,131,146,152,131)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115,111,148,157,143,151)
 Population Mean and Sum Of Clusters

#----Population----
pop<-c(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
pop.mean=mean(pop)
#----Sum of clusters----
sumc1<-sum(Clu1)
sumc2<-sum(Clu2)
sumc3<-sum(Clu3)
sumc4<-sum(Clu4)
sumc5<-sum(Clu5)
sumc6<-sum(Clu6)
 Sum of Clusters

#----Grand total----
n=3;N=6;M=c(4,9,6,7,5,10);
Yi=c(sumc1,sumc2,sumc3,sumc4,sumc5,sumc6)
t.sum=sum(Yi)
 With Mean of Cluster Means

for(i in 1:10000)
{j=sample(1:6,n)
yi=Yi[j];mi=M[j]
clu.mean=yi/mi
m.clu.mean[i]=mean(clu.mean) }
mean(m.clu.mean)
var(m.clu.mean)
 Weighted Mean

Clus-3 144 143 122 134 126 157

Clus-4 114 111 134 131 146 152 131

Clus-5 119 126 122 129 130

n
 M i yi
i 1
yw 
nM

Virtual University of Pakistan Page 111


STA632-Sampling Techniques

 Population Mean and Sum Of Clusters

#----Sum of clusters----
mc1<-mean(Clu1)
mc2<-mean(Clu2)
mc3<-mean(Clu3)
mc4<-mean(Clu4)
mc5<-mean(Clu5)
mc6<-mean(Clu6)
mYi=c(mc1,mc2,mc3,mc4,mc5,mc6)
 With Weighted Mean

w.m=c(); Mbar=mean(M);
for(i in 1:10000)
{j=sample(1:6,n)
myi=mYi[j];mi=M[j];
w.m[i]=sum(mi*myi)/(n*Mbar) }
mean(w.m)
var(w.m)

Virtual University of Pakistan Page 112


STA632-Sampling Techniques

Unequal Probability Sampling

Virtual University of Pakistan Page 113


STA632-Sampling Techniques

Introduction:
 Hansen and Hurwitz (1943) perhaps the first who discussed the concept of unequal
probability theory.

 The most commonly used scheme is probability proportional to size (PPS).

 Sampling with respect to department size.

 Sampling with respect to number of trees.

 PPS With Replacement

 The estimator for population total Y as suggested by Hansen and Hurwitz (1943) is

1 n yi
 ,
yHH 
n i 1 pi
 Computing HH Estimator

Depar-tment Faculty size Pi Numbers

1 332 32/90 1-32

2 110 10/90 33-42

3 221 21/90 43-63

4 116 16/90 64-79

5 11 11/90 80-90

Total 90

1 n yi
yHH   ,
n i 1 pi

Virtual University of Pakistan Page 114


STA632-Sampling Techniques

Q. Prove that Hansen-Hurwitz Estimator is unbiased to population total. Also find the
variance expression.

1 n yi
yHH   ,
n i 1 pi
Answer:
The estimator for population total Y as suggested by Hansen and Hurwitz (1943) is

1 n yi
yHH   ,
n i 1 pi
1 n y
E ( y HH )  
n i 1
E( i )
pi
N
yi Y
E( )   i Pi  Y
pi i 1 Pi

Variance of Hansen-Hurwitz Estimator

1 n yi
yHH   ,
n i 1 pi
2
1 N y 
V  yHH    pi  i  Y 
2
 yi  N  yi 
n i 1  pi V     pi   Y 
  pi  i 1  pi 
 Another Form of Variance

1  N Yi 2 
Var ( yHH )   Y 2 
n  i 1 pi 
Q. Find all possible samples of size two using the following data and find Hansen-
Hurwitz Estimator. Also find mean and variance of Hansen-Hurwitz Estimator.

Y 0 1 2 3
Yi .5 .2 .1 .2
Z 1 1 1 4
Zi 1 2 3 4

Answer:
 The following is a population with four values with respective size.

Y 0 1 2 3
Yi .5 .2 .1 .2
Z 1 1 1 4
Zi 1 2 3 4

 We are interested to take all possible samples of size 2.

Virtual University of Pakistan Page 115


STA632-Sampling Techniques

 HH estimator will be calculated for all the samples. Further mean and variance
will be obtained.

Yi Zi Pi  Zi Z i

0.5 1 0.1

1.2 2 0.2

2.1 3 0.3

3.2 4 0.4

 All Possible Samples

Small Population Example


Small Population Example
Small Population Example

 Small Population Example

Virtual University of Pakistan Page 116


STA632-Sampling Techniques

 The expected value of HH estimator is

E ( yHH )   y HH pi p j  7  Y
Var( yPPS )  E ( yPPS
2
) Y 2
Var( yPPS )  49.5  49  0.5
 Using the Formula

1  N Yi 2 
Var ( yPPS )   
n  i 1 Pi
Y 2 
1
50  49  0.50
 2

Yi Zi Pi  Zi Z i

0.5 1 0.1

1.2 2 0.2

2.1 3 0.3

3.2 4 0.4
Q. Describe the Lahiri’s Method of selection?
Answer:
 Lahiri’s Method of Selection

 A pair of random numbers is chosen such that one from 1 to N and other 1 to Zmax
(say R)

1
o PI (ith)   Zi / Z max 
N

 if R exceeds the size of the ith unit; then that unit is rejected otherwise it is accepted.

 probability of selecting the ith unit at the first draw.

Sr.No Yi Zi

1 0.5 15

2 1.2 20

3 2.1 7

Virtual University of Pakistan Page 117


STA632-Sampling Techniques

4 3.2 13
Q. Example for Lahiri’s Method

 Selection with Lahiri’s Method

 The Estimated Total


 The estimated value for population total using Lahiri’s method of selection

1 5 yi 144179.05
yPPS    5  28836
n i 1 pi

Q. Explain the concept of Unequal Probability Sampling Without Replacement?


Answer:
 let a sample of two units is selected from a population of N units.
Virtual University of Pakistan Page 118
STA632-Sampling Techniques

 Let the probability of the selection of the ith unit is Pi = Zi/Z.

 Suppose the ith unit is not selected at the first draw but the jth unit is selected (j  i)
then the probability of selecting the jth unit at the first draw is Pj = Zj/Z;

 The conditional probability of selecting the ith unit at the second draw is

 The probability of inclusion of ith unit at the second draw to be included in the sample
is the sum of the product that the jth unit is selected at the first draw and the ith unit is
selected at the second draw given the jth unit is selected at the first draw i.e.
N
Pi
P 1 P
j i
j
j

 The total probability i, the probability of inclusion of the ith population unit to be in the
sample is
N
Pi
 i  Pi   Pj
j i 1  Pj
 N P P 
 Pi 1   j  i 
 i 1 1  Pj 1  Pi 
 The probability that both ith and jth units are in the sample is denoted by ij and is
defined as

Pj Pi
ij  PP
i j i  Pj Pi j  Pi  Pj
1  Pi 1  Pj
 1 1 
 PP
i j   
1  Pi 1  Pj 

Q. The Horvitz Thompson Estimator?


Answer:
 HT Estimator

 The general theory of unequal probabilities sampling without replacement was


presented firstly by Horvitz and Thompson (1952).

 An unbiased estimator suggested by them for population total Y is


n N
yi Yi
yHT     ai ,
i 1 i i 1 i

Virtual University of Pakistan Page 119


STA632-Sampling Techniques

Q. Prove that HT Estimator is unbiased estimator.

 The Horvitz and Thompson in 1952 suggested the estimator total is


n N
yi Yi
yHT     ai ,
i 1 i i 1 i
N N
Yi
E ( yHT )   i  Yi  Y
i 1 i i 1

Q.81 what are the Some Relations of i ?


Answer:
 Relations of i

 The following are some relations of


N
i. 
i 1
i n
N
ii. 
j i
ij  (n  1)  i

N
iii.  (
j i
ij   i j )   i (1   i )

N N N
iv.   i j  n i   2
i 1 j i i 1 i 1
i

 Relation (i)
N
• We know a  n
i 1
i

N
• Taking expectation i 1
i n

 Relation (ii)
N

 j i
ij  (n  1)  i

 is sum of all the probabilities of the samples containing ith and jth units

Virtual University of Pakistan Page 120


STA632-Sampling Techniques

 is the sum of the probabilities containing first and second units; first and third units;
and so on.
 Thus every P(s) containing the first unit occurs (n-1) times in this sum as the sample
has (n-1) other members in it and it occurs once for each of these members.
N N

 ij ,
j i
 j i
ij  (n  1) i

 Relation (iii)
N

 (j i
ij   i j )   i (1   i )

Taking L.H.S
N N N

 (j i
ij   i j )    ij   i   j
j i j i
N N

 j i
ij   i (  j   i )
j 1
N N

 j i
ij   i (  j   i )
j 1

Using (i) and (ii) relation

 Relation (iv)
N N N

  i j  n i   2
i 1 j i i 1 i 1
i

N N N

  i j   i  j
i 1 j i i 1 j i

Q.82 Fine the variance expression of Horvitz Thompson estimator?


Answer:
 HT Estimator

 The Horvitz and Thompson in 1952 suggested the estimator total is


n N
yi Yi
yHT     ai ,
i 1 i i 1 i
N N
Yi
E ( yHT )   i  Yi  Y
i 1 i i 1

 Variance of HT Estimator

Var( yHT )  E ( yHT )2   E ( yHT )


2

Virtual University of Pakistan Page 121


STA632-Sampling Techniques

n N
yi Yi
yHT     ai ,
i 1 i i 1 i
2
 N Y 
Var ( y HT )  E   ai i 
 i 1  i 
2
  N Y 
  E   ai i  
  i 1  i  
2
 N Y 
Var ( y HT )  E   ai i 
 i 1  i 
2
 N Y 
   E (ai ) i  
 i 1  i 
N Yi 2 
2
 N Yi 
E   ai     E (a i ) 2 
2

 i 1  i   i 1  i 
N 
   E (ai a j ) i i 
Y Y
 i, j i i 
 j i 
2
 N Y 
Var ( y HT )  E   ai i 
 i 1  i 
2
 N Y 
   E (ai ) i  
 i 1  i 

N  
Yi 2   N Yi Yi 
  E (a i ) 2     E (ai a j )
2

 i 1  i  i , j i i 
 j i 
N 2 Y 
2 N
Y Y
    Eai  i 2    E (a i ) E (a j ) i i
 i 1  i  i , j i i
j i
N 2
Yi 
 E (ai2 )   E ( ai )  
2
2  
i 1  i
N YiY j
   E (ai a j )  E ( ai ) E ( a j ) 
i , j 1  i j 
j i
N
Yi 2 
 E (ai2 )   E (ai )  
2
2  
i 1  i
N YiY j
   E (ai a j )  E (ai ) E (a j ) 
i , j 1  i j 
j i

Virtual University of Pakistan Page 122


STA632-Sampling Techniques

N
Yi 2 
  2 E (ai2 )   E (ai )  
2

i 1  i
 
N YiY j
   E (ai a j )  E (ai ) E (a j ) 
i , j 1  i j 
j i

N
Yi 2
Var ( y HT )   Var (ai ) 
i 1  i2
N YiY j
  
i , j 1
Cov ( ai , a j )
i j
j i

N Yi 2
Var ( y HT )   1   i   i
i 1  i2

   ij   i j 
N YiY j
i ,1  i j
j i

N Yi 2
Var ( y HT )   1   i   i
i 1  i2

   ij   i j 
N YiY j
i ,1  i j
j i

 (
j i
ij   i j )   i (1   i )

N Yi 2
Var ( y HT )   1   i   i
i 1  i2

   ij   i j 
N YiY j
i ,1  i j
j i

N Yi 2
Var ( y HT )   ( i j   ij )
i 1 2 i
j i
N  ij   i j
  YiY j
i , j 1  i j
j i

Virtual University of Pakistan Page 123


STA632-Sampling Techniques

N Yi 2
Var ( y HT )   ( i j   ij )
i 1 2i
j i
N  ij   i j
  YiY j
i , j 1  i j
j i

1 N Yi 2 Y j2
 i j ij  2   2 )
2 i ,1
(    ) (
i j
j i
N  i j   ij
  YiY j
i , j 1  i j
j i

1 N Yi 2 Y j2
 i j ij  2   2 )
2 i ,1
(    ) (
i j
j i
N  i j   ij
  YiY j
i , j 1  i j
j i

1 N
 ( i j   ij )
2 i ,1
j i

 Yi 2 Y j2 YiY j 
 2  2  2 
i  j  i j 
1 N Y Y
 )   ( i j   ij ) ( i  j )2
VarSYG ( yHT
2 i, j i  j
j i

Q. Calculate HT estimator by taking all possible samples of size 2 without replacement.


Further find mean and variance.

Ans:

Small Population Example


 The following is a population with four values with respective size.

Unit No. 1 2 3 4

Yi 0.5 1.2 2.1 3.2


Zi 1 2 3 4

Pi 0.1 0.2 0.3 0.4

 We are interested to take all possible samples of size 2.

Virtual University of Pakistan Page 124


STA632-Sampling Techniques

 HT estimator will be calculated for all the samples. Further mean and variance will be
obtained.

Unit No. 1 2 3 4

Yi 0.5 1.2 2.1 3.2


Zi 1 2 3 4

Pi 0.1 0.2 0.3 0.4

n
yi
yHT  
i 1  i

Pi  Zi Z i

y1 y2
yHT  
1  2
 N P Pi 
 i  Pi 1   j  
 j 1 1  Pj 1  Pi 
 N P P1 
 1  P1 1   j  
 j 1 1  Pj 1  P1 
Pj
1 P  1.456
j

 1  0.1(1  1.456  0.111)


 0.2345
Unit No. 1 2 3 4
Yi 0.5 1.2 2.1 3.2
Zi 1 2 3 4
Pi 0.1 0.2 0.3 0.4

i 0.2345 0.4413 0.6083 0.7159

Pj
1 P  1.456
j

 1  0.1(1  1.456  0.111)


 0.2345

Virtual University of Pakistan Page 125


STA632-Sampling Techniques

 1 1 
 ij  PP   
1  Pi 1  Pj 
i j

 1 1 
 ij  0.1* 0.2  
 0.9 0.8 
 ij  0.1* 0.2 1.111  1.25
 0.0472

n
yi
yHT  
i 1  i

E  yHT
/
  7.0000

Var  yHT   49.8229   7.0000


/ 2

 0.8229.

Q.85 what are the HT Estimator for with Replacement Sampling?


Answer:
 Sample of Four Units

Unit No. 1 2 3 4
Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01
• Yi is the count of animals from the sample of four strips.

 HH Estimator

Unit No. 1 2 3 4

Virtual University of Pakistan Page 126


STA632-Sampling Techniques

Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01

1  60 60 14 1 
     
4  0.05 0.05 0.02 0.01 
 800

 HT Estimator

Unit No. 1 2 3 4
Yi 60 60 14 1
Pi 0.05 0.05 0.02 0.01

 i  1  1  pi 
n

n
yi
yHT  
i 1  i

Q. Perform PPS Sampling With Replacement using the following data in R?

Y
Yi 0.5 1.2 2.1 3.2

Z 1 1 1 4
Zi 1 2 3 4

Answer:
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z
N<-4;n=2;means=c();
#----With PPS-----
for(i in 1:10)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
means[i]<-mean(y[s]) }

Virtual University of Pakistan Page 127


STA632-Sampling Techniques

mean(means)

Q. Select 10000 samples with PPS Sampling With Replacement using the following
data in R. Find mean of each sample. Further find mean of means

Y
Yi 0.5 1.2 2.1 3.2

Z 1 1 1 4
Zi 1 2 3 4

y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z
N<-4;n=2
#----With PPS-----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
means[i]<-mean(y[s]) }
mean(means)

Q. Select 10000 samples with PPS Sampling With Replacement using the following
data in R. Find HH estimator from each sample. Further find mean of HH estimators.

Y
Yi 0.5 1.2 2.1 3.2

Z 1 1 1 4
Zi 1 2 3 4

Answer:
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4);z<-sum(zi);pi<-zi/z
N<-4;n=2
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
hh[i] <- mean(y[s]/pi[s])/N }
mean(hh)

Example:

Answer:
 Population from R

Virtual University of Pakistan Page 128


STA632-Sampling Techniques

 Read in the trees data set from R


 let the variable of interest y be tree volume and draw by draw selection probability be
proportional to girth.
 The variable of interest is tree volume.

# -----Reading the data-----


y <- trees$Volume
zi <- trees$Girth
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4
25.7 24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
zi
8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0 12.9 12.9 13.3
13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6

 Defining Terms

# -----Defining Population----
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
pi
0.02020940 0.02093986 0.02142683 0.02556611 0.02605308 0.02629657 0.02678354
0.02678354 0.02702703 0.02727051 0.02751400 0.02775749 0.02775749 0.02848795
0.02921841 0.03140979 0.03140979 0.03238374 0.03335768 0.03360117 0.03408814
0.03457512 0.03530558 0.0389578 0.03968834 0.04212320 0.04261018 0.04358412
0.04382761 0.04382761 0.05015827

 Sampling With PPS

y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----Sampling with PPS----
s <- sample(1:N,n,replace=TRUE,prob=pi)
y[s]
mean(y[s])
mu <- mean(y)

 With SRS

y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)

Virtual University of Pakistan Page 129


STA632-Sampling Techniques

N<-31;n=10;
#----Sampling with PPS----
s <- sample(1:N,n,replace=TRUE,prob=NULL)
y[s]
mean(y[s])
mu <- mean(y)

 Simulation with PPS

y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----With PPS-----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
means[i]<-mean(y[s]) }
mean(means)

 Simulation with SRSWR

y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#----SRSWR----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=NULL)
means[i]<-mean(y[s]) }
mean(means)

 Observations
 Mean of Population is 30.17
 Mean of means with SRS is 30.25-
 Mean of means with PPS is 33.37

Q. Read the data from the following command and select 10000 samples with PPS
Sampling with replacement using the following data in R. Find HH estimator from each
sample. Further find mean of HH estimators.
y <- trees$Volume
zi <- trees$Girth

Virtual University of Pakistan Page 130


STA632-Sampling Techniques

Answer:
 Population from R

# -----Reading the data-----


y <- trees$Volume
zi <- trees$Girth
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
 Population from R

y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
zi
8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0 12.9 12.9 13.3 13.7
13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6

y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi);N<-31;n=10;
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
hh[i] <- mean(y[s]/pi[s])/N }
mean(hh)

Q. Select 10000 samples with PPS Sampling with replacement using the following data in
R. Find HT estimator from each sample. Further find mean of HT estimators.

Y 0.5 1.2 2.1 3.2


Yi
Y

Z 1 1 1 4
Zi 1 2 3 4

Answer:
# -----Defining Population----
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z;N=4;n=2
#---Calculation of Pi----

Virtual University of Pakistan Page 131


STA632-Sampling Techniques

pii <- 1 - (1-pi)^n

#---Looping-----
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
yu=unique(s)
ht[i] <- sum(y[yu]/pii[yu])/N}
mean(ht)

Q. Select 10000 samples with PPS Sampling without replacement using the following data
in R. Find HT estimator from each sample. Further find mean of HT estimators.

Y 0.5 1.2 2.1 3.2


Yi
Y
Z 1 1 1 4
Zi 1 2 3 4

Answer:
# -----Defining Population----
y<-c(0.5,1.2,2.1,3.2)
zi<-c(1,2,3,4)
z<-sum(zi)
pi<-zi/z;N=4;n=2
#---Calculation of Pi----
piinv<-1-pi
sm<-sum(pi/piinv)
pii <- pi*(1+sm-pi/(1-pi))

 ith HT Estimator

#---Looping-----
hht=c();
for(i in 1:10000)
{s <- sample(1:N,n,replace=FALSE,prob=pi)
hht[i] <- sum(y[s]/pii[s])/N}
mean(hht)

Q. Read the data from the following command and select 10000 samples with PPS
Sampling with replacement using the following data in R. Find HT estimator from each
sample. Further find mean of HT estimators.
y <- trees$Volume
zi <- trees$Girth
Answer:
# -----Reading the data-----

Virtual University of Pakistan Page 132


STA632-Sampling Techniques

y <- trees$Volume
zi <- trees$Girth
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
y
10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3 19.122.2 33.8 27.4 25.7
24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0
zi
8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0 12.9 12.9 13.3 13.7
13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0 20.6

# -----Defining Population----
y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
pi
0.02020940 0.02093986 0.02142683 0.02556611 0.02605308 0.02629657 0.02678354
0.02678354 0.02702703 0.02727051 0.02751400 0.02775749 0.02775749 0.02848795
0.02921841 0.03140979 0.03140979 0.03238374 0.03335768 0.03360117 0.03408814
0.03457512 0.03530558 0.0389578 0.03968834 0.04212320 0.04261018 0.04358412
0.04382761 0.04382761 0.05015827

y <- trees$Volume
zi <- trees$Girth
pi<-zi/sum(zi)
N<-31;n=10;
#---Calculation of Pi----
pii <- 1 - (1-pi)^n
pii
 Values of pii

> pii
0.1846714 0.1907295 0.1947457 0.2281662 0.2320147 0.2339325 0.2377552
0.2377552 0.2396601 0.2415607 0.2434571 0.2453491 0.2453491 0.2509998
0.2566124 0.2732237 0.2732237 0.2804987 0.2877080 0.2895002 0.2930723
0.2966283 0.3019321 0.3279149 0.3330058 0.3497258 0.3530242 0.3595758
0.3612043 0.3612043 0.4022598

#---Looping-----
hht=c();
for(i in 1:10000)
{s <- sample(1:N,n,replace=TRUE,prob=pi)
yu=unique(s)
hht[i] <- sum(y[yu]/pii[yu])/N}

Virtual University of Pakistan Page 133


STA632-Sampling Techniques

mean(hht)

Q. Describe the Brewer selection procedure


Answer:
 Brewer’s Procedure
 Brewer (1963a) suggested a selection procedure. The procedure for a sample of size 2 is
given as
 Select the first unit with probability proportional to

Pi 1  Pi 
1  2 Pi 
 Select the second unit with probability proportional to size of remaining units.

Q. Describe the Durbin’s Procedure

 Durbin’s Procedure

 Durbin (1967) suggested a selection procedure. The procedure for a sample of size 2 is
given as

 Select first unit with probability proportional to size.

 Select second unit with probability proportional to size of

 1 1 
Pj   
1  2 Pi 1  2 Pj 
Q. Describe Shehbaz-Hanif-Samiuddin’s Procedure

 They suggested the procedure in 2003.

 Select first unit with probability proportional to

Pi 1  Pi 
1  2 Pi 
 Select second unit with probability proportional to size of

 1 1 
Pj   
1  2 Pi 1  2 Pj 

Virtual University of Pakistan Page 134


STA632-Sampling Techniques

Estimation with Auxiliary Variable

Virtual University of Pakistan Page 135


STA632-Sampling Techniques

Introduction:
 Auxiliary Variable
 We have discussed the estimation of parameters on the basis of single variable.
 Supporting/supplementary variable may be used at design stage or estimation stage.
 Supporting variable is used to enhance the efficiency of estimation.
 The supporting variable must be correlated with main variable.
 The supplementary information usually referred to as benchmark variable or Auxiliary
variable.
 Graunt (1662) was the first who used auxiliary information to estimate the population
size of England.
 After Graunt(1662), Laplace was the first to introduce the use of auxiliary information for
the estimation of population of France.

X = Cell Phones: 2016


Y = Cell Phones: 2017
X = Working hours
Y = Number of Items Processed
X = the size of unit i
Y = the number of animals in unit i
X = Diameter of the tree
Y = Volume of the tree
 Some Types of Relationships

Virtual University of Pakistan Page 136


STA632-Sampling Techniques

Q. Prove that ratio estimator is almost unbiased for large sample size.
Answer:
 Ratio Estimator

 The Ratio Estimator is

y y
Rˆ  
x x
 Ratio Estimator for Mean and Total
 The Ratio Estimator for population mean

y
yr  X
x
 The Ratio Estimator for population total

y
yr  X
x
 Expectation of Ratio Estimator
 For the large sample size the expectation of ratio estimator is approximately equal
to population ratio, i.e.

E(Rˆ )  R

Virtual University of Pakistan Page 137


STA632-Sampling Techniques

y
Rˆ  R   R
x
y  Rx

x
When n is large
y  Rx
Rˆ  R 
X
E ( y )  RE ( x )
E ( Rˆ  R) 
X
E ( y )  RE ( x )
E ( Rˆ  R) 
X
Y  RX
E ( Rˆ  R ) 
X
E ( Rˆ )  R

Q. Find the variance of Ratio Estimator.


Answer:
 Ratio Estimator

• The Ratio Estimator is

y y
Rˆ  
x x
 Ratio Estimator for Mean and Total
 The Ratio Estimator for population mean

y
yr  X
x
 The Ratio Estimator for population total

y
yr  X
x
 Variance of Ratio Estimator

y  Rx
Rˆ  R 
X
1
E ( Rˆ  R)2  2 E ( y  Rx ) 2
X
d  y  Rx
D  Y  RX
1
E ( Rˆ  R)2  2 V (d )
X

Virtual University of Pakistan Page 138


STA632-Sampling Techniques

N n 1
MSE ( Rˆ ) 
Nn X 2
 1 N 2
 N  1  (Yi  RX i ) 
 i 1 
N  n 1
mse( Rˆ ) 
Nn X 2
 1 n 2
 n  1  ( yi  rxi ) 
 i 1 
 Alternative Expression of Variance

N n 1  1 N 
MSE ( Rˆ )  2  
Nn X  N  1 i 1
(Yi  RX i )2 

N n 1 1 N
SE ( Rˆ )  
2
(Yi  Y )  ( RX i  RX ) 
Nn X N  1 i 1
2

N n 1
MSE ( Rˆ )  2
[ SY2  2 RSYX  R 2 S X2 ]
Nn X
 Expression of Variance for Ratio Estimator of Mean

N n 1
MSE ( Rˆ ) 
Nn X 2
 SY2  2 RSYX  R 2 S X2 
N n
MSE ( yr ) 
Nn
 SY  2 R SY S X  R 2 S X2 
2

MSE ( yr )  Y 2
CY2  2  CY C X  C X2 
C.L.( Rˆ ) : Rˆ  t mse( Rˆ )
C.L.( y ) : yr  t mse( yr )
Q. Find the approximate Bias expression of Ratio Estimator?
Answer:
 Ratio Estimator of Mean

 The Ratio Estimator is

y y
Rˆ  
x x
 Notations

e0   y  Y  Y

Virtual University of Pakistan Page 139


STA632-Sampling Techniques

e1   x  X  X
Using these notations
E  e02   C y2 ,
E  e12   Cx2 ,
E  e0e1   C yx ,
where, C yx   yxC yCx
 Using Notations

y  Y (1  e0 )
x  X (1  e1 )
y
yr  X
x
Y (1  e0 )
yr  X
X (1  e1 )
 Bias of Ratio Estimator

 Y 1  e0 1  e1 
1

 Y 1  e0  1  e1  e12 
yr  Y  Y  e0  e1  e0e1  e12 
1 f
Bias( yr )  Y  Cx2   yxC yCx 
n
1 f
Bias( yr )  Y  Cx2   yxC yCx 
n
1 f
Bias( R)  R  Cx2   yxC yCx 
n

Q. Comparison of Ratio Estimator with Mean Per Unit Estimator?


Answer:
 Ratio Estimator for Mean
 The Ratio Estimator for population mean, i.e.

y
yr  X
x
Var ( yr )  Y 2
[CY2  2  CY C X  C X2 ]
• The mean per unit estimator is

V ( y )  Y 2 CY2
 Comparison

Virtual University of Pakistan Page 140


STA632-Sampling Techniques

Var( y )  Var( yr )  0
Y 2CY2 
Y 2 [CY2  2  CY C X  C X2 ]  0
CY2  [CY2  2CY CX  CX2 ]  0
2CY CX  CX2  0
1 CX

2 CY
Q. Prove that Hartley-Ross is an Unbiased Ratio Estimator
n( N  1) ( y  r x )
rHR  r 
N (n  1) X
Answer:
 Hartley-Ross Estimator

 The HR Estimator is

n( N  1) ( y  r x )
rHR  r 
N (n  1) X
1 n yi
where, r  
n i 1 xi
 The variance of HR Estimator is

1
Var ( rHR ) 
nX2
 SY2  R 2 S X2  2 R PSY S X 
• We know that

 n r
E  r   E   i   E ( ri )
 r 1 n 
E ( ri ) E ( xi )
E r  
E ( xi )
E ( ri ) E ( xi )

X
E ( ri ) E ( xi )
E r   R  R
X
E r   R 
1
 Y  E ( ri ) E ( xi ) 
X 

Virtual University of Pakistan Page 141


STA632-Sampling Techniques

E r   R 
1
 Y  E ( ri ) E ( xi ) 
X 
E r   R 
1
  E ( ri xi )  E ( ri ) E ( xi )
X
yi
Y  E ( yi )  E ( xi )  E ( ri xi )
xi
1
E r   R   cov( ri xi )
X
E r   R 
N
1 1

X N
 (r  R)( x  X )
i 1
i i

1 N 1
E r   R   E ( srx )
X N
E r   R 
1 N 1 N
  (ri  R)( xi  X )
X N ( N  1) i 1
1 N 1
E r   R   E ( srx )
X N
E r   R 
1 N 1 N
  (ri  R)( xi  X )
X N ( N  1) i 1
 Hartley-Ross Estimator

 n n

1  n  r i  xi 
1  n 
srx    ri xi  i 1 i 1     ri xi  nrx 
n  1  i 1 n  n  1  i 1 
 
 
1  n  n
srx   
n  1  i 1
yi  nxr  
 n 1
 y  rx 
1 N 1 1 N 1  n 
E r   R   E ( srx )   E y  rx 
X N X N  n 1 
1 N 1 n
E r   E   y  rx   R
X N n 1 
1 N 1 n
E r   E   y  rx   R
X N n 1 

Virtual University of Pakistan Page 142


STA632-Sampling Techniques

 1 N 1 n
E r   y  rx   R
 X N n 1 
 HR Unbiased Ratio Estimator

 1 N 1 n
E r   y  rx   R
 X N n 1 
E  Estimator   Parameter
n( N  1) ( y  r x )
rHR  r 
N (n  1) X
Q. Prove that regression estimator is an unbiased estimator.
Ans:
yreg  y  ˆ yx ( X  x )
E  y   E  y   ˆ E ( X  x )
reg yx

E  yreg   Y
Q. Find variance of Regression Estimator.
Ans

yreg  y  ˆ yx ( X  x )
yreg  Y (1  e0 ) 
ˆ yx ( X  X (1  e1 ))
yreg  Y 1  e0   ˆ yx Xe1
y  Y 1  e   ˆ Xe
reg 0 yx 1

yreg  Y  Ye0  ˆ yx Xe1

E  yreg  Y   E Ye0  ˆ yx Xe1  


2 2

V ( yreg )  E Ye0   E ˆ yx Xe1  


2 2

 
2 E Ye0 ˆ yx Xe1

 E Ye   E  ˆ 
2 2
0 yx Xe1

2 E Ye ˆ Xe 
0 yx 1

Y 2C y2  ˆ yx
2
X 2C x2 
 
 2YX ˆ yx  C yC x 

Virtual University of Pakistan Page 143


STA632-Sampling Techniques

 2 2 2
2 Sy

Y C y   2 X C x 
2 2

Sx
 
 Sy 
 2YX   C yC x 
 Sx 
 2 2 2
2 Sy
  2 2 2
2 Sy

Y C y   2 X C x  Y C y   2 X C x 
2 2 2 2

Sx Sx
   
 S   S 
 2YX  y  C yC x   2YX  y  C yC x 
 Sx   Sx 
V ( yreg )   S y 1   
2 2

Q.Find expected value and MSE of ratio estimator by taking all possible samples of size 2
from the following population.
Unit No. 1 2 3 4

Yi 5 12 15 28
Xi 11 21 32 14

Answer:
 Small Population Example
 The following is an artificial population with four values.

Unit No. 1 2 3 4

Yi 5 12 15 28
Xi 11 21 32 14

 We are interested to take all possible samples of size 2.


 Calculate expected value for the ratio estimator.

Unit No. 1 2 3 4
Yi 5 12 15 28
Xi 11 21 32 14

y
X
x
X  19.5

Virtual University of Pakistan Page 144


STA632-Sampling Techniques

Unit y x y/x yr

1,2 8.5 16 0.53 10.36

1,3 10 21.5 0.47 9.07

1,4 16.5 12.5 1.32 25.74

2,3 13.5 26.5 0.51 9.93

2,4 20 17.5 1.14 22.29

3,4 21.5 23 0.93 18.23

6
MSE  yr     yri  Y  P( s )
6 2
1
MSE  yr     yr  15
2

i 1 6 i 1

MSE  yr   43.54

Q. Read the data from the following command and select 10000 samples by simple random
smapling without replacement. Find ratio estimator from each sample. Further find mean
of ratio estimators and compare it with mean per unit estimator.
y <- trees$Volume
zi <- trees$Girth
Answer:
 Example

 The Ratio Estimator is

y
yr  X
x
 Defining Population

#----Defining Population
y <- trees$Volume
x <- trees$Girth
N <- 31; n <- 4
mux <- mean(x)
r<-c();mratio<-c();

 Relative Efficiency

Virtual University of Pakistan Page 145


STA632-Sampling Techniques

#----repeating 10000 times


for(i in 1:10000)
{s<-sample(1:N,n)
m[i]<-mean(y[s])
r[i] <- mean(y[s])/mean(x[s])
mratio[i] <- r[i]*mux}
var(m);var(mratio)
re<-var(m)/var(mratio);re

Q. Generate a population of size 1000 for the given parameters using bivariate normal
distribution, such that
Y , X  as    2 2 .
 1 0.85

0.85 1 
We consider sample sizes: n  10 . Select 10,000 random samples considering SRSWOR
and calculate
X
ty
x
Further, calculate the MSEs for above estimators and calculate relative efficiencies with
respect to mean per unit estimator ( y )

Answer:
library(mvtnorm)
N=1000;ryx=0.85; n=10;
m=c(2,2); # vector of mean.
# variance covariance matrix is given below.
sig=matrix(c(1,0.85,0.85,1),ncol=2);
r=rmvnorm(N,m,sig);
x=r[,2];y=r[,1];
data=data.frame(x,y);
plot(y,x)
 Simulation Study

r<-c();mratio<-c();m=c();
for(i in 1:10000)
{s<-sample(1:N,n)
m[i]<-mean(y[s])
r[i] <- mean(y[s])/mean(x[s])
mratio[i] <- r[i]*mux}
var(m);var(mratio)
re<-var(m)/var(mratio);re

Virtual University of Pakistan Page 146


STA632-Sampling Techniques

Q. Find Bias of Product Estimator?


Answer:
 Product Estimator

y
yr  X
x
yx
yp 
X

 Bias of Product Estimator

e0   y  Y  Y
e1   x  X  X
Using these notations,
E  e02   C y2 ,
E  e12   Cx2 ,
E  e0e1   C yx ,
where, C yx   yxC yCx
 Using Notations

y  Y (1  e0 )
x  X (1  e1 )
yx
yp 
X
Y (1  e0 ) X (1  e1 )
yp 
X
 Y 1  e0 1  e1 
 Bias of Product Estimator

 Y 1  e0 1  e1 
 Y 1  e0  e1  e0e1 
y p  Y  Y  e0  e1  e0e1 
E  y p  Y   YE  e0  e1  e0e1 
Bias  y p   YE  e0  e1  e0e1 
Bias  y p   YE  e0e1 
E  e0e1   C yx ,
where, C yx   yxC yCx

Virtual University of Pakistan Page 147


STA632-Sampling Techniques

Bias  y p   Y C yx
Q. Find Mean Square Error of Product Estimator?
Answer:
 MSE of Product Estimator

yx
yp 
X
e0   y  Y  Y
e1   x  X  X
Using these notations,
E  e02   C y2 ,
E  e12   Cx2 ,
E  e0e1   C yx ,
where, C yx   yxC yCx
 Using Notations

yx
yp 
X
Y (1  e0 ) X (1  e1 )
yp 
X
 Y 1  e0 1  e1 
 MSE of Product Estimator

 Y 1  e0 1  e1 
 Y 1  e0  e1  e0e1 
y p  Y  Y  e0  e1  e0e1 
E  y p  Y   YE  e0  e1  e0e1 
E  y p  Y   Y 2 E  e0  e1  e0e1 
2 2

MSE  y p   Y 2 E  e0  e1  e0e1 
2

MSE  y p   Y 2 E  e0  e1 
2

MSE  y p   Y 2 E  e02  e12  2e0e1 


MSE  y p  
Y 2  C y2  Cx2  2  C yC y 
MSE  yr  
Y 2  C y2  Cx2  2  C yC y 

Virtual University of Pakistan Page 148


STA632-Sampling Techniques

Q. Find expected value and MSE of product estimator by taking all possible samples of
size 2 from the following population.
Unit No. 1 2 3 4

Yi 5 12 15 28
Xi 32 21 14 11

Answer:
 Small Population Example

 The following is an artificial population with four values.

Unit No. 1 2 3 4

Yi 5 12 15 28
Xi 32 21 14 11

 We are interested to take all possible samples of size 2.

 Calculate expected value for the product estimator.

yx
X  19.5
X
 All Possible Samples

Virtual University of Pakistan Page 149


STA632-Sampling Techniques

 Expected Value

Unit

1,2 8.5 26.5 11.55

1,3 10 23 11.79

1,4 16.5 21.5 18.19

2,3 13.5 17.5 12.12

2,4 20 16 16.41

3,4 21.5 12.5 13.78

popmean  15 E  y p   13.97

MSE  y p     y pi  Y  P( s)
6
2

i 1

Virtual University of Pakistan Page 150


STA632-Sampling Techniques

6
1
MSE  yr     yr  15
2

i 1 6
MSE  y p   7.36

Q.109 Generate a population of size 1000 for the given parameters using bivariate normal
distribution, such that
Y , X  as    2 2 .
 1 0.85

 0.85 1 
We consider sample sizes: n  10 . Select 10,000 random samples considering SRSWOR
and calculate
x
ty
X
Further calculate the MSEs for above estimators and calculate relative efficiencies with
respect to mean per unit estimator ( y )

Answer:
 Defining Population

library(mvtnorm)
N=1000;ryx=-0.85; n=10;
m=c(2,2); # vector of mean.
# variance covariance matrix is given below.
sig=matrix(c(1,-0.85,-0.85,1),ncol=2);
r=rmvnorm(N,m,sig);
x=r[,2];y=r[,1];
data=data.frame(x,y);

p<-c();mp<-c();m=c();
for(i in 1:10000)
{s<-sample(1:N,n)
m[i]<-mean(y[s])
p[i] <- mean(y[s])*mean(x[s])
mp[i] <- p[i]/mux}
var(m);var(mp)
re<-var(m)/var(mp);re

Virtual University of Pakistan Page 151


STA632-Sampling Techniques

Ratio Estimator in Stratified Sampling

Virtual University of Pakistan Page 152


STA632-Sampling Techniques

Q. Find the expression of Bias of Combined Type Ratio Estimator.


Answer:
 Combined Type Ratio Estimator in Stratified Sampling

 The combined type ratio estimator is

yst
yrst  X
xst
 Notations

e0st   yst  Y  Y
e1st   xst  X  X
Using these notations,

E e   W  C
k
2 2 2
0 st h h yh ,
h 1

E  e12st   Wh2hCxh2 ,
k

h 1
k
E  e0 st e1st   Wh2hC yxh
h 1

where, C yxh   yxhC yhCxh


 Bias of the Estimator

yst  Y (1  e0 st )
xst  X (1  e1st )
yst
yrst  X
xst
Y (1  e0 st )
yrst  X
X (1  e1st )
 Bias of Ratio Estimator

 Y 1  e0 st 1  e1st 
1

 Y 1  e0 st  1  e1st  e12st 
 e0 st  e1st 
yrst  Y  Y  2 
 e0 st e1st  e1st 
k
E  e0 st e1st   Wh2hC yxh
h 1

Virtual University of Pakistan Page 153


STA632-Sampling Techniques

Bias( yrst ) 

Y Wh2h  Cxh   yxhC yhCxh 


k
2

h 1
Q. Find the expression of MSE of Combined Type Ratio Estimator?
Answer:
 Combined Ratio Estimator in Stratified Sampling

 The combined type ratio estimator is

yst
yrst  X
xst
 Notations

e0st   yst  Y  Y
e1st   xst  X  X
Using these notations

E  e02st   Wh2hC yh
k
2
,
h 1

E  e12st   Wh2hCxh2 ,
k

h 1
k
E  e0 st e1st   Wh2hC yxh
h 1

where, C yxh   yxhC yhCxh


 MSE of Combined Type Ratio Estimator

yst  Y (1  e0 st )
xst  X (1  e1st )
yst
yrst  X
xst
Y (1  e0 st )
yrst  X
X (1  e1st )
 Y 1  e0 st 1  e1st 
1

 Y 1  e0 st 1  e1st 
yrst  Y 1  e0 st  e1st 
yrst  Y  Y  e0 st  e1st 
E  yrst  Y   Y 2 E  e0 st  e1st 
2 2

Virtual University of Pakistan Page 154


STA632-Sampling Techniques

yrst  Y 1  e0 st  e1st  e0 st e1st 


E  yrst  Y   Y 2 E  e0 st  e1st 
2 2

E  yrst  Y  
2

 E  e02st   E  e12st  
Y 2

 2 E  e1st e1st  
 
MSE  yrst  
L C yh
2
 C xh2 
Y W  
2 2

 2C yxh 
h h
h 1

MSE  yrst  
C yh
L 2
 C xh2 
Y W  
2 2

 2C yxh 
h h
h 1

 Example

 Consider a population of size 700 consisting on three strata such that N1=100, N2=250
and N3=350. The required sample size is 18.

 The population consisting on two variables Y and X.

 The population mean for the Variable Y and X is 15 and 62.14, respectively.

 The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as 4, 8
and 6, respectively.

 The sample from each stratum is chosen as (y,x)

Stra-1 Stra-2 Stra-3

1,22 7,39 23,92

3,29 12,55 14,65

2,25 8,42 20,84

5,32 5,30 22,89

11,51 24,96

Virtual University of Pakistan Page 155


STA632-Sampling Techniques

10,49 17,68

9,45

12,54

 Considering the Variable Y

Stra-1 Stra-2 Stra-3


1 7 23
3 12 14
2 8 20
5 5 22
11 24
10 17
9
12
mean 2.75 9.25 20
Nh 100 250 350
Sh 1.708 2.493 3.847
nh 4 8 6

 Considering the Variable X

Stra-1 Stra-2 Stra-3


22 39 92
29 55 65
25 42 84
32 30 89
51 96
49 68
45
54
mean 27 45.63 82.33
Nh 100 250 350
Sh 4.397 8.450 12.91

Virtual University of Pakistan Page 156


STA632-Sampling Techniques

nh 4 8 6

 Sample Means

• Sample mean of variable Y

k k
yst   Wh y h   N h yh / N
h 1 h 1
1
yst   N1 y1  N 2 y2  N 3 y3 
N
• Sample mean of variable X

k k
xst   Wh xh   N h xh / N
h 1 h 1
1
xst   N1x1  N 2 x2  N 3 x3 
N
 Sample mean of Y

Stra-
Stra-1 2 Stra-3
mean 2.75 9.25 20
Nh 100 250 350
Sh 1.708 2.493 3.847
nh 4 8 6
k k
yst   Wh y h   N h yh / N
h 1 h 1
1
yst   N1 y1  N 2 y2  N 3 y3 
N
yst  13.70
 Sample mean of X

Stra-1 Stra-2 Stra-3


mean 27 45.63 82.33
Nh 100 250 350
Sh 4.397 8.450 12.91
nh 4 8 6
k k
xst   Wh xh   N h xh / N
h 1 h 1
1
xst   N1x1  N 2 x2  N 3 x3 
N

Virtual University of Pakistan Page 157


STA632-Sampling Techniques

xst  61.32
 Combined Ratio Estimator

 The combined type ratio estimator is

yst
yrst  X
xst
13.70
yrst   62.14
61.32
yrst  13.89
Q.115 Find the expression of Bias and MSE of Separate Type Ratio Estimator in stratified
sampling.
Answer:
 Separate Type Ratio Estimator

 The separate type ratio estimator is


k
yrsst  Wh yrh
h 1
k
yh
yrsst  Wh Xh
h 1 xh
 Bias of Ratio Estimator

Bias( yr ) 
1 f
Y  C x2   yxC yC x 
n
• In case of Stratified Sampling

Bias( yrh ) 
1  fh
Yh  C xh
2
  yxhC yhC xh 
nh
E ( yrh )  Yh 
1  fh
Yh  Cxh2   yxhC yhC xh 
nh
E ( yrh )  Yh 
1  fh
Yh  Cxh2   yxhC yhC xh 
nh
 Bias of Separate Type Ratio Estimator
k k
yrsst  Wh yrh  E  yrsst   Wh E  yrh 
h 1 h 1

Virtual University of Pakistan Page 158


STA632-Sampling Techniques

 1  fh 
Yh  Cxh   yxhC yhCxh  
k
E  yrsst   Wh  Yh  2

h 1  nh 
1  fh
Yh Cxh2   yxhC yhCxh 
k
E  yrsst   Y  Wh
h 1 nh
1  fh
Yh  Cxh2   yxhC yhCxh 
k
E  yrsst   Y  Wh
h 1 nh
 Bias of Separate Type Ratio Estimator

Bias  yrsst  

W  Y  C   yxhC yhCxh 
k
2
h h h xh
h 1
 Expression of MSE for Ratio Estimator of Mean

MSE ( yr ) 
Y 2  C y2  Cx2  2  yxC yCx 
MSE ( yrh ) 
hYh2  C yh2  Cxh2  2 yxhC yhCxh 
 MSE of Separate Type Ratio Estimator
k k
yrsst  Wh yrh  MSE  yrsst   Wh2 MSE  yrh 
h 1 h 1

MSE ( yrh )   Y h h
2
C 2
yh  C  2 yxhC yhCxh 
2
xh

MSE  yrsst   Wh2hYh2  C yh  Cxh2  2 yxhC yhCxh 


k
2

h 1

 Nh
2
k   Yhi  Rh X hi  
MSE  yrsst    Wh2h i 1 
h 1  Nh  1 
 
 Bias and MSE of Separate Type Ratio Estimator

Bias  yrsst  

W  Y  C   yxhC yhCxh 
k
2
h h h xh
h 1

MSE  yrsst  
k  C yh
2
 C xh
2

 W Y 
2 2

 2  yxhC yhC xh 
h h h
h 1  

Virtual University of Pakistan Page 159


STA632-Sampling Techniques

Example

 Consider a population of size 700 consisting on three strata such that N1=100,
N2=250 and N3=350. The required sample size is 18.

 The population consisting on two variables Y and X.

 The population means for the Variable Y and X is 15 and 62.14, respectively.

 The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as
4, 8 and 6, respectively.

 The overall mean of stratum-1, stratum-2 and stratum-3 for variable X is 25, 45
and 85, respectively.

 The sample from each stratum is chosen as (y,x)

Stra-1 Stra-2 Stra-3

1,22 7,39 23,92

3,29 12,55 14,65

2,25 8,42 20,84

5,32 5,30 22,89

11,51 24,96

10,49 17,68

9,45

12,54

 Considering the Variable Y

Virtual University of Pakistan Page 160


STA632-Sampling Techniques

 Considering the Variable X

 Sample Means
 Sample mean of variable Y

k k
yst   Wh y h   N h yh / N
h 1 h 1
1
yst   N1 y1  N 2 y2  N 3 y3 
N
 Sample mean of variable X
k k
xst   Wh xh   N h xh / N
h 1 h 1
1
xst   N1x1  N 2 x2  N 3 x3 
N

Virtual University of Pakistan Page 161


STA632-Sampling Techniques

 Sample Information of Y and X

 Sample mean of variable Y

 Sample mean of variable X

3
yh
yrsst  Wh Xh
h 1 xh
y y y
 W1 1 X 1  W2 2 X 2  W3 3 X 3
x1 x2 x3
 W1 yr1  W2 yr 2  W3 yr 3
y1
y r1  X1
x1
 Separate Type Ratio Estimator
3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1

 W1 yr1  W2 yr 2  W3 yr 3

 Separate Type Ratio Estimator

Virtual University of Pakistan Page 162


STA632-Sampling Techniques

3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1

 W1 yr1  W2 yr 2  W3 yr 3

W1=0.
142857
W2=0.
357143
W3=
0.5
 Separate Type Ratio Estimator
3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1

 W1 yr1  W2 yr 2  W3 yr 3

 0.143  2.55  0.357  9.12


 0.5  20.65

Example

 Consider a population of size 700 consisting on three strata such that N1=100,
N2=250 and N3=350. The required sample size is 18.

 The population consisting on two variables Y and X.

 The population means for the Variable Y and X is 15 and 62.14, respectively.

 The sample size from stratum-1, stratum-2 and stratum-3 is arbitrarily decided as 4, 8
and 6, respectively.

 The overall mean of stratum-1, stratum-2 and stratum-3 for variable X is 25, 45 and
85, respectively.

 The sample from each stratum is chosen as (y,x)

Virtual University of Pakistan Page 163


STA632-Sampling Techniques

 Considering the Variable Y

 Considering the Variable X

Virtual University of Pakistan Page 164


STA632-Sampling Techniques

 Sample Means

 Sample mean of variable Y

k k
yst   Wh y h   N h yh / N
h 1 h 1
1
yst   N1 y1  N 2 y2  N 3 y3 
N
 Sample mean of variable X

k k
xst   Wh xh   N h xh / N
h 1 h 1
1
xst   N1x1  N 2 x2  N 3 x3 
N
 Sample Information of Y and X

 Sample mean of variable Y

 Sample mean of variable X

3
yh
yrsst  Wh Xh
h 1 xh
y y y
 W1 1 X 1  W2 2 X 2  W3 3 X 3
x1 x2 x3
 W1 yr1  W2 yr 2  W3 yr 3
y1
y r1  X1
x1
 Separate Type Ratio Estimator

Virtual University of Pakistan Page 165


STA632-Sampling Techniques

3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1

 W1 yr1  W2 yr 2  W3 yr 3

W1=0.142857
W2=0.357143
W3= 0.5
 Separate Type Ratio Estimator

3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1

 W1 yr1  W2 yr 2  W3 yr 3
W1=0.142857
W2=0.357143
W3= 0.5

 Separate Type Ratio Estimator


3
yh
yrsst  Wh Xh
h 1 xh
3
yrsst  Wh yrh
h 1

 W1 yr1  W2 yr 2  W3 yr 3
 0.143  2.55  0.357  9.12
 0.5  20.65

Q. Find the variance expression of combined Type Regression Estimator?


Answer:
 Regression Estimator

 The Regression Estimator without stratification

Virtual University of Pakistan Page 166


STA632-Sampling Techniques

yreg  y  ˆ yx ( X  x )
S yx
ˆ yx 
S x2
 Combined Type Regression Estimator

 The Regression Estimator under stratified sampling

yregc  yst  ˆyx ( X  xst )


k
yregc  Wh yh 
h 1
k
ˆ yx ( X  Wh xh )
h 1
 Expected Value of Estimator
 Taking Expected Value on both sides

 
E  yregc   WE  yh   ˆ yx  X  Wh E  xh  
k k

h 1  h 1 
 
E  yregc   WY  ˆ yx  X  Wh X 
k k

h 1  h 1 
 
E  yregc   WY  ˆ yx  X  Wh X 
k k

h 1  h 1 
 Expectation of Regression Estimator

E  yregc   Y  ˆ yx  X  X 
E  yreg   Y
 Variance of Regression Estimator

yregc  yst  ˆ yx  X  xst 


V  yregc   E  yregc  Y 
2

 yst 
2

 E 
  yx  X  xst   Y 
ˆ

V  yregc   E  yst  ˆ yx  X  xst   Y 


2

2
 k 
V  yregc   E  Wh yh  ˆ yx ( X  Wh xh )  Y 
k

 h 1 h 1 
2
 k 
V  yregc   E  Wh  yh  Y   ˆ yx Wh  xh  X  
k

 h 1 h 1 

Virtual University of Pakistan Page 167


STA632-Sampling Techniques

2
 k 
V  yregc   E  Wh  yh  Y   ˆ yx Wh  xh  X  
k

 h 1 h 1 
 Variance of Combined Type Regression Estimator

V  yregc  
k  Shy2   02 Shx2 
 W  
2

 2  0 Shxy 
h h
h 1

Q. Find the variance of Separate Type Regression Estimator?


Answer:
 Regression Estimator

 The Regression Estimator without stratification

yreg  y  ˆ yx ( X  x )
S yx
ˆ yx 
S x2
 Separate Type Regression Estimator

 The Combined Type Regression Estimator is

yregc  yst  ˆyx ( X  xst )


 The Separate Type Regression Estimator is

k  yh  
yregs  Wh  
  yxh  X h  xh  
h 1
ˆ
 Expectation of Separate Type Regression Estimator

Virtual University of Pakistan Page 168


STA632-Sampling Techniques

E  yregs  
k  E ( yh )  
 Wh  ˆ 
h 1   yxh  X h  E ( xh )  
E  yregs  
k Yh  
W ˆ 
  yxh  X h  X h  
h
h 1

E  yregs   WhYh  Y
k

h 1
 Variance of Regression Estimator
k
yregs  Wh yh 
h 1
k
ˆ yxh Wh  X h  xh 
h 1

V  yregs   E  yregs  Y 
2

2
 k 
V  yregs   E  Wh yh  ˆ yxh Wh  X h  xh   Y 
k

 h 1 h 1 
2
 k 
V  yregs   E  Wh  yh  Y   Wh ˆ yxh  xh  X h  
k

 h 1 h 1 
 k 2 k
2
 h    Wh2 ˆ 2yxh E  xh  X h  
2
W E y h  Y 
V  yregs    k 
h 1 h 1

 
 2Wh  yxh E  yh  Y  xh  X h 
2 ˆ2

 h 1 
V  yregc  
k  Shy2   yxh
2
Shx2 
 W 2

h h  
h 1  2  yxh Shxy 

Q. Simulation Study for Combined Type Ratio Estimator.


Answer:
 Simulation Study

library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;

Virtual University of Pakistan Page 169


STA632-Sampling Techniques

w1=N1/N;
w2=N2/N;
w3=N3/N;
 Defining Parameters

# mean vectors for stratum 1, 2 and 3.


m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given #below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2);
sig2=matrix(c(1,0.80,0.80,1),ncol=2); sig3=matrix(c(1,0.75,0.75,1),ncol=2);
 Generating Populations

r1=rmvnorm(N1,m1,sig1);
r2=rmvnorm(N2,m2,sig2);
r3=rmvnorm(N3,m3,sig3);
# generate random variable for #stratum 1.
y1=r1[,1];x1=r1[,2]
# generate random variable for #stratum 2.
y2=r2[,1];x2=r2[,2]
# generate random variable for #stratum 3.
y3=r3[,1];x3=r3[,2]
x<-c(x1,x2,x3)
 Looping

 Means and Variances

> mean(yst);mean(xst);mean(rst);
[1] 20.01764
[1] 10.02838
[1] 20.02706

Virtual University of Pakistan Page 170


STA632-Sampling Techniques

> var(yst);var(rst);var(yst)/var(rst)

Q. Define the process of simulation study for combined type ratio estimator also find mean
and variance of estimator using following data

N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;

# mean vectors for stratum 1, 2 and 3.


m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2)
sig2=matrix(c(1,0.80,0.80,1),ncol=2); sig3=matrix(c(1,0.75,0.75,1),ncol=2);

Ans.
library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
w1=N1/N;
w2=N2/N;
w3=N3/N;
# mean vectors for stratum 1, 2 and 3.
m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2)
sig2=matrix(c(1,0.80,0.80,1),ncol=2); sig3=matrix(c(1,0.75,0.75,1),ncol=2);
r1=rmvnorm(N1,m1,sig1);
r2=rmvnorm(N2,m2,sig2);
r3=rmvnorm(N3,m3,sig3);

# generate random variable for #stratum 1.


y1=r1[,1];x1=r1[,2]

# generate random variable for #stratum 2


y2=r2[,1];x2=r2[,2]

# generate random variable for #stratum 3.


y3=r3[,1];x3=r3[,2]
x<-c(x1,x2,x3)
for(i in 1:5000){
sa1=sample(1:N1,n1)
xs1=x1[sa1];ys1=y1[sa1];
sa2=sample(1:N2,n2)

Virtual University of Pakistan Page 171


STA632-Sampling Techniques

xs2=x2[sa2];ys2=y2[sa2];
sa3=sample(1:N3,n3)
xs3=x3[sa3];ys3=y3[sa3];
yst[i]=w1*mean(ys1)+w2*mean(ys2)+w3*mean(ys3);
xst[i]=w1*mean(xs1)+w2*mean(xs2)+w3*mean(xs3);
rst[i]=yst[i]*(mean(x)/(xst[i]))
}
mean(yst);mean(xst);
mean(rst);
var(yst);var(rst);
var(yst)/var(rst)

#Means and Variances


> mean(yst);mean(xst);mean(rst);
> var(yst);var(rst);var(yst)/var(rst)

Q. Define the process of simulation study for seperate type ratio estimator also find mean
and variance of estimator using following data
library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
# mean vectors for stratum 1, 2 and 3.
m1=c(10,5);
m2=c(20,10);
m3=c(30,15)
#variance covariance matrix for #stratum 1, 2 and 3 given #below.
sig1=matrix(c(1,0.85,0.85,1),ncol=2);
sig2=matrix(c(1,0.80,0.80,1),ncol=2); sig3=matrix(c(1,0.75,0.75,1),ncol=2);

Ans:
library(mvtnorm)
#population and sample size.
N=300;N1=100;N2=100;N3=100;n1=15;n2=15;n3=15;n=45;
w1=N1/N;
w2=N2/N;
w3=N3/N;

# mean vectors for stratum 1, 2 and 3.


m1=c(10,5);
m2=c(20,10);
m3=c(30,15)

#variance covariance matrix for #stratum 1, 2 and 3 given #below.


sig1=matrix(c(1,0.85,0.85,1),ncol=2);
sig2=matrix(c(1,0.80,0.80,1),ncol=2); sig3=matrix(c(1,0.75,0.75,1),ncol=2);
r1=rmvnorm(N1,m1,sig1);

Virtual University of Pakistan Page 172


STA632-Sampling Techniques

r2=rmvnorm(N2,m2,sig2);
r3=rmvnorm(N3,m3,sig3);

# generate random variable for #stratum 1.


y1=r1[,1];x1=r1[,2]

# generate random variable for #stratum 2.


y2=r2[,1];x2=r2[,2]

# generate random variable for #stratum 3.


y3=r3[,1];x3=r3[,2]
x<-c(x1,x2,x3)
for(i in 1:5000){
sa1=sample(1:N1,n1)
xs1=x1[sa1];ys1=y1[sa1];
r1[i]=mean(ys1)*(mean(x1)/mean(xs1))
sa2=sample(1:N2,n2)
xs2=x2[sa2];ys2=y2[sa2];
r2[i]=mean(ys2)*(mean(x2)/mean(xs2))
sa3=sample(1:N3,n3)
xs3=x3[sa3];ys3=y3[sa3];
r3[i]=mean(ys3)*(mean(x3)/mean(xs3))
rsst[i]=w1*r1[i]+w2*r2[i]+w3*r3[i];
yst[i]=w1*mean(ys1)+w2*mean(ys2)+w3*mean(ys3);
}
mean(rsst);mean(yst);
var(yst);var(rsst);var(yst)/var(rsst)

Means and Variances


> mean(rsst);mean(yst);

Virtual University of Pakistan Page 173


STA632-Sampling Techniques

Double Sampling

Virtual University of Pakistan Page 174


STA632-Sampling Techniques

Introduction:
Ratio and regression methods of estimation require the knowledge of population mean of auxiliary
variable in advance. An estimate of mean of auxiliary variable from a large sample may be used. This
procedure of selecting a large sample for collecting information on auxiliary variable x and then selecting
a subsample from it for collecting the information on the study variable y is called double sampling or
two-phase sampling.
The estimate is calculated by taking the sample in two phases. The expected value of the statistic
is
E (t )  E1  E2 (t ) 

The variance of the statistic is

Var (t )  E  t  E (t ) 
2

Var (t )  E  t  E2 (t )    E2 (t )  E (t )  
2

Var (t )  E  t  E2 (t )   E  E2 (t )  E (t ) 
2 2

Var (t )  E1 E2  t  E2 (t )   E1E2  E2 (t )  E (t ) 
2 2

Var (t )  E1 V2 (t )   E1  E2 (t )  E1  E2 (t )  
2

Var (t )  E1 V2 (t )   V1  E2 (t ) 

The Ratio Estimator for population mean

y
yr  X
x

The Ratio Estimator for Double Sampling

y2
yrd  x1
x2

Notations
e0   y2  Y  Y

Virtual University of Pakistan Page 175


STA632-Sampling Techniques

e1   x1  X  X

e2   x2  X  X
Using these notations

E  e0   0, E  e02   2C y2 ,

E  e1   0, E  e12   1Cx2 ,

E  e1   0, E  e22   2Cx2 ,

E  e1e2   1Cx2 ,

E  e0e1   1 yxCyCx , E  e0e2   2  yxC yCx ,

Prove that E  e1e2   1Cx


2

Proof:

e1e2 
 x  X  x
1 2 X
 E  e1e2  
E  x1  X  x2  X 
X2 X2

E1E2  x1  X  x2  X 
E  e1e2  
X2

E1  x1  X  x1  X 
E  e1e2  
X2

E1  x1  X 
2

E  e1e2  
X2

1S x2
E  e1e2  
X2

E  e1e2   1Cx2

Virtual University of Pakistan Page 176


STA632-Sampling Techniques

Q. Find Bias of Ratio Estimator in Double Sampling.


The Ratio Estimator for Double Sampling
y2
yrd  x1
x2

e0   y2  Y  Y

e1   x1  X  X

e2   x2  X  X

y2
yrd  x1
x2

Y (1  e0 )
yrd  X (1  e1 )
X (1  e2 )

Y (1  e0 )
yrd  (1  e1 )
(1  e2 )
The Bias of Ratio Estimator in Double Sampling

yrd  Y (1  e0 )(1  e1 )(1  e2 )1

yrd  Y (1  e0 )(1  e1 )(1  e2  e22  ....)

yrd  Y (1  e0 )(1  e1 )(1  e2  e22 )

yrd  Y (1  e0 )(1  e1  e2  e1e2  e22  e1e22 )

yrd  Y (1  e0 )(1  e1  e2  e1e2  e22 )

yrd  Y  Y (e1  e2  e1e2  e22  e0  e0e1  e0e2 )

E  yrd  Y   YE (e1  e2  e1e2  e22  e0  e0e1  e0e2 )

E  yrd  Y   Y  2Cx2  1Cx2  1C yCx  2 C yCx 

Virtual University of Pakistan Page 177


STA632-Sampling Techniques

E  yrd  Y   Y (2  1 )Cx2  (2  1 ) C yCx 

E  yrd  Y   Y  2  1  Cx2  C yCx 

Bias  yrd   Y  2  1  Cx2  C yCx 

Q.Find MSE of Ratio Estimator in Double Sampling.

The Ratio Estimator for Double Sampling

y2
yrd  x1
x2
e0   y2  Y  Y

e1   x1  X  X

e2   x2  X  X

y2
yrd  x1
x2
Y (1  e0 )
yrd  X (1  e1 )
X (1  e2 )

Y (1  e0 )
yrd  (1  e1 )
(1  e2 )

MSE of Ratio Estimator in Double Sampling

yrd  Y (1  e0 )(1  e1 )(1  e2 )1


yrd  Y (1  e0 )(1  e1 )(1  e2  e22  ....)
yrd  Y  Y (e1  e2  e0 )
E  yrd  Y   Y 2 E  e1  e2  e0 
2 2


E  yrd  Y   Y 2 E e12  e22  e02  e0e1  e0e2  e1e2 
2

MSE  yrd   Y 2 2 C y2   2  1   Cx2  2 Cx C y  xy 


MSE  yrd   2Y 2  Cy2  Cx2  2 Cx C y  xy   1Y 2  Cx2  2 Cx C y  xy 

Virtual University of Pakistan Page 178


STA632-Sampling Techniques

Q. Find Bias of Product Estimator in Double Sampling.


The Product Estimator is
x
yp  y
X
The Product Estimator in Double Sampling is
yx
y pd  2 2
x1

e0   y2  Y  Y , e1   x1  X  X , e2   x2  X  X

y2  Y (1  e0 ), x1  X (1  e1 ), x2  X (1  e2 )
Notations
E  e0   0, E  e02   2C y2 , E  e1   0

E  e12   1Cx2 , E  e1   0, E  e22   2Cx2

E  e1e2   1Cx2 , E  e0e1   1  yxC y C x , E  e0e2   2  yxC y C x

Bias of Product Estimator in Double Sampling


y
y pd  2 x2
x1

Y (1  e0 )
y pd  X (1  e2 )
X (1  e1 )
Y (1  e0 )
y pd  (1  e2 )
(1  e1 )
y pd  Y (1  e0 )(1  e2 )(1  e1 )1

Virtual University of Pakistan Page 179


STA632-Sampling Techniques

y pd  Y (1  e0 )(1  e2 )(1  e1  e12  ....)

y pd  Y (1  e0 )(1  e2 )(1  e1  e12 )

y pd  Y (1  e0 )(1  e2  e1  e1e2  e12  e2e12 )

y pd  Y (1  e0 )(1  e2  e1  e1e2  e12 )

y pd  Y  Y (e2  e1  e1e2  e12  e0  e0e2  e0e1 )

E  y pd  Y   YE (e2  e1  e1e2  e12  e0  e0e2  e0e1 )


E  y pd  Y   Y 1C x2  1C x2  2  C y C x  1  C y C x 

Bias  y pd   Y (2  1 )  C y C x 

Q. Find MSE of Product Estimator in Double Sampling.


The Product Estimator for Double Sampling
y2
y pd  x2
x1

e0   y2  Y  Y , e1   x1  X  X , e2   x2  X  X

y2 Y (1  e0 ) Y (1  e0 )
y pd  x2 , y pd  X (1  e2 ), y pd  (1  e2 )
x1 X (1  e1 ) (1  e1 )
The MSE of Product Estimator in Double Sampling

y pd  Y (1  e0 )(1  e2 )(1  e1 ) 1

y pd  Y (1  e0 )(1  e2 )(1  e1  ....)

y pd  Y  Y (e2  e1  e0 )

Virtual University of Pakistan Page 180


STA632-Sampling Techniques

E  y pd  Y   Y 2 E  e2  e1  e0 
2 2

E  y pd  Y   Y 2 E e12  e22  e02  2e0e2  2e0e1  2e1e2


 
2

 
MSE  yrd   Y 2 2 C y2   2  1  C x2  2 C x C y  xy 

  
MSE  yrd   2Y 2 C y2  C x2  2 C x C y  xy  1Y 2 C x2  2 C x C y  xy 
Q. Generate a population of size 1000 for the given parameters using bivariate normal
distribution, such that
Y , X  as   10 2 .
 1 0.85

0.85 1 
Consider sample sizes n1=100;n2=20. Select 10,000 random samples considering double and
calculate
x1
t  y2
x2
Further, calculate the MSEs for above estimator.

Ans:
library(mvtnorm)
N=1000;ryx=0.85; n1=100;n2=20;
m=c(10,2); # vector of mean.

# variance covariance matrix is given below.


sig=matrix(c(1,0.85,0.85,1),ncol=2);
r=rmvnorm(N,m,sig);
x=r[,2];y=r[,1];
r<-c();mratio<-c();m=c();
for(i in 1:10000)
{
s1<-sample(1:N,n1)
s2<-sample(s1,n2)
m[i]<-mean(x[s1])
r[i] <- mean(y[s2])/mean(x[s2])
mratio[i] <- r[i]*m[i]
}

mean(mratio)
var(mratio)

Virtual University of Pakistan Page 181


STA632-Sampling Techniques

Two Stage Sampling

Virtual University of Pakistan Page 182


STA632-Sampling Techniques

Q. Prove that sample mean is unbiased estimator of population mean in two stage sampling
n m y
y  
ij

i 1 j 1 nm

Ans:
The estimate is calculated by taking the sample in two stages
The expected value and variance of the statistic is
E (t )  E1  E2 (t ) 

Var (t )  E1 V2 (t )  V1  E2 (t ) 

Expected Value of Mean


n m y
y   ij
i 1 j 1 nm

E ( y )  E1  E2 ( y ) 

  n m y 
E ( y )  E1  E2   ij  
 
  i 1 j 1 nm  

 n 1  m y 
E ( y )  E1   E2   ij  
 i 1 n 
  j 1 m  

 n 1  m E2  yij   
E ( y )  E1     
 i 1 n  j 1 m  
  

 n 1  M yij 
E ( y )  E1      
 i 1 n j 1 M
  

 n 1 
E ( y )  E1   Yi 
 i 1 n 

 n 1 
E ( y )    E1 Yi  
 i 1 n 

E ( y )  E1 Yi 

Virtual University of Pakistan Page 183


STA632-Sampling Techniques

 N Y 
E( y )    i   E( y )  Y
 i 1 N 
Q. Prove that sample mean is unbiased estimator of population mean in two stage
sampling.
n m y
y  
ij

i 1 j 1 nm
Ans:
n m y n
1
y     yi
ij

i 1 j 1 nm i 1 n

Var (t )  E1 V2 (t )   V1  E2 (t ) 

Var ( y )  E1 V2 ( y )   V1  E2 ( y ) 

 n m yij 
E2 ( y )  E2   
 i 1 j 1 nm 
 n 1 m E2  yij  
E2 ( y )     
 i 1 n j 1 m 

 n 1 M y 
E2 ( y )     ij 
 i 1 n j 1 M 

 n 1 M y 
E2 ( y )     ij 
 i 1 n j 1 M 

 n 1 
E2 ( y )    Yi 
 i 1 n 

E2 ( y )  Yn

Yn is the estimator based on the 1st stage sample of size n, by SRS, we have

Virtual University of Pakistan Page 184


STA632-Sampling Techniques

 N n 2
V (y)   S
 Nn 

 N n 2
V1  E2 ( y )   V1 Yn   V1  E2 ( y )     S1
   Nn 
  n 1 
E1 V2 ( y )   E1 V2   yi  
  i 1 n  
1 n 
  Cov  y , y 
n
1
E1 V2 ( y )   E1  2 V2  yi   2 i j
 n i 1 n i  j 1 

1 n
 1  n M m 2 
E1 V2 ( y )   E1  2
n
V  y   n
i 1
2 i 2
E1  
 i 1 Mm
S2i 

1  n M m  1 M m 
E1 V2 ( y )   2 
E1  S22i    E1  S22i 
n  i 1 Mm  n  Mm 

1  M  m N S22i 
E1 V2 ( y )     
n  Mm i 1 N 

1 M m 2
E1 V2 ( y )   S2 
n  Mm 

Var ( y )  E1 V2 ( y )   V1  E2 ( y ) 

1 M m 2
E1 V2 ( y )   S2 
n  Mm 
 N n 2
V1  E2 ( y )     S1
 Nn 

Var (t )  E1 V2 (t )   V1  E2 (t ) 

1 M m 2  N n  2
Var (t )   S2     S1
n  Mm   Nn 

1 M m 2  N n  2
Var (t )   s2     s1
n  Mm   Nn 

Virtual University of Pakistan Page 185


STA632-Sampling Techniques

Q. Explain two stage sampling for unequal first stage units, also find expected value of
following estimator.
n
M i yi n ui yi
y  
i 1 nM i 1 n
Let the population has N clusters with Mi as size of ith cluster. Let a sample of n first stage units
be selected from this population.Let mi units will be selected at second stage.
Mi= Size of ith cluster

N
M 0   M i  NM
i 1
Mean of ith first stage unit
Mi

y ij
Yi  i 1

Mi
Overall population mean
N

M Y i i
Y i 1

NM
The Expected Value of Mean
n
M i yi n ui yi
y  
i 1 nM i 1 n

E ( y )  E1  E2 ( y ) 
  n uy 
E ( y )  E1  E2   i i 
  i 1 n 

  n u y 
E ( y )  E1  E2   i i  
  i 1 n  
 n uY 
E ( y )  E1   i i 
 i 1 n 

Virtual University of Pakistan Page 186


STA632-Sampling Techniques

 n M i E1 Yi  
E( y )    
 i 1 n 
 

 n E1  uiYi  
E( y )    
 i 1 n 
 

 N uiYi 
E( y )    
 i 1 N 

 N MY 
E( y )    i i 
 i 1 MN 

E( y )  Y
Q. Find the variance expression of following estimator in two stage sampling when first
stage units are not equal in sizes.
n
M i yi n ui yi
y  
i 1 nM i 1 n
Ans
Var (t )  E1 V2 (t )   V1  E2 (t ) 

Var ( y )  E1 V2 ( y )   V1  E2 ( y ) 

  n u y 
V1  E2 ( y )   V1  E2   i i  
  i 1 n  

 n uY 
V1  E2 ( y )   V1   i i 
 i 1 n 

V1  E2 ( y )   V1  y 

1 1 
V1  E2 ( y )      S12
n N 
  n uy 
E1 V2 ( y )   E1 V2   i i 
  i 1 n 

Virtual University of Pakistan Page 187


STA632-Sampling Techniques

 n u2   n u 2 M  mi 2 
E1 V2 ( y )   E1  i2 V2  yi    E1  i2 i S2i 
 i 1 n   i 1 n M i mi 

n
ui2 M i  mi N
ui2 M i  mi S22i
E1 V2 ( y )    2
 
i 1 n M i mi i 1 n M i mi N

Var (t )  E1 V2 (t )  V1  E2 (t )

N
ui2 M i  mi S22i  N  n  2
Var (t )     S1
i 1 n M i mi N  Nn 

Q. Write the program for Two Stage Sampling using R language considering following
data taking n1=3;n2=2. Calculate the sample mean

Cluster-1 125 115 129 134 111

Cluster-2 134 125 142 141 131

Cluster-3 144 122 134 126


143
Cluster-4 114 111 134 131 146

Cluster-5 119 126 122 129 130

Cluster-6 140 125 124 124 115

Ans:
#--Defining Clusters in R----
Clu1<-c(125,115,129,134,111)
Clu2<-c(134,125,142,141,131)
Clu3<-c(144,143,122,134,126)
Clu4<-c(114,111,134,131,146)
Clu5<-c(119,126,122,129,130)
Clu6<-c(140,125,124,124,115)
Clus<-data.frame(Clu1,Clu2,Clu3,Clu4,Clu5,Clu6)
pop<-t(Clus)
1st Stage Sample
n1=3;n2=2;
w<-pop[sample(nrow(pop),n1),]
sc1<-w[1,]
sc2<-w[2,]

Virtual University of Pakistan Page 188


STA632-Sampling Techniques

sc3<-w[3,]
s2_sc1<-sample(sc1,n2)
s2_sc2<-sample(sc2,n2)
s2_sc3<-sample(sc3,n2)
f_s=c(s2_sc1,s2_sc2,s2_sc3)
est=mean(f_s)
Q2: Generate the three clusters such that
Cluster-1 from normal distribution with mean zero and variance 1.
Cluster-2 from normal distribution with mean 2 and variance 5.
Cluster-3 from normal distribution with mean 4 and variance 9.
Calculate the sample mean using two stage cluster sampling taking n1=2;n2=15.
Ans:
Clu1<-rnorm(100,0,1)
Clu2<-rnorm(100,2,5)
Clu3<-rnorm(100,4,9)
Clus<-data.frame(Clu1,Clu2,Clu3)
pop<-t(Clus)
#1st Stage Sampling
n1=2;n2=15;
w<-pop[sample(nrow(pop),n1),]
sc1<-w[1,]
sc2<-w[2,]
#2nd Stage Sampling
s2_sc1<-sample(sc1,n2)
s2_sc2<-sample(sc2,n2)
f_s=c(s2_sc1,s2_sc2)
est=mean(f_s)
Q.Find Bias of Ratio Estimator in Two Stage Sampling.
Ans:
The Ratio Estimator of mean under two stage sampling when the first stage units are equal.

y
yr 2 s  X
x

The estimator of mean in two stage sampling is


n m y n
1
y   ij   yi
i 1 j 1 nm i 1 n

1 M m 2  N n  2
V (y)  S2     S1
n  Mm   Nn 

Some important notations


e0   y  Y  Y , e1   x  X  X
E (e0 )  E (e1)  0, E (e02 )  V02 , E (e12 )  V20 , E (e0e1)  V11

Virtual University of Pakistan Page 189


STA632-Sampling Techniques

V ( y) 1 M m 2   N n  2
V02   2 S2 y     S1 y
Y 2
nY  Mm   Nn 
V (x ) 1 M m 2   N n  2
V20  2  S2 x     S1x
X nX 2  Mm   Nn 
1  1 N  M m  N n 
V11    ( xy 2i S y 2i S x 2i )    ( xy1S y1S x1) 
YX  nN i 1 Mm   Nn  
1 Mi
S y 2i 
2
 ( yij  Yi )2
M i  1 i 1
S xy 2i
 xy 2i 
S x 2 i S y 2i
y  Y (1  e0 ), x  X (1  e1 )

y Y (1  e0 )
yr 2 s  X , yr 2 s  X
x X (1  e1 )
Bias of Ratio Estimator Under Two Stage Sampling

yr 2 s  Y 1  e0 1  e1   yr 2 s  Y 1  e0  1  e1  e12 
1

yr 2 s  Y 1  e1  e12  e0  e0e1  e0e12 

yr 2 s  Y  Y  e0  e1  e12  e0e1  e0e12 

yr 2 s  Y  Y  e0  e1  e12  e0e1 

E  yr 2 s  Y   YE  e0  e1  e12  e0e1 

Bias  yr 2 s   Y V20  V11 

Q.Find MSE of Ratio Estimator in Two Stage Sampling.


Ans:
The Ratio Estimator of mean under two stage sampling when the first stage units are equal
y
yr 2 s  X
x

Virtual University of Pakistan Page 190


STA632-Sampling Techniques

Notations

E (e0 )  E (e1)  0, E (e02 )  V02 , E (e12 )  V20 , E (e0e1)  V11

e0   y  Y  Y , e1   x  X  X

y  Y (1  e0 ), x  X (1  e1 )
y
yr 2 s  X
x

Y (1  e0 )
yr 2 s  X
X (1  e1 )
MSE of Ratio Estimator Under Two Stage Sampling
yr 2 s  Y 1  e0 1  e1   yr 2 s  Y 1  e0 1  e1 
1

yr 2 s  Y 1  e0  e1  e0e1   yr 2 s  Y  Y  e0  e1 
E  yr 2 s  Y   Y 2 E  e0  e1 
2 2

E  yr 2 s  Y   Y 2 E  e02  e12  2e0e1 


2

MSE  yr 2 s   Y 2 V02  V20  2V11 

Q.Find Bias of Product Estimator in Two Stage Sampling.


Ans:
The Product Estimator of mean under two stage sampling when the first stage units are equal
yx
y p2s 
X
Some important notations
E (e0 )  E (e1 )  0, E (e0 2 )  V02

E (e12 )  V20 , E (e0e1 )  V11

e0   y  Y  Y , e1   x  X  X

y  Y (1  e0 ), x  X (1  e1 )

yx
y p2s  ,
X
Y (1  e0 )(1  e1 )
y p2s  X
X

Virtual University of Pakistan Page 191


STA632-Sampling Techniques

Bias of Ratio Estimator Under Two Stage Sampling


y p 2 s  Y 1  e0 1  e1   y p 2 s  Y 1  e0  e1  e0e1 

y p 2 s  Y  Y  e0  e1  e0e1 

E  y p 2 s  Y   YE  e0  e1  e0e1 
Bias  y p 2 s   Y V11 
Q.Find MSE of Product Estimator in Two Stage Sampling
Ans:
The Product Estimator of mean under two stage sampling when the first stage units are equal
yx
y p2s 
X
Notations

E (e0 )  E (e1 )  0,

E (e0 2 )  V02 , E (e12 )  V20

E (e0e1 )  V11

e0   y  Y  Y
e1   x  X  X
y  Y (1  e0 ), x  X (1  e1 )

yx Y (1  e0 )(1  e1 )
y p2s  , y p 2s  X
X X
MSE of Product Estimator
y p 2 s  Y 1  e0 1  e1 
y p 2 s  Y 1  e0  e1  e0e1 
y p 2 s  Y  Y  e0  e1 
E  y p 2 s  Y   Y 2 E  e0  e1   Y 2 E  e02  e12  2e0e1 
2 2

MSE  y p 2 s   Y 2 V02  V20  2V11 

Virtual University of Pakistan Page 192


STA632-Sampling Techniques

Ranked set sampling

Virtual University of Pakistan Page 193


STA632-Sampling Techniques

Introduction:
Ranked set sampling (RSS) is an alternative to simple random sampling that can sometimes offer
large improvements in precision. McIntyre (1952) introduced the basic concept of ranked set
sampling in order to estimate the population means of pasture and forage yields. The RSS
procedure is elaborated as follows
First, a simple random sample of size k is drawn from the population and the k sampling units
are ranked with respect to the variable of interest, say X, by judgment without actual
measurement. Then the unit with rank 1 is identified and taken for the measurement of X. Next,
another simple random sample of size k is drawn and the units of the sample are ranked by
judgment, the unit with rank 2 is taken for the measurement of X and the remaining units are
discarded. Then the unit with rank 2 is identified and taken for the measurement of X. Then
another sample of size 3.

𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏

𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏

𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏

𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏

𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏

𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏

Ranking with Auxiliary Variable


In a case when we are using the auxiliary information (or variable) for ranking the variable of
interest, then ranking error depend on the correlation. The performance of the RSS estimator is
actually based on the ranking of the sample units in variable of interest
Q. Prove that sample mean is an unbiased estimator of mean under ranked set sampling
Ans:
The mean Estimator in RSS is
k

 X  *
i
X*  i 1

k
The Expected Value is

 E  X  
k
*

EX* 
i
i 1

k
We have assumed perfect rankings. X *i  is distributed like the ith order statistic from a continuous
distribution with p.d.f. f(x) and c.d.f. F(x).

Virtual University of Pakistan Page 194


STA632-Sampling Techniques

    x (i 1)(kk!  i)! F ( x)


E X *
i 

i 1
1  F ( x)
k i
f ( x)dx

 E  X  
k
*

EX* 
i
i 1

1 k  

(k  1)!
EX*    F ( x) 1  F ( x) f ( x)dx 
i 1 k i
  kx
k i 1  (i  1)(k  i)! 

1 k   k  1 

E  X      kx    F ( x) 1  F ( x)  f ( x)dx 
* i 1 k i

k i 1   i  1  

k  k 1
k i 

 
EX*     F ( x)  1  F ( x)   dx
i 1
 x f ( x ) 
i 1  i  1  


EX *
   x f ( x)dx  X


EX*  X
Q. Find variance expression for mean estimator under ranked set sampling.
Ans:

The mean Estimator in RSS is

 X  *
i
X*  i 1

By definition

   
2 2
E X *i   X  E X *i   X *i   X *i   X

      X   X 
2 2 2
E X *i   X  E X *i   X *i  *
i

        X   X 
2 2 2
E X *i   X *i   V X *i   E X *i   X *
i

Virtual University of Pakistan Page 195


STA632-Sampling Techniques

V X* 
1 k
   k1   X    X 
k

2 
2 2
E X *i   X *i  2
*
i
k i 1 i 1

Taking ,
k  
2  k  1
 
k

 E X *i  X *i    k x  X     F ( x) 1  F ( x) f ( x)dx 


2 i 1 k i

i 1 i 1    i 1  
k  k 1
k i 

 
 k   x  X  f ( x )     F ( x)  1  F ( x)   dx
2 i 1

 i 1  i  1  

V X   2
1  2 k
 k   X i   X   
2
* *

k  i 1

2
V X*   X   X 
k
1 2
 *
i
k k2 i 1

143. Generate a population of size 1000 following normal distribution with mean 0 and
variance 1. Further calculate the sample mean using ranked set sampling taking sample
size as 6.
Ans:
k=6;rssx=matrix(,k,k);
pop=rnorm(1000,0,1)
for(i in 1:k)
{
s=sample(pop,k)
xs=sort(s,decreasing = FALSE)
rssx[i,]=xs
}
rssx_s=diag(rssx)
est_srs=mean(x)
est_rss=mean(rssx_s)
rssx
Q. Find Bias of Ratio Estimator in Ranked Set Sampling such that
E (e0 )  E (e1)  0
1  x2
E  e12    
k
1

2
 X *
 X  Cx2  Dx2[i ]
rk X 2 X 2 rk 2 i 1 i 
1 y
 
2

E  e0  
k
1
2 
2
2
2
 2
Yi*  Y  C y2  Dy2[i ]
rk Y Y rk i 1
1  yx
E  e0 e1     
k
1
  X *  X Yi*  Y  Cxy  Dxy[i ]
rk YX YXrk 2 i 1 i 
Ans:
The ratio estimator of mean is

Virtual University of Pakistan Page 196


STA632-Sampling Techniques

Y*
y[ rss ]  X
X*
The Variance of Sample Mean Under RSS
2 1 k

V  X *   x  2  X *i   X 
2

k k i 1

Notations
e0  Y *  Y  Y , e1   X *  X  X
E (e0 )  E (e1)  0
1  x2
E  e12    
k
1
2 
2

2
 2
X *i   X  Cx2  Dx2[i ]
rk X X rk i 1
1 y
 
2

E  e02  
k
1
2 
2

2
 2
Yi*  Y  C y2  Dy2[i ]
rk Y Y rk i 1
1  yx
E  e0 e1     
k
1
2 
 X *i   X Yi*  Y  Cxy  Dxy[i ]
rk YX YXrk i 1

Y *  Y (1  e0 )

X *  X (1  e1 )

Y*
y[ rss ]  * X
X

Y (1  e0 )
y[ rss ]  X
X (1  e1 )

Bias of Ratio Estimator Under RS Sampling

Virtual University of Pakistan Page 197


STA632-Sampling Techniques

y[ rss ]  Y 1  e0 1  e1   y[ rss ]  Y 1  e0  1  e1  e12 


1

y[ rss ]  Y 1  e1  e12  e0  e0 e1  e0 e12 

y[ rss ]  Y  Y  e0  e1  e12  e0 e1  e0e12 

y[ rss ]  Y  Y  e0  e1  e12  e0e1 

E  y[ rss ]  Y   YE  e0  e1  e12  e0 e1 

 Cx2  Dx2[i ] 
Bias  y[ rss ]   Y  
   Cxy  Dxy[i ]  

Q.Find MSE of Ratio Estimator in Ranked Set Sampling such that


E (e0 )  E (e1)  0
1  x2
E  e12    
k
1
2 
2

2
 2
X *i   X  Cx2  Dx2[i ]
rk X X rk i 1
1 y
 
2

E  e0  
k
1

2
2
2
 2 2
Y *

i
 Y  C y2  Dy2[i ]
rk Y Y rk i 1
1  yx
E  e0 e1     
k
1
2 
 X *i   X Yi*  Y  Cxy  Dxy[i ]
rk YX YXrk i 1
Ans:

The Ratio Estimator of mean for ranked set sampling is


Y*
y[ rss ]  X
X*

Notations

e0  Y *  Y  Y , e1   X *  X  X

E(e0 )  E(e1)  0,

1  x2
E e    
k
1
2 
2
2
1 2
 2
X *i   X  Cx2  Dx2[i ]
rk X X rk i 1

Virtual University of Pakistan Page 198


STA632-Sampling Techniques

1 y
 
2

E e  
k
1
 Y * Y
2
2
  C y2  Dy2[i ]
rk Y 2 Y 2 rk 2 i 1 i 
0

1  yx
E  e0 e1     
k
1
2 
 X *i   X Yi*  Y  Cxy  Dxy[i ]
rk YX YXrk i 1

Y *  Y (1  e0 )

X *  X (1  e1 )

Y*
y[ rss ]  * X
X

Y (1  e0 )
y[ rss ]  X
X (1  e1 )

MSE of Ratio Estimator Under RS Sampling

y[ rss ]  Y 1  e0 1  e1   y[ rss ]  Y 1  e0 1  e1 


1

y[ rss ]  Y 1  e1  e0   y[ rss ]  Y  Y  e0  e1 

E  y[ rss ]  Y   Y 2 E  e0  e1 
2 2

MSE  y[ rss ]   Y 2 E  e02  e12  2e0e1 


MSE  y[ rss ]   Y 2 C y2  Dy2[i ]  Cx2  Dx2[i ]  2 Cxy  Dxy[i ]  
  C y2  Cx2  2Cxy  
MSE  y[ rss ]   Y 2  
  Dy2[i ]  Dx2[i ]  2 Dxy[i ] 

Similarly, the Bias and MSE of Product Estimator Under RSS


The Product Estimator of mean is
Y* *
y[ pss ]  X
X
Bias of Product estimator is
Bias  y[ pss ]   Y Cxy  Dxy[i ] 
MSE of product estimator

Virtual University of Pakistan Page 199


STA632-Sampling Techniques

MSE  y[ pss ]   Y 2   C y2  Cx2  2Cxy   Dy2[i ]  Dx2[i ]  2Dxy[i ] 


Q. Write some types of sampling schemes using ranked set sampling.
Ranked Set Sampling

𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏

𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏

𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏

𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏

𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏

𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏

Extreme Ranked Set Sampling


Samawi et al. (1996) suggested a modified RSS design named as Extreme ranked set sampling
(ERSS).

𝒙𝟏 𝟏 𝟏 , 𝒙𝟏 𝟐 𝟏 , 𝒙𝟏 𝟑 𝟏 , 𝒙𝟏 𝟒 𝟏 , 𝒙𝟏 𝟓 𝟏 , 𝒙𝟏 𝟔 𝟏

𝒙𝟐 𝟏 𝟏 , 𝒙𝟐 𝟐 𝟏 , 𝒙𝟐 𝟑 𝟏 , 𝒙𝟐 𝟒 𝟏 , 𝒙𝟐 𝟓 𝟏 , 𝒙𝟐 𝟔 𝟏

𝒙𝟑 𝟏 𝟏 , 𝒙𝟑 𝟐 𝟏 , 𝒙𝟑 𝟑 𝟏 , 𝒙𝟑 𝟒 𝟏 , 𝒙𝟑 𝟓 𝟏 , 𝒙𝟑 𝟔 𝟏

𝒙𝟒 𝟏 𝟏 , 𝒙𝟒 𝟐 𝟏 , 𝒙𝟒 𝟑 𝟏 , 𝒙𝟒 𝟒 𝟏 , 𝒙𝟒 𝟓 𝟏 , 𝒙𝟒 𝟔 𝟏

𝒙𝟓 𝟏 𝟏 , 𝒙𝟓 𝟐 𝟏 , 𝒙𝟓 𝟑 𝟏 , 𝒙𝟓 𝟒 𝟏 , 𝒙𝟓 𝟓 𝟏 , 𝒙𝟓 𝟔 𝟏

𝒙𝟔 𝟏 𝟏 , 𝒙𝟔 𝟐 𝟏 , 𝒙𝟔 𝟑 𝟏 , 𝒙𝟔 𝟒 𝟏 , 𝒙𝟔 𝟓 𝟏 , 𝒙𝟔 𝟔 𝟏

Median Ranked set sampling

x1(1)1 x1(2)1 x1(3)1 x1(4)1 x1(5)1 x1(6)1


x2(1)1 x2(2)1 x2(3)1 x2(4)1 x2(5)1 x2(6)1
x3(1)1 x3(2)1 x3(3)1 x3(4)1 x3(5)1 x3(6)1
x4(1)1 x4(2)1 x4(3)1 x4(4)1 x4(5)1 x4(6)1
x5(1)1 x5(2)1 x5(3)1 x5(4)1 x5(5)1 x5(6)1
x6(1)1 x6(2)1 x6(3)1 x6(4)1 x6(5)1 x6(6)1

x1(1)1 x1(2)1 x1(3)1 x1(4)1 x1(5)1

x2(1)1 x2(2)1 x2(3)1 x2(4)1 x2(5)1

Virtual University of Pakistan Page 200


STA632-Sampling Techniques

x3(1)1 x3(2)1 x3(3)1 x3(4)1 x3(5)1

x4(1)1 x4(2)1 x4(3)1 x4(4)1 x4(5)1

x5(1)1 x5(2)1 x5(3)1 x5(4)1 x5(5)1

Mean in Ranked Set Sampling (RSS)


k

 X  *
i
The mean Estimator in RSS is X*  i 1

k
r k

 X   *
ic
In case of r cycles X*  c 1 i 1

rk

Mean in Extreme Ranked Set Sampling (ERSS)

1 r  k /2 * k /2

X *ERSS e    i ,1,c  X k i , k ,c 
rk c 1  i 1
X  *

i 1 2 

1 r   *  k 1 /2 
k 1 /2
X *ERSS o     i ,1,c  X k 1
rk c 1  i 1
X  *
 i , k , c
 X *  k 1  
, c 
 2  
i 1 k ,
2

Mean in Median Ranked Set Sampling (MRSS)

The mean Estimator in Case of even set size

1 r  k /2 * k /2 
X *MRSS e     k   X  k  k  2 
X  *

rk c 1  i 1 i , 2 ,c i 1  2 i , 2 ,c 

The mean Estimator in Case of odd set size

1 r k * 
X *MRSS o    X  k 1  
rk c 1  i 1 i , 2 ,c 

Pair Ranked Set Sampling and Double Ranked Set Sampling are further developments in Ranked
set sampling.

Virtual University of Pakistan Page 201


STA632-Sampling Techniques

Dealing with Non-response

Virtual University of Pakistan Page 202


STA632-Sampling Techniques

Introduction:
The individuals chosen for the sample are not ready to participate in the survey. This is a type of
selection bias Unit/Item nonresponse. The problem of non-response can be dealt with using
following methods.
Sub-sampling of non-respondents
Randomized response technique.
Hansen and Hurwitz (1946) technique
Taking a sub sample of non-respondents after the first mail attempt and then enumerating the sub
sample of non-respondents by personal interviews
n1 n
Hansen & Hurwitz (1946) Estimator y  y1  2 y2'
n n

Q. Prove that following Hansen and Hurwitz estimator is unbiased to population mean.
n1 n
y y1  2 y2'
n n
Ans:
Hansen & Hurwitz (1946) Estimator
n1 n
y*  y1  2 y2'
n n
n 
E  y *   E  1 y1  2 y2' 
n
n n 
n 
E  y *   E1E2  1 y1  2 y2' 
n
n n 
Hansen & Hurwitz (1946) is Unbiased
n n 
E  y*    1 Y  2 Y 
n n 
n n 
E  y*   Y  1  2 
n n
E  y*   Y
Q.Find Variance of following Hansen & Hurwitz (1946) Estimator.
n1 n
y y1  2 y2'
n n
Ans:
The Hansen & Hurwitz (1946) Estimator is
n1 n
y*  y1  2 y2'
n n
Variance
y* 
1
n
 n1 y1  n2 y2   2  y2'  y2 
n
n

1 
V  y *   V   n1 y1  n2 y2   2  y2'  y2 
n
n n 

Virtual University of Pakistan Page 203


STA632-Sampling Techniques

1 
V   n1 y1  n2 y2    S y2
n 

n22 n22
V  y2  y2   2 E  y2'  y2 
' 2
2
n n

n22 n22  '


E  y2  y2   2 E  y2  Y2 
2
 ( y2  Y2 ) 
' 2

n2 n  

n22 
2   2
E y '  Y2   E  y2  Y2   2E  y2'  Y2  ( y2  Y2 ) 
2 2

n  

n22 
 2  E  y2'  Y2   E  y2  Y2  
2 2

n  

 
n22  1 1 2
 2   S y (2)
n  n2 n2 
 
 k 

n2
2 
 k  1 S y2(2)
n

  S y2(2)

V ( y * )   S y2   S y2(2)
150.Generate a population consisting on 1000 values in R program and calculate Hansen &
Hurwitz (1946) Estimator such that n1=80;n2=20;r=10.

N=1000; n=100;n1=80;n2=20;r=10;
pop <- rnorm(N,0,1)
s <- sample(pop,n)
s_r<-pop[1:80]
s_nr<-pop[81:100]
s2<-sample(s_nr,r)

m<-(n1/n)*mean(s_r)+(n2/n)*mean(s2)

Q. Find the expression of Bias of ratio estimator in case of nonresponse on study variable
only, where

Virtual University of Pakistan Page 204


STA632-Sampling Techniques

E (e0* )  E (e1)  0,

 N n  x
2
E e   
2
1  2   Cx
2

 nN  X

E  e0*2  
1
 S y2   S y2(2)   C *2
Y2 
y

E  e0*e1   
S yx
  C yx
YX

Ans:
Ratio estimator when non-response occurs in study variable y is as follow
y*
y  X
*
r
x

V ( y * )   S y2   S y2(2)
Notations
e0*   y *  Y  Y , e1   x  X  X

E (e0* )  E (e1)  0,

 N n  x
2
E  e12     2   Cx
2

 nN  X

E  e0*2  
1
 S y2   S y2(2)   C *2
2  y
Y

E  e0*e1   
S yx
  C yx
YX

y *  Y (1  e0* )

x  X (1  e1 )

y* Y (1  e0* )
yr*  X 
 yr*  X
x X (1  e1 )

Bias of Ratio Estimator Under Non-response

Virtual University of Pakistan Page 205


STA632-Sampling Techniques

 
yr*  Y 1  e0*  1  e1   yr*  Y 1  e0* 1  e1  e12
1

yr*  Y 1  e1  e12  e0*  e0*e1  e0*e12 
yr*  Y  Y  e0*  e1  e12  e0*e1  e0*e12 
yr*  Y  Y  e0*  e1  e12  e0*e1 
E  yr*  Y   YE  e0*  e1  e12  e0*e1 
Bias  yr*    Y Cx2  C yx 
Q. Find the expression of MSE of ratio estimator in case of nonresponse on study variable
only, where
E (e0* )  E (e1)  0,

 N n  x
2
E e   
2
1  2   Cx
2

 nN  X

E  e0*2  
1
 S y2   S y2(2)   C *2
Y2 
y

E  e0*e1   
S yx
  C yx
YX

Ans.
y*
y  X
*
r
x
y  Y (1  e0* )
*

x  X (1  e1 )

y*
y  X
*
r
x

Y (1  e0* )
y 
*
X
X (1  e1 )
r

 
yr*  Y 1  e0*  1  e1   yr*  Y 1  e0* 1  e1 
1

yr*  Y  Y  e0*  e1 

Virtual University of Pakistan Page 206


STA632-Sampling Techniques

E  yr*  Y   Y 2 E  e0*  e1 
2 2

MSE  yr*   Y 2 E  e0*2  e12  2e0*e1 

MSE  yr*   Y 2  C *2
y   Cx  2 C yx 
2

Similary, Bias and MSE of Product Estimator


Product estimator when non-response occurs in study variable y is follow as
y*x
y *p 
X
Bias and MSE of Product Estimator Under Non-response
Bias  y *p    Y C yx 
MSE  y *p   Y 2  C *2
y   Cx  2 C yx 
2

Q. Find Bias and MSE of Ratio Estimator in Case of Non-response on Both Variable,
where
E (e0* )  E (e1* )  0,

E  e0*2  
1
 S y2   S y2(2)   C *2
Y2 
y

E  e1*2  
1
 S x2   S x2(2)   Cx*2
X2 

E  e0*e1*  
1
 S xy   S xy2 (2)   Cxy
*

YX
Ans
Ratio estimator when non-response occurs in study variable y is follow as

y*
yr**  X
x*

Variance of Hansen & Hurwitz (1946) Estimator

V ( y * )   S y2   S y2(2)

V ( x * )   S x2   S x2(2)

e0*   y *  Y  Y

Virtual University of Pakistan Page 207


STA632-Sampling Techniques

e1*   x *  X  X

E (e0* )  E (e1* )  0,

E  e0*2  
1
 S y2   S y2(2)   C *2
2  y
Y

E  e1*2  
1
 S x2   S x2(2)   Cx*2
X2 

E  e0*e1*  
1
 S xy   S xy2 (2)   Cxy
*

YX 
y *  Y (1  e0* )
x *  X (1  e1* )
y*
yr**  X
x*
Y (1  e0* )
yr**  X
X (1  e1* )
Bias of Ratio Estimator Under Non-response
yr**  Y 1  e0* 1  e1*   yr  Y 1  e0 1  e1  e1 
1 ** * * *2

yr**  Y 1  e1*  e1*2  e0*  e0*e1*  e0*e1*2 


yr**  Y  Y  e0*  e1*  e1*2  e0*e1*  e0*e1*2 
yr**  Y  Y  e0*  e1*  e1*2  e0*e1* 
E  yr**  Y   YE  e0*  e1*  e1*2  e0*e1* 
Bias  yr**    Y Cx*2  C *yx 
Notations
y *  Y (1  e0* )
x  X (1  e1* )
y*
yr**  X
x*
Y (1  e0* )
yr*  X
X (1  e1* )
MSE of Ratio Estimator Under RS Sampling
yr**  Y 1  e0* 1  e1*   yr  Y 1  e0 1  e1 
1 ** * *

yr**  Y 1  e1*  e0*  e0*e1*   yr**  Y  Y  e0*  e1* 

Virtual University of Pakistan Page 208


STA632-Sampling Techniques

E  yr**  Y   Y 2 E  e0*  e1* 


2 2

MSE  yr**   Y 2 E  e0*2  e1*2  2e0*e1* 


MSE  yr**   Y 2  C*2
y  Cx  2C yx 
*2 *

roduct estimator when non-response occurs in study variable y is follow as


y*x *
y 
**
p
X
Bias and ME of Product Estimator Under Non-response
Bias  y **
p   Y C yx 
 *
MSE  y **
p   Y  C y  Cx  2C yx 
2 *2 *2 *

Virtual University of Pakistan Page 209


STA632-Sampling Techniques

Randomized Response Technique

Virtual University of Pakistan Page 210


STA632-Sampling Techniques

Introduction:
This technique is useful to estimate the sensitive characteristics in the population. It was first
proposed by S. L. Warner in 1965.Qualitative and quantitative response models. Qualitative
response models are used to estimate the proportion of some behavior or occurrence in a
population.
For example, to estimate the “proportion of people who smoke cigarette today”
The Warner introduced the following model
Z  Yp  (1  p)(1  Y )
p be the probability to answer the sensitive question and Y the true proportion of those
interviewed bearing the sensitive property. Z is the proportion of YES answers.
The Warner model
Z  Yp  (1  p)(1  Y )
It can be transformed as
Z  p 1
Y
2 p 1
For Example
Statement 1: " I smoke cigarettes."
Statement 2: "I never smoke cigarettes."

3  1 1
Y 4 6 1
 
2 1 1 8
6

Example 1.
Statement 1: " I have falsified my tax return.“
Statement 2: " I have never falsified my tax return."
The Warner model
Z  Yp  (1  p)(1  Y )

40 1  1
 Y    1   (1  Y )
50 6  6

4 1
 1
Y 5 6
 
1
2  1
6
Example 2.

Statement 1: " Have you ever used a sick day leave when you weren't really sick? “
Statement 2: " Have you never used a sick day leave when you weren't really sick?”

Virtual University of Pakistan Page 211


STA632-Sampling Techniques

The Warner model


Z  Yp  (1  p)(1  Y )

350 1  1
 Y    1   (1  Y )
500 6  6

7 1
 1
Y 10 6
1
2  1
6
Y  0.2

Quantitative Randomized Response Technique.


Y may the monthly income of the head of a household. Y may be the total value of purchase
orders in a year for a company. Y may be the number of times you bunk the class during a
semester.
Let S be a scrambling variable independent of Y. The respondent is asked to report a scrambled
response for Y given by
Z  Y S
Mean of Response S variable is introduced such that
E S   0

E  Z   E Y 
Variance of Response

V  Z   V Y   V  S 
Q. Generate a population of size 1000 using normal distribution with mean 10 and standard
deviation 2. Further consider sample size n=100. Select 10,000 random samples using the model
Z=S+Y. The variable S is taken to be normal variate with mean equal to zero and standard
deviation equal 1.
(i) Calculate the following estimator considering SRSWOR
tz
(ii) Calculate the variance for above estimators

Ans:

Let S be a scrambling variable independent of Y. The respondent is asked to report a scrambled


response for Y given by
Z  Y S
# Estimation of Mean
N=1000; n=100;
y <- rnorm(N, 10, 2)

Virtual University of Pakistan Page 212


STA632-Sampling Techniques

s <- rnorm(N, 0, 1)
sa <- sample(1:N, n)
sy<-y[sa]; ss=s[sa];
z<-sy+ss
mz=mean(z)
N=1000; n=100;mz=c();
y <- rnorm(N, 10, 2)
s <- rnorm(N, 0, 1)
for(i in 1:10000)
{
sa <- sample(1:N, n)
sy<-y[sa]; ss=s[sa];
mz[i]=mean(sy+ss);
}
mean(mz)
Quantitative RRT with Non-Sensitive Auxiliary Variable.
Y may the monthly income of the head of a household and X may be her current age.Y may be
the total value of purchase orders in a year for a company and X may be the total turnover for
that company in that year.
Scrambled Response
Let Y be the study variable, a sensitive variable which cannot be observed directly due to
respondent bias. Let X be a non sensitive auxiliary variable which has a positive correlation with
Y. Let S be a scrambling variable independent of Y and X.
The respondent is asked to report a scrambled response for Y given by
Z  Y S
Respondent is asked to provide a true response for X.
Mean & Variance of Z
S variable is introduced such that
E S   0
E  Z   E Y 
V  Z   V Y   V  S 
V S 
Cz2  C y2 
Y2
Ratio Estimator of Mean for Sensitive Variable
z
yrs  X
x
Q. Find Bias & MSE of Ratio Estimator Using RRT.
The Ratio Estimator is
z
yrrt  X
x
Notations

Virtual University of Pakistan Page 213


STA632-Sampling Techniques

e0   z  Z  Z , e1   x  X  X

E  e02   Cz2 , E  e12   Cx2 , E  e0e1   C zx ,

where, Czx   zxCz Cx


z  Z (1  e0 ), x  X (1  e1 )
z
yrrt  X
x
Z (1  e0 )
yrrt  X
X (1  e1 )
Bias of Ratio Estimator Using RRT
Bias of Ratio Estimator=  Y 1  e0 1  e1   Y 1  e0  1  e1  e12 
1

yrrt  Y 1  e0 1  e1 
1

1 f
Bias( yr )  Y  Cx2   zxCz Cx 
n

For MSE of Ratio Estimator


z  Z (1  e0 )
x  X (1  e1 )
z
yrrt  X
x
Z (1  e0 )
yrrt  X
X (1  e1 )
MSE of Ratio Estimator  Y 1  e0 1  e1   Y 1  e0 1  e1 
1

yrrt  Y  Y  e0  e1 
E  yrrt  Y   Y 2 E  e02  e02  2e0e1 
2

MSE  yrrt   Y 2  Cz2  Cx2  2 xz Cz Cx 


Q. Find MSE of Ratio Estimator with Two Auxiliary Variables
ANs;
X Z
yrr  y
x z
Notations

Virtual University of Pakistan Page 214


STA632-Sampling Techniques

e0   y  Y  Y , e1   x  X  X , e2   z  Z  Z

E  e0   E  e1   E  e2   0

E  e02   C y2 , E  e12   Cx2 , E  e22   C z2 , E  e0e1   C yx ,

where, C yx   yxC y Cx
y  Y (1  e0 ), x  X (1  e1 ), z  Z (1  e2 )
X Z
yrr  y
x z
Y (1  e0 )
yrr  XZ
X (1  e1 ) Z (1  e2 )
MSE of the Estimator
yrr  Y (1  e0 )(1  e1 )1 (1  e2 )1  Y (1  e0 )(1  e1 )(1  e2 )
yrr  Y (1  e0 )(1  e1  e2  e1e2 )  Y (1  e0 )(1  e1  e2 )
yrr  Y (1  e0  e1  e2 )  yrr  Y  Y  e0  e1  e2 
yrr  Y  Y  e0  e1  e2   E  yrr  Y   Y 2 E  e0  e1  e2 2
2

MSE  yrr   Y 2 E  e02  e12  e22  2e0 e1  2e0 e2  2e1 e2 


MSE  yrr   Y  C y2  Cx2  Cz2  2Cxy  2C yz  2Cxz 
 
MSE  yrr   Y C y2  Cx2  Cz2  2 Cxy  C yz  Cxz 
Q. Define Regression Estimator with Two Auxiliary Variables, also find variance
expression of the estimator.
ANs:
Regression Estimator with Two Auxiliary Variables for population mean
yrreg  y  ˆyx ( X  x )  ˆyz (Z  z )
S S
ˆ yx  yx2 , ˆ yz  yz2
Sx Sz
Notations
e0   y  Y  Y , e1   x  X  X , e2   z  Z  Z
E  e0   E  e1   E  e2   0
E  e02   Cy2 , E  e12   Cx2 , E  e22   Cz2 , E  e0e1   C yx
where, Cyx   yxCyCx
Variance of Regression Estimator
yrreg  y  ˆyx ( X  x )  ˆyz (Z  z )

Virtual University of Pakistan Page 215


STA632-Sampling Techniques

yrreg  Y (1  e0 )  ˆ yx  X  X (1  e1 )   ˆ yz  Z  Z (1  e2 ) 
yrreg  Y 1  e0   ˆ yx  Xe1   ˆ yz  Ze2 

E  yrreg  Y   E Ye0  ˆ yx  Xe1   ˆ yz  Ze2 


2 2

  
2 2
 
V yrreg  E Y 2e02  ˆyx2 X 2  e1   ˆ yz2 Z 2  e2   2YX ˆ yx  e0e1   2YZ ˆ yz e0 e2  2 XZ ˆ yz ˆ yx  e1e2 

V  yrreg   E Y 2e02  ˆyx2 X 2  e1   ˆ yz2 Z 2  e2   2YX ˆ yx  e0e1   2YZ ˆ yz  e0 e2   2 XZ ˆ yz ˆ yx  e1e2 


2 2
 

V  yrreg    Y 2C y2  ˆ yx
2
X 2Cx2  ˆ yz2 Z 2Cz2  2YX ˆ yxC yx  2YZ ˆ yz C yz  2 XZ ˆ yz ˆ yxCxz 

V  yrreg    S y2 1   yx2   yz2  2 yx  yz  xz 

Virtual University of Pakistan Page 216


STA632-Sampling Techniques

Designs for Hard To Detect Populations

Virtual University of Pakistan Page 217


STA632-Sampling Techniques

Q. Define Capture Recapture Sampling.


Select a sample, mark the selected units and released back to the population.
Select second sample independently from the population.
Estimate the proportion of the marked units from the second sample.
Suppose
T = Number of animals in the population
K = Number of animals marked on the first visit
n = Number of animals captured on the second visit
k = Number of recaptured animals that were marked
Since the proportion of the marked subjects in the recaptured sample is likely to be about the
same as the first sample in the whole population
k K

n T
The estimated population size is
nK
T 
k
Example 1.
When you are interested to estimate the homeless people in your city. An initial Sample of 100
homeless people is selected, marked and released. A sample of size 200 is selected on the second
visit and 50 of them were selected in the first visit.
T=??, K=100, n=200, k=50
nK 200 100
T   400
k 50
Example 2.
In a field study K =300 mice are caught in traps, tagged, and released. A few days later the
researchers return to the study area and independently capture n =200 mice, of which they find
that k =50 have tags. T=??, K=300, n=200, k=50
nK 200  300
T 
k 50
Example 3.
We are interested to estimate the size of a population of turtles in a wildlife preserve. An initial
Sample of 20 turtles is selected, marked and released. A sample of size 30 is selected on the
second visit and 10 marked turtles found among them.
nK 20  30
T   60
k 10

Q.Define Line and Point Transects


Mostly used for animal or plant species. Select the sample of n lines of
size L. Observer moves along a selected line and notes the location
relative to the line of every individual of the species detected.
Narrow-Strip Method
y0
D
2w0 L
Ay0
Est.Total  AD 
2w0 L

Virtual University of Pakistan Page 218


STA632-Sampling Techniques

Example
On a line transect of length L =100 meters, a total of y =18 birds were detected at the following
distances (in meters) from the transect line
0, 0, 1, 3, 7, 11, 11, 12, 15, 15, 18, 19, 21, 23, 28, 33, 34, 44.
It is desired to estimate the density of birds in the study region. w0=20
y0
D
2w0 L
Sites are selected as the lines are selected in line transects.
Observations are obtained on selected sites.
Estimation is same as in line transects
163. Explain Adaptive Cluster Sampling
When the selection procedure depends on the observations during the survey
Select an initial sample of size n with a suitable design.
Observe the selected units for a specified condition.
If any of the initially selected unit satisfied the pre-defined condition, its adjacent neighboring
units will be sampled and investigated.

Adaptive Cluster Sampling:

• When the selection procedure depends on the observations during the survey.
• Select an initial sample of size n with a suitable design.
• Observe the selected units for a specified condition.
• If any of the initially selected unit satisfied the pre-defined condition, its adjacent
neighboring units will be sampled and investigated.
The sample mean is
1
wy   in1 wyi
n

Virtual University of Pakistan Page 219


STA632-Sampling Techniques

Virtual University of Pakistan Page 220

You might also like