Comparison and Evaluation of Alternative System Designs

Chapter 12
Comparison and Evaluation

of Alternative System Designs
Prof. Dr. Mesut Güneş ▪ Ch. 12 Comparison and Evaluation of Alternative System Designs 12.1
Contents
•  For two-system comparisons
•  Independent sampling
•  Correlated sampling (common random numbers)
•  For multiple system comparisons
•  Bonferroni approach: confidence-interval estimation, screening, and
selecting the best
•  Metamodels
Purpose
•  Purpose: comparison of alternative system designs.
•  Approach: discuss a few of many statistical methods that can be
used to compare two or more system designs.
•  Statistical analysis is needed to discover whether observed
differences are due to:
•  Differences in design or
•  The random fluctuation inherent in the models
Comparison of Two System Designs
•  Goal: compare two possible configurations of a system
•  Two possible ordering policies in a supply-chain system, two possible
scheduling rules in a job shop
•  Two routing protocols in a network
•  Two different congestion control algorithms on the transport layer
•  Two MAC protocols
•  Approach: the method of replications is used to analyze the

output data
•  The mean performance measure for system i

•  Denoted by θi , i = 1,2,…
•  To obtain point and interval estimates for the difference in
mean performance, namely θ1 – θ2
•  Vehicle-safety inspection example:
•  The station performs 3 jobs: (1) brake check, (2) headlight check, and (3)
steering check.
•  Vehicles arrival: Poisson with rate = 9.5/hour.
•  Present system:
•  Three stalls in parallel (one attendant makes all 3 inspections at each stall).
•  Service times for the 3 jobs: normally distributed with means 6.5, 6.0 and 5.5
minutes, respectively.
•  Alternative system:
•  Each attendant specializes in a single task, each vehicle will pass through three work
stations in series
•  Mean service times for each job decreases by 10% (5.85, 5.4, and 4.95 minutes).
•  Performance measure: mean response time per vehicle (total time from
vehicle arrival to its departure).
•  From replication r of system i, the analyst obtains an
estimate Yir of the mean performance measure θi
•  Assuming that the estimators Yir are (at least approx.)

unbiased:
θ1 = E(Y1r ), r = 1, … , R1

θ2 = E(Y2r ), r = 1, … , R2
•  Goal:
Compute a confidence interval for θ1 – θ2 to compare the
two system designs
•  If CI is totally to the left of 0, strong evidence for the
hypothesis that θ1–θ2<0 (θ1<θ2 )
•  If CI is totally to the right of 0, strong evidence for the

hypothesis that θ1–θ2 >0 (θ1>θ2 )
•  If CI contains 0, no strong statistical evidence that one system

is better than the other
If enough additional data were
collected (i.e., Ri increased), the
CI would most likely shift, and
definitely shrink in length, until
conclusion of θ1<θ2 or θ1>θ2
would be drawn.
•  In this chapter:
•  A two-sided 100(1-α)% CI for θ1 – θ2 always takes the form of:
(Y •1 −Y•2 ) ± tα /2,υ ⋅ s.e.(Y•1 −Y•2 )
Sample Degree Standard error

mean for of of the estimator
system i freedom
•  All three techniques assume that the basic data Yir are approximately
normally distributed.
•  Statistically significant versus practically significant
•  Statistical significance: is the observed difference Y•1 −Y•2 larger than
the variability in Y•1 −Y•2 ?
•  Practical significance: is the true difference θ1 – θ2 large enough to
matter for the decision we need to make?
•  Confidence intervals do not answer the question of practical

significance directly, instead, they bound the true difference within
the range:
(Y •1 − Y•2 )− t α ,υ s.e.(Y•1 − Y•2 ) ≤ θ1 − θ2 ≤ (Y•1 − Y•2 )+ t α ,υ s.e.(Y•1 − Y•2 )

2 2
•  Whether a difference within these bounds is practically significant

depends on the particular problem.
Independent Sampling
Independent Sampling with Equal Variances
•  Different and independent random number streams are
used to simulate the two systems
• All observations of simulated system 1 are statistically
independent of all the observations of simulated system 2.
•  The variance of the sample mean Y•i is:

V (Y•i ) σ i2
V (Y•i ) = = , i = 1,2
Ri Ri
•  For independent samples:

σ 12 σ 22
V (Y•1 − Y•2 ) = V (Y•1 ) + V (Y•2 ) = +
R1 R2
Independent Sampling with Equal Variances
•  If it is reasonable to assume that σ21 = σ22 (approx.) or if R1 = R2,
a two-sample-t confidence-interval approach can be used:
•  The point estimate of the mean performance difference is: Y•1 −Y•2
•  The sample variance for system i is:
1 Ri 2 1 Ri 2
S =
i
2
(Y Y )
∑ ri •i R − 1 ∑
− = Yri − R Y
i •i
2
Ri − 1 r =1 i r =1
•  The pooled estimate of σ2 is:
2 2
( R − 1) S + ( R − 1) S
S p2 = 1 1 2 2
, where υ = R1 + R2-2 degrees of freedom
R1 + R2 − 2
•  CI is given by: (Y •1 − Y•2 )± tα / 2,υ s.e.(Y•1 − Y•2 )
•  Standard error: 1 1
s.e.(Y•1 − Y•2 ) = S p +
R1 R2
Independent Sampling with Unequal Variances
•  If the assumption of equal variances cannot safely be
made, an approximate 100(1-α)% CI can be computed as:
S12 S 22
(
s.e. Y.1 − Y.2 = ) +
R1 R2
• With degrees of freedom:
2
2 2
⎛ S S ⎞
1 2
⎜⎜ + ⎟⎟
R R
υ = ⎝ 12 2 ⎠ 2 , round to an interger
⎛ S12 ⎞ ⎛ S 22 ⎞
⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟
⎝ R1 ⎠ + ⎝ R2 ⎠
R1 − 1 R2 − 1
• In this case, the minimum number of replications

R1 > 7 and R2 > 7 is recommended.
Common Random Numbers (CRN)
•  For each replication, the same random numbers are used to
simulate both systems Æ R1=R2=R.
•  For each replication r, the two estimates, Yr1 and Yr2, are correlated.
•  However, independent streams of random numbers are used on
different replications, so the pairs (Yr1 ,Ys2 ) are mutually independent
for r ≠ s.
•  Purpose: induce positive correlation between Y•1 , Y•2 (for each r)

to reduce variance in the point estimator of Y•1 −Y•2 .
V (Y•1 − Y•2 ) = V (Y•1 ) + V (Y•2 ) − 2 cov(Y•1 , Y•2 )

2 2 Correlation:
σ 1 σ 2 2 ρ12σ 1σ 2 ρ12 is positive
= + −
R R R
•  Compare variance from independent sampling with
variance from CRN:
2 ρ12σ 1σ 2
VCRN = VIND −
R
•  Variance of Y•1 −Y•2 arising from CRN is less than that of

independent sampling (with R1=R2).
•  The estimator based on CRN is more precise, leading to a
shorter confidence interval for the difference.
•  Sample variance of the differences D = Y•1 −Y•2
1 R 2 1 ⎛ R 2 2 ⎞
S =2
D ∑ (Dr − D ) = ⎜ ∑ Dr − RD ⎟
R − 1 r =1 R − 1 ⎝ r =1 ⎠
1 R
where Dr = Yr1-Yr 2 and D = ∑ Dr , with degress of freedom υ = R-1
R r =1
SD
•  Standard error: (
s.e.(D ) = s.e. Y.1 − Y.2 = )
R
•  It is never enough to simply use the same seed for the random-
number generator(s):
•  The random numbers must be synchronized: each random number

used in one model for some purpose should be used for the same
purpose in the other model.
•  Example: if the i-th random number is used to generate a service

time at work station 2 for the 5-th arrival in model 1, the i-th random
number should be used for the very same purpose in model 2.
Common Random Numbers (CRN): Example
•  Vehicle inspection example:
•  4 input random variables:
•  An interarrival time between vehicle n and vehicle n+1,
•  Sn(i) inspection time for task i for vehicle n in model 1 (i=1,2,3; refers to
brake, headlight and steering task, respectively).
•  When using CRN:

•  Same values should be generated for A1, A2, A3, … in both models.
•  However, mean service time for model 2 is 10% less.
•  Two possible approaches to obtain correlated service times:
•  Let Sn(i) be the service times generated for model 1, use:
Sn(i) - 0.1E[Sn(i)]
•  Let Zn(i) as the standard normal variate, σ = 0.5 minutes, use:
E[Sn(i)] + σ Zn(i)
•  For synchronized runs: the service times for a vehicle were generated at
the instant of arrival and stored as its attribute and used as needed.
•  Each replication run of 16 hours
Model 2 with independent random numbers Model 2 with common random numbers
without synchronisation
Model 1
Model 2 with common random numbers
with synchronisation
•  Compare the two systems using independent sampling and CRN
where R = R1 = R2 =10:
•  Independent sampling: Y.1 −Y.2 = −5.4 minutes

with υ = 17, t 0.05,17 = 2.11, S12 = 118.9 and S22 = 244.3, CI : -18.1 ≤ θ1-θ2 ≤ 7.3
•  CRN without synchronization: Y.1 −Y.2 = −1.9 minutes

with υ = 9, t 0.05,9 = 2.26, S D2 = 208.9, CI : -12.3 ≤ θ1 - θ 2 ≤ 8.5
•  CRN with synchronization: Y.1 − Y.2 = 0.4 minutes

with υ = 9, t 0.05,9 = 2.26, S D2 = 1.7, CI : - 0.50 ≤ θ1 - θ 2 ≤ 1.30
CRN with Specified Precision
•  Goal: The error in our estimate of θ1 – θ2 to be less than ε
•  Approach: determine the # of replications R such that the half-
width of CI:
H = tα /2,υ s.e. (Y•1 −Y•2 ) ≤ ε
•  Vehicle inspection example (cont.):

•  R0 = 10, complete synchronization of random numbers
yield 95% CI: 0.4 ± 0.9 minutes
•  Suppose ε = 0.5 minutes for practical significance, we know R is the
smallest integer satisfying R ≥ R0 and:
2
⎛ tα / 2, R −1S D ⎞
R ≥ ⎜⎜ ⎟⎟
⎝ ε ⎠
2
⎛ tα / 2, R0 −1S D ⎞
•  Since tα / 2, R−1 ≤ tα / 2, R0 −1 , a conservation estimate of R is: R ≥ ⎜⎜ ⎟
ε ⎟
•  Hence, 35 replications are needed (25 additional). ⎝ ⎠
Comparison of Several System Designs
•  To compare K alternative system designs
•  Based on some specific performance measure, θi , of system i,
for i = 1, 2, …, K
•  Procedures are classified as:

•  Fixed-sample-size procedures: predetermined sample size is used to
draw inferences via hypothesis tests of confidence intervals
•  Sequential sampling (multistage): more and more data are collected
until an estimator with a prespecified precision is achieved or until
one of several alternative hypotheses is selected
•  Some goals/approaches of system comparison:

•  Estimation of each parameter θ
•  Comparison of each performance measure θi to a control θ1
•  All pair wise comparisons θi - θj for i ≠ j
•  Selection of the best θi
Bonferroni Approach
•  To make statements about several parameters simultaneously,
where all statements are true simultaneously.
•  Bonferroni inequality:
C
P(all statements S i are true, i = 1, ...,C ) ≥ 1 − ∑α
j =1
j = 1−α E
Overall error probability, provides an upper

bound on the probability of a false conclusion
•  The smaller αj is, the wider the j-th confidence interval will be.
•  Major advantage: inequality holds whether models are run with

independent sampling or CRN
•  Major disadvantage: width of each individual interval increases
as the number of comparisons increases.
Bonferroni Approach
•  Should be used only for a small number of comparisons
•  Practical upper limit: about 20 comparisons
•  There are 3 possible applications:

1.  Individual CI’s: Construct a 100(1- αj )% CI for parameter θi ,
where number of comparisons = K.
2.  Comparison to an existing system: Construct a 100(1–αj)% CI for

parameter θi –θ1 (i = 2,3, …, K), number of comparisons = K – 1.
3.  All pairwise: For any 2 different system designs, construct a

100(1–αj )% CI for parameter θi -θj.
Hence, total number of comparisons = K(K – 1)/2.
Bonferroni Approach to Selecting the Best
•  Among K system designs, to find the best system
•  “Best” = the maximum expected performance, where the i-th design
has expected performance θi .
•  Focus on parameters: θ i − max j≠i {θ j } for i = 1, 2,..., K

•  If system design i is the best, it is the difference in performance
between the best and the second best.
•  If system design i is not the best, it is the difference between system
i and the best.
•  Goal: the probability of selecting the best system is at least

1–α, whenever θ i − max j≠i {θ j } ≥ ε
•  Hence, both the probability of correct selection 1-α, and the
practically significant difference ε, are under our control.
•  A two-stage simulation procedure

•  First stage
• Obtain R0 replications from each system
• Delete (screen out) the statistically inferior systems
• If only one system survives, stop!
•  Second stage
• More than one system survived
• Do additional replications to select the best
Metamodeling
Metamodeling
•  Goal: describe the relationship between variables and the
output response.
•  The simulation output response variable, Y, is related to k
independent variables x1, x2, …, xk (the design variables).
•  The true relationship between variables Y and x is represented
by a (complex) simulation model.
•  Approximate the relationship by a simpler mathematical
function called a metamodel, some metamodel forms:
•  Linear regression.
•  Multiple linear regression.
Simple Linear Regression
•  Suppose the true relationship between Y and x is assumed
to be linear, the expected value of Y for a given x is:
E(Y | x) = β0 + β1x
where β0 is the intercept on the Y axis, and β1 is the slope.
•  Each observation of Y can be described by the model:
Y = β0 + β1x + ε

where ε is the random error with mean zero and constant variance σ2
•  Suppose there are n pairs of observations, the method of least
squares is commonly used to estimate β0 and β1.
•  The sum of squares of the deviation between the observations and
the regression line is minimized.
•  The individual observation can be written as:
Yi = β0 + β1xi + εi

where ε1, ε2 ... are assumed to be uncorrelated random variables
•  Rewrite: Y = β ' + β (x -x) + ε
i 0 1 i i
n
where β 0' = β 0 + β1 x and x = ∑ x /n
i =1 i
•  The least-square function (the sum of squares of the deviations):

n n 2 n 2
L =∑ ε =∑ 2
i=1 i i=1
(Y − β
i 0 − β1 xi ) = ∑ #$Yi − β0' − β1 (xi − x)%&
i=1
•  To minimize L, find ∂L / ∂β 0' and ∂L / ∂β1 , set each to zero, and solve for:
Sxy corrected sum
n of cross products
βˆ0' = Y =
n Yi ˆ S xy ∑ Y (x − x)
i =1 i i
of x and Y
∑ i =1 n
and β1 =
S xx
= n
( xi − x ) 2
∑i =1 Sxx corrected sum
of squares of x
Test for Significance of Regression
•  The adequacy of a simple linear relationship should be tested
prior to using the model.
•  Testing whether the order of the model tentatively assumed is

correct, commonly called the “lack-of-fit” test.
•  The adequacy of the assumptions that errors are (normally and

independent) NID(0,σ 2) can and should be checked by residual
analysis.
•  Hypothesis testing: H 0 : β1 = 0 and H1 : β1 ≠ 0
•  Failure to reject H0 indicates no linear relationship between x and Y.
•  If H0 is rejected, implies that x can explain the variability in Y, but

there may be in higher-order terms.
Straight-line Higher-order term

model is is necessary
adequate
•  The appropriate test statistics:
βˆ1
t0 =
MS E / S xx
•  The mean squared error is:
n ei2 S yy − βˆ1S xy
MS E = ∑ i =1 n − 2
=
n−2
which is an unbiased estimator of σ 2 = V(εi)
•  t0 has the t-distribution with n-2 degrees of freedom.

•  Reject H0 if |t0| > tα/2, n-2
Multiple Linear Regression
•  Suppose simulation output Y has several independent variables
(decision variables).
•  The possible relationship forms are:
Y = β0 + β1x1 + β2x2 + …+ βmxm + ε
Y = β0 + β1x1 + β2x2 + ε
Y = β0 + β1x1 + β2x2 + β3x1x2 + ε
Random-Number Assignment for Regression
•  Independent sampling:
•  Assign a different seed or stream to different design points.
•  Guarantees that the responses Y from different design points will be
significantly independent.
•  CRN:
•  Use the same random number seeds or streams for all of the design
points.
•  A fairer comparison among design points (subjected to the same
experimental conditions)
•  Typically reduces variance of estimators of slope parameters, but
increases variance of intercept parameter
Optimization via Simulation
•  Optimization usually deals with problems with certainty, but in
stochastic discrete-event simulation, the result of any simulation run is
a random variable.
•  Let x1,x2,…,xm be the m controllable design variables and Y(x1,x2,…,xm) be

the observed simulation output performance on one run:
•  To optimize Y(x1,x2,…,xm) with respect to x1,x2,…,xm is to maximize or

minimize the mathematical expectation (long-run average) of
performance
E[Y(x1,x2,…,xm)]
•  Example: select the material handling system that has the best chance
of costing less than $D to purchase and operate.
•  Objective: maximize Pr(Y(x1,x2,…,xm) ≤ D).
•  Define a new performance measure:

•  Maximize E(Y’(x1,x2,…,xm)) instead
⎧1, if Y(x1,x 2 ,...xm ) ≤ D

Y ' ( x1,x 2 ,...xm ) = ⎨
⎩0, otherwise
Summary
•  Basic introduction to comparative evaluation of alternative
system design:
•  Emphasized comparisons based on confidence intervals.
•  Discussed the differences and implementation of independent
sampling and common random numbers.
•  Introduced concept of metamodels.

Comparison and Evaluation of Alternative System Designs

Uploaded by

Copyright:

Available Formats

Comparison and Evaluation of Alternative System Designs

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparison and Evaluation of Alternative System Designs

Uploaded by

Copyright:

Available Formats

Chapter 12

Comparison and Evaluation

• Approach: the method of replications is used to analyze the

• The mean performance measure for system i

• Assuming that the estimators Yir are (at least approx.)

• If CI is totally to the right of 0, strong evidence for the

• If CI contains 0, no strong statistical evidence that one system

(Y •1 −Y•2 ) ± tα /2,υ ⋅ s.e.(Y•1 −Y•2 )

Sample Degree Standard error

• Confidence intervals do not answer the question of practical

(Y •1 − Y•2 )− t α ,υ s.e.(Y•1 − Y•2 ) ≤ θ1 − θ2 ≤ (Y•1 − Y•2 )+ t α ,υ s.e.(Y•1 − Y•2 )

• Whether a difference within these bounds is practically significant

• The variance of the sample mean Y•i is:

• For independent samples:

• The pooled estimate of σ2 is:

• CI is given by: (Y •1 − Y•2 )± tα / 2,υ s.e.(Y•1 − Y•2 )

• In this case, the minimum number of replications

• Purpose: induce positive correlation between Y•1 , Y•2 (for each r)

V (Y•1 − Y•2 ) = V (Y•1 ) + V (Y•2 ) − 2 cov(Y•1 , Y•2 )

• Variance of Y•1 −Y•2 arising from CRN is less than that of

• The random numbers must be synchronized: each random number

• Example: if the i-th random number is used to generate a service

• When using CRN:

• Independent sampling: Y.1 −Y.2 = −5.4 minutes

• CRN without synchronization: Y.1 −Y.2 = −1.9 minutes

• CRN with synchronization: Y.1 − Y.2 = 0.4 minutes

• Vehicle inspection example (cont.):

• Procedures are classified as:

• Some goals/approaches of system comparison:

Overall error probability, provides an upper

• Major advantage: inequality holds whether models are run with

• There are 3 possible applications:

2. Comparison to an existing system: Construct a 100(1–αj)% CI for

3. All pairwise: For any 2 different system designs, construct a

• Focus on parameters: θ i − max j≠i {θ j } for i = 1, 2,..., K

• Goal: the probability of selecting the best system is at least

• A two-stage simulation procedure

where β0 is the intercept on the Y axis, and β1 is the slope.

• Each observation of Y can be described by the model:

• The least-square function (the sum of squares of the deviations):

• Testing whether the order of the model tentatively assumed is

• The adequacy of the assumptions that errors are (normally and

• If H0 is rejected, implies that x can explain the variability in Y, but

Straight-line Higher-order term

• The mean squared error is:

• t0 has the t-distribution with n-2 degrees of freedom.

Y = β0 + β1x1 + β2x2 + …+ βmxm + ε

Y = β0 + β1x1 + β2x2 + β3x1x2 + ε

• Let x1,x2,…,xm be the m controllable design variables and Y(x1,x2,…,xm) be

• To optimize Y(x1,x2,…,xm) with respect to x1,x2,…,xm is to maximize or

• Objective: maximize Pr(Y(x1,x2,…,xm) ≤ D).

• Define a new performance measure:

⎧1, if Y(x1,x 2 ,...xm ) ≤ D

You might also like

•  Approach: the method of replications is used to analyze the

•  The mean performance measure for system i

•  Assuming that the estimators Yir are (at least approx.)

•  If CI is totally to the right of 0, strong evidence for the

•  If CI contains 0, no strong statistical evidence that one system

•  Confidence intervals do not answer the question of practical

•  Whether a difference within these bounds is practically significant

•  The variance of the sample mean Y•i is:

•  For independent samples:

•  The pooled estimate of σ2 is:

•  CI is given by: (Y •1 − Y•2 )± tα / 2,υ s.e.(Y•1 − Y•2 )

• In this case, the minimum number of replications

•  Purpose: induce positive correlation between Y•1 , Y•2 (for each r)

•  Variance of Y•1 −Y•2 arising from CRN is less than that of

•  The random numbers must be synchronized: each random number

•  Example: if the i-th random number is used to generate a service

•  When using CRN:

•  Independent sampling: Y.1 −Y.2 = −5.4 minutes

•  CRN without synchronization: Y.1 −Y.2 = −1.9 minutes

•  CRN with synchronization: Y.1 − Y.2 = 0.4 minutes

•  Vehicle inspection example (cont.):

•  Procedures are classified as:

•  Some goals/approaches of system comparison:

•  Major advantage: inequality holds whether models are run with

•  There are 3 possible applications:

2.  Comparison to an existing system: Construct a 100(1–αj)% CI for

3.  All pairwise: For any 2 different system designs, construct a

•  Focus on parameters: θ i − max j≠i {θ j } for i = 1, 2,..., K

•  Goal: the probability of selecting the best system is at least

•  A two-stage simulation procedure

•  Each observation of Y can be described by the model:

•  The least-square function (the sum of squares of the deviations):

•  Testing whether the order of the model tentatively assumed is

•  The adequacy of the assumptions that errors are (normally and

•  If H0 is rejected, implies that x can explain the variability in Y, but

•  The mean squared error is:

•  t0 has the t-distribution with n-2 degrees of freedom.

•  Let x1,x2,…,xm be the m controllable design variables and Y(x1,x2,…,xm) be

•  To optimize Y(x1,x2,…,xm) with respect to x1,x2,…,xm is to maximize or

•  Objective: maximize Pr(Y(x1,x2,…,xm) ≤ D).

•  Define a new performance measure: