Risk and Reliability Analysis
Risk and Reliability Analysis
Risk and Reliability Analysis
Vijay P. Singh
Sharad K. Jain
Aditya Tyagi
Library of Congress Cataloging-in-Publication Data
Singh, V. P. (Vijay P.)
Risk and reliability analysis : a handbook for civil and environmental engineers / Vijay P.
Singh, Sharad K. Jain, Aditya K. Tyagi.
p. cm.
Includes bibliographical references and index.
ISBN-13: 978-0-7844-0891-9
ISBN-10: 0-7844-0891-2
1. Engineering—Management—Handbooks, manuals, etc. 2. Reliability (Engineering)—
Handbooks, manuals, etc. 3. Risk assessment—Handbooks, manuals, etc. I. Jain, S. K. (Sharad
Kumar), 1960- II. Tyagi, Aditya K. III. Title.
TA190.S594 2007
620'.00452—dc22
2006038853
xi
xii Preface
statistical analysis, the central limit theorem, which is discussed next. Distribu-
tions of extremes and other distributions found useful in environmental and
water engineering, such as uniform, triangular, beta, Pareto, logistic, Pearson
type III, and log-Pearson type III distributions, are also presented. Many envi-
ronmental processes can be described by using the concepts of probability and
physical laws. Chapter 6 focuses on impulse response functions as probability
distributions. These distributions have a quasi-physical basis. The concepts dis-
cussed include impulse response of a linear reservoir, cascade of linear reser-
voirs, the Muskingum model, the diffusion model, and the linear channel
downstream model. These concepts are widely used in hydrologic systems anal-
ysis. Many real-world decisions frequently involve more than one variable and
there may not be a one-to-one relationship among them. In such a situation, an
analysis of the joint probabilistic behavior of the variables involved would be
desirable. Chapter 7 presents multivariate distributions, with particular atten-
tion to bivariate distributions. A newly emerging copula methodology is pre-
sented and several bivariate distributions are discussed using this methodology.
The concept of return period is extended to more than one variable.
In statistical analysis, a considerable effort is devoted to deriving parameters
of a distribution, which constitutes the subject matter of Chapter 8. Many tech-
niques are available for this purpose; these include the method of moments, the
method of maximum likelihood, the method of probability weighted moments,
L-moments, and the method of least squares. Discussion of these methods is fol-
lowed by a treatment of the problems of parameter estimation. Besides point
estimates, interval estimation of parameters is carried out to determine the confi-
dence that can be placed in the point estimates and this chapter also includes a
description of interval estimates. The last chapter in the second part, Chapter 9,
deals with entropy. Originating in thermodynamics, the principle of information
theoretic entropy has found applications in many branches of engineering,
including civil and environmental engineering. The fundamental concepts of the
Shannon entropy theory are discussed. The methodology to derive parameters
of normal and gamma distributions by following the Lagrange multiplier
method and the parameter-space expansion method is described. This chapter
also provides a discussion of the fields where the entropy concept has proved to
be useful.
Part 3 of the book, comprising four chapters, deals with Uncertainty Analysis.
Chapter 10 discusses the concepts of error and uncertainty analysis. The focus of
this chapter is on a treatment of the types of uncertainties and analysis of errors.
The Monte Carlo method, a powerful tool to solve a range of problems, is dis-
cussed in Chapter 11. Generation of random numbers comprises an important
part of the Monte Carlo method. Therefore, this chapter provides a discussion of
many techniques that can be used to generate random numbers that follow a
given distribution. Several examples help illustrate the application of Monte
Carlo methods. Because many environmental processes are stochastic and can be
treated as stochastic processes, Chapter 12 gives a preliminary treatment of this
Preface xiii
Preface xi
Acknowledgments xiv
Part I Preliminaries
vii
viii Table of Contents
Chapter 9 Entropy Theory and Its Applications in Risk Analysis ....................... 356
9.1 History and Meaning of Entropy 357
9.2 Principle of Maximum Entropy 360
9.3 Derivation of Parameters of the Normal Distribution Using
Entropy 366
9.4 Determination of Parameters of the Gamma Distribution 374
9.5 Application of Entropy Theory in Environmental and Water
Resources 381
9.6 Closure 389
9.7 Questions 390
Table of Contents ix
References 751
Index 769
About the Authors 785
Part I
Preliminaries
1
Chapter 1
The process of decision making can be traced to the beginning of human civiliza-
tion. However, the nature of problems requiring decisions, the type of decisions,
and the decision making tools have undergone dramatic changes over time. Peo-
ple’s intuitive judgment and cognitive ability; the availability of data; access to
computational tools; environmental and ecological considerations; and social,
political, and economic constraints all influence the process of decision making
and the ensuing decisions. Most day-to-day decisions involve a certain amount
of risk, which is factored, either knowingly or unknowingly, into the decision-
making process.
Planning, design, operation, and management of civil and environmental
engineering systems are greatly affected by the vagaries of nature or the uncer-
tainty of natural events. Nature has immense variability, and the information
available to quantify this variability is usually limited. Nevertheless, decisions
have to be made and implemented. Decision theory attempts to provide a system-
atic approach to making rational decisions. Haimes and Stakhiv (1985) have aptly
summarized the overall philosophy of decision making as shown in Fig. 1-1. This
philosophy presents decision making through a triangle whose three vertices are
occupied by benefit–cost theory, decision theory, and sustainability theory. As
shown in the figure, risk and reliability analysis occupies a central place in the
interaction of certainty and uncertainty; efficiency and equity; and single decision
making and collective decision making. The relative importance of the vertices of
3
4 Risk and Reliability Analysis
Sustainability
theory
Partici-
patory
decisions
Risk and
reliability
analysis
Decision Economic
Decision hierarchy analysis
Benefit–cost
theory
theory
the triangle in the figure changes with social evolution and the development stage
of the society. These days, most societies attach utmost importance to sustainabil-
ity and equity, and decision making therefore is becoming participatory. In line
with the modern-day philosophy and development paradigms, principles of sus-
tainability, equity, and participatory decision making are placed at the apex of the
decision triangle.
Central to rational decision making and risk assessment is uncertainty. Klir
(1991) and Shackle (1961) have argued that the necessity of decision making
results totally from uncertainty. In other words, if the uncertainty did not exist,
there would be no need for decision making or decision making would be rela-
tively simple and straightforward. One of the main causes of uncertainty in nat-
ural systems is the unpredictability of system behavior. For example, flow in a
river varies in time and experiences highs and lows each year. If one were to con-
sider the lowest flow in each year for a number of years, a series of low flows
would be the result. Prediction of these flows cannot be made with certainty. The
same would apply to the highest yearly flows. Another example is the unpre-
dictability of rainfall or for that matter forecasting of climate. Prediction of earth-
quakes also entails a very high degree of uncertainty, as does prediction of
tornadoes.
A risky decision exposes the decision maker to the possibility of some type
of loss but there are many situations when such a decision has to be made. The
Rational Decision Making Under Uncertainty 5
foremost cause involves the vagaries of nature. For example, a decision to build
a project may be risky, because a large flood or hurricane or earthquake might
occur and endanger the structure with resulting loss of life and property. In
addition, a decision may be risky because the natural phenomena are not clearly
understood. Sometimes a risky choice has to be made if the cost of an alternative
that can control the risk is more than the ability or willingness to pay for it. The
ultimate goal is to reduce uncertainty and thereby risk.
the system must satisfy, and (5) the system output or response. As an example,
consider a watershed with the goal of predicting runoff from the watershed as a
function of time for a given rainfall event. Thus, the watershed is the system
here. Rainfall is the input or source for the watershed. Infiltration and evapora-
tion are the sinks of the watershed. The watershed has a certain topography and
channel network, which, in turn, define the watershed geometry. Hydraulic
equations of flow over land areas and in channels are the equations governing
the flow in the watershed. Runoff is generated based on the initial state of the
watershed (antecedent moisture condition) and the upstream and downstream
boundaries. The governing equations must satisfy these conditions. Solution of
these equations yields runoff as a function of time. This is a typical prediction
problem.
Environmental and water resources systems are subject to uncertainties that
are due to natural randomness (or caused by the vagaries of nature) as well as
human-induced errors or factors. For example, consider the problem of predict-
ing runoff from a watershed for a given rainfall event. In this problem, there is
uncertainty in the areal mapping of rainfall, because rainfall varies spatially. In
practice, rainfall is measured only at a point and the point measurement is used
to represent rainfall over an area. Because of inherent randomness in the rainfall
field, there are uncertainties in rainfall measurements caused by wind, angle of
Rational Decision Making Under Uncertainty 7
incidence, raindrop size, and so on. There may also be errors in rainfall measure-
ments resulting from instrumental defects, improper rain gauge location, etc.
Similarly, infiltration and evaporation have uncertainties. The watershed geome-
try also has less than certain elements. The governing equations, expressed as
partial differential equations (PDEs), may themselves be in error. The resistance
parameter, such as Manning’s friction factor, in the momentum equation is spa-
tially variable but only an average value is used. Thus it is also subject to uncer-
tainty. Furthermore, round-off and truncation errors may arise in computations.
These uncertainties in virtually every component of the prediction problem will
introduce uncertainty in the predicted runoff.
This discussion shows that solutions of the aforementioned types of prob-
lems are subject to uncertainty. This leads to the necessity of making decisions
under uncertainty. To make a decision, the problem is to be solved nevertheless.
In the event of uncertainty, a typical problem-solving approach entails preparing
a model of the system, as shown in Fig. 1-3. The model variables are considered
random and are described by laws of probability or probability distribution
functions. Then, parameters of these distributions need to be estimated. One can
then compute the error and thereby the associated risk and reliability of the
model output or the solution of the problem.
It may be noted that the uncertainty resulting from natural causes can be
reduced to some extent by collecting more comprehensive data and using
improved models. Even then, it is not possible to remove the uncertainty beyond
a limit, because nature has immense variability and any model used will be a
simplified depiction of reality. Thus, the objective should be to understand the
causes and sources of uncertainty, deal rationally with uncertainty, and integrate
it with decision making. Klemes (1971) appropriately noted: “Nowadays in
hydrology, and the more so in engineering, uncertainty is still regarded as a
regrettable imperfection in the body of knowledge, as it was in the 19th century
physics. As in physics, it also seems that in hydrology and engineering, progress
lies not in trying to remove the uncertainty at any cost but in learning how to
make it one of the legitimate elements of our concepts.” Of course, this was the
view about 30 years ago, but the perceptions about uncertainty have now begun
to change.
possibility for further extension; and safety. It is not easy to express these consid-
erations in economic terms but there are indirect ways to accomplish this. For
example, vulnerability can be expressed in terms of the cost of insurance that
one may purchase. It should, however, be noted that expenditures, such as the
initial cost, are not always immediate but occur at specified intervals. For exam-
ple, the house loan may be for a period of 10 years; the cost of the house will
then be paid during this period. This points to the effective value of money, say,
in terms of dollars (i.e., because of inflation the value of a dollar at a future date
is not the same as it is today). To account for the change in the value of a dollar
with time, the money markets have established interest rates that express this
change in value. Consider, for example, an interest rate of 5% for the next 10
years. If an amount of $100,000 is invested today, then 10 years from now, it will
amount to $100,000(1 + 0.05)10 = $162,889. In other words, $162,889 invested 10
years from now will have the same value as $100,000 has now. There are, how-
ever, consequences for which there is no market value, as, for example, interrup-
tion in residency during major repair work, inconvenience, emotional value, etc.
It is difficult to evaluate such consequences. The question then arises as to the
worth of these nonquantifiable consequences. How much are the people willing
to pay for less interruption, reduced inconvenience, more emotional value, etc.?
It is difficult to get precise numbers for such consequences, for they are subjec-
tive. Nevertheless, one can at least specify some upper and lower limits for the
monetary value that might suffice to rank alternative designs and enable selec-
tion of the best alternative.
As an example, consider the case of a house design where three alternative
designs are to be evaluated. These alternatives are designated as I, II, and III.
Assume that each design is to be evaluated by considering three aspects: foun-
dation A, material B, and labor C. For design I, IA denotes the foundation for
design I, IB denotes the material needed for design I, and IC denotes the labor
for design I. Each aspect has an effective cost based on probabilistic consider-
ations. For design I, IA has effective cost ECIA, IB has effective cost ECIB, and IC
has effective cost ECIC. In a similar manner, designs II and III are represented.
These designs can be represented as a decision tree, as schematically shown in
Fig. 1-4. Associated with each branch representing a design is a set of conse-
quences, which are entered into the calculation of the relative value of the alter-
native. The relative value is then used for ranking alternative designs.
This exercise essentially comprises a rational planning process consisting of
defining an objective, identifying alternative means of achieving the objective,
and applying a ranking procedure to determine the best alternative. Although it
is conceptually quite simple, in the real world often little planning and little
rationality are employed even for important decisions. A common occurrence is
not to consider alternatives. Quite frequently, the decision is made based on pre-
cedent, tradition, lack of preparation, personal bias, prejudice, or shortsighted-
ness. Many a time, there is a deliberate effort to postpone making decisions,
however consequential they are. The result is that no time is left for anything but
12 Risk and Reliability Analysis
ECIA
IA
ECIB
IB
I IC ECIC
ECIIA
IIA
II
ECIIB
IIB
IIC ECIIC
ECIIIA
III IIIA
ECIIIB
IIIB
IIIC ECIIIC
Figure 1-4 Decision tree. I, II, and III are alternatives; letters A, B, and C associated with
the alternatives denote the consequences of the respective alternatives in terms of
foundation, material, and labor; and ECIs, ECIIs and ECIIIs are costs of the consequences.
cal hierarchical approach for such problems. In this case, decision making is ana-
lytical and relatively simple.
In the problems related to decision making under uncertainty or risk, the
benefits and costs associated with each decision are commonly expressed using
probability distributions. In the absence of one definite outcome, an expected
value criterion may be adopted for comparing decisions. Based on the optimiza-
tion of the expected profit or expected loss, decisions are then evaluated. When
the number of alternatives is small, decision-tree analysis can be used to find the
best alternative or decision.
Example 1.3 Suppose a contractor in Louisiana has bid for a job of repairing a
highway in the month of June. For doing the repair work, the contractor needs a
certain number of rain-free days. To ensure completion of the work in time, there
is a clause in the contract that the contractor forfeits her payment if the work is
not completed in time. The contractor enters a bid for $5,000,000. What is the rea-
sonable course of action?
Solution There are several alternatives that the contractor initially considers.
After initial screening she eliminates what she considers inferior alternatives
and finally decides on evaluating only three alternatives for completing the
work. The first alternative is based on the calculation that she can complete the
work in 20 days at a cost of $3,000,000 with her own equipment. Analysis of
rainfall data reveals that there is a 30% chance of having fewer than 20 rain-free
days in June.
The second alternative is that the contractor can buy additional equipment
and can then finish the work in 15 days at a cost of $3,500,000. There is, however,
a 10% probability that there will be fewer than 15 rain-free days in June.
The third alternative is that the contractor can partner with another contrac-
tor and finish the work in 10 days at a cost of $4,000,000. Analysis of rainfall data
shows that there is virtually no chance of having fewer than 10 rain-free days in
June.
To make a rational decision, one can consider the decision tree with the three
alternatives and their associated consequences or outcomes. Rainfall in the
month of June is a random variable and clearly influences the consequences of
the three alternative courses of action and the resulting outcomes. The possible
outcomes, profit or loss, are associated with appropriate probability values.
These outcomes must be weighted with the corresponding probability values,
and the sum of the weighted outcomes of each decision is then computed. The
weighted sum determines the EMV of each decision. The decision with the
greatest EMV may be the preferred decision but may not necessarily be the best
decision.
Let us now compute the EMV of each alternative decision. In the first alter-
native, the profit will be $2,000,000 with a probability of 0.7 and the loss will be
$3,000,000 with a probability of 0.3. Therefore,
EMV = $2,000,000 × 0.7 – $3,000,000 × 0.3 = $500,000
For the second alternative, the profit will be $1,500,000 with a probability of
0.9 and the loss will be $3,500,000 with a probability of 0.1. Therefore,
EMV = $1,500,000 × 0.9 – $3,500,000 × 0.1 = $1,000,000
For the third alternative, the profit will be $1,000,000 with a probability of 1
and the loss will be $4,000,000 with a probability of 0. Thus,
EMV = $1,000,000 × 1 – $4,000,000 × 0 = $1,000,000
These three courses of action are depicted in Fig. 1-5. In terms of the EMVs,
alternatives can be ranked as alternative C, alternative B, and alternative A.
Rational Decision Making Under Uncertainty 17
Complete 0.7
+2,000,000
$500,000
-3,000,000
Not complete 0.3
Alternative A
-3,500,0000
Not complete 0.1
Alternative C
Complete 1.0
+1,000,000
$1,000,000
-4,000,000
Not complete 0.0
real dollars can be expressed EUV or “utility dollar” if dollars are the measure of
monetary value. This value or dollar is different from the real numerical value or
dollar but can be related to real dollars. In the case of gain, the effective value of
the dollar is usually less than the real dollar (i.e., the ratio of utility dollar to real
dollar is less than one). Thus the effective value of $100 is less than this
amount—it may be $90. However, the effective value of the dollar in the case of
loss is greater than the real dollar (i.e., the ratio of the utility dollar to real dollar
is greater than one). This nonlinear relationship between utility dollar and real
dollar is sketched in Fig. 1-6. This relationship has approximately a 45-degree
slope near the origin, indicating that for relatively small gains and losses the util-
ity dollar and the real dollar have the same value. Large gains are discounted
somewhat and large losses are given greater negative utility value because of
their crippling effect. This relationship may, however, be different for different
persons, depending on their particular limitations, as well as for the same per-
son at different times or under different circumstances. By using the utility dol-
lar, the consequences of any outcome can be evaluated by the same linear
operations of weighing and addition as before. Conceptually, Fig. 1-6 is simple
but the determination of the nonlinear relationship is quite difficult, because it
involves value judgments that cannot be quantified and avoided in real life and
in making actual decisions. The relationship between the real dollar and effec-
tive monetary value as well as that between the real dollar and effective utility
value can vary in a multitude of ways, depending upon a particular situation
and a particular individual or organization. Figure 1-6 depicts some of these
cases. Conceptually, EUV looks attractive but it is difficult to quantify, and there-
fore its practical utility is limited or doubtful.
5
EMV and EUV ($)
0
0 1 2 3 4 5 6
Figure 1-6 Relationships between effective monetary value and effective utility value.
computation of the weighted sum of flood damages or the EMV of flood dam-
ages. The same procedure can be applied when working with utility dollars.
Example 1.4 For flood protection, many measures are employed, depending
upon the situation at hand. Such measures include structural measures, non-
structural measures, and a combination thereof. Structural measures may
include construction of a dam, dykes, levees, a diversion structure, or a drainage
system. Nonstructural measures may include land use management, water har-
vesting, afforestation, and soil conservation.
There are many areas in the United States that suffer flood damage each year.
Louisiana has more than its share of such places. We all have learned of the dev-
astation in New Orleans and the Gulf Coast area brought about by Katrina and
subsequent levee breaching. In one area located near Amite River, Louisiana, a
major hypothetical flood caused damage of $100 million. The officials of this
area would like to reduce this damage value to a much smaller value. They
would like to evaluate different options for protecting this area. These options
may include (1) construction of a dam, (2) construction of levees, (3) construction
of a drainage system, and (4) development of proper land use. While selecting
the schemes, one can use two methods. The first method ignores frequency anal-
ysis and simply computes benefit–cost ratio. Of course, the EMV of each option
in this case is simply the difference between benefit and cost. Thus, one wants to
select the scheme that is most cost effective (highest benefit–cost ratio). The sec-
ond method employs a statistical method, considering discharge, stage, and
damage as random variables. It then computes the EMV values of different
Rational Decision Making Under Uncertainty 21
options designed to protect the area. Use both methods to select the flood protec-
tion scheme and compare the EMVs.
Assume the following construction costs: $40 million for a dam, $30 million
for levees, $24 million for a drainage system, and $20 million for land use. Each
scheme will reduce the damage differently and also change the stage–discharge
curve differently. The reduced damage will be $20 million for the dam, $25 mil-
lion for the levees, $36 million for the drainage system, and $40 million for the
land use.
Table 1-4 contains data on annual peak discharge, stage, and flood damage
for the area under consideration in Louisiana. For the data the relationship
between discharge Q ft3/s (cfs) and stage H (ft) may be expressed by the follow-
ing equation:
⎛ H ⎞
H = 8.0451 ln Q – 52.619 or Q = 692.63 × exp ⎜
⎝ 8.0451⎟⎠
The relationship between stage and damage D (in millions of dollars) may be
expressed by the following equation conditioning on the stage greater than or
equal to 31 ft:
One does not necessarily have to use these relationships. Other suitable rela-
tionships can be derived if so desired.
Solution The first option for evaluating flood protection schemes is without
considering uncertainty; that is, determine the EMV of each scheme. To that end,
if the dam is constructed it will have a benefit of $80 million. The net benefit will
be $80 million − $40 million = $40 million. This is also the EMV of the flood ben-
efit of the dam scheme. The benefit–cost ratio is 1. For the levees, the damage is
reduced to $25 million. The benefit is $100 million – $25 million = $75 million.
The net benefit is $75 million – $30 million = $45 million. This is also the EMV of
the flood benefit of the levee scheme. This gives a benefit–cost ratio of 1.5. For
the drainage system, the damage is reduced to $36 million, so the benefit is $100
million – $36 million = $64 million. The net benefit is $64 million – $24 million =
$40 million, giving a benefit–cost ratio of 1.67. The EMV of the drainage system
scheme is also $40 million. For the land use option, the benefit is $100 million –
$40 million = $60 million. The net benefit is $60 million – $20 million = $40 mil-
lion, yielding a benefit–cost ratio of 2. The EMV of the land use scheme is $40
million. Based on the benefit–cost ratio, one may want to select the land use
option because it has the highest benefit–cost ratio.
Now the second option is examined. This option considers evaluating the
EMV of flood damage for each flood protection scheme under uncertainty. Note
that flood peak Q is a random variable, and so are floods on the Amite River
occurring each year. Likewise, flood damage is a random variable and its value
22 Risk and Reliability Analysis
changes from year to year. The other variable is water level or stage, because
flood damage is related to it rather than to flow. Thus, three random variables
are to be dealt with: the flood peak discharge Q, the corresponding flood stage
H, and the corresponding flood damage D. To compute flood damage, three rela-
tionships are needed: (1) a relationship between peak flow and the correspond-
ing water level (i.e., a rating curve) at the nearby gauge, (2) a relationship
between stage and the corresponding flood damage, and (3) a relationship
between the dollar value and damage. From the data available at the nearby
gauge, the rating curve (the relationship between H and Q) is given as shown in
Fig. 1-7a. The second relationship between the flood stage H and the flood dam-
age D is obtained from the data on actual damage figures and water level obser-
vations and is also given as shown in Fig. 1-7b. Analytical expressions for these
relationships are given.
In evaluating flood protection schemes under uncertainty, the first step is to
perform a frequency analysis of historical flood peak data (Table 1-4) and construct
a cumulative distribution function (CDF) of Q, F(Q), as shown in Fig. 1-7b. Like-
wise, a CDF of flood damage D, F(D), is constructed as shown in Fig. 1-7d.
Not all flood peaks cause damage; only some do and these usually are in the
upper 20% range (i.e., flood peaks in this range contain all damaging floods).
This part of the CDF is plotted separately as shown in Fig. 1-7c. The second step
is to construct a CDF of annual flood damage as shown in Fig. 1-7d (for flood
discharge values, selected ones are in the upper 20th percentile). Each year the
flood peak remains below 70,000 cfs with a probability of F(Q) = 0.7 or 70%. This
flood discharge corresponds to a stage of 37.1 ft. Correspondingly, there is a 70%
probability that the flood damage D will remain below $4.53 million each year.
In other words, F(D) for D = $4.53 million is equal to 0.70 or 70%. In this way the
function F(D) can be constructed.
0.90
Frequency Discharge, F(Q)
0.80
0.70
0.60
0.50
0.40
20000 40000 60000 80000 100000 120000
Discharge, Q, (cfs)
0.90
0.85
F(Discharge)
0.80
0.75
0.70
0.65
0.60
60000 70000 80000 90000 100000 110000 120000
Discharge (cfs)
To that end, the CDF of D, F(D), as shown in Fig. 1-7d, is employed for com-
puting the probabilities of flood-damage values. For calculating the EMV of the
annual flood damage, consider an interval between D’ – (½) dD and D’ + (½) dD as
shown in Fig. 1-8. The probability that D lies in this interval is equal to dF(D), the
interval on the F(D) axis that corresponds to the interval dD on the D axis. The
incremental value of EMV of the damage in this interval can be denoted by dEMV
and can be written as the magnitude of the damage times the probability of its
occurrence:
dEMV = D × dF(D)
Rational Decision Making Under Uncertainty 25
F(D)
↑
D’
dF(D)
dD D ---→
Figure 1-8 Calculation of EMV of the annual flood damage.
The integral of Eq. 1.1 defines the area between the F(D) axis and the CDF
curve of flood damage in Fig. 1-8. The integration can be performed numerically
by taking probability intervals that are sufficiently small for determining the aver-
age value of the damage in the interval. Multiplying the average damage with the
probability interval and adding the values for all intervals gives the EMV of the
annual flood damage. Since costs of alternatives will be spread over time, the
present worth of the EMVs can then be determined by using a given interest rate.
The final value is the EMV of the flood damage. This can be done for each flood
26 Risk and Reliability Analysis
protection scheme and then a flood protection scheme can be selected on this
basis. For the flood-damage curve, the EMV is computed as
EMV = (1.34 × 0.30) + (4.53 × 0.70) + (5.31 × 0.75)
+ (11.6 × 0.95) + (19.3 × 0.99) + (53 × 1.00)
= $90.8 million
A comment regarding computation of EMV for each option is in order. Each
flood protection scheme affects the EMV of flood damage in its own way. If, for
example, levees are to be used for a flood protection scheme, then levees elimi-
nate the flood damage up to the level where they control the channel flooding
and therefore would change the rating curve at the gauging site. This in turn
would change the damage curve. In a similar manner, each flood protection
scheme would lead to a modified EMV of flood damage. Without a rating curve
and a damage curve for each option, it is not possible to compute EMVs of these
options and make a statement as to which should be the preferred option.
Example 1.5 Tresimeno Lake in central Italy is used for irrigation and recre-
ational purposes. The water level in the lake reservoir needs to be regulated
through control works at the outlet and outlet channel. The channel allows large
discharges to pass without raising the lake levels too high, whereas the control
works, such as a gate, permit holding water back when the water level would be
low. The release of water from the lake for irrigation purposes is also controlled
by the gate. The lake water level is to be regulated to make it attractive for recre-
ation on the one hand and satisfy the irrigation water supply need on the other
hand. There may be a conflict in satisfying these two objectives. The question to
be addressed is how best to regulate the lake water level. Analyze conceptually
the approach for determining the best solution in these circumstances. The lake
level is a random variable and can be denoted as X. The annual benefit to be
derived from recreation and irrigated agriculture is also a random variable and
can be denoted as Y.
Solution There can be several alternative proposals to achieve the twin objec-
tives. One alternative may be to emphasize the recreational benefits only. The
other alternative is to emphasize irrigation benefits only. There may be several
alternatives combining the two objectives in different ways. Regardless of the
strategy to regulate the lake, one can hypothesize that the best solution is the one
for which the net benefits are maximized. The determination of the best solution
then requires, for each proposal, the calculation of the cost and the benefits. Were
there no uncertainties, one can first calculate the benefits as the EMV of the rec-
reation and irrigation benefits under natural conditions. Second, one can calcu-
late the EMV of the recreation benefits with the various proposals for lake
regulation. Then, the difference between the two EMVs can be computed for
each proposal and a decision can be made. However, the lake level is a random
variable and thus calculations of EMVs should be done under uncertainty.
To illustrate the procedure under uncertainty, we consider the average lake
level during the months of June and July as the basic random variable X. There is
Rational Decision Making Under Uncertainty 27
a relationship between lake level and possible benefits, that is, between X and Y.
For example, if the lake level is very high, the available beach area may be
reduced and more water may have to be released than what is needed for irriga-
tion. High lake levels also cause beach erosion and may cause damage to cot-
tages and boats and other tourist facilities, especially when coupled with strong
winds. However, low lake levels expose mudflats and cause difficulties with
boating on the lake, such as shoals and docks that are too high. Desired releases
of water for irrigation may not be permissible. This means that a monetary value
must be assigned to each possible lake level. To that end, a survey can be con-
ducted and on that basis a kind of utility function of the lake levels can be pre-
pared. This function quantitatively expresses the monetary worth of the average
lake level during the months of June and July. Figure 1-9 shows this hypothetical
utility function. It is admitted that this utility function is imprecise, subjective,
and difficult to determine. It involves value judgment and attempts to express
intangibles, such as recreation benefits, in terms of monetary value. To make a
rational decision, the worth of benefits has to be expressed in quantitative terms.
It is assumed that the probability density function (PDF) of X is known or can be
obtained from data, as shown in the bottom of Fig. 1-9. Then the CDF of X is
determined from its PDF. Similarly, the PDF and CDF of Y can be derived from
the knowledge of its relationship with X or independently from data, as shown
in Fig. 1-9. It should be noted that the same value of Y can occur for two different
values of X. If Y is set at a given arbitrary value y1, then it can be stated that Y is
smaller than y1 if and only if either X is smaller than x1 or X is larger than x2.
Thus,
P(Y < y1) = P(X < x1) + P(X > x2) (1.2)
The two probabilities, P(X < x1) and P(X > x2), in Eq. 1.2 can be read from the
CDF of X, as shown by a and b in Fig. 1-9. Also,
F(y) = P(Y < y) = P(Y < y1) = a + b (1.3)
In this manner, every point of the CDF of Y can be determined, which is the
CDF of the annual benefits. Then EMVs can be computed and in this way differ-
ent options for lake regulation can be evaluated.
↑
b a Y Y=ψ(x)
+5
CDF of Y
-5
x1 x2 X→
0%
100%
F(y)
CDF of X
100% b
F(x)
a
0%
PDF of
X f(x)
x1 x2 X→
Figure 1-9 A hypothetical utility function for a lake.
(2) The owner can just pay the penalty each year. The penalty is about $25/m3.
(3) The owner can pay a small amount of penalty and abate pollution to some
extent. It is assumed that the owner decides to treat about 70% of the wastewater
and pays a penalty for the remaining 30%.
From the perspective of the plant owner, the less the cost is, the better. Other
factors, such as environmental and social consequences, are not as important as
economic ones. Find the least-cost solution.
Solution The least-cost solution or option can be determined using EMV. To
that end, let us consider each option one by one.
Rational Decision Making Under Uncertainty 29
Option A: For this option there are two consequences. First, if the pollutant
concentration is below the discharge standard, the cost is $20,000, which is
the construction cost of the abatement device. If the concentration is above
the discharge standard, the cost will be the cost of construction and treat-
ment = $20,000 + $3 × 2,000 × 0.8 = $24,800. Therefore, the cost of this conse-
quence will be $24,800.
Option B: If the concentration is below the discharge standard, the owner
has to pay nothing with a probability of 1 – 0.8 = 0.2. Thus, the cost of this
consequence is $0. If the concentration is above the discharge standard, the
cost will be the penalty, which is $25 × 2,000 = $50,000, with a probability of
0.8. Thus, the effective cost will be $50,000 × 0.8 = $40,000.
Option C: If the concentration is below the discharge standard, the cost will
be the cost of the abatement device construction, which is $20,000. Thus, the
effective cost of this consequence will be $20,000 + 0.8 × 2,000 × [($3 × 0.7) +
($25 × 0.3)] = $35,360, as the random part is only the level of pollution
exceeding the standard with a probability of 0.8 and 70% of the wastewater
to be treated (at $3/m3) and the penalty is to be paid for the rest (at $25/m3).
All the costs are listed in Table 1-6a.
Now we calculate the EMV of each option as follows:
EMVA = $20,000 + ($3 × 2,000 × 0.8) = $24,800
One can also compute the EUVs directly by converting the total value into
utility dollars. Thus EUVA < EUVC < EUVB.
From the EMVs and EUVs, it is seen that option A costs the least and option
B costs the most. Thus, it is concluded that option A is the best choice for the
plant owner.
30 Risk and Reliability Analysis
model only those probabilistic aspects that we think we know how to analyze. It
is far better to have an approximate model of the whole problem than an exact
model of only a portion of it.”
In an ideal situation, one should derive joint probability distributions for all
the sources of uncertainty that significantly influence the behavior of the system.
This however is not possible, except in simple cases, owing to the involvement
of a large number of variables and their interactions in a nonlinear manner. The
techniques that are most commonly used for uncertainty analysis are Monte
Carlo simulation (MCS), the mean-value first-order second-moment (MFOSM)
method, and the advanced first-order second-moment (AFOSM) method. In
Monte Carlo simulation, long and multiple series of input random variables are
generated according to the distributions that they follow. Next, these are input to
the model of the system and the output is monitored. Statistical analysis of this
output yields measures of its behavior and the probability distribution.
In the mean-value first-order second-moment method, the Taylor series expan-
sion of the system performance function is truncated after the first terms. The use of
the term “mean-value” signifies that the expansion is about the mean value of the
variable. Further, only the first two moments of the variable are needed. This makes
the method easy to apply and simplifies the calculations. However, if the lineariza-
tion about the central value is not an adequate representation of the true behavior
of the variable, the method will not give acceptable results. Another criticism of this
method arises from the fact that most engineering systems do not fail near the aver-
age of the performance function. Rather, failure occurs at some extreme value.
In the AFOSM method, the Taylor series expansion of the performance func-
tion is taken at a likely failure point. Thus, the key to a successful application of
AFOSM is the determination of the likely failure point. In AFOSM, the reliability
index is the shortest distance between the mean state of the system and the fail-
ure surface. This index can be found either by applying some nonlinear optimi-
zation procedure or by following an iterative scheme.
While comparing the reliability analysis methods with specific reference to
watershed models, Melching (1995) noted that the AFOSM method displayed
good agreement with MCS and better agreement for the tails of the probability
distributions. Expectedly, he noted that when the nonlinearities were not large,
MFOSM performed as accurately as (and sometimes better than) AFOSM. Note
that the results of MCS for a large number of runs formed the standard against
which the performance of the other method was compared.
making under risk or uncertainty involves identifying (1) the actual decisions to
be taken in a particular situation (optimal policy) and (2) the models that are
employed to select the optimal policy. The models they discussed include expec-
tation objective, expectation-variance objective, safety-first rule, utility function,
stochastic dominance, and risk curves.
The mathematical expectation of the risk curve is the simplest approach to
making a rational decision. This is not suitable for management of irrigation res-
ervoirs, because it is risk neutral. The expectation-variance approach bases pref-
erences on mean and variance of the net returns or the outcome is normally
distributed. These assumptions are quite restrictive. The safety-first rule avoids
risk and is therefore not a valid approach. The utility function measures the util-
ity level of each decision or action. It is quite difficult to quantify the utility level.
Stochastic dominance or stochastic efficiency is an efficiency measure of different
decisions. It lacks the ability to identify a strategy for comparing risk curves. It is
difficult to analytically define risk curves of different decisions. One of the main
problems of these decision models is that they are only capable of partially
accommodating the concerns of the decision maker. For rational decision making
Bouchart and Goulter (1998) proposed a methodology using neural networks.
1.8 Questions
1.1 An urban area gets flooded and flooding needs to be mitigated. To that
end, one can consider construction of a detention pond and attenuate flood
peaks using the pond. Another option could be upgrading or enhancement
of the existing drainage system, which will suffice to carry greater runoff.
Still another option could be construction of additional drainage channels.
Proper land use management can be another option. There can be other
options also. Since flooding is a random variable, a decision has to be made
under uncertainty. Analyze the urban flooding and discuss conceptually
which way is the most rational way to mitigate flooding.
1.2 Consider the problem of water supply to a city. There can be several
ways by which water can be supplied to the city. Water can be supplied
from a nearby river. Of course, the river flow is subject to uncertainty.
Water can be supplied from groundwater, which is also subject to uncer-
tainty. Water can be supplied using a combination of surface and
groundwater sources. Still another source can be water harvesting. In
any case, a decision has to be made under uncertainty. What is the ratio-
nal decision for supplying water to the city? Discuss it conceptually.
1.3 Consider a problem of solid waste disposal. An alternative is to burn it
in the open or incinerate it mechanically. One can also landfill it. Or one
can use both options. Still another way is to haul it away to another place
or dump it in the nearby sea. There are many options, but each option
Rational Decision Making Under Uncertainty 33
1.10 An industrial plant generates considerable waste and the owner wants to
determine the best way to dispose of it. There can be many options. The
plant is located near a river. The plant can dump the waste in the river but
the river water quality has to be maintained. This means that the entire
waste may not be dumped. The owner can construct a storage facility and
treat the waste or use a combination of both. Discuss conceptually the
rational decision that can be made for disposing of the waste.
1.11 A family is building a new home in the New Orleans area. The base cost of
the home is $100,000, but the family can choose the degree of wind resis-
tance of the structure. For the base price, the house is designed to with-
stand wind gusts of 90 miles per hour (mph). For an additional cost of
$2,000, the house will withstand winds of 100 mph, and for an additional
$4,000 the house can withstand winds of 110 mph. The expected cost of
repairing wind damage is $10,000, and the family hopes to not have to pay
for damage during the first 10 years of living in the house. From the
FEMA Multihazard Loss Estimation Methodology Hurricane Model
HAZUS-MH Technical Manual, the return periods of various wind gusts
can be estimated as shown below.
Wind gust (mph) Return period (years) Annual probability (per year)
90 15 0.067
100 23 0.043
110 35 0.029
From these data and the building costs given, one can determine the
expected net benefit for different options. What should be the rational
decision for the family for building its home?
1.12 Following catastrophic Hurricane Katrina, nearly 1.5 million people had
to be evacuated from the Louisiana–Mississippi Gulf coast. Clean up
operations will take about a year. Assume that 500,000 people will be
forced to live in temporary housing for a year. Supplying adequate
drinking water is an important aspect of maintaining safe housing
development. Relief officials have three options to choose from: (1) ship-
ping bottled water, (2) pumping well water, and (3) on-site chlorination
of surface water. Assume that the cost of bottled water is $1.89 per liter
and the demand can be met with certainty. Per capita consumption can
be assumed to be 2 liters per day. It can be assumed that pumping
groundwater will cost $0.02912 per liter but there is a 75% chance that
the groundwater supply will be exhausted before the end of the year.
The cost of chlorination is $0.035071 per liter and there is a 90% chance
that there will be sufficient surface water and chlorination to meet the
water demand. What is the rational decision that the relief officials
should make for meeting the water demand of the evacuees in tempo-
rary housing?
Rational Decision Making Under Uncertainty 35
Elements of Probability
36
Elements of Probability 37
4000
3500
Threshold Flow
3000
2500
Flow (cfs)
2000
1500
1000
500
0
May-92
May-99
Nov-94
Nov-96
Nov-89
Jul-91
Mar-93
Aug-93
Mar-95
Aug-95
Apr-97
Jul-98
Mar-00
Aug-00
Apr-90
Oct-92
Oct-99
Jan-94
Jun-94
Jan-96
Jun-96
Sep-97
Feb-98
Sep-88
Jan-89
Jun-89
Sep-90
Feb-91
Dec-91
Dec-98
Date
Figure 2-1 Daily discharge of the Hillsborough River downstream of the Tampa Dam.
Sample Space
Day Observed DO concentration (mg/L) Violations Day Observed DO concentration (mg/L) Violations
1 4.04 2.74 4.09 3.45 3.62 3.39 4 16 4.03 3.85 3.02 5.50 5.96 3.40 3
2 2.33 2.05 6.64 6.80 3.07 3.58 4 17 5.00 3.87 5.39 5.50 0.70 3.73 3
3 1.89 3.52 5.00 6.00 4.30 2.98 3 18 3.04 6.19 5.37 4.07 4.20 4.21 1
4 3.51 5.88 2.43 5.25 3.31 2.99 4 19 1.68 2.95 4.31 5.22 0.70 3.79 4
5 2.68 3.49 3.86 4.68 0.36 3.38 5 20 1.95 4.09 3.50 5.27 3.81 3.75 4
6 2.31 4.48 3.30 5.68 3.35 3.88 4 21 3.76 4.02 2.10 5.55 0.75 3.04 4
7 2.46 6.06 3.19 3.22 2.70 3.74 5 22 5.32 4.28 4.55 5.84 3.94 3.27 2
8 1.74 2.75 3.82 4.09 2.00 3.39 5 23 2.61 4.57 3.45 5.64 2.70 3.23 4
9 3.48 3.72 5.71 4.48 3.38 3.29 4 24 2.97 6.21 2.94 0.79 4.99 2.37 4
10 3.06 5.43 5.48 7.40 1.36 2.88 3 25 3.80 3.02 3.32 5.67 3.07 2.87 5
11 4.20 4.01 1.63 2.82 2.27 3.80 4 26 2.95 4.31 4.64 3.31 2.27 3.09 4
12 2.68 4.74 4.20 2.80 5.62 4.10 2 27 2.31 4.38 4.46 3.77 0.99 3.39 4
13 2.08 3.63 3.56 3.27 2.91 3.07 6 28 4.45 3.69 6.93 3.70 3.67 3.24 4
14 3.65 2.28 4.03 3.72 3.76 2.50 5 29 3.47 3.50 6.25 6.28 5.99 3.92 3
15 2.62 4.00 4.94 4.31 2.72 2.81 3 30 4.41 2.60 4.54 3.80 2.75 3.40 4
0 1 2 3 4 5 6
Number of DO violations that might occur on a given day
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6
Number of violation
2.1.2.1 Histogram
Histogramming is a method of discretization (encoding a data set using bins or
classes) wherein a continuous data set is converted into discrete data. A histogram
is constructed by dividing the observed data into bins (or classes) such that for the
first bin, x1 ≤ X < x2, for the second bin, x2 ≤ X < x3, etc. The DO concentration val-
ues presented in Fig. 2-2 are divided into bins and then a plot is made of the num-
ber of observations in each bin versus the value of X as shown in Fig. 2-4.
The appropriate width of a bin or class interval and the number of total bins
depend on the number of data points, the minimum and maximum values of
data, and the overall behavior of the data. Often, 5 to 15 bins are sufficient for
most practical applications. After categorizing the data into bins or classes, the
number of observations in each bin is determined. As a rule of thumb, the num-
ber of classes is approximately equal to n , where n is the number of observa-
tions. The best histogram is obtained by using the Sturges rule:
N = 1 + 3.3log10 n (2.1)
where N is the number of classes and n is the total number of observations in the
data set.
Elements of Probability 43
70
60
50
Number of observations
40
30
20
10
0
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5
DO concentration (mg/L)
0.35 0.4
a b
0.3 0.35
0.3
0.25
Relative frequency
Relative frequency
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05 0.05
0 0
0 2 4 6 8 0 2 4 6 8
DO concentration (mg/L) DO concentration (mg/L))
statistical analysis. But it must define the number of observations in each bin or
class. In a comparison of two data sets, this will only be useful if they have the
same number of observations. To compare groups of different sizes, the histo-
gram must be modified by plotting the relative frequency of each class versus
the value of X, as given in Fig. 2-5. The relative frequency of a class is deter-
mined by dividing the number of observations in that class by the total number
of observations in the data set. An alternative display is the frequency polygon,
shown for the same data in Fig. 2-5. The frequency polygon is obtained by join-
ing the midpoint of each class matching the class frequency.
In many applications regulatory requirements are in terms of the number of
exceedances or nonexceedances, as, for example, the number of violations of
the critical dissolved oxygen level in a given reach of stream or the number of
exceedances above a given flow or stage at a given cross section of a river. In
these cases, it is advantageous to make another transformation of class frequen-
cies to obtain a cumulative frequency plot. Cumulative relative frequency is
determined by adding the frequency of each class to the sum of the frequencies
for the lower classes by considering the relative frequency constructed by
approaches (1) and (2). Table 2-1 provides cumulative frequencies for the DO
data presented in Fig. 2-2. Figure 2-6 presents the cumulative frequency plot for
the DO data. Based on this figure, one can make statements related to the
chances of exceedance of a certain DO level, such as chances of exceedance of
3.0 mg/L = 1 − 0.27 = 0.73 (i.e., 73%).
Example 2.1 Table E2-1a gives the annual peak flow at the USGS 08075000 site
on Brays Bayou River in downtown Houston, Texas. Develop a frequency histo-
gram and a cumulative frequency plot for the peak flow.
Elements of Probability 45
1 1
a b
Cumulative relative frequency
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 2 4 6 8 0 2 4 6 8
DO concentration (mg/L) DO concentration (mg/L)
Solution In the data from the table, n = 66. Using the Sturges rule, N = 1 + 3.3
log10(66) = 7. Then the bin width for each class is 4,030.6 cfs. Thus seven bins
were selected and frequencies were determined for each bin. By dividing the fre-
quency of each class by n, the relative frequency was determined and then the
cumulative frequency was evaluated as given in Table E2-1b (where (1) and (2)
denote the two approaches outlined earlier). Figures 2-7 and 2-8 show the rela-
tive frequency and cumulative frequency histograms obtained by both normal-
ization approaches.
46 Risk and Reliability Analysis
Table E2-1a
Year 1929 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945
Peak 11100 6600 1270 4530 6800 1340 6460 4590 6280 8120 5590
flow
Year 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956
Peak 3880 4360 1440 2340 5340 786 1850 3580 3680 3300 1180
flow
Year 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967
Peak 4660 5100 7760 12600 6320 7720 8300 4060 3160 9400 4730
flow
Year 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978
Peak 12000 9240 11500 15500 11700 24800 8660 18000 29000 8710 6260
flow
Year 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
Peak 25500 11300 25400 17700 29000 8640 12300 17300 22400 8290 21500
flow
Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Peak 10400 19800 23000 16000 16600 27000 17700 23400 25500 16700 7640
flow
Table E2-1b
-5
x 10
0.35 7
a b
0.3 6
0.25 5
Relative frequency
Relative frequency
0.2 4
0.15 3
0.1 2
0.05 1
0 0
0 1 2 3 0 1 2 3
Peak flow (cfs) 4
Peak flow (cfs) 4
x 10 x 10
Figure 2-7 Frequency histogram plot for the Brays Bayou River peak flow data.
1 1
a b
Cumulative relative frequency
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 1 2 3 0 1 2 3
Peak flow (cfs) 4 Peak flow (cfs) 4
x 10 x 10
Figure 2-8 Cumulative frequency plot for the Brays Bayou River peak flow data.
these, three sums are larger than 10. A sum of 11 can be obtained in two ways: 6
on the first die and 5 on the second or vice versa, whereas 12 is possible only
when both dice have 6. Hence, the probability of getting a sum larger than 10 is
3/36 = 1/12.) The approximation tends to get better as the number of throws
increases.
When a coin is tossed, there are two equally likely possible outcomes; that is,
there is mutual symmetry of all possible outcomes. In any toss, therefore, we
have a number (say, n) of cases that are equally likely. Consider, for example, an
event E defined as getting more than 3 heads in a single throw of 5 coins. The
cases that favor the event E or successes (see Fig. 2-10) are (H, H, H, H, H) and (H,
H, H, H, T). Note that the concern here lies only with the total number of heads
or tails in a single throw. The total number of possible outcomes are (H, H, H, H,
H), (T, H, H, H, H), (T, T, H, H, H), (T, T, T, H, H), (T, T, T, T, H), and (T, T, T, T, T).
If the number of cases favorable to the event (or number of successes) is denoted
by s then in this example, s = 2 and n = 6. Similarly, let event E be defined as
throwing more than 10 with two dice in one single throw. The cases that favor
this event or successes are (6, 6), (6, 5), and (5, 6). In this example, s = 3 and
n = 36.
s
P = lim (2.2)
n→∞ n
Sample space S
(H, H, H, H, H) Event E
(H, H, H, H, T)
(H, H, H, T, T) (H, H, H, H, H)
(H, H, T, T, T) (H, H, H, H, T)
(H, T, T, T, T)
(T, T, T, T, T)
1
Frequency ratio of heads
0.5
0
1 10 100 1000
Number of throws
elementary events are called composite events. Together, all elementary and
composite events constitute the sample space. A sample event can also be an
event consisting of a single sample point, and a compound event is made up of
two or more sample points or elementary outcomes of an experiment. The com-
plement Ac of an event A consists of all sample points in the sample space of the
experiment not included in the event A. Therefore, the complement of an event
is also an event. If A is a certain event (i.e., if A is the collection of all sample
points in the sample space S), then its complement Ac will be the null event (i.e.,
it will contain no sample events).
If two events contain no sample points in common, the events are said to be
mutually exclusive or disjointed (see Fig. 2-12). If two events A and B are not
mutually exclusive, the set of points that they have in common is called their
intersection, denoted as AB. Figure 2-13 shows this concept using the Venn dia-
gram. If the intersection of two events A and B is equivalent to one of the events,
say, A, then event A is said to be contained in event B and is written as A ⊂ B. The
union of two events A and B is the event that is the collection of all sample points
that occur at least once in either A or B and is written as A ∪ B.
Another sample space of interest is conditional sample space. For example, a
hydrologist might be interested in floods exceeding a certain threshold event
denoted as A. The set of events exceeding event A can be considered as a new,
reduced sample space. Only the sample events associated with the sample
points in that reduced space, which is conditional on A, are possible outcomes of
the experiment. The reduced sample space is the conditional sample space and
can be represented as {events|events ≥ Α}.
Elements of Probability 53
A B
S
∪
Figure 2-13 Venn diagram showing intersection and union of two events.
100
Annual flow in million cubic m
Event A
50
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
3
Peak flow in 100 m /s
C
A B S
Ac
A
D
Figure 2-16 Venn diagram.
where A is the event associated with all sample points in the sample
space.
Axiom 3: The probability of an event that is the union of two events is
n
P(E1 or E2 ... or En ) = P(E1 ∪ E2 ∪ E3 ∪ ... En ) = ∑ P(Ei ) (2.6b)
i=1
n n
P[∩ Ei ] = P(E1 ) × P(E2 ) × ... × P(En ) = ∏ P(Ei ) (2.7b)
i =1 i =1
B
A∩B
A A
This leads to
P[A ∩ B] = P[A|B] × P[B] = P[B|A] × P[A] (2.10)
If the two events A and B are statistically independent, then
P[A ¨B] = P[A] (2.11a)
P[B ¨A] = P[B] (2.11b)
P[A ∩ B] = P[A] × P[B] (2.12)
which is the same as Eq. 2.7a.
Taking advantage of Eq. 2.9, one can express the joint occurrence of n depen-
dent events as
n
P[∩ Ei ] = P(E1 ) × P(E2 E1 ) × P(E3 E2 , E1 ) ... × P(En En−1 , En− 2 ,..., E1 ) (2.13)
i =1
Example 2.2 Consider a river passing through an urban area that reaches a flood
stage each summer with a relative frequency of 0.1. Power failures in an indus-
trial complex along the river occur with a probability of 0.2. Experience shows
that when there is a flood, the chances of a power failure are raised to 0.4. Deter-
mine the probability of flooding or power failure.
Solution We are given the following:
P[Flood] = P[F] = 0.1
P[Power failure] = P[PP] = 0.2
Therefore,
If flooding and power failure were independent, the joint probability would be
P⎡
⎣F ∩ PP ⎤
⎦= 0.1× 0.2 = 0.02
P⎣ F ∩ PP ⎤
⎡ ⎦= 0.1× 0.8 = 0.08
P⎣ F ∩ PP ⎤
⎡ ⎦= 0.9 × 0.2 = 0.18
P⎡
⎣ F ∩ PP ⎤
⎦= 0.9 × 0.8 = 0.72
P⎡
⎣F ∪ PP ⎤
⎦= P ⎡ ⎦+ P⎡
⎣F ∩ P P ⎤ ⎣ F ∩ PP ⎤
⎦+ P⎡
⎣ F ∩ PP ⎤
⎦
= 0.02 + 0.08 + 0.18 = 0.28
The events are, however, dependent. When a flood occurs with P[F] = 0.1, a
power failure occurs with probability P[PP|F] = 0.4. Therefore, the true joint
probability is
P[F ∩ PP] = P[F] × P[PP|F]= 0.1 × 0.4 = 0.04
Example 2.3 Consider the design of an underground utility system for an indus-
trial park containing six similar building sites. The sites have not yet been located
and hence their nature is not yet known. If the power and water are provided in
excess of demand, there will be wastage of client’s capital. However, if the facili-
ties prove inadequate, expensive changes will be required. For simplicity of num-
bers, let us consider a particular site where the electric power required by the
occupant will be either 5 or 10 units while the water capacity demand would be
either 1 or 2 units. It is assumed that the probability of electric power demand
being 5 units and water demand being 1 unit is 0.1, the probability of electric
power demand being 5 units and water demand being 2 units is 0.2, the probabil-
ity of electric power demand being 10 units and water demand being 1 unit is 0.1,
and the probability of electric power demand being 10 units and water demand
being 2 units is 0.6. Calculate the probabilities of water or power demands.
Solution First, we define four associated events. Let us denote the following:
event W1 = the water demand is 1 unit, W2 = the water demand is 2 units, E5 =
the electricity demand is 5 units, and E10 = the electricity demand is 10 units.
60 Risk and Reliability Analysis
Note that water and power demands occur simultaneously. In other words, they
both need to be satisfied simultaneously. For example, the water demand of 1
unit can occur either with electric power demand of 5 units or with 10 units.
Then the sample experimental space associated with a single occupant consists
of four points as shown in Table E2-3.
Table E2-3
Example 2.4 In a survey, a number of firms in similar industrial parks are sam-
pled. It is found that there is no apparent relationship between their electricity
demand and their water demand. A high electricity demand does not always
seem to be correlated with a high water demand. Based on this information,
probabilities are assigned, as listed in Table E2-4a. Find the probabilities for the
joint or simultaneous occurrence of events denoted by water demand and elec-
tricity demand.
Elements of Probability 61
Example 2.5 For simplicity, let us calculate the probabilities of water demand
only and investigate the design capacity of a pair of similar sites in the industrial
park. The occupancies of the two sites represent two repeated trials of the previ-
ous experiment. Denote by W1W1 the events that the demand of each firm is one
unit and by W1W2 the event that the demand of the first firm is one unit and that
of the second firm is two units and so on. Find the probabilities for the joint
occurrence of water demands at two sites.
Solution Assuming independence of demands for water from the two sites, one
calculates the values in Table E2-5.
If the demands of all six sites are mutually independent, the probability that
all sites will demand two units of water is
Table E2-4a
Event Estimate of probability
Electricity demand E5 0.2
E10 0.8
Sum 1.0
Water demand W1 0.3
W2 0.7
Sum 1.0
Table E2-4b
P[E5W1] = P[E5]P[W1] 0.2 × 0.3 0.06
P[E5W2] = P[E5]P[W2] 0.2 × 0.7 0.14
P[E10W1] = P[E10]P[W1] 0.8 × 0.3 0.24
P[E10W2] = P[E10]P[W2] 0.8 × 0.7 0.56
Total 1.00
Table E2-5
P[W1W1] = P[W1]P[W1] 0.3 × 0.3 0.09
P[W1W2] = P[W1]P[W2] 0.3 × 0.7 0.21
P[W2W1] = P[W2]P[W1] 0.7 × 0.3 0.21
P[W2W2] = P[W2]P[W2] 0.7 × 0.7 0.49
Total 1.00
62 Risk and Reliability Analysis
Example 2.6 Consider a standard deck of playing cards from which we remove
all hearts except the ace. We now have 40 cards left. What is the probability that
a card drawn at random is both a heart and an ace?
Solution The event of drawing both a heart and an ace can be regarded as the
intersection of two events: event A, meaning the card is an ace, and event H,
meaning that the card is a heart. Since there are four aces, the probability of
drawing one out of a deck of 40 cards is 4/40 = 0.10:
P(A) = 0.10
Since there is only one heart, the probability of drawing it out of a deck of 40
is 1/40 = 0.025:
P(H) = 0.025
Elements of Probability 63
4
Event A
1
1 2 3 4 5 6
Because events A and H are both independent, the multiplication rule would
lead to
P(A ∩ H) = P(A) × P(H) = 0.10 × 1/40 = 1/400
This is obviously wrong.
To do the calculation correctly, we need to use
P(A ∩ H) = P(A|H) × P(H)
P(A|H) is the conditional probability that the card is an ace if it is known to
be a heart. This event is certain; it has a probability of one. Therefore
P(A ∩ H) = 1× 1/40 = 1/40
Alternatively, we can use
P(A ∩ H) = P(H|A) × P(A)
P(H|A) is the conditional probability that the card is a heart if it is known to
be an ace. This probability is evidently 1/4. Therefore
P(A ∩ H) = (1/4) × 1/10 = 1/40
Example 2.7 A contractor wants to construct a tunnel below the bed of a river in
connection with the development of a transportation project. The period of con-
struction will take two years. The building site must be surrounded by a coffer-
dam to keep the water out against a 10-year flood. What is the probability that
the site will be flooded during construction? The water level in the river is con-
sidered as a random variable.
64 Risk and Reliability Analysis
Solution The 10-year flood is the flood that has a 10% probability of being
exceeded every year. This means that the contractor accepts a 10% probability that
the site will be flooded in the year the tunnel is constructed. It is assumed that in
each of the two years the probability of flooding remains at 10% and that the
events in the two years are independent. One might be tempted to argue that dou-
bling the time will double the risk, since extending the time will increase the risk
of flooding. This clearly is not the case, since the answer would then be that the
probability of flooding is 20%. By the same reasoning one would conclude that
flooding would occur with absolute certainty if the construction period would be
extended to 10 years. If the period of construction were extended to 12 years, this
reasoning would lead to an answer of 120%. This answer is patently wrong.
To analyze this problem, we need to only consider the maximum water level
during the first year and the maximum water level during the second year,
because our interest is in the question of whether the level is higher or lower
than the critical level that would cause the cofferdam to be overtopped. The
water levels in successive years are assumed to be independent. The probability
of no flooding in the first year is 0.9; likewise, the probability of no flooding in
the second year is also 0.9. The probability that there is no flooding in either year
is equal to the product of the probabilities, 0.9 × 0.9 = 0.81. Similarly, probabili-
ties can be assigned to the other three possible points. In this case, the sum of all
the probabilities adds up to 1.00.
In a similar manner, no flooding in the first year and no flooding in the sec-
ond year is the intersection of two events defined as a single point that will have
a probability of 0.81. Likewise, flooding in the first year and flooding in the sec-
ond year is the intersection of two events, say, event A and event B, and results
in a single point that will have a probability of 0.1 × 0.1 = 0.01. In the probability
space we can arbitrarily give the water levels lower than the critical level the
value 0 and those higher than the critical level the value 1. This gives four
possible points in probability space, as shown in Fig. 2-19, where we now have a
discrete distribution.
Event B
0.01
1 0.09
Year 2
Event A
0.81 0.09
0
0 1
Year 1
Figure 2-19 Probability space of Example 2.7.
Elements of Probability 65
The event of interest is flooding during the construction period. This means
flooding during the first year or flooding during the second year. This event is
the union of the two events A and B:
P(A ∪ B) = P(A) + P(B) – P(A ∩ B)
P(A ∪ B) = 0.10 + 0.10 – (0.10 × 0.10) = 0.19
The probability associated with the event (A ∪ B) is 0.19. It then follows that
the probability is 19% that the site will be flooded during construction.
Alternatively, it is even simpler to observe that the described event “flooding
during years 1 and 2” has a complementary event: “no flooding during year 1 and
no flooding during year 2.” The latter is the intersection of two independent
events Ac and B c, each having a probability of 0.9 [i.e., P(Ac) = P(Bc) = 1 – 0.1 = 0.9].
The probability of the intersection is P(Ac ∩ Bc) = P(Ac) × P(Bc) = 0.9 × 0.9 = 0.81.
The probability of the complementary event that the site will be flooded at least
once in two years is 1.0 – 0.81 = 0.19.
Example 2.8 A highway culvert is designed for a 10-year flood. What is the
probability that the design flood will be exceeded in the next 20 years?
Solution The event “exceedance during the 20 years” is the union of 20 events
that have many points in common, namely, all the possible multiple occurrences
of exceedance during the 20-year period. Applying the addition rule thus
becomes quite awkward. However, the complementary event has a probability
that can be evaluated easily. The complementary event of exceedance at any time
during the period of 20 years is evidently “no exceedance at all.” That means “no
exceedance in the first year,” “no exceedance in the second year,” “no exceedance
in the third year,” and so on. This is the intersection of 20 events, each having a
probability of 0.9. Assuming statistical independence and applying the multipli-
cation rule gives a probability equal to (0.9)20 = 0.12. It then follows that the prob-
ability of exceedance during the 20 year period is 1.0 – 0.12 = 0.88.
S
S1
S2
A
S3
Sn
By invoking the properties of mutually exclusive events, Eq. 2.14 can be sim-
plified to
P(A) = P(A ∩ S1) + P(A ∩ S2) +… + (A ∩ Sn) (2.15)
Using Eq. 2.9, P(A ∩ B) = P[A|B] ×P[B], allows one to write Eq. 2.15 as
P[A] = P[A|S1] × P[S1] + P[A|S2] × P[S2] + … + P[A|Sn] × P[Sn]
n
= ∑ P[ A|Si ] × P[Si ] (2.16)
i =1
Equation 2.16 is also known as the theorem of total probability. This gives the
probability of event A, regardless of the attributes.
Rewriting Eq. 2.10, one gets
P[A ∩ Si] = P[A| Si] × P[Si] = P[Si|A]P × [A] (2.17)
or
P[Si|A] = P[A| Si] × P[Si]/P[A]
P[ A|Si ] × P[Si ]
P[Si |A] = n
(2.18)
∑ P[A|Si ] × P[Si ]
i =1
Equation 2.18 is known as Bayes’s theorem and follows from the definition
of conditional probability; it is regarded as a fundamental theorem to revise the
probability value through evidence. Bayes’s theorem involves a prior (or a priori)
Elements of Probability 67
distribution that contains all the relevant information about the variable before
additional data become available. Given the a priori distribution, the posteriori
distribution can be evaluated when a new set of data becomes available.
Table E2-9
Probability space X→ Number of days of rain
A1 A2 A3 A4 A5
0 2 3 4 >4
Chance of rain [P(X=Ai)] 40% 30% 20% 10% 0%
Chance of completing work 90% 70% 50% 10% 0%
(given) [P(R|X=Ai)]
From the total probability theorem, the probability of completing the work
on time can be expressed as
5 5
P(R) = ∑ P(R ∩ Ai ) = ∑ P(R|Ai )P(X = Ai )
i =1 i =1
68 Risk and Reliability Analysis
One can now determine the probabilities associated with the intersections of
events A1, A2, A3, A4, and A5 with events R and Q using the multiplication rule:
Thus,
5
P(R) = ∑ P(R ∩ Ai )
i=1
5
= ∑ P(R|Ai )P(X = Ai )
i=1
= 0.9 × 0.4 + 0.7 × 0.3 + 0.5 × 0.2 + 0.1× 0.1 = 0.68
It is seen that the probability of finishing the work on time, now called the
total probability, is equal to 36% + 21% + 10% + 1% = 68%. One can visualize that
each of the events A1, A2, A3, A4, and A5 carries part of this total probability and a
percentage of each corresponding to the conditional probability is associated
with R and the rest is associated with Q.
Example 2.10 Assume that the engineer who calculated the probabilities shown
in Example 2.9 was away when the work was being done. She did not know any-
thing about the weather at the site but she learned that the contractor was able to
finish in time. Did it rain? And if it did, did it rain for two, three, or four days? So
she decided to calculate the probabilities of rain during zero, two, three, and four
days, knowing that the contractor was able to finish on time. It should be noted
that the engineer did have the information on the chances of rainy days and those
of completing the work on time. This example is based on Booy (1990).
Solution The probabilities to be calculated are the probabilities of rain for a
given number of days given that the work has been completed on time. Thus
these are conditional probabilities. These probabilities are not the probabilities of
number of rainy days, which are already given. Therefore, to distinguish
between the probabilities of number of rainy days, which are given, and the
probabilities to be computed, the latter are denoted as R1, R2, R3, and R4. For
computing the probability that it did not rain given that the work was com-
pleted one can write
Ri = P(Ai |R) = P(A1 ∩ R)/P(R)
It is worth noting that before the engineer knew that the contractor had fin-
ished on time she would have rated the probabilities of no rain, two rainy days,
three rainy days, and four rainy days as 0.40, 0.30, 0.20, and 0.10.
0.4 PMF
f(x)
0.2
0
X→
Figure 2-21 Diagram showing the probability of obtaining various values of a variable X.
Example 2.12 Develop a PMF for the data presented in Table 2-1.
Solution Assume that the sample data are sufficient to characterize the PMF of
the DO violation. In such a case, the relative frequency corresponding to a given
number of DO violations can be regarded as its probability. Therefore, Fig. 2-6
can be assumed to represent the PMF of DO violations.
Example 2.13 A subdivision has a provision for water supply from four water
supply utilities. Based on the past history of these individual utilities, it has been
noted that 95% of the time these utilities are able to meet the demand but,
because of maintenance, drought conditions, or other reasons, 5% of the time
they fail to meet the required demand. Develop a probability mass function to
show the distribution of probability with respect to the number of ways the sub-
division will meet its demand.
Solution Let X be the random variable representing the number of ways the
subdivision will meet its demand. The possible values that X can take on are
Elements of Probability 71
Table E2-11
0.14
0.12
Probability P(X=x) = f(x)
0.10
0.08
0.06
0.04
0.02
0.00
277 395 484 770 793 839 1235 1390 1936 2353 2631 3202
X (MG)
Example 2.14 If two dice are thrown, compute the probability of obtaining dif-
ferent values as the sum of the points on their top faces.
Solution Each die has six faces and each face has a unique number of dots,
varying from 1 to 6. Hence, if two dice are thrown, the minimum that one can get
is 1 + 1 = 2 and the maximum is 6 + 6 = 12. In all, there are 6 × 6 = 36 possible out-
comes. Some numbers can occur in more than one way. For example, one can
obtain a sum of 6 in five ways: 1 + 5, 2 + 4, 3 + 3, 4 + 2, and 5 + 1. Clearly, the
probability of obtaining a sum of 6 is 5/36 = 0.139. A complete calculation is
shown in Table 2-14.
The data generated for X are summarized in Table 2-14.
Now, the PMF of the random variable X, which is the total number of points
obtained when throwing two ordinary dice, is plotted in Fig. 2-24. It is seen from
the figure that number 7 has the highest probability of occurrence.
Table E2-14a
Die #2
Die #1
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
2 1 0.03
3 2 0.06
4 3 0.08
5 4 0.11
6 5 0.14
7 6 0.17
8 5 0.14
9 4 0.11
10 3 0.08
11 2 0.06
12 1 0.03
Sum = 36 1
74 Risk and Reliability Analysis
Figure 2-24 PMF for the sum of numbers obtained by throwing two dice.
probability space and this ratio is called the probability density. For a single ran-
dom variable X, the probability density is a function of the value of X, x, and this
function is called the probability density function (PDF); for two variables, X
and Y, it is a function of x and y, and so on. The PDF is denoted by the symbol
f(x). Figure 2-25 shows a graph of a typical probability density function. Since a
continuous random variable X is defined on a particular interval, it may take on
any value in that particular interval. For example, if we say that X is defined on
any arbitrary interval between points a and b, then the probability that X lies
within this interval (a, b) is equal to the area of the PDF, f(x), intercepted by X = a
and X = b:
b
P( a ≤ x ≤ b) = P( a < x < b) = ∫ f ( x)dx (2.20)
a
1. f ( x ) ≥ 0 for all x
∞
2. ∫ f ( x )dx = 1
−∞
Elements of Probability 75
f(x)
dx a b x --→
Figure 2-25 Probability density function.
That means the PDF, f(x), is a non-negative function and the total area under
the PDF is equal to unity.
f ( x ) = c ( x − 1) , for 0 ≤ X ≤ 10
Solution Using the first property of the PDF, f(x) ≥ 0 for all x, the range of X that
X can take on is determined. Thus, for f(x) to be a non-negative function, X ≥ 1.
Therefore, the range on which X is defined is for 1 ≤ X ≤ 10. Now, using the sec-
ond property of the PDF,
∞
∫ f ( x )dx = 1
−∞
we obtain
10
∫ c(x − 1)dx = 1
1
⎛ x2 ⎞10
c⎜
⎜ 2 − x ⎟
⎟ =1
⎝ ⎠1
c = 2 / 81
Example 2.16 Plot the PDF of X given in Example 2.15 and determine the fol-
lowing: (a) probability X ≤ 3, (b) probability X ≥ 9, and (c) probability 4 ≤ X ≤ 8.
Solution The function f(x) = 2(x – 1)/81 is evaluated for various values of X
ranging from 1 to 10 and is plotted in Fig. 2-26.
76 Risk and Reliability Analysis
0.25
0.2
0.15
f(x)
0.1
0.05
0
0 2 4 6 8 10
x
2 ( x − 1)
3
(a) Probability X ≤ 3 = P(X ≤ 3) = ∫ 81
dx = 0.0494
1
2 ( x − 1)
9
(b) Probability X ≥ 9 = 1 – P(X ≤ 9) = 1 – ∫ 81
dx = 0.21
1
2 ( x − 1)
8
(c) Probability 4 ≤ X ≤ 8 = P(4 ≤ X ≤ 8) = ∫ 81
dx = 0.493
4
y
Figure 2-27 Joint probability distribution.
f ( x , y ) = P ( X = x and Y = y ) (2.22)
In real life, most engineering problems contain more than one random vari-
able to define the process of an engineering system. For example, the groundwa-
ter level in a phreatic aquifer depends on withdrawal by pumping, rainfall,
evaporation, and inflow or outflow from other surface water bodies. Another
example is one of a reservoir in which the volume of water and reservoir level
depend upon input from its contributing streams, outflow to downstream reach,
water supply withdrawal, losses from evaporation and seepage, etc. Further, if
these random variables are statistically dependent, analysis related to these
78 Risk and Reliability Analysis
d
c
f(x,y)
0 a b XÆ
Figure 2-28 Contours of equal probability.
∞ ∞
∫ ∫ f ( x , y )dxdy = 1
−∞ −∞
we write
22
Ú Ú ax
2 2
y dydx = 1
00
2
8a 2
Ú 3
x dx = 1
0
a = 9 / 64
1.5 1 1.5
9 3
P (0.5 £ X £ 1.5, 0 £ X £ 1) =
64 0Ú.5 Ú0 64 0Ú.5
x 2 y 2 dydx = x 2 dx = 0.0508
Elements of Probability 79
3 (x − y )
2
f (x, y ) = , –1 ≤ x ≤ 1; –1 ≤ y ≤ 1
8
= 0 otherwise
Find the marginal probability density function of X.
Solution Applying Eq. 2.24,
+∞
f (x) = ∫ f ( x , y ) dy ,
−∞
80 Risk and Reliability Analysis
f(y) dx
f(x,y)
f(x)
dx x →
Figure 2-29 Joint and marginal distributions.
gives
3 1 3 1 2
∫ ( ) ∫ ( x − 2xy + y 2 ) dy
2
f (x) = x − y dy =
8 −1 8 −1
1
3⎡ y3 ⎤ 1
f ( x) = ⎢ x 2 y − xy 2 + ⎥ = ( 3 x 2 + 1)
8⎣ 3 ⎦ 4
−1
3 x 2 + 1 , –1 ≤ x ≤ 1
f ( x) =
4
= 0, otherwise
In the same way one can determine the marginal distribution of Y:
3y2 + 1
f (y) = , –1 ≤ y ≤ 1
4
= 0, otherwise
Elements of Probability 81
P( A, B)
P( A|B) =
P(B)
we thus have
f ( x* , y )dx dy
f ( y|x )dy = (2.26)
f ( x* ) dx
or
f ( x* , y )
f ( y |x ) = (2.27)
f ( x* )
and
f ( x , y* )
f ( x| y ) = (2.28)
f ( y* )
Note that x* is a constant in Eq. 2.27 and y* is a constant in Eq. 2.28. Since x* is a
constant, the expression f(x*, y) signifies the function of f(x, y) for a constant value
of x. On a two-dimensional probability space, this is the cross section of the plane
X = x* with the probability density surface f(x, y), which is a curve, not a surface.
The curve f(x*, y), shown in Fig. 2-30, measures the way the probability density
changes with Y for a constant X but is not a proper PDF since its area will not be
equal to 1. However, the area can be computed by integrating f(x, y)dy over the
entire range of Y for constant X. This is the same way that the joint probability
density function of X is obtained. This will yield the area under the curve f(x*, y)
as equal to f(x*), as shown in Fig. 2-30. Thus, dividing the function f(x*, y) by f(x*)
would make the area under the curve equal to 1.
Equations 2.27 and 2.28 show the relationships among conditional, joint, and
marginal distributions, which can be clarified by considering their geometric
82 Risk and Reliability Analysis
f(y|x*) Y
f(x,y)
f(x*, y)
f(x*)
x*
Figure 2-30 Joint and marginal distributions.
representations. Figure 2-30 shows the joint distribution and the marginal distri-
bution of X.
Example 2.19 Consider the joint PDF f(x,y) given in Example 2.18. Determine
the conditional distribution of X given Y = y.
Solution Applying Eq. 2.28,
f (x, y)
f ( x| y ) = ,
f (y)
3( x − y )2 / 8 3( x − y )2
f ( x| y ) = =
[3 y 2 + 1]/ 4 6y2 + 2
In the same way one can determine the conditional distribution of Y given x:
3( x − y )2
f (y x) = , –1 ≤ y ≤ 1
6y2 + 2
= 0, otherwise
Elements of Probability 83
x
F( x ) = ∫ f ( x ) dx (2.30)
−∞
Once the CDF is determined, the probability associated with any interval can
be determined as
P(a < X ≤ b) = F(b) − F(a) (2.31)
For a discrete variable, F(x) is calculated by summation of probabilities:
Figure 2-31a and Fig. 2-31b show the probability density functions and
cumulative distribution functions for continuous and discrete variables. The
CDF is a function that starts with zero somewhere on the left-hand side and
increases till it reaches one on the right-hand side. F(x) is the total probability to
the left of x and in point x itself. For a continuous distribution, the inclusion of
point x makes no difference since the probability in each single point is zero.
Example 2.21 Plot the CDF of X given in Example 2.15 and determine (a) proba-
bility X ≤ 3, (b) probability X ≥ 9, and (c) probability 4 ≤ X ≤ 8.
Solution By using the PDF and Eq. 2.30, the CDF values are calculated at sev-
eral values of X. Then these points are joined by a smooth curve. The obtained
curve is the required CDF as shown in Fig. 2-33.
Using the plot in Fig. 2-33 one can read the probability corresponding to
any interval. The answers to the posed questions are (a) probability X ≤ 3 = 0.05,
(b) probability X ≥ 9 = 1 – 0.79 = 0.21, and (c) probability 4 ≤ X ≤ 8 = 0.60 – 0.11 =
0.49.
84 Risk and Reliability Analysis
f(x)
x ---→ a b
1.0
P(a<x≤b)
0.5
CDF
F(a) F(b)
0
x ---→ a b
f(x)
a x --→ b
1.0
F(x)
CDF
0.5
F(b) P(a<x≤b)
0
F(a)
a x --→ b
Table E2-20
X Frequency PMF = f(x) CDF = F(x)
2 1 0.03 0.03
3 2 0.06 0.08
4 3 0.08 0.17
5 4 0.11 0.28
6 5 0.14 0.42
7 6 0.17 0.58
8 5 0.14 0.72
9 4 0.11 0.83
10 3 0.08 0.92
11 2 0.06 0.97
12 1 0.03 1.00
Sum = 36 1
Figure 2-32 CDF for the sum of numbers obtained by throwing two dice.
1.00
0.90
0.79
0.80
0.70
0.60
F(x) = P(X < x)
0.60
0.50
0.40
0.30
0.20 0.11
0.10 0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
if x lies in the interval between x + (½)dx and x – (½)dx, such that y = w(x). This
means that the probability that Y lies in the interval between y + (½)dy and y – (½)dy
is equal to the probability that X lies between x + (½)dx and x – (½)dx. In other
words,
f(y)dy = f(x)dx (2.33)
The differential quotients dy/dx and dx/dy can be positive as well as negative
and can be determined by differentiation. The ratio between two positive inter-
vals corresponding to events on the probability space can be obtained as
dx
f ( y ) = f ( x) (2.34)
dy
f(y)
y=w(x)
Y
f(x)
Example 2.22 Assume X is a continuous variable defined in the interval 0 < X < 2
and characterized by the following PDF:
3 2
f ( x) = x
8
If y = x3, find the probability 0 < Y< 5.
Solution Applying Eq. 2.34, f(y) = f(x) dx/dy, and differentiating the relationship
y = x3, dy = 3x2dx, we get dx/dy = 1/3x2 and so
dx 3 2 1 1
f ( y ) = f ( x) = x 2
=
dy 8 3 x 8
C = C0 exp ( − xt )
λe − λx x α−1
f ( x) =
Γ( α)
dx
f ( c) = f ( x )
dc
If x is rewritten as
1 ⎛ C⎞
x = − ln ⎜ ⎟
t ⎝ C0 ⎠
then
⎡ 1 ⎛ C ⎞⎤
d ⎢ − ln ⎜ ⎟ ⎥
t ⎝ C0 ⎠ ⎦
= ⎣
dx 1
=
dC dC −tC0 C
Elements of Probability 89
1 ⎛ C⎞
x = − ln ⎜ ⎟
t ⎝ C0 ⎠
and substituting dx/dc and simplifying the expression gives the distribution of
chlorine residual as
λ/t α −1
⎛ C⎞ ⎡ ⎛ C ⎞⎤
λ⎜ ⎟ ⎢ln ⎜ ⎟ ⎥
⎝ C0 ⎠ ⎣ ⎝ C0 ⎠ ⎦
f (c ) =
Γ( α)C ⋅ C0 (−t)α
2.7 Questions
2.1 Obtain daily temperature data for the city you live in for a number of
years, say, 30 years or more. Plot the temperature data against time. Is
daily temperature random? Obtain the maximum temperature and the
minimum temperature for the month of August for each year. Plot this
temperature as a function of year. Compute the mean temperature for
August and plot it. Discuss whether the maximum temperature and
minimum temperatures are random variables.
2.2 Obtain rainfall data for a city of your choice for several years, say, 30
years or more. Compute yearly rainfall and the long-term yearly mean.
Plot yearly rainfall as well as the mean. Is yearly rainfall a random vari-
able? Now obtain the rainfall data for the month of August for each year.
Also obtain the long-term mean rainfall for the month of August. Plot
the August rainfall data as a function of year. Also plot the mean. Is rain-
fall for the month of August a random variable?
2.3 Obtain the yearly maximum wind velocity data for a city of your choice
for a number of years. Plot the wind velocity as a function of time and
discuss whether this velocity is a random variable. Also plot the wind
velocity mean.
2.4 Obtain the instantaneous maximum discharge data for the Amite River
at Darlington, Louisiana. Plot the maximum discharge as a function of
year and show if the discharge can be considered as a random variable.
Also plot the mean discharge. Also compute the time between two con-
secutive maximum discharge values, called the interarrival time. Com-
pute the average value of this time. Can the interarrival time be
considered random?
90 Risk and Reliability Analysis
2.5 Each year air quality standards established by the EPA are violated for a
certain number of times in Baton Rouge, Louisiana. Obtain the number
of violations occurring in Baton Rouge for several years. Can the number
of violations be considered a random variable? Compute the period of
each violation as well as the time between violations. Can these be con-
sidered random? Compute their mean values.
2.6 The following experiments involve a random variable or nonrandom
variable:
(a) Roll two fair dice at a time.
(b) Measure the time between consecutive plane arrivals.
(c) Toss a coin five times.
(d) Take a penalty shot on goal.
(e) Measure the number of days an air quality standard is violated in a
year.
(f) Roll a die and determine whether it is a 6 or not.
(g) Test a randomly selected circuit to see whether it is defective.
(h) Determine the number of vehicles on a bridge at a particular period
of time.
(i) Determine whether there was flooding this year in New Orleans.
(j) Find out the flow rate in a stream.
(k) Determine when a comet appears in the sky.
(l) Request a person's highest educational level.
Indicate which of the variables is random, and if it is, then determine
which type of random variable it is (e.g., discrete, continuous, or
Bernoulli).
2.7 A concrete culvert is to be designed such that it can carry a predicted
flow. Discharge measurements are irregular, and the engineer assigns
estimates of annual maximum flow rates and their likelihoods of occur-
rences (assuming that a maximum 20 cfs is possible) as follows: event A
= [10 to 17] with P[A] = 0.6; event B = [13 to 20] with P[B] = 0.6, and event
C = [A ∩ B] with P[A] = 0.7.
(a) Construct the sample space. Indicate events A, B, C, A ∩ C, A ∩ B,
and Ac ∩ Bc on the sample space.
(b) Find P[A ∩ B], P[Ac], and P[B ∩ Ac].
(c) Find P[A|B], P[B|A], and P[B|Ac].
2.8 It is assumed that earthquakes and high wind speeds are unrelated. At a
particular location the probability of “high” wind speed occurring
throughout any single minute is 10–6 and the probability of a “moderate”
earthquake during any single minute is 10–9.
Elements of Probability 91
(a) Find the probability of the joint occurrence of the two events during
any minute.
(b) Find the probability of the occurrence of one or the other or both
during any minute.
(c) If the events in succeeding minutes are mutually independent, what
is the probability that there will be no moderate earthquakes in a
year near this location? What is the probability in 5 years?
2.9 Consider the possible failure of a water supply system to meet demand
during any given dry-season day.
(a) Use the total probability (Eq. 2.16) to determinate the probability that
the supply will be insufficient if the probabilities are as listed in
Table Q2-9.
Table Q2-9
Demand level (gal/day) P [level] P [inadequate supply level]
D1 50,000 0.55 0
D2 100,000 0.35 0.1
D3 150,000 0.10 0.5
1.00
Table Q2-12
Year 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953
Peak flow 12400 8850 1380 3040 1600 2480 1810 6890 9390 3170 2210
Year 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
Peak flow 829 1320 10300 7950 5730 7500 2290 10300 4110 6050 10600
Year 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975
Peak flow 1830 1090 1040 24500 1700 5060 1310 2020 7270 8060 3920
2.13 Develop a frequency histogram and a cumulative frequency plot for the
peak flow of Question 2.12 using different numbers of bins. Evaluate the
impact of the number of bins on both the frequency histogram and
cumulative frequency plots. What frequency bin gives you the most
appropriate results?
2.14 If a highway bridge is constructed on Salt Creek near Rowell (Question
2.12) for a 15-year flood, what is the probability that the design flood will
be exceeded in the next 30 years?
2.15 Assume X is a continuous variable defined in the interval 0 < X < 1 and
characterized by the following PDF:
f ( x) = 4 x 2 + 2x + 1
3 (x + y )
2
f (x, y ) = , –1 ≤ x ≤ 1; –1 ≤ y ≤ 1
8
= 0 otherwise
Find the marginal probability density function of X.
2.17 If the dynamic head and discharge relationship in a given pipe system is
described by the following relationship:
h = aqb
where h is the dynamic head and q is the flow described by the following
distribution function:
1 ⎡ 1 ⎛ q − μ⎞ 2 ⎤
f ( q) = exp ⎢ − ⎜ ⎟ ⎥
σ 2π ⎢⎣ 2 ⎝ σ ⎠ ⎥⎦
determine the probability distribution function of the dynamic head h.
Elements of Probability 93
2.18 Based on soil data it was found that the concentration c (μg/L) of tetra-
chloroethylene is described by the following log-normal distribution:
1 ⎡ 1 ⎛ ln c − 6.8 ⎞ 2 ⎤
f (c ) = exp ⎢ − ⎜ ⎟ ⎥
1.2c 2 π ⎢⎣ 2 ⎝ 1.2 ⎠ ⎥⎦
y = c exp(− kx )
( )
f ( x , y ) = k x 2 y + xy 2 + xy + x + y + c , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.
2.22 Let X and Y be two random variables described by their probability den-
sity functions f(x) and f(y) as
− ( 2 y −1)
f ( x) = e −(2 x +1) (x ≥ 0) and f ( y ) = e (y ≥ 0)
1 ⎡⎛ x 2 y 2 ⎞ ⎤
− ⎢⎜ + ⎟ ⎥
1 2 ⎝ 9 16 ⎠ ⎥⎦
f (x, y) = e ⎢⎣
24 π
Determine the following:
(a) probability [X ≤ 2 and Y ≤ 3],
(b) probability [− 1 ≤ X ≤ 3 and 2 ≤ Y ≤ 5], and
(c) probability [X ≥ 4 and Y ≥ 6].
2.25 Consider the bivariate distribution of Question 2.24. Determine the mar-
ginal distributions of X and Y. Use marginal distributions to determine
whether X and Y are independent random variables.
2.26 A preliminary groundwater drilling was conducted with an assumed
prior probability of 0.81. The electrical resistivity method might be used
to locate the drilling locations. This method gives favorable results for
about 78% of applications where water was known to be present and
97% unfavorable results where water was not found. Determine (a) the
probability of finding water given a favorably result and (b) the proba-
bility of finding water given an unfavorable result.
Chapter 3
95
96 Risk and Reliability Analysis
3.1 Expectation
Let X be a random variable characterized with a probability density function
(PDF) or probability mass function (PMF), f(x). Further, let g(x) be another func-
tion of x defining a given system. The expectation of a function g(x), denoted
E[g(x)], is defined as
∞
E ⎡⎣g ( x )⎤⎦ = ∫ g ( x) f (x)dx , if X is a continuous random variable (3.1a)
−∞
∞
E[X ] = ∫ xf (x)dx , if X is a continuous random variable (3.2a)
−∞
E [ X ] = ∑ xf ( x )
, if X is a discrete random variable (3.2b)
all x
5. |E[X]|≤ E[|X|]
6. |E[X]|≤ c if P(|X|≤ c) = 1
The expectation of common random variables can also be calculated by
using Eqs. 3.1 and 3.2.
Example 3.1 Suppose that a discrete random variable X has the following PMF:
X 0 1 2 3 4 5 6
f(x) 0.05 0.1 0.2 0.3 0.2 0.1 0.05
Example 3.2 Calculate (i) E[X], (ii) E[2X], (iii) E[2X+8], if X is uniformly distrib-
uted over (0, 1).
Solution In this example, we have to determine the mathematical form of the
distribution f(x). Because X is distributed uniformly, f(x) is parallel to the X axis.
Let f(x) = c. Moreover, f(x) is defined only over (0, 1). For f(x) to be a distribution,
its area (a rectangle with base length = 1 and width = c) should be 1. So, 1 × c = 1,
98 Risk and Reliability Analysis
giving c = 1. So, f(x) = 1 is defined over (0, 1). Knowing f(x), one can calculate the
required expectations as given in the following.
(i) Using Eq. 3.2a, we have
∞ 1 1
1 2 1 1
E[X ] = ∫ xf ( x )dx = ∫ xf ( x)dx = ∫ xdx = ⎡x ⎤ =
−∞ 0 0
2 ⎣ ⎦0 2
f (x) =
4
3
( )
1 − x 3 for 0 < x ≤ 1
= 0 elsewhere
Find (a) E[X] and (b) E[3X + 2].
Solution
∞ 1 1 1
4 ⎡ x2 x5 ⎤
(a) E [ X ] = ∫ xf ( x )dx = ∫ xf ( x )dx =
4
30∫ x 1 − (
x 3
dx =) ⎢ − ⎥ =
3⎣ 2 5⎦
4
10
−∞ 0 0
(b) E[3X + 2] = 3 E[X] + E[2], using the second property of expectation
= 3 E[X] + 2, using the first property of expectation
= 3 × 4/10 + 2 (substituting the result E[X] = 4/10)
= 16/5
Solution By using the relationship between TP and flow, the midpoints of the
flow classes are converted into the corresponding TP concentrations. Then, the
expected value of the TP concentration can be calculated by applying Eq. 3.1. For
example, for the second class the midpoint is 3 cfs. The corresponding TP con-
centration is 2.51 mg/L. The relative frequency of TP will be equal to the corre-
sponding relative frequency of flow in the same class interval. Thus TP = 2.51
has a relative frequency of 0.04. Therefore, the TP and its relative frequency are
given as follows:
100 Risk and Reliability Analysis
TP (mg/L) 2.07 2.51 2.79 3.06 3.34 3.62 3.90 4.17 4.45 4.73 5.00
Relative frequency 0.01 0.04 0.12 0.15 0.18 0.18 0.16 0.10 0.04 0.01 0.00
3.2 Moments
The moments of a distribution comprise a special class of expectations that can
be used to compare distributions and derive properties of the distributions. In
many cases, the moments of a distribution are used as a way of summarizing the
important characteristics of a distribution as single numbers, without entailing
too much detail. There are several types of moments in the statistical literature,
but we most commonly deal with two general types of moments: moments
about the origin (or regular moments) and moments about the centroid (or cen-
tral moments).
∞
μ’k = E ⎡⎣ X k ⎤⎦ = ∫x
k
f ( x )dx for the continuous case (3.3a)
−∞
μ’k = E⎡
⎣X ⎤⎦= ∑ x f ( x ) for the discrete case
k k
(3.3b)
all x
It is clear from Eq. 3.3 that the zeroth noncentral moment is the integration of
the PDF or PMF itself, giving μ’0 = E [1] = 1. Further, for k = 1 Eq. 3.3 gives the
first noncentral moment, which is equal to the mean of the distribution f(x), that
is, μ’1 = E [ X ] = μ. In general, the first noncentral moment provides a measure of
the central location of a distribution. We will discuss measures of central location
in detail later in this chapter.
Moments and Expectation 101
∞
μk = E ⎡( X − μ) ⎤ = ∫ (x − μ)
k k
f ( x)dx for the continuous case (3.4a)
⎣ ⎦
−∞
⎡k k −i ⎤
k k
μk = E ⎡( X − μ) ⎤ = E ⎢ ∑
⎣
k
⎦ ()
k
i X i ( − μ) ⎥=∑ ( k
i ) (− μ)k −i E ⎡⎣X i ⎤⎦ = ∑ ( ik ) (− μ)k −i μ’i (3.5)
⎣⎢ i = 0 ⎥⎦ i = 0 i=0
1
μ1 = ∑ ( 1i ) (− μ)1−i μ’i = ( 10 ) (− μ)1 μ’0 + ( 11 ) (− μ)0 μ’1
i=0
μ1 = ( 10 ) (− μ) + ( 11 ) μ = μ − μ = 0 (3.5a)
102 Risk and Reliability Analysis
Thus, the first central moment of any random variable is always zero. Now,
for k = 2,
2
μ2 = ∑ ( i2 ) (− μ)2−i μ’i = ( 02 ) (− μ)2 μ’0 + ( 12 ) (− μ)1 μ1’ + ( 22 ) (− μ)0 μ’2 = μ2 − 2 μ2 + μ’2 = μ’2 − μ2
i=0
Thus,
μ2 = μ’2 − μ2 (3.5b)
1
μ3 = ∑ ( i3 ) (−μ) = ( 03 ) (−μ) μ’0 + ( 13 ) (−μ) μ’1 + ( 23 ) (−μ) μ’2 + ( 33 ) (−μ) μ’3
3−i ’ 3 2 1 0
μi
i=0
= – μ3 + 3 μ3 – 3 μ μ ’2 + μ ’3
= μ 3’ – 3 μ μ 2’ + 2 μ3 (3.5c)
μ4 = μ 4’ – 4 μ μ 3’ + 6 μ 2μ 2’ – 3 μ 4 (3.5d)
f ( x) = x , 0 ≤ X ≤ 2
(i) Determine the first four moments about the origin, and (ii) use the non-
central moments to determine the central moments.
Solution
(i) Using Eq. 3.3a gives the first moment of X about the origin:
∞ 2
1 3 2 2 2
μ’1 ∫ xf (x)dx = ∫ x ⎡x ⎤ =
2
= dx =
−∞ 0
3 ⎣ ⎦0 3
∞ 2
1 4 2
μ’2 = ∫ x 2 f ( x )dx = ∫x ⎡x ⎤ = 1
3
dx =
−∞ 0
4 ⎣ ⎦0
∞ 2
1 6 2 4
μ’4 = ∫ x 4 f ( x)dx = ∫x ⎡x ⎤ =
5
dx =
−∞ 0
6 ⎣ ⎦0 3
(ii) Using Eq. 3.5a, one gets μ1 = 0. The second central moment can be calcu-
lated by using Eq. 3.5b:
8 1
μ2 = μ’2 − μ2 = 1 − =
9 9
Using Eq. 3.5c gives the third central moment:
3
4 2 ⎛ 2 2⎞ ⎛ 2 2⎞ 2 2
μ3 = μ’3 − 3 μμ’2 + 2 μ3 = − 3⎜ ⎟ (1) + 2 ⎜ ⎟ =−
5 ⎝ 3 ⎠ ⎝ 3 ⎠ 135
⎧
⎪ all∑x e
tx
f ( x ) for X diiscrete
Mx (t ) = E ⎡⎣ e tX ⎤
⎦ = ⎨ ∞ (3.6)
⎪ ∫ e tx f ( x )dx for X continuous
⎩ −∞
Using the mgf MX(t), we can define the kth moment about the origin as the
value of the kth derivative with respect to t, evaluated at t = 0:
d k MX (t )
μ’k = E ⎡⎣ X k ⎤⎦ = (3.7)
dt k t =0
104 Risk and Reliability Analysis
x − 10 −5 0 5 10 100
F(x) 1/20 1/15 1/10 1/5 1/4 1/3
Using the mgf determine the first noncentral moment (i.e., the mean of X).
Solution Using Eq. 3.6 gives the mgf as
−
( x − μ)2
1 , –∞ < x < ∞
f ( x) = e 2σ 2
σ 2π
Determine the mgf.
Solution Using Eq. 3.6 we have
∞ ∞ −
( x − μ)2
1
MX (t ) = ∫e
tx
f ( x )dx = ∫e
tx
e 2σ 2 dx
−∞ −∞
σ 2π
Moments and Expectation 105
1
t μ+ t 2 σ 2 ∞ ( z−tσ )2 ⎡ ( z−tσ )2 ⎤ t μ+1 t 2 σ 2
1 ∞ −
1
e 2 − t μ+ t 2 σ 2
∫ ⎢ dz ⎥= e 2
⎢ 2π ∫
MX (t ) = e 2 dz = e 2 e 2
2π ⎥
−∞ ⎣ −∞ ⎦
1 d k ϕX ( t )
μ’k = E ⎡⎣ X k ⎤⎦ = k (3.9)
i dt k t = 0
Example 3.8 Find the characteristic function for the random variable X defined
in Example 3.2. Using the obtained characteristic function, find the expected
value of X.
Solution The PDF of X is f(x) = 1 for 0 ≤ x ≤ 1, = 0 elsewhere
The characteristic function of X is
∞ 1 1
e itx
ϕ X (t ) = E ⎡⎣ e itX ⎤⎦ = ∫ e itx f ( x )dx = ∫ e itx (1)dx =
it
= (
1 it
it
e −1 )
−∞ 0 0
( )
tic function e it − 1 / it . Now the expected value of X is
μ’1 = E[X ] =
1 d ϕX ( t )
=
2 it
(
1 i te − ie + i
it
) =−
1
⎡ e it ( it − 1) + 1⎤
i dt t = 0 i i 2t 2 t ⎣
2 ⎦
t=0
t =0
106 Risk and Reliability Analysis
Expanding the exponential term up to second order and then taking the limit
at t = 0, one obtains
1 (it)2 1 t 2 it 3 1
E[X ] = − 2
[(1 + it + )(it − 1) + 1] t=0 =− 2
[− − ] t=0 =
t 2! t 2 2 2
∞ ∞ ∞
( it −1) x e(it −1)x 1
ϕ X (t ) = E ⎡⎣ e itX ⎤
⎦= ∫e
itx
f ( x )dx = ∫ e dx = =
−∞ 0
(it − 1) 0
(1 − it)
3.5.1 Mean
The most important parameter of a distribution is the mean of a random vari-
able. The mean is defined as the first moment of the probability distribution
about the origin of the probability space and is usually designated by the Greek
letter μ. For a continuous distribution, μ is defined as
+∞ 1 ∞
μ= ∫ x f ( x ) dx = ∫ xdF( x ) = ∫ [1 − F(x)]dx (3.10)
−∞ 0 −∞
n
μ= ∑ xi p( xi ) (3.11)
i =1
Mode
f(x)
μ dx x→
Figure 3-2a The mean for a continuous case.
p(xi)
xi
μ x →
⎡ ∂f ( x ) ⎤
⎢ ∂x ⎥ =0
⎣ ⎦ x = xmo
being exceeded. In other words, the median divides the distribution into two
equal halves and represents the 50th quantile of X:
xmed
F( xmed ) = ∫ f ( x ) dx = 0.5
−∞
1 n
m= ∑ xi
n i =1
(3.12)
Similarly, one may want to determine the mean elevation of the water level
during February in a river at a specified location. For this purpose, measure-
ments of the river water level can be carried out each day and their mean can
then be computed.
But why do we need to take a sample? Taking a sample allows us to estimate
the probability associated with each event in the probability space. Consider an
example of cement strength, where one may wish to estimate the probability
that the strength of a batch of cement to be used in a structure falls below a spec-
ified strength. To compute this probability, a sample of observations of cement
Moments and Expectation 109
strength is collected. If, in the sample, say, 10% of the observations of X were
equal to or less than x then it is expected that the probability that X will be equal
to or less than x for the batch of cement to be manufactured will also be 10%. In
other words, F(x) = 10%. Similarly, for computing the probability of discharge
exceeding a certain value we need observations of river flow; for computing the
probability of rainfall amount exceeding a certain value we need observations of
rainfall; for computing the probability of water quality violations we need obser-
vations of water quality constituents.
One can place reasonable confidence in such probability estimates if the
sample is representative of the expected hydraulic conductivity or of the con-
crete strength. Therefore, special precautions are taken to ensure that the sample
will be free of bias (i.e., one should not choose all “good” or “bad” samples). If
possible, numerous test observations are taken so that exceptionally high or low
values have less influence on the final result. Ideally, one should take samples
such that the entire range of variable X is covered, as shown in Fig. 3-3. The sam-
pling should be such that the distribution of relative frequency over the sample
space is representative of the distribution of probability over probability space.
Since the mean of X lies at the center of gravity of the probability mass, one
may expect the center of gravity of the relative frequency mass to be close to the
mean. The latter is calculated by Eq. 3.3. The sample mean is denoted by m and
is an estimator of the mean of the random variable X, which has been designated
by μ.
It is important to keep the following points in mind: (1) The sample distribu-
tion is always a discrete distribution, even though the random variable may be
continuous. (2) The sample distribution is representative of a set of observations.
(3) The sample distribution tends to vary from sample to sample. (4) The sample
mean m itself is a random variable. It varies from sample to sample and each
additional observation tends to change it. (5) The sample mean is usually
referred to as a statistic rather than a parameter. To distinguish it from the mean
of the random variable, μ, it is designated by the letter m or by x. (6) The larger
the sample, the better the agreement between m and μ.
Relative frequencies
1/n 1/n 1/n 1/n 1/n 1/n
x →
i 33 S l ih b i
Figure 3-3 Sample space with n observations.
110 Risk and Reliability Analysis
Example 3.10 The number of passengers who arrived at a railway terminal was
counted from Monday to Friday and the following values were obtained, respec-
tively: 10,371, 8,448, 9,165, 8,974, and 10,739. Find the average number of passen-
gers arriving at the railway terminal each day.
Solution The average number of passengers is
m = (10,371 + 8,448 + 9,165 + 8,974 + 10,739)/5 = 9,539.
This gives the arithmetic mean of the variable. Besides arithmetic mean,
there are also other types of means.
n n
1
x g = ( x1 x2 ⋅⋅⋅ xn )
1/ n
= exp( ∑ log xi ) = ∏ xi1 n
n i=1
(3.13)
i=1
Note that if any one of the observations is zero, the geometric mean will be
zero. Further, if any of the observations is less than zero, the geometric mean
cannot be computed. Logarithms are helpful in computations when more than
three observations are involved. Most often water quality, air pollution, and soil
contaminant data are handled by transforming the raw data by taking loga-
rithms (i.e., these data are log-normally distributed). Further, in real-life prob-
lems, decisions are taken using sample data as the data collection effort is
extremely costly and time consuming. For example, a primary concern in risk-
based corrective action (RBCA) is the decision criterion for evaluating attain-
ment of cleanup objectives. The statistic for comparison with a risk-based
cleanup objective should be accurate and stable. It has been observed that in
such a situation the geometric mean is a more accurate and stable estimator of
the true average concentration and its confidence intervals.
Example 3.11 Analysis of demographic data for a country showed that its popu-
lation growth rate from 1970 to 1980 was 1.25%; from 1980 to 1990, it was 1.22%,
and from 1990 to 2000; the rate was 1.15%. Find the average growth rate for the
period 1970 to 2000.
Solution The average growth rate can be obtained by taking the geometric
mean of given rates:
Example 3.12 Consider the case of river pollution. Careful observations showed
that the river’s pollutant concentration near an industrial town increased by 20%
in the year 2002. The next year the pollutant concentration increased by 60%.
Compute the average rate of increase in pollutant concentration.
Solution The average rate of increase in pollutant concentration can be deter-
mined by using the geometric mean. Thus,
x g = ( x1 x2 ⋅⋅⋅ xn )
1/ n
= ( 20 × 60 )1 2 = 34.64
1 n
xh = =
⎛ 1 ⎞ ⎡⎛ 1 ⎞ ⎛ 1 ⎞ ⎛ 1 ⎞⎤ n
(3.14)
⎜⎝ ⎟⎠ ⎢⎜ ⎟ + ⎜ ⎟ + ⋅⋅⋅ + ⎜ ⎟ ⎥
n ⎣⎝ x1 ⎠ ⎝ x2 ⎠ ⎝ xn ⎠ ⎦
∑ 1 xi
i =1
Example 3.13 Consider a rainfall event with an intensity of 2.5 cm/h that pro-
duced 5 cm of rainfall. Another event had an intensity of 4 cm/h and produced
7 cm. Compute the mean rainfall intensity.
Solution The time for the first event is 5/2.5 = 2 h. The time for the second event
is 7/4 = 1.75 h. Thus, the total time is 2 + 1.75 = 3.75 h. The total rainfall amount
is simply 5 + 7 = 12 cm. Therefore the average (arithmetic) rainfall intensity is
12/3.75 = 3.2 cm/h.
112 Risk and Reliability Analysis
1 1
xh = = ≈ 3.08 cm
⎛1⎞ 1⎡⎛ ⎞ ⎛ 1 ⎞⎤ ⎛ 1 ⎡
⎞ ⎛ 1 ⎞ ⎛ 1 ⎞⎤
⎜ ⎟⎢⎜ ⎟+⎜ ⎟⎥ ⎝ ⎠⎢ ⎜ ⎟ ⎜ ⎟+⎜ ⎟⎥
⎝ 2 ⎠⎣⎝ x1 ⎠ ⎝ x2 ⎠⎦ 2 ⎣⎝ 2.5 ⎠ ⎝ 4 ⎠⎦
Example 3.14 The discharge along a reach of the Song River was 256 m3/s on
November 14. Measurements showed that the cross-sectional areas of flow at five
locations were 103, 96, 114, 107, and 91 m2. Find the mean velocity in the reach.
Solution Velocity is the ratio of discharge to cross-sectional area. The harmonic
mean of the flow areas is
1
vm =
⎛ 1 ⎞⎡ 1 1 1 1 1⎤
⎜ ⎟⎢ + + + + ⎥
⎝ 5 ⎠⎣ v1 v2 v3 v4 v5 ⎦
1
=
⎛ 1 ⎞⎡ 103 96 114 107 91
⎜ ⎟⎢ + + + +
⎝ 5 ⎠⎣ 256 256 256 256 256
= 2.504 m/s
One can also compute the harmonic mean velocity by obtaining the har-
monic mean area first as
1
Ah = = 101.56 m 2
⎛ 1 ⎞⎡ 1 1 1 1 1⎤
⎜ ⎟⎢ + + + + ⎥
⎝ 5 ⎠⎣ 103 96 114 107 91 ⎦
1 1
xh = = = 40 km/h
⎛ 1 ⎞ ⎡⎛ 1 ⎞ ⎛ 1 ⎞ ⎤ ⎛ 1⎞ ⎛ 3 ⎞
⎜⎝ ⎟⎠ ⎢⎜⎝ ⎟⎠ + ⎜⎝ ⎟⎠ ⎥ ⎜⎝ ⎟⎠ ⎜⎝ ⎟⎠
2 60
2 ⎣ 30 60 ⎦
Referring to Example 3.12, we see that the difference in the numbers was
purposely kept large to illustrate the relative importance of the numbers and
why the behavior given by Eq. 3.15 is noted. The second number (60) is three
times the first and its influence on the arithmetic average is also three times
larger than the former. When the geometric mean is taken, the influence is not
three times larger because the differential in logarithms is not that large. With
the harmonic mean, the relative weights are reversed—the contribution of the
smaller number to the mean is more. In summary, the size matters in the arith-
metic mean, the size of logarithms matters in the geometric mean, and the recip-
rocals determine the relative importance in the harmonic mean.
The relative positions of the arithmetic mean, the median, and the mode
depend on the symmetry of the distribution. If the distribution is symmetric,
these measures of central tendency are equal. If the distribution is asymmetric,
they take different positions. If the distribution is not unimodal, the simple rela-
tionships between them may not be valid.
3.5.3 Variance
Variance measures the variability of a random variable and is the second most
important descriptor of its probability distribution. A small variance of a vari-
able indicates that its values are likely to stay near the mean value whereas a
large variance implies that the values have large dispersion around the mean. If
the stage of a river at a gauging station is independently measured in a quick
succession a number of times in a survey, then there will likely be variability in
the stage measurements. The magnitude of the variability is a measure of the
natural variation and the measurement error.
Variance, designated by the Greek letter σ 2, measures the deviation from the
mean and is universally accepted as given by the second moment of the proba-
bility mass about the mean. Sometimes, the notation VAR(x) or var(x) is also
used. For a continuous variable X variance is expressed as
+∞
∫
2
σ = ( x − μ)2 f ( x) dx (3.16)
−∞
f(x1)
f(x2)
σx2
σx1
σx3 f(x3)
x --→
n
1
s2 =
n
∑ ( x i − m )2 (3.18)
i =1
With the sample mean calculated first, the sample variance can be deter-
mined from Eq. 3.18. The sample variance serves as an estimator of the variance
of the random variable X. The sample variance is also referred to as a statistic,
rather than a parameter. It is itself a random variable that is likely to change
when additional observations are made. To distinguish sample variance from
the variance of the random variable X, it is denoted as s2. When the number of
samples n ≤ 30, an unbiased estimate of variance is obtained by
n
1
s2 =
n−1
∑ ( x i − m )2 (3.19)
i =1
116 Risk and Reliability Analysis
1 n 2
s2 = ∑
n i =1
xi − m 2 (3.20)
It is easier to use Eq. 3.20 because one need not subtract m from all values of
x before squaring. Instead, x2 can be computed at the time when computing the
mean.
When the mean of the data is zero, cv is undefined. This coefficient is useful
in comparing different populations or their distributions. For example, if two
samples of aggregates of water quality are analyzed, the one with larger cv will
have more variation.
If each value of a variable is multiplied by a constant α , the mean, variance,
and standard deviation are obtained by multiplying the original mean, variance,
and standard deviation by α , α 2, and α, respectively; the coefficient of variation
remains unchanged. If a constant α is added to each value of the variable, the
new mean is equal to the old mean + α ; the variance and the standard deviation
remain unchanged; the coefficient of variation changes because the unchanged
standard deviation is divided by the new mean.
Example 3.16 For a lake, water levels have been observed for 10 years. The max-
imum level (x) for each year is listed in Table E3-16. From these data, estimate
the mean and the standard deviation of the maximum lake levels.
Solution The computations are demonstrated in Table E3-16.
From the table, we have
m = (1/10)Σ xi = (1/10) × 2,180 = 218.0 m
s= 1.47 = 1.21 m
Moments and Expectation 117
Table E3-16
Year x (meters) x–m (x – m)2
1971 217.5 − 0.5 0.25
1972 218.8 0.8 0.64
1973 216.0 − 2.0 4.00
1974 217.8 − 0.2 0.04
1975 220.0 2.0 4.00
1976 218.2 0.2 0.04
1977 217.2 − 0.8 0.64
1978 218.5 0.5 0.25
1979 219.3 1.3 1.69
1980 216.7 − 1.3 1.69
Sum 2,180.0 0.0 13.24
Example 3.17 The discharge and stage values of the Amite River near Darling-
ton, Louisiana, are given in Table 3-17a. Compute the mean, median, mode,
mean deviation, standard deviation, coefficient of variation, and ratio of stan-
dard deviation to the mean deviation of the discharge and stage values.
Solution The sum of discharge and stage values is given in the last row of
Table 3-17a. We have 48 sets of data. Thus the mean discharge is 1,376,440/48 =
28,675.83 cfs. The statistical properties of discharge (cusecs) and stage (ft) data
are computed and given in Table 3-21b.
Table E3-17a
Year Discharge Stage (ft)
(cubic ft/sec)
1949 20,000 17.79
1950 43,400 20.2
1951 31,600 19.04
1952 3,180 11.22
1953 18,900 17.63
1954 3,280 11.57
1955 55,700 21.17
1956 20,400 17.84
1957 20,200 17.81
1958 6,900 18.05
1959 9,800 14.83
1960 37,900 15.9
1961 15,400 19.69
1962 4,530 17.06
1963 44,500 12.92
1964 44,500 19.4
1965 20,000 19.37
1966 39,300 18.97
1967 8,000 14.13
1968 8,600 13.82
1969 36,300 9.26
1970 10,100 14.44
1971 45,500 19.43
1972 62,100 20.19
1973 22,400 17.13
1974 40,700 18.98
1975 7,660 12.35
1976 76,400 21.76
1977 30,500 18.09
1978 43,400 19.25
1979 47,500 19.59
1980 8,320 12.85
1981 18,100 16.03
Moments and Expectation 119
Table E3-17b
Parameter Discharge Stage
Mean 28,675.83 cusec 16.94 ft
Standard deviation 21,117.14 cusec 2.937 ft
Median 20,800 cusec 17.295 ft
Mode 20,000 cusec 17.13 ft
Mean deviation (mean of absolute 16,744.13 cusec 2.366 ft
deviations from the mean)
Coefficient of variation 0.736 0.173
Ratio of standard deviation to mean 1.261 1.241
deviation
Equation 3.24 is often used to calculate the sample variance. Equations 3.23
and 3.24 can also be employed to calculate the variance of simple functions of X.
For example, the variance of the linear functions of Y = a + bX, where a and b are
constants, can be determined as
120 Risk and Reliability Analysis
3.5.4 Skewness
Probability distributions are not usually symmetrical about their mean. This
property of being asymmetrical is commonly referred to as the skewness of the
distribution. The degree of skewness is measured by the third moment of the
probability mass about the mean. For symmetrical distributions, the third
moment is zero because the contributions of the probabilities on either side of the
mean have opposite signs and cancel each other in the integral. The more asym-
metrical the distribution is, the greater will be the absolute value of the third
moment. The third moment about the mean can be positive or negative, corre-
sponding to a positive or negative skew. Figure 3-5 shows the possible cases.
The third moment about the mean for a continuous distribution can be calcu-
lated as
+∞
μ3 = ∫ (x− μ)3 f (x) dx (3.26)
−∞
f(x1)
f(x2)
f(x3)
x --→
Figure 3-5 Symmetrical and skewed distributions.
Moments and Expectation 121
μ3
γ = Cs = (3.28a)
σ3
The common notation for the coefficient of skewness is γ or Cs. The sign of
the skewness coefficient can be used to denote the degree of symmetry of the
probability distribution function. If γ is zero, the distribution is symmetric about
its mean. If γ is greater than zero, the distribution is positively skewed or the dis-
tribution has a long tail to the right. If γ is negative then the distribution is nega-
tively skewed or the distribution has a long tail to the left.
Another measure of asymmetry is the Pearson skewness coefficient γ1
expressed as
μ − xmod
γ1 = (3.28b)
σ
Clearly this does not involve computation of moments higher than two and
is thus less susceptible to error.
n
n ∑ ( x i − x )3
i =1 (3.29)
g= 3
(n − 1)(n − 2)s
Example 3.18 Compute the coefficient of skewness of the discharge of the Amite
River, given in Example 3.17.
Solution For the data (48 values), the mean and standard deviation were com-
puted as 28,675.83 cusec and 2,117.14 cusec, respectively. The summation term in
Eq. 3.29 comes out to be 5.6729 × 1014.
Hence,
g = 48×5.6729×1014/[47×46×2,117.143] = 1.34
122 Risk and Reliability Analysis
Table E3-19
Xi Frequency Relative xi f(xi ) (xi – m)2 f(xi) (xi – m)3 f(xi)
frequency
f(xi )
0 102 0.283 0.0 0.3963 – 0.4689
1 144 0.400 0.400 0.0134 – 0.0025
2 74 0.206 0.412 0.1374 +0.1122
3 28 0.078 0.234 0.2574 +0.4677
4 10 0.028 0.112 0.2221 +0.6257
5 2 0.005 0.025 0.728 +0.2780
Sum 360 1.000 1.183 1.0994 +1.1022
Moments and Expectation 123
μ4
∫ ( x − μ)4 f ( x) dx
(3.31a)
−∞
K= =
μ22 σ 4
with K > 0. For the normal distribution the value of K is 3. This value is used as a
reference to indicate the degree of peakedness. The coefficient of excess, Ce , is
then defined as K–3. If the value of K is greater than 3 or Ce > 0, then the distribu-
tion is called leptokutic. If K is less than 3 or Ce < 0, then the distribution is
platykutic. Stuart and Ord (1987) presented an inequality that must be satisfied
by all plausible distributions:
γ2 + 1 ≤ K (3.31b)
μR
mR = (3.32)
( μ2 )R / 2
If R = 3 then one gets the coefficient of skewness and if R = 4 then one gets
the coefficient of kurtosis.
124 Risk and Reliability Analysis
μ3
β1 = (3.33a)
μ32 / 2
μ4
β2 = (3.33b)
μ22
The classical form of the moment ratio diagram (MRD) is a graphical plot of
β1 and β2 for a specific distribution or a group of distributions, as shown in
Fig. 3-6, for a number of distributions. Usually, β1 is on the abscissa and β2 is on
the ordinate but with its values increasing downward. Pearson (Johnson and
Kotz 1985) has shown that for all distributions the following must be satisfied:
β 2 − β1 − 1 ≥ 0 (3.34)
When these ratios are plotted, one can discern the impossible region on the
graph.
Bobee et al. (1993) have described two kinds of moment ratio diagrams and
their applications in hydrology. Ashkar et al. (1988) have plotted the MRD for a
number of distributions. These can be employed to distinguish families of dis-
tributions, such as the Pearson system of distributions, the Johnson family of
distributions, and so on. They can also be used to classify distributions. They
allow us to distinguish distributions into three categories: those represented by
a point, those represented by a curve, and those represented by a region. For
example, the normal, exponential, and uniform distributions are represented by
a point, because these distributions do not have a shape parameter but have
only a scale or location parameter. The gamma, log-normal, and Student distri-
butions are represented by a curve, because they have one shape parameter. In
contrast, the beta distribution has two shape parameters and therefore is repre-
sented by a region. In this manner, an MRD permits a comparison of distribu-
tions in terms of their flexibility, because the more flexible shape of the
distribution occupies a greater portion of the diagram. An MRD also aids in
selecting a probability distribution to represent a given sample. This is done by
computing the values of β1 and β2 from the sample and then plotting the result-
ing point on the MRD. One then selects the distribution that seems to best
reflect the position of this point on the MRD. It must however be emphasized
that the sampling variance associated with skewness and kurtosis may be large,
especially with small samples, and one may end up selecting a wrong distribu-
tion as a result.
Moments and Expectation 125
15
Ck (Kurtosis coefficient)
10
5
GEV
GLOG
LOGN
P-III
0
-1 0 1 2 3
Cs (Skewness coefficient)
Figure 3-6 Cs-Ck moment ratio diagram (GEV: generalized extreme value; GLOG:
Generalized Logistic) (LOGN: three parameter Lognormal; P-III: Pearson type III)
μ→
μ →
Figure 3-7 Joint and marginal distributions of two random variables.
The product of the probability and the distance to the Y axis, x, is then inte-
grated over the entire reach of X, resulting in the double integral
+∞+∞
μx = ∫ ∫ x f ( x , y ) dx dy
(3.36)
−∞−∞
Similarly,
+∞+∞
μy = ∫ ∫ y f ( x , y ) dx dy
(3.37)
−∞−∞
Recall that the marginal distributions are defined by the same process of
summing the probability located in small strips parallel to the coordinate axes.
Thus, one can write
+∞
f ( x )= ∫ f ( x , y ) dy
(3.38)
−∞
Table E3-20
Rainfall duration x (h) Relative frequency
and amount y (mm)
x y
1 10 0.188
1 15 0.109
1 20 0.086
2 10 0.168
2 15 0.106
2 20 0.057
3 10 0.069
3 15 0.083
3 20 0.044
4 10 0.052
4 15 0.027
4 20 0.011
Sum 1.000
Solution The marginal probability for various rainfall durations (X) can be
computed by
f X ( x ) = ∑ f X ,Y ( x , y i )
yi
Thus, for the various values of X, the marginal probabilities are calculated as
follows:
P(X = 1) = 0.188 + 0.109 + 0.086 = 0.382
P(X = 2) = 0.168 + 0.106 + 0.057 = 0.331
P(X = 3) = 0.069 + 0.083 + 0.044 = 0.196
P(X = 4) = 0.052 + 0.027 + 0.011 = 0.090
128 Risk and Reliability Analysis
f X ,Y (3 , 15) 0.083
pintensity|duration (15|3) = fY|X (15|3) = = = 0.423
f X ( 3) 0.196
3.6.2 Covariance
The marginal distributions can be readily determined from the joint distribution
of two variables but the converse is not true. The joint distribution also depends
on the degree of dependence and the nature of dependence existing between the
random variables. This dependence is the reason for the study of the joint distri-
bution. The covariance is a second moment about the centroidal axes and is
defined as follows:
Figure 3-8 shows the elements of the integration for the continuous case.
If random variables X and Y are standardized as
X − μx Y − μy
X* = , Y* =
σx σy
then the standardized variables have zero mean and unity variance. It can be
shown that the covariance of standardized X and Y is equal to the correlation
coefficient between nonstandardized X and Y. Standardization of a random vari-
able does not influence its skewness coefficient or kutosis.
The centroidal axes divide the probability space into four quadrants, as
marked by the Roman numerals I to IV in Fig. 3-8. The probability masses in the
first and third quadrants make positive contributions to the value of the covari-
ance, since the product (x – μx)(y – μy) is positive. The probability masses in the
Moments and Expectation 129
↑ II
y
dy f(x, y)dxdy
y-μy
I
μy
f(x, y)
III IV
x-μx
μx dx x→
second and fourth quadrants make negative contributions to the value of the
covariance. Therefore, if values larger than the average value of X are associated
with larger than the average values of Y, and conversely, values smaller than the
average value of X occur simultaneously with values smaller than the average
value of Y, then a relatively large part of the probability mass is located in the
first and third quadrant and the covariance will be positive. However, if values
larger than the average value of Y occur with values smaller than the average
value of X and vice versa, then the covariance will be negative. If X and Y are
unrelated then the positive and negative contributions cancel each other and the
covariance will be zero.
For independent random variables X and Y,
f(x, y)dxdy = f(x)dx f(y)dy (3.41)
Substitution of Eq. 3.32 in Eq. 3.30 permits writing the double integral as the
product of two integrals. For independent variables,
+∞ +∞
cov( x , y )= ∫ ( x − μx ) f ( x)dx ∫ ( y − μy ) f ( y )dy
−∞ −∞
since
+∞ +∞
μx = ∫ x f ( x)dx and μy = ∫ y f ( y ) dy
−∞ −∞
It follows that both integrals in the product are equal to zero. Thus, the cova-
riance of independent variables is zero.
130 Risk and Reliability Analysis
σ xy
ρ= (3.42a)
σx σy
n∑ xi yi − (∑ xi )(∑ yi )
ρ= (3.42b)
n∑ xi2 − (∑ xi )2 n∑ yi2 − (∑ yi )2
The correlation coefficient lies between –1 and +1. The variables having either of
the two extreme values of the correlation coefficient are said to be highly correlated.
However, a high correlation does not mean that the variables have a cause-and-
effect relationship. A value of zero is obtained when the variables are independent
but uncorrelated variables are not necessarily independent.
n
SS = nσ 2y = ∑ ( yi − y ) (3.43a)
i=1
n
SSR = nσ 2y p = ∑ ( y pi − y )2 (3.43b)
i =1
n
SSE = nσ 2e = ∑ ( yi − y pi )2 (3.43c)
i =1
This relationship shows that the dispersion of the observed y values about
their mean is equal to the sum of the dispersion of the predicted y values about
that mean and the dispersion of the actual y values about their corresponding
predicted y values.
The measure of goodness of regression is the standard error of regression, se,
computed as
1 n SSE S
se = ∑
n i =1
( yi − y pi )2 =
n
= e = nσ e
n
(3.44a)
where yi and ypi represent the ith observed and predicted values of y, respec-
tively, Se is the sum of squares of errors, and n is the number of observations. The
standard error of regression, se, quantifies the spread of data around the regres-
sion line of fit and can be referred to as the unexplained sum of squares. Thus it
is the standard deviation of the errors of estimation. It is worth mentioning that
if any of the regression assumptions (independence, zero mean error, or com-
mon variance) concerning the residual error (ei = yi − ypi) are incorrect, then se
may not be a useful estimate of scale or dispersion for the residual errors.
Clearly, a smaller value of se indicates that points lie closer to the regression line.
If all points lie on the regression line then se = 0. For a large sample (i.e., large n)
two-thirds of the errors y − y p will be less than se and about one-third of the
errors will exceed it. Thus, by drawing two lines parallel to the regression line
and at a vertical distance equal to se from it, one can draw a region within which
about two-thirds of the sample points will fall. Likewise, one can show that 95%
of the sample points will fall within the region bounded by two lines parallel to
the regression line at a vertical distance of twice se from it.
where σ 02 is the variance of the observed data, which is a measure of the vari-
ability associated with the dependent variable before regression. This coefficient
is computed as (1 – Δ), where Δ is the difference of the variance of the observed
132 Risk and Reliability Analysis
values of the dependent variable and the variance of the values of the dependent
variable that have been computed using the regression relation divided by the
variance of the observed values. Clearly, as Δ becomes smaller, the coefficient of
determination becomes larger or the regression “improves.” Thus, it is a useful
measure of the goodness of fit for evaluating simple regression models.
As an important note about application of Eq. 3.44b, its useful characteristics
are dependent on the partitioning of the variance of the observed data into error
and regression components by using the minimization criteria of ordinary least
square. If either of these criteria is not used, then the interpretations associated
with Eq. 3.44b are no longer valid.
n
∑ (Oi − Pi )
2
(3.45)
i =1
RMSE =
n
where n is the number of observations, Oi is the ith observed value, and Pi is the
ith predicted value. RMSE should be as small as possible. Note that the RMSE is
affected by the units used for expressing the parameter of concern; consequently,
it cannot be used to compare a model’s efficacy across parameters having differ-
ent units of measurement. To overcome this shortcoming, normalized error sta-
tistics are used. One of the normalized error statistics is the normalized mean
square error (NMSE), defined as (Gershenfeld and Weigend 1993)
n
∑ (Oi − Pi )
2
i =1
NMSE = n
(3.46)
∑ (Oi − O)
2
i =1
where O is the mean of the observed data. NMSE ranges from 0 to +∞ . When
NMSE is zero, the model is perfect. When NMSE is 1, the model is as good as the
observed mean value. When NMSE is greater than 1, the model is poor.
Moments and Expectation 133
The other measure of goodness of fit that has been widely used to evaluate
the performance of hydrologic/water quality models is the coefficient of effi-
ciency developed by Nash and Sutcliffe (1970). Mathematically, the coefficient of
efficiency is defined as
n
∑ (Oi − Pi )
2
i =1
η1 = 1 − n
= 1 − NMSE (3.47)
∑ (Oi − O)
2
i =1
n
∑ Oi − Pi
i =1
η2 = 1 − n
(3.48)
∑ Oi − O
i =1
Table E3-21a
Date Observed (O) Predicted (P)
Table E3-21b
Statistic Segment 1-4 Segment 14-15
Chl-a TP TN Chl-a TP TN
RMSE 16.11 0.03 0.35 19.26 0.05 0.26
NMSE 1.47 1.97 0.89 2.26 1.12 1.05
Nash–Sutcliff coefficient, η1 –0.47 –0.97 0.11 –1.25 –0.12 –0.04
Index of agreement, η2 –0.16 –0.30 0.08 –0.56 0.00 0.03
η 2 indicate that the model performs poorly for predicting Chl-a and TP in both
lake segments, indicating that the corresponding observed average levels of
Chl-a and TP are far better than the model predictions. For TN, the model pre-
dictions are improved marginally in segment 1-4, whereas its predictions are
poor in segment 14-15.
Example 3.22 The precipitation (in millimeters) and runoff (in millimeters) for a
catchment for the month of July are given in Table 3-2. Compute the coefficient
of correlation of the data.
Solution The various variables required to calculate the coefficient of correlation are
computed in Table 3-22. Here, x = 687.05/16 = 42.94 and y = 234.04/16 = 14.63.
Now
1
σ xy =
n
∑ (x − x )( y − y ) = 369.423 /16 = 23.09
sx = (570.056/16)0.5 = 5.97
1 n
sx , y = ∑ x i y i − mx m y
n i =1
(3.51)
Moments and Expectation 137
Equation 3.51 can be used to derive an important theorem about the variance
of the sum of random variables:
var(X + Y ) = σ 2x + σ 2y + 2 ρ σ x σ y (3.53)
Equations 3.52 and 3.53 are equivalent and show that the variance of the sum
of two random variables is equal to the sum of their variances if the variables are
independent; otherwise one must add twice the covariance.
+∞
E{ ϕ( x)}= ∫ ϕ(x) f (x)dx (3.55)
−∞
138 Risk and Reliability Analysis
n
E{ ϕ( x)} = ∑ ϕ( x) p( xi ) (3.56)
i =1
+∞+∞
E{ ϕ(X , Y )}= ∫ ∫ ϕ(x, y ) f (x, y ) dx dy (3.57)
−∞−∞
n m
E{ ϕ(X , Y )} = ∑ ∑ ϕ(x, y ) p(xi , y j ) (3.58)
i =1 j =1
It can be seen from Eq. 3.55 to Eq. 3.58 that calculation of the expectation of
a random variable is a linear operation. This means that the expectation of the
sum of two random variables, or the expectation of the sum of two functions of
random variables, is equal to the sum of the expectations. This follows immedi-
ately from the fact that when ϕ (X) or ϕ (X, Y) can be written as a sum, the inte-
gral can be written as the sum of two separate integrals. For example,
E{ ϕ1 (X ) + ϕ2 (X )} = E{ ϕ1 (X )} + E{ ϕ2 (X )} (3.59)
For two random variables that may or may not be statistically independent,
E{ ϕ1 (X ) + ϕ2 (Y )} = E{ ϕ1 (X )} + E { ϕ2 (Y )} (3.60)
+∞ +∞
E(C )= ∫ C f ( x)dx =C ∫ f ( x)=C
(3.61)
−∞ −∞
Appendix 3A
To prove that
s2 =
∑ ( x i − X )2 (3A.1)
n−1
is an unbiased estimator of
σ 2
=
∑ ( xi − μ)2 (3A.2)
n
Moments and Expectation 139
we write
1 n
s2 =
n−1
∑ (xi − μ)2 − n − 1 (X − μ)2
Taking the expectation gives
1 n
E(s2 ) =
n−1
∑ E( xi − μ)2 −
n−1
E{(X − μ)2 }
or
1 n
E(s2 ) =
n −1
∑ var(X ) −
n −1
var(X )
1 n σ2
= nσ 2 −
n −1 n −1 n
n 2 1
= σ − σ2 = σ2
n −1 n −1
3.8 Questions
3.1 Compute the mean, standard deviation, coefficient of variation, coeffi-
cient of skewness, and coefficient of kurtosis for the temperature, rain-
fall, wind velocity, and discharge data that you have used in the
previous chapter [Questions 2.1 to 2.5]. Also compute shape factors and
moment ratios.
3.2 Using the data from Question 3.1, plot histograms of temperature, rain-
fall, wind velocity, and discharge.
140 Risk and Reliability Analysis
X 0 1 2 3 4 5 6 7 8 9 10
f(x) 0.00 0.04 0.08 0.12 0.16 0.20 0.16 0.12 0.08 0.04 0.00
Calculate the following: (i) E[X], (ii) E[2X], (iii) E[2X+2], and (iv) E[g(X)],
where g(X) = (X2 – 2X + 4).
3.11 Calculate (i) E[X], (ii) E[2X], (iii) E[2X + 2], and (iv) E[g(X)], where g(X) =
(X2 – 2X + 4) if X is described by (a) a uniform distribution in the range
(0, 10) [i.e., f(X) = 1/10], (b) a triangular distribution given in Fig. Q3-11,
and (c) an exponential distribution defined as f(X) = 0.25 exp(–0.25X).
Moments and Expectation 141
Figure Q3-11
(
f (x) = c 1 − x 3 ) for 0 < x ≤ 1
= 0 elsewhere
(a) Find the value of c so that f(x) is a valid distribution. (b) Determine
the first four moments of X about the origin. (c) Use the noncentral
moments to determine the first four central moments.
3.13 Find the characteristic function for the random variable X defined in
Example 3.2. Using the characteristic function thus obtained, find the
expected value of X.
3.14 Find the moment-generating function of X characterized by the follow-
ing PDF:
a x exp(−b)
f ( x) = , where x is a positive integer
x!
Determine the mean and variance of X using this moment-generating
function.
3.15 Let X be defined by the following gamma distribution function:
β α α−1 − βx
f ( x) = x e , x, α > 0
Γ( α)
142 Risk and Reliability Analysis
−
(ln Q − 4)2
1 ,0<Q<∞
f (Q ) = e 18
9Q 2 π
x
1 − 2 , –∞ < x < ∞
f ( x) = e
2
Determine its moment-generating function and the first three central
and noncentral moments. Determine its characteristic functions.
3.19 A random variable X is defined by the normal distribution with parame-
ters μ and σ with the following PDF:
−
( x − μ)2
1 , –∞ < X < ∞
f ( x) = e 2σ 2
σ 2π
Determine its characteristic function.
3.20 Let R, S, T, U, and V be random variables with the first four noncentral
moments given in Table Q3-20. Using the relationship between central
and noncentral moments, determine the mean, variance, skew, and kur-
tosis of these random variables.
Moments and Expectation 143
Table Q3-20
Moment order, k 1 2 3 4
E[Rk] 2.00E-01 4.04E-02 8.24E-03 1.70E-03
⎧0, if x <−5
⎪0.25, if − 5 ≤ x < 5
⎪
F( x) = ⎨0.45, if 5 ≤ x < 10
⎪0.65, if 10 ≤ x < 15
⎪0.85, if 15 ≤ x < 20
⎩1.0, if x > 20
Table Q3-22
Year Stage (ft) Flow (cfs) Year Stage (ft) Flow (cfs)
Table Q3-23
Observed water quality Model-calculated water quality
DO a
FCOLI b
NO3 c
NH3 d
TSSe
TEMP f DO FCOLI NO3 NH3 TSS TEMP
3.0 5. 0.0 0.0 11.0 46.0 6.0 158. 0.1 0.0 6.5 41.7
6.6 50. 0.1 0.0 12.0 48.2 9.2 180. 0.1 0.0 6.7 52.4
6.8 62. 0.1 0.0 13.0 52.7 8.0 124. 0.1 0.0 44.5 50.8
6.8 92. 0.1 0.0 14.0 55.0 7.0 335. 0.1 0.0 23.2 54.0
7.5 100. 0.1 0.0 15.0 56.7 8.3 196. 0.3 0.0 47.7 55.3
7.5 108. 0.1 0.0 15.0 59.2 9.1 170. 0.3 0.1 29.8 53.1
7.7 120. 0.1 0.0 16.0 59.9 7.7 89. 0.3 0.0 32.0 56.5
7.7 122. 0.1 0.0 18.0 59.9 7.7 155. 0.3 0.0 51.1 46.9
7.7 125. 0.1 0.0 19.0 60.6 7.8 124. 0.1 0.0 5.5 55.7
7.8 148. 0.1 0.0 20.0 61.0 7.6 180. 0.1 0.0 25.1 61.4
7.8 160. 0.1 0.0 22.0 61.0 8.1 135. 0.2 0.0 74.5 56.4
7.9 165. 0.1 0.0 23.0 61.9 8.8 40. 0.1 0.0 101.0 59.3
8.0 170. 0.1 0.0 25.0 63.3 7.8 819. 0.3 0.0 114.0 60.3
Moments and Expectation 145
Table Q3-24
Data Root mean square Normalized mean Coefficient of
error (RSME) square error (MSE*) determination (CD)
Flow assessment
Probability
Distributions and
Parameter Estimation
147
Chapter 4
149
150 Risk and Reliability Analysis
may lie in knowing the number of levee sections liable to break. In such exam-
ples, an event may be characterized as simply “failure” or “no failure,” “suc-
cess” or “no success,” “win” or “loss,” “flaw” or “no flaw,” “accident” or “no
accident,” and so on. In other words, the specified event may be described by
“occurrence” or “nonoccurrence.” It is further assumed that the probability of
“occurrence” (and of “nonoccurrence”) is the same from trial to trial, or per unit
of space, or per unit of time, as the case may be. Each time a lined canal is ana-
lyzed, the probability of leakage is assumed to be the same. Each time a water
distribution network is analyzed, the probability of failure of pipes is assumed
to be the same. The probability of encountering a flaw is assumed to be the same
for each meter of canal lining. The probability of accident per unit of time in the
case of transportation is assumed to be the same for each unit of time. Further-
more, it is assumed that the occurrence or nonoccurrence of the specified event
is independent of the previous occurrences or nonoccurrences.
The number denoting the occurrence of the event or the success is regarded
as the random variable, denoted as X, whose probability distribution is of inter-
est. The distribution of X is obviously discrete. Depending on whether the occur-
rences are considered in a fixed number of observations or in a fixed continuum
of space or time, the random variable X will follow a binomial or a Poisson dis-
tribution if three conditions are met: (i) Only two discrete states of the random
variable are possible, (ii) the probability of occurrence is constant for each trial,
and (iii) the occurrence of the specified event is independent of the previous
occurrences. The binomial and the Poisson distributions are the most commonly
used discrete probability distributions.
{ p
f X ( x)= q=1− p
x =1
x=0
(4.1a)
σ 2X =∑ ( x − μ) f X ( x ) = p 2 q+ q2 p= p q ( p+ q)= pq = p (1− p )
2
(4.3)
For a Bernoulli variable, the probability of occurrence of the event in each trial
is the same from trial to trial and the trials are statistically independent. We use the
notation X ~ Bernoulli (p), which reads as X is characterized by a Bernoulli distri-
bution with parameter p. The Bernoulli distribution is useful for modeling an
experiment or an engineering process that results in exactly one of two mutually
exclusive outcomes. The experiments involving repeated sampling of a Bernoulli
random variable are frequently called Bernoulli trials [e.g., tossing a coin repeat-
edly and observing the outcomes (heads or tails)].
p q
0 mean 1 X
sequences are mutually exclusive events. Similarly, for the event y = 2, the mutu-
ally exclusive occurring sequences are
f[y = 2] = 3p2 (1 – p)
Likewise, for y = 3, P[y = 3] = p3, since only one sequence corresponds to y = 3.
In summary
fY(0) = (1 – p)3
fY(1) = 3p (1 – p)2
fY(2) = 3p2 (1 – p)
fY(3) = p3
⎛3⎞
fY ( y )=⎜ ⎟p y (1− p)3−y , y =0, 1, 2, 3
⎝y⎠
(3 )
where y the Binomial coefficient, equals 3!/[(y! (3 – y)!)], the number of ways
that exactly y successes can be found in a sequence of three trials.
To generalize, if there are n Bernoulli trials, the probability mass function of
the total number of successes y is given as
⎛ n⎞ n!
fY ( y )=⎜ ⎟p y (1− p)n−y = p y (1− p)n−y , y = 0 , 1, 2, 3 ,⋅⋅⋅⋅, n (4.4)
⎝ ⎠
y y !( n − y )!
= B(n, p)
Here n must be an integer and 0 ≤ p ≤ 1. Equation 4.4 defines the binomial dis-
tribution of Y for given values of p and n. The distribution is called binomial
because the coefficients are the well-known binomial coefficients that arise when
the series (a+b)n is expanded using the binomial theorem. The binomial distribu-
tion has two parameters: the number of trials and the probability of occurrence
of the specified event in a single trial. In an abbreviated form, this is referred to
as B(n, p). The shape of B(n, p) depends on parameters n and p. The probability of
each sequence is equal to pyqn–y. With use of Eq. 4.4, the probabilities that Y will
154 Risk and Reliability Analysis
n
⎛ n⎞ n
E[Y ] = ∑ y fY ( y ) =∑ y⎜ ⎟p y (1 − p)n−y
y =−∞ y =0
⎝y⎠
n n
⎛ n⎞ (n − 1)!
= ∑ y ⎜ ⎟ p y (1 − p)n− y = np ∑ p y −1 (1 − p)n− y (4.5)
y =1
⎝ ⎠
y y =1 ( y − 1)!( n − y )!
n −1
(n − 1)!
E [Y ] = np ∑ pu (1 − p)n−1− u = np (4.6)
u=0 u !(n − 1 − u)!
because the term after the summation will add up to unity. Similarly, the vari-
ance of Y can be obtained as
To express the variance, E[Y2] needs to be specified. This term can be derived as
∞
⎛ n⎞ y n n
⎛ n⎞
E[Y ] = ∑ y fY ( y ) = ∑ y ⎜ ⎟p (1− p)
2 2 n−y
= ∑ y 2⎜ ⎟p y (1− p)n−y
2
y=−∞ y=−∞
⎝y⎠ y=1
⎝y⎠
n n
n! (n − 1)!
= ∑ y 2 y !(n − y )!p y (1 − p)n− y = np ∑ y ( y − 1)!(n − y )! p y −1 (1 − p)n− y
y =1 y =1
n
( y − 1 + 1)(n − 1)! y −1
= np[ ∑ p (1 − p)n− y ]
y =1 ( y − 1)!( n − y )!
Discrete and Continuous Probability Distributions 155
n n
( y − 1)(n − 1)! y −1 (n − 1)!
= np[ ∑ p (1 − p)n− y + ∑ p y −1 (1 − p)n− y ]
y =1 ( y − 1)!( n − y )! y =1 ( y − 1)!( n − y )!
n n
(n − 1)(n − 2)! y −1 (n − 1)!
= np[ ∑ p p (1 − p)n− y + ∑ p y −1 (1 − p)n− y ]
y =2 ( y − 2)!(n − y )! y =1 ( y − 1)!( n − y ) !
Let u = y – 2 for the first summation and v = y – 1 for the second summation
in this equation. Then one obtains
n n
(n − 2)! (n − 1)!
E[Y 2 ] = np{ p(n − 1)[ ∑ pu+1 (1 − p)n− 2− u + ∑ pv (1 − p)n−1− v ]
u=0 u !(n − 2 − u)! v=0 v !( n − 1 − v )!
Therefore,
n
E [Y ] = ∑ E [Xi ] = n E [Xi ] = n p
i =1
n
var[Y ] = ∑ Var [Xi ] = nVar [Xi ] = n p (1 − p)
i =1
This also shows that the sum of two binomial random variables, B(n1, p)
and B(n2, p), also has a binomial distribution, B(n1+ n2, p), as long as p remains
constant. As p tends to 0.5 and n tends to a large number, B(n, p) tends to a nor-
mal distribution (see the plots in Example 4.2), which will be discussed in the
next chapter.
Example 4.2 Consider the binomial distribution with parameters n and p. Graph
the binomial distribution for the following parameter sets: (1) n = 5, p = 0.1;
(2) n = 5, p = 0.25; (3) n = 5; p = 0.5; (4) n = 15, p = 0.25; (5) n = 15, p = 0.5; (6) n =30,
p = 0.25, (7) n = 30, p = 0.5; and n = 50, p = 0.5.
156 Risk and Reliability Analysis
Example 4.3 Daily rainfall data for Baton Rouge, Louisiana, is available for the
years 1948 to 1990. Consider the rainfall data for the month of September.
Assuming that the occurrence of rainfall on any day is an independent event,
compute the probability of 2 rainy days, 4 rainy days, and 10 rainy days in Sep-
tember. From the data, the total number of rainy days is 380.
Solution The total number of days in the month of September from 1948 to 1990
will be 43 × 30 = 1,290. Thus, the probability of rain on any given day in Septem-
ber is 380/1,290 = 0.2978. The probability of the occurrence of a given number of
rainy days in a month follows a binomial distribution. If there are n Bernoulli tri-
als, the probability mass function of the total number of successes y is given by
Eq. 4.4. Here, the number of trials n is the number of days in September (30), and
p is 0.2978. Hence, the probability of 2 successes, y = 2, is
⎛ 30⎞
P2 = ⎜ ⎟ 0.2978 2 (1 − 0.2978) − = 0.001938
30 2
⎝ 2⎠
Similarly, for y = 4,
⎛ 30⎞
P4 = ⎜ ⎟ 0.2978 4 (1 − 0.2978) − = 0.022
30 4
⎝ 4⎠
and for y =10,
⎛ 30⎞
P10 = ⎜ ⎟ 0.297810 (1 − 0.2978) − = 0.14
30 10
⎝ 10⎠
The probability of a different number of rainy days in September is plotted
in Fig. 4-3.
Example 4.4 Using the data of Example 4.3, find the probability of 2, 3, and 5
consecutive rainy days in the month of September.
Solution Since the probability of rain falling on a day is independent of the rain
on the previous day, the probability of 2 consecutive rainy days in the month of
September is
P2 = ( p ) = ( 0.2978) = 0.0887
2 2
P3 = ( p ) = ( 0.2978) = 0.0264
3 3
P5 = ( p ) = ( 0.2978) = 0.0023
5 5
Discrete and Continuous Probability Distributions 157
0.6
0.4
p(x)
p(x)
0.4
0.2
0.2
0 0
0 1 2 3 4 5 0 1 2 3 4 5
x x
0.2
0.3
0.15
p(x)
p(x)
0.2
0.1
0.1
0.05
0
0
0 1 2 3 4 5
0
10
12
14
x
x
0.08
0.1 0.06
0.04
0.05 0.02
0
0
0
3
6
9
12
15
18
21
24
27
30
10
12
14
0
x
x
0.14 0.10
0.12
0.08
0.1
p(x)
p(x)
0.08 0.06
0.06
0.04
0.04
0.02
0.02
0 0.00
0 10 20 30 40 50
0
3
6
9
12
15
18
21
24
27
30
Figure 4-3 Probability of given number of rainy days in September for Baton Rouge.
Example 4.5 Consider the annual peak discharge series for the Amite River near
Darlington, Louisiana. Assume that the annual peak values are independent. If
the probability of a given T-year flood is constant from year to year, then the suc-
cessive years represent independent Bernoulli trials. What is the probability that
a 50-year flood will occur at least once during 20 years? Compute the probability
of a 100-year flood occurring at least once in 20 years.
Solution When a return period (T) for an event is given, the probability p is
defined as (the definition of return period is provided later) p = the probability
of occurrence of a T-year flood in any given year = 1/T =1/50 = 0.02. Let x = the
number of occurrences of the 50-year flood. X ~ B(n, p = 0.02). The probability
that the 50-year flood will occur in 20 years will be
Ê 20ˆ
P [ x ≥ 1] = 1 - f x (0 ) = 1 - Á ˜ (0.02)0 (1 - 0.02)20 = 0.332
Ë 0¯
The probability that the 100-year flood (p = 0.01) will occur in 20 years will be
Ê 20ˆ
P [ x ≥ 1] = 1 - f x (0 ) = 1 - Á ˜ (0.01)0 (1 - 0.01)20 = 0.182
Ë 0¯
Example 4.6 A factory produces plastic pipes and an inspection showed that
10% of the pipes produced are defective. Prepare the PMF of the number of
defective pipes encountered in a sample of 10. Assume that the number of defec-
tive pipes follows a binomial distribution.
Solution Let X be the number of defective items. Here p = 0.1 and n = 10. A sam-
ple calculation for x = 2 is
⎛ 10⎞
P(x = 2) = ⎜ ⎟ (0.1)2(0.9)8 = 45 × 0.0043047 = 0.1937
⎝ 2⎠
Discrete and Continuous Probability Distributions 159
Table E4-6
X 0 1 2 3 4 5 6 7 8 9 10
⎛n⎞
⎜ ⎟ 1 10 45 120 210 252 210 120 45 10 1
⎝x⎠
pX(x) 0.3487 0.3874 0.1937 0.0574 0.0112 0.0015 0.0001 0.000 0.0 0.0 0.0
Example 4.7 Using the data of Example 4.6, compute the probability that in one
particular sample of 10 plastic pipes, one would find three or more defective
pipes.
Solution This probability can be calculated in two ways. First, one can add up
the probabilities of x = 3, x = 4, x = 5, etc. up to x =10 defective pipes. Second, a
more efficient way is to consider the complementary event: less than three defec-
tive pipes. This means adding up
P(x = 0) = 0.3487 + [P(x = 1) = 0.3874] + [P(x = 2) = 0.1937] = 0.9298
The probability of finding more than three defective pipes is, therefore,
1.0 – 0.9298 = 0.0702.
n n
FN (n) = ∑ PN ( j) = ∑ (1 − p) j −1 p = 1 − (1 − p)n (4.10)
j =1 j =1
∞
1
E [N ] = ∑ np (1 − p)n−1 = (4.11)
n=1 p
1− p
Var [N ]= (4.12)
p2
Example 4.8 Plot the geometric distribution for various values of p and n, taking
n from 0 to 20, and p from 0 to 1.
Solution The distribution has been plotted in Fig. 4-4.
∞ ∞
T = E[N ] = ∑ npN (n) = ∑ npqn−1 = p(1 = 2q + 3q2 + …)
n =1 n =1
p 1
T= 2
=
p p
Discrete and Continuous Probability Distributions 161
0.1 0.2
p=0.1 p=0.2
0.15
Probability
Probability
0.05 0.1
0.05
0 0
0 5 10 15 20 0 5 10 15 20
n n
0.4 0.4
Probability
0.2 0.2
0.1 0.1
0 0
0 5 10 15 20 0 5 10 15 20
n n
0.8 0.8
Probability
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 0 5 10 15 20
n n
0.8 0.8
p=0.7 p=0.8
0.6 0.6
Probability
Probability
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 0 5 10 15 20
n n
Example 4.9 Using the data in Example 4.3, compute the probability of the first
rainy day in September using the geometric distribution. Also, compute the
probability of the first two consecutive rainy days.
Solution The probability of the first occurrence of rain in September (first suc-
cess) can be obtained using Eq. 4.9:
Then, the probability of occurrence of the first two consecutive rainy days in
September can be obtained again by substituting this value in Eq. 4.9:
P = (1 – 0.089)1–1(0.089) = 0.089
Example 4.10 Using the data of Example 4.5, compute the probability that (a) a
50-year flood will occur at least once during 20 years, (b) a 100-year flood will
occur at least once during 20 years, (c) the number of years for the first occur-
rence of the 50-year flood is greater than 10 years, (d) the number of years for the
first occurrence of the 50-year flood is greater than 30 years, (e) the number of
years for the first occurrence of the 100-year flood is greater than 10 years, (f) the
number of years for the first occurrence of the 100-year flood is greater than 20
years, (g) there will be no floods greater than the 50-year flood in 50 years, and
(h) there will be no floods greater than the 100-year flood in 100 years.
Solution
(a) For a 50-year flood, the probability in any year is p = 1/50 = 0.02. The
probability that a 50-year flood will occur at least once during 20 years is
one minus the probability that it will not occur in 20 years:
⎛ 20⎞
P [ x ≥ 1] = 1 − P ( 0) = 1 − ⎜ ⎟ ( 0.02) (1 − 0.02) = 0.332
0 20
⎝ 0⎠
(b) The probability of a 100-year flood in any year is p = 1/100 = 0.01. The
probability that a 100-year flood will occur at least once during 20 years is
⎛ 20⎞
P [ x ≥ 1] = 1 − P (0) = 1 − ⎜ ⎟ ( 0.01)0 (1 − 0.01)20 = 0.182
⎝ 0⎠
(c) The probability that the number of years for the first occurrence of the
50-year flood is greater than n years is
(g) The probability that there will be no floods greater than the 50-year flood
in 50 years is
⎛ 50 ⎞
P [ x = 0 ] = f x (0 ) =⎜ ⎟(0.02)0 (1− 0.02)50 = 0.364
⎝0⎠
(h) The probability that there will be no floods greater than the 100-year
flood in 100 years is
⎛100 ⎞
P [ x = 0 ] = f x (0 ) =⎜ 0 100
⎟(0.01) (1− 0.01) = 0.366
⎝ 0 ⎠
⎛ w − 1⎞
PWk (w) = ⎜ ⎟ (1 − p)w − k p k , w = k , k + 1,... (4.14)
⎝ k − 1 ⎠
where Wk is the trial number at which the kth success occurs. Equation 4.14 implies
that k – 1 successes in the preceding w – 1 trials have already occurred. The proba-
bility of k – 1 successes in w – 1 trials is obtained from the binomial distribution.
This is the negative binomial distribution, also called the Pascal distribution,
with parameters k and p, and is denoted as NB(k,p). Note that PWk(w) = 0 for
w < k. Interestingly, the sum of two negative binomial random variables is also a
negative binomial random variable; that is, NB(k1, p) + NB(k2, p) is also
NB(k1 + k2, p). Parameters of the negative binomial distribution are given as
k
E(w) =
p
k(1 − p)
Var(w) =
p2
164 Risk and Reliability Analysis
Example 4.12 Using the data of Example 4.3, compute the probability that the
second rainy day will occur on the 10th day of September. Also compute the
probability that the third rainy day will occur on the 15th day of the month using
the negative binomial distribution.
Solution The probability of the kth success occurring at the wth trial can be cal-
culated by the negative binomial distribution. Here, k = 2, w = 10. Hence,
⎛ 10 − 1⎞
Pwk (10) = ⎜ ⎟ (1 − 0.2978)10 − 2 (0.2978)2 = 0.047
⎝ 2−1⎠
⎛ 15 − 1⎞
Pwk (15) = ⎜ ⎟ (1 − 0.2978)15− 3 (0.2978)3 = 0.04
⎝ 3−1⎠
0.2 0.4
p=0.25 p=0.5
Probability mass
Probability mass
0.1 0.2
0.05 0.1
0 0
0 50 100 150 0 50 100 150
w w
0.08 0.2
p=0.25 p=0.5
Probability mass
Probability mass
0.04 0.1
0.02 0.05
0 0
0 50 100 150 0 50 100 150
w w
Ê 30ˆ
p [X ≥ 1] = 1 - f X (0) = 1 - Á ˜ (0.02)0 (0.98)30 = 1 - (0.98)30 = 0.45
Ë 0¯
One can also use the binomial theorem and get the same answer:
(30)(29) (30)(29)(28)
= 1 − [ 1 − 30(0.02) + (0.02)2 − (0.02)3 + ...]
2 (2)(3)
= 0.6 – 0.174 + 0.01624 ≈ 0.45
If this risk is too large, the design capacity is increased such that the magni-
tude of the critical flood would be exceeded with an acceptable probability of,
say, 0.01 in any one year. Then, X is B(30, 0.01), and P(X ≥ 1) = 1 – fX(0) = 0.26. The
risk is lowered, but one must weigh the initial cost of the system versus the
decreased risk of incurring the damage associated with the failure of the system
to contain a large flood. The number of years N to the first occurrence of the crit-
ical flood is a random variable with a geometric distribution, G(0.01), in the lat-
ter case. The probability that it is greater than 10 years is
Suppose now the probability that N > 30 is to be computed. This is the prob-
ability that there are no floods in 30 years; that is, X = 0, where X is B(30, 0.01).
Thus,
P(N > 30) = P[X = 0] = 0.74
In a similar manner, one can compute the average return period or the
expected value of N, which simply is
1 1
mN = = = 100 years
p 0.01
This is the average number of trials (years) to the first flood of magnitude
greater than the critical flood after its last occurrence. This is referred to as the
166 Risk and Reliability Analysis
return period or recurrence interval in hydrology. Since X is B(m, 1/m), the prob-
ability that there will be no floods greater than the m-year flood in m years is
⎛ 1 ⎞ m m ⎛ m⎞ ⎛ 1 ⎞ i
P ( X = 0) = ⎜ 1 − ⎟ = ∑ ⎜ ⎟ ⎜ ⎟
⎝ m⎠ i=0
⎝ i ⎠ ⎝ m⎠
2
⎛ 1 ⎞ m (m − 1) ⎛ 1 ⎞
= 1− m ⎜ ⎟ + ⎜ ⎟ ...
⎝ m⎠ 2 ⎝ m⎠
u u2
= 1− + − ... − = e − u , u = m(1/ m) = 1 (4.15)
1! 2 !
For large m,
P[X = 0] ≈ e −1 ≈ 0.368
This states that the probability that one or more m-year events will occur in
m years is approximately 1 – e–1 = 0.632. Thus, a system designed for the “m-year
flood” will be inadequate, because the m-year flood will occur with a probability
of about 2/3 at least once during the period of m years.
On a given day, it is easy to count the number of times the thundering and light-
ning occurred, but it makes little sense to state the number of times it did not
occur. Similarly, the number of flaws in a 1000 m of a water supply pipe can be
counted but the number of nonflaws cannot be stated. Thus, instead of defining
the probability of “occurrence” for the specified event in a single trial, as for the
binomial distribution, what is defined here is the probability of occurrence per
unit of time or of space. For example, the probability that lightning in New
Orleans in the month of May will occur may be 0.025 per day. The probability
that a flaw occurs in a water supply pipe may be 0.000045 per meter of pipe
length or the probability of flooding in an urban area may be 0.01 per year. It is
assumed that these probabilities are the same for every day, every meter, or
every year. It is further assumed that the occurrences and the nonoccurrences are
independent along the continuum. The difference between these binomial and
Poisson distributions can be summarized by noting that both the occurrences
and nonoccurrences can be specified for the binomial distribution, whereas they
cannot be for the Poisson distribution.
The binomial and Poisson distributions share some similarities. The proba-
bility distribution of the number of occurrences X in a given continuum of time
or space can be treated as a special case of the binomial distribution under two
conditions: (1) The number of trials becomes infinitely large, and (2) the average
number of occurrences defined by np remains constant. By dividing the contin-
uum into small intervals, the problem can be reduced to one of “occurrence” and
“nonoccurrence” of the specified event in any of these intervals, provided these
intervals are made so small that the probability of getting two or more “occur-
rences” in any interval is negligible. To that end, consider a fixed interval of
time, say, t. Assume that the probability of an event occurring at any instant is p
(and it is assumed here that the probability of two or more events occurring at
any one instant is negligible). Then, the total number of events X in the n = t
(assumed) independent trials is binomial, B(n, p):
⎛ n⎞
B ( n, p ) = f X ( x) =⎜ ⎟p x (1− p)n−x , x = 0, 1, 2,..., n (4.16)
⎝x⎠
n → ∞ , p → 0, np = v
168 Risk and Reliability Analysis
n ! ⎛ v ⎞x⎛ v ⎞n−x
fX ( x) = ⎜ ⎟ ⎜1 − ⎟
x !(n − x )!⎝ n ⎠ ⎝ n⎠
x n
v ⎛ v⎞ n! 1
= ⎜1 − ⎟
x !⎝ n ⎠ (n − x)! x⎛ v ⎞x
n ⎜1 − ⎟
⎝ n⎠ (4.17)
⎧ ⎫
n⎪ ⎪
x
v ⎛ v ⎞ ⎪ n(n − 1)(n − 2)...(n − x + 1)⎪
= ⎜1 − ⎟ ⎨ ⎬
x !⎝ n⎠ ⎪ ⎡ ⎛ v ⎞⎤
x
⎪
⎪ ⎢ − ⎟⎥
n⎜1− ⎪
⎩ ⎣ ⎝ n ⎠⎦ ⎭
⎧ ⎫
⎪ ⎪
⎪ n(n − 1)(n − 2)...(n − x + 1) ⎪ ⎛ v⎞n −v
⎨ ⎬ = 1; ⎜ 1 − ⎟⎠ ⇒ e (4.18)
⎡ ⎛ v⎞ ⎤
x ⎝ n
⎪ ⎪
⎪ ⎢ n ⎜⎝ 1 − n ⎟⎠ ⎥ ⎪
⎩ ⎣ ⎦ ⎭
Thus, from Eq. 4.17,
v x e−v (4.19)
f x ( x) = , x = 0, 1, 2, ..., ∞
x!
This is the Poisson distribution. This distribution has one parameter and is
entirely specified by the average number of occurrences of the specified event
over the interval of time or space in question. It is denoted by X ~ P(ν).
The average of X can be expressed as
∞ ∞ ∞
v x e−v v x−1e−v v x−1
E[X ] = ∑ x = v∑ = ve−v ∑
x=0
x! x=1
( x − 1)! x=1
( x − 1)! (4.20)
−v v
= ve e = v
failures (two events) in a fixed number of trials, whereas in the Poisson process
only one event (rather than two) is of concern. The Poisson distribution is most
commonly used in waiting time evaluations and reliability analysis (e.g., the
number of arrivals of vehicles at a highway toll booth in a given hour, the num-
ber of times a water or air or noise quality standard is violated at a given site
during a given monitoring period, the number of windy days in a given period,
or the number of snowfalls in a month).
Example 4.14 Consider the Poisson distribution with parameter ν. Graph the
Poisson distribution for ν = 0.5, 1.0, 2.0, 5.0, and 10.0.
Solution The shape of the distribution for a given value of parameter ν is given
in Fig. 4-6.
0.8 0.4
Probability mass
Probability mass
v=0.5 v=1
0.6 0.3
0.4 0.2
0.2 0.1
0 0
0 10 20 30 0 10 20 30
x x
0.4 0.2
v=5
Probability mass
Probability mass
v=2
0.3 0.15
0.2 0.1
0.1 0.05
0 0
0 10 20 30 0 10 20 30
x x
0.2
Probability mass
v=10
0.15
0.1
0.05
0
0 10 20 30
x
Figure 4-6 The Poisson distribution for different values of parameter v.
170 Risk and Reliability Analysis
Table E4-16
X= 0 1 2 3 4 5 6
P(x) 0.0498 0.1494 0.2240 0.2240 0.1680 0.1008 0.0504
P(X<x) 0.0498 0.1991 0.4232 0.6472 0.8153 0.9161 0.9665
P(X>x) 0.9502 0.8009 0.5768 0.3528 0.1847 0.0839 0.0335
X= 7 8 9 10 11 12 13
P(x) 0.0216 0.0081 0.0027 0.0008 0.0002 0.0001 0.0000
P(X<x) 0.9881 0.9962 0.9989 0.9997 0.9999 1.0000 1.0000
P(X>x) 0.0119 0.0038 0.0011 0.0003 0.0001 0.0000 0.0000
Example 4.17 Irrigation canals were lined using two types of lining: concrete and
brick. It was observed after a number of years that 5 out of 70 brick-lined canal
reaches leaked and only 1 out of 50 concrete-lined canal reaches showed cracks
and leakage. Assuming that there should be no difference in the durability of the
Discrete and Continuous Probability Distributions 171
linings, what is the probability that a difference equal to or greater than the
observed difference in the number of leaks (= 4) would occur?
Solution On the assumption of no difference in durability, there are 120 lined
canal reaches, including leaking lined canals. The probability of a leak per single
canal can be estimated at 6/120 = 5%. There are then two samples, one of 70 and
one of 50 canal reaches, and the distribution of the number of defects in each can
be determined. Let these numbers be designated as X70 and X50. If they follow a
binomial distribution, then
X70 ~ B(70, 0.05)
The variances of the two variables are 70(0.05)(0.95) = 3.33 and 50(0.05)(0.95)
= 2.38, respectively.
The Poisson distribution can be expressed with ν = 70 × 0.05 = 3.5 and 50 × 0.05
= 2.5 as X70 ~ P(3.5) and X50 ~ P(2.5).
To determine the probability that D = X70 − X50 is equal to or larger than 4,
the type of distribution of the difference between two Poisson or binominal dis-
tributions is required, which is not known. A simple way out of this difficulty is
as follows. Event D equal to or larger than 4 can result from a rather limited
number of combinations of X70 and X50, namely,
X50 = 0 and X70 equal to or larger than 4
etc. These probabilities can be computed and the total probability can be deter-
mined for the Poisson distributed variables as listed in Table E4-17a.
Similarly, the probabilities computed and the total probability for binomial-
distributed variables are given as listed in Table E4-17b. The calculated values
given in the table indicate that the probability obtained converges to zero
quickly. So it is safe to calculate the probability until X50 = 6, which results in
P(D4) = 0.1492 if it is assumed that the variables are Poisson distributed, and
0.1438 if it is assumed that the variables are binomial distributed. This means
that the evidence of greater durability is by no means conclusive.
Table E4-17a
X50 P(X50) P(X70 ≥ X50 + 4) P(X50)×P(X70 ≥X50 + 4)
0 0.0821 0.4634 0.0380
1 0.2052 0.2746 0.0563
2 0.2565 0.1424 0.0365
3 0.2138 0.0653 0.0140
4 0.1336 0.0267 0.0036
5 0.0668 0.0099 0.0007
6 0.0278 0.0033 0.0001
7 0.0099 0.001 1.01×10–5
8 0.0031 0.0003 8.98×10–7
9 0.0009 0.00008 6.55×10–8
10 0.0002 0.00002 4×10–9
Sum 14.92%
Table E4-17b
X50 P(X50) P(X70 ≥ X50 + 4) P(X50)×P(X70 ≥X50 + 4)
0 0.077 0.47 0.036
1 0.202 0.27 0.055
2 0.261 0.14 0.036
3 0.219 0.06 0.013
4 0.136 0.023 0.0032
5 0.066 0.008 0.00053
6 0.026 0.0025 6.42×10–5
7 0.0086 0.00068 5.88×10–6
8 0.0024 0.00017 4.18×10–7
9 0.00059 3.94×10–5 2.35×10–8
10 0.00013 8.27×10–6 1.07×10–9
Sum 14.38%
λx e − λ
p [x] =
x!
Discrete and Continuous Probability Distributions 173
Hence,
1.791 e −1.79
p [1] = = 0.299
1!
1.792 e −1.79
p [ 2] = = 0.267
2!
In a similar manner, we find p[3] = 0.160, p[4] = 0.071, p[5] = 0.026, and p[6] =
0.0076. These probabilities are plotted in Fig. 4-7.
(l t)0 e - lt
1 - FT (t ) = = e - lt , t ≥ 0 (4.21)
0!
FT (t) = 1 - e - lt , t ≥ 0
dFT (t)
fT (t) = = l e - lt , t ≥ 0 (4.22)
dt
174 Risk and Reliability Analysis
0.35
0.3
0.25
Probabilty
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9 10
Number of droughts
∞
1
E[T ] = ∫ t λe − λt dt = (4.23)
0
λ
where 1/λ denotes the average time between arrivals, and λ is the average num-
ber of events per unit time.
The variance of T is given as
1
σT 2 = (4.24)
λ2
The coefficient of variation of T is
E[T ] 1/ λ
cv = = =1 (4.25)
σ T 1/ λ
Example 4.20 In Example 4.18, the mean and standard deviation of the drought
duration (beyond a threshold value of 15 days) were 8.06 and 8.575 days, respec-
tively. The drought duration was found to follow an exponential distribution.
The maximum observed value of this duration (counted beyond the threshold
value) during the course of 39 years was 47 days. Compute the probability of
drought duration to be less than 5, 10, 15, 20, 25, 30, 35, 40, and 45 days.
Solution The mean of exponential distribution is E[T] = 1/λ = 8.06; thus
λ = 0.124. Therefore,
λ( λx)k −1 e − λx
fX k ( x) = , x ≥ 0, λ > 0, k = 1, 2,... (4.26)
(k − 1)!
λ( λx)k −1 e − λx
fX ( x) = , x ≥ 0, λ > 0, k > 0 (4.27)
Γ( k )
∞
Γ(k ) = ∫ e−u uk−1du, k > 0 (4.28)
0
176 Risk and Reliability Analysis
0.4
Lambda=1
0.3 Lambda=2
Lambda=0.5
f(t)
0.2
0.1
0
1 3 t 5 7
1
0 .9
0 .8
0 .7
Probability
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
5 10 15 20 25 30 35 40 45
d a ys
Figure 4-9 Probabilities of drought durations.
λk x k −1e − λx
x
FX ( x) = ∫ dx (4.29)
0
Γ( k )
This function can be evaluated by using the table of incomplete gamma func-
tions. If k is an integer, the cumulative distribution function can be computed as
k −1
( λx) j
FX ( x) = 1 − e − λx ∑ (4.30)
j=0 j !
Discrete and Continuous Probability Distributions 177
Also, Γ (k+1) = kΓ (k), k > 0; Γ (k) =Γ (k+1)/k, k < 1; and Γ (2) = Γ (1) = 1,
Γ (1/2) = π . The incomplete gamma function is defined as
x
Γ(k , x ) = ∫ e − u uk −1du (4.31)
0
Abramowitz and Stegun (1965) have given the following numerical approxi-
mation to evaluate the gamma function:
k
E[X ] = (4.33)
λ
k
σX2 = (4.34)
λ2
The coefficient of skewness is defined as
E ⎡⎣( x − mx )3 ⎤⎦ 2
γ = 3
= (4.35)
σx k
Example 4.21 Graph the gamma distribution with parameters k and λ for
λ = 2.2, k = 1, 2, 3, 4, 5, and 10. Take the X axis as λ x.
Solution The shape of the gamma distribution for different values of parameter
k is given in Fig. 4-10.
Example 4.22 Markovic (1965) used the gamma distribution to model the maxi-
mum annual river flows in the Weldon River at Mill Grove, Missouri, based on
the data for 1930 to 1960. He found k = 1.727 and λ = 0.00672 (cfs)–1. Determine
the probability that the maximum flow is less than 400 cfs in any year.
Solution The mean is
k 1.727
mX = = = 256.7 cfs
λ 0.00672
178 Risk and Reliability Analysis
2.5
k=1
k=2
2 k=3
k=5
Probability density
k=10
1.5
0.5
0
0 1 2 3 4 5 6 7 8 9 10
X
Figure 4-10 Shape of the gamma distribution for various values of parameter k.
2.69
∫ y(1.727−1)e−y dy
0
Fx ( 400) = = Γ(2.69, 1.727 ) = 0.809
Γ(1.727 )
1 x a−1 −t
Γ( x , a) = ∫ t e dt
Γ( a) 0
Example 4.23 The mean and the standard deviation of the annual peak flow
data for the Amite River near Darlington, Louisiana, are 28,675.833 cfs and
21,117.138 cfs. Assuming that the peak flow data follow a two-parameter gamma
distribution, determine the parameters of the distribution. What is the probabil-
ity that the maximum flow is less than 100,000 cfs, less than 80,000 cfs, and less
than 50,000 cfs in any year? What is the return period of each of these flows?
What is the probability that peak flow will occur in any year between one stan-
dard deviation on either side of the mean, and between two standard deviations
on either side of the mean?
Discrete and Continuous Probability Distributions 179
λ ≈ 6.43 × 10 −5
The probability that the maximum flow is less than 100,000 cfs in any year is
(with λ × 100,000 = 6.43)
1
mN =
1 − Fx ( x )
For a maximum flow of 100,000 cfs, the return period is 1/(1 – 0.9907) = 107.78
years, for a maximum flow of 80,000 cfs, the return period is 1/(1 – 0.971) = 34.48
years, and for a maximum flow of 50,000 cfs, the return period is 1/(1 – 0.863) =
7.30 years.
The probability that peak flow will occur within one standard deviation of
the mean is
Fx(mx – σ < x < mx + σ ) = Fx(mx + σ ) – Fx(mx – σ )
The probability that peak flow will occur within two standard deviations of
the mean is
Fx(mx – 2σ < x < mx + 2σ ) = Fx(mx + 2σ ) – Fx(mx – 2σ )
4.4 Questions
4.1 Using daily rainfall data for the month of January in Houston, Texas,
compute the probability of 2, 5, and 10 rainy days in January in Houston.
Use the binomial distribution. What will be the probability of having
two consecutive rainy days, three consecutive rainy days, four consecu-
tive rainy days, and five consecutive rainy days? What will be the proba-
bility that a 50-year rainfall (in terms of amount) will occur at least once
in 20 years? A 100-year rainfall in 20 years?
4.2 Compute the probability of the first occurrence of rain in January in
Houston. Use the geometric distribution.
4.3 Compute the probability that the third rainy day will occur on the 10th
day or on the 15th day in August in Houston. You can use the negative
binomial distribution.
Table 4-1 Summary of probability distributions.
Distribution Mathematical form Range of random Parameters Applications
name and symbol variable
Bernoulli 1−x X = [0,1 ] E(X) = p To model the behavior of a random variable that can take
f X ( x )= p x (1− p )
distribution 0≤ ρ≤ 1 on any one of the two values: success or failure, rain or
σ 2X = pq
dry, etc.
Binomial n! y = 0, 1, 2, 3, …, n E(Y) = np To model events whose outcomes are mutually indepen-
distribution, fY ( y )= p y (1− p)1−y 0≤ ρ≤ 1 var[Y] = dent and for which the probability of success or failure is
y !(n − y )!
B(n, p) np(1− p) fixed.
Geometric n = 1, 2, 3,… E[N] = 1/p Used when the question is to find the number of trials
P[N = n] = (1 − p)n −1 p
distribution, 1− p before the first success occurs.
G(p) var[N ] =
p2
Negative bino- ⎛ w − 1⎞ w = k, k+1 … k Used to model the number of trials, w, to obtain k suc-
mial distribution, fWk (w)=⎜ ⎟(1− p)w−k p k k = 1, 2, 3, … E(w) = cesses or k successes in w trials.
NB(k, p) ⎝ k −1 ⎠ 0≤ρ≤1
p
k(1 − p)
var( w) =
p2
Gamma x ≥ 0, λ > 0 Used to model the time to the kth event wherein times
λ( λx )k −1 e − λx E[ X ] =
k
distribution, fX k (x) = k>0 λ between arrivals of events are independent and exponen-
G(k, λ) (k − 1)! tially distributed. It is also used to model the instanta-
k neous unit hydrograph. Many distributions from the
σ X2 =
λ2 gamma family are employed in flood frequency analysis.
182 Risk and Reliability Analysis
4.4 Count the number of wet periods in Houston for the data you have.
Define a wet period by the sum of consecutive months each having rain-
fall equal to or greater than 5 inches. The number of wet periods can be
described by the Poisson distribution. Compute the probability that
Houston will have 2, 4, 5, 6, and 10 wet periods.
4.5 Count the number of dry periods in Houston each year for the data you
have. Define a dry period as the sum of months each having rainfall less
than or equal to 2 inches. The number of dry periods can be described by
the Poisson distribution. Compute the probability that Houston will
have 2, 4, 5, 6, and 10 dry periods in a year.
4.6 Compute the time interval between wet periods as defined in Question
4.4. This should follow an exponential distribution. Compute the probabil-
ity that the time interval would be less than 2, 3, 4, and 5 months.
4.7 Compute the time interval between dry periods as defined in Question
4.5. This should follow an exponential distribution. Compute the probabil-
ity that the time interval would be less than 2, 3, 4, and 5 months.
4.8 Compute the maximum monthly rainfall for each year. Determine the
probability that the maximum monthly rainfall is less than 10 inches.
You can use the gamma distribution here.
4.9 The daily concentration of a pollutant in a stream follows an exponential
distribution and is independent from day to day.
(a) If the mean daily concentration of the pollutant is 2 mg/L, estimate
the parameter λ of the exponential distribution given by f ( x) = λe − λx .
(b) Pollution is a problem if the concentration exceeds 6 mg/L. What is
the probability of a pollution problem on any particular day?
(c) What is the return period in days of the pollution problem?
(d) What is the probability of a pollution problem in at most 1 day in any
3 consecutive days?
(e) If instead of being exponentially distributed the pollution level is
described by a gamma distribution with the same mean and vari-
ance as the exponential distribution, then what is the probability of a
pollution problem on any particular day?
4.10 The time between rainstorms is thought to be exponentially distributed
with a mean of 5 days. What would you expect the distribution of the
time for the occurrence of 10 rainstorms to be? What would you expect
for the values of the parameters of this distribution?
Discrete and Continuous Probability Distributions 183
(a) What is the probability that city Z will be having exactly one main
break in a given year?
(b) What is the probability that city Z will be having at least one main
break in a given year?
(c) What is the probability that city Z will be having no breaks in 10 years?
(d) What is the probability that city Z will be having between 3 to 7 main
breaks in 10 years?
4.20 A drinking water company produces 10,000 bottles per day. Each bottle
has a 0.001 chance of being affected by some contaminant. Assume that
the chance of a bottle being affected by contamination is independent of
the daily supply orders.
(a) What is the most appropriate distribution for the number of bottles
affected by contamination?
(b) If your answer is the Poisson distribution, provide your justification
for selecting the Poisson distribution as an acceptable approximation.
(c) What is the probability that 21 bottles turn out to be affected by
contamination?
4.21 What design return period should be used to ensure a 90% chance that
the design will not be exceeded in a 50-year period? What design return
period should be used to ensure an 85% chance of no more than 1
exceedance in 25 years?
4.22 Two widely separated watersheds are selected for a study on peak dis-
charges. If the occurrence of flood flows on the two basins can be consid-
ered as independent events, what is the probability of experiencing a
total of ten 25-year events on the two watersheds in a 50-year period?
4.23 A scientist has predicted that during a certain 5-year period a severe
drought will occur in the high plateau of Mexico. She made this predic-
tion based on her observation of sunspot activity. If the probability of a
drought is 0.18 in any year, what is the probability that the scientist’s
prediction will come true if the occurrence of a drought was a strictly
random phenomenon unrelated to sunspot activity?
4.24 On average how many times will a 5-year flood occur in a 50-year
period? What is the probability that exactly this number of 5-year floods
will occur in a 50-year period?
4.25 What is the probability that exactly 4 years will elapse between occur-
rences of a 5-year event?
4.26 A binomial random variable has a mean of 20 and a variance of 16. Find
the values of n and p that characterize the distribution of this random
variable.
186 Risk and Reliability Analysis
187
188 Risk and Reliability Analysis
1 ⎡ 1⎛ x − μ ⎞2⎤
exp ⎢ − ⎜ ⎟ ⎥, − ∞ ≤ x ≤ ∞
X
fX ( x) = (5.1)
σX 2π ⎢⎣ ⎝
2 σ X ⎠ ⎥⎦
where fX(x) is the PDF of X, μX is the mean value of X, and σ X is the standard
deviation of X. The normal distribution is denoted as N(μ, σ 2), which means
that X is normally distributed with a mean of μ and a variance of σ 2; these are
also known as scale and shape parameters of this two-parameter continuous dis-
tribution. It can be shown that f(x) is symmetrical about the mean and that it
decreases on either side of the mean without ever reaching zero. The distribution
has a characteristic bell shape. The range of a normally distributed variable is
from − ∞ to + ∞ but most engineering variables vary from 0 to some high pos-
itive value. Strictly speaking, such variables cannot be said to follow a normal
distribution. However, if μX of a random variable is more than 3 times σ X, the
probability of the variable acquiring negative values is very small and hence the
normal distribution can be applied without incurring unacceptable error.
The effect of different values of the parameters on the shape of the distribu-
tion is shown in Fig. 5-1. If only μX of a random variable changes but σ X remains
the same, the distribution just gets shifted, but if μ X remains the same and σ X
changes, the spread of the distribution changes.
Example 5.1 Consider the normal distribution with parameters μ and σ . Graph
the normal distribution for the following sets of parameters: (1) μ = 0, σ =1;
(2) μ = 1, σ = 1; (3) μ = 1, σ = 0.5; (4) μ = 1, σ = 1.5; and (5) μ = 1 and σ = 2.5.
Solution The shape of the distribution for the desired combinations of parame-
ters can be seen in Fig. 5-2.
Limit and Other Distributions 189
0.1 -- 0.2 --
-2 -1 0 1 2 -2 -1 0 1 2
(a) (c)
fX(x)
Normal 0.4 -- 0.4 -- Normal
μ=1 μ=1
σ=1 0.3 -- 0.3 -- fX(x) σ = 4/3
0.2 -- 0.2 --
0.1 -- 0.1 --
-2 -1 0 1 2 -2 -1 0 1 2
(b) (d)
0.8
μ=0,σ =1
0.7
μ=1,σ =1
0.6 μ=1,σ =0.5
μ=1,σ =1.5
Probability density
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
x
Figure 5-2 Shape of normal distribution for various values of parameters.
190 Risk and Reliability Analysis
x
1 ⎡ 1 x − μX 2 ⎤
FX ( x) = ∫ σX 2π
exp ⎢ − (
⎣ 2 σX
) ⎥ dx , − ∞ ≤ X ≤ ∞
⎦
(5.2)
−∞
1 -u2 / 2
fU (u) = e , -• £ u £ • (5.3a)
2p
1 ⎛ x − μX ⎞
fX ( x) = fU ⎜ , −∞≤x≤∞ (5.3b)
σX ⎝ σ X ⎟⎠
1 1
E [U ] = [E(X ) − μX ] = [ μX − μX ] = 0 (5.4)
σX σX
and
1 1
var[U ] = [Var ( x)] = σ 2X = 1 (5.5)
σ 2X σ 2X
Thus, the parameters of U are now fixed and the PDF and CDF are functions
of U only. For X =1, μX =1, and σ X = 4/3,
1 ⎛ 1 − 1⎞ 3 3 ⎛ 1 ⎞
fX ( x) = fU ⎜ ⎟ = fU (0) = ⋅ ⎜ = 0.2660
σX ⎝ 4/3⎠ 4 4 ⎝ 2 × 3.1415 ⎟⎠
È x - mX ˘
FX ( x) = P [X £ x ] = P ÍU £ ˙
Î sX ˚
Ê x - mX ˆ
= FU Á = FU (u)
Ë s X ˜¯ (5.6)
u
1 È 1 2˘
=
2p Ú exp ÍÎ- 2 u ˙˚ du, - • £ u £ •
-•
Limit and Other Distributions 191
Example 5.2 Determine the value of F(u) for u = 2.5 from Eq. 5.6 and Eq. 5.8 and
compare the results with tabulated values.
Solution Using Eq. 5.6, we get F(2.5) = 0.994029 whereas Eq. 5.8 gives F(2.5) =
0.99379. The value obtained from standard tables is also 0.99379. While writing
programs to compute F(u) using either Eq. 5.6 or Eq. 5.8, it is advisable to use dou-
ble precision.
The range of a normally distributed random variable is (–∞ to +∞ ) and
hence many hydrologic variables, such as rainfall, discharge, or storage in a res-
ervoir, cannot be strictly normal. But for the random variable whose mean is
quite high, the probability of acquiring a negative value is negligible and the
normal distribution can still be applied to such variables.
Example 5.3 A random variable X has a mean of 3,000 and a standard deviation
of 400. Compute the probability that this variable will have a value less then
4,000.
192 Risk and Reliability Analysis
Solution We evaluate
1 1 4000 ⎡ 1⎛ x − 3000 ⎞2 ⎤
FX ( x) = ∫
400 2 π −∞
exp⎢− ⎜
⎣ 2⎝ 400 ⎠ ⎦
⎟ ⎥dx
⎛ 4000 − 3000 ⎞
= FU ⎜ ⎟= FU (2.5)
⎝ 400 ⎠
1 2.5 ⎡ 1 ⎤
= ∫
2 π −∞
exp⎢− u2 ⎥du = 0.99379
⎣ 2 ⎦
Example 5.4 Compute the probability that the random variable X in the previ-
ous example will be less than 3,400.
Solution The probability is given by
Ê 3400 - 3000 ˆ
P [ X £ 3400 ] = FX (3400) = FU Á ˜¯
Ë 400
= FU (1) = 0.8413
È X-m ˘
P [ m - rs £ X £ m + rs ] = P Í- r £ £ r˙
Î s ˚
= FU (r ) - FU (- r )
= 1 - FU (- r ) - FU (- r ) = 1 - 2FU (- r )
One can easily compute higher moments of N(μ, σ ). Since N(μ, σ ) is symmet-
ric, this implies that all odd-ordered central moments (and the skewness coeffi-
cient) are zero. The even-ordered moments are functions of the mean and
standard deviations. Thus, one obtains
n! σ n
mn = Ε ⎡⎣(X − μX )n ⎤⎦ = , n = 2 , 4 ,…
n/2 ⎛ n ⎞ (5.9a)
n2 ⎜⎝ ⎟⎠ !
2
Note that
4!σ 4
m4 = 2
= 3σ 4 (5.9b)
2 2!
Limit and Other Distributions 193
m4 3σ 4
γ= = =3 (5.10)
(m2 )2 ( σ 2 )2
X ≈ N ( μX , σ X 2 )
Z = X + Y, (5.11)
Y ≈ N ( μY , σ Y 2 )
Then,
μ Z = μX + μ Y (5.12a)
σ Z = σX2 + σ Y2 (5.12b)
Thus,
Z ≈ N ( μX + μY , σX2 + σ Y2 ) (5.13)
Example 5.5 Assuming that the data of Example 4.18 follow a normal distribu-
tion, compute the probability that the peak flow will be less than 100,000 cfs, less
than 80,000 cfs, and less than 50,000 cfs in any year. What is the return period of
each of these flows? Compare these probabilities and return periods with those
computed in Example 4.18. Which probabilities and return periods are more
realistic? What is the probability that the peak flow will occur in any year
between one standard deviation on either side of the mean and between two
standard deviations on either side of the mean?
Solution If the sample data follow a normal distribution with mean mX and
standard deviation sX, then
1 u ⎡ 1 ⎤
Fx ( x ) = P [ x ≤ x ] = ∫
2 π −∞
exp⎢− u2 ⎥du
⎣ 2 ⎦
1
RN =
1 − FX ( x )
The results are given in Table E5-5.
194 Risk and Reliability Analysis
The probability that the peak flow will be between ± 1 and ± 2 standard devi-
ations will be
FX(μ – σ < x <μ + σ ) = FX(–1 < U < 1) = 2×0.3414 = 0.6828
The CDF of the actual data and the normal distribution are plotted in Fig. 5-3.
To compare the distributions, we can define an index C by taking the sum of
squares of the differences between calculated and true values of P and then
dividing by (n – 1) as
n
∑ (Pcalculated − Ptrue )2 (5.14)
i =1
C=
n−1
The values for gamma and normal distributions are Cgamma = 0.001369 and
Cnormal = 0.005775. Hence, the estimates of the gamma distribution appear to be
more realistic.
Table E5-5
Peak flow (cfs) u Probability Return period (years)
100,000 3.377 0.9996 2500
80,000 2.43 0.9920 125
50,000 1.009 0.8621 7.25
or
X/mX = 1 + Cv K (5.18)
Chow (1951) proposed Eq. 5.18 as the general equation for frequency analy-
sis and coined the term frequency factor for K. In addition to the statistical charac-
teristics, the frequency factor also depends upon the recurrence interval. For a
particular distribution, the relationship between K and recurrence interval (T)
can be presented through tables or graphs.
For a normal distribution
K = (X – mX)/ s (5.19)
Ks + mX
1
P(X ≥ x ) = 1.0 −
σ X 2π ∫ exp(− K 2 / 2)dK (5.20)
−∞
0.4
X~B(10, 0.1)
0.35
0.3
Probability mass
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8
x
Figure 5-4 Binomial distribution.
7
Ê 20ˆ
P(5 < X £ 7 ) =  ÁË x ˜¯ px qn- x
x =6
Ê 20ˆ Ê 20ˆ
= Á ˜ 0.46 ¥ 0.6 20 - 6 + Á ˜ 0.47 ¥ 0.6 20 -7
Ë 6¯ Ë7¯
= 0.124 + 0.166 = 0.39
As n becomes large, the standard normal variate is computed as
(X − μ) (X − np)
Z= =
σ np(1 − p)
Limit and Other Distributions 197
When n = 30,
Ê 30ˆ Ê 30ˆ
P(5 < X £ 7 ) = Á ˜ 0.46 ¥ 0.6 30 - 6 + Á ˜ 0.47 ¥ 0.6 30 -7
Ë 6¯ Ë7¯
= 0.0115 + 0.0263 = 0.0388.
Approximating with the normal distribution, one gets
provided that X1 and X2 are independent. For the binomial distribution one can say
if X1 ~ B(n1, p) and X2 ~ B(n2, p) then (X1+X2) ~ B(n1+ n2, p)
provided that X1 and X2 are independent and that p is the same throughout. If
both variables are distributed as binomial or Poisson, then the difference
between them does not necessarily follow the same distribution. However, in
case of a difference between random variables that follow either a binomial or a
Poisson distribution, a normal approximation is often acceptable even if the
variables themselves deviate quite a bit from normality.
198 Risk and Reliability Analysis
Example 5.7 Referring to Example 4.18, one sees that the difference D may be
approximately normally distributed even if X70 and X50 are not. Thus, alternative
calculations can be made to do these calculations.
Solution The means and the variances of X70 and X50 have already been
calculated:
E(X70) = 3.5, E(X50) = 2.5
therefore,
E(D) = 3.5 – 2.5 = 1.00
var(X70) = 3.33, var(X50) = 2.38
so
var(D) = 3.33 + 2.38 = 5.7
It is now assumed that D ~ N(1.0, 2.39). To compute the probability of D ≥ 4,
the correction for continuity that makes D ≥ 4 equivalent to D > 3.5 is noted. The
latter value corresponds to U > 1.05. This probability is 14.69%, quite close to the
accurate value obtained in Example 4.18.
z
Sn - nm y - nm 1 - z2 / 2
lim P[Sn £ s] = lim P[
nƕ nƕ s n
£
s n
]=
2p Úe dz (5.22a)
-•
y − nμ
z=
σ n
Limit and Other Distributions 199
Further, if there exists a constant A, such that |Xn| ≤ A for all n, then for a < b,
b
⎡ S − nμ ⎤ 1 2
lim P ⎢ a ≤ n ≤ b⎥ = ∫ e − z / 2 dz (5.22b)
n −>∞ ⎣ σ n ⎦ 2π a
Similarly, one can obtain that the average X = ( X1 + X 2 + ... + X n ) / n tends
toward a normal distribution function with mean μ and standard deviation
σ/ n .
The central limit theorem helps approximate the sampling distribution of the
sum by an appropriate normal curve regardless of the form of the parent PDF
from which individual observations were derived. To explain it further, let us
consider measurements of discharge of a river being made at a gauging station.
The technician takes note of the computed discharge after rounding it off to the
nearest integer. These data are subsequently used in hydrologic analysis and
design. For example, monthly flow at the station is obtained by adding daily val-
ues. What is the rounding error in the sum of n measurements? The rounding
error in a single measurement display is a random variable, called X. It is
assumed that this variable is uniformly distributed. Suppose now that n round-
ing errors are summed. The sum is a random variable, denoted as Sn. Quite
likely, the values of X will be clustered around zero since positive and negative
values in the individual measurements tend to cancel each other out to some
extent. Values near the extremes will have a very low probability density
because outcomes near extremes would occur if all or most individual measure-
ments have large rounding errors of the same sign. To demonstrate this trend,
PDFs of S1, S2, S3, and S4 are drawn as shown in Fig. 5-5. It is evident that the
PDF of Sn rapidly approaches a characteristic bell shape.
f(X)
f(x1)
f(x2)
f(x4)
f(x3)
It can be shown that when the number n becomes large, the PDF of Sn will
approach
2
fSn (s)= k e − c s (5.23)
where k and c are constants and must be determined for each value of n such
that the area under the curve is equal to one. (The correct mean of zero value is
already ensured by the symmetry of the function about the f(x) axis.)
The observation that the sum of a number of uniformly distributed random
variables approaches the normal distribution is a special case of a more general
law. The sum of triangularly distributed variables would also approach a nor-
mally distributed variable. The general law is called the central limit theorem.
Accordingly, under very general conditions the distribution of the sum of n ran-
dom variables approaches a normal distribution when n is large, regardless of
the shape of the distribution of the contributing variables. The theorem applies
even if the number of variables is only moderately large, as long as they are not
highly dependent and as long as each contribution to the sum is relatively small;
that is, there must not be one or two dominating contributing variables. The
approximation improves with an increasing number of variables and is better
near the center than near the tails of the distribution. If contributing variables
were symmetrical then one needs fewer variables to obtain a good approxima-
tion than if they were asymmetrical. If the random variables already have close
to normal distributions, then the approximation will be very rapid. In a similar
vein, the sum of two or more normally distributed variables is also normally
distributed.
In natural phenomena it is not uncommon to frequently observe near-
normal distributions. This may be partly because variations in observed data can
often be regarded as the sum of variations in additive contributing factors. For
example, the total annual flows of a river result from the runoff caused by many
rainstorms over its drainage basin. One would expect that the annual flow may
be approximately normally distributed. This is indeed the case. Even when there
is little reason to regard the random variables as the sum of many contributing
random variables, the distribution may still be approximately normal. However,
the agreement between the normal distribution and the probabilities encoun-
tered in empirical observations cannot be expected to be perfect but the discrep-
ancies in probability are small.
The central limit theorem holds for most physically meaningful random vari-
ables: (1) independent and identically distributed variables, (2) independent but
not identically distributed variables, and (3) not independent but weakly depen-
dent variables. It is applicable without the knowledge of (1) the marginal distri-
butions of the contributing random variables, (2) their number, or (3) their joint
distribution. Examples 5.6 and 5.7 explain how the central limit theorem works.
Limit and Other Distributions 201
Example 5.8 Choose n random numbers from a uniform PDF in the interval [0, 1]
and obtain the distribution of the sum Sn. Plot and compare the resulting PDFs.
Solution The PDF of the uniform distribution is
f X ( x)=1 for 0 ≤ X ≤ 1
with
1
μ = E [ X ] = ∫ xf X ( x)dx = 1/ 2
0
1
σ = var [ X ] = ∫ x 2 f X ( x)dx = 1/12
2
Based on the central limit theorem, the sampling distribution of the sum
Sn = X1 + X2 + … + Xn tends to a normal distribution with mean nμ = n/2 and
standard deviation σ n = n /12 . Table E5-8 compares the statistics (mean and
standard deviation) of the actual Sn obtained by adding the n uniformly distrib-
uted numbers with the statistics obtained by applying the central limit theorem.
It can be noticed that the statistics of Sn matches very closely.
It is noted from Fig. 5-6a that the probability density function of Sn tends to
have a bell shape but is centered at n/ 2. To compare the shapes of these probabil-
ity density functions for various values of n, we standardize Sn. The standard-
ized parameter is defined as Sn* = (Sn − n μ) σ n . Figure 5-6b depicts the
distribution of Sn* for various values of n.
Example 5.9 Choose n random numbers from an exponential PDF in the inter-
val [0, + ∞ ] with the parameter λ = 1 and obtain the distribution of the sum Sn.
Plot and compare the resulting PDFs.
Table E5-8 (Top) Statistics of Sn based on the chosen numbers from the uniform
distribution. (Bottom) Statistics of Snbased on the central limit theorem.
N 2 4 8 16 32
Mean 1.0 2.0 4.0 8.0 16.0
St. dev. 0.41 0.58 0.81 1.16 1.63
CV 0.41 0.29 0.20 0.14 0.10
Min 0.00 0.26 1.14 2.94 10.02
Max 1.98 3.85 6.92 12.66 21.92
Mean = n/2 1 2 4 8 16
St. dev. =
0.41 0.58 0.82 1.15 1.63
n /12
202 Risk and Reliability Analysis
13
27
40
53
67
80
0.
0.
0.
0.
0.
0.
0.
0.3 0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05 0.05
0 0
2.94 4.33 5.72 7.10 8.49 9.88 11.27 12.66 10.02 11.72 13.42 15.12 16.82 18.52 20.22 21.92
f X ( x) = λe − λx , λ > 0
Figure 5-6b Comparison of probability density functions of Sn* for various values of n.
Table E5-8
Statistics of Sn based on the chosen numbers from the exponential distribution
n 2 4 8 16 32
Mean 2.02 4.01 8.01 16.02 32.04
St. dev. 1.41 1.99 2.79 4.01 5.65
CV 70.0 49.7 34.9 25.0 17.6
Min 0.0 0.3 1.4 5.5 14.4
Max 10.5 14.1 22.1 33.3 55.4
0.35
0.3 n=2
n=4
n=8
0.25
Probability density
n=16
n=32
0.2
0.15
0.1
0.05
0
0 10 20 30 40 50 60
Sn
0.25 n=32
0.2
0.15
0.1
0.05
0
-4 -3 -2 -1 0 1 2 3 4
S*n
Hazen also found that the frequency curve could be straightened out in most
cases if a logarithmic scale, instead of a linear scale, was used along the horizon-
tal axis. Using a logarithmic scale means, of course, plotting the logarithms of
the data instead of the data themselves. If this results in a straight line on a prob-
ability graph then the logarithms of X are normally distributed. A wide range of
variables, such as daily stream flows, annual flood peaks, earthquake magni-
tudes, particle size in a soil sample, hydraulic conductivity of geologic forma-
tions, and strength of some materials, follow the log-normal distribution.
Another reason for the popularity of this distribution is that it avoids negative
values of variables.
Now consider the distribution of a phenomenon that occurs as a result of a
multiplicative action (mechanism) of a number of factors. An example is the
evaporation of water into the atmosphere. Evaporation depends on temperature,
radiation, relative humidity, wind velocity, sunshine hours, etc. For evaporation,
a product type relationship holds. In such cases, the variable of interest can be
expressed as a product of a large number of variables, each of which, in itself, is
difficult to study and describe. By taking the natural logarithm of the product,
we obtain the sum of logarithms. By the central limit theorem, the sum will be
normally distributed. Another example is the sediment particle size that results
from a number of collisions of particles of many sizes traveling at different
velocities. Each collision reduces the particle size by a random proportion of its
size at the time. Thus, the size of a randomly chosen particle after n collisions,
Xn, is the product of its size prior to the collision, Xn–1, and the random reduction
factor, wi. One can then write
n
X n = X n−1wn = X n− 2 wn −1wn = ... = X0 ∏ wi
i =1
k
Qk = Q0 ∏ K i
i =1
where Ki is the recession factor on the ith day and Q0 is the peak discharge or the
discharge at the beginning of recession. In a similar manner autoregressive
processes used in hydrology can be expressed in product form. A random vari-
able X is said to be log-normally distributed if its logarithm, Y = ln(X), can be
characterized by a normal distribution with parameters μY and σ Y. Thus, by
using Eq. 5.1 the distribution of Y can be written as
1 ⎡ 1⎛ y − μ ⎞2⎤
fY ( y ) = exp ⎢ − ⎜ Y ⎥, − ∞ ≤ Y ≤ ∞ (5.24)
σY 2π ⎢⎣ 2 ⎝ σ Y ⎟⎠ ⎥⎦
μY ≈ y =
∑ yi
n
and
σ Y ≈ sY =
( y i − y )2
n−1
One can determine the distribution of X by the technique of variable trans-
formation explained in Chapter 2: One takes
dy
f X ( x ) = fY ( y )
dx
Differentiating Y = ln (X) with respect to X, one gets
dY 1
=
dX X
Substituting this relationship and Eq. 5.24 into the preceding relation gives
the PDF of X as
1 ⎡ 1 ⎛ ln x − μ ⎞ 2 ⎤
fX ( x) = exp ⎢ − ⎜ Y ⎥, x≥0 (5.25)
xσ Y 2π ⎢⎣ 2 ⎝ σ Y ⎟⎠ ⎥⎦
Limit and Other Distributions 207
Equation 5.25 represents the log-normal distribution. Note that the range of
Y is –∞ to +∞ whereas that of X is 0 to ∞ .
It is worth mentioning that the mean of X, μX, should not be interpreted as a
50% probable value. Instead the median value is the 50% probable value of a log-
normally distributed variable X on either side of which half of the distribution
lies. Let MX be the median value and the geometric mean of X. Thus, we can
write P ( X ≤ MX ) = 0.5 . Further, based on the definition of the normal distribu-
tion, this relationship can be rewritten as
⎛ ln MX − μY ⎞
Fu ⎜
⎝ σY ⎟⎠ = 0.5
Thus,
ln MX − μY
= Fu−1 ( 0.5) = 0
σY
μY = ln( MX ) (5.26)
⎡ 1⎧ 1 ⎛ x ⎞ ⎪⎫ ⎤
2
1 ⎪
fX ( x) = exp ⎢ − ⎨ ln ⎬ ⎥, x ≥ 0 (5.27)
xσ Y 2π ⎢ 2 ⎪⎩ σ Y ⎜⎝ MX ⎟⎠ ⎪⎭ ⎥
⎣ ⎦
denoted as LN ( MX , σ ln X ), MX with σ ln X as parameters.
One can use normal tables as follows. If U is a standardized N(0,1) variable,
then
1 ⎛ ln( x / MX ) ⎞
fX ( x) = fU ⎜
x σ ln X ⎝ σ ln X ⎟⎠
In other words,
1
fX (x) = fU (u) (5.28a)
x σ ln X
where
1 x
u= ln (5.28b)
σ ln X MX
The CDF of X is easily calculated from the tables of the normal distribution:
FX ( x) = P [ X ≤ x ] = P [ln X ≤ ln x ]
= P [ Y ≤ ln x ] = FY [ln x]
⎛ ln x − ln MX ⎞
⎛
⎜
y
ln M( X ) ⎞
⎟
⎟ = FU (u)
FX ( x) = FU ⎜
⎝ σ ln X ⎟⎠ = FU ⎜ σ (5.29)
⎜ ln X ⎟
⎝ ⎠
Example 5.10 Consider the log-normal distribution with parameters mean and
standard deviation of Y. Graph the log-normal distribution for different values
of the parameters. Take the values of standard deviation as 0.1, 0.25, 0.5, 0.75, 1.0,
1.5, 2.5, and 5.0.
Solution The log-normal distribution for two cases is plotted in Fig. 5-9. The
mean and variance of Y can be estimated without transforming the data using
the following relations:
1 ⎛ μX 2 ⎞
μY = ln ⎜ (5.30)
2 ⎝ 1 + CVX ⎟⎠
2
where CVX is the coefficient of variation of X. The mean, variance, and the coeffi-
cient of variation of the log-normal distribution are
⎛ σ Y2 ⎞
μX = exp⎜ μ
⎜ Y + ⎟ (5.32)
⎝ 2 ⎟ ⎠
⎣ exp ( σ Y ) − 1⎦
σX2 = μX2 ⎡ 2 ⎤ (5.33)
⎣ exp ( σ ln Y ) − 1⎦
12
CVX =⎡ 2 ⎤ (5.34)
σ =2.5
0.8 σ =5
0.6
0.4
0.2
0
0 5 10 15
x
σ =2.5
0.4 σ =5
0.3
0.2
0.1
0
0 5 10 15
x
Example 5.11 Assume that the peak flow data of Example 4.18 follow a two-
parameter log-normal distribution. Compute the parameters of the log-normal
distribution. Compute the probability that the peak flow will be less than
100,000 cfs, less than 80,000 cfs, and less than 50,000 cfs in any year. Compare
these probabilities and return periods with those computed in Example 4.18.
Which probabilities and return periods are more realistic? What is the probabil-
ity that peak flow will occur in any year between one standard deviation on
either side of the mean and between two standard deviations on either side of
the mean?
Solution The mean and standard deviations of the data are 28,675.833 cfs and
21,117.138 cfs, respectively. Hence,
CVx = sx /mx = 21,117.138/28,675.833 = 0.736
and
2
sln x 0.658 2
mln x = ln mX − = ln 28675.833 − = 10.264 − 0.216 = 10.048
2 2
Now,
⎛ ln(100000) − 10.048 ⎞
P(x < 100,000) = P ⎜ u < ⎟⎠ = P(u < 2.226)) = 0.9868
⎝ 0.658
⎛ ln(80000) − 10.048 ⎞
P(x < 80,000) = P ⎜ u < ⎟⎠ = P(u < 1.887 ) = 0.9706
⎝ 0.658
⎛ ln(50000) − 10.048 ⎞
P(x < 50,000) = P ⎜ u < ⎟⎠ = P(u < 1.173) = 0.879
⎝ 0.658
The return periods are
R(x < 100,000) = 1/(1 – 0.9868) = 75.75 years
R(x < 80,000) = 1/(1 – 0.9706) = 34.01 years
R(x < 50,000) = 1/(1 – 0.879) = 8.26 years
The log-normal distribution is plotted in Fig. 5-10.
The performance index C in Eq. 5.14 was computed for gamma and log-
normal distributions and the values were 0.001369 and 0.003222, respectively.
Hence, the gamma distribution appears to better represent the data.
Limit and Other Distributions 211
Figure 5-10 Theoretical log-normal distribution and the curve of example data.
exp(sx K x − sx2 / 2) − 1
K= (5.36)
exp(sx2 − 1)
narrowest section, etc. If Y is the maximum of n random variables X1, X2, …, Xn,
then the probability
FY(y) = P[Y ≤ y] = P[All n random variables Xi ≤ y]
FY ( y ) = P⎡
⎣ X1 ≤ y ⎤ ⎦P⎡ ⎤… P⎡
⎣ X2 ≤ y ⎦ ⎣ Xn ≤ y ⎤
⎦ (5.39)
= FX1 ( y )⋅ FX2 ( y )⋅…⋅ FXn ( y )
FY ( y ) = {FX ( y )}
n
(5.40)
Assuming Xi to be continuous random variables with PDF fX(x), one obtains
d n n−1
fY ( y ) = ⎡
⎣ FY ( y ) ⎤
⎦ = n⎡
⎣ FX ( y ) ⎤
⎦ fX ( y) (5.41)
dy
Equations 5.40 and 5.41 can be used to determine the distribution of Y if the
Xis are mutually independent and identically distributed. Three limiting forms
of fY(y) for large values of n are found depending on (1) the interest in the largest
or smallest value and (2) the behavior of the appropriate table of Xi.
FX ( x) = 1 − e − λx (5.43)
and
fY ( y ) = α exp ⎡⎣ − α( y − u) − e − α( y − u) ⎤⎦ (5.45)
where α and u are parameters. This is the extreme value type I (EVI) distribution,
also known as the Gumbel distribution. It is represented as EVI,L(u,α ). Parameter
u is the mode of the distribution and parameter α is a measure of dispersion.
The moments of the distribution are
γ 0.5772
mY = u + ≅u+ , γ = Euler’s constant = 0.5772 (5.46)
α α
π2 1.645
σY2 = 2
≅ (5.47)
6α α2
π 1.282
σY = = (5.48)
6α α
6 ⎡ ⎧ ⎛ T ⎞ ⎫⎤
K= ⎢0.5772 + ln ⎨ln ⎜⎝ T + 1⎟⎠ ⎬⎥ (5.54a)
π ⎣ ⎩ ⎭⎦
Referring to Eq. 5.18, when x = mX (the average of x), one sees that K = 0. This
condition from Eq. 5.55 gives T = 2.33 years. This value (2.33 years) is considered
to be the recurrence interval of the mean annual flood.
Gumbel (1958) showed that asymptotically for large x the following holds:
f ’( x )
T ( x) ∝ − (5.54b)
[ f ( x)]2
This yields the tail behavior of EVI.
214 Risk and Reliability Analysis
Example 5.12 Plot the extreme value type I distribution, with parameters u and
α, for both the largest and the smallest values.
Solution The EVI distribution for the largest values:
Example 5.13 Assume that the peak flow data of Example 4.18 follows a two-
parameter extreme value type I distribution. Compute the parameters of the dis-
tribution and the probability that the peak flow in any year will be less than
100,000 cfs, less than 80,000 cfs, and less than 50,000 cfs. Compare these probabil-
ities and return periods with those computed in Example 4.18. Which probabili-
ties and return periods are more realistic? What is the probability that peak flow
will occur in any year between one standard deviation on either side of the mean
and between two standard deviations on either side of the mean?
Solution For the EVI distribution,
1.282 1.282
α= = = 6.07 × 10 −5
σy 21117.138
and
0.5772
u = 28675.833 − = 19171.474
α
Hence, the nonexceedance probabilities are
Px(μ – 2σ < u < μ + 2σ) = Px(x < 70,910.11) – Px(x < –13,558.4) = 0.958 – 0.041 = 0.917
Limit and Other Distributions 215
1.5
α =1
α =2
α =3
α =4
Probability density
0.5
0
-3 -2 -1 0 1 2 3
y
Figure 5-11 Extreme Value Type I distribution (largest value).
0.7
α =0.1
0.6 α =0.08
α =0.04
0.5 α =0.02
Probability density
0.4
0.3
0.2
0.1
0
-20 -15 -10 -5 0 5 10 15 20
y
Figure 5-12 Extreme Value Type I distribution (smallest value).
Table E5-13
Peak flow (cfs) Probability Return period (years)
100,000 0.993 142.8
80,000 0.975 40
50,000 0.857 7
216 Risk and Reliability Analysis
Figure 5-13 shows the distribution. The performance index for EVI turns out
to be 0.0022. Here too, the gamma distribution (Cgamma = 0.001369) appears to
better represent the data.
Example 5.14 For the peak annual flow in a small stream it is found that
mY = 200 m3/s and σY = 100 m3/s. Compute the PDF of the extreme value type I
distribution for the given data.
Solution We first determine parameters α and u:
1.282 1.282
α= = = 0.01282.
σY 100
0.5772 0.5772
u = mY − = 200 − = 154.98 m 3/s
α 0.01282
Thus,
FY ( y ) = exp ⎡⎣ − e −0.01282( y −154.98 ) ⎤⎦
One can compute the probability that the peak flow in a particular year will
exceed a given value, say, 400 m3/s as
−0.01282 ( 400 − 154.98 )
P [Y ≥ 400 ] = 1 − FY ( 400) = 1 − e − e = 0.0423
Tables of FR(r) and fR(r) and R are available. Then, the values of Y and fY(y) are
easily computed. Note that
R = α (Y – u)
Hence,
R
Y =u+
α
1.2
1.0
0.8
Probability
0.2
0.0
0 20000 40000 60000 80000 100000 120000
Discharge(cfs)
(
FZ ( z) = 1 − exp − e − α( z − u) ,) −∞≤ z≤∞ (5.55)
f Z ( z) = α exp ⎡⎣ α( z − u) − e − α( z − u) ⎤⎦ (5.56)
where
γ 0.5772
mZ = u − =u− (5.57)
α α
π 1.282
σZ = = (5.58)
6α α
γ1 = −1.1396 (5.59)
In view of the asymmetry of the distribution for the largest and the smallest
values, the tables of the EVI distribution for the largest values can also be used
for the smallest values. In terms of the reduced variate R,
Example 5.15 Consider the same mean and variance values as used in Example
5.13. For minimum annual flow in a large stream, EVI might be an appropriate
model.
Solution The mean and the standard deviation are
mz = 200 m3/s
σz = 100 m3/s
0.5772 0.5772
u = mZ + = 200 + = 1245.02 m 3/s
α 0.01282
218 Risk and Reliability Analysis
Y 5 10 15 20 25 30 35 40 45 50
Fy(y) 0.411 0.657 0.820 0.911 0.957 0.979 0.990 0.995 0.998 0.999
1 .0
0 .9
0 .8
0 .7
Probability
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0 .0
0 5 10 15 20 25 30 35 40 45 50
D ays
and
0.577 0.577
u = my − = 8.06 − = 4.213
α 0.150
We first compute
Fy ⎡⎣y ⎤⎦ = exp ⎡⎢ − e ( ) ⎤⎥
− α y −u
⎣ ⎦
The average return period is computed by
1
mN =
1 − Fy ( y )
The results are listed in Table E5-16. The return periods are plotted in Fig. 5-15.
Table E5-16
y Fy(y) Return period (days)
10 0.657198 2.92
20 0.910589 11.18
30 0.979318 48.35
40 0.995348 214.94
50 0.99896 961.57
60 0.999768 4,307.73
70 0.999948 19,304.16
80 0.999988 86,513.50
1000
Return Period (days)
800
600
400
200
0
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Drought (days)
Figure 5-15 Return periods of droughts of various durations.
220 Risk and Reliability Analysis
FY ( y ) = exp[− (u / y ) ],
k
y≥0 (5.63)
k +1
k ⎛ u⎞
fY ( y ) =
u ⎜⎝ y ⎟⎠
(
exp −(u / y )k , ) y≥0 (5.64)
⎛ j⎞
E ⎡⎣Y j ⎤⎦ = u j Γ ⎜ 1 − ⎟ (5.65)
⎝ k⎠
⎛ 1⎞
mY = uΓ ⎜ 1 − ⎟ , k >1 (5.66)
⎝ k⎠
⎡ ⎛ 2⎞ ⎛ 1⎞ ⎤
σ Y 2 = u2 ⎢Γ ⎜ 1 − ⎟ − Γ 2 ⎜ 1 − ⎟ ⎥ , k>2 (5.67)
⎣ ⎝ k⎠ ⎝ k⎠⎦
If the coefficient of variation is represented by VY, then
VY2 = σ Y2 /mY2
or
VY = 2 ( ) − 1,
Γ 1− 2k
k>2 (5.68)
Γ (1 − 1 k )
2
The relation between this distribution and type I is the same as that between
log-normal and normal distributions. If Y has a type II distribution with parame-
ters u and k, then Z = ln Y has the type I distribution with parameters u0 = ln u
and α = k. It follows that
FY ( y ) = FZ (ln y ) (5.69)
1
fY ( y ) = f Z (ln y ) (5.70)
y
Limit and Other Distributions 221
FY ( y ) = FW ⎡⎣(ln y − ln u) k ⎤⎦ (5.71)
k
fY ( y ) = fW ⎡⎣(ln y − ln u) k ⎤⎦ (5.72)
y
The EVII distribution has also been used to model the annual maximum
wind velocity.
Example 5.17 For the extreme value type II distribution, plot the coefficient of
variation versus k. Graph the distribution for various values of the parameters.
Solution The graph of CV versus k for the extreme value type II distribution,
based on Eq. 5.68, is shown in Fig. 5-16. The shape of the distribution for various
combinations of parameters is given in Fig. 5-17.
Example 5.18 From the measured wind data at an airport location, the mean
and standard deviation of the maximum annual wind velocity were
mY = 60 km/hour and σY = 12.6 km/hour, respectively. Find the wind velocity
that will be exceeded with a probability of 0.05 in any year.
Solution To determine parameters u and k, first calculate
CV = σY / mY = 12.6/60 = 0.21
3
u=1,k=3
2.5 u=1,k=4
u=1,k=5
u=1,k=6
Probability density
2 u=1,k=7
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
y
1.4
u=2,k=3
1.2 u=2,k=4
u=2,k=5
1 u=2,k=6
Probability density
u=2,k=7
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
y
1.4
u=1,k=3
1.2 u=2,k=3
u=3,k=3
1 u=4,k=4
Probability density
u=5,k=5
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8
y
Figure 5-17 Extreme value type II distribution for various values of parameters.
Limit and Other Distributions 223
60 60 60
u= = = = 53.67 km/hour
⎛ 1 ⎞ Γ(0.844) 1.118
Γ ⎜1 − ⎟
⎝ 6.4 ⎠
⎡ ⎛ 53.67 ⎞ k ⎤
FY ( y ) = exp ⎢ − ⎜ ⎥
⎢ ⎝ y ⎟⎠ ⎥
⎣ ⎦
The velocity y, which will be exceeded with probability 0.05 = 1/20 in any
year, is found as
P[Y ≥ y] = 1 – P[Y ≤ y] = 1 – FY(y)
⎡ ⎛ 53.67 ⎞ 6.4 ⎤
exp ⎢ − ⎜ ⎥ = 0.95
⎢ ⎝ y ⎟⎠ ⎥
⎣ ⎦
Solving for y, one gets y = 84.76 km/hour.
Example 5.19 The data for the maximum annual wind velocity in the Baton
Rouge area are given in Table E5-19. Compute the mean and standard deviation
of the maximum annual wind velocity. Assume that the peak wind velocity fol-
lows an extreme value type II distribution. Compute the 20-, 30-, 50-, 80-, and
100-year wind velocity.
Solution Table E5-19 gives the maximum annual wind velocity and direction
from 1980 to 2000. The coefficient of variation is
8.83
CV = = 0.186
47.38
From the CV function, k = 8.5. Then,
47.38
u= = 43.74
⎛ 1 ⎞
Γ ⎜1 − ⎟
⎝ 8.5 ⎠
Thus
⎡ ⎛ 43.74 ⎞ 8.5 ⎤
FY ( y ) = exp ⎢ − ⎜ ⎥
⎢ ⎝ y ⎟⎠ ⎥
⎣ ⎦
224 Risk and Reliability Analysis
Table E5-19
Year Month Day Speed (mph) Wind direction
(degrees)
1980 7 18 41 30
1981 3 22 29 210
1982 5 22 35 110
1983 1 31 35 270
1984 2 12 44 270
1985 4 5 51 330
1986 8 10 41 180
1987 12 14 48 270
1988 7 4 54 180
1989 5 16 51 220
1990 5 31 48 360
1991 4 25 46 270
1992 8 26 70 130
1993 7 3 60 110
1994 3 27 43 180
1995 12 18 52 300
1996 3 18 49 290
1997 7 10 47 280
1998 2 10 51 170
1999 5 10 52 250
2000 8 30 48 130
Mean 47.38 216.19
Standard deviation 8.83 83.99
⎡ ⎛ w − y⎞k ⎤
FY ( y ) = exp ⎢ − ⎜ ⎟ ⎥, y≤w (5.74)
⎢⎣ ⎝ w − u ⎠ ⎥⎦
k −1 ⎡ ⎛ w − y⎞k ⎤
k ⎛ w − y⎞
fY ( y ) = ⎜ ⎟ exp ⎢ − ⎜ ⎟ ⎥, y≤w (5.75)
w − u ⎝ w − u⎠ ⎢⎣ ⎝ w − u ⎠ ⎥⎦
where u and k are parameters of the distribution, k is the scale parameter and w
is the location parameter, and u is the lower limit of x. When k = 2, this results in
a triangular distribution. Most useful applications of this distribution deal with
the smallest values. Hence, the left-hand tail of the PDF of Xi satisfies X ≥ ε such
that near X = ε, the CDF has the form
FX ( x) = c( x − ε)k , x≥ε
where ε is the lower limit of x. For ε = 0, the gamma distribution acquires this
form. For independent and identically distributed Xi, the distribution of Z is
⎡ ⎛ z − ε⎞ k ⎤
FZ ( z) = 1 − exp ⎢ − ⎜ ⎟ ⎥, z≥ε (5.76)
⎢⎣ ⎝ u − ε ⎠ ⎥⎦
k ⎛ z − ε ⎞ k −1 ⎡ ⎛ z − ε⎞ k ⎤
f Z ( z) = ⎜⎝ ⎟⎠ exp ⎢ − ⎜ ⎟ ⎥, z≥ε (5.77)
u−ε u−ε ⎢⎣ ⎝ u − ε ⎠ ⎥⎦
⎛ 1⎞
mZ = ε + (u − ε)Γ ⎜ 1 + ⎟ (5.78)
⎝ k⎠
⎡ ⎛ 2⎞ 2⎛ 1⎞ ⎤
σ Z 2 = ( u − ε)2 ⎢Γ ⎜ 1 + ⎟ − Γ ⎜⎝ 1 + ⎟ (5.79)
⎣ ⎝ k⎠ k ⎠ ⎥⎦
⎛ j⎞
E ⎡( Z − ε) j ⎤ = (u − ε) j Γ ⎜ 1 + ⎟ (5.80)
⎣ ⎦ ⎝ k⎠
This distribution gives rise to the Weibull distribution, commonly employed in
studies on reliability of the lifetimes of components, rainfall analysis, and so on.
If ε = 0,
(
FZ ( z) = 1 − exp −( z / u)k , ) z≥0 (5.81)
226 Risk and Reliability Analysis
with
⎛ 1⎞
mZ = uΓ ⎜ 1 + ⎟ (5.82)
⎝ k⎠
⎧ ⎡ 2⎤ ⎡ 1 ⎤⎫
σ Z 2 = u2 ⎨Γ ⎢1 + − Γ 2 ⎢1 + ⎬ (5.83)
⎩ ⎣ k ⎥⎦ ⎣ k ⎥⎦ ⎭
CVZ2 = σ Z2 /mZ2
⎛ 2⎞
Γ ⎜1 + ⎟
⎝ k⎠
CVZ 2 = −1 (5.84)
⎛ 1⎞
Γ2 ⎜ 1 + ⎟
⎝ k⎠
1
f Z ( z) = f x [ln ( z - e )]
z-e (5.86)
k Ê z-eˆ
= fW Á - k ln ˜, z>e
z-e Ë u-e¯
When ε = 0, the extreme value type III distribution is the gamma distribu-
tion. The probability density function of the gamma distribution is
x
1 −
α −1 β
f ( x ; α, β) = x e
β α Γ( α)
x α −1 e − x
f ( x ; α) =
Γ( α)
0.300
alpha=1
alpha=2
f(x)
0.200
alpha=3
0.100
0.000
0 3 6 x 9 12
EV III Distribution
0.35
0.3 alpha=1 beta=2
0.25 alpha=2 beta=2
alpha=3 beta=2
0.2
f(x)
0.15
0.1
0.05
0
0 3 x 6 9 12
Example 5.20 Assume that the minimum annual low flows follow an extreme
value type III distribution. For the Amite River at Darlington, Louisiana, com-
pute the mean and standard deviation of the low flow data. Also, compute the
distribution parameters.
Solution From the data of annual minimum low flow one obtains
mean = 106.3235 cfs
standard deviation = 28.03755 cfs
28.03755
CV = = 0.264
106.3235
From the CV function or Eq. 5.84, k = 4.278. Now, Eq. 5.82 gives
106.3235
u= = 116.852
⎛ 1 ⎞
Γ ⎜1 + ⎟
⎝ 4.278 ⎠
The distribution of this annual low flow data is plotted in Fig. 5-19.
228 Risk and Reliability Analysis
0.016
0.014
0.012
Probability density
0.01
0.008
0.006
0.004
0.002
0
0 20 40 60 80 100 120 140 160 180 200
Annual low flows (cfs)
Figure 5-19 Frequency distribution of annual low flow data of Amite River, Darlington.
where a > 0 and c are, respectively, the scale and location parameters, and b is a
shape parameter. The range of X depends on the value of b; it is bounded by
c + (a/b) from above for b > 0 [i.e., –∞ < x < c + (a/b)] and is bounded from below
for b < 0 [i.e., c+(a/b) < x < ∞]. Depending on the value of b, different extreme
value distributions are represented by Eq. 5.87. For example, the GEV distribu-
tion corresponds to the Gumbel distribution (or extreme value type I) for b = 0,
the extreme value type II distribution for b < 0, and the extreme value type III
distribution for b > 0. Equation 5.87 gives rise to a reverse Raleigh distribution
for b = 2 and to a reverse exponential distribution for b = 1. It can also be shown
that the Weibull distribution is a reverse GEV distribution.
The CDF of the GEV distribution can be expressed as
⎡ 1⎤
⎛ b ⎞
FX ( x) = exp ⎢ − ⎜ 1 − ( x − c)⎟ ⎥
b
⎢ ⎝ a ⎠ ⎥ (5.88)
⎢⎣ ⎥⎦
Limit and Other Distributions 229
1
fX ( x) = exp[− y − exp(− y )] (5.89)
a(1 − z)
where
1
y = − ln(1 − z) (5.90)
b
b
z = ( x − c) (5.91)
a
The three parameters, a, b, and c, of the GEV distribution can be estimated by
using the method of moments. For b < 0 (extreme value type II distribution) the
first three moments, by using the transformation
1
⎡ b ⎤b
y = ⎢1 − ( x − c)⎥ (5.92)
⎣ a ⎦
are found to be
1
M10 = c + [1 − Γ(1 + b)] (5.93)
b
a2
M2 = [Γ(1 + 2b) − Γ 2 (1 + b)] (5.94)
b2
a3
M3 = [−Γ(1 + 3b) + 3Γ(1 + b)Γ(1 + 2b) − 2Γ 3 (1 + b)] (5.95)
b3
where M10 , M2, and M3 are, respectively, the first moment about the origin and
the second and third moments about the centroid. The value of b is computed
numerically from the relationship to the coefficient of skewness Cs , defined as
M3
Cs = (5.96)
M23 / 2
Example 5.21 Assume that annual peak flows follow the GEV distribution for
the Amite River at Darlington, Louisiana. Compute the mean and standard devi-
ation of the peak flow data. Also, compute the distribution parameters.
Solution For the peak flow data at Darlington, Louisiana, one has mean =
20,371 cfs, standard deviation = 20,643 cfs, Cv = 0.7542, Cs = 1.37.
Thus parameter b can be obtained by solving the following equation (Rao
and Hamed 2000):
Then from Eq. 5.94 and Eq. 5.95 we have a = 16,462, c = 20,370.
The distribution of this annual peak flow data is plotted in Fig. 5-20.
Equation 5.98 defines the uniform distribution, which is also known as the
rectangular distribution. Its cumulative distribution function is triangular in
shape:
⎧0 x<0
⎪
FX ( x) = ⎨ x 0≤ x≤1 (5.98)
⎪1 x >1
⎩
The mean and variance of the rectangular distribution are
1
μX = (5.99)
2
1
σ x2 = (5.100)
12
This rectangular distribution can be generalized to any arbitrary range a to b.
Then, the PDF becomes
⎧ 1
⎪⎪ b − a a≤x≤b
fX ( x) = ⎨ (5.101)
⎪
⎪⎩ 0 elsewhere
Limit and Other Distributions 231
-5
x 10
2.5
2
Probability density
1.5
0.5
0
0 2 4 6 8 10 12
Peak flow (cfs) 4
x 10
Figure 5-20 The GEV distribution of annual peak flow.
a+b
μX = (5.102)
2
(b − a)2
σ 2X = (5.103)
12
The uniform distribution is shown in Fig. 5-21.
1 r −1 r>0
fX ( x) = x (1 − x)t − r −1 , (5.104)
B t−r>0
(r − 1)!(t − r − 1)!
B= (5.105)
(t − 1)!
1/(b-a)
fX(x)
a x b
Figure 5-21 Generalized uniform or rectangular distribution.
if r and t – r can take on noninteger values. The mean and variance of the beta
distribution are
r
mX = (5.107)
t
mX (1 − mX ) r(t − r )
σX2 = = 2 (5.108)
t+1 t (t + 1)
1 r ⎡ (r + 2)(r + 1) 3r(r + 1) 2r 2 ⎤
γ= ⎢ − + 2 ⎥ (5.109)
σ x 3 t ⎣ (t + 2)(t + 1) t(t + 1) t ⎦
The beta distribution can assume a wide variety of shapes for different val-
ues of its parameters r and t. It reduces to the uniform distribution for r = 1, t = 2
and to the triangular distribution for t = 3, r = 1 or 2. It is symmetrical about
x = 0.5 if r = t/2. It is skewed to the right if r < t/2 and to the left if r > t/2. It is U–
shaped if r < 1 and t ≤ 2r. It is J-shaped if r < 1 and t > r + 1 or if r > 1 and t < r + 1.
It is unimodal and bell shaped (generally skewed) if r > 1 and t > r + 1 with the
mode at x = (r – 1)/(t – 2). The distribution is plotted in Fig. 5-22 for several com-
binations of parameters.
For integer values of t and r, the tables of the binomial distribution can be
used to evaluate fX(x) and FX(x). The binomial distribution for Y as a function of
n and p can be written as
n!
PY ( y ) = p y (1 − p)n− y (5.110)
y !(n − y )!
(t − 1)!
fX ( x) = x r −1 (1 − x )t − x −1 (5.111)
(r − 1)!(t − r − 1)!
Limit and Other Distributions 233
f(x)
f X ( x) = (t − 1)pY ( y ) (5.112)
if PY(y) is evaluated at n = t – 2 and y = r – 1 for various values of p = x.
For a ≤ x ≤ b, the beta distribution can be generalized to
1
fY ( y ) = t −1
( y − a)r −1 (b − y )t − r −1 , a≤y≤b (5.113)
B(b − a)
when
r
mY = a + (b − a) (5.114)
t
r(t − r )
σ Y 2 = (b − a)2 (5.115)
t 2 (t + 1)
There is a simple relation among Y, BT(a, b, r, t), and X, which is a BT(0, 1, r, t)
variable. In particular,
Y = a + (b – a)X (5.116)
Thus,
⎛ y − a⎞
FY ( y ) = FX ⎜ (5.117)
⎝ b − a ⎟⎠
1 ⎛ y − a⎞
fY ( y ) = fX ⎜ ⎟ (5.118)
b − a ⎝ b − a⎠
234 Risk and Reliability Analysis
a = 0 , b = 360
r
360 = mY = 200
t
r(t − r )
(360)2 2 = σ y 2 = 10000
t (t + 1)
1 ⎛ y ⎞
fY ( y ) = fX ⎜ ⎟
360 ⎝ 360 ⎠
where X is a beta-distributed variable, BT(1.22, 2.2)
Example 5.23 The data on wind direction at the Baton Rouge airport measured
in degrees from north are given in Example 5.19. Compute the mean and stan-
dard deviation of the wind direction in degrees. Assume the beta distribution is
appropriate for modeling the wind direction and compute the distribution
parameters. Compute the probability of the wind direction (measured from
north) exceeding 45°, 60°, 90°, 120°, 145°, 180°, 200°, 245°, 290°, 320°, and 345°.
Compute the recurrence intervals of these wind directions.
Solution From the data, my = 216.19 and σ y = 83.99 . The direction is limited
between 0° and 360°. Thus, for a beta distribution,
a = 0, b = 360°
r
× 360 = my = 216.19
t
r (t − r )
(360)2 × = σ 2y = 83.992
t 2 (t + 1)
1
fY ( y ) = 3.41−1 ( y − 0)2.05−1 (360 − y )3.41− 2.05−1
0.3 × ( 360 − 0)
= 2.302 × 10 −6 y 1.05 ( 360 − y )
0.36
y
FY ( y ) = ∫ fY ( y )dy
0
360
FY’ ( y ) = 1 − FY ( y ) = ∫ fY ( y )dy
y
Table E5-23
45 0.984 1.02
60 0.967 1.03
90 0.918 1.09
1 ⎛ x − γ ⎞ β −1 ⎛ x − γ⎞
fX ( x) = ⎜ ⎟ exp ⎜ − ⎟ γ<x<∞ (5.119)
αΓ( β) ⎝ α ⎠ ⎝ α ⎠
∞
( x − γ )r ⎛ x − γ ⎞ β − 1 ⎛ x − γ⎞
Mrc = ∫ α Γ( β) ⎜⎝ α ⎟⎠ exp ⎜⎝ − α ⎟⎠ dx (5.120)
c
ˆ = (m2 / βˆ )
α (5.122)
γˆ = m1′ − m2 βˆ (5.123)
β −1
1 ⎛ log( x ) − γ ⎞ ⎛ log( x ) − γ ⎞
f (x) = ⎜ ⎟⎠ exp ⎜ − ⎟⎠ γ<x<∞ (5.124)
αxΓ( β) ⎝ α ⎝ α
y
1
y b −1e − y dy
Γ(b) ∫0
FY ( y ) = (5.126)
Example 5.24 The data of annual maximum stage and discharge for the Amite
River at Darlington have been given in Example 3.17. For the discharge data, n = 48,
mean mX = 28,676 cusecs, coefficient of variation cv = 0.736, and coefficient of skew-
ness cs = 1.34. For these data, compute the parameters of the gamma distribution.
Solution We first compute β using Eq. 5.121:
β = (2/1.34)2 = 2.228
Now, from Eq. 5.122, we get
where F = F(x) = P(X ≤ x). For this distribution, explicit expressions of the proba-
bility density function or cumulative distribution function are not available. This
distribution is not commonly used because its application requires estimation of
five parameters.
The Pareto distribution is a special case of the Wakeby distribution. If in
Eq. 5.130, γ = 0, we get
⎧ 1
⎪ ⎛ x − c ⎞a
⎪1−⎜1− a b ⎟ , a≠0
F( x ) = ⎨ ⎝ ⎠
(5.131)
⎪ ⎛ x−c⎞
⎪1− exp⎜− ⎟, a = 0
⎩ ⎝ b ⎠
⎧ 1
⎪1⎛ x − c ⎞a−1
⎪ ⎜1 − a b ⎟ , a≠0
f ( x) = ⎨b⎝ ⎠
(5.132)
⎪1 ⎛ x−c ⎞
⎪ exp⎜ ⎟, a=0
⎩b ⎝ b ⎠
b
x =c+ (5.133)
1+ a
b2
μ2 = (5.134)
(1 + a)2 (1 + 2a)
2(1 − a) (1 + 2a)
Cs = (5.135)
(1 + 3 a)
Example 5.25 For the Amite River data used in Example 5.24, important param-
eters are n = 48, mean mx = 28,676 cusecs, coefficient of variation cv = 0.736, and
coefficient of skewness cs = 1.34. For these data, find the parameters of the Pareto
distribution.
Limit and Other Distributions 239
2(1 − a) (1 + 2 a)
1.34 =
(1 + 3 a)
3
α= σ (5.138)
π
and
m= x (5.139)
3
KT = log(T − 1)
π
and the T-year return period flood is
xT = m + α log (T – 1)
Example 5.26 For the Amite River data given in Example 5.24, find the parame-
ters of the logistic distribution and a 100-year return period quantile.
Solution The parameters of the distribution are estimated as
3
α= (0.736 × 28676) = 11636
π
m = 28,676
The 100-year flood will be
X100 = 28,676 + 11,636 log (100 – 1) = 51,897 cusec
Ï 2 ( x - a)
Ô (b - a) (c - a) for a £ x £ c
Ô
fX ( x) = Ì
2 (b - x )
Ô for c £ x £ b (5.140)
ÔÓ (b - a) (b - c )
where a, b, and c are the minimum, maximum, and mode values of X as defined
in Fig. 5-24.
Limit and Other Distributions 241
a c b
x
Figure 5-24 Probability density function for a triangular distribution.
The cumulative density function FX(x) for this triangular distribution is given by
Ï ( x - a)2
Ô for a £ x £ c
Ô (b - a ) (c - a )
FX ( x) = Ì
2
Ô 1 - (b - x ) for c £ x £ b (5.141)
Ô (b - a ) (b - c )
Ó
Equations 5.140 and 5.141 are convenient to use when parameters a, b, and c
for the population of X are known. Generally, it is very unlikely to have the pop-
ulation data of a random variable. We most often only have the sample data and
their characteristics, such as the mean (μx), coefficient of variance (CVx), and
skew (γx). Thus, it is necessary to determine parameters a, b, and c, given the val-
ues of μx, CVx, and γx. The sample characteristics μx , CVx , and γx can be repre-
sented in terms of a, b, and c as
1
μX = ( a + b + c) (5.142)
3
1 2
var(X ) = σ 2 = ⎡ a + b 2 + c 2 − ( ab + bc + ca)⎤ (5.143)
18 ⎣ ⎦
1 a2 + b 2 + c 2 − ( ab + bc + ca) (5.144)
CVX =
2 a+b+c
γX =
3
( 3 3
)
2 2 a + b + c − 3 [ ab ( a + b) + bc ( b + c) + ca ( c + a)] + 12 abc
3 (5.145)
5 ⎡ a2 + b 2 + c 2 − ( ab + bc + ca) ⎤ 2
⎣ ⎦
242 Risk and Reliability Analysis
If the values of μx, CVx, and γx are known then a unique triangle can be delin-
eated by determining its parameters a, b, and c. These parameters can be
obtained by using the following equation (Tyagi, 2000):
⎧ ⎡ 2 πn 1 ⎛ 5 ⎞ ⎤⎫
a = μX ⎨1 + 2 2CVX cos ⎢ + cos −1 ⎜ γx ⎟ ⎥⎬ (5.146)
⎩ ⎣ 3 3 ⎝ 2 2 ⎠ ⎦ ⎪⎭
2 ⎡⎣(b − c) ar + 2 + (c − a) b r + 2 + ( a − b) c r + 2 ⎤⎦
μ’r = E ⎡⎣ X ⎦ =
r⎤ (5.147)
( r + 1) ( r + 2) (b − c) (c − a) (b − a)
For a symmetrical triangle, γX = 0 and the parameters a, b, and c can be
obtained corresponding to n = 1, 0, and 2. The obtained c is the μX and the
parameters a and b are the same as those obtained using the method of
moments. The estimates of a and b are given as
(
â = μX 1 − 6CVX ) (5.148)
(
b̂ = μX 1 + 6CVX ) (5.149)
Using Eqs. 5.147, 5.148, and 5.149 one can rewrite the expression for E[Xr] as
μrX
( ) ( )
⎡ 1 + CV 6 r+2 r+2
E ⎡⎣ X r ⎤⎦ = + 1 − CVX 6 − 2⎤ (5.150)
6 ( r + 1) ( r + 2) CVX2 ⎣⎢ X
⎦⎥
Example 5.28 For the previous example determine the various noncentral and
central moments.
Limit and Other Distributions 243
Solution Substituting values of a, b, and c in Eq. 5.147 gives the first four
moments about the origin. Then, the relationship between the central and non-
central moments is used to determine the moments about the origin. The
obtained results are as follows:
where m > 0 is a scale parameter, α > 0 and υ are shape parameters, and Ku is the
modified Bessel function of the second kind of order υ. Equation 5.151 contains
three parameters and is a reparameterized form of the generalized inverse Gaus-
sian distribution (Good 1953). Taking υ = 0 specializes Eq. 5.151 into the har-
monic distribution.
The PDF of the Halphen type B distribution can be expressed as
2 x x
fB ( x) = 2υ
x 2 υ−1 exp[−( )2 + α( )] (5.152)
m ef υ ( α) m m
where m > 0 is a scale parameter, α and υ > 0 are shape parameters, and efυ(α ) is
the exponential factorial function (a normalizing function). The Halphen type B
distribution is used for modeling smaller values in data sets.
The PDF of the Halphen inverse B distribution can be written as
2 x m
fIB ( x) = −2 υ
x−2 υ−1 exp[−( )2 + α( )] (5.153)
m ef υ ( α) m x
where parameters have the same connotation as in the case of the type B distri-
bution. The relation between the type B distribution and type IB distribution is
seen by noting that if X follows a type B distribution then 1/X follows a type IB
distribution.
For the tail behavior of these distributions on the relationship between
return period and quantile, Morlat (1956) and Ouarda et al. (1994) reported the
following:
1. For the Halphen type A, the gamma, and the Gumbel distributions,
x ∝ ln T .
5.6 Questions
5.1 Obtain one-day maximum rainfall for each year for Houston, Texas, for a
period of 30 to 50 years. Compute the mean, standard deviation, and
coefficient of variation of the daily maximum values. Fit a suitable distri-
bution. Give the parameter values. Do the same for two-day and three-
day maximum rainfall.
5.2 Use the maximum yearly discharge values for the Amite River at Dar-
lington for as long a period as you can get. Compute the mean, standard
deviation, and coefficient of variation of the instantaneous maximum
discharge values. Fit a suitable distribution.
5.3 A random variable X has a mean of 5,000 and a standard deviation of
1,000. Compute the probability that this variable will have a value less
then 7,000. Compute the probability that the random variable X will be
less than 3,000.
5.4 Assuming that the data of Example 4.18 follow a normal distribution,
compute the probability that the peak flow will be less than 50,000 cfs,
less than 30,000 cfs, and less than 20,000 cfs in any year. What is the
return period of each of these flows? What is the probability that the
peak flow will occur in any year between one standard deviation,
between two standard deviations, and between three standard devia-
tions on either side of the mean?
5.5 Consider a binomial random variable X with n = 30 and p = 0.3. Evaluate
the probability 3 < X ≤ 8 using the binomial as well as the normal approxi-
mation of the binomial distribution. What will be the answer when n = 50?
5.6 Assume that the peak flow data of Example 4.18 follow a two-parameter
log-normal distribution. Compute the parameters of the log-normal dis-
tribution. Compute the probability that the peak flow will be less than
150,000 cfs, less than 100,000 cfs, and less than 70,000 cfs in any year.
What is the probability that peak flow will occur in any year between
one standard deviation, between two standard deviations, and between
three standard deviations on either side of the mean?
5.7 Assume that the peak flow data of Example 4.18 follow a two-parameter
extreme value type I distribution. Compute the parameters of the distri-
bution and the probability that the peak flow in any year will be less
than 150,000 cfs, less than 100,000 cfs, and less than 70,000 cfs. What is
the probability that peak flow will occur in any year between one stan-
dard deviation, between two standard deviations, and between three
standard deviations on either side of the mean?
248 Risk and Reliability Analysis
5.8 For the peak annual flow in a small stream it is found that mY = 500 m3/s
and σY = 200 m3/s. Compute the PDF of the extreme value type I distri-
bution for peak annual flow.
5.9 From the measured data of wind at an airport location, the mean and
standard deviation of the maximum annual wind velocity were mY =
50 km/hour and σY = 15 km/hour, respectively. Find the wind veloc-
ity that will be exceeded with a probability of 0.01 in any year.
5.10 For the data of annual maximum discharge for a river, n = 50, mean mX =
30,000 cusecs, coefficient of variation cv = 0.50, and coefficient of skew-
ness cs = 2.00. For these data, compute the parameters of the gamma
distribution.
5.11 Consider the normal distribution with parameters μ and CV. Graph the
normal distribution for μ = 1, CV = 0.01. Keep the value of μ constant but
increase CV by 25% in each step. Perform this calculation for about 20
steps and plot both the PDF and cumulative PDF. For each step also cal-
culate the P (x < 0) and plot it with respect to CV. What conclusion can
you draw from this plot about the nature of the probability distributions
of hydrologic and water quality variables?
5.12 A random variable X has a mean of 300 and a standard deviation of 100.
Compute the following probabilities:
(a) X will have a value less than 50.
(b) X will have a value more than 550.
(c) X will be between 50 and 550.
5.13 Consider a binomial random variable X with n = 22 and p = 0.7. Evaluate
the probability 4 < X ≤ 8 using the binomial distribution, the Poisson dis-
tribution, and the normal approximation of the binomial distribution.
Repeat the same calculation for n = 30, n = 50, and n = 100. Comment on
your results about the approximation of a binomial distribution by the
Poisson and normal distributions.
5.14 Choose n random numbers u1, u2,…, un from a uniform PDF in the interval
[0, 10] and obtain the distribution of the mean value μn = (u1+ u2+…+ un
/n). Plot and compare the resulting PDFs of μn.
5.15 Choose n random numbers from an exponential PDF in the interval [0,
+∞ ] with the parameter λ = 10 and obtain the distribution of the mean μn.
Plot and compare the resulting PDFs.
5.16 Assume that the peak flow data of Example 3.17 follow a two-parameter
log-normal distribution. Compute the parameters of the log-normal dis-
tribution. Compute the probability that the peak flow will be less than
10,000 cfs, less than 8,000 cfs, and less than 15,000 cfs in any year. What is
the probability that peak flow will occur in any year between one
Limit and Other Distributions 249
standard deviation on either side of the mean and between two standard
deviations on either side of the mean?
5.17 Assume that the peak flow data of Example 3.17 follow a two-parameter
extreme value type I distribution. Compute the parameters of the distri-
bution and the probability that the peak flow in any year will be less
than 10,000 cfs, less than 8,000 cfs, and less than 15,000 cfs. Compare
these probabilities and return periods with those computed in Question
5.16. Which distribution is a better choice to characterize the peak flow?
What is the probability that peak flow will occur in any year between
one standard deviation on either side of the mean and between two stan-
dard deviations on either side of the mean?
5.18 Assume that the peak flow data of Example 3.17 follow a two-parameter
gamma distribution. Compute the parameters of the distribution and the
probability that the peak flow in any year will be less than 10,000 cfs, less
than 8,000 cfs, and less than 15,000 cfs. Compare these probabilities and
return periods with those computed in Question 5.17. Do you think a
gamma distribution is a better choice to characterize the peak flow?
What is the probability that peak flow will occur in any year between
one standard deviation on either side of the mean and between two stan-
dard deviations on either side of the mean?
5.19 Repeat Question 5.18 using the log Pearson type III distribution.
5.20 For the peak annual flow in a stream, the mean and standard deviation of
the peak flow are 2,000 and 1,200 m3/s, respectively. Compute the PDF of
the peak flow distribution under the following assumptions:
(a) Peak flow is described by an extreme value type I distribution.
(b) Peak flow is described by a log-normal distribution.
(c) Peak flow is described by a gamma distribution.
(d) Peak flow is described by log Pearson type III distribution.
5.21 For National Pollutant Discharge Elimination System permits it is com-
mon practice to evaluate the water quality status during low flow condi-
tions. For the Chattahoochee River downstream of the Buford Dam at
Lake Lanier, the 7-day minimum flow is given in Table Q5-21. Using
these data, select the most appropriate distribution from various candi-
date distributions, such as the extreme value type I, log-normal, gamma,
log Pearson type III, etc. Explain, using the fitting characteristics, why
you feel that your selected distribution is the most appropriate choice.
Further, compute the probability of the low flows for the return periods
of 10, 20, 30, 40, 50, 60, 70, and 90 years.
5.22 From the measured data of air temperature at a given location, the mean
and standard deviations of the maximum temperature were 50°F and
15.5°F, respectively. Find the temperature that will be exceeded with prob-
ability of (a) 0.05°F in any year, (b) 5°F in any year, and (c) 0°F in any year.
250 Risk and Reliability Analysis
Table Q5-21
Year 7-day Year 7-day Year 7-day Year 7-day
min min min min
flow flow flow flow
(cfs) (cfs) (cfs) (cfs)
1958 12.43 1970 15.29 1982 17.57 1994 31.86
1959 9.43 1971 26.14 1983 14.00 1995 14.00
1960 13.00 1972 14.29 1984 23.29 1996 24.29
1961 16.14 1973 28.00 1985 25.00 1997 12.43
1962 13.86 1974 19.86 1986 12.29 1998 13.86
1963 18.14 1975 41.43 1987 11.00 1999 10.87
1964 25.00 1976 18.71 1988 10.01 2000 7.36
1965 16.71 1977 17.71 1989 21.86 2001 12.57
1966 28.29 1978 9.21 1990 18.86 2002 5.47
1967 26.43 1979 15.86 1991 25.29 2003 24.71
1968 25.43 1980 22.43 1992 20.86 2004 41.14
1969 24.43 1981 6.20 1993 14.29
5.23 Assume that annual peak flow of Salt Creek at the USGS gauge near
Rowell, Illinois, is described by the GEV distribution as given in Question
3.22. Compute the mean and standard deviation of the peak flow data.
Also, compute the parameter of the distribution parameters.
5.24 The data of annual maximum stage for Salt Creek near Rowell, Illinois,
has been given in Question 3.22. For the stage data, n = 34, mX = 20.34 ft,
CV = 0.16, and cs = 0.53. For these data, compute the parameters of the
gamma distribution.
5.25 Repeat Question 5.24 using the Pareto distribution.
5.26 Determine the parameter of a random variable X if it is defined by a tri-
angular distribution with mean, CV, and skew of 100, 0.33, and 0.1,
respectively. Sketch both the frequency and cumulative distribution
functions. Further, sketch both the frequency and cumulative distribu-
tion functions by increasing the CV by 10% and varying the skew
between − 0.55 and 0.55. For each case determine the probability of X <
0. What is the maximum skew that a triangular distribution can
describe?
5.27 Assume that peak discharge Q is exponentially distributed with mean
μQ and variance σ 22 . What is the probability distribution of stage S?
σQ
Q
Suppose stage and discharge are related by Q = aSb.
5.28 A set of data having a mean of 6.5 and a standard deviation of 2.5 is
thought to follow the extreme value type I distribution for minima. What
Limit and Other Distributions 251
Impulse Response
Functions as Probability
Distributions
252
Impulse Response Functions as Probability Distributions 253
6.1 Hypothesis
It is hypothesized that if an environmental variable is described by a linear or lin-
earized governing equation, then the solution of this equation for a unit impulse
(or Dirac delta) function (UIF) can be interpreted as a PDF for describing the
probabilistic properties of the random variable, say, X. The solution for the UIF
can be characterized as the UIR or h(t). If the UIR is a function of time t, then the
PDF is a mapping from the (h, t) plane to the (f, x) plane, where x is the value (or
quantile) of the random variable X for which h(x, t) is desired, and f is the PDF.
There are many environmental variables that can be reasonably well
described linearly. If some of the variables cannot be described linearly in the real
domain, then they can be described linearly in the logarithmic domain or in an
appropriate transformed domain. Examples of linear approximation are surface
runoff from rainfall excess, river flow, monthly sediment discharge, and solute
concentration in a tube or soil. Thus, their UIRs can be considered as their PDFs.
It is not surprising that several probability distributions have found their niche in
linear environmental analyses. This hypothesis will be explored in what follows.
Impulse Response Functions as Probability Distributions 255
dS(t) dQ
I (t) − Q(t) = =k (6.3)
dt dt
Solution of Eq. 6.3 gives
Qt = I[1 – exp(–t/k)] t≤ D (6.4)
and
Qt = QP exp[–(t–D)/k] t≥D (6.5)
where D is the inflow duration, and QP is the peak of outflow hydrograph given by
QP = I[1 – exp(–D/k)] (6.6)
I(t)
k
Qp
h(t) (a)
Q(t)
D
exp(-t/k)
t
t
(b) (c)
Figure 6-1 Depiction of linear reservoir concept: (a) lag time, (b) IUH, and
(c) hydrograph due to a pulse of D hour duration.
256 Risk and Reliability Analysis
The linear reservoir has been used for rainfall–runoff modeling either by
itself or as an element of a network model. For instantaneous inflow that fills the
reservoir in time t = 0,
S
Q= exp(−t / k ) (6.7)
k
If I(t) is denoted by a unit delta function δ(t), then the UIR of the linear reservoir,
h(t), is
exp(−t / k )
h(t) = (6.8)
k
In hydrology, h(t) is known as the instantaneous unit hydrograph (IUH). The
determination of h(t), the impulse response function (or the kernel function or
Green’s function) of a system from input and output data, is known as system
identification. Convolution of the impulse response function with the system
inputs gives the system output. Then, the PDF of a variable described by a linear
reservoir becomes
exp(− x / k )
f ( x) = (6.9)
k
where x is the quantile of the variable X described by the linear reservoir and k is
a parameter. Thus, it is seen that h(t) is mapped onto f(x). Equation 6.9 is an
exponential density function and is widely used in environmental and water
resources. For example, if an environmental process is described by the Poisson
process then the interarrival times follow an exponential distribution. Interar-
rival times of floods can be modeled by using Eq. 6.9. Rainfall depth, intensity,
and duration have been modeled with Eq. 6.9. It should be noted that k in f(x)
represents the average of X and hence its interpretation from Eq. 6.8 remains
unchanged under mapping of h(t) onto f(x).
Another modification of the linear reservoir involves restating the unit delta
function δ(t) as δ(t – t0), where t0 is the time at which the function occurs. In that
case, h(t) of Eq. 6.8 becomes
exp[−(t − t0 )/ k )
h(t) = (6.10)
k
Equation 6.10 is the UIR of a lag and route linear reservoir system in which t0
is the amount of lag time before water is released from the reservoir. This is
equivalent to a linear reservoir and a linear channel, connected in series. By
mapping Eq. 6.7 onto the probability plane, the PDF becomes
exp[−( x − x0 )/ k )
f ( x) = (6.11)
k
where x0 is the threshold of X, x ≥ x0. The threshold is the minimum value of X.
This is useful in frequency analysis of environmental data.
Impulse Response Functions as Probability Distributions 257
Example 6.1 Consider a linear reservoir with a lag time of 10 hours. This reser-
voir receives a pulse of 10 m3/s for a duration of 5 hours. Determine the peak
outflow and graph the outflow hydrograph. Also graph the outflow hydrograph
when lag time is 5 hours and compare the two hydrographs.
Solution The peak of the hydrograph can be computed from Eq. 6.6:
QP = I[1 – exp(–D/k)] = 10[1 – exp(–5/10] = 3.93 cumec
The hydrograph for both cases is plotted in Fig. 6-2. Notice that the peak is
higher when the lag time is smaller and the recession is slower and lasts longer
when k is larger. Clearly, a larger catchment will have a longer lag time. This
hydrograph can also be considered as a probability density function of peak dis-
charge exceeding a given threshold.
α 1 ⎡ t ⎤
h (t ) = − δ (t ) + exp ⎢ − ⎥ (6.13)
1− α K (1 − α) ⎣ K (1 − α) ⎦
2
It has been shown that modeling flood routing along a short reach of a low-
land river may result in the negative value of α and
⎛ α ⎞
0 ≤ ⎜− ≤1 (6.14)
⎝ 1 − α ⎟⎠
7
Q (k=10)
6
Discharge (Cumec)
Q (k = 5)
5
4
3
2
1
0
0 5 10 15 20 25 30
Time (Hours)
Figure 6-2 Outflow hydrograph from the linear reservoir for two values of lag time.
258 Risk and Reliability Analysis
0.35
0.3
0.25
0.2
f(t)
0.15
0.1
0.05
0
0 2 4 6 8 10 12 14 16 18 20
time (days)
1 β
Denoting = β and = γ and renaming t as x, one gets a two-parameter
1− α K
probability distribution function:
f(x) = (1 – β)δ(x) + βγ exp(–γx) (6.15)
The PDF given by Eq. 6.15 is a weighted sum of two functions: a delta func-
tion and an exponential function. It is interesting to note that in this function
parameter β is a weighting factor and parameter K = β/γ becomes the average of
X. Thus, the original expressions of the weighting factor and the average travel
time are modified under mapping, but the conceptual meaning of the modified
expressions remains more or less intact. Equation 6.15 is useful for frequency
analysis of floods with zero values as well as flood damage.
Example 6.2 Let the average travel time of a Muskingum lowland reach, K, be 2
days, and the weighting coefficient parameter be –0.1. Determine the impulse
response function of reach outflow.
Solution According to Eq. 6.15,
1 1
β= = = 0.91, γ = β / K = 0.91/ 2 = 0.455
1 + 0.1 1 + 0.1
Then
f (t) = 0.09 δ(t) + 0.414 exp(−0.455t)
Impulse Response Functions as Probability Distributions 259
1 ⎛ t ⎞ n −1
h(t) = ⎜ ⎟ exp(−t / k ) (6.16)
k Γ( n ) ⎝ k ⎠
where k is the storage parameter of each reservoir and Γ(n) is the gamma of n.
Since there are n reservoirs, nk represents the total lag time (or the average resi-
dence time) of the system. By mapping onto the probability plane, the PDF
becomes
n −1
1 ⎛ x⎞
f ( x) = ⎜ ⎟ exp(− x / k ) (6.17)
k Γ( n ) ⎝ k ⎠
where k and n are parameters. Equation 6.17 is the gamma probability density
function. The gamma distribution results from the sum of exponentials, where n
is the number of exponentials. In deterministic parlance, each exponential repre-
sents a linear reservoir. Thus, the deterministic interpretation of parameters is
carried over through mapping. The gamma distribution is one of the most com-
monly used probability distributions for environmental frequency analysis.
If an environmental system satisfies the requirement that h(t) > 0 if t ≥ t0 then
the UIR becomes
n −1
1 ⎛ t − t0 ⎞
h(t) = ⎜ ⎟ exp[−(t − t0 )/ k )] (6.18)
k Γ( n ) ⎝ k ⎠
n −1
1 ⎛ x − x0 ⎞
f ( x) = ⎜ ⎟ exp[−( x − x0 )/ k )] (6.19)
k Γ( n ) ⎝ k ⎠
which represents the three-parameter Pearson type III probability density func-
tion. This is equivalent to a cascade of linear reservoirs and channels connected
in series. This is one of the most widely used frequency distributions in hydrol-
ogy and environmental sciences. Note that Eq. 6.17 is a special case of Eq. 6.19.
Here parameter x0 is the lowest value or threshold of the variable X. Although
these parameters, k, n, and x0, can be interpreted by using the deterministic anal-
ogy, their optimal values are better found by curve fitting. This means that under
mapping onto the probability plane, the interpretation of the parameters may be
somewhat transformed.
260 Risk and Reliability Analysis
Example 6.3 Take n as 3 and k as 6 hours. Compute the probability density func-
tion of peak discharge. Assume the lowest value or threshold of discharge as 100
cumecs.
Solution Substituting n = 3, k = 6, and x0 = 100 into Eq. 6.19, we have the proba-
bility density function of discharge as
2
1 ⎛ x − 100 ⎞
f ( x) = ⎜ ⎟ exp[−( x − 100)/ 6)]
6 Γ( 3 ) ⎝ 6 ⎠
∂ 2Q ∂ 2Q ∂ 2Q ∂Q ∂Q
a +b +c 2 =d +e (6.20)
∂y 2 ∂y ∂t ∂t ∂y ∂t
0.35
0.3
0.25
0.2
f(t)
0.15
0.1
0.05
0
0 2 4 6 8 10 12 14 16 18 20
time (day s )
Figure 6-4 Probability density function of cascade of reservoirs in series.
the original Eq. 6.20, which is hyperbolic. Its solution for a semi-infinite channel,
known under the name of the linear diffusion analogy (LDA) or the convective–
diffusion solution, has the form
x ⎡ ( y − ut )2 ⎤
h ( y , t) = exp ⎢ − ⎥ (6.21)
4 πDt 3 ⎢⎣ 4Dt ⎥
⎦
where y is the length of the channel reach, t is the time, u is the convective veloc-
ity, and D is the hydraulic diffusivity. Both u and D are functions of channel and
flow characteristics at the reference steady-state condition. Besides flood rout-
ing, LDA has been applied by Moore and Clarke (1983) and Moore (1984) as a
transfer function of a sediment routing model.
The function given by Eq. 6.21 is rarely quoted in statistical literature. It was
derived by Cox and Miller (1965, p. 221) as the probability density function of
the first passage time T for a Wiener process starting at 0 to reach an absorbing
barrier at a point x, where u is the positive draft and D is the variance of the
Wiener process. Tweedie (1957) termed the density function of Eq. 6.21 as an
inverse Gaussian PDF, Johnston and Kotz (1970) summarized its properties, and
Folks and Chhikara (1978) provided a review of its development. The function in
Eq. 6.21 has been applied by Strupczewski et al. (2001) as a flood frequency
model expressed as
α ⎡ ( α − λx )2 ⎤
f ( x) = exp ⎢ − ⎥ (6.22)
πx 3 ⎢⎣ x ⎥⎦
262 Risk and Reliability Analysis
α ⎡ ( α − λ[x − ε])2 ⎤
f ( x) = exp ⎢ − ⎥ (6.23)
πx 3 ⎢⎣ [x − ε] ⎥⎦
∞
⎛ t − Δ⎞
h ( x , t ) = P0 ( λ) δ (t − Δ ) + ∑ Pi ( λ) ⋅hi ⎜ ⋅ 1 (t − Δ )
⎝ α ⎟⎠
(6.24)
i =1
where
λi
Pi ( λ) = exp ( − λ) (6.24a)
i!
is the Poisson distribution,
i −1
⎛ t⎞ 1 ⎛ t⎞ ⎛ t⎞
hi ⎜ ⎟ = ⎜ ⎟ exp ⎜ − ⎟ (6.24b)
⎝ α ⎠ α (i − 1) ! ⎝ α ⎠ ⎝ α⎠
is the gamma distribution, and 1(t) is the unit step function. Parameters α , λ,
and Δ are functions of both channel geometry and flow conditions, which are
different for the two models. Furthermore, there is no time lag (Δ) in the impulse
response function of the KD model.
Both models can be considered as hydrodynamic and conceptual. Note that
the solution of both models can be represented in terms of basic conceptual ele-
ments used in hydrology, namely, a cascade of linear reservoirs and a linear
channel in case of the RF model. The upstream boundary condition is delayed
Impulse Response Functions as Probability Distributions 263
by a linear channel with time lag Δ divided according to the Poisson distribution
with mean, and then transformed by parallel cascades of equal linear reservoirs
(with time constant α) of varying lengths. Note that λ is the average number of
reservoirs in a cascade. Strupczewski et al. (1989) and Strupczewski and Napi-
orkowski (1989) have derived the distributed Muskingum model from the multi-
ple Muskingum model and have shown its identity to the KD model. Similarly,
the RF model happens to be identical to the distributed delayed Muskingum
model (Strupczewski and Napiorkowski 1990c).
Einstein (1942) introduced the function given by Eq. 6.21 to hydrology as the
mixed deterministic–stochastic model for the transportation of bed load. It has
also been used as the PDF of the total rainfall depth derived from the assump-
tion of a Poisson process for storm arrivals and an exponential distribution for
storm depths (Eagleson 1978). The function in Eq. 6.24 is considered to be a flood
frequency model. An example of such a model is shown in Fig. 6-5.
Figure 6-5 Empirical and two theoretical KD cumulative distribution functions for the
Big Lost River, Arco, Idaho. MOM and MLM estimated parameters are shown. Solid
line: MOM estimated CDF, dotted line: MLM estimated CDF.
264 Risk and Reliability Analysis
The RF model can be employed for modeling samples censored by the value.
If the delay is equated to zero and t is renamed as x then Eq. 6.24 yields a two-
parameter probability distribution of the form
∞
x
f ( x) = P( z = 0) δ( x)+∑ P( z = i)+ hi ( )iI ( x) (6.25)
i
α
λi
P( z = i) = Pi ( λ) = exp(− λ) (6.25a)
i!
x 1 x x
hi ( ) = ( )i −1 exp(− ) (6.25b)
α α(i − 1)! α α
I(x) is the unit step function. Equation 6.25 differs from Eq. 6.24 since its second
term cannot be expressed as the product of the probability of nonzero value (i.e.,
(1 − P0 ( λ)) ) and the conditional PDF (i.e., f1(x, g) with β not included in g, where
β = exp(− λ) and g =[ α, β] in the KD distribution function). The second term of
the PDF,
∞
x
fc = ∑ Pi ( λ) hi ( ) (6.26)
i α
can be expressed by the first-order modified Bessel function of the first type,
∞
1 z
I1 ( z ) = ∑ ( )2(i −1)+1 (6.27)
i (i − 1)! i ! 2
as
x λ λx
fc ( x) = exp(− λ − ) I1 ( 2 ) I ( x) (6.28)
α αx α
Thus, Eq. 6.25 can be written as
x λ λx
f ( x ) = P0 ( λ)δ( x ) + exp(− λ − ) I1 × ( 2 ) I ( x) (6.29)
α αx α
which is the KD–PDF.
Impulse Response Functions as Probability Distributions 265
∂C ∂ 2C ⎛ M ⎞
= D 2 + ⎜ ⎟ δ ( y − y 0 ) δ (t ) (6.30)
∂t ∂x ⎝ A⎠
where D is the diffusion coefficient, δ(y – y0) is a Dirac delta function of (y – y0), δ(t)
is a Dirac delta function of t, and y0 is the location where the mass is inserted at
time t = 0. A Dirac delta function has the property that it is equal to zero if the
argument is nonzero; when the argument is zero, the Dirac delta function becomes
infinite. The definition of the Dirac delta requires that the product δ ( x ) ∂ ( x ) is
dimensionless. Thus, the units of the Dirac delta are the inverse of those of the
argument x. That is, δ ( x ) has units meters–1, and δ (t ) has units sec–1 (Scott 1955).
The first boundary condition states that there is no diffusion of dye through
the closed left end of the tube at y = 0:
∂C
= 0 , at y = 0 (6.31)
∂y
C(y, 0) = 0 (6.32)
The second boundary condition states that the concentration and the concen-
tration flux are zero at infinity. (More generally, all of the terms in the Taylor
series expansion of the concentration are zero at infinity.) The second boundary
condition is stated for the terms of the Taylor series of concentration as
∂C ∂ nC
C = 0, = 0 ,..., = 0,..., as x → ∞ (6.33)
∂y ∂y n
The initial condition states that there is no dye in the tube at time zero.
Using the integral transform method gives the solution of Eq. 6.30 subject to
Eq. 6.31 to Eq. 6.33 (Özisik 1968; Cleary and Adrian 1973):
M 1 ⎛ ⎡ ( x − x 0) 2 ⎤ ⎡ ( x + x 0)2 ⎤ ⎞
C( x , t ) = ⎜ exp⎢ − ⎥+ exp⎢ − ⎥⎟ (6.34)
A 4 πDt ⎜ ⎣ 4Dt ⎦ ⎣ 4Dt ⎦ ⎟
⎝ ⎠
266 Risk and Reliability Analysis
We now reduce the number of terms in Eq. 6.34, normalize the equation so
that it represents a unit mass injected over a unit area, and map onto the proba-
bility plane by introducing a frequency term instead of concentration. These
changes make Eq. 6.34 resemble a probability distribution. The term “4Dt”
appears together in Eq. 6.34. We define a new term
σ2 = 2 D t (6.35)
In addition, the mass and cross-sectional area are combined with concentra-
tion, σ is held constant so it is treated as a parameter, and a new term f(x; x0, σ) is
introduced, so that f(x; x0, σ) = AC(x, t)/M, which has units length–1. The result is
the equation
⎧ ⎡ ⎫
⎢ ⎛ x + x0 ⎞ ⎤⎥⎪
2
1 ⎪ ⎡ x − x0
( )
2⎤
f ( x; σ , x0 ) = ⎨exp⎢− ⎥ + exp⎢−⎜ ⎟ ⎥⎬ (6.36)
⎣⎝ ⎠ ⎥⎦⎪⎭
2 πσ ⎪ ⎢ ⎣ 2σ ⎥
⎦ 2σ
⎩ ⎢
6.3 Application
In the frequency analyses of environmental (say, hydrological) data in arid and
semiarid regions, one often encounters data series that contain several zero values
with zero being the lower limit of the variability range. From the viewpoint of
probability theory, the occurrence of zero events can be expressed by placing a
nonzero probability mass on a zero value (i.e., P(X = 0) ≠ 0, where X is the random
variable and P is the probability mass). Therefore, the distribution functions from
which such hydrological series were drawn would be discontinuous with discon-
tinuity at the zero value having a form
f ( x ) = (1 − β ) δ ( x ) + fc ( x ; h) ⋅ 1( x ) (6.37)
Impulse Response Functions as Probability Distributions 267
where (1 –β) denotes the probability of the zero event, that is, 1 − β = P ( X = 0) ,
∞
fc ( x ; h) is the continuous function such that ∫0 fc ( x ; h ) dx = β , h is the vector
of parameters, δ ( x ) is the Dirac delta function, and 1(x) is a unit step function.
The estimation procedures for hydrologic samples with zero events have
been a subject of several publications. The theorem of total probability has been
employed (Jennings and Benson 1969, Woo and Wu 1989, Wang and Singh 1995)
to model such series. Then, Eq. 6.37 takes the form
where f1(x; g) is the conditional probability density function (CPDF), that is,
f1 ( x ; g ) ≡ f1 ( x ; g X > 0 ) , which is continuous in the range (0, +∞) with a lower
bound of zero value. Wang and Singh (1995) estimated β and the parameters of
the CPDF separately by considering the positive values as a full sample. Having
estimated g and β allows one to transform the conditional distribution to the mar-
ginal distribution [i.e., to f(x)] by Eq. 6.33. Among several PDFs with zero lower
bound recognized in flood frequency analysis (FFA) (e.g., Rao and Hamed 2000),
the gamma distribution given by Eq. 6.17 was chosen by Wang and Singh (1995)
as an example of a CPDF and four estimation methods were applied: the maxi-
mum likelihood method (MLM), the method of moments (MOM), probability
weighted moments (PWM), and the ordinary least-squares method. By using
monthly precipitation and annual low-flow data from China, and annual maxi-
mum peak discharge data from the United States, the suitability of the distribu-
tion and the estimation methods was assessed. The histogram and the estimated
PDF of all three series indicated a reverse J-shape without mode, whereas the
value of the coefficient of variation of f1(x; g) was close to one, pointing out a
good fit of data to the Muskingum-originated PDF given by Eq. 6.10.
Among positively skewed distributions, it is the log-normal (LN) distribu-
tion, which together with the gamma, is most frequently used in environmental
frequency analysis. The LN distribution has been found to describe hydraulic
conductivity in a porous medium (Freeze 1975), annual peak flows, raindrop
sizes in a storm, and other hydrologic variables. Chow (1954) reasoned that this
distribution is applicable to hydrologic variables formed as the product of other
variables since if X = X1 · X2 · … Xn then Y = log X tends to the normal distribu-
tion for large n provided that the Xi are independent and identically distributed.
Kuczera (1982d) considered six alternative PDFs and found the two-
parameter LN to be most resistant to an incorrect distributional assumption in
at-site analysis and also while combining site and regional flood information.
Strupczewski et al. (2001) fitted seven two-parameter distribution functions—
namely, normal, gamma, Gumbel (extreme value type I), Weibull, log-Gumbel,
and log-logistic—to thirty-nine 70-year long annual peak flow series of Polish
rivers. The criterion of the maximum log-likelihood value was used for the best
model choice. From these competing models, the log-normal was selected in 32
cases out of 39, the gamma in 6 cases, the Gumbel in one case, and the remaining
four were not identified as the best model even in a single case.
268 Risk and Reliability Analysis
Figure 6-6 Empirical and two theoretical LD cumulative distribution functions for the
Warta River, Skwierzyna cross-section data. MOM and MLM estimated parameters are
shown. Solid line: MOM estimated CDF, dotted line: MLM estimated CDF.
Impulse Response Functions as Probability Distributions 269
Application of the function given by Eq. 6.24 to flood data reveals that for
λ < 2, which corresponds to the probability of the zero event being equal to
P0(λ = 2) = exp(–2) = 0.135, the PDF has a reverse J-shape without mode. By
modeling longer time series, it may be reasonable to introduce a third parame-
ter to the model of Eq. 6.22, making the shape of the continuous part of the dis-
tribution independent of the probability of zero event:
∞
1− β ⎛ x⎞
f ( x) = β ⋅ δ( x) +
1 − e− λ
∑ Pi ( λ)⋅hi ⎜⎝ α ⎟⎠ ⋅ 1( x) (6.39)
i =1
⎛ x⎞
where Pi ( λ) and hi ⎜ ⎟ are defined by Eq. 6.25a and Eq. 6.25b.
⎝ α⎠
6.4 Summary
To summarize, one can conclude the following:
1. The unit impulse response functions of linear or linearized physically
based models form suitable models for environmental frequency analy-
sis.
2. Many of the unit response functions are found to be the same as those
that have been used in statistics for a long time.
3. The use of the UIR functions can provide a physical basis to many of the
statistical distributions.
4. The UIR approach provides a hope for linking deterministic and stochas-
tic frequency models.
6.5 Questions
6.1 Consider a linear reservoir whose impulse response function can be
defined as
1
h(t) = exp(−t / k )
k
where h(t) is the impulse response at time t and k is a parameter, called
the lag time. Integrate h(t) over time and show that it can be considered
as a probability density function regardless of the value of k. Plot h(t)
(1/hour). In the probability domain, h(t) will assume the role of a proba-
bility density function and t will be the random variable and its value
will be a quantile.
270 Risk and Reliability Analysis
6.2 Integration of the impulse response function in Question 6.1, U(t), can be
expressed as
U (t) = 1 − exp(−t / k )
Here U(t) takes on the role of the cumulative distribution function in the
probability domain. Plot U(t) for k =1, 2, 5, 10, and 15 hours and interpret
these plots physically.
6.3 The impulse response function for an aquifer interacting with a stream
can be expressed as
1
h(t) = exp[− a(t − t0 )/ Sc ]
Sc
where Sc is the storage coefficient and is equal to specific yield for uncon-
fined aquifers, a (1/time) is the subsurface flow constant, and t0 is the
initial time. In the probability domain this can also be considered as a
probability density function with parameters two parameters and a
threshold value of t0. Plot the UIR for different values of Sc: 0.01, 0.05, 0.1,
0.2, 0.3, 0.4, and 0.5. Take the value of a as 1/day, 0.5/day, and 0.2/day.
6.4 The impulse response of a time-variant linear reservoir can be expressed as
t
1 dw
h(t) = exp[− ∫ ]
k(t) τ k(w)
where k(t) is the time-varying lag time and is the time at which input to
the reservoir is applied. Plot h(t, τ) assuming k(t) =10 + t. Take τ as 0, 1, 2,
3, 4, and 5 hours. Interpret these graphs physically and discuss the kind
of probability density function these graphs look like.
6.5 Plot the impulse response function using Eq. 6.17 for different values of
n as 1, 2, 3, 4, and 5 and k as 2, 4, 6, 8, and 10 hours. Interpret these plots
physically. Now use the impulse response function as a probability den-
sity function of peak discharge and then compute the probability of dis-
charge equal to or exceeding 1,000 cumecs.
6.6 Consider the impulse response function of Eq. 6.18. Then compute the
probability of discharge equal to or exceeding 1,000 cumecs if the lowest
value or threshold of discharge is 100 cumecs.
6.7 Consider a linear reservoir with a lag time of 15 hours. This reservoir
receives a pulse of 600 m3/s for a duration of 6 hours. Determine the
peak outflow and graph the outflow hydrograph. Also graph the out-
flow hydrograph for lag times as 10 and 20 hours and compare all three
hydrographs and comment on your results.
Impulse Response Functions as Probability Distributions 271
6.8 A reservoir releases a peak flow of 531 m3/s. It was determined that the
lag time for this reservoir was 12 hours. It receives an inflow pulse for a
duration of 6 hours. Determine the magnitude of the inflow pulse.
6.9 Determine the reservoir storage coefficient K and inflow pulse duration
D using the following data:
Inflow 27 82 109 136 190 218 245 272 299 326 354 381 408 435 463 490 517 544
(m3/s)
Peak 14 43 57 71 99 113 128 142 156 170 184 198 213 227 241 255 269 283
outflow
(m3/s)
6.10 Let the average travel time of a Muskingum lowland reach K be 5 days,
and let the weighting coefficient parameter be –0.15. Determine the
impulse response function of reach outflow.
6.11 Take n as 5 and k as 9 hours. Compute the probability density function of
peak discharge. Assume the lowest value or threshold of discharge as
300 m3/s.
6.12 Select a watershed in your area that has two USGS gauging stations on a
stream. Use the discharge data at the two stations to determine the
Muskingum method routing coefficients.
6.13 Rework Question 6.12 using a linear reservoir and determine the lag K
and pulse duration D parameters.
6.14 What is a response function? Discuss its applications and advantages.
Chapter 7
Multivariate Probability
Distributions
272
Multivariate Probability Distributions 273
1 ⎡ z ⎤
f X1 , X2 ( x1 , x2 ) = exp ⎢ − 2 ⎥ (7.1a)
2 πσ 1 σ 2 1 − ρ2 ⎣ 2(1 − ρ ) ⎦
where
( x1 − u1 )2 2 ρ( x1 − μ1 )( x2 − μ2 ) ( x2 − μ2 )2
z≡ − + (7.1b)
σ 12 σ 1σ 2 σ 22
where μ1, μ2, σ 1, and σ 2 are the means and standard deviations of variables X1
and X2, respectively, and ρ is the coefficient of correlation between variables X1
and X2. Symbol X1~N(μ1, σ 1) means that X1 is normally distributed with mean
μ1 and variance σ 1 and the same applies to other variables.
Example 7.1 Let the bivariate variables listed in Table E7-1 be normally distrib-
uted after the Box–Cox transformation. What is the joint probability density
function of these two variables?
Solution The values of the random variable X1 representing peak discharge
(in cfs) and X2 representing the corresponding flow volume (in cfs·day), after
the Box–Cox transformation, are given as follows:
1. To calculate the first two moments of variables X1 and X2, one evaluates
∑ x1(i) ∑ (x1(i) − X1 )2
i =1, n 2477.3 i =1, n 2887.9
m1 = X1 = = = 39.36; σ 1 = S1 = = = 6.88
n 62 n−1 61
∑ x 2 (i ) ∑ ( x 2 ( i ) − X 2 )2
i =1, n 98034 i =1, n 20314000
m2 = X 2 = = = 1581.2; σ 2 = S2 = =
n 62 n−1 61
= 577.08
274 Risk and Reliability Analysis
Table E7-1
No. X1 X2 No. X1 X2 No. X1 X2 No. X1 X2
1 31.583 733.08 17 33.17 1465.8 33 54.272 2513.8 49 44.654 1698
2 33.882 1698.3 18 35.414 1631.1 34 40.018 1442.3 50 46.771 1858.4
3 36.174 1902.4 19 35.275 1095.5 35 48.448 2304.2 51 44.917 2192.9
4 31.649 863.53 20 44.676 1568.3 36 47.678 2773.2 52 42.209 1514.2
5 42.55 2098.4 21 44.808 1548.4 37 31.042 674.27 53 28.181 316.14
6 30.831 1506.1 22 25.374 533.26 38 37.442 1267.6 54 47.874 1860
7 31.248 1111.2 23 42.627 1648.5 39 54.506 2848.8 55 33.935 1096.8
8 33.614 1156.9 24 44.852 1822.4 40 37.325 998 56 31.714 764.94
9 38.867 1828.8 25 42.42 2345.7 41 38.374 1427.4 57 48.086 1915.1
10 43.761 2142.9 26 44.385 2013.3 42 43.474 1855.2 58 39.57 1361.1
11 39.207 2164.7 27 26.725 582.37 43 40.018 1830.3 59 37.246 1452.1
12 42.704 1634.3 28 37.088 1215.3 44 42.831 2064.8 60 45.782 1969.6
13 41.668 1581.5 29 36.556 840.95 45 39.406 1429 61 36.388 1193
14 28.637 596.3 30 31.908 902.43 46 52.622 2565.5 62 50.563 2106.4
15 48.173 2267.2 31 45.261 1984.6 47 41.887 1806.2
16 33.449 843.3 32 46.942 1971.8 48 44.565 1635.7
1 ⎡ z ⎤
f X1 , X2 ( x1 , x2 ) = exp ⎢ − ⎥⎦
12300 ⎣ 0 .4862
⎛ 1 ⎞
exp⎜− (X − μ)’ Σ−1 (X − μ)⎟
⎝ 2 ⎠ (7.2)
f (X ) =
( 2 π) Σ
Multivariate Probability Distributions 275
4
x 10
0.06 8
0.05
6
0.04
0.03 4
0.02
2
0.01
0 0
20 30 40 50 60 0 1000 2000 3000
Figure 7-1 Probability density functions of transformed random variable X1 and X2.
f (x1,x2)
x2 x1
where Σ denotes
the determinant of the covariance
matrix of the random vari-
able vector X , with Σ positive
definite, and μ denotes the mean vector of the
random variable vector X . Consider the case of N = 2. Then the covariance
matrix Σ is expressed as
σ 12 ρσ 1 σ 2
Σ=
ρσ 1 σ 2 σ 22
276 Risk and Reliability Analysis
σ 12 ρ12 σ 1 σ 2 ρ13 σ 1 σ 3
Σ = ρ12 σ 1 σ 2 σ 22 ρ23 σ 2 σ 3
ρ13 σ 1 σ 3 ρ23 σ 2 σ 3 σ 23
Example 7.2 Similar to Example 7.1, let the trivariate variables—peak discharge
(in cfs), volume (in cfs·day), and duration (in days)—be normally distributed
after the Box–Cox transformation (see Table E7-2). What is the joint probability
density function?
Table E7-2
Solution The first two moments of variables X1 and X2 are calculated from
Example 7.1 as
m1 = 39.36; σ 1 = 6.88 ; m2 = 1581.2; σ 2 = 577.08
The first two moments of X3 are calculated as
2
∑ x 3 (i ) ∑ ⎡⎣ x3 (i) − X 3 ⎤⎦
i =1:n 327.77 i =1, n 65.77
m3 = = = X 3 = 5.29; σ 3 = S3 = = = 1.04
n 62 n−1 61
f X1 ( x1 ) = α exp ( − αx1 )
f X2 ( x2 ) = β exp ( − βx2 )
278 Risk and Reliability Analysis
-4
x 10
0.07 8 0.4
0.06 0.35
6 0.3
0.05
0.25
0.04
4 0.2
0.03
0.15
0.02
2 0.1
0.01 0.05
0 0 0
0 20 40 60 0 2000 4000 0 5 10
Figure 7-3 Probability density functions of transformed random variables X1, X2, and X3.
and δ represents the correlation between the two variables, which is in the
range of [0, 1]. This parameter δ is defined through the correlation ρ of two
variables as
∞
1
ρ =−1+ ∫ exp(−x)dx (7.3a)
0
1 + δx
-0.1
-0.2
-0.3
-0.4
-0.5
0 0.2 0.4 0.6 0.8 1
αβ ⎛ αx1 βx2 ⎞ ⎛ ρ ⎞
f ( x1 , x2 ) = exp ⎜ − − ⎟ I0 ⎜ 2 x1 x2 αβ ⎟ (7.4)
(1 − ρ) ⎝ (1 − ρ) (1 − ρ) ⎠ ⎝ 1 − ρ ⎠
(z2 / 4)
k
∞
I 0 ( z) = ∑ (k !)2
(7.4a)
k =0
Table E7-3
No. X1 X2 No. X1 X2 No. X1 X2 No. X1 X2
1 50 2011 6 39 5166 11 65 2360 16 50 1866
2 40 5199 7 39 1421 12 53 797 17 52 5280
3 49 4930 8 60 1303 13 29 4813 18 40 2455
4 38 2547 9 64 1266 14 36 469 19 51 3041
5 47 1526 10 50 1839 15 48 2312 20 45 698
f (x2)
f (x1)
x1 x2
Figure 7-5 Exponential probability density functions of random variable X1 and X2.
which yields δ = 0.508. Then the joint probability density function, plot-
ted in Fig. 7-6(a), is
• Type 2 model: For parameters α,β,ρ determined earlier, the bivariate dis-
tribution by the NK model, plotted in Fig. 7-6(b), is expressed as
Notice that, for the modified Bessel function, the variable can be complex.
⎧ ⎡ −1
⎪ 1 1 ⎤ ⎫⎪
FX1 , X2 ( x1 , x2 ) = FX1 ( x1 )FX2 ( x2 )exp ⎨− θ ⎢ + ⎥ ⎬ , 0 ≤ θ ≤ 1 (7.5)
⎪⎩ ⎢⎣ ln FX1 ( x1 ) ln FX2 ( x2 ) ⎥⎦ ⎪⎭
Multivariate Probability Distributions 281
a
b
f (x1,x2)
f (x1,x2)
x2 x1 x2 x1
Figure 7-6 Joint exponential probability density function for random variables X1 and
X2. (a: Type I model; b: Type II model).
where Fx1 ( x1 ) and Fx2 ( x2 ) are the Gumbel marginal distributions of random
variables X1 and X2, FX1 , X2 ( x1 , x2 ) denotes the joint distribution of two random
variables X1 and X2, and θ denotes the correlation between the two random vari-
ables, which can be estimated as
[
= 22⎡
θθ = cos( π ρρ//66) ⎤
− cos(
⎣11 − ] for 0 0≤≤ρρ≤≤22 //33
) ,, for
⎦ (7.5a)
Example 7.4 Suppose two low-flow random variables X1 (discharge in cfs) and
X2 (volume in cfs·day) follow the Gumbel distribution, with the data given in
Table E7-4. What is the joint cumulative probability distribution?
Table E7-4
No. X1 X2 No. X1 X2 No. X1 X2 No. X1 X2
1 610 35600 6 1100 37213 11 1360 48790 16 1470 38634
2 934 39744 7 1130 49226 12 1370 38682 17 1490 57769
3 949 33010 8 1170 42497 13 1380 45263 18 1490 55766
4 968 58538 9 1210 74840 14 1420 60824 19 1500 41943
5 993 36882 10 1330 47627 15 1460 50895 20 1530 60767
282 Risk and Reliability Analysis
Solution First one determines the parameters of the Gumbel marginal distribu-
tions. The probability distribution function of the Gumbel variable is
⎡ ⎛ x − β⎞ ⎤
F( x) = exp ⎢ − exp ⎜ − ⎟ (7.5b)
⎣ ⎝ α ⎠ ⎥⎦
where M and S are the sample mean and sample standard deviation, respec-
tively, given by
f (x2)
x1 x2
Figure 7-7 Gumbel probability density functions of random variables X1 and X2.
Multivariate Probability Distributions 283
Now one must determine the parameter θ. The value of ρ = 0.407 is in the
range [0, 2/3]. Thus the Gumbel mixed distribution is valid and we have
Then the bivariate Gumbel mixed distribution for the two variables, plotted
in Fig. 7-8, is
⎧ −1
⎪ ⎡ 1 1 ⎤ ⎫⎪
FX1 , X2 ( x1 , x2 ) = FX1 ( x1 )FX2 ( x2 )exp ⎨−0.68 ⎢ + ⎥ ⎬
⎪⎩ ⎢⎣ ln FX1 ( x1 ) ln FX2 ( x2 ) ⎥⎦ ⎪
⎭
with
⎡ ⎛ x − 1129.3 ⎞ ⎤
FX1 ( x) = exp ⎢ − exp ⎜ − 1 ⎟
⎣ ⎝ 197.32 ⎠ ⎥⎦
and
⎡ ⎛ x − 42820 ⎞ ⎤
FX2 ( x) = exp ⎢ − exp ⎜ − 2 ⎟
⎣ ⎝ 8502.4 ⎠ ⎥⎦
x2 x1
Figure 7-8 Bivariate Gumbel mixed joint probability distribution function of random
variables X1 and X2.
284 Risk and Reliability Analysis
where m is the parameter and Eq. 7.5b gives the marginal distribution represen-
tation of variables X1 and X2. It will be seen later that Eq. 7.6 has the same form
as the Gumbel–Hougaard copula formula.
Example 7.5 Using data in Example 7.4, if the Gumbel logistic distribution is
applied for the bivariate frequency analysis of two variables X1 and X2, deter-
mine its joint distribution.
Solution The Gumbel marginal distributions of two variables were already
obtained through Example 7.4. Now one must determine parameter m in the
bivariate Gumbel logistic model. This parameter can be estimated through the
correlation coefficient of two variables as
1
m= (7.6a)
1− ρ
with the marginal distribution of FX1 ( x1 ) and FX2 ( x2 ) given in Example 7.4.
F (x1,x2)
x2 x1
Figure 7-9 Bivariate Gumbel logistic distribution of random variables X1 and X2.
Multivariate Probability Distributions 285
⎛ αX x1 + αX2 x2 ⎞
( x1x2 )( n−1)/ 2 x1m exp ⎜ − 1 ⎟⎠
⎝ 1− η
f X1 , X 2 =
Γ(n)Γ(m)( αX1 αX2 )−( n+1)/ 2 αX
−m
1
(1 − η)η( n −1)/ 2 (7.7)
1 ⎛ 2 αX αX ηx1x2 (1 − t) ⎞
× ∫ (1 − t)( n−1)/ 2 t m −1 exp ⎡⎣ αX1 ηx1t /(1 − η)⎤⎦ I n−1 ⎜ 1 2
⎟ dt
⎜⎝ 1 −η ⎟⎠
0
η = ρ β X1 β X2 (0 ≤ ρ ≤ 1, 0 ≤ η ≤ 1) (7.7a)
n = β X 2 , m = β X1 − β X 2 = β X 1 − n , n ≥ 0 , m ≥ 0 (7.7b)
where In–1(.) is the modified Bessel function of the first kind, η is the association
parameter between random variables X1 and X2, and ρ is Pearson’s correlation
coefficient.
⎧ K1 ∞ ∞
f X1 , X2 ( x1 , x2 ) = ⎨ K 2 ∑ ∑ c jk ( αX1 x1 ) j (ηαX2 x2 ) j+ k , 0 < η < 1
⎪
j=0 k =0
(7.8)
⎪ f (x , α , β ) f (x , α , β )
⎩ X1 1 X1 X1 X2 2 X2 X2
where β X2 ≥ β X1 ,
β x1 −1 β x2 −1 ⎛ αX x1 + αX2 x2 ⎞
K1 = ( αX1 x1 ) ( αX 2 x2 ) exp ⎜ − 1 ⎟⎠ (7.8a)
⎝ 1− η
η j + k Γ ( λ X 2 − λ X1 + k )
c jk = (7.8c)
(1 − η)2 j + k Γ( λX2 + j + k ) j ! k !
and
η = ρ β X1 β X 2 (7.8d)
286 Risk and Reliability Analysis
Example 7.6 Suppose the correlated rainfall variables intensity and depth have
the gamma marginals with the data given in Table E7-6. Determine the joint
probability distribution of rainfall intensity and depth.
Table E7-6
No. X1 X2 No. X1 X2 No. X1 X2 No. X1 X2 No. X1 X2 No. X1 X2 No. X1 X2
1 2.65 7.95 9 1.09 2.17 17 2.47 7.42 25 3.2 6.5 33 1.2 6.2 41 1.3 6.5 49 3.04 6.08
2 1.8 3.59 10 1.12 4.49 18 2.25 4.5 26 3.5 6.9 34 1.9 3.8 42 4.5 4.5 50 1.09 5.43
3 3.03 9.1 11 3.03 3.03 19 2.78 5.55 27 4.2 8.4 35 1.6 8.2 43 2.1 4.2 51 2.65 10.6
4 1.26 2.51 12 3.25 3.25 20 2.24 2.24 28 5.3 5.3 36 3.1 12 44 1.1 5.5 52 1.89 3.77
5 2.91 2.91 13 1.6 3.19 21 0.72 2.87 29 1.3 4 37 1 7.2 45 3.2 9.7 53 1.21 4.84
6 2.65 5.3 14 2.54 5.08 22 3.26 6.52 30 2.7 11 38 1.3 6.4 46 3.2 9.5 54 2.2 15.4
7 4.41 4.41 15 8.05 8.05 23 2.43 4.85 31 1.7 5.2 39 1 3.8 47 1.7 5.2 55 2.74 5.48
8 3.69 7.38 16 2.24 2.24 24 1.11 4.43 32 3.2 13 40 1.3 6.5 48 7.3 7.3
Solution
1. First one needs to determine the marginal distribution of rainfall inten-
sity X1 and depth X2. Consider the gamma probability density function is
expressed as
1
f ( x , α, β) = α β x β −1 exp(− αx)
Γ( β )
f (x2)
f (x1)
x1 x2
Figure 7-10 Gamma probability density functions of random variables X1 and X2.
f(x1,x2)
x2 x1
Figure 7-11 Bivariate gamma distribution (Izawa model) of random variables X1 and X2.
288 Risk and Reliability Analysis
f(x1,x2)
x2
x1
Figure 7-12 Bivariate gamma distribution (SAT model) of random variables X1 and X2.
1 ⎡ 1 ⎛ ln x − μ ⎞ 2 ⎤
exp ⎢ − ⎜ ⎟⎠ ⎥⎥
Y
fX ( x) = (7.9)
2
x 2 πσ Y ⎢⎣ 2 ⎝ σ Y ⎦
where Y = lnX and μY and s Y are the mean and the standard deviation, respec-
tively, of Y.
The bivariate log-normal density function with log-normal marginals is
expressed as
1 ⎛ z⎞
f X1 , X2 ( x1 , x2 ) = exp⎜− ⎟
2 πσ Y1 σ Y2 1− ρ2 ⎝ 2⎠
⎡ 2
⎛ ln x1 − μY ⎞⎛ ln x2 − μY ⎞ ⎛ ln x2 − μY ⎞2 ⎤ (7.10)
1 ⎢⎛ ln x1 − μY1 ⎞ 2 ⎟ ⎥
z= ⎜ ⎟ ⎜ 1 ⎟⎜ 2 ⎟ ⎜
1− ρ2 ⎢⎜ σ ⎟ − 2 ρ⎜ σ ⎟⎜ σ ⎟+⎜ σ ⎟⎥
⎣⎝ Y1 ⎠ ⎝ Y1 ⎠⎝ Y2 ⎠ ⎝ Y2 ⎠⎦
Multivariate Probability Distributions 289
ρ= ⎣( )(
E ⎡ Y1 − μY1 Y2 − μY2 ⎤
⎦ ) (7.10a)
σ Y1 σ Y2
Example 7.7 Suppose the correlated rainfall variables X1 ~ rainfall intensity (in
inches/day) and X2 ~ depth (in inches) follow the log-normal distribution, with
the data given in Table E7-7. Determine the joint distribution of rainfall intensity
and depth.
Table E7-7
No. X1 X2 No. X1 X2 No. X1 X2
1 4.3 4.30 11 1.5 7.72 21 2.03 6.10
2 4.95 4.95 12 2.2 11.17 22 7.15 7.15
3 1.24 3.73 13 1.3 5.21 23 2.24 6.72
4 2.5 7.51 14 3.1 3.13 24 2.25 4.50
5 1.7 5.11 15 0.9 5.31 25 1.5 6.00
6 3.04 9.11 16 2.2 8.68 26 2.3 6.91
7 6.11 6.11 17 2.2 4.42 27 1.08 4.31
8 0.74 2.96 18 0.9 5.37 28 2.06 14.40
9 1.67 8.36 19 1.9 5.54 29 1.92 7.67
10 2.95 11.79 20 2.8 8.52
Solution
1. To determine the parameters of the log-normal (LN2) marginal distribu-
tion of rainfall intensity X1 and depth X2, one takes the logarithm of vari-
ables X1 and X2. For rainfall intensity, one gets
0.5
⎛ 29 2⎞
29
∑ ln(x1 (i)) 21.56
⎜ ∑ (ln( x1 (i)) − 0.74) ⎟
μY1 = i =1
= = 0.74; σ Y1 = ⎜ i =1 ⎟ = 0.54
29 29 ⎜ 28 ⎟
⎜ ⎟
⎝ ⎠
f (x1)
f (x2)
x1 x2
Figure 7-13 Log-normal probability density functions of random variables X1 and X2.
1 ⎛ z⎞
f X1 , X2 ( x1 , x2 ) = exp⎜− ⎟
1.25 ⎝ 2⎠
1 ⎡⎛ ln x1 − 0.74 ⎞2 ⎛ ln x1 − 0.74 ⎞⎛ ln x2 − 1.82 ⎞ ⎛ ln x2 − 1.82 ⎞2 ⎤
z= ⎢⎜ ⎟ − 0.5⎜ ⎟⎜ ⎟+⎜ ⎟⎥
0.94⎣⎝ 0.54 ⎠ ⎝ 0.54 ⎠⎝ 0.38 ⎠ ⎝ 0.38 ⎠⎦
⎧X λ −1
⎪ , λ≠0
Y =⎨ λ (7.11)
⎪ln(X ), λ = 0
⎩
Example 7.8 Consider the rainfall intensity variable in Example 7.7. Determine
parameter λ for the Box–Cox transformation.
Solution The rainfall intensity variable is expressed as random variable X1.
Parameter λ for the Box–Cox transformation is determined through the
Multivariate Probability Distributions 291
f(x1,x2)
x1 x2
Figure 7-14 Bivariate log-normal density function of random variables X1 and X2.
maximum likelihood method as λ = –0.07. The first four moments of the trans-
formed variable are
m = 0.76; S = 0.495; skewness=0.005; kurtosis= 2.56
Thus, it is safe to say that the transformed variable by the Box–Cox transfor-
mation follows the normal distribution ~ N(0.76, 0.495).
Let f X1 , X2 ( x1 , x2 ) be the joint PDF of random variables X1 and X2, and let
f X 1 ( x1 ) and f X 2 ( x 2 ) be the marginal PDFs of X1 and X2, respectively.
f X1 , X2 ( x1 , x2 )
f X1|X2 = x2 = (7.12)
f X2 (X 2 = x2 )
and the conditional cumulative distribution function (CCDF) of X1 given
X2 = x2 can be expressed as
x1
x1 ∫ f X1 , X2 (u1 , x2 )du
(7.12a)
−∞
FX1|X2 = x2 (X1 ≤ x1 |X 2 = x2 ) = ∫ f X1|X2 = x2 (u|x2 )du =
f X2 ( x2 )
−∞
292 Risk and Reliability Analysis
f X1 , X2 , X3 ( x1 , x2 , x3 )
f X1 , X2|X3 = x3 = (7.12d)
f X3 (X 3 = x3 )
f X1 , X2 , X3 ( x1 , x2 , x3 )
f X1|X2 = x2 , X3 = x3 = (7.12e)
f X 2 , X 3 (X 2 = x2 , X 3 = x3 )
FX1 , X2 , X3 ( x1 , x2 , x3 ) FX1 , X2 , X3 ( x1 , x2 , x3 )
FX1 , X2|X3 = x3 = = (7.12f)
FX3 (X 3 ≤ x3 ) FX3 ( x3 )
FX1 , X2 , X3 ( x1 , x2 , x3 ) FX1 , X2 , X3 ( x1 , x2 , x3 )
F ’X1|X2 ≤ x2 , X3 ≤ x3 = = (7.12g)
FX2 , X3 (X 2 ≤ x2 , X 3 ≤ x3 ) FX2 , X3 ( x2 , x3 )
5. Special case: Let X1, X2, and X3 be independent variables. Then the condi-
tional distributions of X1 and X2, given X3 = x3, and of X1 and X2, given
X3 ≤ x3, are the same and are expressed as
Multivariate Probability Distributions 293
Example 7.9 Consider Example 7.1. What is the probability density function of
f X1|X2 ( x1 |X 2 = 2000) and f X1|X2 ( x1 |X 2 ≤ 2000)?
Solution f X1|X2 ( x1 |X 2 = 2000) can be solved by using Eq. 7.12. The joint distri-
bution and each marginal have already been calculated in Example 7.1. Then
f X1 , X2 ( x1 , x2 = 2000)
f X1|X2 ( x1 |X 2 = 2000) =
f X2 ( x2 = 2000)
∂ FX1 , X2 ( x1 , x2 ≤ 2000)
f X1|X2 ( x1 |X 2 ≤ 2000) = ( )
∂x1 FX2 ( x2 ≤ 2000)
0.06 0.07
0.05 0.06
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01 0.01
0 0
0 50 100 0 20 40 60 80
X1 X1
Figure 7-15 Conditional probability density functions.
294 Risk and Reliability Analysis
1
TX1 , X2 ( x1 , x2 ) = (7.13)
1 − FX1 , X2 ( x1 , x2 )
1
TX1|X2 = x2 ( x1 |X 2 = x2 ) = (7.13a)
1 − FX1|X2 = x2 ( x1 |x2 = x2 )
1
TX1|X2 = x2 ( x1 |X 2 ≤ x2 ) = (7.13b)
1 − FX1|X2 ≤ x2 ( x1 |x2 ≤ x2 )
1
TX1 , X2 , X3 ( x1 , x2 , x3 ) = (7.13c)
1 − FX1 , X2 , X3 ( x1 , x2 , x3 )
1
TX1 , X2|X3 = x3 = (7.13d)
1 − FX1 , X2|X3 = x3 ( x1 , x2 |X 3 = x3 )
1
TX1 , X2|X3 ≤ x3 = (7.13e)
1 − FX1 , X2|X3 = x3 ( x1 , x2 |X 3 ≤ x3 )
1
TX1|X2 = x2 , X3 = x3 = (7.13f)
1 − FX1|X2 = x2 , X3 = x3 ( x1 |X 2 = x2 , X 3 = x3 )
1
TX1|X2 ≤ x2 , X3 ≤ x3 = (7.13g)
1 − FX1|X2 ≤ x2 , X3 ≤ x3 ( x1 |X 2 ≤ x2 , X 3 ≤ x3 )
In Example 7.10, only bivariate conditional return periods are considered. The
trivariate conditional return period can be obtained in a similar manner.
Example 7.10 Consider Example 7.4. Determine the conditional return period
given X2 ≤ 50,000, 60,000, and 90,000 cfs·day.
Solution In Example 7.4, marginal distributions and joint distribution were cal-
culated. Thus the conditional return period can be estimated by using Eq. 7.12 as
FX1 , X2 ( x1 , x2 )
FX1|X2 ( x1 |X 2 ≤ x2 ) =
FX2 ( x2 )
a b
Figure 7-16 Conditional probability distribution of discharge (a) and conditional return
period (b) given different flood volumes (cfs·day).
where the copula C is a mapping uniquely determined on the unit square whenever
FXi ( xi ) , i = 1, 2, …, N, are continuous. A copula captures the essential features of the
dependence between the random variables and is essentially a function that connects
the multivariate probability distribution to its marginals. Thus the problem of
determining H reduces to determining C. There are a variety of copulas that can
be employed for deriving multivariate distributions. De Michele and Salvadori
(2003) applied the copula method for rainfall analysis. Favre et al. (2004)
applied elliptical and Archimedean copulas and copulas with quadratic section
for frequency analysis of flood peaks as well as that of peak flows and volumes.
Salvadori and De Michele (2004) employed copulas for bivariate frequency
analysis of hydrological events. De Michele et al. (2005) used a two-dimensional
copula to derive a bivariate probability distribution for evaluating the ade-
quacy of dam spillways. Zhang and Singh (2006) used Archimedean copulas for
flood frequency analysis. The Archimedean copula family is perhaps the most
popular and commonly used family in hydrology and environmental
engineering.
Multivariate Probability Distributions 297
where ϕ(•) is the generating function of the Archimedean copula, θ is the cop-
ula parameter, which is hidden in the generating function, and Cθ denotes the
representation of the copula. Thus, the Archimedean copula is determined from
Eq. 7.15. Furthermore, the N-dimensional Archimedean copulas can be defined
as described in the following section.
Following Nelson (1999), one has that the functions CN in Eq. 7.17 are the
serial iterates of the two-dimensional Archimedean copulas generated by ϕ.
Then for N ≥ 3,
⎛ θ 1/ θ ⎞
θ ⎧ ⎡ x2 − μ ⎤⎫
F( x1 , x2 ) = Cθ (u1 , u2 ) = exp⎜− {− ln⎡
⎣ 1 − exp(− λx ) ⎤
⎦} + ⎨− ln Φ( ) ⎬ ⎟
⎜ 1
⎩ ⎢ ⎣ σ ⎥
⎦⎭ ⎟
⎝ ⎠
1− θ(1− t) ⎛ 3 θ − 2 ⎞ 2⎛ 1 ⎞2
ϕ(t) = ln , τ =⎜ ⎟− ⎜1 − ⎟ ln(1− θ) (7.20a)
t ⎝ θ ⎠ 3⎝ θ⎠
⎡ x −β ⎤
u1 = FX1 ( x1 ) = 1 − exp(− λx1 ); u2 = exp ⎢ − exp(− 2 )
⎣ α ⎥⎦
Substituting u1 and u2 into Eq. 7.20, one can obtain the joint distribution
through the Ali–Mikhail–Haq copula as
⎧ ⎡ x2 − β ⎤ ⎫
⎣⎡1 − exp(− λx1 )⎤⎦⎨exp ⎢ − exp(− α )⎥ ⎬
⎩ ⎣ ⎦⎭
FX1 , X2 ( x1 , x2 ) = Cθ (u1 , u2 ) =
⎧ ⎡ x − β ⎤⎫
1 − θ exp(− λx1 ) ⎨1 − exp ⎢ − exp(− 2 )⎥ ⎬
⎩ ⎣ α ⎦⎭
Cθ (u1 , u2 ) = Cθ⎡
⎣ FX1 ( x1 ), FX2 ( x2 ) ⎤
⎦= FX1 , X2 ( x1 , x2 )
1 ⎪ ⎧ ⎡ ⎤⎣
⎣ exp(θ u) − 1⎦⎡ exp(θ v) − 1⎤⎫
⎦⎪
= ln ⎨1+ ⎬, θ ≠ 0 (7.21)
θ ⎪ ⎩ exp(θ) − 1 ⎪
⎭
with
⎡ exp(θ t) − 1 ⎤ 4
ϕ(t) = ln ⎢ ⎥ , τ = 1 − [D1 (− θ) − 1] (7.21a)
⎣ exp( θ ) − 1 ⎦ θ
300 Risk and Reliability Analysis
k θ tk
Dk (θ) =
xk
∫0 exp(t) − 1
dt , θ > 0 (7.21b)
kθ
Dk (− θ) = Dk (θ) + (7.21c)
k +1
⎡ ⎛ x − β⎞ ⎤
u1 = FX1 ( x1 ) = 1 − exp(− λx1 ); u2 = exp ⎢ − exp ⎜ − 2 ⎟⎠ ⎥
⎣ ⎝ α ⎦
Substituting u1 and u2 into Eq. 7.21, one can obtain the joint distribution
through the Frank copula as
FX1 , X2 ( x1 , x2 ) = Cθ (u1 , u2 )
⎛ ⎧ ⎡ ⎛ x − β ⎞⎤ ⎫ ⎞
⎜ exp {θ⎡ ⎦}− 1 exp⎨θ exp⎢ exp⎜− 2
⎣ 1− exp( λx1 )⎤ ⎟⎥− 1⎬ ⎟
1 ⎜ ⎩ ⎣ ⎝ α ⎠⎦ ⎭ ⎟
= ln⎜1+ ⎟
θ exp(θ) − 1
⎜
⎜ ⎟
⎟
⎝ ⎠
( )
−1/ θ
Cθ (u1 , u2 ) = Cθ ⎡⎣ FX1 ( x1 ), FX2 ( x2 )⎤⎦ = FX1 , X2 ( x1 , x2 ) = u1− θ + u2− θ − 1 , θ ≥ 0 (7.22)
with
θ
ϕ(t) = t−θ − 1, τ = (7.22a)
θ+2
Eq. 7.22, one can obtain the joint distribution through the Cook–Johnson (Clay-
ton) copula as
FX1 , X2 ( x1 , x2 ) = Cθ (u1 , u2 )
−1/θ
⎧ −θ ⎡ x2 − β ⎤ ⎫
= ⎨⎡
⎣ 1− exp(−λx1 ) ⎤
⎦ + exp⎢−θ exp(− ) ⎥− 1⎬
⎩ ⎣ α ⎦ ⎭
with
ϕ(t) = − ln ⎡⎣1 − (1 − t)θ ⎤⎦ , θ ≥ 1 (7.23a)
θ 1/ θ
θ ⎧ ⎡ x − β ⎤⎫
exp(− λx1 ) + ⎨1 − exp ⎢exp(− 2 ) ⎬
⎩ ⎣ α ⎥⎦ ⎭
FX1 , X2 ( x1 , x2 ) = Cθ (u1 , u2 ) = 1 − θ
θ⎧ ⎡ x2 − β ⎤ ⎫
− exp(− λx1 ) ⎨1 − exp ⎢exp(− ) ⎬
⎩ ⎣ α ⎥⎦ ⎭
Cθ (u1 , u2 ) = Cθ⎡
⎣ FX1 ( x1 ), FX2 ( x2 ) ⎤
⎦= FX1 , X2 ( x1 , x2 )
= u1u2 exp(−θ ln u1 ln u2 ) (7.24)
302 Risk and Reliability Analysis
with
ϕ(t) = ln(1 − θ ln t) (7.24a)
There are still other Archimedean copulas (e.g., the copula proposed by Gen-
est and Ghoudi (1994)) in the Archimedean copula family. By using the same
procedure as for the generation of two-dimensional Archimedean copulas, N-
dimensional Archimedean copulas can be generated and expressed as
where superscript N denotes the dimension of the copula and u denotes the
variable vector.
Following Nelson (1999), we obtain the copula function CθN , the serial iter-
ate of the Archimedean two-dimensional Archimedean copula generated by j,
which can be expressed as
but this procedure may not always succeed. Thus only the Gumbel–Hougaard,
Frank, Cook–Johnson (Clayton), and Ali–Mikhail–Haq multivariate copulas are
considered here. These N-dimensional Archimedean copulas can be represented
as follows:
• Gumbel–Hougaard multivariate Archimedean copula:
CθN (u ) = CθN ⎡ ⎤
⎣ FX1 ( x1 ), FX2 ( x2 ),..., FX N ( xN ) ⎦= FX1 , X2 ...X N ( x1 , x2 ,..., xN )
CθN (u ) = CθN ⎡ ⎤
⎣ FX1 ( x1 ), FX2 ( x2 ),..., FX N ( xN ) ⎦= FX1 , X2 ...X N ( x1 , x2 ,..., xN )
1 ⎡ (e−θu1 − 1)(e−θu2 − 1)...(e−θuN − 1) ⎤ (7.26)
=− ln⎢ 1+ ⎥
θ ⎣ (e−θ − 1)N−1 ⎦
(
with ϕ(t) = − ln (e − θt − 1)/(e − θ − 1) . )
• Cook–Johnson (Clayton) multivariate Archimedean copula:
CθN (u ) = CθN ⎡ ⎤
⎣ FX1 ( x1 ), FX2 ( x2 ),..., FX N ( xN ) ⎦= FX1 , X2 ...X N ( x1 , x2 ,..., xN )
(7.27)
= (u−θ −θ −θ
1 + u2 + ... + uN − N + 1)
1/ θ
Semiparametric Method
To estimate the copula parameter θ, two conditions may be considered. First, if
appropriate marginals are already available, then one simply expresses the like-
lihood function for the copula. The resulting estimate of θ would then be mar-
ginal dependent; the same maximum likelihood methodology, which is usually
applied for estimation of parameters of univariate probability distributions, is
indirectly effected for the copula method. This is a semiparametric method. Sec-
ond, if nonparametric estimates are contemplated for the marginals, the estima-
tion of the copula parameter θ will be marginal free.
The semiparametric estimation can be expressed step by step as follows:
n
{
L(θ) = ∑ log cθN [ F1n (X1k ),..., FNn (X Nk )] } (7.30)
k =1
where Fin denotes n/(n + 1) times the marginal empirical distribution func-
tion of the ith variable (Genest and Rivest 1993). This rescaling avoids the
difficulty of the potential unboundedness of log [cθ (u1 ,..., uN )] , since
some of the ui s tend to 1. The term cθ denotes the probability density
function of the copula, which has the same meaning as the probability
density function of a univariate random variable.
3. According to the property of the semiparametric estimator, θ is consis-
tent and asymptotically normal under the same conditions as the maxi-
mum likelihood estimation (Genest, et al. 1995), which is an asymptotic
property. To maximize the preceding log-likelihood function the follow-
ing step is needed:
1 ∂ 1 n
L(θ) = ∑ lθ [θ , F1n (X1k ), F2 n (X 2 k ),...FNn (X Nk )] = 0 (7.31)
n ∂θ n k =1
Nonparametric Method
Genest and Rivest (1993) described a procedure to identify a copula func-
tion based on a nonparametric estimation for bivariate Archimedean copu-
las. It is assumed that a random sample of bivariate observations
( x11 , x21 ), ( x12 , x22 ),..., ( x1n , x2 n ) is available and that its underlying distribu-
tion function FX1 , X2 ( x1 , x2 ) has an associated Archimedean copula Cθ , which
also can be regarded as an alternative expression of F. Then the following
steps are followed to identify the appropriate copula:
• Determine Kendall’s τ (the dependence structure of the bivariate random
variables) from observations as
()
−1
n
τn = 2 ∑ sign[(x1i − x1 j )(x2i − x2 j )] (7.32)
i< j
where n is the number of observations; sign = 1 if x1i ≤ x1j and x2i ≤ x2j
and, otherwise, sign = –1; i, j = 1, 2, …., n; and τn is the estimate of τ from
the observations.
• Determine the copula parameter θ from this value of τ according to
the relationship between Kendall’s τ and copula parameter θ (i.e., for the
Gumbel–Hougaard copula, the relationship between Kendall’s τ and the
Multivariate Probability Distributions 305
Example 7.17 Consider the bivariate flow variables X1 (volume in cfs·day) and
X2 (duration in days) with the sample data for each random variable given in
Table E7-17. Suppose that the Gumbel–Hougaard copula family can be applied
to this bivariate data set. Determine the parameter by the semiparametric
method for this bivariate case and the parameter by the nonparametric method
for correlated variables X1 and X2.
Table E7-17
No. X1 X2 No. X1 X2 No. X1 X2 No. X1 X2
Solution
1. Estimation by the semiparametric method for the bivariate variables:
(a) Determine the marginal probabilities: For simplicity, empirical prob-
abilities are obtained from the Gringorten plotting-position formula
as Pi = (I – 0.44)/(n + 0.12) in which n is the sample size and i is the
rank. To avoid the possibility of some of the uis tending to 1, we
need to take n/(n + 1) times the marginal probability denoted as Pic
[i.e., Pic = nPi/(n + 1)].
306 Risk and Reliability Analysis
∂ 2C(u1 , u2 )
c(u1 , u2 ) = (7.33)
∂u1∂u2
ϕ(t)
KC (t) = t − (7.34)
ϕ ’(t)
line indicates that the quantiles are equal. Otherwise, the copula function
needs to be reidentified.
Example 7.18 Consider the same data as in Example 7.17. Obtain the Q–Q plot
for the identification of the Gumbel–Hougaard copula.
Solution
(a) Construct an intermediate variable Z by Eq. 6.35.
(b) Obtain Kn(Z) as K n ( z) = the proportion of zi s ≤ z. These two steps sim-
ply sort the data.
(c) Obtain the parametric estimation of K(z). For the Gumbel–Hougard cop-
ula, the generating function is
ϕ(t) = (− ln t)θ
Nonparametric
where
where
Likewise one can derive the conditional distributions. The conditional distri-
bution function of X, given Y = y for y ≥ 0, has two parts:
p01 hY ( y ) + p11 g( y )H X Y ( x y )
Φ X Y ( x Y = y , Y > 0) = (7.40b)
p01 hY ( y ) + p11 g( y )
1 x
H X Y ( x y ) = H X Y ( x Y = y , X > 0 , Y > 0) = ∫ h(t, y )dt
g( y ) 0
(7.41c)
1 x
HY X ( y x ) = HY X ( y X = x , X > 0 , Y > 0) = ∫ h(x, t)dt
f (x) 0
(7.41d)
p10 H X ( x ) + p11 F( x )
ψ X ( x X > 0) = (7.42a)
p10 + p11
p01 HY ( y ) + p11G( y )
ψY ( y Y > 0 ) = (7.42b)
p01 + p11
Z = Q −1 [F(X )] (7.45a)
W = Q −1 [G(Y )] (7.45b)
where Q–1 is the inverse of Q and Z and W are standard normal. The bivariate
conditional distribution of (X,Y|X > 0, Y > 0) is meta-Gaussian (Kelly and
Krzysztofowicz 1997) if the distribution function is expressed as
{
H ( x , y ) = B Q −1 F( x), Q −1 ⎡⎣Q( y )⎤⎦; γ } (7.46a)
⎧⎪ Q −1 ⎡G( y )⎤ − γQ −1 [ F( x)] ⎫⎪
h( x , y ) =
f ( x ) g( y ) ⎣ ⎦
q⎨ ⎬
2
{
−1
1 − γ q Q ⎡⎣G( y )⎤⎦ ⎩⎪ } 1 − γ2 ⎪⎭
(7.46b)
B( z , w ; γ) = P(Z ≤ z , W ≤ w) (7.46c)
6 γ
ρ= arcsin( ) (7.47)
π 2
Now the conditional distribution function HY X, defined by Eq. 7.41d, can be
expressed as
Multivariate Probability Distributions 311
⎧⎪ Q −1 ⎡G( y )⎤ − γQ −1 [ F( x)] ⎫⎪
HY X ( y x ) = Q ⎨ ⎣ ⎦
⎬ (7.48)
⎪⎩ 1 − γ2 ⎪⎭
The corresponding density function can be obtained by using Eq. 7.41b and
h( x , y )
hY X ( y x) = (7.49)
f (x)
7.4 Questions
7.1 Get instantaneous yearly peak discharge and associated volume data for
a period of at least 30 years for a gauging station near your town. Check
whether the frequency distributions of peak discharge and volume are
normal. If not, then use the Box–Cox transformation to transform them
to normal. Then sketch the joint probability density function of these two
variables.
7.2 Obtain the duration data for the flood events in Question 7.1. If the flood
duration is not normally distributed, then use the Box–Cox transforma-
tion to transform it to normal. Then sketch the trivariate probability den-
sity function.
7.3 Obtain data for two low-flow duration and discharge data for a gauging
station near your town. Assume that they are exponentially distributed.
Then sketch their joint distribution.
7.4 Suppose the low-flow variables in Question 7.3 follow the Gumbel dis-
tribution. Sketch the joint Gumbel distribution.
7.5 Suppose the low-flow variables in Question 7.3 follow the Gumbel logis-
tic distribution. Then sketch the bivariate distribution.
7.6 Obtain data on rainfall intensity and depth for a rain gauge station
nearby. Assume that intensity and depth have gamma distributions.
Sketch the joint probability distribution of rainfall intensity and depth.
7.7 Suppose rainfall intensity and X2 follow a log-normal distribution.
Sketch the joint distribution of rainfall intensity and depth.
312 Risk and Reliability Analysis
Parameter Estimation
313
314 Risk and Reliability Analysis
∫ ( x − a)
r
f ( x )dx
μra ( f ) = −∞ ∞
∫ f ( x )dx
−∞
(8.2)
As the denominator in Eq. 8.2 defines the area under the curve, which is usu-
ally unity or made to unity by normalization, the two definitions are numeri-
cally the same. In this text the definition of Eq. 8.1 is used with f(x) normalized
beforehand. It is assumed here that the integral in Eq. 8.1 converges. There are
some functions that possess moments of lower order; some do not possess any
moment except of zero order. However, if a moment of higher order exists, then
moments of all lower orders must exist. Figure 8-1 shows the concept of moment
of a function about an arbitrary point located at a distance a from the origin.
Moments are statistical descriptors of a distribution and reflect on its quanti-
tative properties. For example, if r = 0 then Eq. 8.1 yields
∞ ∞
μ0a ( f ) = ∫ ( x − a) f ( x )dx = ∫ f ( x )dx =1
0
(8.3)
−∞ −∞
316 Risk and Reliability Analysis
f(x)
a x-a
-∞ 0 dx x
x
∞
Thus, the zero-order moment is the area under the curve defined by f(x) sub-
ject to −∞ < x < ∞ .
If r = 1, then Eq. 8.1 yields
∞
μ1a ( f ) = ∫ (x − a) f (x)dx = μ − a (8.4)
−∞
where μ is the centroid of the area or mean. Thus, the first moment is the
weighted mean about the point a. If a = 0, it is called the first moment about the
origin. It gives the mean and is represented by
∞
μ’1 = ∫ xf (x)dx (8.5a)
−∞
∫ (x − μ) ∫ ( x − a)
r r
μr = f ( x )dx = f ( x )dx (8.5b)
−∞ −∞
The second moment about the mean is known as variance and is a measure
of how the data are scattered about the mean value. The third moment is a mea-
sure of symmetry; it indicates whether the data are evenly distributed about the
mean or the mode is to the left or right of the mean. If the mode is to the left of
the mean, the data are said to have positive skew and if it is to the right, the data
are said to have negative skew. The coefficient of skewness is a quantitative mea-
sure of the skewness in the data. The fourth moment is a measure of peakedness,
which is explained through kurtosis. Kurtosis is the peakedness or flatness of
data with respect to the normal distribution.
when fx is normalized:
∞
∑ fx = 1 (8.7)
x=−∞
Otherwise,
∞ ∞
Mr = ∑ xr fx / ∑ fx (8.8)
x =−∞ x =−∞
It is thus seen that Eq. 8.6 and Eq. 8.8 are analogous to Eq. 8.1 and Eq. 8.2.
Figure 8-2 explains the concept of moment of a discrete function.
Sample moments are often biased owing to the small size of the sample.
Commonly, the first four moments are used in parameter estimation. Selected
central moments or moments about sample mean (x) are
1 n
M2μ = ∑ ( x i − x )2
n i =1
(8.9)
n
n
M 3μ = ¦ ( xi − x ) 3
(n − 1)(n − 2) i =1 (8.10)
n
n2
M4μ = ∑ ( xi − x )4
(n − 1)(n − 2)(n − 3) i =1
(8.11)
318 Risk and Reliability Analysis
a
… -3 -2 -1 1 2 3 …
-∞ x ∞
x-2 x2
x-3 x3
Example 8.1 The histogram of annual flows of the Sabarmati River in India is
given in Table 8-1. Find the mean and variance of the sample data and suggest
the candidate distribution(s) using the method of moments.
Solution Summing the frequencies gives
6 + 11 + 9 + 19 + … + 2 + 0 + 0 = 98
The first moment of the data is
[(150 × 6) + (250 × 9) + (350 × 11) + …+ (1,750 × 1)]/98 = 664.29 cumecs
This is the mean of the data.
The second moment about the mean gives the variance:
second moment = [(150 – 664)2 × 6 + (250 – 664)2 × 9 + … + (1,750 – 664)2
× 1]/98 = 120,000 cumecs2
Parameter Estimation 319
Table 8-1 Data description of annual flows of the Sabarmati River in India
Discharge Average Frequency Discharge Average Frequency
range (m3/s) (m3/s) range (m3/s) (m3/s)
100–200 150 6 200–300 250 9
300–400 350 11 400–500 450 9
500–600 550 9 600–700 650 9
700–800 750 19 800–900 850 6
900–1,000 950 6 1,000–1,100 1,050 1
1,100–1,200 1,150 5 1,200–1,300 1,250 2
1,300–1,400 1,350 3 1,400–1,500 1,450 0
1,500–1,600 1,550 2 1,600–1,700 1,650 0
1,700–1,800 1,750 1
1 È 1 Ê ln x - m ˆ 2 ˘
exp Í - Á ˜¯ ˙˙ , - • £ x £ •
Y
fX ( x) =
xs y 2p ÍÎ 2 Ë s Y ˚
It has two parameters, μX and σ X. By definition these are the first and second
moments of the normal distribution, that is, E [ X ] = μX and E ⎡( X − μX )2 ⎤ = σ 2X .
⎣ ⎦
320 Risk and Reliability Analysis
Thus, by using the method of moments the parameter estimates for μX and
σ X are given as
1 ⎡ 1 ⎛ ln x − μ ⎞ 2 ⎤
fX ( x) = exp ⎢ − ⎜ Y ⎥, x≥0
xσ y 2π ⎢⎣ 2 ⎝ σ Y ⎟⎠ ⎥⎦
The log-normal distribution can be treated in exactly the same way as the nor-
mal distribution by transforming the random variable X into another random vari-
able Y using the transformation Y = ln(X). As described in Chapter 5, the estimates
of the mean and variance of Y can be estimated using the following relations:
1 ⎛ μ 2
ˆX ⎞
μ
ˆY = ln ⎜ = 6.5
2 ⎝ 1 + CVX ⎟⎠
ˆ 2
σˆ Y2 = ln (1+ CV
ˆ 2 ) = 0.492
X
ˆ
γˆ X = 3CV ˆ 3
X + CVX = 1.71
The PDF of the gamma distribution with k and λ as parameters has been
described in Chapter 4 as
λk x k −1e − λx
fX k ( x) = , x ≥ 0, λ > 0, k > 0
Γ( k )
The algebraic expressions for the mean and variance of X are expressed as
k
E[X ] =
λ
and
k
σX2 =
λ2
Equating the distribution moments of X with its sample moments, one gets
kˆ
= 664.29
ˆλ
Parameter Estimation 321
and
kˆ
= 346.412
ˆλ2
On solving these two equations, one gets kˆ = 3.677 and ˆλ = 0.006 . The fitted
gamma distribution has a coefficient of skewness of 1.04.
The plot in Fig. 8-3 compares the PDFs of various fitted distributions with
the observed data relative frequency distributions. It appears that out of these
three candidate distributions, the gamma distribution is a better choice.
Example 8.2 If one wants to fit an exponential distribution to the data given in
Example 8.1, what will be its parameter based on the method of moments?
Solution As given in Chapter 4, the PDF of an exponential distribution is
fx(x) = λe–λx, x ≥ 0
∞
1
E[X ] = ∫ x λe − λx dx =
0
λ
0.30
Observed data
0.25 Lognormal
Gamma
Normal
0.20
Relative Frequency
0.15
0.10
0.05
0.00
0 200 400 600 800 1000 1200 1400 1600 1800 2000
3
Discharge (m /sec)
n
f ( x1 , x2 ,..., xn ; a1 , a2 ,..., am ) = π f ( xi ; a1 , a2 ,..., am ) (8.12)
i =1
n n
ln L = L* = ln π f ( xi ; a1 , a2 ,..., am ) = ∑ ln f ( xi ; a1 , a2 ,..., am ) (8.14)
i =1
i =1
Example 8.3 Using the maximum likelihood estimation procedure, find the
parameter α of the exponential distribution for the data of the Sabarmati River in
India given in Example 8.1.
Solution The probability density function of the one-parameter exponential dis-
tribution is given by
fX(x) = α exp(–α x) (8.16)
n ⎛ n ⎞
L( α) = ∏ α exp(− αxi ) = αn exp ⎜ − α∑ xi ⎟ (8.17)
i =1 ⎝ i =1 ⎠
⎛ n ⎞
ln L( α) = n ln( α) − α ⎜ ∑ xi ⎟ (8.18)
⎝ i =1 ⎠
where n is the sample size. Differentiating Eq. 8.18 with respect to α gives
d ln L( α) n n
= − ∑ xi = 0
dα α i =1
This yields
⎛ n ⎞ 1
α = n / ⎜ ∑ xi ⎟ = (8.19)
⎝ i =1 ⎠ x
In Example 8.1, the mean of the data was found to be 664.9 cumecs. This will
give the estimate of α as
α = 1/664.29 = 1.51 × 10-3 cumec–1
In Fig. 8-4, the likelihood function for a typical case is plotted. The maximum
likelihood estimation tries to find that value of parameter that gives the maxi-
mum value of the likelihood function (or its logarithm). Thus, in the present
case, a value of α that is equal to the reciprocal of the mean (1/x or 1/mx) is most
likely to be the true value of the parameter.
324 Risk and Reliability Analysis
Estimate the parameters of this distribution using the data of the Amite
River at Darlington, Louisiana (Example 3.6), using the method of maximum
likelihood estimation.
Solution The likelihood function is formed as
1 ⎡ 1 n ⎤
L= exp ⎢ − ∑ ( x i − ε) ⎥ (8.21)
αn ⎢⎣ α i =1 ⎥⎦
The log-likelihood function becomes
1 n
log L = − n log α − ∑ (xi − ε)
α i =1
(8.22)
1 n
α= ∑ (xi − ε) = mx − ε
n i =1
n(mx − x1 )
α
ˆ= (8.23)
(n − 1)
nx1 − mx
εˆ = (8.24)
(n − 1)
where x1 is a minimum value.
For the Amite River data given in Example 3.6, n = 48, mean = 28,676, and
x1 = 3,180. Therefore,
α = 48(28,676 – 3,180)/(48 – 1) = 26,038.0
ε = [(48 × 3,180) – 28,676)]/(48 – 1) = 2,637.5
Parameter Estimation 325
Likelihood function
1/mx α
Figure 8-4 A typical likelihood function.
1
Mi , j , k = E[x i F j (1 − F )k ] = ∫ [x( F )]i F j (1 − F )k dF (8.25)
0
where Mi,j,k is the probability-weighted moment of order (i, j, k), E is the expecta-
tion operator, and i, j, and k are real numbers. If j = k = 0 and i is a non-negative
integer then Mi,0,0 represents the conventional moment of order i about the
origin. If Mi,0,0 exists and X is a continuous function of F, then Mi,j,k exists for all
non-negative real numbers j and k.
326 Risk and Reliability Analysis
k k
⎛ ⎞
Mi ,0 , k = ∑ ⎜ ⎟ ( −1) Mi , j ,0
j
(8.26)
j=0
⎝ ⎠
j
j
⎛ j⎞
Mi , j , 0 = ∑ ⎜⎝ k⎟⎠ (−1)k Mi ,0,k (8.27)
k =0
If Mi,0,k exists and X is a continuous function of F then Mi,j,0 exists. When the
inverse X = X(F) of the distribution F = F(X) cannot be analytically defined, it
may, in general, be difficult to derive Mi,j,k analytically. We normally work with
the moments Mi,j,k into which x enters linearly. In particular, if we consider an
ordered sample in which x1 ≤ x2 ≤ … ≤ xn, the PWM for hydrologic applications
are defined as
1 n ⎛ n − i⎞ ⎛ n − 1⎞
M1,0 , s = as = ∑
n i =1 ⎜⎝ s ⎟⎠
xi ⎜⎝ s ⎟⎠ (8.28)
1 n ⎛ i − 1⎞ ⎛ n − 1⎞
M1,0 , r = br = ∑
n i =1 ⎜⎝ r ⎟⎠
xi ⎜⎝ r ⎟⎠ (8.29)
s
⎛ s⎞
as = ∑ (−1)k ⎜⎝ k⎟⎠ bk (8.30)
k =0
r
⎛ r⎞
br = ∑ (−1)k ⎜⎝ k⎟⎠ ak (8.31)
k =0
Therefore,
a0 = b0, b0 = a0
a1 = b0 – b1, b1 = a0 – a1 (8.32)
variability: PWM are more robust than conventional moments to outliers in the
data, enable more secure inferences to be made from small samples about an
underlying probability distribution, and they frequently yield more efficient
parameter estimates than the conventional moment estimates.
Example 8.5 Using the data of Example 8.4, find the parameters of a two-
parameter exponential distribution using the PWM method.
Solution For the given data, the first PWM are determined as
a0 = 28,675.8
a1 = 8,724.1
1 1
a0 = ∫ xdf =∫ [ ε − αln(1− F )] dF
0 0
1 1
a1 = ∫ x (1 − F ) df = ∫ [ ε − αln (1− F )] (1− F ) df
0 0
a0 = εˆ + α
ˆ , a1 = εˆ / 2 + α
ˆ /4
εˆ + α
ˆ = 28675.9 , 2εˆ + α
ˆ = 34896
moments, and L-moments are related to each other. L-moments are known to
have several important advantages over ordinary moments. L-moments have less
bias than ordinary moments because they are linear combinations of ranked
observations. As an example, the variance (second moment) and skewness (third
moment) involve squaring and cubing of observations, respectively, which com-
pel them to give greater weight to the observations far from the mean. As a result,
they result in substantial bias and variance.
If X is a real-value ordered random variate of a sample of size n, such that
x1:n ≤ x2:n ≤ ….≤ xn:n with the cumulative distribution F(x) and quantile function
x(F), then the rth L-moment of X (Hosking 1990) can be defined as a linear func-
tion of the expected order statistics as
1 r −1 ⎛ r − 1⎞
Lr = ∑ ( −1)k ⎜ E {X r − k :r } , r = 1, 2,... (8.33)
r k =0 ⎝ k ⎟⎠
{ }
E X j :r =
r!
(r − j)! j !
∫ j−1 r−j
x {F ( x )} {1− F ( x )} dF ( x ) (8.34)
1
Lr = E[xPr*−1 {F( x)}] = ∫ x( F )Pr*−1 ( F )dF , r = 1, 2,...
0 (8.35)
k
r −k ⎛ r ⎞ ⎛ r + k⎞ k
Pr* ( F ) = ∑ ( −1) ⎜⎝ k ⎟⎠ ⎜⎝ k ⎟⎠ F (8.36)
r
r
Pr* ( F ) = ∑ Pr ,k F k (8.37)
k =0
and
r −k ⎛ r ⎞ ⎛ r + k⎞
Pr , k = ( −1) ⎜⎝ k ⎟⎠ ⎜⎝ k ⎟⎠ (8.38)
The shifted Legendre polynomials are related to the ordinary Legendre poly-
nomials Pr(u) as P*r(u) = Pr(2u – 1) and are orthogonal on the interval (0,1) with a
constant weight function.
Parameter Estimation 329
L 1 = E( x) = ∫ xdF (8.39)
1
L2 = E ( x2:2 − x1:2 ) = ∫ x ( 2F − 1) dF (8.40)
2
L3 =
1
3
( )
E ( x3:3 − 2 x2:3 + x1:3 ) = ∫ x 6 F 2 − 6 F + 1 dF (8.41)
L4 =
1
4
( )
E ( x 4:4 − 3 x3:4 + 3 x2:4 − x1:4 ) = ∫ x 20 F 3 − 30 F 2 + 12F − 1 dF (8.42)
r r
LP , r +1 = (−1)r ∑ Pr , k αk = ∑ Pr ,k β (8.43)
k =0 k =0
Example 8.6 Using the data of Example 8.4, find the parameters of exponential
distribution using the L-moments.
Solution The parameter estimation using L-moments is very similar to the
method of moments. As described earlier, in the method of moments we equate
the first k conventional moments of the distribution to the first k conventional
sample moments, whereas in the L-moment method, the first k L-moments of the
distribution are equated to the first k L-moments of the sample data.
We determined the first two PWM in Example 8.5 as a0 = 28,675.8 and
a1 = 8,724.1. Using Eq. 8.4 we obtain the first two L-moments based on the sam-
ple data:
sample LP,1 = a0 = 28,675.8
Now we will determine the algebraic expression for the first two L-moments
of the exponential distribution for which the parameters are to be determined.
The inverse form of exponential distribution given by Eq. 8.20 is written as
x = ε – α ln(1 – F) (8.47)
Using Eq. 8.25, we can write expressions for PWM as
1 1
a0 = ∫ xdf = ∫ [ ε − αln (1 − F )] dF
0 0
1 1
a1 = ∫ x(1 − F )dF = ∫ [ ε − α ln (1 − F )] (1 − F )dF
0 0
Now equating the distribution moments with the sample moments, one can
find the estimators of ε and α. Thus,
distribution LP,1 = sample LP,1: ε + α = 28,675.8
yielding
α = 2 × 11,227.6 = 22,445.2
ε = 28,675.8 – 22,445.2 = 6,230.6
models in hydrology. Williams and Yeh (1983) described MOLS and its variants
for use in rainfall–runoff models. Jones (1971) linearized weight factors for least
squares (LS) fitting. Shrader et al. (1981) developed a mixed-mode version of
MOLS and applied it to estimate parameters of the log-normal distribution. Sny-
der (1972) reported on fitting of distribution functions by nonlinear least
squares. Stedinger and Tasker (1985) performed regional hydrologic analysis
using ordinary, weighted, and generalized least squares.
MOLS is quite good for mathematical functions that can be linearized. Most
of the distributions used in engineering analysis can be linearized rather easily.
For these distributions, the calculations are relatively easy and straightforward.
Further, this technique provides a good measure of the goodness of fit of the
chosen distribution in the form of R-square value (coefficient of determination).
MOLS is generally best used with complete data sets containing no censored or
interval data.
Let Y = f(X; a1, a2,…, am) be a linearized form of a distribution function, where
ai, i = 1,2,…. , m, are parameters to be estimated. The method of least squares
involves estimating parameters by minimizing the sum of squares of all devia-
tions between observed and computed values of Y. Mathematically, this sum S
can be expressed as
n n n
S = ∑ d12 = ∑ [ y0 ( i ) − yc ( i )] = ∑ ⎡⎣ y0 ( i ) − f ( x ; a1 , a2 ,..., am ) ⎤⎦
2 2
(8.50)
i =1 i =1 i =1
where y0(i) is the ith observed value of Y, yc(i) is the ith computed value of Y, and
n > m is the number of observations. The minimum of S in Eq. 8.50 can be
obtained by differentiating S partially with respect to each parameter and equat-
ing each differential to zero:
n
∂∑ ⎡⎣ y0 ( i) − f ( xi ; a1 , a2 ,..., am )⎤⎦
2
i =1
(8.51)
=0
∂ai
This leads to m equations, usually called the normal equations, which are
then solved to estimate the m parameters. This method is used to estimate
parameters of a linear regression model. For instance, suppose a linear equation
of the type
yi = a + bxi (8.52)
The residual error at this point is ei = yi – ŷi , which is a measure of how well
the least-squares line conforms to the raw data. If the line passes through each
sample point, the error ei would be zero. The sum of the square of the errors is
n n
Sse = ∑ ei2 = ∑ ( yi − yi )2 (8.54)
i =1 i =1
Example 8.7 The precipitation and runoff for a catchment for the month of July
are given in Table E8-7. The relationship between rainfall and runoff follows a
linear relation of the form y = a + bx, where y represents runoff and x precipita-
tion. Estimate parameters a and b using MOLS.
Solution Parameters a and b are computed using Eq. 8.55. To that end,
x = 687.05/16 = 42.94, y = 234.04/16 = 14.63. The various other quantities, such
as Sxy and Sxx, required to calculate a and b are computed in Table E8-7. Thus,
b = Sxy/Sxx =369.423/570.0559 = 0.648
Example 8.8 Solve Example 8.4 using the method of ordinary least squares.
Solution The probability density function of a two-parameter exponential dis-
tribution is given by
1 ⎛ x − ε⎞
fX ( x) = exp ⎜ − ⎟ ε<x<∞
α ⎝ α ⎠
The cumulative distribution function for this two-parameter exponential dis-
tribution is given by
x x
1 ⎛ x − ε⎞ ⎛ x − ε⎞
FX ( x ) = ∫ f X ( x)dx = ∫ exp ⎜ − ⎟⎠ dx =1 − exp ⎜⎝ − ⎟
α0 ⎝ α α ⎠
0
⎛ x − ε⎞
1 − FX ( x ) = exp ⎜ − ⎟
⎝ α ⎠
Taking logarithm of both sides gives
ε 1
ln [1 − FX ( x )] = − x
α α
This equation is the linearized form of the two-parameter exponential dis-
tribution as it matches with Eq. 8.52 in which y = ln[1 – FX(x)], a = ε/α , and
b = –1/α . Now, we can use MOLS as described earlier. It is easier to perform
the calculations of the required quantities in a tabular form as given in
Table E8-8.
Now using Eq. 8.55, one has
−870741.27
bˆ = = −0.00004
2.10 × 1010
and
Table E8-8 Calculations of the required quantiles for Example 8.8 (Continued)
Rank i X Plotting position yi = ln[1–FX(x)] (xi–Mx)2 (xi–Mx) (yi–My)
FX(x) = i/(N+1)
It is known that a = ε/α and b = –1/α. The estimates of α and ε are given as
1 −1
α
ˆ =− = = 24070.15
ˆb −0.0004
εˆ = aˆ α
ˆ = 0.23 × 24070.15 = 5541.68
0.0
-0.5
-1.0
-1.5
y = ln[1-PX(x)]
-3.0
-3.5
-4.0
-4.5
0 20000 40000 60000 80000 100000 120000
Flow, x
8.2.1 Bias
Bias measures how close an estimator is on average to the true parameter value.
Let the parameter be a and its estimate â. The estimate â is called an unbiased esti-
mate of a if the expected value of the estimate equals the true value of the parame-
ter (i.e., E[â] = a). Otherwise, the estimate is said to be biased (i.e., E[ac] ≠ a). Since
the parameters and estimators were known, their bias can be calculated by
m
1
Bias ( aˆ |a) = ∑ aˆ − a = E [ aˆ ] − a
m j=1 j
(8.57)
where j =1,..., m are the numbers of samples, â∈{â1,â2,â3,…}, and a∈{a1, a2, a3,…}.
An unbiased estimate has a probability distribution where the mean equals the
actual value of the parameter. For the sake of convenience, we will write
bias(â|a) as bias(â). Obviously, bias(â) = 0 for an unbiased estimate. It should,
however, be noted that an individual â may not be equal to or even close to a
even if bias(a) = 0. It simply implies that the average of many independent esti-
mates of parameter a will be equal to its true value. The bias(â) is usually consid-
ered to be additive, so that bias(â) = E[â] − a. When we have a biased estimate,
the bias usually depends on the number of observations, n. An estimate is said to
338 Risk and Reliability Analysis
where SD(â) is the standard deviation of â. The following are the properties of
unbiased estimators:
1. They are not unique. For example, let x1, x2,... , xn constitute a random
sample from a uniform distribution with the range defined by param-
eters a1 and a2. Then, [(n+1)/n]yn is an unbiased estimator of a2, where
yn = max(x1, x2,... , xn) is the largest sample value. Further, 2x is also an
unbiased estimator of a2. This shows that unbiased estimates are not
unique.
2. If ã is an unbiased estimator of a, it does not necessarily follow that f(ã) is
an unbiased estimator of f(a), where f(.) is any mathematical function
operating on parameter a. For example, the square root of the sample
variance is not an unbiased estimator of the standard deviation.
Should the lack of bias be considered a desirable property? If many unbiased
estimates are computed from statistically independent sets of observations hav-
ing the same parameter value, the average of these estimates will be close to the
true parameter value. This property does not mean that the estimate has less
error than a biased one; there exist biased estimates whose mean-squared errors
are smaller than unbiased ones. In such cases, the biased estimate is usually
asymptotically unbiased. Lack of bias is good, but that is just one aspect of how
we evaluate estimators.
8.2.2 Efficiency
Efficiency refers to the variance of an estimator. An efficient estimate â of a has to
satisfy two conditions: (1) It must be unbiased, and (2) its variance must be at least
as small as that of any other unbiased estimate of a. If there are two estimates of a,
say, a1 and a2, then the relative efficiency of a2 with respect to a1 is defined as the
ratio of their variances (i.e., var(â1)/var(â2)). Mathematically, it is given as
E [ aˆ1 − a]
2
e= (8.59)
E [ aˆ 2 − a]
2
Parameter Estimation 339
If e < 1, then â1 is more efficient than â2. Only an efficient estimate has e = 1. If
an efficient estimate exists, it may be approximately obtained by the use of MLE
or the entropy method.
An efficient estimate has a mean-squared error that equals a particular lower
bound known as the Cramer–Rao bound. If an efficient estimate exists (the
Cramer–Rao bound being the greatest lower bound), it is optimum in the mean-
squared sense, meaning that no other estimate has a smaller mean-squared error.
If is an unbiased estimator ã of parameter a exists, then under some very general
conditions var(ã) is given by the Cramer–Rao inequality as
1
var ( a ) ≥
⎡⎛ ∂ ln f ( X ) ⎞ 2 ⎤ (8.60)
nE ⎢⎜ ⎟⎠ ⎥
⎢⎣⎝ ∂a ⎥⎦
4n 4 2
e = lim n→∞ = lim n→∞ = ≈ 0.64
π ( 2n + 1) π ( 2 + 1/ n ) π
Thus, the mean is more efficient than the median for all sample sizes for a
normal population. For large samples, the mean requires only about 64% as
many observations as the median to estimate μ with the same reliability. Using
the Cramer–Rao inequality given by Eq. 8.60, one can confirm that X is an
MVUE of the mean μ of a normal distribution.
340 Risk and Reliability Analysis
0.65
0.60
Efficiency of Median w.r. to Mean
0.55
0.50
0.45
0.40
0 10 20 30 40 50 60 70 80 90 100
Sample Size (n)
MSE ( aˆ ) = E ⎡( aˆ − a) ⎤
2
(8.61)
⎣ ⎦
⎣
2
⎦ ⎢⎣ {
MSE ( aˆ ) = E ⎡⎢( aˆ − E [ aˆ ] + E [ aˆ ] − a) ⎤⎥ = E ⎡ ( aˆ − E [ aˆ ]) + ( E [ aˆ ] − a) ⎤
2
}
⎥⎦
MSE ( aˆ ) = E⎡
2⎤ ⎡ 2⎤
⎣ ( aˆ − E [ aˆ ]) ⎦+ E⎣ ( E [ aˆ ] − a) ⎦+ 2E⎡
⎣ ( aˆ − E [ aˆ ]) (E [ aˆ ] − a)⎤
⎦
On simplifying, the third term cancels out and the remaining expression is
Equation 8.62 shows that the MSE of parameter a is equal to the expected
average squared deviation of the estimator from the true value. It can be com-
puted as the bias squared plus the variance of the estimator. The MSE combines
both bias and variance in a logical way and is therefore a convenient measure of
how closely â approximates a.
Because all estimators are unbiased, bias(a1) = bias(a2) = bias[0.5 × (a1 + a2)] = 0.
Therefore, MSE will be governed by the variance only and will be minimum for
the variable having the smallest variance. Using the properties of the exponential
distribution, one can write
So,
Example 8.11 Consider the previous example and determine what value of con-
stant b minimizes the MSE of b × (a1 + a2).
Solution Let us represent the estimator b × (a1 + a2) by â, so that
Now differentiating MSE with respect to b and equating it to zero, one deter-
mines the minimum of the MSE function:
⎣MSE ( aˆ )⎤
d⎡ ⎦
= 12b − 4 = 0
db
b = 1/3
342 Risk and Reliability Analysis
8.2.4 Consistency
As already explained, the bias refers to the mean value of the estimator and the
efficiency refers to the variance of the estimator. Now, we will discuss another
property that refers to both the bias and the variance of the estimator. As shown
in Eq. 8.62, MSE is a combination of bias and variance of an estimator. We term
an estimate consistent if the MSE tends to be zero as the number of observations
becomes large (i.e., lim nÆ• MSE Æ 0 ). Thus, a consistent estimate must be at
least asymptotically unbiased. In other words, error in the estimator continu-
ously decreases as the sample size increases.
Unbiased estimates whose errors never diminish as more data are collected
do exist. Their variances remain nonzero no matter how much data are available.
Inconsistent estimates may provide reasonable estimates when the amount of
data is limited, but they have the counterintuitive property that the quality of
the estimate does not improve as the number of observations increases.
Although smaller MSE than a consistent estimate over a pertinent range of val-
ues of n may be appropriate in certain circumstances, consistent estimates are
usually favored in practice.
8.2.5 Sufficiency
An estimate of a parameter a is termed sufficient if it uses all of the information that
is contained in the sample and pertinent to the parameter estimation. More pre-
cisely, let a1 and a2 be two independent estimates of a. Estimate a1 is considered a
sufficient estimate if the joint probability distribution of a1 and a2 has the property
f(a1,a2) = f (a1)f(a2|a1) = f(a1)K(x1, x2,… , xn) (8.63)
NSE = σ ( aˆ ) / a (8.64)
where σ (.) denotes the standard deviation of a and is computed as
12
⎡ 1 n 2⎤
σ ( aˆ ) = ⎢ ∑ {aˆ i − E ( aˆ i )} ⎥ (8.65)
⎢⎣ n − 1 i =1 ⎥⎦
where the summations are over n estimates â of a. This measure is similar to the
coefficient of variation.
Parameter Estimation 343
2 ⎞ 0.5
1 ⎛ n ⎡ Q − Qc ⎤
RME = ⎜ ∑ ⎢ 0 ⎥ ⎟ (8.66)
n ⎜⎝ i =1 ⎣ Q0 ⎦ ⎟⎠
1 n Q0 − Qc
RAE = ∑
n i =1 Qc
(8.67)
RMSE
NRMSE = (8.69b)
σ ( a)
8.2.9 Robustness
Kuczera (1982a,b,c) defined a robust estimator as the one that is resistant and
efficient over a wide range of fluctuations of population. Two criteria for resis-
tant estimators are mini-max and minimum average RMSE. According to the
mini-max criteria, the maximum RMSE for all population cases should be mini-
mum. Thus, for a resistant estimator the average RMSE as well as the maximum
RMSE should be minimum.
344 Risk and Reliability Analysis
1 1 1 1
mx = X1 + X 2 + X 3 + … + X n
n n n n (8.70)
The sample mean mx is considered a random variable, for X1 , X2 ,… , Xn are
random variables. This can be seen by observing that any repetition of the n
observations will result in different values for X1, X2, X3,… , Xn. Therefore, X1,
X2, X3,… , Xn are regarded as n random variables. For each sample these vari-
ables will take on values in accord with their probability distributions. This
means that mx is the sum of n random variables Xi, each to be divided by n.
From the central limit theorem, one can first conclude that mx is approxi-
mately normally distributed. As described in Chapter 5, the approximation is
better when n is large. If X itself is already normally distributed, then the sample
mean is normally distributed even if n is only 2. Second, taking the expectation
of mx in Eq. 8.70 one obtains
1 1 1 1
E(mx ) = E(X1 ) + E(X 2 ) + E(X 3 ) +…+ E(X n ) (8.71)
n n n n
The mean value of all the variables Xi is evidently μx. It follows that
E(mx) = μx (8.72)
Parameter Estimation 345
One can determine the variance of mx. The terms in Eq. 8.70 are independent
and for independent variables it is known that the variance of a sum is equal to
the sum of the variances. Therefore,
1 1 1 1 σ2 (8.73)
var(mx ) = var(X1 ) + var(X 2 ) + var(X 3 ) + ... + var(X n ) =
n2 n2 n2 n2 n
In summary, variable mx ~ N(μ, σ / n ); mx is normally distributed with a
mean equal to the mean of the variable and a standard deviation equal to the
standard deviation of the variable divided by the square root of n. Equation 8.73
shows that the mean of a set of observations becomes less variable as the number
of observations increases. Otherwise, mx can be expected to approximately follow
a normal distribution. In the limit the variance approaches zero and the sample
mean approaches the mean of the distribution, μ. However, should X be normally
distributed, then mx is normally distributed, regardless of the sample size.
Example 8.12 At a building site, 45 samples of soil were taken and their analysis
showed that the mean compressive strength was 35,000 kPa with a standard
deviation of 600 kPa. Find the standard deviation of the mean. How many sam-
ples will be required to reduce this standard deviation by half?
Solution The standard deviation of the mean is
⎡ μ− m ⎤
P ⎢ −1.96 ≤ ≤ 1.96 ⎥ = 0.95
⎣ σ/ n ⎦
or
or
P[34,825 ≤ m ≤ 35,175] = 0.95
346 Risk and Reliability Analysis
It is acknowledged here that the true mean is not known but the probability
that the true mean lies in the range 34,825 to 35,175 is known to be 0.95. This
leads to the interpretation of mean as a random variable instead of a fixed value.
In other words, here we go from a point estimate of a parameter to an interval
estimate. The width of the interval depends on three factors.
1. The width of the interval increases with increasing standard deviation of
the data and vice versa, if all other things remain the same.
2. The width also depends on the probability, which was 0.95 in this example.
3. As this probability increases, the interval becomes wider, and vice versa.
Commonly used values of this probability are 0.99, 0.95, and 0.90. As seen in
this example, reducing the width of the interval requires a larger number of
samples.
1 n
∑ (Xi − μ)
2
s2 = (8.74)
n i =1
Parameter Estimation 347
If μ is not known then one can calculate the sample variance from the devia-
tions from the sample mean mx. Dividing each term inside the summation sign
in Eq. 8.72 by σ2, and bringing this factor outside the summation, one gets
n 2
σ2 ⎡ Xi − μ ⎤
s2 =
n
∑ ⎢⎣ σ ⎥⎦
(8.75)
i =1
Each term in brackets within the summation sign in Eq. 8.75, Zi, is a normally
distributed random variable with a mean of zero and a standard deviation of
one. The sum of squares of n such variables follows a distribution, known as the
chi-square distribution with parameter n, denoted as χ 2(n):
n
x− μ
χ 2 (n) = ∑ Zi2 , where Z ~ N (0, 1), Z = (8.76)
i =1 σ
σ2 2
s2 = χ ( n) (8.77)
n
It can be shown that the mean of the chi-square distribution is equal to the
degrees of freedom:
var (Z) = E(Z2) – [E(Z)]2
1 = E(Z2) – 0
E(Z2) = 1
Since mx is, in general, not the same as μ, division by σ 2 does not result in a
standardized variable. One can write the departures as
n n
∑ [Xi − μ]2 = ∑ [(Xi − m) + (m − μ)]2
i =1 i =1
n n n
= ∑ [Xi − m]2 + ∑ [m − μ]2 + ∑ 2(Xi − m)(m − μ)
i =1 i =1 i =1
n n
= ∑ (Xi − m)2 + n(m − μ)2 + 2(m − μ) ∑ (Xi − m)
i =1 i =1
Because for each sample the sum of the departures from the sample mean mx
is equal to zero, one obtains
n n
∑ (Xi − μ)2 = ∑ (Xi − m)2 + n (m − μ)2 (8.80)
i =1 i =1
n 2 n 2 2
⎡ X − μ⎤ ⎡X −m⎤ ⎡ m − μ ⎤
∑ ⎢⎣ iσ ⎥⎦ = ∑ ⎢⎣ i σ ⎥⎦ + ⎢⎣ σ / n ⎥⎦ (8.81)
i =1 i =1
The term on the left side in Eq. 8.81 is the chi-square variable with n degrees
of freedom. The second term on the right side is the chi-square variable with one
degree of freedom, since the sample mean mx is normally distributed with a
mean of μ and a standard deviation of σ / n . Noting that the chi-square distri-
bution is additive, one can conclude that the first term on the right side is the
chi-square variable with n – 1 degrees of freedom.
Returning to Eq. 8.79 for calculating the sample variance and dividing each
term in the summation sign by σ2, one obtains
n 2
σ2 ⎛ Xi − m ⎞
s2 =
n
∑ ⎜⎝ σ ⎟⎠
i =1
σ2 2
s2 = χ ( n − 1) (8.82)
n
When s2 is calculated from the sample mean, the effect is one of reducing the
number of degrees of freedom of the chi-square variable by one. Figure 8-7 plots
the chi-square distribution for selected degrees of freedom.
Parameter Estimation 349
0.5
n=2
n=5
0.4 n=10
n=24
0.3
2
χ
0.2
0.1
0
0 5 10 15 20 25 30
x
σ2
E(s2 ) = E ⎡⎣ χ 2 ( n − 1) ⎤⎦ (8.83)
n
It is known that the mean of the chi-square variable is equal to the number of
degrees of freedom. Therefore,
n−1 2
( )
E s2 =
n
σ (8.84)
Equation 8.82 shows that on average the variance of a sample from a distri-
bution with a variance of σ 2 tends to be smaller than σ 2. This is commonly
expressed by saying that Eq. 8.79 produces a biased estimate of the variance of
σ 2. One can remove the bias in Eq. 8.79 by dividing by (n – 1) instead of n. Thus,
an unbiased estimator of σ 2 is obtained as
1 n
s2 = ∑ ( X i − m )2
n − 1 i =1
(8.85)
σ2 2
s2 = χ (n − 1) (8.86)
n−1
350 Risk and Reliability Analysis
The estimates of the 100(1 – α)% confidence interval for the variance can be
constructed by
(n - 1) s2 (n - 1) s2
£s2 £
2
ca c 21 - a / 2
/2 ( )
2 2
where χ α/2 and χ(1−α/ 2) are the critical values of the chi-square distribution
using (n – 1) degree of freedom.
tables, one can read that the 95% and 5% confidence limits for χ (24) are2
( n - 1) s2 £ s 2 £ ( n - 1) s2
2
ca c 21 - a / 2
/2 ( )
( 25 - 1) 36002 ( 25 - 1) 36002
£s2 £
36.415 13.848
2922.59 ≤ σ ≤ 4739.30
Hence, the limits of the standard deviation are 2,922.59 kPa2 and 4,739.30 kPa2.
Parameter Estimation 351
σ
mx ~ N (μ, ) (8.87)
n
Equation 8.87 can be used to obtain confidence limits for μ provided that σ is
known. One can write
σ
μ = mx + Z , where Z ~ N(0,1) (8.88)
n
Equation 8.88 shows that if the confidence limits of Z are known then those of
μ can be calculated. The confidence limits for Z can be obtained from the table of
the normal distribution. Then, substituting these in Eq. 8.88, one obtains the confi-
dence limits for μ. In this calculation σ is not known and is replaced by the sample
standard deviation s. Therefore, one defines a new variable T such that
s
μ = mx + T (8.89)
n
which is analogous to Eq. 8.88, with the relationship between T and Z as
σ
T= Z (8.90)
s
It is known that
σ2 n−1
2
= 2
s χ (n − 1)
Therefore,
Z ( n − 1)
T= (8.91)
χ ( n − 1)
2
Example 8.15 Calculate the 90% confidence limits for the mean μ in Example 8.6
of the 25 test cylinders for which mx = 34,000 kPa and s = 3,600 kPa.
352 Risk and Reliability Analysis
0.4
n=2
0.35 n=10
n=50
0.3
Probability density
0.25
0.2
0.15
0.1
0.05
0
-3 -2 -1 0 1 2 3
x
i h di ib i l d l d d
Figure 8-8 The T distribution for selected values of degrees of freedom.
s
μ = mx + tn − 1
n
= 34,000 + (3,600/5)t24
= 34,000 + 720t24
The 90% confidence limits for the t distribution with 24 degrees of freedom
are ±1.711. Substitution in the preceding equation gives confidence limits as
32,768 kPa and 35,232 kPa.
Example 8.16 For the Sabarmati River data (Example 8.1), the mean and stan-
dard deviation were estimated as 664.29 m3/s and 346.91 (m3/s)2 using 98 sam-
ples. Determine 95% confidence limits for the mean.
Solution As before, we write
s
μ = mx + tn − 1
n
or
μ = 664.29 + 346.91/(98)0.5 × t97
For 98 degrees of freedom, the 95% confidence limits for the t distribution are
±1.96. Hence, μ = 664.29 ± 346.91/(98)0.5 × 1.96 = 733.61 and 594.9 m3/s.
Parameter Estimation 353
8.4 Questions
8.1 Peak annual flow data of Buckhorn Creek observed at USGS station
#02102192 near Cornith, North Carolina, are listed in Table Q8-1. Find
the annual peak flow characteristics of the sample data and determine
the candidate distribution(s) using the method of moments.
Table Q8-1
Year 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983
Flow 1530 3130 890 2150 891 1680 2820 1740 951 58 129 470
(cfs)
Year 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
Flow 470 781 319 766 889 114 562 328 216 390 770 453
(cfs)
Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Flow 401 1940 480 1190 913 314 828 347 982 200 284 750
(cfs)
1 ⎛ x − ε⎞
f ( x ) exp⎜− ⎟, ε < x <∞
α ⎝ α ⎠
Estimate the parameters of this distribution using the data of Question 8.1
and the method of maximum likelihood estimation.
8.5 Using the data of Question 8.1, find the parameters of a two-parameter
exponential distribution using the PWM method.
8.6 Using the data of Question 8.1, find the parameters of the exponential
distribution using the L-moment method.
8.7 Consider the data of Table Q8-7, in which x is the observed temperature
value and y is the model simulated temperature value for a stream. The
relationship between observed and model simulated temperatures can
354 Risk and Reliability Analysis
Table Q8-7
x y x y x y x y
46.00 41.7 61 56.4 73.4 69.6 78.3 79.2
48.20 52.4 61.9 59.3 74.5 71.1 78.6 78.1
52.70 50.8 63.3 60.3 75 81.2 79 82.4
55.00 54 66.2 60.4 75.7 74.4 79.2 80.5
56.70 55.3 67.6 63.6 76.1 77.8 79.5 82.4
59.20 53.1 69.3 69.5 77 78.7 80.4 82
59.90 56.5 69.6 69.1 77 75.2 80.4 76
59.90 46.9 70.5 60.7 77 75.6 81.9 78.5
60.6 55.7 72.1 69.8 77.5 81.4 82 80.9
61 61.4 72.9 71.4 77.9 77.4 85.3 83.3
Table Q8-13
Day Observed Day Observed Day Observed Day Observed
DO DO DO DO
(mg/L) (mg/L) (mg/L) (mg/L)
1 3.3 9 7.7 17 8.4 25 9.4
2 6.6 10 7.8 18 8.5 26 9.6
3 6.8 11 7.8 19 8.5 27 9.6
4 6.8 12 7.9 20 8.9 28 10.2
5 7.5 13 8.1 21 9.1 29 10.2
6 7.5 14 8.2 22 9.2 30 10.4
7 7.7 15 8.1 23 9.2 31 10.5
8 7.7 16 8.2 24 9.3 32 8.2
8.15 For the daily DO concentration data of Question 8.13 determine the 95%
confidence interval for the variance of the daily DO concentration.
8.16 For the peak annual flow of Buckhorn Creek given in Question 8.1 esti-
mate parameters of the two-parameter gamma distribution function
using the method of moments and maximum likelihood method and
compare the two sets of parameter estimates.
Chapter 9
Environmental and water resource systems are inherently spatial and complex,
and our understanding of these systems is less than complete. Many of the sys-
tems are either fully stochastic or part stochastic and part deterministic. Their
stochastic nature can be attributed to randomness in one or more of the follow-
ing components that constitute them: (1) system structure (geometry), (2) system
dynamics, (3) forcing functions (sources and sinks), and (4) initial and boundary
conditions. As a result, a stochastic description of these systems is needed, and
entropy theory enables development of such a description.
Engineering decisions concerning environmental and water resource sys-
tems are frequently made with less than adequate information. Such decisions
may often be based on experience, professional judgment, rules of thumb, crude
analyses, safety factors, or probabilistic methods. Usually, decision making
under uncertainty tends to be relatively conservative. Quite often, sufficient data
are not available to describe the random behavior of such systems. Although
probabilistic methods allow for a more explicit and quantitative accounting of
uncertainty, their major difficulty occurs because of the lack of sufficient or com-
plete data. Small sample sizes and limited information render estimation of
probability distributions of system variables with conventional methods diffi-
cult. This problem can be alleviated by use of entropy theory, which enables
determination of the least-biased probability distributions with limited
356
Entropy Theory and Its Applications in Risk Analysis 357
knowledge and data. Where the shortage of data is widely rampant, as is nor-
mally the case in developing countries, entropy theory is particularly appealing.
Since the development of entropy theory by Shannon in the late 1940s and of
the principle of maximum entropy by Jaynes in the late 1950s, there has been a
proliferation in application of entropy. The real impetus to entropy-based mod-
eling in environmental and water resources was however provided in the early
1970s, and a great variety of entropy-based applications have since been
reported and new applications continue to unfold. The objective of this chapter
is to briefly discuss entropy theory and demonstrate its usefulness for modeling
and risk analysis in water resources and environmental systems.
Ni Ni
H = − kN ∑ ln = − k* ∑ pi ln pi (9.1a)
i N N i
where H is the entropy of the system, pi is the fraction of particles in energy state
i, N is the total number of particles in the system, Ni is the number of particles in
energy state i, and k* is Boltzmann’s constant.
If there were no energy loss, a hydraulic system would be orderly and orga-
nized. It is the energy loss and its causes that make the system disorderly and
chaotic. Thus, entropy can be interpreted as a measure of the amount of chaos
within a system. Algebraically, it is proportional to the logarithm of the probabil-
ity of the state the system is in. The constant of proportionality is the Boltzmann
constant and this defines the Boltzmann entropy or statistical entropy.
Shannon (1948) developed entropy theory for expression of information or
uncertainty in communication. He expressed the average information conveyed
Entropy Theory and Its Applications in Risk Analysis 359
H = − kN ∑ p j ln p j = − k* ∑ p j ln p j
(9.1b)
j j
n
P(X ) = ( p1 , p2 ,..., pn ); ∑ pi = 1 ; pi ≥ 0, i = 1, 2,..., n (9.1c)
i
If this experiment is repeated, the same outcome is not likely, implying that
there is uncertainty as to the outcome of the experiment. Based on one’s knowl-
edge about the outcomes, the uncertainty can be more or less. For example, the
total number of outcomes is a piece of information and the number of those out-
comes with nonzero probability is another piece of information. The probability
distribution of the outcomes, if known, provides a certain amount of informa-
tion. Shannon (1948) defined a quantitative measure of uncertainty associated
with a probability distribution or the information content of the distribution in
terms of entropy, H(P) or H(X), called Shannon entropy or informational entropy
(with k* taken as unity) as
n
H (X ) = H (P) = − ∑ pi ln pi = E [− ln p] (9.2)
i =1
∞
H (X ) = − ∫ f ( x)ln[ f ( x)] dx = − ∫ ln[ f ( x)] dF( x) = E [− ln f ( x)] (9.3)
0
where f(x) is the probability density function of X, F(x) is the cumulative proba-
bility distribution function of X, and E [.] is the expectation of [.].
Thus, entropy is a measure of the amount of uncertainty represented by the
probability distribution and is a measure of the amount of chaos or of the lack of
information about a system. If complete information is available, entropy = 0.
Otherwise, it is greater than zero. The uncertainty can be quantified using
entropy by taking into account all different kinds of available information. The
Shannon entropy is the weighted Boltzmann entropy.
360 Risk and Reliability Analysis
Scheme 1 Scheme 2
R0 R1 R0 R1
Probability 0.5 0.5 0.9 0.1
The maximum entropy in the presence of some information will be less than
the maximum entropy in the absence of that information. The difference
between these two maximum entropies may be regarded as a measure of the
bias resulting from the given information. Maximizing entropy amounts to min-
imizing this bias. For this reason, POME is said to give the minimally biased
assignment of probabilities and POME may be called the principle of minimum
bias or minimum prejudice.
If no information is available on the random variable, then all possible out-
comes are equally likely, that is, pi = 1/n, i = 1, 2, 3,… , n. It can be shown that the
Shannon entropy is maximum in this case and may indeed serve as an upper
bound of entropy for all cases involving some information. In a more general
case, let the information available about P or X be
n
pi ≥ 0, ∑ pi = 1 (9.4)
i =1
and
n
∑ gr (xi ) pi = ar r = 1, 2,..., m (9.5)
i =1
mean annual flood). As usual, f(x) is a possible function for every x in some
interval (a, b) and is normalized to unity such that
∫ f (x)dx = 1 (9.7)
a
b
I[ f ] = I[x] = − k ∫ f ( x)ln[ f ( x )/ m( x )]dx (9.8)
a
b b
I[ f ] = I[x] = − ∫ f ( x)ln[ f ( x)]dx ; ∫ f (x)dx = 1 (9.9)
a a
We may think of I[f] as the mean value of –ln[f(x)]. Actually, –I measures the
strength and +I measures the weakness. SEF allows us to choose the f(x) that
minimizes the uncertainty. Note that f(x) is conditioned on the constraints used
for its derivation. Singh (1988, 1998a, 1998b) has described the theory of entropy
and has given expressions of SEF for a number of probability distributions.
Shannon (1948) showed that I is a unique function and the only one that sat-
isfies the following properties:
1. It is a function of the probabilities f1, f2,… , fn, where n is the number of
data points.
2. It follows an additive law, that is, I[xy] = I[x] + I[y].
3. It monotonically increases with the number of outcomes when fi are all
equal.
4. It is consistent and continuous.
According to POME, “the minimally prejudiced assignment of probabilities is
that which maximizes entropy subject to the given information.” Mathematically,
it can be stated as follows: Given m linearly independent constraints C in the form
b
Ci = ∫ yi ( x ) f ( x )dx , i = 1, 2,..., m (9.10)
a
Entropy Theory and Its Applications in Risk Analysis 363
where yi(x) are some functions whose averages over f(x) are specified, the maxi-
mum of I, subject to the conditions in Eq. 9.10, is given by the distribution
m
f ( x) = exp[− λ0 − ∑ λi yi ( x)] (9.11)
i =1
where λi, i = 0, 1,…, m, are Lagrange multipliers and can be determined from
Eq. 9.10 and Eq. 9.11 along with the normalization condition in Eq. 9.9. An
increase in the number of constraints leads to less uncertainty about the infor-
mation concerning the system.
b m b
I[ f ] = λ0 ∫ f ( x)dx + ∑ λi ∫ yi ( x ) f ( x )dx
a i =1 a
or
m
I [ f ] = λ0 + ∑ λi C (9.12)
i =1
b m
∫ exp[− λ0 − ∑ λi yi ] dx = 1 (9.13)
a i =1
resulting in
b m
λ0 = ln ∫ exp[− ∑ λi yi ] dx (9.14)
a i =1
The Lagrange multipliers are related to the given information (or con-
straints) by
∂λ0
− = Ci (9.15)
∂λi
364 Risk and Reliability Analysis
∂ 2 λ0 ∂ 2 λ0
= var[ yi ( x)]; = cov[ yi ( x), y j ( x)], i ≠ j (9.16)
∂λi2 ∂ λi ∂ λ j
With the Lagrange multipliers estimated from Eq. 9.15 and Eq. 9.16, the fre-
quency distribution given by Eq. 9.11 is uniquely defined. It is implied that the
distribution parameters are uniquely related to the Lagrange multipliers.
Clearly, this procedure states that a frequency distribution is uniquely defined
by specification of constraints and application of POME.
∞
I( f ) = ∫ f ( x ; θ)ln f ( x , θ)dx (9.17)
−∞
∂I [ f ]
= 0, i = 1, 2,...., n − 1 (9.18)
∂λi
and
∂I [ f ]
= 0, i = 1, 2,...., n (9.19)
∂θ i
Solution of Eq. 9.18 and Eq. 9.19 yields estimates of parameters of the
distribution.
Entropy Theory and Its Applications in Risk Analysis 365
n pi
D(P , Q) = ∑ p i ln (9.20)
i =1
qi
n pi ⎛ n ⎞
D(P , U ) = ∑ p i ln [ ] = ln n ⎜ ∑ p i ln p i⎟ (9.21)
i=1
1/ n ⎝i=1 ⎠
If the two random variables are dependent then the Shannon entropy of the
joint distribution is the sum of the marginal entropy of one variable and the con-
ditional entropy of the other variable conditioned on the realization of the first.
Expressed algebraically, this is
H (X , Y ) = H (X ) + H (Y X ) (9.23)
n m
H (X |Y ) = − ∑ ∑ p(xi , y j )ln ⎡⎣ p (xi |y j )⎤⎦ (9.24)
i =1 j =1
It is seen that if X and Y are independent then Eq. 9.23 reduces to Eq. 9.22.
Furthermore, the joint entropy of dependent X and Y will be less than or equal to
the joint entropy of independent X and Y, that is, H(X, Y) ≤ H(X) + H(Y). The dif-
ference between these two entropies defines transinformation T(X, Y) or T(P, Q)
expressed as
T (X , Y ) = H (X ) + H (Y ) − H (X , Y ) (9.25)
T (X , Y ) = H (Y ) − H(Y X ) (9.26)
n m
exp( λ0 ) = Z = ∑ exp[ − ∑ λ j g j ( xi )] (9.27)
i =1 j =1
Entropy Theory and Its Applications in Risk Analysis 367
where Z is called the partition function and λ0 is the zeroth Lagrange multiplier.
The Lagrange parameters are obtained by differentiating Eq. 9.27 with respect to
Lagrange multipliers:
∂λ0
= − a j = E[ g j ], j = 1, 2, 3,..., m
∂λ j
∂ 2 λ0
= var[ g j ] (9.28)
∂λ2j
∂ 2 λ0
= cov[ g j , g k ]
∂λ j ∂λ k
∂ 3 λ0
= − μ3 [ g j ]
∂λ3j
where E[.] is the expectation, var[.] is the variance, cov[.] is the covariance, and
μ3 is the third moment about the centroid, all for gj.
When there are no constraints, then POME yields a uniform distribution. As
more constraints are introduced, the distribution becomes more peaked and pos-
sibly skewed. In this way, the entropy reduces from a maximum for the uniform
distribution to zero when the system is fully deterministic.
The derivation of the normal distribution by the entropy method is
described in the following.
1 ⎡ ( x − a )2 ⎤
f ( x) = exp ⎢ − ⎥ (9.29)
b 2π ⎣ 2b 2 ⎦
Taking the logarithm to the base e, one gets
( x − a )2
ln f ( x) = − ln 2 π − ln b −
2b 2
(9.30)
x2 a2 2ax
= − ln 2 π − ln b − − +
2b 2 2b 2 2b 2
Multiplying Eq. 9.30 by [− f(x)] and integrating between − ∞ to ∞ , one gets
∞ ∞
a2
I ( x) =− ∫ f ( x)ln f ( x)dx = [ln 2 π + ln b + 2
] ∫ f (x)dx (9.31)
−∞ 2b −∞
∞ ∞
1 a
+
2b 2
∫ x 2 f (x)dx − b2 ∫ xf (x)dx
−∞ −∞
368 Risk and Reliability Analysis
From Eq. 9.31, the constraints appropriate for Eq. 9.29 can be written as
∫ f ( x)dx = 1
−∞
∞
∫x
2
f ( x)dx = E[x 2 ] = sx2 + x 2
−∞
where λ0, λ1, and λ2 are Lagrange multipliers. Substitution of Eq. 9.33 in the nor-
mality condition in the first of Eq. 9.32 gives
∞ ∞
∫ exp(− λ1x − λ2 x
2
exp( λ0 ) = )dx (9.35)
−∞
Equation 9.35 defines the partition function. Making the argument of the
exponential as a square in Eq. 9.35, one gets
∞
λ12 λ2
exp( λ0 ) = ∫ exp(− λ1x − λ2 x 2 + − 1 )dx
4 λ2 4 λ2
−∞
(9.36)
⎛ λ2 ⎞ ∞ λ
= exp ⎜ 1 ⎟ ∫ exp− ( x λ2 + 1 )2 dx
⎝ 4 λ2 ⎠ −∞ 2 λ2
Now let
λ1
t = x λ2 + (9.37)
2 λ2
Entropy Theory and Its Applications in Risk Analysis 369
Then
dt
= λ2 (9.38)
dx
Making use of Eqs. 9.37 and 9.38 in Eq. 9.36, we get
⎛ λ2 ⎞ ⎛ λ2 ⎞
exp ⎜ 1 ⎟ ∞ 2 exp ⎜ 1 ⎟ ∞
⎝ 4 λ2 ⎠ ⎝ 4 λ2 ⎠
∫ exp(−t 2 )dt = ∫ exp(−t
2
exp( λ0 ) = )dt (9.39)
λ2 −∞
λ2 0
∫ exp(−t
2
)dt
0
Let k = t2. Then [dk/dt] = 2t and t = k0.5. Hence, this expression can be simpli-
fied by making substitution for t to yield
∞ ∞ ∞ ∞
dk 1 −0.5 1 π
∫ exp(−t )dt = ∫ exp(−k )
2
= ∫ k exp(− k )dk = ∫ k [0.5−1] exp(− k )dk = (9.40)
0 0
2k 0.5 20 20 2
⎛ λ2 ⎞
2 exp ⎜ 1 ⎟
⎝ 4 λ2 ⎠ π ⎛ λ2 ⎞ π
exp( λ0 ) = = exp ⎜ 1 ⎟ (9.41)
λ2 2 ⎝ 4 λ2 ⎠ λ2
1 1 λ2
λ0 = ln π − ln λ2 + 1 (9.42)
2 2 4 λ2
One also obtains the zeroth Lagrange multiplier from Eq. 9.35 as
∫ exp(− λ1x − λ2 x
2
λ0 = ln )dx (9.43)
−∞
370 Risk and Reliability Analysis
∫ x exp(− λ1x − λ2 x
2
)dx ∞
∂λ0
= − −∞∞ = − ∫ x exp(− λ0 − λ1x − λ2 x 2 )dx
∂λ1
∫ exp(− λ1x − λ2 x −∞
2
)dx
−∞
∞
(9.44)
= − ∫ xf ( x )dx = − x
−∞
∫x
2
exp(− λ1x − λ2 x 2 )dx ∞
∂λ0
= − −∞∞ = − ∫ x 2 exp(−
− λ0 − λ1x − λ2 x 2 )dx
∂λ 2
∫ exp(− λ1x − λ2 x −∞
2
)dx
−∞
∞
(9.45)
= − ∫ x f ( x)dx =
2
−(sx2 +x ) 2
−∞
Differentiating Eq. 9.42 with respect to λ1 and λ2, respectively, one obtains
∂λ0 2 λ1 λ
= = 1 (9.46)
∂λ1 4 λ2 2 λ2
∂λ0 1 λ2
=− − 12 (9.47)
∂λ 2 2 λ2 4 λ2
Equating Eq. 9.44 to Eq. 9.46 and Eq. 9.45 to Eq. 9.47, one gets
λ1
= −x (9.48)
2 λ2
2
1 1⎛ λ ⎞
+ ⎜ 1 ⎟ = sx2 + x 2 (9.49)
2 λ2 4 ⎝ λ2 ⎠
λ1 = −2 λ2 x (9.50)
Entropy Theory and Its Applications in Risk Analysis 371
1 1 4 λ22 x 1 1
+ = sx2 + x 2 ⇒ = sx2 ⇒ λ2 = 2 (9.51)
2 λ2 4 λ2 2 2 λ2 2sx
1 x
λ1 = −2 x=− (9.52)
2sx2 sx2
1 1 λ2
f ( x ) = [− ln π + ln λ2 − 1 − λ1x − λ2 x 2 ]
2 2 4 λ2
λ12
= exp[ln( π)−0.5 + ln( λ2 )0.5 − − λ1x − λ2 x 2 ] (9.53)
4 λ2
λ12
= ( π)−0.5 ( λ2 )0.5 exp[− − λ1x − λ2 x 2 ]
4 λ2
λ1 = − a/b2 (9.54)
λ2 = 1/(2b2) (9.55)
a=x (9.56)
b = sx (9.57)
372 Risk and Reliability Analysis
∞ ∞ ∞
⎛ x2 ⎞ 1 x
I ( x) = ⎜ ln 2 π + ln sx + 2 ⎟ ∫ f ( x )dx + 2 ∫ x f ( x)dx − 2 ∫ xf ( x)dx
2
⎝ 2sx ⎠ −∞ 2sx −∞ sx −∞
⎛ x2 ⎞ 1 x2 (9.58)
= ⎜ ln 2 π + ln sx + 2 ⎟ + 2 ( x 2 + sx2 ) − 2
⎝ 2 sx ⎠ 2 sx sx
= ln[sx (2 π e )0.5 ]
∞
⎛ xa ⎞ ⎛ xa ⎞ a ax
∫ ⎜⎝ b2 ⎟⎠ f (x)dx = E ⎜⎝ b2 ⎟⎠ = b2 E(x) = b2 (9.59)
−∞
∞
⎛ x2 ⎞ ⎛ x 2 ⎞ sx2 + x 2
∫ ⎜⎝ 2b2 ⎟⎠ f ( x )dx = E ⎜ 2⎟ =
⎝ 2b ⎠ 2b 2
(9.60)
−∞
The PDF corresponding to POME and consistent with Eqs. 9.32, 9.59, and
9.60 takes the form
xa x2
f ( x ) = exp[− λ0 − λ1 − λ2 ] (9.61)
b2 2b 2
where λ0, λ1, and λ2 are Lagrange multipliers. Insertion of Eq. 9.61 into Eq. 9.32
yields
∞
⎛ xa x2 ⎞ b 2π ⎛ a2 λ12 ⎞
exp( λ0 ) = ∫ ⎜⎝ 1 b2 2 2b2 ⎟⎠
exp − λ − λ dx =
λ2
exp ⎜ ⎟
⎝ 2 λ2 b 2 ⎠
(9.62)
−∞
Equation 9.62 is the partition function. Taking the logarithm of Eq. 9.62 leads
to the zeroth Lagrange multiplier, which can be expressed as
a2 λ12
λ0 = ln b + 0.5 ln(2 π) − 0.5 ln λ2 + (9.63)
2 λ2 b 2
Entropy Theory and Its Applications in Risk Analysis 373
∞
xa x2
λ0 = ln ∫ exp[− λ1
b2
− λ 2 (
2b 2
)]dx (9.64)
−∞
λ2 ⎡ a2 λ1 λ1 xa λ2 x 2 ⎤
f ( x) = exp ⎢ −( 2
+ + )⎥ (9.65)
b 2π ⎣ 2 λ2 b b2 2b 2 ⎦
A comparison of Eq. 9.65 with Eq. 9.29 shows that λ2 = 1 and λ1 = –1.
Taking the logarithm of Eq. 9.65 and multiplying by [− 1], one gets
1 1 a2 λ12 λ1 xa λ2 x 2
− ln f ( x) = − ln λ2 + ln b + ln(2 π) + + + (9.66)
2 2 2 λ2 b 2 b2 2b 2
Multiplying Eq. 9.66 by f(x) and integrating from minus infinity to positive
infinity, we get the entropy function of the form
1 1 a2 λ12 λa λ
I ( f ) = − ln λ2 + ln b + ln(2 π) + 2
+ 12 E[x] + 22 E[x 2 ] (9.67)
2 2 2 λ2 b b 2b
∂I 2a2 λ1 a
=0= + 2 E[x] (9.68)
∂λ1 2 λ2 b 2
b
∂I 1 a 2 λ2 1
=0=− − 2 12 + 2 E[x 2 ] (9.69)
∂λ 2 2 λ2 2 λ2 b 2b
∂I 2a λ12 λ1
=0=− + E[x] (9.70)
∂a 2b λ2 b 2
∂I 1 2a2 λ12 2a λ1 2λ
=0= − − 3 E[x] − 32 E[x 2 ] (9.71)
∂b b 2 λ2 b 3 b 2b
374 Risk and Reliability Analysis
E[x2] = a2 + b2 (9.73)
E[x] = a (9.74)
E[x2] = b2 + a2 (9.75)
Equations 9.72 and 9.74 are the same, and so are Eqs. 9.73 and 9.75. Thus
the parameter estimation equations are Eqs. 9.72 and 9.73.
1 ⎛ x ⎞ b −1 − x / a
f ( x) = ⎜ ⎟ e (9.76)
aΓ(b) ⎝ a ⎠
where a > 0 and b > 0 are parameters. The gamma distribution is a two-
parameter distribution. Its CDF can be expressed as
∞
1 ⎛ x ⎞ b −1 − x / a
F( x ) = ∫ ⎜⎝ ⎟⎠ e dx (9.77)
0
aΓ ( b ) a
Entropy Theory and Its Applications in Risk Analysis 375
F( y ) = F( X 2 v ) (9.79)
⎡ χ2 2 ⎤ 9v
u = ⎢( )1/ 3 + − 1⎥ ( )1/ 2 (9.80)
⎣ v 9v ⎦ 2
This helps us to compute F(x) for a given x by first computing y = x/a and
χ 2 = 2y and then inserting these values into Eq. 9.80 to obtain u. Given a value
of u, F(x) can be obtained from the normal distribution tables.
x
ln f ( x) = − ln aΓ(b) + (b − 1)ln x − (b − 1)ln a −
a
∞ ∞
I ( f ) = − ∫ f ( x)ln f ( x )dx = [ln aΓ(b) + (b − 1)ln a]∫ f ( x )dx
0 0
∞ ∞
1
−(b − 1) ∫ [ln x] f ( x )dx + ∫ xf ( x )dx (9.82)
0
a0
From Eq. 9.82 the constraints appropriate for Eq. 9.76 can be written (Singh
et al. 1985, 1986) as
∞
∫ f (x) = 1 (9.83)
0
376 Risk and Reliability Analysis
∫ xf (x)dx = x (9.84)
0
f ( x ) = exp[− λ0 − λ1 x − λ2 ln x] (9.86)
where λ0, λ1, and λ2 are Lagrange multipliers. Substitution of Eq. 9.86 in Eq. 9.83
yields
∞ ∞
∞ ∞
exp( λ0 ) = ∫ exp[− λ1x − λ2 ln x]dx = ∫ exp[− λ1x]exp[− λ2 ln x]dx
0 0
∞
= ∫ exp[− λ1x]exp[ln x − λ2 ]dx
(9.88)
0
∞ ∞
y − λ2 dy 1 − λ2 − y 1
exp( λ0 ) = ∫ ( ) exp(− y ) = 1− λ ∫y e dy = Γ(1 − λ2 ) (9.89)
0
λ1 λ1 λ1 2 0
λ11− λ2
∞
λ0 = ln ∫ exp[− λ1x − λ2 ln x]dx
(9.91)
0
Entropy Theory and Its Applications in Risk Analysis 377
∂λ
∫ x exp[− λ1x − λ2 ln x]dx ∞
= − 0∞ = − ∫ x exp[ − λ0 − λ1x − λ2 ln x ] dx
∂λ1
∫ exp[− λ1x − λ2 ln x]dx 0
0
∞
= − ∫ xf ( x )dx = − x (9.92)
0
∂λ0
∫ ln x exp[− λ1x − λ2 ln x]dx
=−0 ∞
∂λ 2
∫ exp[− λ1x − λ2 ln x]dx
0
∞ ∞
= − ∫ ln x exp[− λ0 − λ1x − λ2 ln x]dx = − ∫ ln xf ( x )dx = −E[ln x] (9.93)
0 0
∂λ0 λ 2 − 1
= (9.94)
∂λ1 λ1
∂λ0 ∂
= ln λ1 + Γ(1 − λ2 ) (9.95)
∂λ 2 ∂λ 2
Let 1 − λ2 = k. Then
∂k
= −1 (9.96)
∂λ 2
∂λ0 ∂ ∂k
= ln λ1 + Γ(k ) = ln λ1 − ψ(k ) (9.97)
∂λ 2 ∂k ∂λ 2
From Eq. 9.92 and Eq. 9.94 as well as Eqs. 9.93 and 9.96 and 9.97, one gets
λ2 − 1 k
= −x ; x = (9.98)
λ1 λ1
We can find the value of k (= 1 − λ2) from Eq. 9.100 and substitute it in Eq. 9.98
to get λ1.
If λ2 = 1 − k then
λ1k
f ( x) = exp[− λ1 x]x k −1 (9.102)
Γ( k )
and
λ2 = 1 − b (9.104)
ba = x (9.105)
x
= [ln aΓ(b) + ln ab −1 ] − (b − 1)E[ln x] +
a
x
= ln ⎡⎣ aΓ(b)ab −1 ⎤⎦ + − (b − 1)E[ln x]
a
x
= ln ⎡⎣Γ(b)ab ⎤⎦ + − (b − 1)E[ln x] (9.107)
a
∞
x b −1 x
∫ ln( a ) f ( x )dx = E[ln( )b −1 ] (9.109)
0
a
Also, from Eq. 9.112 one gets the zeroth Lagrange multiplier
∞ x x
λ0 = ln ∫ exp[− λ1 ( ) − λ2 ln( )b −1 ]dx (9.113)
0 a a
Introduction of Eq. 9.112 in Eq. 9.110 produces
1 1 x x
f ( x) = ( λ1 )1− λ2 (b −1) exp[− λ1 − λ2 ln( )b −1 ] (9.114)
a Γ[1 − λ2 (b − 1)] a a
Comparison of Eq. 9.114 with Eq. 9.76 shows that λ1 = 1 and λ2 = − 1. Taking
the logarithm of Eq. 9.114 yields
x x
ln f ( x ) = − ln a + [1 − λ2 (b − 1)]ln λ1 − ln Γ[1 − λ2 (b − 1)] − λ1 − λ2 ln( )b −1 (9.115)
a a
Multiplying Eq. 9.115 by [− f(x)] and integrating from 0 to ∞ yields the
entropy function of the gamma distribution. This can be written as
x x
I ( f ) = ln a − [1 − λ2 (b − 1)]ln λ1 + ln Γ[1 − λ2 (b − 1)] + λ1E[ ] + λ2 E[ln( )b −1 ] (9.116)
a a
∂I 1 x
= 0 = −[1 − λ2 (b − 1)] + E( ) (9.117)
∂λ1 λ1 a
∂I x
= 0 = +(b − 1)ln λ1 − (b − 1)Ψ(K ) + E[ln( )b −1 ], K = 1 − λ2 (b − 1) (9.118)
∂λ 2 a
∂I 1 λ x (1 − b)
= 0 = + − 1 E[ ] + λ2 (9.119)
∂a a a a a
∂I x
= 0 = + λ2 ln λ1 + E[ln( )b −1 ]− λ2 Ψ(K ) (9.120)
∂b a
Entropy Theory and Its Applications in Risk Analysis 381
x
E( ) = b (9.121)
a
x
E[ln( )] = Ψ(k ) (9.122)
a
x
E( ) = b (9.123)
a
Equation 9.121 is the same as Eq. 9.123. Therefore, Eq. 9.121 and Eq. 9.122 are
the parameter estimation equations.
function is defined for comparing two hypotheses. The evidence in favor of one
hypothesis over its competitor is the difference between the respective entropies
of the competition and the hypothesis under test. Defining surprisal as the nega-
tive of the logarithm of the probability, one can express the mean surprisal for a
set of observations. Therefore, the evidence function for two hypotheses is
obtained as the difference between the two values of the mean surprisal multi-
plied by the number of observations.
probability density function of failure at a value of the failure indicator then the
cumulative probability of the failure indicator defines fragility.
In PRA, one considers the probability of failure from loads exceeding design
loads. Through introduction of a hazard function, the probability density func-
tion of external loads is specified. Consider a hydraulic system with a set of ran-
dom parameters. The system can fail in many ways and every failure mode can
be described with a corresponding failure indicator involving a number of dif-
ferent failure modes, which is a function of the hazard parameters and the ran-
dom parameters. For example, in case of an earth dam, depending on the failure
mode, a failure indicator could be erosion at the bottom, reservoir water level,
water leakage, displacement, etc. Hazard parameters could be extreme rainfall,
reservoir level, peak discharge, depth of water at the dam top, etc. Structural
random parameters could be strengths of materials, degree of riprap, degree of
packing, internal friction, etc.
+w
1 1
H( f ) =
2
ln(2w) +
4w ∫ ln[W ( f )]df (9.126)
w
where w is the frequency band. Equation 9.126 is maximized subject to the con-
straint equations given as autocorrelations until log m:
+w
ρ(n) = ∫ W ( f ) exp(i 2 πfnΔt)df, m ≤ n ≤ + m (9.127)
w
where t is the sampling time interval and i = (− 1)1/2. Maximization of Eq. 9.127
is equivalent to maximizing
+w
H( f ) = ∫ ln[W ( f )]df (9.128)
w
which is known as the Burg entropy. The spectrum W(f) can be expressed in
terms of the Fourier series as
1 ∞
W( f ) = ∑ ρ(n) exp[i2πnf Δt]
2w n = ∞
(9.129)
MESA and found MESA to be superior. Eilbert and Christensen (1983) analyzed
annual hydrological forecasts for central California and found that dry years
might be more predictable than wet years. Dalezios and Tyraskis (1989)
employed MESA to analyze multiple precipitation time series.
the networks are gathering the needed information optimally. Entropy theory is a
natural tool to make that determination. Krstanovic and Singh (1992a,b)
employed the theory for space and time evaluation of rainfall networks in Louisi-
ana. The decision whether to keep or to eliminate a rain gauge was based entirely
on reduction or gain of information at that gauge. Yang and Burn (1994)
employed a measure of information flow, called directional information transfer
index (DIT), between gauging stations in the network. The value of DIT varies
from zero, where no information is transmitted and the stations are independent,
to one, where no information is lost and the stations are fully dependent. Between
two stations of one pair, the station with higher DIT value should be retained
because of its greater capability of inferring information at the other side.
9.5.14 Hydraulics
Yang (1994) showed that the fundamental theories in hydrodynamics and
hydraulics can be derived from variational approaches based on maximization
of entropy, minimization of energy, or minimization of energy dissipation rate.
Barbe et al. (1991) and Chiu and Murray (1992) applied POME to determine the
probability distribution of velocity in nonuniform open-channel flow. The
entropy-based velocity distribution fits experimental data very well and is of
great practical value in hydraulic modeling.
388 Risk and Reliability Analysis
Harmancioglu and Singh (1998) reviewed the advantages as well as the limita-
tions of the entropy method as applied to the design of water quality monitoring
networks. Given an observed change in water quality levels at a downstream
location, the entropy-based formulation predicts the probabilities of each possi-
ble water quality level at each of the upstream stations.
9.6 Closure
Entropy theory permits determination of the least-biased probability distribu-
tion of a random variable, subject to the available information. It suggests
whether the available information is adequate and, if not, then additional infor-
mation should be sought. In this way it brings the model, the modeler, and the
decision maker closer together. As an objective measure of information or uncer-
tainty, entropy theory allows us to communicate with nature, as illustrated by its
application to the design of data acquisition systems, the design of environmen-
tal and hydrologic networks, and the assessment of reliability of these systems
or networks. In a similar vein, it helps us to better understand physics or science
of natural systems, such as landscape evolution, geomorphology, and hydrody-
namics. A wide variety of seemingly disparate or dissimilar problems can be
meaningfully solved with the use of entropy.
390 Risk and Reliability Analysis
9.7 Questions
9.1 Take sample data of annual peak discharge from a gauging station on a
river near your town. Fit the gamma distribution to the discharge data.
Then using this distribution, determine the effect of sample size on the
value of the Shannon entropy.
9.2 Use the same gamma distribution as in Question 9.1. Changing the
parameter values determine the Shannon entropy and discuss the effect
of parameter variation.
9.3 Determine the effect of discretization on the Shannon entropy.
9.4 Determine the constraints for the gamma distribution needed for estima-
tion of its parameters using entropy. Then determine its parameters in
terms of the constraints.
9.5 Determine the constraints for the Pearson type III distribution needed
for estimation of its parameters using entropy. Then determine its
parameters in terms of the constraints.
9.6 Determine the constraints for the Weibull distribution needed for esti-
mation of its parameters using entropy. Then determine its parameters
in terms of the constraints.
9.7 Determine the constraints for the Gumbel distribution needed for esti-
mation of its parameters using entropy. Then determine its parameters
in terms of the constraints.
9.8 Determine the constraints for the three-parameter log-normal distribution
needed for estimation of its parameters using entropy. Then determine its
parameters in terms of the constraints.
9.9 Determine the constraints for the logistic distribution needed for estima-
tion of its parameters using entropy. Then determine its parameters in
terms of the constraints.
9.10 Determine the constraints for the Pareto distribution needed for estima-
tion of its parameters using entropy. Then determine its parameters in
terms of the constraints.
9.11 Determine the constraints for the log-Pearson type III distribution
needed for estimation of its parameters using entropy. Then determine
its parameters in terms of the constraints.
9.12 Take monthly discharge data for several gauging stations along a river.
Compute the marginal entropy of monthly discharge at each gauging
station and plot it as a function of distance between gauging stations.
What do conclude from this plot? Discuss it.
Entropy Theory and Its Applications in Risk Analysis 391
Uncertainty Analysis
393
Chapter 10
395
396 Risk and Reliability Analysis
Uncertainty and risk are pervasive features and go hand in hand in many
engineering systems. How to handle the risks often associated with uncertainty
comprises one of the most difficult aspects of analysis, planning, and manage-
ment of many civil engineering systems. Uncertainty is central to decision mak-
ing and risk assessment problems. Questions of safety or reliability in
environmental and water resources engineering arise principally because of the
presence of uncertainty. However much one may like, uncertainties cannot be
completely eliminated. At best, one can reduce them by better equipment, stan-
dard data-collection procedures, dense monitoring networks, and better models
and maintenance. Uncertainty analysis is performed to determine the statistical
properties of output as a function of statistical input parameters. This helps
determine the contribution of each input parameter to the overall uncertainty of
the model output and can be used to reduce the output uncertainty.
In most civil engineering–related projects, uncertainty analysis is the study of
model output uncertainty as a function of a careful inventory of the different sources
of uncertainty present in the model input parameters. Generally, the most frequent
questions addressed by uncertainty analysis are the following: What is the predic-
tion uncertainty resulting from all of the uncertainties in model inputs? How do
uncertain inputs contribute to model prediction uncertainty? What input parame-
ters need more data-collection effort? The objective of this chapter is to describe
these issues, detailing different types, sources, and measures of uncertainty.
1
Q= AR2 / 3 S1/ 2
n
Error and Uncertainty Analysis 397
1 ⎛ x − μ⎞ 2
1 − ⎜ ⎟
f ( x) = e 2⎝ σ ⎠ , –∞ < x < ∞
σ 2π
Model parameter uncertainty occurs because an inadequate parame-
ter estimation technique is used, inaccurate data are used for parameter
estimation, or both.
4. Data uncertainties arise from (i) measurement inaccuracy and errors,
(ii) inadequacy of the data gauging network, and (iii) data handling
and transcription errors.
5. Computational uncertainties arise from truncation and rounding off
errors in doing calculations.
6. Operational uncertainties are associated with construction, manufactur-
ing, deterioration, maintenance, and other human factors that are not
accounted for in the modeling or design procedure.
Uncertainty may also be classified into two categories:
1. Inherent or intrinsic—caused by randomness in nature.
2. Epistemic—caused by the lack of knowledge of the system or paucity of
data.
river at a given station, the time series of annual 24-hour maximum rainfall at a
gauging station, annual sediment yield of a watershed at its outlet, the time
series of the annual 7-day minimum ozone level in a given city, the time series of
annual 7-day low streamflow, and so on are examples of the inherent
uncertainty in time.
Parameter Uncertainty
Parameter uncertainty is caused by a lack of data, poor-quality data, or an inade-
quate method of parameter estimation. This type of uncertainty is widely preva-
lent in environmental and water resources analysis. Usually, the variance of
parameter estimation is proportional to 1/N, where N is the data length and the
precision is proportional to 1/N0.5 (Burges and Lettenmaier 1982). This means
that, to improve the precision of parameters by a factor of 2, the required data
length will have to be four times as long. But the data themselves may have
associated uncertainties; these could arise from measurement errors, inconsis-
tency, errors during data recording, and inadequate representation of the
variable owing to limited samples in spatial and temporal domains.
Distribution Uncertainty
It is not always clear which type of a probability distribution a particular environ-
mental random variable follows. For example, the annual maximum instanta-
neous discharge of a river can be described by the log-Pearson type 3
distribution, the three-parameter log-normal distribution, the Pearson type 3 dis-
tribution, or the generalized extreme value distribution. In many cases it is diffi-
cult to discern the exact type of the distribution the annual instantaneous
maximum discharge follows. Similar cases abound with a number of other
environmental variables.
Relative Frequency
Relative Frequency
True Value
Relative Frequency
True Value
B
A
Precision
C D
Accuracy
Figure 10-2 Measurement of rainfall by four raingages. Gage A is precise, inaccurate;
gage B is precise, accurate; gage C is imprecise, inaccurate; gage D is imprecise, accurate.
The innermost circle indicates the true value.
404 Risk and Reliability Analysis
probability p of the peak flow in the Narmada River exceeding 3,000 m3/s in any
given year is 2.5%. But there is an element of uncertainty in estimation of p,
which depends on the length and quality of data. This raises the following ques-
tion: What is the probability that p lies within a given range p ± Δp? In each case,
events can be analyzed such that reasonable people, using reasonable proce-
dures, come up with reasonably close probability assessments. Here the
objective is to rationally deal with uncertainty, not completely eliminate it.
To deal with uncertain events in the decision-making process, events that are
certain to occur, or conclusions that are certainly true, must be fully taken into
account. These can be given a weight of 1 (certain events). Impossible events, in
contrast, are disregarded in decisions and these are given the weight of 0. Any
in-between event is given a weight equal to the probability of its occurrence.
Thus, the more likely an event is, the more weight it gets and the greater is its
relative effect on the outcome or the decision. It is in this manner that the effec-
tive monetary value of the consequences of a decision was evaluated in
Chapter 1.
Evaluation of safety and reliability requires information on uncertainty,
which may be determined by the standard deviation or coefficient of variation.
Questions of safety or reliability arise principally from the presence of uncer-
tainty. Thus, an evaluation of the uncertainty is an essential part of the evalua-
tion of engineering reliability. The uncertainty resulting from random variability
in physical phenomena is described by a probability distribution function. For
practical purposes, its description may be limited to (a) a central tendency and
(b) its dispersion (e.g., standard deviation) or coefficient of variation.
To deal with uncertainty arising from prediction error (estimation error or
statistical sampling error and imperfection of the prediction model), one nor-
mally employs the coefficient of variation or the standard deviation, which rep-
resents a measure of the random error. In effect, the random error is involved
whenever there is a range of possible error. One source of random error is sam-
pling error, which is a function of the sample size. The random sampling error
can be expressed in terms of the coefficient of variation (CV) as Δ1 = CV / N ,
where N = sample size.
Consider, for example, the mean annual rainfall for Baton Rouge to be 60.00
inches. Conceivably, this estimate of the true mean value would contain error. If
the rainfall measurement experiment is repeated and other sets of data are
obtained, the sample mean estimated from the other sets of data would most
likely be different. The collection of all the sample means will also have a mean
value, which may well be different from the individual sample mean values, and
a corresponding standard deviation. Conceptually, the mean value of the collec-
tion of sample means may be assumed to be close to the true mean value (assum-
ing that the estimator is unbiased). Then, the difference (or ratio) of the estimated
sample mean (i.e., mean value of 60 inches) to the true mean is the systematic
error, whereas the coefficient of variation or standard deviation of the collection
of sample means represents a measure of the random error. In effect, random
406 Risk and Reliability Analysis
error is involved whenever there is a range of possible error. One source of ran-
dom error is the error from sampling, which is a function of the sample size.
The systematic error is a bias in the prediction or estimation and can be cor-
rected through a constant bias factor. The random error, called the standard
error, requires statistical treatment. It represents the degree of dispersiveness of
the range of possible errors. It may be represented by the standard deviation or
coefficient of variation of the estimated mean value. An objective determination
of the bias as well as the random error will require repeated data on the sample
mean (or medians), which are hard to come by.
For a random phenomenon, prediction or estimation is usually confined to
the determination of a central value (e.g., mean or median) and its associated
standard deviation or coefficient of variation. The uncertainty associated with
the error in the estimation of the degree of dispersion is of secondary impor-
tance, whereas the uncertainty resulting from error in the prediction of the cen-
tral value is of first-order importance. To summarize, through methods of
prediction one obtains
x = estimate of mean value
σx = estimate of the standard deviation
1 n
x= ∑ xi
n i =1
(10.1)
σx = σx / n (10.4)
The uncertainty associated with random sampling error is
Δx = σ x x (10.5)
This random error in x is limited to the sampling error only. There may, how-
ever, be other biases and random errors in x, such as the effects of factors not
included in the observational program.
Often, the information is expressed in terms of the lower and upper limits of
a variable. Given the range of possible values of a random variable, the mean
value of the variable and the underlying uncertainty may be evaluated by pre-
scribing a suitable distribution within the range. For example, if a random vari-
able X is assumed to be characterized by a uniform distribution with the lower
and upper limits of xl and xu, respectively, then using Eq. 5.93 and Eq. 5.94 gives
the mean, standard deviation, and CV of X as
1 1
x= ( xl + xu ) , σ x = ( xu − xl ) (10.6)
2 2 3
1 ⎛ xu − xl ⎞
CV = (10.7)
3 ⎜⎝ xu + xl ⎟⎠
where the PDF of the variable is uniform between xl and xu, as shown in Fig. 10-3.
Alternatively, let the PDF be given by a symmetric triangular distribution
with limits xl and xu, as shown in Fig. 10-4. By substituting a = xl, b = xu, and
c = (xu + xb)/2 in Eq. 5.133 and Eq. 5.134, the corresponding CV would be
1 ⎛ xu − xl ⎞
CV = (10.8)
6 ⎜⎝ xu + xl ⎟⎠
With either the uniform or the symmetric triangular distribution, it is implic-
itly assumed that there is no bias within the prescribed range of values for X.
However, if there is bias, a skewed distribution may be more appropriate. If the
bias is judged to be toward the higher values within the specified range, then the
upper triangular distribution as shown in Fig. 10-5 would be appropriate. In
such a case, by substituting a = xl and b = c = xu in Eq. 5.133 and Eq. 5.134, the
mean and CV of X can be determined as
1
x=
3
( xl + 2 xu ) (10.9)
1 ⎛ xu − xl ⎞
CV = (10.10)
2 ⎜⎝ 2xu + xl ⎟⎠
408 Risk and Reliability Analysis
F(x)
xl xu x
F(x)
xl xu x
Figure 10-4 Symmetric triangular probability density function between xl and xu.
F(x)
xl xu x
Figure 10-5 Upper triangular probability density function between xl and xu.
Conversely, if the bias is toward the lower range of values, the appropriate
distribution may be a lower triangular distribution as shown in Fig. 10-6. Substi-
tuting a = c = xl and b = xu in Eq. 5.133 and Eq. 5.134 gives the corresponding
mean and CV as
1
x=
3
( 2 xl + xu ) (10.11)
Error and Uncertainty Analysis 409
F(x)
xl xu x
Figure 10-6 Lower triangular probability density function between xl and xu.
1 ⎛ xu − xl ⎞
CV = (10.12)
2 ⎜⎝ xu + 2xl ⎟⎠
Another distribution may be a normal distribution, as shown in Fig. 10-7,
where the given limits may be assumed to cover ± 2σ from the mean value. In
such cases, the mean value is
1
x=
2
( xu + xl ) (10.13)
1 ⎛ x − xl ⎞
CV = ⎜ u (10.14)
2 ⎝ xu + xl ⎟⎠
The seemingly different types and sources of uncertainty can also be ana-
lyzed as follows. Let the true value of the variable be x and its prediction be given
as x̂ . Let there be a correction factor λ to account for error in x̂ . Therefore, the
true value x may be expressed as
x = λxˆ (10.15)
For random variable value X, the model X̂ should be a random variable.
The estimated mean value x̂ and variance σ 2x (e.g., from a set of observations)
are those of x̂ . Then, CV = σ x x represents the inherent variability. The neces-
sary correction λ may also be considered a random variable, whose mean value
“e” represents the mean correction for systematic error in the predicted mean
value x, whereas the CV of λ, Δ , represents the random error in the predicted
mean value x . If one assumes λ and x̂ to be statistically independent, the mean
value of X is
μx = ex (10.16)
410 Risk and Reliability Analysis
f(x)
xl xu x
2σ 2σ
Figure 10-7 Normal probability density function between xl and xu.
Y = g( x1 , x2 ,..., xn ) (10.18)
the mean value and associated uncertainty of Y are of concern. A model (or func-
tion) ĝ and a correction λg may be used, so
Y = λ g gˆ ( x1 , x2 ,..., xn ) (10.19)
Thus, λg has a mean value of eg and a CV of Δg. Using the first-order approxi-
mation gives the mean value of Y:
(
μy ≅ e g gˆ μx1 , μx2 ,..., μxn ) (10.20)
( )
where eg is the bias in gˆ μx1, μx 2 ,..., μxn and μxi = ei xi . Also, the total CV of Y is
1
Ω 2y = Δ 2g +
μ2g
∑ ∑ ρij ci c j σ xi σ xj (10.21)
i j
in which ci = ∂g ∂xi evaluated at ( μx1 , μx 2 ,..., μxn ) and ρij = correlation coeffi-
cient between xi and xj.
Error and Uncertainty Analysis 411
Example 10.2 Consider the mean annual rainfall for Baton Rouge, which is
given as 60 inches based on a sample of data. The mean rainfall estimated by the
arithmetic mean method is about 5% to 10% higher than the true mean. Taking
the sample standard deviation of 15 inches and the number of observations in
the sample as 25, compute the total random error in the estimated mean value.
Solution The corresponding CV is 15/60 = 0.25. Assume the random sampling
error (expressed in terms of CV) would be
Δ1 = 0.25/(25)0.5 = 0.05
The systematic error or bias may arise from factors not accounted for in the
prediction model that tends to consistently bias the estimate in one direction or
the other. For example, the mean rainfall estimated by the arithmetic mean
method may be about, say, 5% to 10% higher than the true mean, say, yielded by
the isohyetal method. With this information, a realistic prediction of the mean
rainfall requires a correction from 90% to 95% of the corresponding mean (arith-
metic) rainfall. If a uniform PDF between this range of correction factors is
assumed, then the systematic error in the estimated arithmetic mean rainfall of
60 inches will need to be corrected by a mean bias factor of
1
e = (0.9 + 0.95) = 0.925
2
whereas the corresponding random error in the estimated mean value,
expressed in terms of CV (see Eq. 10.7), is
2 2
⎛ ∂f ⎞ ⎛ ∂f ⎞
σ 2s = ⎜ 2
⎟ σK + ⎜
2
⎟ σQ (10.22)
⎝ ∂K p ⎠ ⎜⎝ ∂Q ⎟
p⎠
412 Risk and Reliability Analysis
where
∂f ∂f
=Q , =K
∂K p ∂Q p
2 2
σ S2 = Q σ 2K + K σ Q
2
(
Dividing both sides by (S)2 = K ⋅ Q
2 2
) one gets
2
σ S2 = σ K σ Q
2
2
+ 2
2
S K Q
Thus,
2
σS σ 2K σQ
CVS = = 2
+ 2
= CVK2 + CVQ2 (10.23)
S K Q
Example 10.4 The storage in a reservoir at the end of month t+1, St+1, can be
computed using the continuity equation
St+1 = St + It – Rt – Et (10.24)
E(St+1) = St + E(It) – Rt – E(Et) = 28.0 + 21.3 – 16.7 – 2.4 = 30.2 million m3 (10.25)
Hence,
SD(St+1) = 20.610.5 = 4.54 million m3
Example 10.5 In the previous example, both reservoir inflow and evaporation
depend upon the climate and hence may be correlated. From the analysis of
Error and Uncertainty Analysis 413
historical data, this was found to be indeed the case and the correlation was –0.25.
Determine the standard deviation of the expected storage at the end of the
month.
Solution The variance of St+1 can now be calculated as
var(St+1) = var(It) + var(Et) – 2 × ρ × SD(It) × SD(Et) (10.27)
Example 10.6 Consider the rational method for computing peak discharge,
Q = CIA, where Q = peak discharge in m3/s, C = rational runoff coefficient
(dimensionless), I = rainfall intensity in mm/hour, and A = drainage area in km2.
Assume that the variables C, I, and A are independent and that the errors in
them are independent and uncorrelated. Express the relative error (standard
deviation divided by mean) in the peak discharge as a function of errors in C, I,
and A.
Solution Since C, I, and A are independent and their errors are independent and
uncorrelated
2 2 2
⎛ ∂f ⎞ ⎛ ∂f ⎞ ⎛ ∂f ⎞
2 2 2 2
σQ =⎜ ⎟ σC + ⎜ ⎟ σI + ⎜ ⎟ σA (10.28)
⎝ ∂ C p⎠ ∂
⎝ p⎠I ⎝ ∂ A p⎠
where
∂f ∂f
=I A, =CA
∂C p ∂I p
and
∂f
= CI
∂A p
2 2 2 2 2 2 2
σQ = I A σ C2 + C A σ 2I + C I σ 2A
But we have
Q = CI A
414 Risk and Reliability Analysis
Thus,
2
σQ σ C2 σ I2 σ 2A
= 2
+ 2
+ 2
2
Q C I A
2
σQ σ C2 σ I2 σ 2A
ΕQ = 2
= 2
+ 2
+ 2
= CVQ2 = CVC2 + CVI2 + CVA2 (10.29)
Q C I A
where
∂f 1 ∂f QS
= and =−
∂Qs p
Q ∂Q p Q
2
2 2 2
σQ Qs σ Q
σ C2 = s
2
+ 4
(10.31)
Q Q
But
Qs
C=
Q
Thus,
2 2
σ C2 σQ σQ σ C2
2
σQ 2
σQ
= s
2
+ 2 and EC = = s
+ (10.32)
2 2 2 2
C Qs Q C Qs Q
and triangular portions. Let the total area (A) be expressed as A = x + y, where x
is the cross-sectional area of the rectangular part and y is the cross-sectional area
of the two triangular parts. Express the error in A as a function of errors in x and
y. Then, consider the following data: x (m2) = 100 ± 1.5 and y (m2) = 75 ± 0.5.
Compute the error in the area.
Solution Since x and y are independent and their errors are independent and
uncorrelated
2 2
⎛ ∂f ⎞ ⎛ ∂f ⎞
σ 2A =⎜ 2
⎟ σx + ⎜ ⎟ σ 2y (10.33)
⎝ ∂x p ⎠ ⎜⎝ ∂y ⎟
p⎠
where
∂f ∂f
=1, =1
∂x p ∂y p
σ 2A = σ 2x + σ 2y
But
A=x+y
Thus,
σ 2A σ 2x σ 2y
= +
A2 ( x + y )2 ( x + y )2
and
σ 2A σ 2x + σ 2y
EA = =
A2 ( x + y )2
σ 2A σ 2x + σ 2y 1.52 + 0.52
EA = = = = 9.035 × 10 -3
A2 ( x + y )2 (100 + 75)2
2 2
⎛ ∂f ⎞ ⎛ ∂f ⎞
σ 2Z =⎜ ⎟ x ⎜
σ 2
+ ⎟ σ 2y (10.34)
⎝ ∂x p ⎠ ⎜⎝ ∂y ⎟
p⎠
where
∂f ∂f
= 1 and = −1
∂x p ∂y p
σ 2Z = σ 2x + σ 2y
Thus,
σ 2Z σ 2x σ 2y
= +
z2 ( x − y )2 ( x − y )2
and
σ 2Z σ 2x + σ 2y
EZ = =
z2 ( x − y )2
Since
σx σy
Ex = = Ey = = 1.0%
x y
z = 2 ± (2 × 7.06) = 2 ± 14.12
Obviously, Ez is much larger than Ex or Ey.
as a function of the errors in I and Q. Then express the storage rate in terms of
the derived expression for a.
Solution Starting with the Muskingum equation S* = aI + (1– a)Q, one can write
2 2
⎛ ∂f ⎞ ⎛ ∂f ⎞
σ S2 =⎜ 2
⎟ σI + ⎜
2
⎟ σQ (10.35)
⎝ ∂I p ⎠ ⎜⎝ ∂Q ⎟
p⎠
where
∂f ∂f
= a and = 1− a
∂I p ∂Q p
σ S2 = a2 σ 2I + (1 − a)2 σ Q
2
dσ S2
= 2 aσ I2 − 2(1 − a)σ Q
2
=0
da
then gives
2
σQ
a=
σ 2I + σ Q
2
so
1
σ Q2 σ 2I
σ S2 =( ) 2
σ 2I +( ) 2 2
σQ = 1 + 1
σ 2I + σ Q
2
σ 2I + σ Q
2
σ 2I σ Q
2
and
σ Q2 σ 2I
S=( )I + ( )Q
σ 2I + σ Q
2
σ 2I + σ Q
2
Further, the system output or the overall result of an experiment is also uncer-
tain and hence can be represented by a random variable as well. When input
random variables of a given system, such as a model or an experiment associ-
ated with errors, are used in the calculation of overall system response, these
errors propagate from input to output. Quite often we are interested in knowing
how errors in mathematical models or instruments propagate throughout a
given system so that we can estimate the magnitude of the error associated in
the overall system response. Error propagation gives a valid estimate of the error
involved in the mathematical result.
Let us consider a univariate relationship Y = f(X) between a dependent variable
Y and an input variable X. Further, let us consider that σx is the uncertainty (error)
in x. How will the uncertainty associated with x be reflected in the uncertainty of y,
denoted as σy? Figure 10-8 explains the various terms and their physical meanings
in a graphical format. The functional relationship f(x) can be written as
y = f ( x ) = f [ x + ( x − x )] (10.36)
dy 1 d2 y 1 d3 y
f ( x) = f ( x ) + (x − x) + ( x − x )2 + ( x − x )3 + ... (10.37)
dx x=x 2 ! dx 2 3 ! dx 3
x=x x=x
Output y
y = f (x )
df
dx x = x
σy P
y
σx
x Input x
Now, truncating the series at the linear terms and solving for σy, we obtain
dy
y = f ( x) ≈ f ( x ) + (x − x) (10.38)
dx x = x
Using the formula var ⎡⎣y ⎤⎦ = E ⎡⎣ y 2 ⎤⎦ − y 2 and Eq. 10.38, one can write
2
⎛ dy ⎞ 2 dy
σ 2y =⎜ σ or σ y = σx (10.39)
⎝ dx ⎟⎠ x dx x=x
x=x
If the dependent variable Y depends upon several input variables X1, X2, …,
Xn, we can write the truncated form of the Taylor series expansion up to the lin-
ear terms as
n
⎛ ∂y ⎞
y = f ( x1 , x 2 , ..., xn ) + ∑ ( xi − xi ) ⎜ (10.40)
i =1 ⎝ ∂xi ⎟⎠ P
⎧⎛ 2 2 ⎫
⎪ ∂f ⎞ 2 ⎛ ∂f ⎞ 2 ⎪
σ 2y = ⎨⎜ E ⎡x
⎟ ⎣ 1 ⎦ ⎜− x ⎤ + E
⎟ ⎣ 1 ⎦⎡x − x ⎤ + ... ⎬
⎪⎩⎝ ∂x1 P ⎠ ⎝ ∂x 2 P ⎠ ⎪⎭
⎧⎪ ⎛ ∂f ⎞ ⎛ ∂f ⎞ ⎫
+ ⎨2 ⎜ ⎟⎜ ⎟ E ⎡⎣( x1 − x1 ) ( x2 − x2 )⎤⎦ + ... (10.41a)
∂x ∂x
⎩⎪ ⎝ 1 P ⎠ ⎝ 2 P ⎠ ⎭
Alternatively, Eq. 10.41a can be written as
420 Risk and Reliability Analysis
n n
⎛ ∂y ⎞ ⎛ ∂y ⎞
σ Y2 = ∑ ∑ ⎜ ⎟ ⎜ ⎟ cov ⎡⎣ xi x j ⎤⎦ (10.41b)
i = 1 j = 1 ⎝ ∂x i ⎠ P ⎝ ∂x j ⎠ P
Case 1
If the basic variables are statistically independent, the expression for var(Y)
becomes
2
n ⎡ ∂y ⎤
σ 2y = ∑⎢ σ xi ⎥ (10.42)
⎣ ∂x i
i =1 ⎢ P ⎥⎦
⎧n ⎡ 2 ⎫1 2
σy ⎪ 1 ∂y ⎤ ⎪
= ⎨∑ ⎢ σ xi ⎥ ⎬ (10.43)
y ⎪ i =1 ⎢⎣ y ∂xi ⎥⎦ ⎪⎭
⎩ P
1 ∂y
ei = σ xi (10.44)
y ∂x i P
Thus when input variables are uncorrelated, the total relative error is the
square root of the sum of squares of all individual relative errors.
Case 2
When input variables are correlated, one has to use Eq. 10.41. For the sake of
simplicity, let us consider only two input variables. Rewriting Eq. 10.41 for the
relationship y = f(x1, x2) in which x1 and x2 are correlated gives
Error and Uncertainty Analysis 421
2 2
⎛ ∂f ⎞ 2 ⎛ ∂f ⎞ 2 ⎛ ∂f ⎞ ⎛ ∂f ⎞
σ 2y =⎜ ⎟ σ x1 + ⎜ ⎟ σ x2 + 2 ⎜ ⎟⎜ ⎟ cov ( x1 x2 ) (10.46)
⎝ ∂x1 P ⎠ ⎝ ∂x 2 P ⎠ ⎝ ∂x1 P ⎠ ⎝ ∂x2 P ⎠
Now, using Eq. 10.46 and Eq. 10.47 one can conclude that when cov(x1, x2) is
negative, var(y) is smaller than in the uncorrelated case, whereas, when cov(x1, x2)
is positive, var(y) is larger than in the uncorrelated case.
Example 10.11 Consider the case of two independent variables x and y, with
z = f(x, y). Assume that the errors are independent.
Solution Using Eq. 10.42 gives the standard deviation of z as
2 2
⎛ ∂f ⎞ 2 ⎛ ∂f ⎞ 2
σz = ⎜⎝ ∂x ⎟⎠ σ x + ⎜ ∂y ⎟ σ y (10.48)
⎝ ⎠
If z = xy, then
σ z = ( y )2 σ 2x + ( x )2 σ 2y (10.49)
σ z = σ 2x + σ 2y (10.51)
Equation 10.51 shows why the methods of computation that are based only
on the water balance equation are not preferred in environmental analysis. An
example is the significant error obtained when evaporation from a lake or a
watershed is computed based on water balance alone. The errors in the estima-
tion of individual components will be accumulated in the estimate of
evaporation, making it highly unreliable.
Another case is z = xm yr. In this case,
Equation 10.52 shows that the error multiplies in a nonlinear case. If z = xm,
then CVz = mCVx . If z = cx, where c is constant, then CVz = CVx .
It is now possible to determine the best value of a quantity x from two or
more independent measurements whose errors may be different. Intuitively, the
measurement with less error should carry more weight. However, how exactly
the weighting should be done is not quite clear. To that end, the principle of
least-square error may be invoked. Consider two independent measurements of
X as x1 and x2, with their respective (plus or minus) errors as σ 1 and σ 2. It may
be reasonable to assume an estimate of X as
x1 x2
+
σ 12 σ 22
x12 = (10.54)
1 1
2
+ 2
σ1 σ 2
1
2
σ 12 =[ ]−1 (10.55)
σ 12 + σ 22
n
x
∑ ( σ 2i )
i =1 i
x= n
(10.56)
1
∑ ( σ2 )
i =1 i
and
−1/ 2
⎡ ⎤
⎢ 1 ⎥
σz = ⎢
n ⎥ (10.57)
⎢ ∑ σ i2 ⎥
⎢ i =1 ⎥
⎢⎣ ⎥⎦
Now consider the case when errors are correlated. Let E = A/(A + B), where A
and B are independent measurements, with their means and variances,
respectively, denoted as A, σ 2A and B , σ B2 . It can be shown that
Error and Uncertainty Analysis 423
(B)2 ( A)2
σ E2 = 2
σ 2A + 2
σ B2 (10.58)
( A + B) ( A + B)
σ z = σ 2x + σ 2y ± 2 σ x y (10.59)
σx y
CVz = CVx2 + CVy2 + 2 (10.60)
(x y )
If z = x/y, then
σx y
CVz = CVx2 + CVy2 − 2 (10.61)
xy
Equations 10.60 and 10.61 are similar, except for the sign of the covariance
term.
If z = x m y r , then it can be shown that
2mr σ x y
CVz = m2CVx2 + r 2 CVy2 + (10.62)
(x y )
Example 10.12 Consider the case of independent and uncorrelated errors. The
hydraulic radius R for a rectangular channel can be expressed as R = bh/[b + 2h],
where b = width and h = flow depth. Variables b and h can be considered inde-
pendent. Derive the error in R as a function of errors in b and h. Assume that the
means and standard deviations of b and h are known.
Solution Given that R = bh/[b + 2h] and b and h are independent, we have
Similarly,
Hence,
Thus,
2 2
⎛ ∂R ⎞ 2 ⎛ ∂R ⎞ 2
σ 2R =⎜ ⎟ σb + ⎜ ⎟ σh
⎝ ∂b p ⎠ ⎝ ∂h p ⎠
or
2 2
⎛ ⎞ ⎛ ⎞
2h 2 ⎟ σ b2 + ⎜ b2 ⎟ σ 2h
σR = ⎜
(
⎜ b + 2h ) ⎟ (
⎜ b + 2h ) ⎟
2 2
⎝ ⎠ ⎝ ⎠
Example 10.13 Consider the case described in Example 10.12. Let R = A/B,
where A = bh and B = b + 2h. Clearly, A and B are no longer independent. Derive
the error in R as a function of errors in A and B. Compare this error in R with that
derived in the previous example.
Solution Given that R = A/B = bh/(b + 2h) and A and B are dependent, we have
∂ R/∂A = 1/B, (∂ R/∂A)|p = 1/B = 1/(b + 2h)
and
σ B2 = σ b2 + 4σ h2
Thus,
σAB2 = AB − AB
= bh(b + 2h) − bh (b + 2 h )
= b 2 h + 2bh 2 − b 2 h − 2b h 2
(
= b 2 h + b 2 h − 2bh 2 + 2b h 2 )
= σ b2h +2bσ h2
Error and Uncertainty Analysis 425
Hence,
2 2
⎛ ∂R ⎞ 2 ⎛ ∂R ⎞ 2 ⎛ ∂R ⎞ ⎛ ∂R ⎞ 2
σR = ⎜ ⎟ σA + ⎜ ⎟ σB + 2⎜ ⎟⎜ ⎟ σ AB
⎝ ∂A p ⎠ ⎝ ∂B p ⎠ ⎝ ∂A p ⎠ ⎝ ∂B p ⎠
b 2 σ 2h + h 2 σ b2 b 2h 2 2 2 2bh ( σ b2 h + 2b σ 2h )
= + ( σ + 4 σ ) −
( )
b h
( b + 2 h )2 (b + 2 h )4 b + 2h
3
=
(4h 4 σ b2 + b 4 σ 2h )
(b + 2h )2
Note that the expression for σ R is the same for both cases. One reason is that,
although A and B are considered dependent on each other, linear operations are
involved in the computation of σ R.
1 2 / 3 1/ 2
V= R S
n
where n = Manning’s roughness factor, R = hydraulic radius, and S = slope.
Express the error in V as a function of error in R, n, and S.
Solution Given Manning’s equation, we have
∂V 1
= − 2 R2 / 3 S1/ 2
∂n n
and
∂V 1
=− R 2 / 3 S 1/ 2
∂n p n2
∂V 2 −1/ 3 1/ 2
= R S
∂R 3 n
∂V 2 −1/ 3 1/ 2
= R S
∂R p 3n
and
∂V 1 2 / 3 −1/ 2 ∂V =
1 2 / 3 −1/ 2
= R S , R S
∂S 2 n ∂S p 2n
426 Risk and Reliability Analysis
and so
2 2 2
⎛ ∂V ⎞ 2 ⎛ ∂V ⎞ 2 ⎛ ∂V ⎞ 2
σ V2 =⎜ ⎟ σn + ⎜ ⎟ σR + ⎜ ⎟ σS
⎝ ∂n p ⎠ ⎝ ∂R p ⎠ ⎝ ∂S p ⎠
2 2 2
⎛ 1 ⎞ ⎛ 2 −1/ 3 1/ 2 ⎞ 2 ⎛ 1 2 / 3 −1/ 2 ⎞ 2
= ⎜ − 2 R2 / 3S 1/ 2 ⎟ σ n2 + ⎜ S ⎟ σR + ⎜ R S σ
⎠⎟ S
R
⎝ n ⎠ ⎝ 3n ⎠ ⎝ 2n
⎞ ⎛σ σ2 ⎞
2 2
⎛1 4σ 2
= ⎜ R2 / 3S 1/ 2 ⎟ ⎜ n2 + R2 + S2 ⎟
⎝n ⎠ ⎝n 9R 4S ⎠
2
⎛1
2
⎞ σ n 4σ R σ S2
σ V = ⎜ R 2 / 3 S 1/ 2 ⎟ + +
⎝n ⎠ n 2 9R2 4S 2
Example 10.15 Suppose the bottom width (x) and sides (y) of a rectangular
channel have been measured independently. The measurements may be in error
by ± 0.5 m. Thus, the bottom measurement is x = (10 ± 0.5) m and the side mea-
surement is y = (6 ± 0.5) m. Find the best values of the area A and wetted perim-
eter P of the rectangular section and their standard deviations. Also, find the
covariance if their correlation is 0.3.
Solution For a rectangular channel, we have
A = xy
P = x +2y
Hence, the best values of these are
A = 10 × 6 = 60 m2
P = 10 + (2 × 6) = 22 m
Now
∂A ∂A
=y, =6
∂x ∂x y
∂A ∂A
=x, = 10
∂y ∂y x
and
2
⎛ ∂A ⎞ 2 ⎛ ∂A ⎞ 2
2
σ 2A =⎜ σx + ⎜ ⎟ σy
⎝ ∂x x ⎟⎠ ⎜⎝ ∂y y ⎟⎠
= 6 2 × 0.52 + 10 2 × 0.52 = 34
Hence, σA = 5.83 m2.
Error and Uncertainty Analysis 427
Similarly,
∂P ∂P
=1, =1
∂x ∂x y
∂P ∂P
=2, =2
∂y ∂y x
2
⎛ ∂P ⎞ 2 ⎛ ∂P ⎞ 2
2
2
σP = ⎜ σx + ⎜ ⎟ σy
⎝ ∂x x ⎟⎠ ⎜⎝ ∂y y ⎟⎠
= 1 × 0.52 + 22 × 0.52 = 1.25
Hence, σ P = 1.128 m and the covariance is
Example 10.16 The Universal Soil Loss Equation (USLE) is used to predict soil
erosion. The USLE is given as A = RKLSCP, where A is the soil loss and R, K, L, S,
C, and P are input parameters. Assuming all the input variables are independent,
estimate the uncertainty associated with the prediction of soil loss by erosion.
Table E10-16a gives the mean and standard deviation of various parameters of
the USLE.
Table E10-16a Mean and standard deviation of various parameters of the USLE.
KF x
Q = CA
(Tc + b )n
in which K = 1.74, x = 0.20, b = 0.20, n = 0.77, and F = 5 are the fixed parameters,
whereas the other parameters are uncertain with the characteristics given in
Table E10-17.
Parameter Mean CV
C 0.45 0.33
Tc 0.37 0.62
A 12.00 0.10
Solution By substituting all the parameters that are constant, the peak runoff is
given as
KF x −0.77
Q = CA = 2.41CA (Tc + 0.2)
(Tc + b ) n
−0.77
Z = (Tc + 0.2) = β −0.77
Error and Uncertainty Analysis 429
Further, from Eq. 9.52 we know that if z = βm, then CVz = mCVβ . Thus,
CVz = (–0.77) × 0.41 = –0.31
Q = 2.41CAZ
This indicates that the peak runoff contains a significant amount of uncertainty.
Substituting mean values of all parameters gives a mean peak runoff of 22.20 cfs.
10.4 Questions
10.1 Based on a sample of 20 years of data, the mean and standard deviation
of the annual rainfall for Saint Tammany Parish, Louisiana, is 62.5 inches
and 8.14 inches, respectively. The mean rainfall estimated by the arith-
metic mean method is about 6.5% to 12.5% higher than the true mean.
Estimate the overall random error in the estimated mean value.
10.2 Consider a linear reservoir expressed as S = KQ, where S = storage (in
m3), K = reservoir constant (in hours), and Q = discharge (in cfs). Further,
the mean of K and Q are 20 hours and 50 cfs, respectively. Assume the
coefficients of variation of K and Q to be 0.30 and 0.52, respectively.
(a) Determine the mean storage and the uncertainty in the estimation of
S assuming K and Q to be independent.
(b) Determine the mean storage and the uncertainty in the estimation of S
assuming K and Q to be correlated with a correlation coefficient of 0.52.
(c) What is the magnitude of error involved if an engineer made an
analysis by assuming K and Q as independent whereas the data
show that both of these parameters were dependent?
10.3 The storage St+1 in a reservoir at the end of month t + 1 is given as
St+1 = St + It +Rt – Ot – Et – Lt
430 Risk and Reliability Analysis
t = 1.05 A0.6
t = 1.17 A0.59
t = 0.537 A0.70
t = 6.64L1ca.09S−0.32
10.9 For a given river reach, the Muskingum method for flow routing is rep-
resented as S = K[0.25I + 0.75O], where S = storage rate, I = inflow rate,
O = outflow rate, and K = storage constant. Assume the variables I and
Q are independent. If mean values of K, I, and O are 22 hours, 80 cfs,
and 65 cfs, respectively, with CV values of 0.30, 0.45, and 0.38, respec-
tively, determine the CV and standard deviation of S.
10.10 Consider the case of independent and uncorrelated errors. The hydraulic
radius R for a trapezoidal channel section is expressed as
y (b + my )
R=
b + 2 y 1 + m2
432 Risk and Reliability Analysis
Mean 50 10 2.5
Determine the error in R if (a) all parameters are independent, (b) b and y
are positively correlated with a correlation coefficient of 0.60, and (c) b
and y are negatively correlated with a correlation coefficient –0.60.
10.12 Consider Manning’s equation,
1 A 5 3 1/ 2
Q= S
n P2 3
where n = Manning’s roughness factor, A = cross-sectional area,
P = perimeter, and S = slope. Express the error in Q as a function of error
in A, P, n, and S.
10.13 Based on the Soil Conservation Service method, the basin lag time is
given as
L0.8 [(1000 / CN ) − 9]
0.7
tL =
1900S0.5
where L = length along stream to basin divide in miles, CN = curve num-
ber, and S = % watershed slope. Assume for a given watershed that
L = 25 miles, CN = 75, and S = 0.2%. Determine the CV of watershed lag
time.
10.14 Based on the Modified Rational Method (ISWM Design Manual for
Development/Redevelopment 2004), the critical duration of the design
storm is given as
2CAab
Td = −b
Q0
on the location and return period. The required storage volume for a
detention basin is given as
⎡ 1 Q (b − TC ) ⎤ P180
V = 60 ⎢CAa − ( 2CabAQ0 ) 2 + 0 ⎥
⎣ 2 ⎦ PTd
where P180 = 3-hour (180-minute) storm depth, Tc = time of
concentration, and PTd = storm depth for the critical period.
Assuming that all parameters are independent, derive expressions for
the coefficient of variation for Td and V. Further, consider urban develop-
ment near you and assume appropriate values for a small watershed and
determine the mean and standard deviation of the required storage
volume for the detention pond.
10.15 Based on Darcy’s law, the flow rate through an aquifer is given as
Q = KIA , where K = hydraulic conductivity (m/day), I = hydraulic gra-
dient, and A = cross-sectional area (m2). Assume mean values of the var-
ious parameters as I = 0.004, K = 50 m/day, and A = 1.0 m. Further,
assume CV values for I, K, and A of 0.11, 0.33, and 0.01, respectively.
Determine the CV of Q.
10.16 Assuming complete and instantaneous mixing, the BOD L0 of the mix-
ture of streamwater and wastewater at the point of discharge is given as
Qw Lw + Qr Lr
L0 =
Qw + Qr
Determine the mean and standard deviation of the flow and BOD at the
point of discharge.
10.17 The BOD remaining at a point downstream of the point of discharge is
given as L(x) = L0 exp(–kx/v), where k = deoxygenation rate and
v = stream velocity. Consider the data of Question 10.16 along with the
mean values of k and v as 0.2 per day and 0.30 m/s, respectively, and
their respective CV values of 0.15 and 0.30. Determine the mean and CV
at a point 30,000 m downstream from the discharge.
434 Risk and Reliability Analysis
Rn + E + G + H = 0
ET = cT a
where T is the mean monthly temperature, c is a coefficient, and a is an
exponent. Derive the error in ET as function of the error in T.
10.26 Consider the case of a rectangular channel whose wetted perimeter can
be WP = b + 2h, where b = width and h = flow depth. Variables b and h
can be considered independent. Derive the error in WP as a function of
the errors in b and h. Assume that the means and standard deviations of
b and h are known.
10.27 Consider Chezy’s equation, V = CR1/ 2S1/ 2, where C = Chezy’s rough-
ness factor, R = hydraulic radius, and S = slope. Express the error in V as
a function of the errors in R, C, and S.
10.28 The rating curve for a river is usually defined as Q = aAb , where Q is dis-
charge (m3/s), A is cross-sectional area (m2), and a and b are constants.
Derive the error in Q as a function of the error in A.
10.29 The base flow from a basin at any time t can be expressed as
Qt = Q0 K t
where Qt is the base flow at time t, Q0 is the initial flow, and K is the
recession constant (between 0 and 1, usually 0.85 to 0.99). Derive the
error in Qt as a function of the errors in Q0 and K.
10.30 The time of concentration for a small watershed can be expressed as
Tc = CLap Spb
10.31 For a convex method of flow routing in a river, the outflow at the end of
time interval t + 1, Qt+1, can be computed using the continuity equation
Qt+1 = aQt + bIt
437
438 Risk and Reliability Analysis
In Monte Carlo simulation, the inputs to the system are transformed into
outputs by means of a mathematical model of the system. This model is devel-
oped such that the important features of the system are represented in sufficient
detail. The main steps in Monte Carlo simulation are assembling inputs, prepar-
ing a model of the system, conducting experiments using the inputs and the
model, and analyzing the output. Sometimes, a parameter of the system is sys-
tematically changed and the output is monitored in the changed circumstances
to determine how sensitive it is to the changes in the properties of the system.
The main advantages of Monte Carlo simulation are that it permits detailed
description of the system, its inputs, outputs, and parameters. All the critical
parameters of the system can be included in its description. The other advan-
tages include savings in time and expenses. It is important to remember that the
synthetically generated data are no substitute for the observed data but this is a
useful pragmatic tool that allows the analyst to extract detailed information
from the available data. However, when using Monte Carlo simulation in practi-
cal risk and reliability analyses, a large amount of computation may be needed
for generating random variables and these variables may be correlated. Because,
these days, computing power is usually not a limitation, this consideration is not
a serious limitation.
The generation of random numbers forms an important part of Monte Carlo
simulation. In the early days, roulette wheels similar to those in use at Monte
Carlo were used to generate random numbers, giving rise to the name of the
technique. During the initial days of mathematical simulation, mechanical
means were employed to generate random numbers. The techniques that were
used to generate random numbers were drawing cards from a pack, drawing
numbered balls from a vessel, reading numbers from a telephone directory, etc.
Printed tables of random numbers were also in use for quite some time. The cur-
rent approach is to use a computer-based routine to generate random numbers.
This approach is discussed next.
where Ri are integer variables and a, b, and d are positive integer constants that
depend upon the properties of the computer. The word “modulo” denotes that
the variable to the left of this word is divided by the variable to the right (in this
case d) and the remainder is assigned the value Ri. The desired uniformly dis-
tributed random number is obtained as Ri/d. The initial value of the variable (R0)
in Eq. 11.1 is called the seed. The properties of the generated numbers depend on
the values of constants a, b, and d, their relationships, and the computer used.
The value of constant a needs to be sufficiently high; low values may not yield
good results. Constants b and d should not have any common factors. The posi-
tive integers R0, a, b, and d are chosen such that d > 0, a < d, and b < d.
In computer generation, the sequence of random numbers is repeated after a
certain lag and it is desired that the length of this cycle should be as long as pos-
sible. This lag increases as d increases and therefore a large value of d should be
chosen. Normally, d is set equal to the word length (the number of bits retained
as a unit) of the computer; a typical value is 231 − 1. It is important to ensure that
the length of the cycle is more than the numbers that are needed for the study.
Example 11.1 Generate 10 uniformly distributed random numbers using Eq. 11.1
with a = 5, b = 3, and d = 7. The seed R0 can be assigned a value of 2.
Solution Equation 11.1 is rewritten as
Ri = (5 × Ri-1 + 3) (modulo 7)
known as prime modulus multiplicative LCGs, are widely used these days to
generate random numbers.
Since these algorithms of random number generation are deterministic, the
generated numbers can be duplicated again. Therefore, these numbers are not
random in a strict sense and are called pseudo random numbers. A good random
number generator should produce uniformly distributed numbers that do not
have any correlation with each other, should be fast, should not require a large
memory, and should be able to exactly reproduce a given stream of random
numbers. It is also required that the algorithm be capable of generating several
separate streams of random numbers.
After generation, the random numbers should be tested to ensure that they
possess the desired statistical properties (i.e., the numbers are not serially corre-
lated). The chi-square test is one such test that can be used to confirm that the
numbers are uniformly distributed. Law and Kelton (1991) have discussed
random number generation and tests in greater detail.
where FQ–1 is the inverse of the cumulative distribution function of random vari-
able Q. This method is known as the inverse transformation or inverse CDF
method. The method is graphically illustrated in Fig. 11-1. It is useful when the
inverse of the CDF of the random variable can be expressed analytically. This is a
simple and computationally efficient method. However, it can be used for only
those distributions whose inverse form can be easily expressed.
1.0
FR(r) FX(x)
45°
r 0 x X
R
Figure 11-1 Determination of random number x with desired distribution from
uniformly distributed random number r.
Monte Carlo Simulation 443
m
FX ( x) = ∑ wi FXi ( x ) (11.4)
i =1
where wi are the weights and FXi (x) are the cumulative distribution functions.
The weights should sum to unity. The requisite random number is generated in
two stages. First, two uniformly distributed random numbers (u1 and u2) in the
range [0, 1] are generated. The first number u1 is used to select the appropriate
CDF Fxi (x) for generation of the random number. The second number u2 is used
to determine the random variate according to the selected distribution.
444 Risk and Reliability Analysis
FX(x)
ΔF(x)
0 Δx1 Δx2 X
Figure 11-2 Shape of the probability distribution function and density of generated variates.
Example 11.3 Generate random variates that follow the following probability
density function:
fX(x) = 3/5 + x3, 0 ≤ x ≤ 1 (11.5)
where
f1(x) = 1 and F1(x) = x, 0 ≤ x ≤ 1
In the second attempt, let the uniformly distributed numbers be 0.722 and
0.361. Since u1 is greater than 3/5, u2 is used to generate the variate by following
F2(x). Hence,
1 4
g= ∑ ln ui = 0.777 + 2.134 + 0.599 + 2.532 = 6.042
λ i =1
(11.8)
x1 = mx + σx −2 ln u1 × cos(2πu2) (11.9)
x2 = mx + σx −2 ln u1 × sin(2πu2) (11.10)
This method was developed by Box and Muller (1958).
446 Risk and Reliability Analysis
Let two uniformly distributed random numbers be 0.3465 and 0.8552. Hence,
x2 = e2.057 = 7.822
The mean of these log-normally distributed variates will be mx(ln) = exp(mx+σ x2 /2)
and the variance will be σ x(ln)
2
= exp(2mx + σ x2 )[exp(σ 2x ) – 1]. If the aim is to generate
log-normally distributed random variates with mean mx(ln) and variance σ x(ln) 2
then
these can be obtained as follows. First, generate normally distributed random vari-
ates with mean and variance as
p0 + p1t + p2t 2 + p3 t 3 + p4 t 4
x(u) = t + , 0.5 ≤ u < 1 (11.14)
q0 + q1t + q2t 2 + q3 t 3 + q4 t 4
x(u) = –x(1 – u), 0 < u < 0.5
Monte Carlo Simulation 447
j
⎛ 5⎞
FX ( j) = ∑ ⎜ ⎟ (0.4)i (0.6)5− i , j = 0, 1, ..., 5 (11.16)
i=0
⎝ i⎠
i 0 1 2 3 4 5
FX(i) 0.0778 0.337 0.6826 0.913 0.9898 1.000
Example 11.7 Generate random variates that follow a Poisson distribution with
parameter λ = 4.
448 Risk and Reliability Analysis
i 0 1 2 3 4 5
FX(i) 0.018 0.091 0.237 0.432 0.627 0.783
where f Xi ( xi |x1 ,..., xi −1 ) is the conditional PDF of Xi, given X1 = x1, X2 = x2, …,
Xi–1 = xi–1. The joint cumulative distribution is
FX1 , X2 ,..., Xn ( x1 ,..., xn ) = FX1 ( x1 )FX2 ( x2 |x1 )...FXn ( xn |x1 ,..., xn−1 ) (11.20)
Monte Carlo Simulation 449
W = −2 × ln Q (11.26)
Example 11.9 The purpose of this example is to illustrate the strength of Monte
Carlo simulation for sizing of a storage reservoir. Assume that, at the site of
interest, the statistical characteristics of annual streamflows, which are log-
normally distributed, are known. The inflows have mean = 660 × 106 m3, stan-
dard deviation = 175 × 106 m3, lag–1 autocorrelation coefficient ρ = 0.2, and
skewness γ = 0.99. The generated inflows, quite naturally, should preserve these
statistical properties. Here, we adopt a three-parameter log-normal distribution
to model the flows x.
Let a be the lower bound of the flows. Accordingly, (x – a) will be log-
normally distributed, or y = ln (x – a) will be normally distributed. For the
Monte Carlo Simulation 451
random variate x, the mean μx, standard deviation σx, lag–1 correlation coefficient
ρx, and skewness γx are related to statistical properties of y (Matalas 1967) by
μx = a + exp( σ 2y / 2 + μy ) (11.28)
exp(3 σ 2y ) − 3 exp( σ 2y ) + 2
γx = (11.30)
{exp( σ 2y ) − 1}1.5
exp( σ 2y ρy ) − 1
ρx = (11.31)
exp( σ 2y ) − 1
Knowing the values of μx, σx, ρx, and γx for the historic data, we can find the
values of a, μ y , σy, and ρy for y. First, using Eq. 11.30 we have
exp(3 σ 2y ) − 3 exp( σ 2y ) + 2
= 0.99
{exp( σ 2y ) − 1}1.5
The solution of this equation gives μy = 6.26. Next, we have from Eq. 11.31
exp(0.096659 ρy ) − 1
= 0.2
exp(0.096659) − 1
where ti are normally distributed random variates with mean zero and standard
deviation of unity. The synthetic flows in real domain are obtained by the
transformation
xi = exp(yi) + a (11.33)
452 Risk and Reliability Analysis
This procedure was used to generate 500 traces of inflows, each 100 years
long, which is the useful life of a typical storage reservoir.
A simple procedure to determine the required storage capacity is the sequent
peak algorithm (see Jain and Singh 2003). In the present case, the storage capac-
ity was obtained for a draft equal to 0.7 times the average inflows. These esti-
mates are shown in Fig. 11-3. The mean storage was 125 × 106 m3, and the
standard deviation of storage values was 46 × 106 m3. It is interesting to note that
the maximum and minimum values of storage were 363 × 106 and 13 × 106 m3.
Clearly, deciding on the size of the reservoir after such an analysis will be a
much better decision than just using the available (and usually small sample)
data.
The reservoir will be empty in the first case and full in the third.
Monte Carlo Simulation 453
400
350
Storage size (million m )
3
300
250
200
150
100
50
0
0 100 200 300 400 500
Run number
Figure 11-3 Variation of storage values for the various simulation runs.
Full
Empty Reservoir
Release Reservoir
T 45º
45º
T Smax + T Storage + Inflow
Figure 11-4 Operation policy for the reservoir in Example 11.10.
x j +1 = x + r( x j − x ) + s(1 − r 2 )N j (11.35)
where xj is the flow for year j, x is the mean of the inflows, r is the correlation
coefficient, and s is the standard deviation. Further, the Njs are standard normal
deviates that can be generated by using techniques explained previously.
Equation 11.35 can be employed to generate an annual streamflow sequence.
One drawback of the model for inflow generation used here is that it ignores
skewness of inflows. To preserve the skewness of the historic data, the numbers
Nj in Eq. 11.35 should be transformed as follows (ASCE 1996):
3
2⎡ gN j g 2 ⎤ 2
W = ⎢1 + − ⎥ − (11.36)
g ⎢⎣ 6 36 ⎥⎦ g
Release range (106 m3) < 400 400–600 600–800 800–1,000 >1,000
Let the annual benefits from the release in different ranges (bi) be as given in
the second row of this table. Hence, the expected annual benefit (B) from the
reservoir can be computed as
B = ∑ bi Ri
Monte Carlo Simulation 455
Release range (106 m3) <400 400–600 600–800 800–1,000 >1,000 Expected
benefits
Benefits 2.0 6.0 10.0 7.0 5.0
Base run 0 0.040 0.899 0.046 0.015 9.627
Inflow correlation = 0.5 0 0.020 0.949 0.028 0.003 9.821
Inflow correlation = 0.1 0 0.054 0.866 0.058 0.022 9.5
Skewness considered 0 0.015 0.841 0.091 0.053 9.402
using Eq. 11.30
Target demand = 900 0 0.040 0.118 0.827 0.015 7.284
Reservoir capacity = 600 0 0.040 0.808 0.111 0.041 9.302
456 Risk and Reliability Analysis
Performing a sensitivity analysis is one way to use the power of Monte Carlo
simulation. After the model has been developed, it is easy to change the key
parameters and study the impact of the change. However, to make full use of
this power, it is essential that the runs be carefully planned and that physical
realism not be lost.
⎧ 1
⎪1 − ⎛ 1 − a x − c ⎞ a , a ≠ 0
⎪ ⎜ ⎟
F( x ) = ⎨ ⎝ b ⎠ (11.37)
⎪1 − exp ⎛ − x − c ⎞ , a = 0
⎪⎩ ⎜⎝ ⎟
b ⎠
b
x =c+ (11.39)
1+ a
b2
s2 = (11.40)
(1 + a)2 (1 + 2a)
x1 − c ⎛ n ⎞ a
1− a =⎜ (11.42)
b ⎝ n + 1⎟⎠
Eliminating b and c by substitution of Eq. 11.40 and Eq. 11.41 into Eq. 11.42,
one gets an expression for the estimation of a:
⎡ 1 1 1⎛ n ⎞a ⎤
x − x1 =⎢ − + ⎜ ⎟ ⎥S(1+ a)(1+ 2a)1 2 , a ≠ 0 (11.43)
⎣ 1 + a a a⎝ n + 1 ⎠⎦
With the value of a obtained from the solution of Eq. 11.43, estimates of b and
c are obtained from the solution of Eq. 11.40 and Eq. 11.41.
W0 − 8W1 + 9W2
a= (11.44)
−W0 + 4W1 − 3W2
1 ⎛ b⎞ b 1
Wr = ⎜⎝ c + ⎟⎠ − , r = 0, 1, 2,... (11.47)
r+1 a a ( a + r + 1)
Thus, Eq. 11.44 to Eq. 11.46 yield parameter estimates in terms of PWM. For a
finite sample size, consistent moment estimates ( W r ) can be computed as
1 n
∑ (1 − Fi:n ) xi:n
r
Wr = (11.48)
n i
where x1:n ≤ x2:n ≤ ... ≤ xn:n is the ordered sample and Fi:n=(i – 0.35)/n (Landwehr
et al. 1979b).
n
∑ ln [1 − a(xi − c)/ b] = −na (11.50)
i =1
Example 11.11 Compare the various parameter estimation methods for the
three-parameter generalized Pareto distribution.
Monte Carlo Simulation 459
b
x( F ) = c + ⎡⎣1 − (1 − F )a ⎤⎦ , a ≠ 0 (11.51)
a
x = c − b ln(1 − F ), a = 0 (11.52)
where X(P) denotes the quantile of the cumulative probability P or nonexceed-
ance probability 1− P. To assess the performance of the parameter estimation
methods outlined here, Monte Carlo sampling experiments were conducted
using Eq. 11.51 and Eq. 11.52.
In engineering practice, observed samples may be frequently available, for
which the first three moments (mean, variance, and skewness) are computed.
Thus, parameter estimates and quantiles for the commonly encountered peak
characteristics data are frequently characterized by using the coefficients of vari-
ation and skewness. Because this information is readily derivable from a given
data set, a potential candidate for estimating the parameters and quantiles of
GPD3 will be the one that performs the best in the expected observed ranges of
the coefficient of variation and skewness. Thus, the selection of the population
parameter ranges is very important for any simulation study. To that end, the
following considerations were made:
1. For a given set of data, the distribution parameters are not known in
advance; usually known quantities are the sample coefficients of varia-
tion and skewness, if they exist. Therefore, if one were somehow able to
classify the best estimators with reference to these readily knowable data
characteristics, that would be preferable from a practical standpoint as
compared to classifying the best estimators in terms of an a priori
unknown parameter.
2. Not all the estimators perform well in all population ranges, so a range
where all the estimators can be applied has to be selected. Fortunately, in
real life most commonly encountered data lie within the range
considered in this study.
3. Because the sample skewness in GPD may correspond to more than one
variance, parameter estimation based on skewness alone is misleading. To
avoid this folly, evaluation of the estimators is based on the parameters.
Keeping the above considerations in mind, we investigated parameters using
a factorial experiment within a space spanned by {ai, bj, ck}, where {ai = – 0.1, – 0.05,
0.0, 0.05, 0.1}, {bj = 0.25, 0.50}, and {ci = 0.5, 1.0}. Twenty GPD3 population cases,
listed in Table E11-11, were considered. The ranges of data characteristics were
also computed so that the results of this study could be related to the commonly
used data statistics. For each population case, 20,000 samples of size 10, 20, 50, 100,
200, and 500 were generated, and then parameters and quantiles were estimated.
460 Risk and Reliability Analysis
SE = S(θ) (11.54)
0.5
RMSE = E ⎡⎣(θ − θ)2 ⎤⎦ (11.55)
1 n ˆ
E ⎡⎣ θˆ ⎤⎦ = ∑ θi
n i =1
(11.56)
0.5
⎪⎧ 1 2⎫
n
S ⎡⎣ θˆ ⎤⎦ = ⎨ ∑ ⎡ θˆ 1 − E(θˆ )⎤ ⎪⎬ (11.57)
⎣ ⎦
⎩⎪ n − 1 i =1 ⎭⎪
The root mean square error can also be expressed as
0.5
⎡ n −1 ⎤
RMSE =⎢ SE+ BIAS 2 ⎥ (11.58)
⎣ n ⎦
These indices were used to measure the variability of parameter and quan-
tile estimates for each simulation. Although they were used to determine the
overall “best“ parameter estimation method, our interest lies in the bias and
variability of estimates of quantiles in the extreme tails of the distribution (non-
exceedance probability P = 0.9, 0.99, 0.999) when the estimates are based on
small samples (n ≤ 50). Owing to the limited number of random number of
samples (20,000 here) used, the results are not expected to reproduce the true
values of BIAS, SE, RMSE, and E[ θ̂ ]. Nevertheless, they provide a means to
compare the performance of estimation methods used. The computed values of
BIAS and RMSE in quantiles are tabulated as ratios Xˆ ( F )/ X( F ) rather than for
the estimator Xˆ ( F ) itself.
11.6.3.5 Robustness
Kuczera (1982a,b) defined a robust estimator as the one that is resistant and
efficient over a wide range of population fluctuations. If an estimator performs
steadily without undue deterioration in RMSE and bias, then it can be
expected to perform better than other competing estimators under population
conditions different from those on which conclusions were based. Two criteria
to identify a resistant estimator are mini-max and minimum average RMSE
(Kuczera 1982b). Based on this mini-max criterion, the preferred estimator is
the one whose maximum RMSE for all population cases is minimum. The min-
imum average criterion is to select the estimator whose RMSE average over
the test cases is minimum.
462 Risk and Reliability Analysis
when a > 0 and sample size n ≤ 20. As sample size increased (n ≥ 50), MM1 per-
formed comparatively better than did RME in all ranges. RME and MM1 tended
to further underestimate the quantiles with increasing population c for a given
value of b for P ≤ 0.995. However, for P = 0.999 the trend was reversed. On the
whole, with increasing b the absolute bias of these methods increased for sample
sizes n ≥ 10. PWM and MLE outperformed the other methods in terms of the
absolute value of the bias for all sample sizes and quantile ranges. PWM and
MLE responded positively in terms of bias to both b and c when one was varied
keeping the other constant.
11.6.3.12 Summary
An evaluation of the relative performance of four methods for estimating
parameters and quantiles of the three-parameter GPD was performed by using
Monte Carlo simulation. The generation of a large number of sample data and
their analysis enabled a comparison of the various methods of parameter esti-
mation. No single method was found to be preferable to another for all popula-
tion cases considered. On the whole, the PWM method was found to perform in
a consistent fashion. When a clear choice of a particular method is in doubt,
PWM can be the most reliable and should be the preferred method.
468 Risk and Reliability Analysis
Example 11.12 Annual maximum flow data for a river in South America for 10
years are arranged in descending order in Table E11-12a. The results of a
detailed study showed that the GEV-PWM distribution fits the data for the
region. The following regional formula was developed for the region:
{
YT = 1 − [ −ln (1 − 1/ T )]
K
}/ K (11.60)
The regional parameters of the GEV distribution were computed using the
PWM estimation method. These are K = – 0.247, u = 0.448, α = 0.493, a = 20.91,
and b = 0.46. Estimate the bias in the regional formulas using Monte Carlo
simulations.
Solution The CDF of the GEV distribution is
⎧ 1⎫
⎪ ⎡ ⎛ z − u⎞ ⎤ K ⎪
F( z) = exp ⎨− ⎢1 − K ⎜
⎝ α ⎟⎠ ⎥⎦ ⎬
⎪ ⎣ ⎪
⎩ ⎭
and the PWM estimators are related to data statistics by the following relations:
K = 7.859 C +2.9554 C2
C = 2/(T3 + 3) – ln(2)/ln(3)
L2 × K
α=
Γ(1 + K ) × (1 − 2− K )
α
u = L1 + [Γ(1 + k ) − 1]
K
T3 = L3/L1
where L1, L2, and L3 are the L-moments.
From the data, the mean of annual floods is
xm = 3,297 cumec
{
xT / xm = u + α 1 − [ − ln ( F )]
K
}/ K
Table E11-12b
F xT/xm xT xT B1 B2 B3
in descending order
1 0.440575 0.54848 1808.338 8437.501 843.75 843.75 843.75
2 0.101219 0.078529 258.9088 4413.256 392.29 343.25 294.22
3 0.28496 0.338954 1117.531 3671.264 285.54 214.16 152.97
4 0.798873 1.338567 4413.256 2515.626 167.71 104.82 59.90
5 0.947568 2.559145 8437.501 1808.338 100.46 50.23 21.53
6 0.280841 0.333588 1099.839 1404.637 62.43 23.41 6.69
7 0.351404 0.426035 1404.637 1388.346 46.28 11.57 1.65
8 0.347675 0.421094 1388.346 1117.531 24.83 3.10 0.00
9 0.73204 1.113516 3671.264 1099.839 12.22 0.00 0.00
10 0.575503 0.763005 2515.626 258.9088 0.00 0.00 0.00
Average 2611.525 1935.51 1594.29 1380.70
The values of xT/xm are stored in the third column and the corresponding val-
ues xT are stored in the fourth column with xm = 3,297 cumec. In the fifth column,
the values of the fourth columns are arranged in descending order. The values of
B1, B2, and B3 are listed in the next three columns.
470 Risk and Reliability Analysis
λ2 k
α= (1 − 2− k ) = 0.411066
Γ(1 + k )
α
u = 1+ [Γ(1 + k ) − 1] = 0.504225
k
Using these computed parameters, we compute floods for various return
periods as shown in Table E11-12c.
In a similar manner, two more replications of the procedure were made. The
results for the second replication are shown in Table E11-12d.
Now, the L-moments are
λ1 = 2,816.911, λ2 = 1,178.135, λ3 = 180.8436, λ4 = 198.4997
and the L-moment ratios are
L-CV = 0.41824
L-SK = 0.15350
L-KR = 1.09763
The GEV parameters using these L-moments can be computed as follows:
c = 0.003286, k = – 0.0258, α = 0.588463, u = 0.635062
The floods for various return periods for this replication are shown in
Table E11-12e.
For the third replication the data are listed in Table E11-12f.
Monte Carlo Simulation 471
Table E11-12c
T F (lnF)k u + α /k[1 – (lnF)k] X(T)
Table E11-12d
F xT/xm xT xT B1 B2 B3
in descending order
1 0.637087 0.882038 2908.079 6870.646 687.06 687.06 687.06
2 0.680785 0.979472 3229.32 4507.997 400.71 350.62 300.53
3 0.199327 0.225736 744.2525 4267.184 331.89 248.92 177.80
4 0.805957 1.367303 4507.997 3229.32 215.29 134.56 76.89
5 0.512151 0.656233 2163.601 2908.079 161.56 80.78 34.62
6 0.047668 0.03175 104.681 2163.601 96.16 36.06 10.30
7 0.356743 0.43313 1428.029 1945.318 64.84 16.21 2.32
8 0.787364 1.294263 4267.184 1428.029 31.73 3.97 0.00
9 0.915208 2.083908 6870.646 744.2525 8.27 0.00 0.00
10 0.469038 0.590027 1945.318 104.681 0.00 0.00 0.00
Sum 2611.525 1997.52 1558.18 1289.52
Table E11-12e
T F (lnF)k u + α /k[1 – (lnF)k] X(T)
Table E11-12f
F xT/xm xT xT B1 B2 B2
in descending order
1 0.551562 0.721099 2377.464 5496.240 549.62 549.62 549.62
2 0.096165 0.069667 229.6935 2981.103 264.99 231.86 198.74
3 0.647532 0.904186 2981.103 2956.833 229.98 172.48 123.20
4 0.620144 0.847434 2793.988 2793.988 186.27 116.42 66.52
5 0.644095 0.896825 2956.833 2377.464 132.08 66.04 28.30
6 0.864891 1.667043 5496.24 1890.559 84.02 31.51 9.00
7 0.30977 0.371299 1224.171 1249.217 41.64 10.41 1.49
8 0.457785 0.573418 1890.559 1224.171 27.20 3.40 0.00
9 0.117616 0.105994 349.4614 349.4614 3.88 0.00 0.00
10 0.315583 0.378895 1249.217 229.6935 0.00 0.00 0.00
Sum 2154.873 1519.69 1181.75 976.88
with L-moments
λ1 = 3,297.49, λ2 = 526.1456, λ3 = 152.575, λ4 = 57.02214
and L-moment ratios
L-CV = 0.15956
L-SK = 0.28999
L-KR = 0.37373
Monte Carlo Simulation 473
Table E11-12g
T F (lnF)k u + α /k[1 – (lnF)k] X(T)
Table E11-12h
SN X B1 B2 B3
1 5111.1 511.11 511.11 511.11
2 4352 386.84 338.49 290.13
3 4089 318.03 238.53 170.38
4 3228.3 215.22 134.51 76.86
5 3014 167.44 83.72 35.88
6 2999.6 133.32 49.99 14.28
7 2927.8 97.59 24.40 3.49
8 2489.4 55.32 6.92 0.00
9 2424.3 26.94 0.00 0.00
10 2339.4 0.00 0.00 0.00
Sum 32974.9 1911.82 1387.67 1102.13
Here, results from only a few replications have been shown. When 10,000
replications were made, the bias for T = 1,000 years was nearly 11%. So the GEV
distribution can be considered to be a robust model for the region.
b
F = ∫ f ( x ) dx (11.61)
a
Table E11-12i
T F (lnF)k u + α /k[1 – (lnF)k] X(T)
Table E11-12j
X(T) from replication number X(T) from Bias
obs. data
T 1 2 3 Mean
50 11221.24 8571.342 6590.504 8794.362 6313.123 39.30288
100 15203.18 9861.475 7609.221 10891.29 7242.84 50.37322
200 20417.61 11170.22 8653.642 13413.83 8292.239 61.76362
500 29879.12 12933.2 10077.93 17630.08 9892.567 78.21543
1000 39664.74 14293.59 11190.5 21716.28 11289.03 92.36621
10000 99960.81 18988.97 15119.8 44689.86 17390.88 156.973
Monte Carlo Simulation 475
where the PDF, hx(x) ≥ 0, is defined over the interval a ≤ x ≤ b. The transformed
integral given by Eq. 11.62 is equivalent to the computation of expectation of the
term inside the square brackets; that is,
⎡ f ( x) ⎤
F = EX ⎢ ⎥ (11.63)
⎣ fX (x) ⎦
where X is a random variable whose PDF hX(x) is defined over a ≤ x ≤ b. Now F
can be computed by the Monte Carlo method as
1 N ⎡ f ( xi ) ⎤
Fˆ = ∑ ⎢ ⎥ (11.64)
N i =1 ⎣ hX ( xi ) ⎦
Here, xi is the ith random variate generated according to fx(x) and N is the
number of random variates generated. Computations using the sample-mean
Monte Carlo integration method are carried out in the following steps:
1. Select hX(x) defined over the region of the integral from which N random
variables are generated.
2. Compute f(xi)/hX(xi) for i = 1, 2, …, N.
3. Calculate the sample average based on Eq. 11.64 as the estimate of F.
1
αavg = ( α1 + α2 ) (11.65)
2
The new estimator is also unbiased and its variance is
1
var( αavg ) = ⎣⎡var( α1 ) + var( α2 ) + 2 cov( α1 , α2 )⎤
⎦ (11.66)
4
In Monte Carlo simulation, estimators α 1 and α 2 depend on random vari-
ates that are generated. Of course, these variates are related to the standard uni-
form random variates used to generate random variates. Thus, α 1 and α 2 are
functions of the two standard uniform random variables U1 and U2. It is clear
from Eq. 11.66 that the variance of α avg can be reduced if one can generate ran-
dom variates that yield strongly negative correlations between α 1 and α 2.
A negative value of cov[α 1 (U1), α 2(U2)] can be obtained by generating U1
and U2, which are negatively correlated. A simple approach that produces nega-
tively correlated uniform random variates and demands minimal computation
is to set U1 = 1 – U2.
where A = (a1, a2, …, am) is a set of parameters for design A. Similarly, the perfor-
mance of another design B will be
ZB = g (B, X) (11.68)
where B = (b1, b2, …, bm) is a set of parameters for design B. Now, the difference
in performances for the two designs is
Z = ZA – ZB (11.69)
Monte Carlo Simulation 477
Since both ZA and ZB involve the same random numbers, these may be
highly correlated. Hence, the method of correlated sampling may be used effec-
tively to estimate statistical properties of Z. Let the mean value of these be Z. The
variance of Z is given by
z Aj = g⎡
⎣ A, FX1 (u1 ), FX2 (u2 ), , FXn (un ) ⎤
−1 −1 −1
⎦ (11.72)
zBj = g⎡
⎣ B, FX1 (u1 ), FX2 (u2 ), , FXn (un ) ⎤
−1 −1 −1
⎦ (11.73)
variables, the conditional sampling can be extended, or the joint distribution can
be sampled randomly in a stratified manner. The N-tuples remain together in the
random assignment to model runs.
Several computer packages containing routines for Monte Carlo and LHS
methods are reported in the literature. The U.S. Environmental Protection
Agency has approved some computer packages for LHS and these are available
through EPA's Exposure Models Library. Monte Carlo and LHS have been
applied to model groundwater contamination and reliability assessment of civil
engineering structures.
N1 N2 N3 N4
N5 N6 N7
Figure 11-5 Stratified sampling: population space of N samples divided into 7 strata.
The theory of stratified sampling examines the issues of how to divide the
population into strata, how many strata should there be, and how the domains
of the strata should be determined. Cochran (1972) has described the technique
in greater detail.
Y = Z − α(X − μX ) (11.74)
In that case, the indirect estimator Y is more accurate than the direct estima-
tor Z. The value of α can be selected to obtain the maximum reduction in
variance.
480 Risk and Reliability Analysis
N −1 N
αJ = ∑ ( αis − αs )2
N i +1
(11.78)
Example 11.13 Twenty students were asked to measure the water level of Nar-
mada River at a gauging site. The measured values (in meters) are 290.940,
290.870, 291.010, 290.950, 291.070, 291.110, 291.090, 290.640, 290.680, 290.750,
290.890, 290.580, 290.750, 291.120, 290.630, 290.890, 290.610, 290.870, 291.060, and
291.070. Determine the accuracy of the mean of these measurements using the
jackknife method.
Solution There are 20 different values of the water stage of Narmada River. The
mean μ of these 20 measurements of river stage is 290.879 m. Following step 2 of
the jackknife method, 20 values of mean river stage were computed by ignoring
one observation at a time. These mean values μi (in meters) are 290.928, 290.985,
291.030, 291.086, 291.132, 291.183, 291.236, 291.313, 291.363, 291.412, 291.457,
291.526, 291.570, 291.603, 291.682, 291.721, 291.788, 291.827, 291.869, and 291.922.
Monte Carlo Simulation 481
20 − 1
μJ = × 7.97 = 2.7516
20
1 M *( m)
( )
2
var( α*s ) = ∑ αs − α*s
M m =1
(11.79)
∗
where αs is the mean of α∗s .
The jackknife and bootstrap methods can be used to compute the variance of any
statistic (e.g., mean, standard deviation, and skewness) that are determined from
the sample data.
dY ΔY Y( x j 0 + Δx j ) − Y( x j 0 )
sj = ( ) x0 = = (11.80)
dx j Δx j Δx j
dY / Y( x j 0 ) ΔY x j 0 Y( x j 0 + Δx j ) − Y( x j 0 ) x j 0
s j (%) = ( ) x0 = ( )= ( ) (11.81)
dx j / x j 0 Δx j Y( x jo ) Δx j Y( x j 0 )
where xj,i is the jth parameter generated in the ith parameter set, and yi is the
corresponding ith model output. Now one can use regression and correlation
analysis for I outputs and I sets of J parameters to determine the relative impor-
tance of each of the J parameters and then define sensitivity and uncertainty
indicators.
The accuracy of MCS-based GSA depends on the sample size I for the num-
ber of parameters J. Using the LHS scheme, McKay (1988) suggested I to be
greater than or equal to twice J, whereas Iman and Helton (1985) as well as
Manache (2001) found that I = 4J/3 would be adequate. The importance of each
model parameter can be determined by using the coefficient of correlation (or
the Pearson product moment correlation coefficient) indicating the strength of
the linear relationship between model output and parameters. It is possible that
there is a monotonic nonlinear relationship between model output and parame-
ters. Then Spearman’s rank correlation coefficient can be computed. To that
Monte Carlo Simulation 483
end, the data generated for each parameter and model output are ranked in
either ascending order or descending order and then the rank correlation coeffi-
cient is computed as
I I
I +1 M +1
∑ [R(x j , i ) − 2
][R( yi ) −
2
] 6∑ [R( x j , i ) − R( yi )]2
(11.82)
rR( y ), R( x j ) = i=1 = 1− i=1
I (I 2 − 1)/12 M( M 2 − 1)/12
where j =1, 2, …, J, r is the rank correlation coefficient, R(yi) is the rank of the i-
generated model output value of y, and R(xi,j) is the ith generated value of
parameter xj. If the rank correlation coefficient is higher than the Pearson prod-
uct correlation coefficient then a nonlinear relationship exists between model
output and parameter values. In this case the sensitivity coefficient changes with
parameter values.
In practice a second-order regression equation suffices to relate the model
output to model parameters:
J J J −1 J
y = a0 + ∑ b j x j + ∑ c j x 2j + ∑ ∑ d jm x j xm + ε (11.83)
j =1 j =1 j =1 m = j + 1
where a, b, c, and d are regression coefficients; ε is the error term denoting the
deviation between model response and regression relation-produced output.
For purposes of discussion of GSA, only the linear term of Eq. 11.83 is retained:
J
y = a0 + ∑ b j x j + ε (11.84)
j =1
Equation 11.84 shows that the global sensitivity coefficient associated with
model parameters bj is the same as the regression coefficient dy / dx j = b j ,
reflecting the average sensitivity of the model response to a unit change in the
model parameter.
For comparing model sensitivity to parameters having different units, it is
better to standardize model parameters and model output as
( x − x) (y − y)
x* = , y* =
sx sy
J
y* = ∑ b* j x* j (11.85)
j =1
484 Risk and Reliability Analysis
Now the contribution of each model parameter to the total model output
variability can be expressed as
11.11 Questions
11.1 Generate 20 uniformly distributed random numbers using Eq. 11.1 with
a = 4, b = 2, and d = 5. Take an appropriate value of seed R0.
11.2 Generate values of a random variable that follows an exponential distri-
bution with parameter λ = 5.0.
11.3 Generate values of a random variable that follows a symmetric triangu-
lar distribution.
11.4 Generate values of a random variable that follows an asymmetric (to the
left) triangular distribution.
11.5 Generate values of a random variable that follows an asymmetric (to the
right) triangular distribution.
11.6 Generate a random variable that follows the probability density function
11.8 Generate normally distributed random numbers with parameters (5, 2.0)
using the function-based method.
11.9 Generate values of a random variate that follows a binomial distribution
with parameters (10.0, 0.5).
11.10 Generate values of a random variate that follows a Poisson distribution
with parameter λ = 8.
11.11 Generate numbers that follow a bivariate normal distribution with
parameters (4, 1.5) and (6.0, 2.5). The coefficient of correlation (ρ)
between the numbers is 0.5.
11.12 Twenty values of the water level of Narmada River were measured at a
gauging site. The measured values (in meters) are 291.20, 291.50, 291.50,
290.05, 292.01, 291.20, 291.10, 290.58, 290.78, 290.85, 290.90, 290.35,
290.65, 291.08, 290.43, 290.78, 290.82, 290.95, 291.35, and 291.29. Deter-
mine the accuracy of the mean of these measurements using the jack-
knife method.
Chapter 12
Stochastic Processes
486
Stochastic Processes 487
4
x 10
12
10
8
Discharge (cfs)
0
1940 1950 1960 1970 1980 1990 2000
Time (year)
Since X(t) is a random variable at each time, X(t) will have a probability den-
sity function (PDF), denoted as fX(x, t), and a cumulative distribution function
(CDF) defined as FX ( x , t) = P(X(t) ≤ x ) . Note that both PDF and CDF are func-
tions of time and related as usual:
∂FX ( x , t)
f X ( x , t) =
∂x
The function fX(x, t) is also called the first-order density of X(t).
For two assigned times t1, t2, X(t1) and X(t2) are random variables. Their joint
distribution depends on the values of t1 and t2 and can be written as
Here FX(x1, x2; t1, t2) is also called the second-order distribution of X(t), and
fX(x1, x2; t1, t2) is called the second-order probability density function of X(t).
where μX (t) is the time-dependent mean of the process X(t). The variance of the
process is expressed as
∞
2
σ X (t) = E[X(t) − μ(t)] = 2
∫ (x − μ(t))2 fX (x, t)dx (12.4)
−∞
If t1 = t2, Eq. 12.5 yields the variance as a function of time. cov [X (t1), X (t2)] is the
autocovariance of the random process X(t) at times t1 and t2. Equation 12.5 can be
expressed as
where RX(t1 , t2) is the autocorrelation of X(t) at t1 and t2 and is indeed the joint
moment of random variables X(t1) and X(t2):
∞ ∞
R X (t1 , t2 ) = E[X(t1 )X(t2 )] = ∫ ∫ x1x2 fX (x1 , x2 ; t1 , t2 )dx1dx2 (12.7)
−∞−∞
∞
2
σ X (t) = E[X(t) − μ(t)] = 2
∫ [ x − μ(t)]2 fX (x, t)dx
−∞
where μa is the mean of a. Applying Eq. 12.7 for the autocorrelation function
gives
R(t1 , t2 ) = E[( at1 )( at2 )] = E[a2t1t2 ] = t1t2 E[a 2 ]
cov(t1 , t2 ) t1t2 σ 2a t t σ2
ρX (t1 , t2 ) = = = 1 2 2a = 1
σ X (t1 )σ X (t2 ) t12 σ 2a t22 σ 2a t1t2 σ a
∞ ∞ ⎡ (k − μK )2 ⎤
1
E ( e−kt ) = ∫ e−kt fK (k )dk = ∫ e−kt exp⎢− ⎥dk
−∞ 2 π σ K −∞ ⎣ 2σ 2K ⎦
⎛ σ 2 t2 ⎞
= exp⎜
⎜ 2 − μK t ⎟
K
⎟
⎝ ⎠
where f K (k ) is the PDF of K. The term within the integral sign can be evaluated
as follows: First, consider
⎡ (k 2 + μK2 − 2k μK + 2σ K2 kt) ⎤
exp ⎢ − ⎥
⎣ 2σ 2K ⎦
⎧ ⎡ k − ( μ − σ 2 t )⎤ 2 + 2 μ σ 2 t − σ 4 t 2 ⎫
⎪ K ⎪
= exp ⎨− ⎣
K K ⎦ K K
2 ⎬
⎪ 2σ K ⎪
⎩ ⎭
⎧ ⎡ k − ( μ − σ 2 t )⎤ ⎫
2
⎪ K ⎦ ⎪ ⎛ σKt
2 2 ⎞
= exp ⎨− ⎣
K
2 ⎬ exp ⎜ − μK t⎟
⎪ 2σ K ⎪ ⎝ 2 ⎠
⎩ ⎭
Therefore, the integral term becomes
∞ ⎡ k − ( μK − σ K
2 2⎤
t) ⎛ σ 2 t2 ⎞
∫ ⎣exp⎢− 2
2σ K
⎥exp
⎦ ⎝
⎜
⎜ 2 − μK t ⎟
K
⎟dk
−∞ ⎠
⎧ ⎡ 2 ⎫
⎛ σ 2 t2 ⎞∞ ⎪ ⎣ k − ( μK − σ K t) ⎤
2
⎦ ⎪
= exp⎜ ⎟ ∫ exp ⎨−
⎜ 2 − μK t ⎟
K
2
dk⎬
⎝ ⎠−∞ ⎪ 2σ K ⎪
⎩ ⎭
Hence,
⎧ ⎡ 2 2⎫
σ 2K t 2 ∞
⎪ ⎣ k − ( μK − σ K )⎤
⎦⎪
E e(
−kt
)
= exp(
2
− μK t) 1
∫
2 π −∞
exp⎨−
⎪
2
2σ K
⎬ dk
⎪
⎩ ⎭
⎛ σ 2 t2 ⎞ ⎛ σ 2 t2 ⎞
( )
E [Q(t)] = E Q0 exp ⎜ K − μK t⎟ or μQ (t) = μQ0 exp ⎜ K − μK t⎟
⎝ 2 ⎠ ⎝ 2 ⎠
Stochastic Processes 491
R(t1 , t2 ) = E⎡ ⎦= E (Q0 ) E ( e 1 2 )
⎣ (Q0 e 1 )(Q0 e 2 )⎤
−kt −kt 2 −kt −kt
⎡ σ 2 (t 2 + t22 ) ⎤
= E (Q02 ) exp⎢ k 1 − μK (t1 + t2 )⎥
⎣ 2 ⎦
Equation 12.6 gives the autocovariance function:
⎡ σ 2k (t12 + t22 ) ⎤
cov(t1 , t2 ) = E⎡ 2⎤
⎣ 0⎦ ⎢
Q exp − μK (t1 + t2 )⎥
⎣ 2 ⎦
⎡ σ 2 (t 2 + t22 ) ⎤
− [E(Q0 )]2 exp⎢ K 1 − μK (t1 + t2 ) ⎥
⎣ 2 ⎦
⎡ σ 2 (t 2 + t22 ) ⎤
{ ⎣
}
= E (Q02 ) − [E(Q0 )]2 exp⎢ k 1
2
− μK (t1 + t2 ) ⎥
⎦
Referring to Eq. 12.9 for the variance (t1 = t2 = t), one obtains
2
σQ = E⎡
⎣ Q0 ⎤
2
⎦exp⎡
⎣ σ K t − 2 μK t ⎤
2 2
⎦− [E(Q0 )] exp⎡
2
⎣ σ K t − 2 μK t ⎤
2 2
⎦
p⎡
= exp⎣ σ K t − 2 μK t ⎤
2 2 2 2
⎦− {E(Q0 ) −[E(Q0 )] }
2 E(Q)
2
= σQ exp⎡
⎣ σ K t − 2 μK t ⎤
2 2
⎦= σ Q0
0
E(Q0 )
2
where σ Q o
is the variance of Q0.
Application of Eq. 12.10 yields the correlation coefficient:
σ 2K (t12 + t22 ) σ 2 (t 2 + t 2 )
E[Q02 ]exp[ − μK (t1 + t2 )] − [E(Q0 )]2 exp[ K 1 2 − μK (t1 + t2 )]
ρQ (t1 , t2 ) = 2 2
σ Q0 exp ⎣ 2t1 σ K − 2 μK t1 ⎦ exp ⎣ 2t2 σ K − 2 μK t2 ⎤⎦
2 ⎡ 2 2 ⎤ ⎡ 2 2
where μH0 and μB are the mean of H0 and the mean of B, respectively.
Using Eq. 12.7 for the autocorrelation function yields
⎣ ( H0 + Bx1 ) ( H0 + Bx2 )⎤
R( x1 , x2 ) = E⎡ ⎦
( )
= E ( H 02 ) + x1E ( H 0 ) E (B) + x2 E ( H0 ) E (B) + x1 x2 E B2
= E ( H02 ) + ( x1 + x2 )E ( H0 ) E (B) + x1 x2 E ( B2 )
cov( x1 , x2 ) = R( x1 , x2 ) − μH ( x1 ) μH ( x2 )
= E ( H02 ) + ( x1 + x2 )E ( H0 ) E (B) + x1 x2 E ( B2 ) − ( μH + x1 μB )
0
( μH0 + x2 μB )
= E ( H 02 ) + ( x1 + x2 ) μH μB + x1 x2 E (B2 ) − μ2Ho − x2 μH μB
0 o
− x1 μHo μB − x1 x2 μB2
= σ 2H o + x1 x2 E ( B2 ) − x1 x2 μB2
( )
σ 2H ( x) = R( x , x) − μH ( x ) μH ( x) = σ 2Ho + x 2 E B2 − x 2 μB2 = σ 2Ho + x 2 σ B2
σ 2Ho + x1 x2 E ( B2 ) − x1 x2 μB2
ρH ( x1 , x2 ) =
σ 2H o + x12 σ B2 σ 2H o + x22 σ B2
Example 12.4 Let the potential rate of infiltration be described by the Horton
model: f (t) = f0 + ( f0 − fc )exp(− Kt) , where f0 is the initial infiltration rate, fc is
the final (or steady) infiltration rate, and K is a parameter. The Horton model can
be simply written as f (t) = f0 + B exp(− Kt) , where B = ( f0 − f e ) . Let f (t) be a
stochastic process with f0 , B, and K as independent random variables. K is
normally distributed. Determine the mean, variance, autocorrelation function,
covariance function, and autocorrelation coefficient.
Stochastic Processes 493
= E ( f 02 )+ E ( f0 ) E (B) E⎡ ⎦+ E ( f0 ) E (B) E⎡
⎣ exp(−Kt2 ) ⎤ ⎣ exp(−Kt1 ) ⎤
⎦
+ E ( B2 ) E (exp(−Kt1 )) E⎡
⎣ exp(−Kt2 ) ⎤
⎦
⎛ σ 2 t2 ⎞ ⎛ σ 2 t2 ⎞
= E ( f02 ) + μ f0 μB exp⎜
⎜ 2
K 2
− μ t ⎟
K 2⎟+ μ μ
f0 B exp⎜
⎜ 2 − μK t1 ⎟
K 1
⎟
⎝ ⎠ ⎝ ⎠
⎡ σ 2 ⎤
+ E(B2 )exp⎢ K (t12 + t12 ) − μK (t1 + t2 )⎥
⎣ 2 ⎦
⎡ ⎛ σ 2 t2 ⎞⎤⎡ ⎛ σ 2 t2 ⎞⎤
= R(t1 , t2 ) − ⎢ μ f0 + μB exp ⎜ K 1 − μK t1 ⎟ ⎥ ⎢ μ f0 + μB exp ⎜ K 2 − μK t2 ⎟ ⎥
⎢⎣ ⎝ 2 ⎠ ⎥⎦ ⎢⎣ ⎝ 2 ⎠ ⎥⎦
⎛ σ 2 t2 ⎞ ⎛ σ 2 t2 ⎞
= R(t1 , t2 ) − ( μ f0 )2 − μ f0 μB exp⎜
⎜ 2
K 2
− μ t ⎟
K 2⎟− μ μ
f0 B exp⎜
⎜ 2 − μK t1 ⎟
K 1
⎟
⎝ ⎠ ⎝ ⎠
⎡ σ 2 ⎤
− μB2 exp⎢ K (t12 + t22 ) − μK (t1 + t2 )⎥
⎣ 2 ⎦
494 Risk and Reliability Analysis
⎛ σ2 ⎞ ⎛ σ2 ⎞
= E ( f02 ) + μ f0 μB exp⎜ K 2
⎜ 2 2t − μ t
K 2⎟⎟ + μ μ
f0 B exp ⎜ K 2
⎜ 2 1 t − μ K 1⎟
t ⎟
⎝ ⎠ ⎝ ⎠
⎡ σ2 ⎤ ⎛ σ 2 t2 ⎞
+ E ( B2 ) exp⎢ K (t12 + t22 ) − μK (t1 + t2 )⎥− ( μ f0 )2 − μ f0 μB exp⎜ ⎜ 2
K 2
− μK 2⎟
t ⎟
⎣ 2 ⎦ ⎝ ⎠
⎛ σ 2 t2 ⎞ ⎡ σ 2K 2 2 ⎤
− μ f0 μB exp⎜ ⎜ 2
K 1
− μ t ⎟
K 1⎟− ( μB ) 2
exp⎢ (t1 + t 2 ) − μ (
K 1t + t 2 ⎥
)
⎝ ⎠ ⎣ 2 ⎦
⎡ σ2 ⎤
= σ 2f0 + σ B2 exp⎢ K (t12 + t22 ) − μK (t1 + t2 )⎥
⎣ 2 ⎦
⎡ σ2 ⎤
( )
σ 2fo + σ B2 exp ⎢ K t12 + t22 − μK (t1 + t2 )⎥
⎣ 2 ⎦
ρ f (t1 , t2 ) =
(
σ 2f o + σ B2 exp σ 2K t12 − 2 μK t1 ) (
σ 2fo + σ B2 exp σ K2 t22 − 2 μK t2 )
12.3 Stationarity
A stochastic process is considered stationary if its probabilistic descriptions (e.g.,
statistics) are independent of a shift in time. This means that joint distributions
would be invariant with a shift of the time origin. Two processes X(t) and X(t + τ)
have the same statistics for any τ, that is, fX(x,t) = fX(x) is independent of t;
μX(t) = μX is constant; fX(x1, x2; t1, t2) = fX(x1, x2; τ), τ = t1 – t2, depends on the
time difference τ; RX(t1, t2) = R(τ) depends on the time difference;
cov[x(t1), x(t2)] = cov(x; τ) depends on the time difference; and ρX(t1, t2) = ρX(τ)
also depends on the time difference. If t1= t2, τ = 0, then R(t, t) gives the variance
of the process.
The stationarity property may also be extended to n-dimensional vectors.
For example, when the joint distribution of n-dimensional random vectors
{X1 (t), X2 (t),..., Xn (t)} and {X1 (t + τ), X2 (t + τ),..., Xn (t + τ)} have the same statis-
tical characteristics (e.g., mean, variance, etc.) for all τ, the stochastic process X(t)
is stationary. If a stochastic process does not satisfy this condition, the process is
called an evolutionary stochastic or nonstationary process.
The concept of stationarity alludes to a similar structure of variability at dif-
ferent times (i.e., some kind of repetition is implied in the process). This is an
Stochastic Processes 495
10
0
0 10 20 30 40 50 60 70 80 90 100
12.4 Correlogram
The autocorrelation coefficient (Eq. 12.10) can be expressed in terms of τ, the lag
or separation time, as
cov(t ,t + τ)
ρx (t , t+τ)= (12.13)
σ 2X (t)
0.8
0.6
ρ(τ)
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
τ
0.5
ρ(τ)
-0.5
-1
0 2 4 6 8 10 12 14 16 18 20
τ
cov(k )
ρX (t , t + k Δt) = rX (k ) = , k = 0, 1 , 2, … (12.14)
SX2
where rX (k ) is the lag k autocorrelation coefficient, SX2 is the sample variance of
X, and cov(k) is the lag k autocovariance expressed as
1 N −k
cov(k ) = ∑ (xi − x)(xi+ k − x)
N − k i =1
(12.15)
498 Risk and Reliability Analysis
where N is the number of observations in the series. For k = 0, ρX(0) = 1. For non-
zero k, there are N – k pairs of data in the summation term in Eq. 12.15. For exam-
ple, for k = 1, there are N – 1 pairs of data points, each separated by k = 1 unit of
time. For k = 2, there are N – 2 pairs of data, each separated by k = 2 units of time.
Equations 12.14 and 12.15 indicate that the second-order statistics of a pro-
cess may be obtained from a sample curve (i.e., time-averaged statistics), in place
of an ensemble of many samples (i.e., ensemble statistics). If the time-average
statistics of a process are the same as the ensemble statistics, the process is called
ergodic. In real life, only a single realization of the stochastic process is available.
Therefore, ergodicity or stationarity is assumed after simplification. Under this
assumption, time-average mean, time-average variance, time-average correla-
tion coefficient, etc. are obtained from a single realization of the stochastic pro-
cess. Although these statistics differ from their ensemble values, this assumption
is necessary for practical expediency. Statistical tests, such as the χ2 test, are used
to check stationarity and ergodicity of a series.
The first part in Eq. 12.17 yields that X will have zero mean, and the second
part shows that increments of Z at two different frequencies are uncorrelated
(i.e., the Z process will have orthogonal increments). If ω 1 = ω 2 = ω , then
where (ω ) is the integrated spectrum, and S(ω ) is the spectral density function
or simply the spectrum.
If Eq. 12.16 is viewed as a Fourier transform then dZ can be considered as
the random amplitude. Equation 12.18 shows that the spectrum is proportional
to the square of the random amplitude per frequency increment. This means that
the spectrum must be non-negative at all frequencies.
For a stationary stochastic process, the covariance function can be expressed
as the inverse of the Fourier transform of the spectrum:
∞
cov( τ) or R( τ) = ∫ eiωtS(ω)dω (12.19a)
−∞
or
∞
R( τ) = ∫ S(ω)cos(ωτ)dω (12.19b)
−∞
∞
1
S( ω) = ∫ e−iωτ R(τ) dτ
2 π −∞ (12.20a)
or
1 ∞
S( ω) = ∫ R(τ)cos(ωτ)dτ
2 π −∞ (12.20b)
∞ ∞
cov( τ) = ∫ S(ω)eiωτ dω = ∫ S(ω)cos(ωτ)dω (12.21)
−∞ −∞
and
1 ∞ 1 ∞
S( ω) = ∫
2 π −∞
cov( τ)e−ωτ dτ = ∫ cov(τ)cos(ωτ)dτ
2 π −∞ (12.22)
500 Risk and Reliability Analysis
If the process is covariance stationary, one may normalize the spectral den-
sity function by dividing it by the variance σ X 2 . This means that the covariance
function can be replaced in Eq. 12.22 by the autocorrelation coefficient:
1 ∞ 1 ∞
S∗ ( ω) = ∫
2 π −∞
ρX ( τ) e−ωτ dτ = ∫ ρ (τ)cos(ωτ)dτ
2 π −∞ X (12.23)
where S∗ ( ω) is the normalized spectral density function. Likewise, Eq. 12.21 can
be recast as
∞ ∞
ρX ( τ) = ∫ S∗ (ω)eiωτ dω = ∫ S∗ (ω)cos(ωτ)dω S (12.24)
−∞ −∞
Equation 12.24 indicates that area under the normalized spectral density
function is unity since ρ(0) = cos(0) = 1.
If τ = 0, cov(0) = σ 2 = variance, then Eq. 12.19 reduces to
∞
R(0)=σ 2 = ∫ S( ω)dω (12.25)
−∞
ω = 2 π = 2 πf (12.26)
T
where T is period in units of time and f is frequency in cycles per unit of time.
Therefore,
S( ω) = 2 π S( f ) (12.27)
Example 12.5 Consider the exponential covariance function given by Eq. 12.11.
Express the relation between the covariance and spectrum.
Stochastic Processes 501
∞
σ2
S( ω) = ∫ exp[−iωτ]exp[− τ / λ]dτ
2 π −∞
σ 2 ⎡ ∞ −iωτ − τ / λ 0 ⎤
= ⎢∫ e dτ + ∫ e−iωτ + τ / λ dτ ⎥ = σ2 λ
(12.28)
2 π⎢⎣0 −∞ ⎥
⎦ π(1 + λ2 ω2 )
fmax = Δt (12.29)
2
where fmax is the maximum frequency, also called the Nyquist frequency.
For a sample of observations the spectral density function can be computed
from the sample autocorrelation coefficient and integration of Eq. 12.20:
M −1
S( f ) = [ ρX (0) + 2 ∑ ρX (k )cos(2 πkf Δt) + ρX ( M )cos(2 πMf Δt)]Δt (12.30)
k =1
σ =3
2.5
2. λ=1
2
S(ω)
1.5
1.
0.5
0.
0
0 0.5
0. 1 1.5
1. 2 2.5
2. 3 3.5
3. 4 4.5
4. 5
ω
where M is the maximum lag, which should be a small portion of N data points,
for example, M ≤ 0.25N. Equation 12.30 should be used for frequencies:
kfmax
f = (12.31)
M
Now consider two zero-mean random processes X(t) and Y(t). The covari-
ance function of X and Y can be expressed as
∞
RXY ( τ) = ∫ exp(iωτ)Sxy (ω) dω (12.33)
−∞
or
∞
SXY ( ω) = 1
2π ∫ e−iωτ RXY (τ)dτ (12.34)
−∞
N(t)
t1 t2 t3 t4 Time
Example 12.6 Consider Examples 4.18 and 4.19. Suppose that the occurrences of
drought events may be considered as a Poisson process with rate υ = 1.79, and
the interarrival time has the exponential distribution with parameter λ = 0.124.
Then, find P(N(s) = 5). (Hint: N(s) = 5 means by time s, a total of five droughts
have occurred but the sixth has not, which can be seen from Fig. 12-7.)
Solution To solve this problem, we need to know that N(s) = 5 if T4 ≤ s ≤ T5.
Then, let T4 = t and T5 > s. We have the time interval t6 = T5 – T4 = T5 – t4 > s – t,
which is independent of T4 according to the properties of the Poisson process.
With this information in hand, we can solve this problem as
s s
P( N (s) = 5) = ∫ P(T4 = t)P(T5 > s|T4 = t)dt = ∫ P(T4 = t)P(t5 > s − t)dt (12.40)
0 0
( λt)5−1 − λ( s−t )
s s
λ5
P( N (s) = 5) = ∫ λe − λt e dt = e − λs ∫ t 5−1dt
0
( 5 − 1)! ( 5 − 1)! 0
5 5
− λs ( λs) −0.124 s (0.124 s)
=e =e
5! 5!
Example 12.7 Consider rainy (1) or dry (2) weather conditions in any given day
in summer. Suppose the probability of a rainy day is 0.2. Determine the probabil-
ity of 4 rainy days occurring in 20 days.
Solution In this problem, the weather condition denoted as X has only two pos-
sible outcomes: rain (1) or no rain (0). Also suppose that in a given day whether
it rains or not does not depend on any previous weather condition. Thus the
weather condition X can be considered as a Bernoulli process. Then each xi is
Bernoulli distributed with parameter P = 0.3.
The probability of 4 rainy days occurring in 20 days can be computed as fol-
lows: This can be expressed as the probability of 4 successes (rainy days) in a
total of 20 trials (days). If we treat weather conditions (X) as independent identi-
cally distributed Bernoulli trials, then the probability of 4 rainy days in a total of
20 days is binomial distributed as B(20, 0.3), so
20
( )
P( 4 rainy days in 20 days) = 4 ⋅ 0.3 4 (1 − 0.3)16 = 0.13
⎡1 ⎤
−⎢ (x - μX ) Σ−
T 1
1 ⎣2 X ( x - μX ) ⎥
⎦
f X (x ) = 1/ 2
e , x ∈ Rn (12.41)
( 2 π )N / 2 Σ X
where
⎛ μX (t1 ) ⎞
⎜ μ (t )⎟ CX (t1 , t1 ) CX (t1 , t2 ) ... CX (t1 , tN )
μX =⎜ X 2 ⎟, ΣX =
⎜ ⎟ C (t , t ) C (t , t )... C (t N , tN )
⎝ μX (t1 ) ⎠ X N 1 X N 2 X
Example 12.8 Suppose the sequence X = {x1, x2, …, xn} with each xi ~ N ( μxi , σ 2xi ) ,
and let x1, …, xn be independent. Prove that Z = X1 + X2 +…+ Xn is also a Gaussian
process.
Solution The moment-generating function of the normal distribution is
expressed as
σ 2t 2
mgf X (t) = exp( μt + )
2
506 Risk and Reliability Analysis
Then
mgf Z (t) = E(e tZ ) = E[e t( X1+...+Xn ) ] = E[e tX1 ]E[e tX2 ]...E[e tXn ]
σ 2x1 t 2 σ 2x2 t 2 σ 2xn t 2
= exp( μx1 t + )exp( μx2 t + )...exp( μxn t + )
2 2 2
⎡ (
σ 2x1 + σ 2x2 + ...+ σ 2xn t 2 ⎤ )
⎢ (
= exp t μx1 + μx2 + ...+ μxn +
⎢
) 2
⎥
⎥
⎣ ⎦
{ } {
P xtn (w) ≤ λ xt1 ,..., xtn −1 = P xtn (w) ≤ λ xtn −1 } (12.42)
Example 12.9 Consider the weather type in Baton Rouge in summer. Let the
probability(rain) = 0.2 and probability(no rain) = 0.8. Then if we know today is a
rainy day, is tomorrow a dry day, given only today’s weather type? Is this pro-
cess a stochastic process? If it is not, can it be made a Markov process?
It is clear that it is not guaranteed that tomorrow’s weather type only
depends on today’s weather type; the Markov property does not hold in this case,
so it is not a Markov process. But one can find a way to make this non-Markov
process a Markov process. If we have the weather type information of yesterday,
then the combination of weather type of yesterday and today can be considered
as a state. The probability of tomorrow being a rain day is given in Table E12-9.
Solution
Thus the state can be defined as
1: (R, R), 2: (D, R), 3: (R, D), 4: (D, D)
Now it is Markov process with the transition matrix T given as
0.6 0 0.4 0
0.5 0 0.5 0
T=
0 0.3 0 0.7
0 0.2 0 0.8
Xt |X s ~ N ⎡⎣ x , σ 2 (t − s)⎤⎦
FXt|Xs = x ( y ) = P [ Xt ≤ y|X s = x ]
= P [ Xt − X s ≤ y − x|X s = x ]
= P [ Xt − X s ≤ y − x ] (thee increment is independent of the former state X s )
⎛ y−x ⎞
= Φ⎜ ⎟ (property 3)
⎝σ t−s⎠
508 Risk and Reliability Analysis
Thus
f( Xs , Xt ) ( x , y ) = f Xs ( x ) f Xt|Xs ( y|x)
1 ⎡ x2 ( y − x )2 ⎤
= exp⎢− 2 − 2 ⎥
2 πσ s(t − s) ⎣ 2σ s 2σ (t − s) ⎦
12.7.1 Trend
12.7.1.1 Concept of Time Trend
When a time series is plotted, the series values may, on average, increase or
decrease. This increasing or decreasing tendency defines a trend. The trend may
be the result of low-frequency oscillation, depending on the time scale of obser-
vation. If the time scale of observation is large, trends may be identified as
seasonal or periodic components.
Consider a stochastic process
X(t) = Y(t) + Z(t) (12.44)
where Y(t) is the deterministic trend and Z(t) is the random component. The
deterministic trend can be observed from the solution of the governing deter-
ministic differential equation of the system. The deterministic trend normally
accounts for the largest portion of the total magnitude of a stochastic process.
This is a result of inherent determinism in the process.
Stochastic Processes 509
The component Z(t) accounts for the randomness inherent in the process,
errors in the model hypothesis and its parameter estimation, and errors in the
data. Thus, Z(t) is modeled statistically. The magnitude of its variance σ Z 2 (t) is a
measure of its relative importance to X(t). The larger the value of σ Z 2 (t) , the
more important the uncertainty in X(t). If Y(t) is determined and represented by
a function, say a polynomial, then the fitted values are subtracted from X(t), and
then the random component Z(t) is analyzed by using the serial correlogram for
identification and removal of periodic components, if any.
K
Y( j) = ∑ ai ( jΔt)i , j = 1, 2, 3,..., n (12.45)
i=0
2
n ⎡ nK ⎤
A = ∑ [x(i) − y(i)] = ∑ ⎢ x(i) − ∑ a j (iΔt) j ⎥
2 (12.46)
i=0 i=0 ⎢
⎣ j=0 ⎥⎦
K n n
∑ a j ∑ (iΔt)k + m = ∑ x(i)(iΔt)m , m = 0, 1, 2,..., K (12.47)
j=0 i =1 i =1
Example 12.11 Consider the monthly discharge in July from the Colorado River
near the Grand Canyon, Arizona. Determine the deterministic component (time
trend) Y(t) and random component Z(t) of the data.
Solution The monthly discharge data at the Colorado River near the Grand
Canyon, Arizona, are given in Table E12-11.
Let K = 1. Then, for this problem, Eq. 12.45 to Eq. 12.47 can be written as
510 Risk and Reliability Analysis
N
Q = ∑⎡
2
⎣ x (i ) − ( a0 + a1t (i ))⎤
⎦ (12.49)
i=1
⎧ ∂Q N N
⎪ = a0 N + a1 ∑ t(i) − ∑ x(i) = 0
⎪ ∂a0 i =1 i =1
⎨ N N N (12.50)
⎪ ∂Q = a ∑ t(i) + a ∑ [t(i)]2 − ∑ [ x(i)t(i)] = 0
0 1
⎪⎩ ∂a1 i =1 i =1 i =1
Table E12-11 Monthly discharge data in July at the Colorado River near the Grand
Canyon.
Year Discharge Year Discharge Year Discharge Year Discharge
(cfs) (cfs) (cfs) (cfs)
1923 37840 1945 28160 1967 11270 1989 13580
1924 17060 1946 12760 1968 14060 1990 12970
1925 24190 1947 31750 1969 16160 1991 15150
1926 23230 1948 16410 1970 13250 1992 14290
1927 41100 1949 34600 1971 15170 1993 14520
1928 25260 1950 22790 1972 14170 1994 13880
1929 34410 1951 22720 1973 10910 1995 18310
1930 18790 1952 25860 1974 20080 1996 16480
1931 8195 1953 15939 1975 20260 1997 22020
1932 33610 1954 10860 1976 13120 1998 20640
1933 19200 1955 10050 1977 14440 1999 18660
1934 2380 1956 9722 1978 11340 2000 8703
1935 24620 1957 65590 1979 13950 2001 13460
1936 17000 1958 11110 1980 25400 2002 15079
1937 22230 1959 12939 1981 13700 2003 15240
1938 28520 1960 11030 1982 13430 2004 15409
1939 7611 1961 6780 1983 55550
1940 7040 1962 29620 1984 35400
1941 28510 1963 1755 1985 28290
1942 21870 1964 1368 1986 21470
1943 23730 1965 11780 1987 18380
1944 30150 1966 11350 1988 11890
Stochastic Processes 511
2
N N N ⎡N ⎤ N N N
N ∑ x(i)∑ [t(i)]2 − ∑ x(i) ⎢ ∑ t(i)⎥ − N ∑ [ x(i)t(i)] − ∑ x(i)∑ t(i)
⎢⎣ i =1 ⎥⎦
a0 = i =1 i =1 i =1 i =1
2
i =1 i =1
= 2.165 × 10 5
N ⎡ N ⎤
N 2 ∑ [t(i)]2 − N ⎢ ∑ t(i)⎥
i =1 ⎢⎣ i =1 ⎥⎦
N N N
N ∑ [t(i)x(i)] − ∑ x(i)∑ t(i)
i =1 i =1 i =1
a1 = 2
= −100.6
2 ⎡ ⎤
N N
N ∑ [t(i)] − ⎢ ∑ t(i)⎥
i =1 ⎢⎣ i =1 ⎥⎦
From this analysis, we see that there is a decreasing trend existing in the dis-
charge time series considered. Figure 12-8a shows the discharge time series and
the corresponding time trend Y(t). Figure 12-8b shows the discharge time series
after the time trend removal.
12.7.2 Periodicity
Cyclic, periodic, or seasonal fluctuations in time series are other deterministic
components. These components are detected by periodic oscillations in the cor-
relogram or by frequencies of oscillations in the spectral density function. A time
series containing a trend and a periodic component can be represented as
X(t) = V(t) + W(t) + Z(t) (12.51)
where W(t) is the periodic component. Periodicity is usually modeled using har-
monic functions. The fitted periodic values are then subtracted from X(t) and the
random component Z(t) is analyzed further using correlogram and statistical
techniques.
Note that in Eqs. 12.44 and 12.48 deterministic components Y(t) and W(t) are
added to Z(t) or X(t) is partitioned into Y(t), W(t), and Z(t). This linear addition
or subtraction will not be valid if the differential equation for the periodic com-
ponent or the trend is nonlinear or their coefficients are random.
4
x 10
7
5
Discharge(cfs)
3 Y(t)
0
1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Year
Figure 12-8a Discharge time series and the corresponding time trend.
4
x 10
5
3
Discharge (cfs)
-1
-2
1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Year
Figure 12-8b Discharge time series after time trend removal.
decaying curve or the spectral density shows a smooth combination of several fre-
quencies, then a colored-noise process may be employed to represent Z(t). If the
autocorrelation function shows serial dependence up to a certain lag, then an
autoregressive process may be a suitable representation of Z(t).
Stochastic Processes 513
12.8 Questions
12.1 The potential rate of infiltration, f(t) at time t, in a soil is described by the
Kostiakov equation as
f (t) = at −0.5
Q(t) = Q0 K t
h(t) = qt
Q(t) = qx
f (t) = fc + st −0.5
12.6 For ice and snow melt conditions the base-flow recession at time t can be
adequately represented as
Q(t) = at − n + b
where a and b are constant parameters, I(t) is inflow to the reach at time t,
and Q is outflow from the reach at time t. Assume that I and Q are nor-
mally distributed. Show whether S is normally distributed or not.
12.9 Assume that in summer in New Orleans the probability(rain) on any day
is 0.15 and probability(no rain) = 0.85. If it rains today, will it rain or be
dry tomorrow, given only today’s weather type? What type of a stochas-
tic process is this?
12.10 Consider monthly discharge for January from the Amite River near Den-
ham Springs, Louisiana. Determine the deterministic component (time
trend) Y(t) and random component Z(t) of the discharge data.
Chapter 13
Stochastic Differential
Equations
515
516 Risk and Reliability Analysis
∂z ∂z
x +y =z (13.1)
∂x ∂y
is a PDE. The order of a PDE is the order of the highest derivative. Thus Eq. 13.1
involves partial differentials of first order and hence it will be classified as a PDE
of order one. Similarly, the equation
∂2 z ∂2 z ∂2 z
+ + 2 =0 (13.2)
∂x 2 ∂x ∂y ∂y
∂z ∂z
+y =z (13.3)
∂x ∂y
is a linear equation, since the coefficients of the derivatives do not depend on the
dependent variable. If the coefficients associated with the derivatives of the
dependent variable are functions of the dependent variable, it is a nonlinear
PDE. For example,
∂2 z ∂2 z
z + =1 (13.4)
∂x 2 ∂y 2
and
∂z ∂z
+ ln z = 0 (13.5)
∂x ∂y
dy
+ yP = Q (13.6)
dx
is a linear equation of first order, where y is a stochastic dependent variable, Q is
a forcing function, P is a sink function, and x is an independent variable. Func-
tions P and Q may be stochastic. Another example of a linear equation is
dy
+ xy = 5x (13.7)
dx
In contrast, the equation
dy
+ xy 2 = 5x (13.8)
dx
is not a linear equation.
A given system may receive input at time t = 0 that is not deterministic. In
ordinary differential equations, the initial condition(s) may be random variables.
In a partial differential equation, it may be specified as a random process.
Depending upon the properties of the system, the uncertain input may be fur-
ther propagated or it may be dissipated. Given sufficient data, the problem is to
find the probability distribution of the system output. However, at times,
enough information may not be available to determine the complete distribution
and one may have to be content with only the first few moments of it. In all of
these situations SDEs arise. Depending on the way randomness is considered,
stochastic differential equations can also be classified as (i) differential equations
with random initial conditions, (ii) differential equations with random forcing
functions, (iii) differential equations with random boundary conditions, (iv) dif-
ferential equations with random coefficients, (v) differential equations with ran-
dom geometrical domains, and (vi) differential equations that combine two or
more of these conditions. A solution of an SDE is a stochastic process that satis-
fies it. Because the dependent variable is stochastic, the concepts of mean square
continuity, stochastic differentiation, and stochastic integration are invoked.
These concepts define the continuity, differentiation, and integration of a
stochastic process.
exists. Similar to stochastic differentiation, X(t) is integrable over the interval (a, b)
if the double integral of the autocorrelation function is bounded:
b b
∫ ∫|RX (t1 , t2 )|dt1 dt2 <∞ (13.14)
a a
520 Risk and Reliability Analysis
The condition for the existence of a mean-square derivative can also be con-
sidered from the spectral representation, which is an integral in the mean square
sense. The derivative of X(t) can be expressed as
dX ∞ iωt ∞
W (t) = = ∫ e iω dZX ( ω) = ∫ e iωt dZW ( ω) (13.15)
dt −∞ −∞
∞ ∞
RWW ( τ) = ∫ e iωτ SWW ( ω)dω = ∫ eiωτ ω2SXX (ω) (13.18a)
−∞ −∞
d 2 RXX
dω = − (13.18b)
dτ 2
For τ = 0 , the variance of the derivative follows:
∞
d 2 RXX
2
σW = ∫ ω2SXX (ω)dω =− dτ 2
|τ=0<∞ (13.19)
−∞
Since
dX(t)
W (t) =
dt
the covariance functions of X and W are related as
d 2 RXX ( τ)
= − RWW ( τ) (13.20)
dτ 2
by Eq. 13.18. However, Eq. 13.17 shows a simple algebraic relation between their
spectra:
SWW ( ω)
S XX ( ω) = (13.21)
ω2
Stochastic Differential Equations 521
∞
SWW ( ω)
σX = 2
∫ ω2
dω <∞ (13.22)
−∞
b b b
E[|∫ X(t)dt|]2 = ∫ ∫ R X (t1 , t2 )dt1dt2 (13.23)
a a a
t1 t2
E[Y(t1 )Y(t2 )] = RY (t1 , t2 ) = E[(∫ X(t)dt)(∫ X(s)ds)]
0 0
t1 t2
= E[∫ ∫ X(t)X(s)dtds]
0 0
t1 t2 t1 t2 (13.26)
= ∫ ∫ E[X(t)X(s)]dtds = ∫ ∫ RX (t , s)dtds
0 0 0 0
t1 t2 t1 t2
∫ ∫ RX (t, s)dtds − ∫ ∫ μX (t) μX (s)dtds
0 0 0 0
ρY (t1 , t2 ) =
t1 t1 t1 t1
∫ ∫ RX (z, s)dzds − ∫ ∫ μX (z) μX (s)dz ds (13.29)
0 0 0 0
1
×t t2 t2 t2
2
Now, let us look at some examples. The first example treats outflow as a
random function.
Example 13.1 The water level in a lake in India during the summer months of
no rainfall is governed by the following differential equation:
∂H
K = −Q
∂t
where Q is discharge from the lake through an outlet and K is a parameter. The
water level in the lake is expressed as
⎧∂ ⎫
Q(t) = − K ⎨ [ H0 + h(t)]⎬
⎩ ∂ t ⎭
Taking the expectation gives
∂ ∂H ∂
E[Q(t)] = −E{K
∂t
[ H0 + h(t)]} = −K ∂t0 − KE ⎡⎢ ∂t h(t)⎤⎥
⎣ ⎦
Assuming the stochastic process mean-square continuity, we can extend
Eq. 13.11 and Eq. 13.12 to the derivative of the stochastic process as
dE [ X(t)]
E ⎡⎣ X ’(t)⎤⎦ =
dt
Stochastic Differential Equations 523
We then have
⎡∂ ⎤ dE [ h(t)]
KE ⎢ h(t)⎥ = K =0
⎣ ∂t ⎦ dt
Thus, we get
∂ ∂H 0
E[Q(t)] = −E{K
∂t
[ H0 + h(t)]} = − K
∂t
The autocorrelation function of discharge can be expressed as
2 2
⎡ ∂H (t) ⎤ ⎛ ∂H 0 ⎞
σ Q 2 (t) = RQ (t , t) − μQ 2 (t) = K 2 ⎢ 0 ⎥ + K 2 a2CV 2 − K 2 ⎜ 2 2
⎟⎠ = K a CV
2
⎣ ∂t ⎦ ⎝ ∂t
dS
= −Q , S = KQ
dt
is frequently used for stream base-flow recession. Here S is the storage in a water-
shed at time t, Q is discharge, and K is the residence time. The initial condition is
the following: At t = 0, Q(0) = Q0. It is assumed that Q0 is a random variable with
mean μQ0 and variance σ Q 2 . Determine the solution of the differential equation
0
and the mean μ and the variance, covariance, and autocorrelation function of dis-
charge Q.
524 Risk and Reliability Analysis
dQ 1
+ Q=0
dt K
Its solution is
Q(t) = Q0 exp(−t / K )
2
σQ exp[−(t1 + t2 )/ K ]
ρQ (t1 , t2 ) = 2
0
=1
σQ 0
exp(−2t1 / K ) exp(−2t2 / K )
10 80
60
σ Q(t)
μQ(t)
5 40
2
20
0 0
0 50 100 0 50 100
t (time) t (time)
80 200
60 150
Cov(t1,t2)
20 50
0 0
0 50 100 0 50 100
τ (time difference between t1 and t 2) τ (time difference between t1 and t 2)
Figure 13-1 The expectation, variance, covariance and autocorrelation functions of Q(t).
RQ (t1 , t2 ) = E⎡ ⎦= E⎡
⎣ Q(t1 ) Q(t2 ) ⎤ ⎣ A t1t2 exp (−At1 − At2 )⎤
4
⎦
Au
1
= ∫ A 4 t1t2 exp (−At1 − At2 ) dA
AL
Au − AL
−B1 exp⎡
⎣−Au (t1 + t2 ) ⎤
⎦+ B2 exp[−AL (t1 + t2 )]
=
t14 t24 ( Au − AL )
2 2
σQ = RQ (t , t) − μQ (t)
2
1 ⎡ B3 exp(−2 ALt) − B4 exp(−2 Aut) ⎤ ⎡
⎣ B5 exp(−ALt) − B6 exp(−Aut) ⎤
⎦
= ⎢ ⎥−
Au − AL ⎣ 4t 3 ⎦ 2
( Au − AL ) t 4
The correlation of Q is
cov ⎡⎣t1 , t2 ⎤⎦
ρQ (t1 , t1 ) = 2 2
=1
σQ (t1 )σ Q (t2 )
Stochastic Differential Equations 527
Taking the lower and upper limits of A as 0.10 and 0.5 hour –1, respectively,
we can plot the mean, variance, covariance function, and autocorrelation
function of Q(t). The results are given in .com.
The next two examples deal with the cases where a parameter in the IUH is a
random variable.
k
hn = (kt)n−1 e − kt
(n − 1)!
0.4 3000
0.3
2000
σ 2 [Q(t)]
μ [Q(t)]
0.2
1000
0.1
0 0
0 50 100 0 50 100
Time (h) Time (h)
0 0.015
t2=2 t2=2
t2=10 t2=3
0.01
Cov[Q(t)]
R[Q(t)]
t2=20 t2=4
-0.005
0.005
-0.01 0
0 50 100 0 10 20 30 40 50
Time (h) Time (h)
Figure 13-2 The expectation, variance, covariance and autocorrelation functions of Q(t).
528 Risk and Reliability Analysis
Here
1 (k − μK )2
f k (k ) = exp[− ]
2 πσ k 2σ k 2
Then
k ∞
k 1 ⎡ ( k − μ )2 ⎤
(tk )n−1 e−kt ]m = ∫ [ (kt)n−1 e−kt ]m exp⎢− ⎥dk
k
E[hnm (t)] = E[
(n − 1)! −∞
( n − 1)! σ k 2 π ⎢
⎣ 2σ k ⎥
2
⎦
For m = 1,
M1 = μk − σ k2t , n = 1
M2 = σ 2k + ( μk − σ 2k t)2 , n = 2
M3 = 3( μk − σ 2k t)σ 2k + ( μk − σ 2k t)3 , n = 3
M2∗ = σ 2k + ( μk − 2σ k2t)2 , n = 1
The computed mean and standard deviation of function hn(t) are given in
Fig. 13-3.
Stochastic Differential Equations 529
0.7 0.03
0.6 0.025
0.5
0.02
σ (hn(t))
μ (hn(t))
0.4
0.015
0.3
0.01
0.2
0.1 0.005
0 0
0 5 10 15 20 0 5 10 15 20
Time (h) Time (h)
λ
f k (k ) = ( λk )n−1 exp(− λk )
Γ( α)
Determine the mean and variance of the IUH in Example 13.4. Plot the func-
tion f k (k ) and the mean and standard deviations of function hn(t).
Solution Following the procedure used in Example 13.4, we find the mean of
hn (t) from
t n−1 λ α
E [ hn (t)] = ( ) Mn
(n − 1)! λ + t
n
Mn = ( λ + t)− n ∏ ( α + i − 1)
i =1
2
⎡ t n −1 ⎤ ⎡ λ α ∗ λ 2α 2 ⎤
var[hn (t)] = ⎢ ⎥ ⎢( ) M2 n − ( ) Mn ⎥
⎣ ( n − 1)! ⎦ ⎣ λ + λt α +t ⎦
530 Risk and Reliability Analysis
Taking α = 0.25, λ = 2.1, and n = 3, we can plot the mean and standard devia-
tion of the IUH (see Fig. 13-4). It is worth noting that not all gamma-distributed
random coefficients can be studied by this method, because the variance calcu-
lated in this manner might be smaller than zero; thus only a narrow range of
parameters α and λ may be applicable.
The next two examples deal with reservoirs. Example 13.6 focuses on a cas-
cade of linear reservoirs with a random parameter. Example 13.7 relates to the
topic of a lumped linear reservoir.
Example 13.6 In Example 13.4, the Nash cascade of equal linear reservoirs, each
with parameter k, can be recast as
dQi
k + Qi = Qi −1 , I = 1, 2,…, n
dt
where n is the number of reservoirs, Qi (0) = 0, i = 1, 2,..., n, Qi is the outflow
from the ith reservoir, and t is time. If the cascade is subject to a unit impulse of
input upstream (i.e., at the first reservoir), then the cascade response from this
input will be the IUH, denoted as hn (t) . Assuming n as 2 and k to be repre-
sented as k = k + k* , where k = mean value of k and k* = zero-mean Gaussian
random variable, determine the mean and variance of the IUH and plot these
functions. This situation was addressed by Sarino and Serrano (1990). Take k =10
hours and σ k* =5 hours. Plot the functions of the IUH.
0.016 0.035
0.014 0.03
0.012
0.025
0.01
σ (hn(t))
0.02
μ (hn(t))
0.008
0.015
0.006
0.01
0.004
0.002 0.005
0 0
0 10 20 30 0 10 20 30
Time (h) Time (h)
Solution Consider the IUH (or outflow) of the first reservoir expressed as
dh1
k + h1 = δ(t)
dt
Replacing k by k + k* , one obtains
dh1
(k + k∗ ) + h1 = δ(t)
dt
or
dh1 dh
h + h1 = δ(t) − k* 1
dt dt
or
dh1 h1 δ(t) k∗ dh1
+ = − , h1 (0) = 0
dt k k k dt
To determine the IUH, consider the right side as I1 (t) . Therefore,
dh1 h1
+ = I1 (t)
dt k
The unit impulse response of this equation is e −t / k , if I1 (t) is represented by
a unit impulse or delta function. Therefore, the IUH of the first reservoir is
obtained by convoluting the IUH with I1 (t) as
1 t −(t−s)/ k k t dh (s)
h1 (t) = e−t / k h1 (0) + ∫
k 0
e δ(s)ds − ∗ ∫ e−(t−s)/ k 1 ds
k 0 ds
Its solution is
Note that h1 whose selection is sought appears on the right side as well.
Sarino and Serrano (1990) approximated h1 on the right side as a series:
4
h1 = ∑ h∗i
i =1
e−t / k k ∗ −t / k t s / k d e−t / k k ∗ −t / k
4 t4
d
h1 (t) =
k
− e
k
∫ e ∑
ds i=1
h ∗i( λ) ds =
k
− e
k
∑ ∫ e s / k h∗i (s)ds
ds
0 i=1 0
532 Risk and Reliability Analysis
Now we solve the second term on the right side term by term. The first term
h∗1 is taken as the previous term, which then is obtained from h1(t):
e −t / k
h*1 (s) =
k
The second term now becomes
k* −s / k s ξ / k d −ξ / k k
h*2 (s) =−
k
e ∫ e dξ (e )dξ = 3∗ se−s / k
0 k
k* −s / k s ξ / k d k* −ξ / k k2 k2
h*3 (s) =−
k
e ∫ e ( ξe
dξ k
)dξ = *4 e−s / k + * s2 e−s / k
2k
0 k
k* −s / k s ξ / k d k3 k3 k3
h*4 (s) =−
k
e ∫ e
dξ
( h*3 ( ξ)dξ = e−s / k ( ∗6 s2 + ∗5 s + ∗7 s3 )
0 k k 6k
Summing h*1, h∗2 , h∗3 , and h∗4 , we obtain the IUH of the first reservoir:
1 k t k 2t k 2t 2 k 3 t k 3 t 2 k 3 t 3
h1 (t) = e−t / k ( + ∗3 − ∗4 + ∗ 5 + ∗5 − ∗ 6 − ∗ 7 )
k k k 2k k k 6k
Now we consider the outflow from the second reservoir, which will be the
IUH of the two reservoirs in series. The input to the second reservoir is the out-
flow from the first reservoir, which is given by h1(t) as already calculated. Fol-
lowing the previous procedure and avoiding algebraic details, we find h2(t) to be
Terms within integrals are moments of first, second, and third orders of
k∗ about the origin. The first-order moment is zero. Neglecting the third-order
moments, one obtains
t t 2t 2 t3
E[h2 (t)] = σ k2∗ e −t / k [ 2
+ 4
− 5
+ 6
]
σ k2* k k k 2k
⎛ t 2 2t 3 t4 ⎞
var[h2 (t)] = σ 2k∗ e −2t / k ⎜ 6 − 7 + 8 ⎟
⎝k 3k 3k ⎠
The mean and covariance function are plotted in Fig. 13-5.
-4
x 10
0.035 3
0.03 2.5
0.025
2
Var[h2(t)]
0.02
E[h2(t)]
1.5
0.015
1
0.01
0.005 0.5
0 0
0 50 100 0 50 100
Time (h) Time (h)
R(t)
Stream
Aquifer, S, a
h(t)
H(t)
where q is the outflow per unit area, s is the average storage coefficient or specific
yield, R is the recharge rate, h is the average thickness of the saturated zone, H is
the elevation of water level in an adjacent water body, a is an outflow constant,
and t is time. Treating s and a as constant, and H, h, and R as stationary zero-mean
random processes, determine the spectral solution of the equation. This example
is discussed by Gelhar (1974). Take a = 0.25 and s = 0.2, and plot the ratios
Shh ( ω)/ SRR ( ω) .
Solution Using the stochastic Fourier–Stieltjes integral, we can express the
random functions as
∞
h(t) = ∫ eiωt dZh (ω)
−∞
∞
H (t) = ∫ eiωt dZH (ω)
−∞
∞
R(t) = ∫ eiωt dZR (ω)
−∞
where dZh , dZH , and dZR are, respectively, the Fourier amplitudes of h, H, and
R; and ∞ is the angular frequency. The Fourier amplitudes satisfy the usual
properties as discussed in the preceding chapter. Substituting these into the
differential equation yields
adZH ( ω) + dZR ( ω)
dZh ( ω) =
( a + isω)
Stochastic Differential Equations 535
Recall that, for a random process with independent increments, one gets
∗
E[dZX ( ω1 )dZX ( ω2 )] = 0 , ω1 ≠ ω2
∗
E[dZX ( ω)dZX ( ω)] = SXX ( ω)dω
1 ∞ −iωτ
SXX ( ω) = ∫ e RXX (τ)dτ
2 π −∞
a 1 a 1
E[dZh ( ω)dZh∗ ( ω)] = E[( dZH + dZR )( dZ∗H + dZR∗ )]
a + i ωs a + i ωs a − i ωs a − i ωs
a2 ∗ a
Shh( ω) dω = E[ 2 2 2
dZH dZH ]+ E[ dZH dZR∗ ]
a +ω s a + ω2 s 2 2
a 1
+ E[ 2 2 2
dZR dZ∗H ]+ E[ 2 2 2
dZR dZR∗ ]
a +ω s a +ω s
a2 a a
= SHH ( ω)dω + 2 SHR dω + 2 SRH dω
a 2 + ω2 s 2 a + ω2 s 2 a + ω2 s 2
1
+ 2 SRR dω
a + ω2 s 2
1
Shh = 2 2 2
[a2SHH + aSHR + aSRH + SRR ]
a +ω s
where SHR is the cross-spectrum, which is related to the cross-correlation
function as
1 ∞ −iωτ
SHR ( ω) = ∫ e RHR (τ)dτ = CHR (ω) − iQHR (ω)
2 π −∞
536 Risk and Reliability Analysis
The real part CHR ( ω) is the cospectrum and the imaginary part QHR ( ω) is
the quadspectrum. The cross-correlation function RHR is expressed as
1 ∞ iωτ
RHR ( τ) = ∫ e SHR (ω)dω = E[H(t + τ)R∗ (τ)]
2 π −∞
1
Shh = 2 2 2
[a2SHH + 2aSHR + SRR ]
a +ω s
This shows how the spectral density functions of recharge R and input H,
along with their cospectrum, determine the spectral density function of h.
When τ = 0, the mean square fluctuation E[h 2 ] is
∞
E[h 2 ] = ∫ Shh (ω)dω
−∞
a 1
E[dZR ( ω1 )dZh∗ ( ω2 )] = E[dZR ( ω1 )( dZ∗H + dZR∗ )]
a − i ωs a − i ωs
a 1
E[dZH ( ω1 )dZ∗h ( ω2 )] = E[dZH ( ω1 )( dZ∗H + dZR∗ )]
a − iω a − iω
One can draw inferences in the response of the linear reservoir using these
spectra. For example, if H = 0, then
SRR = a2Shh
The ratio Shh(ω)/SRR(ω) is plotted in Fig. 13-7.
The next example considers the case where the input is random.
Stochastic Differential Equations 537
20
15
0
0 20 40 60 80 100
Frequenc y (ω)
Figure 13-7 Plot of spectral ratio of Shh(ω)/SRR(ω).
Example 13.8 Consider a drainage system as shown in Fig. 13-8, where the drain
spacing is 2L and the maximum water table height above the drain is M(t). One-
dimensional flow to parallel drains can be described by the Dupuit
approximation as
∂h ∂2 h
s =T 2 +R
∂t ∂ t
where the variables are defined in the preceding example. This drainage problem
has been discussed by Duffy et al. (1984). The Darcy equation can be written as
T ∂h
q= |x = 0
L ∂x
where q is the aquifer outflow per unit area. Determine the spectral solution of
h(t). It is assumed that h(t) and R(t) are stochastic processes.
Solution To reduce these processes to zero-mean processes, let
∂ ∂2
s ( y + h) = T 2 ( y + h) + P + R
∂t ∂x
where the bar denotes the mean value. This yields
∂y ∂2 y ∂2 h
s = T 2 + P + [T 2 + R]
∂t ∂x ∂x
538 Risk and Reliability Analysis
Recharge R(t)
h(x,t) m(t)
h0
This equation has two parts: a mean part and a fluctuating part. The mean
part is
∂2 h
T = − R , h = h0 , x = 0 ; h = h0 , x = L
∂x 2
which has the solution
∂y ∂2 y
s = T 2 + P , x = 0 , y = 0 , x = L , ∂y / ∂x = 0
∂t ∂x
In terms of the Fourier amplitudes, this equation reduces to
d2
iωsdZ y ( ω, x) = T [dZy ( ω, x)] + dZp ( ω)
dx 2
d
x = 0 , dZy = 0 ; x = L , (dZy ) = 0
dx
Using the properties of the Fourier amplitudes gives the spectral solution:
s 2 ω2
SRR ( ω) = Syy ( ω)
F( ω, x )
Stochastic Differential Equations 539
where
F( ω, X ) = ⎡⎣ g × g ∗ − ( g + g ∗ ) + 1⎤⎦
g = cosh[b( x − L)]/cosh(bL)
SRR = recharge spectrum, and Syy = water table spectrum. The asterisk
means the complex conjugate.
Perturbation can also be applied to the Darcy equation for drain-flow spec-
tral response. Applying the Stieltjes integral to Darcy’s equation, one has
T d T dZ y ( ω, x ) df
dZQ ( ω, x)|x = 0 = ⎡ dZ y ( ω, x)⎤|x = 0 = |x = 0
L dx ⎣ ⎦ L ( f − 1) dx
R
α=
P
where α is the rainfall recharge coefficient, R is the amount of groundwater
recharge, and P is the amount of rainfall. The governing equation in radial coordi-
nates for groundwater flow is obtained from the Dupuit–Forchheimer theory as
∂ H1 1 ∂ ∂H1
C = (Tr )+ R , 0 ≤ r ≤ a
∂t r ∂r ∂r
∂H 2 1 ∂ ∂H 2
C = (Tr ) , a≤r≤∞
∂t r ∂r ∂r
where C is the storage coefficient, T is the transmissivity, H1 and H2 are the hydraulic
heads at locations 1 and 2, R is the groundwater recharge, r is the radial distance from
the center of recharge, and t is time. The initial and boundary conditions are
540 Risk and Reliability Analysis
15
10
X/L=1
X/L=1/2
X/L=1/4
10 X/L=1/8
10
s 2ω2Shh/SRR
5
10
0
10
1 2 3 4
10 10 10 10
Ω
Figure 13-9 Plot of spectral ratio of s2ω 2 Shh(ω)/SRR(ω)
y
Recharge R
Water table
H H0
Unconfined aquifer
Impervious bed
H1 ( r , 0 ) = H 0
∂H1
|r = 0 = 0 , H1 ( a, t) = H 2 ( a, t)
∂r
∂H1 ∂H1
|r = a = |r = a , H 2 (∞ , t) = H0 = constant head
∂r ∂r
Stochastic Differential Equations 541
H 2 (r , 0)= H0
H = H+h , R= R+q
∂ 1 ∂ ⎡ ∂ ⎤
C ( H + h1 ) = ⎢ rT ( H + h1 )⎥ + R + q
∂t r ∂r ⎣ ∂r ⎦
∂ 1 ∂ ∂
C ( H + h2 ) = (rT ( H + h2 ))
∂t r ∂r ∂r
Removing the average terms and neglecting the products of perturbation
quantities, one gets
∂h1 1 ∂ ∂h
C = (rT 1 ) + q , 0 ≤ r ≤ a
∂t r ∂r ∂r
∂h2 1 ∂ ∂h
C = (rT 2 ) , a ≤ r ≤ ∞
∂t r ∂r ∂r
In a similar manner, the initial and boundary conditions reduce to
h1 (r , 0) = 0 , h2 (r , 0) = 0 , h2 (∞ , t) = 0
h1 ( a, t) = h2 ( a, t)
∂h1 ∂h1 ∂h
|r = 0 = 0 , |r = a = 2 |r = a
∂r ∂r ∂r
For spectral representation of h and q, one can write the Fourier–Stieltjes
integrals as
∞
h(r , t) = ∫ eiωt dZ h (r , ω)
−∞
542 Risk and Reliability Analysis
∞
q(t) = ∫ eiωt dZq (ω)
−∞
where ∞ represents the angular frequency, dZh is the Fourier amplitude corre-
sponding to h, and dZq is the Fourier amplitude corresponding to q. Both satisfy
the following properties:
where Z∗h is the complex conjugate of dZh , Zq∗ is the complex conjugate of Zq ,
Sh ( ω) is the spectral density function of h, and Sq ( ω) is the spectral density
function of q.
Introducing the Fourier–Stieltjes integrals into the governing equations and
boundary conditions gives
∂ ∞ i ωt 1 ∂ ∂ ∞ ∞
C [ ∫ e dZh1 (r , ω)] = (rT [ ∫ e iωt dZh1 (r , ω)]+ ∫ e iωt dZq ( ω))
∂t −∞ r ∂r ∂r −∞ −∞
which becomes
∂2 1 ∂ ω dZq
dZh1 + dZh1 − iC dZh1 = −
∂r 2 r ∂r T T
Similarly, the second equation is simplified as
∂ ∞ i ωt 1 ∂ ∂ ∞
C [ ∫ e dZh2 (r , ω)] = (rT [ ∫ e iωt dZh2 (r , ω)]) , 0 ≤ r ≤ a
∂t −∞ r ∂r ∂r −∞
which reduces to
∂2 1 ∂ ω
dZh2 + dZh2 − iC dZh2 = 0 , 0 ≤ r ≤ ∞
∂r 2 r ∂r T
Likewise the boundary conditions become
∂
dZh1 |r = 0 = 0 , dZh1 |r = a = dZh2 |r = a
∂r
Stochastic Differential Equations 543
∂ ∂
dZh1 |r = a = dZh2 |r = a , dZh1 |r →∞ = 0
∂r ∂r
These two equations can now be solved. Let
dZq
dZh1 = M(r ) − i
Cω
Inserting this in the first equation, one obtains
∂2 1 ∂M ω
M+ = iC M
∂r 2 r ∂r T
The solution of this equation (Carslaw and Jaeger 1959) is
⎡⎛ iCw ⎞ 1/ 2 ⎤ ⎡⎛ iCw ⎞ 1/ 2 ⎤
M = PI 0 ⎢⎜ ⎟ r ⎥ + QK 0 ⎢⎜ ⎟ r⎥
⎢⎣⎝ T ⎠ ⎥⎦ ⎢⎣⎝ T ⎠ ⎥⎦
where I 0 is the first-kind modified Bessel function of zeroth order and K0 is the
second-kind modified Bessel function of zeroth order. Then,
dZq
dZh = f h1
1 Cw
Likewise, the solution of the second equation is
dZq
dZh2 = f h2
Cw
where
C ω iπ / 4 C ω iπ / 4
K1 ( e a)I 0 ( e r)
f h1 (r , ω) = T T −1
C ω iπ / 4 C ω iπ / 4 C ω iπ / 4 C ω iπ / 4
K0 ( e a)I1 ( e a) + K1 ( e a)I 0 ( e a)
T T T T
C ω iπ / 4 C ω iπ / 4
K0 ( e r )I1 ( e a)
f h2 (r , ω) = T T −1
C ω iπ / 4 C ω iπ / 4 C ω iπ / 4 C ω iπ / 4
K0 ( e a)I1 ( e a) + K1 ( e a)I 0 ( e a)
T T T T
where I 0 and I1 are the first-kind modified Bessel functions of zeroth and first
order, respectively, and K0 and K1 are the second-kind modified Bessel functions
of zeroth and first orders, respectively.
544 Risk and Reliability Analysis
Sq
Sh1 = 2 2
f h1 (r , ω) f h∗1 (r , ω) , 0 ≤ r ≤ a
C ω
Sq
Sh2 = f h2 (r , ω) f h∗2 (r , ω) , a ≤ r ≤ ∞
C 2 ω2
where Sh1 is the spectral density function of h1 , Sh2 is the spectral density func-
tion of h2 , and Sq is the spectral density function of q.
The spectral relation between rainfall and recharge is
Sq = α2SP
Using this relation, one can derive the spectral relation between rainfall and
hydraulic head:
SP C2
f h1 (r , ω) f h∗1 (r , ω) = 2 ω2 , 0 ≤ r ≤ a
Sh1 α
SP C2
f h2 (r , ω) f h∗2 (r , ω) = 2 ω2 , a ≤ r ≤ ∞
Sh2 α
∂h ∂2 h
= K 2 , h( x , 0) = 0 , h(0, t) = g(t) , h( x , t) = 0 as x → ∞
∂t ∂x
where h is the hydraulic head (or piezometric height), t is time, x is the space
coordinate, and
kH
K=
m
in which k is the hydraulic conductivity, m is porosity, and H is the aquifer thick-
ness. The IUH or impulse response function (IRF) for g(t) = δ(t) for the partial
differential equation is given as (Venetis 1970; Singh 1989):
4Kt x2 x2
u( x , t) = ( )−3 / 2 exp(− )
x2 4Kt 4K
Stochastic Differential Equations 545
-2 -2
10 10
r=1 r=50
r=20 r=100
-3 r=50 r=150
10 -4
10
-4
10
Sp/Sh1
Sp/Sh2
-6
10
-5
10
-8
-6
10
10
-7 a -10 b
10 10
0.01 0.1 1 10 0.01 0.1 1 10
ω ω
where u(x, t) is the IUH and h(0, t) is the input at x = 0. The IUH is due to
h(0 , t) = δ(0, t) .
The autocovariance function of h(x, t), Ch(τ), is expressed as
Ch ( x , τ) = E[h( x , t) h( x , t + τ)]
∞∞
= ∫ ∫ u( x , z) u( x , y )E[h(0, t − y )h(0, t + τ − z)]dy dz
0 0
∞
= ∫ u( x , y ) u( x , y + τ) dy
0
Integration of ρh over τ > 0 yields the time scale T, which measures the persis-
tence or memory of the water table or hydraulic head. Substituting the IUH into
the expression for Ch ( x , τ) , one gets
∞
2 x 2 k 2 4Kt 4Kt 8Kt + x 2 τ x2
Ch ( x , τ ) = ( ) ∫ [ 2 ( 2 + τ)]−3 / 2 exp[− 2
] dt
K 0
x x 4 Kt( 4 kt + x τ ) 4 Kt
For τ = 0, Ch ( x , 0) is
2x 2 k 2 ∞ 4Kt −3 x2 2x 2 k 2 ∞ 1 x2
C h ( x , 0) = ( ) ∫ [ 2 ] exp[− ]dt = ( ) ∫ 3 exp[− ]dt
K 0 x
2Kt K 0 t
2Kt
2x 4 k 2
≈
K2
The correlation function can be described as
∞
4Kt 4Kt 8kt + x 2 τ 4K
ρh ( x , τ ) = 4 ∫ [ 2
( 2
+ τ )]−3 / 2
exp[− 2
] 2 dt
0 x x 4Kt( 4Kt + x τ) x
Its integration yields the correlation time:
πx 2
Th =
2K
Take x = 1,000 m, m = 0.35, H = 50 m, and k = 5 × 10-4 m/s. The correlation
function of h is plotted in Fig. 13-12.
0.8
0.6
ρh (x, τ )
0.4
0.2
0
0 20 40 60
Days
∂2 q 3 Q02 ∂q 2 Q ∂q
= + 2 202
∂x 2
A02 C 2
H02 ∂x A0 C H0 ∂t
where Q0 is the mean river flow, q is the fluctuation of the flow, H0 is the mean
flow depth, A0 is the mean cross-sectional area, C is Chezy’s roughness coeffi-
cient, x is the distance along the channel, and t is time. The IUH of this equation
is (Dooge 1973; Singh 1989):
3 Q0 2
(x − t)
x 1 2 A0
u( x , t) = exp[− ]
A0 C 2 H02 t
3/2
4 A0 C 2 H02
4π
2Q0 2Q0
For many river reaches the IUH can be expressed as a gamma function:
1 t
u(n, t) = ( )n − 1 e − t / k
k Γ( n ) k
∞ 2
1 s s + τ n−1 −( s+τ )/ k
Cq ( n , τ ) = ∫ ( ) ( )n−1 e−s / k ( ) e ds
0
k Γ(n) k k
we expand the term (s + τ k )n−1 using the binomial series. This yields
2 n − r −1
1 − τ / k n −1 τ Γ(2n − r − 1) 1
Cq ( n , τ ) =
Γ( n )
e ∑( ) ( )
r = 0 k Γ( n − r )Γ( r + 1) 2
For τ = 0,
Γ(2n − 1)
Cq ( n , 0) = 2−2 n+1
Γ 2 ( n)
Cq ( n , τ ) n −1 n − 1 τ r Γ(2n − r − 1) r
ρq (n, τ) = = e−τ /k ∑ ( )( ) 2
C y ( n , 0) r =0
r K Γ(2n − 1)
548 Risk and Reliability Analysis
Integration over τ > 0 leads to the time scale of the flow process:
∞ n−1
n − 1 n Γ(2n − r − 1)
Tq = ∫ ρq (n, τ)dτ = k ∑ ( )2 ⋅Γ(r + 1)
0 r=0
r Γ(2n − 1)
The instantaneous hydrographs are plotted in Fig. 13-13 for a river reach of
length = 20 km; Q = 500 m3/s; n = 2, 3, 4, and 5; and k = 0.2, 0.3, and 0.5 for time
in days.
Example 13.12 Consider flood routing in a channel reach using the Muskingum
method, expressed as
dV
= I −Q
dt
V = K[xI + (1 − x)Q]
where V is the storage in the channel at time t, I is rate of flow to the channel, Q
is the outflow from the channel, x is a weighting factor, and K is the average
travel time of the reach or average residence time.
Combining the two equations, one gets
dQ dI
K (1 − X ) + Q = I − Kx
dt dt
Assuming I and Q as zero-mean random processes, and treating x and K as
constants, determine the spectral solution of Q as well as V.
10 5 1.5
n=2 n=2 n=2
n=3 n=3 n=3
8 n=4 4 n=4 n=4
n=5 n=5 n=5
1
6 3
4 2
0.5
2 1
∗ (1 − xKiω)dZI (1 + xKiω)dZI∗
E[dZQ dZQ ] = SQQ ( ω) = E[ ⋅ ]
1 + K (1 − x)iω 1 − K (1 − x)iω
This yields
1 + x 2 K 2 ω2
SQQ = SII
1 + ω2 K 2 (1 − x)2
or
1 + x 2 K 2 ω2
SQQ SII =
1 + ω2 K 2 (1 − x )2
This shows the spectral density function of outflow from the channel reach
as a function of the spectral density function of inflow to the reach. Usually
x < 0.5; therefore, high-frequency variations in the inflow hydrograph will be
attenuated in the outflow.
For the Muskingum storage–discharge relation, one obtains
1
x=0.1
x=0.25
0.8 x=0.4
SQQ/SII 0.6
0.4
0.2
0
0 0.5 1 1.5 2
ω
Figure 13-14 Spectral response of SQQ/SII.
where q is the discharge leaving the reservoir, V is the volume of the reservoir, C0
is the initial concentration, and t is time. Assuming that q is a stochastic process,
determine the expected value of C and the residence time. Compare the resi-
dence time derived with the residence time assuming q to be a constant or a
mean value. For simplicity, assume C to be the excess concentration above the
background level. This problem is discussed by Maran (2002).
Solution When q is constant, the solution of the differential equation is
q
C(t) = C0 exp(− t) = C0 exp(− kt)
V
where k = q/V is the rate constant for advection.
When q is considered as a stochastic process, the residence time Tr is then
defined as
1 V
Tr = =
K q
dC(t)
= − k(t)C(t)
dt
Stochastic Differential Equations 551
where k(t) is a random function. Assume k(t) to be stationary. That is, its
expected value and all successive moments are independent of time:
E[q(t)] μq
μk = E[k(t)] = =
V V
σ 2q
σ 2k = E[k(t) − μk ]2 =
V2
1
rk ( τ) = E[(k(t) − μk )(k(t − τ) − μk )] = rq ( τ)
σ 2k
pk (k ) = Vpq (kV )
k(t) = μk + σ k k1 (t)
where k1 (t) is a dimensionless random function having zero mean and unit vari-
ance and autocorrelation equal to r k or r q . Here σ K signifies the magnitude of
the fluctuations. The differential equation can now be written as
dC(t)
= −( μk + σ k k1 )C
dt
To solve this equation, we consider a change of variable:
dy(t)
= −σ k k1 y , y(0) = y0
dt
Its solution is
t
y(t) = y0 exp[−σ k ∫ k1 (s)ds]
0
552 Risk and Reliability Analysis
The expected value of y(t) can be determined by using the cumulant expan-
sion presented by Kubo (1962). For any random variable X, cumulants are
defined as
∞
1 n n
E[exp( ax )] = exp[ ∑ a CX ]
n =1 n !
where CXn is the nth term in the cumulant expansion of X = X n . The double
angle ( ⋅ ) notation implies cumulant. Cumulants and moments are related
(Singh 1988). For the first four cumulants,
CX = X = E[X ] = μX
CX 2 = X2 ( )
= E X 2 − [E(X )] = σ 2X
2
CX 3 = X3 ( )
= E X 3 − 3E(X 2 )E(X ) + 2[E(X )]3
( )
2
= E X 4 − 4E(X 3 )E(X ) − 3 ⎡⎣E(X 2 )⎤⎦ + 12E(X 2 ) + [E(X )] − 6[E(X )]]4
2
CX 4 = X4
Note that, for a random variable with zero mean, the first three cumulants
are equal to the first three moments. Furthermore CX1 , X2 ,..., Xn = 0 if one of the
random variables, X1 , X 2 ,..., X n , is independent of the others. For example,
k1 (t )k1 (t −τ ) vanishes if τ > τ ac .
Making use of the cumulant relations, one can express the expected value of
y(t) as
E[ y(t)] ⎧
⎪ ⎡ t ⎤⎫
⎪
= E ⎨exp⎢−σ k ∫ k1 (s)ds ⎥⎬
y0 ⎩ ⎢
⎪ ⎣ 0 ⎥
⎦⎪
⎭
⎡ ∞ (−σ )n t t ⎤
= exp⎢ ∑ k
∫ dt1 ...∫ dtnCK (t1 )Ck (t2 )... Ck (tn )⎥⎥
⎢
⎣ n=2 n ! 0 0
1 1 1
⎦
⎡ ∞ ⎤t t1 tn−1
= exp⎢ ∑ (−σ
σ k )n ⎥∫ dt1 ∫ dt2 ... ∫ dtn ... k1(t1 )k2 (t2 )...k1(tn )
⎢
⎣ n=2 ⎥
⎦0 0 0
t 1 t
E[ y(t)]
= exp[σ 2k ∫ dt1 ∫ dt2Ck1k2 (t1 , t2 )]
y0 0 0
t t1 t
= exp[σ 2k ∫ dt1 ∫ dτ rk ( τ)] = exp[σ K
2
∫ Γ(τ)dτ]
0 0 0
t
where Γ(t) = ∫ rk ( τ)dτ . This is the solution of the following differential equation:
0
d[E( y )]
= σ 2k Γ(t)E( y )
dt
or
dE(C )
= −[ μk − σ 2k Γ(t)]E[C]
dt
This solution presents the variability of the average concentration, provided
σ k τ ac << 1 . If t >> τ ac , Γ(t) ≈ τ ac and the decay rate of the average concentration
is decreased by a quantity proportional to the variance of fluctuations and to
their autocorrelation time. This differential equation is valid when μk domi-
nates or when σ k / μk << 1 even if σ k τ ac is not smaller than one.
Neglecting terms of higher order than σ k τ ac , we can write
dE(C )
= −[ μk − σ 2k τ ac ]E[C]
dt
Its solution is
E(C ) = C0 exp[−( μk − σ 2k τ ac )t]
1 1 1 σ 2k τ ac
Tr(1) = = ≅ Tr(0 ) (1 + )
μk − σ 2k τ ac μk σ 2k τ ac μk
[1 − ]
μk
where Tr(0 ) is the residence time obtained from the mean discharge (zero-order
approximation, equal to 1/ μk ), and terms of second or higher order in
σ 2k τ ac have been neglected. One can write
554 Risk and Reliability Analysis
Tr(1) − Tr(0 ) σ q2 τ ac
=
Tr(0 ) E[q]V
This specifies the increase in the residence time owing to discharge fluctuations.
Now the case of large autocorrelation time is considered. In this case k is a
random variable, not a random function. For initial concentration C0 and rate
constant k, let the solution of the concentration equation be denoted as
f (t , k ) .Then the PDF of the concentration at time t, pC , can be computed from
the PDF of k, pK, as
∞
PC (C , t) = ∫ δ⎡
⎣ C − f (t , k )⎤
⎦pk (k )dk
0
where δ is the Dirac delta function. The probability that the concentration at
time t is between C and C + dC is equal to pC (C , t)dC . Using the solution of C,
one obtains
∞
pC (C , k ) = V ∫ pq (kV )δ(C − C0 e−kt )dk
0
V∞ V C ds V V C
pC (C , t) = ∫
t 0
pq ( ln 0 )δ(C − s) =
t s s Ct
pq ( ln 0 )
t C
Now the expected value of concentration is
∞
μC (t) = E[C(t)] = ∫ CpC (C , t)dC
0
The asymptotic behavior of the mean concentration, that is, as t → 0 (t small com-
pared with Tr(0 ) ), can be obtained by expanding the exponential to the first order:
μC (t) ∞ qt t
= ∫ (1− )pq (q)dq = 1− (0 )
C0 0
V Tr
which is the same as the asymptotic behavior of the deterministic equation.
Stochastic Differential Equations 555
μC ∞ −qt /V 1
=∫e [ pq (0) + p’q (0)q + p"q (0)q2 + ...]dq
C0 0
2
V V 1 V
= [ p (0) + p’q (0) + p"q (0)( )2 + ...]
t q t 2 t
This shows that the expected value of concentration decreases with a power
of 1/t.
The residence time in this case can be computed as
∞ ∞ p q ( q)
1 1
Tr =
C0
∫ μC (t)dt =V ∫ q
dq = VE[ ]
q
0 0
σ T2 = V 2 σ 2q* , q∗ = 1/ q
Example 13.14 Consider a lake with a residence time based on the mean dis-
charge as 50 hours. Assume the lake outflow is characterized by a log-normal
probability distribution with a coefficient of variation of 0.5 and a long autocorre-
lation time. The initial concentration is C0 = 1.5 mg/L. Compute the PDFs of the
residence time and concentration and the expected value of the residence time.
Solution The PDF of k ( = q/V) is
1 1 ln k − mk 2
pk ( k ) = exp[− ( ) ]
sk k 2 π 2 sk
Parameters mk and sk are related to CV and Tr(0 ) by
(0) 1 2
CV 2 = exp(sk2 ) − 1 , mk = − ln Tr − sk
2
where exp(mk ) is the geometric mean of k, which is the geometric mean dis-
charge divided by volume. The plot of pk is shown in Fig. 13-15.
The PDF of the residence time is computed by using the relation between
pT and pq . This is also a log-normal PDF with parameters sT = sK and
mT = − mk . The coefficient of variation CV of T equals the CV of q, and the
556 Risk and Reliability Analysis
geometric mean of T equals the ratio of the reservoir volume and the geometric
mean of discharge. The PDF of T is shown in Fig. 13-16.
The expected value of residence time is computed as
V
E[T ] = (1 + CV 2 ) = Tr(0 ) (1 + CV 2 ) = 50 × (1 + 0.52 ) = 62.5 hours
E[q]
E[T ]
Tg = = Tr(0 ) 1 + CVq2 = 50 × 1 + 0.52 = 55.9 hours
2
1 + CVq
1 1 ln ln(C0 / C ) − m(t) 2
pC (C , t) = exp[− ( ) ]
C0 2 Sq
C ln( )Sq 2π
C
where m(t) = ln(t / Tg ) .
Figure 13-17 plots the PDFs of C at various times. The expected value of C is
plotted in Fig. 13-18.
60
50
40
30
20
10
0
0 0.02 0.04 0.06 0.08 0.1
k (1/h)
0.02
mT =4.02
σ T =0.47
0.015
0.01
0.005
0
0 50 100 150 200
Residence time (h)
Figure 13-16 Probability density function of residence time (T).
7
t=25 h
6 t=50 h
t=75 h
5 t=100 h
t=150 h
4
0
0 0.2 0.4 0.6 0.8 1
c/c 0
0.8
0.6
c /c0
0.4
0.2
0
0 50 100 150 200
Time (h)
13.5 Questions
13.1 A linear differential equation
dS
= −Q , S = KQ
dt
is frequently used for stream base-flow recession. Here S is the storage in
a watershed at time t, Q is discharge, and K is the residence time. The ini-
tial condition is the following: At t = 0, Q(0) = Q0 . It is assumed that K is
a random variable with mean μK and variance σ K 2 . Determine the solu-
tion of the differential equation and the mean μ and the variance, covari-
ance, autocorrelation function of discharge Q.
13.2 Consider the differential equation in Question 13.1. Assume that both K
and Q0 are normally distributed random variables. Determine the solu-
tion of the differential equation and the mean μ and the variance, covari-
ance, autocorrelation function of discharge Q.
13.3 The surface runoff hydrograph from an area represented by a linear reservoir
can be described mathematically as
dQ(t)
K + Q(t) = P , Q(0) = 0
dt
Stochastic Differential Equations 559
k
hn = (kt)n−1 e − kt
(n − 1)!
dQ dI
K (1 − x) + Q = I + xK
dt dt
where I is inflow to the reach, Q is the outflow from the reach, x is a
weighting factor, and K is the storage delay time. Considering x and K as
constant and I as a stationary zero-mean random process, determine the
spectral solution of the equation.
13.7 The rate of infiltration, f(t), at time t in a soil column can be described by
(1) the continuity equation
dS(t)
= f s − f (t)
dt
where S(t) is potential water storage space available in the soil column at
time t and fs is the seepage rate, assumed to be constant, and (2) a rela-
tion between S(t) and f(t). One such simple relation is
a
S0 − S(t) =
f (t)
13.8 For the situation in Question 13.7, make the additional assumption that fs
is also a random variable. Derive the spectral solution of the infiltration
equation.
13.9 Solve Question 13.7 if a is uniformally distributed.
Part IV
561
Chapter 14
563
564 Risk and Reliability Analysis
Probability
1.0
Safety Factor,
Reliability Unknown
Example 14.1 For a given watershed the point-source load WLA is 65 lb/day
and the non-point-source load is 1,258 lb/day. The TMDL capacity at the outlet
of this watershed is determined to be 1,678 lb/day. Determine the margin of
safety and factor of safety.
Solution By using Eqs. 14.2a and 14.2b the margin of safety is calculated as
MOS = capacity – load =1,678 – (65 +1,258) = 355 lb/day
The factor of safety is given as
FOS = capacity/total load = 1,678/(65 +1,258) = 1.27
As an alternative to stacking up worst-case margins and uncertainties, the
engineer could combine these factors statistically to yield information about the
degree of confidence (“reliability”) in a particular point design. In other words,
the engineer could generate not just a single performance predictions but also a
distribution of performance predictions with associated probabilities of occur-
rence, as discussed in the following section.
Reliability Analysis and Estimation 567
Probability
Reliability
Known
Pf = P ( L > C ) (14.3)
Pf = P ( Z < 0 ) (14.4)
Z=C−L (14.5)
C
Z= −1 (14.6)
L
⎛ C⎞
Z = ln ⎜ ⎟ (14.7)
⎝ L⎠
ℜ = P ( Z > 0 ) = 1 − Pf (14.8)
Reliability Analysis and Estimation 569
bl
Pf = ∫ ∫ fC , L ( c , l ) dcdl (14.9)
ad
where fC,L (c,l) is the joint probability density function of C and L; d and l are the
lower and upper bounds of C; and a and b are the lower and upper bounds of L,
respectively. The capacity C and load L are random variables that, in general, are
the resultant of many uncertain variables of the system under consideration,
such as weather parameters; location of the water table; temperature; flow quan-
tities, such as runoff, peak discharge, and volume; contaminant concentration in
soil, water, and air; minimum dissolved oxygen in a stream; material characteris-
tics; and process-specific variables of an engineering system under consider-
ation, to name only a few. Therefore, a generic performance function Z can be
written as
Z = g ( X1 , X 2 , X 3 ,..., X n ) (14.10)
0 0
1 ⎡ 1 ⎛ z − μ ( Z ) ⎞2 ⎤
Pf = P ( Z < 0 ) = ∫ f Z ( z)dZ = ∫ σ (Z) ⎢
exp − ⎜ ⎟ ⎥ (14.12)
2π ⎢ 2 ⎝ σ (Z) ⎠ ⎥
−∞ −∞ ⎣ ⎦
As shown in Fig. 14-3, the probability of failure is the area of the PDF of Z
below 0. Thus, by substituting Z = 0 in Eq. 14.12 and writing μ (Z)/σ (Z) = β,
Eq. 14.12 becomes
0
1 ⎡ 1 2⎤
Pf = ∫ σ ( Z) exp ⎢ − ( − β ) ⎥ = Φ ( − β ) = 1 − Φ ( β ) (14.13)
−∞
2π ⎣ 2 ⎦
( )
β = −Φ −1 Pf (14.14)
L − μ ( L) C − μ (C )
UL = and UC = (14.15)
σ ( L) σ (C )
and substituting values of L and C in Eq. 14.5 gives the performance function in
the limit state as
Z = C – L = 0, i.e., σ (C ) UC − σ ( L) U L + μ (C ) − μ ( L) = 0 (14.16)
Now, plotting the performance function (Eq. 14.16) in the reduced coordinate
system in Fig. 14-5, we see that the straight line generated by this expression is dis-
played at a distance equal to the reliability index β from the origin. The shortest
572 Risk and Reliability Analysis
C<L
C = L or Z = 0
C>L
Table 14-1 Relationship between reliability index (β ) and probability of failure (Pf).
Reliability index β 1 1.2 1.4 1.6
Probability of failure 0.159 0.115 0.0808 0.0548
Pf = Φ (–β )
UL
Limit State Z = 0
Unsafe Region, Z > 0
UC
μ(C) − μ(L)
β=
σ 2 (C) +σ 2 (L)
distance of this line from the origin is equal to the length of the perpendicular
drawn from the origin, which is given as
μ (C ) − μ ( L)
β= (14.17)
σ 2 (C ) + σ 2 ( L)
σ 2Z = σ ln(
2 2
C ) + σ ln( L ) (14.20)
X
μln( X ) = ln
1 + CV 2 ( X )
574 Risk and Reliability Analysis
and
⎛μ 1 + CVL2 ⎞
ln ⎜ C ⎟
μln(C ) − μln( L) ⎜⎝ μL 1 + CVC2 ⎟⎠ (14.21)
β= =
2 2
σ ln( C ) + σ ln( L ) ln[(1 + CVC2 )(1 + CVL2 )]
Example 14.2 It is well known that the TMDL process is inherently uncertain
and a deterministic approach to determining the factor of safety and margin of
safety may not be justified. Assume that the magnitudes of uncertainty (repre-
sented by the coefficient of variation, CV) associated with TMDL, WLA, and LA
are 0.26, 0.18, and 0.33, respectively. Using the values of Example 14.1 as the
mean values for point and non-point-source loads and the TMDL capacity,
determine the reliability index and the corresponding probability of failure.
Assume TMDL, WLA, and LA are independent and normally distributed.
Solution Equation 14.2a and Eq. 14.2b define the performance function as the
conventional MOS:
Z = TMDL – (WLA + LA)
The mean of Z is calculated as
μZ = μTMDL – (μWLA + μLA) = 1,678 – (65 +1,258) = 355 lb/day
σ 2
z =σ 2
TMDL + (σ 2
WLA +σ 2
LA )
⎛ μ −μ ⎞ ⎛ 10 − 7.5 ⎞
C L
Pf = 1 − Φ ⎜ ⎟ = 1− Φ⎜ ⎟
⎜⎝ σ 2 + σ 2 ⎟⎠ ⎝ 1.52 + 2.4 2 ⎠
C L
= 1 − Φ (0.8834) = 1 − 0.81 = 0.19
Thus, there is a 19% risk that the available storage will be inadequate to con-
tain the incoming flood.
Example 14.4 Using the data of Example 14.3, compute the risk of not being able
to contain the flood if the load and resistance follow the log-normal distribution.
Solution We first calculate the coefficient of variation for load and resistance:
CVL = 2.4/7.5 = 0.32
⎡ ⎛μ 1 + CVL2 ⎞ ⎤ ⎡ ⎛ 10 1 + 0.322 ⎞ ⎤
⎢ ln ⎜ C ⎟ ⎥ ⎢ ln ⎜ ⎟ ⎥
⎢ ⎜⎝ μL 1 + CVC2 ⎟⎠ ⎥ ⎢ ⎜⎝ 7.5 1 + 0.152 ⎟⎠ ⎥
Pf = 1 − Φ ⎢ ⎥= 1− Φ⎢ ⎥
⎢ ln[(1 + CVC2 )(1 + CVL2 )] ⎥ ⎢ ln[(1 + 0.152 )(1 + 0.322 )] ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
It may be noted from the results that the difference in the failure probability
is negligible when the variables are assumed to follow the log-normal distribu-
tion rather than the normal distribution.
Example 14.5 A company has been granted a license to discharge waste into a
stream. Under the licensing arrangement, the company must comply with cer-
tain conditions, one of which concerns the concentration of pollutant in the
stream at a monitoring point 100 meters downstream from the outfall. Specifi-
cally, there should be a chance of less than 1% of the pollutant concentration
exceeding 10 mg/L during any one month. The stream has been monitored
daily, since the company began operations, and the data suggest that the
monthly maximum concentration is approximately normally distributed, with a
mean of 6.3 mg/L and a standard deviation of 2.1 mg/L. Does it appear that the
company is complying with the condition of its license?
Solution Let C represent the maximum monthly concentration of the concerned
pollutant. We are given μ C = 6.3 mg/L and σ C = 2.1 mg/L. Then use the follow-
ing steps:
Step 1: Define the performance function as Z = 10 – C.
Step 2: Determine the mean and standard deviation of Z.
The mean value of Z can be determined by taking expectation of Z,
E[Z] = μZ = 10 – E[C] = 10 – 6.3 = 3.7, and the variance of Z will remain the
same as that of C. Thus, σ Z = 2.1 mg/L.
Step 3: Now, the reliability index is
μ ( Z)
β= = 1.762
σ ( Z)
Thus, it appears that the company is not complying with its license conditions.
Example 14.6 The drinking water demand of a city follows a normal distribu-
tion with a mean of 700 m3 per day and a standard deviation of 125 m3. (a) If the
water distribution network can provide a constant rate of supply of 900 m3 per
day, determine the risk of failure on a typical day. (b) What is the risk if the stan-
dard error of the estimated water distribution network capacity is 100 m3?
Solution
(a) Here demand follows a normal distribution. The risk is the probability of
load exceeding 900. This case is depicted graphically in Fig. 14-6. Mathe-
matically, it can be written as
•
1 Ê (D - 700)2 ˆ
R= Ú 125 2p ÁË - 2 * 1252 ˜¯ dD
exp
900
Using the standard normal variate for evaluating the integral gives
• 1.6
1 Ê u2 ˆ 1 Ê u2 ˆ
R= Ú 2p
exp Á - ˜ du = 1 - Ú
Ë 2¯ 2p
exp Á - ˜ du
Ë 2¯
1.6 -•
= 1 - 0.9453 = 0.0547
Thus, there is a 5.47% chance that the demand on a day will exceed
the capacity of the system.
578 Risk and Reliability Analysis
0.0035
Probability distribution function
Demand
0.0030 Capacity
0.0025
0.0020
0.0015
0.0010
0.0005
0.0000
200 300 400 500 600 700 800 900 1000 1100 1200
Demand, Capacity
(b) At this stage, the capacity is also a normally distributed variable with a
mean of 900 and standard deviation of 100:
1 ⎛ (C − 900)2 ⎞
f (C ) = exp ⎜ − ⎟ dC
100 2 π ⎝ 2 * 100 2 ⎠
∞ ⎡∞ ⎤
1 ⎛ (D − 700)2 ⎞ 1 ⎛ (C − 900)2 ⎞
R= ∫ ∫ 125 2 π ⎜⎝ 2 × 1252 ⎟⎠ ⎥ 100 2 π ⎜⎝ − 2 × 1002 ⎟⎠ dC = 0.1001
⎢ exp − dD ⎥ exp
−∞ ⎢
⎣C ⎦
Hence, the risk of failure is about 10% when the capacity is assumed
to follow a normal distribution. This is about twice the value obtained
when the capacity was assumed to be constant.
The integral here can also be numerically integrated to determine
risk. To that end, it is convenient to write it in the following form:
∞ ⎡ C ⎤
1 ⎛ (D − 700)2 ⎞ 1 ⎛ (C − 900)2 ⎞
R= ∫ ⎢ ∫ 125 2 π ⎜⎝ 2 × 1252 ⎟⎠ ⎥ 100 2 π ⎜⎝ − 2 × 1002 ⎟⎠ dC
⎢1 − exp − dD ⎥ x
exp
−∞ ⎣ ∞ ⎦
A small discretization of the dummy variable is necessary to obtain
correct results. The limits of integration are decided based on experience;
a value of (mean ± 6 times the standard deviation) would be adequate in
most cases.
If D and C are correlated, their joint distribution needs to be found to
evaluate the integral. If both these variables follow a normal distribution,
the joint probability distribution is given by
Reliability Analysis and Estimation 579
0.0045
Demand
Probability distribution function
0.0040
Capacity
0.0035
0.0030
0.0025
0.0020
0.0015
0.0010
0.0005
0.0000
200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400
Demand, Capacity
1 ⎛ 1 ⎡ (c − μc )2 (d − μd )2 (c − μc ) (d − μd ) ⎤⎞
f (c , d ) = exp ⎜ − 2 ⎢ 2
+ 2
− 2ρ ⎥⎟
2 πσ c σ d 1− ρ2 ⎝ 2(1 − ρ ) ⎣ σ c σd σc σ d ⎦⎠
Example 14.7 Let the statistical parameters of water demand and capacity of the
supply system be the same as in Example 14.6 and assume these are correlated
with ρ = 0.2. Assuming their joint distribution to be normal, determine the prob-
ability P(C ≤ 900, D ≥ 900). Also determine this probability if ρ = 0.5.
Solution To obtain correct results by numerical integration, it is necessary to use
a small increment for the variables. In the present case, a value of 1.0 may be
adopted for both the variables. The results yield
P(C ≤ 900, D ≥ 900, ρ = 0.2) = 0.0186
P(C ≤ 900, D ≥ 900, ρ = 0.5) = 0.0069
where X = (X1, X 2, ..., Xn), a vector containing n random variables. In FOA, a Tay-
lor series expansion of the model output is truncated after the first-order term:
n
⎛ ∂g ⎞
Y = g (X e ) + ∑ ( Xi − Xie ) ⎜ (14.23)
i =1 ⎝ ∂Xi ⎟⎠ X
e
where X e = (X1e, X2e, ..., Xne), a vector representing the expansion points. In FOA
applications to water resources and environmental engineering, the expansion
point is commonly the mean value of the basic variables. Thus, the expected
value and variance of Y are
E [Y ] ≈ g X ( ) (14.24)
n n
⎛ ∂g ⎞ ⎛ ∂g ⎞
var (Y ) = σ Y2 ≈ ∑ ∑ ⎜ ⎟ ⎜ ⎟ ( )( )
E ⎡⎣ Xi − Xi Xi − X j ⎤⎦ (14.25)
i = 1 j = 1 ⎝ ∂X i ⎠ X i ⎝ ∂X j ⎠ Xj
where μXi = mean of Xi. By using Eq. 14.26, the approximate variance of the mul-
tiplicative σ̂ Y2 can be approximated as
n n
σ̂ Y2 = C02 ∏ μX i ∑ ri2CVX2i
2r
i
(14.29)
i =1 i =1
0.5
⎛ 2 n 2 ri n 2 2 ⎞
⎜ C 0 ∏ μXi ∑ ri CVXi ⎟ ⎛ n ⎞
0.5
ĈVY = ⎜ i =1 i =1 ⎟ = ⎜ ∑ ri2CVX2i ⎟ (14.30)
⎜ n ⎟ ⎝ i =1 ⎠
C 0 ∏ μX
2 2 ri
⎜ ⎟
⎝ i =1
i
⎠
The additive form is obtained when two or more power functions are added.
This form is often encountered in reliability analysis of engineering systems. The
general additive form is written as
n
Y = C1X11 + C2 X 22 + … + Cn X nrn = ∑ Ci Xi i
r r r
(14.31)
i =1
n
μ̂Y = ∑ Ci μXi
r
i
(14.32)
i =1
n
σ̂ Y2 = ∑ Ci2 ri2 μX i CVX2i
2r
i
(14.33)
i =1
n
∑ Ci2 ri2 μX2r CVX2 i
i i
i =1
ĈVY = n
(14.34)
∑ r
Ci μXi
i
i =1
582 Risk and Reliability Analysis
m
r r rm
Y = C0 X11 X 22 … X m
n
( r r rm + n
C1X mm++11 + C2 X mm++22 + … + Cn X m +n )
r
= C0 ∏ Xi i ∑ C j + m X j +j +mm
r
(14.35)
i =1 j =1
For evaluating the mean and variance of combined forms, such as Eq. 14.35,
the mean and variance of the additive part must be determined first by using
Eq. 14.32 and Eq. 14.33. Next, Eqs. 14.28, 14.29, and 14.30 are used to determine
the mean, variance, and CV of Y by treating the combined form as a multiplica-
tive form and assuming the additive part as a multiplicative component with
known mean and variance.
Then the mean and variance of the performance function Z are determined. To
estimate the reliability of the system, ℜ, typically one assumes that Z is normally
distributed. Taking PZ(z) to be a normal distribution with its parameters E[Z] and
σZ determined by FOA, one can determine the risk and reliability of a given sys-
tem using the concept of reliability index β as discussed in the preceding section.
The great advantage of FOA is its simplicity: It requires knowledge of only the
first two statistical moments of the basic variables and simple sensitivity calcula-
tions about selected central values. FOA is an approximate method that may suf-
fice for many applications, but the method does have several theoretical and
conceptual shortcomings. The main weakness of the FOA method is that it is
assumed that a single linearization of the system performance function at the cen-
tral values of the basic variables is representative of the statistical properties of sys-
tem performance over the complete range of basic input variables. The accuracy of
the estimates is influenced in part by the degree of nonlinearity in the functional
relationship and the importance of higher-order terms, which are truncated in the
Taylor series expansion. In applying FOA in reliability analyses, it is generally
assumed that the performance function is normally distributed, which is seldom
true. Therefore, any attempt to characterize the tails of the actual distribution
based on an assumption of normality is likely to result in an inexact answer.
Example 14.8 Solve Example 14.2 using FOA by defining the performance func-
tion in two ways: (1) as the conventional MOS and (2) as the FOS.
Solution
Case 1: The solution for the conventional MOS method is given in Example 14.2.
Case 2: The conventional factor of safety is defined as FOS = capacity/load;
thus the performance function is
TMDL
Z= −1
WLA+LA
Reliability Analysis and Estimation 583
μ (TMDL ) 1678
μ ( Z) = −1= − 1 = 0.27
μ ( WLA) + μ (LA) (65 + 1258)
Using Eq. 14.26 gives the variance of the performance function:
⎡ ⎤ ⎧⎪ 2
2
⎡ μ ( TMDL ) ⎤
2 ⎫
σ 2Z = ⎢
1
⎣ μ ( WLA) + μ (LA) ⎦ ⎪⎩
σ
⎥ ⎨ TMDL ⎢ + ⎥
⎣ μ ( WLA) + μ (LA) ⎦
(σ 2WLA + σ LA
2
)⎪⎬
⎪⎭
⎤ ⎪⎧ ⎫
2 2
=⎢
⎡ 1
⎨
⎣ 65 + 1258 ⎥⎦ ⎩⎪
436. 28 2 ⎡ 1678 ⎤
+ ( 2
⎢⎣ 65 + 1258 ⎥⎦ 11.7 + 415.14
2
)⎪⎬⎪ = 0.267
⎭
So,
σ Z = 0.517
The reliability index is
β = 0.27/0.517 = 0.52
and the probability of failure is
Pf = 1 – Φ(0.52)= 0.30
Z = Cmax – aQb
First, let us calculate the mean and standard deviation of C using FOA.
Using Eq. 14.24 gives the mean of C:
Now,
E[Z]= E[Cmax] – E[C] = 10 – 10.75= –0.75
C = C0 exp ( − kt )
where C0 is the initial chlorine concentration (mg/L) and k is the overall decay
constant (L/hour). Assuming mean and coefficient of variation of k to be
0.14 L/hour and 0.33, respectively, determine the reliability that a location hav-
ing a travel time of 20 hours will be having at least 0.2 mg/L of residual chlorine.
Assume C0 = 4 mg/L.
Solution The water is safe when the concentration of free residual chlorine is
greater than or equal to 0.2 mg/L. The performance function Z can be defined as
Z = C − Cmin = C0 e − kt − Cmin
with
σ k = 0.14 × 0.33 = 0.046
and
2
⎛ dC ⎞
( )μ ( )μ
2
var [ Z ] = ⎜ ⎟ σ k2 = −C0 te − kt σ k2 = t 2C 2 e −2 kt σ k2
⎝ dk ⎠ μ k k
k
2
= (20 × 4 × exp(−0.14 × 20) × 0.046) = 0.051
Thus,
σ Z = 0.225
Assume that the foundation depth df = 12 m. The other parameters are given
in Table E14-11. The situation is depicted in Fig. 14-8.
Solution The objective function is defined as
Z = d f − ds
where df is foundation depth and ds is the scouring depth. The dam is safe when
the depth of scour is less than the foundation depth. By using Eq. 14.28, the mean
of ds is determined to be 3.46. Thus, μZ = 12 – 3.46 = 8.54. Now, using Eq. 14.30, we
can determine the coefficient of variation of ds as
0.5
Ê n 2 2 ˆ
ds = Á Â ri CVXi ˜
ˆ
CV
Ë i =1 ¯
0.5
È(0.862 ¥ 0.01)2 + (0.891 ¥ 0.2)2 + (1.128 ¥ 0.05)2 ˘
=Í ˙
Î+ ( -0.431 ¥ 0.2) + ( -2.01 ¥ 0.2) + (1 ¥ 0.19)
2 2 2
˚
= 0.49
Thus,
σ ds = 3.46 × 0.49 = 1.7
Now
var(Z) = var(df) + var(ds)
and thus
σ Z = 1.7
where E[ ] is an expectation operator and μYi is the mean of the ith power function,
μYi = E ⎡ Xi i ⎤
r
(14.37)
⎣ ⎦
The coefficient of variation of Y, CVY , can be written as
0.5
⎡ n ⎤
(
CVY = ⎢∏ CVY2i + 1 − 1⎥
⎢⎣ i =1
) (14.38)
⎦⎥
The variance of Y, σ Y2 , can be written as
σ Y2 = ( μY CVY )
2
(14.39)
Y = f (X ) = cX r (14.42)
588 Risk and Reliability Analysis
where c and r are constants. The FOA estimate for the mean μ̂Y is
r
μ̂Y = c μX (14.43)
σ̂ Y2 = c 2 r 2 μX(
2 r −1) 2
σX = c 2 r 2 μ2Xr CVX2 . (14.44)
These estimates for μY and σ Y contain errors. The exact value of any
moment can be computed as
FOA estimate
Exact value = (14.45)
1 − Ε(.)
where Ε(.) is the relative error in a moment estimated using FOA. Analytical
relationships for Ε(.) in FOA estimates for means and variances of component
functions were developed (Tyagi 2000) for generic power and exponential func-
tions for five common distributions. These analytical expressions can be used as a
guide for judging the suitability of FOA by determining the relative errors in the
most sensitive parameters. Further, when the relative error is more than the
acceptable error, these analytical relationships enable one to correct the FOA esti-
mates for means and variances of model components to their true values. Using
these corrected values of means and variances of model components, one can
determine the exact values of mean and variance of an overall model output.
Table 14-2 and Table 14-3 present the developed expressions for Ε μ̂ Y and ( )
( ) ( )
Ε σ̂ Y2 for a generic power function Y = cX r . The correction factors for the nor-
mal distribution have been presented graphically in Fig. 14-9 and Fig. 14-10.
In Table 14-2 and Table 14-3, to avoid the singularity at r = –1, r should be taken
as –0.9999, and to avoid the singularity at r = –2, r should be taken as –1.9999.
Similarly, Table 14-4 and Table 14-5 present the developed expressions for
( ) ( )
Ε μ̂ Y and Ε σ̂ Y2 for a generic exponential function Y = be cX . ( )
Example 14.12 Solve Example 14.8 using the corrected FOA method.
Solution
Case 1: As the performance function Z = TMDL – (WLA + LA) is linear, there
is no error involved in the FOA application. So, both the reliability index β and
probability of failure will remain the same as calculated in Example 14.8.
Case 2: In this case the performance function is nonlinear:
TMDL TMDL
Z= −1= − 1 = TMDL × TL−1 − 1
WLA + LA TL
where TL = WLA + LA. First, the mean and variance of TL are determined:
E[TL] = E[WLA] + E[LA] = 65 + 1258 = 1323
Reliability Analysis and Estimation 589
Table 14-2 Generalized relative error in the FOA predicted mean of a power function
Uniform
2 3 ( r + 1) CVX
1−
⎡ 1 + CV 3 (r +1) − 1 − CV 3 (r +1) ⎤
(
⎢⎣ X ) X (
⎥⎦ )
Symmetrical
6 ( r + 1) ( r + 2) CVX2
triangular 1−
⎡ 1 + CV 6 (r + 2) + 1 − CV 6 (r + 2) − 2 ⎤
(
⎢⎣ X ) X ( ⎥⎦ )
Normal 1
r (1− r )
(
1 − 1 + CVX2 ) 2
Gamma
1−
CVX−2 r Γ CVX−2 ( )
Γ ⎡CVX−2
⎣ (1 + rCVX2 )⎤⎦
Exponential 1
1−
Γ ( r + 1)
Relative error
1
0.28
0.22
0.9
0.18 CV X = 0.33
0.8 0.25
0.7 0.20
0.14
0.6
0.16
0.5
0.4 0.10
0.12
0.3
0.2 0.08
0.06
0.1
0.04
0.02
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-0.1
Exponent
Figure 14-9 Relative error in FOA predicted mean for CVx ranging from 0.01 to 0.33,
where X is normally distributed.
590 Risk and Reliability Analysis
Table 14-3 Generalized relative error in the FOA predicted variance of a power function.
Distribution 2
Relative error in FOA predicted variance, Ε σ̂ Y ( )
Uniform 12 ( 2r + 1) r 2 ( r + 1) CVX4
2
1−
⎪⎧ ⎫
r +1 ⎤ 2 ⎪
( ) ( ) ( ) ( )
2⎡ 2r +1 2r +1 ⎤ r +1
⎨2 3CVX ( r + 1) ⎢ 1 + CVX 3 − 1 − CVX 3 − ( 2r + 1) ⎡ 1 + CVX 3 − 1 − CVX 3
⎣ ⎥⎦ ⎢⎣ ⎥⎦ ⎬⎪
⎩⎪ ⎭
Symmetrical 36 ( 2r + 1) r 2 ( r + 1) ( r + 2) CVX6
2 2
1−
triangular ⎪⎧ 2⎫
⎪
( ) ( ) ( ) ( )
2⎡ 2r + 2 2r + 2 r+2 r+2
⎨3 ( r + 1) ( r + 2) CVX ⎢ 1 + CVX 6 − 2 ⎤ − ( 2r + 1) ⎡ 1 + CVX 6 − 2⎤ ⎬
2
+ 1 − CVX 6 + 1 − CVX 6
⎪⎩ ⎣ ⎦⎥ ⎣⎢ ⎦⎥ ⎪⎭
Normal
( )
r
r 2CVX2 CVX2 + 1
1−
(CVX2 + 1) ⎡⎢⎣(CVX2 + 1) ⎤
2 2
r r
− 1⎥
⎦
Gamma
( )
2
r 2CVX(
2 1− 2 r ) ⎡
Γ CVX−2 ⎤
1− ⎣ ⎦
2
Γ ⎡CVX−2
⎣ ( ⎦
⎧
) (
⎩ ⎣
⎫
1 + 2rCVX2 ⎤ Γ CVX−2 − ⎨Γ ⎡CVX−2 1 + rCVX2 ⎤ ⎬
⎦ ⎭ ) ( )
Exponential
r2
1−
⎡Γ ( 2r + 1) − Γ 2 ( r + 1)⎤
⎣ ⎦
Relative error
1
0.28
0.22 CV X = 0.33
0.9 0.18 0.25
0.20
0.14
0.16
0.8
0.10
0.7
0.12
0.6
0.5
0.06
0.4 0.08
0.3
0.2
0.04
0.1
0.02
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-0.1
Exponent
Figure 14-10 Relative error in FOA predicted variance for CVx ranging from 0.01 to
0.33, where X is normally distributed.
Reliability Analysis and Estimation 591
Table 14-4 Generalized relative error in FOA predicted mean of an exponential function.
Uniform 3c μx CVx
2 3c μxCVx e
1−
(e 2 3c μx CVx
−1 )
Symmetrical
triangular 6c 2 μ2xCVx2e c μx CVx 6
1−
(e )
2
c μx CVx 6
−1
Normal
⎡ 1 ⎤
1 − exp ⎢ − c 2 μ2xCVx2 ⎥
⎣ 2 ⎦
Gamma 1
(
1 − 1 − c μxCVx2 ) CVx2 exp (c μx )
Exponential
1 − (1 − c μx ) exp (c μx )
Distribution 2
Relative error in FOA predicted variance, E σ̂ Y ( )
Uniform 3c μx CVx
12c 4 μ4x CVx4 e 2
1−
(e 2 3c μx CVx
)(
− 1 ⎡⎢
⎣
3c μxCVx − 1 e 2 ) 3c μx CVx
+ 3c μxCVx + 1⎤⎥
⎦
Symmetrical 6c μx CVx
triangular 72c6 μ6xCVx6 e 2
1−
(e ) ( )( ) (3c2 μ2xCVx2 + 2)⎤⎥⎦
2
6c μx CVx
− 1 ⎡ 3c 2 μ2xCVx2 − 2 e 2 6c μx CVx
+ 1 + 2e 6c μx CVx
⎢⎣
Normal
c 2 σ 2x
1−
( )
exp c 2 σ 2x ⎡exp c 2 σ 2x − 1⎤
⎣ ⎦( )
Gamma
c 2 μ2xCVx2 exp ( 2c μx )
1−
1 2
(1 − 2c μxCVx2 ) CV − (1 − c μxCVx2 ) CV
− 2
− 2
x x
Exponential
c 2 μ2x exp ( 2c μx )
1−
(1 − 2c μx )−1 − (1 − c μx )−1
Pf estimate may contain error; the correct Pf should have been 28%. The cor-
rected FOA method is particularly useful when only mean and variance of a
model output is needed (e.g., in uncertainty analysis). It is advised that one
should determine higher moments and find a suitable distribution for Z.
Using this distribution along with its parameters one can calculate the cor-
rect value of Pf. This will be further discussed in another approach, called the
generic expectation function approach.
Example 14.13 Solve Example 14.9 using the corrected FOA method assuming
Q is characterized by (1) a normal distribution, (2) a log-normal distribution, and
(3) a gamma distribution.
Solution Substituting values of a, b, and Cmax one can write the performance
function as
Z = Cmax – C = 8 – (2.10 × 10–8)Q3
First, let us calculate the mean and standard deviation of C using the cor-
rected FOA method for the simple power function Q3. The calculation is pre-
sented in the tabular form in Table E14-13. Since the table only considers the
power function Q3, the mean of the power function C = b × Q3 is determined as
The CV will remain unaffected by the constant b so var(C) will be the same as
calculated in column 11. This information can be used to calculate the mean and
standard deviation of the performance function for three cases.
1. For the normal distribution,
E[Z]= E[Cmax] – E[C]
Example 14.14 Solve Example 14.10 using the corrected FOA method assuming
k is characterized by (1) normal and (2) gamma distributions.The water is safe
when the concentration of free residual chlorine is greater than or equal to 0.2
mg/L. The performance function Z can be defined as
Z = C − Cmin = C0 e − kt − Cmin
Example 14.15 Now knowing the mean and variance of C, one can determine
the mean and variance of the performance function Z = C – Cmin using the theory
of statistical expectation. These calculations are shown in Table E14-14b.
Example 14.15 Solve Example 14.11 using the corrected FOA method.
Reliability Analysis and Estimation 595
n n
μds = E [ ds ] = C0 ∏ E ⎡ Xiri ⎤ = C0 ∏ μYi
⎣ ⎦
i =1 i =1
= 110 × 0.35 × 5.65 × 0.0025 × 0.56 × 12.87 × 1
= 4.01
596 Risk and Reliability Analysis
FOA mean Rel. errror Corr. mean FOA var. Rel. error Corr. var CV2
0.35 0.00 0.35 0.0051 7.50 × 10–5 0.0051 7.43 × 10–5
5.66 0.00 5.65 1.02 0.00 1.02 0.03
0.0025 0.0002 0.0025 2.05×10–8 –0.0002 2.05×10–8 0.0032
0.55 0.01 0.56 0.00 0.14 2.62×10–3 0.01
11.246 0.13 12.874 20.44 0.44 36.57 0.22
0.5
⎡ n ⎤
(
CVds = ⎢∏ CVY2i + 1 − 1⎥ )
⎢⎣ i =1 ⎦⎥
( )
0.5
⎡ 1 + 7.43 × 10 −05 ( 1 + 0.03 ) ( 1 + 0.0032 ) ⎤
=⎢ ⎥
⎢⎣× ( 1 + 0.01) ( 1 + 0.22 ) ( 1 + 0.04 ) − 1 ⎥⎦
= 0.57
Thus the standard deviation of ds is
= 12 – 4.01 = 7.99
Because var(Z) = var(ds),
σ Z = σ ds = 2.27
Reliability Analysis and Estimation 597
population mean. The error in the estimation of the population mean is inversely
proportional to the square root of the number of trials. To improve the estimate
by a factor of 2, the sample size must increase by a factor of 4. If the sample size
is n, the standard deviation of the mean is 1 n times the standard deviation of
the population. This indicates that the sample size must be large. As the sample
size increases, the precision of the empirical percentile estimates of a model out-
put improves. However, the rate of convergence to the true distribution
decreases as the size of the sample increases.
The requirement of generating very large samples poses a serious problem.
The MCS method often entails sample sizes that are in the range of 5,000 to
20,000 members. Generally, the number of required samples increases with the
variances and the coefficient of skewness of the input distributions.
Another simulation technique similar to MCS is Latin hypercube sampling
(LHS) in which a stratified sampling approach is used. In LHS the probability
distribution of each basic variable is subdivided into nonoverlapping intervals
(say, m) each with equal probability (1/m). Random values of the basic variables
are simulated such that each range is sampled only once. The order of the selec-
tion of the ranges is randomized and the model is executed m times with the ran-
dom combination of basic variables from each range for each basic variable. The
output statistics and distributions may then be approximated from the sample of
m output values. It has been shown that the stratified sampling procedure of
LHS converges more quickly than an equidistribution sampling employed in
MCS. Except for reducing the computation effort to some extent, LHS has the
same problems that are associated with MCS.
2⎛ ∂ g⎞
2
n
⎛ ∂g ⎞ 1 n
Y = g (X e ) + ∑ ( Xi − Xie ) ⎜ + ∑ ( X − X ) ⎜ 2⎟ (14.47)
⎝ ∂Xi ⎟⎠ X
i ie
i =1 e
2 i =1 ⎝ ∂X i ⎠ Xe
In SOA, the expansion point is commonly the mean value of the basic vari-
ables. By considering all input variables to be statistically independent and tak-
ing the expectation of Eq. 14.47, the expected value Y is given as
1 n ⎛ ∂2 g ⎞
E [Y ] ≈ g ( X ) + ∑ ⎜ ⎟ var ( Xi ) (14.48)
2 i =1 ⎜⎝ ∂Xi2 ⎟⎠
Reliability Analysis and Estimation 599
2 2 2 2
n ⎛ ∂g ⎞ 1 n ⎛ ∂2 g ⎞ n ⎛
∂g ⎞ ⎛ ∂ 2 g ⎞
(
⎟ E ⎡⎢ Xi − Xi )
3⎤
var ( Y ) = ≈ ∑⎜
σ Y2 ⎟ var ( Xi ) − ∑ ⎜⎜ 2 ⎟⎟ var ( Xi ) + ∑ ⎜
2
⎟ ⎜
i =1 ⎝ ∂X i ⎠ X i 4 i =1 ⎝ ∂Xi ⎠ i =1 ⎝ ∂Xi ⎠X i ⎜⎝ ∂Xi2 ⎟⎠ ⎣ ⎥⎦
Xi Xi
2
n ⎛ 2 ⎞
∂ g n −1 n ⎛∂ g⎞ ∂ g
2 ⎛ 2 ⎞
1
( )
+ ∑ ⎜ 2 ⎟ E ⎡ Xi − Xi ⎤ + ∑ ∑ ⎜ 2 ⎟ ⎜ 2 ⎟ var ( Xi ) var X j ( )
4
n
⎛ ∂Z ⎞
E[Z] ≈ ∑ ⎜
⎝ ∂ X
(*
⎟⎠ * X i − Xi ) (14.50)
i =1 i X i
n n
⎛ ∂Z ⎞ ⎛ ∂Z ⎞
var ( Z) = σ 2Z ≈ ∑ ∑ ⎜ ⎟ ⎜ ⎟ (
⎣ )(
E ⎡ Xi − Xi* Xi − X *j ⎤
⎦ ) (14.51)
i = 1 j = 1 ⎝ ∂X i ⎠ X * ⎝ ∂X j ⎠
i X *j
600 Risk and Reliability Analysis
⎛ xi* − μXN ⎞
Φ⎜
⎜⎝ σ X N
i
⎟⎠
*
⎟ = PXi xi ( ) (14.53)
i
N N
where μX i
and σ X i
are the mean value and standard deviation of the equivalent
( )
normal distribution for Xi; PX xi* is the original CDF of Xi; and Φ(.) is the CDF
of the standard normal distribution. Using Eq. 14.53, one can write the mean of
the equivalent normal distribution as
N
μX i
N −1 ⎡
= xi* − σ X i
Φ PXi xi* ⎤
⎣ ⎦ ( ) (14.54)
⎛ xi* − μXN ⎞
N
σX
1
φ⎜
⎜⎝ σ X N
i
⎟⎠ ( )
*
⎟ = pXi xi (14.55)
i i
where φ(.) is the PDF of the standard normal distribution. Combining Eq. 14.54
and Eq. 14.55, one obtains the standard deviation of the equivalent normal dis-
tribution as
N
σX =
{ ( )}
φ Φ −1 ⎡ PXi xi* ⎤
⎣ ⎦ (14.56)
i
( )
pXi xi*
Reliability Analysis and Estimation 601
The key to FORM is the determination of the failure point for the Taylor
series expansion. Shinozuka (1983) has shown that for FORM the reliability
index β is the shortest distance in the standardized space between the system
mean state and the failure surface. Thus, if the failure point is determined cor-
rectly, it represents the most likely combination of input variable values that pro-
duce the critical target level. Ang and Tang (1984) and Haldar and Mahadevan
(2000) present detailed mathematical treatment and interpretation of FORM. The
Hasofer and Lind approach can be summarized as follows:
1. Formulate the performance function or the limit state in terms of the
original design space, that is, X = {X1, X2, X3, …,Xn} [the X space as
shown in Fig. 14-11(a)].
2. Define the independent and standardized normal vector U = {U1, U2, U3,
…,Un} by transforming the input variables into an equivalent normal
distribution.
3. Transform the performance function into the standard normal space [the
U space as shown in Fig. 14-11(b)].
4. Search for the minimum β .
5. Determine the probability of failure or reliability of the system corre-
sponding to the obtained reliability index β.
Figure 14-11 illustrates the concept of reliability index and MPP search for a
two-variable case in the standard normal space. After completing steps 1 and 2, one
focuses on the transformed performance function curve [i.e., G(U1, U2) = 0]. Next,
among the various possible β values, the minimum β is sought. The corresponding
point is called the MPP. The process of determining the minimum β value can be
mathematically expressed as follows:
minimize β = U12 + U 22
X2 U2 MPP
g(X 1 ,X 2 ,…,X n ) = 0 G(U1 , U 2 ,…,U n ) = 0
β
Tangent at MPP
U1
Figure 14-11 Transformation of input variables, nonlinear limit state function, MPP,
and reliability index in FORM.
602 Risk and Reliability Analysis
For a more general system consisting of n input variables, Eq. 14.57 can be
written as
∑ i=1Ui2 =
n
minimize β = UTU
Therefore, the difficulty then lies in determining the minimum distance for a
general nonlinear function. This is essentially a nonlinear, constrained optimiza-
tion problem. Thus, determination of β requires application of a constrained
nonlinear optimization, such as the generalized reduced-gradient algorithm
used by Cheng (1982), the Lagrange multiplier approach used by Shinozuka
(1983), or the iterative optimization method suggested by Rackwitz (1976).
( ∂g ∂Ui )U
αi = *
n (14.59)
∑ ( ∂g ∂Ui )U
2
*
i =1
Reliability Analysis and Estimation 603
The function L is called the Lagrangean function and the parameter λ the
Lagrange multiplier. Taking the partial derivatives of L with respect to Xi and λ
and equating to zero one gets the following equations:
∂L
=0 (14.61)
∂X i
∂L
=0 (14.62)
∂λ
Solving Eq. 14.61 and Eq. 14.62, one gets the coordinates of the MPP and can
evaluate the reliability index as
2
n ⎛ Xi − μi ⎞
β= ∑ i =1 ⎜
⎝ σi ⎟⎠
Example 14.16 Solve case 2 of Example 14.8 using the Rackwitz and Lagrange
multiplier methods.
Solution The performance function is given as
TMDL
Z= −1
WLA + LA
604 Risk and Reliability Analysis
Rackwitz’s Method
Step 1: In this example all the variables are assumed to be independent and
normally distributed. The first step is to transform the random variables into
the reduced space as
U 3 = ( LA − μLA ) σ LA
where the mean and standard deviations are given in Table E14-16a.
The first iteration proceeds as follows:
Step 2: Assume initial values for each input random variable,
U *1 = U *2 = U *3 = 0 ; that is, TMDL* = 1678 , WLA* = 65, and LA* = 1258.
Step 3: Evaluate the partial derivatives:
1
∂g ∂U1 = σ TMDL
( WLA+LA)
−TMDL
∂g ∂U 2 = σ WLA
( WLA+LA)2
−TMDL
∂g ∂U 3 = σ LA
( WLA+LA)2
For the assumed MPP, the values of the partial derivatives are
1
∂g ∂U1 U * = 436 × = 0.33
1 (65 + 1258)
−1678
∂g ∂U 2 = 12 × = −0.01
(65 + 1258)2
−1678
∂g ∂U 3 = 415 × = −0.40
(65 + 1258)2
Table E14-16a
Variable Standard deviation Mean
TMDL 436 1678
WLA 12 65
LA 415 1258
Reliability Analysis and Estimation 605
( ∂g ∂U1 )U 0.33
α1 = *
= = 0.64
3
(0.33)2 + ( −0.01)2 + ( −0.40)2
∑ ( ∂g )
2
∂U i U
*
i =1
( ∂g ∂U 2 )U −0.01
α2 = *
= = −0.02
3
(0.33)2 + ( −0.01)2 + ( −0.40)2
∑ ( ∂g )
2
∂U i U
*
i =1
( ∂g ∂U 3 )U −0.40
α3 = *
= = −0.77
3
(0.33)2 + ( −0.01)2 + ( −0.40)2
∑ ( ∂g )
2
∂U i U
*
i =1
and so
1678 − 1323
β= = 0.593
319.79 + 278.84
Thus the revised failure point is
TMDL = 1678 − 278.29 × 0.593 = 1513
Step 7. Using the result from step 6 as a new starting point steps 2 to 6 are
repeated until the starting points in step 2 and resulting solution in step 6 con-
verge to the same solution. The new iterations are presented in Table E14-16b.
Note that the solution converges in the fourth iteration. Another point to be
noted in this example is that the reliability index β remains the same in all iter-
ations, which is not a necessity.
n ⎛ 2
X − μi ⎞ ⎛ X1 ⎞
L = ∑⎜ i ⎟ + λ⎜ − 1⎟ (14.63)
i =1 ⎝ σ i ⎠ X
⎝ 2 + X 3 ⎠
in which X1, X2, and X3 are TMDL, WLA, and LA and are the mean and stan-
dard deviation of the respective Xi. Taking the partial derivatives of L and set-
ting them equal to zero, one obtains the following four nonlinear equations:
λ
2 ( X1 − μ1 ) + σ 12 = 0 (14.64)
X2 + X3
λX1
2 ( X 2 − μ2 ) − σ 22 = 0 (14.65)
(X2 + X3 ) 2
Table E14-16b
Iteration Variable, Xi Assumed Standard Mean Xi ∂g/∂Ui α1 New Xi
failure deviation
point Xi
TMDL 1513.98 436 1678 0.289 0.724 1491.91
WLA 65.15 12 65 –0.008 0.019 65.13
2nd
LA 1446.35 415 1258 –0.275 0.69 1426.77
β = 0.589
TMDL 1491.91 436 1678 0.292 0.724 1491.76
WLA 65.13 12 65 –0.008 0.019 65.13
3rd
LA 1426.77 415 1258 –0.278 0.689 1426.63
β = 0.589
TMDL 1491.76 436 1678 0.292 0.724 1491.76
WLA 65.13 12 65 –0.008 0.019 65.13
4th
LA 1426.63 415 1258 –0.278 0.689 1426.63
β = 0.589, Pf = 1 − Φ (0.59) = 0.28
Reliability Analysis and Estimation 607
λX1
2 ( X 3 − μ3 ) − σ 23 = 0 (14.66)
(X2 + X3 ) 2
X1
−1= 0 (14.67)
X2 + X3
Table E14-16c
Item TMDL WLA LA λ
Mean 1678 65 1258
CV 0.26 0.18 0.33
Standard deviation 436 12 415
Assumed solution, Xi 1491.76 65.13 1426.63 2.92
0.1822 0.0001 0.1650
U 2i = ⎡⎣( Xi − μi ) σ i ⎤⎦
2
2
3
⎛X − μ ⎞
β= ∑ ⎜⎝ i σ i ⎟⎠ = 0.589
i =1 i
Z = 0.000
Pf = 0.278
(X − μ) C (X − μ)
T
β = min X ∈F (14.68)
where X represents the vector of the random variables, μ are the corresponding
mean values, C is the covariance matrix, F is the failure region, and T indicates the
608 Risk and Reliability Analysis
(X − μ) C (X − μ) = 1
T
(14.69)
(
Z = 25.50 − 1.41X1 + 0.039X12 − X 2 )
containing only two random variables X1 and X2. For this case Eq. 14.69 reduces to
−1
⎡ σ 12 ρσ 1 σ 2 ⎤ ⎡ ( X1 − μ1 ) ⎤
⎡⎣( X1 − μ1 ) ( 2 2 )⎦ ⎢
X − μ ⎤ ⎥ ⎢ ⎥=1 (14.70)
⎢⎣ ρσ 1 σ 2 σ 22 ⎥⎦ ⎣( X 2 − μ2 ) ⎦
( X1 − μ1 )2 + ( X2 − μ2 )2 − 2 ρ ( X1 − μ1 ) ( X2 − μ2 ) = 1
(14.71)
(1 − ρ2 ) σ12 (1 − ρ2 ) σ 22 (1 − ρ2 ) σ1σ 2
Equation 14.71 is the 1σ dispersion ellipse plotted for various values of ρ assum-
ing μ1 = μ2 = 9, σ 1 = 3, and σ 1 = 2. Figure 14-11 also shows the critical ellipse for
(
ρ = 0.7, which just touches the limit state line 25.50 − 1.41X1 + 0.039X12 − X 2 = 0 . )
The critical ellipse is the 1σ dispersion ellipse corresponding to ρ = 0.7 expanded by
β times so that it becomes tangent to the limit state line.
Low (1996) has explained the meaning of the reliability index as follows.
Suppose the 1σ scatter ellipse gradually expands without changing its original
aspect ratio. The equation of the ellipse at any time is then obtained by substitut-
ing kσ 1 for σ 1 and kσ 2 for σ 2 in Eq. 14.71. The resulting equation is
( X1 − μ1 )2 + ( X2 − μ2 )2 − 2 ρ ( X1 − μ1 ) ( X2 − μ2 ) = k 2
(14.72)
(1 − ρ2 ) σ12 (1 − ρ2 ) σ 22 (1 − ρ2 ) σ1σ 2
When the expanding ellipse just touches the failure surface (or limit state
surface), the value of k is the reliability index β, as shown in Fig. 14-12. The corre-
sponding point of contact on the failure surface is the MPP having the coordi-
nate vector X * . Therefore, the reliability index
(X * - m1 ) + (X * - m2 ) - 2r (X * - m1 ) (X * - m2 )
2 2
b=
1 2 1 2
(14.73)
(1 - r 2 )s 12 (1 - r 2 )s 22 (1 - r 2 )s 1s 2
Reliability Analysis and Estimation 609
ρ(X1, X2) = 0
Figure 14-12 1–σ dispersion ellipses for various values of ρ and the critical ellipse for =0.7.
The search for the most critical point (X1*, X2*) on the failure surface and
subsequent evaluation of β index using Eq. 14.73 can be formulated into an opti-
mization problem as follows (Low 1996):
( X1 − μ1 )2 + ( X2 − μ2 )2 − 2 ρ ( X1 − μ1 ) ( X2 − μ2 )
minimize β = (14.74a)
(1 − ρ2 ) σ12 (1 − ρ2 ) σ 22 (1 − ρ2 ) σ1σ 2
subject to Z(X1, X2) = 0 (14.74b)
where Z(X1, X2) = 0 is the equation describing the limit state function. A similar
method can be used for systems involving more than two random variables, in
which case the ellipses will become ellipsoids or hyper-ellipsoids.
minimize β =
( X1 − μ1 )2 + ( X2 − μ2 )2 + (X3 − μ3 )2 subject to Z(X1, X2, X3) = 0
σ 12 σ 22 σ 23
610 Risk and Reliability Analysis
The input variables to the problem are shown in Table E14-17. Invoking the
optimization module SOLVER in Excel by setting β = minimum, changing
assumed Xi values, and subjecting the problem to the constraint Z = 0, one can
perform the calculations listed in Table E14-17.
Example 14.18 A culvert has been designed for a carrying capacity of 30 cfs.
Based on the rational formula, the 5-year flow at the culvert is given as Q = 2.41
× C × A × (Tc + 0.2)–0.77. The data are shown in Table E14-18a. Determine the
probability of failure of the culvert using FORM assuming all the variables are
independent and normally distributed.
Solution The objective function becomes
Because the variables are uncorrelated, by using the ellipsoid method the
problem for determining the reliability index, Eq. 14.74, can be written as
minimize β =
( X1 − μ1 )2 + ( X2 − μ2 )2 + (X3 − μ3 )2
σ 12 σ 22 σ 23
Table E14-17
Item TMDL WLA LA
Mean 1678 65 1258
CV 0.26 0.18 0.33
Standard deviation 436.28 11.7 415.14
Assumed solution, Xi 1491.76 65.13 1426.63
0.18 0.001 0.16
U 2i = ⎡⎣( Xi − μi ) σ i ⎤⎦
2
Nonlinear optimization
2
3
⎛ Xi − μi ⎞
min β = ∑ ⎜⎝ σ i ⎟⎠
= 0.59, after minimization
i =1
subjected to Z = 0
Pf = 1 – normal(0.59) = 0.28
Reliability Analysis and Estimation 611
Table E14-18a
Function Mean SD
A 12.00 1.20
C 0.45 0.15
Tc 0.37 0.23
Table E14-18b
Item C A TC
Mean 0.45 12.00 0.37
CV 0.33 0.10 0.62
Standard deviation 0.15 1.20 0.23
Assumed solution, Xi 0.52 12.19 0.21
0.21 0.02 0.46
U 2i = ⎡⎣( Xi − μi ) σ i ⎤⎦
2
Nonlinear optimization
2
3
⎛X − μ ⎞
min β = ∑ ⎜⎝ i σ i ⎟⎠ = 0.83, after minimization
i =1 i
subjected to Z = 0
Pf = 1 − Φ (0.83) = 0.20
Example 14.19 Solve Example 14.18 using FORM if C and Tc are correlated with
ρ(C, Tc) = 0.75.
Solution This problem can be solved by formulating the problem in the matrix
notation:
(X − μ) C (X − μ)
T
β = min X ∈F
subjected to Z = 0
To explain how this calculation is performed, let us assume a starting point
( )
A = 10, C = 1, and Tc = 0.5. The corresponding X − μ matrix can be written as
⎛ 10 − 12 ⎞ ⎛ −2 ⎞
( )
X − μ = ⎜ 1 − 0.45 ⎟ = ⎜ 0.55⎟
⎜ ⎟ ⎜ ⎟
⎜⎝ 0.5 − 0.37 ⎟⎠ ⎜⎝ 0.13⎟⎠
612 Risk and Reliability Analysis
(
Thus the X − μ ) T
matrix becomes
⎛ 1.44 0 0 ⎞
⎜
Co = 0 0.02 0.03⎟
⎜ ⎟
⎜⎝ 0 0.03 0.05⎟⎠
⎛ 0.69 0 0 ⎞
⎜
Co = 0 103.65 −50.79⎟
⎜ ⎟
⎜⎝ 0 −50.79 44.24 ⎟⎠
( )
Now, Co X − μ is determined as
⎛ −1.39 ⎞
( )
Co X − μ = ⎜ 50.40 ⎟
⎜ ⎟
⎜⎝ −22.18⎟⎠
out to be 27.62. At this point the objective function Z = –1.72. So this point is not
located on the limit state surface. Now this process of matrix manipulation is
performed with the constraint Z = 0 using SOLVER. The mean, standard devia-
tion (SD), and covariance matrix are entered as given in Table E14-19. Then some
X values are assumed as the starting point. The matrices (X – μ), (X – μ)T, and Co
are determined using Excel spreadsheet formulas in terms of cells. Then formu-
las for matrix multiplication Co(X – μ) and (X – μ)T Co(X – μ) are fed into the pro-
Example 14.20 Solve Example 14.19 using FORM if C, Tc , and A are character-
ized by log-normal, gamma, and triangular distributions, respectively. Assume
ρ = 0.75 between C and Tc.
Solution This problem is the same as given in Example 14.19, except for the type
of distributions used to characterize C, Tc, and A. The FORM approach for deter-
mining the reliability index given by Hasofer and Lind assumes that all the ran-
Reliability Analysis and Estimation 613
Table E14-19
Variable Xi Mean SD Covariance matrix, C
Xi value Xi(μ) Xi(σ )
A 12.46 12.00 1.20 1.44 0 0
C 0.42 0.45 0.15 0 0.02 0.03
Tc 0.12 0.37 0.23 0 0.03 0.05
dom variables are normally distributed. Therefore, all the non-normal variables
need to be transformed into their equivalent normal variables. This can be accom-
plished by using Eq. 14.54 and Eq. 14.56. As evident from these equations, the
N N
parameters of the equivalent normal distribution, μX i
and σ X i
, are functions of
*
the assumed failure point Xi ; these are used in calculations to get a better esti-
mate of the failure point; the transformation process is performed in each itera-
tion step of optimization. Once the parameters of equivalent normal distributions
for all the variables in each iteration are determined, the problem becomes the
same as given in Example 14.19. A stepwise solution is given as follows:
Step 1: Assume a starting value for xi* ; generally mean values are taken as
the starting point (see Table E14-20a).
Table E14-20a
A 11.43
C 0.45
Tc 0.37
(
â = μX 1 − 6CVX ) Par. 1
b̂ = μX (1 + 6CVX ) Par. 2
614 Risk and Reliability Analysis
1 ⎛ μX 2 ⎞
μY = ln ⎜ Par. 1
2 ⎝ 1 + CVX ⎟⎠
2
(
σ Y = ln 1 + CVX2 ) Par. 2
⎣ i ⎦ ( )
point xi* , determine Φ −1 ⎡ PX xi* ⎤ , in other words, the normal standard
variate (Z value), corresponding to the cumulative probability PXi xi* ( )
(column 9). Now using this Z value, one can easily determine the density of
{ ( )}
the standard normal distribution φ Φ −1 ⎡ PX xi* ⎤ (column 10).
⎣ i ⎦
Step 5: Determine the standard deviation of the equivalent normal distribu-
tion using Eq. 14.53 as
N
σX =
{ ( ) } = Column 10
φ Φ −1 ⎡ PXi xi* ⎤
⎣ ⎦
i
( )
pXi xi* Column 8
N
μX i
N −1 ⎡
= xi* − σ X i ⎣ ⎦( )
Φ PXi xi* ⎤ (column 11 in Table E14-20b).
Table E14-20b
Variable Distribution Mean CV Par. 1 Par. 2
(1) (2) (3) (4) (5) (6)
( )
PXi xi* ( )
pXi xi* ( )
Φ −1 ⎡ PXi xi* ⎤
⎣ ⎦ { ⎣ ( )}
φ Φ −1 ⎡ PXi xi* ⎤
⎦
N
μX i
N
σX i
Table E14-20c
Variable Xi Meanb SDc Covariance matrix, C
Xi valuea N N
μX i
σX i
0.00
(X – μ)TC–1 (X – μ) b Pf Performance
2.95
function, Z
–0.53 0.13 0.36 0.36 10.89
a. As assumed in step 1.
b. From column 11.
c. From column 12.
616 Risk and Reliability Analysis
N N
μX i
and σ X i
are automatically updated and used in the optimization. The
optimization table is the same as used in Example 14.19.
FORM is quite accurate because it is able to overcome model nonlinear-
ity problems, and no additional assumption about the distribution type of
the performance function is required. It is still an approximation method
because the performance function is approximated by a linear function at the
design point, and accuracy problems may arise when the performance func-
tion is strongly nonlinear (Cawlfield and Wu 1993, Zhao and Ono 1999).
Another disadvantage of FORM is that determination of the linearization
point is generally not easy, depending upon the nature and complexity of the
system for which the reliability, risk, or uncertainty analysis is being studied
(Melching and Anmangandla 1992). Further, the magnitude of acceptable
convergence may affect the accuracy of the reliability estimates. In some
cases, the magnitude of the convergence error may not be reduced after a
certain level.
those distribution forms that require higher order moments. Let us consider an
exponential function
Y = f ( x) = be cx (14.75)
where b and c are constants. The generic expectation function is defined as the
rth moment of Y about the origin ( μr′ ). Mathematically, it is defined as
∞ ∞
E ⎡⎣Y r ⎤⎦ = μ′r = ∫ [ f (x)] pX (x)dx =b r
r
∫e
rcx
pX ( x )dx (14.76)
−∞ −∞
where μY is the mean of Y, which can be evaluated from Eq. 14.76 by substitut-
ing r = 1 as
μy = E [Y ] (14.78)
where μ2 is the second moment of Y about the mean. The coefficient of skewness
of Y, γY, is defined as
μ3
γY = 32 (14.80)
μ2
where μ3 is the third moment of Y about the mean, which can be obtained by
substituting r = 3 in Eq. 14.77 as
where μ4 is the fourth moment of Y about the mean, which can be obtained by
substituting r = 4 in Eq. 14.77 as
Example 14.21 Solve Example 14.10 for the time of travel t = 1 to 24 hours using
the generic function method. Also calculate the mean and variance using the
FOA method and their errors using the corrected FOA method. Assume k is
characterized by a gamma distribution with the mean and coefficient of varia-
tion of 0.14 L/hour and 0.33, respectively.
Solution First, statistical characteristics of C at times t ranging from 1 to 24
hours are computed using the FOA and the generic expectation methods. Errors
in the FOA estimates have also been calculated using the corrected FOA method.
One can calculate any order of higher moment using the generic expectation
method. In this example however only four moments are calculated. The higher
order moments can help in identification of the distribution either by matching
the moments or by a more formal treatment such as using the principle of maxi-
mum entropy as discussed in Chapter 9.
It can be noticed from tabulated calculation that there is a substantial
amount of error in the FOA estimates of means and variances of residual chlo-
rine concentration in the water distribution system. The FOA underestimates the
mean throughout the 24 hours, whereas it overestimates the variance during the
first 10 hours and underestimates afterward. By using the generic expectation
function method, first-, second-, third-, and fourth-order moments of C about
the origin are estimated as listed in columns 7, 8, 9, and 10 of Table 14-8 . Using
these moments, central moments and statistical distributional characteristics of
C such as mean, variance, coefficient of variation, skewness, and kurtosis are cal-
culated as listed in columns 11, 12, 13, 14, and 15, respectively.
Reliability Analysis and Estimation 619
Table 14-6 Generic expectation functions for some commonly used probability density
functions.
Uniform ∞
μr
∫ (b − a) X dX = 2 3 (r +X1)CVX ⎢⎣(1 + CVX 3 ) ( )
1 ⎡ r +1 r +1 ⎤
E ⎡⎣ X r ⎤⎦ = r
− 1 − CVX 3
⎥⎦
−∞
Symmetrical
μrX
( ) ( )
⎡ 1 + CV 6 r+2 r+2
triangular E ⎡⎣ X r ⎤⎦ = + 1 − CVX 6 − 2⎤
6 ( r + 1) ( r + 2) CVX2 ⎣⎢ ⎦⎥
X
Unsymmetrical
2 ⎡(b − c ) a r + 2 + (c − a ) b r + 2 + ( a − b ) c r + 2 ⎤⎦
triangular E ⎡⎣ X r ⎤⎦ = ⎣
(r + 1) (r + 2) (b − c) (c − a) (b − a)
Log-normal r ( r −1)
(
E ⎡⎣ X r ⎤⎦ = μrX 1 + CVX2 ) 2
Gamma
⎧
⎦ ( ⎣ )⎫
E ⎡⎣ X r ⎤⎦ = μrX CVX2 r exp ⎨ln ⎡Γ CVX−2 + r ⎤ − ln ⎡Γ CVX−2 ⎤ ⎬
⎩ ⎣ ⎦ ⎭ ( )
Exponential
Ε ⎡⎣ X r ⎤⎦ = μrX Γ ( r + 1)
Normal
⎡ r ( r − 1) r ( r − 1) ( r − 2)…( r − n + 1) ⎤
E ⎡⎣ X r ⎤⎦ = μrX ⎢1 + CVX2 + … + n
CVXn + ...⎥
⎢ 2 2 2 (n / 2)! ⎥
⎣ ⎦
r /2
⎛ r⎞
E ⎡⎣ X r ⎤⎦ = μrX ∑ ⎜ ⎟CVX2 nE ⎡⎣ z 2 n ⎤⎦ , when r is even
n=0
⎝ 2 n⎠
(r −1)/ 2 r
⎛ ⎞
E ⎡⎣ X r ⎤⎦ = μrX ∑ ⎜⎝ 2 n⎟⎠CVX E ⎡⎣ z ⎤⎦
2n 2 n , when r is odd
n=0
Table 14-7 Generic expectation functions for some commonly used probability density
functions.
Uniform
br ⎡ rc μx (1+ CVx 3 ) − e rc μx (1−CVx 3 ) ⎤
2 3 rc μxCVx ⎢⎣ e ⎥⎦
Symmetrical
br ⎡ 1 rc μx (1+ CVx 6 ) 1
(
rc μx 1− CVx 6 )⎤
triangular ⎢e 2 − e2 ⎥
6 r 2c 2 μ2xCVx2 ⎢⎣ ⎥⎦
Unsymmetrical
2b r ⎡⎣( α − β ) exp ( rcω) + ( β − ω) exp ( rcα) + ( ω − α) exp ( rc β )⎤⎦
triangular
r 2c 2 ( β − α ) ( ω − α ) ( β − ω)
where α, β, and ω are the min, max, and mod of the random variable X.
Normal
⎛ 1 ⎞
b r exp ⎜ rc μx + r 2c 2 μ2X CVX2 ⎟
⎝ 2 ⎠
Gamma 1
( ) CV
−
b r 1 − cr μxCVx2 2
x
Exponential br
(1 − cr μx )
Point estimate methods are procedures where probability distributions for con-
tinuous random variables are modeled by discrete “equivalent” distributions
having two or more values. The elements of these discrete distributions (or point
estimates) have specific values with defined probabilities such that the first three
moments of the discrete distribution match those of the continuous random vari-
able. With only a few values over which to integrate, the moments of the perfor-
mance function are easily obtained. First we summarize the PE method
developed by Rosenblueth (1981), which is applicable to both symmetric and
nonsymmetric and to correlated and uncorrelated random input variables.
Pi- Pi+
pX i(x i )
Xi- Xi+
Random Variable X i
by two point estimates, Xi- and Xi+, with probability concentrations Pi- and Pi+,
respectively. Because the two point estimates and their probability concentra-
tions form an equivalent probability distribution for the random variable, the
two P values must sum to unity. The two point estimates and probability con-
centrations are chosen to match three moments of the random variable. These
two probability masses (Pi- and Pi+) are located x’i- and x’i+ standard deviations
above and below the mean:
Xi + = μ ( Xi ) + xi’ + σ ( Xi ) (14.85)
Xi − = μ ( Xi ) + xi’ − σ ( Xi ) (14.86)
2
γ(Xi ) ⎛ γ(Xi ) ⎞
x’i + = + 1+ ⎜ ⎟ (14.87)
2 ⎝ 2 ⎠
xi − = xi’ + − γ ( Xi ) (14.88)
xi’ −
Pi + = (14.89)
xi’ + + xi’ −
Pi − = 1 − Pi + (14.90)
Now, once the two probability masses and their locations for each random
variable are determined, there will be 2N points on the performance function at
which the values of the performance function need to be determined. For exam-
ple, for a bivariate performance function Y(X1, X2), four point evaluations need
to be done as follows:
N N -1 Ê N ˆ
f(d
1 ,d 2 ,…,d n )
= ’ Pi ,d i +  ÁÁ  d id j aij ˜˜ (14.92)
i =1 i =1 Ë j = i +1 ¯
After evaluating all the point locations of the performance function and their
corresponding probability masses, one can approximate the rth moment of the
performance function about the origin as
E ⎡⎣Y r ⎤⎦ = ∑ p δ r
( 1 , δ2 ,..., δn )Y(δ1 , δ2 ,..., δn ) (14.94)
From this information, the mean and the variance of the performance func-
tion are determined as
μY = E [Y ] (14.95)
var (Y ) = E ⎡⎣Y 2 ⎤⎦ − ( μY )
2
(14.96)
624 Risk and Reliability Analysis
Now, one can determine the reliability index and the corresponding reliabil-
ity and probability of failure.
As is seen, this method requires 2N model evaluations to estimate a single statis-
tical moment of the model output. For a complex model with a large number of
parameters, Rosenblueth’s PE method is computationally intensive and may some-
times be impractical. Further, a reliability analysis requires knowledge of higher-
order moments to approximate the distribution of the output random variable.
This makes the method even more computationally extensive. Thus, although
Rosenblueth’s method is quite efficient for problems with a small number of uncer-
tain basic variables, its computational requirements are similar to those of MCS for
a model having a large number of parameters. For example, a model having
between 10 and 15 parameters will require 1,024 to 32,768 model evaluations.
Harr (1989) modified Rosenblueth’s method to reduce its computational
requirements from 2N to 2N for an N input parameter model by using the first
two moments of the random variables. This method does not provide the flexibil-
ity to incorporate known higher-order moments of input random variables.
Chang et al. (1995) showed that the estimated uncertainty feature of model out-
put could be inaccurate if the skewness of a random variable is not accounted for.
(
Y = X 2 − 0.04X12 − 1.41X1 + 25.5 )
Assume that the means of X1 and X2 are 9 and 20 and that their standard
deviations are 3 and 2, respectively. Determine the reliability index and failure
probability using the PE method for the following cases:
(a) Both X1 and X2 are independent and normally distributed.
(b) Both X1 and X2 are dependent and normally distributed with covariance
of 4.2.
(c) X1 is characterized by a gamma distribution and X2 is log-normally dis-
tributed with covariance of 4.2.
Solution
(a) See Table 14-23a.
(b) See Table 14.23b.
(c) See Table 14.23c.
Reliability Analysis and Estimation 625
Table E14-22
Item and formula Case 1 Case 2 Case 3
X = normal X = gamma X = log-normal
Known information
μ (Q ) 800 800 800
γ (Q ) 0 0.66 1.03
1 0.8 0.7
x’− = x’+ − γ (Q)
21.17
E ⎡⎣Y 2 ⎤⎦ = P++Y+2+ + P−−Y−2− + P+−Y+2− + P−+Y−2+
8.28
var [Y ] = E ⎡⎣Y 2 ⎤⎦ − {E [Y ]}
2
σ (Y ) 2.87
β = μ (Y ) σ (Y ) 1.24
Pf = 1 − Φ ( β ) 0.11
Table E14-23b
Item X1 = normal X2 = normal Item Value
9.00 20.00 cov(X1,X2) 4.2
μ ( Xi )
CV(Xi) 0.33 0.10
ρ( X1 , X 2 )
0.70
2
1.00 1.00 P−− = P1− P2 − + a12 0.43
’ γ(Xi ) ⎛ γ(Xi ) ⎞
x i+ = + 1+ ⎜ ⎟
2 ⎝ 2 ⎠
1.00 1.00 P+− = P1+ P2 − − a12 0.08
xi − = xi’ + − γ ( Xi )
12.00 22.00 P−+ = P1− P2 + − a12 0.08
Xi + = μ ( Xi ) + xi’ + σ ( Xi )
6.00 18.00 7.66
Xi − = μ ( Xi ) − xi’ − σ ( Xi ) Y++ = Y ⎡⎣ μ( X1 ) + σ ( X1 ) , μ( X 2 ) + σ ( X 2 ) ⎤⎦
0.50 0.50 –0.48
xi’ − Y−− = Y ⎡⎣ μ( X1 ) − σ ( X1 ) , μ( X 2 ) − σ ( X 2 ) ⎤⎦
Pi + =
xi + + xi’ −
’
26.97
E ⎡⎣Y 2 ⎤⎦ = P++Y+2+ + P−−Y−2− + P+−Y+2− + P−+Y−2+
14.08
var [Y ] = E ⎡⎣Y 2 ⎤⎦ − {E [Y ]}
2
σ (Y ) 3.75
β = μ (Y ) σ (Y ) 0.96
Pf = 1 − Φ ( β ) 0.17
628 Risk and Reliability Analysis
Table E14-23c
Item X1 = X2 = Item Value
gamma log-
normal
9.00 20.00 cov(X1,X2) 4.2
μ ( Xi )
2
1.39 1.16 P−− = P1− P2 − + a12 0.55
’ γ(Xi ) ⎛ γ(Xi ) ⎞
x i+ = + 1+ ⎜
2 ⎝ 2 ⎟⎠
4.6
Y−+ = Y ⎡⎣ μ( X1 ) − σ ( X1 ) , μ( X 2 ) + σ ( X 2 ) ⎤⎦
25.69
E ⎡⎣Y 2 ⎤⎦ = P++Y+2+ + P−−Y−2− + P+−Y+2− + P−+Y−2+
12.81
var [Y ] = E ⎡⎣Y 2 ⎤⎦ − {E [Y ]}
2
σ (Y ) 3.57
β = μ (Y ) σ (Y ) 1.00
Pf = 1 − Φ ( β ) 0.15
Reliability Analysis and Estimation 629
Table E14-24a
Variable Mean value Coefficient of variation Skew coefficient
X1 40 0.125 0.20
X2 50 0.05 0.70
X3 1000 0.20 –0.66
630 Risk and Reliability Analysis
Table E14-24b
Item X1 X2 X3
μ (Xi) 40 50 1000
CV(Xi) 0.125 0.05 0.20
σ (Xi) 5 2.5 200
γ (Xi) 0.2 0.7 –0.66
Table E14-24c
Item Y = f(x1, x2, x3)
Y+++ 1236.66
Y++– 1636.66
Y+–+ 995.50
Y+–– 1395.50
Y–++ 698.76
Y–+– 1098.76
Y – –+ 510.83
Y ––– 910.83
Table E14-24d
Performance function Probability
YijkPijk Y2ijkPijk
Point notation Value Notation Value
Y+++ 1236.66 P+++ 0.199 246.5 304848.1
Y++– 1636.66 P++– 0.041 66.5 108773.4
Table E14-24e
Item Value
942.07
E [Y ] = P+++ Y+++ + P++- Y++- + P+-+ Y+-+ + P+-- Y+-- + P-++ Y-++ + P-+- Y-+-
+ P--+ Y--+ + P--- Y---
971341.97
E ÈÎY 2 ˘˚ = P+++ Y+++
2 2
+ P++- Y++- 2
+ P+-+ Y+-+ 2
+ P+-- Y+-- 2
+ P-++ Y-++ 2
+ P-+- Y-+-
2 2
+ P--+ Y--+ + P--- Y---
83841.09
var [Y ] = E ⎡⎣Y 2 ⎤⎦ − {E [Y ]}
2
σ (Y ) 289.55
β = μ (Y ) σ (Y ) 3.25
Pf = 1 − Φ ( β ) 5.70 × 10–4
where yi is the mean of yi+ and yi-; yi =(yi+ + yi-)/2= [f(xi+) +f(xi-)]/2; and λi are
the eigenvalues obtained as the correlation matrix ρ of variables decomposed
using the orthogonal transformation method into an eigenvector matrix
(w1, w2, w3,…, wn), W, its transpose WT, and a diagonal matrix Δ containing the
eigenvalues λ1 , λ2 ,… , λn :
ρ = W λW T (14.98)
632 Risk and Reliability Analysis
where superscript T denotes the transpose of the matrix. The uncorrelated stan-
dardized coordinates of the vectors of the n random variables x+ and x– are gen-
erated as
xi − = μ − n D wi and xi + = μ + n D wi (14.99)
n
∑ yim (14.100)
i =1
E[Y m ] =
n
in which yi is calculated as before.
The weighting factor for each independent variable xi is considered for the
modified uncorrelated standardized coordinates in the eigenspace as
3n η n n n n
E[Y m ] = (1 − + + ∑ pi 0 ) + ∑ [( pi + − ηi + 1)yim+ + pi − yim−1 )] + ∑ ∑ yijm ηij (14.102)
2 2 i =1 i =1 j i< j
where η is the sum of all the η i , η i is the sum of all η ij with respect to i, and
ηij = ρij /( xi’ + x’j + ) . Note that ηii =1 . The points xi- , xi+, and μ are computed as
Reliability Analysis and Estimation 633
xi’ +1 = μi + xi’ + σ ,
xi − = μi + xi’ − σ ,
γ1 + 4ki − 3 γ i2
xi’ + = ,
2
γ i − 4ki − 3 γ i2
xi’ −= ,
2
xi 0 = μ
1 1
pi + = , pi − = , pi 0 = 1 − pi + − p− (14.103)
xi’ + ( xi’ + − xi’ − ) xi’ − ( xi’ − − xi’ + )
This method is efficient and accurate. Thus the reliability of the system is
R = 1 − Pf = 0.999 (i.e., 99.9%).
N −1 N
η−N m N
E[Y m ] = [(1 − n) + ] y + ∑ [( pi + − ηi + 1)yin+ ] + ∑ ∑ yijm ηij (14.104)
2 i =1 i =1 j = i + 1
N −1 N
η − 3N m N
y + ∑ [( pi + − ηi + 1)yim+ + pi − yim− ] + ∑ ∑ yijm ηij (14.105)
m
E(Y m ) − Y =
2 i =1 i =1 j = i + 1
Table 14-9
Characteristics Rosenblueth’s Harr’s Modified Li’s Modified
method method Harr’s method Rosenblueth’s
method method
Moments needed 3 2 2 4 3
Intensity of 2N 2N 2N (N2+3N+2)/2 (N2+3N+2)/2
computation
Capability to yes yes yes yes yes
consider correlated
variables
Capability to yes yes yes yes yes
consider
asymmetric
variables
The expected (or mean) time to failure (MTTF) of a system, or its expected
life, is the expected value of time during which the system will be reliable (or
operate successfully), that is,
∞
MTTF = E(t) = μt = ∫ tf (t)dt (14.107)
0
dR(t)
= R ′(t) = − f (t) (14.109)
dt
Substituting Eq. 14.109 into Eq. 14.107, we have
∞
E(t) = − ∫ tR ′(t)dt (14.110)
0
Since all systems must fail eventually, R(t) approaches zero faster than t
approaches infinity. Hence,
∞
E(t) = ∫ R(t)dt
0
∞
σ t2 = var[t] = ∫ (t − μt )2 f (t)dt (14.112)
0
∞ 2
⎡∞ ⎤
σ t2 = var(t) = ∫ 2tR(t)dt − ⎢ ∫ R(t)dt ⎥ (14.113)
0 ⎢⎣ 0 ⎥⎦
This can be shown as follows. Substituting Eq. 14.109 into Eq. 14.112, we
have
∞
σ t2 = − ∫ (t − μt )2 R ′(t)dt (14.114)
0
∞
= − ∫ (t 2 − 2 μtt t + μt2 )R ′(t) dt (14.115)
0
∞ ∞ ∞
σ t2 = − ∫ t 2 R ′(t)dt + 2 μt ∫ tR ′(t)dt − μt2 ∫ R ′(t)dt
0 0 0
∞ ∞ ∞
∞
= − ∫ t 2 dR(t) + 2 μt ∫ −tf (t)d(t) + μt2 = −t 2 R(t) 0 + ∫ 2tR(t)dt − 2 μt2 + μt2
0 0 0
∞ ∞ 2
⎡∞ ⎤
= ∫ 2tR(t)dt − μt = ∫ 2tR(t)dt − ⎢ ∫ R(t)dt ⎥
2
0 0 ⎢⎣ 0 ⎥⎦
Also,
∞
MTTF = ∫ t0.0055 exp(−0.0055t)dt = 1/ 0.0055 = 182 days
0
P [t1 ≤ t ≤ t1 + Δt ]
P ⎡⎣ t1 < t ≤ t1 + Δt t > t1 ⎤⎦ = (14.117)
P [t > t1 ]
Dividing both sides by Δt and taking the limiting case as Δt → 0, we obtain
⎡ t ⎤
f (t) = h(t) exp ⎢ − ∫ h(t)dt ⎥ (14.120)
⎢⎣ 0 ⎥⎦
To obtain Eq. 14.119, Eq. 14.120 is written as
dF / dt
h(t) = (14.121)
1 − F(t)
638 Risk and Reliability Analysis
Therefore,
dF
= h(t)dt
1 − F(t)
On integration we get
t
ln [1 − F(t)] = − ∫ h(t)dt
0
⎡ t ⎤
R(t) = exp ⎢ − ∫ h(t)dt ⎥
⎢⎣ 0 ⎥⎦
f (t) = λe − λt (14.123)
which is the exponential probability distribution. The reliability function R(t) is
∞
R(t) = ∫ λe − λt dt = e − λt (14.124)
t
∞
1
MTTF = E[t] = ∫ e − λt dt = (14.125)
0
λ
which is the expected value of the exponential distribution and the reciprocal of
the hazard rate. The implication is that the greater the hazard or failure rate, the
shorter will be the expected time to failure.
average, 10 years after the sections were constructed and accepted. What is the
reliability of the 20 sections, as a whole and 10 years and 20 years after accep-
tance?
Solution If expensive rehabilitation denotes failure, MTTF = 10 years, and the
failure rate λ is 0.10 years. Therefore,
R = e − λtR
1
ln R = − λ tR or tR = − ln R (14.126)
λ
For a constant failure rate of 0.10 per year (λ = 0.10), the reliable life for a reli-
ability of 10% is
tR=10% = –10 × ln(0.10) = 23 years
To determine the time until only one of the 20 pavements in the previous
example is reliable, R = 1/20 = 0.05, we calculate
tR = 5% = –10 × ln(0.05) = 30 years
The expected time for half of the pavements to show massive distress in this
example, R = 0.5, would be
tR = 50% = –10 × ln(0.50) = 7 years
( λt)x e − λt
f ( x) = (14.127)
x!
where λ = mean occurrence rate, x = random variable, and t = time interval. The
Poisson distribution models the probability of occurrence of events during a
time interval t, where the mean occurrence rate is λ. Then μ = λt.
Suppose that the probability of occurrence of a storm with a rainfall amount
that is capable of causing severe damage to a system follows a Poisson distribu-
tion. Here λ is the rate at which the killer storm is expected to occur. Within the
Reliability Analysis and Estimation 641
context of reliability theory, the reliability is measured relative to the time for the
first occurrence of the catastrophic storm; that is, the reliability is the probability
of no failures, and hence no occurrence of storms, until t:
R(t) = P(no devastating storms occur until time t)
For x = 0, Eq. 14.127 yields
Example 14.27 Suppose that records over a 100-year period show that there
were six major storms that caused severe damage to a structure. (a) What is the
estimated reliability of such a structure at the same location 10 years after it was
rebuilt if the construction was completed 3 years after the devastating storm?
(b) How many years after construction does the probability of failure reach 0.90?
Solution
(a) MTTF = 6/100 = 0.06. The reliability of the system at a time t = 10 + 3 = 13
years after the storm is to be determined. Hence,
1 ⎛ 100 ⎞
tR = − ln R = − ⎜ ln(0.10) = 38 years
λ ⎝ 6 ⎟⎠
both time and money. Consequently, the time interval for this phase is relatively
large as compared with that for items produced in other fields.
Consider a linearly increasing hazard function h(t) = kt, where time starts
after the constant hazard phase. From Eq. 14.117, for h(t) = kt, we have
⎡ t ⎤ ⎡ kt 2 ⎤
f (t) = kt exp ⎢ − ∫ ktdt ⎥ = kt exp ⎢ − ⎥ (14.129)
⎢⎣ 0 ⎥⎦ ⎣ 2 ⎦
⎡ t ⎤ − kt 2
R(t) = exp ⎢ − ∫ ktdt ⎥ = e 2 (14.130)
⎢⎣ 0 ⎥⎦
∞ − kt 2
∫ kt
MTTF = 2 (14.131)
e 2 dt
0
π
MTTF = (14.132)
2k
The MTTF for the linearly increasing hazard function is found to be proportional
to the reciprocal of the hazard slope. The hazard function is shown in Fig. 14-15.
Hazard function
Breaking-in
failures k
1
Wearing-out
failures
tcf Time t
and
Chance failures
Figure 14-15 The hazard function.
Reliability Analysis and Estimation 643
Example 14.28 Suppose that a system is only 75% reliable (or adequate) one
year after it has begun to wear out. Assuming the failure rate to increase linearly
with time, estimate the reliability of the system for the next 2 years.
Solution Equation 14.130 can be graphed as shown in Fig. 14-16. For t = 1 and
R = 75%, from Fig. 14-16, k = 0.58.
The time required to reach other estimates of reliability are then easily
scaled. In a sense the remaining reliability can be thought of as the percent worth
of the system. For example, from Fig. 14-16, after 3 years of deterioration with-
out maintenance, one can estimate that the system would be worth approxi-
mately 8% of its value after the initiation of the wearing-out mode of distress.
100 --
Slope of hazard function k
Reliability % R=75%
50 --
| | | |
0 1 2 3
Time t (years)
R
= 5%
= 10%
= 25%
k=0.58 = 50%
= 75%
= 90%
| | | | | | |
0 1 2 3 4 5 6
Wearing out time, two
Figure 14-16 The reliability function.
644 Risk and Reliability Analysis
After the event tree is prepared, follow-up steps are necessary to attend to
the branch that has high associated risk. The risk can be minimized by replacing
the component that is perceived to initiate, contribute, or quicken failure.
A fault tree gives a reverse representation of the process, working back
from a particular event (known as the top event) through all the chains of
events that are precursors of the top event. There can be more than one top
event. But for a top event to take place, some other events, known as lower-
level events, must take place. At the bottom are the events known as basic
events; these cannot be decomposed further and the failure probabilities for
these events need to be known. The key components of a fault tree are thus
event specifications and logic gates (with jargon heavily borrowed from the
electronics and communication field). A fault tree for failure of an earthen dam
is shown in Fig. 14-18. At the first level, the major causes of failure of such a
dam are identified and the next level lists the various causes that may result in
overtopping of the dam.
The main outcome of fault-tree analysis is the probability of occurrence of
the top event. This probability is stated in terms of OR (union) or AND (intersec-
tion) of the basic events. Knowing the probability of the basic events, one can
compute the probability of the top event. This works well for small fault trees,
but for large problems the computations become complex. An efficient method,
known as the cut set approach, is employed to increase efficiency. A cut set is a
set of basic events whose joint occurrence causes the top event to take place. A
minimal cut set of the system comprises the set of components that, when they
fail, cause failure of the system. If any component of this system works, the
remaining components in this set will collectively no longer cause the failure of
the system. Thus, in a cut set, the nonoccurrence of any basic event will lead to
the nonoccurrence of the top event. A complex system, such as a large water dis-
tribution network, can be subdivided into a number of cut sets working in paral-
lel and any one of these can result in the occurrence of the top event. Thus, the
system is a union (joined by an OR switch) of the cut sets.
Clearly, the application of these techniques requires an extensive database on
the occurrence of events and failures. Such databases obviate the necessity of
assuming some value based on subjective judgment. Although many databases
exist, one has to be cautious in pooling data from different sources since the pur-
pose, categorization, etc. are not likely to be the same across the databases.
Example 14.29 A water distribution system is shown in Fig. 14-19. The failure
probabilities of various pipes are as follows: Pipe 1 (P1) = 0.004, Pipe 2 (P2) =
0.003, Pipe 3 (P3) = 0.002, and Pipe 4 (P4) = 0.005. Draw the fault-tree diagram for
the system and determine the failure probability of no supply from source to
outlet.
646 Risk and Reliability Analysis
Earthen dam
failure
or
Due to
Piping failure Dam overtopping Earthquake Others
excessive seepage
or
Figure 14-18 Fault-tree for failure of earthen dam (adapted from Yen et al. [1986]).
Pipe 1 Pipe 3
Source
Outlet
Pipe 2 Pipe 4
Solution The fault-tree diagram of the system is drawn in Fig. 14-20 using AND
and OR gates. The probability of no supply from source to outlet is
Pno supply = P[ failure of Pipe 1 AND failure of Pipe 2]
≈ 22 × 10–6
Evidently, this is a very low probability, partly because of redundancy in the
network.
Reliability Analysis and Estimation 647
Failure of pipe
system
OR
Branch Branch
No. 1 No. 2
AND AND
observed, this will be termed as a nonzero-order rule. If the spillway gate open-
ing is given by g = f(x) and f() is determined a priori, this will be termed as a zero-
order rule.
Thus, the main difference between the two types of rules is that, according to
the nonzero-order decision rule, the exact value of a decision variable with
respect to stage t can only be computed after the outcomes of all random vari-
ables concerning the preceding t – 1 stages have been observed whereas, accord-
ing to the zero-order rule, the value of the decision variable is exactly known at
the beginning of stage t.
With respect to the random nature of the variables, when the unknown val-
ues of the decision variables are assumed to be deterministic, the decision rule is
called a nonrandomized decision rule. Since the random variations in the
parameters of a problem induce random variations in the optimal values of deci-
sion variables Xj, we can have a chance mechanism to determine the optimal val-
ues of Xj. The rules governing such a mechanism are called randomized decision
rules. In these rules, Xj are treated as random variables and consequently we
have to find their probability distributions. In the example just cited, the reser-
voir inflows are treated as random variables whose probability distribution
must be determined to develop and implement the decision rule.
Reliability constraints are frequently imposed on the system under consider-
ation so as to ensure a certain level of reliability regarding its performance.
x ≥ x(1− α) (14.136)
1.0
α
Exceedence
probability
1-α
0
x(α) X x(1-α)
Probability
density
function
fX(x)
Assume that, over many years, the initial storage volumes St, in each within-
year period t, were to be within certain lower ŝt and upper st limits at least
some fraction αt of the time:
Similarly, let the reservoir releases Rt be within the range r̂t to rt at least
some fraction βt of the time:
Because the probability distributions of all St and Rt are unknown, they must
be replaced by a function of random variables whose distributions are known
before deterministic equivalents can be defined.
Substituting Eqs. 14.138 into 14.139 and 14.140 permits the definition of deter-
ministic equivalents, since the distributions of the random inflow variables It are
known. Similarly, deterministic equivalents of Eq. 14.141 and Eq. 14.142 can also
be defined by using the continuity equation:
Rt = St + It – St+1 , t = 1, 2, ..., T; T + 1 =1 (14.143)
and the linear decision rule given by Eq. 14.137 or Eq. 14.138. It is then possible
to determine the optimal values of the decision variables.
In a reservoir design problem, the objective function is to minimize the
capacity of the reservoir (C). It can be expressed mathematically as
minimize C (14.144)
The objective function is subject to a number of constraints as discussed next.
Freeboard Constraint
To provide flood control, the storage St+1 at the end of period t should be such
that the freeboard volume C – St+1 is at least vt with reliability (say) 90%. Here, vt
is the flood storage capacity required at the end of the tth month of the year.
Mathematically, the constraint can be written in deterministic form as
C – St+1 ≥ vt (14.145)
where It0.90 is the flow for month t that is available 90% of the time (see x(α) in
Fig. 14-21). Putting Eq. 14.138 in Eq. 14.146, we have
C – bt – It0.90 ≥ vt (14.148)
or
C – bt ≥ vt + It0.90, t = 1, 2, 3, …, n (14.149)
Since It0.90 is exceeded, on average, 10% of the time, this constraint should
hold 90% of the time. If monthly data are being used, the value of n will be 12.
or
Putting Eq. 14.155 in Eq. 14.153, we have the explicit statement of chance
constraints:
This completes the formulation. In the problem, all variables are measured in
volume units.
C – bt ≥ vt + It0.90
Similarly,
C – b2 ≥ 1.49, C – b3 ≥ 1.37, C – b4 ≥ 1.58, C – b5 ≥ 1.80
Table E14-30 Inflows to the Gohira reservoir at different availabilities and the
minimum required release in million cubic meters (MCM).
Month Flow available 90% of the Flow available 10% of the Minimum required
time in period t (It0.90) time in period t (It0.10) release
amC – bt ≤ It0.10
Similarly,
0.3C – b2 ≤ 0.01, 0.3C – b3 ≤ 0.007, 0.3C – b4 ≤ 0.005
Minimum water supply requirement constraint: The constraint from Eq. 14.146 is
bt–1 – bt ≥ qt – It–10.10
bt–1 – bt ≤ ft – It–10.90
Similarly,
–b3 +b2 ≤ 59.51, –b4 +b3 ≤ 59.63, –b5 +b4 ≤ 59.42
load and fr is the factor of safety for resistance. For example, if fl is 1.4 and fr is
1.3, then the combined safety factor becomes 1.82.
Most of the structures designed by such a strategy work well and are not
likely to fail. But it is well known that no structure or design is 100% safe and
there is always some probability, however small, of failure. The conventional
design does not provide any assessment of failure probability and may, at times,
result in an oversafe design. Although the fundamental justification for the use
of safety factors is to take care of the uncertainties and unknowns in load estima-
tion and design, these factors are usually chosen based on experience or some
standard guidelines rather than a detailed statistical analysis of measured data.
Instances wherein the same value might be followed throughout a country are
not uncommon to find.
A reliability-based design is a procedure in which the design of a structure is
carried out with explicit control of the probability of performance and structural
failure. A key feature of this approach is the recognition that no design can be
absolutely safe. Moreover, the concepts of reliability-based design can also be
used to assess failure probabilities from causes such as aging, changed system
environment, changes in input properties (e.g., rainfall or inflows), and changes
in demands (e.g., new cropping pattern or expansion of a city). The main advan-
tages of reliability-based design are that it (a) recognizes that natural variables
(e.g., rainfall, floods, and winds) are essentially stochastic, (b) uses information
on the probabilistic properties of these inputs in design, (c) applies the statistical
estimates of distributional properties of loads and capacity rather than the arbi-
trarily chosen factors of safety, and (d) provides an assessment of risk and reli-
ability of the design.
Besides design, the reliability-based approaches have also found wide appli-
cations in planning. Based on the data used in design, the three levels of design
can be classified as follows:
• Level III
(a) Uses joint probability distributions of load and resistance.
(b) Explicitly computes failure probability.
• Level II
(a) Based on failure probability.
(b) Treats load and resistance as random variables.
(c) Uses mean and variance of loads and resistance.
• Level I
(a) Conventional approach.
(b) Based on deterministic concepts and single values of load and resis-
tance (capacity).
(c) Assumes failure occurs if load exceeds resistance.
Reliability Analysis and Estimation 659
The failure probability changes with time as the result of aging and changes
in system load as well as inputs.
Despite extensive literature on risk and reliability analysis, there appears to
be a considerable gap between theory and practice of design in civil and envi-
ronmental engineering. The main reasons for this gap are as follows:
• Standard and widely accepted criteria to quantify risk and reliability are
not available.
• Robust methodologies that can be readily used by practicing engineers
to include reliability measures in design and management of prototype
systems are not commonly available.
• Adequate data about the variables involved in analysis are available only
in limited cases.
• Budgetary support may not be available to carry out such an analysis.
• A general apathy toward reliability-based concepts hinders their
application.
In a majority of instances, the severity of consequences determines the selec-
tion of design events. Logic, however, would require that the values of design
variables be decided based on the expected value of damages for the events with
given probability of occurrence, the sensitivity of damages to the magnitude and
occurrence probability, and incremental cost of the engineering device or struc-
ture with respect to the magnitude of design variables. In general, the cost of the
structure increases as higher reliability is sought and the relationship between
cost and reliability is usually highly nonlinear.
The losses from the failure of a system, either structural or in terms of perfor-
mance, are expressed in monetary terms for use in planning and management
decisions. But this requires that the losses are first assessed in physical terms and
for that a list of harmful consequences of failure must be prepared. This list will
include the losses that arise because the structure is no longer able to serve its
intended function and the unwanted consequences just because the structure is
no longer present. For example, the failure of a flood embankment may lead to
inundation of downstream areas; damage to property, crops, and industries; and
loss of life. Failure of a water distribution system means that the population and
industries in the area being served will have to search for an alternative source
of water until the supply is restored. If a hydropower dam fails when the reser-
voir is full, not only is a source of electricity lost, but all the consequences of an
embankment failure are felt, possibly at a more severe level. In an industrial
area, damage is assessed based on the cost of repairs, damage to the raw mate-
rial and final product, and the loss of production.
However, it is difficult to assign a monetary value to damages. In agriculture
areas, the monetary value of crop damage depends upon the area of inundation,
type of crops grown and their growth stage, and the duration and depth of inun-
dation. Assigning a value to human life is quite difficult and is even controversial,
660 Risk and Reliability Analysis
and many times loss of human life is reported separately. The damage in an indus-
trial area is assessed based on the cost of repairs and loss in production.
Example 14.31 Design a levee for a stream. The annual peak flow QL for this
stream is characterized by the Gumbel distribution with mean and coefficient of
variation of 500 cfs and 0.393, respectively. The cross section of the stream can be
assumed as given in Fig. 14-22. The stream capacity QC can be calculated using
Manning’s equation as
1.49
QC = AR2 3 S1 2
n
The geometrical and hydraulic characteristics are given in Table E14-31a.
1:12
1:12
nb = 0.11 nb = 0.11
1:3
1:3
100 ft
nc =
0.08
Table E14-31a
Parameter C N (main N (left over N (right over Slope
channel) bank) bank)
Mean 1 0.08 0.11 0.08 9.873 × 10–5
CV 0.15 0.15 0.15 0.15 0.25
Distribution Normal Normal Normal Normal Normal
Reliability Analysis and Estimation 661
Solution The limit state function (or the performance function) can be defined as
(
Z = C1 Yc nc−1 + 2Yb nb−1 S0.5 − QL )
where Y = AR2/3 is called the section factor. Yc and Yb represent section factors
for the main channel and over bank sections, respectively. The values of Yc and
Yb are considered constant for a given trial flood stage. Assuming the variables
are uncorrelated, we can write the problem of determining the reliability in
terms of the ellipsoid method as
(C − μC )2 + (nC − μn ) + (nb − μn )
2 2
minimize β = c b
+
( S − μS )
2
+
(QL − μL )
2
σ C2 σ 2nc σ 2nb σ S2 σ L2
(
subject to C1 Yc nc−1 + 2Yb nb−1 S0.5 − QL = 0 )
We see that most parameters are normally distributed, except QL, which is
defined by the Gumbel distribution. Therefore, it is necessary to determine the
characteristics of the equivalent normal distribution at various points on the
limit state function. Based on the Gumbel distribution the probability of x being
exceeded is expressed as
where
π π
α= = = 0.007
σX 6 (500 * 0.393) 6
and
0.57722 0.57722
u=x− = 500 − = 411.56
α 0.007
Using these parameters of the Gumbel distribution, we can calculate the
parameters of the equivalent normal distribution at a given point x* (say
x* = 599.27 cfs) as seen in Table E14-31b
Once these parameters are calculated, the problem of determining the reli-
ability index can be formulated as shown in Table E14-31b.
In this formulation, a dynamic link is made to determine the parameters of the
equivalent normal distribution for all the trials in the nonlinear optimization pro-
cess. Now invoking SOLVER in Excel by setting β = minimum while using
assumed Xi values and subjecting them to the constraint Z = 0, one can easily per-
form the calculation for the reliability index. The calculations presented in
Table 14-31b correspond to a stage of 7.94 ft. Similar calculations can be performed
662 Risk and Reliability Analysis
for other assumed flood stages. Table E14-31c presents the flood stage and proba-
bility of levee failure and levee reliability.
Now, a designer can choose an appropriate levee height corresponding to a
desired reliability. As the levee height is increased the cost of the project will
increase, so in many situations a tradeoff in cost and reliability must be
considered.
Table E14-31b
Item C nC nb S QL
Mean 1 0.08 0.11 9.873 × 10–5 1033.437
CV 0.15 0.15 0.15 0.25
SD 0.15 0.012 0.0165 2.469 × 10–5 657.468
Assumed solution, 0.98 0.08 0.11 9.58 × 10–5 599.27
Xi
0.02 0.02 0.00 0.01 0.44
U 2i = ⎡⎣( Xi − μi ) σ i ⎤⎦
2
β 0.704
(C − μC )2 + (nC − μn ) + (nb − μn )
2 2
c b
+
(S − μS )2 + (QL − μL )2
σ C2 σ n2c σ n2b σ S2 σ 2L
Pf 1 − Φ ( β) 0.24059
( )
Performance 0.000
function, Z C1 Yc nc−1 + 2Yb nb−1 S0.5 − QL
Table E14-31c
Flood stage Failure probability, Pf Reliability, R
7.94 0.241 0.759
8.72 0.105 0.895
9.62 0.022 0.978
10.22 0.008 0.992
10.74 0.003 0.997
11.23 0.002 0.998
12 0.001 0.999
15 0.000 1.000
Reliability Analysis and Estimation 663
Example 14.32 Size a storm sewer with a 90% reliability using the following
information: The storm sewer peak runoff QL is given as QL = λLCIA . The
capacity QC of the sewer is given as
0.463 12
QC = λm d8 3 S0
n
The definition and statistical characteristics of uncertain variables are given
in Table E14-32a.
Table E14-32a
Parameter Distribution Mean Coefficient of
variation
λm Triangular 1.100 0.089
D (ft) Triangular 7.000 0.041
S0 (ft/ft) Triangular 0.005 0.250
λL Triangular 1.000 0.123
C Triangular 0.825 0.200
I (inches/hour) Triangular 4.000 0.300
A (acres) Triangular 25.000 0.041
n Gamma 0.015 0.300
Solution Combining the storm sewer peak runoff and the sewer capacity
expressions, we can define the performance function Z as
With the above process, the probability of failure is determined for different
assumed sewer diameters. Table E14-32e presents the probability of failure Pf and
reliability R = 1– Pf corresponding to various sewer sizes. One can plot these
numbers, which can be used to interpolate the Pf and R values corresponding to
any other desired size. Figure 14-23 presents a plot between sewer diameter
versus probability of failure/reliability. It is clear from Fig. 14-23 and Table 14-32e
that the sewer diameter should be 6 ft for 90% reliability.
1.0
0.9
Probability of Failure
0.8
Reliability
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0.0
2 3 4 5 6 7 8
SewerDiameter (ft)
14.7 Questions
14.1 For a given watershed the point-source load, WLA, is 50 lb/day and the
non-point-source load is 1,000 lb/day. The TMDL [= WLA + LA + MOS]
capacity at the outlet of this watershed is determined to be 1,500 lb/day.
Determine the margin of safety and factor of safety.
14.2 Assume that the magnitude of uncertainty (represented by the coeffi-
cient of variation, CV) associated with TMDL, WLA, and LA are 0.25,
0.20, and 0.35, respectively. Using the values for point- and non-point-
source loads and the TMDL capacity in Question 14.1 as mean values,
determine the reliability index and the corresponding probability of
failure. Assume that TMDL, WLA, and LA are independent and nor-
mally distributed.
Table E14-32b Equivalent normal distributions for non-normal distributions.
Triangular distributions
System Given information Parameters of Cumulative probability Values from Eqs. Parameters of
parameter triangular distribution pdf values at assumed 14.50 to 14.53 equivalent normal
(1) point x* distribution
Mean X CV X a b P(x*) p(x*) Φ–1 [P(x*)] φ{Φ–1 [P(x*)]} SD Mean
(2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
λM 1.100 0.089 0.860 1.340 0.500 4.166 0.000 0.399 0.096 1.100
D 3.000 0.041 2.699 3.301 0.500 3.319 0.000 0.399 0.120 3.000
S0 0.005 0.250 0.002 0.008 0.500 326.599 0.000 0.399 0.001 0.005
λL 1.000 0.123 0.699 1.301 0.500 3.319 0.000 0.399 0.120 1.000
C 0.825 0.200 0.421 1.229 0.500 2.474 0.000 0.399 0.161 0.825
I 4.000 0.300 1.061 6.939 0.500 0.340 0.000 0.399 1.173 4.000
A 25.000 0.041 22.502 27.498 0.500 0.400 0.000 0.399 0.997 25.000
Gamma Distribution
Mean x CV x a B P(x*) p(x*) Φ–1 [P(x*)] φ{Φ–1 [P(x*)]} SD Mean
N 0.015 0.300 11.111 0.001 0.540 87.992 1.00E-01 0.397 0.005 0.015
666 Risk and Reliability Analysis
Table E14-32c Solution using Rackwitz’s iterative method (by setting up the algorithm
at old Xi = Mean X and d = 3.0 ft).
Parameter Old Mean SD dg/dx a New
Xi from from Eq. 14.56 Xi
col. (11) col.
(10)
λM 1.100 1.100 0.096 3.913 0.113 1.111 Performance
function,
D 3.000 3.000 0.120 4.802 0.138 3.017
Z = –1.995
S0 0.005 0.005 0.001 5.490 0.158 0.005
N 0.015 0.015 0.005 –13.517 –0.389 0.013 Reliability index,
λL 1.000 1.000 0.120 –9.916 –0.286 0.965 β=0
Table E14-32d Solution using Rackwitz’s iterative method (obtained for d = 3.0 ft).
Variable Old Xi Mean SD of ∂g/∂x αi New Xi
of Xi Xi
λM 1.114 1.100 0.101 5.218 0.145 1.114 Performance
function,
d 3.023 2.999 0.128 6.506 0.180 3.023
Z = 5.09 × 10–5
S0 0.005 0.005 0.001 7.167 0.199 0.005
n 0.012 0.014 0.004 –18.476 –0.512 0.012 Reliability index,
β = 1.012
λL 0.973 1.001 0.129 –7.641 –0.212 0.973
C 0.760 0.828 0.179 –13.575 –0.376 0.760
I 3.129 4.030 1.321 –24.350 –0.674 3.129 Probability of failure,
A 24.933 25.001 1.022 –2.365 –0.065 24.933 Pf = 0.844
14.3 The average volume of space for flood control in a multipurpose storage
reservoir is about 15 million m3 with a standard deviation of 2.0 million m3.
The mean volume of the largest flood in a given year is 9 million m3 with a
standard deviation of 3.0 million m3. Determine the probability that the res-
ervoir would not be able to contain the largest flood.
14.4 Considering the data of Question 14.3, determine the risk that the reser-
voir would not be able to contain the flood if the flood and the capacity
followed the gamma distribution.
14.5 The monthly maximum concentration in a stream is observed to be
approximately normally distributed, with a mean of 10.00 mg/L and a
standard deviation of 3.0 mg/L. For waste discharge, there is a require-
ment that the pollutant concentration, 200 m downstream from the out-
fall, exceeding 10 mg/L during any one month should have less than 1%
chance. Is the requirement being satisfied?
14.6 The water supply to a town is 1,000 m3 per day. The demand for water in
the town follows a normal distribution with a mean of 900 m3 per day
and a standard deviation of 250 m3. Determine the risk that the water
demand would not be met on a typical day. Determine the risk if the
water supply had a standard deviation of 200 m3.
14.7 Assume that the water supply and demand in Question 14.6 are corre-
lated with ρ = 0.2 and their joint probability distribution is normal.
Determine the probability P(C ≤ 800, D ≥ 1000). Also determine this
probability if ρ = 0.5.
14.8 Solve Question 14.2 using FOA by defining the performance function as
in the conventional MOS and FOS.
14.9 Let the concentration C of a pollutant in a stream be given as C = aQb,
where Q is the streamflow and a and b are some constants. Consider
the performance function as Z = Cmax – aQb, with Cmax (the maximum
allowable pollutant concentration) = 15 mg/L, a = 2.10 × 10–8, and b = 3.
The flow discharge has a mean value of 1,000 cfs and a coefficient of
variation of 0.35. Determine the probability that the allowable stream
standard would be violated.
14.10 Let the chlorine concentration C (mg/L) at time t in a drinking water dis-
tribution system be given as C = C0 exp ( − kt ) , where C0 = the initial chlo-
rine concentration (mg/L) = 4 mg/L and k = the overall decay constant
(L/hour) with a mean value of 0.20 L/hour and coefficient of variation
of 0.30. Determine the reliability that a location having a travel time of 20
hours would have at least 0.50 mg/L of residual chlorine.
668 Risk and Reliability Analysis
whose parameters are given in Table 14-11. Determine the reliability of the
outlet structure against the scouring induced by vertical jets downstream
of the outlet facility, if the foundation depth is df = 8, 10, 12, and 15 m.
14.12 Solve Question 14.8 using the corrected FOA method.
14.13 Solve Question 14.9 using the corrected FOA method assuming Q is
characterized by (1) a normal distribution, (2) a log-normal distribution,
and (3) a gamma distribution.
14.14 Solve Question 14.10 using the corrected FOA method assuming k is
characterized by (1) normal and (2) gamma distributions.
14.15 Solve Question 14.11 using the corrected FOA method.
14.16 Solve Question 14.11 using the ellipsoid approach.
14.17 A culvert has been designed for a carrying capacity of 50 cfs. Based on
the rational formula the 5-year flow at that point is given as Q = 2.41 × C
× A × (Tc + 0.2)–0.77. Assuming all variables are independent and nor-
mally distributed, determine the probability of failure of the culvert
using FORM and the data from Table Q14-17.
Table Q14-17
Function Mean SD
A 15.00 1.25
C 0.50 0.20
Tc 0.40 0.25
14.18 Solve Question 14.17 using FORM if C and Tc are correlated with
r(C, Tc) = 0.50.
14.19 Solve Question 14.18 using FORM if C and Tc and A are characterized by
log-normal, gamma, and triangular distributions, respectively. Assume
ρ = 0.50 between C and Tc.
14.20 Solve Question 14.10 for the time of travel t = 1 to 24 hours using the
generic function method. Also calculate the mean and variance using the
FOA method and their errors using the corrected FOA method. Assume
k is characterized by a gamma distribution with a mean of 0.20 L/hour
and a coefficient of variation of 0.33.
14.21 Solve Question 14.17 using the PE method.
Chapter 15
Risk is an inherent part of life and engineering decisions. The word “risk” seems
to have been derived from Spanish or Portuguese. Originally, it referred to sail-
ing into uncharted waters and it had therefore an orientation in space. Another
illustration of spatial orientation would be individuals or governments or banks
lending money for projects. With progression of time, its connotation assumed a
time dimension. For example, in water resources and environmental engineer-
ing risk may entail calculation of the probable adverse consequences of environ-
mental and water resources projects to be built for people in the project area.
Normally, one would want to minimize the risk of undesirable consequences or
outcomes of a decision. In most cases it is not possible to completely eliminate
risk; however, one can mitigate it. Before initiating a discussion on risk, the fre-
quently used terms are defined first.
669
670 Risk and Reliability Analysis
purpose—it also changes with discipline. In general terms, risk can be defined as
the potential loss resulting from the convolution of hazard and vulnerability.
Mathematically, risk may be expressed as the probability of surpassing a deter-
mined level of economic, social, or environmental consequence at a certain site
and during a certain period of time. Convolution is a mathematical term that
refers to concomitance and mutual conditioning of hazard and vulnerability. In
other words, a system cannot experience a risk if it is not exposed to a hazard
and is vulnerable. Hazard and vulnerability are mutually conditioning situa-
tions and neither can exist on its own. Altering one or two of the components of
risk alters the overall system risk as well. However, in many cases it is not possi-
ble to modify hazard to reduce risk; one has to reduce the vulnerability of the
system as a measure of prevention or mitigation, a process also known as risk
reduction.
Most commonly, in civil engineering risk has been defined as the probability
of a system failure, the reciprocal of the expected length of time before a system
failure takes place, or some measure of the cost of failure. The Royal Society
(1983) defined risk as the probability that a particular adverse event (an event
whose occurrence produces harm, such as a 100-year flood, a category-5 hurri-
cane, a 50-ft storm surge, or a 300-mile/hour tornado) occurs during a stated
period of time. Thus, the concept of risk combines a probabilistic measure of the
occurrence of the adverse event with a measure of the consequences of the
occurrence of that event. The occurrence may include the amount or intensity,
starting time, or duration.
An important point to note is that risk is not viewed in a positive sense. Con-
sider, for example, that a person receives information that he is likely to receive
an award of either 1 million dollars or 10 million dollars. Although the person is
not certain as to the amount he will receive, it is safe to say that he will not be
under risk.
Risk involves uncertainty as well as loss or damage. For example, the risk of
flooding involves the probability of occurrence of the flood as well as the dam-
age that might result from the flood event. Therefore,
Risk = Hazard uncertainty + Consequence owing to the system’s
vulnerability to hazard (damage or loss) (15.1)
Sometimes risk is defined as the probability times consequences. A draw-
back of this definition is that it equates risk of a high-probability, low-damage
event with that of a low-probability, high-damage event. Clearly, in real life these
two events may not amount to the same risk.
The converse of loss is benefit, which is defined in terms of gain or improve-
ment for a human being, a society, a nation, the human population, or the planet.
Expected benefit includes an estimate of the probability of achieving the gain.
Gain and loss are often measured in economic terms but in real life they can also
be in nontangible terms.
Risk Analysis and Management 671
Example 15.1 Most people are always interested in weather. What is the risk
from the coldest weather on record next winter in Baton Rouge, Louisiana?
Solution Weather plays an important role in our daily lives. We want to know
whether the coming winter in Baton Rouge will be the coldest on record and its
consequences if it is. Thus, risk (assuming cold weather is harmful) in this case is
the probability of occurrence of the coldest winter next year in Baton Rouge and
the ensuing consequences in terms of crop damage, bursting of pipes, traffic
accidents, higher heating bills, and so on.
15.1.3 Hazard
Hazard is a situation or occurrence of an event that could, in particular circum-
stances, lead to harm. Hazard can be considered as a latent danger or an external
risk factor of an exposed system. This can be mathematically expressed as the
probability of occurrence of an event of certain intensity at a specific site and
during a determined period of exposure. Thus, hazard is a source of risk and risk
includes the chances of conversion of that source into actual loss. For example, it
is not advisable to drive when road conditions are icy and not favorable for driv-
ing. It is not advisable to go to a beach when there is a warning of tsunami. It is
hazardous to cross a river under spate by swimming, because the chances of
drowning are relatively significant even for an expert swimmer. But if the swim-
mer attempts to cross the river by a motorboat, equipped with a powerful
engine, rugged body, and life jackets, etc., the risk of drowning is considerably
smaller. Thus safeguards help reduce risk. Mathematically, one can write
(Kaplan and Garrick 1981)
Risk = Hazard/Safeguards (15.2)
672 Risk and Reliability Analysis
Example 15.2 Consider a detention pond for local flood control in an urban
area. What could the risk be from this detention structure?
Solution The detention dam may be overtopped and breached. As a result, the
dam breach may cause harm to people in the urban area. The risk would be the
probability of specified damage or harm in a given period.
For water control structures, hazard from failure depends on the size of the
structure. Therefore, decisions about the recommended design load (or design
flood) are based upon the size of the structure and its hazard potential. The rec-
ommendations of the U.S. Army Corps of Engineers regarding selection of spill-
way design flood for a dam are given in Table 15-1. A typical classification of
reservoirs according to size and the hydraulic head is given in Table 15-2. The
hazard potential classification of reservoirs is given in Table 15-3.
15.1.4 Disaster
The term disaster has the connotation of an event capable of inflicting damage or
causing danger to human and/or animal life and/or property. A disaster can be
anthropogenic or natural. Hurricanes, typhoons, cyclones, earthquakes, tsuna-
mis, lightning, and land subsidence are examples of natural disasters. Anthropo-
genic disasters include dam breaching, levee failure, chemical spills, nuclear
Risk Analysis and Management 673
explosions, bomb blasts, bridge collapses, train accidents, and so on. Disasters
have occurred from time immemorial and will continue to occur. We cannot
eliminate disasters but we can mitigate their impact.
If this table contains all the possible scenarios, it is the estimation of risk. Of
course, the list of scenarios in a real-life case can be quite lengthy. Kaplan and
Garrick (1981) suggest that a category “others,” encompassing all the scenarios
that have not been thought of, may be added to the list for the sake of comple-
tion. Of course, the problem of assigning a probability to this category remains
to be tackled. Logically, the probability for the event in this category will be very
Risk Analysis and Management 675
small because the event has not happened; otherwise it would have been
included in the list.
The triplet-based definition of risk and previous discussion suggest that haz-
ard can be defined as a subset of risk—a set of doublets:
H = [si, xi], i = 1, 2, …, N (15.4)
then the second column of Table 15-2 can be accumulated and a smooth curve
between x and p can be plotted, as shown in Fig. 15-1. Such curves are termed
risk curves.
Risk curve A
Risk curve B
Probability
Risk
Figure 15-1 Smooth risk curves.
and manufactured risk. External risk is the risk that stems from outside, from the
fixities of tradition or nature. Manufactured risk is the risk that is created by our
actions and can occur in a situation that we have very little experience of con-
fronting. For example, environmental risks of global warming or climate change
are manufactured risks, influenced by intensifying globalization. This is the risk
created by the very impact of our developing knowledge upon the world. Flood-
ing risk from land use change, such as urbanization, can be categorized as a man-
ufactured risk.
From the earliest days of human civilization up to the threshold of modern
times, risks were primarily due to external sources (natural): floods, famines,
plagues, earthquakes, tsunamis, etc. Recently, the focus has shifted from what
nature does to us to what we have done to nature. This marks the transition from
the predominance of external risk to that of manufactured risk. Much of what
used to be natural is not completely natural any more. Therefore, natural phe-
nomena, such as floods, droughts, diseases, extreme weather, and land subsid-
ence, are not entirely natural; rather, they are being influenced by human
activities, as suggested by their unusual features.
As a manufactured risk expands, there is a new riskiness to risk. The very
idea of risk is tied to the possibility of calculation. However, in many cases we
simply do not know what the level of risk is and we could not know for sure
until it is too late. In these circumstances, there are two extremes to characteriz-
ing risk. On the one hand, if the risk is real, there must be an explicit statement to
that effect and the risk must be emphasized even at the cost of scaremongering.
On the other hand, if the risk turns out to be minimal, there will indeed be accu-
sations of scaremongering. Furthermore, if the risk is not emphasized and it
turns out to be significant, there will then be accusations of cover-up. This is
illustrated in Fig. 15-2.
Real
Risk
βi × 10 −4 / year
PAcc = (15.5)
νij
680 Risk and Reliability Analysis
where βi can range from 0.01 for a high risk for an action that gives no benefit to
the person to 10 for a risky activity that brings high satisfaction to the person;
and νij is the vulnerability of an individual to an event xi. For a nation, the
acceptable probability, according to Vrijling et al. (1995), becomes
10 −3 / year
PAcc = for n ≥ 10 casualties (15.6)
n2
A problem with the idea of acceptable risk is that acceptability or otherwise of
a risk can only be expressed along with the associated costs and benefits. Given
an option, one may not be willing to accept any risk at all. At a given time, a risk
may be taken only if some benefits are associated with it and these cannot be
obtained in another way unless a higher risk is taken. Logically, a decision maker
will choose the optimum mixture of risk, cost, and benefit and might be willing to
take a higher risk only if it is associated with either less cost or more benefit.
The second point to highlight here is the implicit assumption that risks are
linearly comparable, but this is not true. For instance, one cannot say in Fig. 15-1
whether risk is higher in case of the curve A or curve B. A way out is to some-
how reduce the risk curves to single numbers by defining a utility function,
which depends on x, and then integrate to determine the expected utility.
The steps of risk management are depicted in Fig. 15-3. These days, most
large organizations integrate risk management in overall decision making. There
is considerable diversity of opinion as to the identification, measurement, and
regulation of risk. Risk management may entail
1. The degree of anticipation to be adopted
2. The extent of blame (orientation) in management systems
3. The contribution of quantitative assessment techniques
4. The feasibility of institutional design
5. The cost of risk reduction
6. The desirable level of participation
7. The regulatory budget
Figure 15-4 FEMA Flood Insurance Rate Map (FIRM) for St. Clair County, Illinois.
quality of the environment in the short or longer term. Risks affecting the envi-
ronment should not be confused with risks caused by environmental effects,
either natural or anthropogenic.
C × CR × EF × ED
CDI = (15.7)
BW × AT
where CDI is the chronic daily intake (mg/kg/day), C is the chemical concentra-
tion, contacted over the exposure period (mg/L), CR is the contact rate, the
amount of contaminated medium contacted per unit time (L/day), EF is the expo-
sure frequency (day/year), ED is the exposure duration (years), BW is the body
weight, and AT is the averaging time.
The risk from a carcinogenic chemical is calculated as
Risk = CDI × SF (15.8)
where CDI is the chronic daily intake and SF is the carcinogen slope factor.
The risk from a noncarcinogenic chemical is calculated as
HI = CDI/RfD (15.9)
where HI is the hazard index, CDI is the chronic daily intake, and RfD is the ref-
erence dose.
CS × IR × CF × FI × EF × ED
Rc = SF (15.10)
BW × AT
where CS is the chemical concentration in the soil (mg/kg), CF is a conversion
factor (10–6 kg/mg), IR is the ingestion rate (mg soil/day), FI is the fraction
ingested from contaminated sources (nondimensional), EF is the exposure fre-
quency (days/year), ED is the exposure duration (years), BW is the body weight
(kg), AT is the averaging time (period over which exposure is averaged in days),
and SF is the slope factor or cancer potency factor ((kg-day)/mg).
Solution Using the FOA method described in Chapter 14, we find that the mean
of a function y = x1 x2 ...xn is
y = x1 x2 ...xn (15.11)
Applying Eq. 15.10 in the context of Eq. 15.7 allows us to determine the
expected value of the excess lifetime cancer Rc as
Using Eq. 15.12 and the CV values of various parameters given in Table E15-3
gives the coefficient of variation of excess lifetime cancer Rc from the ingestion of
contaminated soil as 2.57. Because CVy >> 1, there is considerable uncertainty in
the Rc value and thus its use as a representative risk is not justified.
688 Risk and Reliability Analysis
Example 15.4 The town of Risky is located on the banks of Floody River. The
river flow follows an extreme value type I distribution with a mean of 256.8 m3/s
and a standard deviation of 78.2 m3/s. The rating curve of Floody River at a
gauging site near Risky is given by
Q = 97.03 × (h – 214.5)0.64 (15.13)
The Noah Shipping Company wants to construct an office on a plot of land
near the river at an elevation of 223.0 m. Find the risk of flooding at this plot every
year. What will be the risk if the ground elevation is raised 0.75 m by filling?
Solution We first need to calculate the parameters of the EVI distribution as
described in Chapter 5. Here mQ = 256.8 m3/s and σ Q = 78.2 m3/s. Therefore,
1.282
α= = 0.0164
78.2
0.5772
u = 256.8 − = 221.605
0.0164
From the rating curve expressed by Eq. 15.13, the flow at stage 223 m will be
381.7 m3/s. Hence the probability that the flow in a given year will exceed a
value q= 381.7 m3/s will be
P [ Q ≥ q] = 1 – FQ(q) = 1 – exp – exp[ –0.0164(q – 221.605)]}
The probability that this flow q is exceeded is
P [ Q ≥ 381.7] = 1 – exp{–exp[ –0.0164(381.7 – 221.605)]}
= 0.0695
Thus there is about 7% risk that the Noah Shipping Company office will be
under water in a given year. If this risk is perceived to be high and yet the com-
pany wants to build office at this very site, it can consider raising the plinth level
by soil filling. If soil is filled so that the ground elevation is raised by 0.75 m, the
new elevation will be 223.75 m. The corresponding discharge (calculated by
assuming that the same rating curve remains valid) is 403 m3/s and for this dis-
charge, the risk of flooding is about 5% each year. Clearly, the risk of flooding
has been reduced by about 2% by raising the plinth level by 0.75 m.
The methodology illustrated in this example is employed to construct flood-
plain zoning maps. If a hazard-producing event leads to failure of a structure or
its component, risk assessment includes the consequences of this failure also.
A criticism of risk assessment is that any numerical estimation may be
highly uncertain because of its dependence on future human actions and devel-
opments in the area. For example, consider a flood control dam that is con-
structed in an area subject to frequent flooding. As a result of protection
provided by the dam, the area may witness rapid growth in industrial and con-
struction activities. If the dam fails 20 years after construction, the damage might
be much more than what would have occurred if the dam had not been con-
structed in the first place.
690 Risk and Reliability Analysis
1 1
T (x) = = (15.14)
1 - Prob(X £ x) Prob(X ≥ x)
where T(x) represents the average time between hazard events having intensi-
ties equal to or exceeding x. It does not mean that the event will certainly take
place. The events occurring in the time interval n – 1 < t ≤ n are plotted at t = n,
not at their actual time of occurrence in the interval. T(x) will be slightly greater
than the real average time between the events with intensities x. The magnitude
of the difference will depend on the time scale used. For example, the return
period for the time measured in years will be different from the return period
measured in decades.
Consider a random variable W(x) that denotes the time interval between two
successive exceedances of x. Then, W(x) can be referred to as the waiting time. If
an exceedance of x has just occurred, one can compute the probability that the
692 Risk and Reliability Analysis
next exceedance will occur n time units away. Because of the assumption of
statistical independence, what occurs during one time unit has no effect on the
probabilities of future occurrences. This means that the same result would be
obtained if any integer on the time axis was selected as the initial point, irrespec-
tive of whether an exceedance has just previously occurred. If W = n, this means
that there must have been n – 1 hazard events without an exceedance of x (prob-
ability of each = q) followed by an exceedance (probability = p). Here p = 1– F(x)
and q = F(x) =1 – p. Thus,
1 1
= 1 + q + q2 + q3 + ... = = (15.18)
1 − q 1 − F( x )
This equation can also be derived by noting that
T = p(1 + 2q + 3 q2 + 4 q3 + …)
d
= p (1 + q + q2 + q3 + …)
dq (15.19)
d 1 p p 1 1
=p ( )= = = =T =
dq 1 − q (1 − q)2 p 2 p 1 − F( x )
The quantity T, as used in the hydrologic literature, is the average return
period. Thus, we state that, on average, a flood above a level x will occur once
every T years. It should be noted that the distribution function of N is a geomet-
ric progression:
1 − qn
P( N ≤ n) = p(1 + q + q2 + q3 + q 4 + … + qn−1 ) = p = 1 − qn (15.20)
1− q
1
Since q = F( x) = 1 − , Eq. 15.20 can be expressed in terms of the return period as
T
⎛ 1⎞T
P ( N ≤ n) = 1 − ⎜ 1 − ⎟ (15.21)
⎝ T⎠
Risk Analysis and Management 693
∞ ∞ ∞
E[N ]2 = ∑ n2 P( N = n) = ∑ n2 qn−1 − ∑ n2 qn
n=1 n=1 n=1
1+ q
= 1 + 3 q + 5 q2 + 7 q3 + 9 q 4 + 11 q5 + … =
(1 − q)2
Hence,
1+ q 1 q q
var[N ] = 2
− 2
= 2
= (15.23)
(1 − q) (1 − q) (1 − q) p2
Example 15.5 Given that q = 0.9 and 0.99, find the variance of the return period.
Solution When q = 0.9, p = 1 – 0.9 = 0.1. Hence, T = 10 and
q 0.9 9
var[T ] = 2
= = = 90
p 0.01 0.1
0.99 99
var[T ] = = = 9900
0.0001 0.01
These examples show that E[N] is not a good measure. Therefore, it is better
to calculate the probability of no exceedance within a given period, say, n; that is,
we want
P( N > n) = α
∞ ∞
= ∑ P ( N =k ) = ∑ qk−1 (1− q)
k = n+1 k = n+1
∞ ∞ (15.24)
= ∑ qk−1 − ∑ qk
k = n+1 k = n+1
= (qn + qn+1 +…)−(qn+1 + qn+2 +…)
This yields
1 1
T= or q = 1 −
1− q T
⎛ 1⎞n ⎛ 1⎞T
α = ⎜ 1 − ⎟ = ⎜ 1 − ⎟ for n = T (15.26)
⎝ T⎠ ⎝ T⎠
⎛ 1 ⎞T
lim⎜1− ⎟ = e−1 ≅0.368 (15.27)
T →∞⎝ T⎠
or P (at least one exceedance within a large return period) = 1– 0.368 = 0.632.
Example 15.6 Let the return period of a flood be T = 10 years. Find the probabil-
ity of at least one exceedance within the return period.
Solution Given T = 10, then α = (1 – 0.1)10 = 0.910 = 0.356. Therefore, for T = 10,
P(A) = 1 – 0.356 = 0.644.
1
q = 1− p = 1−
T
The probability that x will be equaled or exceeded in any n successive years
is given by
⎛ 1⎞n
⎜⎝ 1 − ⎟⎠
T
The probability that x will occur for the first time in n years is
n −1
⎛ 1⎞ 1
q n −1 p = ⎜ 1 − ⎟
⎝ T⎠ T
The probability that x will occur at least once in the next n years is the sum of
the probabilities of its occurrence in the first, second, … to nth years and is
therefore
p + p q + p q2 + ... + p qn−1
Risk Analysis and Management 695
Thus, the probability that the event will occur only once is
n
⎛ 1⎞
R = 1 − qn = 1 − ⎜ 1 − ⎟
⎝ T⎠
The probability R is called risk. This can also be obtained directly from the
probability of nonexceedance in n years as
⎛ 1⎞n
R =1 − ⎜1 − ⎟ (15.28)
⎝ T⎠
This equation can be used to calculate the probability that x will occur within
its return period:
⎛ 1⎞T
PT = 1 − ⎜ 1 − ⎟ (15.29)
⎝ T⎠
For large T, as already shown,
PT = 1 − e −1 = 0.63
This indicates that the probability that x will occur within its return period is
about 64%. Thus a dam designed to withstand a flood with a 25-year return
period has a 64% chance that this design flood will be exceeded before the end of
the first 25-year period.
For design purposes it might be desirable to specify some probability that
the undesirable event would occur within the design period and calculate the
required return period. If R is the risk that the event will occur within the design
period then
n
⎛ 1⎞
R = 1 − qn = 1 − ⎜ 1 − ⎟
⎝ T⎠
Thus, one can compute the values of the design return period T correspond-
ing to a number of values of the risk R and the design period n.
The probability R is also called the encounter probability. Suppose a dam is
built for a postulated life of L time units (say, years). The probability that an
event with intensity x will occur during the life of the dam is the encounter
probability, E(x), and is a measure of risk. The probability of no exceedance dur-
ing L time units is [F(x)]L. Hence, the probability of one or more exceedances is
E = 1 – [F(x)]L (15.30)
The relationship between E and T is given by
⎡ 1 ⎤L
E = 1 − ⎢1 − ⎥ (15.31)
⎣ T⎦
696 Risk and Reliability Analysis
Equations 15.30 and 15.31 have the same appearance because there are one
or more exceedances of x in time L if N ≤ L. Hence, E = P(N ≤ L). Table 15-7
shows values of the encounter probability for various values of the estimated life
L and return periods T. Table 15-6 shows the return periods for various values of
the encounter probability and estimated life. A comparison of the waiting time
and the encounter probability brings out several interesting properties. For
example, a dam with a 50-year life has a better than even chance of encountering
50-year floods during its life. Indeed the probability is 0.636. Thus, depending on
the amount of risk one is willing to take, a much higher return period flood will
have to be used for a 50-year dam. As an example, for a 10% risk, a 475-year
flood will have to be used.
Table 15-6 Return periods T1 for estimated life L and encounter probability
E1 [= 1 – (1 – 1/ T)L].
E1
L
0.02 0.05 0.10 0.15 0.20 0.30 0.40 0.50 0.70
1 50 20 10 7 5 3 3 2 1
2 99 39 19 13 9 6 4 3 2
3 149 59 29 19 14 9 6 5 3
4 198 78 38 25 18 12 8 6 4
5 248 98 48 31 23 15 10 8 5
6 297 117 57 37 27 17 12 9 6
7 347 137 67 44 32 20 14 11 6
8 396 156 76 50 36 23 16 12 7
9 446 176 86 56 41 26 18 13 8
10 495 195 95 62 45 29 20 15 9
12 594 234 114 74 54 34 24 18 10
14 693 273 133 87 63 40 28 21 12
16 792 312 152 99 72 45 32 24 14
18 892 351 171 111 81 51 36 26 15
20 990 390 190 124 90 57 40 29 17
25 1238 488 238 154 113 71 49 37 21
30 1485 585 285 185 135 85 59 44 25
35 1733 683 333 216 157 99 69 51 30
40 1981 780 380 247 180 113 79 58 34
45 2228 878 428 277 202 127 89 65 38
50 2475 975 475 308 225 141 98 73 42
Risk Analysis and Management 697
Example 15.7 Suppose a dam is designed with a projected life of 25 years. The
designer wants to take only a 10% chance that the dam will be overtopped
within this period. What return period flood should be used?
Solution Given n = 25, R = 10%, one gets T = 238 years. This is the return period
of the flood one should use in design. A useful approximation for the previous
expression for R is
⎛ 1 1⎞
T =n ⎜ − ⎟
⎝ R 2⎠
error. Its probability is usually designated by symbol α. A type II error occurs when
the null hypothesis is wrongly accepted. Its probability is usually designated by the
symbol β. In the decision tree of Fig. 15-5, q/I = α and q/II = β. We now illustrate
these decision problems with a number of examples.
f(x)
Probability of
Error I α
f(x)
1-α
α/2 α/2
| | 275 | |
225 250 300 325
Critical region Acceptance region Critical region
f(x) β as a function of μ
X1 X2
5% 5%
209.2 242.1 275.0 307.9 340.8 μ→ X
←-32.9 -→ ←- 32.9 →← 32.9 -→←- 32.9 →
Acceptance region
Figure 15-8a Operating characteristics curve.
1.00
0.95
0.90
Beta Values = Prob(Not Rejecting H0)
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0 10 20 30 40 50 60 70 80 90 100
f(x)
←1.645σ→←1.645σ→←1.645σ→←1.645σ→
5% 5% 5% 5%
μ1 ← Acceptance region → μ2 x
0.9
0.8
N=1
0.7 N = 25
0.6
Beta values
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Solution The firm argues that there is bound to be a difference in the proportion
of water quality violations even if the two methods are equivalent. This leads us
to adopt as null hypothesis the assumption that the two methods do not differ in
their probabilities of producing violations of waste norms. This hypothesis
should not be rejected unless the probability of a type I error is less than, say, 5%.
First, the random variable for which the acceptance region and the critical
regions are to be defined is to be identified. One can begin by defining as X228 the
number of defectives in a sample of 228, and as X337 the number of defectives in
Risk Analysis and Management 705
a sample of 337. One is interested, however, in the difference between X337 and
X228, which is denoted by Y. Can we determine the probability distribution of Y,
assuming the null hypothesis to be true?
If the null hypothesis were true, then one can pool the two samples and state
that one had 65 days of exceeding waste limits out of 565 days. The probability
of violation per single sample is then p = 65/565 = 0.115. Now, X337 and X228 fol-
low binomial distributions as
X337 ~ B(337, 0.115)
X228 ~ B(228, 0.115)
For X337, the variance is
npq = 337 × (0.115) × (0.885) = 34.30
For X228, the variance is
npq = 228 × (0.115) × (0.885) = 23.20
In both cases, the value of npq is well over 9; hence the approximation is jus-
tified. One can write
Example 15.10 Experience shows that, under the prevalent conditions of con-
trol, a production process will show 5% defective items. An inspector is told that
he must take action if the probability of a breakdown in control exceeds 95%.
Periodically, he takes a sample of 50 items. What minimum number of defectives
in this sample should he regard as reason for action?
706 Risk and Reliability Analysis
f(y’)
f(y’)
Observed Y
| | | | |
y’
-10 0 10 20 30
Critical region Acceptance region
Solution The random variable for which the acceptance region must be deter-
mined is evidently the number of defective items in a sample of 50. Calling this
variable X, one can write
X~ B(50, 0.05)
if the null hypothesis “no breakdown in control” holds. The variance is now npq =
50(0.05)(0.95) = 2.375, which is substantially smaller than 9. The normal distribu-
tion is, therefore, not an acceptable approximation. In such cases, the Poisson dis-
tribution often makes a good approximation. A rule in this regard is that the
Poisson distribution is acceptable if n ≥ 20 and p ≤ 0.05. This is the case here. The
Poisson distribution has one parameter, the average number of defectives in a
sample of 50, which is 2.5:
X`~ P(2.5)
The value of X` that has a 5% probability of being exceeded can be found in
standard tables. The critical number is 6, as shown in Fig. 15-12.
Example 15.11 An experiment is carried out to determine the effect of the load-
ing rate on the measured compressive strength of concrete test cylinders. To this
end, 54 test cylinders were loaded to failure at a slow speed and 36 were tested
at a fast speed. The results of both tests were found to be normally distributed.
The first run gave a mean strength of 28 MPa and the second gave a mean
strength of 30.5 MPa. The standard deviation was not significantly different for
the two runs and was taken to be 4.0 MPa. Is the difference significant at the 95%
confidence level?
Solution For the null hypothesis, one can assume that there is no difference in
the strength measurement. The random variable for which the acceptance region
must be determined is the difference in the average strength D for samples of
size 54 (m54) and 36 (m36):
D = m36 – m54
Risk Analysis and Management 707
30% --
100%
25% -- 95%
20% -- 90%
85%
15% --
10% --
5% --
0 -
0 1 2 3 4 5 6 7 8 9 X→
Acceptable region Critical region
Figure 15-12 Poisson distribution function and the critical region.
Since m54 and m36 are both normally distributed, the difference is also nor-
mally distributed. On the basis of the null hypothesis, there is no difference in
the means of m54 and m36, and their variances are, respectively, 16/54 and 16/36.
One can therefore write
16 16 0.5
D ~ N[0,( + ) ]
54 36
or
D ~ N(0,0.86)
To determine the acceptance region, consider that there is little reason to
expect the faster test to result in lower strength values. The experiment therefore
is regarded as a one-sided test. The acceptance region is then determined by
Z < 1.645 or D < 1.41. It has been observed that D = 2.5; this point lies in the
critical region and the null hypothesis must therefore be rejected. The difference
in measured strength is significant.
Notice that in this example the two samples were assumed to be indepen-
dent; otherwise one could not add the variances to obtain the variance of the dif-
ference. This condition is not always met in practice.
Suppose one would not wish to set up a special testing program for the
determination of the effect of the loading rate. Instead, one could, whenever a
cylinder had to be tested, produce two cylinders from the same batch. One
would be tested at the standard slow rate, but the other at the fast rate. The sam-
ples would then be paired. One would expect a correlation between sample
items since the effect of different aggregate, different water/cement ratio, differ-
ent cement content, etc. would be reflected in each of the pairs in the same way.
The variance of the difference is then reduced by twice the covariance:
var(X – Y) = var(X) + var(Y) – cov(X,Y) (15.32)
708 Risk and Reliability Analysis
The correct procedure in this case is to determine the differences between the
paired items first. Let this difference be denoted by
Y = X1 – X2
One then has n values of Y and can determine the mean my, which can be
assumed to be normally distributed. Assume that n is large so that the variances
of Y can be determined reliably from the sample. On the basis of the null hypoth-
esis, there is no difference in the real mean strength, so μy is zero. The standard
deviation of my will be sy /n0.5. One can then determine the acceptance region of
my and see if the observed value of my falls within it.
The problem is, in principle, quite similar to the one discussed earlier. One
must judge whether the observed difference in sample means ma and mb is sig-
nificant. The first difference, however, is that in the previous problem the stan-
dard deviation was given. Here, one has to determine the standard deviation
from the samples, and the two observed standard deviations are disconcertingly
different. The first step, therefore, should be to determine whether it is reason-
able to assume that the two cements do lead to the same standard deviation and
that the observed difference is due only to chance. One might anticipate the
results and assume that both sa and sb come from the same distribution and their
difference is due to chance.
Risk Analysis and Management 709
This does not solve the problem of what to take for the standard deviation. The
proper procedure is to calculate the pooled variance of the two samples by adding
the sums of the squares of the deviations from the sample means and dividing that
sum by the total number of degrees of freedom, (N1 – 1) + (N2 – 1). Since sa and sb have
already been calculated, one can simply multiply each by N – 1, adding the results
and dividing by (N1 – 1) + (N2 – 1). In this particular case, where the sample sizes are
equal, the procedure is the same as taking the average of the variances. The result is
that the pooled variance is 30,313 and the pooled standard deviation is 174 MPa.
The second difference from the previous problem is that the sample sizes are
small and the standard deviation is determined from the samples. The variance of
the mean in a sample of 6 is calculated from sm2= 30,313/6 = 5,052, and the vari-
ance of the difference between the two means of samples of 6 is equal to twice this
number or 10,104. This makes the standard deviation of the difference equal to
100.52 MPa. But this is a sample standard deviation, not a true standard deviation.
Under these circumstances, one must use the t distribution instead of the normal
distribution. In this case the t distribution has 10 degrees of freedom (N1 + N2 – 2):
D = 0 + 100.52 T10
At the 95% confidence level, the acceptance region for the t distribution with
10 degrees of freedom is ±2.228. This gives an acceptance region for D between
+214 and –224 MPa. The observed difference was 4,673 – 4,370 = 303 MPa. This is
well outside the acceptance region, so the null hypothesis of “no difference”
must be rejected.
One can now consider here only the relatively simple problem of judging the
null hypothesis that the variances from two samples are chance observations of
the same random variable and that the difference is therefore not significant. It
has been seen that the sample variance is a random variable, the distribution of
2
which is related to the χ distribution:
σ2 2
s2 = χ ( N − 1) (15.33)
N −1
Suppose now that one has two samples for which the sample variances have
been calculated. One wants to test the null hypothesis that the sample variances
are estimates of the same parameter σ 2. One then defines a variable F, which is
the ratio of the two sample variances:
s12
F= (15.34)
s22
Substitution of Eq. 15.28 into this equation results in
χ 2 ( N − 1)( N − 2)
F= (15.35)
χ 2 (m − 1)(m − 2)
710 Risk and Reliability Analysis
where N and m are sample sizes. The terms (N – 1) and (m – 1) are the degrees of
freedom in Eq. 15.35.
The F distribution is a well-known distribution for which tabulated values
are readily available. Tables giving values of the cumulative distribution for a
range of degrees of freedom and several levels of significance are widely avail-
able. It should be noted that to calculate F, one must put the largest variance in
the numerator and the smallest in the denominator. This is because the F test has
been set up as a one-sided test in which the alternative to the null hypothesis is
the alternative hypothesis that s1 is larger than s2.
One can demonstrate the use of the F test by examining sa2 and sb2 in the pre-
vious example where these variances had been obtained from two samples of 6
items. There
It can be seen that the null hypothesis cannot be ruled out even at the 90%
confidence level. Note that the result obtained here is largely negative. The small
samples make it impossible to rule out the null hypothesis that the variances are
obtained from the same random variable. That does not mean, however, that the
results should inspire confidence in the correctness of the null hypothesis.
15.7 Questions
15.1 There is always a great deal of interest in weather. What is the risk from
the coldest weather on record occurring next winter in Houston, Texas?
15.2 Consider a small dam pond for local flood control in an urban area.
What could the risk be from the failure of this detention structure?
15.3 What is the health risk from air pollution?
15.4 Consider instantaneous peak discharge data for a number of years for a
river near the town in which you live. Assume that the peak discharge
data follow a two-parameter log-normal distribution. The rating curve at
the nearest gauging site is also known. A private company wants to con-
struct a chemical plant near the river. Compute the risk that the plant
Risk Analysis and Management 711
will not be flooded by a 100-year flood. Also, what will be the reduction
in flooding risk if the plant were elevated one meter above the ground?
15.5 Find the variance of the return period for different values of nonexceed-
ance probabilities.
15.6 What is the probability that a 500-year flood will occur at least once in
100 years?
15.7 Suppose a dam is designed with a projected life of 100 years. The
designer wants to take only a 5% chance that the dam will be overtopped
within this period. What return period flood should be used?
15.8 A chemical plant has installed a pollution abatement device, referred to as
Method A. This method has resulted in 40 days out of 300 days when the
poor air quality exceeded the prescribed limits. For fear of avoiding penal-
ties, the plant operator considers an alternative method, Method B, which
has produced 30 days of exceeding waste limits out of 300 days when
used at another place. Method B looks superior, but before considering a
changeover, the plant operator wants to be certain that the difference is not
due to chance. How should the plant operator assess the situation?
15.9 At a cement manufacturing plant, the quality of cement is to be inspected.
It is found that 5% of the cement bags do not meet the prescribed quality
standard. The production process must be amended if the probability of
defective cement bags exceeds 5%. For cement testing, a sample of 30
bags is used. What minimum number of bags in this sample should be
regarded as reason for action? For simplicity, the normal approximation
can be employed here.
Chapter 16
Reliability Analysis of
Water Distribution
Networks
712
Reliability Analysis of Water Distribution Networks 713
p1 v12 p v2
z1 + + = z2 + 2 + 2 + hL1− 2 (16.2)
γ 2g γ 2g
where z1 is the elevation head, p1 is the pressure head, γ is the unit weight of
water, v1 is the velocity of flow; subscript 1 denotes that the variables refer to
cross section 1; and hL1–2 denotes head loss between cross sections 1 and 2. The
terms in Eq. 16.2 are explained in Fig. 16-1. The hydraulic grade line (HGL) is a
line that is p/γ above the center line of the pipe. If a piezometer is attached to the
pipe, water will rise up to the HGL. The energy grade line (EGL) is v2/2g above
the HGL.
The loss of energy in a pipe network takes place because of the roughness of
pipes, turbulence, and viscous stress. Some energy is also lost in contractions,
expansions, bends, joints, and valves. This loss is termed as minor loss. Gener-
ally, the minor head loss is proportional to v2/2g.
p1 p2 (15 − 11)kPa
hL = − = = 0.404 m
γ γ 9.89kN/m 3
Example 16.2 For the pipe of Example 16.1, let the elevation of the upstream end
be 2 m higher than that of the downstream end. Further, at the downstream end,
the pipe diameter is 50 mm (whereas it is 60 mm at the upstream end). Find the
head loss between the two sections.
Solution The velocity of flow at section 1 will be
At section 2,
15 1.77 2 11 2.552
2+ + =0+ + + hL ,1− 2
9.89 2 × 9.81 9.89 2 × 9.81
so
hL1-2 = 2.232 m
Reliability Analysis of Water Distribution Networks 715
1 2
hL1-2
EGL
HGL
v2/2g
p/J
Datum
Figure 16-1 Energy terms for pipe flow.
where v is the flow velocity (in meters per second), R is the hydraulic radius (in
meters), Sf is the friction slope (in meters per meter), and C is the Hazen–
Williams roughness coefficient, which depends upon the pipe properties. The
hydraulic radius is the ratio of cross-sectional area and wetted perimeter. For a
circular pipe, R = A/P = πr2/2πr = r/2, where r is the radius of the pipe.
For a smooth plastic pipe, a typical value of the coefficient C may be about
150 (metric units). The head loss from friction (in meters) per 1,000 m pipe length
can be computed by
1.85
⎛ 151Q ⎞
hL = ⎜ = KQ1.85 (16.4)
⎝ CD2.63 ⎟⎠
where D is the diameter of the pipe (in meters) and K is the pipe coefficient.
Another commonly used equation for head loss in pipes is the Darcy–
Weisbach equation
L V2
hL = f = KQ 2 (16.5)
D 2g
where L is pipe length (in meters) and f is the Darcy–Weisbach friction factor,
which depends, among other things, on the relative roughness of the pipe and
716 Risk and Reliability Analysis
the Reynolds number (Re). The Reynolds number is the ratio of inertial forces to
viscous forces (vD/ν, where ν is the kinematic viscosity). For laminar flow (Re <
2,100),
f = 64/Re (16.6)
For turbulent flow (Re > 2000), the roughness is obtained from the Karman
and Prandtl equations:
1
smooth pipe: = 2 log10 (Re f ) − 0.8 for Re > 3000 (16.7)
f
1 k
rough pipe: = 1.14 − 2 log10 ( s ) (16.8)
f D
Knowing the relative roughness and the Reynolds number, one can read the
friction factor for a pipe from the Moody diagram. The relative roughness equals
ks/D. Here, ks is the equivalent sand roughness, which is the resistance character-
istics produced by a pipe of the same diameter, internally coated with sand par-
ticles having diameter ks. From a computational viewpoint, it is convenient to
use the equation proposed by Swamee and Jain (1976) to compute f:
0.25
f = 2
⎡ ⎛ ks 5.74 ⎞ ⎤ (16.9)
⎢log ⎜ + 0.9 ⎟ ⎥
⎢⎣ ⎝ 3.7 D Re ⎠ ⎥⎦
For flow in open channels, the velocity can be computed using Manning’s
equation
1 2 / 3 1/ 2
v= R S0 (16.10)
nm
where S0 is the slope of the channel bed and nm is a coefficient, known as Man-
ning’s roughness coefficient, whose values depend upon the properties of the
channel cross section; higher values represent a “rough” cross section. Barnes
(1962) has tabulated values of n for natural channels. For a concrete channel, a
typical value of n may be 0.013, and n for a straight earthen section may be
0.02. For a steel pipe, flowing partially full, n is about 0.012; it is 0.014 for a
cast-iron pipe.
Another popular equation to compute the velocity in a channel is Chezy’s
equation
v = C (RS0)0.5 (16.11)
1
f = = 0.0196
[1.14 − 2 log10 (0.001)]2
0.25
f = 2
= 0.0198
⎡ ⎛ 0.005 5.74 ⎞⎤
⎢log ⎜ + ⎟⎥
⎢⎣ ⎝ 3.7 × 0.5 (2 × 10 ) ⎠ ⎥⎦
6 0 . 9
n
hLe = ∑ hLi (16.13)
i =1
and
n
K e = ∑ Ki (16.14)
i =1
When pipes are connected in parallel as shown in Fig. 16-3, the head loss in
each pipe between the junctions will be the same. Thus
hL1 = hL2 = hL3 = …
Q = Q1 + Q2 (16.15)
718 Risk and Reliability Analysis
hL1
hL2
hL3
Q
Figure 16-2 A network of pipes in series.
hL1= hL2
Q1
Q = Q1 + Q2
Q2
Figure 16-3 A network of pipes in parallel.
Substituting the value of Q from Eq. 16.4 or Eq. 16.5, one obtains
1/ n 1/ n 1/ n
⎛ hL ⎞ ⎛h ⎞ ⎛h ⎞
⎜⎝ K ⎟⎠ =⎜ L⎟ +⎜ L ⎟ (16.16)
e ⎝K ⎠ 1 ⎝K ⎠2
or
1/ n 1/ n 1/ n
⎛ 1⎞ ⎛ 1⎞ ⎛ 1 ⎞
⎜⎝ K ⎟⎠ =⎜ ⎟ +⎜ ⎟ (16.17)
e ⎝ K1 ⎠ ⎝ K2 ⎠
v1 ⎛f L D ⎞
= ⎜ 2 2 2⎟ (16.18)
v2 ⎝ f1 L1 D1 ⎠
Reliability Analysis of Water Distribution Networks 719
Example 16.4 Two tanks are connected through two pipes as shown in Fig. 16-4.
The flow of water from the upper tank to the lower one is at 0.04 m3/s and the
Darcy–Weisbach friction factor is 0.03. Find the elevation of water in the lower
tank if the elevation of water in the upper tank is 100 m.
Solution The water velocities in the pipes are
z2 = z1 − hL 2 − hL 2
400 1.274 2 600 0.566 2
= 100 − 0.03 − 0.03 = 94.06m
0.20 2 × 9.81 0.30 2 × 9.81
Example 16.5 Two tanks are connected through pipes as shown in Fig. 16-5. The
length of each pipe is 100 m. The diameter of pipes 1 and 2 is 40 cm and that of
pipe 3 is 30 cm. The elevation of water in the first tank is 100 m and it is 90 m in
the second tank. If the Darcy–Weisbach friction factor is 0.02 for all the pipes,
find the velocity of water in the pipes and the discharge through pipe 1.
Solution Since pipe 2 and pipe 3 are in parallel connection, the head loss in
them will be equal. Therefore, from the Darcy–Weisbach equation, one has
L3 v32 L v2
f = f 2 2
D3 2 g D2 2 g
or
100 v32 100 v22
f = f
0.3 2 × 9.81 0.4 2 × 9.81
which upon simplifying gives
v2 = 1.155v3
The discharge through pipe 1 will be the same as the sum of discharges
through pipes 2 and 3. Hence,
A1v1 = A2v2 +A3v3
so
3.14 × (402/4)v1 = 3.14 × (402/4)v2 + 3.14 × (302/4)v3
which gives
v1 = 1.7175v3
720 Risk and Reliability Analysis
z1 = 100 m
Tank
1 L = 600 m z2 = ?
D = 30 cm
L = 400 m Tank
D = 20 cm 2
z1 = 100 m
ZL = 100 m
Tank D = 30 cm
1 z2 = 90 m
XL =100 m Tank
D = 40 cm 2
YL = 100 m
D = 40 cm
Figure 16-5 Two tanks connected by pipes in parallel.
The total head loss through the system is 10 m. This permits us to write
where Qe,j is the external inflow and demand at node j. The energy conservation
equation for each primary loop can be written as
ΣhL = ΣEp (16.21)
where hL is the energy loss in each pipe and Ep is the energy imparted to the flow
by the pumps. If no energy is imparted, the right-hand side will be zero.
h = k Qn (16.22)
where k is a constant and n is an exponent. If the Hazen–Williams equation is
used, n = 1.85.
In the absence of the knowledge of discharge Q flowing through a pipe, let
the assumed discharge be Q1. We can write
Q = Q1 + Δ (16.23)
where Δ is the error in the assumed discharge. Substituting Q in Eq. 16.22 yields
If Δ is small compared to Q, the higher order terms can be neglected and this
yields
For a pipe loop, the sum of head losses for all pipes must be zero. This yields
ΣkQn = 0
or
Σk[Q1n + n Q1n-1 Δ] = 0
and so
ΣkQ1n Σh
Δ=− =− (16.25)
nΣkQ1n−1 n(Σh /Q)
Note that the term in the numerator has a sign whereas, in the denominator,
the absolute value of the terms is taken. Eq. 16.25 is used in the Hardy–Cross
method to get the value of correction that is applied to the assumed flow
through a pipe to obtain a better value. The steps of the Hardy–Cross method
are as follows:
1. Assume a distribution of flow in the network that should satisfy the con-
tinuity equation. Ensure that, at a junction, the sum of flows entering
must be equal to the sum of flows leaving.
2. Determine the head loss in each pipe. By convention, clockwise flows are
given a positive sign and counterclockwise flows are given a negative sign.
3. Compute head loss for each loop.
4. Determine correction term using Eq. 16.25. If the largest of the correc-
tions is smaller than a predetermined limit, stop. Otherwise, compute
corrected flows and go to step 2.
The computations of this method can be easily programmed on a computer.
Example 16.6 A simple WDN is shown in Fig. 16-6. The network consists of two
loops. The diameters and lengths of the various pipes are shown in the diagram.
Water enters the network at node A and the direction of flows through the vari-
ous pipes is shown with the help of arrows. The demands at various nodes are
shown in the diagram. Assume the roughness coefficient is C = 100 for all pipes.
Compute the flow in each pipe by the Hardy–Cross method.
Solution To begin calculations, assume a flow in each pipe such that the
demands are met and mass balance is maintained. Now compute the head loss
for each pipe. The computations are carried out iteratively. As an example, the
head loss hL for pipe AB is computed by the use of Eq. 16.4 with the Hazen–
Williams roughness coefficient C =100:
1.85
⎛ 151 × 0.9 ⎞
hLAB =⎜ = 40.4233m
⎝ 100 × 0.552.63 ⎟⎠
Reliability Analysis of Water Distribution Networks 723
where hi and hj are the heads at nodes i and j, respectively, and Kij is the pipe
coefficient for the connecting pipe. The main advantage of the node formulation
is that it has fewer equations, but the equations are nonlinear, which means they
are difficult to solve by hand. Figure 16-7 shows a node that receives flow from
node i and two pipes carrying water to nodes i + 1 and i + 2. Also, the discharge
Qout leaves the network at this node.
0.4 m 0.55 m
1000 m 1250 m
0.4 m 0.4 m
1200 m 1200 m
j
i+1 i+2
Qout
Figure 16-7 A WDN node with three pipe connections and an outflow.
The mass balance equation for node j can be written (with flow toward a
node as positive) as
Qi,j – Qj,i+1 – Qj,i+2 – Qout = 0 (16.27)
or
0.54 0.54 0.54
⎛ hi − h j ⎞ ⎛ h j − hi +1 ⎞ ⎛ h j − hi + 2 ⎞
⎜ ⎟ −⎜ ⎟ −⎜ ⎟ − Qout = 0 (16.28)
⎝ Ki , j ⎠ ⎝ K j ,i +1 ⎠ ⎝ K j ,i + 2 ⎠
Equation 16.28 can be written for each node. Thus, there will be a system of
nonlinear equations with the same number of equations as the number of
unknowns. These are then solved to obtain the unknown heads and thereby the
flows in the pipes.
hb – ha = K Q1.85 (16.30)
and hb > ha. Equation 16.30 can be linearized using the Taylor series expansion as
where Qio is the estimated discharge in pipe i and q is the correction term. We write
Qi = Qio + q (16.32)
There will be n such equations for n nodes. Note that the equations are linear
because the unknown head appears with a power of unity. This set of equations
can be solved to get the unknowns in an iterative manner. The Hardy–Cross
node method begins with a set of estimated heads. These form inputs for com-
puting flows at the nodes and the residual discharge rates are determined by
using the continuity equation. The heads are iteratively adjusted to get the solu-
tion. Many software packages are available to analyze a WDN. WADISCO
(Water Distribution Simulation and Optimization) by Walski et al. (1990) is one
such package. The software KYPIPE was developed by Wood (1980) and Ross-
man (2000) has described the software EPANET. The software FlowMaster
(Meadows and Walski 1998) can be used for hydraulic analysis and design of
pipes, ditches, and open channels.
Example 16.7 Consider the WDN shown in Fig. 16-8. The network has 30 nodes
and 42 pipe links. The network receives water supply at node 1 whereas the other
nodes are either demand nodes or connecting nodes. The demand nodes are
shown as solid black dots and the connecting nodes are shown by double circles.
Table 16-1 shows the demand (m3) at various nodes along with node elevation.
Pipe details, such as diameter (mm) and length (m), are given in Table 16-2. The
Hazen–Williams coefficient (HWC) is 100 (metric units) for each pipe. Find the
flow and head loss in each pipe. This example is excerpted from Kumar (1999).
Solution The WDN is hydraulically analyzed by assuming that all nodes are
perfectly functional. The pressure at various nodes (in meters and tons per
square meter, TPSM), and the energy grade line (in meters) are shown in
Table E16-7a. The discharge in cubic meters per day (cmd), velocity, and head
loss in different pipes are given in Table E16-7b.
3 14 11 21 14 24 17
13 20 27
`10
6 7 15 25 19 26
28
18
2 12 20 30
16
6 29 21
1
31
2 23
7 15 32 22
3
1 4 22 36 35 33 34
5 4 8 10 17 41 27 25
9 16 28 24
8
18 23 42 38 37
5 11 9 12 13 19 29 40 30 39 26
Figure 16-8 WDN network of Example 16.7.
capacity of components is declining and many of them are failing. Since water is
a basic necessity, WDNs are required to have high reliability. Among other
things, this requires that a certain amount of redundancy should be introduced
into the network. A redundant network has an adequate residual capacity and
alternate flow paths to provide uninterrupted water supply.
In the study of the reliability of a WDN, two categories are identified:
mechanical reliability and hydraulic reliability. Mechanical reliability is con-
cerned with the failure of the network components, such as pumps and pipes. It
mainly depends on the design and manufacture of the components, age, and
environment. Hydraulic reliability of a WDN refers to its ability to provide suffi-
cient water to meet the demand at a required pressure. The hydraulic failure of a
network could arise either from mechanical failure or when the demand out-
strips its ability to supply the requisite quantity of water.
728 Risk and Reliability Analysis
Table E16-7a Pipe network analysis (node data and analysis) for Example 16.7.
Node Elevation (m) Discharge Energy grade Pressure Pressure
(cmd) line (m) head (m) (TPSM)
1 65.0 –19700.0 100.0 35.0 35.0 (supply)
2 60.0 700.0 95.3 35.3 35.3
3 60.0 700.0 87.4 27.4 27.4
4 60.0 700.0 99.2 39.2 39.2
5 60.0 1000.0 95.8 35.8 35.8
6 60.0 1000.0 94.9 34.9 34.9
7 60.0 95.5 35.5 35.5
8 60.0 900.0 98.2 38.2 38.2
9 60.0 500.0 94.4 34.4 34.4
10 60.0 1000.0 83.7 23.7 23.7
11 60.0 1200.0 83.0 23.0 23.0
12 60.0 1000.0 86.0 26.0 26.0
13 60.0 92.9 32.9 32.9
14 60.0 76.3 16.3 16.3
15 60.0 1200.0 88.1 28.1 28.1
16 60.0 600.0 90.1 30.1 30.1
17 60.0 800.0 75.3 15.3 15.3
18 58.0 1000.0 72.7 14.7 14.7
19 56.0 500.0 72.0 16.0 16.0
20 57.0 700.0 73.9 16.9 16.9
21 57.0 500.0 70.4 13.4 13.4
22 55.0 70.8 15.8 15.8
23 56.0 1200.0 73.1 17.1 17.1
24 54.0 1000.0 70.1 16.1 16.1
25 53.0 1000.0 65.3 12.3 12.3
26 54.0 800.0 74.9 20.9 20.9
27 53.0 500.0 75.5 22.5 22.5
28 54.0 600.0 79.2 25.2 25.2
29 57.0 86.1 29.1 29.1
30 57.0 600.0 82.8 25.8 25.8
Reliability Analysis of Water Distribution Networks 729
Table E16-7b Pipe network analysis (pipe data and analysis) for Example 16.7.
Pipe Node Diameter Length Flow Velocity Head loss
From To (mm) (m) (cmd) (m/s) (m)
1 1 2 300.0 600.0 7018. 7 1.15 4.72
2 1 6 200.0 500.0 2791.3 1.03 5.14
3 1 7 150.0 900.0 887.6 0.58 4.50
4 1 4 300.0 400.0 3390.4 0.56 0.82
5 1 5 300.0 800.0 5611.9 0.92 4.16
6 2 3 250.0 500.0 6318.7 1.49 7.86
7 6 3 150.0 800.0 1242.2 0.81 7.45
8 8 7 200.0 600.0 1790.4 0.66 2.71
9 4 8 250.0 300.0 2690.4 0.63 0.97
10 7 13 250.0 800.0 2678.0 0.63 2.57
11 5 9 300.0 400.0 4611.9 0.76 1.45
12 9 13 300.0 500.0 4111.9 0.67 1.46
13 3 10 250.0 600.0 3836.2 0.90 3.75
14 3 11 200.0 500.0 2573.9 0.95 4.42
15 3 12 150.0 1000.0 450.9 0.30 1.43
16 6 12 100.0 600.0 549.1 0.81 8.87
17 13 15 150.0 900.0 926.1 0.61 4.86
18 13 16 200.0 600.0 1831.4 0.67 2.82
19 13 29 250.0 1000.0 4032.5 0.95 6.85
20 10 14 200.0 700.0 2836.2 1.04 7.40
21 11 14 150.0 600.0 1373.9 0.90 6.73
22 16 15 100.0 500.0 273.9 0.40 2.04
23 16 29 150.0 700.0 957.5 0.63 4.02
24 14 17 300.0 700.0 2678.9 0.44 0.92
25 14 20 200.0 700.0 1531.1 0.56 2.36
26 17 18 200.0 800.0 1519.4 0.56 2.66
27 17 19 100.0 500.0 359.5 0.53 3.37
28 18 19 100.0 600.0 140.5 0.21 0.71
29 20 22 150.0 700.0 831.1 0.54 3.10
30 18 21 100.0 300.0 378.9 0.56 2.23
31 22 21 100.0 400.0 121.1 0.18 0.36
32 23 22 100.0 800.0 224.0 0.33 2.25
33 22 25 100.0 600.0 425.6 0.63 5.54
34 22 24 150.0 400.0 508.4 0.33 0.71
35 27 23 150.0 1000.0 598.5 0.39 2.41
36 28 23 150.0 1400.0 825.5 0.54 6.11
37 26 24 100.0 400.0 491.6 0.72 4.82
38 26 25 100.0 600.0 574.4 0.85 9.64
39 30 26 150.0 400.0 1866.0 1.22 7.91
40 29 30 200.0 400.0 2466.0 0.91 3.27
41 28 27 150.0 500.0 1098.5 0.72 3.71
42 29 28 150.0 200.0 2524.0 1.65 6.92
730 Risk and Reliability Analysis
Rs = Pr [(S − D) > 0 ]
∞ ⎡∞ ⎤
= ∫ fS (S) ⎢ ∫ fD (D)dD ⎥ dS (16.34)
0 ⎢⎣ 0 ⎥⎦
where Pr() stands for probability, and fS(S) and fD(D) are the probability density
functions of supply and demand, respectively. Equation 16.34 is difficult to solve
since the joint probability density function of supply and demand is difficult to
derive. If supply and demand follow normal distribution, the reliability (Rs) can
be computed as
⎛ μ −μ ⎞
RS = φ ⎜ S D ⎟ (16.35)
⎜⎝ σ 2 + σ 2 ⎟⎠
S D
where D0 is the demand truncation level or design capacity of the system, Rh1 is
the network hydraulic reliability when D < D0 (which equals one as the network
is designed for D0), Rha is the network hydraulic reliability when D > D0, and all
the pipes in the network are operational.
Reliability Analysis of Water Distribution Networks 731
Since the demand is less than the capacity of the system and all the pipelines
are working, computations for the first component of Eq. 16.36 can be carried
out analytically as discharge and pressure criteria will be satisfied at any node of
the system. However, when the demand is greater than the network capacity,
some of the nodes may not receive water with sufficient pressure head. To com-
pute the second component of Eq. 16.36, demand above D0 is divided into m dis-
crete demand intervals and we get
m
Pr {(S − D) > 0, D > D0 } × Rha = ∑ Pr {(S − D) > 0 , Di −1 < D > Di } × (Rha )i −1,i (16.37)
i =1
where the number of discrete intervals (m) depends upon the accuracy of the
results desired. For simplicity, each demand interval is assumed to be repre-
sented by the average demand of that interval. For example, demand Dk, which
is the average of Di-1 and Di, represents the kth demand interval as far as the
hydraulic reliability is concerned. A hydraulic simulation of WDN having n
demand nodes is carried out for this Dk and the corresponding pressure heads at
all the demand nodes are computed. The water supplied to the jth node will
depend on the head attainable at that node. For each node, two head limits must
be given: a minimum head, Hmin, and a service head, Hs.
For system performance to be termed satisfactory, all the imposed demands
for each node should be met with heads above the service limit (Hs). If the avail-
able head (H) at a node is below Hs but above Hmin, the system cannot supply the
full demand. It can meet the reduced supply at that node. However, no supply is
possible if the pressure head at the node is below Hmin. Many relationships are
available to estimate this reduced supply.
The reduced supply can be computed using the following equation:
m Q j ,supplied
R j = ∑ Pr {(S − D) > 0 , Di −1 < D < Di } × (16.39)
i =1 Q j ,required
732 Risk and Reliability Analysis
The network hydraulic reliability for the ith demand interval can be com-
puted by taking the arithmetic mean or the weighted average of the nodal
hydraulic reliability. If the arithmetic mean is taken, one gets
1 n
(Rha )i −1,i = ∑ Rj
n j =1
(16.40)
Example 16.8 A WDN is shown in Fig. 16-9. The hydraulic equivalence of this
WDN is shown in Fig. 16-10. The water demand for this network is 13,500 cubic
meters per day (cmd). The HWC for all the pipes is 100. The pressure head at the
supply node is 35 m. The service head (Hs) required to satisfy the full demand at a
demand node is 16 m. No flow is possible if the residual pressure at a given
demand node is less than 12 m, which is the minimum required head (Hmin). If
the pressure at a demand node is between 12 and 16 m, the demand will be satis-
fied only partially. The capacity of the network is 12,000 cmd. Therefore, all the
demand nodes will receive water at a pressure of more than 16 m when the net-
work demand is less than or equal to 12,000 cmd. Based upon the past data and
the local climatic conditions, the following statistical data are available:
mean of supply series (μs) = 17,000 cmd, standard deviation (σ s) = 2,000 cmd
mean of demand series (μD) = 15,000 cmd, standard deviation (σ D) = 3,000 cmd
Supply
Legend
Demand node
Supply node
⎛ 17000 − 15000 ⎞
RS = φ ⎜ ⎟ = φ(0.5547 )
⎜⎝ 2000 2 + 3000 2 ⎟⎠
From the tables of the normal distribution, Rs = 0.7105. This will be the sys-
tem reliability if there are no capacity constraints. The WDN is presented in
Fig. 16-10 in terms of node and links.
Considering the demand data to follow a normal distribution, one can find
the probabilities of various demand levels using the normal distribution tables.
For example, in the present illustrative example, the probabilities of demands
less than 12,000 cmd, between 12,000 and 15,000, between 15,000 and 18,000, and
between 18,000 and 21,000 cmd are computed as follows:
Pr{D < 12,000} = 0.1587
3 3(300)
1000 1100 4(250)
2(350) 4
900
1700 179 900
2 180 5
600 1000 5(200)
1350 6
1(350) 178 181
30 m 1400 8(200) 900
1400
8 183
1 6(300) 7(250) 800 10(200)
7 9(200) 1000
-13500 900 1100
800 181 900 11(150)
1200
180
182
9
12(300) 10
1000
1100 1000
180
12 182
13(200) 800 15(150)
11 800 14(200) 800 16(200)
600 181 1400 1100
181 13
14
1000
800
180
17(250) 181
1200 19(150)
15 900 21(150)
800 17 1200
18(200)
179 800 16 900
a 20(200)
600 181
b Junction Number 1400
Junction Demand 180
c Ground Elevation (m)
Pipeline No. (diam. in mm)
Length in m
Figure 16-10 Line diagram of the water distribution network of Fig. 16-9.
734 Risk and Reliability Analysis
Further, the probability of (S – D) > 0 is 0.7105. If supply and demand are inde-
pendent and normally distributed, then the joint probabilities can be found as
Pr{(S – D) > 0; D < 12,000} = 0.1587 × 0.7105 = 0.1128
The next step is the hydraulic analysis of the network to estimate pressures
and flows in various pipelines for various demand intervals. The program WAD-
ISO developed by Walski et al. (1990) was used for the purpose. It was assumed
that a particular demand interval can be represented by its average demand.
Thus, the demand intervals of 12,000–15,000, 15,000–18,000, and 18,000–21,000 are
represented by the demands of 13,500, 16,500 and 19,500, respectively.
The results of hydraulic analysis for demand levels of 12,000, 13,500, 16,500,
and 19,500 cmd are given in Table E16-8. As can be seen from Table E16-8, when
the demand is 12,000 cmd, all the demand nodes will receive water more than
the service head (16 m). However, when the demand is more than 12,000 cmd,
some of the nodes have pressure heads below the minimum desired service
head. As a result, some of the demand nodes may receive water in full, some
may receive water at a reduced rate, and some may not receive any water. For
example, when the demand of the network is 16,500 cmd, the demand node 5
will receive water at 13.2 m. This node will receive water in partial-flow mode
and will get (from Eq. 16.33) 669 cmd against the requirement of 1,222 cmd. Sim-
ilarly, node number 6 will receive water at a pressure of 7.8 m. Since this is below
the minimum required head, no water will be withdrawn from this node.
The residual pressure heads at various nodes for different sets of demands can
be utilized to compute the network hydraulic reliability using Eq. 16.40. After
computing the Rha values for all the demand intervals, the system reliability can be
computed by using Eq. 16.36. Note that the probability of (S – D) > 0 and D >
21,000 cmd is very small (0.0161) and the corresponding hydraulic reliability will
also be very small. Therefore, the hydraulic reliability computations for D > 21,000
cmd have been neglected. This gives the system hydraulic reliability
Rs = (0.1128 × 1.0) + (0.2425 × 0.9946) + (0.2425 × 0.6604) + (0.0966 × 0.4235) = 0.5550
Reliability Analysis of Water Distribution Networks 735
If the value of the HWC changes (owing to aging of pipeline, etc.), the values
of the residual pressures at various demand nodes will also change. The reliabil-
ity can be computed by following the procedure just given. For the case when
the HWC is 90 (metric units) for all the pipes, higher head loss at various
demand nodes will reduce the system hydraulic reliability to 0.4716.
Example 16.9 Compute the hydraulic reliability of the WDN of Fig. 16-10 when
the network demand is 16,500 cmd. This example is excerpted from Kansal (1996).
Solution In Table E16-8, some of the nodes are in partial-flow mode and some
are in no-flow mode. For example, nodes 5, 9, 13, and 16 are in partial-flow mode
and nodes 6, 10, 14, and 17 are in no-flow mode. Using Eq. 16.37, one can
Table E16-8 Pressure and flow data at various nodes for the WDN shown in Fig. 16-10 by node head analysis.
Node Elevation Output Pressure Output Pressure Output Pressure Output Pressure Rj
(m) (cmd) head (cmd) head (cmd) head (cmd) head (Eq. 16.27)
(m) (m) (m) (m)
1 180.0 –12000 35.0 –13500 35.0 –16500 35.0 –19500 35.0 —
2 178.0 533 31.6 600 30.3 733 27.3 867 23.7 1.0
3 179.0 889 28.4 1000 26.5 1222 22.3 1444 17.3 1.0
4 180.0 800 25.6 900 23.4 1100 18.1 1300 12.0 b 0.8414
5 181.0 889 22.5 1000 19.6 1222 13.2a 1444 5.6a 0.6870
6 183.0 800 18.6 900 15.3a 1100 7.8b 1300 –1.0a 0.4688
7 182.0 711 26.4 800 24.9 978 21.2 1156 16.9 1.0
8 181.0 711 25.5 800 23.5 978 18.7 1156 13.2a 0.9158
9 180.0 889 23.1 1000 20.2 1222 13.5a 1444 5.7b 0.7091
10 182.0 889 19.4 1000 16.0 1222 8.4b 1444 –0.5b 0.5000
11 181.0 533 29.7 600 28.6 733 26.2 867 23.3 1.0
12 181.0 711 25.8 800 23.7 978 19.1 1156 13.7 a 0.9300
13 183.0 889 20.9 1000 18.3 1222 12.1a 1444 4.8b 0.5540
14 181.0 711 20.0 800 16.6 978 8.7b 1156 –0.4b 0.5000
15 179.0 711 28.4 800 26.6 978 22.3 1156 17.4 1.0
16 180.0 533 24.1 600 21.4 733 15.3a 867 8.1b 0.8100
17 181.0 800 20.6 900 17.4 1100 9.9b 1300 1.2b 0.5000
Network hydraulic 1.0 0.9946 0.6604 0.4235
reliability (Rha)
a. Only partial flow is possible; figures in italics are the actual flows possible.
738 Risk and Reliability Analysis
compute the flows at nodes 5, 9, 13, and 16 as 669, 748, 193, and 666 cmd, respec-
tively. The flows at nodes 6, 10, 14, and 17 will be zero if the computed heads are
stationary. In the next iteration, these new computed flows are considered; this
will change the heads at these nodes, causing the possibility of increased flow at
the pressure-deficient nodes. The procedure is repeated until the computed val-
ues of flow at all the demand nodes are the same in two consecutive iterations.
The results have been tabulated in Table E16-9.
From Table E16-9, observe that of the network demand of 16,500 cmd, only
15,200 cmd can be met. Also, the demand nodes 6, 10, 14, and 17, which were in
no-flow condition, will actually have flows of 658, 850, 765, and 930 cmd, respec-
tively. The nodes 5, 9, 13, and 16, which were in partial-flow mode with flows of
669, 748, 193, and 666 cmd, will actually have flows of 1,222, 1,222, 1,120, and 930
cmd. Thus, nodes 5 and 9 will have full flow, whereas nodes 13 and 16 will still
be under partial flow. Similarly, for a network demand of 13,500 and 19,500 cmd,
the possible flows are 13,450 and 16,383 cmd, respectively. The actual nodal
flows are shown in Table E16-9. Note that the nodal demands shown in
Table E16-9 satisfy Eq. 16.37 at all the demand nodes.
Comparison of the last column of Table E16-9 with that of Table E16-8 shows
that the nodal reliability has changed considerably.
Similarly, the network hydraulic reliability (Rha) computed from Eq. 16.39 for
most of the demand patterns has also gone up. Using the Rha values with the
modified outflows at various demand nodes (see Table 16-9), we can compute
the system reliability from Eq. 16.35 as
Rs = (0.1128 × 1.0) + (0.2425 × 0.9965) + (0.2425 × 0.9274) + (0.0966 × 0.8542) = 0.6619
Thus, the hydraulic reliability after accounting for the actual flows is 0.6619
as compared to 0.555 from NHA. Of course, this value is less than the static
hydraulic reliability of the system (0.7105), which was computed without the
residual head criterion.
The nodal reliabilities can be plotted to get a reliability surface. Graphical
representation of the results helps to visualize them and to identify the areas of
low reliability. In turn, this helps in the operation and maintenance of a WDN.
⎛ μS− μD ⎞
RS = φ ⎜ ⎟ (16.41)
⎜⎝ σ 2 + σ 2 − 2 ρσ σ ⎟⎠
S D S D
ρμ S − μ D
ρ1 = (16.42)
σ S2 2
+ σD − 2 ρσ S σ D
If we denote the surplus series by X and the demand series by Y, the bivari-
ate probability estimation based on the normal distribution can be computed as
1 ⎡ x 2 − 2 ρ1 xy + y 2 ⎤
h k
Pr (X ≤ h, Y ≤ k , ρ1 ) = ∫ ∫ ⎢⎢ 2(1 − ρ2 ) ⎥⎥dxdy
2 π 1 − ρ12 ∞ ∞
exp (16.43)
⎣ 1 ⎦
The value of Pr(X≤ h, Y≤ k, ρ1) can be expressed using the T function (Kumar
1980) as
Pr ( h) Pr (k ) ⎧⎪ 0
Pr (X ≤ h, Y ≤ k , ρ1 ) = + − T ( h, ah ) − T (k , ak ) − ⎨ (16.44)
2 2 ⎩⎪ 1/ 2
where
k ρ12
ah = − (16.45)
h 1 − ρ12 1 − ρ12
h ρ12
ak = − (16.46)
k 1 − ρ12 1 − ρ12
a
1 exp[− h 2 (1 + x 2 )/ 2]
2 π ∫0
T ( h , a) = dx (16.47)
1 + x2
The values of the T function were tabulated by Owen (1962) for 0 ≤ a ≤ 1 and
∞. To obtain the values for 1 < a < ∞, the following formula may be used:
T(h, a) = 0.5P(h) + 0.5P(ah) – P(h)P(ah) – T(ah, 1/a) (16.48)
740 Risk and Reliability Analysis
-f
-f (S-D) f
f
Figure 16-11 Graphical representation of Eq. 16.50.
Thus,
Pr[(S – D) > 0; D0 < D ≤ D1; ρ1] = Pr(D ≤ D1) – Pr (D ≤ D0) – Pr[(S – D) ≤ 0, D < D1; ρ1]
+ Pr[(S – D) ≤ 0; D ≤ D0; ρ1] (16.50)
m
Pr (X ≤ h, Y ≤ k , ρ1 ) = ∑ Pr {(S − D) > 0 , Di −1 < D < Di , ρ1 } (16.51)
i =1
Example 16.10 For illustrating the methodology, consider again the WDN of
Example 16.8, shown in Fig. 16-10. The hydraulic equivalent of this WDN was
shown in Fig. 16-6. The network is then analyzed for pressures and flows in var-
ious pipelines for given demand intervals. This example has been excerpted
from Kumar (1999).
Reliability Analysis of Water Distribution Networks 741
Solution In the WDN of Example 16.8 with 17 nodes connected by 21 pipes, the
head available in the reservoir is 30 m. The service head for all the demand
nodes is 16 m and the minimum head for all the demand nodes is 12 m. The
node will receive reduced supply if the pressure of water lies between 12 and
16 m and there will be no supply if the pressure goes below 12 m. In the present
illustrative example, the following data have been used:
supply series: mean (μS) = 17,000 cmd
The coefficient of correlation between the supply and demand series (ρ) = –0.4.
The coefficient of correlation ρ1 between the (S – D) series and the D series can
be computed by using Eq. 16.30. For the given data, it turns out to be –0.9.
For computation of the system reliability as shown in Eq. 16.24, first the
value of joint probabilities for various intervals of demand subject to the condi-
tion that supply is more than the demand are computed. These values have been
computed using the T functions and are reported as follows:
Pr{(S – D) > 0; D < 12,000} = 0.158
The computations for the second demand interval (12,000 < D < 15,000) are
as follows: Using Eq. 16.50 one can express the probability Pr{(S – D) > 0; 12,000
< D < 15,000} as
Pr{(S – D) > 0; 12,000 < D < 15,000} =
If one does not consider the capacity constraint, the value of the system reli-
ability (RS), which is represented by Pr{(S – D) > 0}, would have been 0.683.
strict capacity measure excludes those paths that do not meet the desired demand
fully. A more realistic measure is one that also takes into account the partial satis-
faction of demand into consideration because two partially satisfying paths may
combine to satisfy the required demand at a particular node. Another important
dimension of the problem is that the intermediate nodes in any particular path
may draw water and hence restrict the capacity of the path for the desired demand
node.
Qij n( j )
Xij =
Qj
, and Q j = ∑ qij
i =1
N
Q0 = ∑ Q j (16.52)
j =1
Note that Q0, which is the sum of flows in all links, is greater than the total
demand. The redundancy for node j can be written in the same form as that for
Shannon’s entropy (see Chapter 9):
n( j ) qij qij
Sj = − ∑ ln (16.53)
i =1 Qj Qj
where qij is the flow in the link from node i to node j, Qj is the total flow into node
j, and, in this equation, redundancy is measured by the extent to which the node
receives water when a link that is incident to it fails. The redundancy Sj of a node
will be a maximum when all qij/Qj terms are equal (i.e., all links incident on the
node carry the same flow). Awumah et al. (1991) argued that the relative impor-
tance of a link to the total flow is important in assessing the overall network per-
formance and hence qij/Qj in Eq. 16.53 should be replaced by qij/Q0. With this
replacement, Eq. 16.53 can be written as
The sum of redundancies at all nodes will give the entropic measure of the
network redundancy:
N
SN = ∑ S j ( ) (16.55)
j =1
746 Risk and Reliability Analysis
Substituting in Eq. 16.55 from Eq. 16.54 gives the entropic measure of net-
work redundancy as
N ⎡Q ⎤ N ⎡ Qj Qj ⎤
SN = ∑ ⎢ Sj ⎥ − ∑ ⎢
j
ln ⎥ (16.56)
j =1 ⎣ Q0 ⎦ j =1 ⎣ Q0 Q0 ⎦
The first term on the right-hand side of this equation is the weighted sum of
entropy at different nodes. The second term reflects the redundancy among
nodes where uniformity among the nodes in terms of flow distribution (unifor-
mity of Qj/Q0 value) indicates that the network has a better capacity to success-
fully overcome failure of any single link. Awumah et al. (1990) showed that
maximization of the function given by Eq. 16.56 is equivalent to maximizing the
ability of the network to supply water to each node. This is the same as maximiz-
ing the network redundancy.
A measure of network performance that reflects how well the network is
able to supply flow under a range of failure conditions is the percentage of the
total demand supplied at adequate pressure (PSPF). This parameter shows the
performance of the network as a whole and therefore reflects the network-wide
redundancy (Awumah et al. 1991).
The entropy approach can also be used to minimize the cost of the network
in a formulation that includes, in addition to the hydraulic constraints, a set of
constraints to ensure the minimum level of redundancy in the network. Con-
straining entropy of the network at each node individually has been suggested.
This approach ensures that the network does not have a few unreliable nodes
while maintaining good overall redundancy. The decision variables of this prob-
lem are the flows. The resulting problem is a nonlinear optimization problem
since the entropy function is nonlinear.
Example 16.11 Figure 16-12 shows a WDN. The node numbers are shown in
bold and the flows in the links (m3/hour) are also shown. Compute the entropy
of the WDN
Solution For the entire network, flow Q0 obtained by summing the individual
flows is 8,704 units. The entropy has been computed as shown in Table E16-11.
For example, at node 1, the total flow is 840 + 760 = 1,600 units. The last two col-
umns give the first and the second terms of Eq. 16.56. Hence, SN = 2.9807.
Reliability Analysis of Water Distribution Networks 747
6 5 4
280 60 610
8 200 7
9
80 60 210
12 20 11 110 10
Figure 16-12 WDN of Example 16.11.
16.5 Questions
16.1 The discharge passing through a horizontal pipe of diameter 80 mm is
0.01 m3/s. The pressures at the upstream and downstream sections are
20 and 15 kPa, respectively. What is the head loss in the pipe?
16.2 For the pipe of Question 16.1, let the elevation of the upstream end be
1.5 m higher than that of the downstream end. Further, at the down-
stream end, the pipe diameter is 70 mm (whereas it is 80 mm at the
upstream end). Find the head loss between the two sections.
16.3 A 1000-m-long pipe with a diameter of 0.5 m carries water at a velocity
of 3 m/s. Determine the head loss in the pipe if the relative roughness is
ks = 0.0001 m. Assume the kinematic viscosity ν = 1.0 × 10–6 m2/s.
16.4 Tanks 1 and 2 are 1,000 meters apart and are connected through two
pipes. The first pipe connected to pipe 1 is 600 m long and has a diame-
ter of 300 mm. This is then connected to another pipe of 400-m length
and a diameter of 200 mm, which is then connected to tank 2. The flow of
water from the upper tank to the lower one is at 0.05m3/s and the
Darcy–Weisbach friction factor is 0.025. Find the elevation of water in the
lower tank if the elevation of water in the upper tank is 80 m.
16.5 A pipe of 0.15 m diameter is connected to a pipe of 0.20 m diameter. If the
average velocity in the first pipe is 10 m/s, what is the average velocity in
the second pipe? Also determine the discharge through the second pipe.
16.6 Four pipes of different diameters join at a junction as shown in the
Fig. 16-13. The diameters of the pipes and the discharge passing through
them are also shown in the figure. Find the value of Q4 and the average
velocity of flow in each pipe.
Q 2 = 0.035 m3/s
20 cm
Q1 = 0.022 m3/s
12.5 cm 15 cm Q 3 = 0.025 m3/s
22.5 cm
Q4
Figure 16-13
Reliability Analysis of Water Distribution Networks 749
Vijay P. Singh, Ph.D., D.Sc., P.E., P.H., D.WRE, holds the Caroline and William
N. Lehrer Distinguished Chair in Water Engineering and is a professor of biolog-
ical and agricultural engineering and a professor of civil and environmental
engineering at Texas A&M University. He earned B.S., M.S., Ph.D., and D.Sc.
degrees in engineering and has widely published in the areas of hydrology,
hydraulics, irrigation engineering, environmental engineering, and water
resources. He currently serves as editor-in-chief of ASCE’s Journal of Hydrologic
Engineering, editor-in-chief of the Water Science and Technology book series for
Springer, and associate editor or member of 16 editorial boards. He has won
more than 42 national and international awards for his contributions and profes-
sional service. Dr. Singh has been president and senior vice president of the
American Institute of Hydrology and a member of numerous committees of
ASCE, the Hydrology Section of the American Geophysical Union, and the
American Water Resources Association.
Sharad K. Jain, Ph.D., is a senior scientist and head of the Water Resources Sys-
tems Division of the National Institute of Hydrology, Roorkee, India. He holds
bachelor’s, master’s, and doctorate degrees in civil engineering. He has written
textbooks on water resources systems planning and management and on hydrol-
ogy and water resources of India; in addition, he published numerous journal
articles and conference papers. Dr. Jain is a former editor of the Journal of Indian
Water Resources Society. He has been involved in many research and consultancy
projects that address real-life problems of water resources development and
management.
Aditya Tyagi, Ph.D., P.E., is a senior water resources engineer and technologist
in the water business group of CH2M HILL. Previously, he worked as a scientist
in the National Institute of Hydrology in India. He obtained his B.S. in civil engi-
neering and M.S. in environmental engineering from the Indian Institute of
Technology and his Ph.D. in biosystems engineering from Oklahoma State Uni-
versity. His research interests encompass the application of various analytical,
numerical, statistical, stochastic, and optimization techniques to solve hydro-
logic, hydraulic, and environmental engineering problems.
811
References
751
752 References
Barbe, D. E., Cruise, J. F., and Singh, V. P. (1994). “Derivation of a distribution for
the piezometric head in groundwater flow using stochastic and statistical
methods.” Hydrology and environmental engineering, 2, K. W. Hipel, ed.,
Kluwer Academic Publishers, Dordrecht, 151–160.
Barnes, H. H. (1962). “Roughness characteristics of natural channels.” Water
supply papers 1849, U.S. Geological Survey, Washington, D.C.
Bendat, J. S., and Piersol, A. G. (1980). Engineering applications of correlation
and spectral analysis, Wiley-Interscience, New York.
Benjamin, J. R., and Cornell, C. A. (1970). Probability, statistics, and decision for
civil engineers, McGraw-Hill Inc., New York.
Bobee, B., Pereault, L., and Ashkar, F. (1993). “Two kinds of moment ratio
diagrams and their applications in hydrology.” Stochastic Hydrology and
Hydraulics, 7, 41–65.
Booy, C. (1990). Rational engineering decisions under conditions of uncertainty.
Lecture notes for course 25-221, Department of Civil Engineering,
University of Manitoba, Winnipeg, Canada.
Borgman, L. E. (1963). “Risk criteria.” Journal of the Waterways and Harbor
Division, Proc., ASCE, 89 (WW3), 1–35.
Bos, R. (1990). Aquifer parameter identification by the maximum entropy and
Bayes’ theorem. Unpublished report, Technion-Israel Institute of
Technology, Haifa, Israel.
Bouchart, F. J.-C., and Goulter, I. C. (1998). “Is rational decision making
appropriate for management of irrigation reservoirs?” Journal of Water
Resources Planning and Management, 124 (6) 310–309.
Boudon, R. (2003). “Beyond rational choice theory.” Annual Review of Sociology,
29, 1–29.
Box, G. E. P., and Muller, M. E. (1958). “A note on the generation of random
normal deviates.” Annals of Mathematical Statistics, 29, 610–611.
Burg, J. P. (1975). Maximum entropy spectral analysis. Ph.D. thesis, Stanford
University, Palo Alto, Calif.
Burges, S. J., and Lettenmaier, D. P. (1982). “Reliability measures for water
supply reservoirs and the significance of long-term persistence.” Decision
making for hydrosystems: Forecasting and operation, T. E. Unny and E. A.
McBean, eds., Water Resources Publications, Littleton, Colo.
Carslaw, H. S. and Jaeger, J. C. (1959). Conduction of heat in solids, Clarendon
Press, Oxford, England.
Cawlfield, J. D., and Wu, M. C. (1993). “Probabilistic sensitivity analysis for one
dimensional reactive transport in porous media.” Water Resources
Research, 29(3), 661–672.
Chang, C. H., Tung, Y. K., and Yang, J. C. (1995). “Evaluation of probabilistic
point estimate methods.” Applied Mathematical Modelling, 19(2), 95–105.
References 753
HEC. (1990). “Explaining flood risk.” Training document no. 33, Hydrologic
Engineering Center, U.S. Army Corps of Engineers, Davis, Calif.
Herr, H. D., and Krzystofowicz, R. (2005). “Generic probability distribution of
rainfall in space: The bivariate model.” Journal of Hydrology, 300, 234–
263.
Hosking, J. R. M. (1986). “The theory of probability-weighted moments.”
Technical report RC 12210, Mathematics, IBM Thomas J. Watson Research
Center, Yorktown Heights, N.Y.
Hosking, J. R. M. (1990). “L-moments: Analysis and estimation of distribution
using linear combination of order statistics.” Journal of Royal Statistical
Society B, 52(1), 105–124.
Hosking, J. R. M., and Wallis, J. R. (1987). “Parameter and quantile estimation for
the generalized Pareto distribution.” Technometrics, 29(3), 339–349.
Houghton, J. C. (1978). “Birth of a parent: The Wakeby distribution for modeling
flood flows.” Water Resources Research, 14(6), 1105–1110.
Iman, R. L., and Helton, J. C. (1985). “An investigation of uncertainty and
sensitivity analysis techniques for computer models.” Risk Analysis, 8(1),
71–90.
ISWM Design Manual for Development and Redevelopment (2004). Prepared by
Freese and Nichols, Inc., AMEC Earth and Environmental, Alan Plummer
Associates, Inc., and Caffey Engineering, Inc., for the North Central Texas
Council of Governments.
Jain, S. K., and Singh, V. P. (2003). Water resources systems planning and
management, Elsevier, Amsterdam.
Jaynes, E. T. (1957). “Information theory and statistical mechanics, I.” Physical
Review, 106, 620–630.
Jaynes, E. T. (1982). “On the rationale of maximum entropy methods.” Proc.,
IEEE, 70, 939–952.
Jenkinson, A. F. (1955). “The frequency distribution of the annual maximum (or
minimum) values of meteorological elements.” Quarterly Journal of the
Royal Meteorological Society, 81, 251–261.
Jenkinson, A. F. (1969). “Estimation of maximum floods.” World Meteorological
Organization Technical Note, Geneva, Switzerland, 98(5), 183–227.
Jennings, M. E., and Benson M. A. (1969). “Frequency curve for annual flood
series with some zero events or incomplete data.” Water Resources
Research, 5(1), 276–280.
Jeppson, R. W. (1977). Analysis of flow in pipe networks, Ann Arbor Science,
Ann Arbor, Mich.
Joe, H. (1993). “Parametric families of multivariate distributions with given
margins.” Journal of Multivariate Analysis, 46(2), 262–282.
Johnson, N. L., and Kotz, S. (1985). “Moment ratios.” Encyclopedia of Statistical
Inferences, 5, 603–604.
758 References
Ochoa, I. D., Bryson, M. C., and Shen, H. W. (1980). “On the occurrence and
importance of Paretian-tailed distribution in hydrology.” Journal of
Hydrology, 48, 53–62.
Ouarda, T. B. M. J, Ashkar, F., Ben Said, E. M., and Hournani, L. (1994).
“Distribution statistiques utilisees en hydrologie: Transformation et
proprietes asymptotiques.” Rapport de recherché STAT-13, University of
Moncton, New Brunswick, Canada.
Özisik, M. N. (1968). Boundary value problems of heat conduction, International
Textbook Co., Scranton, Pa., 48–79.
Owen, D. B. (1962). Handbook of statistical tables, Addison-Wesley Publishing
Company, Reading, Mass.
Ozkul, S., Harmancioglu, N. B., and Singh, V. P. (2000). “Entropy-based
assessment of water quality monitoring networks.” Journal of Hydrologic
Engineering, 5(1), 90–100.
Padmanabhan, G., and Rao, A. R. (1986). “Maximum entropy spectra of some
rainfall and river flow time series from southern and central India.”
Theoretical and Applied Climatology, 37, 63–73.
Padmanabhan, G., and Rao, A. R. (1988). “Maximum entropy spectral analysis
of hydrologic data.” Water Resources Research, 24(9), 1519–1533.
Pate-Cornell, M. E. (1996). “Uncertainties in risk analysis: Six levels of
treatment.” Reliability Engineering and System Safety, 54(2–3), 95–111.
Pebesma, E. J., and Heuvelink, G. B. M. (1999). “Latin hypercube sampling of
Gaussian random fields.” Technometrics, 41(4), 303–312.
Perreault, L., Bobee, B., and Rasmussen, P. F. (1999a). “Halpen distribution
system. I: Mathematical and statistical properties.” Journal of Hydrologic
Engineering, 4(3), 189–199.
Perreault, L., Bobee, B., and Rasmussen, P. F. (1999b). “Halpen distribution
system. II: Parameter and quantile estimation.” Journal of Hydrologic
Engineering, 4(3), 200–208.
Phien, H. N., and Jivajirajah, T. (1984). “Fitting the Sb curve by the method of
maximum likelihood.” Journal of Hydrology, 67, 67–75.
Plate, E. J. (2002). “Risk management for hydraulic systems under hydrological
loads.” Risk, reliability, uncertainty, and robustness of water resources
systems, J. J. Bogardi and Z. W. Kundzewicz, eds., Cambridge University
Press, Cambridge, U.K.
Quandt, R. E. (1966). “Old and new methods of estimation of the Pareto
distribution.” Biometrika, 10, 55–82.
Quimpo, R. G. (1993). “Reliability analysis of water distribution systems.” Proc.,
Sixth Conference Sponsored by the Engineering Foundation on Risk-
Based Decision Making in Water Resources, ASCE, 45–55.
Rackwitz, R. (1976). “Practical probabilistic approach to design.” Comite
European du Beton, Paris, Bulletin No. 112.
References 763
USEPA (1989). Risk assessment guidance for superfund, Vol. 1: Human health
evaluation manual (part A), Interim final, Rep. no. EPA/540/1-89/002,
U.S. Office of Emergency and Remedial Response, Washington, D.C.
USEPA (1990). Guidance on remedial actions for superfund sites with PCB
contamination, Rep. no. EPA/540/G-90/007, U.S. Office of Emergency
and Remedial Response, Washington, D.C.
Venetis, C. (1970). “Finite aquifers: Characteristic response and applications.”
Journal of Hydrology, 12, 53–62.
Veneziano, D. (1974). Contributions to second moment reliability, Res. Rep. No.
R74-33, Department of Civil Engineering, Massachusetts Institute of
Technology, Cambridge, Mass.
Vrijling, J. K., and van Gelder, P. H. A. J. M. (2000). “Policy implications of
uncertainty integration in design.” Stochastic hydraulics 2000, Wang and
Hu, eds., Balkema, Rotterdam., 633–646.
Vrijling, J. K., Van Hengel, W., and Houben, R. J. (1995). “A framework for risk
evaluation.” Journal of Hazardous Materials, 43, 245–261.
Wagner, J. M., Shamir, U., and Marks, D. H. (1988) “Water distribution reliability:
Analytical methods.” Journal of Water Resources Planning and
Management, ASCE, 114(3), 253–275.
Walski, T. M., Gessler, J., and Sjostrom, J. W. (1990). Water distribution systems:
Simulation and sizing, Lewis Publishers Inc., Chelsea, Mich.
Wang, Q. J. (1997). “LH moments for statistical analysis of extreme events.”
Water Resources Research, 33(12), 2841–2848.
Wang, S. X., and Adams, B. J. (1984). Parameter estimation in flood frequency
analysis, Publication 84-02, Department of Civil Engineering, University
of Toronto, Toronto, Canada.
Wang, S. X., and Singh, V. P. (1995). “Frequency estimation for hydrological
samples with zero value.” Journal of Water Resources Planning and
Management, 121(1), 98–108.
Williams, B. J., and Yeh, W. W.-G. (1983). “Parameter estimation in rainfall-runoff
models.” Journal of Hydrology, 63, 373–393.
Willmott, C. J., Ackleson, S. G., Davis, R. E., Feddema, J. J., Klink, K. M., Legates,
D. R., O'Donnell, J., and Rowe, C. M. (1985). “Statistics for the evaluation
and comparison of models.” Journal of Geophysics Research, 90, 8995–
9005.
Woo, M. K., and Wu, K. (1989). “Fitting annual floods with zero flows.”
Canadian Water Resources Journal, 14(2), 10–16.
Wood, D. J. (1980). Computer analysis of flow in pipe networks including
extended period simulations (KYPIPE)—Users manual, Department of
Civil Engineering. University of Kentucky, Lexington, Ky.
Woodbury, A. D., and Ulrych, T. J. (1993). “Minimum relative entropy: Forward
probabilistic modeling.” Water Resources Research, 29(8), 2847–2860.
References 767
A
Abramowitz 177 191 375
Abramowitz and Stegun 450
Acceptable risk See Risk
Adams See Wang and Adams
Advanced first-order second-moment
(AFOSM) 31
See Also Uncertainty analysis
Affinity 382
Akaike information criterion (AIC) 382
Alabama 679
Ali 298
Alias method 448
Alternative hypothesis 699
Amite River, Louisiana 121 178 227 229
237 238 240
Ang 601
See Also First-order
reliability method
Archimedean copula 297
n-dimensional 297
two-dimensional 297
Archimedean copula families 297
Ali-Mikhail-Haq 298
Cook-Johnson (Clayton) 300
Frank 299
B
Bacchi 277
Bangladesh 671
Barbe 387
Barnes 716
See Also Pipe flow equations
Based risk model 691
Baton Rouge, Louisiana 234 405 411 506
Bayes's theorem 66
Bayesian entropy 365
Benefit-cost ratio 12
Benjamin 579
See Also First-order approximation
method
Bernoulli distribution 150 150
Bernoulli random variable 151
Bernoulli trials 151
Bessel function 279
See Also Bivariate exponential
distribution
Beta distribution 231
Bias See Estimators 337
Big Lost River, Idaho 263
Binomial distribution 152 195
Bivariate exponential distribution 277
Bessel function 279 280
Nagao-Kadoya (NK) model 279
Bivariate gamma distribution 285
Bessel function 285
bivariate hydrologic frequency
analysis 285
Izawa model 285
Pearson's correlation coefficient 285
Smith–Adelfang–Tubbs (SAT) model 285
Bivariate Gumbel logistic distribution 283
Bivariate Gumbel mixed distribution 280
cumulative density function 280
Gumbel marginal probability
distributions 280
method of moments 282
Bivariate log-normal distribution 288
Bivariate normal distribution 273
C
California 386
Cascade of linear reservoirs 259
average residence time 259
lag time 259
Pearson type 3 probability density
function 259
Central limit theorem 187 198
Centroid 125
Chance constrained programming
method 647
chance constraints 648
decision rule 647
nonrandomized decision rule 648
D
Dalezios and Tyraskis 386
Damage curve 26
Darcy's equation 434
Darlington, Louisiana 158 178 227 229
237
De Michele See Salvadori and De Michele
De Morgan's rule 55
Distributions (Cont.)
conditional probability 81
conditional trivariate 292
continuous 149
discrete 149
exponential 173
extreme value type I (EVI) 212 267
extreme value type II (EVII) 220
extreme value type III 224 224
gamma 175 236 267
Gaussian 187
generalized extreme value (GEV) 228
generalized Pareto (GP) 456
generic bivariate probability 308
geometric 159
Gumbel 213 267
Gumbel bivariate exponential 301
Gumbel marginal probability 280
Halphen 243
joint probability 76
lambda 446
limiting 212
log-Gumbel 267
logistic 239
log-logistic 267
log-normal 204 267
log-Pearson type 3 (LP3) 237
marginal 79
multivariate normal 273
negative binomial 163
negative exponential 212
normal 187
Distributions (Cont.)
Pareto 237
Pascal 163
Poisson 150 166 197
Raleigh 228
rectangular 230
regenerative 168
sample 108
symmetric 114
three-parameter generalized Pareto 456
three-parameter Pearson 236
triangular 240
two-parameter gamma 374
uniform 230
wakeby 237
Weibull 225 267
Distributive rule 55
Diversity index (DI) 388
Dooge 260 374
See Also Linear downstream channel
routing model
Duffy 537
Dwight 642
Dynamic equilibrium 358
E
Eclipsing error 183
Ecological risk 688
Economic risk 685
Edson 374
Effective monetary value (EMV) 699
Effective utility value 699
Efficiency 338
Cramer-Rao bound 339
Cramer-Rao inequality 339
minimum variance unbiased
estimator (MVUE) 339
See Also Estimators
Efficient 339
Efron 481
Eilbert and Christensen 386
Einstein 263
See Also Linear downstream channel
routing model
Elementary events 51
Entropy 357
Bayesian 365
Boltzmann 358
Burg 385
conditional 366
informational 359
information-theoretical 357
joint 366
Shannon 359 362
statistical 357 358
thermodynamical 357
zero 357
Entropy method 367
Entropy theory 356
Entropy-based parameter estimation 363
Entropy-based redundancy measure 387
Environmental and water resources 5
problems in 5
system variables 5
Estimators (Cont.)
modified moment 457
probability-weighted moment (PWM) 457
regular maximum likelihood (RMLE) 458
relative mean error 343
root mean square error 343
variance 340
Evaluation of alternatives 10
Event mean concentration 187
Events 644
accident sequence 644
basic events 645
cut set approach 645
event specifications 645
event tree 644
fault tree 645
intersection 645
logic gates 645
minimal cut set 645
top event 645
union 645
Exceedance probability 195
Expectation 96
of a function 96
properties 96
Expected life 384
Expected monetary value (EMV) 10
Expected utility value (EUV) 10
Expected value of a decision 19
Experiment 37
Exponential distribution 173
F
F Test 708
Failure 384
initial failure state 641
instantaneous failure rate 384
Failure of a system 563
performance 563
structural 563
Failure rate 384
Favre 296
Federal Emergency Management Agency
(FEMA) 683
Fiorentino See Singh and Fiorentino 386
First-order approximation method 579
advanced first-order approximation
method 599
Darcy-Weisbach equation 580
Hazen-Williams equation 580
Manning's equation 580
multiplicative-type model 580
scatter 579
First-order reliability method 599
ellipsoid approach 607
Excel 607
G
Gamma distribution 175 236 267
See Also Discrete distributions
Garrick See Kaplan and Garrick
Gauss, Karl 188
See Also Normal distribution
Gaussian distribution See Normal
distribution
Gelhar 534
Generalized extreme value (GEV)
distribution 228
Generalized Pareto (GP) distribution 456
Generation of inflow data 454
Generic bivariate probability distribution 308
Generic expectation function method 616
Genest and Ghoudi 302
See Also Archimedean copula families
Geometric distribution 159
Geometric random variable 159
Ghoudi See Genest and Ghoudi
Goulter See Bouchart and Goulter
H
Haktanir 325
Haldar 601 616
See Also First-order reliability method
Halphen distributions 243
type A 243
type B 243
type IB 243
Hamed See Rao and Hamed
Hardy-cross method 721
Harley 260
See Also Linear downstream channel
routing model
Harm 672
I
Iman 482
Independence 62
India 522
Individual risk 685
Informational entropy 359
Information-theoretical entropy 357
Ingram 277
Inherent uncertainty 397
Initial failure state 641
Initial value problems 516
Instantaneous failure rate 384
Instantaneous unit hydrograph 374
Integer overflow 440
Interval estimation of parameters 344
Intrinsic uncertainty 397
Inverse transformation method 442
Itensity-Duration-Frequency (IDF) curve 428
Izawa 285
J
Jackknife method 480
Jain See Swamee and Jain
K
Kansas City, Missouri 428
Kelly and Krzysztofowicz 310
Kelton 441
See Also Random numbers
Kendall 375
Kendall's coefficient 298
Kernel function See Unit impulse response
(UIR) function
Kinematic wave theory 513
Klemes 7
Klir 4
See Also Uncertainty
Kostiakov equation 513
Kotz 261
See Also Johnston and Kotz
Krasovskaia 386
Kroll and Stedinger 314
See Also Method of moments
L
Lag time See Cascade of linear reservoirs
Lagrange multipliers 363
Lagrange parameters 367
Lambda distribution 446
Laplace's principle of insufficient reason 365
Latin hypercube sampling 598
Law 441
See Also Random numbers
Law of errors See Normal distribution
Legendre polynomial 328
Levine and Tribus 360
Lienhard 374
Limit state 570
Limit state surface 383
Limiting distribution 212
Lin and Wang 525 527
Low 607
See Also First-order reliability method
M
Mahadevan 601 616
See Also First-order reliability method
Manache 482
Manning's equation 425
Manning's friction factor 7
Manning's roughness factor 425
Maran 550
Marginal distributions 79
Marginal probability density function 79
Markovic 177
Marshall 277
Masse 260
See Also Linear downstream channel
routing model
Maximum entropy principle 357 360
Jaynes 357
minimum bias principle 361
minimum prejudice 361
Maximum entropy spectral analysis 385
AR models 385
Horton-Strahler ordering scheme 386
Maximum external loads 383
Mays 713
McKay 482
Mean 106
arithmetic 110
geometric 110
harmonic 111
Mean-value 31
Mean-value first-order second-moment
(MFOSM) 31
See Also Uncertainty analysis
Measures of central location 100
Median 107
Melching 31 482
See Also Sensitivity analysis
Mellin transform 634
Meta-Gaussian model 310
Method of L-moments 327
Method of maximum likelihood 322
likelihood function 323
maximum likelihood (ML) parameter
estimates 322
Method of moments 282 314
higher moments 123
kurtosis 123
moment ratios 123
peakedness 123
See Also Bivariate Gumbel mixed
distribution
Method of ordinary least squares 330
Mill Grove, Missouri 177
Miller 261
See Also Cox and Miller
Minimum cross entropy principle 365
Minimum relative entropy principle 387
Minimum water supply requirement
constraint 654
Mississippi Gulf Coast 672
Mitchell 431
Mode 107
Modified rational method 432
Moment-generating function (MGF) 103
Moments 95 100
central 100
conventional, theory of 326
even-ordered 192
method of probability-weight 325
odd-ordered central 192
probability-weighted 267 327
regular 100
theorem of 314
Moments of distribution 100
Monte Carlo integration 474
Monte Carlo simulation 31 437 597
antithetic variates technique 475 476
control variates 475 479
correlated sampling technique 475 476
importance sampling technique 475 478
latin hypercube simulation 475 477 479
stratified sampling technique 475 478
variance reduction techniques 475
See Also Uncertainty analysis
Moore 261
Moore and Clarke 261
See Also Linear downstream channel
routing model
Moramarco and Singh 388
Morlat 244
See Also Halphen distributions
Muller See Box and Muller
Multiplication rule 63
N
Napiorkowski 263
See Also Linear downstream channel
routing model
Narmada River 405
Nash 122 314 374
See Also Method of moments
Nash cascade 527
Natale and Todini 330
National pollutant discharge elimination
system 249
Natural environment research council 228
Negative binomial distribution 163
negative binomial random variable 163
Negative exponential distribution 212
Nelson 297 298
New Orleans 167 671 679
Node equations 720
Node head analysis (NHA) 735
Nonexceedance probability 214
Nonlinear node formation 723
Nonoccurrence 150
Nonquantifiable consequences 11
Nonstructural measures 20
Normal distribution 187
coefficient of kurtosis 193
even-ordered moments 192
law of errors 188
odd-ordered central moments 192
Normalization 343
Normalized mean square error (NMSE) 132
Normalized standard error (NSE) See
Estimators 342
Null hypothesis 699
O
Occurrence 150
Operating characteristic curve 701
Ordinary entropy method 375
Ordinary moments 327
Ouarda 244
See Also Halphen distribution
Owen 739
P
Padmanabhan and Rao 385
Parameter estimation 313 456
maximum likelihood (ML) equations 458
maximum likelihood (ML) estimator 458
modified moment estimators 457
probability-weighted moment
estimators (PWM) 457
regular maximu m likelihood
estimators (RMLE) 458
Parameter-space expansion method 364 372 379
Q
Q-Q plot 306
Quandt 457
See Also Parameter estimation
Quimpo 742
R
Rackwitz 599
See Also First-order reliability method
Risk (Cont.)
probabilistic approach 676
Royal Society 670
subnational 690
supranational 690
vulnerability 670
Risk management 681
acceptability 681
analysis 681
control procedures 681
criteria 681
hazard identification 681
Robustness 343 461
See Estimators 343
mini-max 343
minimum average RMSE 343
Root mean square error (RMSE) 132
Rossman 726
Rubinstein 478
S
Safeguards 671
Safety 383 691
Safety factors See Reliability evaluation
Safety margins See Reliability evaluation
Saint-Venant equation 547
Salvadori and De Michele 296
Sample distribution 108
Sample space 52
Sample-mean method 475
Sarino and Serrano 530 531
Scale parameter 176
Scandinavia 386
Second law of thermodynamics 358
Second quartile 113
Second-order approximation method 598
Second-order reliability method 616
Sensitivity analysis 481
global sensitivity analysis (GSA) 482
local sensitivity analysis (LSA) 481
sensitivity coefficient 481
Spearman's rank correlation
coefficient 482
Serrano See Sarino and Serrano
Shackle 4
See Also Uncertainty
Shannon 357 359
Shannon entropy 362
Shape factors 122
Shape parameter 176
Shinozuka 601
See Also First-order reliability method
Shore 385
Shrader 331
Similarity function 382
Simple event 57
Simulation 437
Singh 260 296 325 366
381 385 388 456
See Also Harmancioglu and Singh
See Also Krstanovic and Singh
See Also Linear downstream channel
routing model
See Also Moramarco and Singh
Singh (Cont.)
See Also Zhang and Singh
Singh and Fiorentino 366 381
Singh and Rajagopal 364 379
Singh and Singh 277
Skewness 120
coefficient of 121
sample 121
Skwierzyna 268
See Also Flood frequency analysis
(FFA)
Small-sample correction 349
Smith 285
See Also Bivariate gamma distribution
Snyder 331
Societal risk 685
Soil conservation service method 432
Sombor 171
South America 468
Spectral analysis 254
Spectral density 498
Fourier analysis 498
Fourier-Stieltjes integral 499
harmonic analysis 498
Nyquist frequency 501
Statistical procedures 187
Stedinger and Tasker 331
Stedinger See Kroll and Stedinger
Stegun 177 191 375 450
See Also Abramowitz and Stegun
Stem-and-leaf plot 47
extent of spread 47
extent of symmetry 47
leaves 48
presence of gaps 47
representative value 47
stem 48
typical value 47
Stochastic analysis 253
Stochastic differential equations 515
Darcy equation 537 539
differentiation 518 519
Dirac delta function 545
Dupuit approximation 537
Dupuit-Forchheimer theory 539
impulse response function (IRF) 544
integration 518 519
mean square continuity 518
Stochastic processes 486
autocorrelation coefficient 489
Bernoulli process 504
Brownian motion process 507
counting process 503
covariance and correlation 488
ensemble 486
evolutionary stochastic process 494
Gaussian process 505
integer-valued continuous-time 503
Markov process 506
mean and variance 487
nonstationary process 494
Poisson process 504
T
T Test 708
Taha 12
Tail behavior 213
Tang 601 607
See Also First-order reliability method
Tasker See Stedinger and Tasker
Taylor series expansion 31
Taylor's theorem 418
Texas 672
Theorem of total probability 66 67 267
Thermodynamical entropy 357
Three-parameter generalized Pareto
distribution 456
Three-parameter Pearson distribution 236
Time series analysis 253 508
Time to failure 634
expected life 635
mean time to failure 635
reliability 634
Todini See Natale and Todini
Total maximum daily load (TMDL) 133 430
Total quality management 677
Transinformation 366
Trends 508
deterministic component 509
seasonal or periodic components 508
time trend 508
trend component 509
Triangular distribution 240
Tribus See Levine and Tribus
Triplet definition 674
True value 337
Tsai 633
Tsunami See Bangladesh
Tung 634
Tweedie 261
See Also Linear downstream channel
routing model
Two-parameter gamma distribution 374
Tyraskis See Dalezios and Tyraskis
U
U.S. Army Corps of Engineers 672
U.S. environmental protection agency
(EPA) 478
exposure models library 478
U.S. National Weather Service 67
U.S. Water Resources Council 237
Ulrych 387
Ulrych and Clayton 385
Uncertainty 4 383 395
accuracy 401
analysis 31
computational 397
epistemic 397
ni data 397
inherent 397
intrinsic 397
model parameter 396
model structure 396
natural 396
operational 397
precision 401
sources of 30
Unexplained sum of squares 131
Uniform distribution 230
Unit hydrograph method 254
Unit impulse (UIF) function 252 254
Unit impulse response (UIR) function 252 254
Green's function 256
instantaneous unit hydrograph (IUH) 256
kernel function 256
Poisson process 256
V
Variance 114 317
coefficient of variation 116
expectation 117
properties 114
sample variance 115
standard deviation 114
standard error 116
Vrijling 679
See Also Risk
W
Wagner 742
Waiting time 691
Wakeby distribution 237
Wallis See Hosking and Wallis
Walski 726
Wang 314 325 525
See Also Lin and Wang
See Also Method of moments
Wang and Adams 314
Warta River 268
Y
Yang 387
Yang and Burn 387
Yeh See Williams and Yeh
Yue 285 288
See Also Bivariate gamma distribution
Z
Zero entropy 357
Zeroth Lagrange multiplier 367
Zhang and Singh 296