Eyvazian M Mehrpour MR Principles of Probability Theory and

The science of probability theory was first noticed in the middle of the 18th century with the study of
games of chance. As time went by, the applications of this science expanded to the point where today, the theory
of probability plays a crucial role in recognizing, modeling, and improving many uncertain real-world
phenomena.
The book in front of you is the result of several years of studying, using, and teaching the concepts of
probability theory and its applications. Despite the many complexities of this science, the main goal and focus
of this book are to present its contents simply and fluently and to help students create a deep understanding of
the subjects of this science. To achieve this goal, each chapter of the book is divided into two main parts. In the
first part, the text of each chapter is explained in a simple way and has examples that increase learning and
understanding of the content. Since the authors believe solving different problems is the best way to get good at
probability theory and fully understand it, the second half of each chapter in the book is filled with many
classified problems. The order of the problems is also designed such that the readers feel themselves progressing
along the learning path step by step.
In the first chapter of this book, the main ideas of combinatorial analysis, which is the primary way to
figure out probabilities, are explained.
In Chapter 2, the definition of probability, the principles of probability theory, and the main methods of
probability calculation are stated, and in Chapter 3, conditional probability and its applications are presented.
In Chapter 4, random variables are addressed, and in Chapter 5, the expected value of random variables
and some of their properties are mentioned. Also, in Chapters 6 and 7, some widely used discrete and continuous
random variables are introduced.
In Chapter 8, joint random variables and conditional distributions are discussed, and in Chapters 9 and
10, the distribution and expected value of a function of several random variables are addressed.
Finally, in Chapter 11, expected value of multiple random variables, correlation, and covariance are
explained.
For those who encounter difficulties in solving the problems in this book, another book titled "Solution
Manual for Principles of Probability and its Applications" has been prepared by the authors, in which there are
explanatory answers to all questions and, in many cases, more than one solution to the problems is proposed.
The contents of this book are not without flaws, and we incentivize our dear readers to contact us via
mrz.mehrpour94@gmail.com if they find any weak points or faults in the book or have any comments or
suggestions.
Ultimately, the authors hope that this collection can be effective in getting to know and learning the
science of probability better and be a step, however small, in developing this science.
CHAPTER 1: COMBINATORIAL ANALYSIS ............................................................................................................................................... 1
1.1. Introduction ....................................................................................................................................................................................................... 1
1.2. The Basic Principle of Counting ................................................................................................................................................................... 1
1.3. Permutations .................................................................................................................................................................................................... 5
1.3.1 Permutation of “n” Distinct elements .......................................................................................................................................... 5
1.3.2 Permutation of “r” Distinct elements from “n” Distinct elements ........................................................................................ 7
1.3.3 Permutation of “n” Elements, some of which have the Same value...................................................................................... 8
1.3.4 Permutation of “n” Distinct elements at a Round table......................................................................................................... 10
1.4. Combinations ................................................................................................................................................................................................. 12
1.5. Significant Identities of the Combinatorial Topic ................................................................................................................................ 16
1.6. The Ball and Urn (cell) Model ..................................................................................................................................................................... 23
1.7. Chapter Problems ..........................................................................................................................................................................................48
CHAPTER 2: AXIOMS OF PROBABILITY ................................................................................................................................................ 58

2.1. Introduction .................................................................................................................................................................................................... 58
2.2. Random Trial, Sample Space, and Event ................................................................................................................................................ 58
2.3. An Introduction to the Algebra of Sets ..................................................................................................................................................... 59
2.4. Definition of Probability .............................................................................................................................................................................. 64
2.5. Some Probabilistic Propositions Resulting from Principles of the Probability Theory ............................................................... 79
2.6. Chapter Problems ..........................................................................................................................................................................................96
CHAPTER 3: CONDITIONAL PROBABILITY AND INDEPENDENCE ..................................................................................................... 106

3.1. Introduction .................................................................................................................................................................................................. 106
3.2. Conditional Probability Concept ............................................................................................................................................................. 106
3.3. The Law of Multiplication in Probability ............................................................................................................................................... 113
3.4. Independence of Events .............................................................................................................................................................................. 115
3.5. The Law of Total Probability .................................................................................................................................................................... 125
3.6. Bayes' Law ..................................................................................................................................................................................................... 133
3.7. The Law of Total Probability in Reduced Space .................................................................................................................................. 138
3.8. Chapter Problems ........................................................................................................................................................................................ 143
CHAPTER 4: RANDOM VARIABLES ...................................................................................................................................................... 162

4.1. Introduction .................................................................................................................................................................................................. 162
4.2. Types of Random Variables....................................................................................................................................................................... 166
4.3. Discrete Random Variables ....................................................................................................................................................................... 167
4.4. Continuous Random Variables ................................................................................................................................................................ 169
4.5. Mixed Random Variables ...........................................................................................................................................................................174
4.6. Cumulative Distribution Function ......................................................................................................................................................... 176
4.7. Some Important Values of Random Variables ...................................................................................................................................... 181
4.8. The Distribution of a Function of a Random Variable ....................................................................................................................... 184
4.9. Conditioning on Continuous Space ........................................................................................................................................................ 188
4.10. Chapter Problems ......................................................................................................................................................................................... 191
CHAPTER 5: EXPECTED VALUE .......................................................................................................................................................... 206

5.1. Introduction ................................................................................................................................................................................................. 206
5.2. The Expected Value of Discrete, Continuous, and Mixed Random Variables ............................................................................. 209
I
5.3. Some notes about the Expected Value of a Random Variable .......................................................................................................... 214
5.4. Expected Value of a Function of a Random Variable .........................................................................................................................223
5.5. Central Tendency Measures of a Random Variable ........................................................................................................................... 227
5.6. Dispersion Measures of a Random Variable........................................................................................................................................ 228
5.6.1. Variance ......................................................................................................................................................................................... 228
5.6.2. Standard Deviation ..................................................................................................................................................................... 230
5.6.3. Expected Distance from the Mean ............................................................................................................................................ 231
5.7. Other Measures of a Random Variable .................................................................................................................................................. 231
5.8. Approximate Expected Value of The Function of a Random Variable ...........................................................................................232
5.9. Moment Generating Function ................................................................................................................................................................ 233
5.10. Factorial Moment Generating Function .............................................................................................................................................. 238
5.11. Distribution and Expected Value Of 𝑿|𝒂 ≤ 𝑿 ≤ 𝒃 ..............................................................................................................................242
5.12. Markov's and Chebyshev's Inequalities..................................................................................................................................................244
5.12.1. Markov's Inequality ......................................................................................................................................................................244
5.12.2. Chebyshev's Inequality ................................................................................................................................................................ 246
5.12.3. One-Sided Chebyshev's Inequality ........................................................................................................................................... 248
CHAPTER 6: SPECIAL DISCRETE RANDOM VARIABLES .................................................................................................................... 264

6.1. Introduction ................................................................................................................................................................................................. 264
6.2. The Bernoulli Random Variable.............................................................................................................................................................. 264
6.3. The Binomial Random Variable .............................................................................................................................................................. 269
6.3.1. Properties of The Binomial Random Variable........................................................................................................................ 272
6.4. The Geometric Random Variable ............................................................................................................................................................ 275
6.4.1. The Mean And Variance of The Geometric Distribution .................................................................................................... 277
6.4.2. The Memoryless Property of The Geometric Distribution .................................................................................................. 279
6.4.3. The Geometric Random Variable of Failure Type ................................................................................................................ 282
6.5. The Negative Binomial Random Variable (Pascal) ............................................................................................................................ 283
6.5.1. The Properties of Negative Binomial Random Variable ..................................................................................................... 286
6.5.2. The Negative Binomial Random Variable of Failure Type................................................................................................. 289
6.5.3. The Problem of Points ................................................................................................................................................................. 290
6.6. The Poisson Random Variable.................................................................................................................................................................. 291
6.6.1. The Properties of The Poisson Random Variable ................................................................................................................. 295
6.6.2. The Poisson Process ...................................................................................................................................................................... 297
6.6.3. Approximating The Probability Function of The Number Of Successes in n Dependent Trials .............................. 299
6.7. The Hypergeometric Random Variable................................................................................................................................................. 304
6.7.1. The Properties of Hypergeometric Random Variable ......................................................................................................... 305
6.8. The Discrete Uniform Random Variable .............................................................................................................................................. 309
6.9. Chapter Problems ......................................................................................................................................................................................... 311
CHAPTER 7: SPECIAL CONTINUOUS RANDOM VARIABLES .............................................................................................................. 333

7.1. Introduction ................................................................................................................................................................................................. 333
7.2. Continuous Uniform Random variable ................................................................................................................................................ 333
7.2.1. Some Properties of a Continuous Uniform Random Variable .......................................................................................... 338
7.3. Normal Random Variable......................................................................................................................................................................... 340
7.3.1. Some Properties of a Normal Random Variable .................................................................................................................... 341
7.3.2. The Normal Distribution Approximation to the Binomial Distribution .........................................................................352
7.4. The Exponential Random Variable .........................................................................................................................................................355
7.4.1. Some Properties of the Exponential Distribution ................................................................................................................ 359
7.4.2. The Two-Parameter Exponential Distribution ..................................................................................................................... 361
7.5. The Gamma Random Variable ................................................................................................................................................................ 362
7.5.1. The Three-Parameter Gamma Random Variable ............................................................................................................................... 367
7.6. Other Continuous Distributions............................................................................................................................................................. 368
7.6.1. The Beta Random Variable ........................................................................................................................................................ 368
7.6.2. The Weibull Random Variable ...................................................................................................................................................370
II
7.6.3. The Cauchy Distribution ..............................................................................................................................................................371
7.6.4. The Pareto Random Variable ......................................................................................................................................................371
7.7. The Failure Rate Function......................................................................................................................................................................... 372
CHAPTER 8: JOINT RANDOM VARIABLES AND CONDITIONAL DISTRIBUTION ............................................................................... 394
8.1. Introduction ................................................................................................................................................................................................. 394
8.2. Joint Random Variables ............................................................................................................................................................................ 394
8.2.1. Jointly Discrete Random Variables .......................................................................................................................................... 394
8.2.2. Jointly Continuous Random Variables .................................................................................................................................... 399
8.3. Some well-known Joint Distributions ................................................................................................................................................... 405
8.3.1. The Multinomial Distribution ................................................................................................................................................... 405
8.3.2. The Multivariate Hypergeometric Distribution ................................................................................................................... 406
8.3.3. The Bivariate Uniform Random Variable ...............................................................................................................................407
8.3.4. The Bivariate Normal Distribution .......................................................................................................................................... 410
8.4. The Independence of Random Variables................................................................................................................................................ 410
8.5. Conditional Distributions ......................................................................................................................................................................... 418
8.5.1. Discrete Case .................................................................................................................................................................................. 418
8.5.2. Continuous Case............................................................................................................................................................................ 421
CHAPTER 9: DISTRIBUTION OF A FUNCTION OF MULTIPLE RANDOM VARIABLES ........................................................................... 442
9.1. Introduction ..................................................................................................................................................................................................442
9.2. Distribution of a Function of Multiple Random Variables................................................................................................................442
9.2.1. Discrete Case ..................................................................................................................................................................................442
9.2.2. Continuous Case............................................................................................................................................................................444
9.3. The Sum of Independent Random Variables ....................................................................................................................................... 448
9.4. The Central Limit Theorem ...................................................................................................................................................................... 457
9.5. Order Statistics ............................................................................................................................................................................................ 461
9.6. Chapter Problems .........................................................................................................................................................................................471
CHAPTER 10: EXPECTED VALUE OF MULTIPLE RANDOM VARIABLES, COVARIANCE, AND CORRELATION ................................. 482
10.1. Introduction ................................................................................................................................................................................................. 482
10.2. The Expected Value of a Function of Multiple Random Variables ................................................................................................. 482
10.3. The Expected Value of The Sum of Random Variables ...................................................................................................................... 487
10.4. Covariance Between Two Random Variables ..................................................................................................................................... 493
10.5. Correlation Coefficient Between Random Variables ......................................................................................................................... 506
CHAPTER 11: DETERMINING THE PROBABILITY FUNCTION AND EXPECTED VALUE BY CONDITIONING ..................................... 520
11.1. Introduction ................................................................................................................................................................................................. 520
11.2. Determining The Probability Function by Conditioning .................................................................................................................. 521
11.3. Determining The Expected Value by Conditioning ............................................................................................................................523
11.4. The Expected Value and Variance of The Sum of A Random Number of Random Variables ................................................. 529
11.5. Chapter Problems ....................................................................................................................................................................................... 533
REFERENCES ................................................................................................................................................................................................................. 541
III
I n this chapter, we will introduce some methods to count the number of elements
in a discrete and finite set. It will be observed in subsequent chapters that using
enumeration theory or combinatorial analysis is one of the primary and fundamental
methods in probability computation. However, the reader should note that there are
not fixed and specific methods for determining the number of states of a set. The
purpose of this chapter is to learn the primary and fundamental principles of
enumeration and to achieve the ability to generalize them for problems that have not
been observed before.
A ll methods of counting rely on the Basic Principle of Counting or the

Principle of Multiplication, which is expressed as follows:
Suppose that two trials are to be done. If the first trial can obtain one
out of the 𝒏 possible results and each of those results correspond with
the 𝒎 possible results of the second trial, then altogether there are
𝒏 × 𝒎 possible results for performing the two trials.
1|Page
A noteworthy point in applying the multiplication principle is to pay attention
to the phrase “each of those results”. Even though it seems obvious, many mistakes
occurring in usage of the multiplication principle result from disregarding the very
point. Note the following examples:
Example 2.1
There are 12 coaches, each of whom has 4 athletes participating in a ceremony.

If one coach and one of his athletes are to be chosen as the coach and athlete of the
year, respectively, how many different choices are possible to do so?
Solution. We define the first and second trials to be choosing the coach and athlete
of the year, respectively. The first trial can be done in 12 states, and given the
selection of each coach in the first trial, choosing his athlete can be done in 4 states.
Hence, the trials can be performed in 12 × 4 = 48 states.
Example 2.2
Suppose that five coaches have two athletes each and the other seven
coaches have three athletes each. Now, if we want to choose one coach and one of
his athletes as the coach and athlete of the year, how many different choices are
possible to do so?
Solution. Since given some of the results of the first trial (choosing coaches), there
are two results for the second trial (choosing athletes). Also, given some other
results of the first trial, there are three possible results for the second trial. Hence,
we cannot directly use the principle of counting. In such situations, we should
divide the problem into two different parts and, concerning the principle of
multiplication, count the number of states belonging to each part. Then, by using
2|Page
the Principle of Plus (the additional plus), we will add up the number of states of
each part. Consequently, the answer to this example equals 5 × 2 + 7 × 3 = 31.
Generally, if there are 𝑛2 results for each of the 𝑛1 results of the first trial and
𝑚2 results for each of the 𝑚1 results of the second trial, then these two trials can be
done in 𝑛1 𝑛2 + 𝑚1 𝑚2 states altogether.
If more than two trials are to be performed, the principle of multiplication can
be generalized as follows:
Suppose that 𝒓 trials are to be performed such that the first trial
consists of 𝒏𝟏 possible results and for each of those results, there are
𝒏𝟐 possible outcomes for the second trial and for each of the results
of the first and second trials, there are 𝒏𝟑 results for the third trial
and, …, and for each of the results of the first (𝒓 − 𝟏) trials, there are
𝒏𝒓 possible outcomes for the 𝒓𝒕𝒉 trial. Then, there are 𝒏𝟏 𝒏𝟐 𝒏𝟑 . . . 𝒏𝒓
states to perform 𝒓 trials altogether.
Example 2.3
How many four-digit numbers are there in a way that the digits are not the
same?
Solution. We define the trials to be the determination of the digits of the four-
digit number from left to right, respectively. The first trial cannot be zero, which
leads to nine states. For each result of the first trial, there are nine outcomes for
the second trial (any number except for the first digit), and for each of the (9 × 9)
outcomes of the first and second trials, there are eight outcomes for the third trial
(any number except for the first and second digits), and finally for each of the
(9 × 9 × 8) outcomes of the first, second, and third trials, there are seven
3|Page
outcomes for the fourth trial (any number except for the first, second, and third
digits). Therefore, the answer is 9 × 9 × 8 × 7 = 4536.
Example 2.4
How many four-digit numbers are there in a way that the consecutive digits
are not the same?
Solution. We define the trials to be the determination of the digits of the four-digit
number from left to right, respectively. The first trial cannot be zero, which leads to
nine outcomes. For each result of the first trial, there are nine outcomes for the
second trial (any number except for the first digit), and for each of the (9 × 9)
outcomes of the first and second trials, there are nine outcomes for the third trial
(any number except for the second digit), and finally for each of the (9 × 9 × 9)
outcomes of the first, second, and third trials, there are nine outcomes for the third
trial (any number except for the third digit). Therefore, the answer is:
9 × 9 × 9 × 9 = 94 = 6561
Example 2.5
How many three-digit even numbers can be made using 0, 1, 2, 3, 4, and 5

without repetition?
Solution. For the first trial (choosing units digit), there are three possible
outcomes. Namely, 0, 2, and 4. However, the reader should note that for the
number of states of the second trial (choosing hundreds digit), if we consider the
values of 2 and 4 as the results of the first trial, there will be four outcomes for
the second trial. Also, if we consider the value of zero as the result of the first trial,
there will be five possible outcomes for the second trial. Hence, the problem
should be divided into two different parts, and the answer is equal to:
4|Page
4×4×2 result 2,4
+5 × 4 × 1 result 0
= 52
There is another approach to solve the example. If we define the first and
second trials to be the determination of the hundreds and units digits, respectively,
given the three possible results of the first trial (1, 3, and 5), there are three outcomes
for the second trial (0, 2, and 4). Furthermore, given the other two results of the first
trial (2 and 4), there are two outcomes for the second trial (zero and the digit not
resulting from in the previous trial). Therefore, the total number of states is
calculated as follows:
3 × 4 × 3 + 2 × 4 × 2 = 52
A common type of problems concerning the combinatorial analysis relates to the

number of states of arranging “𝑛” elements in different places, which is
addressed in this section.
According to the principle of multiplication, the number of states of putting

“𝑛” distinct elements into 𝑛 different places is equal to:
𝑛 × (𝑛 − 1) × ⋯ × 3 × 2 × 1 = 𝑛!
To determine the first element put into the first place, there are 𝑛 possible
states, and for each state of selecting the element of the first place, there are (𝑛 − 1)
possible choices for the element of the next place. In the same manner, there is one
state for the element of the 𝑛𝑡ℎ place. Therefore, the number of states in which “𝑛”
distinct elements can be arranged in “𝑛” different places is equal to 𝑛! states.
5|Page
Example 3.1
How many ways can five girls and five boys be seated in a row if
a) There is no restriction for sitting?
b) The girls sit together, and so do the boys?
c) The girls sit together?
d) The girls and boys sit alternately?
Solution.
a) Assuming that all the ten places are distinguishable from left to right,
we should arrange the ten people (ten distinct elements) in ten different
places. Therefore, the answer is 10!.
b) The boys can sit together in 5! states, and so do the girls. Moreover,
these two groups of boys and girls can be displaced in 2! states.
Consequently, according to the principle of multiplication, the number
of states is equal to 5! × 5! × 2!.
c) The girls can sit together in 5! states. Furthermore, the group of girls
along with the remaining five boys accounts for six different groups to
be displaced in 6! states. Consequently, according to the principle of
multiplication, the number of states is equal to 5! × 6!.
d) There are ten states for the first place, but since the person sitting in
the second place should not have the same gender as the first place,
there are five states for the second place. A person sitting in the third
place should have the same gender as the first position. Therefore, there
are four states for the third place. Likewise, each person sitting in each
place should have the same gender as the person sitting in the previous
two places. Hence, the total number of states is equal to: 10 × 5 × 4 ×
4 × 3 × 3 × 2 × 2 × 1 × 1 = 2 × 5! × 5!
6|Page
The number of ways that we can select 𝑟 distinct elements from 𝑛 distinct ones
and arrange them is equal to:
𝑛!
𝑃𝑟𝑛 = 𝑛(𝑛 − 1)(𝑛 − 2) . . . (𝑛 − 𝑟 + 1) =
(𝑛 − 𝑟) !
There are “𝑛” states for determining the element of the first place. Then, for
each of the states belonging to the element of the first place, there are (𝑛 − 1) states
for the element of the next place. Likewise, there are (𝑛 − 𝑟 + 1) states for
determining the element of the 𝑟 𝑡ℎ place. Hence, the number of states that we can
select “𝑟” distinct elements from “𝑛” distinct ones and arrange them equals 𝑃𝑟𝑛 =
𝑛(𝑛 − 1)(𝑛 − 2) . . . (𝑛 − 𝑟 + 1). This can be written as follows:
𝑃𝑟𝑛 = 𝑛(𝑛 − 1)(𝑛 − 2) … (𝑛 − 𝑟 + 1)
(𝑛 − 𝑟)!
= 𝑛(𝑛 − 1)(𝑛 − 2) … (𝑛 − 𝑟 + 1) ×
(𝑛 − 𝑟)!
𝑛!
=
(𝑛 − 𝑟)!
Example 3.2
How many ways can we select a director, an administrative assistant, and a

financial deputy from a ten-member board?
Solution. For selection of the director, there are ten states. Then, for each state of
the ten states, there are nine states for determining the financial deputy. Given each
state of the two preceding trials, there are eight states to determine the
administrative assistant. Therefore, the number of states equals:
10!
𝑃310 = 10 × 9 × 8 =
7!
7|Page
In some problems, due to the sameness of some objects, the displacement
among them does not create a new state. For example, in arranging of numbers 22111,
the displacement among 1's does not result in a new state. In this section, we address
the number of states of such problems.
Example 3.3
How many different arrangements can be made using the letters 𝑎, 𝑎, 𝑐

(arrangements in which the letter “𝑎” and the letter “𝑐” appear twice and once,
respectively)?
Solution. In many cases, we confront new problems that are reasonably similar to
the previous ones solved, and their answer is straightforward to us. One method to
solve such problems is to find the ratio between the number of states of the new
problem and the old one as follows:
(The number of states of the new problem)
= (The number of states of the old problem) × 𝑘
Where the number of states of the new problem is calculated by finding 𝑘.
In this example, the value of two identical “𝑎” is the same, and their
displacement does not create a different state. To solve the example, we go for an
old problem which deals with determining the number of arrangements of letters 𝑎1 ,
𝑎2 , and 𝑐. As mentioned in Section 1.3.1, the number of states of this problem equals
3!. As seen below, every two states in the old problem are equivalent to one state in
the new problem. Therefore, it can be concluded that the number of states of the
new problem is half of the number of states of the old problem.
𝑎1 𝑎2 𝑐 𝑎1 𝑐 𝑎2 𝑐 𝑎1 𝑎2
}⇒𝑎𝑎𝑐 }⇒𝑎𝑐𝑎 }⇒𝑐𝑎𝑎
𝑎2 𝑎1 𝑐 𝑎2 𝑐 𝑎1 𝑐 𝑎2 𝑎1
Hence, the number of possible states for arranging the letters a, a, and c can
be obtained as follows:
8|Page
𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑛𝑒𝑤 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 (𝑎, 𝑎, 𝑐)
1 1
= 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑜𝑙𝑑 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 (𝑎1 𝑎2 𝑐) × = 3! × = 3
2 2
Example 3.4
How many different arrangements can be made using the letters 𝑎, 𝑎, 𝑏, 𝑏?

Solution. The number of states to sort out 𝑎1 𝑎2 𝑏1 𝑏2 (an old problem) is equal to 4!.
Furthermore, as seen in the below, it can be shown that each 2! × 2! states of the old
problem are equivalent to one state of this example.
The number of states of The number of states of

the old problem the new problem
𝑎1 𝑎2 𝑏1 𝑏2
𝑎1 𝑎2 𝑏2 𝑏1
} ⇒ 𝑎𝑎𝑏𝑏
𝑎1 𝑎1 𝑏1 𝑏2
𝑎1 𝑎1 𝑏2 𝑏1
Consequently, the number of states of the problem is equal to:
4!
2! 2!
Example 3.5
How many different arrangements can be made using the letters 𝑎𝑎𝑏𝑏𝑐𝑐𝑐𝑐?
Solution. The number of states to sort out the letters 𝑎1 𝑎2 𝑏1 𝑏2 𝑐1 𝑐2 𝑐3 is equal to 7!.
Now, by considering Examples 4.3 and 5.3, it can be concluded that the number of
possible states of this example equals:
9|Page
7!
3! 2! 2!
In general, it can be shown that the number of sorting out “𝑛” elements, 𝑛1
of which are the same, 𝑛2 of which are the same, …, and 𝑛𝑘 of which are the same
is equal to
𝑛!
𝑛1 ! 𝑛2 ! … 𝑛𝑘 !
The number of states that “𝑛” distinct elements can be arranged at a round
table is equal to (𝑛 − 1)!. To prove it, we should know that the only difference
between the problem of arranging people at a round table and in a row is that the
location of people does not matter in the former case, which the only important point
is the way of arranging the people. We are now trying to establish a relationship
between the number of states of this problem and the number of states of seating
people in a row. Also, it is intended to show that every “𝑛” states of seating people in
a row are equivalent to one state of seating people at a round table.
As mentioned previously, in the problem of arranging people at a round table,
the only important issue is the order of sitting. Hence, the states shown below are
considered indistinguishable:
A D C B
D B C A B D A C
C B A D
Therefore, there is a relationship between the states of seating people in a row

and at a round table as follows:
10 | P a g e
The states of seating in a row The states of seating at a round table
Hence, the number of states of seating people at a round table can be written
as follows:
𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑜𝑓 𝑠𝑒𝑎𝑡𝑖𝑛𝑔 𝑛 𝑝𝑒𝑜𝑝𝑙𝑒 𝑎𝑡 𝑎 𝑟𝑜𝑢𝑛𝑑 𝑡𝑎𝑏𝑙𝑒

1 1
= (𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑜𝑓 𝑠𝑒𝑎𝑡𝑖𝑛𝑔 𝑛 𝑝𝑒𝑜𝑝𝑙𝑒 𝑖𝑛 𝑎 𝑟𝑜𝑤) × = 𝑛! × = (𝑛 − 1)!
𝑛 𝑛
There is also another way to justify the formula of arranging people around a
round table. Since different possible places of the round table do not create a new
state for the first person, there is only one state for him. However, after he sits, since
the way of sitting relative to the first person is important for the other ones, the value
of places turns out to be different, and the number of states of seating them relative
to the first person equals:
1 × (𝑛 − 1) × (𝑛 − 2) × (𝑛 − 3) × … × 1 = (𝑛 − 1)!
Example 3.6
How many ways can “𝑛” people be seated at a round table such that person A
sits between person B and person C?
Solution 1. There is one state for person A. Then, there are two states for person B
to sit on the left or right side of the person A. In this status, there is one state for
person C. Finally, the other (𝑛 − 3) people can sit on the remaining places in (𝑛 − 3)!
states. Therefore, the number of states equals:
1 × 2 × 1 × (𝑛 − 3)! = 2 × (𝑛 − 3)!
11 | P a g e
Solution 2. People A, B, and C form a three-member group, and the remaining (𝑛 − 3)
people form (𝑛 − 3) one-member groups. Therefore, we have (𝑛 − 2) groups
altogether to be arranged at a round table in (𝑛 − 3)! states. Meanwhile, for the group
including members A, B, and C, there are 2! states for the displacement of people B
and C on both sides of A. That is, there are two states (BAC) and (CAB). Therefore, the
answer is equal to:
(𝑛 − 3)! × 2!
S uppose that we have “𝑛” distinct objects. The number of states of choosing “𝑟”
distinct objects from these “𝑛” distinct objects (without considering the order of
choices) is equal to:
𝑛 𝑛!
𝐶𝑟𝑛 = ( ) =
𝑟 (𝑛 − 𝑟)! × (𝑟!)
To prove the above equation, it suffices to refer to one of the similar previous
problems with a straightforward answer. The number of states of selecting “𝑟”
distinct objects from “𝑛” distinct objects with consideration of their permutations
𝑛!
(orders) is 𝑃𝑟𝑛 = (𝑛−𝑟)! , every 𝑟! results of which are equivalent to one state of the new
problem (selecting objects without consideration of the permutations). For instance,
in choosing a three-member group from a ten-member group of people, every 3!
states of the problem with consideration of the order of choices are equivalent to
one state of the problem without consideration of the order of choices.
ABC 

ACB 
BAC  are equivalent to
 ⎯⎯⎯⎯⎯⎯⎯→ {ABC}
BCA 
CAB 

CBA 

States of the old problem States of the new problem
12 | P a g e
Therefore, the number of states that we can choose three of the seven distinct
elements equals:
7 𝑃37 7!
𝐶37 = ( ) = =
3 3! 4! 3!
Likewise, in general, it can be shown that the number of states of choosing “𝑟”
elements from the “𝑛” distinct elements is equal to:
𝑃𝑟𝑛 𝑛! (𝑛)(𝑛 − 1) ⋯ (𝑛 − (𝑟 − 1)) 𝑛
𝐶𝑟𝑛 = = = =( )
𝑟! (𝑛 − 𝑟)! 𝑟! 𝑟! 𝑟
Example 4.1
Suppose that a class consists of five boys and four girls.

a) How many ways can a group of size 3 be chosen from them?
b) How many ways can a group of size 3 consisting of one girl and two boys
be chosen?
c) How many ways can a group of size 3 consisting of at most one boy be
chosen?
Solution.
a) The number of states of choosing three out of the nine people equals:
9 9! 9×8×7
( )= = = 84
3 3! 6! 3!
b) We define the first trial to be choosing one girl and the next trial to be
4
choosing two boys. There are ( ) states for the first trial, and for each
1
5
of these four states, there are ( ) states for the second trial. Therefore,
2
according to the principle of multiplication, the total number of states
equals:
4 5
( ) ( ) = 40
1 2
13 | P a g e
c) Considering the “at most one boy” restriction, we can choose either no
boy and three girls or one boy and two girls. Therefore, according to the
principle of multiplication, the total number of states equals:
5 4 5 4
( ) ( ) + ( ) ( ) = 1 × 4 + 5 × 6 = 34
0 3 1 2
Example 4.2
How many seven-letter arrangements can be formed using the letters aaabbcc
such that none of the a's is next to each other?
4!
Solution. First, we arrange bbcc which can be done in states. Then, we select
2!2!
three spaces for a's from the five spaces created with the letters. Finally, a's can be
arranged in one state (note that the displacement of a's does not create a new state
in the selected spaces).
b b c c
indicating the space that at most one "a" can be seated.
Hence, the number of states equals:

4! 5
×( )×1
2! 2! 3
14 | P a g e
Example 4.3
How many seven-letter arrangements can be formed using the letters abcddee
such that a, b, and c are not next to each other?
Solution. First, we arrange ddee. Then, we select three spaces for a, b, and c from 5
spaces created with the letters and arrange them in 3! states.
d d e e
Hence, the number of states equals:
4! 5
( ) 3! = 360
2! 2! 3
Example 4.4
Consider a set of 𝑛 people. How many possible selections are there to make
a committee of size 𝑟 such that an individual is the chairperson?
Solution. First, we choose a committee of size “𝑟” and then choose one of them as
𝑛
the chairperson. The number of states of the first trial equals ( ), and for each of
𝑟
𝑟
those states, we have ( ) states for the second trial. Therefore, according to the
1
principle of multiplication, the total number of states equals:
𝑛 𝑟
( )×( )
𝑟 1
Another approach to solve this problem is that we first choose the
chairperson and then the other committee members of size 𝑟. This will provide
𝑛 𝑛−1
( ) states for the first trial. Also, for each of these states, there will be ( )
1 𝑟−1
15 | P a g e
states for the second trial. Therefore, according to the principle of multiplication,
the total number of states equals:
𝑛 𝑛−1
( )×( )
1 𝑟−1
As observed in the above example, solving a combinatorial analysis problem
by using different ways can lead to some interesting combinatorial relations. Some
of them are addressed in the next section.
I n this section, we are about to introduce some of the widely used combinatorial
identities in the probability theory and prove them analytically. The first identity
is as follows:
𝑛 𝑛
( )=( ); 0≤𝑟≤𝑛 (5.1)
𝑟 𝑛−𝑟
To prove it analytically, suppose we have an 𝑛-member set and we want to

select “𝑟” members from them (left side of the identity). Such a selection can be made
by firstly choosing (𝑛 − 𝑟) members of the set, setting them aside (right side of the
identity), and then regarding the remaining 𝑟 members as the leading members of
the set.
The second combinatorial identity known as the Pascal's identity is expressed
as follows:
𝑛 𝑛−1 𝑛−1
( )=( ) +( ); 1≤𝑟≤𝑛 (5.2)
𝑟 𝑟−1 𝑟
Consider an 𝑛-member set and suppose that we want to select “𝑟” members
from the set (left side of the identity). To do so, regard a specific element such as “A”
and divide all the possible states into two groups. The first group consists of the
states in which the member “A” is among the “𝑟” members selected, and the second
group consists of states in which the member “A” is not among the “𝑟” members
selected (right side of the identity). The number of possible states in which the
1 𝑛−1
member “A” is selected equals ( ) ( ), and the number of possible states in which
1 𝑟−1
16 | P a g e
1 𝑛−1
the member “A” is not selected equals ( ) ( ). Hence, the total number of states
0 𝑟
is equal to:
1 𝑛−1 1 𝑛−1 𝑛−1 𝑛−1
( )( ) + ( )( )=( )+( )
0 𝑟 1 𝑟−1 𝑟 𝑟−1
The next identity is a useful relation, which will be used in the subsequent
chapters, and is expressed as follows:
𝑘
𝑚 𝑛 𝑚+𝑛
∑( )( ) =( ) (5.3)
𝑟 𝑘−𝑟 𝑘
𝑟=0
To prove the above identity, suppose that we have a set of size (𝑚 + 𝑛)
elements (for instance, “𝑛” women and “𝑚” men) from which we want to select “𝑘”
ones (right side of the identity). To do so, firstly, divide the set into two 𝑛-member
and 𝑚-member groups. We can then suppose that the 𝑘-member group selected can
consist of 0 members from the 𝑛-member set and 𝑘 members from the 𝑚-member
set, or one member from the 𝑛-member set and (𝑘 − 1) members from the 𝑚-
member set, and so on. Such a statement represents the left side of the identity (5.3).
Another identity utilized in the combinatorial analysis is identity (5.4)
expressed as follows:
𝑛 𝑚
𝑛 𝑚 𝑛 𝑚 𝑚+𝑛 𝑚+𝑛
∑( )( ) = ∑( )( ) = ( )=( ) (5.4)
𝑖 𝑖 𝑖 𝑖 𝑛 𝑚
𝑖=0 𝑖=0
To prove, using Identities (5.1) and (5.3), we have:

𝑛 𝑚
𝑛 𝑚 𝑛 𝑚 𝑚+𝑛
∑( )( ) = ∑( )( ) = ( )
𝑖 𝑖 𝑛−𝑖 𝑖 𝑛
𝑖=0 𝑖=0
To understand the above identity better, suppose that we want to make a

subset from a committee consisting of four women and five men in a way that the
number of women equals that of the men (this subset can contain zero member). In
such conditions, the number of possible states equals:
4 4
4 5 4 5 4 5 4 5 4 5 4+5 9
( )( ) + ( )( ) + ⋯+ ( )( ) = ∑( )( ) = ∑( )( ) = ( )=( )
0 0 1 1 4 4 𝑖 𝑖 4−𝑖 𝑖 4 4
𝑖=0 𝑖=0
17 | P a g e
Note that if 4 as the upper bound of the summation above is replaced by 5, it
4 5
does not affect the answer because the term ( ) ( ) is equal to zero.
5 5
In Identity (5.4), if we replace the value of 𝑛 with 𝑚, then the following identity
is obtained:
𝑛
𝑛 2 2𝑛
∑( ) = ( ) (5.5)
𝑖 𝑛
𝑖=0
Another important identity that is commonly used in the combinatorial

analysis and probability theory is the Binomial Expansion. This is expressed as
follows:
𝑛 𝑛
𝑛 𝑛!
(𝑥1 + 𝑥2 ) = ∑ ( ) 𝑥1𝑖 𝑥2𝑛−𝑖 = ∑
𝑛
𝑥1𝑖 𝑥2𝑛−𝑖
𝑖 (𝑛
𝑖! − 𝑖)!
𝑖=0 𝑖=0
If we write (𝑥1 + 𝑥2 )𝑛 as (𝑥1 + 𝑥2 )(𝑥1 + 𝑥2 ) ⋯ (𝑥1 + 𝑥2 ) and expand it, the

resulted expansion contains 2𝑛 members. For instance, suppose that we write the
term (𝑥1 + 𝑥2 )3 as (𝑥1 + 𝑥2 )(𝑥1 + 𝑥2 )(𝑥1 + 𝑥2 ). Thus, the expansion is:
(𝑥1 + 𝑥2 )3 = (𝑥1 + 𝑥2 )(𝑥1 + 𝑥2 )(𝑥1 + 𝑥2 )
= 𝑥1 𝑥1 𝑥1 + 𝑥1 𝑥1 𝑥2 + 𝑥1 𝑥2 𝑥1 + 𝑥1 𝑥2 𝑥2 + 𝑥2 𝑥1 𝑥1 + 𝑥2 𝑥1 𝑥2 + 𝑥2 𝑥2 𝑥1 + 𝑥2 𝑥2 𝑥2
It is seen that the expansion of (𝑥1 + 𝑥2 )3 contains 23 members, some of which
are equal to each other.
Hence, the term above can be written as:
𝑥13 𝑥20 + 3𝑥12 𝑥21 + 3𝑥11 𝑥22 + 𝑥10 𝑥23
As seen, the coefficient of each term such as 𝑥11 𝑥22 equals the number of
times that this term is repeated among 23 possible terms. For example, the
coefficient of the term 𝑥11 𝑥22 equals 3 because after expanding the term (𝑥1 + 𝑥2 )3 ,
the term 𝑥11 𝑥22 is repeated three times as 𝑥1 𝑥2 𝑥2 , 𝑥2 𝑥1 𝑥2 , and 𝑥2 𝑥2 𝑥1 . Since each of
these three terms is one arrangement of letters 𝑥1 , 𝑥2 , and 𝑥2 , their number of
states based on the matters expressed in Section 1.3.3 is equal to
3! 3
=( )
1! 2! 1
18 | P a g e
Likewise, it can be shown that if we expand the term (𝑥1 + 𝑥2 )𝑛 , some of the
terms will be equal to each other. As a result, the coefficient of each term such as
𝑥1𝑖 𝑥2𝑛−𝑖 is its number of repetitions, that is:
𝑛! 𝑛
=( )
𝑖! (𝑛 − 𝑖)! 𝑖
Example 5.1
What is the coefficient of 𝑥1 7 𝑥2 3 in the expansion of (𝑥1 + 𝑥2 )10 ?

Solution. The completed expansion of (𝑥1 + 𝑥2 )10 is as follows:
10
10 10
(𝑥1 + 𝑥2 ) = ∑( ) 𝑥1 𝑖 𝑥210−𝑖
𝑖
𝑖=0
10
Hence, the coefficient of 𝑥1 7 𝑥2 3 in this expansion equals ( ).
7
Example 5.2
What is the coefficient of 𝑥1 7 𝑥2 3 in the expansion of (2𝑥1 + 3𝑥2 )10?

Solution. The completed expansion of (2𝑥1 + 3𝑥2 )10 is as follows:
10 10
10 10 10
(2𝑥1 + 3𝑥2 ) = ∑ ( ) (2𝑥1 )𝑖 (3𝑥2 )10−𝑖 = ∑ ( ) 2𝑖 310−𝑖 𝑥1 𝑖 𝑥210−𝑖
𝑖 𝑖
𝑖=0 𝑖=0
10 7 3
Consequently, the coefficient of 𝑥1 7 𝑥2 3 in this expansion equals ( )2 3 .
7
19 | P a g e
Example 5.3
What is the coefficient of 𝑥1 7 in the expansion of (2𝑥1 + 3)10 ?

Solution. The completed expansion of (2𝑥1 + 3)10 is as follows:
10 10
10 10 10
(2𝑥1 + 3) = ∑ ( ) (2𝑥1 )𝑖 (3)10−𝑖 = ∑ ( ) 2𝑖 310−𝑖 𝑥1 𝑖
𝑖 𝑖
𝑖=0 𝑖=0
10 7 3
Therefore, the coefficient of 𝑥1 7 in this expansion equals ( )2 3 .
7
In the binomial expansion identity, if we substitute value 1 for 𝑥1 and 𝑥2 , the
following result is obtained:
𝑛
𝑛
∑ ( ) = 2𝑛 (5.6)
𝑖
𝑖=0
Also, to prove the above identity analytically, suppose that we want to

determine the number of subsets of an 𝑛-member set. On the left side of the
identity, the number of subsets including zero member, one member, two
members, …, and 𝑛 members, are separately calculated and then added together.
The right side of the identity is considered such that to making a subset from an
𝑛-member set, each member of the main set can be present or absent in the subset
(it has two states). Therefore, the number of states that a subset can be made from
an 𝑛-member set equals 2 × 2 × 2 × … × 2 = 2𝑛 .
Furthermore, in the binomial expansion identity, if we substitute the values
of −1 and 1 for 𝑥1 and 𝑥2 , the following result is obtained:
𝑛 𝑛
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
(−1 + 1) = ∑ ( ) (−1)𝑖 1𝑛−𝑖 = ∑ ( ) (−1)𝑖 = 0 ⇒ ( ) − ( ) + ( ) − ( ) + ⋯ = 0
𝑛
𝑖 𝑖 0 1 2 3
𝑖=0 𝑖=0
𝑛 𝑛
⇒ ∑ ( ) = ∑ ( )
𝑖 𝑖
𝑜𝑑𝑑 𝑖 𝑒𝑣𝑒𝑛 𝑖
20 | P a g e
In addition, we know that:
𝑛
𝑛 𝑛 𝑛
∑ ( ) + ∑ ( ) = ∑ ( ) = 2𝑛
𝑖 𝑖 𝑖
𝑜𝑑𝑑 𝑖 𝑒𝑣𝑒𝑛 𝑖 𝑖=0
Therefore, we have:
𝑛 𝑛
∑ ( ) = ∑ ( ) = 2𝑛−1 (5.7)
𝑖 𝑖
𝑜𝑑𝑑 𝑖 𝑒𝑣𝑒𝑛 𝑖
The generalized form of the binomial expansion is called the Multinomial

Expansion, which is presented as follows:
𝑛 𝑛 𝑛 𝑛 𝑛! 𝑛 𝑛 𝑛
(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑟 )𝑛 = ∑ (𝑛 , . . . , 𝑛 ) 𝑥1 1 𝑥2 2 ⋯ 𝑥𝑟 𝑟 = ∑ 𝑥 1 𝑥 2 ⋯ 𝑥𝑟 𝑟
1 𝑟 𝑛1 ! ⋯ 𝑛𝑟 ! 1 2
𝑛1 + ⋯ + 𝑛𝑟 = 𝑛 𝑛1 + ⋯ + 𝑛𝑟 = 𝑛 (5.8)
To prove the above identity, it suffices to expand it in the same manner as the
binomial expansion. For instance, consider the trinomial expansion (𝑥1 + 𝑥2 + 𝑥3 )7:
(𝑥1 + 𝑥2 + 𝑥3 )7 = (𝑥1 + 𝑥2 + 𝑥3 )(𝑥1 + 𝑥2 + 𝑥3 ) ⋯ (𝑥1 + 𝑥2 + 𝑥3 )

= 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 + 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 + 𝑥1 𝑥1 𝑥1 𝑥1 𝑥1 𝑥2 𝑥1 + ⋯ + 𝑥3 𝑥3 𝑥3 𝑥3 𝑥3 𝑥3 𝑥3
As mentioned in the binomial expansion, some of the terms are equal to each
other after expanding the term (𝑥1 + 𝑥2 + 𝑥3 )7. Furthermore, the coefficient of each
term, such as 𝑥13 𝑥22 𝑥32 , is equal to the number of times that the term has been
7! 7
repeated. For instance, the coefficient of the term 𝑥13 𝑥22 𝑥32 is equal to =( )
3!2!2! 3,2,2
7!
because after expanding the term (𝑥1 + 𝑥2 + 𝑥3 )7, the term 𝑥13 𝑥22 𝑥32 is repeated
3!2!2!
7
=( ) times. Some of them are 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥3 𝑥3 , 𝑥1 𝑥1 𝑥2 𝑥1 𝑥2 𝑥3 𝑥3 , and 𝑥1 𝑥1 𝑥2 𝑥2 𝑥1 𝑥3 𝑥3 .
3,2,2
Each of these terms is equivalent to one arrangement of the letters 𝑥1 𝑥1 𝑥1 𝑥2 𝑥2 𝑥3 𝑥3
7! 7
that can be arranged in =( ) states.
3!2!2! 3,2,2
21 | P a g e
Likewise, it can be shown that if we expand the term (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑟 )𝑛 , some
of the terms are equal to each other. As a result, the coefficient of each term, such as
𝑛 𝑛 𝑛 𝑛! 𝑛
𝑥1 1 𝑥2 2 … 𝑥𝑟 𝑟 , is equal to its number of repetitions, which is = (𝑛 , 𝑛 , . . . , 𝑛 ).
𝑛1 ! ⋯ 𝑛𝑟 ! 1 2 𝑟
In the multinomial expansion, if the coefficient of 𝑥𝑖 's equals 1, we call it the

simple multinomial expansion. To obtain the sum of the coefficients in each
multinomial expansion, we should substitute the value of 1 for 𝑥𝑖 's. Therefore, the
sum of the coefficients in the simple multinomial expansion equals 𝑟 𝑛 .
𝑛 𝑛!
∑ (𝑛 , . . . , 𝑛 ) = ∑ = 𝑟𝑛
1 𝑟 𝑛1 ! ⋯ 𝑛𝑟 !
𝑛1 + 𝑛2 ⋯ + 𝑛𝑟 = 𝑛 𝑛1 + 𝑛2 ⋯ + 𝑛𝑟 = 𝑛
Example 5.4
Obtain the coefficient of the term 𝑥1 4 𝑥2 6 in the expansion of (3𝑥1 2 + 2𝑥2 3 + 2)10 .
Solution. The completed expansion of the term (3𝑥1 2 + 2𝑥2 3 + 2)10 is as follows:
10!
(3𝑥1 2 + 2𝑥2 3 + 2)10 = ∑ (3𝑥1 2 )𝑛1 (2𝑥2 3 )𝑛2 (2)𝑛3
𝑛1 ! 𝑛2 ! 𝑛3 !
𝑛1 + 𝑛2 + 𝑛3 = 10
10!
=∑ 3𝑛1 2𝑛2 2𝑛3 (3𝑥1 2 )𝑛1 (2𝑥2 3 )𝑛2 (2)𝑛3
𝑛1 ! 𝑛2 ! 𝑛3 !
𝑛1 + 𝑛2 + 𝑛3 = 10
Therefore, the coefficient of the term 𝑥1 4 𝑥2 6 in this expansion equals:
10!
32 22 26
2! 2! 6!
22 | P a g e
I n some combinatorial analysis problems, the goal is to count the number of states
of putting some objects (balls) into some containers (cells or urns). In this type of
problems, the term “indistinguishable objects” means that only the number of each
container's objects matters. Moreover, the displacement of an object from a
container with the other object from another container does not create a new state
(the value of objects is the same). On the contrary, the term “distinguishable objects”
means that both the number of objects and their values in each container are
important. In other words, if the values of objects are assumed to be different, the
displacement of one object from a container with the other object from another
container creates a new state.
If the objects are assumed to be the same or

𝑥 𝑥 𝑥 𝑥 𝑥 indistinguishable, the displacement of an object inside
an urn with that of another object inside another urn
does not create a new state.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5
If the objects are assumed to be different or

distinguishable, the displacement of an object inside
𝑥1 𝑥2 𝑥4 𝑥3 𝑥5 an urn with that of another object inside another urn
creates a new state.
Furthermore, the term “indistinguishable containers” means that groupmates

of each object matter. Nonetheless, “the different containers” means that in addition
to the groupmates, the urn of each object is important as well.
Different rooms 4 people in 2 two-member rooms The same rooms
23 | P a g e
In this book, when distributing the people into physical places (such as a class,
a room, and an avenue, to name but a few), we consider individuals to be different
unless otherwise stated in the problem (For instance, it is explicitly stated in the
problem that the value of people is the same or only the number of individuals lying
in each place is important.). Furthermore, we consider physical places to be distinct
urns unless otherwise stated in the problem (For instance, it is explicitly stated in the
problem that only the number of each person's groupmates matters, not their place.)
If we want to group the people, we consider the groups to be identical urns, unless
otherwise stated in the problem.
Now, we investigate some well-known cases in the ball and urn model. Note
that the classification scheme of this book is merely a suggestion for classifying the
ball and urn model's problems, which is not necessarily adopted in all reference
books.
Type 1: Putting distinguishable objects into distinguishable urns such that the
number of objects in each urn is determinate.
Suppose that we want to distribute “𝑛” distinguishable objects into “𝑟”

distinguishable urns in a way that 𝑛1 objects are put into the first urn, 𝑛2 objects
into the second urn, ... , and 𝑛𝑟 objects into the 𝑟 𝑡ℎ urn such that ∑𝑟𝑖= 1 𝑛𝑖 = 𝑛. There
𝑛
are (𝑛 ) states for choosing the objects of the first urn, and given each state of
1
𝑛 − 𝑛1
choosing the objects of the first urn, there are ( 𝑛 ) states for choosing the
2
objects of the second urn, and so on. Therefore, the total number of states is equal
to:
𝑛 𝑛 − 𝑛1 𝑛 − 𝑛1 − 𝑛2 − ⋯ − 𝑛𝑟−1
(𝑛 ) ( 𝑛 ) … ( 𝑛𝑟 )
1 2
𝑛! (𝑛 − 𝑛1 )! (𝑛 − 𝑛1 − ⋯ − 𝑛𝑟−1 )!
= × × ⋯×
𝑛1 ! × (𝑛 − 𝑛1 )! 𝑛2 ! × (𝑛 − 𝑛1 − 𝑛2 )! 𝑛𝑟 ! × 0!
𝑛! 𝑛
= = (𝑛 , 𝑛 , … , 𝑛 )
𝑛1 ! × 𝑛2 ! × ⋯ × 𝑛𝑟 ! 1 2 𝑟
Hence, the number of possible states to distribute “𝑛” distinguishable objects

into “𝑟” distinguishable urns such that 𝑛1 objects are put into the first urn, 𝑛2 objects
into the second urn, ... , and 𝑛𝑟 objects into the 𝑟 𝑡ℎ urn is equal to:
24 | P a g e
𝑛! 𝑛
= (𝑛 , 𝑛 , … , 𝑛 )
𝑛1 ! × 𝑛2 ! × ⋯ × 𝑛𝑟 ! 1 2 𝑟
This book entitles this type of problem as the "Type 1" of ball and urn problems.
In such problems, it is essential to note the properties of the given problem.
Consequently, note that the formula of type 1 belongs to the problems possessing
properties as follows:
a. Objects are distinguishable (“𝑛” distinguishable objects).
b. Urns are distinguishable (“𝑟” distinguishable urns).
c. It is known that precisely which urn receives 𝑛1 objects, which urn
receives 𝑛2 objects, …, and which urn gets 𝑛𝑟 objects.
Example 6.1
How many ways can seven people be distributed into three rooms numbered
from 1 to 3 such that three people are in room 1, two people in room 2, and two people
in room 3?
Solution. If we define the trials 1, 2, and 3 to be the determination of the number of
7
people who are in rooms 1, 2, and 3 respectively, there are ( ) states for the first trial.
3
4
For each of those states, there are ( ) states for the second trial, and for each state
2
2
of the first and second trials, there is ( ) state for the third trial. Hence, the number
2
of possible states is obtained as:
7 4 2 7!
( )( )( ) = = 210
3 2 2 3! 2! 2!
Meanwhile, the reader should note that the order of trials can also occur in
other ways. For instance, we can first determine the people of room 2, then the room
1, and finally the room 3. Therefore, the number of states for this case is equal to:
25 | P a g e
7 5 2
( ) ( ) ( ) = 210
2 3 2
Type 2: Putting distinguishable objects into distinguishable urns such that one
urn receives 𝑛1 objects, one urn receives 𝑛2 objects, and so on (it is not known
which urn receives 𝑛1 objects, which urn receives 𝑛2 objects, and so on.).
Suppose that we want to distribute a set, consisting of “𝑛” distinguishable

objects, into “𝑟” distinguishable urns in a way that 𝑛1 objects are put into an urn, 𝑛2
objects into another urn, ... , 𝑛𝑟 objects into the other urn such that ∑𝑟𝑖=1 𝑛𝑖 = 𝑛.
Moreover, it is not known exactly which urn has 𝑛1 objects, which urn has 𝑛2 objects,
…, and which urn has 𝑛𝑟 objects.
This book entitles this type of problem as the "Type 2" of the ball and urn
problems. To solve it, we first determine which urn gets 𝑛1 objects, which urn gets
𝑛2 objects, …, and which urn gets 𝑛𝑟 objects. Then, we confront a type 1 problem
because the number of objects in each urn is known. In other words, doing some
more trials converts the type 2 problem into type 1 problem. Properties of the type 2
problem are as follows:
a. Objects are distinguishable.
b. Urns are distinguishable.
c. On urn receives 𝑛1 objects, one urn receives 𝑛2 objects, …, and one urn
receives 𝑛𝑟 objects, but it is not known that precisely which urn gets 𝑛1
objects, which urn gets 𝑛2 objects, …, and which urn gets 𝑛𝑟 objects.
Example 6.2
How many ways can seven people be distributed into 3 rooms such that 2 or 3
people are in each room?
Solution. The number of states for this problem can be written as a sum of three
following states:
26 | P a g e
The first state: 3, 2, and 2 people are in rooms 1, 2, and 3 respectively. The
7 4 2 7!
number of states is equal to ( ) ( ) ( ) = = 210.
3 2 2 3!2!2!
The second state: 2, 3, and 2 people are in rooms 1, 2, and 3 respectively. The
7 5 2 7!
2 3 2 2!3!2!
The third state: 2, 2, and 3 people are in rooms 1, 2, and 3 respectively. The
7 5 3 7!
2 2 3 2!2!3!
Therefore, the number of states for this problem equals the sum of the above
7!
three states. That is, 3 × = 630.
3!2!2!
In fact, since the problem does not specify the two and three-member rooms,
we should first clarify this issue. For this purpose, it suffices to choose the three-
3
member room, which is possible in ( ) states. After that, similar to the type 1
1
7 4 2
problem, the number of states for each room is specified, and there are ( ) ( ) ( ) =
3 2 2
7!
ways to determine people of the rooms. Consequently, the number of states
3!2!2!
3 7!
for this problem is equal to ( ) = 630.
1 3!2!2!
Example 6.3
How many ways can seven people be distributed into 4 two-member rooms?
Solution. One of the rooms is occupied by one person, and each of the other three
rooms is occupied by two people. However, the one-member and two-member
rooms are not specified. Therefore, first, we should determine which room is one-
member. Then, the number of people in each room is specified. Furthermore, we
should select the individuals in the same way as the type 1 problem. Hence, the
number of states is equal to:
27 | P a g e
4 7 5 3 1 4 7!
( )( )( )( )( ) = ( )
1 2 2 2 1 1 1! 2! 2! 2!
Note that when we determine the one-member room, the number of two-
member rooms is obtained automatically. To specify the number of individuals in the
rooms, instead of determining the one-member room, the two-member rooms can
4
be selected, which has ( ) states. Therefore, the number of states of this problem
3
can be written as follows:
4 7!
( )
3 2! 2! 2! 1!
In general, the number of possible states for distributing “𝑛” distinguishable
objects into “𝑟” distinguishable urns in a way that one urn gets 𝑛1 objects, …, and one
urn gets 𝑛𝑟 objects such that it is not known which urn gets 𝑛1 objects, which urn
gets 𝑛2 objects, and so on is equal to:
𝑛!
(The number of states to specify how many objects are in each urn) ×
𝑛1 ! 𝑛2 !. . . 𝑛𝑟 !
where 𝑛1 + 𝑛2 + 𝑛3 + ⋯ + 𝑛𝑟 = 𝑛.
Example 6.4
How many ways can seven people be distributed into 4 two-member rooms
such that two particular people are supposed to be in a room?
4
Solution. First, we select the room of two particular people in ( ) states. Then, one
1
out of the remaining three rooms is occupied by one person, and each of the other
two rooms is occupied by two people. Therefore, we confront the type 2 problem. In
such situations, we select one of the remaining three rooms as the one-member
room, and the other two rooms will be automatically two-member rooms. Now, we
choose the people of each room. Therefore, the number of states equals:
4 3 5!
( )( )
1 1 2! 2! 1!
28 | P a g e
Example 6.5
How many ways can ten prizes be distributed among three people such that
one person receives five prizes, another person receives three prizes, and the other
person receives two prizes?
Solution. First, we determine the person who receives the five prizes (the first trial),
the one who receives three prizes (the second trial), and the individual who receives
two prizes. Then, we distribute prizes (objects) among the individuals. Therefore, the
number of states is equal to:
3 2 1 10!
( )( )( )
1 1 1 5! 3! 2!
Example 6.6
How many ways can seven people be distributed among 3 three-member

rooms?
Solution. There are two general states for this problem:
The first state: one room is occupied by one person, and each of the other two
rooms is occupied by three individuals, which is the type 2 problem. Hence, the
number of states in this case is equal to:
3 7!
( ) = 420
1 1! 3! 3!
The second state: one room is occupied by three individuals, and each of the
other two rooms is occupied by two individuals, which is the type 2 problem. Hence,
the number of states in this case is:
3 7!
( ) = 630
1 3! 2! 2!
29 | P a g e
Therefore, the number of states of this problem is equal to:
3 7! 3 7!
( ) +( ) = 420 + 630 = 1050
1 1! 3! 3! 1 3! 2! 2!
Type 3: Putting distinguishable objects into indistinguishable (identical) urns

such that the number of objects in each urn is a specific value.

indistinguishable urns in a way that one urn gets 𝑛1 objects, one urn gets 𝑛2
objects, ..., and one urn gets 𝑛𝑟 objects such that ∑𝑟𝑖= 1 𝑛𝑖 = 𝑛. This book entitles this
type of problems as the "Type 3" of the ball and urn problems, the properties of
which are as follows:
b. Urns are indistinguishable.
c. One urn receives 𝑛1 objects, one urn receives 𝑛2 objects, …, and one urn
receives 𝑛𝑟 objects.
Example 6.7
Suppose that we want to position four people in two indistinguishable two-

member rooms. How many ways are there to do so?
Solution. The difference between this example and the type 1 and 2 problems
discussed in this book is that, herein, the rooms are indistinguishable. In other words,
only each person's groupmates matter, not his or her room. If it is supposed to
position four people in two distinguishable two-member rooms numbered from 101
4 2
to 102, we are confronted with an old problem belonging to the type 1 with ( ) ( ) =
2 2
4!
states. Now, if we regard Example 6.7 as a new problem, it can be shown as
2!2!
follows that every 2! states of the old problem are equivalent to one state of the new
problem. This is related to the creation of a new state as a result of the displacement
30 | P a g e
of groups in distinct urns. However, this is not true for the case of indistinguishable
urns.
Distinguishable 4 people in 2 two-member rooms Indistinguishable
Room 101 Room 102
Every 2! States of the left problem

are equivalent to one state of the
right problem
Therefore, the number of states in the old problem is 2! times that of the states
in the new problem. To show the number of states in the new problem, we have:
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑒𝑤 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 (𝑖𝑛𝑑𝑖𝑠𝑡𝑖𝑛𝑔𝑢𝑖𝑠ℎ𝑎𝑏𝑙𝑒 𝑢𝑟𝑛𝑠)

1 4! 1
= 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑜𝑙𝑑 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 (𝑑𝑖𝑠𝑡𝑖𝑛𝑔𝑢𝑖𝑠ℎ𝑎𝑏𝑙𝑒 𝑢𝑟𝑛𝑠) × = ×
2! 2! 2! 2!
Example 6.8
How many ways can nine people be distributed into 3 three-member rooms
such that:
a. Rooms are different and numbered from 101 to 103?
b. Rooms are identical?
31 | P a g e
Solution.
a. If it is supposed to position nine people in three different three-member
rooms, we confront the type 1 problem, and the number of states equals:
9 6 3 9!
( )( )( ) =
3 3 3 3! 3! 3!
b. In this problem, rooms are indistinguishable, and only each person's

groupmates matter. If we regard the section (b) of this example as a new
problem and its section (a) as an old problem, it can be shown as follows
that every 3! states of the old problem are equivalent to one state of the
new problem:
Distinguishable rooms 9 people in 3 two-member rooms Indistinguishable rooms
Room 101 Room 102 Room 103
Every 3! from the left problem equals one state of the right problem
However, the above figure shows only 6 states out of the total number of states
in section (a) and one state out of the total number of states in section (b). It is
presented to comprehend the relationship between sections (a) and (b) in a better
way.
Hence, the number of states in the old problem is 3! times that of the states in
the new problem. To show the number of states in the new problem, we have:
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑒𝑤 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 (𝑖𝑑𝑒𝑛𝑡𝑖𝑐𝑎𝑙 𝑢𝑟𝑛𝑠)

1 9! 1
= 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑜𝑙𝑑 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 (𝑑𝑖𝑠𝑡𝑖𝑛𝑔𝑢𝑖𝑠ℎ𝑎𝑏𝑙𝑒 𝑢𝑟𝑛𝑠) × = ×
3! 3! 3! 3! 3!
32 | P a g e
Example 6.9
Suppose that we want to group seven people. Obtain the number of possible
states in the following conditions:
a. If we have three different rooms numbered 1 through 3 such that the
room 1 is occupied by three people, the room 2 is occupied by two
people, and the room 3 is occupied by two people.
b. If we have three different rooms numbered 1 through 3 such that one
room is occupied by three people, and each of the other two rooms is
occupied by two people.
c. If we have three identical rooms such that one room is occupied by
three people, and each of the other two rooms is occupied by two
people.
Solution.
a. In Section (a) of this example, since the number of people in each room
are specific, we face the type 1 problem. The number of desired states
is:
7 4 2 7!
( )( )( ) = = 210
3 2 2 3! 2! 2!
b. In Section (b) of this example, since rooms are distinguishable and it is

not specified which room is three-member, we should select it that
indicates the type 2 problem. Therefore, its number of states is equal to:
3 7 4 2 7!
( )( )( )( ) = 3× = 630
1 3 2 2 3! 2! 2!
c. In this problem, the rooms are indistinguishable, and only the number
of each person's groupmates matters. If we regard Section (c) of this
example as a new problem, Section (a) or (b) can be used to solve it. It
can be shown as follows that if we regard the Section (a) as an old
33 | P a g e
problem, every 2! states of the old problem are equivalent to one state
of the new problem. However, if we regard Section (b) as an old
problem, every 3! states of the old problem are equivalent to one state
of the new problem.
The section b (Type 2) The section c (Type 3) The section a (Type 1)

Room 1 Room 2 Room 3 Room 1 Room 2 Room 3
However, the above figure shows some states of the sections (a), (b), and (c),
presented to better realize the relationship of this example's sections. Also, in the
figure above, with regard to the relationship between the sections (a) and (c), the
reader should note that the displacement of two-member groups in rooms 2 and 3,
which are two-member, creates a new state in Section (a), but it does not lead to a
new state in Section (c), where the rooms are assumed to be indistinguishable.
Furthermore, in the sections (b) and (c), the displacement of all groups in different
rooms creates a new state in Section (b), but it does not result in a new state in
Section (c), where the rooms are assumed to be indistinguishable.
Therefore, to solve Section (c), if we regard Section (a) as an old problem, the
number of states in the old problem is 2! times that of the states in the new problem.
To show the number of states in the new problem, we have:
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑠𝑒𝑐𝑡𝑖𝑜𝑛 (𝑐)
1 7! 1
= 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑠𝑒𝑐𝑡𝑖𝑜𝑛 (𝑎) ×
= × = 105
2! 3! 2! 2! 2!
Besides, to solve Section (c), if we regard Section (b) as an old problem, the
number of states in the old problem is 3! times that of the number of states in the
new problem. To show the number of states in the new problem, we have:
the number of states in section (c)
1 3 7! 1
= 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑠𝑒𝑐𝑡𝑖𝑜𝑛 (𝑏) × =( ) × = 105
3! 1 3! 2! 2! 3!
34 | P a g e
Consequently, to solve Section (c), where the urns are indistinguishable, two
methods are presented. We usually use the first method in this book, and its general
structure is as follows:
The number of states to put “𝑛” distinguishable objects into “𝑟”
indistinguishable urns such that one urn receives 𝑛1 objects, one urn receives 𝑛2
objects, …, and one urn receives 𝑛𝑟 objects is equal to:
𝑛! 1
×
𝑛1 ! 𝑛2 ! . . . 𝑛𝑟 ! 𝑘1 ! 𝑘2 !. . .
where 𝑘𝑖 's are the urns having the same number of objects. For example, consider
two urns, each with three objects, and another two urns, each with two objects.
Then, we have 𝑘1 = 2 and 𝑘2 = 2, meaning that the number of possible states to
distribute ten people among four groups, where two of them are two-member and
10! 1
the other two groups are three-member, is equal to × .
2!2!3!3! 2!2!
Type 4: Putting distinguishable objects into distinguishable urns such that

there is no restriction on the number of objects in the urns.
Suppose that we want to determine the number of states for distributing “𝑛”
distinguishable objects into “𝑟” distinguishable urns without any restriction on the
number of objects in each urn. The first object has “𝑟” states to be put into the urns.
For each of the results of the first trial, there are “𝑟” states for the second object, …,
and for each result of the previous trials, there are “𝑟” states for the 𝑛𝑡ℎ object. Hence,
according to the principle of multiplication, it can be done in 𝑟 𝑛 states. In this book,
we entitle this problem as the "Type 4" with the following properties:
c. Without any restriction on the number of objects in each urn.
Note that the type 4 problem can be written as a sum of type 1 states. For
example, suppose that we want to distribute 𝑛 = 3 distinguishable objects into 𝑟 = 2
distinguishable urns. The number of states for this problem is 𝑟 𝑛 = 23 . Furthermore,
we can write the number of states for this problem as the sum of some type 1
problems as follows:
35 | P a g e
Urn 1 Urn 2 The number of states
3!
3 0 =1
3!0!
3!
2 1 =3
2!1!
3!
1 2 =3
1!2!
3!
0 3 =1
0!3!
The total number of states= 𝑟 𝑛 = 23 = 8
Therefore, in general, the number of states to put “𝑛” distinguishable objects
into “𝑟” distinguishable urns without any restriction is equal to:
𝑛 𝑛!
𝑟𝑛 = ∑ (𝑛 , … , 𝑛 ) = ∑
1 𝑟 𝑛1 ! ⋯ 𝑛𝑟 !
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑟 = 𝑛 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑟 = 𝑛
A close look reveals that the number of states of the type 4 problem is equal to
the sum of the coefficients of simple multinomial expansion. To show this
relationship, it suffices to let 𝑥𝑖 's equal 1:
𝑛 𝑛 𝑛 𝑛 𝑛! 𝑛 𝑛 𝑛
(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑟 )𝑛 = ∑ (𝑛 , . . . , 𝑛 ) 𝑥1 1 𝑥2 2 … 𝑥𝑟 𝑟 = ∑ 𝑥1 1 𝑥2 2 … 𝑥𝑟 𝑟
1 𝑟 𝑛1 ! ⋯ 𝑛𝑟 !
𝑛1 + ⋯ + 𝑛𝑟 = 𝑛 𝑛1 + ⋯ + 𝑛𝑟 = 𝑛
Example 6.10
How many ways can ten officers be placed in three important avenues?
Solution. As mentioned before, since there are three possible states for each officer,
the total number of states equals 𝑟 𝑛 = 310 , which can be written as the sum of the
type 1 states.
Therefore, the answer to this problem is:
36 | P a g e
10!
∑ = 310
𝑛1 ! 𝑛2 ! 𝑛3 !
𝑛1 + 𝑛2 + 𝑛3 = 10
Example 6.11
How many ways can ten officers be placed in three important avenues such
that the first avenue contains exactly three officers?
Solution. We first select three officers and place them in the first avenue. Then, we
send the remaining seven officers to the other two avenues without any restriction.
Hence, the answer equals:
10
( ) × 27 = 120 × 27
3
Type 5: Putting distinguishable objects into distinguishable urns such that

each urn receives at least one object.

distinguishable urns in a way that each urn receives at least one object. In this book,
we entitle this type of problem as the "Type 5" with the following properties:
c. There should be at least one object in each urn.
37 | P a g e
Example 6.12
How many ways can ten officers be placed in three avenues such that each
avenue has at least one officer?
One possible solution is to initially choose 3 out of the 10 officers and position
10
them in avenue 3. This is possible in ( ) 3! states. Then, according to the type 4
3
problem, the rest of the officers can be positioned in avenues in 37 states. Therefore,
10
the total number of states of this example becomes ( ) 3! × 37 states.
3
But the above solution is wrong!
The reason making the solution wrong is that some states are counted
multiple times instead of once. That is, if we first choose three officers numbered 1
through 3 and position them in avenues numbered 1 through 3 respectively, and then
position the others in avenue 1, this means that we have counted this state once.
Avenue (1) Avenue (2) Avenue (3)
1,4,5,6,7,8,9,10 2 3
However, if we first choose three officers numbered 4, 2, and 3 and position

them in avenues numbered 1, 2, and 3 respectively, and then send the remaining
seven officers to the avenue 1, we have regarded this state in our calculation as well.
Hence, this result is the same as the previous one, and we have counted it twice
without any logic. Also, note that there are more repetitive states in this solution that
are not explained here. To be more specific, the number of states related to this
specious solution is more than the total number of states. For example, the number
10
of states of this example resulting from the false solution is equal to ( ) 3! × 37
3
which is more than the total number of 310 possible states.
A method to solve the problem is to write it as a series of cases of the type (1)
in a way that none of the urns is empty, leading to following the number of states:
38 | P a g e
𝑛 𝑛!
∑ ( ) = ∑
𝑛1 +𝑛2 +⋯+𝑛𝑟 =𝑛
𝑛1 , … , 𝑛𝑟 𝑛
𝑛 ! ⋯ 𝑛𝑟 !
=𝑛 1
1 +𝑛2 +⋯+𝑛𝑟
𝑛𝑖 ≥1 𝑛𝑖 ≥1
However, if the difference between the number of objects (𝑛) and the number
of urns (𝑟) is high in this method, the solution usually takes a lot of time. For example,
if we want to solve Example 6.12 by this solution, the number of states is as follows:
10! 10! 10! 10! 10!
∑ = + + + +⋯
𝑛1 ! 𝑛2 ! 𝑛3 ! 8! 1! 1! 1! 8! 1! 1! 1! 8! 1! 5! 4!
𝑛1 + 𝑛2 + +𝑛3 = 10
𝑛𝑖 ≥ 1
As seen, calculating the answer by the very method is time-consuming.
However, if the difference between the number of objects (𝑛) and the number of urns
(𝑟) is low in this method, the solution becomes mostly elegant. For instance, if we
have just four officers in Example 6.12, the difference between the number of objects
and the number of urns equals one, and the number of the required states using this
method is:
4! 4! 4! 4! 3 4!
∑ = + + =( )
𝑛1 ! 𝑛2 ! 𝑛3 ! 2! 1! 1! 1! 2! 1! 1! 1! 2! 1 2! 1! 1!
𝑛1 + 𝑛2 + +𝑛3 = 4
𝑛𝑖 ≥ 1
Another method to solve the type (5) problem is to apply one of the formulas
of the algebra of sets. Since the topic will be explained in the next chapter, only one
of the obtained results is used in this section. Suppose that 𝐴𝑖 indicates avenue 𝑖 is
vacant. In the next chapter, we will show that Example 6.12 can be solved by using
the formulas of the algebra of sets as follows:
𝑛 (All of the avenues have officer) = 𝑛 (Total states) − 𝑛 (At least one avenue is vacant)
= 𝑛 (Total states) −𝑛(𝐴1 ∪ 𝐴2 ∪ 𝐴3 )
𝑛 (Total states) = 310
𝑛 (𝐴1 ∪ 𝐴2 ∪ 𝐴3 ) = 𝑛(𝐴1 ) + 𝑛(𝐴2 ) + 𝑛(𝐴3 ) − 𝑛(𝐴1 ∩ 𝐴2 ) − 𝑛(𝐴1 ∩ 𝐴3 ) − 𝑛(𝐴2 ∩ 𝐴3 ) +
𝑛 (𝐴1 ∩ 𝐴2 ∩ 𝐴3 )
𝑛(𝐴1 ) = 𝑛(The first avenue is vacant) = 210 ⇔ All of the officers are in the avenues 2 and 3
39 | P a g e
𝑛(𝐴1 ∩ 𝐴2 ) = (The first and second avenues are vacant) = 110
⇔ All of the officers are in avenue 3
𝑛(𝐴1 ∩ 𝐴2 ∩ 𝐴3 ) = 𝑛(The first, second, and third avenue are vacant) = 0
Therefore, the number of states in this example is equal to:
𝑛 (Total states) −𝑛 (𝐴1 ∪ 𝐴2 ∪ 𝐴3 )
= 310 − 210 − 210 − 210 + 110 + 110 + 110 − 0
3 3
= 310 − ( ) 210 + ( ) 110 − 0
1 2
In general, using methods of the algebra of sets, the number of states of
putting “𝑛” distinguishable objects into “𝑟” distinguishable urns such that each urn
receives at least one object can be written as follows:
𝑟
𝑟 𝑟 𝑟 𝑟
= 𝑟 − ( ) (𝑟 − 1)𝑛 + ( ) (𝑟 − 2)𝑛 + ⋯ + (−1)𝑟 ( ) (𝑟 − 𝑟)𝑛 = ∑ (−1)𝑖 ( ) (𝑟 − 𝑖)𝑛
𝑛
1 2 𝑟 𝑖
𝑖=0
Type 6: Putting indistinguishable objects into distinguishable urns such that

there is no restriction on the number of objects in the urns.
Suppose that we want to determine the number of states for distributing “𝑛”
indistinguishable objects into “𝑟” distinguishable urns without any restriction on the
number of objects in each urn. In this book, we entitle this type of problem as "Type
6" with the following properties:
a. Objects are indistinguishable.
c. Without any restriction on the number of objects in the urns.
As mentioned in type 4 of the ball and urn problems, the number of possible
results for putting “𝑛” distinguishable objects in “𝑟” distinguishable urns is equal to
𝑟 𝑛 . For example, suppose that we have five objects numbered 1 through 5 and three
different urns. The number of states for putting objects into the urns equals 35 = 243.
5!
Out of the 243 possible results, there are = 30 states, where the number of
2! 2! 1!
objects in the first, second, and third urns is equal to 2, 2, and 1, respectively. Now, if
the objects are assumed to be indistinguishable, and their number does not matter,
or in other words, only the number of objects put into each urn matters, then all of
40 | P a g e
these 30 states turn out to be identical. Two states of them are shown below as
examples:
Urn 1 Urn 2 Urn 3
1,2 3,4 5 Urn 1 Urn 2 Urn 3



 Two balls Two balls One ball


1,3 4,5 2
Hence, the new question is that if the value of objects is the same, and only
the number of objects put into each urn matters, how many states are there to put
the balls into urns?
We continue the problem with the same example including five objects and
three urns. A possible method is to find a ratio between the number of answers in
the problem with distinguishable and undistinguishable objects. However, the
approach is not suitable because there is no fixed ratio between the number of
states of these two problems. For example, if we consider that there is one object
in each of the first and second urns and three objects in the third urn, then every
5!
= 20 states of the problem with distinguishable objects are equivalent to one
1!1!3!
state of the problem with indistinguishable objects. However, as shown, when two
objects are in the first urn, two objects in the second urn, and one object in the
5!
third urn, every = 30 states of the problem with distinguishable objects are
2!2!1!
equivalent to one state of the problem with indistinguishable objects.
Hence, we use another method to solve this problem. For this purpose,
consider the following problem:
How many different arrangements are possible using five circles and two
lines?
As mentioned in Section 1.3.3, the number of states of this problem is:
7! 7 7
=( )=( )
2! 5! 2 5
41 | P a g e
Now, it can be understood that each state of this problem is equivalent to
one state of putting five indistinguishable objects into three distinguishable urns.
Some of those states are presented as follows:
Urn 1 Urn 2 Urn 3

○ | ○ | ○○○ → Three
One ball One ball
balls
| ○○○○ | ○→ Zero Four

One ball
ball balls
Note that for equating the number of states of these two problems, it suffices
to regard, in the left figure, the number of circles before the first line as the number
of objects in the first urn, the number of circles between the first and second lines
as the number of objects in the second urn, and the number of circles after the
second line as the number of objects in the third urn. Hence, the number of states to
distribute five indistinguishable objects into three distinguishable urns is equivalent
to arrange five O (circle) and two | (line) in a row.
Therefore, in general, it can be shown that the number of states to distribute
“𝑛” indistinguishable objects into “𝑟” distinguishable urns is equivalent to the number
of arrangements of (𝑟 − 1) lines and 𝑛 circles, which is:
(𝑛 + 𝑟 − 1)! 𝑛+𝑟−1 𝑛+𝑟−1

=( )=( )
(𝑟 − 1)! 𝑛! 𝑟−1 𝑛
Example 6.13
How many ways are possible to distribute eleven intact oranges among four
boys such that only the number of received oranges matters for the boy?
42 | P a g e
Solution. Since only the number of oranges matters for each boy, the objects are
considered identical in this problem, and the number of states is:
11 + 4 − 1 14
( )=( )
4−1 3
Example 6.14
How many ways are possible to distribute seven intact oranges and five intact
apples among four boys if the value of apples and that of oranges are considered to
be the same by each boy?
7+4−1
Solution. The oranges can be distributed in ( ) states among boys. For each
4−1
5+4−1
of those results, there are ( ) states to distribute apples among boys.
4−1
Therefore, the total number of states equals:
7+4−1 5+4−1 10 8
( )( ) = ( )( )
4−1 4−1 3 3
Note that it can be shown that the total number of states to solve the equation
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑟 = 𝑛 such that the 𝑛𝑖 's are nonnegative integers is equivalent to the
number of states to distribute “𝑛” indistinguishable objects into “𝑟” distinguishable
urns. To demonstrate the point, it suffices that we show each state of the solutions
belonging to the equation 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑟 = 𝑛 is equivalent to one state of putting
“𝑛” indistinguishable objects into “𝑟” distinguishable urns. For instance, consider the
following example:
Example 6.15
How many nonnegative integer solutions does the following equation have?
𝑛1 + 𝑛2 + 𝑛3 = 7
43 | P a g e
Solution. This problem is the same as the problem of distributing seven
indistinguishable objects into three distinguishable urns. For example, 𝑛1 = 2, 𝑛2 = 2,
and 𝑛3 = 3 is equivalent to the state that the first urn receives two objects, the second
urn receives two objects, and the third urn receives three objects. Therefore, the
number of solutions of the equation is:
7+3−1 9
( )=( )
3−1 2
In addition, it can be shown that the number of terms in the expansion of (𝑥1 +
𝑥2 + ⋯ + 𝑥𝑟 )𝑛 is equivalent to the number of states of putting “𝑛” indistinguishable
objects into “𝑟” distinguishable urns. For instance, consider the following example:
Example 6.16
How many terms are there in the expansion of (𝑥1 + 𝑥2 + 𝑥3 )7?

Solution. The expansion of this problem is as follows:
7! 𝑛 𝑛 𝑛
(𝑥1 + 𝑥2 + 𝑥3 )7 = ∑ 𝑥1 1 𝑥2 2 𝑥3 3
𝑛1 ! 𝑛2 ! 𝑛3 !
𝑛1 + 𝑛2 + 𝑛3 = 7
Each state of determining the 𝑛𝑖 's such that their sum equals 7 comprises
one of the terms belonging to this expansion. Hence, the number of terms in this
expansion is equal to the number of nonnegative integer solutions of the equation
𝑛1 + 𝑛2 + 𝑛3 = 7, which is:
7+3−1 9
( )=( )
3−1 2
Therefore, in general, the number of states of putting “𝑛” indistinguishable
objects into “𝑟” distinguishable urns, the number of nonnegative integer solutions
of the equation 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑟 = 𝑛, and the number of terms in the expansion of
(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑟 )𝑛 are equal to each other.
44 | P a g e
Furthermore, with regard to the expansion of (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑟 )𝑛 , the sum
𝑛+𝑟−1
of the coefficients is equal to 𝑟 𝑛 , and its number of terms is ( )=
𝑟−1
𝑛+𝑟−1
( ).
𝑛
Type 7: Putting indistinguishable objects in distinguishable urns in a way that
each urn receives at least one object.
Suppose that we want to distribute “𝑛” indistinguishable objects into “𝑟”
distinguishable urns such that each urn receives at least one object. In this book,
we entitle this type of problem as "Type 7" with the following properties:
a. Objects are indistinguishable.
c. Each urn should receive at least one object.
To solve this problem, we first put one object into each urn, which can be
done in one state since objects are identical. Then, we put the remaining (𝑛 − 𝑟)
objects, like the type 6 problem, into the urns without restriction. Therefore, the
number of states is:
(𝑛 − 𝑟) + 𝑟 − 1 𝑛−1 𝑛−1
( )=( )=( )
𝑟−1 𝑟−1 𝑛−𝑟
Example 6.17
How many ways are possible to distribute eleven intact and identical
oranges among four boys such that each boy receives at least one orange?
Solution. We first give each boy one orange, which can be done in one state.
Then, we distribute the remaining seven oranges among boys. Therefore, the
7+4−1 10
( )=( )
4−1 3
Note that the type 7 problem can be simply generalized for the states in
which the least number of objects specified for each urn is more than one. For
45 | P a g e
example, in Example 6.17, if it is supposed that each boy receives at least two
oranges, we first give each boy two oranges, which can be done in one state.
Then, we distribute the remaining three oranges among the boys. Therefore, the
3+4−1 6
( )=( )
4−1 3
Finally, the reader should note that other problems can also be defined for
putting objects into the urns, which are not addressed in this book.
Furthermore, to classify the problems related to distributing objects into

urns more appropriately, we recap some examples solved in Section 1.6.
1) How many ways can ten officers be positioned in three avenues named A, B, and
C if it is supposed to place the four officers in avenue A, three officers in avenue
B, and three officers in avenue C?
10!
Solution. According to the type (1) problem, the answer is equal to .
4!3!3!
C that there are four officers in one avenue and three officers in each of the other
two avenues?
Solution. According to the type (2) problem (it is not specified which avenue is four-
3 10! 3 10!
member), the answer is equal to ( ) =( ) .
1 4!3!3! 2 3!3!4!
3) How many ways can ten officers be positioned in 2 three-member groups and 1
four-member group if only each officer's groupmates matter? (i.e, the location of
groups does not matter, that is, they are the same.)
10! 1
Solution. According to the type (3) problem, the answer is equal to × .
4!3!3! 2!
C whenever there is no restriction on the number of officers in each avenue?
Solution. According to the type (4) problem, the answer is equal to 310 .
46 | P a g e
C in a way that each avenue is occupied by at least one officer?
Solution. According to the type (5) problem, the answer is:
3 3
310 − ( ) 210 + ( ) 110 − 0
1 2
C whenever the number of officers in each avenue matters?
Solution. According to the type (6) problem (since only the number of officers in
each avenue matters, the objects are considered indistinguishable), the answer is
equal to:
10 + 3 − 1 12
( )=( )
3−1 2
C if only the number of officers in each avenue matters, and each avenue is
occupied by at least one officer?
Solution. According to the type (7) problem, the answer is equal to:
10 − 1 9
( )=( )
3−1 2
47 | P a g e
1) How many four-digit even numbers can be formed without repetition using
the integer digits from 0 to 4?
2) How many five-digit numbers are there such that:
a. Contain at least a 5?
b. Contain exactly a 5?
3) How many four-digit numbers are there such that at least one of its digits is
repeated more than once?
4) If each key can take on the value of zero or one, then how many keys do we
need to send 1000 errands of an electronic system?
5) Of numbers 1 through 10000, how many numbers are there such that at least
two consecutive digits are identical (For example, 1003 and 992)?
6) Suppose that we want to randomly arrange three boys and three girls in a row.
How many possible states are there to do so:
a. If there is no restriction on sitting for people?
b. If no two of the same sex are allowed to sit together?
c. If girls are required to sit together, and so do the boys?
d. If girls are to sit together, but there is no restriction on the sitting of
boys.
e. If these six people include three sister-brother couples, and each girl
wants to sit next to her brother?
f. If one specific couple out of the 3 couples is required to sit together?
7) Solve the preceding problem again, if we randomly arrange the people at a
round table.
8) How many possible states are there if five people are to sit in a row, and people
A and B sit next to each other at a distance?
48 | P a g e
9) Solve the preceding problem again, if we randomly arrange the people at a
round table.
10) In the expansion of (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑟 )𝑛 ,
a. How many terms are there?
b. What is the sum of the coefficients of terms?
c. How many terms are there such that all the powers of 𝑥𝑖 's are greater
than or equal to 1?
11) In the expansion of (𝑥1 + 2𝑥2 + 3𝑥3 + 4𝑥4 )4,
a. How many terms are there?
b. What is the coefficient of the term 𝑥2 𝑥32 𝑥4 ?
c. What is the sum of the coefficients of terms?
12) What is the coefficient of the term 𝑥1 𝑥22 in the expansion of (𝑥1 + 4𝑥2 − 𝑥3 +
1 5
) ?
2
1
13) What is the coefficient of the term 𝑥14 𝑥215 in the expansion of (4𝑥1 2 + 2 𝑥2 3 +
𝑥3 )7 ?
14) In the expansion of (𝑥1 + 𝑥2 + 𝑥3 )4,
a. how many terms are there such that each term includes all the
variables?
b. how many terms are there such that each term includes just 2 out of the
3 variables?
15) In the expansion of (𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 )5,
a. how many terms are there such that each term includes at least two
variables?
b. how many terms are there such that the power of variable 𝑥1 is greater
than or equal to 2?
16) What is the number of solutions of the equation 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑟 = 𝑛 such that
𝑘 of the 𝑥𝑖 's equals zero?
49 | P a g e
17) We want to distribute 25 indistinguishable marbles into three boxes A, B, and
C such that there is at least one marble in box A, at least two marbles in box B,
and at least three marbles in box C. How many possible states are there to do
so?
18) How many ways can “𝑛” indistinguishable balls be distributed into “𝑟”
distinguishable urns such that the 𝑖 𝑡ℎ urn contains at least 𝑚𝑖 balls?
𝑟
(𝑛 ≥ ∑ 𝑚𝑖 = 𝐴)
𝑖=1
19) How many ways can 262 indistinguishable balls be distributed into eight
8
distinguishable urns such that the 𝑖 𝑡ℎ urn contains at least ( ) balls?
𝑖
20) How many ways can eight televisions be purchased from a company producing
just three kinds of television?
21) How many ways can the equation 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 + 𝑥5 = 30 with nonnegative
integers be solved such that all of them are multiple of 3?
22) Consider five mathematics, five physics, and five chemistry students.
a. How many ways are there to position these people in three classes such
that there is at least one person from each discipline? (Consider the
people of each discipline to be identical. In other words, only the
number of people of each discipline matters.)
b. If the people of each discipline is considered distinguishable and there
are three classes, how many states are there such that exactly one
person exists in class number 1 of each discipline?
23) We want to position seven people numbered 1 through 7 in 4 two-member
rooms.
a. How many states are there such that people 1 and 2 are roommate?
b. How many states are there such that people 1 and 2 are roommate, and
so do the people 3 and 4?
24) Consider a function of 𝑘 variables like 𝑓(𝑥1 , … , 𝑥𝑘 ) which is an everywhere
differentiable function.
50 | P a g e
a. How many partial derivatives of the order “𝑛” are possible for this
function?
b. How many partial derivatives of the order “𝑛” are possible for this
function such that “𝑘” times have been differentiated with respect to a
specific variable?
c. How many partial derivatives of the order “𝑛” are possible for this
function such that at least once has been differentiated with respect to
each variable?
25) Suppose that 12 points lie on a plane.
a. If no three points lie on a straight line, how many straight lines do we
need to connect these points two by two?
b. If four points lie on a straight line, and except these four points no three
points lie on a straight line, then how many straight lines do we need to
connect these points in pairs?
26) How many triangles can be made using nine points such that:
a. No two points are on one line?
b. Points are located as follows:
27) In the following figure, nine points have been specified on a circle.
a. How many quadrilaterals can be made using these points?

b. How many quadrilaterals can be made such that AC is the diameter?
51 | P a g e
28) How many diameters are there in an 𝑛-sided regular polygon?
29) In each of the following figures, how many rectangles can be found?
a.
b.
30) We roll six dice. How many possible states are there to see three pairs?
31) There are “𝑛” distinguishable urns and “𝑛” distinguishable balls. How many
states are there to select “𝑘” out of the “𝑛” balls and distribute them into “𝑘”
urns such that each urn gets one ball?
32) How many ways can five identical iron and ten identical golden keys be
arranged next to each other on a panel such that the fourth key is iron?
33) How many ten-letter words can be made using the letters AAAABBBCCC in a
way that the third and seventh letters are A and B respectively?
34) There are three cars from country A, four cars from country B, and five cars
from country C in a car racing. How many ways can two cars from country A
achieve the first and last (the twelfth) ranks (suppose that the scores are
registered based on the country name and the cars of each country are
considered identical)?
35) Twelve runners have participated in a competition. Of the runners, five are
from China, three are from the United States of America, two are from Kenia,
and two are from Germany. The obtained score is registered based on the
runner's name of the country and not the runner's name. How many possible
ranking results are there from 1 to 12 if China has two runners among the top
three ranks and one runner among the bottom three ranks?
52 | P a g e
36) Consider ten people. How many ways are there to give two $2000-prize, one
$500-prize, and one $200-prize to four of them such that each of them
receives one prize?
37) How many ways can four people be positioned in three distinguishable two-
member rooms?
38) How many ways can people A, B, C, and D be positioned in three rooms with
respective capacities of one-member, two-member, and three-member?
39) In a box, there are 𝑁 marbles numbered 1 through 𝑁.
a. How many ways can “𝑛” marbles be selected from the box such that the
smallest number chosen among the marbles is equal to the number “𝑎”?
(1 ≤ 𝑎 ≤ 𝑁 − 𝑛 + 1)
b. If we select “𝑛” marbles from the box, how many states are there such
that the smallest number left in the box is equal to the number “𝑏”?
40) What is the number of eight-member subsets selected from the set {1,2, … ,12}
in a way that at least four members of each of these eight-member subsets
belong to the set {1,2, … ,6}?
41) We want to select a three-member committee consisting of one
administrator, one executive, and one secretary from ten representatives.
How many possible states are there such that:
a. Person A is a member of the committee only when he is selected as the
administrative?
b. Person A is a member of the committee only when he is selected as the
administrative, and people B and C are either selected together or none
of them is selected?
42) Consider a group consisting of seven men and eight women. We want to form
a six-member jury such that the jury contains at least three women and at
least two men. How many ways are there to do so?
43) How many ways can a council be selected from four women and five men such
that at least four people are men in the council?
53 | P a g e
44) If we want to select a five-member group from four freshmen, four
sophomores, and five juniors, how many states are there to do so if at least
one person from each category is to be in the group?
45) Out of 30! permutations of numbers 1, 2, … , 30, how many permutations are
there such that the multiples of 3 are not next to each other (meaning that
numbers 3, 6, … , 30 are not next to each other)?
𝑎 𝑎 𝑎
46) If the 𝑥𝑖 's are prime numbers, then how many divisors does 𝑋1 1 𝑋2 2 . .. 𝑋𝑛 𝑛 have?
47) Consider the number 410 × 68 × 1012 .
a. How many divisors are there for this number?
b. How many divisors of the preceding section are divisible by 3000?
48) How many possible states are there to position four tourists in three buses
such that there is at least one tourist in each of the buses (tourists are not
indistinguishable)?
49) How many ways can five balls be distributed into four urns such that exactly
two of them are empty?
50) Consider 12 people consisting of six sister-brother couples.
a. How many ways can these people be divided into six two-member
groups?
b. How many ways can these people be divided into two six-member
groups?
c. How many ways can these people be divided into six two-member
groups such that there are one boy and one girl in each group?
d. How many ways can these people be divided into two six-member
groups such that the number of girls in each group is equal to that of
the boys?
51) Ten wrestlers have participated in a competition. In the first round of the
competition, the wrestlers are divided into five couples to wrestle together. If
it is supposed to record the couples and their winners, how many different
outcomes are there to do so?
54 | P a g e
52) How many ways can three freshmen, three sophomores, and three juniors be
positioned in three three-member groups such that there is one person from
each category in the group?
53) How many ways can five freshmen, four sophomores, and seven juniors be
selected from a three-member group such that there is one person from each
category in the group?
54) Suppose that “𝑛” people are aligned in a row from left to right. It is desired to
calculate the number of states that:
a. “𝑎” is before “𝑏”.
b. “𝑎” is before “𝑏” and “𝑐”.
c. “𝑎” is before “𝑏”, and “𝑏” is before “𝑐”.
d. “𝑎” is before “𝑏”, and “𝑏” is before “𝑐” and “𝑑”.
e. “𝑎” is before “𝑏”, and “𝑐” is before “𝑑”.
55) How many six-letter words can be made using the letters aabbcd such that:
a. “𝑑” is before 𝑎's?
b. 𝑎's are before 𝑏's?
56) How many ways can eight people be placed in five rooms such that
a. None of the rooms is empty?
b. There are at most two people in each room?
c. There are at most two people in each room, and people A and B want to
be together in one room?
57) How many five-digit numbers can be made using digits from 1 to 9 such that
two of them are repeated twice (For example, 22717 and 81481)?
58) We have “𝑘” boxes, each of which contains “𝑛” marbles numbered 1 through 𝑛.
We draw one marble from each box. It is desired to find the number of states
in which number 3 is the least number among the k marbles.
59) There are “𝑛” marbles, “𝑎” of which are white numbered 1 through 𝑎 and (𝑛 −
𝑎) of which are black numbered 1 through (𝑛 − 𝑎). We want to draw white
55 | P a g e
marbles successively and randomly from the box. It is desired to find the
number of states that we draw the last white marble in the 𝑏 𝑡ℎ selection.
60) Consider the set {1, 2, 3, … , 10}. How many subsets are there in this set such
that:
a. The numbers 1 and 2 are present, but number 3 is absent?
b. The minimum number is 3, and the maximum number is 9?
c. The difference between the maximum and minimum numbers is 5?
61) Propose one combinatorial argument for each of the following identities.
𝑛 𝑖−1
a. ( ) = ∑𝑛𝑖=𝑘 ( );𝑛 ≥ 𝑘
𝑘 𝑘−1
𝑛 𝑖 𝑛 𝑛−1
b. ( )( ) = ( )( )
𝑖 1 1 𝑖−1
𝑛 𝑛
c. ∑𝑛𝑖=0 𝑖 ( ) = ( ) 2𝑛−1
𝑖 1
𝑛
d. ∑𝑛𝑖=0 𝑖 2 ( ) = 𝑛(𝑛 + 1)2𝑛−2
𝑖
𝑛 𝑗 𝑛
e. ∑𝑛𝑗=0 ( 𝑗 ) ( ) = ( ) 2𝑛−𝑖 ; 𝑖 ≤ 𝑛
𝑖 𝑖
𝑛+𝑟
f. ∑𝑛𝑖=0 (𝑖 + 𝑟 − 1) = ( )
𝑖 𝑟
𝑛+𝑟
g. ∑𝑛𝑖=0 (𝑖 + 𝑟 − 1) = ( )
𝑖 𝑟
𝑛
h. ∑𝑛𝑖=0 ( ) 𝑟 𝑛−𝑖 = (𝑟 + 1)𝑛
𝑖
62) A row of the cage is divided into four sections. How many ways can 20
distinguishable books be arranged in a row such that the first, second, third,
and fourth cages contain at least two, three, five, and three books
respectively?
63) We want to go from the origin (point O) to the point A with the coordinate of
(100,120). In each movement, we are allowed to go 20 units up horizontally
and 6k units vertically (in which k is a natural number 𝑘 = 1,2, …). In such
situations, how many ways are there to get to the point A if we start from the
point O?
56 | P a g e
64) Out of one thousand codes (000), (001), (002), … , (999), how many three-digit
codes are there such that the sum of its leftmost two digits equals its rightmost
digit?
65) Suppose that we have 𝑛 letters of 𝑎 and 𝑚 letters of 𝑏.
a. How many states are there such that none of 𝑎's is next to each other?
b. How many states are there such that at least two of 𝑏's are between
each of two 𝑎's?
c. Solve the two preceding sections with the assumption that all the
objects (𝑎 and 𝑏) are distinguishable.
66) Eight teachers and 25 students sit at a round table. How many states are there
such that at least two students sit between each of the two teachers?
67) Consider the equation (𝑥1 + 𝑥2 + 𝑥3 )(𝑥4 + 𝑥5 + 𝑥6 + 𝑥7 ) = 91. Obtain the
number of solutions in the equation such that 𝑥𝑖 's are natural numbers.
57 | P a g e
I n this chapter, we introduce the probability concept and address different ways of
calculating probability. Furthermore, we examine the procedure and conditions of
using combinatorial analysis for probability calculation. In the beginning and prior to
the probability definition, it is necessary to introduce the concepts of random trial,
sample space, and event as well as to explain the algebra of sets.
C onsider a trial with an unknown prior result, yet known possible results. Such a
trial is called a random trial, and the set of its possible results is called sample
space, usually denoted by the letter 𝑆. For more clarification, consider the following
examples:
➢ In the trial of tossing one coin, the sample space is as follows:
𝑆 = {𝐻, 𝑇}1
➢ In the trial of tossing two coins, the sample space is defined as:
𝑆 = { (𝐻, 𝐻), (𝑇, 𝐻), (𝐻, 𝑇), (𝑇, 𝑇) }
58 | P a g e
➢ In the trial of tossing two dice, the sample space consists of 36 states and is
defined as:
𝑆 = { (𝑖, 𝑗) ∶ 𝑖, 𝑗 = 1,2,3,4,5,6 }
➢ In the trial of measuring the lifetime of a particular light bulb (in hours), the
sample space is defined as:
𝑆 = {𝑥: 𝑥 ≥ 0}
Each subset of a sample space with possible outcomes belonging to a trial is
called the sample space event.
For instance, consider the trial of tossing two coins. If the event 𝐸 denotes at
least one heads appears, the event is expressed as follows:
𝐸 = {(𝐻, 𝑇), (𝑇, 𝐻), (𝐻, 𝐻)}
Alternatively, consider the trial of tossing two dice. If the event 𝐸 denotes the
sum of the results of two dice is equal to 4, the event is expressed as:
𝐸 = {(1,3), (2,2), (3,1)}
Also, in the trial of measuring the lifetime of a particular light bulb, the event
E is defined as the lifetime of the light bulb with a maximum value of 10 hours. This
event is represented as follows:
𝐸 = {𝑥: 0 ≤ 𝑥 ≤ 10}
Note that we say the event 𝐸 has occurred when one of its results has
occurred. Namely, in the trial of tossing two dice, assume that the event 𝐸 denotes
the sum of the results of two dice is equal to 4. Then, if one of the results (1,3), (2,2),
or (3,1) occurs, we say that the event 𝐸 has occurred.
I n the probability theory, the algebra of sets and the relationships between different
events of a trial are of great importance, which are addressed in this section.
Meanwhile, we assume that all the studied events belong to one sample space such
as S.
One illustrative method to indicate the logical relationships of events is the
use of the Venn Diagram. In this diagram, the sample space of the trial is represented
59 | P a g e
by a rectangle containing all the points, and the various events such as E and F are
usually shown as circles inside the rectangle. Thus, the desired events can be shown
by hatching the related area of the figure.
If E and F are arbitrary two events of the sample space, then we say that 𝐸 ∩ 𝐹
or 𝐸𝐹 is the intersection of two events E and F. That is, it contains all possible results
of the trial, which are both in the events E and F.
Figure 2-1 Intersection of the two events E and F shown as 𝐸 ∩ 𝐹
In fact, 𝐸 ∩ 𝐹 occurs whenever both of the events E and F occur. For this
purpose, a result of the sample space should occur that is in common for both of the
events.
Moreover, we say that 𝐸 ∪ 𝐹 is the union of two events E and F whenever it
contains all results either in E or F (or both), as shown Figure 2-1.
Figure 2-2 Union of the two events E and F shown as 𝐸 ∪ 𝐹
In other words, 𝐸 ∪ 𝐹 occurs whenever at least one of the events E and F

occurs. To this end, a result of the sample space should occur that is either in E or F
(or both), shown as 𝐸 ∪ 𝐹.
Namely, in the trial of tossing a die, suppose that the events E and F are defined
as 𝐸 = {1 ,2 ,3} and 𝐹 = {3 ,4}, respectively. Then, the events 𝐸 ∩ 𝐹 and 𝐸 ∪ 𝐹 will lead
to the respective values {3} and {1,2,3,4}.
60 | P a g e
The two events E and F are mutually disjoint or exclusive whenever their
intersection is empty. Such an event is shown by ∅. In other words, we say that two
events E and F are mutually disjoint whenever there is no possibility of their
simultaneous occurrence.
Figure 2-3 Two mutually disjoint events
For example, if the events E and F are respectively defined as 𝐸 = {1,2,3} and
𝐹 = {4,5} in the trial of tossing a die, then the events E and F are disjoint, and there is
no possibility of their simultaneous occurrence.
The event 𝐸 𝑐 is defined as the complement of the event E whenever 𝐸 𝑐
contains all the results of the sample space which are not present in E. In other
words, we say that event 𝐸 𝑐 is the complement of event E in case of satisfying the
following conditions:
𝐸 ∩ 𝐸𝑐 = ∅ , E ∪ E𝑐 = 𝑆
Figure 2-4 Two complement events
In fact, 𝐸 𝑐 occurs whenever the event E does not occur. For example, if the
events E and F are respectively defined as 𝐸 = {1,2 } and 𝐹 = {3,4,5,6} in the trial of
tossing a die, then the events E and F are complementary, and the event F occurs
if and only if the event E does not occur.
For the two events E and F, if all the members of E are also present in F, then
the event E is called as the subset of the F, designated as 𝐸 ⊂ 𝐹 and depicted as
Figure 2-5. Therefore, if the event E occurs, the event F necessarily occurs as well.
61 | P a g e
Figure 2-5 The event E is the subset of F shown as 𝐸 ⊂ 𝐹
If 𝐸 ⊂ 𝐹 and 𝐹 ⊂ 𝐸, then E and F are the same. In other words, we have:

(𝐸 ⊂ 𝐹 , 𝐹 ⊂ 𝐸) ⇔ 𝐸 = 𝐹
If E and F are the two events of the sample space, the event 𝐸 − 𝐹 contains all
the results of the sample space which are present E, while are absent in F. This is
expressed as 𝐸 ∩ 𝐹 𝑐 and depicted as Figure 2-6.
Figure 2-6 𝐸 − 𝐹 or 𝐸 ∩ 𝐹 𝑐
If E and F are the two events of the sample space, the event 𝐸𝛥𝐹 contains all
the results of the sample space, which are present only in one of the events F and E.
In other words, it is as follows:
(𝐸𝛥𝐹) = (𝐸 − 𝐹) ∪ (𝐹 − 𝐸) = (𝐸 ∩ 𝐹 𝑐 ) ∪ (𝐹 ∩ 𝐸 𝑐 ) = (𝐸 ∪ 𝐹) − (𝐸 ∩ 𝐹)
Figure 2-7 The event 𝐸𝛥𝐹
62 | P a g e
In fact, (𝐸𝛥𝐹) occurs if at least one of the events E and F occurs, but they do
not occur simultaneously.
Namely, consider the trial of tossing a die. Suppose that the events E and F are
defined as 𝐸 = {1 ,2 ,3} and 𝐹 = {3 ,4}, respectively. Then, the events 𝐸 − 𝐹, 𝐹 − 𝐸, and
𝐸𝛥𝐹 include respective values {1,2}, {4}, and {1,2,4}.
Intersection, union, and complement operations of events follow specific laws
in the algebra of sets. Some of these laws are expressed as follows:
Commutative law 𝐸∪𝐹 = 𝐹∪𝐸 𝐸∩𝐹 = 𝐹∩𝐸
Associative law (𝐸 ∪ 𝐹) ∪ 𝐺 = 𝐸 ∪ (𝐹 ∪ 𝐺)
(𝐸 ∩ 𝐹) ∩ 𝐺 = 𝐸 ∩ (𝐹 ∩ 𝐺)
Distributive law (𝐸 ∪ 𝐹) ∩ 𝐺 = (𝐸 ∩ 𝐺) ∪ (𝐹 ∩ 𝐺)
(𝐸 ∩ 𝐹) ∪ 𝐺 = (𝐸 ∪ 𝐺) ∩ (𝐹 ∪ 𝐺)
De Morgan's law (𝐸 ∩ 𝐹)𝑐 = 𝐸 𝑐 ∪ 𝐹 𝑐 (𝐸 ∪ 𝐹)𝑐 = 𝐸 𝑐 ∩ 𝐹 𝑐
To demonstrate the validity of the above laws, it suffices to show that each
result of the left-hand event is equivalent to a result of the right-hand event, and vice
versa. Furthermore, another way to show that these relationships are valid is the use
of the Venn Diagram. However, using the diagram to indicate the validity of all
relationships of the algebra of sets is not necessarily a useful tool, especially when
the number of studied events is numerous.
Note that, in De Morgan's law, the equation (𝐸 ∩ 𝐹)𝑐 = 𝐸 𝑐 ∪ 𝐹 𝑐 simply
indicates that the complement of the event, requiring the occurrence of both events
E and F, means that at least one of them does not occur.
Figure 2-8 The event 𝐸 𝑐 ∪ 𝐹 𝑐 or the complement of event 𝐸 ∩ 𝐹
63 | P a g e
Likewise, the equation (𝐸 ∪ 𝐹)𝑐 = 𝐸 𝑐 ∩ 𝐹 𝑐 simply indicates that complement of
an event, requiring the occurrence of at least one of the events of E and F, means
that none of the events occurs.
Figure 2-9 The event 𝐸 𝑐 ∩ 𝐹 𝑐 or the complement of event 𝐸 ∪ 𝐹
The generalization of De Morgan's law is as follows:
(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 )𝑐 = 𝐸1𝑐 ∪ 𝐸2𝑐 ∪ … ∪ 𝐸𝑛𝑐 (3-1)
(𝐸1 ∪ 𝐸2 ∪ … ∪ 𝐸𝑛 )𝑐 = 𝐸1𝑐 ∩ 𝐸2𝑐 ∩ … ∩ 𝐸𝑛𝑐 (3-2)
The first equation indicates that the events' complement, requiring the
occurrence of all of the events, means that at least one of them does not occur.
Likewise, the second equation denotes that the complement of an event, requiring
the occurrence of at least one of the events, means that none of the them occurs.
A s a straightforward definition, probability expresses the likelihood of an

event's occurrence by a number resulting from a proportion. However, in
math, the probability is a real-valued function defined on the sample space and
assigns a real value to each sample space member in the interval [0, 1]. Now, a
question may arise is how does this function assign numbers in the interval [0, 1]
to the sample space members?
One of the possible ways to obtain the probability of one event is the use of
relative frequency, which is defined as follows:
The probability of an event's occurrence, such as 𝐸, equals the proportion
of times from a trial in which event 𝐸 occurs, denoted as 𝑛(𝐸), to the total number
of performing the trial, 𝑛. Note that this is defined for the identical conditions of
64 | P a g e
the trial’s performing as well as when the number of performances approaches the
infinity. We designate this proportion as 𝑃(𝐸), defined with the following formula:
𝑛(𝐸)
𝑃(𝐸) = 𝑙𝑖𝑚
𝑛→∞ 𝑛
One of the drawbacks of definition is that each trial's performing condition

should be immutable, and it needs to be repeated infinite times. Since it is
practically impossible to do so, the obtained value from the definition gives is
approximate value for the real quantity of 𝑃(𝐸). Furthermore, the assumption of
the definition indicating the necessity of converging the proportion to a constant
value is complicated, which is also not straightforward to be accepted. Meanwhile,
even assuming that the above formula converges to a constant value, in case of
repeating the trials, it is still not easy to accept that the formula converges to the
same number obtained from performing the first trial. Despite all these cases, this
definition is one of the most basic definitions in the probability theory that
possesses many applications in this theory.
In the modern method of the probability theory, firstly, a set of simple
principles associated with probability is presented. Then, these principles and the
resulted prepositions are utilized to calculate different events. Moreover, the
probability concept as the relative frequency can also be understood by using
these principles. Axioms of the probability theory are defined as follows:
Consider a trial with sample space 𝑆. For any event, like 𝐸, from the sample
space 𝑆, we assume that 𝑃(𝐸) satisfies the following three axioms:
Axiom 1 -
0 ≤ 𝑃(𝐸) ≤ 1
Axiom 2 -
𝑃(𝑆) = 1
Axiom 3 -
For any sequence of mutually exclusive events (𝐸𝑖 ∩ 𝐸𝑗 = ∅ , i ≠ j), we have:
∞ ∞
𝑃(⋃ 𝐸𝑖 ) = 𝑃( 𝐸1 ∪ 𝐸2 ∪ … ∪ 𝐸𝑛 ) = ∑ 𝑃(𝐸𝑖 )
𝑛→∞
𝑖=1 𝑖=1
65 | P a g e
Axiom 1 states that the probability of any event like 𝐸 is in the interval [0,1].
The second axiom says that, in performing a random trial, the probability of
occurrence of one of the sample space members is equal to one. Furthermore, the
third axiom indicates that if one event can be expressed as the union of mutually
exclusive events, the probability of that event equals the sum of the probabilities of
each of those disjoint events.
One of the most important results of the probability theory axioms applies for
the conditions that the sample space members is finite, and its all possible results
can be assumed to be equally likely. In many random trials, assuming that different
members of the sample space are equally likely is a natural assumption. In other
words, it can be assumed that the probability of the occurrence of any sample space
member is the same. In such situations, to calculate the probability of the occurrence
of any event from the sample space, it suffices to divide the number of that event's
members by the total number of the sample space members. For instance, consider
a trial with a sample space of finite size 𝑁 as 𝑆 = { 𝑎1 , 𝑎2 , … , 𝑎𝑁 }. Now, if it can be
assumed that the probability of all these 𝑁 members are the same, we have:
𝑃({ 𝑎1 }) = 𝑃({ 𝑎2 }) = ⋯ = 𝑃({ 𝑎𝑁 })
Then, considering the second and third axioms of the probability theory, we
have:
𝑃({ 𝑎1 } ∪ { 𝑎2 } ∪ … ∪ { 𝑎𝑁 }) = 𝑃(𝑆) = 1 ⇒ 𝑃({ 𝑎1 }) + 𝑃({ 𝑎2 }) + ⋯ + 𝑃({ 𝑎𝑁 }) = 𝑃(𝑆) = 1

1
⇒ 𝑁𝑃({ 𝑎𝑖 }) = 1 ⇒ 𝑃({ 𝑎𝑖 }) = ; i = 1, … , N
𝑁
Consequently, according to the third probability theory, the probability of

event 𝐸 whose number of states equals 𝑛(𝐸) = 𝑘 is equal to the sum of event 𝐸
1
members' probabilities. Since the probability of each member is equal to 𝑁, the
probability of event 𝐸 is equal to:
1 𝑛(𝐸) 𝑛(𝐸)
𝑃(𝐸) = 𝑛(𝐸) = =
𝑁 𝑁 𝑛(𝑆)
66 | P a g e
As an example, if it can be assumed that each of the six possible results of
one die is equally likely, and if the event 𝐸 is defined in a way that the result of one
tossing is an even number, since the sample space possesses six members, and the
event 𝐸 contains three members as 𝐸 = {2,4,6}, its probability can be calculated as
follows:
𝑛(𝐸) 3
𝑃(𝐸) = =
𝑛(𝑆) 6
However, as mentioned before, one way to obtain the probability is using the
relative frequency formula. That is, a die is tossed many times, and the proportion
of the number of times that an even number appears can be calculated. If the
sample space members can be assumed to be equally likely, and using formula
𝑛(𝐸)
𝑃(𝐸) = is possible, applying the relative frequency method does not seem
𝑛(𝑆)
reasonable.
Example 4.1
There are four restaurants in a small city. If three people go to the city in an
arbitrary day, and each of them randomly choose one of the four restaurants, what
is the probability that they select different restaurants?
Solution. We assume that all 43 states of the sample space are equally likely. If the
event 𝐸 is defined as selecting different restaurants, the number of possible states
4
of occurrence of the event 𝐸 is equal to 4 × 3 × 2 or ( ) 3!. Therefore, the required
3
probability is:
4
𝑛(𝐸) 4 × 3 × 2 (3) 3! 3
𝑃(𝐸) = = = =
𝑛(𝑆) 4 × 4 × 4 43 8
67 | P a g e
Example 4.2
If ten men and seven women randomly sit at a round table, find the probability
of the event that all the women sit next to each other.
Solution. Since individuals sit randomly, we assume that all of the possible 16! states
are equally likely. To calculate the number of states of the required event, the group
of women along with the ten men makes eleven groups that can be arranged in 10!
states at a round table. Meanwhile, arranging women in their group has 7! states.
10! ×7!
Consequently, the desired probability is equal to .
16!
Another approach is to first arrange the men in 9! states. Then, we select one
place from the 10 spaces available among the men and arrange all the women there,
which can be done in 7! states. Consequently, the desired probability is equal to:
10
7! × ( ) × 9! 7! × 10!
1 =
16! 16!
Example 4.3
If ten men and seven women randomly sit at a round table, find the probability
of the event that no two women sit next to each other.
68 | P a g e
Solution. To calculate the number of states of the event, we first arrange the men in
9! states. Now, we select seven places for women and then arrange them, which can
10
be done in ( ) × 7! states. Consequently, the desired probability is equal to:
7
10
9! × ( ) × 7!
7
16!
Example 4.4
We want to choose a two-member group consisting of six men and five

women. If we select people at random, what is the probability that the selected group
consists of one man and one woman?
11
Solution. If the order of choosing people does not matter, and assuming that all ( )
2
states of the sample space are equally likely, then the number of states of the desired
6 5
event is equal to ( ) ( ). Consequently, the desired probability is equal to:
1 1
6 5
( ) ( ) 30
1 1 =
11 55
( )
2
However, if the order of choosing people matters, the sample space contains
11 × 10 = 110 states. Among them, there are 6 × 5 states that initially a man is
selected followed by the choice of a woman. Likewise, there are 5 × 6 states that
initially a woman is selected followed by the choice of a man. Therefore, the
probability that the group consists of one man and one woman is equal to:
6×5+5×6 60 30
= =
11 × 10 110 55
The reader can observe that in a problem which the order of choosing people
matters, the number of states of the numerator and denominator is 2! times as many
as that of states of the case which the order of choices does not matter; however,
69 | P a g e
the desired probability did not change. Generally, if a trial consists of selecting
without replacement from a sample of size 𝑟, and there is no emphasis on the order
of choices in the problem's event, provided that the order of choices matters, then
the number of states of the numerator and denominator is 𝑟! times as many as that
of states of the case which the order of choices does not matter; however, the
𝑛(𝐸)
proportion of fraction 𝑛(𝑆) does not change.
Note that if the sampling is done with replacement in this example, the total
number of states is equal to 11 × 11 = 121, but the number of states of the event does
not change; that is, there are 6 × 5 states in which a woman is firstly selected followed
by the choice of a man, and there are 5 × 6 states in which a man is firstly selected
followed by the choice of a woman. Therefore, if the sampling is done with
6×5+5×6 60
replacement in this example, the required probability is equal to = 121 which
11×11
is not equal to the state in which the selection is without replacement.
In general, in problems of selecting r objects from n objects, if the trials are
done without replacement, and the order of choices is not specified in the event of
the problem, then there is no difference in the probability of the event with or
without consideration of the order. The reader should note that if we consider the
𝑛(𝐸)
order of choices in the nominator of fraction 𝑃(𝐸) = , the same should be done
𝑛(𝑆)
for the denominator, and vice versa. In the problems of selecting with replacement,
the order should be considered, and it should be noted that the answer to the
problem is not necessarily equal to the problem of selecting without replacement.
Example 4.5
Suppose that we select four marbles, at random and with replacement, from a
box containing ten marbles numbered 1 through 10. What is the probability that the
smallest number selected is equal to 3?
70 | P a g e
Solution. Since the smallest number selected should be equal to 3, this number
should be chosen at least once, and the other numbers selected should be greater
than 3. To do so, to calculate the number of states of the event, we add up the states
in which number 3 is selected once, twice, three times, and four times. Consequently,
the required probability is equal to:
4 4 4 4 4
( ) 1 × 73 + ( ) 12 × 72 + ( ) 13 × 71 + ( ) 14 × 70 ∑4𝑖=1 ( ) 1𝑖 × 74−𝑖
1 2 3 4 = 𝑖
104 104
∑4𝑖=0 (4) 1𝑖 × 74−𝑖 − (4) 10 × 74 (7 + 1)4 − 74
= 𝑖 0 =
104 104
There is also another method to solve the problem. Suppose that the events
E, F, and G are defined as follows:
E: the smallest number selected is equal to 3.
F: the smallest number selected is greater than 3.
G: the smallest number selected is greater than or equal to 3.
Therefore, with respect to the third probability theory, we have:
84 74
𝑃(𝐸) + 𝑃(𝐹) = 𝑃(𝐺) ⇒ 𝑃(𝐸) = 𝑃(𝐺) − 𝑃(𝐹) = −
104 104
Note that the minimum value of some numbers is greater than or equal to
3 means all numbers are greater than or equal to 3. In addition, the minimum
value of some numbers is greater than 3 means all the numbers are greater than
3.
Example 4.6
Solve the previous example under the condition of without replacement, as

well.
Solution. Since the smallest number selected should be equal to 3, this number
should be selected, and the other numbers selected should be greater than 3. Note
71 | P a g e
that, under the condition of without replacement, number 3 cannot be chosen more
than once. Consequently, the required probability is equal to:
1 7
( )( )
1 3
10
( )
4
Moreover, similar to the preceding example, defining the events E, F, and G,
we can solve this example by the following method:
8 7
( ) ( )
𝑃(𝐸) + 𝑃(𝐹) = 𝑃(𝐺) ⇒ 𝑃(𝐸) = 𝑃(𝐺) − 𝑃(𝐹) = 4 − 4
10 10
( ) ( )
4 4
Furthermore, if the order of choices matters, the fraction's ultimate answer

does not change, as mentioned before. Due to the consideration of the order of
choices, the reader should be careful in counting the number of states of the event.
4
That is, number 3 can occur in ( ) states during the four trials. If the trial
1
representing number 3 is specified, then it has one state, and the other trials that
should be more than 3 have 7, 6, and 5 states, respectively. Consequently, the
required probability is equal to:
4
( )1 × 7 × 6 × 5
1
10 × 9 × 8 × 7
Moreover, we can solve this example by the following method:

8×7×6×5 7×6×5×4
𝑃(𝐸) + 𝑃(𝐹) = 𝑃(𝐺) ⇒ 𝑃(𝐸) = 𝑃(𝐺) − 𝑃(𝐹) = −
10 × 9 × 8 × 7 10 × 9 × 8 × 7
72 | P a g e
Example 4.7
Suppose that we select four marbles, at random and with replacement, from a
box containing ten marbles numbered 1 through 10. What is the probability that the
greatest number selected is equal to 3?
Solution. Since the greatest number selected should be equal to 6, this number
should be chosen at least once, and the other numbers selected should be less than
6. To do so, to calculate the number of states of the event, we add up the states in
which number 6 is selected once, twice, three times, and four times. Consequently,
the required probability is equal to:
4 4 4 4 4
( ) 1 × 53 + ( ) 12 × 52 + ( ) 13 × 51 + ( ) 14 × 50 ∑4𝑖=1 ( ) 1𝑖 × 54−𝑖
1 2 3 4 = 𝑖
104 104
∑4𝑖=0 (4) 1𝑖 × 54−𝑖 − (4) 10 × 54 (5 + 1)4 − 54
= 𝑖 0 =
104 104
There is also another method to solve the problem. Suppose that events E, F,
and G are defined as follows:
E: the greatest number selected is equal to 6.
F: the greatest number selected is less than 6.
G: the greatest number selected is less than or equal to 6.
Therefore, with respect to the third probability theory, we have:
64 54
𝑃(𝐸) + 𝑃(𝐹) = 𝑃(𝐺) ⇒ 𝑃(𝐸) = 𝑃(𝐺) − 𝑃(𝐹) = −
104 104
Note that the maximum value of some numbers is less than or equal to 6 means
all of them are less than or equal to 6. In addition, the maximum value of some
numbers is less than 6 means all of them are less than 6.
73 | P a g e
Example 4.8
Solve the previous example under the condition of without replacement.

Solution. Since the greatest number selected should be equal to 6, this number
should be selected, and the other numbers selected should be less than 6.
Consequently, the required probability is equal to:
1 5
( )( )
1 3
10
( )
4
Moreover, like the preceding example, by defining the events E, F, and G, this
example can be solved by the following method:
6 5
( ) ( )
𝑃(𝐸) + 𝑃(𝐹) = 𝑃(𝐺) ⇒ 𝑃(𝐸) = 𝑃(𝐺) − 𝑃(𝐹) = 4 − 4
10 10
( ) ( )
4 4
or if we consider the order of choices matters, then the required probability is equal
to
4
( )1 × 5 × 4 × 3
1
10 × 9 × 8 × 7
Moreover, we can solve this example by the following method:

6×5×4×3 5×4×3×2
𝑃(𝐸) + 𝑃(𝐹) = 𝑃(𝐺) ⇒ 𝑃(𝐸) = 𝑃(𝐺) − 𝑃(𝐹) = −
10 × 9 × 8 × 7 10 × 9 × 8 × 7
74 | P a g e
Example 4.9
Consider a group consists of five freshmen and five sophomores. If we

randomly divide these students into five two-member rooms, obtain the
probability of one freshman and one sophomore in each room.
Solution. If the rooms are considered to be different, we first put one freshman
into each room to calculate the number of the desired events, which can be done
in 5! states. Then, we put one sophomore into each room, which also can be done
in 5! states. Therefore, as the total number of dividing 10 persons into 5 two-
10!
member rooms equals (2!)5, the required probability is equal to:
5! 5!
10!
(2!)5
If the rooms are considered to be identical, then to calculate the number
of desired events, we first put one freshman into each room, which can be done
in 1 state because the occupied room does not matter. Then, we put one
sophomore into each room, which can also be done in 5! states because after the
freshmen are put into the rooms, the roommate matters to sophomores.
Therefore, as the total number of dividing 10 persons into 5 indistinguishable two-
10!
member rooms equals (2!)5 5!, the required probability is equal to:
1 × 5!
10!
(2!)5 5!
If the rooms are considered to be different, both of the numerator and
𝑛(𝐸)
denominator of fraction 𝑛(𝑆) are 5! times as many as the case which the rooms are
considered to be identical. Consequently, the required probability is same in
these two cases. This problem can also be construed such that when the rooms
are different, it is essential to consider the order of arranging the groups.
However, this note does not affect the value of the probability.
75 | P a g e
Example 4.10
Suppose that we want to distribute two identical white balls into two urns at
random. What is the probability that each urn gets one ball?
Solution. First, suppose that we want to check this problem by using the relative
frequency method. Assuming that the balls are identical, to compute the required
probability, we need to repeat the trial of randomly distributing two identical balls
into two urns many times. It is followed by the computation of the proportion of
times that the required event (each urn gets one ball) occurs. It is evident that if the
balls are numbered, it does not affect the trial's physical situations, and the referred
proportion does not change, and the desired probability undergoes no change.
Therefore, using the relative frequency concept, whether the balls are numbered or
not, the required event's frequency proportion does not change through the relative
frequency concept.
𝑛(𝐸)
However, if we want to use formula , an issue should be considered. In the
𝑛(𝑆)
preceding chapter, if the values of the balls are considered identical to count the
number of states of this trial, the total number of states is computed with the
𝑛+𝑟−1 2+2−1
relationship ( )=( ) = 3. However, the issue concerning such a
𝑟−1 2−1
sample space representation is that the probabilities of the created states are not the
𝑛(𝐸)
same. Therefore, in such spaces, we cannot use formula 𝑛(𝑆) since when the balls are
randomly distributed into the urns, the actual states are as follows:
The first ball is put into the first urn, and the second ball is put into the first
1
urn as well; the probability of this event is equal to 4.
The first ball is put into the second urn, and the second ball is put into the
1
second urn as well; the probability of this event is equal to 4.
The first ball is put into the first urn, and the second ball is put into the
1
second urn; the probability of this event is equal to 4.
76 | P a g e
The first ball is put into the second urn, and the second ball is put into the
1
first urn; the probability of this event is equal to 4.
Nevertheless, the noteworthy point is that when the balls are considered to
be identical, the value of the third and fourth states is equal to us, and the sample
space of the problem will turn into three states which do not possess equal
probability.
Both balls are distributed into the first urn; the probability of this event is
1
equal to 4.
Both balls are distributed into the second urn; the probability of this event
1
is equal to 4.
2
Each urn receives one ball; the probability of this event is equal to 4.
𝑛(𝐸)
Therefore, to obviate this obstacle and use formula more conveniently,
𝑛(𝑆)
it is assumed that the balls (objects) are different. In fact, when we suppose that the
objects are identical or indistinguishable, it often makes some states of the space
possess the same value, while each of them has a distinctive chance. This issue may
result in the creation of a new space, the members of which are not equally likely.
However, assuming that the objects are different means that we have assigned a
separate chance to each physical object.
Except for the above problem, there are other examples where the sample
space can be written such that its members are not equally likely or its members
𝑛(𝐸)
are equally likely. In order to employ formula 𝑛(𝑆) , it is necessary to write the space
in a way that its members are equally likely if using formula.
For example, if we suppose that each child of a family is equally likely to be
either a girl or a boy, for families with two children, it is evident that the probability
1
of being boys for both of the first and second cases is equal to 4. The probability
1
that both the first and second children are girls is equal to 4. The probability that
1
the first child is a boy and the second child is a girl equals 4, and the probability that
77 | P a g e
1
the first child is a girl and the second child is a boy is equal to 4. Consequently, the
2
probability that the family has one boy and one girl is equal to 4.
Therefore, the sample space members can be represented as four

states {(𝑔, 𝑏), (𝑏, 𝑔), (𝑔, 𝑔), (𝑏, 𝑏)}, each of which can occur with the equal probability
1
of 4. It can also be shown as three states {{𝑏, 𝑔}, {𝑔, 𝑔}, {𝑏, 𝑏}} in which the events can
2 1 1 𝑛(𝐸)
occur with the probabilities , , and , respectively. We cannot use formula to
4 4 4 𝑛(𝑆)
compute the event's probability in such spaces. Namely, it incorrectly says the
1
probability that both children are boys is equal to 3.
Likewise, it can be shown that, for example, the results should be written as
36 ordered pairs such as {(𝑖, 𝑗) : 𝑖, 𝑗 = 1,2,3,4,5,6} so that we can assume the members
of the sample space of rolling the two fair dice are equally likely. It means that, for
1
example, we have assigned separate probabilities of size to (5,6), (6,5), and (6,6).
36
This is because of the fact that if we do not consider the order of dice results, then
the probability that the first two flips consist of a five and a six is twice as many as
the case that both results are six. Consequently, such a space will not be equally
likely.
Example 4.11
In rolling a pair of fair dice, obtain the probability that the sum of the upturned
faces equals 𝑖 given 𝑖 = 2, 3, … ,12.
Solution. As mentioned, the sample space belonging to the trial of rolling two fair
dice can be written as 36 ordered pairs like {(𝑖, 𝑗) : 𝑖, 𝑗 = 1,2,3,4,5,6}. Hence, if the event
𝐴𝑖 denotes that the sum of the upturned faces is equal to i, we have:
1
𝑃(𝐴2 ) = 𝑃{(1,1)} =
36
2
𝑃(𝐴3 ) = 𝑃{(1,2), (2,1)} =
36
78 | P a g e
3
𝑃(𝐴4 ) = 𝑃{(1,3), (2,2), (3,1)} =
36
4
𝑃(𝐴5 ) = 𝑃{(1,4), (2,3), (3,2), (4,1)} =
36
5
𝑃(𝐴6 ) = 𝑃{(1,5), (2,4), (3,3), (4,2), (5,1)} =
36
6
𝑃(𝐴7 ) = 𝑃{(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)} =
36
5
𝑃(𝐴8 ) = 𝑃{(2,6), (3,5), (4,4), (5,3), (6,2)} =
36
4
𝑃(𝐴9 ) = 𝑃{(3,6), (4,5), (5,4), (6,3)} =
36
3
𝑃(𝐴10 ) = 𝑃{(4,6), (5,5), (6,4)} =
36
2
𝑃(𝐴11 ) = 𝑃{(5,6), (6,5)} =
36
1
𝑃(𝐴12 ) = 𝑃{(6,6)} =
36
At the end of this section, it is emphasized again that the only allowable
𝑛(𝐸)
condition for usage of the formula 𝑃(𝐸) = to compute 𝑃(𝐸) is when all the 𝑛(𝑆)
𝑛(𝑆)
states of the sample space are equally likely. For instance, in problems which examine
distributing objects into urns, if we suppose that the objects are identical, the space
𝑛(𝐸)
is often not equally likely, and it is not possible to use the formula 𝑃(𝐸) = .
𝑛(𝑆)
U
𝑛(𝐸)
sing the formula 𝑃(𝐸) = in the equally likely sample space perhaps is the
𝑛(𝑆)
most important result arising from the principle of the probability theory.
However, other important propositions can be proven by these principles that
have many applications in the probability theory. Some of these propositions are
indicated in this chapter.
Proposition 5-1
𝑃(𝐸 𝑐 ) = 1 − 𝑃(𝐸)
79 | P a g e
Proof.
𝑃(𝐸 ∪ 𝐸 𝑐 ) = 𝑃(𝑆) = 1
} ⇒ 𝑃(𝐸 𝑐 ) = 1 − 𝑃(𝐸)
𝑃(𝐸 ∪ 𝐸 𝑐 ) = 𝑃(𝐸) + 𝑃(𝐸 𝑐 )
Sometimes 𝐸𝑖 's are events that calculating their probability of not occurrence
is more straightforward than calculating their probability of occurrence. In such
conditions, using De Morgan's law, we have:
(𝐸1 ∪ 𝐸2 )𝑐 = 𝐸1𝑐 ∩ 𝐸2𝑐 ⇒ 𝑃(𝐸1 ∪ 𝐸2 ) = 1 − 𝑃(𝐸1 ∪ 𝐸2 )𝑐 = 1 − 𝑃(𝐸1𝑐 ∩ 𝐸2𝑐 )
(𝐸1 ∩ 𝐸2 )𝑐 = 𝐸1𝑐 ∪ 𝐸2𝑐 ⇒ 𝑃(𝐸1 ∩ 𝐸2 ) = 1 − 𝑃(𝐸1 ∩ 𝐸2 )𝑐 = 1 − 𝑃(𝐸1𝑐 ∪ 𝐸2𝑐 )
or more generally, we have:
(𝐸1 ∪ 𝐸2 ∪ ⋯ ∪ 𝐸𝑛 )𝑐 = 𝐸1𝑐 ∩ 𝐸2𝑐 ∩ ⋯ ∩ 𝐸𝑛𝑐 ⇒ 𝑃(𝐸1 ∪ 𝐸2 ∪ ⋯ ∪ 𝐸𝑛 )
= 1 − 𝑃(𝐸1𝑐 ∩ 𝐸2𝑐 ∩ ⋯ ∩ 𝐸𝑛𝑐 )
(𝐸1 ∩ 𝐸2 ∩ ⋯ ∩ 𝐸𝑛 )𝑐 = 𝐸1𝑐 ∪ 𝐸2𝑐 ∪ ⋯ ∪ 𝐸𝑛𝑐 ⇒ 𝑃(𝐸1 ∩ 𝐸2 ∩ ⋯ ∩ 𝐸𝑛 )
= 1 − 𝑃(𝐸1𝑐 ∪ 𝐸2𝑐 ∪ ⋯ ∪ 𝐸𝑛𝑐 )
In fact, if 𝐸𝑖 's are events whose calculating their probability of not occurrence
is more straightforward than calculating their probability of occurrence, the
probability that at least one of the 𝐸𝑖 's occurs can be written as the probability that
one minus none of the 𝐸𝑖 's occurs. Likewise, the probability that all of the 𝐸𝑖 's occur
can be written as the probability that one minus at least one of the 𝐸𝑖 's does not
occur.
Proposition 5-2
𝐸 ⊂ 𝐹 ⇒ 𝑃(𝐸) ≤ 𝑃(𝐹)
Proof. We write the event F as the union of the two disjoint events E and 𝐸 𝑐 ∩ 𝐹.
Therefore, according to the third principle of the probability theory, we have:
𝐹 = 𝐸 ∪ (𝐸 𝑐 ∩ 𝐹) ⇒ 𝑃(𝐹) = 𝑃(𝐸) + 𝑃(𝐸 𝑐 ∩ 𝐹) ⇒ 𝑃(𝐸) ≤ 𝑃(𝐹)
80 | P a g e
For example, suppose that E and F are the two arbitrary events, using
Proposition 5-2, we have:
(𝐸 ∩ 𝐹) ⊂ 𝐹 ⊂ (𝐸 ∪ 𝐹) ⇒ 𝑃(𝐸 ∩ 𝐹) ≤ 𝑃(𝐹) ≤ 𝑃(𝐸 ∪ 𝐹)
(𝐸 − 𝐹) ⊂ (𝐸𝛥𝐹) ⊂ (𝐸 ∪ 𝐹) ⇒ 𝑃(𝐸 − 𝐹) ≤ 𝑃(𝐸𝛥𝐹) ≤ 𝑃(𝐸 ∪ 𝐹)
Likewise, using Proposition 5-2, we have:
𝑃(𝐸 ∩ 𝐹) ≤ 𝑚𝑖𝑛{ 𝑃(𝐸), 𝑃(𝐹)} (5-1)
𝑃(𝐸 ∪ 𝐹) ≥ 𝑚𝑎𝑥{ 𝑃(𝐸), 𝑃(𝐹)} (5-2)
To prove relationships (5-1) and (5-2), we have:

𝑃(𝐸 ∩ 𝐹) ≤ 𝑃(𝐸)
} ⇒ 𝑃(𝐸 ∩ 𝐹) ≤ 𝑚𝑖𝑛{ 𝑃(𝐸), 𝑃(𝐹)}
𝑃(𝐸 ∩ 𝐹) ≤ 𝑃(𝐹)
𝑃(𝐸 ∪ 𝐹) ≥ 𝑃(𝐸)
} ⇒ 𝑃(𝐸 ∪ 𝐹) ≥ 𝑚𝑎𝑥{ 𝑃(𝐸), 𝑃(𝐹)}
𝑃(𝐸 ∪ 𝐹) ≥ 𝑃(𝐹)
Proposition 5-3
𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹)
Proof. Based on Figure 2-10, if we write the event 𝐸 ∪ 𝐹, as the union of disjoint events
III, II, and I, then based on the third probability theory, 𝑃(𝐸 ∪ 𝐹) is equal to the sum
of these events' probabilities. Hence, to calculate 𝑃(𝐸 ∪ 𝐹), we can add up 𝑃(𝐸) and
81 | P a g e
𝑃(𝐹). Nonetheless, if doing so, we have considered the region II twice. Therefore, it
should be removed from the calculations once.
Figure 2-10 The event 𝐸 ∪ 𝐹
𝑃(𝐸) = 𝑃(𝐼) + 𝑃(𝐼𝐼)

{𝑃(𝐹) = 𝑃(𝐼𝐼𝐼) + 𝑃(𝐼𝐼)
𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐼) + 𝑃(𝐼𝐼) + 𝑃(𝐼𝐼𝐼)
⇒ 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹) = {𝑃(𝐼) + 𝑃(𝐼𝐼)} + {𝑃(𝐼𝐼𝐼) + 𝑃(𝐼𝐼)} − 𝑃(𝐼𝐼)
= 𝑃(𝐼) + 𝑃(𝐼𝐼) + 𝑃(𝐼𝐼𝐼) = 𝑃(𝐸 ∪ 𝐹)
Therefore, the value of 𝑃(𝐸 ∪ 𝐹) or the probability that at least one of the two
events E or F occurs can be written in two following ways:
𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹)
𝑃(𝐸 ∪ 𝐹) = {
1 − 𝑃(𝐸 𝑐 ∩ 𝐹 𝑐 )
If we have three events, law, then the above relationship using the distributive
leads to:
𝑃(𝐸1 ∪ 𝐸2 ∪ 𝐸3 ) = 𝑃(𝐸1 ) + 𝑃(𝐸2 ) + 𝑃(𝐸3 ) − 𝑃(𝐸1 ∩ 𝐸2 ) − 𝑃(𝐸1 ∩ 𝐸3 ) − 𝑃(𝐸2 ∩ 𝐸3 ) + 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 )
Proof.
𝑃(𝐸1 ∪ 𝐸2 ∪ 𝐸3 )
= 𝑃((𝐸1 ∪ 𝐸2 ) ∪ 𝐸3 ) = [𝑃(𝐸1 ) + 𝑃(𝐸2 ) − 𝑃(𝐸1 ∩ 𝐸2 )] + [𝑃(𝐸3 )] − [𝑃((𝐸1 ∪ 𝐸2 ) ∩ 𝐸3 )]
= 𝑃(𝐸1 ) + 𝑃(𝐸2 ) + 𝑃(𝐸3 ) − 𝑃(𝐸1 ∩ 𝐸2 ) − 𝑃((𝐸1 ∩ 𝐸3 ) ∪ (𝐸2 ∩ 𝐸3 ))
= 𝑃(𝐸1 ) + 𝑃(𝐸2 ) + 𝑃(𝐸3 ) − 𝑃(𝐸1 ∩ 𝐸2 ) − [𝑃(𝐸1 ∩ 𝐸3 ) + 𝑃(𝐸2 ∩ 𝐸3 ) − 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 )]
= 𝑃(𝐸1 ) + 𝑃(𝐸2 ) + 𝑃(𝐸3 ) − 𝑃(𝐸1 ∩ 𝐸2 ) − 𝑃(𝐸1 ∩ 𝐸3 ) − 𝑃(𝐸2 ∩ 𝐸3 ) + 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 )
Likewise, using induction, it can be shown that the generalization of the
preceding relationship is as follows:
𝑃(𝐸1 ∪ 𝐸2 ∪ … ∪ 𝐸𝑛 ) = 𝑃(𝐸1 ) + 𝑃(𝐸2 ) + ⋯ + 𝑃(𝐸𝑛 )
− 𝑃(𝐸1 ∩ 𝐸2 ) − 𝑃(𝐸1 ∩ 𝐸3 ) − ⋯ − 𝑃(𝐸𝑛−1 ∩ 𝐸𝑛 )
82 | P a g e
+ 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) + 𝑃(𝐸1 ∩ 𝐸3 ∩ 𝐸4 ) + ⋯ + 𝑃(𝐸𝑛−2 ∩ 𝐸𝑛−1 ∩ 𝐸𝑛 )
⋮
+ (−1)𝑛+1 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 )
Note that if the conditions of 𝐸𝑖 's are such that:
𝑃(𝐸1 ) = 𝑃(𝐸2 ) = ⋯ = 𝑃(𝐸𝑛 )
𝑃(𝐸1 ∩ 𝐸2 ) = 𝑃(𝐸1 ∩ 𝐸3 ) = ⋯ = 𝑃(𝐸𝑛−1 ∩ 𝐸𝑛 )
⋮
The probability that at least one of events 𝐸1 , 𝐸2 , …, or 𝐸𝑛 occurs is equal to:
𝑃(𝐸1 ∪ 𝐸2 ∪ … ∪ 𝐸𝑛 )
𝑛 𝑛 𝑛
= ( ) 𝑃(𝐸1 ) − ( ) 𝑃(𝐸1 ∩ 𝐸2 ) + ⋯ + (−1)𝑛+1 ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 )
1 2 𝑛
In such conditions, the probability that at least one of 𝐸𝑖 's occurs can be
calculated in two ways as follows:
𝑃(𝐸1 ∪ 𝐸2 ∪ … ∪ 𝐸𝑛 )
𝑛 𝑛 𝑛
( ) 𝑃(𝐸1 ) − ( ) 𝑃(𝐸1 ∩ 𝐸2 ) + ⋯ + (−1)𝑛+1 ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 )
={ 1 2 𝑛
𝑐 𝑐 𝑐
1 − 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 )
Proposition 5-4
𝑃(𝐸 ∩ 𝐹 𝑐 ) = 𝑃(𝐸 − 𝐹) = 𝑃(𝐸) − 𝑃(𝐸 ∩ 𝐹)
Proof. If we write the event E as the union of disjoint events 𝐸 ∩ 𝐹 𝑐 and 𝐸 ∩ 𝐹,
according to the third principle of the probability theory, we have:
𝐸 = (𝐸 ∩ 𝐹 𝑐 ) ∪ (𝐸 ∩ 𝐹) ⇒ 𝑃(𝐸) = 𝑃(𝐸 ∩ 𝐹 𝑐 ) + 𝑃(𝐸 ∩ 𝐹)
⇒ 𝑃(𝐸 ∩ 𝐹 𝑐 ) = 𝑃(𝐸 − 𝐹) = 𝑃(𝐸) − 𝑃(𝐸 ∩ 𝐹)
Proposition 5-5
𝑃(𝐸𝛥𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 2𝑃(𝐸 ∩ 𝐹)
83 | P a g e
Proof. If we write the event 𝐸 ∪ 𝐹 as the union of disjoint events 𝐸𝛥𝐹 and 𝐸 ∩ 𝐹,
according to the third principle of the probability theory, we have:
(𝐸 ∪ 𝐹) = (𝐸𝛥𝐹) ∪ (𝐸 ∩ 𝐹) ⇒ 𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸𝛥𝐹) + 𝑃(𝐸 ∩ 𝐹)
⇒ 𝑃(𝐸𝛥𝐹) = 𝑃(𝐸 ∪ 𝐹) − 𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 2𝑃(𝐸 ∩ 𝐹)
Proposition 5-6
For two arbitrary events E and F belonging to a sample space, we have:
𝑃(𝐸) + 𝑃(𝐹) − 1 ≤ 𝑃(𝐸 ∩ 𝐹)
Proof. Based on the first principle of the probability theory, since the probability of
one event cannot be greater than 1, we have:
𝑃(𝐸 ∪ 𝐹) ≤ 1 ⇒ 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹) ≤ 1 ⇒ 𝑃(𝐸) + 𝑃(𝐹) − 1 ≤ 𝑃(𝐸 ∩ 𝐹)
The inequality above is called Bonferroni's inequality.
Hence, considering relationship (5-1) and Proposition (5-6) simultaneously, we
have:
𝑃(𝐸) + 𝑃(𝐹) − 1 ≤ 𝑃(𝐸 ∩ 𝐹) ≤ 𝑀𝑖𝑛{𝑃(𝐸), 𝑃(𝐹)}
Generalization of Bonferroni's inequality for the case of having three events
is as follows:
𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) ≥ 𝑃(𝐸1 ∩ 𝐸2 ) + 𝑃(𝐸3 ) − 1 ≥ (𝑃(𝐸1 ) + 𝑃(𝐸2 ) − 1) + 𝑃(𝐸3 ) − 1
= 𝑃(𝐸1 ) + 𝑃(𝐸2 ) + 𝑃(𝐸3 ) − 2
Likewise, using induction, it can be shown that the generalization of
Bonferroni's inequality when we have n events is equal to:
𝑃(𝐸1 ) + ⋯ + 𝑃(𝐸𝑛 ) − (𝑛 − 1) ≤ 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 )
Therefore, in general, a lower and upper bound of the value 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩
𝐸𝑛 ) can be written as:
𝑃(𝐸1 ) + ⋯ + 𝑃(𝐸𝑛 ) − (𝑛 − 1) ≤ 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 ) ≤ 𝑀𝑖𝑛{𝑃(𝐸𝑖 )}
84 | P a g e
For example, if 𝑃(𝐸) = 0.9 and 𝑃(𝐹) = 0.8, the minimum and maximum values
of 𝑃(𝐸 ∩ 𝐹) are calculated by using the preceding relationship as follows:
𝑃(𝐸) + 𝑃(𝐹) − 1 ≤ 𝑃(𝐸 ∩ 𝐹) ≤ 𝑀𝑖𝑛{𝑃(𝐸), 𝑃(𝐹)} ⇒ 0.7 ≤ 𝑃(𝐸 ∩ 𝐹) ≤ 0.8
Alternatively, if 𝑃(𝐸) = 0.2 and 𝑃(𝐹) = 0.3, the minimum and maximum values
of 𝑃(𝐸 ∩ 𝐹) are calculated by using the relationship as:
𝑃(𝐸) + 𝑃(𝐹) − 1 ≤ 𝑃(𝐸 ∩ 𝐹) ≤ 𝑀𝑖𝑛{𝑃(𝐸), 𝑃(𝐹)} ⇒ 0 ≤ 𝑃(𝐸 ∩ 𝐹) ≤ 0.2
In addition, if 𝑃(𝐸1 ) = 𝑃(𝐸2 ) = 𝑃(𝐸3 ) = 0.9, the minimum and maximum
values of 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) are calculated by using the same relationship as:
𝑃(𝐸1 ) + 𝑃(𝐸2 ) + 𝑃(𝐸3 ) − 2 ≤ 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) ≤ 𝑀𝑖𝑛{𝑃(𝐸𝑖 )} ⇒ 0.7 ≤ 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) ≤ 0.9
Proposition 5-7
For the two arbitrary events E and F belonging to a common sample space, we have:
𝑃(𝐸 ∪ 𝐹) ≤ 𝑃(𝐸) + 𝑃(𝐹)
Proof.
𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹) ⇒ 𝑃(𝐸 ∪ 𝐹) ≤ 𝑃(𝐸) + 𝑃(𝐹)
Consequently, considering relationship (5-2) and Proposition (5-7), we have:
𝑀𝑎𝑥{ 𝑃(𝐸), 𝑃(𝐹)} ≤ 𝑃(𝐸 ∪ 𝐹) ≤ 𝑃(𝐸) + 𝑃(𝐹)
Likewise, using induction, it can be shown that:
𝑀𝑎𝑥{𝑃(𝐸𝑖 )} ≤ 𝑃(𝐸1 ∪ 𝐸2 ∪ … ∪ 𝐸𝑛 ) ≤ 𝑃(𝐸1 ) + ⋯ + 𝑃(𝐸𝑛 )
For example, if 𝑃(𝐸) = 0.2 and 𝑃(𝐹) = 0.3, the minimum and maximum values
of 𝑃(𝐸 ∪ 𝐹) are calculated by using the preceding relationship as follows:
𝑀𝑎𝑥{𝑃(𝐸), 𝑃(𝐹)} ≤ 𝑃(𝐸 ∪ 𝐹) ≤ 𝑃(𝐸) + 𝑃(𝐹) ⇒ 0.3 ≤ 𝑃(𝐸 ∪ 𝐹) ≤ 0.5
Alternatively, if 𝑃(𝐸) = 0.9 and 𝑃(𝐹) = 0.8, the minimum and maximum values
of 𝑃(𝐸 ∪ 𝐹) are calculated by using the same relationship as follows:
𝑀𝑎𝑥{𝑃(𝐸), 𝑃(𝐹)} ≤ 𝑃(𝐸 ∪ 𝐹) ≤ 𝑃(𝐸) + 𝑃(𝐹) ⇒ 0.9 ≤ 𝑃(𝐸 ∪ 𝐹) ≤ 1
85 | P a g e
In the following pages, in order to provide a better perception of the
relationships and propositions resulting from the probability theory principles, some
relevant examples are addressed.
Example 5.1
If we distribute ten distinguishable marbles into four urns, it is desired to

calculate the probability that:
a. The urn 1 is not empty.
b. At least one of the urns 1 and 2 is not empty.
c. At least one of the urns 1 and 2 is empty.
d. The urns 1 and 2 are not empty.
e. None of the urns is empty.
f. The urn 1 is empty and the urn 2 is not empty.
g. Exactly one of the urns 1 and 2 is empty.
Solution.
a. Direct calculation of the probability that the urns are not empty is challenging
and time-consuming since when one urn is not empty, it can receive one
marble or more. Nevertheless, calculating its complementary probability is
straightforward. Suppose that 𝐸𝑖 is the event that the urn 𝑖 is not empty,
meaning that it receives at least one marble. In such situations, we have:
𝑐 310
𝑃(𝐸1 ) = 1 − 𝑃(𝐸1 ) = 1 − 10
4
Note that the urn 1 is empty means all the marbles are put into the urn 2, 3, or
4. In this case, according to the type 4 of the ball and urn problem in the
preceding chapter, there are 310 states to do so.
b.
210
𝑃(𝐸1 ∪ 𝐸2 ) = 1 − 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 ) = 1 −
410
86 | P a g e
Note that the urns 1 and 2 are empty means all the marbles are put into the
urn 3 or 4, and there are 210 states to do so.
c.
310 310 210
𝑃(𝐸1 𝑐 ∪ 𝐸2 𝑐 ) = 𝑃(𝐸1 𝑐 ) + 𝑃(𝐸2 𝑐 ) − 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 ) = + −
410 410 410
d.
𝑃(𝐸1 ∩ 𝐸2 ) = 1 − 𝑃(𝐸1 𝑐 ∪ 𝐸2 𝑐 ) = 1 − [𝑃(𝐸1 𝑐 ) + 𝑃(𝐸2 𝑐 ) − 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 )]
310 310 210
= 1 − [ 10 + 10 − 10 ]
4 4 4
e.
𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ 𝐸4 ) = 1 − 𝑃(𝐸1 𝑐 ∪ 𝐸2 𝑐 ∪ 𝐸3 𝑐 ∪ 𝐸4 𝑐 )
= 1 − 𝑃(𝐸1 𝑐 ) − 𝑃(𝐸2 𝑐 ) − 𝑃(𝐸3 𝑐 ) − 𝑃(𝐸4 𝑐 ) + 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 ) + 𝑃(𝐸1 𝑐
∩ 𝐸3 𝑐 ) + 𝑃(𝐸1 𝑐 ∩ 𝐸4 𝑐 ) + 𝑃(𝐸2 𝑐 ∩ 𝐸3 𝑐 ) + 𝑃(𝐸2 𝑐 ∩ 𝐸4 𝑐 ) + 𝑃(𝐸3 𝑐 ∩ 𝐸4 𝑐 )
− 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 ∩ 𝐸3 𝑐 ) − 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 ∩ 𝐸4 𝑐 ) − 𝑃(𝐸1 𝑐 ∩ 𝐸3 𝑐 ∩ 𝐸4 𝑐 )
− 𝑃(𝐸2 𝑐 ∩ 𝐸3 𝑐 ∩ 𝐸4 𝑐 ) + 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 ∩ 𝐸3 𝑐 ∩ 𝐸4 𝑐 )
Where, in this problem, since the equations 𝑃(𝐸1 ) = ⋯ = 𝑃(𝐸4 ), 𝑃(𝐸1 ∩ 𝐸2 ) =

⋯ = 𝑃(𝐸3 ∩ 𝐸4 ), and 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) = ⋯ = 𝑃(𝐸2 ∩ 𝐸3 ∩ 𝐸4 ) are valid, the desired
probability is equal to:
4 4 4 4
= 1 − ( ) 𝑃(𝐸1 ) + ( ) 𝑃(𝐸1 ∩ 𝐸2 ) − ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) + ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ 𝐸4 )
1 2 3 4
10 10 10 10
4 3 4 2 4 1 4 0
= 1 − ( ) 10 + ( ) 10 + ( ) 10 + ( ) 10
1 4 2 4 3 4 4 4
Note that this problem is similar to the type 5 of the ball and urn problem
except that the probability of the event is required in this section, while its
number of states was required in the preceding chapter. Therefore, with
𝑛(𝐸)
respect to the formula 𝑃(𝐸) = , if the number of states of the problem type
𝑛(𝑆)
5 in the preceding chapter is divided by the total number of states, 410 , then
the probability of the event in this section is obtained.
f.
310 210
𝑃(𝐸1 𝑐 ∩ 𝐸2 ) = 𝑃(𝐸1 𝑐 ) − 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 ) = −
410 410
g.
87 | P a g e
𝑐 𝑐 𝑐 𝑐 𝑐 𝑐 310 310 210 310 210
𝑃(𝐸1 𝛥𝐸2 ) = 𝑃(𝐸1 ) + 𝑃(𝐸2 ) − 2𝑃(𝐸1 ∩ 𝐸2 ) = 10 + 10 − 2 × 10 = 2 × 10 − 2 × 10
4 4 4 4 4
Example 5.2
An urn contains 12 marbles numbered 1 through 12. If we select five marbles at

random and with replacement from the urn, it is desired to calculate the probability
that:
a. The marble 1 is selected.
b. At least one of the marbles 1 and 2 is selected.
c. The marbles 1 and 2 are selected.
d. The marble 1 is selected, but the marble 2 is not selected.
Solution.
a. Since selections are with replacement, direct calculation of the probability
that the marbles are selected is time-consuming because one marble can be
selected once or more than once. However, calculating its complementary
probability is straightforward. For instance, in this example, the number of
states of selecting five marbles with replacement such that the marble 𝑖 is not
selected equals 105 . Hence, if 𝐸𝑖 is the event that the marble 𝑖 is selected, then
we have:
105
𝑃(𝐸1 ) = 1 − 𝑃(𝐸1 𝑐 ) = 1 −
115
b.
95
𝑃(𝐸1 ∪ 𝐸2 ) = 1 − 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 ) = 1 −
115
c.
𝑃(𝐸1 ∩ 𝐸2 ) = 1 − 𝑃(𝐸1 𝑐 ∪ 𝐸2 𝑐 ) = 1 − [𝑃(𝐸1 𝑐 ) + 𝑃(𝐸2 𝑐 ) − 𝑃(𝐸1 𝑐 ∩ 𝐸2 𝑐 )]
105 105 95
= 1 − [ 5 + 5 − 5]
11 11 11
88 | P a g e
d.
105 105 105 95
𝑃(𝐸1 ∩ 𝐸2 𝑐 ) = 𝑃(𝐸1 ) − 𝑃(𝐸1 ∩ 𝐸2 ) = (1 − ) − [1 − ( + − )]
115 115 115 115
105 95
= −
115 115
That is, since calculation of the probability of 𝐸𝑖 is more complicated than its
complement, we have:
105 95
𝑃(𝐸1 ∩ 𝐸2 𝑐 ) = 𝑃(𝐸2 𝑐 ∩ 𝐸1 ) = 𝑃(𝐸2 𝑐 ) − 𝑃(𝐸2 𝑐 ∩ 𝐸1 𝑐 ) = 5 − 5
11 11
Example 5.3
Suppose that 12 cards numbered 1 through 12 are given to each of the people
coming from a five-member group. If each of them randomly selects one of the cards
ranging from 1 to 12, what is the probability that at least one of them chooses card
number 1?
Solution. Suppose that 𝐸𝑖 denotes the event that the person 𝑖 chooses card number
1. Hence, the probability that at least one of the people chooses it is equal to:
115
𝑃(𝐸1 ∪ 𝐸2 ∪ … ∪ 𝐸5 ) = 1 − 𝑃(𝐸1𝑐 ∩ 𝐸2𝑐 ∩ … ∩ 𝐸5𝑐 ) = 1 − ( )
125
Another method to solve this problem is as follows:
𝑃(𝐸1 ∪ 𝐸2 ∪ 𝐸3 ∪ 𝐸4 ∪ 𝐸5 )
= 𝑃(𝐸1 ) + ⋯ + 𝑃(𝐸5 ) − 𝑃(𝐸1 ∩ 𝐸2 ) − ⋯ − 𝑃(𝐸4 ∩ 𝐸5 ) + ⋯ + 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ 𝐸4 ∩ 𝐸5 )
89 | P a g e
5 5 5 5
= ( ) 𝑃(𝐸1 ) − ( ) 𝑃(𝐸1 ∩ 𝐸2 ) + ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) − ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ 𝐸4 )
1 2 3 4
5
+ ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ 𝐸4 ∩ 𝐸5 )
5
5 1 5 1 5 1 5 1 5 1
= ( ) ( ) − ( ) ( )2 + ( ) ( )3 − ( ) ( )4 + ( ) ( )5
1 12 2 12 3 12 4 12 5 12
A wrong solution to solve this problem is to satisfy the restriction that at least
one of the 𝐸𝑖 's occurs. That is, we first select one of the people to choose card number
5
1, which is possible in ( ) states. Then, the remaining people can choose their desired
1
card without restriction, which is possible in 124 states. Therefore, according to this
method, the required probability of the example is equal to:
5
( ) 124 5
1 =
125 12
which is wrong. In this method, many states are counted multiple times.
Contrary to the previous method, these states should be subtracted from the above
answer. In fact, the result of this approach is equal to the first term of the above
5 5 1
solution which is ( ) 𝑃(𝐸1 ) = ( ) (12).
1 1
Example 5.4
If we randomly arrange four sister-brother couples in a row, what is the

probability that none of the couples is next to each other?
Solution. Direct solution to this problem is tough because direct enumeration of the
number of states in which none of the couples is next to each other is impossible by
using the principle of multiplication. This is because, if we alternated putting people
from left to right in different places, the number of states that the next person to sits
after the preceding person would depend on whether one's couple sits in one of the
places. Another method to solve this example is to use the complementary method
90 | P a g e
and De Morgan's law. Suppose that 𝐸𝑖 denotes the event that the individuals of couple
𝑖 are next to each other. Therefore, we have:
𝑃(𝐸1𝑐 ∩ 𝐸2𝑐 ∩ 𝐸3𝑐 ∩ 𝐸4𝑐 ) = 1 − 𝑃(𝐸1 ∪ 𝐸2 ∪ 𝐸3 ∪ 𝐸4 )
4 4 4 4
= 1 − ( ) 𝑃(𝐸1 ) + ( ) 𝑃(𝐸1 ∩ 𝐸2 ) − ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) + ( ) 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ 𝐸4 )
1 2 3 4
4 7! 2! 4 6! 2! 2! 4 5! 2! 2! 2! 4 4! 2! 2! 2! 2!
= 1−( ) +( ) −( ) +( )
1 8! 2 8! 3 8! 4 8!
For instance, 𝑃(𝐸1 ∩ 𝐸2 ) means that the individuals of couple 1 are next to each
other, and couple 2 are next to each other as well. The total number of states is equal
to 8!. To occur such an event, we regard the individuals of couple 1 as one group,
those of couple 2 as one group, and each of the remaining four people as one group.
Therefore, we have six groups altogether to be arranged in a row in 6! states, and
since each of the couples 1 and 2 can be arranged in their group in 2! states, 𝑃(𝐸1 ∩
𝐸2 ) is equal to:
6! 2! 2!
8!
Example 5.5
Suppose that each of the 𝑛 men participating at a party throws his hat into the
center of the room. Then, the hats are mixed, and each of the 𝑛 men randomly
chooses a hat. What is the probability that
a. No man selects his hat.
b. Exactly 𝑘 people select their hats.
Solution.
a. The total number of states of distributing 𝑛 hats into 𝑛 people (in a way that
each person gets one hat) is equal to 𝑛!. Nevertheless, the number of states
in which no one selects his hat cannot directly be solved by the principle of
multiplication because the first person has 𝑛 states. If the first person
selects his hat, there are (𝑛 − 1) states for the second person, and if the first
91 | P a g e
person selects another person's hat except for the second individual's hat,
there are (𝑛 − 2) states for the second person. Therefore, to compute the
number of states of the required event, it is impossible to use the principle
of multiplication (the probability that the second person selects his hat
depends on the result of the first person's choice).
Now, if 𝐴𝑖 denotes the event that person 𝑖 selects his hat, to calculate the
number of states in which none of the 𝑛 people selects his hat, we have:
𝑃(𝐴1𝐶 ∩. . .∩ 𝐴𝐶𝑛 ) = 1 − 𝑃(𝐴1 ∪ 𝐴2 ∪. . .∪ 𝐴𝑛 )

𝑛 𝑛 𝑛
= 1 − [( ) 𝑃(𝐴1 ) − ( ) 𝑃(𝐴1 ∩ 𝐴2 ) + ⋯ + (−1)𝑛+1 ( ) 𝑃(𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑛 )]
1 2 𝑛
𝑛 1 × (𝑛 − 1) ! 𝑛 1 × 1 × (𝑛 − 2) ! 𝑛 1 × 1 × 1 × (𝑛 − 3) !
= 1 − [( ) −( ) +( ) −⋯
1 𝑛! 2 𝑛! 3 𝑛!
𝑛 1𝑛
+ (−1)𝑛+1 ( ) ]
𝑛 𝑛!
1 1 1 1 (−1)𝑛 1 1 1 1 1 (−1)𝑛
=1− + − + − ⋯+ = − + − + − ⋯+
1! 2! 3! 4! 𝑛! 0! 1! 2! 3! 4! 𝑛!
𝑛
(−1)𝑖
=∑
𝑖!
𝑖=0
𝑎𝑖
Note that, based on the MacLaurin expansion, the value of term ∑∞
𝑖=0 𝑖! is equal
to 𝑒 𝑎 that in the particular case of 𝑎 = −1 this expansion becomes as
(−1)𝑖
∑∞
𝑖=0 = 𝑒 −1. Therefore, if the value of 𝑛 is high in the problem above, the
𝑖!
(−1)𝑖
term ∑𝑛𝑖=0 or the probability that no one selects his hat is approximately
𝑖!
(−1)𝑖
equivalent to ∑∞
𝑖=0 , which is equivalent to 𝑒 −1 .
𝑖!
In addition, if 𝐸 denotes the event that no one selects his hat, by using the
𝑛(𝐸)
relationship 𝑃(𝐸) = , we have:
𝑛(𝑆)
𝑛 𝑛
𝑛(𝐸) (−1)𝑖 𝑛(𝐸) (−1)𝑖
𝑃(𝐸) = ⇒ [∑ ]= ⇒ 𝑛(𝐸) = 𝑛! [∑ ]
𝑛(𝑆) 𝑖! 𝑛! 𝑖!
𝑖=0 𝑖=0
92 | P a g e
It means that, in this example, the number of possible states to distribute the
hats among people such that no one selects his hat is equal to:
𝑛
(−1)𝑖
𝑛! [∑ ]
𝑖!
𝑖=0
For example, if 𝑛 = 5, the number of possible states to distribute the hats
among people such that no one selects his hat is equal to:
5
(−1)𝑖 1 1 1 1 1 1 5! 5! 5! 5! 5! 5!
5! [∑ ] = 5! [ − + − + − ] = [ − + − + − ]
𝑖! 0! 1! 2! 3! 4! 5! 0! 1! 2! 3! 4! 5!
𝑖=0
= 120 − 120 + 60 − 20 + 5 − 1 = 44
b. To count the number of states that exactly 𝑘 people select their hats, we first
count the number of states that 𝑘 people select their own hats, resulting in
𝑛
( ) states. Then, we count the number of states that the remaining (𝑛 − 𝑘)
𝑘
people do not select their own hats. According to the explanations of the first
1 1 1 1 (−1)𝑛−𝑘
section of this chapter, this is possible in (𝑛 − 𝑘)! [0! − 1! + 2! − 3! + ⋯ + ]
(𝑛−𝑘)!
states.
Therefore, the probability that exactly 𝑘 people select their hats is equal to:
𝑛 1 1 1 1 (−1)𝑛−𝑘 1 1 1 1 (−1)𝑛−𝑘
( ) (𝑛 − 𝑘)! [0! − 1! + 2! − 3! + ⋯ + ] 0! − 1! + 2! − 3! + ⋯ +
𝑘 (𝑛 − 𝑘)! (𝑛 − 𝑘)!
=
𝑛! 𝑘!
where for large values of 𝑛 and small values of 𝑘, the above expression
𝑒 −1
converges to .
𝑘!
This problem, known as the Matching problem [1], can be expressed as follows:
Suppose that we have 𝑛 distinguishable elements such that there is a one-to-
one correspondence between them and the elements of another set. Now, if we
assign one of the opposite set members to each of the 𝑛 elements, what is the
probability that there are exactly 𝑘 matches?
Some examples of the matching problem are as follows:
• Suppose that there are 𝑛 couples of brothers and sisters, and you want
to randomly guess them. What is the probability that you guess exactly
𝑘 right guesses?
93 | P a g e
• Suppose that a secretary randomly puts 𝑛 letters into 𝑛 envelopes
belonging to the individuals' names. What is the probability that she/he
puts exactly 𝑘 letters into the envelopes related to their owners?
• Suppose that you write down the name of 𝑛 people on different pages
and randomly give one of them to each of these 𝑛 people. What is the
probability that exactly 𝑘 people receive the pages related to their
names?
It is seen in each of the examples above that there is a one-to-one
correspondence between the members of the two sets.
In the preceding example, if 𝑛 = 5 and 𝐹𝑘 denotes the event that exactly 𝑘
matches occur, for values 𝑘 = 0, 1, 2, . . . , 5, we have:
1 1 1 1 1 1
0! − 1! + 2! − 3! + 4! −
𝑃(𝐹0 ) = 5!
0!
1 1 1 1 1
0! − 1! + 2! − 3! + 4!
𝑃(𝐹1 ) =
1!
1 1 1 1
− + −
𝑃(𝐹2 ) = 0! 1! 2! 3!
2!
1 1 1
− +
𝑃(𝐹3 ) = 0! 1! 2!
3!
1 1
−
𝑃(𝐹4 ) = 0! 1! = 0
4!
1
1
𝑃(𝐹5 ) = 0! =
5! 5!
Note that the probability of occurring exactly 𝑛 − 1 matches is equal to zero
in the matching problem. This because, if 𝑛 − 1 matches occur, then the 𝑛𝑡ℎ match
definitely occurs as well. Moreover, the probability that 𝑛 matches occur is equal
1
to because the number of states of performing trials is equal to 𝑛!, only one of
𝑛!
which makes all the random matches correct.
Note that if we have 𝑛 couple or in other words, we have 𝑛 distinct elements
such that there is a one-to-one correspondence between them and the members
of another set, it is possible to define different random trials, all of which are not
necessarily the trial of the matching problem. The defined random trial for 𝑛
94 | P a g e
couples is called the matching problem only when we establish a one-to-one
correspondence between the members of the two groups in a random way. This is
possible in 𝑛! states. For example, if we have four couples consisting of brothers
and sisters, many random trials are possible to be defined such as:
• Randomly arranging them in a row, whose number of states is equal
to 8!.
• Randomly arranging them at a round table, whose number of states
is equal to 7!.
• Randomly distributing them into four two-member groups, whose
8!
number of states is equal to .
(2!)4 4!
• Randomly grouping each girl with one boy, whose number of states
is equal to 4!.
Among all the random trials above, only the last random trial relates to the
matching problem, stating that each of the group members of the girls has been
randomly matched with each of group members of the boys.
95 | P a g e
1) An index increases on Monday, Tuesday, and Wednesday with respective
probabilities 0.8, 0.7, and 0.6. It is desired to find the least and most
probabilities that the index increases on Monday, Tuesday, and Wednesday.
1
2) Suppose that 𝐴1 , 𝐴2 , and 𝐴3 are three events such that 𝑃(𝐴𝑖 ) = 2+𝑖. Now, find
an upper bound for 𝑃(𝐴1 ∪ (𝐴2 ∩ 𝐴3 )).
3) Prove that the term √𝑃(𝐴)𝑃(𝐵) is an upper bound for 𝑃(𝐴 ∩ 𝐵), and a lower
bound for 𝑃(𝐴 ∪ 𝐵).
4) If 𝐴 and 𝐵 are two mutually exclusive events from one sample space,
investigate whether each of the following propositions is correct or incorrect.
a. 𝑃(𝐵 − 𝐴𝑐 ) = 0
b. 𝑃(𝐴 − 𝐵) = 𝑃(𝐵)
c. 𝑃(𝐵 − 𝐴𝑐 ) = 𝑃(𝐵)
d. 𝑃(𝐴 − 𝐵 𝑐 ) = 𝑃(𝐴 ∩ 𝐵 𝑐 )
e. 𝑃[(𝐴𝑐 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵 𝑐 )] = 𝑃(𝐴) + 𝑃(𝐵)
f. 𝑃(𝐴𝑐 ∩ 𝐵 𝑐 ) = 1 − 𝑃(𝐴) − 𝑃(𝐵) + 𝑃(𝐴). 𝑃(𝐵)
5) If events 𝐴, 𝐵, and 𝐶 are from one sample space, investigate whether each of
the following propositions is correct or incorrect.
a. If 𝑃(𝐴 ∩ 𝐵) = 0, then 𝐴 and 𝐵 are disjoint events.
b. If 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 0, the events 𝐴, 𝐵, and 𝐶 are mutually exclusive.
c. If 𝐴 and 𝐵 are disjoint events, then 𝑃(𝐴 ∩ 𝐵 𝑐 ) = 𝑃(𝐴).
d. If 𝐴 and 𝐵 are disjoint events, then 𝑃(𝐵 − 𝐴𝑐 ) = 0.
6) In a community, sixty percent of families own car, thirty percent own home,
and twenty percent own both car and home. If one family is randomly selected
from the community, what is the probability that this family own either car or
home, but do not own both of them?
96 | P a g e
7) In a sixty-member class, there are 35 men. 20 of men are under the age of 20,
and 25 of the class members are at least 20 years old. In this class, how many
women are 20 years old or over the age of 20?
8) A company has 100 staff, 48 of whom have experience over five years, and 53
of whom are as technicians. Moreover, 10 people are technicians and have
experience over five years. If someone is selected randomly, what is the
probability that the person is not a technician or has experience over five
years?
9) If 𝑃(𝐴) = 0.4, 𝑃(𝐵) = 0.5, 𝑃(𝐶) = 0.7, 𝑃(𝐴 ∩ 𝐶) = 𝑃(𝐴 ∩ 𝐵) = 0.2, 𝑃(𝐵 ∩ 𝐶) = 0.4,
and 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 0.1, then it is desired to calculate
a. 𝑃(𝐴 ∪ 𝐵 ∪ 𝐶 𝐶 )
b. 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 𝐶 )
10) Suppose that one essay is to be judged by three jurors. Suppose that when
there is one scientific mistake, jurors A, B, and C will detect it with probabilities
0.6, 0.3, and 0.4, respectively. The probability of detecting the scientific
mistake by jurors A and B is 0.2, A and C is 0.2, and B and C is 0.1. Moreover,
the probability of detecting the scientific mistake by jurors A, B, and C is 0.1. It
is desired to calculate the probability that
a. At least one juror detects the scientific mistake.
b. Jurors A and B detect the scientific mistake, but juror C does not detect
it.
c. Exactly one of the jurors detects the scientific mistake.
d. Exactly two jurors detect the scientific mistake.
11) We randomly arrange four couples of sisters and brothers numbered 1 through
4 in a row. It is desired to calculate the probability of each of the following
events:
a. The sisters and brothers are next to each other.
b. The girls are next to each other.
c. The girls are next to each other, and so do the boys.
d. The girls and boys alternate.
97 | P a g e
e. At least the individuals of one of the couples are next to each other.
f. None of the individuals of the couples are next to each other.
g. The individuals of couple 1 are next to each other, but those of couple 2
are not.
h. The people of couple 1 are next to each other with a distance.
i. The people of couple 1 are next to each other with at least one distance.
12) Reconsider the preceding problem. Solve it if we randomly arrange the people
at a round table.
13) If ten men and seven women are randomly arranged in a row, find the
probability that
a. No two women are next to each other.
b. All women are next to each other.
14) If we want to randomly arrange 𝑛 (𝑛 > 5) people at a round table, it is desired
to calculate the probability that
a. Person 𝐴 is next to person 𝐵 at the right side as well.
b. Person 𝐴 is next to people 𝐵 and 𝐶.
c. Person 𝐴 is next to person 𝐵, person 𝐵 is next to person 𝐶, and person
𝐶 is next to person 𝐷.
d. None of the people 𝐵, 𝐶, and 𝐷 is next to person 𝐴.
15) Suppose that we want to randomly select one letter from the word
PROBABILITY and one letter from the word PROPER. What is the probability
that the letters selected from two words are alike?
16) A box contains 50 bolts, 10 of which are defective. If one labor randomly selects
five bolts simultaneously, what is the probability that more than two of them
are defective?
17) A box contains 20 marbles numbered 1 through 20. We randomly select four
marbles without replacement from the box.
a. The probability that the minimum number selected is equal to 10.
98 | P a g e
b. The probability that the maximum number selected is equal to 10.
c. Solve Sections “a” and “b” with respect to sampling with replacement.
18) There are 𝑁 distinguishable marbles numbered 1 through 𝑁 in a bag. We select
𝑛 marbles at random and without replacement from the bag. It is desired to
find the probability that
a. The maximum number of the remaining marbles in the bag is equal to
𝑚 (𝑚 ≤ 𝑁).
b. The minimum number of the remaining marbles in the bag is equal to 𝑚
(𝑚 ≤ 𝑁).
19) A fair die is rolled six times. It is desired to calculate the probability that:
a. Three pairs appear.
b. One number appears once, other number appears twice, and the other
number appears three times.
c. Number 2 appears exactly once, and number 1 appears exactly twice.
20) A fair die is rolled three times. It is desired to calculate the probability that:
a. The numbers appearing are alike.
b. Only the first and second flips are alike.
c. The numbers appearing are not alike.
d. The outcome of each flip is more than that of the preceding flip.
e. The outcome of the third flip is more than the outcomes of two
preceding flips.
21) How many times should a fair die be flipped until the probability of the
following events' occurrence is greater than or equal to 0.95?
a. A one and a six turn up at least once.
b. A one, a two, and a three turn up at least once.
c. All faces turn up at least once.
22) A fair die is rolled ten times. It is desired to calculate the probability that the
outcomes of throws are non-decreasing.
99 | P a g e
Hint: to solve it, rewrite one non-decreasing outcome like 𝑣1 =
(1,2,2,2,3,5,5,5,6) as the vector 𝑣2 = (1,3,1,1,3,1) in which the 𝑖 𝑡ℎ element of the
second vector denotes the number of repetitions of number 𝑖 in the first
vector.
23) We randomly divide six boys and six girls into six two-member groups. It is
desired to calculate the probability that:
a. Boys are grouped together, and so do girls.
b. Each group consists of one boy and one girl.
c. Two groups consist of only boys, two groups consist of boys and girls,
and the other two groups consist of boys and girls.
24) We randomly divide six boys and six girls into two six-member groups. It is
a. Both groups consist of an equal number of boys.
b. Boys are grouped together, and so do girls.
25) Twelve students are divided into three four-member groups at random. If
three intelligent students are among them, what is the probability that each
group consists of one intelligent student?
26) If we want to randomly make a five-digit number using the natural numbers
from 1 to 6, it is desired to calculate the probability that:
a. The created number contains the digit 5 or 6.
b. The created number contains the digits 5 and 6.
27) If we want to randomly make a five-digit number using the integer numbers
from 0 to 9, it is desired to calculate the probability that:
a. The created number contains the digit 5 or 6.
b. The created number contains the digits 5 and 6.
28) 𝑁 numbers are labelled from 1 to 𝑁. A sample of size 𝑛 is selected at random.
It is desired to calculate the probability that the selected sample contains
number 1, if:
a. Sampling is with replacement.
100 | P a g e
b. Sampling is without replacement.
29) 𝑁 numbers are labeled from 1 to 𝑁. A sample of size 2 is selected at random. It
is desired to calculate the probability that the selected sample contains
number 1, but it does not contain number 2, if:
30) Twenty people ride a bus in a city. The bus starts moving from the station 1,
and it has other four stations on its way. Assuming that these people are
equally likely to get off at each of the four stations, and the bus stops at those
stations in which at least one person gets off, it is desired to calculate the
probability that:
a. No one gets off at station 1.
b. All the people get off at station 1.
c. At least one person gets off at each station.
d. Five people get off at each station.
e. Exactly five people get off at station 1.
f. At most one person gets off at station 1.
g. At least two people get off at station 1.
h. The bus stops at exactly two stations.
i. The bus stops at stations 1 and 2.
j. The bus stops at station 1, but it does not stop at station 2.
31) If we randomly distribute 𝑛 balls into 𝑟 urns, what is the probability that none
of the urns gets occupied by more than one ball (𝑛 ≤ 𝑟)?
32) We randomly distribute nine balls numbered from 1 through 9 into three
boxes.
a. What is the probability that each of the three boxes gets occupied by
the same number of balls?
101 | P a g e
b. What is the probability that the balls of each box contain consecutive
numbers?
33) If six mothers along with one of their children are invited to a party, and one
asks you to randomly guess each mother's child, what is the probability that 3
of your guesses are correct?
34) A secretary has typed five letters, each of which belongs to a particular person,
and five envelopes with their names have been provided for sending. If the
secretary randomly puts the letters into the envelopes, what is the probability
that the letter has been put wrongly in three envelopes?
35) There are five cards numbered 1 through 5. These cards are put on a table
such that their numbers are not definite. If one asks you to randomly guess
the number of each of these cards, it is desired to find the probability that:
a. Only one of your guesses is true.
b. Only cards with numbers 1 and 2 are correctly guessed.
c. At least one of the guesses is true.
d. 4 of your guesses are true.
36) If we define finite sets of A, B, C, and D as follows:
𝐴 = {2,3,4} 𝐵 = {1,3,4} 𝐶 = {1,2,4} 𝐷 = {1,2,3}
how many states are there for (𝑎, 𝑏, 𝑐, 𝑑) in a way that 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵, 𝑐 ∈ 𝐶, and
𝑑 ∈ 𝐷 provided that repetition is not admissible?
37) We have six blue cards numbered 1 through 6, and four red cards numbered 1
through 4. We want to randomly draw 3 blue and 3 red cards and pair them
together. It is desired to calculate the probability that:
a. The blue card number 1 and red card number 1 are paired together.
b. The blue card number 1 and red card number 1 are selected, but they
are not paired together.
38) A box contains five white and 𝑛 black marbles. Two marbles are randomly
withdrawn from the box. What is the value of 𝑛 such that the probability of:
102 | P a g e
a. Choosing the first marble as white and the second one as black becomes
5
?
18
10
b. Choosing two marbles with different colors becomes 21?
39) A box contains ten marbles numbered 1 through 10. If we randomly select three
marbles from the box, it is desired to calculate the probability that:
a. The numbers of the three marbles selected are consecutive.
b. The result of multiplication of these three marbles is an even number.
c. There are at least one odd number and one even number among these
three marbles.
d. The sum of these three marbles is an odd number.
40) A box contains ten marbles numbered 1 through 10. Likewise, we have ten urns
numbered 1 through 10. We draw three marbles at random and without
replacement from the box and randomly distribute them into the urns. It is
desired to find the probability that the marble 5 is put into the urn 5 if:
a. Each urn contains at most one marble.
b. Each urn can contain more than one marble.
41) A faculty consisting of four distinguished professors, ten associate professors,
and four assistant professors is supposed to constitute a four-member
committee. If the committee members are randomly selected, what is the
probability that the committee consists of at least one person from each level?
42) In a weight lifting competition, five people from Russia, and five people from
other countries have participated. At the end of the competitions, these
people are ranked based on their achieved points. Suppose that no two points
are the same and all 10! different possible states related to the ranking are
equally likely. If so, it is desired to calculate the probability that:
a. The first rank belongs to a Russian participant.
b. The best rank attained by the Russian participants is 3.
c. The best and worst ranks attained by the Russian participants are equal
to 2 and 9, respectively.
103 | P a g e
43) There are 36 balls in a box consisting of four different colors and numbered 1
through 9. We select three balls at random and without replacement from the
box. It is desired to calculate the probability that:
a. The balls selected are consecutive numbers.
b. The balls selected have the same color and are consecutive numbers.
c. The balls selected have different colors and are consecutive numbers.
d. The balls selected are consecutive numbers and the color of all marbles
is not the same.
44) We randomly arrange natural numbers from 1 to 𝑛. What is the probability that:
a. The number 2 immediately appears after the number 1?
b. The number 2 immediately appears after the number 1, and the number
3 immediately appears after the number 2?
45) A box contains 𝑚 marbles numbered 1 through 𝑚. If we each time draw one
marble at random and with replacement and repeat it 𝑛 times, what is the
probability that:
a. No marble is selected twice?
b. The first repeated number appears on the 𝑛𝑡ℎ choice?
c. Two numbers appear twice, and each of the other numbers appears
once?
46) A box contains 15 marbles numbered from 1 to 15. Five marbles are randomly
selected from the box. In each of the following states, it is desired to calculate
the probability that the minimum and maximum numbers of the selected
marbles are 1 and 15, respectively:
47) A box contains 15 marbles numbered 1 through 15. Five marbles are randomly
selected from the box. In each of the following states, it is desired to calculate
104 | P a g e
the probability that the minimum and maximum numbers of the marbles
selected are 8 and 12, respectively:
48) In a random game, there are four different choices. Five players are asked to
participate in this game and randomly select one of the choices available. What
is the probability that the sum of the number of choices elected by these five
players is equal to 2?
49) In the preceding problem, if a choice is elected more than the other ones, it is
designated as the dominant choice, and a 3000$-prize is equally divided
among the players who have elected this choice. What is the probability that
this prize is divided into three parts?
105 | P a g e
C onditional probability is one of the most important concepts in the probability
theory having many applications in problems and probabilistic analyses of the
real world. It provides the possibility of calculating the probability of an event when
some parts of the sample space information are accessible, or our information
increases as time passes. In fact, in a random trial, sometimes knowledge of an
event's occurrence may change another event's probability in the same space. In
such conditions, to obtain the probability of the mentioned event considering the
knowledge of the primary event's occurrence, conditional probability concepts are
used. Moreover, using conditional probability can facilitate the solution of many
complex problems in many cases.
S uppose that one fair die is rolled. If it lands on 1, 2, or 3, person E will be given a
prize. If it lands on 3, 4, or 5, person F will be given a prize (if it lands on 3, both
of them will be given the prizes), and if it lands on 6, nobody receives the prize. The
Venn diagram of this trial is as follows:
106 | P a g e
3
In such situations, it is evident that person E assigns probability to his win.
6
Now, if he is informed that the die is rolled, and person F wins, then person E knows
the winning event of person F means that one of the faces 3, 4, or 5 has appeared. It
is evident that these three outcomes are equally likely. Hence, the event that E wins
occurs only if the die lands on 3, and both of E and F win. Therefore, given the new
1
information, person E assigns probability 3 to his win. In other words, before knowing
3
the occurrence of event E, the probability that event E occurs was equal to 6.
However, after knowing the occurrence of event F, the probability that event E
1
occurs is equal to 3, which is an example for the conditional probability concept.
If event F occurs, the sample space of the problem is simply reduced to the set
of F. It is evident that, in such a situation, event E can occur only if one of the
members of the event 𝐸 ∩ 𝐹 occurs (the part of E that is present in F). Hence, if we
know the event E has occurred, the probability that E occurs is equal to the
probability that the event 𝐸 ∩ 𝐹 occurs divided by the probability that event F occurs
leading to the following definition:
Definition: If F is not empty or 𝑃(𝐹) > 0, then:
𝑃(𝐸 ∩ 𝐹)
𝑃(𝐸|𝐹) = (2-1)
𝑃(𝐹)
E F
107 | P a g e
Note that the occurrence of event F can decrease, increase, or does not
change the probability of occurrence of event E. To clarify the point, consider the
preceding example that the occurrence of event F reduced the probability of
3 1
occurrence of event E from 6 in the main space to 3 in the reduced space. However,
if, for example, the event F was defined as {3,4} based on the formula of the
conditional probability, the probability of occurrence of event E in the reduced space
1
would be equal to . This is equal to the probability of occurrence of event E in the
2
main space. Furthermore, if event F was defined as {3} based on the formula of the
conditional probability, the probability of occurrence of event E in the reduced space
would be equal to 1, which is greater than that of occurrence of event E in the main
space.
Example 2.1
Consider a box in which there are 4 white and 5 black marbles. We select a
sample of size 3 marbles randomly and without replacement from the box. If the
chosen sample contains two white marbles, what is the probability that the first
selected marble is white?
Solution. Suppose that F denotes the event that the selected sample contains 2 white
marbles, and E denotes the event that the first selected marble is white. For the
occurrence of event F, either the first and second choices, the first and third choices,
or the second and third choices should be white. Furthermore, for the occurrence of
𝐸 ∩ 𝐹, the first choice should be white, and one of the second and third choices
should also be white. That is, either the first and second choices or the first and third
choices should be white. Therefore, the probability that event E occurs when we
know that event F has occurred is equal to:
4×3×5 4×5×3
𝑃(𝐸|𝐹) =
𝑃(𝐸 ∩ 𝐹)
= 9×8×7+9×8×7 =
2
𝑃(𝐹) 4×3×5 4×5×3 5×4×3 3
9×8×7+9×8×7+9×8×7
108 | P a g e
Example 2.2
There are 4 red and 6 blue marbles in a box. If we select 3 marbles at random
and without replacement from the box and know that at least one of them is red,
what is the probability that all of the marbles selected are red?
Solution. Suppose that F denotes the event that at least one of the choices is red,
and E denotes the event that all of them are red. In such a situation, the probability
that event E occurs when we know that event F has occurred is equal to:
4 6
( )( )
3 0
10
𝑃(𝐸 ∩ 𝐹) ( ) 4 4
𝑃(𝐸|𝐹) = = 3 = =
𝑃(𝐹) 4 6 4 6 4 6
[( ) ( ) + ( ) ( ) + ( ) ( )] 60 + 36 + 4 100
1 2 2 1 3 0
10
( )
3
Note that if event E is a subset of event F, then to calculate 𝑃(𝐸|𝐹), we have:
𝑃(𝐸 ∩ 𝐹) 𝑃(𝐸)
𝑃(𝐸|𝐹) = =
𝑃(𝐹) 𝑃(𝐹)
E F
Likewise, if event E is a subset of event F, then it is evident that the value of

𝑃(𝐹|𝐸) is equal to 1, because:
𝑃(𝐸 ∩ 𝐹) 𝑃(𝐸)
𝑃(𝐹|𝐸) = = =1
𝑃(𝐸) 𝑃(𝐸)
Indeed, if event E is a subset of event F, and we know that event E has
occurred, it means that one of its members has occurred. If so, one of the members
of F has certainly occurred, and the probability that F occurs is equal to 1.
109 | P a g e
Moreover, if E and F are two disjoint events, then we have:
𝑃(𝐸 ∩ 𝐹)
𝑃(𝐸|𝐹) = =0
𝑃(𝐹)
E F
It means that if E and F are two disjoint events, given that event F occurs, the
probability that event E occurs is equal to zero. In other words, given that event F
occurs, event E does not occur.
Example 2.3
In a game based on chance, a person is given a pair of dice to throw. If the sum
of the upturned faces in his throws equals 3, 5, or 7, the person is given a prize
commensurate with the sum of the upturned faces. If we know that the person has
already received a prize when playing this game, what is the probability that the sum
of the upturned faces is equal to 7?
Solution. Suppose that F denotes the event that the sum of the upturned faces of
two dice equals 3, 5, or 7, and E denotes the event that the sum equals 7. In such a
situation, according to Example 4.11 of Chapter 2, the probability that the sum of two
2 4 6
dice equals 3, 5, or 7 is equal to 36, 36, and 36, respectively. Therefore, the probability
that event F occurs is equal to:
2 4 6 12
𝑃(𝐹) = + + =
36 36 36 36
110 | P a g e
It is evident that event E is a subset of event F. As a result, the event 𝐸 ∩ 𝐹 is
equivalent to event E. Hence, the probability that event E occurs when we know that
event F has occurred is equal to:
6
𝑃(𝐸 ∩ 𝐹) 𝑃(𝐸) 36 1
𝑃(𝐸|𝐹) = = = =
𝑃(𝐹) 𝑃(𝐹) 12 2
36
Example 2.4
Consider a trial of rolling a fair die. If the events E and F are defined as 𝐸 =
{1,2,3} and 𝐹 = {3,4,5}, then answer to each of the following questions.
a. If we know at least one of the events E and F has occurred, what is the
probability that both of them occur?
b. If we know that at least one of the events E and F has occurred, what is the
probability that only one of them occurs?
c. If we know that only one of the events E and F has occurred, what is the
probability that event E occurs?
Solution. The Venn Diagram of this problem is as follows:
a. Considering the given explanations in the section of algebra of sets in Chapter

2, we have:
1
𝑃((𝐸 ∩ 𝐹) ∩ (𝐸 ∪ 𝐹)) 𝑃(𝐸 ∩ 𝐹) 6 1
𝑃(𝐸 ∩ 𝐹|𝐸 ∪ 𝐹) = = = =
𝑃(𝐸 ∪ 𝐹) 𝑃(𝐸 ∪ 𝐹) 5 5
6
111 | P a g e
b.
4
𝑃((𝐸𝛥𝐹) ∩ (𝐸 ∪ 𝐹)) 𝑃(𝐸𝛥𝐹) 4
𝑃(𝐸𝛥𝐹|𝐸 ∪ 𝐹) = = =6=
𝑃(𝐸 ∪ 𝐹) 𝑃(𝐸 ∪ 𝐹) 5 5
6
c.
2
𝑃(𝐸 ∩ (𝐸𝛥𝐹)) 𝑃(𝐸 − 𝐹) 6 2
𝑃(𝐸|𝐸𝛥𝐹) = = = =
𝑃(𝐸𝛥𝐹) 𝑃(𝐸𝛥𝐹) 4 4
6
It should be noted that conditional probability is simply the same probability
with a changed sample space of the problem. Therefore, all the propositions resulting
from the probability theory principles, expressed in Section 2.5 of Chapter 2, are valid
for the conditional probability. For example, it was proven in Proposition 5-1 of
Chapter 2 that the expression 𝑃(𝐸) + 𝑃(𝐸 𝑐 ) is equal to 1. This proposition is also valid
in the conditional probability as follows:
𝑃(𝐸|𝐹) + 𝑃(𝐸 𝑐 |𝐹) = 1
Proof.
𝑐
𝑃(𝐸 ∩ 𝐹) 𝑃(𝐸 𝑐 ∩ 𝐹) 𝑃(𝐹)
𝑃(𝐸|𝐹) + 𝑃(𝐸 |𝐹) = + = =1
𝑃(𝐹) 𝑃(𝐹) 𝑃(𝐹)
Likewise, it can be shown that all the propositions proven by the principles of
the probability theory in the preceding chapter are valid in the conditional
probability as well. For instance, consider the following equations:
𝑃(𝐸1 ∪ 𝐸2 |𝐹) = 𝑃(𝐸1 |𝐹) + 𝑃(𝐸2 |𝐹) − 𝑃(𝐸1 ∩ 𝐸2 |𝐹)
or
𝑃(𝐸1 ∪ 𝐸2 |𝐹) = 1 − 𝑃(𝐸1𝑐 ∩ 𝐸2𝑐 |𝐹)
or
𝑃(𝐸1 − 𝐸2 |𝐹) = 𝑃(𝐸1 ∩ 𝐸2𝑐 |𝐹) = 𝑃(𝐸1 |𝐹) − 𝑃(𝐸1 ∩ 𝐸2 |𝐹)
or
𝑃(𝐸1 𝛥𝐸2 |𝐹) = 𝑃(𝐸1 |𝐹) + 𝑃(𝐸2 |𝐹) − 2𝑃(𝐸1 ∩ 𝐸2 |𝐹)
or
𝑃(𝐸1 ∩ 𝐸2 |𝐹) ≥ 𝑃(𝐸1 |𝐹) + 𝑃(𝐸2 |𝐹) − 1
112 | P a g e
I f we multiply both sides of Equation (2-1) by 𝑃(𝐹), the following equation is
obtained:
𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐹)𝑃(𝐸|𝐹) (3-1)
Likewise, it can be shown that:

𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐹 ∩ 𝐸) = 𝑃(𝐸)𝑃(𝐹|𝐸)
This law states that the probability of simultaneous occurrence of two events
𝐸 and 𝐹 is equal to the probability of occurrence of event 𝐸 multiplied by the
probability of occurrence of event E given that event 𝐹 has occurred.
Example 3.1
Suppose that a box contains 10 defective and 15 non-defective items. If we

withdraw two items randomly and without replacement from the box, what is the
probability that the first item is defective and the second one is non-defective?
Solution. Suppose that 𝐸1 denotes the event that the first item is defective, and 𝐹2
denotes the event that the second item is non-defective. Then, we have:
10 15
𝑃(𝐸1 ∩ 𝐹2 ) = 𝑃(𝐸1 ) 𝑃(𝐹2 |𝐸1 ) = ×
25 24
10
It is evident the probability that the first item is defective is equal to .
25
Moreover, if the first item is defective, the total number of items is 24, where 15 items
15
are nondefective. As a result, 𝑃(𝐹2 |𝐸1 ) = 24.
Equation (3-1) can be extended for the simultaneous occurrence of n events

as follows:
𝑃(𝐸1 ∩ 𝐸2 ∩. . .∩ 𝐸𝑛 ) = 𝑃(𝐸1 ) 𝑃(𝐸2 |𝐸1 ) 𝑃(𝐸3 |𝐸1 ∩ 𝐸2 ). . . 𝑃(𝐸𝑛 |𝐸1 ∩ 𝐸2 ∩. . .∩ 𝐸𝑛−1 )
(3-2)
113 | P a g e
We call the above equation the law of multiplication in the probability with
the following Proof.
𝑃(𝐸1 ) × 𝑃(𝐸2 |𝐸1 ) × 𝑃(𝐸3 |𝐸1 ∩ 𝐸2 ) × ⋯ × 𝑃(𝐸𝑛 |𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛−1 )
𝑃(𝐸1 ∩ 𝐸2 ) 𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 )
= 𝑃(𝐸1 ) × × × ⋯×
𝑃(𝐸1 ) 𝑃(𝐸1 ∩ 𝐸2 ) (𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛−1 )
= 𝑃(𝐸1 ∩ 𝐸2 ∩ … ∩ 𝐸𝑛 )
Example 3.2
In Example 3.1, what is the probability that the first and second items selected
are defective, and the third item selected is non-defective?
Solution. Suppose that events 𝐸1 and 𝐸2 denote the defectiveness of the first and
second selected items respectively, and event 𝐹3 denotes that the third selected item
is non-defective. In such a situation, the simultaneous occurrence of these three
events is obtained by using Equation (3-2) as follows:
10 9 15
𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐹3 ) = 𝑃(𝐸1 ) 𝑃(𝐸2 |𝐸1 ) 𝑃(𝐹3 |𝐸2 ∩ 𝐸1 ) = × ×
25 24 23
Example 3.3
Consider a group consisting of 6 girls and 4 boys with 4 sister-brother couples.

If we want to randomly guess the boys’ sisters, what is the probability that all the
guesses are true?
Solution. Suppose that 𝐸𝑖 denotes the event that we correctly guess the 𝑖 𝑡ℎ boy's
sister. Then, based on Equation (3-2), the probability that all the guesses are true is
equal to:
1 1 1 1 1
𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ 𝐸4 ) = 𝑃(𝐸1 )𝑃(𝐸2 |𝐸1 )𝑃(𝐸3 |𝐸1 ∩ 𝐸2 )𝑃(𝐸4 |𝐸1 ∩ 𝐸2 ∩ 𝐸3 ) = × × × =
6 5 4 3 360
114 | P a g e
Note that the probability of the correct guessing of the first boy's sister is equal
1
to 6. Moreover, if we correctly guess the first boy's sister, the probability that we
1
correctly guess the second boy's sister is equal 5, and so on.
W e say that the two events E and F are independent whenever the occurrence
of one event does not affect the probability of occurrence of the other one. In
other words, we say that the two events E and F are independent whenever:
𝑃(𝐸|𝐹) = 𝑃(𝐸) (4-1)
Likewise, if the two events are independent, in addition to the equation above,
the following equations are valid as well:
𝑃(𝐸 ∩ 𝐹)
𝑃(𝐸|𝐹) = 𝑃(𝐸) ⇔ = 𝑃(𝐸)
𝑃(𝐹) (4-2)
⇔ 𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐸)𝑃(𝐹)
𝑃(𝐸 ∩ 𝐹)
⇔ = 𝑃(𝐹)
𝑃(𝐸) (4-3)
⇔ 𝑃(𝐹|𝐸) = 𝑃(𝐹)
In fact, the above equations state that if the occurrence of event F does not
affect the probability of occurrence of event E, then the occurrence of event E does
not affect the probability of occurrence of event E as well.
To investigate the independence of two events, each of Equations (4-1), (4-2),
or (4-3) can be examined. If the two events are not independent, we say that they are
dependent.
115 | P a g e
Example 4.1
Suppose that a fair die is rolled once. If it lands on 1, 2, or 3, person E will be

given a prize. If it lands on 3 or 4, person F will be given the prize (if it lands on 3,
both of them are given the prize), and if it lands on 6, nobody receives the prize.
Investigate the independence of events E and F.
Solution. As mentioned, to investigate the independence of these two events, each
of Equations (4-1), (4-2), or (4-3) can be examined. For example, to examine Equation
(4-1), we have:
1
3 𝑃(𝐸 ∩ 𝐹) 6 1
𝑃(𝐸) = , 𝑃(𝐸|𝐹) = = = ⇒ 𝑃(𝐸|𝐹) = 𝑃(𝐸)
6 𝑃(𝐹) 2 2
6
3
As seen, the probability of occurrence of event E in the main space equals 6.
Furthermore, under the condition of knowledge about the occurrence of event F, the
1
probability of occurrence of event E equals 2. Therefore, even though the occurrence
of event F reduced the main sample space, it did not change the probability of
occurrence of event E. In such a situation, two events are independent of each other.
To investigate the independence of two events, each of the other two
equations can also be examined. To examine Equation (4-2), we have:
3 2 1
𝑃(𝐸) = , 𝑃(𝐹) = , 𝑃(𝐸 ∩ 𝐹) = ⇒ 𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐸)𝑃(𝐹)
6 6 6
or to examine Equation (4-3), we have:
1
2 𝑃(𝐸 ∩ 𝐹) 6 1
𝑃(𝐹) = , 𝑃(𝐹|𝐸) = = = ⇒ 𝑃(𝐹|𝐸) = 𝑃(𝐹)
6 𝑃(𝐸) 3 3
6
2
As seen, the probability of occurrence of event F in the main space equals 6.
Furthermore, under the condition of having knowledge about the occurrence of
116 | P a g e
1
event E, the probability of occurrence of event F equals 3. Hence, even though the
occurrence of event E reduced the main sample space, it did not change the
probability of occurrence of event F.
Note that the independence of the two events does not mean that they are not
related to each other. For instance, consider the above example such that the
member {3} was present in both of the events E and F. However, the occurrence of
one of these two events did not affect the probability of occurrence of the other
event.
Note that Equation (4-2) or 𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐸)𝑃(𝐹) is simply Equation (3-1) or
𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐸)𝑃(𝐹|𝐸). In fact, they are equivalent to each other because if the two
events are independent, 𝑃(𝐹|𝐸) is to be equal to 𝑃(𝐹).
Example 4.2
Suppose that we have 5 red, 5 blue, 5 black, and 5 white cards, that are are
numbered from 1 to 5, and we randomly select one of them. If E denotes the event
that the card number 1 is selected, and W denotes the event that a white card is
selected, investigate the independence of these two events.
Solution. As mentioned previously, to investigate the independence of these two
events, each of Equations (4-1), (4-2) or (4-3) can be examined. For instance, to
examine Equation (4-2), we have:
4 5 1
𝑃(𝐸) = , 𝑃(𝑊) =, 𝑃(𝐸 ∩ 𝑊) =
20 20 20
4 5 1
⇒ 𝑃(𝐸) 𝑃(𝑊) = × = 𝑃(𝐸 ∩ 𝑊) =
20 20 20
Therefore, these two events are independent, or to examine Equation (4-1), we
have:
4 1
𝑃(𝐸) = , 𝑃(𝐸|𝑊) =
20 5
⇒ 𝑃(𝐸|𝑊) = 𝑃(𝐸)
117 | P a g e
As a result, these two events are independent.
Example 4.3
Suppose that E and F are two nonempty and disjoint events from one sample
space. Investigate the independence of these two events.
Solution. To investigate the independence of these two events, it suffices to note
that since these two events are nonempty and disjoint, the value of 𝑃(𝐸 ∩ 𝐹) is equal
to zero. Therefore, it is not equal to the value of 𝑃(𝐸)𝑃(𝐹). Therefore, these two
events are not independent. Moreover, it can simply be shown that the value of
𝑃(𝐸|𝐹) is equal to zero and thus not equal to 𝑃(𝐸). Hence, the occurrence of F affects
the probability of occurrence of E, meaning that these two events are not
independent.
Example 4.4
Suppose that E and F are two nonempty events from one sample space such
that E is a subset of F, and none of them is equal to S. Investigate the independence
of these two events.
Solution. To investigate the independence of these two events, it suffices to note
that since E is a subset of F, and none of them is equal to S, the value of 𝑃(𝐸 ∩ 𝐹) is
equal to that of 𝑃(𝐸) and not equal to that of 𝑃(𝐸)𝑃(𝐹). Hence, these two events are
𝑃(𝐸∩𝐹) 𝑃(𝐸)
not independent. Moreover, the value of 𝑃(𝐸|𝐹) is equal to , which leads to 𝑃(𝐹),
𝑃(𝐹)
meaning that it is not equal to 𝑃(𝐸). In other words, the occurrence of F affects the
probability of occurrence of E.
118 | P a g e
Proposition 4-1
If E and F are two independent events, then
1. E and 𝐹 𝑐 are independent.
2. 𝐸 𝑐 and F are independent.
3. 𝐸 𝑐 and 𝐹 𝑐 are independent.
Proof. To prove the first equation, we have

𝑃(𝐸) = 𝑃(𝐸 ∩ 𝐹) + 𝑃(𝐸 ∩ 𝐹 𝑐 ) = 𝑃(𝐸)𝑃(𝐹) + 𝑃(𝐸 ∩ 𝐹 𝑐 )
⇒ 𝑃(𝐸) − 𝑃(𝐸)𝑃(𝐹) = 𝑃(𝐸 ∩ 𝐹 𝑐 ) ⇒ 𝑃(𝐸)(1 − 𝑃(𝐹)) = 𝑃(𝐸 ∩ 𝐹 𝑐 ) ⇒ 𝑃(𝐸)𝑃(𝐹 𝑐 ) = 𝑃(𝐸 ∩ 𝐹 𝑐 )
Therefore, 𝐸 and 𝐹 𝑐 are dependent as well. Likewise, it can be shown that

events 𝐸 𝑐 and 𝐹 are independent, and events 𝐸 𝑐 and 𝐹 𝑐 are also independent. Indeed,
this note states that if the two events are independent, the occurrence or non-
occurrence of one does not affect the probability of occurrence or non-occurrence
of the other one.
Example 4.5
In a class, there are 100 people including 40 men and 60 women. Consider that
16 men are smokers and 24 men are non-smokers. Now, we want to randomly choose
one of these 100 people. What are the number of female smokers and non-smokers
so that the events of being a smoker or non-smoker and the gender (being a man or
woman) are independent?
Solution. If 𝐶 and 𝐶 𝑐 denote the events of being a smoker and non-smoker,
respectively, and 𝑀 and 𝐹 denote the events of being male and female, respectively,
then events 𝐶 and 𝐶 𝑐 should be independent of events 𝑀 and 𝐹. Now, if, for example,
we suppose that the number of female smokers is equal to n and the number of
female non-smokers is equal to (60 − 𝑛), it leads to:
119 | P a g e
16 16 + 𝑛 40
𝑃(𝐶 ∩ 𝑀) = 𝑃(𝐶)𝑃(𝑀) ⇒ = × ⇒ 40 (16 + 𝑛) = 1600 ⇒ 𝑛 = 24
100 100 100
Therefore, for the independence of C and M, there should be 24 female
smokers and 36 female nonsmokers. However, to solve this problem, the other three
equations should be examined that due to the validity of the above equation they
become valid automatically. This relates to Proposition 4-1, saying that if events 𝐶
and 𝑀 are independent, then events 𝐶 𝑐 and 𝑀 are independent as well. Likewise,
events 𝐶 and 𝐹 and events 𝐶 𝑐 and 𝐹 are also independent.
Definition: Three events E, F, and G are said to be simultaneously independent if the
occurrence of one or two of them does not affect the probability of the other. In
other words, three events E, F, and G are said to be simultaneously independent
whenever:
𝑃(𝐸|𝐹) = 𝑃(𝐸) 𝑃(𝐺|𝐸) = 𝑃(𝐺) 𝑃(𝐹|𝐸) = 𝑃(𝐹)
𝑃(𝐸|𝐺) = 𝑃(𝐸) 𝑃(𝐺|𝐹) = 𝑃(𝐺) 𝑃(𝐹|𝐺) = 𝑃(𝐹)
𝑃(𝐸|𝐹 ∩ 𝐺) = 𝑃(𝐸) 𝑃(𝐺|𝐸 ∩ 𝐹) = 𝑃(𝐺) 𝑃(𝐹|𝐸 ∩ 𝐺) = 𝑃(𝐹)
It can be shown that if three events are simultaneously independent, the
following equations are valid, and vice versa. In other words, for investigating the
independence of three events, the following equations can be examined:
1. 𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐸) 𝑃(𝐹)
2. 𝑃(𝐸 ∩ 𝐺) = 𝑃(𝐸) 𝑃(𝐺)
3. 𝑃(𝐹 ∩ 𝐺) = 𝑃(𝐹) 𝑃(𝐺)
4. 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺) = 𝑃(𝐸) 𝑃(𝐹) 𝑃(𝐺)
If Equations 1, 2, and 3 are valid, then the three events are said to be pairwise
independent. In these conditions, Equation 4 can be valid or not. If, in addition to
Equations 1, 2, and 3, Equation 4 is also valid, these three events are said to be
simultaneously independent.
120 | P a g e
Example 4.6
Suppose that we throw two fair dice. If E, F, and G denote the events “the sum
of two dice is 7”, “the first die lands on 2”, and “the second die lands on 5”,
respectively, investigate the simultaneous independence of these three events.
Solution. Considering Example 4.11 in Chapter 2, we have:
6 6 6
𝑃(𝐸) = , 𝑃(𝐹) = , 𝑃(𝐺) =
36 36 36
1
𝑃(𝐸 ∩ 𝐹) = 𝑃{(2,5)} = = 𝑃(𝐸)𝑃(𝐹)
36
1
𝑃(𝐸 ∩ 𝐺) = 𝑃{(2,5)} = = 𝑃(𝐸)𝑃(𝐺)
36
1
𝑃(𝐹 ∩ 𝐺) = 𝑃{(2,5)} = = 𝑃(𝐹)𝑃(𝐺)
36
1
𝑃(𝐸 ∩ 𝐹 ∩ 𝐺) = 𝑃{(2,5)} = ≠ 𝑃(𝐸)𝑃(𝐹)𝑃(𝐺)
36
In this example, it is seen that even though the first three equations are valid,
Equation 4 is not valid. Therefore, these three events are pairwise independent, but
they are not simultaneously independent. Indeed, it can be shown that the
occurrence of one of the three events E, G, and F does not affect the probability of
occurrence of the other one. Nonetheless, the occurrence of two of them affects the
probability of occurrence of the other one. For example, consider the following
values:
6 1 1
𝑃(𝐸) = , 𝑃(𝐸|𝐹) = , 𝑃(𝐸|𝐺) = , 𝑃(𝐸|𝐹 ∩ 𝐺) = 1
36 6 6
⇒ 𝑃(𝐸) = 𝑃(𝐸|𝐹) = 𝑃(𝐸|𝐺) ≠ 𝑃(𝐸|𝐹 ∩ 𝐺)
Note that, in this example, if the first die lands on 2 and the second die lands
on 5, then the probability that the sum of two dice is 7 is equal to 1.
121 | P a g e
Note that if E, F, and G are independent, then E is independent of each event
consisting of E and G. For instance, event E is independent of event 𝐹 ∪ 𝐺 because:
𝑃(𝐸 ∩ (𝐹 ∪ 𝐺)) = 𝑃((𝐸 ∩ 𝐹) ∪ (𝐸 ∩ 𝐺))
= 𝑃(𝐸 ∩ 𝐹) + 𝑃(𝐸 ∩ 𝐺) − 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺)
= 𝑃(𝐸)𝑃(𝐹) + 𝑃(𝐸)𝑃(𝐺) − 𝑃(𝐹)𝑃(𝐹 ∩ 𝐺)
= 𝑃(𝐸)[𝑃(𝐹) + 𝑃(𝐺) − 𝑃(𝐹 ∩ 𝐺)]
= 𝑃(𝐸)𝑃(𝐹 ∪ 𝐺)
Example 4.7
Suppose that 𝐴1 , 𝐴2 , and 𝐴3 are three simultaneously independent events with

respective probabilities 0.2, 0.3, and 0.4. Obtain the value of 𝑃((𝐴1 ∪ 𝐴2 ) ∩ 𝐴3 ).
Solution. As mentioned, if three events are simultaneously independent, then one of
them is independent of each event consisting of the other two. Therefore, we have:
𝑃((𝐴1 ∪ 𝐴2 ) ∩ 𝐴3 ) = 𝑃(𝐴1 ∪ 𝐴2 )𝑃(𝐴3 ) = [𝑃(𝐴1 ) + 𝑃(𝐴2 ) − 𝑃(𝐴1 ∩ 𝐴2 )]𝑃(𝐴3 )

= (0.2 + 0.3 − (0.2 × 0.3)) × 0.4 = 0.176
3
As mentioned, to investigate the independence of three events, ( ) equations
2
3
should be examined for the twofold relationship and ( ) equations for the trinary
3
relationship. Likewise, it can be shown that the number of equations required to
investigate the simultaneous independence of N events is equal to:
𝑁 𝑁 𝑁 𝑁 𝑁
( ) + ( ) + ⋯ + ( ) = 2𝑁 − ( ) − ( ) = 2𝑁 − 1 − 𝑁
2 3 𝑁 0 1
Sometimes one trial can be divided into a series of some smaller subtrials. In
these conditions, we say that these subtrials are independent whenever each event
122 | P a g e
of these subtrials is independent of each event of the other subtrials. For example,
suppose that we flip a fair coin 10 times. This trial can be defined as a series of 10
subtrials, each of which consists of flipping the coin once. In such a situation, it can
be assumed that these subtrials are independent of each other since each result of
one of these subtrials does not affect the probability of occurrence of different
events related to the other subtrial.
Example 4.8
Suppose that teams A and B play soccer together. In each competition,

1
independently of the preceding competitions, team A wins with probability 6, team B
2 3
wins with probability , and the competition results in a draw with probability . This
6 6
competition is independently repeated until one of the teams wins. What is the
probability that team A wins before team B?
Solution. To win sooner, team A should win the first competition, or no team wins
the first competition (the competition is a draw) and team A wins the second
competition, or no team wins the first and second competitions and team A wins the
third competition, and so on. Therefore, if 𝑊𝐴 denotes the event that team A wins
sooner, and 𝑊𝐵 denotes the event that team B wins sooner, assuming that the results
of successive competitions are independent, we have:
∞ 1
1 3 1 3 2 1 3 𝑛−1 1 6 1 2
𝑃(𝑊𝐴 ) = + ( ) × + ( ) × + ⋯ = ∑( ) × = = ⇒ 𝑃(𝑊𝐵 ) = 1 − 𝑃(𝑊𝐴 ) =
6 6 6 6 6 6 6 3 3 3
𝑛=1
6
or
∞ 2
2 3 2 3 2 2 3 𝑛−1 2 6 2
𝑃(𝑊𝐵 ) = + ( ) × + ( ) × + ⋯ = ∑( ) × = =
6 6 6 6 6 6 6 3 3
𝑛=1
6
Likewise, it can be shown that if a trial is successively and independently
repeated such that in each trial, event A occurs with probability “𝑎”, event B occurs
with probability “𝑏” (there is no possibility for their simultaneous occurrence in each
123 | P a g e
trial), and none of them occurs with probability (1 − 𝑎 − 𝑏), then the probability that
event A occurs before event B is equal to:
𝑎
𝑃(𝑊𝐴 ) =
𝑎+𝑏
and the probability that event B occurs before event A is equal to:
𝑏
𝑃(𝑊𝐵 ) =
𝑎+𝑏
For example, if the answer to Example 4.6 is calculated with this method, the
required probability is equal to:
1
6 1
𝑃(𝑊𝐴 ) = =
1 2 3
6+6
Example 4.9
Suppose that people A and B alternately and successively fire a shot at a target.
First, A fires the shot. If the shot hits the target, he wins. Otherwise, B fires the shot.
If person B's shot hits the target, he wins. Otherwise, person A fires one shot at the
target again, and these two people fire the shot at the target until one of them wins.
1
If person A's shot hits the target with probability in each shot and person B's shot
6
2
hits the target with probability 6, then calculate the probability that A wins sooner.
Solution. Note that, in this example, people A and B alternately fire a shot at a
target. While in the preceding example, in each competition, both of the teams A
and B had a chance of winning, herein, person A has a chance of winning only in
odd numbered shots and person B has a chance of winning only in even numbered
shots. Now, if person A is meant to win sooner, then A's shot should hit the target
in the first shot, or the shot does not hit the target in the first shot and person B's
shot does not hit the target in the second shot and person A's shot hits the target
in the third shot, or etc. Therefore, if 𝑊𝐴 denotes the event that person A wins
124 | P a g e
sooner and 𝑊𝐵 denotes the event that person B wins sooner, assuming that the
successive shots are independent, we have:
1
1 5 4 1 5 4 2 1 6 3 5
𝑃(𝑊𝐴 ) = + ( × ) × + ( × ) × + ⋯ = = ⇒ 𝑃(𝑊𝐵 ) = 1 − 𝑃(𝑊𝐴 ) =
6 6 6 6 6 6 6 5 4
1−6×6 8 8
or
5 2
5 2 5 41 5 2 5 4 2 5 2 × 5
𝑃(𝑊𝐵 ) = × + ( × ) × × + ( × ) × × + ⋯ = 6 6 =
6 6 6 6 6 6 6 6 6 6 5 4
1−6×6 8
Likewise, it can be shown that if people A and B perform independent trials

successively and alternately, person A wins with probability “𝑎” in his/her
corresponding trials, person B wins with probability “𝑏” in his/her corresponding
trials, and A starts playing this competition, then we have:
𝑎 𝑎
𝑃(𝑊𝐴 ) = =
1 − (1 − 𝑎)(1 − 𝑏) 𝑎 + 𝑏 − 𝑎𝑏
(1 − 𝑎)𝑏 𝑏 − 𝑎𝑏
𝑃(𝑊𝐵 ) = =
1 − (1 − 𝑎)(1 − 𝑏) 𝑎 + 𝑏 − 𝑎𝑏
S uppose that 𝐹𝑖 's are a sequence of mutually exclusive events and their union
equals the sample space. In such a situation, we say that 𝐹𝑖 's are partitioned
events. In other words, 𝐹𝑖 's are partitioned events whenever:
𝑛
⋃ 𝐹𝑖 = 𝑆 , ∀𝑖 ≠ 𝑗 ⇒ 𝐹𝑖 ∩ 𝐹𝑗 = ∅
𝑖=1
Using the third principle of the probability theory, we can show that
∑𝑛𝑖=1 𝑃( 𝐹𝑖 )
= 1.
Now, suppose that E is an event whose probability of occurrence is different
for each state of 𝐹𝑖 's. In these conditions, to calculate the probability that 𝐸 occurs,
the following equation can be used:
125 | P a g e
𝐸 = (𝐸 ∩ 𝐹1 ) ∪ (𝐸 ∩ 𝐹2 ) ∪. . .∪ (𝐸 ∩ 𝐹𝑛 )
⇒ 𝑃(𝐸) = 𝑃(𝐸 ∩ 𝐹1 ) + 𝑃(𝐸 ∩ 𝐹2 ) + ⋯ + 𝑃(𝐸 ∩ 𝐹𝑛 )
= 𝑃(𝐸|𝐹1 ) 𝑃(𝐹1 ) + 𝑃(𝐸|𝐹2 ) 𝑃(𝐹2 ) + ⋯ + 𝑃(𝐸|𝐹𝑛 ) 𝑃(𝐹𝑛 )
𝑛
⇒ 𝑃(𝐸) = ∑ 𝑃(𝐸|𝐹𝑖 ) 𝑃(𝐹𝑖 )

𝑖=1
This theorem is applicable when the probability of occurrence of event E 𝐸 is

different for each of the partitioned states of 𝐹𝑖 . In other words, the probability of
occurrence of 𝐸 depends on the occurrence of 𝐹𝑖 's. In fact, since ∑𝑛𝑖=1 𝑃( 𝐹𝑖 ) is equal
to 1, the law of total probability states that if the probability of occurrence of the
event E in different parts of the partitioned space of 𝐹𝑖 's is different, then to calculate
the probability of 𝐸, we should take a weighted average of 𝑃(𝐸|𝐹𝑖 )'s.
Example 5.1
Consider two boxes. The first one contains 3 red and 2 white marbles, and the
second one contains 2 red and 3 white marbles. We randomly select one box and
withdraw one marble from it. What is the probability that the withdrawn marble is
white?
Solution. If 𝐹𝑖 denotes the event that box i is selected, and W denotes the event that
the withdrawn marble is white, to calculate the probability of W, we condition on 𝐹1
and 𝐹2 as follows:
2 1 3 1 5
𝑃(𝑊) = 𝑃(𝑊|𝐹1 )𝑃(𝐹1 ) + 𝑃(𝑊|𝐹2 )𝑃(𝐹2 ) = × + × =
5 2 5 2 10
126 | P a g e
Example 5.2
In the preceding example, suppose that we randomly withdraw one marble

from the first box and put it into the second box. Then, we randomly withdraw one
marble from the second box. What is the probability that the marble withdrawn from
the second box is white?
Solution. If 𝑊1 denotes the event that the marble withdrawn from the first box is
white, and 𝑊2 denotes the event that the marble selected from the second box is
white, to calculate the probability of the required event, we should condition on 𝑊1
and 𝑊1𝑐 as follows:
4 2 3 3 17
𝑃(𝑊2 ) = 𝑃(𝑊2 |𝑊1)𝑃(𝑊1 ) + 𝑃(𝑊2 |𝑊1𝑐 )𝑃(𝑊1𝑐 ) = × + × =
6 5 6 5 30
Example 5.3
A company sells its products in lots of size 10. A buyer randomly takes a sample
of size 3 from the products in the lots and only accepts the lots that do not contain
a defective product in the inspected sample. If 25 percent of the lots belonging to the
company do not consist of a defective product, 50 percent consist of one defective
product, and 25 percent of them consist of two defective products, what is the
probability that a lot randomly selected by the buyer is accepted?
Solution. Provided that one lot contains 𝑖 defective products, the probability of its
𝑖 10−𝑖
( )( )
acceptance is equal to 0
10
3
. Therefore, if A denotes the event that the lot is
( )
3
accepted and 𝐵𝑖 denotes the event that there are i defective products in the lot, to
calculate the acceptance probability of the lot, we should condition on 𝐵𝑖 's as follows:
127 | P a g e
𝑃(𝐴) = 𝑃(𝐴|𝐵0 ) 𝑃(𝐵0 ) + 𝑃(𝐴|𝐵1 ) 𝑃(𝐵1 ) + 𝑃(𝐴|𝐵2 ) 𝑃(𝐵2 )
10 9 8
( ) 1 ( ) 2 ( ) 1 1 7 2 7 1 15 + 21 + 7 43
=[ 3 ]× +[ 3 ]× +[ 3 ]× =1× + × + × = =
10 4 10 4 10 4 4 10 4 15 4 60 60
( ) ( ) ( )
3 3 3
Example 5.4
An urn contains b black and r red balls. We randomly select one ball from the
urn, but we put another c balls of the same color when returning it to the urn. Now,
suppose that we randomly select one ball again. What is the probability that the
second ball selected is red?
Solution. If events 𝑅1 and 𝐵1 denote that the first selected ball is red and black,
respectively, and 𝑅2 denotes that the second selected ball is red, to calculate the
probability of 𝑅2 , we should condition on 𝑅1 and 𝐵1 as follows:
𝑟 𝑏 𝑟+𝑐 𝑟 𝑟
𝑃(𝑅2 ) = 𝑃(𝑅2 |𝐵1 )𝑃(𝐵1 ) + 𝑃(𝑅2 |𝑅1 )𝑃(𝑅1 ) = ( × )+( × )=
𝑟+𝑏+𝑐 𝑟+𝑏 𝑟+𝑏+𝑐 𝑟+𝑏 𝑟+𝑏
Example 5.5
Suppose that a child has been lost in the city center. The child's father says
that his child has been missed in the east of the central area with probability 0.75 and
in the west with probability 0.25. The police station of the area three officers to the
east and two officers to the west to look for the child. Suppose that if an officer is
dispatched to the area that the child is lost (west or east), then he independently of
the other officers finds him with probability 0.4. What is the probability of finding
the child?
128 | P a g e
Solution. Suppose that F denotes the event of finding the child, E denotes the event
that the child has been lost in the east, and W denotes the event that the child has
been lost in the west. In each of the two directions, the probability that at least one
of the officers finds the child is equal to one minus the probability that none of them
finds him. Therefore, to calculate the probability of F, we should condition on events
E and W as follows:
𝑃 (𝐹) = 𝑃(𝐹|𝐸) 𝑃(𝐸) + 𝑃(𝐹|𝑊) 𝑃 (𝑊) = [1 − (0.6)3 ] × 0.75 + [1 − (0.6)2 ] × 0.25 = 0.748
Example 5.6
Solve Example 4.8 by using the law of total probability and conditioning on the
results of the first trial.
Solution. Suppose that event A denotes that person A wins the game sooner than B,
and event B denotes that person B wins the game sooner than A. Furthermore, 𝐴1
and 𝐵1 denote the events that people A and B win the first trial, respectively, and 𝐶1
denotes the event that nobody wins the first trial. Therefore, conditioning on the
result of the first trial leads to:
𝑃(𝐴) = 𝑃(𝐴|𝐴1 ) 𝑃(𝐴1 ) + 𝑃(𝐴|𝐵1 ) 𝑃(𝐵1 ) + 𝑃(𝐴|𝐶1 ) 𝑃(𝐶1 )
1 2 3 1
𝑃(𝐴) = 1 × + 0 × + 𝑃(𝐴) × ⇒ 𝑃(𝐴) =
6 6 6 3
Note that if nobody wins the first trial, then the conditions to calculate the
probability of wining for person A are similar to the state with no trial. Therefore, in
the solution above, the value of 𝑃(𝐴|𝐶1 ) is assumed to be equal to the value of 𝑃(𝐴).
Moreover, to calculate the probability that person B wins the game sooner than A,
we have:
𝑃(𝐵) = 𝑃(𝐵|𝐴1 ) 𝑃(𝐴1 ) + 𝑃(𝐵|𝐵1 ) 𝑃(𝐵1 ) + 𝑃(𝐵|𝐶1 ) 𝑃(𝐶1 )
1 2 3 2
𝑃(𝐵) = 0 × + 1 × + 𝑃(𝐵) × ⇒ 𝑃(𝐵) =
6 6 6 3
129 | P a g e
Example 5.7
Solve Example 4.9 by using the law of total probability and conditioning on the
results of the first and second trials.
Solution. Suppose that event A denotes person A wins the game sooner than B, and
event B denotes person B wins the game sooner than A. Furthermore 𝐴𝑖 and 𝐵𝑖 denote
the events that people A and B win the 𝑖 𝑡ℎ trial. Therefore, conditioning on the result
of the first and second trials leads to:
𝑃(𝐴) = 𝑃(𝐴|𝐴1 ) 𝑃(𝐴1 ) + 𝑃(𝐴|𝐴1𝑐 𝐵2 ) 𝑃(𝐴1𝑐 𝐵2 ) + 𝑃(𝐴|𝐴1𝑐 𝐵2𝑐 ) 𝑃(𝐴1𝑐 𝐵2𝑐 )
1 5 2 5 4 3
𝑃(𝐴) = 1 × + 0 × ( × ) + 𝑃(𝐴) × × ⇒ 𝑃(𝐴) =
6 6 6 6 6 8
Note that if nobody wins the first and second trials, then the conditions to
calculate the probability that person A wins are similar to the state with no trial.
Therefore, in the solution above, the value of 𝑃(𝐴|𝐴1𝑐 𝐵2𝑐 ) is assumed to be equal to the
value of 𝑃(𝐴). Moreover, to calculate the probability that person B wins the game
sooner than A, we have:
𝑃(𝐵) = 𝑃(𝐵|𝐴1 ) 𝑃(𝐴1 ) + 𝑃(𝐵|𝐴1𝑐 𝐵2 ) 𝑃(𝐴1𝑐 𝐵2 ) + 𝑃(𝐵|𝐴1𝑐 𝐵2𝑐 ) 𝑃(𝐴1𝑐 𝐵2𝑐 )
1 5 2 5 4 5
𝑃(𝐵) = 0 × + 1 × × + 𝑃(𝐵) × × ⇒ 𝑃(𝐵) =
6 6 6 6 6 8
Example 5.8
Suppose that a bag contains 5 gold and 15 iron coins and you are one of the 20
people who choose one coin at random and respectively.
a. If you are the first person choosing a coin, what is the probability of picking
the gold coin?
130 | P a g e
b. If you are the second person choosing a coin, what is the probability of picking
the gold coin?
Solution.
5
a. The probability of picking the gold coin for the first person equals 20.
b. To calculate the probability that the second person picks the gold coin, we can
condition on the event whether the first person picks the gold coin or not.
Therefore, if 𝐺1 and 𝐺2 denote the events of picking the gold coin by the first
and second individuals, respectively, we have:
4 5 5 15 5
𝑃(𝐺2 ) = 𝑃(𝐺2 |𝐺1 ) 𝑃(𝐺1 ) + 𝑃(𝐺2 |𝐺1𝑐 ) 𝑃(𝐺1𝑐 ) = × + × =
19 20 19 20 20
Likewise, it can be shown that the probability of choosing the gold coin for
5
each of the choices from the first to the 20𝑡ℎ one is equal to 20.
The reader should note that, in this example, even though it seems necessary
to condition on the preceding choices to calculate the probability of choosing
the gold coin for each choice, it is possible to solve this problem without
conditioning on the preceding events as well. Indeed, each of the choices, such
as the 20𝑡ℎ choice, can be one out of the 20 possible coins, 5 of which are gold.
Therefore, in each choice, even the 20𝑡ℎ one, the probability of choosing the
5
gold coin is equal to 20.
Example 5.9
In the preceding example, if you randomly select 2 coins as the first person,
what is the probability that both of them are golden? If, first, one coin is randomly
withdrawn and then you select 2 coins at random, what is the probability that both
of them are gold?
Solution. If you select 2 coins as the first person, the probability that both coins are
gold equals:
131 | P a g e
5
( )
2 = 1
20
( ) 19
2
Or if the order of choices matters, by using another method, the answer is equal to:
5×4 1
=
20 × 19 19
However, if one coin is randomly selected first and then you select two coins
at random, to calculate the probability that both of them are gold, we can condition
on the event whether the initial choice has been gold or not. Hence, if 𝐺1 denotes the
event of choosing the gold coin by the first person and A denotes the event that both
coins corresponding to your choices are gold, we have:
4×3 5 5×4 15
𝑃(𝐴) = 𝑃(𝐴|𝐺1 ) 𝑃(𝐺1 ) + 𝑃(𝐴|𝐺1𝑐 ) 𝑃(𝐺1𝑐 ) = × + ×
19 × 18 20 19 × 18 20
4 × 5 × (3 + 15) 1
= =
18 × 19 × 20 19
Likewise, it can be shown that in whatever way you select two coins at random
1
from the bag, the probability that both of the coins are gold is equal to . In fact,
19
even though it seems necessary to condition on the preceding choices to calculate
the probability that both of your coins are gold, it is also possible to solve this
problem without conditioning on the preceding events. Indeed, whether the first and
second random choices or the 19𝑡ℎ and the 20𝑡ℎ random choices belong to yours,
20 5
there are ( ) states, ( ) of which indicate your coins are gold. Hence, the
2 2
probability that both of the coins are gold is equal to:
5
( )
2 = 1
20
( ) 19
2
Likewise, it can be shown that if we randomly select n marbles from a bag
containing N gold and M non-gold marbles, the probability of any event (such as all
the marbles are gold) is equal to the state that k marbles are randomly deducted first
and then we select n marble at random from the bag.
132 | P a g e
5
As seen in the preceding examples, 𝑃(𝐺1 ), 𝑃(𝐺2 ), and 𝑃(𝐺2 |𝐺1 ) are equal to 20,
5 4
, and , respectively. Nevertheless, if the choices are with replacement, these
20 19
5 5 5
probabilities are equal to , , and , respectively. In other words, choices with
20 20 20
replacement are independent trials and those without replacement are dependent
trials.
S ometimes, having knowledge about the occurrence of an event like E, we want to

determine the probability of occurrence of one of the partitioned events like 𝐹𝑖 .
In fact, it can be stated that the probability of one of the partitioned events is
regarded in the new main space E that can be represented as 𝑃(𝐹𝑖 |𝐸) and calculated
as follows:
𝑃(𝐸 ∩ 𝐹𝑖 ) 𝑃(𝐸|𝐹𝑖 ) 𝑃(𝐹𝑖 )

𝑃(𝐹𝑖 |𝐸) = = 𝑛
𝑃(𝐸) ∑𝑗=1 𝑃(𝐸|𝐹𝑗 ) 𝑃(𝐹𝑗 )
Example 6.1
A person can go to his work office either by bus or taxi. The probability that
he takes a taxi equals 0.3. If he goes to his office by taxi, he comes late at work in 20
percent of times. However, he comes late in 30 percent of times when he goes to his
133 | P a g e
office by bus. Given that he arrives at work late in a day, what is the probability that
he comes by bus?
Solution. Suppose that D denotes the event of coming late, T denotes the event of
taking a taxi, and B denotes the event of taking a bus. In these conditions, to calculate
𝑃(𝐵|𝐷), we use Bayes' law as follows:
𝑃(𝐷 |𝐵) 𝑃(𝐵) 0.3 × 0.7 21
𝑃(𝐵 |𝐷) = = = = 0.778
𝑃(𝐷 |𝐵) 𝑃(𝐵) + 𝑃(𝐷 |𝑇) 𝑃(𝑇) (0.3 × 0.7) + (0.2 × 0.3) 27
From the result of the problem, it can be concluded that, in the total sample
space, the probability of taking a bus and a taxi is equal to 0.7 and 0.3, respectively.
However, for the days of late arrival to the work office, these proportions are equal
to 0.778 and 0.222, respectively.
Example 6.2
Suppose that the probability of having a type of disease for smokers equals 0.1
and for non-smokers equals 0.03. If, in a community, 10 percent of the people are
smokers and 90 percent of them are non-smokers, what percentage of the people
having the disease are smokers?
Solution. If A denotes the event of being a smoker and B denotes the event of having
the disease, to calculate 𝑃(𝐴|𝐵), we use Bayes' law as follows:
134 | P a g e
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴) 0.10 × 0.10 0.01
𝑃(𝐴|𝐵) = = 𝑐 𝑐
= = ≈ 0.27
𝑃(𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴) + 𝑃(𝐵|𝐴 )𝑃(𝐴 ) 0.10 × 0.10 + 0.03 × 0.90 0.037
This result may be a little strange because only 27 percent of the people having
the disease are smokers and thus 73 percent of them are non-smokers. The reason
behind such a result is that the frequency of non-smokers is high in the main
community. In fact, 0.03 of the non-smokers having the disease constitute a high
population and possess a high portion of the diseased people. For example, suppose
that the community consists of 1000 people, 10 percent of which meaning 100 people
are smokers and 90 percent of which meaning 900 people are non-smokers. 10
percent or 10 people of the smokers and 3 percent or 27 people of the non-smokers
have the disease. It means that even though 10 percent of the smokers and 3 percent
of the non-smokers have the disease, among all the 37 people having the disease,
10
only of them, meaning 27 percent, are smokers. As mentioned before, the reason
37
behind such a result is that in the main community, the frequency of smokers is low
and that of the non-smokers is high.
Example 6.3
In a test, the proportion of women and men participants are 25 and 75 percent,
respectively. If 98 percent of women and 96 percent of men are accepted in the test,
what percentage of the people accepted are female?
Solution. If F and M denote the events of being a woman and man, respectively, and
B denotes the event of being accepted, to calculate 𝑃(𝐹|𝐵), we use Bayes' law as
follows:
𝑃(𝐹 ∩ 𝐵) 𝑃(𝐵|𝐹)𝑃(𝐹) 0.98 × 0.25 0.245

𝑃(𝐹|𝐵) = = = = ≈ 0.254
𝑃(𝐵) 𝑃(𝐵|𝐹)𝑃(𝐹) + 𝑃(𝐵|𝑀)𝑃(𝑀) 0.98 × 0.25 + 0.96 × 0.75 0.965
135 | P a g e
As seen, the proportion of women and men in the main space of the test
participants is 0.25 and 0.75, respectively. However, this proportion in the reduced
space related to the people accepted equals 0.254 and 0.746. Note that, in this
problem, the reduced space constitutes 0.965 of the main space and it is not that
different from the main space. Hence, the proportion of women and men in the main
space and in the reduced space are not remarkably different.
Example 6.4
There are two coins in a box. One of them is two-headed and the other is fair.
We select one coin at random from the box and flip it once. If it lands on heads, what
is the probability that the two-headed coin is selected?
Solution. Suppose that H denotes the event of landing on heads, F denotes the event
of selecting the two-headed coin, and 𝐹 𝑐 denotes the event of selecting the fair coin.
In these conditions, to calculate 𝑃(𝐹 |𝐻), we use the Bayes' law as follows:
1
𝑃(𝐻|𝐹)𝑃(𝐹) 1×2 2
𝑃(𝐹|𝐻) = = =
𝑃(𝐻|𝐹)𝑃(𝐹) + 𝑃(𝐻|𝐹 𝑐 )𝑃(𝐹 𝑐 ) 1 × 1 + 1 × 1 3
2 2 2
Example 6.5
A company sells its products in lots of size 10. A buyer takes a sample of size 3
from the products in the lots and only accepts lots that do not contain a defective
product in the inspected sample. Suppose that 25 percent of the lots belonging to
the company do not have a defective product, 50 percent of them consist of one
defective product, and 25 percent of them consist of two defective products. If a lot
136 | P a g e
is randomly selected and accepted by the buyer, what is the probability that there
are two defective products in the lot?
Solution. If A denotes the event of accepting the lot and 𝐵𝑖 denotes the event that
the lot contains 𝑖 defective products, to calculate 𝑃(𝐵2 |𝐴), we use the Bayes' law as
follows:
𝑃(𝐴|𝐵2 ) 𝑃(𝐵2 )
𝑃(𝐵2 |𝐴) =
𝑃(𝐴|𝐵0 ) 𝑃(𝐵0 ) + 𝑃(𝐴|𝐵1 ) 𝑃(𝐵1 ) + 𝑃(𝐴|𝐵2 ) 𝑃(𝐵2 )
8
( )
3 ]×1
[ 10 4 7
( ) 7
= 3 = 60 =
10 9 8 15 + 21 + 7 43
( ) 1 ( ) 2 ( ) 1
3 3 3
[ 10 ] × 4 + [ 10 ] × 4 + [ 10 ] × 4 60
( ) ( ) ( )
3 3 3
Note that as mentioned in Example 5.3, if the lot consists of 𝑖 defective
𝑖 10−𝑖
( )( )
products, its acceptance probability is equal to 0
10
3
.
( )
3
Example 6.6
Suppose that the probability of having a type of disease for smokers is twice
1
that of the non-smokers. If 5 of people are smokers in a community, what percentage
of people having the disease are smokers?
Solution. If A denotes the event that people are smokers and B denotes the event of
having the disease, according to the information of the problem, the value of the
𝑃(𝐵|𝐴) is twice that of 𝑃(𝐵|𝐴𝑐 ). Therefore, to calculate 𝑃(𝐴|𝐵), we use the Bayes' law
as follows:
1
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴) 2𝑎 × 1
𝑃(𝐴|𝐵) = = = 5 =
𝑃(𝐵) 𝑃(𝐵|𝐴)𝑃(𝐴) + 𝑃(𝐵|𝐴𝑐 )𝑃(𝐴𝑐 ) 2𝑎 × 1 + 𝑎 × 4 3
5 5
137 | P a g e
Example 6.7
There are 3 urns. Urn A contains 2 white and 4 red balls, urn B contains 8 white
and 4 red balls, and urn C contains one white and 3 red balls. If we select one ball at
random from each urn, what is the probability that the ball selected from the urn A
is white given that altogether 2 white balls are selected?
Solution. Suppose that A denotes the event that the ball of urn A is white and F
denotes the event that altogether 2 white balls are selected. Therefore, to calculate
𝑃(𝐴|𝐹), we use the Bayes' law as follows:
8 3 4 1 2
𝑃(𝐹|𝐴) 𝑃(𝐴) [12 × 4 + 12 × 4] × 6 7
𝑃(𝐴|𝐹) = = =
𝑃(𝐹|𝐴) 𝑃(𝐴) + 𝑃(𝐹|𝐴 ) 𝑃(𝐴 ) [ 8 × 3 + 4 × 1] × 2 + [ 8 × 1] × 4 11
𝑐 𝑐
12 4 12 4 6 12 4 6
A s mentioned in Section 3.2, the conditional probability has all the features of a
probability in the main space. It can also be shown that 𝑃(𝐸|𝐹) satisfies all the
three axioms of the probability theory. Furthermore, all the propositions belonging
to the sample space S can be used in the sample space F as well. For instance, we
showed that equation 𝑃(𝐸|𝐹) + 𝑃(𝐸 𝑐 |𝐹) = 1 is true. Likewise, it can be shown that
the law of total probability, expressed in Section 3.5, is true in the reduced space
as it is true in the main space. In other words, in the main space, if the probability
of occurrence of an event such as G depends on the occurrence or non-
occurrence of an event such as E, based on the law of total probability, we have:
𝑃(𝐺) = 𝑃(𝐺|𝐸)𝑃(𝐸) + 𝑃(𝐺|𝐸 𝑐 )𝑃(𝐸 𝑐 )
Now, in the reduced space F, if the probability of occurrence of an event
such as G depends on the occurrence or non-occurrence of an event such as E,
we have:
𝑃(𝐺|𝐹) = 𝑃(𝐺|𝐸 ∩ 𝐹)𝑃(𝐸|𝐹) + 𝑃(𝐺|𝐸 𝑐 ∩ 𝐹)𝑃(𝐸 𝑐 |𝐹)
138 | P a g e
It means that, in the reduced space F, to calculate the probability of
occurrence of event G, we should condition on the occurrence or nonoccurrence
of event E.
Example 7.1
Suppose a test is conducted in a factory to determine whether the parts are

defective or non-defective. The probability that the test result regarding the
defectiveness of the defective parts gets positive is 0.95. Moreover, the probability
that the test result regarding the defectiveness of the non-defective parts gets
positive is 0.1. If 5 percent of the parts in this factory are defective, what percentage
of the parts whose first defectiveness-diagnosis test result is positive have a positive
result in the second test as well?
Solution. Suppose that F denotes the event that the parts are defective. 𝐸1 and 𝐸2
denote the events that the results of the first and second tests are positive,
respectively. Therefore, 𝑃(𝐸2 |𝐸1 ) is required for the problem. In the space of the parts
with a positive first test, the proportion of defective parts and nondefective parts are
𝑃(𝐸1 |𝐹)𝑃(𝐹) 0.95 × 0.05 1
𝑃(𝐹|𝐸1 ) = = =
𝑃(𝐸1 |𝐹)𝑃(𝐹) + 𝑃(𝐸1 |𝐹 𝑐 )𝑃(𝐹 𝑐 ) 0.95 × 0.05 + 0.1 × 0.95 3
𝑃(𝐸1 |𝐹 𝑐 )𝑃(𝐹 𝑐 ) 0.1 × 0.95 2
𝑃(𝐹 𝑐 |𝐸1 ) = 𝑐 𝑐
= =
𝑃(𝐸1 |𝐹)𝑃(𝐹) + 𝑃(𝐸1 |𝐹 )𝑃(𝐹 ) 0.95 × 0.05 + 0.1 × 0.95 3
As a result, among the parts with a positive first test, the proportion of
1 2
defective parts and nondefective parts are not 0.05 and 0.95, but rather and 3.
3
Therefore, in this space, the probability that the result of the next trial is also positive
equals:
1 2
𝑃(𝐸2 |𝐸1 ) = 𝑃(𝐸2 |𝐸1 ∩ 𝐹)𝑃(𝐹|𝐸1 ) + 𝑃(𝐸2 |𝐸1 ∩ 𝐹 𝑐 )𝑃(𝐹 𝑐 |𝐸1 ) = 0.95 × + 0.1 × ≈ 0.383
3 3
However, this problem can also be solved by the following method:
139 | P a g e
𝑃(𝐸2 ∩ 𝐸1 ) 𝑃(𝐸2 ∩ 𝐸1 |𝐹)𝑃(𝐹) + 𝑃(𝐸2 ∩ 𝐸1 |𝐹 𝑐 )𝑃(𝐹 𝑐 )
𝑃(𝐸2 |𝐸1 ) = =
𝑃(𝐸1 ) 𝑃(𝐸1 |𝐹)𝑃(𝐹) + 𝑃(𝐸1 |𝐹 𝑐 )𝑃(𝐹 𝑐 )
0.95 × 0.95 × 0.05 + 0.1 × 0.1 × 0.95
=
0.95 × 0.05 + 0.1 × 0.95
Note that, in this problem, when we know that event F has occurred, events
𝐸1 and 𝐸2 can be assumed to be independent. It means that if one part is defective, it
is possible to assume that each of the two trials belonging to that part is
independently positive with probability 0.95. In these conditions, we say that events
𝐸1 and 𝐸2 are independent given event F. However, in case of not having knowledge
about the occurrence or nonoccurrence of event F, these two events are not
independent of each other. That is, 𝑃(𝐸2 |𝐸1 ) is not equal to 𝑃(𝐸2 ).
Example 7.2
Suppose that the parts of a factory are produced by two devices. For the first
and second devices, the probability that the parts are defective is 0.05 and 0.15,
respectively. The parts of each device are packed in separate stocks. We have
selected one part at random from a stock which we do not know the device it belongs
to, resulting in a defective part. By considering the assumption that half of the
factory's stocks are produced by the first device and the other half by the second
device, what is the probability that the next part selected from this stock is also
defective?
Solution. Suppose that 𝐹𝑖 denotes the event that the stock is selected from device i.
Also, 𝐸1 and 𝐸2 denote the events that choices 1 and 2 are defective, respectively.
Hence, 𝑃(𝐸2 |𝐸1 ) is required for the problem. If we know that the first choice has been
defective, the probability of selecting the stock from the first and second devices is
equal to:
𝑃(𝐸1 |𝐹1 )𝑃(𝐹1 ) 0.05 × 0.5 1
𝑃(𝐹1 |𝐸1 ) = = =
𝑃(𝐸1 |𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐸1 |𝐹2 )𝑃(𝐹2 ) 0.05 × 0.5 + 0.15 × 0.5 4
140 | P a g e
𝑃(𝐸1 |𝐹2 )𝑃(𝐹2 ) 0.15 × 0.5 3
𝑃(𝐹2 |𝐸1 ) = = =
𝑃(𝐸1 |𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐸1 |𝐹2 )𝑃(𝐹2 ) 0.05 × 0.5 + 0.15 × 0.5 4
Therefore, if the first part is defective, the probability of selecting the stock
1 1 1 3
from the first and second devices is not equal to and 2, but rather and ,
2 4 4
respectively. Hence, if the first part is defective, the probability that the second part
is also defective is equal to:
1 3
𝑃(𝐸2 |𝐸1 ) = 𝑃(𝐸2 |𝐸1 ∩ 𝐹1 )𝑃(𝐹1 |𝐸1 ) + 𝑃(𝐸2 |𝐸1 ∩ 𝐹2 )𝑃(𝐹2 |𝐸1 ) = 0.05 × + 0.15 × = 0.125
4 4
However, this problem can also be solved by the following method:
𝑃(𝐸2 ∩ 𝐸1 ) 𝑃(𝐸2 ∩ 𝐸1 |𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐸2 ∩ 𝐸1 |𝐹2 )𝑃(𝐹2 )
𝑃(𝐸2 |𝐸1 ) = =
𝑃(𝐸1 ) 𝑃(𝐸1 |𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐸1 |𝐹2 )𝑃(𝐹2 )
0.05 × 0.05 × 0.5 + 0.15 × 0.15 × 0.5
= = 0.125
0.05 × 0.5 + 0.15 × 0.5
At the end of this chapter, to recap some of the presented concepts and
examples, we investigate the following example:
Consider two urns that there are 2 black and 3 white marbles in the first one,
and there are 3 black and 2 white marbles in the second one. We select one urn and
then select two marbles at random from it. It is desired to calculate the probability
a. Both of the marbles selected are white.
b. The second marble selected is white.
c. The first urn is selected given that both of the marbles selected are white.
d. The second marble selected is white, given that the first marble selected is
white.
Solution. Suppose that 𝐹𝑖 denotes the event of selecting urn 𝑖. Also, 𝐸1 and 𝐸2 denote
the events that the first and second choices are white.
a. According to the explanations of Section 3.5 or the law of total probability,
since the probability of the desired event depends on the selected urn, we
condition on the selected urn as follows:
3 2 1 2 1 1 1
𝑃(𝐸2 ∩ 𝐸1 ) = 𝑃(𝐴) = 𝑃(𝐴|𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐴|𝐹2 )𝑃(𝐹2 ) = ( × ) × + ( × ) × =
5 4 2 5 4 2 5
141 | P a g e
b. According to the explanations of Example 5.8, whatever urn we select, the
probability that the second selected marble is white equals the probability that
the first selected marble is white. Therefore, we have:
𝑃(𝐸2 ) = 𝑃(𝐸2 |𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐸2 |𝐹2 )𝑃(𝐹2 ) = 𝑃(𝐸1 |𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐸1 |𝐹2 )𝑃(𝐹2 )
3 1 2 1 1
=( )× +( )× =
5 2 5 2 2
c. According to the explanations of Section 3.6 or the Bayes' law, we have:
𝑃(𝐹1 ∩ 𝐴) 𝑃(𝐴|𝐹1 )𝑃(𝐹1 )

𝑃(𝐹1 |𝐸2 ∩ 𝐸1 ) = 𝑃(𝐹1 |𝐴) = =
𝑃(𝐴) 𝑃(𝐴|𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐴|𝐹2 )𝑃(𝐹2 )
3 2 1
( × 4) × 2 3
= 5 =
3 2 1 2 1 1
( × 4) × 2 + ( × 4) × 2 4
5 5
d. According to the explanations of Section 3.7, we have:
3 2 1 2 1 1
𝑃(𝐸2 ∩ 𝐸1 |𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐸2 ∩ 𝐸1 |𝐹2 )𝑃(𝐹2 ) (5 × 4) × 2 + (5 × 4) × 2 2
𝑃(𝐸2 |𝐸1 ) = = =
𝑃(𝐸1 |𝐹1 )𝑃(𝐹1 ) + 𝑃(𝐸1 |𝐹2 )𝑃(𝐹2 ) 3 1 2 1 5
( )× +( )×
5 2 5 2
142 | P a g e
1) Consider a sample space such as 𝑆. Assuming that the defined events in the
following statements are nonempty, investigate the validity of the following
statements.
a. If 𝐴 and 𝐵 are disjoint events, then they are dependent.
b. If A is a subset of 𝐵, then 𝐴 and 𝐵 are independent.
c. 𝑃(𝐵|𝐴) + 𝑃(𝐵 𝐶 |𝐴) = 1.
d. If 𝐴 and 𝐵 are independent, then 𝑃(𝐴𝐶 |𝐵) = 𝑃(𝐴𝐶 ).
e. If 𝐴 and 𝐵 are independent, then 𝐴 and (𝐴 ∪ 𝐵) are independent as well.
f. If 𝐴 and 𝐵 are independent, then (𝐴𝐶 ∪ 𝐵 𝐶 ) is not equal to the sample
space (𝑆).
g. If 𝑃(𝐴|𝐵) = 𝑃(𝐴|𝐵 𝐶 ), then 𝐴 and 𝐵 are independent.
h. If 𝐴 and 𝐵 are independent and 𝑁 ⊂ 𝐵, 𝑀 ⊂ 𝐴, then 𝑀 and 𝑁 are
independent.
i. If 𝐴 is independent of 𝐵 and also 𝐶, then 𝐴 is independent of 𝐵 ∪ 𝐶.
j. For three arbitrary events 𝐴, 𝐵, and 𝐶, if 𝐴 and 𝐵 are independent, and
so do 𝐴 and 𝐶, then B and 𝐶 are independent as well.
k. For arbitrary events 𝐴 and 𝐵, we have 𝑃(𝐴 ∩ 𝐵|𝐴) ≥ 𝑃(𝐴 ∩ 𝐵|𝐴 ∪ 𝐵).
𝑃(𝐴)+𝑃(𝐵)−1
l. For arbitrary events 𝐴 and 𝐵, we have 𝑃(𝐴|𝐵) ≥ .
𝑃(𝐵)
m. If 𝑃(𝐴|𝐵) = 1, then 𝑃(𝐵 𝐶 |𝐴𝐶 ) = 1.

n. If 𝑃(𝐴) ≤ 𝑃(𝐴|𝐵), then 𝑃(𝐵) ≤ 𝑃(𝐵|𝐴).
o. If 𝑃(𝐴) < 𝑃(𝐴|𝐵), then 𝑃(𝐴𝑐 ) > 𝑃(𝐴𝑐 |𝐵).
𝑃(𝐵𝐶 )
p. If 𝑃(𝐴) + 𝑃(𝐵) ≥ 1, then 𝑃(𝐵|𝐴) ≥ 1 − .
𝑃(𝐴)
q. If (𝐴 ∩ 𝐵) ⊂ 𝐶, then 𝑃(𝐶 𝐶 ) ≤ 𝑃(𝐴𝐶 ) + 𝑃(𝐵 𝐶 ).
143 | P a g e
2) If 𝐴𝑖 's are simultaneously independent events belonging to one sample space,
then investigate the validity of the following statements.
a. 𝑃(𝐴1 𝛥(𝐴2 ∩ 𝐴3 )) = 𝑃(𝐴1 ) + 𝑃(𝐴2 )𝑃(𝐴3 ) − 2𝑃(𝐴1 )𝑃(𝐴2 )𝑃(𝐴3 )
b. 𝑃(𝐴3 ∩ (𝐴1 ∪ 𝐴2 )) = 𝑃(𝐴1 )𝑃(𝐴3 ) + 𝑃(𝐴2 )𝑃(𝐴3 ) − 𝑃(𝐴1 )𝑃(𝐴2 )𝑃(𝐴3 )
c. 𝑃(𝐴1 ∩ … ∩ 𝐴𝑛 ) = 𝑃(𝐴1 ) … 𝑃(𝐴𝑛 ) = ∏𝑛𝑖=1 𝑃(𝐴𝑖 )
d. 𝑃(𝐴1 ∪ … ∪ 𝐴𝑛 ) = 1 − ∏𝑛𝑖=1 𝑃(𝐴𝑖 )
3) In the trial of flipping two fair coins, consider the following events:
A: The event that the first coin lands on tails.
B: The event that the second coin lands on heads.
C: The event that one head and one tail appear.
Investigate the simultaneous independence of these three events.
4) Consider n individuals who have gathered together at random. Suppose that
𝐸𝑖,𝑗 denotes the event that people (𝑖, 𝑗) have the same birthday. Also, suppose
that the individuals were born with the same probability in a 365-day year. In
these conditions, it is desired to calculate:
a. 𝑃(𝐸3,4 |𝐸1,2 )
b. 𝑃(𝐸1,3 |𝐸1,2 )
c. 𝑃(𝐸2,3 |𝐸1,2 ∩ 𝐸1,3 )

1
5) If E and F are two independent events with respective probabilities 𝑃(𝐸) = 3
1
and 𝑃(𝐹) = 2, then obtain the following values:
a. 𝑃(𝐸 𝐶 − 𝐹 𝐶 )
b. 𝑃(𝐸 𝐶 𝛥𝐹 𝐶 )
c. 𝑃(𝐸 𝐶 |𝐹 𝐶 )
6) If 𝑃(𝐴) = 0.4 and 𝑃(𝐵) = 0.8, then obtain the lower and upper bounds for the
value of 𝑃(𝐴|𝐵).
144 | P a g e
7) If we have two events 𝐴 and 𝐵 such that:
3 2 1
𝑃(𝐴|𝐵) + 𝑃(𝐵|𝐴) = , 𝑃(𝐵) = , 𝑃(𝐴) =
4 3 3
it is desired to calculate:
a. 𝑃(𝐴 ∩ 𝐵)
b. 𝑃(𝐴 ∪ 𝐵)
8) If we have two events 𝐴 and 𝐵 such that:
1 1 1
𝑃(𝐴) = , 𝑃(𝐴|𝐵) = , 𝑃(𝐵|𝐴𝐶 ) =
2 3 4
a. 𝑃(𝐵)
b. 𝑃(𝐴 ∩ 𝐵)
c. 𝑃(𝐴 ∪ 𝐵)
9) If 𝑃(𝐴𝐶 ∩ 𝐵 𝐶 ) = 0.6 and 𝑃(𝐴 ∩ 𝐵) = 0.1, then it is desired to obtain 𝑃(𝐴𝛥𝐵|𝐴 ∪
𝐵).
10) A device transmits the digits 0 and 1, each of which is erroneously transmitted
1
with probability . If we are told that a four-digit code is erroneously
3
transmitted, what is the probability that its first digit was erroneously
transmitted?
11) An experience has shown that 80 percent of poacher 𝐴's shots and 40 percent
of poacher 𝐵's shots hit the target. During their poaching, they saw a rabbit
abruptly and attempted to hit it simultaneously and independently. The rabbit
died, but there was one hole on its body. What is the probability that poacher
𝐴 hits the rabbit?
12) A device is defective in the first and second days of the week with respective
probabilities 0.2 and 0.22. If the probability of being defective for the device
on the second day when we know it has been defective in the first day is equal
to 0.7, what is the probability of not being defective on the second day given
that it has not been defective on the first day?
145 | P a g e
1
13) Suppose that 𝐴 and 𝐵 are two independent events in a way that 𝑃(𝐴 ∩ 𝐵) = 6
1
and 𝑃(𝐴𝐶 ∩ 𝐵 𝐶 ) = 3. In these conditions, obtain 𝑃(𝐴) and 𝑃(𝐵).
14) The probability that a person survives after kidney transplantation equals 0.8.
If the patient survives after the surgery, the probability that his body is not
compatible with the transplantation and dies is equal to 0.1. What is the
probability of survival for a kidney transplant patient after these two stages?
15) A fair die is rolled three times. If we know that the sum of the upturned faces
is at least four, it is desired to calculate the probability that the sum of the
upturned faces is exactly equal to four.
16) The probability that a person works less than five years in a company equals
0.2. If the person does not leave the company until the fifth year, he will not
leave the company until the seventh year with probability of 0.3. What is the
probability that this person leaves the company between the fifth and seventh
years?
17) A system consists of 3 devices 𝐴, 𝐵, and 𝐶 that function independently and
properly in each shift work with probability 𝑝. The system functions until at
least one of the devices functions. If the system functions in one shift work, it
is desired to calculate the probability that:
a. Device 𝐴 functions.
b. Device 𝐵 does not function.
c. Both devices 𝐴 and 𝐵 function.
d. All three devices 𝐴, 𝐵, and 𝐶 function.
e. Only one of the devices functions.
18) Each of the three people 𝐴, 𝐵, and 𝐶 hits their shots at a target with respective
probabilities 02, 0.3, and 04. It is desired to calculate the probability that 𝐴 hits
at the target given that we know:
a. At least two shots hit the target.
b. Exactly one person hits the target.
19) Every day, a retired person randomly chooses one of the 6 parks, located in
the vicinity of a certain city, to go jogging. If we know that he has gone to park
146 | P a g e
𝑋 at least once during the last 10 days, it is desired to calculate the probability
that:
a. He went to park 𝑋 at least twice or more during this period.
b. He went to park 𝑋 all ten days.
20) A moving object starts to move one unit to the right side in each time with
probability 𝑝 and one unit to the left side with probability (1 − 𝑝). Movements
are independent of each other. If we know that after 3 movements the object
has moved one unite to the right side altogether, it is desired to calculate the
probability that its first movement is to be on the right side.
21) A newly graduated student plans to participate in three examinations. He takes
the first exam in month 𝐴 and if he passes it, he will take the second exam in
month 𝐵. Then, if he passes the second exam, he will take the third exam in
month 𝐶. The probability of passing the first, second, and third examinations
are 0.9, 0.8, and 0.7, respectively. If the person has taken examinations
independently and we know that he did not accomplish these three stages,
then what is the probability that he has failed in the second stage?
22) If we know that each child is equally likely to be a boy or girl independently,
for a couple having two children, it is desired to calculate the probability that:
a. Both children are girls.
b. Both children are girls given that the oldest child is a girl.
c. Both children are girls if we know that at least one of them is a girl.
23) If we know that each child from a couple is equally likely to be a boy or girl
independently, for a couple having 4 children, it is desired to calculate the
probability that the family has three boys provided that we know there is at
least one boy in this family.
24) Consider an urn containing 10 balls, 8 of which are white. We select a sample
of size 4 randomly and with replacement from this urn. If the selected sample
contains 3 white balls, it is desired to calculate the probability that:
a. The first selected ball is white.
b. The first and third selected balls are white.
147 | P a g e
25) Solve the preceding problem when the choices are without replacement.
26) The king of a country comes from a two-child family. It is desired to calculate
the probability that:
a. The other child is a girl.
b. The king is the oldest child of the family.
c. The youngest child is a girl, when we know that the king is the oldest
child of the family.
27) In a group consisting of “𝑚” men and “𝑤” women, 𝑥 men and 𝑦 women are
smokers (𝑥 ≤ 𝑚, 𝑦 ≤ 𝑤). We select a person at random from the group and use
𝐴 and 𝐵 to denote respectively that the person is a man and a smoker. Then,
show that equation 𝑥𝑤 = 𝑦𝑚 is a necessary and sufficient condition for the
independence of these two events.
28) We throw a pair of fair dice until the sum of the upturned faces is equal to 5
or 4. What is the probability that the sum is 5 appears before the sum is 4?
29) 𝐴 and 𝐵 each alternately and independently fires one shot at a target. If each
2
of them hits the target with probability 3 and 𝐴 is the first person starting the
game, what is the probability that 𝐵 hits the target sooner?
30) Player 𝐴 successively plays with two players 𝐵 and 𝐶 for an extended period.
5
In each play, independently, the winning probability of 𝐴 versus 𝐵 equals 7, and
7
𝐴 versus 𝐶 equals 10. Player 𝐴 first plays with player 𝐵. What is the probability
that player 𝐶 manages to win player 𝐴 before player 𝐵 can do so?
31) Four players 𝐴, 𝐵, 𝐶, and 𝐷 alternately and independently flip a fair coin,
respectively. The game terminates whenever one of them gets a heads for the
first time leading to his winning. Obtain the probability of success for each
player.
32) People 𝐴 and 𝐵 do a game alternately and independently. Person 𝐴 starts the
game and if his two successive shots hit the target, then he wins the game.
However, if one of his shots does not hit the target, he gives the shot to 𝐵.
Then, 𝐵 starts the game and if his two successive shots hit the target, he wins
the game. Nevertheless, if one of his shots does not hit the target, he gives the
148 | P a g e
shot to 𝐴 again. They keep hitting the target until one of them wins. If 𝑃1 and
𝑃2 denote the respective probabilities of hitting the target by people 𝐴 and 𝐵's
shots, obtain the probability that 𝐴 wins the game.
33) A fair die is successively tossed. If the upturned face's outcome is divisible by
2, then 𝐴 gets one point. If it is divisible by 3, then 𝐵 gets one point. What is
the probability that 𝐴 gets the point sooner than 𝐵?
34) Suppose that people 𝐴 and 𝐵 are playing a game. In each stage of the game,
the winning probability for the winner and loser the preceding stage is equal
3 1
to 4 and 4, respectively. If 𝐴 wins the preceding stage, it is desired to calculate
the probability that 𝐴 wins at least two out of the next three stages.
35) An urn contains 2 red and 6 black marbles. 𝐴 and 𝐵 select a marble randomly
and without replacement from the urn, respectively, until one red marble is
obtained. If 𝐴 starts the game, obtain the probability that 𝐵 gets the red marble
first.
36) Urn 1 contains 2 white and 2 black marbles, and urn 2 contains 3 white and 4
black marbles. We draw two marbles at random from urn 1 and put them into
urn 2. Then, we draw one marble at random from urn 2. What is the probability
that this marble is black?
37) There are 30 and 20 light bulbs in boxes 𝐴 and 𝐵, respectively. There are 5
defective light bulbs in box 𝐴 and 3 ones in box 𝐵. We select 10 and 8 light bulbs
randomly and without replacement from boxes 𝐴 and 𝐵, respectively, and then
put them into a new box. Now, we draw one light bulb at random from the new
box. Obtain the probability that this light bulb is defective.
38) A student independently answers each question incorrectly with probability
3
0.3 in an exam. In this exam, he is given 3 questions with probability and 4
4
1
questions with probability 4. What is the probability that he answers at least
one question incorrectly?
39) In a multiple-choice quiz containing 4 answer choices for each question,
each student either knows the answer or randomly marks it. Suppose the
2
probability that the student knows the answer of a question is equal to 3. In
149 | P a g e
these conditions, what is the probability that the student did not randomly
select the answer of the question that he correctly marked?
40) We flip a fair coin. If it lands on heads, we throw a fair die and depending on
the upturned face's outcome, we receive a prize. If the coin lands on tails,
we throw two dice and receive a prize depending on the sum of their
upturned faces. Obtain the probability that we receive at most 5 prizes.
41) If we consider the probability of being a boy or girl for each born child is
equally likely, then what percentage of the boys in three-child families are
the oldest child?
42) A mother has three children. She has gone shopping along with one of the
children who is a boy. If the mother along with her first, second, and third
children goes shopping equally likely, it is desired to calculate the
probability that the boy has a brother older than himself.
43) We select two numbers randomly and with replacement from the set
{1,2,3, … ,100}. What is the probability that the second selected number is
greater than the first one?
44) In a registration system, suppose that the proportion of first-year, second-
year, third-year, and fourth-year students are 𝑃1 , 𝑃2 , 𝑃3 , and 𝑃4 , respectively,
and the number of students is large. Given that a student logs in to the
system at a certain time and sees that the other 10 people are registering, it
is desired to calculate the probability that:
a. At least one first-year or second-year student is registering.
b. None of his classmates is registering.
45) A box contains 18 tennis balls, 8 of which are intact. Suppose that we draw
3 balls at random from the box and then return them to the box after
playing. If we randomly draw another 3 balls for the second time, then what
is the probability that all of these balls are intact?
46) From an urn containing 5 balls numbered 1 through 5, we randomly draw 3
balls. If the first or second selected ball is 1, it is not returned to the urn.
Otherwise, it is returned to the urn. It is desired to calculate the probability
that the third selected ball is 2.
150 | P a g e
47) We intend to select one set at random from the 20 mother-child couples,
which can have from 0 to 40 members. If the set's members are randomly
selected, it is desired to calculate the probability that:
a. Each selected child sees his mother in the set.
b. None of the selected children sees his mother in the set.
c. Only children whose mothers have been chosen will be selected.
48) A person can go to work through route 𝐴 or 𝐵. He chooses route 𝐴 to go
to work with probability 0.3. If he chooses route 𝐴, he is late in 20 percent
of times; On the other hand, in case of choosing route 𝐵, he is late in 30
percent of times. Given that he is late on a working day, what is the
probability that he has chosen route 𝐴?
49) Suppose that a person asks his neighbor to water a flower vase while he is
on vacation. If it does not receive water, it wilts with probability 0.8, and if
it is watered, it wilts with probability 0.15. Meanwhile, we know that the
neighbor does not forget to water the flower vase with probability 0.9. After
the vacation, if the flower wilts, what is the probability that the neighbor
has forgotten to water the flower vase?
50) Box Ι contains 3 white and 2 black marbles, and box ΙΙ contains 2 white and
3 black marbles. We draw 2 marbles at random from box Ι and transfer
them to box ΙΙ. This is followed by the random selection of 2 marbles from
box ΙΙ. It is desired to calculate the probability that:
a. The 2 selected marbles from box ΙΙ are black.
b. Both of the selected marbles from box  have the same color, given
that the 2 selected marbles from box ΙΙ are black.
51) Suppose a trial taking on values 0, 1, 2, and 3 with respective probabilities
0.4, 0.3, 0.2, and 0.1. Then, if the trial's outcome is equal to 𝑖, we flip a fair
coin 𝑖 times (if the trial's outcome equals zero, no flip is performed). It is
a. The number of obtained heads equals 1.
b. One flip is performed, given that the number of obtained heads was
equal to 1.
151 | P a g e
52) Consider 3 boxes. In the first box, there are 3 white and 5 black marbles. In
the second box, there are 6 white and 4 black marbles. Also, in the third box,
there are 2 white and 4 black marbles. If one marble is randomly selected
from each box, it is desired to calculate the probability that:
a. Exactly two black marbles are selected.
b. The marble selected from the first box is black, given that exactly two
black marbles are selected.
53) In an urn, there are 2 fair coins, 5 coins having probability 0.3 of landing on
heads, and 3 two-headed coins. We randomly select a coin from the urn and
flip it 3 times. It is desired to calculate the probability that:
a. At least one head appears in 3 flips.
b. A fair coin is selected, if we know that at least one head appears in 3
flips.
54) An urn contains 10 marbles that 𝑋 of them are white and the rest are black.
We know that 𝑋 takes on the values 2, 4, and 6 with respective probabilities
2 3 5
, and . We select two balls randomly and without replacement from
10 10 10
the urn. If it is seen that both marbles are white, then obtain the probability
that there are 6 white marbles in this urn.
1
55) We have 20 parts that 𝑖 (𝑖 = 0,1,2) of them are defective with probability 3.
If we randomly take a sample of size 5 and see that none of them is defective,
what is the probability that there is no defective part in a total of 20 parts
as well?
56) In an inspection stage of a factory, parts identified as nondefective ones
pass the inspection stage. 10 percent of the factory productions are
defective. Furthermore, the inspection division identifies 10 percent of the
defective productions as nondefective ones and 20 percent of nondefective
parts as defective ones. What proportion of productions passing the
inspection stage is defective?
57) A sample of size 3 is obtained such that we first begin with an urn containing
5 white and 7 red balls. Whenever a ball is randomly withdrawn from this urn,
152 | P a g e
this ball along with another same-color ball is put into the urn. It is desired to
a. There is exactly one white ball in a sample of size 3.
b. The second selected ball is white, given that there is exactly one white
ball in a sample of size 3.
58) An urn contains “𝑏” black and “𝑟” red balls. We randomly select one of these
balls, but we also put another “𝑐” balls of the same color into the urn when
returning it to the urn. Now, suppose that we randomly select another ball. It
is desired to calculate the probability that
a. The second ball is red.
b. The first selected ball was black, if we know that the second ball is red.
59) In a particular area, the probability of car crash for men and women is 0.55
and 0.45 per year, respectively. If the number of men drivers is 3 times as many
as women drivers, what percentage of the people who crash in a year will also
crash in the next year?
60) Factories 𝐴 and 𝐵 are producers of a device in a specific industry. Each device
produced by these factories is defective with respective probabilities 0.05 and
0.01. Suppose that we are to inspect two devices having been purchased from
one factory. Moreover, we know that these two devices are purchased from
factories 𝐴 and 𝐵 with respective probabilities 0.2 and 0.8. Given that the first
device is defective, what is the probability that the second device is also
defective?
61) Suppose that urn 𝐴 contains 2 white and 2 black marbles, and urn 𝐵 contains
1 white and 3 black marbles. If we randomly select an urn and then randomly
draw 2 marbles from it and see that the first marble is black, what is the
probability that the second marble is not black?
62) There are 50 parts in a box that five of them are defective. If the parts are
selected randomly, successively, and without replacement from the box, it is
a. The third part is defective.
b. The first, third, and fifth parts are defective.
153 | P a g e
63) There are 4 parts of type 𝐴, 3 parts of type 𝐵, and 5 parts of type 𝐶 in a box. If
the people who are numbered from 1 to 12 draw the parts randomly,
alternately, and without replacement (each person draws one part at random),
then it is desired to calculate the probability that:
a. The first and third people draw the parts of type 𝐴.
b. The first and seventh people draw the parts of type 𝐴 and the third
person draws the part of type 𝐵.
c. The third person draws the part of type 𝐵, given that the second person
draws the part of type 𝐴.
64) We have 10 keys and two of them open the lock of a door. Two keys have been
lost, and we do not know which ones they are.
a. What is the probability that we can still open the door?
b. If we randomly select one of the remaining keys, what is the probability
that it can open the door?
c. If the keys are tested alternately, randomly, and without replacement,
what is the probability that we can open the door until at most the
second try?
d. Solve the preceding part under the assumption that 3 keys are lost.
65) Box number Ι contains 6 white and 4 black marbles. We randomly select 5
marbles from this box and put them into box number ΙΙ. Now, we randomly
select one marble from box number ΙΙ. It is desired to calculate the probability
that
a. The color of the marble selected form box number ΙΙ is white.
b. There are other 4 white marbles in box number ΙΙ, given that the color
of the marble selected form box number ΙΙ is white.
66) Consider two urns. There are three red and one white marbles in the first urn,
and two red and one white marbles in the second urn. We first randomly select
two marbles from the first urn and put them aside without seeing their colors.
Then, we draw one marble at random from the first urn and put it into the
second urn. Finally, we draw one marble at random from the second urn. What
is the probability that the marble selected from the second urn is white?
154 | P a g e
67) There are 7 red and 13 blue balls in a box. Two balls are drawn randomly
and without replacement from the box and discarded without seeing their
colors. Then, another ball is selected at random. It is desired to calculate
the probability that
a. The third ball is red.
b. Both discarded balls are blue, if we know that the third ball is red.
68) We have two urns. There are “𝑎” red and “𝑏” blue marbles in the first urn,
𝑎 𝑐
and “𝑐” red and “𝑑” blue marbles in the second urn such that (𝑏 = 𝑑). We
draw one marble at random from the first urn and put it into the second
urn and then draw one marble at random from the second urn and return
it to the first urn. Finally, we select one marble at random from the first
urn. What is the probability that this marble is red?
69) A box contains “𝑛” parts that “𝑟” of them are defective and the rest are non-
defective. We randomly inspect the parts. What is the probability that the
𝑘 𝑡ℎ inspected part (𝑟 ≤ 𝑘) is the last defective part?
70) Consider 5 girls and 4 boys such that there are 4 sister-brother couples. If
we randomly create 4 couples consisting of girls and boys, it is desired to
a. We choose 4 sister-brother couples.
b. We choose 3 sister-brother couples.
c. We choose 2 sister-brother couples.
71) We randomly put 𝑛 balls numbered from 1 to 𝑛 into 𝑛 boxes numbered from
1 to 𝑛 as well. It is desired to obtain the probability that:
a. None of the boxes is empty, given that the 𝑖 𝑡ℎ ball can be put into
one of the boxes 1 to 𝑖 equally likely.
b. Box 𝑖 is empty, given that 𝑖 𝑡ℎ ball can be put into one of the boxes 1
to 𝑖 equally likely.
c. None of the boxes is empty, given that the 𝑖 𝑡ℎ ball can be put into
one of the boxes 1 to 𝑛 equally likely.
155 | P a g e
d. Box 𝑖 is empty, given that the 𝑖 𝑡ℎ ball can be put into one of the boxes
1 to 𝑛 equally likely.
72) Box 𝐴 contains 9 balls numbered 1 through 9, and box 𝐵 contains 5 balls
numbered 1 through 5. One box is randomly selected and one ball is
randomly removed from it. If the selected ball is an even number, then the
second ball is randomly selected from the same box. Also, if the ball selected
is an odd number, then the second ball is randomly selected from the other
box. In these conditions, what is the probability that:
a. Both balls are odd?
b. Both balls are even?
c. Both balls are even, given that both balls are selected from box 𝐴?
73) First, number 𝑥1 is randomly selected from the set {1,2,3,4}, and then 𝑥2 is
randomly selected from the set {1,2, … , 𝑥1 }. In these conditions, it is desired
to calculate:
a. The probability that 𝑥2 = 1, given that 𝑥1 equals 𝑘, where 𝑘 = {1,2,3,4}.
b. The probability that 𝑥2 = 1.
c. The probability that 𝑥1 = 1, if we know that 𝑥2 was equal to 1.
74) An oil company is going to drill 3 different areas and now it is drilling one
oil well in each area. Previous experiences have shown that the chance of
reaching oil in the first, second, and third areas are 0.8, 0.9, and 0.7. If the
company can drill oil from each of these wells, it makes 100 million dollars.
It is desired to calculate the probability that:
a. This company makes 200 million dollars at the end of the drilling
period.
b. The company reaches the oil well by drilling the first well, given that
it makes 200 million dollars at the end of drilling period.
75) A device transmits digits 1 and 0. Each of these digits should pass three
stages. In each of these stages, the probability that the digit transmitted
1
does not change is equal to 3. It is desired to obtain the probability that
a. The transmitted digit in the third stage is seen as zero.
156 | P a g e
b. The transmitted digit in the third stage is seen as zero, given that it is
actually entered as zero.
76) One urn contains 14 iron coins and one gold coin, and another urn contains
15 iron coins. 5 coins are randomly selected from the first urn and put into
the second urn. Then, 5 coins are randomly selected from the second urn and
returned to the first urn. It is desired to calculate the probability that:
a. The gold coin remains in the first urn after transferring these coins.
b. The gold coin is not transferred to the second urn, if we know that the
gold coin has remained in the first urn.
77) Performance of a blood test is such that it shows positive result for 90
percent of people who have a particular disease. However, the test
erroneously shows positive result for the healthy people in 20 percent of
times. In an area, 30 percent of people are suffering from this disease.
Doctors prescribe a special medicine for people whose results of the test are
positive. This medicine has side effects for people effects with the respective
probabilities of 0.2 and 0.1 for the diseased and healthy people. In these
conditions, it is desired to obtain the probability that:
a. A person gets side effects of the medicine.
b. A person is really patient, given that he gets side effects of the
medicine.
78) Suppose that there are 10 urns numbered 1 through 10, each of which
contains 10 marbles, and there are 𝑖 white marbles out of 10 marbles in urn
number i. If one urn is randomly selected, and 2 marbles are withdrawn
randomly and without replacement from it, it is desired to calculate the
probability that
a. The first selected marble is white.
b. The first and second selected marble is white.
c. The second selected marble is white, given that the first selected
marble is white.
d. Urn number 10 is selected, given that the first and second selected
marbles are white.
e. Urn number 10 is selected, given that the second selected marble is
white.
157 | P a g e
79) We have 3 urns numbered 1 through 3 such that urn number 1 contains 3 red
and 2 blue marbles, urn number 2 contains 2 red and 1 blue marbles, and urn
number 3 contains 1 red and 3 blue marbles. If one marble is randomly selected
from the first urn and put into the second urn, then one marble is randomly
transferred from the second urn to the third urn, and finally one marble is
randomly selected from the third urn, it is desired to calculate the probability
that:
a. The selected marble from the third urn is blue.
b. The marble transferred from the first urn to the second urn is blue,
given that the marble selected from the third urn was blue.
c. The marble transferred from the second urn to the third urn is red,
given that the marble selected from the third urn was blue.
80) A mother carries the gene for a particular disease with probability 0.5. If she is
a carrier of the gene, then each of her children has the disease independently
1
with probability 2. Suppose that the mother has two children. It is desired to
calculate the probability that
a. Both children are healthy.
b. The mother carries the gene of disease, if we know that her both
children are healthy.
c. The third child of the mother is healthy, if we know that her both first
and second children are healthy.
81) Suppose that you have been invited to a game show, where you have the
chance to choose one of the three boxes available. In one of the boxes, there
is a gold coin, and the other two are empty. Meanwhile, the game show host
knows where the gold coin is.
a. If you choose a box such as box number 1 and the host declares that one
of the other two boxes is empty, such as, the box number 3, assuming
that he does not lie to you, do you want to stay with box number 1 or
pick box number 2?
158 | P a g e
b. If, before your first choice, the host removes one of the empty boxes,
and then you choose one of the remaining boxes, what is the probability
that you win the game?
82) In each of the following circuits, if each relay functions independently with
probability 𝑝 and fails with probability 𝑞 (𝑝 + 𝑞 = 1), calculate the probability
that the circuit functions.
a.
b.
c.
d.
e.
f.
1
83) In the following circuit, if each relay functions with probability 2, calculate the
probability that the circuit functions.
159 | P a g e
84) (Laplace's rule of succession) There are (𝑘 + 1) coins such that the 𝑖 𝑡ℎ coin
𝑖
turns up heads with probability 𝑘, (𝑖 = 0,1,2,3, … , 𝑘). One of these coins is
randomly selected and successively flipped.
a. Obtain the probability that the first 𝑛 flips are all heads.
b. What is the probability that the 𝑖 𝑡ℎ coin is selected, if we know that
the first 𝑛 flips are all heads?
c. Using the solution of the preceding part, show that if the first 𝑛 flips
are all heads, the probability that the next flip is also heads is equal
to:
𝑖 𝑛+1
∑𝑘𝑖=0 ( )
𝑘
𝑖
∑𝑘𝑖=0 ( )
𝑘
d. Using the following approximation, show that the solution of the
𝑛+1
preceding part for large values of k equals 𝑛+2.
1 𝑖 𝑛+1 1 1
Hint: 𝑘 ∑𝑘𝑖=0 (𝑘) ≅ ∫0 𝑥 𝑛+1 𝑑𝑥 = 𝑛+2
85) (The gambler's ruin problem) Two gamblers A and B bet on the result of
flipping a coin. On each flip, if the coin turns up heads, A collects $1 from B,
and if not, B collects $1 from A. Flips are independent and each flip comes
up heads with probability 𝑝 and comes up tails with probability 𝑞 (𝑝 + 𝑞 = 1).
If A and B have $(𝑖) and $(𝑁 − 𝑖), respectively, and play with each other
(flipping coin) until one of them ends up with all the money, and 𝑃𝑖 denotes
the event that A ends up with all the money when he starts with $(𝑖), then:
a. Show that given 𝑖 = 1, 2, 3, … , 𝑁 − 1, the equation 𝑃𝑖 = 𝑝𝑃𝑖+1 + 𝑞𝑃𝑖−1 is
true.
𝑞
b. Show that given 𝑖 = 1, 2, 3, … , 𝑁 − 1, the equation 𝑃𝑖+1 − 𝑃𝑖 = 𝑝 [𝑃𝑖 −
𝑃𝑖−1 ] is true.
c. Using the preceding equation, show that:
𝑞 𝑖
𝑃𝑖+1 − 𝑃𝑖 = ( ) 𝑃1 ; 𝑖 = 1,2, … , 𝑁 − 1
𝑝
160 | P a g e
d. Show that the following equation is true.
𝑞 𝑞 2 𝑞 𝑖−1
𝑃𝑖 − 𝑃1 = 𝑃1 [( ) + ( ) + ⋯ + ( ) ]
𝑝 𝑝 𝑝
Then, using the fact that 𝑃0 = 0 and 𝑃𝑁 = 1, show that the following
equation is true as well:
𝑞
1−( ) 1
𝑝
𝑁 ; 𝑖𝑓 𝑝 ≠
𝑞 2
𝑃𝑖 = 1 − (𝑝)
𝑖 1
{𝑁 ; 𝑖𝑓 𝑝 =
2
86) An investor has a stock in the market whose present value is 320 units. He
decides to sell his stock, provided that its value reaches either 300 or 350
units. If any change of the price is either increased 1 unit with probability
0.6 or decreased 1 unit with probability 0.4 and the successive changes are
assumed to be independent, what is the probability that the investor retires
as a winner?
87) A and B play a series of games. In each game, A independently wins with
probability 𝑝 and B wins with probability 1 − 𝑝. They stop playing the game
whenever the total number of wins for one of the players is two times
greater than that of the other player. The winner of the game is the one
whose number of wins is greater than that of the other. It is desired to
a. The game ends up with a total of 4 games.
b. A wins the game.
161 | P a g e
C onsider a trial that we flip a fair coin 5 times. The sample space of this trial
contains 32 members shown as follows:
𝑆 = {(𝑇, 𝑇, 𝑇, 𝑇, 𝑇), (𝐻, 𝑇, 𝑇, 𝑇, 𝑇), (𝑇, 𝐻, 𝑇, 𝑇, 𝑇), . . . , (𝐻, 𝐻, 𝐻, 𝐻, 𝐻)}
Now, suppose that we want to study and analyze the number of heads turning
up in this trial. In these conditions, we define a function on the trial's sample space
to count the number of obtained heads in flipping of a coin 5 times. This function
assigns an integer from 0 to 5 to each of the sample space members. For instance, in
the above sample space, it assigns number 0 to the first member, number 1 to the
second member, number 1 to the third member, …, and number 5 to the last member.
If 𝑋 represents this function, its possible values can be shown as follows:
𝑋 = {0,1,2,3,4,5}
We can say that the above set represents the sample space too; however, its
advantage in comparison with the main sample space is that considering the aim of
the problem (analyzing the number of heads turning up), the sample space has been
shown more elegantly. Moreover, if we represent the sample space in this way, all of
its results will appear in the numerical format. It is also evident that applying
mathematical analyses on a sample space represented as numeral is much more
straightforward. For example, in these conditions, the mean number of heads turning
up in flipping of a coin 5 times can be calculated (we will address such analyses both
during this chapter and in the next chapters).
162 | P a g e
Most of the time, when an experiment is performed like the
aforementioned example, it is more appropriate to study a function of the trial's
results rather than to study the trial's results. If the output of these functions
appears in the numeral format, they are called random variables. Therefore, we
define random variables as follows:
Definition: Random variable is a real-valued function defined on the sample

space and assigns a number to each member of the sample space.
Now, we should determine the probability of each random variable state.

For instance, in the aforementioned example, each of the possible 32 members of
the sample space is equally likely such that in one of these 32 members no heads
comes up (𝑋 = 0). Therefore, the probability that random variable 𝑋 (the number
1
of appeared heads) takes on number 0 equals . Moreover, there exist 5 out of
32
the sample space members that a heads turns up (𝑋 = 1). As a result, the
5
probability that the random variable 𝑋 adopts value 1 equals 32. Likewise, we have:
1 5 10 5 1
𝑃(𝑋 = 0) = , 𝑃(𝑋 = 1) = , 𝑃(𝑋 = 2) = , 𝑃(𝑋 = 4) = , 𝑃(𝑋 = 5) =
32 32 32 32 32
Alternatively, we have:
5
( )
𝑃(𝑋 = 𝑥) = 𝑥 ; 𝑥 = 0,1,2, . . . ,5
32
A function showing the probability of each of the random variable's possible
values (here means discrete) is said to be the probability function or the probability
mass function.
In fact, a random variable is a function defined on the sample space such
that the sample space members indicate the domain's members and the random
variable values represent the function's range. Moreover, the probability function
is a function defined on the set of random variable values that the values of
random variable refer to the domain's members and each probability denotes the
members of the function's range. For instance, in the aforementioned example,
we have:
163 | P a g e
Example 1.1
An urn contains 5 white and 6 black balls. Suppose that you randomly and
without replacement select 3 balls from the urn. If random variable 𝑋 denotes the
number of white balls in a sample of size 3, determine its probability function.
Solution. The main sample space of this problem consists of (11

3
) equally likely
members. Considering the definition of the random variable 𝑋, it assigns one of the
numbers 0, 1, 2, and 3 to each of the sample space members, and the probability
function of the random variable 𝑋 is as follows:
5 6 5 6
( )( ) ( )( )
𝑃 (𝑋 = 0) = 0 3 𝑃 (𝑋 = 1) = 1 2
11 11
( ) ( )
3 3
5 6 5 6
( )( ) ( )( )
𝑃 (𝑋 = 2) = 2 1 𝑃 (𝑋 = 3) = 3 0
11 11
( ) ( )
3 3
Alternatively, we have:
5 6
( )( )
𝑃 (𝑋 = 𝑖) = 𝑖 3 − 𝑖 ; 𝑖 = 0,1,2,3
11
( )
3
164 | P a g e
Example 1.2
Suppose that we successively flip a fair coin until either a heads appears or a
total of 4 flips is obtained. If the random variables 𝑋, 𝑌, and 𝑍 are defined as follows,
determine their probability function.
a. 𝑋: the number of times that the coin is flipped.
b. 𝑌: the number of appeared tails.
c. 𝑍: the number of appeared heads.
Solution. The sample space of the trial is as follows:
𝑆 = {𝐻, 𝑇𝐻, 𝑇𝑇𝐻, 𝑇𝑇𝑇𝐻, 𝑇𝑇𝑇𝑇}
a. The random variable 𝑋 denotes the number of times that the coin is flipped.
As a result, considering the above sample space, it can adopt one of the
integers of 1 to 4 and its probability function is as follows:
1
𝑃(𝑋 = 1) = {𝐻} =
2
1 1 1
𝑃(𝑋 = 2) = {𝑇𝐻} = × =
2 2 4
1 1 1
𝑃(𝑋 = 3) = {𝑇𝑇𝐻} = ( )2 ( ) =
2 2 8
1 3 1 1 1
𝑃(𝑋 = 4) = 𝑃{𝑇𝑇 𝑇𝐻} + 𝑃{𝑇𝑇𝑇 𝑇} = ( ) ( ) + ( )4 =
2 2 2 8
b. The random variable 𝑌 denotes the number of appeared tails. Therefore,

considering the sample space of the problem, this random variable can adopt
one of the integers of 0 to 4 and its probability function is as follows:
1
𝑃(𝑌 = 0) = {𝐻} =
2
1 1 1
𝑃(𝑌 = 1) = {𝑇𝐻} = × =
2 2 4
1 1 1
𝑃(𝑌 = 2) = {𝑇𝑇𝐻} = ( )2 ( ) =
2 2 8
165 | P a g e
1 1 1
𝑃(𝑌 = 3) = 𝑃{𝑇𝑇 𝑇𝐻} = ( )3 ( ) =
2 2 16
1 4 1
𝑃(𝑌 = 4) = 𝑃{𝑇𝑇𝑇 𝑇} = ( ) =
2 16
c. The random variable 𝑍 denotes the number of upturned heads. Therefore,

considering the sample space of the problem, it can adopt either the value of
0 or 1 and its probability function is as follows:
1 1
𝑃(𝑍 = 0) = 𝑃{𝑇𝑇𝑇 𝑇} = ( )4 =
2 16
𝑃(𝑍 = 1) = 𝑃{𝐻} + 𝑃{𝑇𝐻} + 𝑃{ 𝑇𝑇𝐻} + 𝑃{𝑇𝑇 𝑇𝐻}
1 1 1 1 1 1 1 15
= ( ) + ( )1 ( ) + ( )2 ( ) + ( )3 ( ) =
2 2 2 2 2 2 2 16
However, since 𝑍 adopts only the value of 0 or 1, to calculate 𝑃(𝑍 = 1), the
following method can be used:
1 15
𝑃(𝑍 = 1) = 1 − 𝑃(𝑍 = 0) = 1 − =
16 16
Note that, in each of the above examples, the sum of the probabilities of values
associated with each of the aforementioned variables should be equal to 1. This
point can be used to control and investigate the correctness of the
calculations.
D epending on the range of values of a random variable, it is categorized as

discrete, continuous, or mixed variable. Discrete random variables are those
that can adopt the countable number of real values. In fact, these random variables
are those that their range or possible values are discrete or disjoint. Moreover,
continuous random variables are those that their range or possible values are
innumerable. Finally, there is also another type of random variables created by a
combination of discrete and continuous variables. They are called mixed random
variables, whose parts of possible values are discrete, and the other parts are
continuous.
166 | P a g e
A s mentioned, discrete random variables are those that their range or possible
values are discrete or disjoint. For this type of random variables, there is a
function entitled as the probability mass function or often abridged as the
probability function representing the probability magnitude of each of these
possible values associated with this type of random variables. Indeed, in this
chapter, all of the previous examples relate to discrete random variables.
Other examples of this type of random variables are such as:
➢ The number of defective parts in a sample of size n from a production
line
➢ The number of earthquakes occurring yearly in city A
➢ The number of flips of a coin until a heads comes up
Many random variables are applied to count the number of times that an event
occurs. In such cases, it is evident that this type of random variables does not have a
unit. For instance, in Example 1.2, none of random variables 𝑋, 𝑌, and 𝑍 has a unit.
However, if we receive $1 for each heads and pay $1 for each tails in this example, the
obtained profit has dollar unit.
Example 3.1
Consider an urn containing 10 marbles numbered 1 through 10. We randomly

and with replacement select 4 marbles. If we define random variable 𝑋 to denote the
smallest number selected, obtain its probability function.
Solution. As mentioned in Example 4.5 of Chapter 2, to calculate 𝑃(𝑋 = 𝑖) in this
problem, it is better to use the following equation:
𝑃(𝑋 = 𝑖) = 𝑃(𝑋 ≥ 𝑖) − 𝑃(𝑋 > 𝑖)
Event 𝑋 ≥ 𝑖 means that all the choices are greater than or equal to number 𝑖
and the number of states that the choices are greater than or equal to i equals 10 −
(𝑖 − 1) = 11 − 𝑖. In addition, event 𝑋 > 𝑖 means that all the choices are greater than 𝑖
167 | P a g e
and the number of states that the choices are greater than i equals 10 − 𝑖. Therefore,
the required probability equals:
(11 − 𝑖)4 (10 − 𝑖)4
𝑃(𝑋 = 𝑖) = 𝑃(𝑋 ≥ 𝑖) − 𝑃(𝑋 > 𝑖) = − ; 𝑖 = 1,2, . . . ,10
104 104
Example 3.2
In the preceding example, if we define random variable 𝑌 to denote the

greatest number selected, obtain its probability function.
Solution. As mentioned in Example 4.7 of Chapter 2, to calculate 𝑃(𝑌 = 𝑖) in this
problem, it is better to use the following equation:
𝑃(𝑌 = 𝑖) = 𝑃(𝑌 ≤ 𝑖) − 𝑃(𝑌 < 𝑖)
Event 𝑌 ≤ 𝑖 means that all the choices are less than or equal to number 𝑖 and
the number of states that the choices are less than or equal to 𝑖 equals 𝑖. In addition,
event 𝑌 < 𝑖 means that all the choices are less than 𝑖 and the number of states that
the choices are less than 𝑖 equals 𝑖 − 1. Therefore, the required probability equals:
𝑖4 (𝑖 − 1)4
𝑃(𝑌 = 𝑖) = 𝑃(𝑌 ≤ 𝑖) − 𝑃(𝑌 < 𝑖) = − ; 𝑖 = 1,2, . . . ,10
104 104
Example 3.3
Solve again Examples 3.1 and 3.2 under the condition that the choices are
without replacement.
Solution. In Example 3.1, if the sampling is without replacement, considering
Example 4.6 of Chapter 2, since the smallest number selected should be equal to
number 𝑖, number 𝑖 should be selected and the other choices should be greater than
number 𝑖. Moreover, it should be noted that if the sampling is without replacement,
168 | P a g e
the smallest number selected cannot be greater than 7. To clarify the point, consider
an example that the smallest obtained number is to be equal to 8. If so, 3 numbers
greater than that should be selected which is an event with the probability of zero.
Therefore, the probability function of this random variable is as follows:
1 10 − 𝑖
( )( )
𝑃(𝑋 = 𝑖) = 1 3 ; 𝑖 = 1,2, . . . ,7
10
( )
4
In Example 3.2, if the sampling is without replacement, considering Example
4.8 of Chapter 2, since the greatest obtained number should be equal to number 𝑖,
number 𝑖 should be selected and the other choices should be less than number 𝑖.
Moreover, in sampling without replacement, the smallest selected number cannot be
less than 4. To clarify the point, assume that the greatest obtained number is to be
equal to 2. If so, 3 numbers less than that should be selected which is an event with
the probability of zero. Therefore, the probability function of this random variable is
as follows:
1 𝑖−1
( )( )
𝑃(𝑌 = 𝑖) = 1 3 ; 𝑖 = 4,5, . . . ,10
10
( )
4
A s mentioned, in addition to discrete random variables investigated in Section 4.3,

there is another type of random variables with innumerable or continuous
possible values. These are called as the continuous random variables. Some examples
of this type are as follows:
➢ The lifetime of light bulbs produced in a factory
➢ The weight of each cement bag produced in a related factory
➢ The jump magnitude of an athlete in a triple jump competition
➢ The daily amount of rain in a certain district
One of the properties of this type of random variables is that they usually
possess a unit of measurement. In other words, unlike some discrete random
variables that have no units of measurement (like the random variables of Example
1.2), continuous random variables usually have a unit of measurement such as the
minute, gram, meter, square meter, and to name but a few.
169 | P a g e
Discrete random variables are defined by the probability function or 𝑃(𝑋 = 𝑥).
On the contrary, continuous random variables are not defined by 𝑃(𝑋 = 𝑥) .This is
because of the fact that the probability function of these random variables converges
to zero in each point. For instance, it can be shown the probability that the lifetime
of a light bulb takes on a value in the range of 10000 ± 1 (measured in seconds) is
negligible. Now, since this interval contains innumerable number of points, it can be
concluded the probability that the lifetime of the light bulb is exactly equal to 10000
converges to zero. Therefore, instead of the probability function for continuous
random variables, we define the probability density function or briefly density
function. In this type of random variables, the probability density function is defined
as follows:
𝑑𝑥 𝑑𝑥
𝑃(𝑥 − ≤𝑋≤𝑥+ )
𝑓𝑥 (𝑥) = 𝑙𝑖𝑚 2 2
𝑑𝑥→ 0 𝑑𝑥
Note that the concept of the above function is the same as that of the linear
density in physics. However, in physics, to define the linear density in one point, we
divide the mass of a very small region around that point by the length of the interval.
On the other hand, in the probability, to define the density in one point, we divide
the probability of very small region around that point by the length of the interval.
The reader should note that the density function or 𝑓𝑋 (𝑥) does not have the
probability feature. It has a unit equal to the inverted unit of the random variable. In
fact, in continuous space, 𝑓𝑥 (𝑥)𝑑𝑥 has features of probability and does not have a unit:
𝑑𝑥 𝑑𝑥
𝑙𝑖𝑚 𝑃(𝑥 − ≤ 𝑋 ≤ 𝑥 + ) = 𝑙𝑖𝑚 𝑓𝑥 (𝑥)𝑑𝑥
𝑑𝑥→ 0 2 2 𝑑𝑥→ 0
Using the above-mentioned definitions, to calculate the probability of an

interval such as [𝑎, 𝑏] from the continuous random variable 𝑋, we divide the interval
into small parts of length 𝑑𝑥 and add the probabilities of those small parts as follows:
𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫ 𝑓𝑥 (𝑥)𝑑𝑥
𝑎
Thus, since the integral under the curve of the density function in interval
[𝑎, 𝑏] is equal to the area under the curve of the density function in that interval, it
170 | P a g e
can be concluded that the probability that continuous random variable 𝑋 takes on a
value in interval [𝑎, 𝑏] is equal to its area under the curve of density function in
interval [𝑎, 𝑏].
Figure 4-1 The area under the curve of density funnction in interval [𝑎, 𝑏]
Since the sum of probabilities of all possible values of one random variable is
equal to 1, for continuous random variables, the total area under the curve of the
density function is equal to 1. In other words, the following feature is always true for
the density function:
𝑃(𝑋 ∈ (−∞, +∞)) = ∫𝑓𝑥 (𝑥)𝑑𝑥 = 1

𝑥
Furthermore, since the probability of one point for continuous random

variables converges to zero, for the continuous random variable 𝑋, we have:
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝑃(𝑎 < 𝑋 < 𝑏)
Some features of the density function for the continuous random variable 𝑋
can be shown as follows:
𝐼) 𝑓𝑥 (𝑥) ≥ 0
𝐼𝐼) ∫𝑓𝑥 (𝑥)𝑑𝑥 = 1

𝑥
𝐼𝐼𝐼) 𝑙𝑖𝑚 𝑓(𝑥) = 𝑙𝑖𝑚 𝑓(𝑥) = 0

𝑥→+∞ 𝑥→−∞
Equation I is true because it cannot adopt a negative value based on the density
function definition. Besides, if the density function can adopt a negative value, it is
171 | P a g e
possible that the probability of some regions becomes negative, which is in contrast
with the probability theory principles. As mentioned, equation II is also true because
the sum of the probabilities of values for each random variable should be equal to 1.
Equation III is also true because if the value of density function does not converge to
zero in infinity (positive or negative), equation II would not be true.
Example 4.1
Suppose that, in a chemical experiment, the measurement error of the

temperature follows a continuous random variable with the following density
function:
𝑥2
𝑓𝑋 (𝑥) = ; −1 < 𝑥 < 2
3
Investigate whether 𝑓𝑋 (𝑥) is a probability density function or not, then obtain
𝑃(0 ≤ 𝑋 ≤ 1).
Solution. 𝑓𝑋 (𝑥) should be nonnegative in order to be a density function. This
condition is met since 𝑓𝑋 (𝑥) is a perfect square. Moreover, the integral over all the
possible values should be equal to 1, which is also due to:
2
𝑥2 𝑥3 2 8 1
∫𝑓𝑋 (𝑥) 𝑑𝑥 = ∫ 𝑑𝑥 = | −1 = + = 1
𝑥 −1 3 9 9 9
Now, to calculate 𝑃(0 ≤ 𝑋 ≤ 1), we should take integral over the values in
interval [0,1] as follows:
1
𝑥2 𝑥3 1 1
𝑃(0 ≤ 𝑋 ≤ 1) = ∫ 𝑑𝑥 = | 0 =
0 3 9 9
172 | P a g e
Example 4.2
Experience has shown that the arrival time between two consecutive
customers in a store follows a random variable with the following density function:
𝑐𝑒 −2𝑥 ; 𝑥≥0
𝑓(𝑥) = {
0 ; 𝑥<0
a. Find the value of c.
b. Calculate 𝑃(1 < 𝑋 < 2).
c. Calculate 𝑃(𝑋 ≤ 𝑎).
Solution.
a.
+∞ +∞ ∞
𝑐 𝑐
∫ 𝑓𝑋 (𝑥)𝑑𝑥 = ∫ 𝑐𝑒 −2𝑥 𝑑𝑥 = (− 𝑒 −2𝑥 )| = 1 ⇒ − [0 − 1] = 1 ⇒ 𝑐 = 2
−∞ 0 2 0 2
b.
2
𝑃(1 < 𝑋 < 2) = ∫ 2𝑒 −2𝑥 𝑑𝑥 = (−𝑒 −2𝑥 )|12 = 𝑒 −2 − 𝑒 −4
1
c.
𝑎 𝑎
𝑃(𝑋 ≤ 𝑎) = ∫ 𝑓𝑋 (𝑥) 𝑑𝑥 = ∫ 2𝑒 −2𝑥 𝑑𝑥 = (−𝑒 −2𝑥 )|𝑎0 = 1 − 𝑒 −2𝑎
−∞ 0
Example 4.3
In a store, the daily demand magnitude for a kind of chemical material

(measured in kilograms) follows a continuous distribution with the following density
function:
𝑓𝑋 (𝑥) = 1.25 × 10−5 𝑥; 0 ≤ 𝑥 ≤ 400
173 | P a g e
Consider that the storage capacity of the store is 360 kilograms, which is filled
at the beginning of each day. What is the probability that the store is confronted with
the deficiency of this chemical material in one day?
Solution. The store is confronted with the deficiency when the demand in one day is
greater than 360 kilograms. Therefore, the deficiency probability is equal to:
400 400
−5 −5
𝑥2
𝑃(𝑋 > 360) = ∫ 1.25 × 10 𝑥𝑑𝑥 = 1.25 × 10 | = 0.19
360 2 360
As a result, on each specific day, this store faces a deficiency with probability
19 percent.
A s mentioned before, except for the discrete and continuous random variables,
there is another type of random variables consisting of a combination of discrete
and continuous random variables called mixed random variables. In this type, some
parts of the range are discrete and some others are continuous. For the part with
discrete range, the probability mass function is defined, and for the part with
continuous range, the probability density function is defined. For instance, note the
following examples:
Example 5.1
Suppose that waiting time (measured in seconds) of cars at a stoplight follows

1
a mixed random variable. Cars meet green light with the probability of and their
2
waiting time at the spotlight equals zero. Otherwise, their waiting time follows
1
density function 𝑓(𝑥) = 60 ; 0 < 𝑥 < 30. In such a case, what percentage of cars wait
less than 15 seconds at the spotlight?
174 | P a g e
Solution. In this mixed random variable, the probability function for discrete values
and the density function for continuous values are as follows:
1
𝑃(𝑥) = ; 𝑥=0
{ 2
1
𝑓(𝑥) = ; 0 < 𝑥 < 30
60
Therefore, to calculate the probability that one car waits less than 15 seconds,
we have:
15
1 1 1 1
𝑃(𝑋 < 15) = 𝑃(𝑋 = 0) + 𝑃(0 < 𝑋 < 15) = + ∫ 𝑑𝑥 = + = 0.75
2 0 60 2 4
Hence, 75 percent of cars wait less than 15 seconds at the spotlight.
Example 5.2
A factory produces light bulbs. 10 percent of them do not work and for the
other light bulbs, the density function of working (measured in years) is 𝑓(𝑥) =
9
𝑒 −𝑥 ; 𝑥 > 0. If we randomly select a light bulb from the factory, what is the
10
probability that the light bulb lasts less than one year?
Solution. We let X denote the working time of the light bulbs. This random variable
is a mixed random variable with the following probability function for discrete values
and density function for continuous values:
1
𝑃(𝑥) = ; 𝑥=0
{ 10
9 −𝑥
𝑓(𝑥) = 𝑒 ; 𝑥>0
10
Therefore, to calculate the probability that a light bulb lasts less than one year,
we have:
1
1 9 −𝑥
𝑃(𝑋 < 1) = 𝑃(𝑋 = 0) + 𝑃(0 < 𝑋 < 1) = +∫ 𝑒 𝑑𝑥
10 0 10
175 | P a g e
1
1 9 −𝑥
1 9 9
= + (−𝑒 )| = + (1 − 𝑒 −1 ) = 1 − 𝑒 −1
10 10 0 10 10 10
However, there is another solution for this problem as follows:

∞
9 −𝑥
𝑃(𝑋 < 1) = 1 − 𝑃(𝑋 ≥ 1) = 1 − ∫ 𝑒 𝑑𝑥
1 10
∞
9 9 9
= 1 − (−𝑒 −𝑥 )| = 1 − [0 − (−𝑒 −1 )] = 1 − 𝑒 −1
10 1 10 10
A s mentioned before, for discrete random variables and continuous random

variables, the probability function and probability density function are
respectively defined. Moreover, for mixed random variables, the probability function
is defined for discrete parts and the density function is defined for continuous parts.
However, there is a function known as the cumulative distribution function
commonly defined for all three types of random variables, which is defined as follows:
𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎)
That is, the cumulative distribution function of the random variable 𝑋 at point
“𝑎” is equal to the probability that the random variable 𝑋 adopts a value less than or
equal to “𝑎”.
For example, suppose that the random variable 𝑋 adopts values 1, 2, and 3 with
1 1 1
respective probabilities 4, 2, and 4. In such a case, the probability function of this
random variable is as follows:
1
; 𝑥=1
4
1
𝑃𝑋 (𝑥) = ; 𝑥=2
2
1
{4 ; 𝑥 = 3
To obtain the cumulative distribution function of this variable, according to
the definition of this function, we should obtain for each point like x the probability
that 𝑋 takes on values less than or equal to it. Namely, it is evident that the function
176 | P a g e
3
for 𝑥 = 2 is equal to 𝐹𝑋 (2) = 𝑃(𝑋 ≤ 2) = 4, for 𝑥 = 1.5 is equal to 𝐹𝑋 (1.5) = 𝑃(𝑋 ≤ 1.5) =
1
, and for 𝑥 = 0 is equal to 𝐹𝑋 (0) = 𝑃(𝑋 ≤ 0) = 0. Likewise, the cumulative distribution
4
function value can be obtained for all the points and its function is as follows:
0 ; 𝑥<1
1
; 1≤𝑥<2
𝐹𝑋 (𝑥) = 4
3
; 2≤𝑥<3
4
{0 ; 3≤𝑥
The cumulative distribution function of the continuous random variable 𝑋 is

also obtained as follows:
𝑎
𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = 𝑃(𝑋 < 𝑎) = ∫ 𝑓𝑋 (𝑥) 𝑑𝑥
−∞
The above equation is simply the area under the density function curve in an
interval less than or equal to 𝑎.
Figure 4-2 The cumulative distribution function of the continuous random variable X
at point a or the area under the density funnction curve in interval (−∞, 𝑎)
The reader should note that the cumulative distribution function possesses
important properties, some of which are addressed below:
1. The cumulative distribution function is nondecreasing. This is due to the fact
that if 𝑎 < 𝑏, then event 𝑋 ≤ 𝑎 is a subset of event 𝑋 ≤ 𝑏. Therefore, according
to Proposition 5.2 in Chapter 2, 𝑃(𝑋 ≤ 𝑎) = 𝐹𝑋 (𝑎) is less than or equal to:
177 | P a g e
𝑃(𝑋 ≤ 𝑏) = 𝐹𝑋 (𝑏)
2. The cumulative distribution function is right-continuous. This means that for

each “𝑎” and each decreasing sequence 𝑎𝑛 (𝑛 ≥ 1) converges to “𝑎”, we have:
𝑙𝑖𝑚 𝐹𝑋 (𝑎𝑛 ) = 𝐹𝑋 (𝑎)
𝑛→∞
3. 𝑙𝑖𝑚 𝐹(𝑥) = 0
𝑥→−∞
4. 𝑙𝑖𝑚 𝐹(𝑥) = 1
𝑥→+∞
Proof of Equations 2, 3, and 4 results from Theorem 1.7 expressed in reference

[2].
Since the type of cumulative distribution function is probability, we can
calculate the probability any event of a random variable with this function. For
instance, we have:
1. To calculate 𝑃(𝑋 > 𝑎), considering the equation 𝑃(𝑋 > 𝑎) = 1 − 𝑃(𝑋 ≤ 𝑎), we
have:
𝑃(𝑋 > 𝑎) = 1 − 𝐹𝑋 (𝑎)
2. To calculate 𝑃(𝑋 < 𝑎), considering Theorem 1.7 of reference [2], we have:
1 1
𝑃(𝑋 < 𝑎) = 𝑙𝑖𝑚 𝑃 (𝑋 ≤ 𝑎 − ) = 𝑙𝑖𝑚 𝐹𝑋 (𝑎 − ) = 𝐹𝑋 (𝑎− )
𝑛→∞ 𝑛 𝑛→∞ 𝑛
3. To calculate 𝑃(𝑋 ≥ 𝑎), considering the equation 𝑃(𝑋 ≥ 𝑎) = 1 − 𝑃(𝑋 < 𝑎), we
have:
1
𝑃(𝑋 ≥ 𝑎) = 1 − 𝑃(𝑋 < 𝑎) = 1 − 𝑃(𝑋 < 𝑎) = 1 − 𝑙𝑖𝑚 𝐹𝑋 (𝑎 − ) = 1 − 𝐹𝑋 (𝑎− )
𝑛→∞ 𝑛
4. To calculate 𝑃(𝑋 = 𝑎), considering the equation 𝑃(𝑋 ≤ 𝑎) = 𝑃(𝑋 < 𝑎) + 𝑃(𝑋 =
𝑎), we have:
1
𝑃(𝑋 = 𝑎) = 𝑃(𝑋 ≤ 𝑎) − 𝑃(𝑋 < 𝑎) = 𝐹𝑋 (𝑎) − 𝑙𝑖𝑚 𝐹𝑋 (𝑎 − ) = 𝐹𝑋 (𝑎) − 𝐹𝑋 (𝑎− )
𝑛→∞ 𝑛
In fact, the equation above states that the jump magnitude of the cumulative
distribution function at each point is equal to the probability of that point.
5. Considering Equations of 1 to 4, we use the following equations to calculate
the probability of some different intervals of the random variable 𝑋 for (𝑎, 𝑏)
such that 𝑎 < 𝑏:
1
➢ 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝑃(𝑋 ≤ 𝑏) − 𝑃(𝑋 < 𝑎) = 𝐹𝑋 (𝑏) − 𝑙𝑖𝑚 𝐹𝑋 (𝑎 − 𝑛) = 𝐹𝑋 (𝑏) − 𝐹𝑋 (𝑎− )
𝑛→∞
178 | P a g e
➢ 𝑃(𝑎 < 𝑋 ≤ 𝑏) = 𝑃(𝑋 ≤ 𝑏) − 𝑃(𝑋 ≤ 𝑎) = 𝐹𝑋 (𝑏) − 𝐹𝑋 (𝑎)
1
➢ 𝑃(𝑎 < 𝑋 < 𝑏) = 𝑃(𝑋 < 𝑏) − 𝑃(𝑋 ≤ 𝑎) = 𝑙𝑖𝑚 𝐹𝑋 (𝑏 − 𝑛) − 𝐹𝑋 (𝑎) = 𝐹𝑋 (𝑏 − ) − 𝐹𝑋 (𝑎)
𝑛→∞
1 1
➢ 𝑃(𝑎 ≤ 𝑋 < 𝑏) = 𝑃(𝑋 < 𝑏) − 𝑃(𝑋 < 𝑎) = 𝑙𝑖𝑚 𝐹𝑋 (𝑏 − 𝑛) − 𝑙𝑖𝑚 𝐹𝑋 (𝑎 − 𝑛)
𝑛→∞ 𝑛→∞
= 𝐹𝑋 (𝑏 − ) − 𝐹𝑋 (𝑎− )
Example 6.1
Cumulative distribution function of the random variable 𝑋 is given by:

0 ; 𝑥<0
𝑥
; 0≤𝑥<1
10
𝑥2
𝐹𝑋 (𝑥) = ; 1≤𝑥<2
5
9
; 2≤𝑥<3
10
{1 ; 3≤𝑥
It is desired to calculate:
a. 𝑃(𝑋 > 1)
b. 𝑃(𝑋 < 2)
c. 𝑃(𝑋 = 1)
d. 𝑃(1 < 𝑋 ≤ 2)
e. 𝑃(2 ≤ 𝑋 ≤ 4)
Solution.
1 4
a. 𝑃(𝑋 > 1) = 1 − 𝐹𝑋 (1) = 5 = 5
1
1 (2− )2 4
b. 𝑃(𝑋 < 2) = 𝑙𝑖𝑚 𝐹𝑋 (2 − 𝑛) = 𝑙𝑖𝑚 𝑛
=5
𝑛→∞ 𝑛→∞ 5
1
1 1 1− 1 1 1
c. 𝑃(𝑋 = 1) = 𝐹𝑋 (1) − 𝑙𝑖𝑚 𝐹𝑋 (1 − 𝑛) = 5 − 𝑙𝑖𝑚 𝑛
= 5 − 10 = 10
𝑛→∞ 𝑛→∞ 10
9 1 7
d. 𝑃(1 < 𝑋 ≤ 2) = 𝐹𝑋 (2) − 𝐹𝑋 (1) = 10 − 5 = 10
1 4 1
e. 𝑃(2 ≤ 𝑋 ≤ 4) = 𝐹𝑋 (4) − 𝑙𝑖𝑚 𝐹𝑋 (2 − 𝑛) = 1 − 5 = 5
𝑛→∞
179 | P a g e
The cumulative distribution function curve of X is as follows:
It is seen that the cumulative distribution function of this random variable

within the intervals 0 ≤ 𝑥 < 1 and 1 ≤ 𝑥 < 2 continuously increases and has a jump at
points 1, 2, and 3. In fact, this random variable is a mixed random variable taking on
continuous values within the intervals 0 ≤ 𝑥 < 1 and 1 ≤ 𝑥 < 2 and discrete values at
points 1, 2, and 3.
As previously mentioned, the cumulative distribution function of the
continuous random variable 𝑋 is defined as follows:
𝑎
𝐹𝑥 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = ∫ 𝑓(𝑥)𝑑𝑥
−∞
Now, differentiating both sides of the preceding equation yields the

following equation:
𝑑𝐹𝑥 (𝑎)
= 𝑓𝑥 (𝑎)
𝑑𝑎
Intuitively, the concept of the derivative of cumulative distribution function
at one point like 𝑥 with an increment of length 𝑑𝑥 equals:
𝐹𝑥 (𝑥 + 𝑑𝑥) − 𝐹𝑥 (𝑥) 𝑑𝐹𝑥 (𝑥)
𝑙𝑖𝑚 = 𝑓𝑥 (𝑥) ⇒ = 𝑓𝑥 (𝑥)
𝑑𝑥→0 𝑑𝑥 𝑑𝑥
Where the term 𝐹𝑥 (𝑥 + 𝑑𝑥) − 𝐹𝑥 (𝑥) in the below figure is equal to the area of
the shaded region that if it is divided by 𝑑𝑥, it equals the height of the shaded
rectangle, i.e., 𝑓𝑥 (𝑥). Therefore, the derivative or slope of the cumulative
distribution function at each point equals the density of that point.
180 | P a g e
Figure 4-3 Variations of the cumulative distribution function of the continuous
random variable X at a point like x with an increment of length dx
Example 6.2
Suppose that the cumulative distribution function of a random variable is as

follows:
1 − 𝑒 −2𝑥 ; 𝑥≥0
𝐹𝑋 (𝑥) = {
0 ; 𝑥<0
Obtain its density function.
Solution. As mentioned, to obtain the density function of a continuous random

variable, we can differentiate its cumulative distribution function. Therefore, we
have:
𝑑𝐹𝑋 (𝑥) 2𝑒 −2𝑥 ; 𝑥≥0
𝑓𝑋 (𝑥) = ={
𝑑𝑥 0 ; 𝑥<0
T here are some values in random variables that are important in practice. Some
of these values are median, mode, and mean. However, it can be shown that all
of these values necessarily do not exist. We introduce the median and mode in this
181 | P a g e
chapter followed by the explanation of the mean of a random variable in the next
chapter.
Definition: The median of a random variable is a value that its less-than-or-equal-to
probability is equal to at least 0.5, and that its greater-than-or-equal-to probability
is equal to at least 0.5.
𝑃(𝑋 ≥ 𝑚) ≥ 0.5, 𝑃(𝑋 ≤ 𝑚) ≥ 0.5
Nevertheless, it can simply be shown that, for continuous random variables,
the median is a value that its less-than-or-equal-to probability is equal to 0.5, and
that its greater-than-or-equal-to probability is equal to 0.5.
Definition: The mode of a random variable is a value with the highest probability. For
continuous random variables, the mode is a value with the highest probability
density.
Example 7.1
In each of the following discrete random variables, determine the median and
mode.
1
; 𝑥=1 0.2 ; 𝑦=1
4
1 0.3 ; 𝑦=2
𝑃𝑋 (𝑥) = ; 𝑥=2 𝑃𝑌 (𝑦) = {
2 0.3 ; 𝑦=3
1 0.2 ; 𝑦=4
{4 ; 𝑥=3
182 | P a g e
Solution. It is evident that the mode of the random variable 𝑋 is equal to 2, but
random variable 𝑌 has two modes with the values of 2 and 3.
Considering the definition of median for the random variable 𝑋, the only value
that its less-than-or-equal-to probability and that its greater-than-or-equal-to
probability are equal to at least 0.5 is value 2. This means that the other values do not
have this property.
3 3
𝑃(𝑋 ≥ 2) = , 𝑃(𝑋 ≤ 2) =
4 4
However, for random variable 𝑌, not only one value does have the mentioned
property for the median, but also all values within the interval [2,3] possesses this
property. For instance, for values 2, 2.5, and 3, we have:
𝑃(𝑌 ≥ 2) = 0.8 𝑃(𝑌 ≤ 2) = 0.5
𝑃(𝑌 ≥ 2.5) = 0.5 𝑃(𝑌 ≤ 2.5) = 0.5
𝑃(𝑌 ≥ 3) = 0.5 𝑃(𝑌 ≤ 3) = 0.8
Example 7.2
Suppose that the random variable 𝑋 has density function 𝑓𝑋 (𝑥) = 2𝑒 −2𝑥 ; 𝑥 ≥ 0.
Determine the median and mode of this random variable.
Solution. The derivative of this density function is negative. Therefore, it is a
decreasing function and its mode is value zero. To obtain its median, we use the
following equation:
183 | P a g e
𝑚
1 1 1
𝐹𝑋 (𝑚) = 𝑃(𝑋 ≤ 𝑚) = ⇒ ∫ 2𝑒 −2𝑥 𝑑𝑥 = −𝑒 −2𝑥 ]𝑚
0 =1−𝑒
−2𝑚
= ⇒ 𝑒 −2𝑚 =
2 0 2 2
1 𝑙𝑛 2
⇒ −2𝑚 = 𝑙𝑛 = − 𝑙𝑛 2 ⇒ 𝑚 =
2 2
S ometimes we know the distribution of a random variable, yet we would like to

determine the distribution of a function of a random variable. If 𝑋 is a random
variable, then, in order to determine the distribution of 𝑌 = 𝑔(𝑋), we first should
recognize whether 𝑌 is discrete or continuous by determining the possible values of
random variable 𝑌. If 𝑌 = 𝑔(𝑋) is a discrete random variable, then, in order to
determine its distribution, we should obtain its probability function. On the other
hand, if 𝑌 = 𝑔(𝑋) is a continuous random variable, then, in order to obtain its
distribution, we should obtain its density function. To do so, firstly, we obtain the
cumulative distribution function and then differentiate it. This results in the density
function. For better comprehension of this issue, consider the following examples.
Example 8.1
If 𝑋 is a continuous random variable with the density function of

1
𝑓𝑋 (𝑥) = 10 ; 0 < 𝑥 < 10,
then determine the probability function of random variable 𝑌 = [𝑋] (the sign
[ ] denotes the integer part).
Solution. The random variable 𝑌 = [𝑋] can take on the values of 1 to 9. Thus, 𝑌 is a
discrete random variable and in order to determine its distribution, we should find
its probability function. Hence, we have:
184 | P a g e
1
1 1
𝑃(𝑌 = 0) = 𝑃(0 ≤ 𝑋 < 1) = ∫ dx=
0 10 10
2
1 1
𝑃(𝑌 = 1) = 𝑃(1 ≤ 𝑋 < 2) = ∫ dx=
1 10 10
⋮
10
1 1
𝑃(𝑌 = 9) = 𝑃(9 ≤ 𝑋 < 10) = ∫ dx=
9 10 10
Therefore, as it is seen, the random variable Y adopts discrete values 0 to 9
1
with probability 10.
Example 8.2
If the continuous random variable 𝑋 has density function 𝑓𝑋 (𝑥) = 𝜆𝑒 −𝜆𝑥 ; 𝑥 ≥ 0,

then determine the probability density function of random variable 𝑌 = 𝑋 2 .
Solution. The random variable 𝑋 adopts continuous nonnegative values. Therefore,
the random variable 𝑌 = 𝑋 2 also adopts continuous nonnegative values. Hence, to
obtain the distribution of 𝑌 = 𝑋 2 , firstly, 𝐹𝑌 (𝑦) should be obtained and then by
differentiating it with respect to 𝑦, the density function of 𝑌 or 𝑓𝑌 (𝑦) is resulted.
√𝑦
√𝑦
𝐹𝑌 (𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑋 2 ≤ 𝑦) = 𝑃(𝑋 ≤ √𝑦) = ∫ 𝜆𝑒 −𝜆𝑥 𝑑𝑥 = −𝑒 −𝜆𝑥 ]0 = 1 − 𝑒 −𝜆√𝑦 ; 𝑦 ≥ 0
0
𝑑𝐹𝑌 (𝑦) 𝜆 −𝜆√𝑦
⇒ 𝑓𝑌 (𝑦) = = 𝑒 ; 𝑦≥0
𝑑𝑦 2√𝑦
Example 8.3
If continuous random variable 𝑋 has density function 𝑓𝑋 (𝑥) = 1; 0 ≤ 𝑥 ≤ 1,

then determine the probability density function of random variable 𝑌 = 𝑋 𝑛 .
185 | P a g e
Solution. The random variable 𝑋 takes on continuous values within interval (0,1).
Therefore, random variable 𝑌 = 𝑋 𝑛 also takes on continuous values within interval
(0,1). Consequently, to obtain the distribution of 𝑌 = 𝑋 𝑛 , firstly, 𝐹𝑌 (𝑦) should be
obtained and then by differentiating it with respect to 𝑦, the density function of 𝑌 or
𝑓𝑌 (𝑦) is resulted.
1 1
1 𝑦𝑛 𝑦𝑛 1
𝐹𝑌 (𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑋 𝑛 ≤ 𝑦) = 𝑃 (𝑋 ≤ 𝑦𝑛) = ∫ 𝑓(𝑥) dx = ∫ dx = 𝑦 𝑛 ; 0 ≤ 𝑦 ≤ 1
0 0
𝑑𝐹𝑌 (𝑦) 1 1−1

⇒ 𝑓𝑌 (𝑦) = = 𝑦𝑛 ; 0 ≤ 𝑦 ≤ 1
𝑑𝑦 𝑛
Example 8.4
1
If continuous random variable 𝑋 has density function 𝑓𝑋 (𝑥) = 2 ; − 1 ≤ 𝑥 ≤ 1,
then determine the probability density function of random variable 𝑌 = 𝑋 2 .
Solution. The random variable 𝑋 takes on continuous values within the interval
[−1,1]. Therefore, random variable 𝑌 = 𝑋 2 takes on continuous values within the
interval [0,1]. Hence, for values 0 ≤ 𝑦 ≤ 1, we have:
𝐹𝑌 (𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑋 2 ≤ 𝑦) = 𝑃(−√𝑦 ≤ 𝑋 ≤ √𝑦)

√𝑦 2√𝑦
=∫ 𝑓(𝑥) dx = = √𝑦; 0≤𝑦≤1
−√𝑦 2
𝑑𝐹𝑌 (𝑦) 1
⇒ 𝑓𝑌 (𝑦) = ⇒ 𝑓𝑌 (𝑦) = ; 0≤𝑦≤1
𝑑𝑦 2√𝑦
186 | P a g e
Theorem 8.1: If continuous random variable 𝑋 has the probability density function
of 𝑓𝑋 (𝑥), and 𝑌 = 𝑔(𝑋) is an invertible and differentiable function for all the values
of 𝑋, then the density function of random variable 𝑌 is as follows:
𝑑𝑥
𝑓𝑋 (𝑥) | | ; for some 𝑥 satisfying 𝑌 = 𝑔(𝑥)
𝑓𝑌 (𝑦) = { 𝑑𝑦
0; for some 𝑥 not satisfying 𝑌 ≠ 𝑔(𝑥)
where, instead of 𝑥, we should replace its corresponding value in terms of 𝑦 or
𝑔−1 (𝑦).
Proof. If 𝑔(𝑥) is an increasing function, then we have:

𝐹𝑌 (𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑔(𝑋) ≤ 𝑦) = 𝑃(𝑋 ≤ 𝑔−1 (𝑦)) = 𝐹𝑋 (𝑔−1 (𝑦))
Now, by differentiating both sides of above equation, we have:
𝑑𝑔−1 (𝑦)
𝑓𝑌 (𝑦) = 𝑓𝑋 (𝑔−1 (𝑦))
𝑑𝑦
Furthermore, for each x that 𝑦 ≠ 𝑔(𝑥), the value of 𝐹𝑌 (𝑦) is either 0 or 1 and
thus 𝑓𝑌 (𝑦) is equal to zero.
If 𝑔(𝑥) is a decreasing function, then we have:
𝐹𝑌 (𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑔(𝑋) ≤ 𝑦) = 𝑃(𝑋 ≥ 𝑔−1 (𝑦)) = 1 − 𝐹𝑋 (𝑔−1 (𝑦))
Now, by differentiating both sides of above equation, we have:
𝑑𝑔−1 (𝑦)
𝑓𝑌 (𝑦) = −𝑓𝑋 (𝑔−1 (𝑦))
𝑑𝑦
𝑑𝑔−1 (𝑦)
Note that the term is negative in this case. Therefore, the value of 𝑓𝑌 (𝑦)
𝑑𝑦
becomes positive. Hence, in general (for increasing or decreasing functions 𝑔(𝑥)), we
have:
𝑑𝑥
𝑓𝑋 (𝑥) | | ; for some 𝑥 satisfying 𝑌 = 𝑔(𝑥)
𝑓𝑌 (𝑦) = { 𝑑𝑦
0; for some 𝑥 not satisfying 𝑌 ≠ 𝑔(𝑥)
187 | P a g e
Now, we solve Example 8.2 by using this theorem. It should be noted that even
though function 𝑋 2 is not always invertible or monotonous, it is strictly monotonic
in the interval of possible values for 𝑋 (𝑥 ≥ 0). Therefore, we have:
𝑓𝑋 (𝑥) = 𝜆𝑒 −𝜆𝑥 ; 𝑥 ≥ 0
𝑥 = 𝑔−1 (𝑦) = √𝑦
1 1
𝑦 = 𝑔(𝑥) = 𝑥 2 ⇒ {𝑑𝑥 𝑑𝑔−1 (𝑦) 1 ⇒ 𝑓𝑌 (𝑦) = 𝑓𝑋 (√𝑦) | | = 𝜆𝑒 −𝜆√𝑦 ; 𝑦≥0
= = 2√𝑦 2√𝑦
𝑑𝑦 𝑑𝑦 2√𝑦
Moreover, Example 8.3 can also be solved by direct usage of this theorem. It is
evident that function 𝑋 𝑛 is monotonic in the interval [0,1]. Therefore, we have:
𝑓𝑋 (𝑥) = 1 ; 0 < 𝑥 < 1
1
𝑥 = 𝑔−1 (𝑦) = 𝑦𝑛 𝑑𝑥 1 1
𝑦 = 𝑥 ⇒ {𝑑𝑥 𝑑𝑔−1 (𝑦) 1 1 ⇒ 𝑓𝑌 (𝑦) = 𝑓𝑋 (𝑥) | | = 𝑦 𝑛−1 ; 0 ≤ 𝑦 ≤ 1
𝑛
= = 𝑦 𝑛−1 𝑑𝑦 𝑛
𝑑𝑦 𝑑𝑦 𝑛
Note that example 8.4 cannot be solved by directly using this theorem because
function 𝑋 2 is not monotonic in the interval [−1,1].
I n Chapter 3, we showed that, according to the law of total probability, if the

probability of occurrence of an event in distinct parts of a space is different, we
should condition on possible parts of the space to calculate the probability of
occurrence of that event. Now, if the probability of occurrence of an event depends
on the result of a random variable, we should condition on its possible values.
Consider the following example:
Example 9.1
1
The probability that a shooter's shot hits the target is in which 𝑋 is the
𝑋2
distance of the target from the shooter which is determined randomly and can take
188 | P a g e
1
on 1, 2, 3, 4, and 5 with probability 5. What is the probability that the shooter's shot
hits the target?
Solution. Since the probability that the shot hits the target depends on the distance
from the target, we should condition on the distance from the target. Therefore, if
event E denotes that the shot hits the target, we have:
5
1 1 1 1 1 1 1 1 1 1 5269
𝑃(𝐸) = ∑ 𝑃(𝐸|𝑋 = 𝑥) 𝑃(𝑋 = 𝑥) = × + × + × + × + × =
1 5 4 5 9 5 16 5 25 5 18000
𝑥=1
Now, if 𝑋 follows a continuous distribution taking on values in the interval

1
(1,5) with equal density 𝑓𝑋 (𝑥) = 5, then the probability that the shot hits the target
again depends on 𝑋 and we should condition on its possible values. However,
herein, 𝑋 is a continuous random variable. In such cases, the general approach for
conditioning is the same as before. We divide the possible values of the random
variable 𝑋 into very small parts. Then, in contrast with discrete case that we denote
the probability of each point by 𝑃(𝑋 = 𝑥), for the continuous case, we denote the
probability of each small interval of length 𝑑𝑥 about point x by 𝑓𝑋 (𝑥)𝑑𝑥. Finally, we
can calculate the probability of occurrence of an event like 𝐸 for each small interval
of the random variable 𝑋 and add them up as follows:
5 5
1 1 −1 5
𝑃(𝐸) = ∫ 𝑃(𝐸|𝑋 = 𝑥)𝑓𝑋 (𝑥)𝑑𝑥 = ∫ × 𝑑𝑥 = ] = 0.2
1 1 𝑥2 5 − 1 4𝑥 1
Therefore, if the probability of occurrence of event 𝐸 is different for distinct

values of random variable 𝑋, we have:
∑ 𝑃(𝐸|𝑋 = 𝑥)𝑃(𝑋 = 𝑥) ; 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒

𝑃(𝐸) = 𝑥
∫𝑃 (𝐸|𝑋 = 𝑥)𝑓𝑋 (𝑥)𝑑𝑥 ; 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠

{ 𝑥
189 | P a g e
Example 9.2
Suppose that each day you perform a trial twice independently with the
success probability of “𝑝” for each trial. However, considering your psychological
condition, “𝑝” or success probability is different for each day. If 𝑝, on different days,
follows a continuous random variable with the density function of 𝑓𝑃 (𝑥) = 2𝑥; 0 < 𝑥 < 1,
in what proportion of the days, both of your trials result in success?
Solution. The probability that both trials result in success depends on the value of 𝑝.
Hence, if 𝐸 denotes the event that both of your trials result in success, we should
condition on possible values of 𝑝 to calculate the probability of the aforementioned
event. Hence, we have:
1 1
2𝑥 4 1 1
𝑃(𝐸) = ∫ 𝑃(𝐸|𝑋 = 𝑥)𝑓𝑃 (𝑥)𝑑𝑥 = ∫ 𝑥 2 × 2𝑥𝑑𝑥 = | =
0 0 4 2
0
Therefore, in the long run, the proportion of the days that both your trials
result in success is equal to 0.5.
190 | P a g e
1) Determine the type (discrete, continuous, or mixed) of the following random
variables.
a. The number of earthquakes happening in a particular area yearly.
b. The arrival time of a woman (man) to a certain store after the door is
opened.
c. The elapsed time between each two successive earthquakes in a
particular area.
d. The number of phone conversations in a center on one day.
e. The number of misprints on a page of a book.
f. The weight of batches of yoghurt produced in a factory.
g. The waiting time of people at a spotlight.
h. The shortage magnitude of a raw material (measured in kilograms) in a
specific time interval at a factory storeroom.
2) Consider the successive flips of a fair coin. In each of the following parts, write
the sample space and possible values of the defined random variable.
a. The number of heads in three flips.
b. The number of flips until one heads comes up.
c. The number of flips until two heads come up.
d. The number of flips until one head and one tail come up.
3) In the preceding problem, obtain the probability of possible values of the

defined random variables.
191 | P a g e
4) Considering the below probability function, what are the possible values of 𝑎?
𝑋=𝑥 2 4 6 8
1 1 1 1
𝑃𝑋 (𝑥) 𝑎+ −𝑎
4 4 4 4
5) Which of the following functions is a probability function?

1−𝑥
a. 𝑃𝑋 (𝑥) = ; 𝑥 = −2, 0, 2
4
2 1 2
b. 𝑃𝑋 (𝑥) = ( ) (2) ; 𝑥 = 0, 1, 2
𝑥
1−𝑥
c. 𝑃𝑋 (𝑥) = ; 𝑥 = −2, −1, 0
4
1 𝑥 1 1−|𝑥|
d. 𝑃𝑋 (𝑥) = (4) (2) ; 𝑥 = −1, 0, 1
6) Two balls are randomly and without replacement selected from an urn
containing 8 white, 4 black, and 2 red balls. If two points are given for each
black ball selected from the urn, one point is reduced for each white ball
selected from the urn, and random variable 𝑋 is defined as the sum of our
points obtained from each choice of size 2, then obtain the possible values of
𝑋 and their respective probabilities.
7) Suppose a trial consists of rolling a fair die twice. For this trial, obtain the
possible values of each random variable defined as follows as well as their
respective probabilities.
a. 𝑋: The greatest obtained result in two throws.
b. 𝑌: The least obtained result in two throws.
c. 𝑍: The sum of the obtained results in two throws.
d. 𝑊: The first obtained result minus the second obtained result.
8) In an urn, there are 𝑛 balls numbered 1 through 𝑛. We randomly select one ball
and after noting its number, it is returned to the urn again. This is done until
one ball is selected for the second time. If random variable 𝑋 denotes the
number of times that the ball is selected from the urn, then obtain the
probability mass function of 𝑋.
192 | P a g e
9) Consider an urn containing 10 balls that only one of them is green. Balls are
selected successively and randomly until one green ball is picked. If random
variable 𝑋 denotes the number of times that the balls are picked from the urn,
then obtain the probability mass function of 𝑋 and 𝑃(𝑋 ≤ 3) under two
a. Choices are without replacement.
b. Choices are with replacement.
10) Random variable 𝑋 is defined as the number of tails minus the number of heads
appeared in 3 flips of a fair coin. It is desired to calculate:
a. The possible values of 𝑋.
b. 𝑃(𝑋 = 𝑥).
c. 𝐹𝑋 (𝑥).
d. 𝑃(−1 ≤ 𝑋 < 3).
11) In a small bakery, 30 loaves of French bread are daily baked. Daily demand of
this bakery possesses the following probability function. What is the
probability that at the end of the randomly chosen day, the number of baked
and unsold bread is greater than or equal to 10?
𝐷 0 10 20 30 40
𝑃𝐷 (𝑑) 0.25 0.3 0.3 0.1 0.05
12) If the probability function of the discrete random variable 𝑋 taking on values
𝑛+1
𝑥 = 1,2, … , 𝑛 is given by 𝑃𝑋 (𝑥) = 𝑛𝑥(𝑥+1), then obtain the value of 𝑃(2 < 𝑋 < 𝑛).
13) Suppose that 5 people along with you and your friend are aligned in a row. If
𝑋 represents the number of people between you and your friend, then obtain
the probability function of 𝑋 for values of 𝑥 = 0,1,2, … ,5.
𝑘
14) Suppose that random variable 𝑋 has probability function 𝑃𝑋 (𝑥) = 2𝑥 ;
𝑥 = 0,1,2, … . It is desired to calculate:

a. The value of 𝑘.
b. The probability that 𝑋 adopts an even value.
193 | P a g e
15) Suppose that random variable 𝑋 has probability function:
2 𝑥
𝑃𝑋 (𝑥) = 𝑘(𝑥 + 1) (3) ; 𝑥 = 1,2, … .
It is desired to obtain:
b. The probability that 𝑋 adopts an even value.
𝑑
Hint: ∑∞ 𝑥
𝑥=1 𝑥𝑑 = (1−𝑑)2
16) A box contains 𝑁 parts that 𝑘 of them are defective. We select a part randomly
from the box until all the 𝑘 defective parts are selected. If random variable 𝑋
denotes the number of parts selected from the box, then it is desired to obtain
the probability function of 𝑋 in two cases of with replacement and without
replacement.
1
17) If 𝑃(𝑋 = 𝑥 + 1) = 𝑥+1 𝑃(𝑋 = 𝑥) ; 𝑥 = 0,1,2, …, then obtain 𝑃(𝑋 = 0).
𝑑𝑥
Hint: ∑∞
𝑥=0 𝑥! = 𝑒
𝑑
18) Suppose that the density function of random variable 𝑋 for values of
1
(−1 ≤ 𝑥 ≤ 1) is given by 𝑓(𝑥) = 𝑎𝑥 + , and it is equal to zero otherwise. Obtain
2
the possible values of 𝑎.
19) Suppose that the density function of nonnegative random variable 𝑋 is as
follows:
3 2
𝑓𝑥 (𝑥) = {4 𝑥 ; 0 ≤ 𝑥 ≤ 1
𝑎𝑒 −𝑥 ; 𝑥>1
Then, obtain values of 𝑎 and 𝑃(𝑋 < 2).
20) Show that the following functions cannot be a density function.
5
𝑐 (2𝑥 − 𝑥 3 ); 0<𝑥<
a. 𝑓𝑥 (𝑥) = { 2
0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
5
𝑐 (2𝑦 − 𝑦 2 ) ; 0 < 𝑦 < 2
b. 𝑓𝑌 (𝑦) = {
194 | P a g e
𝑐
21) Consider density function 𝑓𝑋 (𝑥) = 1+𝑥 2 ; −∞ < 𝑥 < ∞. It is desired to calculate
a. The value of 𝑐.
b. 𝑃(𝑋 < 0).
c. 𝑃(|𝑋| ≤ 1).
22) For each of the probability density functions, obtain the values of 𝑃(|𝑋| ≤ 1)
and 𝑃(𝑋 2 < 9).
𝑥2
𝑓𝑋 (𝑥) = { 3 ; −1 < 𝑥 < 2
(𝑥 + 2)
𝑓𝑋 (𝑥) = { 18 ; −2 < 𝑥 < 4
23) The density function of random variable 𝑋 is given by:
𝑓(𝑥) = 𝑏𝑒 −𝑎|𝑥| ; −∞ < 𝑥 < ∞.
If 𝑃(|𝑋| ≤ 1) = 1 − 𝑒 −2 , then obtain the values of 𝑎 and 𝑏.
24) If the density function of random variable 𝑋 is given by

𝑎𝑀𝑖𝑛{3𝑥, 2 − 𝑥}; 0<𝑥<2
𝑓𝑋 (𝑥) = {
then obtain the value of 𝑎.
25) The density function of random variable 𝑋 is given by the following figure.
If 𝑎 > 0 and 𝑏 > 0 , obtain the largest value for 𝑎.
195 | P a g e
26) If continuous random variable 𝑋 is given by
𝑥; 0≤𝑥<1
𝑓𝑋 (𝑥) = {𝑘 − 𝑥; 1≤𝑥<2
it is desired to obtain the value of:
a. 𝑘
1
b. 𝑃(𝑋 < 6)
1 7
c. 𝑃(6 < 𝑋 < 4)
27) Suppose that the density function of random variable 𝑋 is given by:
1
𝑓𝑋 (𝑥) = {2 ; 0≤𝑥≤2
𝑋 ; 𝑥≤1
If random variable Y is defined by 𝑌 = { , obtain the value of
𝑋2 ; 𝑥>1
1 9
𝑃(4 < 𝑌 < 4).
28) An oil tank located at an oil station of a certain area is weekly filled. If the
weekly sales value of oil (measured in thousands litter) at the station follows
random variable 𝑋 with the following density function:
5(1 − 𝑥)4 ; 0<𝑥<1
𝑓𝑋 (𝑥) = {
what should be the volume of the oil tank so that the probability of its
depletion equals 0.01 during one week?
29) In a factory, the density function of electronic device lifetime (measured in
hours) is given by:
10
𝑓𝑋 (𝑥) = {𝑥 𝟐 ; 𝑥 > 10
0 ; 𝑥 ≤ 10
If we randomly select a sample of size 3 and the probability that each device
lasts less than 15 hours is independent of the other devices, what is the
probability that all of them last less than 15 hours?
196 | P a g e
30) The daily sales value of bread in a bakery (measured in 100 kilograms) is a
random variable with the following density function:
𝑐𝑥; 0≤𝑥<5
𝑓𝑋 (𝑥) = {𝑐(10 − 𝑥); 5 ≤ 𝑥 < 10
0; elsewhere
If event 𝐴 denotes that the daily sales is more than 500 kilograms and 𝐵
denotes that the daily sales is between 250 and 750 kilograms,
a. Obtain 𝑐.
b. Obtain 𝑃(𝐴|𝐵).
c. Are 𝐴 and 𝐵 two independent events?
31) In a certain area, a water tank with the capacity of 7 million litters is filled at
the beginning of each week. The density function of the weekly million litters
(units) water demand in the area is given by:
𝑒 −𝑥 ; 𝑥≥0
𝑓𝑋 (𝑥) = {
a. Obtain the distribution of the random variable water consumption
magnitude.
b. Obtain the probability that the water consumption magnitude is more
than 1 unit.
c. Obtain the probability that the water consumption magnitude is
between 1 and 3 units.
d. Obtain the distribution of random variable water shortage magnitude.
e. Obtain the probability that the water shortage magnitude is less than 1
unit.
f. Obtain the probability that the water shortage magnitude is between 1
and 3 units.
32) Determine which of the following functions is a cumulative distribution

function.
197 | P a g e
a. 𝐹𝑋 (𝑥) = 1 − 𝑒 −𝑥 ; −∞ < 𝑥 < ∞
1
1 − 𝜋 𝑒 −𝑥 ; 𝑥≥0
b. 𝐹𝑋 (𝑥) = {
0; 𝑥<0
1
1 − 1+𝑥 ; 𝑥≥0
c. 𝐹𝑋 (𝑥) = {
0 ; 𝑥<0
1
( ) 𝑒𝑥 ; 𝑥<0
2
d. 𝐹𝑋 (𝑥) = { 3
1 − (4) 𝑒 −𝑥 ; 𝑥≥0
33) For each of the following density functions, obtain the cumulative distribution
function.
a. 𝑓𝑋 (𝑥) = 3(1 − 𝑥)2 ; 0<𝑥<1
b. 𝑓𝑋 (𝑥) = 𝑥 −2 ; 1<𝑥<∞
1
c. 𝑓𝑋 (𝑥) = 3 ; 𝑥 ∈ {(0,1) ∪ (2,4)}
1
d. 𝑓𝑋 (𝑥) = 2 𝑒 −|𝑥| ; − ∞ < 𝑥 < +∞
34) Suppose that the cumulative distribution function of the random variable 𝑋 is
given by:
0; 𝑥<0
𝑥2
; 0≤𝑥<1
4
𝐹𝑋 (𝑥) = 1 ; 1≤𝑥<2
2
𝑥
; 2≤𝑥<3
3
{1; 𝑥≥3
It is desired to calculate the following values:
5
a. 𝑃(𝑋 > 2)
b. 𝑃(𝑋 < 1)
c. 𝑃(𝑋 = 2)
d. 𝑃(1 ≤ 𝑋 ≤ 3)
198 | P a g e
e. 𝑃(1 ≤ 𝑋 < 2)
f. 𝑃(𝑋 < 2|𝑋 ≥ 1)
2 1
g. 𝑃(|𝑋 − 3 | > 3)
35) If the cumulative distribution function of random variable 𝑋 is given by:

0; 𝑥<0
1
𝑥; 0≤𝑥<1
4
𝐹𝑋 (𝑥) = 1
𝑎 + (𝑥 − 1); 1≤𝑥<2
4
𝑏; 2≤𝑥<3
{1; 𝑥≥3
1 1
and we know that 𝑃(𝑋 = 1) = 4 and 𝑃(𝑋 = 2) = 6, then obtain the values of 𝑎
and 𝑏.

1 −𝑥 1 −[𝑥]
𝐹𝑋 (𝑥) = { 1 − 𝑒 2− 𝑒 3; 𝑥≥0
2 2
then it is desired to calculate the following values:
a. 𝑃(𝑋 < 2)
b. 𝑃(𝑋 = 3)
c. 𝑃(4 ≤ 𝑋 ≤ 6)
d. 𝑃(𝑋 > 6|𝑋 > 2)

𝑥
𝐹𝑥 (𝑥) = 𝑎 + 𝑏(𝑡𝑎𝑛−1( )); −∞ < 𝑥 < ∞
2
then obtain the constants 𝑎 and 𝑏.
199 | P a g e
38) In each of the following functions, determine the possible values of constant
“𝑎” such that each of these functions becomes cumulative distribution
function.
0; 𝑥 < −2
𝑥+𝑎
a. 𝐹𝑋 (𝑥) = { 8
; −2 ≤ 𝑥 < 2
1; 𝑥≥2
0; 𝑥<0
b. 𝐹𝑋 (𝑥) = { 2
1 − 𝑒 −𝑥 −𝑎 ; 𝑥≥0

0; 𝑥<0
𝐹𝑋 (𝑥) = {𝑎(𝑥 2 + 2𝑥); 0≤𝑥<2
1; 𝑥≥2
a. Obtain the possible values of constant 𝑎.
b. Determine the value of 𝑎 such that 𝑋 becomes a continuous random
variable.
1 3
c. If 𝑋 is a continuous random variable, obtain the value of 𝑃(2 < 𝑋 < 2).

0; 𝑥<0
𝐹𝑋 (𝑥) = {𝑎𝑥 3 ; 0≤𝑥<2
1; 𝑥≥2
then what values of 𝑎, make the random variable 𝑋 continuous, discrete, or
mixed?
41) Suppose that continuous random variable 𝑋 has probability density function
𝑓𝑋 (𝑥) and cumulative distribution function 𝐹𝑋 (𝑥). In addition, suppose that 𝑎 ∈
ℝ is an arbitrary point of this variable. Determine whether the following
function can be a density function.
0; 𝑥<𝑎
ℎ𝑋 (𝑥) = { 𝑓𝑋 (𝑥)
; 𝑥≥𝑎
1 − 𝐹𝑋 (𝑎)
200 | P a g e
42) Obtain the mode of each of the following distributions:
a. 𝑓𝑋 (𝑥) = 12𝑥 2 (1 − 𝑥); 0 ≤ 𝑥 ≤ 1
1
b. 𝑓𝑋 (𝑥) = (2) 𝑥 2 𝑒 −𝑥 ; 𝑥≥0
c. 𝑓𝑋 (𝑥) = 3𝑒 −3𝑥 ; 𝑥≥0
43) Obtain the median of each of the following distributions:

3𝑥 2 ; 0<𝑥<1
a. 𝑓𝑋 (𝑥) = {
1
b. 𝑓𝑋 (𝑥) = 𝜋(1+𝑥 2) ; −∞ < 𝑥 < ∞
1
c. 𝐹𝑋 (𝑥) = 1+𝑒 −2𝑥 ; −∞ < 𝑥 < ∞
0; 𝑥<0
d. 𝐹𝑋 (𝑥) = { −2𝑥
1−𝑒 ; 𝑥 ≥0
44) Suppose that random variable 𝑋 has the following density function:
1; 0≤𝑥≤1
𝑓𝑋 (𝑥) = {
If random variable 𝑌 is defined by 𝑌 = √𝑋, obtain the median of the random
variable 𝑌.
45) Suppose that the density function of 𝑋 is positive within interval 0 ≤ 𝑥 ≤ 1 and
is given by 𝑓(𝑥) = 2𝑥. If we denote the mode and median of 𝑋 by 𝑥̃ and m,
respectively, then obtain mode, median, and 𝑃(𝑚 < 𝑋 < 𝑥̃).
46) If the density function of random variable 𝑋 is given by:

2
𝑀𝑖𝑛 {2𝑥, (2 − 𝑥)} ; 0≤𝑥≤2
𝑓𝑋 (𝑥) ={ 3
then obtain the mode and median of the random variable.
47) Suppose that the cumulative distribution function 𝑋 is given by:
201 | P a g e
0; 𝑥<0
𝑥2
; 0≤𝑥<1
𝐹𝑋 (𝑥) = 5
4
𝑥; 1≤𝑥<2
9
{1; 𝑥≥2
Obtain the distribution of 𝑋.
48) If 𝑋 follows a continuous random variable with the following density function:
1
𝑓𝑋 (𝑥) = {5 ; 0≤𝑥≤5
then obtain the probability function of random variable 𝑌 = [𝑋]. ([𝑥] is the
largest integer less than or equal to x)
49) If 𝑋 follows a continuous random variable with the following density function:
1
𝑓𝑋 (𝑥) = {𝑥 2 ; 𝑥>1
then obtain the probability function of random variable 𝑌 = [𝑋]. ([𝑥] is the
50) Suppose that the density function of continuous random variable 𝑋 is given by:
−(𝑥+1)
𝑓𝑋 (𝑥) = {𝑒 ; 𝑥 ≥ −1
0; −1 ≤ 𝑥 < 0
If random variable 𝑌 is defined by 𝑌 = { , then obtain the
1; 𝑥≥0
probability function of this discrete random variable.
51) If 𝑋 is a continuous random variable with the following density function:
1
𝑓𝑋 (𝑥) = {2 ; −1 < 𝑥 < 1
then obtain each of density functions 𝑌 = |𝑋| and 𝑊 = 𝑋 2 .
52) If 𝑋 is a continuous random variable with the following density function:
202 | P a g e
𝑥+1
𝑓𝑋 (𝑥) ={ 2 ; −1 < 𝑥 < 1
then obtain the density function of random variable 𝑌 = 𝑋 2 .
53) If cumulative distribution function 𝑋 is given by
1 − 𝑒 −𝑥 ; 𝑥≥0
𝐹𝑋 (𝑥) = {
then obtain the cumulative distribution function of the following random
variables.
a. 𝑌 = 2𝑋
b. 𝑊 = 𝑋2
54) If the cumulative distribution function of nonnegative continuous random
variable 𝑋 is denoted by 𝐹𝑋 (𝑥), then, in each of the following states, obtain the
cumulative distribution function of random variable 𝑍 in terms of cumulative
distribution function of 𝑋.
a. 𝑍 = 𝑎𝑋 + 𝑏
1
b. 𝑍=𝑋
c. 𝑍 = 𝑙𝑛 𝑋
d. 𝑍 = 𝑒𝑋
55) If nonnegative continuous random variable 𝑋 possesses density function 𝑓𝑋 (𝑥),
obtain the density function of random variable 𝑍 in each of the following
states.
a. 𝑍 = 𝑎 + 𝑏𝑋
1
b. 𝑍=𝑋
c. 𝑍 = 𝑙𝑛 𝑋
d. 𝑍 = 𝑒𝑋
𝑒 −𝑥
56) If the density function of 𝑋 is given by 𝑓𝑋 (𝑥) = (1+𝑒 −𝑥 )2 ; −∞ < 𝑥 < +∞, obtain
1
the density function of random variable 𝑌 = 1+𝑒 −𝑋.
203 | P a g e
57) If the density function of random variable X is given by:
𝑎𝑥 𝑎−1 ; 0≤𝑥≤1
𝑓𝑋 (𝑥) = {
then obtain the density function of random variable 𝑌 = −𝐿𝑛𝑋. (𝑎 is a positive
constant.)
58) If the density function of random variable 𝑋 is given by:
4𝑥 3 ; 0≤𝑥≤1
𝑓𝑋 (𝑥) = {
then obtain density function of random variable 𝑌 = −2 ln(𝑋 4 ).
59) Suppose that random variable 𝑋 has density function 𝑓𝑋 (𝑥) = 𝜋(1+𝑥
1
2 ) ; −∞ < 𝑥 < ∞.
If so, obtain the density function of each of the following random variables.
1
a. 𝑌=𝑋
b. 𝑌 = |𝑋|
c. 𝑌 = 𝑋2
𝑒 −𝑥 ; 𝑥≥0
𝑓𝑋 (𝑥) = {
𝑋; 𝑥≤1
If random variable 𝑌 is defined by 𝑌 = { 1 , obtain its density function.
; 𝑥>1
𝑋
1
𝑓𝑋 (𝑥) = {4 ; −2 ≤ 𝑥 ≤ 2
If random variable 𝑌 is defined by 𝑌 = 𝑋 2 , obtain its density function.
204 | P a g e
1
𝑓𝑋 (𝑥) = {5 ; −2 ≤ 𝑥 ≤ 3
If random variable 𝑌 is defined by 𝑌 = 𝑋 2 , obtain its density function.
205 | P a g e
O ne of the important concepts of the probability theory is the expected value
of a random variable, which has numerous applications in probability theory
and statistics. For discrete random variable 𝑋, the expected value is defined as
follows:
𝐸(𝑋) = ∑ 𝑥 𝑃(𝑋 = 𝑥)
𝑥
In fact, the expected value is a weighted average of the possible values of 𝑋,

where the weight of each value is equal to the probability of the random variable
𝑋 in that value.
For example, consider a game that a person can win the amounts of $1, $2,
and $3 with respective probabilities 0.2, 0.3, and 0.5. If his gain value in this game
is denoted by 𝑋, the expected value of his gain equals:
∑ 𝑥 𝑃(𝑋 = 𝑥) = 1 × 0.2 + 2 × 0.3 + 3 × 0.5 = 2.3

𝑥
The value of 2.3 or the expected value of this random variable is simply a
weighted average of the person's gain value in each trial. It can also be expressed
that the person, on average, wins $2.3. However, it should be noted that the
expected value of a random variable does not necessarily take on one of the
random variable values. For instance, in the aforementioned random variable, the
206 | P a g e
value of 2.3 does not belong to one of the random variable values and it is merely
an index to express the weighted average of the random variable values
considering the probability of each value.
Example 1.1
Obtain the expected value of the outcome of rolling a fair die.

Solution. Suppose that the result of rolling a fair die is denoted by 𝑋. Then, its
expected value is calculated as follows:
6
1 1 1 1 1 1 21 7
𝐸(𝑋) = ∑ 𝑖 𝑃(𝑋 = 𝑖) = 1 ( ) + 2 ( ) + 3 ( ) + 4 ( ) + 5 ( ) + 6 ( ) = =
6 6 6 6 6 6 6 2
𝑖=1
One intuitive concept of the expected value is simply the average of results of
the random variable's infinite repetition that we will address it by the law of large
numbers in Chapter 10. For instance, in Example 1.1, one intuitive concept for number
7
or the obtained expected value is that if we roll a fair die infinite times, the average
2
7
of the upturned faces converges to number 2 in the long run. Furthermore, consider
the first example where the person wins the amounts of $1, $2, and $3 with respective
probabilities 0.2, 0.3, and 0.5, and the expected value of person's gain was equal to
2.3. An intuitive concept of number 2.3 is that if this person successively performs
the game, his average gain in each game converges to number 2.3 in the long run.
To provide a better understanding of the intuitive concept of expected value,
suppose that random variable 𝑋 takes on values of 𝑥1 , 𝑥2 , . . . , 𝑥𝑚 with the respective
probabilities of 𝑝1 , 𝑝2 , … , 𝑝𝑚 . If we repeat a trial related to this random variable infinite
times and record its results, assuming that each of the 𝑥𝑖 's is repeated 𝑎𝑖 times, then
the sum of the results is as 𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑚 𝑥𝑚 and the average of the results is
(𝑎1 𝑥1 +𝑎2 𝑥2 +⋯+𝑎𝑚 𝑥𝑚 )
as . Now, assuming that 𝑛 or the number of times the trial is
𝑛
performed converges to infinity, and regarding the empirical concept of probability
207 | P a g e
𝑎𝑖
or the probability as a relative frequency in infinite times, or the proportion of
𝑛
times that the result 𝑥𝑖 is repeated can be regarded as the probability of occurrence
of 𝑥𝑖 , which is denoted by 𝑝𝑖 . In such conditions, the average of the numbers obtained
by infinite times of repetition can be written as follows:
𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑚 𝑥𝑚
𝑙𝑖𝑚 = 𝑝1 𝑥1 + 𝑝2 𝑥2 + ⋯ + 𝑝𝑚 𝑥𝑚 = ∑ 𝑥𝑖 𝑝𝑖
𝑛→∞ 𝑛
𝑖
which is the expected value definition of random variable 𝑋.
Moreover, another interpretation of the expected value concept is that we

regard it as the concept of a mass distribution gravity center in a way that the mass
of each point is equivalent to the probability of that point. For example, suppose that
we have a rod with negligible mass, and some spheres are located in a distance of
1 1 1
lengths 1, 2, and 4 from its beginning point with respective masses 4, 4, and (mass
2
units). In such a situation, the expected value concept or the probabilistic gravity
center can be represented as follows:
gravity center
1 2 4
∑3𝑖=1 𝑚𝑖 𝑥𝑖 ∑3𝑖=1 𝑝𝑖 𝑥𝑖 ∑3𝑖=1 𝑝𝑖 𝑥𝑖 1 1 1 11

𝐸(𝑋) = 3 = 3 = =1× +2× +4× =
∑𝑖=1 𝑚𝑖 ∑𝑖=1 𝑝𝑖 1 4 4 2 4
11
In other words, can be construed as the probabilistic gravity center of the
4
random variable 𝑋.
208 | P a g e
A s mentioned previously, the expected value of discrete random variables is
𝐸(𝑋) = ∑ 𝑥 𝑃(𝑋 = 𝑥)
𝑥
Example 2.1
If we roll two fair dice, obtain the expected value of the upturned faces.
Solution. Suppose that the random variable 𝑋 denotes the sum of two dice. Then, its
expected value is calculated as follows:
12
1 2 3 1
𝐸(𝑋) = ∑ x 𝑃(𝑋 = 𝑥) = 2 × +3× +4× + ⋯ + 12 × =7
36 36 36 36
𝑥=2
The probability function of this random variable is as the figure below. Also, as
mentioned above, the expected value of a random variable can be expressed as a
probabilistic gravity center of the random variable 𝑋. Since this variable takes on
finite values and the following figure is symmetric about number 7, its expected value
or the probabilistic gravity center can be considered to be equal to number 7.
209 | P a g e
Example 2.2
Suppose that a fair coin is flipped until one head comes up, but if the number
of flips is equal to 4 while no heads turns up, we stop flipping. Obtain the expected
value of the number of flips.
Solution. Suppose that 𝑋 denotes the number of flips. Considering the sample space
expressed in Example 1.2 of Chapter 4, we have:
1
𝑃(𝑋 = 1) = {𝐻} =
2
1 1
𝑃(𝑋 = 2) = {𝑇𝐻} = ×
2 2
1 2 1
𝑃(𝑋 = 3) = {𝑇𝑇𝐻} = ( ) ( )
2 2
1 3 1 1 4 1 3
𝑃(𝑋 = 4) = 𝑃{𝑇𝑇𝑇 𝐻} + 𝑃{𝑇𝑇𝑇𝑇} = ( ) ( ) + ( ) = ( )
2 2 2 2
4
1 1 1 1 15
𝐸(𝑋) = ∑ 𝑥𝑃(𝑋 = 𝑥) = 1 × + 2 × + 3 × + 4 × =
2 4 8 8 8
𝑥=1
Example 2.3
In the preceding example, obtain the expected value of the number of

appeared tails.
Solution. Suppose that 𝑌 denotes the number of the upturned tails. Considering the
sample space expressed in Example 1.2 of Chapter 4, we have:
210 | P a g e
1
𝑃(𝑌 = 0) = {𝐻} =
2
1 1
𝑃(𝑌 = 1) = {𝑇𝐻} = ×
2 2
1 2 1
𝑃(𝑌 = 2) = {𝑇𝑇𝐻} = ( ) ( )
2 2
1 3 1 1
𝑃(𝑌 = 3) = 𝑃{𝑇𝑇𝑇 𝐻} = ( ) ( ) =
2 2 16
1 4 1
𝑃(𝑌 = 4) = 𝑃{𝑇𝑇𝑇𝑇} = ( ) =
2 16
4
1 1 1 1 1 15
𝐸(𝑌) = ∑ 𝑦𝑃(𝑌 = 𝑦) = 0 × + 1 × + 2 × + 3 × +4× =
2 4 8 16 16 16
𝑦=0
For the expected value of continuous random variables, a similar concept to

the expected value of discrete random variables can be stated as well. If 𝑋 is a
continuous random variable having density function 𝑓𝑋 (𝑥), then, according to the
explanations of Section 4.4 in Chapter 4, for small values of 𝑑𝑥, we have:
𝑑𝑥 𝑑𝑥
𝑃(𝑥 − ≤ 𝑋 ≤ 𝑥 + ) ≅ 𝑓(𝑥)𝑑𝑥
2 2
Accordingly, to calculate the expected value of the random variable 𝑋 using
the same concept defined in the preceding section for discrete random variables, the
expected value of a continuous random variable𝑋can be defined as follows:
+∞
𝐸(𝑋) = ∫ 𝑥 𝑓(𝑥) 𝑑𝑥
−∞
Example 2.4
In a group of men, the difference between the uric acid and its standard value
which is equal to 6 is a random variable with the following density function:
211 | P a g e
1 2
𝑓(𝑥) = { 2 (2𝑥 − 3𝑥 );
− −1 < 𝑥 < 1
Obtain the expected value of the difference between the uric acid and its
standard value for the group of men.
Solution.
1
+∞ 1
1 1 2 3 2
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥) 𝑑𝑥 = ∫ − 𝑥(2𝑥 − 3𝑥 2 ) 𝑑𝑥 = − ( 𝑥 3 − 𝑥 4 ) | = −
−∞ −1 2 2 3 4 −1
3
Note that the above calculations mean that the average value of the uric acid
2
for the group of men is 3 less than its standard value (6).
Example 2.5
Experience has shown that the time interval between two consecutive
customers entering at a store follows a random variable with the density function
2𝑒 −2𝑥 ; 𝑥 ≥ 0
𝑓(𝑥) = { . Obtain the expected value of this random variable.
Solution.
+∞ +∞
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 2𝑒 −2𝑥 𝑑𝑥
−∞ 0
Now, using integration by parts, we have:

∞
∞
1 1
𝐸(𝑋) = −𝑥𝑒 −2𝑥 | + ∫ 𝑒 −2𝑥 𝑑𝑥 = 0 + =
0 0 2 2
𝑥 1
(Note that using Hopital's rule leads to 𝑙𝑖𝑚 𝑥𝑒 −2𝑥 = 𝑙𝑖𝑚 = 𝑙𝑖𝑚 = 0.)
𝑥→∞ 𝑥→∞ 𝑒 2𝑥 𝑥→∞ 2𝑒 2𝑥
212 | P a g e
Example 2.6
If the density function of the random variable 𝑋 is as follows:

𝑐𝑥 2 ; −1 < 𝑥 < 2
𝑓(𝑥) = {
, obtain its expected value.
Solution.
2 2
𝑥3 2 8 − (−1) 1
∫ 𝑓(𝑥) 𝑑𝑥 = 1 ⇒ ∫ 𝑐𝑥 2 𝑑𝑥 = 1 ⇒ 𝑐 | −1 = 1 ⇒ 𝑐 =1⇒𝑐=
−1 −1 3 3 3
2 2 4 2
1 1𝑥 1 16 − 1 5
𝐸(𝑋) = ∫ x𝑓(𝑥) 𝑑𝑥 = ∫ 𝑥 𝑥 2 𝑑𝑥 = | −1 = =
−1 −1 3 3 4 3 4 4
To calculate the expected value of the mixed random variables, we use formula
∑𝑥 𝑥 𝑃(𝑋 = 𝑥) for discrete values, ∫𝑥 𝑥 𝑓(𝑥) 𝑑𝑥 for continuous values, and we finally
add up the results together.
Example 2.7
Suppose that the waiting time of cars behind a stoplight (measured in seconds)
1
is a mixed random variable. The cars are faced with the greenlight with probability 2.
As a result, their waiting time behind the stoplight equals zero; otherwise, their
213 | P a g e
1
waiting time follows density function 𝑓(𝑥) = 60 ; 0 < 𝑥 < 30. Calculate the expected
value of waiting time for cars behind the stoplight.
Solution. If the random variable associated with the waiting time of cars behind the
stoplight is denoted by 𝑋, we have:
1
𝑃(𝑥) = ; 𝑥=0
{ 2
1
𝑓(𝑥) = ; 0 < 𝑥 < 30
60
Now, we use formula ∑𝑥 𝑥 𝑃(𝑋 = 𝑥) for discrete values, ∫𝑥 𝑥 𝑓(𝑥) 𝑑𝑥 for

continuous values, and we finally add up the results together. Therefore, we have:
30 30
1 1 𝑥2
𝐸(𝑋) = ∑ 𝑥𝑃(𝑋 = 𝑥) + ∫ 𝑥𝑓(𝑥)𝑑𝑥 = 0 × + ∫ 𝑥 𝑑𝑥 = | = 7.5
continuous 𝑥 2 0 60 2 × 60 0
discrete 𝑥
A s mentioned in Section 5.1 of this chapter, the expected value is simply a

weighted average of the random variable's possible values in a way that the
weight of each value equals its probability. Therefore, the expected value possesses
all the properties of a weighted average. For instance:
➢ If the random variable 𝑋 takes on different real values of 𝑥1 , 𝑥2 , … , 𝑥𝑚
with respective nonzero probabilities 𝑝1 , 𝑝2 , … , 𝑝𝑚 , its expected value is
between Min{𝑥1 , 𝑥2 , … , 𝑥𝑚 } and Max{x1 , x2 , … , xm }.
➢ If the probability function of the random variable X is symmetric about
point a, the expected value of random variable X, if exists, is equal to a.
For example, as seen in Example 2.1 of this chapter, the probability function
associated with the sum of the upturned faces of rolling two dice is symmetric about
number 7. Hence, its expected value or the probabilistic gravity center equals 7;
214 | P a g e
however, it should be noted it is possible that the expected value of a random variable
does not exist. For example, note the following example.
Example 3.1
1
If the density function of the random variable 𝑋 is 𝑓𝑥 (𝑥) = (1+𝑥)2 ; 𝑥 > 0, obtain
its expected value.
Solution.
+∞ +∞ +∞
1 1+𝑥−1
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 𝑑𝑥 = ∫ 𝑑𝑥
0 0 (1 + 𝑥)2 0 (1 + 𝑥)2
+∞
1 +∞
1 ∞
1 ∞
=∫ 𝑑𝑥 − ∫ 𝑑𝑥 = 𝑙𝑛( 1 + 𝑥)|0 + |
0 1+𝑥 0 (1 + 𝑥)2 1+𝑥 0
1 ∞
The term 1+𝑥| is equal to -1, but the term 𝑙𝑛( 1 + 𝑥)|∞
0 diverges. In such cases,
0
it is said that the expected value of the random variable 𝑋 does not exist.
Example 3.2
Obtain the expected value of the random variable 𝑋 with the probability
1 1
function of 𝑃(𝑋 = 𝑥) = 𝑥 − 𝑥+1 ; 𝑥 = 1,2,3, … .
215 | P a g e
Solution.
1 1 1
𝑃(𝑋 = 𝑥) = − = ; 𝑥 = 1,2,3, …
𝑥 𝑥 + 1 𝑥(𝑥 + 1)
∞ ∞ ∞
1 1
𝐸(𝑋) = ∑ 𝑥 𝑃(𝑋 = 𝑥) = ∑ 𝑥 =∑
𝑥(𝑥 + 1) (𝑥 + 1)
𝑥=1 𝑥=1 𝑥=1
1
where the term ∑∞
𝑥=1 (𝑥+1) diverges because:
∞
1
∑ =
(𝑥 + 1)
𝑥=1
1 1 1
+ +⋯+
2 3 10
1 1 1
+ + + ⋯+
11 12 100
1 1 1
+ + + ⋯+
101 102 1000
⋮
In the above term, all the numbers in the first row are greater than or equal to
1 1
. Therefore, the value of this row is greater than 9 × 10 = 0.9. Moreover, in the
10
1
second row, all numbers are greater than or equal to 10; hence, the value of this row
1
is greater than 90 × = 0.9. Likewise, it can be shown that the value of each row is
100
greater than 0.9. Therefore, the above term is divergent and the expected value of
this random variable does not exist.
Example 3.3
If the density function of the random variable 𝑋 is given by

1 1
𝑓(𝑥) = 𝜋 × 1+𝑥 2 ; − ∞ < 𝑥 < +∞, then obtain its expected value.
216 | P a g e
Solution. Even though the density function of this random variable is symmetric
about zero, it can be shown that its expected value is undefined.
∞
∞
1 1 1 ∞ 𝑥 1 1 2
∞
𝐸(𝑋) = ∫ 𝑥 𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 2
𝑑𝑥 = ∫ 𝑑𝑥 = × ln(1 + 𝑥 ) |
−∞ −∞ 𝜋 1 + 𝑥 𝜋 −∞ 1 + 𝑥 2 𝜋 2 −∞
It should be noted that the above term is divergent for the values within
interval (−∞, 0] and values within interval (0, ∞). In these conditions, we say that the
expected value of this random variable is undefined. In general, the expected value
of a random variable exists if its expectation of the absolute value of 𝑋 or 𝐸(|𝑋|) exists
(does not diverge). For instance, the expectation of the absolute value of 𝑋 in the
aforementioned example is as follows:
∞ ∞
1 1 2 ∞ 𝑥
𝐸(|𝑋|) = ∫|𝑥|𝑓(𝑥)𝑑𝑥 = 2 ∫ 𝑥𝑓(𝑥)𝑑𝑥 = 2 ∫ 𝑥 𝑑𝑥 = ∫ 𝑑𝑥
𝑥 0 0 𝜋 1 + 𝑥2 𝜋 0 1 + 𝑥2
1
= 𝑙𝑛( 1 + 𝑥 2 )|∞
0
𝜋
which is divergent. As a result, the expected value of random the variable X does not
exist.
Therefore, when the density function or probability function of a random
variable is symmetric about a point like “𝑎”, if the expected value of this random
variable exists, it is equal to “𝑎”.
Proposition 3-1
For the random variable 𝑋 taking on nonnegative integers, it can be shown that:
∞ ∞ ∞
𝐸(𝑋) = ∑ 𝑃(𝑋 ≥ 𝑖) = ∑ 𝑃(𝑋 > 𝑖) = ∑(1 − 𝐹𝑥 (𝑖))

𝑖=1 𝑖=0 𝑖=0
Proof. Term ∑∞
𝑖=1 𝑃(𝑋 ≥ 𝑖) is extended as follows:
∞
∑ 𝑃(𝑋 ≥ 𝑖) = 𝑃(𝑋 ≥ 1) + 𝑃(𝑋 ≥ 2) + 𝑃(𝑋 ≥ 3) + ⋯

𝑖=1
= 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + ⋯
+ 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + ⋯
+ 𝑃(𝑋 = 3) + ⋯
+⋯
217 | P a g e
∞
= 1 × 𝑃(𝑋 = 1) + 2 × 𝑃(𝑋 = 2) + 3 × 𝑃(𝑋 = 3) + ⋯ = ∑ 𝑖 × 𝑃(𝑋 = 𝑖)

𝑖=1
The above term is the expected value of a discrete random variable with
nonnegative integers. In addition, ∑∞
𝑖=1 𝑃(𝑋 ≥ 𝑖) can be written as follows:
∞
∑ 𝑃(𝑋 ≥ 𝑖) = 𝑃(𝑋 ≥ 1) + 𝑃(𝑋 ≥ 2) + 𝑃(𝑋 ≥ 3) + ⋯

𝑖=1
∞ ∞
= 𝑃(𝑋 > 0) + 𝑃(𝑋 > 1) + 𝑃(𝑋 > 2) + ⋯ = ∑ 𝑃(𝑋 > 𝑖) = ∑(1 − 𝐹𝑥 (𝑖))
𝑖=0 𝑖=0
Example 3.4
Suppose that there are 10 marbles numbered 1 through 10 in an urn. We select

two marbles randomly and with replacement from the urn. If we define random
variable X to be the smallest selected number, obtain its expected value.
Solution. Suppose that the random variable X denotes the smallest selected number.
In these conditions, its expected value can be calculated by the aforementioned
method in Proposition 3-1 as follows:
∞
𝐸(𝑋) = ∑ 𝑃(𝑋 ≥ 𝑖) = 𝑃(𝑋 ≥ 1) + 𝑃(𝑋 ≥ 2) + 𝑃(𝑋 ≥ 3) + ⋯ + 𝑃(𝑋 ≥ 10) + 𝑃(𝑋 ≥ 11) + ⋯

𝑖=1
10 × 11 × 21
10 9 8 1 6 77
= ( )2 + ( )2 + ( )2 + ⋯ + ( )2 + 0 = =
10 10 10 10 102 20
Namely, 𝑃(𝑋 ≥ 3) means that both of the two choices are greater than or equal
8
to 3, which has the probability of (10)2.
Note that the following equation is used to solved this example:

𝑛
𝑛(𝑛 + 1)(2𝑛 + 1)
∑ 𝑖2 =
6
𝑖=1
218 | P a g e
Proposition 3-2
For the random variable 𝑋 taking on integers, it can be shown that:
∞ −1 ∞ −1
𝐸(𝑋) = ∑ 𝑃(𝑋 ≥ 𝑖) − ∑ 𝑃(𝑋 ≤ 𝑖) = ∑ 𝑃(𝑋 > 𝑖) − ∑ 𝑃(𝑋 ≤ 𝑖)

𝑖=1 𝑖=−∞ 𝑖=0 𝑖=−∞
∞ −1
= ∑(1 − 𝐹𝑥 (𝑖)) − ∑ 𝐹𝑥 (𝑖)

𝑖=0 𝑖=−∞
Proof. Similar to the proof of Proposition 3-1, for nonnegative values, it can be shown
that:
∞
∑ 𝑃(𝑋 ≥ 𝑖) = 𝑃(𝑋 ≥ 1) + 𝑃(𝑋 ≥ 2) + 𝑃(𝑋 ≥ 3) + ⋯

𝑖=1
= 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + ⋯
+ 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + ⋯
+𝑃(𝑋 = 3) + ⋯
+⋯
∞
= 1 × 𝑃(𝑋 = 1) + 2 × 𝑃(𝑋 = 2) + 3 × 𝑃(𝑋 = 3) + ⋯ = ∑ 𝑖 × 𝑃(𝑋 = 𝑖)

𝑖=1
Likewise, for negative values, we have:

−1
− ∑ 𝑃(𝑋 ≤ 𝑖) = −[𝑃(𝑋 = −1) + 𝑃(𝑋 = −2) + 𝑃(𝑋 = −3) + ⋯ ]

𝑖=−∞
− [𝑃(𝑋 = −2) + 𝑃(𝑋 = −3) + ⋯ ]
− [𝑃(𝑋 = −3) + ⋯ ]
−1
= (−1) × 𝑃(𝑋 = −1) + (−2) × 𝑃(𝑋 = −2) + (−3) × 𝑃(𝑋 = −3) + ⋯ = ∑ 𝑖 × 𝑃(𝑋 = 𝑖)
𝑖=−∞
where the sum of the above two terms is equal to the expected value of the
random variable 𝑋.
219 | P a g e
Proposition 3-3 For the nonnegative random variable 𝑋, we have:
∞ ∞
𝐸(𝑋) = ∫ 𝑃(𝑋 > 𝑥) 𝑑𝑥 = ∫ (1 − 𝐹𝑥 (𝑥)) 𝑑𝑥
0 0
Proof. The proof of the continuous random variable 𝑋 is as follows:
∞ ∞ ∞
𝐸(𝑋) = ∫ 𝑃(𝑋 > 𝑥) 𝑑𝑥 = ∫ ∫ 𝑓𝑥 (𝑦) 𝑑𝑦 𝑑𝑥
0 0 𝑥
Now, changing the order of integration results in:

∞ ∞ 𝑦 ∞
∫ 𝑃(𝑋 > 𝑥) 𝑑𝑥 = ∫ (∫ 𝑑𝑥)𝑓𝑥 (𝑦) 𝑑𝑦 = ∫ 𝑦𝑓𝑥 (𝑦) 𝑑𝑦 = 𝐸(𝑌)
0 0 0 0
Note that since the integral of subtraction of the two functions defined on a
specific area denotes the area between those two functions, it can be shown that the
expected value of a function for nonnegative random variables equals the area
between 𝑔1 (𝑥) = 1 and 𝑔2 (𝑥) = 𝐹𝑥 (𝑥).
Namely, if the cumulative distribution function of a variable is as follows, then
the expected value of the variable is equal to the area of region A.
Figure 5-1 The expected value of a nonnegative variable as the area between
𝑔1 (𝑥) = 1 and 𝑔2 (𝑥) = 𝐹𝑥 (𝑥).
220 | P a g e
Example 3.5
Obtain the expected value of Example 2.5 in this chapter using the above-
mentioned note.
Solution.
∞ ∞
𝑃(𝑋 > 𝑥) = ∫ 2𝑒 −2𝑥 𝑑𝑥 = −𝑒 −2𝑥 | 𝑥 = 0 − (−𝑒 −2𝑥 ) = 𝑒 −2𝑥
𝑥
∞ ∞ ∞
1 1 1
⇒ 𝐸(𝑋) = ∫ 𝑃 (𝑋 > 𝑥)𝑑𝑥 = ∫ 𝑒 −2𝑥 𝑑𝑥 = − 𝑒 −2𝑥 | 0 = (0 − (−1)) =
0 0 2 2 2
In addition, for all kinds of random variables, we have:

∞ 0 ∞ 0
𝐸(𝑋) = ∫ 𝑃(𝑋 > 𝑥) 𝑑𝑥 − ∫ 𝑃(𝑋 ≤ 𝑥) 𝑑𝑥 = ∫ (1 − 𝐹𝑥 (𝑥)) 𝑑𝑥 − ∫ 𝐹𝑥 (𝑥) 𝑑𝑥
0 −∞ 0 −∞
For instance, if the cumulative distribution function of a variable is as

follows, the expected value of this variable is equal to the area of region A minus
the area of region B.
𝐹𝑋 (𝑥)
1
𝐴
𝐵
x
Figure 5-2 The expected value of a random variable as the area of region A minus the
area of region B
It should be noted that Propositions 3-3 and its generalization are not valid
only for continuous random variables, and it can be shown that they are also valid
for discrete and mixed random variables.
221 | P a g e
Example 3.6
If the cumulative distribution function of the random variable X is given by:

0 ; 𝑥<0
𝑥2
𝐹𝑥 (𝑥) = { ; 0≤𝑥<4
32
1 ; 𝑥≥4
obtain its expected value.
Solution. Using Proposition 3-3, the expected value of this nonnegative random
variable can be calculated as follows:
∞ ∞ 4 ∞
𝑥2
𝐸(𝑋) = ∫ 𝑃(𝑋 > 𝑥) 𝑑𝑥 = ∫ (1 − 𝐹𝑥 (𝑥)) 𝑑𝑥 = ∫ (1 − ) 𝑑𝑥 + ∫ (1 − 1) 𝑑𝑥
0 0 0 32 4
𝑥3 4 64 10
= (𝑥 − ) | 0 = 4 − =
96 96 3
However, it should be noted that this function is related to the cumulative
distribution function of a mixed random variable that takes on the values within
interval 0 ≤ 𝑥 < 4 continuously, but it has a jump at point 4. Therefore, we have:
16 1
𝑃(𝑋 = 4) = 𝑃(𝑋 ≤ 4) − 𝑃(𝑋 < 4) = 1 − 32 = 2
{ 𝑥
𝑓𝑋 (𝑥) = 16 ; 0 ≤ 𝑥 < 4
Now, we use the expression ∑𝑥 𝑥 𝑃(𝑋 = 𝑥) for discrete values, ∫𝑥 𝑥 𝑓(𝑥) 𝑑𝑥 for
continuous values, and we finally add up the results together. Hence, the expected
value of this variable can be obtained as follows:
4
1 𝑥 1 𝑥3 4 4 10
𝐸(𝑋) = × 4 + ∫ 𝑥 𝑑𝑥 = × 4 + | 0 = 2 + =
2 0 16 2 48 3 3
222 | P a g e
S uppose that the random variable 𝑋 denotes the daily production amount of a
factory. In addition, we know that the daily expenses of the factory are a function
of the daily production amount. In these conditions, we are usually interested in
analyzing daily expenses. For example, we might be interested in computing the
expected value of daily expenses. In this case, we would like to compute the expected
value of a function of random variable 𝑋. Consider the following example:
Example 4.1
If the random variable 𝑋 takes on each of the values 1, 0, and -1 with probability
1
and function 𝑌 = 𝑋 2 is defined, obtain the expected value of random variable 𝑌.
3
Solution. One possible solution is to calculate the expected value of 𝑌 is that we

obtain its probability function. In fact, since 𝑋 is a random variable and 𝑌 is a function
of 𝑋, we can conclude that 𝑌 is to be a random variable as well. Furthermore, we can
obtain its expected value by calculating its probability function. The probability
function of the random variable 𝑌 is calculated as follows:
1
𝑃(𝑌 = 0) = 𝑃(𝑋 = 0) =
3
2
𝑃(𝑌 = 1) = 𝑃(𝑋 = 1) + 𝑃(𝑋 = −1) =
3
Then, for the expected value of the random variable 𝑌, we have:

2 1 2
𝐸(𝑌) = 1 × 𝑃(𝑌 = 1) + 0 × 𝑃(𝑌 = 0) = 1 × +0× =
3 3 3
Even though this method can always be used to compute the expected value
of any function of 𝑋, a more elegant solution is suggested in some cases to calculate
the expected value of 𝑔(𝑋). To apply this method, the following proposition is used:
223 | P a g e
Proposition 4-1
If the discrete random variable 𝑋 takes on the value of 𝑥𝑖 with probability 𝑃(𝑥𝑖 ), then
for any real-valued function we have:
𝐸(𝑔(𝑋)) = ∑ 𝑔(𝑥𝑖 ) 𝑃𝑋 (𝑥𝑖 )

𝑖
Namely, to solve the preceding example, we have:

1 1 1
𝐸(𝑌) = 𝐸(𝑔(𝑋)) = 𝐸(𝑋 2 ) = ∑ 𝑥 2 𝑃𝑥 (𝑥) = (−1)2 × + 02 × + 12 ×
3 3 3
𝑥
1 2 2
=0× +1× =
3 3 3
Proof. To prove the proposition, it can be said that in expression ∑𝑖 𝑔(𝑥𝑖 ) 𝑃𝑋 (𝑥𝑖 ) same
as the preceding example, some values of 𝑥𝑖 have the same 𝑔(𝑥𝑖 ). Then, by factoring
𝑔(𝑥𝑖 ) and adding values 𝑃𝑋 (𝑥𝑖 ), we have:
∑ 𝑔 (𝑥𝑖 ) 𝑃𝑋 (𝑥𝑖 ) = ∑ ∑ 𝑔 (𝑥𝑖 )𝑃𝑋 (𝑥𝑖 ) = ∑ 𝑦𝑗 ∑ 𝑃𝑋 (𝑥𝑖 )

𝑖 𝑗 𝑖:𝑔 (𝑥𝑖 )=𝑦𝑗 𝑗 𝑖:𝑔 (𝑥𝑖 )=𝑦𝑗
= ∑ 𝑦𝑗 𝑃(𝑔 (𝑥𝑖 ) = 𝑦𝑗 ) = 𝐸(𝑌)

𝑗
Example 4.2
If 𝑋 denotes the outcome of rolling a fair die, obtain the expected value of 𝑋 2 .
Solution.
6
2)
1 1 1 1 1 1 91
𝐸(𝑋 = ∑ 𝑥 2 𝑃(𝑋 = 𝑥) = 12 × + 22 × + 32 × + 42 × + 52 × + 62 × =
6 6 6 6 6 6 6
𝑥=1
224 | P a g e
91
Note that if 𝑋 denotes the outcome of rolling a fair die, 𝐸(𝑋 2 ) = and
6
7 2 49
[𝐸(𝑋)]2 = ( ) = are not equal. We will show in this chapter that for any random
2 4
variable, like 𝑋, the inequality 𝐸 (𝑋 2 ) ≥ [𝐸(𝑋)]2 is valid.
Example 4.3
If the probability function of the random variable 𝑋 is given by:

𝑒 −1
; 𝑥 = 0, 1, 2, …
𝑃(𝑋 = 𝑥) = { 𝑥! ,
1
then obtain 𝐸[1+𝑋].
Solution.
∞ ∞ ∞ ∞ ∞
1 1 1 𝑒 −1 𝑒 −1 𝑒 −1 1
𝐸( )=∑ 𝑃𝑋 (𝑥) = ∑ =∑ =∑ = 𝑒 −1 ∑
1+𝑋 1+𝑥 1 + 𝑥 𝑥! (𝑥 + 1)! 𝑗! 𝑗!
𝑥=0 𝑥=0 𝑥=0 𝑗=1 𝑗=1
−1 −1
=𝑒 (𝑒 − 1) = 1 − 𝑒
To solve this example, the following equation is used:
∞
𝑎𝑖
∑ = 𝑒𝑎
𝑖!
𝑖=0
where substituting 𝑎 = 1 leads to:

∞
1
∑ = 𝑒1
𝑖!
𝑖=0
If 𝑋 is a continuous random variable with density function 𝑓(𝑥), then for any
real-valued function like 𝑔, we have:
+∞
𝐸(𝑔(𝑋)) = ∫ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥
−∞
225 | P a g e
Example 4.4
If 𝑋 is a continuous random variable with density function 𝑓(𝑥) = 1; 0 < 𝑥 < 1,

then obtain 𝐸(𝑋 𝑛 ).
Solution.
1 1 1
𝑛 𝑛 𝑛
1 𝑥 𝑛+1 1
𝐸(𝑋 ) = ∫ 𝑥 𝑓𝑋 (𝑥)𝑑𝑥 = ∫ 𝑥 × 𝑑𝑥 = ] =
0 0 1−0 𝑛+1 0 𝑛+1
Proposition 4-2
If 𝑎 and 𝑏 are constants and 𝑋 is a random variable with the expected value of 𝜇, then
we have:
𝐸 (𝑎𝑋 + 𝑏) = 𝑎𝐸(𝑋) + 𝑏 = 𝑎𝜇 + 𝑏
Proof. For the discrete case, we have:
𝐸(𝑎𝑋 + 𝑏) = ∑(𝑎𝑋 + 𝑏) 𝑃(𝑥) = 𝑎 ∑ 𝑥 𝑃(𝑥) + 𝑏 ∑ 𝑃(𝑥) = 𝑎𝐸(𝑋) + 𝑏 = 𝑎𝜇 + 𝑏

𝑥 𝑥 𝑥
The proof of the continuous case is similar to the discrete case.
Proposition 4-3
If 𝑔(𝑋) and ℎ(𝑋) are the real-valued functions defined on random variable X, then we
have:
𝐸 (𝑔(𝑋) + ℎ (𝑋)) = 𝐸 (𝑔(𝑋)) + 𝐸 (ℎ(𝑋))
Proof. The proof for discrete random variables is as follows:
𝐸(𝑔(𝑋) + ℎ(𝑋)) = ∑(𝑔(𝑥) + ℎ(𝑥)) 𝑃(𝑥) = ∑ 𝑔(𝑥) 𝑃(𝑥) + ∑ ℎ(𝑥) 𝑃(𝑥)

𝑥 𝑥 𝑥
= 𝐸 (𝑔(𝑋)) + 𝐸(ℎ(𝑋))
226 | P a g e
Definition: the 𝑛𝑡ℎ moment of the random variable 𝑋 about point 𝑎 is defined as
follows:
𝐸 [(𝑋 − 𝑎)𝑛 ]
which is equal to ∑𝑥(𝑥 − 𝑎)𝑛 𝑃(𝑋 = 𝑥) for discrete random variables and
∫𝑥(𝑥 − 𝑎)𝑛 𝑓(𝑥)𝑑𝑥 for continuous random variables.
I n this chapter, the expected value and some of its properties have been addressed
so far. The expected value is actually the probabilistic center of gravity of a random
variable which is a criterion to evaluate and analyze the position of the random
variable concentration. It should be noted that to analyze the position of the random
variable concentration, there are some other measures such as the median and mode
of the random variable to be stated in the preceding chapter. However, the measures
of central tendency of a random variable usually do not describe all of its properties.
For example, consider the following random variables. These two variables have the
same mean, median, and mode, but all of their properties are not the same.
1
; 𝑥 = −1
4
1
𝑃𝑋 (𝑥) = ; 𝑥=0
2
1
{4 ; 𝑥=1
1
; 𝑦 = −100
4
1
𝑃𝑋 (𝑦) = ; 𝑦=0
2
1
{4 ; 𝑦 = 100
It is obvious that in these two random variables, the mean and median as well
as the mode are equal to zero. However, their variations are not the same. To
measure the magnitude of variations of random variables, other measures are used.
Some of them are addressed in the following section.
227 | P a g e
T he variance of a random variable, such as 𝑋, is equal to the expectation of the
squared distance from its mean, defined as follows:
𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝐸 (𝑋))2 ]
Where, if 𝑋 is a discrete random variable with mean 𝜇, then we have:
2
𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇)2 ] = ∑(𝑥𝑖 − 𝜇) 𝑃𝑋 (𝑥𝑖 )
𝑥𝑖
Example 6.1
If 𝑋 denotes the outcome of rolling a fair die, obtain the variance of 𝑋.

Solution.
6 6
1 7 1 7 1 7 1 7 1 35
𝑉𝑎𝑟(𝑋) = ∑(𝑖 − 𝜇) × = ∑(𝑖 − )2 × = (1 − )2 + (2 − )2 + ⋯ + (6 − )2 =
2
6 2 6 2 6 2 6 2 6 12
𝑖=1 𝑖=1
Moreover, for continuous random variables, the variance of a random variable

is calculated as follows:
𝑉𝑎𝑟(𝑋) = ∫(𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥

𝑥
Proposition 6-1
The variance of random variable 𝑋 can be obtained as follows:
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
Proof.
𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇)2 ] = 𝐸(𝑋 2 − 2𝜇𝑋 + 𝜇 2 ) = 𝐸(𝑋 2 ) − 𝐸(2𝜇𝑋) + 𝐸(𝜇 2 )
= 𝐸(𝑋 2 ) − 2𝜇𝐸(𝑋) + 𝜇 2
228 | P a g e
= 𝐸(𝑋 2 ) − 2𝜇 2 + 𝜇 2 = 𝐸(𝑋 2 ) − 𝜇 2 = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
Which can also be obtained for discrete random variables as follows:
2
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = ∑ 𝑥 2 𝑃𝑥 (𝑥) − [∑ 𝑥 𝑃𝑥 (𝑥)]

𝑥 𝑥
Moreover, for continuous random variables, it can be calculated as follows:

2
2 [𝐸(𝑋)]2 2
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 ) − = ∫𝑥 𝑓(𝑥)𝑑𝑥 − [∫𝑥𝑓(𝑥)𝑑𝑥]
𝑥 𝑥
Example 6.2
Obtain the variance of the outcome of rolling a fair die by using Proposition 6-
1.
Solution.
2
2 [𝐸(𝑋)]2 2
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 ) − = ∑ 𝑥 𝑃𝑥 (𝑥) − [∑ 𝑥𝑃𝑥 (𝑥)]
𝑥 𝑥
1 1 1 1 1 1 91 7 35
= (12 × + 22 × + ⋯ + 62 × ) − (1 × + 2 × + ⋯ + 6 × )2 = − ( )2 =
6 6 6 6 6 6 6 2 12
Example 6.3
The daily demand of a product at a store follows the random variable 𝑋 with
2(𝑥 − 1); 1 < 𝑥 < 2
density function 𝑓(𝑥) = { . Obtain the variance of X .
229 | P a g e
Solution.
2
𝑥3 𝑥2 2 5
𝐸(𝑋) = ∫ 2𝑥(𝑥 − 1)𝑑𝑥 = 2( − )| =
1 3 2 3
1
2
2
2 (𝑥
𝑥 4 𝑥 3 2 17
𝐸(𝑋 ) = ∫ 2𝑥 − 1)𝑑𝑥 = 2( − ) | =
1 4 3 6
1
17 5 1
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = − ( )2 =
6 3 18
It is worth mentioning that since the variance of a random variable cannot be
negative, we have:
𝑉𝑎𝑟(𝑋) ≥ 0 ⇒ 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 ≥ 0 ⇒ 𝐸(𝑋 2 ) ≥ [𝐸(𝑋)]2
Proposition 6-2
If 𝑎 and 𝑏 are constants and 𝑋 is a random variable with respective expected value
and variance of 𝜇 and 𝜎 2 , then we have:
𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)
Proof.
2 2
𝑉𝑎𝑟(𝑌) = 𝐸 [(𝑌 − 𝐸(𝑌)) ] = 𝐸 [(𝑎𝑋 + 𝑏 − (𝑎𝜇 + 𝑏)) ] = 𝐸[(𝑎𝑋 − 𝑎𝜇)2 ] = 𝐸[𝑎2 (𝑋 − 𝜇)2 ]
= 𝑎2 𝑉𝑎𝑟(𝑋) = 𝑎2 𝜎 2
T o evaluate the dispersion, there is another criterion known as the Standard

Deviation (SD) of random variable 𝑋 which is simply the square root of the
variance and is expressed as follows:
𝑆𝐷(𝑋) = √𝑉𝑎𝑟 (𝑋)

A closer look helps us to understand that the unit of Standard Deviation equals
the unit of 𝑋, which is the main advantage of the Standard Deviation compared to
230 | P a g e
the variance. For instance, if the unit of 𝑋 is dollar, then the unit of variance of 𝑋 is
dollar raised to the power of 2; however, the unit of Standard Deviation of 𝑋 is dollar.
Proposition 6-3 If 𝑋 is 𝑎 random variable, then we have:
𝑆𝐷(𝑎𝑋 + 𝑏) = |𝑎|. 𝑆𝐷(𝑋)
Proof.
𝑆𝐷(𝑎𝑋 + 𝑏) = √𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = √𝑎2 𝑉𝑎𝑟(𝑋) = |𝑎|√𝑉𝑎𝑟(𝑋) = |𝑎|. 𝑆𝐷 (𝑋)
A nother criterion which can be used to calculate the dispersion of a random

variable is the expected distance of 𝑋 from the mean, defined as follows:
𝐸( |𝑋 − 𝜇| ) = ∑ |𝑥 − 𝜇 | 𝑃(𝑥)
𝑥
But this criterion is not that appropriate, and it is rarely used in the
probability theory and statistics.
A t the end of the topic of analyzing the features of random variables, it should
be noted that there are other measures to analyze the shape of a distribution
in addition to the central and dispersion measures, such as
skewness and kurtosis of the distribution.
𝐸 [(𝑋 − 𝜇)3 ] 𝑋−𝜇 3
𝑏1 = = 𝐸 [( ) ]
𝜎3 𝜎
The skewness is an appropriate measure to investigate the asymmetry of the
distribution. When the distribution is asymmetric, its skewness magnitude is equal
to zero (𝑏1 = 0). If the skewness magnitude for a distribution becomes positive, it
means that the distribution has been skewed to the right side(𝑏1 > 0). On the other
hand, if it becomes negative, it means that the distribution has been skewed to the
left side (𝑏1 < 0).
231 | P a g e
Figure 5-3 The effect of skewness on the shape of a random variable
Besides, the measure of kurtosis represents how acute the peak of the
distribution is, which is defined as follows:
𝐸[(𝑋 − 𝜇)4 ] 𝑋−𝜇 4
𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 = 4
= 𝐸[( ) ]
𝜎 𝜎
more acute
distribution
Figure 5-4 The effect of kurtosis on the shape of a random variable
In a distribution, as the peak becomes more acute and the tail becomes flatter,
its acuteness increases.
I f 𝑋 is a random variable with finite mean 𝜇 and finite variance 𝜎 2 , and it is feasible
to differentiate a function like 𝑔 twice, then:
𝑔″ (𝜇) 2
𝐸[𝑔(𝑋)] ≈ 𝑔(𝜇) + 𝜎
2
Proof. According to the Taylor series of function 𝑔(𝑋) about 𝜇, we have:
′
(𝑋 − 𝜇)2
″
𝑔(𝑋) ≈ 𝑔(𝜇) + 𝑔 (𝜇)(𝑋 − 𝜇) + 𝑔 (𝜇) +⋯
2
(𝑋 − 𝜇)2
≈ 𝑔(𝜇) + 𝑔′ (𝜇)(𝑋 − 𝜇) + 𝑔″ (𝜇)
2
232 | P a g e
Now, taking expectations of the expressions above, we have:
(𝑥 − 𝜇)2
𝐸(𝑔(𝑋)) ≈ 𝐸(𝑔(𝜇) + 𝑔′ (𝜇)(𝑋 − 𝜇) + 𝑔″ (𝜇) )
2
(𝑋 − 𝜇)2 𝜎2
= 𝑔(𝜇) + 𝑔′ (𝜇)𝐸(𝑋 − 𝜇) + 𝑔″ (𝜇)𝐸( ) = 𝑔(𝜇) + 0 + 𝑔″ (𝜇)
2 2
″
𝜎2
= 𝑔(𝜇) + 𝑔 (𝜇)
2
Example 8.1
Suppose that the mean and variance of the random variable 𝑋 are equal to 𝜇
and 𝜎 2 , respectively, where 𝜇 and 𝜎 2 are not equal to zero. If so, obtain the
1
approximate expected value of 𝑌 = 𝑋.
Solution.
1
𝑔(𝑋) == 𝑋 −1 ⇒ 𝑔′ (𝑋) = −𝑋 −2 ⇒ 𝑔″ (𝑋) = 2𝑋 −3
𝑋
𝑔″ (𝜇) 2 1 1 2 2 1 𝜎2
𝐸[𝑔(𝑋)] ≈ 𝑔(𝜇) + 𝜎 ⇒ 𝐸( ) ≈ + 3𝜎 = + 3
2 𝑋 𝜇 2𝜇 𝜇 𝜇
M oment generating function of the random variable 𝑋 is denoted by 𝑀𝑋 (𝑡).

Provided that it exists for real values of 𝑡, it is defined as follows:
𝑀𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 )
It is defined for discrete random variables as follows:
∑ 𝑒 𝑡𝑥 𝑃𝑥 (𝑥)
𝑥
Also, it is defined for continuous random variables as follows:
∫𝑒 𝑡𝑥 𝑓𝑥 (𝑥)𝑑𝑥
𝑥
233 | P a g e
Note that the moment generating function of a random variable exists
whenever the value of 𝐸(𝑒 𝑡𝑋 ) is finite for some real values of 𝑡 and its value is
convergent in the neighborhood of 𝑡 = 0.
Moreover, there are two important points in the above definitions. Firstly, the
moment generating function of a random variable cannot take on negative value.
Secondly, the value of moment generating function in the neighborhood of 𝑡 = 0 is
equal to 1. That is, we have:
𝑀𝑥 (𝑡 = 0) = 𝐸(𝑒 𝑡𝑥 ) | =1
𝑡=0
The Taylor series of function 𝑒 𝑡𝑋 is equal to:
(𝑡𝑋)2 (𝑡𝑋)3 (𝑡𝑋)4
𝑒 𝑡𝑋 = 1 + 𝑡𝑋 + + + +⋯
2! 3! 4!
Therefore, the moment generating function of the random variable 𝑋 is equal
to:
(𝑡𝑋)2 (𝑡𝑋)3 𝑡2 𝑡3
𝐸(𝑒 𝑡𝑋 ) = 𝐸[1 + 𝑡𝑋 + + +. . . ] = 1 + 𝑡𝐸(𝑋) + 𝐸(𝑋 2 ) + 𝐸(𝑋 3 ) + ⋯
2! 3! 2! 3!
Hence, it can simply be shown that if we differentiate the moment generating
function n times with respect to 𝑡 and then let 𝑡 = 0, then the value of 𝐸(𝑋 𝑛 ) or the
𝑛𝑡ℎ moment of random variable 𝑋 about zero is obtained.
In other words, if we differentiate the moment generating function of the discrete
random variables which is equal to ∑𝑥 𝑒 𝑡𝑥 𝑃𝑥 (𝑥), or the continuous random
+∞
variables, which is equal to ∫−∞ 𝑒 𝑡𝑥 𝑓𝑥 (𝑥)𝑑𝑥, once with respect to 𝑡, the respective
terms ∑𝑥 𝑥𝑒 𝑡𝑥 𝑃𝑥 (𝑥) and ∫𝑥 𝑥𝑒 𝑡𝑥 𝑓𝑥 (𝑥)𝑑𝑥 are obtained. That is;
𝑑𝑀𝑥 (𝑡) 𝑑𝐸(𝑒 𝑡𝑋 )

= = 𝐸(𝑋𝑒 𝑡𝑋 )
𝑑𝑡 𝑑𝑡
Likewise, if we differentiate the moment generating function of the random
variable 𝑋 𝑘 times with respect to 𝑡, the following term is obtained:
𝑑 𝑘 𝑀𝑥 (𝑡) 𝑑 𝑘 𝐸(𝑒 𝑡𝑋 )
= = 𝐸(𝑋 𝑘 𝑒 𝑡𝑋 )
𝑑𝑡𝑘 𝑑𝑡𝑘
Now, if we let 𝑡 = 0 in the above term, term 𝐸(𝑋 𝑘 ) or the 𝑘 𝑡ℎ moment of random
variable 𝑋 about zero is obtained. The reason for calling the function as the moment
234 | P a g e
generating function is that by successively differentiating moment of this function
with respect to 𝑡 and then letting 𝑡 = 0, the 𝑘 𝑡ℎ moment of this function about the
origin is obtained.
Example 9.1
Suppose that the discrete random variable 𝑋 has the following probability
function. Obtain the moment generating function and its first and second moments.
𝑒 −1
𝑃(𝑥) = ; 𝑥 = 0,1,2, …
𝑥!
Solution. According to the definition of the moment generating function, we have:
∞ ∞
𝑡𝑋 𝑡𝑥
𝑒 −1 (𝑒 𝑡 )𝑥 𝑡 𝑡
𝑀𝑥 (𝑡) = 𝐸(𝑒 ) = ∑ 𝑒 = 𝑒 −1 ∑ = 𝑒 −1 × 𝑒 𝑒 = 𝑒 (𝑒 −1)
𝑥! 𝑥!
𝑥=0 𝑥=0
Now, if we differentiate the moment generating function of this random

variable 𝑘 times with respect to 𝑡 and then let 𝑡 = 0, term 𝐸(𝑒 𝑡𝑋 ) is obtained.
Therefore, we have:
𝑡 −1)
𝑀𝑥′ (𝑡) = 𝑒 𝑡 𝑒 (𝑒 ⇒ 𝐸(𝑋) = 𝑀𝑥′ (𝑡) | =1
𝑡=0
𝑡 −1) 𝑡 −1)
𝑀𝑥″ (𝑡) = 𝑒 𝑡 𝑒 (𝑒 + (𝑒 𝑡 )2 𝑒 (𝑒 ⇒ 𝐸(𝑋 2 ) = 𝑀𝑥″ (𝑡) | =1+1
𝑡=0
Example 9.2
function. Obtain the moment generating function and its first and second moments.
235 | P a g e
𝑛
( )
𝑃(𝑥) = 𝑥𝑛 ; 𝑥 = 0,1,2, … , 𝑛
2
𝑛𝑛 𝑛 𝑛
( ) 1 1 1
𝑡𝑋 𝑡𝑥 𝑥 𝑛 𝑡𝑥 𝑛
𝑀𝑥 (𝑡) = 𝐸(𝑒 ) = ∑ 𝑒 𝑛
= 𝑛 ∑ ( ) 𝑒 = 𝑛 ∑ ( ) 𝑒 𝑡𝑥 1𝑛−𝑥 = 𝑛 (𝑒 𝑡 + 1)𝑛
2 2 𝑥 2 𝑥 2
𝑥=0 𝑥=0 𝑥=0
where the last equality is obtained by using the binomial expansion or

∑𝑛𝑥=0 (𝑛) 𝑎 𝑥 𝑏 𝑛−𝑥 = (𝑎 + 𝑏)𝑛 .
𝑥
Now, if we differentiate the moment generating function of this random
variable 𝑘 times with respect to 𝑡 and then let 𝑡 = 0, term 𝐸(𝑒 𝑡𝑋 ) is obtained.
Therefore, we have:
1 𝑛
𝑀𝑥′ (𝑡) = 𝑛
𝑛𝑒 𝑡 (𝑒 𝑡 + 1)𝑛−1 ⇒ 𝐸(𝑋) = 𝑀𝑥′ (𝑡) | =
2 𝑡=0 2
1 1 𝑛 𝑛(𝑛 − 1)
𝑀𝑥″ (𝑡) = 𝑛𝑒 𝑡 𝑡
(𝑒 + 1)𝑛−1
+ 𝑛𝑒 𝑡
(𝑛 − 1)𝑒 𝑡 𝑡
(𝑒 + 1)𝑛−2
⇒ 𝐸(𝑋 2
) = 𝑀𝑥
″ (𝑡)
| = +
2𝑛 2𝑛 𝑡=0 2 4
Example 9.3
Suppose that continuous random variable 𝑋 has the following density function.
Obtain the moment generating function and its 𝑘 𝑡ℎ moment.
𝑓(𝑥) = 2𝑒 −2𝑥 ; 𝑥 > 0
∞ ∞
2
𝑀𝑥 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∫ 𝑒 𝑡𝑥 2𝑒 −2𝑥 𝑑𝑥 = 2 ∫ 𝑒 −(2−𝑡)𝑥 𝑑𝑥 = ;𝑡 < 2
0 0 2−𝑡
2 2 1
𝑀𝑥′ (𝑡) = ⇒ 𝐸(𝑋) = 𝑀𝑥′ (𝑡) | = =
(2 − 𝑡)2 𝑡 = 0 22 2
236 | P a g e
2×2 2×2 2
𝑀𝑥″ (𝑡) = 3
⇒ 𝐸(𝑋 2 ) = 𝑀𝑥″ (𝑡) | = 3 = 2
(2 − 𝑡) 𝑡=0 2 2
3×2×2 3! × 2 3!
𝑀𝑥‴ (𝑡) = ⇒ 𝐸(𝑋 3 ) = 𝑀𝑥‴ (𝑡) | = = 3
(2 − 𝑡)4 𝑡=0 24 2
Likewise, in this example, it can be shown that:

𝑑 𝑘 𝑀𝑥 (𝑡) 𝑘! × 2 𝑘!
𝐸(𝑋 𝑘 ) = 𝑘
| = 𝑘+1
| = 𝑘
𝑑𝑡 𝑡 = 0 (2 − 𝑡) 𝑡=0 2
Proposition 9-1
If we call 𝑀𝑋 (𝑡) the moment generating function of the random variable 𝑋, then the
given constants 𝑎 and 𝑏, the moment generating function of the random variable 𝑌 =
𝑎𝑋 + 𝑏 is equal to:
𝑀𝑌 (𝑡) = 𝑀𝑎𝑥+𝑏 (𝑡) = 𝑀𝑥 (𝑎𝑡) 𝑒 𝑡𝑏
Proof.
𝑀𝑎𝑥+𝑏 (𝑡) = 𝐸(𝑒 𝑡(𝑎𝑋+𝑏) ) = 𝐸(𝑒 𝑡𝑏 𝑒 (𝑎𝑡)𝑋 ) = 𝑒 𝑡𝑏 𝑀𝑥 (𝑎𝑡)
For example, if the moment generating function of the random variable 𝑋 is
2
𝑀𝑋 (𝑡) = 2−𝑡 ; 𝑡 < 2, and the random variable 𝑌 is defined as 𝑌 = 𝑋 − 1, then for the
moment generating function of 𝑌, we have:
𝑒𝑡 1
𝑌 = 𝑋 − 1 ⇒ 𝑀𝑌 (𝑡) = 𝑀𝑥 (𝑡)𝑒 −𝑡 = 𝑡
𝑒 −𝑡 =
2−𝑒 2 − 𝑒𝑡
It can be shown that the moment generating function possesses a one-to-
one correspondence with the respective random variable distribution. In other
words, if the moment generating function of any random variable exists, it is
unique. Namely, in Example 9.3, the moment generating function of random
2
variable 𝑋 with density function 𝑓(𝑥) = 2𝑒 −2𝑥 ; 𝑥 > 0 is equal to 𝑀𝑋 (𝑡) = 2−𝑡 ; 𝑡 < 2
. In such cases, if the moment generating function of a random variable is equal to
that of this problem's random variable, its density function is certainly equal to the
density function of the random variable in this problem.
237 | P a g e
F actorial moment generating function of random variable 𝑋 in case of existence as
follows for real values of 𝑡:
𝜋𝑥 (𝑡) = 𝐸(𝑡 𝑋 )
Which is defined for discrete random variables as follows:
∑ 𝑡 𝑥 𝑃𝑥 (𝑥)
𝑥
And it is defined for continuous random variables as follows:

+∞
∫ 𝑡 𝑥 𝑓𝑥 (𝑥)𝑑𝑥
−∞
If we differentiate the factorial moment generating function 𝑘 times with

respect to 𝑡 and then let 𝑡 = 1, the 𝑘 𝑡ℎ factorial moment generating function is
obtained as follows:
𝑑𝑘 𝐸(𝑡 𝑥 )
| = 𝐸[𝑋(𝑋 − 1)(𝑋 − 2). . . (𝑋 − 𝑘 + 1)𝑡 (𝑥−𝑘) ] |
𝑑𝑡𝑘 𝑡=1 𝑡=1
𝑋!
= 𝐸[𝑋(𝑋 − 1)(𝑋 − 2). . . (𝑋 − 𝑘 + 1)] = 𝐸 [ ]
(𝑋 − 𝑘)!
𝑋!
Note that expression 𝐸[𝑋(𝑋 − 1)(𝑋 − 2). . . (𝑋 − 𝑘 + 1)] = 𝐸 [(𝑋−𝑘)!] is known as
the 𝑘 𝑡ℎ factorial moment generating function.
For example, for the first, second, and third factorial moment generating
functions, we have:
𝑑𝐸(𝑡 𝑥 )
| = 𝐸[𝑋]
𝑑𝑡 𝑡=1
𝑑 2 𝐸(𝑡 𝑥 )
| = 𝐸[𝑋(𝑋 − 1)]
𝑑𝑡 2 𝑡 = 1
𝑑 3 𝐸(𝑡 𝑥 )
| = 𝐸[𝑋(𝑋 − 1)(𝑋 − 2)]
𝑑𝑡 3 𝑡 = 1
238 | P a g e
Example 10.1
function:
𝑛
( )
𝑃(𝑋 = 𝑥) = 𝑥𝑛 ; 𝑥 = 0,1,2, . . . , 𝑛
2
Obtain its factorial moment generating function and the first and second
factorial moment generating functions.
Solution.
𝑛 𝑛 𝑛 𝑛
( ) 1 1 1
𝑋 𝑋 𝑥 𝑛 𝑋 𝑛
𝜋𝑥 (𝑡) = 𝐸(𝑡 ) = ∑ 𝑡 𝑛
= 𝑛 ∑ ( ) 𝑡 = 𝑛 ∑ ( ) 𝑡 𝑋 1𝑛−𝑥 = 𝑛 (𝑡 + 1)𝑛
2 2 𝑥 2 𝑥 2
𝑥=0 𝑥=0 𝑥=0
where the last equality is obtained by using the binomial series or

𝑛
∑𝑛𝑥=0 ( ) 𝑎 𝑥 𝑏 𝑛−𝑥 = (𝑎 + 𝑏)𝑛 .
𝑥
𝑑𝐸(𝑡 𝑥 ) 1 𝑛
𝐸(𝑋) = | = ( )𝑛 × 𝑛(𝑡 + 1)𝑛−1 | =
𝑑𝑡 𝑡=1 2 𝑡=1 2
𝑋! 𝑑 2 𝐸(𝑡 𝑥 ) 1 𝑛(𝑛 − 1)
𝐸(𝑋(𝑋 − 1)) = 𝐸( )= 2
| = ( )𝑛 𝑛(𝑛 − 1)(𝑡 + 1)𝑛−2 | =
(𝑋 − 2)! 𝑑𝑡 𝑡=1 2 𝑡=1 4
Note that since the moment generating function equals 𝐸(𝑒 𝑡𝑋 ) and the
factorial moment generating function equals 𝐸(𝑡 𝑋 ), in order to obtain the factorial
moment generating function of a distribution, it suffices that we let 𝑒 𝑡 = 𝑡 in the
moment generating function.
Namely, if the moment generating function of Example 9.2 is obtained as
1
𝐸(𝑒 𝑡𝑋 ) = 2𝑛 (𝑒 𝑡 + 1)𝑛 , its factorial moment generating function is equal to 𝐸(𝑡 𝑋 ) =
1
(𝑡 + 1)𝑛 .
2𝑛
239 | P a g e
Proposition 10-1
If 𝑋 is a random variable taking on integer and nonnegative values, then we have:
𝑃(𝑋 𝑒𝑣𝑒𝑛) − 𝑃(𝑋 𝑜𝑑𝑑) = 𝐸(𝑡 𝑋 )|𝑡=−1
Proof.
𝐸(𝑡 𝑥 ) | = 𝐸((−1)𝑥 ) = ∑(−1)𝑥 𝑃𝑥 (𝑥) = (−1)0 𝑃(0) + (−1)1 𝑃(1) + (−1)2 𝑃(2) + ⋯
𝑡 = −1
𝑥
= 𝑃(0) − 𝑃(1) + 𝑃(2) + ⋯ = 𝑃(𝑋 𝑒𝑣𝑒𝑛) − 𝑃(𝑋 𝑜𝑑𝑑)
Example 10.2
function:
𝑥
10 2
𝑃(𝑥) = ( ) 10 ; 𝑥 = 0,1,2, . . . ,10
𝑥 3
Obtain the probability that the random variable 𝑋 takes on an even value and
the probability that it adopts an odd value.
Solution.
10 10 10
𝑋 10
2𝑥 1 𝑛 1 𝑛 1
𝐸(𝑡 ) = ∑ 𝑡 ( ) 10 = 10 ∑ ( ) (2𝑡) 𝑋 = 10 ∑ ( ) (2𝑡) 𝑋 110−𝑥 = 10 (2𝑡 + 1)10
𝑋
𝑥 3 3 𝑥 3 𝑥 3
𝑥=0 𝑥=0 𝑥=0
𝑃(𝑋 𝑒𝑣𝑒𝑛) + 𝑃(𝑋 𝑜𝑑𝑑) = 1

{ 1
𝑃(𝑋 𝑒𝑣𝑒𝑛) − 𝑃(𝑋 𝑜𝑑𝑑) = 𝐸[(−1) 𝑋 ] = 𝐸(𝑡 𝑋 )|𝑡=−1 =
310
1 1
1+ 1−
⇒ 𝑃(𝑋 𝑒𝑣𝑒𝑛) = 310 , 𝑃(𝑋 𝑜𝑑𝑑) = 310
2 2
Note that if the discrete random variable 𝑋 takes on nonnegative integers, we
call the factorial moment generating function as the probability generating function
because in such a case we have:
240 | P a g e
𝜋𝑋 (𝑡) = ∑ 𝑡 𝑥 𝑃(𝑋 = 𝑥) = 𝑃𝑋 (0) + 𝑡𝑃𝑋 (1) + 𝑡 2 𝑃𝑋 (2) + 𝑡 3 𝑃𝑋 (3) + ⋯ ⇒ 𝜋𝑋 (𝑡 = 0) = 𝑃𝑋 (0)
𝑥
Then, successive differentiating of 𝜋𝑋 (𝑡) with respect to 𝑡 results in:

𝑑𝜋𝑋 (𝑡)
| = [𝑃𝑋 (1) + 2𝑡𝑃𝑋 (2) + 3𝑡 2 𝑃𝑋 (3) + 4𝑡 3 𝑃𝑋 (4) + ⋯ ]|𝑡=0 = 𝑃𝑋 (1)
𝑑𝑡 𝑡=0
𝑑2 𝜋𝑋 (𝑡)
| = [2 × 1 × 𝑃𝑋 (2) + 3 × 2 × 𝑡𝑃𝑋 (3) + 4 × 3 × 𝑡 2 𝑃𝑋 (4) + ⋯ ]|𝑡=0 = 2! 𝑃𝑋 (2)
𝑑𝑡 2 𝑡=0
𝑑3 𝜋𝑋 (𝑡)
| = [3 × 2 × 1 × 𝑃𝑋 (3) + 4 × 3 × 2 × 𝑡𝑃𝑋 (4) + ⋯ ]|𝑡=0 = 3! 𝑃𝑋 (3)
𝑑𝑡 3 𝑡=0
⋮
Likewise, it can be shown that:
𝑑 𝑛 𝜋𝑋 (𝑡) 1 𝑑 𝑛 𝜋𝑋 (𝑡)
| = 𝑛! 𝑃𝑋 (𝑛) ⇒ 𝑃𝑋 (𝑛) = × |
𝑑𝑡 𝑛 𝑡=0 𝑛! 𝑑𝑡 𝑛 𝑡=0
In other words, if in the factorial moment generating function of the random

variable 𝑋 taking on nonnegative integers, we let 𝑡 = 0, it yields 𝑃(𝑋 = 0). Moreover,
if we differentiate 𝑛 times with respect to 𝑡 and divide the resulted value by 𝑛!, it
yields 𝑃(𝑋 = 𝑛) and that is why this function is called the probability generating
function as well.
Example 10.3
Suppose that the factorial moment generating function (the probability

generating function) of the random variable 𝑋 is 𝜋𝑋 (𝑡) = 𝑒 𝑡−1 . Obtain the probability
function of this random variable.
Solution. Based on the previous explanations of the current section, we have:
𝑃(𝑋 = 0) = 𝜋𝑋 (𝑡 = 0) = 𝑒 −1
1 𝑑 𝑛 𝜋𝑋 (𝑡) 1 𝑑 𝑛 (𝑒 𝑡−1 ) 1 𝑡−1 |
𝑒 −1
𝑃(𝑋 = 𝑛) = × | = × | = × 𝑒 𝑡=0 = ; 𝑛 = 1,2, . ..
𝑛! 𝑑𝑡 𝑛 𝑡=0 𝑛! 𝑑𝑡 𝑛 𝑡=0
𝑛! 𝑛!
Therefore, given that 0! is equal to 1, we have:

241 | P a g e
𝑃(𝑋 = 0) = 𝑒 −1
𝑒 −1
{ 𝑒 −1 ⇒ 𝑃(𝑋 = 𝑛) = ; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑛) = ; 𝑛 = 1,2, … 𝑛!
𝑛!
Other advantages of the factorial moment generating function is that if the
random variable 𝑋 takes on nonnegative integers, for values |𝑡| ≤ 1, term 𝐸(𝑡 𝑋 ) surely
converges. Therefore, for this kind of random variables, the factorial moment
generating function always exists.
As mentioned in Section 5.9, sometimes it is possible that the moment
generating function of a random variable does not exist. As a result, it cannot be used
to determine the features of the random variable like obtaining the expected value.
In such cases, if the random variable 𝑋 takes on nonnegative integers, the factorial
moment generating function certainly exists and it can be used to investigate the
random variable features.
𝑿|𝒂 ≤ 𝑿 ≤ 𝒃
S ometimes the obtained information about a random variable changes the values
of that random variable. For instance, when we know that the outcome of a die is
less than 5, or the lifetime of lightbulb is greater than 10 months, possible values of
the random variable and their probabilities change.
If we know that the random variable 𝑋 is within the interval from 𝑎 to 𝑏, then,
for the discrete case, its probability function is determined as follows:
𝑃(𝑋 = 𝑥)
𝑃(𝑋 = 𝑥|𝑎 ≤ 𝑋 ≤ 𝑏) = ; 𝑥 = 𝑎, … , 𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏)
Or, for the continuous case, the density function of the random variable 𝑋 is
determined as follows in the new state:
𝑓(𝑥)
𝑓(𝑥|𝑎 ≤ 𝑋 ≤ 𝑏) = ; 𝑎≤𝑥≤𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏)
242 | P a g e
Proof.
For the discrete case, we have:
𝑃((𝑋 = 𝑥) ∩ (𝑎 ≤ 𝑋 ≤ 𝑏)) 𝑃(𝑋 = 𝑥)
𝑃(𝑋 = 𝑥|𝑎 ≤ 𝑋 ≤ 𝑏) = = ; 𝑥 = 𝑎, … , 𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏)
In addition, for the continuous case, we have:
𝑑𝑥 𝑑𝑥
𝑓(𝑥|𝑎 ≤ 𝑋 ≤ 𝑏)𝑑𝑥 = 𝑙𝑖𝑚 𝑃(𝑥 − <𝑋<𝑥+ |𝑎 ≤ 𝑋 ≤ 𝑏)
𝑑𝑥→0 2 2
𝑑𝑥 𝑑𝑥 𝑑𝑥 𝑑𝑥
𝑙𝑖𝑚 𝑃((𝑥 − < 𝑋 < 𝑥 + ) ∩ (𝑎 ≤ 𝑋 ≤ 𝑏)) 𝑙𝑖𝑚 𝑃(𝑥 − <𝑋<𝑥+ ) 𝑓(𝑥)𝑑𝑥
𝑑𝑥→0 2 2 𝑑𝑥→0 2 2
= = = ; 𝑎≤𝑥≤𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏)
𝑓(𝑥)
⇒ 𝑓(𝑥|𝑎 ≤ 𝑋 ≤ 𝑏) =
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏)
Also, for the expected value of the random variable X, in the case that the
values are limited, we have:
𝐸(𝑋|𝑎 ≤ 𝑋 ≤ 𝑏)
𝑏 𝑏
𝑃(𝑋 = 𝑥) ∑𝑏𝑎 𝑥 𝑃(𝑋 = 𝑥)
∑ 𝑥 𝑃(𝑥|𝑎 ≤ 𝑋 ≤ 𝑏) = ∑ 𝑥 = 𝑏 ; if 𝑋 is a discrete random variable
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) ∑𝑎 𝑃(𝑋 = 𝑥)
𝑎 𝑎
= 𝑏
𝑏 𝑏 ∫ 𝑥 𝑓𝑋 (𝑥)𝑑𝑥
𝑓𝑋 (𝑥)
∫ 𝑥 𝑓𝑋 (𝑥|𝑎 ≤ 𝑋 ≤ 𝑏) 𝑑𝑥 = ∫ 𝑥 𝑑𝑥 = 𝑎 𝑏 ; if 𝑋 is a continuous random variable
{ 𝑎 𝑎 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) ∫𝑎 𝑓𝑋 (𝑥)𝑑𝑥
Example 11.1
function:
1
; 𝑥 = −1
4
1
𝑃𝑋 (𝑥) = ; 𝑥=0
2
1
{4 ; 𝑥=1
Obtain the expected value of this random variable given that we know it does
not take on negative values.
243 | P a g e
Solution.
1 1 1
𝑃(𝑋 = 𝑥) ∑1𝑥=0 𝑥𝑃(𝑋 = 𝑥) 0 × 2 + 1 × 4 2 1 1
𝐸(𝑋|𝑋 ≥ 0) = ∑ 𝑥 = = =0× +1× =
𝑃(𝑋 ≥ 0) 𝑃(𝑋 ≥ 0) 1 1 3 3 3
𝑥=0
2+4
Example 11.2
1
Suppose that the density function of random variable 𝑋 is 𝑓(𝑋) = 2 ; 0 < 𝑥 < 2 .
1
Obtain the value of 𝐸(𝑋|𝑋 < 2) .
Solution.
1 1
𝑥 1
1 ∫02 𝑥 𝑓𝑋 (𝑥)𝑑𝑥 ∫02 2 𝑑𝑥 = 16 = 1
𝐸(𝑋|𝑋 < ) = = 1
2 1 1 1 4
𝑃(𝑋 < 2) ∫02 𝑑𝑥
2 4
I n numerous applications of statistics, the exact distribution of a random variable

is not specific, but the mean and variance of its distribution can be estimated. In
these conditions, it is possible to obtain some bounds for some probabilities by using
Markov's and Chebyshev's inequalities. These two are explained in this section.
I f 𝑋 is a nonnegative random variable, then for any 𝑎 > 0, we have:
𝑃(𝑋 ≥ 𝑎) ≤
𝐸(𝑋)
𝑎
Proof. We suppose that 𝑋 is a continuous random variable.
244 | P a g e
∞ 𝑎 ∞ ∞ ∞
𝐸(𝑋) = ∫ 𝑥 𝑓𝑥 (𝑥)𝑑𝑥 = ∫ 𝑥 𝑓𝑥 (𝑥)𝑑𝑥 + ∫ 𝑥 𝑓𝑥 (𝑥)𝑑𝑥 ≥ ∫ 𝑥 𝑓(𝑥)𝑑𝑥 ≥ ∫ 𝑎 𝑓𝑥 (𝑥)𝑑𝑥
0 0 𝑎 𝑎 𝑎
∞
𝐸(𝑥)
⇒ 𝐸(𝑋) ≥ 𝑎 ∫ 𝑓𝑥 (𝑥)𝑑𝑥 ⇒ 𝑃(𝑋 ≥ 𝑎) ≤
𝑎 𝑎
In addition, the Markov's inequality can be written as follows:
𝐸(𝑋) 𝐸(𝑋) 𝐸(𝑋) 𝐸(𝑋)

𝑃(𝑋 ≥ 𝑎) ≤ ⇒ −𝑃(𝑋 ≥ 𝑎) ≥ − ⇒ 1 − 𝑃(𝑋 ≥ 𝑎) ≥ 1 − ⇒ 𝑃(𝑋 < 𝑎) ≥ 1 −
𝑎 𝑎 𝑎 𝑎
Example 12.1
A shopping center has, on average, 1000 customers per day.

a. What can be said about the probability that the shopping center that will
have at least 1500 customers tomorrow?
b. What can be said about the probability that the shopping center will have
less than 1500 customers tomorrow?
Solution.
a. Suppose that 𝑋 denotes the number of customers per day. In this state, using
Markov's inequality results in:
𝐸(𝑋) 1000 2
𝑃(𝑋 ≥ 1500) ≤ = =
1500 1500 3
The probability that the shopping center will have at least 1500 customers
2
tomorrow is at most 3.
b. Using the second figure of Markov's inequality, we have:
𝐸(𝑋) 2 1
𝑃(𝑋 < 1500) = 1 − 𝑃(𝑋 ≥ 1500) ≥ 1 − =1− =
𝑎 3 3
245 | P a g e
The probability that the shopping center will have less than 1500 customers
1
tomorrow is at least 3.
Markov's inequality can be extended as follows:
If 𝑔(𝑋) is a nonnegative random variable, then for any 𝑎 > 𝑜, we have:
𝐸(𝑔(𝑋))
𝑃(𝑔(𝑋) ≥ 𝑎) ≤
𝑎
𝐸(𝑔(𝑋))
𝑃(𝑔(𝑋) < 𝑎) ≥ 1 −
𝑎
Namely, if 𝑋 is a random variable, then for any 𝑎 > 0, we have:
2
𝐸(𝑋 2 )
𝑃(𝑋 ≥ 𝑎) ≤
𝑎
I
f 𝑋 is a random variable with respective finite mean and variance 𝜇 and 𝜎 2 , then
for any 𝐶 > 0, we have:
𝜎2
𝑃{|𝑋 − 𝜇| ≥ 𝐶 } ≤
𝐶2
Proof.
2 2
𝐸[(𝑋 − 𝜇)2 ] 𝜎 2
𝑃{|𝑋 − 𝜇| ≥ 𝐶 } = 𝑃{ (𝑋 − 𝜇) ≥ 𝐶 } ≤ = 2
𝐶2 𝐶
To realize the above proof better, let [(𝑋 − 𝜇)2 ] be equal to 𝑔(𝑋) and then use
the extension of Markov's inequality.
Chebyshev's inequality states that the probability that the distance of a
random variable from its mean becomes greater than or equal to the positive number
𝜎2
𝐶 is at most 𝐶 2 .
Moreover, Chebyshev's inequality can be written as follows:

𝜎2 𝜎2 𝜎2
𝑃{|𝑋 − 𝜇| ≥ 𝐶 } ≤ 2 ⇒ −𝑃{|𝑋 − 𝜇| ≥ 𝐶 } ≥ − 2 ⇒ 1 − 𝑃{|𝑋 − 𝜇| ≥ 𝐶 } ≥ 1 − 2
𝐶 𝐶 𝐶
𝜎2
⇒ 𝑃{|𝑋 − 𝜇| < 𝐶 } ≥ 1 −
𝐶2
246 | P a g e
Therefore, in general, Chebyshev's inequality can be represented as
following manifestations:
𝜎2 𝜎2
𝑃{|𝑋 − 𝜇| < 𝐶 } ≥ 1 − 𝐶 2 𝑃{|𝑋 − 𝜇| ≥ 𝐶 } ≤ 𝐶 2
Now, if in the Chebyshev's inequality, we let 𝐶 = 𝑘𝜎, other manifestations of

this inequality are obtained:
𝜎2 1 𝜎2 1
𝑃{|𝑋 − 𝜇| ≥ 𝑘𝜎 } ≤ (𝑘𝜎)2 = 𝑘 2 𝑃{|𝑋 − 𝜇| < 𝑘𝜎 } ≥ 1 − (𝑘𝜎)2 = 1 − 𝑘 2
Example 12.2
Suppose a shopping center has, on average, 1,000 customers per day with a
standard deviation of 20 customers. What can be said about the possibility of having
between 800 and 1,200 customers tomorrow?
Solution. Suppose that 𝑋 denotes the number of customers per day. If so, using
Chebyshev's inequality leads to
𝜎2
𝑃{800 < 𝑋 < 1200 } = 𝑃{800 − 1000 < 𝑋 − 1000 < 1200 − 1000 } = 𝑃{|𝑋 − 1000| < 200 } ≥ 1 −
𝐶2
(20)2
=1− = 0.99
(200)2
Therefore, the probability that the shopping center will have between 800 and 1200
customers tomorrow is at least 0.99.
247 | P a g e
Example 12.3
The external diameter of a piece is suitable if its size is between 19.5 to 20.5
mm. If the average external diameter of this type of piece is equal to 20 mm and its
standard deviation is equal to 0.25 mm, what can be said about the proportion of
suitable pieces?
Solution. Suppose that 𝑋 denotes the external diameter magnitude of the piece.
(0.25)2 3
𝑃(20.5 < 𝑋 < 19.5) = 𝑃{|𝑋 − 20| < 0.5} ≥ 1 − = = 0.75
(0.5)2 4
3
This means that 4 of the pieces are suitable.
I
f 𝑋 is a random variable with respective mean and finite variance 0 and 𝜎 2 , then
for any 𝑎 > 0, we have:
𝜎2
𝑃{𝑋 ≥ 𝑎 } ≤
𝑎2 + 𝜎 2
Proof. for any 𝑏 > 0, we have:

𝑃{𝑋 ≥ 𝑎 } = 𝑃{𝑋 + 𝑏 ≥ 𝑎 + 𝑏 }
Moreover, we know that:

𝑃{(𝑋 + 𝑏)2 ≥ (𝑎 + 𝑏)2 } = 𝑃{𝑋 + 𝑏 ≥ (𝑎 + 𝑏)} + 𝑃{𝑋 + 𝑏 ≤ −(𝑎 + 𝑏) }
Hence, we have:
𝑃{𝑋 ≥ 𝑎 } = 𝑃{𝑋 + 𝑏 ≥ 𝑎 + 𝑏 } ≤ 𝑃{(𝑋 + 𝑏)2 ≥ (𝑎 + 𝑏)2 }
248 | P a g e
As a result, using the extension of Markov's inequality leads to:
𝐸[(𝑋 + 𝑏)2 ] 𝑉𝑎𝑟[(𝑋 + 𝑏)] + [𝐸(𝑋 + 𝑏)]2 𝜎 2 + 𝑏 2
𝑃{𝑋 ≥ 𝑎 } ≤ 𝑃{(𝑋 + 𝑏)2 ≥ (𝑎 + 𝑏)2 } ≤ = =
(𝑎 + 𝑏)2 (𝑎 + 𝑏)2 (𝑎 + 𝑏)2
𝜎2
Therefore, the replacement of 𝑏 = leads to the following inequality:
𝑎
𝜎2
𝑃{𝑋 ≥ 𝑎 } ≤
𝑎2 + 𝜎 2
Example 12.4
A factory produces, on average, 1000 pieces with standard deviation 100.

Obtain an upper bound for the probability that it produces at least 1200 pieces.
Solution. Suppose that 𝑋 denotes the daily production amount.
1002 1
𝑃{𝑋 ≥ 1200 } = 𝑃{𝑋 − 1000 ≥ 1200 − 1000} ≤ 2 2
=
200 + 100 5
5
If we obtain the above upper bound using the Markov's inequality, value is
6
1
obtained, which is more tenuous than . This is because the variance of the
5
distribution is not used in the Markov's inequality.
Using one-sided Chebyshev's inequality, it can be shown that if X is a random
variable having finite mean 𝜇 and finite variance 𝜎 2 , then for any 𝑎 > 0, we have:
𝜎2
𝑃{𝑋 ≥ 𝑎 + 𝜇 } ≤
𝑎2 + 𝜎 2
𝜎2
𝑃{𝑋 ≤ 𝜇 − 𝑎 } ≤ 2
𝑎 + 𝜎2
To prove this note, it suffices to know that 𝑋 − 𝜇 and 𝜇 − 𝑋 are random variable
with zero mean and variance 𝜎 2 . Therefore, we have:
𝜎2 𝜎2
𝑃{𝑋 − 𝜇 ≥ 𝑎 } ≤ 2 ⇒ 𝑃{𝑋 ≥ 𝑎 + 𝜇 } ≤ 2
𝑎 + 𝜎2 𝑎 + 𝜎2
249 | P a g e
250 | P a g e
1) If 𝑋 has the following probability function
𝑐
𝑃(𝑋 = 𝑥) = ; 𝑥 = 0,1,2, …
𝑥!
2) If 𝑋 has the following probability function
𝑞𝑥
𝑃(𝑋 = 𝑥) = − ; 𝑥 = 1,2, … ; 𝑞 = 1 − 𝑝; 𝑝 ∈ (0,1)
𝑥 𝑙𝑛 𝑝
3) Suppose that the PH value of the potable water in a region follows a random
variable with the following density function:
𝑥
; 1<𝑥<𝑎
𝑓𝑋 (𝑥) = {4
Obtain the average PH value of the potable water.
4) Suppose that the random variable 𝑋 has the following density function
𝑎 + 𝑏𝑥 2 ; 0<𝑥<1
𝑓𝑋 (𝑥) = {
3
Obtain the values of a and b such that 𝐸(𝑋) = 5.
5) If 𝑋 is a random variable with the following density function
obtain 𝐸(𝑋).
251 | P a g e
6) Suppose that you want to get your car insured against accidents as many as
$40,000. Based on the type of accidents, the insurance company should pay
the full insurance amount with probability 0.002, a half of the full insurance
amount with probability 0.01, and a quarter of the full insurance amount with
probability 0.01. Also, the other compensations are negligible. How much
money does the insurance company should receive as a premium in order to,
on average, gain $200?
7) At a store, a product is sold by either the guaranty or auction. Prices of the
guaranty-oriented and auction-oriented ways are $𝑎 and $𝑏 (𝑏 < 𝑎),
respectively. We know that the auction products are nondefective with
probability 𝑝 and defective with probability 1 − 𝑝. Under what circumstances
is it worth buying at the auction?
8) Suppose that two teams A and B are to play a game and stop it whenever one
of them wins twice. If in each game, team A wins with the probability 𝑝, and
team B wins with the probability 1 − 𝑝,
a. Obtain the average number of games.
b. What value of 𝑝 maximizes the number of games?
9) A person’s previous experience has shown that 7% of his mailed parcels will
not reach their destination. He has bought two books, each of which costs $20,
and wants to mail them to his brother. If he mails them in one parcel, its
postage costs $5.25, but if he mails them in separate parcels, their postage
costs $3.30 each. Obtain the expected value of expenses (being lost + postage)
under the following circumstances.
a. Mailing books as a single parcel.
b. Mailing books as two separate parcels.
10) Obtain the expected value of the random variable 𝑋 with the following
probability function:
1
𝑃(𝑋 = 𝑥) = ( )𝑥+1 ; 𝑥 = 0,1,2, …
2
𝑑
Hint: ∑∞ 𝑖
𝑖=1 𝑖𝑑 = (1−𝑑)2 ; |𝑑| < 1
252 | P a g e
11) Obtain the expected value of the random variable 𝑋 with the following
probability function:
1
( )𝑥+1 ; 𝑥 = 0,1,2, …
𝑃(𝑋 = 𝑥) = { 3
1
( )|𝑥|+1 ; 𝑥 = −1, −2, −3, …
2
100
12) An urn contains 2100 marbles, ( ) of which have number 𝑥(𝑥 = 0,1,2, … ,100).
𝑥
One marble is randomly withdrawn from the urn. If its number is denoted by
the random variable 𝑋, it is desired to obtain
a. The probability function of 𝑋.
b. The expected value of 𝑋.
13) There are 10 balls numbered 1 through 10 in an urn. Suppose that 2 of them are
randomly and with replacement withdrawn from the urn. It is desired to obtain
the probability function and the expected value of 𝑋 under the following
circumstances
a. 𝑋: the least selected number in the sample.
b. 𝑋: the greatest selected number in the sample.
14) Solve the preceding problem when the choices are without replacement.
15) The military tanks of a particular country are numbered 1 through 𝑁. The
country randomly losses “𝑛” tanks captured by the enemy in a war. The enemy
identifies the number of the labeled captured tanks. If 𝑀 denotes the greatest
number among the captured tanks by the enemy,
a. What is the value of 𝐸(𝑀)?
b. How can the enemy apply 𝐸(𝑀) to estimate the total number of opposite
tanks, which is 𝑁?
16) Suppose that the random variable 𝑋 has the following probability function:
𝑃(𝑋 = 𝑥) = 𝑐𝑥 ; 𝑥 = 1,2,3,4,5
a. Obtain the expected value of 𝑋.
b. Obtain the variance of 𝑋.
253 | P a g e
17) Suppose that the random variable 𝑋 has the following density function:
𝑐𝑥 𝑛 ; 0<𝑥<1
𝑓𝑋 (𝑥) = {
0 ; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
|1 − 𝑥| ; 0<𝑥<2
𝑓𝑋 (𝑥) = {
c. Obtain the expected value of 𝑋.
d. Obtain the variance of 𝑋.
3! 1 3
𝑃(𝑋 = 𝑥) = ( ) ; 𝑥 = 0,1,2,3
𝑥! (3 − 𝑥)! 2
2𝑥 −3 ; 𝑥>1
𝑓𝑋 (𝑥) = {
3𝑥 −4 ; 𝑥>1
𝑓𝑋 (𝑥) = {
254 | P a g e
1 − |𝑥|; −1 < 𝑥 < 1
𝑓𝑋 (𝑥) = {
a. Obtain the median of 𝑋.
b. Obtain the mode of 𝑋.
2(1 − 𝑥); 0≤𝑥≤1
𝑓𝑋 (𝑥) = {
a. Obtain the median of 𝑋.
b. Obtain the mode of 𝑋.
1 ; 𝑥>0
24) If the random variable 𝑌 is defined by 𝑌 = { 0 ; 𝑥 = 0 and random variable
−1 ; 𝑥 < 0
𝑋 has the density function 𝑓𝑋 (𝑥) and distribution function 𝐹𝑋 (𝑥), then obtain
the expected value and variance of the random variable 𝑌 in terms of 𝐹𝑋 (0).
𝑝; 𝑥=0
𝑃(𝑋 = 𝑥) = {1 − 2𝑝; 𝑥=1
𝑝; 𝑥=2
c. What value of 𝑝 maximizes the variance of 𝑋?
255 | P a g e
26) Suppose that the random variable 𝑋 has the following cumulative distribution
function:
0; 𝑥<0
3
; 0≤𝑥<1
8
1
𝐹𝑋 (𝑥) = ; 1≤𝑥<3
2
3
; 3≤𝑥<4
4
{1; 𝑥≥4
27) Suppose that the random variable 𝑋 has the following cumulative distribution
function:
0; 𝑥<0
𝑥+1
𝐹𝑋 (𝑥) = { ; 0≤𝑥<1
4
1; 𝑥≥1
28) Suppose that 𝑋 is an arbitrary random variable with finite mean 𝜇 and finite
variance 𝜎 2 . If the random variable 𝑌 is defined as 𝑌 = (𝑋 − 𝜇)/𝜎, then obtain
the expected value and variance of this random variable.
29) Show that, for the arbitrary random variable 𝑋, the equality 𝐸[𝑋(𝑋 − 𝜇𝑋 )] =
𝑉𝑎𝑟(𝑋) is valid.
30) If the expected value of the random variable 𝑋 equals 𝜇 and 𝑐 is a constant,
a. Show that the equality 𝐸[(𝑋 − 𝑐)2 ] = 𝑉𝑎𝑟(𝑋) + (𝜇 − 𝑐)2 is valid.
b. What value of 𝑐 minimizes the expression 𝐸[(𝑋 − 𝑐)2 ]?
256 | P a g e
31) The number of days needed to complete a project follows the random
variable 𝑋 with the following probability function:
0.2; 𝑥 = 10
0.3; 𝑥 = 11
𝑃(𝑋 = 𝑥) = 0.3; 𝑥 = 12
0.1; 𝑥 = 13
{0.1; 𝑥 = 14
If the revenue from this project (measured in thousand dollars) is a function
of the number of days as 𝑌 = 1000(12 − 𝑋), obtain the expected value and
variance of the random variable 𝑌.
32) Suppose that the random variable 𝑋 has the following cumulative
distribution function:
0; 𝑥 < −3
3
; −3 ≤ 𝑥 < 0
8
1
𝐹𝑋 (𝑥) = ; 0≤𝑥<3
2
3
; 3≤𝑥<4
4
{1; 𝑥≥4
Obtain each of the following values.
a. 𝐸(𝑋 2 − |𝑋|)
b. 𝐸(𝑋|𝑋|)
33) Consider a rectangle with the random dimensions of 𝑋 and 1 − 𝑋 where 𝑋 is
a random variable with the following density function:
3𝑥 2 ; 0<𝑥<1
𝑓𝑋 (𝑥) = {
Obtain the expected value of the rectangle area.
34) The random variable 𝑋 has mean 𝜇 and variance 𝜎 2 . After observing a
random value of 𝑋, a rectangular whose dimensions are 3|𝑋| by |𝑋| are made.
Calculate the expected value of the rectangular area.
257 | P a g e
1; 0<𝑥<1
𝑓𝑋 (𝑥) = {
Obtain each of the following values.
1
a. 𝐸 [𝑀𝑖𝑛(𝑋, 3)]
1
b. 𝐸 [𝑀𝑎𝑥(𝑋, 3)]
1
c. 𝐸(|𝑋 − 3 |)
36) The random variable 𝑋 has the the following probability function:
1
𝑃𝑋 (𝑥) = ; 𝑥 = 51,52, … ,100
50
𝑏+1 𝑑𝑥 1 𝑏 𝑑𝑥
Using the inequality ∫𝑎 < ∑𝑏𝑥=𝑎 𝑥 < ∫𝑎−1 , obtain the approximate value
𝑥 𝑥
1
of 𝐸 (𝑋).
37) For the arbitrary random variable 𝑋, show that the following equality is
valid: 𝐸(𝑋) = 𝐸[𝑚𝑎𝑥( 𝑋, 0)] − 𝐸[𝑚𝑎𝑥( − 𝑋, 0)]
38) Suppose that the expected value and variance of the random variable 𝑋 are
4 and 16, respectively. Obtain the approximate value of 𝐸(√𝑋 2 + 1).
39) Using Taylor series of function 𝑔(𝑋) = (𝑋 − 𝜇)2 , show that the relationship
Var[𝑔(𝑋)] ≈ [𝑔′ (𝜇)]2 𝜎 2 is valid.
40) Suppose that 𝑋 is a random variable whose mean and variance are 𝜇 and 𝜎 2 ,
respectively. Obtain the approximate expected value and variance of the
1
random variable 𝑌 = 𝑋.
41) For nonnegative random variable 𝑋 taking on integers, prove the following
identities:
a. ∑∞
i=0 𝑃(X>i) = 𝐸(𝑋)
b. ∑∞
𝑖=0 𝑃(𝑋 ≥ 𝑖) = 𝐸(𝑋) + 1
1
c. ∑∞ 2
𝑖=0 𝑖𝑃(𝑋 > 𝑖) = 2 [𝐸(𝑋 ) − 𝐸(𝑋)]
258 | P a g e
1
d. ∑∞ 2
𝑖=0 𝑖𝑃(𝑋 ≥ 𝑖) = 2 [𝐸(𝑋 ) + 𝐸(𝑋)]
42) Suppose that the yearly compensation value imposed to an insured car follows
a random variable with the following probability function:
0.9; 𝑥=0
0.06; 𝑥 = 500
0.03; 𝑥 = 1000
𝑃(𝑋 = 𝑥) =
0.008; 𝑥 = 10000
0.001; 𝑥 = 50000
{0.001; 𝑥 = 100000
What is the expected value of the compensation if we know that the
compensation value of the car is more than zero?
43) Time duration (measured in minutes) that a person speaks on the phone is a
𝑥
; 0<𝑥≤2
random variable with the density function given 𝑓𝑋 (𝑥) = {44 . What is
3
; 𝑥 > 2
𝑥
the expected value of time duration for communications lasing at least one
minute?
1
44) Suppose that 𝑋 is a nonnegative random variable and 𝑃(𝑋 ≥ 15) = 5. Obtain a
lower bound for the expected value of this random variable.
45) A chief financial officer of a company, based on historical data, knows that the
expected value and standard deviation of the daily production cost per unit is
equal to 24 and 9 (measured in thousand dollars), respectively. Obtain a lower
bound for the proportion of days that a product's daily production cost is
between $6,000 and $42,000?
46) Suppose that 𝑋 is a random variable such that 𝐸(𝑋) = 3 and 𝐸(𝑋 2 ) = 13. Obtain
the least value of 𝑃(−2 < 𝑋 < 8).
47) A rental car agency announces in the advertisement that the expected waiting
time to take a rental car for its customers is 5 minutes. A customer requests
taking a rental car from the agency to get to the airport. If he waits more than
60 minutes to take the car, he will miss the flight. What is the maximum
probability that he misses the flight?
259 | P a g e
48) If 𝑋 is a continuous random variable with positive values, 𝜇 is the expected
value, and m is the median, show that the relationship 𝑚 ≤ 2𝜇 is valid based on
Markov's inequality.
49) If 𝑋 is a random variable with positive values and 𝐸(𝑋 2 ) exists, obtain an upper
bound for 𝑃(𝑋 > 2√𝐸(𝑋 2 )).
50) If the moment generating function of the random variable 𝑋 is given by:
𝑡 −1)
𝑀𝑋 (𝑡) = 𝑒 2(𝑒 , it is desired to obtain:
a. 𝐸(𝑋)
b. 𝐸(𝑋 2 )
51) If 𝑋 is a nonnegative random variable with the density function:
𝑓𝑋 (𝑥) = 2𝑒 −2𝑥 ; 𝑥 > 0, then it is desired to calculate:
a. 𝑀𝑋 (𝑡)
b. 𝐸(𝑋)
c. 𝐸(𝑋 2 )
52) If 𝑋 is a nonnegative random variable with the probability function of
1
𝑃(𝑋 = 𝑥) = 𝑘 ; 𝑥 = 1,2, … , 𝑘, then it is desired to calculate:
a. 𝑀𝑋 (𝑡)
b. 𝐸(𝑋)
c. 𝐸(𝑋 2 )
1
53) If 𝑋 is a random variable and we know that 𝑀𝑥 (𝑡) = (4)10 (1 + 𝑒 𝑡 )20 , it is desired
to calculate:
a. 𝐸(𝑋)
b. 𝐸(𝑋 2 )
c. 𝐸(𝑋(𝑋 − 1))
54) If the factorial moment generating function of the random variable 𝑋 is equal
(3+𝑡)10
to 𝜋𝑋 (𝑡) = , it is desired to calculate:
4 10
260 | P a g e
a. 𝐸(𝑋)
b. 𝐸(𝑋(𝑋 − 1))
c. 𝐸(𝑋 2 )
55) The probability function of the random variable 𝑋 is equal to:
𝑒 −1
𝑃(𝑋 = 𝑥) = ; 𝑥 = 0,1,2, … . It is desired to calculate:
𝑥!
a. 𝜋𝑋 (𝑡) = 𝐸(𝑡 𝑋 )
b. the probability that the random variable 𝑋 takes on an even value.
c. the probability that the random variable 𝑋 takes on an odd value.
d. 𝐸(𝑋)
e. 𝐸(𝑋(𝑋 − 1))
f. 𝐸(𝑋(𝑋 − 1) … (𝑋 − (𝑘 − 1)))
56) The probability function of the random variable 𝑋 is equal to:
1 𝑥
𝑃(𝑋 = 𝑥) = (2) ; 𝑥 = 1,2, … . It is desired to calculate:
a. 𝜋𝑋 (𝑡) = 𝐸(𝑡 𝑋 )
b. The probability that the random variable 𝑋 takes on an even value.
c. The probability that the random variable 𝑋 takes on an odd value.
57) Suppose that the moment generating function of random variable 𝑋 with
respective mean and variance 𝜇 and 𝜎 2 is denoted by 𝑀𝑋 (𝑡). Moreover, the
function 𝜙𝑋 (𝑡) is defined as 𝜙𝑋 (𝑡) = 𝑙𝑛[𝑀𝑋 (𝑡)]. Show that the following
equalities are valid.
𝑑𝜙𝑋 (𝑡)
a. | =𝜇
𝑑𝑡 𝑡=0
𝑑2 𝜙𝑋 (𝑡)
b. | = 𝜎2
𝑑𝑡 2 𝑡=0
58) Suppose that the random variable 𝑋 takes on nonnegative integers. In this
case, if the factorial moment generating function of this random variable is
1
given by 𝜋𝑋 (𝑡) = 3 𝑡 2 (1 + 2𝑡 3 ), obtain the probability function of random
variable 𝑋.
261 | P a g e
59) Suppose that random variable 𝑋 takes on nonnegative integers, and its
factorial moment generating function is given by
2 1 4 2 3 3
𝜋𝑋 (𝑡) = 𝐸(𝑡 𝑋 ) = + 𝑡+ 𝑡 + 𝑡
10 10 10 10
Now, obtain the value of 𝑃(𝑋 ≤ 1).
60) Suppose that the random variable 𝑋 has a moment generating function
given by
1 1 1
𝑀𝑋 (𝑡) = 𝑒 𝑡 + 𝑒 2𝑡 + 𝑒 3𝑡
6 3 10
Obtain the probability function of this random variable.
61) Suppose that random variable 𝑋 has the moment generating function given
by
𝑡
4 2𝑒 𝑡 𝑒 −𝑡 3𝑒 2
𝑀𝑋 (𝑡) = + + +
10 10 10 10
Obtain the value of 𝑃(𝑋 ≤ 0).
62) Suppose that 𝑋 is a random variable taking on values of 0, 1, and 2. If 𝑘 ≥ 1
1
and 𝐸(𝑋 𝑘 ) = 4 + 2𝑘−1, then obtain the value of 𝑃(𝑋 ≥ 1).
63) Suppose that for any 𝑛 ≥ 1, the 𝑛𝑡ℎ moment of random variable 𝑋 exists and
is as follows:
𝐸(𝑋 𝑛 ) = 𝑛! 2𝑛
Obtain the moment generating function of this random variable.
64) (Newspaper Seller Problem) A product is sold seasonally. When the season
terminates, it leads to a net profit of $𝑏 for each sold product and a net loss
of $ℓ for each unsold product. The demand value of this product in each
season follows a discrete random variable, whose probability function is as
𝑃𝑋 (𝑋 = 𝑖) for nonnegative values of 𝑖. If the store has to stock this product
in advance,
a. Show that the profit amount denoted by 𝐵(𝑠) is equal to:
𝑏𝑥 − (𝑠 − 𝑥)𝑙 ; 𝑋≤𝑠
𝐵(𝑠) = {
𝑠𝑏 ; 𝑋>𝑠
262 | P a g e
Where “𝑠” is the number of products that the seller orders.
b. Show that the expected value of profit is 𝑠𝑏 + (𝑏 + ℓ) ∑𝑠𝑖=0(𝑖 − 𝑠)𝑃(𝑖).
c. Show that equality 𝐸[𝐵(𝑠 + 1)] − 𝐸[𝐵(𝑠)] = 𝑏 − (𝑏 + ℓ) ∑𝑠𝑖=0 𝑃(𝑖) is
valid.
d. Show that the number of products that the seller should stock to
maximize its expected profit is the least value of 𝑠 satisfying the
following inequality:
𝑏
𝐹𝑋 (𝑠) ≥
𝑏+ℓ
65) Reconsider the preceding problem. Suppose that if each unit of demand is
not met by the seller, then he has to pay a cost of $𝑐. Show that, in this state,
the value of 𝑠 maximizing the expected value of profit is equal to the least
value of 𝑠 satisfying the following inequality:
𝑏+𝑐
𝐹𝑋 (𝑠) ≥
𝑏+ℓ+𝑐
66) In Problem 64, show that if the demand value or 𝑋 follows a continuous
random variable with density function 𝑓 and cumulative distribution
function 𝐹, then the value that the seller should stock to maximize his
average profit is 𝑠 satisfying the following inequality:
𝑏
𝐹𝑋 (𝑠) =
𝑏+ℓ
67) In Problem 66, show that if the demand value or 𝑋 follows a continuous
random variable with density function 𝑓 and cumulative distribution
function 𝐹, then the value that the seller should stock to maximize his
average profit is 𝑠 satisfying the following inequality:
𝑏+𝑐
𝐹𝑋 (𝑠) =
𝑏+ℓ+𝑐
263 | P a g e
I n Chapter 4, we presented the definition of random variable and introduced
different types of random variables. Further, some properties of random variables
were expressed in Chapters 4 and 5. There are some discrete random variables
having many applications in the real-world modeling of random phenomena. Due to
its high importance, it is distinctly addressed in this chapter.
T he Bernoulli random variable is one of the most important and widely used
random variables in modeling random phenomena in the real world. Consider a
trial that its outcome can take on one of the two states of success or failure. For
example, if a heads come up in flipping a coin, we record it as a success. On the other
hand, if a tails comes up, we record it as a failure. As another example, if a 6 comes
up in the trial of rolling a die, we say that a success has occurred, and if any face
other than 6 turns up, we say that a failure has occurred. We call this type of trials
Bernoulli trial.
If we assign value 1 to the success and value zero to the failure in this type of
trials, we call the resulted random variable Bernoulli random variable.
Other examples of this random variable are as follows:
➢ In the trial of investigating the lifetime of a light bulb
264 | P a g e
1 The lifetime of the light bulb is less than or equal to 100 hours
𝑌= {
0 The lifetime of the light bulb is more than 100 hours
➢ In the trial of inspecting a piece
1 The piece is nondefective

𝑍= {
0 The piece is defective
The probability function of this random variable is as follows:
𝑝 ; 𝑥=1
𝑃(𝑋 = 𝑥) = {
1−𝑝 ; 𝑥=0
Where the value of 0 ≤ 𝑝 ≤ 1 is the parameter of the Bernoulli distribution
and probability of success. Furthermore, 1 − 𝑝 is the probability of failure, which
is sometimes denoted by 𝑞. If 𝑋 possesses a Bernoulli distribution with the
probability of success 𝑝, we briefly denote it by 𝑋 ∼ 𝐵𝑒𝑟(𝑝).
Moreover, the probability function of this random variable can be shown as
follows:
𝑃(𝑋 = 𝑥) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 = 𝑝 𝑥 𝑞1−𝑥 ; 𝑥 = 0,1
If 𝑋 possesses the Bernoulli random variable, for its mean and variance, we
have:
𝐸(𝑋) = ∑ 𝑥 𝑃𝑥 (𝑥) = 0 × (1 − 𝑝) + 1 × 𝑝 = 𝑝
𝑥
𝐸(𝑋 𝑛 ) = ∑ 𝑥 𝑛 𝑃𝑥 (𝑥) = 0𝑛 × (1 − 𝑝) + 1𝑛 × 𝑝 = 𝑝
𝑥
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)] 2 = 𝑝 − 𝑝2 = 𝑝 (1 − 𝑝) = 𝑝𝑞

The moment generating function and factorial moment generating function
of this random variable are obtained as follows:
1
𝑀𝑥 (𝑡) = 𝐸(𝑒 ) = ∑ 𝑒 𝑡𝑥 𝑃(𝑋 = 𝑥) = 𝑞𝑒 0 + 𝑝𝑒 𝑡 = 𝑝𝑒 𝑡 + 𝑞

𝑡𝑋
𝑥=0
1
𝜋𝑥 (𝑡) = 𝐸(𝑡 ) = ∑ 𝑡 𝑥 𝑃(𝑋 = 𝑥) = 𝑞𝑡 0 + 𝑝𝑡1 = 𝑝𝑡 + 𝑞

𝑋
𝑥=0
265 | P a g e
Example 2.1
If 𝑋 has a Bernoulli distribution with parameter 𝑝, obtain 𝐸(𝑋 + 𝑋 2 + ⋯ + 𝑋 𝑛 ).

Solution. As mentioned in this section, 𝐸(𝑋 𝑛 ) or the 𝑛𝑡ℎ moment of the Bernoulli
random variable is equal to 𝑝. Therefore, we have:
𝐸(𝑋 + 𝑋 2 + ⋯ + 𝑋 𝑛 ) = 𝐸(𝑋) + 𝐸(𝑋 2 ) + ⋯ + 𝐸(𝑋 𝑛 ) = 𝑝 + 𝑝 + ⋯ + 𝑝 = 𝑛𝑝
Example 2.2
If 𝑋 has a Bernoulli distribution with parameter 𝑝, then show that each of the
random variables 𝑊1 = 𝑋 𝑛 , 𝑊2 = 1 − 𝑋, 𝑊3 = 1 − 𝑋 𝑛 , and 𝑊4 = (1 − 𝑋)𝑛 has a Bernoulli
distribution.
Solution. Since 𝑋 takes on the value of 0 or 1, by substituting these two values for
each of the random functions 𝑊1 , 𝑊2 , 𝑊3 , and 𝑊4 , we can see that these variables also
take on the value of 0 or 1; thereby, all of them have the Bernoulli distribution.
𝑃(𝑊1 = 0) = 𝑃(𝑋 𝑛 = 0) = 𝑃(𝑋 = 0) = 1 − 𝑝

} ⇒ 𝑊1 ∼ 𝐵𝑒𝑟(𝑝)
𝑃(𝑊1 = 1) = 𝑃(𝑋 𝑛 = 1) = 𝑃(𝑋 = 1) = 𝑝
Moreover, since 𝑋 takes on the value of 0 or 1, it is evident that 𝑋 𝑛 is equal to
𝑋, and it has the Bernoulli distribution with parameter 𝑝.
𝑃(𝑊2 = 0) = 𝑃(1 − 𝑋 = 0) = 𝑃(𝑋 = 1) = 𝑝
} ⇒ 𝑊2 ∼ 𝐵𝑒𝑟(1 − 𝑝)
𝑃(𝑊2 = 1) = 𝑃(1 − 𝑋 = 1) = 𝑃(𝑋 = 0) = 1 − 𝑝
Note that 𝑊2 takes on the value 1 with probability 1 − 𝑝. Therefore, its

distribution is Bernoulli with parameter 1 − 𝑝.
266 | P a g e
𝑃(𝑊3 = 0) = 𝑃(1 − 𝑋 𝑛 = 0) = 𝑃(𝑋 𝑛 = 1) = 𝑃(𝑋 = 1) = 𝑝
} ⇒ 𝑊3 ∼ 𝐵𝑒𝑟(1 − 𝑝)
𝑃(𝑊3 = 1) = 𝑃(1 − 𝑋 𝑛 = 1) = 𝑃(𝑋 𝑛 = 0) = 𝑃(𝑋 = 0) = 1 − 𝑝
𝑃(𝑊4 = 0) = 𝑃((1 − 𝑋)𝑛 = 0) = 𝑃(1 − 𝑋 = 0) = 𝑃(𝑋 = 1) = 𝑝
} ⇒ 𝑊4 ∼ 𝐵𝑒𝑟(1 − 𝑝)
𝑃(𝑊4 = 1) = 𝑃((1 − 𝑋)𝑛 = 1) = 𝑃(1 − 𝑋 = 1) = 𝑃(𝑋 = 0) = 1 − 𝑝
Example 2.3
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 are Bernoulli random variables resulting from independent

trials, each with parameter 𝑝,
a. Obtain the distribution of random variable 𝑌1 = 𝑀𝑖𝑛{𝑋1 , 𝑋2 , … , 𝑋𝑛 }.
b. Obtain the distribution of random variable 𝑌2 = 𝑀𝑎𝑥{𝑋1 , 𝑋2 , … , 𝑋𝑛 }.
c. Obtain the distribution of random variable 𝑌3 = 𝑋1 𝑋2 … 𝑋𝑛 .
d. Determine 𝑃(𝑌1 = 0 , 𝑌2 = 0), 𝑃(𝑌1 = 1 , 𝑌2 = 1), and 𝑃(𝑌1 = 0 , 𝑌2 = 1).
Solution. Different possible outcomes of these trials can be shown as follows:
𝑋1 𝑋2 𝑋3 ⋯ 𝑋𝑛
0 0 0 ⋯ 0
1 0 0 ⋯ 0
0 1 0 ⋯ 0
⋮ ⋮ ⋮ ⋱ ⋮
1 1 1 ⋯ 1
a. Variable 𝑌1 = 𝑀𝑖𝑛{𝑋1 , 𝑋2 , … , 𝑋𝑛 } can take on the value of 0 or 1. This
variable takes on the value of 1 when all of the trials result in successes
or 1's (the last row of the above table). Therefore, the probability that
this variable takes on the value of 1 equals 𝑝𝑛 . Hence, this variable is a
Bernoulli random variable with parameter 𝑝𝑛 .
b. Variable 𝑌2 = 𝑀𝑎𝑥{𝑋1 , 𝑋2 , … , 𝑋𝑛 } can also take on the value of 0 or 1. This
variable takes on the value of 0 when all of the trials result in failures or
0's (the first row of the above table). Therefore, the probability that this
variable takes on the value of 0 equals 𝑞 𝑛 , and the probability that this
variable takes on the value of 1 equals 1 − 𝑞 𝑛 . Therefore, this variable is
a Bernoulli random variable with parameter 1 − 𝑞 𝑛 .
267 | P a g e
c. Variable 𝑌3 = 𝑋1 𝑋2 … 𝑋𝑛 can also take on the value of 0 or 1. This variable
like variable 𝑌1 takes on the value of 1 when all of the trials result in
successes or 1's (the last row of the above table). Therefore, the
probability that this variable takes on the value of 1 equals 𝑝𝑛 .
Accordingly, this variable is a Bernoulli random variable with parameter
𝑝𝑛 .
d. Two variables 𝑌1 and 𝑌2 are equal to zero whenever all of the trials result
in failures or 0's (the first row of the above table). Therefore, we have:
𝑃(𝑌1 = 0 , 𝑌2 = 0) = 𝑃(𝑋1 = 0, 𝑋2 = 0, … , 𝑋𝑛 = 0) = 𝑞 𝑛
Note that the above expression cannot be written as 𝑃(𝑌1 = 0)𝑃(𝑌2 = 0)
since events 𝑌1 = 0 and 𝑌2 = 0 are not independent. For example, when
we know that event 𝑌2 = 0 has occurred, it means that all of the 𝑋𝑖 's are
zeros; thereby, event 𝑌1 = 0 certainly occurs. Hence, these two events
are not independent.
Moreover, two variables 𝑌1 and 𝑌2 are equal to one whenever all of the
trials result in successes or 1's (the last row of the above table).
Therefore, we have:
𝑃(𝑌1 = 1 , 𝑌2 = 1) = 𝑃(𝑋1 = 1, 𝑋2 = 1, … , 𝑋𝑛 = 1) = 𝑝𝑛
And in other cases, 𝑌1 = 0and 𝑌2 = 1. Therefore, we have:
𝑃(𝑌1 = 0 , 𝑌2 = 1) = 1 − 𝑞 𝑛 − 𝑝𝑛
Note that (𝑌1 = 0 , 𝑌2 = 1) occurs whenever at least one of 𝑋𝑖 's is equal
to zero and at least one of 𝑋𝑖 's is equal to one. This event always occurs
except for the state that the 𝑋𝑖 's are equal to 1 or all of the 𝑋𝑖 's are equal
to zero. In fact, except for the first and the last rows, this event always
occurs.
(Explain why the probability of occurrence of the event (𝑌1 = 1 , 𝑌2 = 0)
is equal to zero.)
The Bernoulli random variable, which is named after the Swiss mathematician
James Bernoulli, may be the most straightforward type of random variables.
Nonetheless, such a simple random variable can result in creating many special
discrete random variables, some of which are to be introduced in the sections ahead.
268 | P a g e
S uppose that 𝑛 trials with probability of success 𝑝 and failure 1 − 𝑝 are performed
independently. If the random variable 𝑋 denotes the number of successes in these
𝑛 trials, it is called a binomial random variable with parameters (𝑛, 𝑝) whose
probability function is as follows:
𝑛
𝑃(𝑋 = 𝑖) = ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 ; 𝑖 = 0,1,2, … , 𝑛
𝑖
To better understand the binomial random variable probability function,

suppose that 5 independent Bernoulli trials are performed. States of these 5 trials
5
with three number of successes is equal to ( ) outcomes shown as follows:
3
sssff, ssfsf, sfssf, fsssf, ssffs, sfsfs, fssfs, sffss, fsfss, ffsss
Since the probability of each of the above outcomes is equal to 𝑝3 (1 − 𝑝)2 , the
probability that the number of successes in five trials is equal to 3 equals:
5
( ) 𝑝3 (1 − 𝑝)2
3
Therefore, if 𝑋 denotes the number of successes in these five trials, its
probability function is as follows:
5
𝑃(𝑋 = 𝑖) = ( ) 𝑝𝑖 (1 − 𝑝)5−𝑖 ; 𝑖 = 0,1,2,3,4,5
𝑖
If 𝑋 has a binomial distribution with parameters 𝑛 and 𝑝, we briefly denote it

by 𝑋 ∼ 𝐵(𝑛, 𝑝).
269 | P a g e
Example 3.1
The probability of rehabilitation for each person having a type of cancer is 0.4.
If we know that 15 people have this type of cancer, obtain the probability of following
events.
a. Exactly 5 out of these 15 people rehabilitate.
b. At least 10 out of these 15 people rehabilitate.
Solution.
a. If 𝑋 denotes the number of rehabilitated people, then 𝑋 has a binomial
distribution with parameters 𝑝 = 0.4 and 𝑛 = 15. Therefore, we have:
15
𝑃(𝑋 = 5) = ( ) (0.4)5 (0.6)10 = 0.1859
5
15 (0.4)𝑖 (0.6)15−𝑖
b. 𝑃(𝑋 ≥ 10) = ∑15 15
𝑖=10 𝑃(𝑋 = 𝑖) = ∑𝑖=10 ( ) = 0.0338
𝑖
c.
Example 3.2
Suppose that the pieces produced by a factory are defective independently

with probability 0.05 each. These pieces are sold in packages of size 10 and if one
package contains more than one defective piece, then it is returned to the factory.
What proportion of the packages produced by this factory are returned?
Solution. Suppose that 𝑋 denotes the number of defective pieces in a package. If so,
we have:
10 10
𝑃(𝑋 > 1) = 1 − 𝑃(𝑋 = 0) − 𝑃(𝑋 = 1) = 1 − ( ) (0.05)0 (0.95)10 − ( ) (0.05)1 (0.95)9 = 0.086
0 1
270 | P a g e
Example 3.3
In Example 3.2, if a person purchases 5 packages from this factory, it is desired

to calculate the probability that one of them is returned to the factory.
Solution. If 𝑌 denotes the number of packages to be returned by the person, then 𝑌
has the binomial distribution with parameters 𝑛 = 5 and 𝑝 = 0.086.
5
𝑃(𝑌 = 1) = ( ) (0.086)1 × (0.914)4 ≈ 0.3
1
Example 3.4
In “𝑛” independent Bernoulli trials where each has the probability of success
𝑝, we get 𝑘 successes. What is the probability that the outcome of the first trial is a
success?
Solution.
A: The outcome of the first trial is a success
B: A total of 𝑘 successes is obtained
𝑛 − 1 𝑘−1 (𝑛−1)−(𝑘−1) 𝑛−1
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵|𝐴) 𝑃(𝐴) (𝑘 − 1) 𝑝 (1 − 𝑝) ×𝑝 ( )
𝑘−1 =𝑘
𝑃(𝐴|𝐵) = = = 𝑛 = 𝑛
𝑃(𝐵) 𝑃(𝐵) ( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘 ( ) 𝑛
𝑘 𝑘
In fact, the above calculations show that when, in “𝑛” independent Bernoulli
trials, each of which results in a success with probability of success 𝑝, we know that
we have obtained 𝑘 successes, the probability of success for each of the trials like the
𝑘
first one is equal to 𝑛.
271 | P a g e
T he mean and variance of the binomial random variable is as follows:
𝐸(𝑋) = 𝑛𝑝
𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝)
To prove the above equalities, we first prove that if 𝑋 has a binomial
distribution with parameters 𝑛 and 𝑝 and 𝑌 has the binomial distribution with
parameters 𝑛 − 1 and 𝑝, then we have:
𝐸(𝑋 𝑘 ) = 𝑛𝑝𝐸((𝑌 + 1)𝑘−1 )
The proof of the above equality is as follows:
𝑛 𝑛 𝑛
𝑛 𝑛
𝐸(𝑋 𝑘 ) = ∑ 𝑖 𝑘 𝑃(𝑋 = 𝑖) = ∑ 𝑖 𝑘 ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 = ∑ 𝑖 𝑘−1 𝑖 ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑖 𝑖
𝑖=0 𝑖=1 𝑖=1
𝑛 𝑛−1
Using the identity 𝑖 ( ) = 𝑛 ( ), we have:
𝑖 𝑖−1
𝑛 𝑛
𝑛 − 1 𝑖−1 (1 𝑛−1 𝑗
⇒ 𝐸(𝑋 𝑘 ) = 𝑛𝑝 ∑ 𝑖 𝑘−1 ( )𝑝 − 𝑝)𝑛−𝑖 = 𝑛𝑝 ∑(𝑗 + 1)𝑘−1 ( ) 𝑝 (1 − 𝑝)𝑛−𝑖
𝑖−1 𝑗
𝑖=1 𝑖=1
= 𝑛𝑝𝐸((𝑌 + 1)𝑘−1 )
Letting 𝑘 = 1, we get:
Moreover, letting 𝑘 = 2 results in:
𝐸(𝑋 2 ) = 𝑛𝑝𝐸((𝑌 + 1)1 ) = 𝑛𝑝((𝑛 − 1)𝑝 + 1)
𝑉𝑎𝑟(𝑋) = 𝑛𝑝((𝑛 − 1)𝑝 + 1) − (𝑛𝑝)2 = 𝑛𝑝(1 − 𝑝)
The moment generating function of the binomial random variable is
determined as follows:
∞ ∞
𝑡𝑋 𝑡𝑥 𝑛 𝑛
𝑀𝑥 (𝑡) = 𝐸(𝑒 ) = ∑ 𝑒 ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 = ∑ ( ) (𝑝𝑒 𝑡 )𝑥 (1 − 𝑝)𝑛−𝑥 = (𝑝𝑒 𝑡 + 1 − 𝑝)𝑛
𝑥 𝑥
𝑥=0 𝑥=0
272 | P a g e
Where the last equality is obtained by using the binomial expansion or the
𝑛
identity ∑𝑛𝑖=0 ( ) 𝑥 𝑖 𝑦 𝑛−𝑖 = (𝑥 + 𝑦)𝑛 .
𝑖
In addition, its factorial moment generating function is as follows:
∞ ∞
𝑛 𝑛
𝜋𝑥 (𝑡) = 𝐸(𝑡 ) = ∑ 𝑡 ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 = ∑ ( ) (𝑝𝑡)𝑥 (1 − 𝑝)𝑛−𝑥 = (𝑝𝑡 + 1 − 𝑝)𝑛
𝑋 𝑥
𝑥 𝑥
𝑥=0 𝑥=0
However, as mentioned in the preceding chapter, it can be shown that if the

moment generating function of a variable is as 𝑀𝑥 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = (𝑝𝑒 𝑡 + 1 − 𝑝)𝑛 , its
factorial moment generating function is as 𝜋𝑥 (𝑡) = 𝐸(𝑡 𝑋 ) = (𝑝𝑡 + 1 − 𝑝)𝑛 .
To calculate the mean and variance of the binomial random variable, its
moment generating function can also be used as follows:
𝑑𝑀𝑥 (𝑡)
𝐸(𝑋) = | = 𝑛𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 1 − 𝑝)𝑛−1 |𝑡=0 = 𝑛𝑝
𝑑𝑡 𝑡=0
𝑑2 𝑀𝑥 (𝑡)
𝐸(𝑋 2 ) = |
𝑑𝑡 2 𝑡=0
= (𝑛(𝑛 − 1)(𝑝𝑒 𝑡 )2 (𝑝𝑒 𝑡 + 1 − 𝑝)𝑛−2 + 𝑛𝑝𝑒 𝑡 (𝑝𝑒 𝑡 + 1 − 𝑝)𝑛−1 ) |
𝑡=0
= 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝
⇒ 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = 𝑛𝑝(1 − 𝑝)
Proposition 3-1
If 𝑋 is a binomial random variable with parameters 𝑛 and 𝑝, then 𝑌 = 𝑛 − 𝑋 has the
binomial distribution with parameters 𝑛 and 1 − 𝑝.
Proof.
𝑛 𝑛
𝑃(𝑌 = 𝑖) = 𝑃(𝑛 − 𝑋 = 𝑖) = 𝑃(𝑋 = 𝑛 − 𝑖) = ( ) 𝑝𝑛−𝑖 𝑞 𝑖 = ( ) 𝑞 𝑖 𝑝𝑛−𝑖 ; 𝑖 = 0,1,2, … , 𝑛
𝑛−𝑖 𝑖
⇒ 𝑌 ∼ 𝐵(𝑛, 𝑞) ; 𝑞 = 1−𝑝
In other words, if the number of successes in “𝑛” independent Bernoulli trials has the
binomial distribution with parameters 𝑛 and 𝑝, then the number of failures has the
binomial distribution with parameters 𝑛 and 𝑞 = 1 − 𝑝.
273 | P a g e
Proposition 3-2
Suppose that 𝑋 is a binomial random variable with parameters 𝑛 and 𝑝. If (𝑛 + 1)𝑝 is
an integer, then the mode (the most likely value of the random variable) includes two
points (𝑛 + 1)𝑝 and (𝑛 + 1)𝑝 − 1. However, if (𝑛 + 1)𝑝 is not an integer, then the mode
is equal to the greatest integer less than or equal to (𝑛 + 1)𝑝.
Proof. Considering the probability function of the binomial distribution (Figure 6-1),
the least value of 𝑘 for which 𝑃(𝑋 = 𝑘 + 1) is less than or equal to 𝑃(𝑋 = 𝑘) leads to
the greatest probability. Therefore, the least value of 𝑘 with the following property is
the mode:
𝑛
𝑃(𝑘 + 1) ( ) 𝑝𝑘+1 (1 − 𝑝)𝑛−𝑘−1 (𝑛 − 𝑘)𝑝
≤1⇒ 𝑘 + 1 ≤1⇒ ≤1
𝑃(𝑘) 𝑛 (𝑘 + 1)(1 − 𝑝)
( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘
𝑘
⇒ 𝑛𝑝 − 𝑘𝑝 ≤ 𝑘 − 𝑘𝑝 + 1 − 𝑝 ⇒ 𝑛𝑝 ≤ 𝑘 + 1 − 𝑝 ⇒ (𝑛 + 1)𝑝 ≤ 𝑘 + 1
Therefore, the first value of 𝑘 satisfying the above last inequality has the
If (𝑛 + 1) 𝑝 is an integer ⇒ mode = (𝑛 + 1) 𝑝 , (𝑛 + 1) 𝑝 − 1
If (𝑛 + 1) 𝑝 is not an integer ⇒ mode = [(𝑛 + 1) 𝑝]
Moreover, it can simply be shown that if, in the binomial distribution, two
consecutive points 𝑘 and 𝑘 + 1 have the same probability, then 𝑘 + 1 = (𝑛 + 1)𝑝, and
therefore, 𝑘 and 𝑘 + 1 are certainly the mode.
In the figure below, examples of the binomial distribution probability function

with different values of 𝑛 and 𝑝 are shown:
274 | P a g e
Figure 6-1 Examples of the binomial distribution probability function with
different values of 𝑛 and 𝑝
As seen in the above figures, in the binomial distribution, if 𝑝 < 0.5, the
distribution is skewed to the right, if 𝑝 > 0.5, the distribution is skewed to the left,
and if 𝑝 = 0.5, the distribution is symmetric; however, if 𝑝 ≠ 0.5, we show in Chapter
7 that as “𝑛” increases, the skewness of the distribution decreases.
C onsider a sequence of independent Bernoulli trials that each has the probability
of success 𝑝. If the random variable 𝑋 denotes the number of trials required to
get the first success, we call it the geometric random variable, whose probability
function is as follows:
𝑃(𝑋 = 𝑖) = (1 − 𝑝)𝑖−1 𝑝 ; 𝑖 = 1,2,3, …
Understanding the equality associated with the probability function of this
random variable is straightforward. This is because when the first success occurs at
275 | P a g e
the 𝑖 𝑡ℎ trial (𝑋 equals 𝑖), all of the first (𝑖 − 1) trials should result in failures and the 𝑖 𝑡ℎ
trial should be a success.
The probability function of the geometric distribution is depicted in Figure 6-
2:
Figure 6-2 The probability function of the geometric distribution
If 𝑋 has a geometric distribution with the probability of success 𝑝, we briefly

denote it by 𝑋 ∼ 𝐺(𝑝). Moreover, if 𝑋 has a geometric distribution, for the positive
integer 𝑎, we have:
∞ ∞
1 1
𝑃(𝑋 > 𝑎) = ∑ 𝑃(𝑋 = 𝑖) = ∑ 𝑞 𝑖−1 𝑝 = 𝑞 𝑎 𝑝 = 𝑞𝑎 𝑝 = 𝑞𝑎
1−𝑞 𝑝
𝑖=𝑎+1 𝑖=𝑎+1
However, the above probability can also be obtained by another way since
event 𝑋 > 𝑎 means that the first success does not occur until the 𝑎𝑡ℎ trial, which has
the probability of 𝑞 𝑎 = (1 − 𝑝)𝑎 .
Likewise, it can be shown that the value of the cumulative distribution function
of the geometric distribution for positive integer 𝑎 is equal to:
𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = 1 − 𝑃(𝑋 > 𝑎) = 1 − 𝑞 𝑎
276 | P a g e
Example 4.1
A person rolls a fair die until he gets the first 6. If 𝑋 denotes the number of
trials needed to get the first 6. It is desired to calculate:
a. 𝑃(𝑋 = 4)
b. 𝑃(𝑋 < 5)
c. 𝑃(𝑋 ≥ 6)
Solution.
a.
5 1
𝑃(𝑋 = 4) = ( )3 ( )
6 6
b.
5
𝑃(𝑋 < 5) = 𝑃(𝑋 ≤ 4) = 1 − 𝑞 4 = 1 − ( )4
6
c.
5
𝑃(𝑋 ≥ 6) = 𝑃(𝑋 > 5) = 𝑞 5 = ( )5
6
T he mean and variance of the geometric distribution is equal to:
∞ ∞ ∞
𝑖−1
𝑝 𝑝 1−𝑝 1
𝐸(𝑋) = ∑ 𝑖𝑃(𝑋 = 𝑖) = ∑ 𝑖(1 − 𝑝) 𝑝= ∑ 𝑖(1 − 𝑝)𝑖 = 2
=
1−𝑝 1 − 𝑝 (1 − (1 − 𝑝)) 𝑝
𝑖=1 𝑖=1 𝑖=1
∞ ∞ ∞
𝑝 𝑝 (1 − 𝑝)(2 − 𝑝) 2 − 𝑝
𝐸(𝑋 2 ) = ∑ 𝑖 2 𝑃(𝑋 = 𝑖) = ∑ 𝑖 2 (1 − 𝑝)𝑖−1 𝑝 = ∑ 𝑖 2 (1 − 𝑝)𝑖 = = 2
1−𝑝 1−𝑝 𝑝3 𝑝
𝑖=1 𝑖=1 𝑖=1
2−𝑝 1 1−𝑝
⇒ 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = 2
− ( )2 =
𝑝 𝑝 𝑝2
Where the following equalities are used:
277 | P a g e
∞ ∞ ∞ ∞
𝑖 𝑖−1
𝑑𝑞 𝑖 𝑑 𝑑 𝑞 𝑞
∑ 𝑖𝑞 = 𝑞 ∑ 𝑖𝑞 = 𝑞∑ =𝑞 ∑ 𝑞𝑖 = 𝑞 = ; 0<𝑞<1
𝑑𝑞 𝑑𝑞 𝑑𝑞 1 − 𝑞 (1 − 𝑞)2
𝑖=1 𝑖=1 𝑖=1 𝑖=1
∞ ∞ ∞ ∞
𝑑 𝑑 𝑑 𝑞 𝑞(1 + 𝑞)
∑ 𝑖 2 𝑞 𝑖 = 𝑞 ∑ 𝑖 2 𝑞𝑖−1 = 𝑞 ∑ 𝑖𝑞 𝑖 = 𝑞 ∑ 𝑖𝑞 𝑖 = 𝑞 = ; 0<𝑞<1
𝑑𝑞 𝑑𝑞 𝑑𝑞 (1 − 𝑞)2 (1 − 𝑞)3
𝑖=1 𝑖=1 𝑖=1 𝑖=1
However, the expected value of the geometric distribution can be obtained by

the following formula as well.
∞ ∞ ∞
1 1
𝐸(𝑋) = ∑ 𝑃(𝑋 ≥ 𝑖) = ∑ 𝑃(𝑋 > 𝑖) = ∑ 𝑞 𝑖 = =
1−𝑞 𝑝
𝑖=1 𝑖=0 𝑖=0
The moment generating function of this random variable is determined as

follows:
∞ ∞
𝑡𝑋 𝑡𝑥 𝑥−1
𝑝 𝑝 𝑞𝑒 𝑡 𝑝𝑒 𝑡
𝑀𝑥 (𝑡) = 𝐸(𝑒 ) = ∑ 𝑒 𝑞 𝑝 = ∑(𝑞𝑒 𝑡 )𝑥 = = ; 𝑞𝑒 𝑡 < 1
𝑞 𝑞 1 − 𝑞𝑒 𝑡 1 − 𝑞𝑒 𝑡
𝑥=1 𝑥=1
Likewise, it can be shown that its factorial moment generating function is as

follows:
𝑝𝑡
𝜋𝑥 (𝑡) = 𝐸(𝑡 𝑋 ) =
1 − 𝑞𝑡
And to calculate 𝐸(𝑋) and 𝐸(𝑋 2 ) of the random variable, we can differentiate
the moment generating function once and twice with respect to 𝑡, respectively, and
then let 𝑡 = 0.
𝑑𝑀𝑥 (𝑡) 𝑝𝑒 𝑡 𝑝 𝑝 1
𝐸(𝑋) = | = | = = =
𝑑𝑡 𝑡 = 0 (1 − 𝑞𝑒 𝑡 )2 𝑡 = 0 (1 − 𝑞)2 𝑝2 𝑝
𝑑2 𝑀𝑥 (𝑡) 𝑝𝑒 𝑡 − 𝑝𝑒 3𝑡 + 2𝑝2 𝑒 3𝑡 − 𝑝3 𝑒 3𝑡 2−𝑝
𝐸(𝑋 2 ) = 2
| = 𝑡 4
| =
𝑑𝑡 𝑡=0 (1 − 𝑞𝑒 ) 𝑡=0 𝑝2
278 | P a g e
Example 4.2
On average, how many times should a pair of fair dice be rolled until two 6's
occurs?
Solution. The number of rolls needed to get the two 6's has a geometric distribution
1
with the parameter or the probability of success . Therefore, its expected value is
36
equal to:
1 1
= = 36
𝑝 1
36
Example 4.3
There are “𝑎” white and “𝑏” black marbles in a bag. If we keep withdrawing
marbles with replacement from the bag until one white marble is obtained, on
average, how many marbles should be withdrawn?
Solution. The number of marbles required to get the first white marble is simply the
number of independent Bernoulli trials required to get the first success. Therefore,
its distribution is geometric and since the probability of success for each trial is 𝑝 =
𝑎 1 𝑎+𝑏
, its mean is equal to 𝑝 = .
𝑎+𝑏 𝑎
A Geometric distribution possesses a property called memoryless. This property

of the geometric distribution means that if we do not get the first success until
the first “𝑏” trials, the probability that we do not get the success until the trial 𝑎 + 𝑏
is equal to the probability of the event in which we do not get the success until the
279 | P a g e
first “𝑎” trials. That is, knowing that we do not get the success in the first “𝑏” trials
does not affect the probability that we do not get the success in the next “𝑎” trials.
This property, for real numbers “𝑎” and “𝑏”, can be shown as follows:
𝑃(𝑋 > 𝑎 + 𝑏 ∩ 𝑋 > 𝑏) 𝑃(𝑋 > 𝑎 + 𝑏) (1 − 𝑝)𝑏+𝑎

𝑃(𝑋 > 𝑎 + 𝑏|𝑋 > 𝑏) = = = = (1 − 𝑝)𝑎
𝑃(𝑋 > 𝑏) 𝑃(𝑋 > 𝑏) (1 − 𝑝)𝑏
Example 4.4
Suppose that there are 11 coins in an urn, one of which is gold. If we select one
coin randomly and successively from this urn, then it is desired to obtain the
probability function of the number of choices until getting the gold in the following
conditions:
a. Choices are with replacement.
b. Choices are without replacement.
Solution.
a. For choices with replacement, since trials are independent, it is evident
that the number of trials until getting the success has a geometric
distribution. Besides, since the probability of success in each trial is
1
independently equal to 11, its probability function is as follows:
10 1
𝑃(𝑋 = 𝑖) = ( )𝑖−1 × ; 𝑖 = 1,2,3, …
11 11
b. For choices without replacement, as mentioned in Chapter 3, trials are
not independent; thereby, the number of trials until getting the success
does not follow the geometric distribution. The number of choices
without replacement until getting the gold takes on integers from 1 to
11, and its probability function is as follows:
280 | P a g e
1
𝑃(𝑋 = 1) =
11
10 1 1
𝑃(𝑋 = 2) = × =
11 10 11 1
10 9 1 1 ⇒ 𝑃(𝑋 = 𝑖) = ; 𝑖 = 1,2, … ,11
𝑃(𝑋 = 3) = × × = 11
11 10 9 11
⋮
10 9 8 1 1
𝑃(𝑋 = 11) = × × × ⋯× = }
11 10 9 1 11
For instance, 𝑃(𝑋 = 3) means that the first and second trials are failures
10 9 1
and the third trial is a success, which has the probability of × 10 × 9 =
11
1
.
11
Example 4.4
In the preceding problem, obtain the values of 𝑃(𝑋 > 2) and 𝑃(𝑋 > 2 + 3|𝑋 >
3) under two conditions of with replacement and without replacement.
Solution. As mentioned, for the case with replacement, since the trials are
independent, the number of trials until getting the success in this problem follows a
1
geometric distribution with parameter 𝑝 = 11. Hence, we have:
10 2
𝑃(𝑋 > 2) = (1 − 𝑝)2 = ()
11
10 5
𝑃(𝑋 > 5, 𝑋 > 3) 𝑃(𝑋 > 5) (11) 10
𝑃(𝑋 > 2 + 3|𝑋 > 3) = = = = ( )2 = 𝑃(𝑋 > 2)
𝑃(𝑋 > 3) 𝑃(𝑋 > 3) ( )310 11
11
As seen, in this case, the value of 𝑃(𝑋 > 2 + 3|𝑋 > 3) is equal to that of 𝑃(𝑋 >
2) resulting from the memoryless property of geometric distribution.
For the case in which the choices are without replacement, (𝑋 > 2) means that
the number of trials until getting the first success is more than two. That is, the first
and second trials are failures. Therefore, we have:
281 | P a g e
10 9 9
𝑃(𝑋 > 2) = × =
11 10 11
Moreover, to calculate 𝑃(𝑋 > 2 + 3|𝑋 > 3), we have:
6
𝑃(𝑋 > 5, 𝑋 > 3) 𝑃(𝑋 > 5) 11 6
𝑃(𝑋 > 2 + 3|𝑋 > 3) = = = = ≠ 𝑃(𝑋 > 2)
𝑃(𝑋 > 3) 𝑃(𝑋 > 3) 8 8
11
When we know that the event 𝑋 > 3 has occurred, it means that the first
three trials are failures. In such a case, eight coins remain in the urn and one of
them is gold. Accordingly, the probability that the next two choices are not gold
7 6 6
can also be written as 8 × 7 = 8.
As seen, in this case, the value of 𝑃(𝑋 > 2 + 3|𝑋 > 3) is not equal to that of
𝑃(𝑋 > 2). The reason is that the trials are dependent in this case and the number
of trials until getting the first success does not follow the geometric distribution.
S ometimes analyzing the number of failures before getting the first success
matters rather than the number of trials until getting the first success. In
independent Bernoulli trials with identical parameter 𝑝, the number of failures before
getting the first success is called the geometric distribution of failure type. If this
random variable is denoted by 𝑌, then we have:
𝑃(𝑌 = 𝑗) = (1 − 𝑝)𝑗 𝑝 ; 𝑗 = 0, 1,2,3, …
Since the number of failures before getting the first success is 𝑗 (𝑌 equals 𝑗),
the first 𝑗 trials are all failures and the (𝑗 + 1)𝑡ℎ should result in a success, which has
the probability of (1 − 𝑝)𝑗 𝑝.
It is evident that if in independent Bernoulli trials with identical parameter 𝑝

variable 𝑋 denotes the number of trials until getting the first success and variable 𝑌
denotes the number of failures before getting the first success, the equality 𝑌 = 𝑋 − 1
is valid between them. Hence, for the properties of geometric random variable of
failure type, we have:
𝑃(𝑌 > 𝑎) = 𝑃(𝑋 − 1 > 𝑎) = 𝑃(𝑋 > 𝑎 + 1) = 𝑞 𝑎+1
282 | P a g e
1 1−𝑝
𝐸(𝑌) = 𝐸(𝑋 − 1) ⇒ 𝐸(𝑌) =−1=
𝑝 𝑝
1−𝑝
𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝑋 − 1) = 𝑉𝑎𝑟(𝑋) = 2
𝑝
C onsider a sequence of independent Bernoulli trials, each of which has the

probability of success 𝑝. Suppose that the random variable 𝑋 denotes the
number of trials until getting the 𝑟 𝑡ℎ success, which is called the Negative binomial
random variable with parameters 𝑟 and 𝑝. In this case, the probability function of this
𝑖−1
𝑃(𝑋 = 𝑖) = ( ) 𝑝𝑟 (1 − 𝑝)𝑖−𝑟 ; 𝑖 = 𝑟, 𝑟 + 1, 𝑟 + 2, …
𝑟−1
To explain the procedure to obtain the probability function of the negative

binomial distribution, we know that since the 𝑟 𝑡ℎ success occurs at the 𝑖 𝑡ℎ trial, there
should be (𝑟 − 1) successes among the first (𝑖 − 1) trials and also the 𝑟 𝑡ℎ trial should
certainly result in a success. Therefore, the probability that the number of trials until
getting the 𝑟 𝑡ℎ success equals 𝑖 is:
𝑖−1 𝑖−1
𝑃(𝑋 = 𝑖) = ( ) 𝑝𝑟−1 (1 − 𝑝)(𝑖−1)−(𝑟−1) × 𝑝 = ( ) 𝑝𝑟 (1 − 𝑝)𝑖−𝑟
𝑟−1 𝑟−1
If 𝑋 is a negative binomial distribution with parameters 𝑟 and 𝑝, we briefly

denote it by 𝑋~𝑁𝐵(𝑟, 𝑝).
Note that the geometric random variable is a special case of the negative
binomial random variable in which the value of 𝑟 is equal to 1.
283 | P a g e
Example 5.1
If we successively roll a fair die, what is the probability that the second six
appears at the fifth roll?
Solution. Suppose that 𝑋 denotes the number of trials required to get the second six.
Therefore, we have:
4 1 5
𝑃(𝑋 = 5) = ( ) ( )2 ( )3
1 6 6
Example 5.2
In the preceding example, what is the probability that the second six does not
appear until the tenth roll?
Solution.
10
𝑖 − 1 1 2 5 𝑖−2
𝑃(𝑋 > 10) = 1 − 𝑃(𝑋 ≤ 10) = 1 − ∑ ( )( ) ( )
2−1 6 6
𝑖=2
However, there is another solution to this problem. That the second six does
not appear until the tenth roll means that the second six should have not appeared
in the first 10 rolls. That is, the face 6 comes up at most once in the first 10 rolls, the
probability of which is equal to:
10 1 0 5 10 10 1 5
( ) ( ) ( ) + ( ) ( )1 ( )9
0 6 6 1 6 6
284 | P a g e
Example 5.3
In independent Bernoulli trials with identical parameter 𝑝, if we know that the

third success occurs on the 10th trial, then it is desired to obtain:
a. The probability that the first success occurs on the fourth trial and the
second success occurs on the seventh trial.
b. The probability that the first success occurs on the fourth trial.
c. The probability that the fourth trial results in a success.
Solution. Suppose that events A, E, F, and G are defined as follows:

A: The third success occurs on the 10th trial.
E: The first success occurs on the fourth trial and the second success
occurs on the seventh trial.
F: The probability that the first success occurs on the fourth trial.
G: The fourth trial results in a success.
a.
𝑃(𝐸 ∩ 𝐴) (1 − 𝑝)3 𝑝(1 − 𝑝)2 𝑝(1 − 𝑝)2 𝑝 1 1
𝑃(𝐸|𝐴) = = = =
𝑃(𝐴) 10 − 1 3 9
( ) 𝑝 (1 − 𝑝)10−3 ( ) 36
3−1 2
Note that (𝐸 ∩ 𝐴) means that the first success occurs on the fourth trial,
the second success occurs on the seventh trial, and the third success
occurs on the 10th trial.
b.
3 5 1 4 5
𝑃(𝐹 ∩ 𝐴) (1 − 𝑝) 𝑝 (1) 𝑝 (1 − 𝑝) 𝑝 (1) 5
𝑃(𝐹|𝐴) = = = =
𝑃(𝐴) 10 − 1 3 9
( ) 𝑝 (1 − 𝑝)10−3 ( ) 36
3−1 2
(𝐹 ∩ 𝐴) means that the first success occurs on the fourth trial and the
third success occurs on the 10th trial. In this case, the second success
can occur on each of the fifth to ninth trials.
285 | P a g e
c.
8 8
𝑃(𝐸 ∩ 𝐴) 𝑝 ( ) 𝑝1 (1 − 𝑝)7 𝑝 ( ) 8 2
𝑃(𝐺|𝐴) = = 1 = 1 = =
𝑃(𝐴) 10 − 1 3 9
( ) 𝑝 (1 − 𝑝)10−3 ( ) 36 9
3−1 2
(𝐺 ∩ 𝐴) means that the fourth trial results in a success and the third
success occurs on the tenth trial, and among the other eight trials we
need only one success.
T he mean and variance of the negative binomial random variable are as follows:
𝑟
𝐸(𝑋) =
𝑝
𝑟 (1 − 𝑝)
𝑉𝑎𝑟 (𝑋) =
𝑝2
To prove the above equalities, we first prove that if 𝑋 has a negative binomial
distribution with parameters 𝑟 and 𝑝 and 𝑌 has the binomial distribution with
parameters 𝑟 + 1 and 𝑝, then we have:
𝑟
𝐸(𝑋 𝑘 ) = 𝐸((𝑌 − 1)𝑘−1 )
𝑝
The proof of the above equation is as follows:
∞ ∞
𝑖−1
𝐸(𝑋 𝑘 ) = ∑ 𝑖 𝑘 𝑃(𝑋 = 𝑖) = ∑ 𝑖 𝑘 ( ) 𝑝𝑟 (1 − 𝑝)𝑖−𝑟
𝑟−1
𝑖=𝑟 𝑖=𝑟
∞
𝑟 𝑖 𝑖−1 𝑖
= ∑ 𝑖 𝑘−1 ( ) 𝑝𝑟+1 (1 − 𝑝)𝑖−𝑟 𝑖( ) = 𝑟( )
𝑝 𝑟 𝑟−1 𝑟
𝑖=𝑟
∞
𝑟 𝑗 − 1 𝑟+1 𝑟
= ∑ (𝑗 − 1)𝑘−1 ( ) 𝑝 (1 − 𝑝)𝑗−(𝑟+1) = 𝐸((𝑌 − 1)𝑘−1 )
𝑝 𝑟 𝑝
𝑗=𝑟+1
Letting 𝑘 = 1, we have:
𝑟
𝐸(𝑋) =
𝑝
Moreover, letting 𝑘 = 2, we get:
286 | P a g e
𝑟 𝑟 𝑟+1
𝐸(𝑋 2 ) =𝐸(𝑌 − 1) = ( − 1)
𝑝 𝑝 𝑝
𝑟 𝑟+1 𝑟 2 𝑟 (1 − 𝑝)
𝑉𝑎𝑟 (𝑋) = ( − 1) − ( ) =
𝑝 𝑝 𝑝 𝑝2
In Chapter 9, we will prove that the moment generating function and factorial
moment generating function of the negative binomial random variable are as follows:
𝑡𝑋
𝑝𝑒 𝑡 𝑟
𝑀𝑥 (𝑡) = 𝐸(𝑒 ) = ( )
1 − 𝑞𝑒 𝑡
𝑝𝑡 𝑟
𝜋𝑥 (𝑡) = 𝐸(𝑡 𝑋 ) = ( )
1 − 𝑞𝑡
And to compute 𝐸(𝑋) and 𝐸(𝑋 2 ) of the random variable, we can differentiate
the moment generating function once and twice with respect to 𝑡, respectively, and
then let 𝑡 = 0.
𝑟−1
𝑑𝑀𝑥 (𝑡) 𝑝𝑒 𝑡 𝑝𝑒 𝑡 𝑝 𝑝 𝑟
𝐸(𝑋) = | = 𝑟( ) | =𝑟 =𝑟 2=
𝑑𝑡 𝑡=0 1 − 𝑞𝑒 𝑡 𝑡 2
(1 − 𝑞𝑒 ) 𝑡 = 0 (1 − 𝑞) 2 𝑝 𝑝
𝑑2 𝑀𝑥 (𝑡)
𝐸(𝑋 2 ) = |
𝑑𝑡 2 𝑡=0
𝑡 𝑟−2 2
𝑝𝑒 𝑝𝑒 𝑡
= (𝑟(𝑟 − 1) ( ) ( )
1 − 𝑞𝑒 𝑡 (1 − 𝑞𝑒 𝑡 )2
𝑟−1
𝑝𝑒 𝑡 𝑝𝑒 𝑡 − 𝑝𝑒 3𝑡 + 2𝑝2 𝑒 3𝑡 − 𝑝3 𝑒 3𝑡
+𝑟( ) )|
1 − 𝑞𝑒 𝑡 (1 − 𝑞𝑒 𝑡 )4 𝑡=0
1 2−𝑝 𝑟 𝑟+1
= 𝑟(𝑟 − 1) 2
+𝑟 2 = ( − 1)
𝑝 𝑝 𝑝 𝑝
𝑟 𝑟+1 𝑟 2 𝑟 (1 − 𝑝)
⇒ 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 ) − [𝐸(𝑋)]2 =
2
( − 1) − ( ) =
𝑝 𝑝 𝑝 𝑝2
287 | P a g e
Example 5.4
In the trial of rolling a fair die, obtain the mean and variance of the number of
rolls required to get a 6 for the 6𝑡ℎ time.
Solution. Suppose that 𝑋 is a random variable denoting the number of rolls until
getting a 6 for the 6𝑡ℎ time. In this case, 𝑋 is a negative binomial distribution with
1
parameters 𝑝 = 6 and 𝑟 = 6. Hence, its mean and variance are equal to:
𝑟
𝐸(𝑋) = = 36
𝑝
𝑟 (1 − 𝑝)
𝑉𝑎𝑟 (𝑋) = = 180
𝑝2
Example 5.5
We successively repeat independent Bernoulli trials with the identical

probability of success 𝑝 = 0.3 until getting the third success. Each success results in
a 10-dollar prize and each failure results in a 5-dollar penalty. What are the mean and
variance of the profit value obtained by playing such a game?
Solution. If the profit value resulting from this game is denoted by 𝑌, and the number
of trials until getting the third success is denoted by 𝑋, it is evident that 𝑌 is a function
of 𝑋 as follows:
𝑌 = (3 × 10) − 5(𝑋 − 3) = 45 − 5𝑋
Hence, since 𝑋 is a negative binomial random variable with parameters
(𝑟 = 3, 𝑝 = 0.3), the mean and variance of 𝑌 are equal to:
𝑟 3
𝐸(𝑌) = 45 − 5𝐸(𝑋) = 45 − 5 × = 45 − 5 × = −5
𝑝 0.3
288 | P a g e
𝑟(1 − 𝑝) 3 × 0.7 1750
𝑉𝑎𝑟(𝑌) = 25𝑉𝑎𝑟(𝑋) = 25 × 2
= 25 × =
𝑝 (0.3)2 3
Considering the above values, are you willing to participate in the above game?
Note that the negative binomial distribution has not the memoryless property.
In other words, in the negative binomial distribution, it can be shown that:
𝑃(𝑋 = 𝑎 + 𝑛|𝑋 > 𝑛) ≠ 𝑃(𝑋 = 𝑎)
In other words, in independent Bernoulli trials with the identical probability
of success 𝑝, knowing that we have not obtained the 𝑟 𝑡ℎ success in the first 𝑛 trials,
the probability that after those “𝑛” trials it takes us another “𝑎” trials to get the 𝑟 𝑡ℎ
success depends on the number of successes obtained in the first 𝑛 trials. Hence, the
expression 𝑃(𝑋 = 𝑎 + 𝑛|𝑋 > 𝑛) is not equal to 𝑃(𝑋 = 𝑎).
I n independent Bernoulli trials with the identical probability of success 𝑝, the

number of failures before getting the 𝑟 𝑡ℎ success is called the negative binomial
random variable of failure type (the negative binomial random variable of the second
type). If this random variable is denoted by 𝑌, then its probability function is as
follows:
𝑗+𝑟−1 𝑟
𝑃(𝑌 = 𝑗) = ( ) 𝑝 (1 − 𝑝)𝑗 𝑗 = 0,1,2, …
𝑟−1
Note that since the number of failures before getting the 𝑟 𝑡ℎ success should
be equal to 𝑗 (𝑌 = 𝑗), there should be (𝑟 − 1) successes among the first (𝑟 + 𝑗 − 1)
trials, and 𝑗 failures occur, also the (𝑟 + 𝑗)𝑡ℎ trial should certainly result in a success.
It is evident that if, in independent Bernoulli trials with the identical
probability of success 𝑝, the random variable 𝑋 denotes the number of trials until
getting the 𝑟 𝑡ℎ success and the random variable 𝑌 denotes the number of failures
before getting the 𝑟 𝑡ℎ success, then the relationship 𝑌 = 𝑋 − 𝑟 is valid between them.
Therefore, we have:
𝑟 𝑟(1 − 𝑝)
𝐸(𝑌) = 𝐸(𝑋) − 𝑟 = −𝑟 =
𝑝 𝑝
𝑟(1 − 𝑝)
𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝑋) =
𝑝2
289 | P a g e
I n performing independent Bernoulli trials, each having the probability of success
𝑝, to calculate the probability that the 𝑛𝑡ℎ success occurs before the 𝑚𝑡ℎ failure,
two different approaches can be adopted. In the first approach, we use the negative
binomial distribution. In this approach, the probability of the event that the 𝑛𝑡ℎ
success occurs before the 𝑚𝑡ℎ failure is equivalent to the event that the 𝑛𝑡ℎ success
occurs before the (𝑛 + 𝑚)𝑡ℎ trial. That is, the 𝑛𝑡ℎ success occurs up to the
(𝑛 + 𝑚 − 1)𝑡ℎ trial. This expression means that the negative binomial distribution with
parameters (𝑟 = 𝑛, 𝑝) takes on one of the integers from 𝑛 to (𝑛 + 𝑚 − 1), which has
the probability of:
𝑚+𝑛−1
𝑘−1 𝑛
∑ ( ) 𝑝 (1 − 𝑝)𝑘−𝑛
𝑛−1
𝑘=𝑛
Moreover, the probability that we get the 𝑛𝑡ℎ success until the (𝑛 + 𝑚 − 1)𝑡ℎ
trial means that we get at least 𝑛 success among the (𝑛 + 𝑚 − 1) trials. Hence, the
other approach to solve this problem is to use the binomial distribution such that the
probability of the desired event is equivalent to that of the event in which a binomial
distribution with parameters (𝑛 + 𝑚 − 1, 𝑝) takes on one of the integers from 𝑛 to
(𝑛 + 𝑚 − 1). As a result, the probability of this event can be written as follows:
𝑚+𝑛−1
𝑚+𝑛−1 𝑘
∑ ( ) 𝑝 (1 − 𝑝)𝑚+𝑛−1−𝑘
𝑘
𝑘=𝑛
This problem is called the problem of points.
290 | P a g e
Example 5.6
Suppose that people A and B play together. In each play, independent of the
1 2
preceding times, A and B win with respective probabilities and 3. What is the
3
probability that the fifth win of person A occurs before the second win of person B?
Solution. The fifth win of person A occurs before the second win of person B means
that the fifth success of person A occurs before his second failure. That is, the fifth
success of person A should occur before the seventh trial. In other words, the fifth
success should be obtained up to the sixth trial. Such an event means that a negative
1
binomial distribution with parameters (𝑟 = 5, 𝑝 = 3) should adopt one of the integers
5 or 6, which has the following probability:
5−1 1 5 2 0 6 − 1 1 5 2 1 13
( )( ) ( ) + ( )( ) ( ) = 6
5−1 3 3 5−1 3 3 3
Moreover, the probability that we get the fifth success until the sixth trial
is equivalent to getting at least 5 successes among the first six trials. The
probability of the desired event is equal to that of the event in which a binomial
1
distribution with parameters (𝑛 = 6, 𝑝 = 3) adopts one of the integers 5 or 6.
Hence, the probability of this event is equal to:
6 1 2 6 1 2 13
( ) ( )5 ( )1 + ( ) ( )6 ( )0 = 6
5 3 3 6 3 3 3
T he Poisson distribution was introduced by the well-known mathematician

Siméon Denis Poisson in 1837. He introduces a method to approximate the
probability of binomial distribution, which comes in handy when the number of trials
is large (𝑛 → ∞), the probability of success is small (𝑝 → 0), and the average number
of successes remains a fixed quantity. The importance of the Poisson distribution
remained unknown until 1889, when the German-Russian mathematician
291 | P a g e
Bortkiewicz showed that this formula has significant applications and can be used to
model many real phenomena in the nature. Hence, the Poisson distribution is known
as one of the most widely used probabilistic distributions in the 20th century.
Definition
Poisson random variable: The random variable 𝑋 taking on one of the integers
0, 1, 2, … is called a Poisson random variable with parameter 𝜆 whenever we have for
any 𝜆 > 0:
𝑒 −𝜆 𝜆𝑖
𝑃(𝑋 = 𝑖) = ; 𝑖 = 0, 1, 2, …
𝑖!
If 𝑋 is a Poisson random variable with parameter 𝜆, we briefly denote it by
𝑋~𝑃(𝜆)
As mentioned, one of the applications of this distribution is to approximate the
binomial distribution under the condition that the number of trials is large (𝑛 → ∞),
the probability of success is small (𝑝 → 0), and the average number of successes
remains constant.
Suppose that 𝑋 is a binomial random variable with parameters 𝑛 and 𝑝 such
that 𝑛𝑝 is not a small quantity. If 𝜆 = 𝑛𝑝 is defined, then we have:
𝑛 𝑖 𝑛−𝑖
𝑛! 𝜆 𝑖 𝜆 𝑛−𝑖
𝑃(𝑋 = 𝑖) = ( ) 𝑝 (1 − 𝑝) = ( ) (1 − )
𝑖 𝑖! (𝑛 − 𝑖)! 𝑛 𝑛
𝑛
𝜆
𝑛(𝑛 − 1) ⋯ (𝑛 − 𝑖 + 1) 𝜆𝑖 (1 − 𝑛)
=
𝑛𝑖 𝑖! 𝜆 𝑖
(1 − 𝑛)
Now, for large values of 𝑛 in a way that 𝜆 is comparatively smaller than 𝑛, we

have:
𝜆 𝑛 𝑛(𝑛 − 1) ⋯ (𝑛 − 𝑖 + 1) 𝜆 𝑖
(1 − ) ≈ 𝑒 −𝜆 , ≈1 , (1 − ) ≈ 1
𝑛 𝑛𝑖 𝑛
Therefore, we have:
𝑒 −𝜆 𝜆𝑖
𝑃(𝑋 = 𝑖) ≈
𝑖!
292 | P a g e
In a simpler case, if the conditions of 𝑝 ≤ 0.1 and 𝑛 ≥ 10 are valid in the
binomial distribution, then the desired probabilities for the binomial distribution can
be approximated by the above probability function such that 𝜆 = 𝑛𝑝.
Example 6.1
Suppose that a factory sells its pieces in lots of size 20 and each piece is
defective with probability 0.05. It is desired to calculate the probability that there are
less than three defective pieces in a lot produced by the factory.
Solution. Suppose that 𝑋 denotes the number of defective pieces in lots of size 20.
Therefore, we have:
𝑃(𝑋 < 3) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)

20 20 20
= ( ) (0.05)0 (0.95)20 + ( ) (0.05)1 (0.95)19 + ( ) (0.05)2 (0.95)18
0 1 2
= 0.358 + 0.377 + 0.189 = 0.924
In this example, if we use the Poisson approximation, the value of 𝜆 is equal to

20 × 0.05 = 1. Hence, the desired probability can be approximated as follows:
𝑒 −1 × 10 𝑒 −1 × 11 𝑒 −1 × 12
𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) = + +
0! 1! 2!
= 0.368 + 0.368 + 0.184 = 0.920
As seen, the original values and the ones approximated by the Poisson
distribution are close to each other.
293 | P a g e
Example 6.2
If the probability of having side effects of special serum on people is 0.005,

what is the probability that at most 2 out of 1000 people bear such a side effect?
Solution. If the random variable 𝑋 denotes the number of people bearing this side
effect, then this random variable has a binomial distribution with parameters 𝑝 =
0.005 and 𝑛 = 1000. Since the value of 𝑛 is large and the value of 𝑝 is small, this
distribution can be approximated by a Poisson distribution with parameter 𝜆 = 𝑛𝑝 =
5. Accordingly, we have:
𝑒 −5 50 𝑒 −5 51 𝑒 −5 52 37 −5
𝑃(𝑋 ≤ 2) ≈ + + = 𝑒
0! 1! 2! 2
As mentioned before, the Poisson random variable was firstly presented to
approximate the binomial distribution. However, as time went by, it was used for
other applications as well. Suppose that we want to model the number of daily
customers entering at a store by using a random variable. In the first look, this
random variable is not similar to the binomial random variable, but if we divide the
time interval (namely, the day) into disjoint and very small subintervals such that
there is zero or one customer in each of them, then the number of customers per
day is equal to the number of subintervals in which we have one occurrence (the
customer entrance). In such a case, if the entrance of customers in the subintervals
is assumed to be independent, and the probability of customer entrance given each
subinterval is assumed to be the same and equal to 𝑝, then the number of customers
per day follows a binomial distribution. It is reasonable that as we increase the
number of subintervals per day, 𝑝 or the probability of the customer entrance in each
subinterval decreases. Furthermore, if 𝜆 = 𝑛𝑝 can be assumed to be a fixed quantity
for them, this binomial random variable can be modeled by the Poisson random
variable.
Likewise, it can be shown that there are many random variables that the
number of their events in a time unit or place unit can be modeled by the Poisson
random variable. Some examples of such a random variable are as follows:
294 | P a g e
1. The number of customers entering a bank per day.
2. The number of wrong numbers received by a person in a month.
3. The number of earthquakes occurring during a given year in a particular area.
4. The number of daily accidents occurring at an intersection.
5. The number of patients visiting an emergency center per day.
6. The number of scratches on one square meter of a cloth produced by a factory.
T o compute the mean and variance of the Poisson random variable, we can say
that this variable is approximately a binomial distribution with parameters 𝑛 and
𝑝 in which 𝑛 is large, 𝑝 is small, and 𝜆 equals 𝑛𝑝. Hence, it seems that its mean and
variance are 𝑛𝑝 = 𝜆 and 𝑛𝑝(1 − 𝑝) = 𝜆(1 − 𝑝) ≈ 𝜆, respectively. Now, to show these
results, suppose that 𝑋 has a Poisson distribution with parameter 𝜆. Therefore, we
have:
∞ ∞ ∞ ∞ ∞
𝑘 𝑘
𝑒 −𝜆 𝜆𝑖𝑘
𝑒 −𝜆 𝜆𝑖 𝑒 −𝜆 𝜆𝑖−1 𝑒 −𝜆 𝜆𝑗
𝐸(𝑋 ) = ∑ 𝑖 𝑃(𝑋 = 𝑖) = ∑ 𝑖 = ∑ 𝑖𝑘 = 𝜆 ∑ 𝑖 𝑘−1 = 𝜆 ∑(𝑗 + 1)𝑘−1
𝑖! 𝑖! (𝑖 − 1)! 𝑗!
𝑖=0 𝑖=0 𝑖=1 𝑖=1 𝑗=0
= 𝜆𝐸((𝑋 + 1)𝑘−1 )
Accordingly, the mean of the Poisson distribution can be calculated as follows:
𝐸(𝑋) = 𝜆
𝐸(𝑋 2 ) = 𝜆𝐸((𝑋 + 1)2−1 ) = 𝜆(𝜆 + 1)
⇒ 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸 (𝑋)] 2 = (𝜆 + 𝜆2 ) − 𝜆2 = 𝜆
As seen, the mean of the Poisson distribution is equal to 𝜆 usually called the
expectation or the rate of the number of events in a time or place unit. Moreover, it
is often obtained empirically.
295 | P a g e
Example 6.3
Suppose that the number of cars per day ignoring the no entry sign on the
street has a Poisson random variable with the daily expectation of 4 events. It is
desired to calculate the probability that at most two cars ignore the no entry sign on
a given day.
Solution. Suppose that 𝑋 denotes the number of cars per day, ignoring the no entry
sign. Therefore, we have:
𝑒 −4 40 𝑒 −4 41 𝑒 −4 42
𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) = + + = 13𝑒 −4 = 0.238
0! 1! 2!
The moment generating function of this random variable is determined as
follows:
∞ ∞
𝑡𝑋 𝑡𝑥
𝑒 −𝜆 𝜆𝑥 −𝜆
(𝜆𝑒 𝑡 )𝑥 𝑡 𝑡
𝑀𝑥 (𝑡) = 𝐸(𝑒 ) = ∑ 𝑒 =𝑒 ∑ = 𝑒 −𝜆 𝑒 𝜆𝑒 = 𝑒 𝜆(𝑒 −1)
𝑥! 𝑥!
𝑥=0 𝑥=0
Likewise, it can be shown that its factorial moment generating function is as

follows:
𝜋𝑥 (𝑡) = 𝐸(𝑡 𝑋 ) = 𝑒 𝜆(𝑡−1)
Furthermore, to calculate 𝐸(𝑋) and 𝐸(𝑋 2 ) of this random variable, we can
differentiate the moment generating function once and twice with respect to 𝑡,
respectively, and then let 𝑡 = 0.
𝑑𝑀𝑥 (𝑡) 𝑡
𝐸(𝑋) = | = 𝜆𝑒 𝑡 𝑒 𝜆(𝑒 −1) | =𝜆
𝑑𝑡 𝑡=0 𝑡=0
2
𝑑2 𝑀𝑥 (𝑡) 𝑡 𝑡
𝐸(𝑋 ) = 2
| = (𝜆𝑒 𝑡 𝑒 𝜆(𝑒 −1) + 𝜆2 𝑒 2𝑡 𝑒 𝜆(𝑒 −1) ) | = 𝜆 + 𝜆2
𝑑𝑡 𝑡=0 𝑡=0
296 | P a g e
A s mentioned above, one of the applications of Poisson random variable is to
model the occurrence of events in time or place intervals. For example, the
number of earthquakes, the number of customers of a bank, and the number of
accidents at an intersection in specified time intervals can be among the
applications of this random variable. It can be shown that if 𝜆 is the rate or the
expected number of events in a time or place unit and the following conditions are
valid, then the number of events occurring in a time interval of length “𝑡” follows a
Poisson random variable with mean 𝜆𝑡. The conditions are as follows:
1. The probability that precisely one event occurs at a time interval of length 𝛥𝑡
𝛰(𝛥𝑡)
is 𝜆𝑡 + 𝛰(𝛥𝑡) , where 𝛰(𝛥𝑡) denotes a function having property 𝑙𝑖𝑚 = 0.
𝛥𝑡→0 𝛥𝑡
As an example, if 𝛰(𝛥𝑡) is defined as (𝛥𝑡)2 , it has this property. However, if it
is defined as 𝑂(𝛥𝑡) = 𝛥𝑡, the property is not valid.
2. The probability that more than one event occurs in a time interval of length
𝛥𝑡 is 𝑂(𝛥𝑡).
3. For integers n and 𝑘1 , 𝑘2 , … , 𝑘𝑛 , if events 𝐸1 , 𝐸2 , … , 𝐸𝑛 denote that the events
𝑘1 , 𝑘2 , … , 𝑘𝑛 occur, respectively in disjoint subintervals, then 𝐸1 , 𝐸2 , … , 𝐸𝑛 are
independent.
Properties 1 and 2 state that the probability that exactly one event occurs in
a very small interval is 𝜆𝛥𝑡, and the probability that two or more than two events
occur is very small in comparison with 𝛥𝑡. The third property states that the number
of events belonging to disjoint time intervals is independent of each other. Now, by
using these assumptions, it can be shown that the number of events occurring in a
time interval of length “𝑡” follows a Poisson random variable with mean 𝜆𝑡. Suppose
that 𝑁(𝑡) is the number of events in an interval of length 𝑡. To calculate 𝑃(𝑁(𝑡) = 𝑘),
we define 𝑃(𝑁(𝑡) = 𝑘) = 𝑃𝑘 (𝑡)
Now, consider the two cases of 𝑘 = 0 and 𝑘 > 0. 𝑃0 (𝑡 + 𝛥𝑡) is the probability
that no event occurs until moment 𝑡 + 𝛥𝑡. This means that no event occurs until
moment “𝑡”, which has the probability of 𝑃0 (𝑡), and no event occurs in interval 𝑡 to
𝑡 + 𝛥𝑡, which has the probability of 1 − 𝜆𝛥𝑡 − (𝛥𝑡). Hence, using the third
assumption, we have:
𝑃0 (𝑡 + 𝛥𝑡) = 𝑃0 (𝑡)(1 − 𝜆𝛥𝑡 − Ο(𝛥𝑡))
297 | P a g e
Considering that (𝛥𝑡) is comparatively smaller than 𝑡, we have:
𝑃0 (𝑡 + 𝛥𝑡) − 𝑃0 (𝑡) 𝑃0 (𝑡)(−𝜆𝛥𝑡 − (𝛥𝑡)) 𝑑𝑃0 (𝑡)
𝑙𝑖𝑚 = 𝑙𝑖𝑚 ≈ −𝜆𝑃0 (𝑡) ⇒ = −𝜆𝑃0 (𝑡)
𝛥𝑡→0 𝛥𝑡 𝛥𝑡→0 𝛥𝑡 𝑑𝑡
Now, for 𝑘 > 0, assuming that the probability of having more than one event
in a time interval of length 𝛥𝑡 is approximately equal to zero, the number of events
belonging to a time interval of length 𝛥𝑡 is assumed to be equal to 1. In this case,
𝑃𝑘 (𝑡 + 𝛥𝑡) is equal to:
𝑃𝑘 (𝑡 + 𝛥𝑡) = 𝑃𝑘 (𝑡)𝑃0 (𝛥𝑡) + 𝑃𝑘−1 (𝑡)𝑃1 (𝛥𝑡) = (1 − 𝜆𝛥𝑡 − (𝛥𝑡))𝑃𝑘 (𝑡) + 𝜆𝛥𝑡𝑃𝑘−1 (𝑡)
𝑃𝑘 (𝑡 + 𝛥𝑡) − 𝑃𝑘 (𝑡) 𝑑𝑃𝑘 (𝑡)
𝑙𝑖𝑚 = −𝜆𝑃𝑘 (𝑡) + 𝜆𝑃𝑘−1 (𝑡) ⇒ = −𝜆𝑃𝑘 (𝑡) + 𝜆𝑃𝑘−1 (𝑡)
𝛥𝑡→0 𝛥𝑡 𝑑𝑡
Therefore, in general, we have:
𝑑𝑃0 (𝑡)
= −𝜆𝑃0 (𝑡)
{ 𝑑𝑡
𝑑𝑃𝑘 (𝑡)
= −𝜆𝑃𝑘 (𝑡) + 𝜆𝑃𝑘−1 (𝑡) ; k=1,2,3, ...
𝑑𝑡
Solving the system of above equations yields the following equality:
𝑒 −𝜆𝑡 (𝜆𝑡)𝑘
𝑃𝑘 (𝑡) = ; k=0,1,2,3, ...
𝑘!
This probability function belongs to a Poisson random variable with parameter
𝜆𝑡.
Example 6.4
In Example 6.3, what is the probability that three cars on one day and two cars
on the next day ignore the no entry sign?
Solution. With respect to the third property mentioned in the Poisson distribution,
the number of events on the first day is independent of that of the events on the
second day. Hence, the desired probability is equal to:
𝑒 −4 43 𝑒 −4 42
×
3! 2!
298 | P a g e
Example 6.5
In Example 6.3, what is the probability that a total of five cars on two days
ignore the no entry sign?
Solution. According to the Poisson process, the number of cars ignoring the street's
no entry sign on two days follows a Poisson random variable with parameter 𝜆𝑡 =
4 × 2 = 8. Hence, the probability that a total of five cars on two days ignore the no
entry sign is equal to:
𝑒 −8 85
5!
A s mentioned before, whenever the number of trials in a binomial distribution is

large (𝑛 → ∞) and the probability of success is small (𝑝 → 0), the binomial
distribution probabilities can be approximated by the Poisson distribution. This
approximation is valid even for the distribution of the number of successes in 𝑛
dependent Bernoulli trials with weak dependence. For example, recall the matching
problem in Example 5.5 of Chapter 2. When people randomly select their hats, these
random choices can be considered 𝑛 simple Bernoulli trials in which the result of the
𝑖 𝑡ℎ trial (i=1,2, … , n) is a success whenever person 𝑖 picks his hat. If 𝐸𝑖 denotes the
event that person 𝑖 selects his hat, then we have:
1
𝑃(𝐸𝑖 ) =
𝑛
1
𝑃(𝐸𝑖 |𝐸𝑗 ) = ; 𝑖≠𝑗
𝑛−1
Accordingly, it is seen that the 𝐸𝑖 's are not independent; however, if 𝑛 is large,
then the two above quantities will be equal and the dependency of trials the
decreases. In these conditions, it is expected that the number of successes follows
1
approximately a Poisson distribution with parameter 𝜆 = 𝑛𝑝 = 𝑛 × 𝑛 = 1.
299 | P a g e
Example 6.6
If 20 couples randomly sit at a round table, what is an approximation to the

probability that the number of couples sitting next to each other is zero?
Solution. We define one trial for each couple and regard the result of the 𝑖 𝑡ℎ trial as
a success whenever couple 𝑖 are next to each other. Therefore, if couple 𝑖's success
is denoted by 𝐸𝑖 , the probability of couple 𝑖's success is calculated as follows:
2 × 38! 2
𝑃(𝐸𝑖 ) = =
39! 39
In the above calculations, to obtain the number of sates of the desired event,
we consider pair 𝑖 as a two-person group and the remaining 38 people as 38 single-
person groups that can be arranged in (39 − 1)! states around the table. The people
of couple 𝑖 can be displaced in 2! states within their group as well.
Moreover, if couple 𝑗's success occurs, the probability of couple 𝑖's success is
equal to:
2 × 2 × 37!
𝑃(𝐸𝑖 ∩ 𝐸𝑗 ) 39! 2
𝑃(𝐸𝑖 |𝐸𝑗 ) = = =
𝑃(𝐸𝑗 ) 2 × 38! 38
39!
Therefore, it is seen that the dependency of trials is small. Besides, since the
number of couples is large and their probability of success is small, the quantity of
2 40
𝜆 = 𝑛𝑝 is equal to 20 × 39 = 39. Using the Poisson approximation, the probability that
none of the couples gets a success; or in other words, they are not next to each other
is equal to:
40
𝑒 −𝜆 𝜆0
= 𝑒 −𝜆 = 𝑒 −39 '
0!
300 | P a g e
Proposition 6-1
If the number of times that an event occurs in a time unit follows a Poisson
distribution with parameter 𝜆, and each event of this distribution, independently,
belongs to a special type with probability 𝑝, then the number of times that the special
event occurs in a time unit has a Poisson distribution with parameter 𝜆𝑝.
Proof. Suppose that the number of events in a time unit follows a Poisson random
variable with parameter 𝜆 in which each event belongs to a special type with
probability 𝑝. If the total number of events in a time unit is denoted by 𝑋, the number
of special events in a time unit is denoted by 𝑌, the number of 𝑌 depends on the total
number of events. In addition, to calculate the probability that the number of special
events is 𝑘, we should condition on the total number of events as follows:
∞
𝑃(𝑌 = 𝑘) = ∑ 𝑃(𝑌 = 𝑘|𝑋 = 𝑥) 𝑃(𝑋 = 𝑥)

𝑥=0
𝑘−1 ∞
𝑥 𝑒 −𝜆 𝜆𝑥
= ∑ 0 × 𝑃(𝑋 = 𝑥) + ∑ ( ) 𝑝𝑘 (1 − 𝑝)𝑥−𝑘
𝑘 𝑥!
𝑥=0 𝑥=𝑘
∞
𝑒 −𝜆𝑝 (𝜆𝑝)𝑘 𝑒 −𝜆(1−𝑝) [𝜆(1 − 𝑝)]𝑥−𝑘
= ∑
𝑘! (𝑥 − 𝑘)!
𝑥=𝑘
∞
𝑒 −𝜆𝑝 (𝜆𝑝)𝑘 𝑒 −𝜆(1−𝑝) [𝜆(1 − 𝑝)]𝑗 𝑒 −𝜆𝑝 (𝜆𝑝)𝑘
= ∑ =
𝑘! 𝑗! 𝑘!
𝑗=0
As seen, the above probability function relates to a Poisson distribution with

parameter 𝜆𝑝.
In the above poof, note that if the total number of events is less than 𝑘, there
is no possibility that the number of special events is equal to 𝑘.
For example, suppose that the number of customers in a store follows a
Poisson random variable with parameter 𝜆 = 10 in which each person is a woman
with probability 0.4 and man with probability 0.6. Using Proposition 6-1, we can
conclude that the number of female customers per day in the store follows a Poisson
distribution with parameter 10 × 0.4 = 4, and the number of male customers per day
follows a Poisson distribution with parameter 10 × 0.6 = 6.
301 | P a g e
Example 6.7
Suppose that the number of customers calling a restaurant to order food

from 12:00 to 13:00 p.m. follows a Poisson distribution with parameter 20. If 30
percent of the customers call to order pizza and 70 percent of them call to order
sandwich, then it is desired to calculate:
a. The probability that from 12:00 to 13:00 p.m. 5 customers call to
order pizza.
b. The probability that in time interval 12:15 to 12:30 p.m. 4 customers
call to order pizza.
Solution.
a. According to Proposition 6-1, the number of calls to order the pizza
from 12:00 to 13:00 p.m. follows a Poisson distribution with parameter
𝜆𝑝1 = 20 × 0.3. Therefore, the probability of having 5 pizza orders in one
hour is equal to:
𝑒 −0.3×20 (0.3 × 20)5
= 0.16
5!
b. The number of calls to order the sandwich from 12:15 to 12:30 p.m.
follows a Poisson distribution with parameter 𝜆𝑝2 𝑡 = 20 × 0.7 × 0.25 =
3.5. Therefore, the probability of having 4 sandwich orders in this time
interval (12:15 to 12:30 p.m.) is equal to:
𝑒 −3.5 3. 54
= 0.188
4!
Proposition 6-2
Suppose that X is a Poisson random variable with parameter 𝜆. If 𝜆 is an integer, the
value of 𝑃(X=i) is maximum when i takes on 𝜆 or (𝜆 − 1). Nevertheless, if 𝜆 is not an
integer, 𝑃(X=i) takes on its maximum value when i is equal to the greatest integer
less than 𝜆.
302 | P a g e
Proof. Similar to the proof of Proposition 3-2, the least value of 𝑘 having the following
property is the mode:
𝑒 −𝜆 𝜆𝑘+1
𝑃(𝑋 = 𝑘 + 1) (𝑘 + 1)! 𝜆
≤ 1 ⇒ −𝜆 𝑘 ≤ 1 ⇒ ≤1⇒𝜆 ≤𝑘+1
𝑃(𝑋 = 𝑘) 𝑒 𝜆 𝑘+1
𝑘!
The first value of 𝑘 which is valid in the above relation has the following
condition:
If 𝜆 is an integer ⇒ mode = 𝜆, 𝜆 − 1
If 𝜆 is 𝑛𝑜𝑡 an integer ⇒ mode = [𝜆]
Moreover, it can be shown that if two consecutive points 𝑘 and 𝑘 + 1 from a

Poisson distribution have the same probability, then 𝑘 + 1 = 𝜆; thereby, 𝑘 and 𝑘 + 1
certainly are the mode.
Examples of the Poisson probability function is shown in the figure below:
Figure 6-3 Examples of the Poisson probability function
As seen in Figure 6.3, the Poisson distribution is generally right skewed, but
as 𝜆 increases, its skewness magnitude decreases.
303 | P a g e
S uppose that we select 𝑛 parts randomly and without replacement from a box
containing 𝑚 defective and 𝑁 − 𝑚 nondefective parts. If 𝑋 denotes the number
of defective parts withdrawn from the sample of size 𝑛, then we have:
𝑚 𝑁−𝑚
( )( )
𝑃(𝑋 = 𝑖) = 𝑖 𝑛−𝑖 ; Max(0, 𝑛 + 𝑚 − 𝑁) ≤ 𝑖 ≤ 𝑀𝑖𝑛(𝑚, 𝑛)
𝑁
( )
𝑛
We call this random variable hypergeometric with parameters 𝑛, 𝑚, and 𝑁
briefly denoted by 𝑋 ∼ 𝐻𝐺(𝑁, 𝑚, 𝑛). The values of this random variable are often
written as 0,1, … , 𝑛, but the value of 𝑃(𝑋 = 𝑖) is equal to zero for some values of
0,1, … , 𝑛. In fact, actual possible values for this distribution are in interval
𝑚𝑎𝑥(0, 𝑛 + 𝑚 − 𝑁) ≤ 𝑖 ≤ 𝑚𝑖𝑛(𝑚, 𝑛) since 𝑖 cannot be more than 𝑚 and 𝑛, and also
cannot be less than 𝑚𝑎𝑥(0, 𝑛 + 𝑚 − 𝑁). Namely, if 𝑛 = 4, 𝑚 = 3, 𝑁 = 10, it is evident
that 𝑃(𝑋 = 4) is equal to zero. Moreover, if (𝑛 = 5, 𝑚 = 7, 𝑁 = 10), we can easily
realize that 𝑖 cannot be less than 2.
Example 7.1
A factory sells its products in lots of size 10. If there are 3 defective products
in a lot produced by this factory, it is desired to calculate the probability that two
defective products are observed in a sample of size 3 (without replacement).
Solution. Suppose that the random variable 𝑋 denotes the number of defective
products in a sample of size 3.
3 7
( )( ) 7
𝑃(𝑋 = 2) = 2 1 =
10 40
( )
3
304 | P a g e
Example 7.2
In Example 7.1, suppose that 30 percent of lots are nondefective, 40 percent

have one defective product, and 30 percent of them have two defective products.
The factory offers a money-back guarantee to purchasers provided that there is a
defective product in a sample of size 3. What percentage of the lots are returned to
the factory?
Solution. Suppose that 𝐴𝑖 denotes the event that there are 𝑖 defective products in
lots of size 10 and E denotes the event that the lot is returned. Therefore, we have:
𝑃(𝐸) = 𝑃(𝐸|𝐴0 ) 𝑃(𝐴0 ) + 𝑃(𝐸|𝐴1 ) 𝑃(𝐴1 ) + 𝑃(𝐸|𝐴2 ) 𝑃(𝐴2 )

1 9 2 8 2 8
( )( ) ( )( ) +( )( )
= 0 × 0.3 + 1 2 × 0.4 + 1 2 2 1 × 0.3 = 0.28
10 10
( ) ( )
3 3
T he mean and variance of hypergeometric random variable are as follows:
𝑁−𝑛
𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝)
𝑁−1
𝑚
where 𝑝 = is the proportion of defective parts in the lot. To prove the
𝑁
above equalities, we first prove the following identity:
𝑛 𝑛 𝑚 𝑁−𝑚
( )( )
𝐸(𝑋 𝑘 ) = ∑ 𝑖 𝑘 𝑃(𝑋 = 𝑖) = ∑ 𝑖 𝑘 𝑖 𝑛−𝑖
𝑁
𝑖=0 𝑖=1 ( )
𝑛
𝑚 𝑚−1 𝑁 𝑁−1
Using identities 𝑖 ( ) = 𝑚 ( ) and 𝑛 ( ) = 𝑁 ( ), we have:
𝑖 𝑖−1 𝑛 𝑛−1
305 | P a g e
𝑛 𝑚 𝑁−𝑚 𝑚 𝑁−𝑚
( )( ) 𝑛 𝑛 𝑖( )( )
𝑘
𝐸(𝑋 ) = ∑ 𝑖 𝑖 𝑘 𝑛 − 𝑖 = ∑𝑖 𝑘−1 𝑖 𝑛−𝑖
𝑁 𝑁 𝑁−1
𝑖=1 ( ) 𝑖=1 ( )
𝑛 𝑛−1
𝑛 𝑚−1 𝑁−𝑚 𝑚−1 𝑁−𝑚
𝑛𝑚 ( )( ) 𝑛𝑚 𝑛−1 (
𝑗
)(
𝑛 −1−𝑗
) 𝑛𝑚
= ∑𝑖 𝑘−1 𝑖 − 1 𝑛 − 𝑖 = ∑(𝑗 + 1) 𝑘−1
= 𝐸((𝑌 + 1)𝑘−1 ))
𝑁 𝑁−1 𝑁 𝑁−1 𝑁
𝑖=1 ( ) 𝑗=0 ( )
𝑛−1 𝑛−1
where 𝑌 is a hypergeometric random variable with parameters 𝑛 − 1, 𝑁 − 1,
and 𝑚 − 1. Letting 𝑘 = 1, we have:
𝑛𝑚
𝐸(𝑋) = = 𝑛𝑝
𝑁
In addition, letting 𝑘 = 2, we get:
𝑛𝑚 𝑛𝑚 (𝑛 − 1)(𝑚 − 1)
𝐸(𝑋 2 ) = 𝐸(𝑌 + 1) = ( + 1)
𝑁 𝑁 𝑁−1
𝑛𝑚 (𝑛 − 1)(𝑚 − 1) 𝑛𝑚
⇒ 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = ( + 1) − ( )2
𝑁 𝑁−1 𝑁
𝑚 𝑚 𝑁−𝑛 𝑁−𝑛
=𝑛 (1 − ) = 𝑛𝑝(1 − 𝑝)
𝑁 𝑁 𝑁−1 𝑁−1
It should be noted that sampling from the hypergeometric distribution is

without replacement. If sampling is with replacement, the trials become independent
and the number of defective parts in a sample of size 𝑛 follows a binomial distribution
𝑚
with parameters 𝑛 and 𝑝 = 𝑁 . Moreover, its mean and variance are equal to 𝑛𝑝 and
𝑛𝑝(1 − 𝑝), respectively. Accordingly, the mean of the number of defective items of a
sample is the same in two cases of with replacement and without replacement, but
their variance is different. Furthermore, for the variance, if the value of 𝑁 is
𝑁−𝑛
comparatively larger than 𝑛 such that is approximately equal to 1, then the
𝑁−1
variance will approximately be equal to each other in two cases of with replacement
and without replacement. In fact, in the hypergeometric distribution, if the value of
𝑁 is comparatively larger than 𝑛, the probability function of this random variable
𝑚
converges to that of a binomial random variable with parameters 𝑛 and 𝑝 = 𝑁 . The
related proof can be shown as follows:
306 | P a g e
𝑚 𝑁−𝑚 𝑚! (𝑁 − 𝑚)!
( )( ) 𝑖! (𝑚 − 𝑖)! (𝑛 − 𝑖)! (𝑁 − 𝑚 − 𝑛 + 𝑖)!
𝑃(𝑋 = 𝑖) = 𝑖 𝑛−𝑖 =
𝑁 𝑁!
( )
𝑛 𝑛! (𝑁 − 𝑛)!
𝑚! (𝑁 − 𝑚)!
𝑛! (𝑚 − 𝑖)! (𝑁 − 𝑚 − 𝑛 + 𝑖)!
=
𝑖! (𝑛 − 𝑖)! 𝑁!
(𝑁 − 𝑛)!
𝑛 𝑚 𝑚 − 1 𝑚 − 𝑖 + 1 𝑁 − 𝑚 𝑁 − 𝑚 + 1 𝑁 − 𝑚 − (𝑛 − 𝑖 + 1)
= ( )( ⋯ )( ⋯ )
𝑖 𝑁𝑁−1 𝑁−𝑖+1 𝑁−𝑖 𝑁−𝑖+1 𝑁 − 𝑖 − (𝑛 − 𝑖 + 1)
𝑚
Now, if 𝑝 = and the values of 𝑁 and 𝑚 are comparatively larger than 𝑛 and 𝑖,
𝑁
the above expression is approximately equal to:
𝑛
( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑖
The intuitive perception of this note is straightforward because if 𝑁 and 𝑚 are
large, then the defectiveness or nondefectiveness of the preceding choices is not
very effective on the probability that the next choices are defective or nondefective.
Therefore, the trials can approximately be assumed as independent.
Example 7.3
A lot of size 5000 from an indusial piece contains 50 defective parts sent by
the factory to one of its representatives. If a customer randomly buys 5 parts from
this lot, what is an approximation to the probability that he receives no defective
part?
Solution. The number of defective parts from the sample follows a hypergeometric
distribution with parameters (𝑁 = 5000 , 𝑚 = 50 , 𝑛 = 5). It can be approximated by
𝑛
a binomial distribution with paramters (𝑛 = 5 , 𝑝 = 0.01) since the proportion is
𝑁
small. Therefore, the desired probability is approximately equal to:
307 | P a g e
5
𝑃(𝑋 = 0) ≈ ( ) (0.01)0 (0.99)5 ≈ (0.99)5 ≈ 0.951
0
Proposition 7-1
In a hypergeometric distribution, if the values of parameters 𝑚 and 𝑛 are changed,
the probability function will not change. In other words, if 𝑋 ∼ 𝐻𝐺(𝑁, 𝑚 = 𝑎, 𝑛 = 𝑏)
and 𝑌 ∼ 𝐻𝐺(𝑁, 𝑚 = 𝑏, 𝑛 = 𝑎), 𝑃(𝑋 = 𝑖) is equal to 𝑃(𝑌 = 𝑖).
Proof. Suppose that we have 𝑁 white marbles in an urn. Person A when starting to
play, selects 𝑎 of the marbles randomly and without replacement and assigns the
label A to them and finally returns them to the urn. Then, the second person when
starting to play selects 𝑏 of the marbles randomly and without replacement and
assigns the label B to them and finally returns them to the urn. Suppose that 𝑋
denotes the number of marbles labeled with A which are selected by person B.
Indeed, 𝑋 denotes the number of marbles with two labels of A and B, and this does
not depend on the person (A or B) who starts to play. Hence, whether person A or
person B starts to play, the distribution of the random variable 𝑋 does not change.
If person A begins, the probability function of 𝑋 is as follows:
𝑎 𝑁−𝑎
( )( )
𝑖 𝑏−𝑖 ;
𝑃(𝑋 = 𝑖) = i=0,1, … ,Min(a,b)
𝑁
( )
𝑏
On the other hand, if person B begins, the probability function of 𝑋 is as
follows:
𝑏 𝑁−𝑏
( )( )
𝑃(𝑋 = 𝑖) = 𝑖 𝑎 − 𝑖 ; i=0,1, … ,Min(a,b)
𝑁
( )
𝑎
Which will be equal to each other because of the aforementioned reasons.
308 | P a g e
I
1
f the random variable 𝑋 takes on variables 𝑥1 , 𝑥2 , … , 𝑥𝑛 with the same probability 𝑛,
it is called the discrete uniform random variable.
1
𝑃(𝑋 = 𝑥𝑖 ) = ; 𝑖 = 1,2,3, … , 𝑛
𝑛
We denote the discrete uniform random variable as 𝑋 ∼ 𝑈{1,2, … , 𝑛}.
The mean and variance of this distribution are calculated as follows:
𝑛 𝑛
1 ∑𝑛𝑖=1 𝑥𝑖
𝐸(𝑋) = ∑ 𝑥𝑖 × 𝑃(𝑋 = 𝑥𝑖 ) = ∑ 𝑥𝑖 × =
𝑛 𝑛
𝑖=1 𝑖=1
𝑛 2
1 ∑ 𝑥𝑖
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 ) − 2 [𝐸(𝑋)]2 = ∑ 𝑥𝑖2 × −[ ]
𝑛 𝑛
𝑖=1
In special case, if 𝑥𝑖 's are integers from 1 to 𝑛, then we have:

1
𝑃(𝑋 = 𝑖) = ; 𝑖 = 1,2,3, … , 𝑛
𝑛
In this regard, the mean and variance of this distribution are equal to:
𝑛 𝑛
1 𝑛(𝑛 + 1) 𝑛 + 1
𝐸(𝑋) = ∑ 𝑖 × 𝑃(𝑋 = 𝑖) = ∑ 𝑖 × = =
𝑛 2𝑛 2
𝑖=1 𝑖=1
𝑛 2
2 2 2
1 (𝑛 + 1) 𝑛(𝑛 + 1)(2𝑛 + 1) (𝑛 + 1)2 𝑛2 − 1
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 ) − [𝐸(𝑋)] = ∑ 𝑖 × − [ ] = − =
𝑛 2 6𝑛 4 12
𝑖=1
Example 8.1
Obtain the mean and variance of each of the following random variables.
a. 𝑋 ∼ 𝑈{1,2,3,4,5}
b. 𝑌 ∼ 𝑈{0,1,2,3,4}
c. 𝑍 ∼ 𝑈{1,3,5,7,9}
309 | P a g e
Solution.
5+1 52 −1
a. 𝐸(𝑋) = =3 𝑉𝑎𝑟(𝑋) = =2
2 12
b. It is evident that there is the relationship 𝑌 = 𝑋 − 1 between the

variables 𝑋 and 𝑌. Therefore, we have:
𝐸(𝑌) = 𝐸(𝑋) − 1 = 3 − 1 = 2 , 𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝑋) = 2
c. There is the relationship 𝑍 = 2𝑋 − 1 between the variables 𝑋 and 𝑍.

Therefore, we have:
𝐸(𝑍) = 2𝐸(𝑋) − 1 = 6 − 1 = 5 , 𝑉𝑎𝑟(𝑌) = 22 𝑉𝑎𝑟(𝑋) = 8
However, note that the main formulas can also be utilized to solve the second
and third parts, but is seems that using their relationship with the random variable
of part “a” is a more elegant solution.
310 | P a g e
1) If 𝑋 follows a Bernoulli distribution with parameter 𝑝, then it is desired to
calculate:
a. 𝐸(∑𝑛𝑖=1 𝑋 𝑖 )
b. 𝐸[(𝑋 − 2)(𝑋 2 + 4 + 2𝑋)]
𝑋𝑖
c. 𝐸 (∑𝑛𝑖=1(1+𝑋))
2) If 𝑋 follows a Bernoulli distribution with parameter 𝑝 and 𝑌 = 𝐹𝑋 (𝑥) is its

cumulative distribution function, then obtain the probability function of the
random variable 𝑌.
3) If 𝑋1 , … , 𝑋𝑛 are random variables resulting from independent Bernoulli trials,
each with parameter 𝑝, obtain the probability function, expected value,
variance, and moment generating function of the following random variables:
a. 𝑉 = (𝑋1 )𝑛
b. 𝑈 = ∏𝑛𝑖=1 𝑋𝑖
c. 𝑍 = 𝑛𝑋1
d. 𝑌 = ∑𝑛𝑖=1 𝑋𝑖
Hint: To obtain the probability function of the random variable 𝑌, note that
this random variable takes on the value of 𝑘 when 𝑘 of the 𝑋𝑖 's are equal to 1.
4) Suppose that 𝑋1 , … , 𝑋𝑛 are random variables resulting from independent
Bernoulli trials with identical parameter 𝑝. If 𝑌1 and 𝑌2 denote 𝑀𝑖𝑛{𝑋1 , … , 𝑋𝑛 }
and 𝑀𝑎𝑥{𝑋1 , … , 𝑋𝑛 } respectively, then it is desired to obtain:
a. The probability function of the random variables 𝑌1 = 𝑀𝑖𝑛{𝑋1 , … , 𝑋𝑛 } and
𝑌2 = 𝑀𝑎𝑥{𝑋1 , … , 𝑋𝑛 }.
b. 𝐸(𝑌1 𝑌2 )
c. Var(𝑌1 𝑌2 )
d. 𝑃(𝑌2 > 𝑌1 )
311 | P a g e
e. 𝑃(𝑌1 + 𝑌2 > 1)
f. 𝑃(𝑌2 − 𝑌1 = 0)
5) Suppose that 𝑋1 , … , 𝑋𝑛 are the random variables resulting from the
independent Bernoulli trials with identical parameter 𝑝. Obtain the
probability function and the expected value of the following random
variables:
a. 𝑍 = ∏𝑛𝑖=1(1 − 𝑋𝑖 )2
b. 𝑊 = ∏𝑛𝑖=1(𝑋𝑖 (𝑋𝑖 + 1))
c. 𝑉 = (1 − ∏𝑛𝑖=1 𝑋𝑖 )2
∏ 𝑛 𝑋𝑖
d. 𝑌 = 1+∑𝑖=1
𝑛
𝑖=1 𝑋𝑖
e. 𝑈 = ∏𝑛𝑖=1 𝑋𝑖 ∏𝑚
𝑖=1 𝑋𝑖 , for 𝑚 < 𝑛
6) Suppose that 𝑋1 , … , 𝑋𝑛 are the random variables resulting from the

independent Bernoulli trials with identical parameter 𝑝 such that each of
1
them has moment generating function 𝐸(𝑒 𝑡𝑥 ) = 10 (6 + 4𝑒 𝑡 ). It is desired to
calculate:
a. 𝐸(∑𝑛𝑖=1 𝑋𝑖𝑖 )
b. 𝑃(∏𝑛𝑖=1 𝑋𝑖 = 0)
7) Suppose that we perform 𝑛 independent trials and the random variable 𝑋𝑖 is
defined for the 𝑖 𝑡ℎ trial such that it adopts the values of 1, 0, and -1 with
1
identical probability 3. Obtain the probability function of the random variable
𝑌 = (∏𝑛𝑖=1 𝑋𝑖 )2 and then its expected value and variance.
8) In a binomial distribution with mean 3 and variance 2,
a. Obtain the value of 𝑃(𝑋 = 0).
b. Obtain the mode or the most probable value for this random variable.
9) We roll a fair die six times. If the random variable 𝑋 denotes the number of
times that the side 2 appears,
a. Obtain the probability function of the random variable 𝑋.
b. What is the probability that the side 2 appears at least twice?
312 | P a g e
c. Obtain the expected value and variance of this random variable.
d. Obtain the mode of this random variable.
10) In a binomial distribution with parameter 𝑛 = 10, the probability of success
in each trial is three times as many as that of its failure. In these conditions,
a. Obtain the mode of this random variable.
b. Obtain the expected value and variance of this random variable.
11) If the random variable X follows a binomial distribution with parameters 𝑛 =
9 and p in which 𝑃(𝑋 = 0) = 𝑃(𝑋 = 1), then obtain:
a. The value of 𝑃(𝑋 = 2).
b. The mode of this random variable.
c. The expected value and variance of this random variable.
12) Company A receives his required pieces from 10 providers. If company A
orders 5 pieces and each provider, without any restriction on the number
of pieces, can meet the needs of company A equally likely, then:
a. Obtain the average number of pieces that company A receives from
a particular provider.
b. What is the probability that a particular provider can meet precisely
3 pieces?
1
13) In 𝑛 independent Bernoulli trials with the identical probability of success 2,
determine 𝑛 such that:
a. The probability of getting at least one success is greater than 0.95.
b. The probability of getting at least one failure is greater than 0.95.
c. The probability of getting at least one success and at least one failure
is greater than 0.95.
14) In a game of chance, the probability of success for a person is independently
0.02. How many games should the person play such that the probability of
getting at least one success among his trials is greater than 50%?
313 | P a g e
15) Suppose that you have participated in a party. Except you, how many people
should participate in the party such that the probability that at least one of
them has the same birthday as yours is greater than 0.5?
(Suppose that none of the people participating in the party was born in the
leap year.)
16) In a game of chance, you are supposed to flip a fair coin 4 times. If the random
variable 𝑋 denotes the number of heads in 4 flips of this fair coin, and the
1
random variable 𝑌 denotes your prize value defined by 𝑌 = 𝑋+1 (measured in
dollars), then obtain the probability function of the random variable 𝑌 for
1 1 1 1
values of 1, 2, 3, 4, and 5.
17) In performing 10 independent Bernoulli trials, each of which results in a

success with probability 𝑝, we have obtained 4 successes. In these conditions,
a. What is the probability that the result of the last trial is a success?
b. What is the probability that the results of the first three trials are failure,
failure, and success, respectively?
c. What is the probability that we obtain exactly one success in the first
three trials?
18) There are 4 parts in a system and each part fails independently with probability
0.2 in each operation. The system functions until at least two parts function.
If the system successfully terminates an operation, what is the probability that
the part number 2 does not function during the operation?
19) A device sends digits zero and one. The device erroneously sends each of the
1
digits independently with probability 3. If we know that a four-digit password
is sent erroneously, what is the probability that its first digit is sent
erroneously?
100 100
20) There are ( ) marbles with number zero, ( ) marbles with number one,
0 1
100
…, and ( ) marble with number 100 in a bag. If we select one marble at
100
random from the bag and the random variable 𝑋 denotes number of the
selected marble, then obtain the probability function of this random variable.
314 | P a g e
21) Consider a person located at the zero point of the 𝑋-axis. If this person moves
1
each time independently to the right one unit with probability and one unit
2
1
to the left with probability 2, what is the probability that the person is displaced
2 units to the right after a total of 8 movements?
22) In the preceding problem, if the person moves each time and independently
to the right one unit with probability 𝑝 and one unit to the left with probability
𝑞 = 1 − 𝑝, what is the probability that the person is 2𝑘 units away from the
origin after a total of 2𝑛 movements?
23) The random variable 𝑋 takes on the values of 0, 1, 2, and 3 with respective
probabilities 0.4, 0.3, 0.2, and 0.1. The random variable 𝑌 denotes the number
of heads obtained in 𝑋 flips of a fair coin (if 𝑋 = 0, no flip is performed and 𝑌 is
equal to zero). If the trial of flipping the coin is performed and we know that
one heads appears, what is the probability that one flip is performed?
24) Obtain the expected value of the following probability functions:
10 2 𝑥 1 10−𝑥
a. 𝑃(𝑋 = 𝑥) = ( )( ) ( ) ; 𝑥 = 0,1,2, … ,10
𝑥 3 3
10 3𝑥
b. 𝑃(𝑋 = 𝑥) = ( ) ; 𝑥 = 0,1,2, … ,10
𝑥 410
10
( )
c. 𝑃(𝑋 = 𝑥) = 𝑥
10
; 𝑥 = 0,1,2, … ,10
2
25) Determine the constant 𝑐 in each of the following probability functions:

10
a. 𝑃(𝑋 = 𝑥) = 𝑐 ( ); 𝑥 = 0,1,2, … ,10
𝑥
10
b. 𝑃(𝑋 = 𝑥) = 𝑐 ( ); 𝑥 = 1,2, … ,10
𝑥
26) If the moment generating function of the random variable 𝑋 is given by
1
𝐸(𝑒 𝑡𝑥 ) = (2)20 (𝑒 𝑡 + 3)10 ,
a. Obtain the value of 𝐸(𝑋 2 ).

b. Obtain the value of 𝑃(𝑋 ≥ 1).
c. Obtain the value of 𝑃(𝑋 = 1|𝑋 > 0).
d. What is the probability that this random variable adopts an even value?
315 | P a g e
e. What is the probability that this random variable adopts an odd value?
27) Suppose that the random variable 𝑋 follows a binomial distribution with
1
parameters 𝑛 = 25 and 𝑝 = 2. If so, then it is desired to calculate:
a. 𝐸[𝑒 2𝑥 ]
b. 𝐸[(−1) 𝑋 ]
c. 𝐸[(1 + 2𝑋 )2 ]
d. 𝐸[𝑋𝑒 2𝑋 ]
𝜕𝐸(𝑒 𝑡𝑋 )
Hint: = 𝐸(𝑋𝑒 𝑡𝑋 )
𝜕𝑡
e. 𝐸[𝑋(−1) 𝑋−1 ]
𝜕𝐸(𝑡 𝑋 )
Hint: = 𝐸(𝑋𝑡 𝑋−1 )
𝜕𝑡
𝑋!
f. 𝐸 [(𝑋−3)!]
28) We arrange 𝑁 devices in a line. If each device is defective independently with

probability 𝑝, what is the probability that no two adjacent devices are
defective?
Hint: To solve this problem, condition on the number of defective devices.
29) Suppose that the density function for the lifetime of an electronic piece
(measured in hours) is given by:
10
𝑓(𝑥) = { 𝑥 2 ; 𝑥 > 10
0 ; 𝑥 ≤ 10
If we select 30 pieces at random and assume that the trials of investigating the
lifetime of pieces are independent,
a. How many pieces, on average, last longer than 15 hours?
b. What is the probability that 20 pieces last longer than 15 hours?
316 | P a g e
30) If the random variable 𝑋 follows a binomial distribution with parameters 𝑛 and
1
𝑝, then obtain the value of 𝐸 [1+𝑋].
31) The random variable 𝑋 follows a binomial distribution with parameters 𝑛 and
𝑝. It is desired to calculate:
a. The expected value of 𝑋 if we know at least one trial results in successes
(calculating 𝐸(𝑋|𝑋 ≥ 1)).
b. The expected value of 𝑋 if we know all of the trials did not result in
successes (calculating 𝐸(𝑋|𝑋 < 𝑛)).
c. The expected value of 𝑋 if we know the first trial results in a success.
d. The expected value of 𝑋 if we know the first trial results in a failure.
e. 𝐸(𝑋|𝑋 ≤ 1)
f. 𝐸(𝑋 2 |𝑋 ≥ 1)
32) Suppose that you produce 10 pieces per day such that the cost of production
for each piece is $30. Each piece is defective independently with probability 𝑝.
If you can sell each nondefective piece for $40 and each defective one for $10,
then:
a. What is your expected value of daily profit?
b. Find the value of 𝑝 such that your average daily profit becomes positive.
33) Suppose that parts of a production line are nondefective with identical
probability 0.9. We divide the parts into lots of size 10 and select two parts at
random from each lot. If both parts are nondefective, we accept the lot;
otherwise, the whole lot will be inspected. Obtain the average number of the
inspected parts of each lot.
34) Suppose that the daily demand of a product follows a binomial distribution
1
with parameters 𝑛 = 5 and 𝑝 = 2. The product price is $32, half of which belong
to the cost production, and the other half is profit. Four products are produced
per day, and each demand that we cannot provide for the costumer would
impose a penalty equivalent to the product price.
a. What is the probability that we encounter a shortage on a day?
317 | P a g e
b. What is the average amount of daily shortage penalty (measured in
dollars)?
c. What is the average number of the daily sold products?
d. What is the average amount of daily revenue (measured in dollars)
gained from the daily sold products?
e. What is the average amount of daily profit (measured in dollars) of the
product?
f. Using the result obtained in Problem 65 of Chapter 5, determine the
number of daily products such that the average daily profit is
maximized.
35) There are 𝑎 white and 𝑏 black marbles in a bag. If we keep selecting marbles
with replacement until one white marble is obtained, then calculate the
expected value and variance of the number of marbles required to get the first
white marble.
36) If the random variable 𝑋 follows a binomial distribution with parameters 𝑛 and
𝑝 and the random variable 𝑌 follows a geometric distribution with parameter
𝑝, then show that relationship 𝑃(𝑋 = 0) = 𝑃(𝑌 > 𝑛) is valid.
37) Consider a sequence of independent Bernoulli trials such that the probability
1
of success for the 𝑖 𝑡ℎ trial is equal to (1 − 𝑖 ). If the random variable 𝑁 denotes
the number of trials until getting the first success, obtain the probability
function and expected value.
1
38) Consider a trial in which the probability of success is 𝑝 = 2. We want to repeat
this trial independently until getting one success. Performing the first trial
costs 50 units (measured in thousand dollars), and given any failure, the total
5
costs of the trials amounts to 4 times. What is the average total cost (measured
in thousand dollars) to get the first success?
39) The distance traveled by a type of tire, measured in 10 thousand kilometers, is
given by:
2
𝑓𝑋 (𝑥) = {𝑥 2 ; 1<𝑥<2
318 | P a g e
If we inspect these tires separately and independently until getting the first
tire traveling more than 18000 kilometers, it is desired to calculate:
a. The expected value and variance of the number of inspected tires.
b. The probability that the number of inspected tires is equal to 2.
40) A service system can serve up to 10 individuals per day. If the daily service
demand of the system follows a geometric distribution with parameter 𝑝 = 0.1,
what proportion of the customers leave the system without receiving a
service?
41) In a research, we perform a sequence of independent trials with identical
1
probability of success 𝑝 = 3 until getting a success. The cost of each trial is 5
units (measured in thousand dollars), but if the trial results in a failure, then it
will cost 2 additional units.
a. Obtain the expected value and standard deviation of the total cost
measured in thousand dollars.
b. If we have $20000 altogether, what is the probability that we cannot get
the first success with the given budget.
42) If the probability function of the random variable X is given by
2
𝑃(𝑋 = 𝑥) = 𝑐(3)𝑥 ; 𝑥 = 1,2, … , then it is desired to:
a. Obtain the value of 𝑃(|𝑋 − 2| ≤ 1).

b. Obtain the value of 𝑃(𝑋 ≤ 7|𝑋 ≥ 3).
𝑒𝑡
43) If the probability function of the random variable 𝑋 is 𝑀𝑥 (𝑡) = 3−2𝑒 𝑡, then it is
desired to calculate:
a. The expected value and variance of this random variable.
b. 𝑃(𝑋 > 3)
c. 𝑃(𝑋 = 5|𝑋 > 3)
d. The probability that this random variable adopts an even value.
e. The probability that this random variable adopts an odd value.
319 | P a g e
44) A company intends to develop its marketing activities by face-to-face
advertising. To do so, the costs of cataloging and shipping to present each
potential buyer is estimated to be $500. If the probability of reaching an
agreement with each buyer is 0.5 and a budget of $1500 is considered for
advertising, what is the probability that the company can conclude a contract
with the budget?
45) Suppose 𝑋 is a random variable taking on positive integers such that
4𝑃(𝑋 = 𝑥 + 1) = 3𝑃(𝑋 = 𝑥); 𝑥 = 1,2,3, …
a. Obtain the value of 𝑃(𝑋 = 1).
b. Obtain the probability function of this random variable.
c. Obtain the expected value and variance of this random variable.
46) If 𝑋 denotes the number of failures before getting the first success in
1
performing independent Bernoulli trials with identical parameter 𝑝 = , then
3
obtain the following values:
a. The expected value and variance of the random variable 𝑋.
b. 𝑃(𝑋 > 4)
c. 𝐹𝑋 (3)
d. 𝑃(2 ≤ 𝑋 ≤ 5)
e. 𝑃(𝑋 ≤ 5|𝑋 ≥ 2)
f. 𝐸[(𝑋 − 3)2 ]
47) In each of the following probability functions, obtain the expected value of the
random variables.
2 1
a. 𝑃(𝑋 = 𝑥) = (3)𝑥−1 3 ; 𝑥 = 1,2,3, …
2 1
b. 𝑃(𝑋 = 𝑥) = (3)𝑥 3 ; 𝑥 = 0,1,2,3, …
2 1
c. 𝑃(𝑋 = 𝑥) = (3)𝑥 2 ; 𝑥 = 1,2,3, …
2 1
d. 𝑃(𝑋 = 𝑥) = (3)𝑥+1 2 ; 𝑥 = 0,1,2,3, …
3
e. 𝑃(𝑋 = 𝑥) = 𝑐(4)𝑥 ; 𝑥 = 1,2,3, …
320 | P a g e
3
f. 𝑃(𝑋 = 𝑥) = 𝑐(4)𝑥 ; 𝑥 = 0,1,2,3, …
48) A radar system in an area warns with probability 0.9 if there is an aircraft. Even
if there is no aircraft in the area, it may give an erroneous warning with
probability 0.05. Based on the available information, we know that there is an
aircraft with probability 0.2 in this area. In these conditions,
a. If the radar system warns in this area, what is that probability that this
warning is true?
b. In this area, on average, how many warnings should occur to reach a
valid warning?
c. In this area, on average, how many false alarms occur before a correct
warning?
49) Suppose that the random variable 𝑋 follows a negative binomial distribution
2
with parameters 𝑟 = 2 and 𝑝 = . If we know that the value of 𝑋 is equal to 10,
3
a. The probability that the first three trials result in failures.
b. The probability that the first success occurs in the third trial.
50) Suppose that the random variable 𝑌 follows a negative binomial with
2
parameters 𝑟 = 3 and 𝑝 = 3. If we know that the value of 𝑋 is equal to 10, it is
a. The probability that the first and second successes occur in the third
and fifth trials, respectively.
b. The probability that the first success occurs in the third trial.
51) In successive flips of an unfair coin whose probability of coming heads is 𝑝, we
know that the sixth time a heads comes up belongs to the tenth flip. What is
the probability that the third heads occurs in the fifth flip and the fifth heads
occurs in the eights flip?
52) We flip a coin whose probability of coming heads is twice that of coming tails
until 12 heads appear. What are the expected value and standard deviation of
the number of trials?
321 | P a g e
53) We perform a trial whose probability of success is 𝑝 until 𝑟 successes are
obtained. If the expected value and variance of the number of trials are 10 and
10,
a. What is the probability that we need exactly 5 flips?
b. What is the probability that we need more than 5 flips?
54) Suppose that we have two coins, one of which is fair and the other is unfair
1
such that its probability of coming heads is . We select a coin at random and
4
flip it successively.
a. What is the probability that the second heads appears in the fourth flip?
b. Given that the second heads appears in the fourth flip, what is the
probability that the fair coin is selected?
55) We flip a coin until two heads appear. Suppose that the random variables 𝑋
and 𝑌 denote the number of flips required to obtain the first and second heads,
respectively. If the probability of coming heads is 𝑝 in each trial, then obtain
the values of 𝑃(𝑋 = 𝑛|𝑌 = 𝑚) and 𝑃(𝑌 = 𝑚|𝑋 = 𝑛). (𝑚 > 𝑛)
56) A box contains 10 black balls. One black ball is randomly withdrawn from the
1
box in each step, and then one new ball that is either black with probability 3
2
or white with probability is replaced. What is the average number of steps
3
required to withdraw all of the black balls inside the box?
57) In independent Bernoulli trials with identical parameter 𝑝, it is desired to
calculate:
a. The expected value of the number of trials until getting 𝑟 successes.
b. The expected value of the number of trials until getting 𝑟 failures.
c. The expected value of the number of failures before getting the 𝑟 𝑡ℎ
success.
d. The expected value of the number of trials before getting the 𝑟 𝑡ℎ
success.
e. The expected value of the number of trials between the third and fifth
successes.
322 | P a g e
f. The expected value of the number of failures between the third and fifth
successes.
g. The expected value of the number of trials until getting the third
success if we know that the first six trials were failures.
h. The expected value of the number of trials until getting the third
success if we know that exactly one success occurs in the first six trials.
58) A random digit generator produces one of the digits from 0 to 9 with equal
probability in each trial independently. It is desired to calculate:
a. The expected value of the digits produced until getting the tenth time
that digit 3 is seen.
b. The expected value of the produced digits before getting the tenth time
that digit 3 is seen.
c. The expected value of the produced digits between the second and
seventh time that digit 3 is seen.
59) Consider a trial in which we flip three fair coins. We keep flipping successively.
a. The probability that the second time in which the results of the three
flips are the same (three heads or three tails) occurs in the fourth trial.
b. The probability that the second time in which at least two tails appear
occurs in the third trial.
60) A box contains three blue marbles, two green marbles, and two red marbles.
Each time, we select two marbles at random and without replacement and
then turn them to the box after seeing their color. In each trial, if the two
marbles are of the same color, then we call it a success; otherwise, we call it a
failure.
a. What is the average number of trials until getting the second success?
b. What is the average number of selected marbles until getting the
second success?
c. If we have gotten a success, what is the probability that both marbles
are red?
323 | P a g e
61) In a research, we perform a sequence of independent trials with the identical
1
probability of success 𝑝 = 2 until getting the second success. The cost of each
trial is 5 units (measured in thousand dollars). If the trial results in a failure, it
will cost 2 additional units.
a. Obtain the expected value and standard deviation of the total cost
measured in thousand dollars.
b. If we have $20000 altogether, what is the probability that we cannot get
the second success with the given budget?
62) A box contains 𝑛 pieces that 𝑟 of them are defective and the rest are
nondefective. We inspect the pieces at random. Obtain the probability that the
𝑘 𝑡ℎ piece (𝑘 ≥ 𝑟) is the 𝑟 𝑡ℎ defective piece to be inspected in each of the
a. Sampling is without replacement.
b. Sampling is with replacement.
63) If 𝑋 is a binomial distribution with parameters 𝑛 and 𝑝, and 𝑌 is a negative
binomial distribution with parameters 𝑟 and 𝑝, show that 𝑃(𝑋 < 𝑟) = 𝑃(𝑌 > 𝑛)
is valid.
𝑒𝑡
64) If the moment generating function of the random variable X is 𝑀𝑥 (𝑡) = (4−3𝑒 𝑡 )3,
b. 𝑃(𝑋 > 3)
c. 𝑃(𝑋 = 5|𝑋 > 3)
d. The probability that this random variable takes on an even value.
65) Consider successive flips of an unfair coin whose probability of coming heads
is 𝑝. In these conditions, obtain the probability function for the following
random variables:
a. The number of trials until getting a trial in which both faces are seen at
least once.
324 | P a g e
b. The number of trials until getting a trial in which both faces are seen
at least twice.
66) Suppose that five percent of the pieces produced by a factory are defective.
Inspection of every nondefective piece takes 2 minutes, and inspection
together with repairing for every defective piece takes 3 minutes. Due to the
restrictions of the available spare pieces, after inspecting and repairing the
third defective piece, we stop inspecting. Obtain the expected value and
standard deviation of the time (in minutes) spent on inspecting and repairing
the pieces.
67) Obtain the expected value of the random variables for the following
probability functions:
𝑥 − 1 2 3 1 𝑥−3
a. 𝑃(𝑋 = 𝑥) = ( ) (3) (3) ; 𝑥 = 3,4,5, …
2
𝑥+2 2 3 1 𝑥
b. 𝑃(𝑋 = 𝑥) = ( )( ) ( ) ; 𝑥 = 0,1,2, …
2 3 3
2 1
c. 𝑃(𝑋 = 𝑥) = (𝑥 − 1 )(3)2 (3)𝑥−2 ; 𝑥 = 2,3,4, …
2 1
d. 𝑃(𝑋 = 𝑥) = (𝑥 + 1)(3)2 (3)𝑥 ; 𝑥 = 0,1,2, …
𝑥−1
e. 𝑃(𝑋 = 𝑥) = ; 𝑥 = 2,3,4, …
2𝑥
𝑥+1
f. 𝑃(𝑋 = 𝑥) = 2𝑥+2 ; 𝑥 = 0,1,2, …
68) In a competition, people A and B keep playing until one of them achieves five
wins. In each game, independently, people A and B win with respective
1 2
probabilities 3 and 3.
a. What is the probability that person A wins the competition sooner?

b. What is the probability that the competition is terminated in seven
games?
c. If the competition terminates in a total of seven games, what is the
probability that person A wins the competition?
69) (Banach's match problem) A smoker buys two matchboxes and puts one of
them in his left pocket and the other in his right pocket. Each time he wants
325 | P a g e
to smoke, he chooses one of his pockets randomly and picks a matchstick to
light a cigarette. Suppose that each matchbox contains 𝑁 matchsticks. When
the person gets informed that one of the matchboxes is empty, what is the
probability that the other matchbox contains exactly 𝑚 matchsticks (𝑚 =
0,1,2, … , 𝑁)?
70) The number of customers in a store follows a Poisson distribution with mean
20 people per hour.
a. What is the probability that it takes more than 5 minutes for the first
customer to visit the store after opening of the entryway?
b. What is the probability that at least one person visits the store in the
first 10 minutes?
1
c. If each customer is a man with probability 4, what is the probability
that at least one man visits the store in the first hour?
d. With respect to the preceding section, what is the probability that no
woman visits the store in the first half of an hour after opening of the
entryway?
e. With respect to the materials expressed in the third section of this
problem, if we know that four customers have visited the store, what
is the probability that one of them is a man?
f. If we know that one customer has visited the store in the first half an
hour after opening of the entryway, what is the probability that two
customers visit the store in the next half an hour?
g. If we know that a male customer has visited the store in the first half
an hour after opening of the entryway, what is the probability that
three female customers visit the store in the next half an hour?
h. If we know that one customer visited the store in the first hour after
opening of the entryway, what is the probability that the customer
has visited the store in the first 15 minutes?
i. If we know that three customers visited the store in the first hour
after opening of the entryway, what is the probability that two of
them have visited the store in the first 15 minutes?
326 | P a g e
71) The number of customers in a store follows a Poisson distribution with
parameter 𝜆 = 300 people per day, and each customer buys a particular
product with probability 0.01 from the store. What is the probability that
three products are sold on a certain day?
72) Suppose that in a telecommunications office, the number of calls made to
the center in one hour follows a Poisson distribution with a mean of 10. The
telecommunication operator spends 15 minutes per hour on his personal
work and does not respond to the phone. If the random variable 𝑋 denotes
the number of calls that he responds in 8 hours, obtain the probability
function of 𝑋.
73) The number of events in an area per year measured in 𝑆 square kilometers
𝑆
follows a Poisson distribution with parameter 2. What is the probability that
an accident from a certain point to a radius of two kilometers will not occur
in 2 years?
𝑡 −1)
74) If the moment generating function of the random variable 𝑋 is 𝑀𝑥 (𝑡) = 𝑒 (𝑒 ,
then it is desired to calculate:
b. 𝑃( 𝑋 > 2)
c. 𝑃(𝑋 = 1|𝑋 < 3)
d. The probability that this random variable takes on an even value.
75) Obtain the expected value of the random variable 𝑋 for the following
probability functions:
22𝑥
a. 𝑃(𝑋 = 𝑥) = 𝑐 ; 𝑥 = 0,1,2, …
𝑥!
𝑐
b. 𝑃(𝑋 = 𝑥) = 𝑥! ; 𝑥 = 0,1,2, …
𝑐
c. 𝑃(𝑋 = 𝑥) = 𝑥! ; 𝑥 = 1,2,3, …
76) A random digit generator produces one of the digits from 0 to 99

independently with equal probability in each trial. It is desired to calculate the
approximate probability that at least one of them is zero.
327 | P a g e
77) In a pastry, when raisin bread is prepared, 𝑛 raisins are randomly spread on
the dough. Then, the 𝑘 raisin bread is baked with this dough. If 𝑛 and 𝑘 are
large, what is the approximate probability of being at least one raisin in a given
raisin bread?
78) If 30 people throw their hats in the middle of a room and each of them chooses
a hat at random, then it is desired to calculate the probability that:
a. Two people pick their hats.
b. At least one person picks up his hat.
79) If 20 couples are arranged randomly in a row, what is the approximate
probability that none of the couples is next to each other?
80) If 100 people are in a party, what is the probability that:
a. No two people were born in one day?
b. No three people were born in one day?
81) The management of a movie theater, based on experience, knows that every
spectator who buys a ticket attends in the hall with probability %98. Suppose
that the hall's capacity is 298 people, and 300 tickets have been sold in a
certain screening. What is the approximate probability that the hall faces a
shortage of seats in this screening?
82) The number of severe accidents in a city follows a Poisson distribution with a
mean of 5 accidents per day. Assuming that the number of accidents in
different days is independent, what is the probability that a day with 6
accidents occurs before a day with 3 accidents?
Hint: use the result of Example 4.8 in Chapter 3.
83) The number of each person's accidents in an accident-prone area follows a
Poisson random variable with parameter 𝜆 = 5 accidents per year. Suppose
that a new law is enacted that reduces the Poisson parameter for the people
enforcing the law to 𝜆 = 3. If only 75 percent of the people in this area enforce
the law and we know that a particular person has not had an accident in one
year, it is desired to calculate:
a. The probability that the person enforces the law.
328 | P a g e
b. The probability that the person will not have an accident in the next
year.
84) If the random variable 𝑋 denotes the number of road accidents per day
following a Poisson distribution parameter 𝜆 = 1, it is desired to calculate:
a. The average number of accidents on days in which at most one accident
occurs (calculating 𝐸(𝑋|𝑋 ≤ 1)).
b. 𝐸(𝑋 2 |𝑋 ≤ 1)
c. 𝑉𝑎𝑟(𝑋|𝑋 ≤ 1)
d. The average number of accidents on days in which the number of
accidents is not equal to zero (calculating 𝐸(𝑋|𝑋 > 0)).
e. 𝐸(𝑋 2 |𝑋 > 0)
f. 𝑉𝑎𝑟(𝑋|𝑋 > 0)
85) If the random variable 𝑋 follows a Poisson distribution with parameter 𝜆, using
Problem 57 of Chapter 5, obtain the expected value and variance of this
distribution.
86) Suppose that the random variable 𝑋 follows a Poisson distribution with
𝑒 −𝜆 𝜆𝑥
probability function 𝑃(𝑋 = 𝑥) = ; 𝑥 = 0,1,2, … . In these conditions, obtain
𝑥!
the following values:
a. 𝐸(𝑋!)
1
b. 𝐸(1+𝑋)
87) Suppose that the random variable 𝑋 follows a Poisson distribution with
parameter 𝜆 = 2. In these conditions, it is desired to calculate:
a. 𝐸[𝑒 2𝑥 ]
b. 𝐸[(−1) 𝑋 ]
c. 𝐸[(1 + 2𝑋 )2 ]
d. 𝐸[𝑋𝑒 2𝑋 ]
𝜕𝐸(𝑒 𝑡𝑋 )
Hint: = 𝐸(𝑋𝑒 𝑡𝑋 )
𝜕𝑡
329 | P a g e
e. 𝐸[𝑋(−1) 𝑋−1 ]
𝜕𝐸(𝑡 𝑋 )
Hint: = 𝐸(𝑋𝑡 𝑋−1 )
𝜕𝑡
𝑋!
f. 𝐸 [(𝑋−3)!]
88) From a street with an equal width to that of a car, cars pass at a rate according
to the Poisson process. A person wants to cross this street. His transit time is
constant and equal to T. Suppose that if a car passes the street while a person
crosses the street, it will definitely hit him.
a. In each trial, what is the probability that the person has the opportunity
to cross the street?
b. Before the person succeeds in crossing the street, on average, how
many trials should he do?
89) There are 6 white and 3 black marbles in a box. If we select 4 marbles at
random from the box, then obtain the expected value and variance of the
number of selected white marbles in two cases of with replacement and
without replacement.
90) Suppose that we have a lot of size 10 and five of them are defective. We select
a sample of size 4 at random and without replacement from the lot. If the
random variable 𝑋 denotes the number of defective items in a sample of size
4, then it is desired to calculate:
a. 𝐸(𝑋)
b. 𝑉𝑎𝑟(𝑋)
c. 𝐸[(𝑋 − 1)2 ]
d. 𝑃(𝑋 = 0|𝑋 ≤ 1)
e. 𝐸(𝑋|𝑋 ≥ 1)
91) A company sells its products in lots of size 10. A buyer first takes a sample of
size 3 at random from the lot and only accepts the lots that the inspected
sample does not contain a defective product. Suppose that 25 percent of lots
contain no defective products, 50 percent contain one defective product, and
25 percent contain two defective products. In these conditions,
330 | P a g e
a. What is the probability that a lot randomly selected by the buyer is
accepted?
b. If a lot is accepted, what is the probability that the lot contains no
defective products?
92) In a lot produced by a company, there are 1000 products that 10 of them are
defective. If we select a sample of size 100 at random, what is the approximate
probability that the sample contains exactly one defective product?
93) Suppose we have two lots of size 1000 and one of them contains 10 defective
pieces and the other lot contains 20 defective pieces. We select one of the lots
at random and withdraw a sample of size 100 at random from it. If the sample
contains no defective pieces, then we accept the lot. In these conditions,
a. What is the approximate probability of accepting the lot?
b. If a lot is accepted, what is the approximate probability that the lot
contains 10 defective pieces?
94) Suppose that there are three white and seven black balls in a box. Two random
samples of size 2 and 2 are selected successively and without replacement
from the box. If 𝑋 and 𝑌 denote the number of white balls in the first and
second samples, respectively, then obtain the expected value of random
variables 𝑋 and 𝑌.
95) Consider a lot consisting of 10 pieces that four of them are defective. We select
a sample of size 3 at random and without replacement from the lot. If the
inspection time of each nondefective piece is 2 minutes and the inspection as
well as repair time of each defective piece is 3 minutes, then obtain the
expected value and standard deviation of the inspection as well as repair time
(measured in minutes) required for the pieces of the sample.
96) Obtain the expected value and variance of the following random variables:
1
a. 𝑃(𝑋 = 𝑖) = 𝑘 ; 𝑖 = 1,2,3, … , 𝑘
1
b. 𝑃(𝑌 = 𝑖) = 𝑘+1 ; 𝑖 = 0,1,2,3, … , 𝑘
1
c. 𝑃(𝑍 = 𝑖) = 𝑘 ; 𝑖 = 1,3,5, … ,2𝑘 − 1
331 | P a g e
97) A person has 10 keys, one of which opens the door of his home. If he selects
the keys at random until getting the key that opens the door, obtain the
expected value and variance of the number of choices required to open the
door in the following conditions:
98) In the preceding problem, suppose that it takes 7 seconds to try any key that
does not open the door and 4 seconds to try any key that opens the door.
Obtain the expected value and standard deviation of the time (measured in
seconds) required to open the door in the following conditions:
332 | P a g e
I n Chapter 4, we presented the definition of random variable and then introduced
different types of random variables in Chapter 5. In Chapter 6, some special
discrete random variables were introduced, and their properties were addressed. In
this chapter, we examine some continuous random variables having many
applications.
T he first and simplest continuous random variable is a uniform random variable

in which the density of an interval's different points in continuous space is
assumed to be the same. If 𝑋 denotes the coordinates of points in the interval (𝑎, 𝑏),
the density of all points in the space is the same and assumed to be equal to a number
such as 𝑘. In other words, the density function of this random variable in the interval
(𝑎, 𝑏) does not depend on the values of 𝑥. It is defined as follows:
𝑘 ; 𝑎≤𝑥≤𝑏
𝑓(𝑥) = {
If the random variable 𝑋 follows a uniform distribution in an interval of real

numbers from 𝑎 to 𝑏, we briefly denote it by 𝑋 ∼ 𝑈(𝑎, 𝑏).
Since this function should be a density function, its integral in the interval
(𝑎, 𝑏) is to be equal to 1. Therefore, we have:
333 | P a g e
+∞ 𝑏 𝑏 1
∫ 𝑓(𝑥)𝑑𝑥 = 1 ⇒ ∫ 𝑘𝑑𝑥 = 1 ⇒ 𝑥𝑘 | 𝑎 = 1 ⇒ 𝑘(𝑏 − 𝑎) = 1 ⇒ 𝑘 =
−∞ 𝑎 𝑏−𝑎
Therefore, the density function of the uniform random variable in the interval
(𝑎, 𝑏) is defined as follows:
1
𝑓(𝑥) = {𝑏 − 𝑎 ; 𝑎≤𝑥≤𝑏
Also, Figure 7.1 shows its density function curve.
Figure 7.1: Density function curve of the uniform random variable
If the random variable 𝑋 is a uniform random variable in the interval [𝑎, 𝑏], then
𝑙
the probability of each interval of length 𝑙 inside the interval [𝑎, 𝑏] is equal to .
𝑏−𝑎
Suppose that (𝑢, 𝑣) is an interval of length l such that 𝑎 < 𝑢 < 𝑣 < 𝑏. If so, then we
have:
𝑣
1 𝑥 𝑣 𝑣−𝑢 𝑙
𝑃(𝑢 < 𝑋 < 𝑣) = ∫ 𝑑𝑥 = | = =
𝑢 𝑏−𝑎 𝑏−𝑎 𝑢 𝑏−𝑎 𝑏−𝑎
Figure 7.2 shows the probability of an interval of length l from a uniform

random variable.
334 | P a g e
𝑓𝑥 (𝑥)
1
𝑘=
𝑏−𝑎
a u v b
𝑥
Figure 7.2: The probability of a region from the uniform random variable
Example 2.1
A person's arrival time to his work office follows a uniform distribution in the
interval [8: 009: 00] (measured in hours). What is the probability that the person
arrives at his office in the time interval 08:20 to 08:40?
Solution. If 𝑋 denotes the time in terms of minutes and consider 8 a.m. to be the
origin of time, then we have:
40 40
1 𝑥 40 40 − 20 1
𝑃(20 ≤ 𝑋 ≤ 40) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 = | = =
20 20 60 − 0 60 20 60 − 0 3
Example 2.2
From 7:15 a.m., there are trains that move to destination A and arrive at a
station every 15 minutes. If a passenger arrives at the station in a time that is evenly
335 | P a g e
distributed between 7 and 8 a.m., and the passenger boards the first train to arrive at
the station, what is the probability that he waits at the station for less than 5 minutes?
Solution. To wait less than 5 minutes, the passenger should arrive at the interval
from 7:10 to 7:15, 7:25 to 7:30, 7:40 to 7:45, or 7:55 to 8. Therefore, if 𝑋 denotes the time
in terms of minutes and consider 8 am to be the origin of time, the required
probability is equal to:
𝑃(10 ≤ 𝑋 ≤ 15) + 𝑃(25 ≤ 𝑋 ≤ 30) + 𝑃(40 ≤ 𝑋 ≤ 45) + 𝑃(55 ≤ 𝑋 ≤ 60)

15 − 10 30 − 25 45 − 40 60 − 55 20 1
= + + + = =
60 60 60 60 60 3
Example 2.3
Three people are supposed to be present at a place such that each person's
arrival time, independently of others, follows a uniform distribution in the interval
[8: 00,9: 00] (measured in hours). It is desired to calculate the probability that exactly
two of them arrive at the place before 8:20?
Solution. Using the preceding example, each person, independently of others,
1
arrives at the place before 8:20 with probability 3. Hence, the probability that exactly
two of them arrive at the place before 8:20 is equal to:
3 1 2 2
( ) ( )2 ( )1 =
2 3 3 9
Example 2.4
The number of defective pieces produced by a working station follows a

Poisson distribution with parameter 𝜆 = 6 per hour. The inspector of this station
observes no defective pieces in his first inspection at 8 a.m.; however, in his second
336 | P a g e
inspection at 9 p.m., he observes one defective piece. What is the probability that
this piece is produced before 8:20?
Solution. Suppose that events A and B are as follows:
A: one event occured in the interval [8: 00 − 8: 20].
B: one event has occured in the interval [8: 00 − 9: 00].
Therefore, the required probability is equal to:
1 1 2 2
𝑒 −6×3 × (6 × 3)1 𝑒 −6×3 × (6 × 3)0
𝑃(𝐴 ∩ 𝐵) × 1
𝑃(𝐴|𝐵) = = 1! 0! =
𝑃(𝐵) 𝑒 −6×1 (6 × 1)1 3
1!
Note that 𝐴 ∩ 𝐵 means that one event occurs in the first 20 minutes and no
events occur in the remaining 40 minutes.
As seen, if we know that an event has occured in a one-hour time interval of a
Poisson process, the probability that this event occurred in the first 20 minutes is
1
equal to 3. Likewise, it can be shown that if we know an event has occured in a time
interval of length 𝑇 from a Poisson process, then the probability that this event
𝐿
occurred in an interval of length 𝐿 inside the same interval (𝐿 < 𝑇) is equal to 𝑇. Also,
its occurrence time follows the uniform distribution in that interval.
Example 2.5
The number of customers entering a store per hour follows a Poisson

distribution with parameter 𝜆 = 2. If we know that two customers have entered the
store in the interval [8: 00 − 9: 00], what is the probability that both of them entered
in the first half of an hour?
Solution. Suppose that events A and B are as follows:

A: Two events occured in the interval [8: 00 − 8: 30].
B: Two events have occured in the interval [8: 00 − 9: 00].
337 | P a g e
Therefore, the required probability is equal to:
1 1 1 1
𝑒 −2×2 × (2 × 2)2 𝑒 −2×2 × (2 × 2)0
𝑃(𝐴 ∩ 𝐵) × 1
𝑃(𝐴|𝐵) = = 2! 0! =
𝑃(𝐵) 𝑒 −2×1 (2 × 1) 2 4
2!
Note that 𝐴 ∩ 𝐵 means that the two customers enter in the first half an hour and no
customer enters in the second half an hour.
In fact, each customer, independently of others and uniformly, has entered in
the first hour. Hence, the probability that both of the customers have entered in the
first half an hour is equal to:
1 1 1
× =
2 2 4
Likewise, it can be shown that if we know some events have occured in a time
interval of length 𝑇 from a Poisson process, the probability that each of these events
has occurred independently in an interval of length 𝐿 inside the same interval (𝐿 <
𝐿
𝑇) is equal to 𝑇. Also, the occurrence time for each of them follows the uniform
distribution in that interval.
T he cumulative distribution function of the continuous random variable is

0 ; 𝑥≤𝑎
𝑥 𝑥
1 𝑦 𝑥−𝑎
𝐹(𝑥) = ∫ 𝑑𝑦 = | = ; 𝑎<𝑥<𝑏
𝑎 𝑏−𝑎 𝑏−𝑎 𝑎 𝑏−𝑎
{1 ; 𝑥≥𝑏
Figure 7.3 shows the cumulative distribution function curve of the uniform
random variable.
338 | P a g e
𝐹𝑥 (𝑥)
a
𝑥
b
Figure 7.3: The cumulative distribution function curve of the uniform random variable
The expected value and variance of the continuous uniform random variable
in the interval [𝑎, 𝑏] is calculated as follows:
+∞ 𝑏
𝑥 𝑏 2 − 𝑎2 𝑎+𝑏
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 = =
−∞ 𝑎 𝑏−𝑎 2 (𝑏 − 𝑎) 2
+∞ 𝑏 2 3 3
𝑥 𝑏 −𝑎 𝑏 + 𝑎𝑏 + 𝑎2
2
𝐸(𝑋 2 ) = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 = =
−∞ 𝑎 𝑏−𝑎 3(𝑏 − 𝑎) 3
𝑏 2 + 𝑎𝑏 + 𝑎2 (𝑎 + 𝑏)2 (𝑏 − 𝑎)2
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = − =
3 4 12
The moment generating function of the continuous uniform random variable

in the interval [𝑎, 𝑏] is determined as follows:
+∞ 𝑏
𝑡𝑋 𝑡𝑥 𝑡𝑥
1 1 𝑡𝑥 |𝑏
𝑒 𝑡𝑏 − 𝑒 𝑡𝑎
𝑀𝑋 (𝑡) = 𝐸(𝑒 ) = ∫ 𝑒 𝑓(𝑥)𝑑𝑥 = ∫ 𝑒 𝑑𝑥 = 𝑒 𝑎=
−∞ 𝑎 𝑏−𝑎 𝑡(𝑏 − 𝑎) 𝑡(𝑏 − 𝑎)
Moreover, to calculate 𝐸(𝑋) and 𝐸(𝑋 2 ), we can differentiate the moment

generating function once and twice with respect to t, respectively, and then let t be
zero. Note that because of the symmetry of the uniform distribution, its median and
𝑎+𝑏
mean are the same, equal to .
2
Proposition 2-1
If the random variable X follows a uniform distribution in the interval [𝑎, 𝑏], then 𝑌 =
𝑐𝑋 + 𝑑 for 𝑐 > 0 follows a uniform distribution in the interval [𝑐𝑎 + 𝑑, 𝑐𝑏 + 𝑑], and for
𝑐 < 0 follows a uniform distribution in the interval [𝑐𝑏 + 𝑑, 𝑐𝑎 + 𝑑].
339 | P a g e
Proof. According to Theorem 8.1 in Chapter 4, since the function 𝑌 = 𝑐𝑋 + 𝑑 is
reversible, the density function is determined as follows:
𝑦 − 𝑑 𝑑𝑥 𝑑𝑔−1 (𝑦) 1
𝑦 = 𝑔(𝑥) = 𝑐𝑥 + 𝑑 ⇒ 𝑥 = 𝑔−1 (𝑦) = ⇒ = =
𝑐 𝑑𝑦 𝑑𝑦 𝑐
𝑦−𝑑 1 1 1
⇒ 𝑓𝑌 (𝑦) = 𝑓𝑋 ( )| | = | |
𝑐 𝑐 𝑏−𝑎 𝑐
Where, for positive values of 𝑐, the above density function is a uniform density
function in the interval [𝑐𝑎 + 𝑑, 𝑐𝑏 + 𝑑], and for negative values of 𝑐, the above density
function is a uniform density function in the interval [𝑐𝑏 + 𝑑, 𝑐𝑎 + 𝑑].
T
he density function of a normal random variable with parameters 𝜇 and 𝜎 2
indicating the mean and variance of this random variable, respectively, is shown
as follows:
1 −(𝑥−𝜇)2
𝑓(𝑥) = 𝑒 2𝜎2 ; −∞ < 𝑥 < +∞
𝜎√2𝜋
If random variable 𝑋 follows a normal distribution with parameters 𝜇 and 𝜎 2 ,

we briefly denote it by 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ).
The density function curve of the normal distribution is bell-shaped and
symmetric about 𝜇.
Figure 7.4: The density function curve of the normal random variable with parameters 𝜇 and 𝜎 2
Some examples of the normal distribution are the height of men in a college,
measurement error of length of a piece, the diameter of a machined shaft, and annual
rainfall in an area.
340 | P a g e
The normal distribution is the most applicable distribution to model random
experiences in the real world. This distribution was initially introduced by Abraham
DeMoivre, a French mathematician, in 1733 to approximate a binomial distribution
1
with parameters 𝑛 and 𝑝 = 2. However, it had been unknown approximately for 100
years until Gauss presented this distribution in 1809. This is why it is sometimes
called a Gaussian distribution. Of course, Laplace also used the normal distribution
to approximate a binomial distribution with parameters 𝑛 and 𝑝. This distribution is
called normal because it is compatible with many real variables available in nature.
To show this fact, many efforts were made in the 18th and 19th centuries.
To prove that 𝑓(𝑥) is a density function, we should show that its integral in the
𝑥−𝜇
interval (−∞, +∞) is equal to 1. To do so, using variable substitution 𝑦 = , we have: 𝜎
+∞ −(𝑥−𝜇)2 +∞ −𝑦 2
1 1
∫ 𝑒 2𝜎2 𝑑𝑥 = ∫ 𝑒 2 𝑑𝑦
𝜎√2𝜋 −∞ √2𝜋 −∞
Also, we define:
+∞ −𝑦 2
𝐼= ∫ 𝑒 2 𝑑𝑦
−∞
Therefore, we have:
+∞ −𝑦 2 +∞ −𝑥 2 +∞ +∞ −(𝑥 2 +𝑦 2 )
𝐼2 = ∫ 𝑒 2 𝑑𝑦 ∫ 𝑒 2 𝑑𝑥 =∫ ∫ 𝑒 2 𝑑𝑦 𝑑𝑥
−∞ −∞ −∞ −∞
Now, defining 𝑥 = 𝑟 𝑐𝑜𝑠 𝜃 and 𝑦 = 𝑟 𝑠𝑖𝑛 𝜃 leads to:
∞ 2𝜋 −𝑟 2 ∞ −𝑟 2 2𝜋 ∞ −𝑟 2
𝑑𝑥𝑑𝑦 = 𝑟𝑑𝜃𝑑𝑟 ⇒ 𝐼 2 = ∫ ∫ 𝑒 2 𝑟𝑑𝜃𝑑𝑟 = ∫ 𝑟𝑒 2 (∫ 𝑑𝜃 ) 𝑑𝑟 = 2𝜋 ∫ 𝑟𝑒 2 𝑑𝑟
0 0 0 0 0
−𝑟 2 ∞ 1 +∞ −𝑦 2 1
= −2𝜋𝑒 2 | 0 = 2𝜋 ⇒ 𝐼 = √2𝜋 ⇒ ∫ 𝑒 2 𝑑𝑦 = √2𝜋 = 1
√2𝜋 −∞ √2𝜋
To determine the mean of a normal distribution, we have:
+∞ −(𝑥−𝜇)2
1
𝐸(𝑋) = ∫ 𝑥𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞
Where letting 𝑥 = (𝑥 − 𝜇) + 𝜇, we have:
341 | P a g e
+∞ −(𝑥−𝜇)2 +∞ −(𝑥−𝜇)2
1 1
𝐸(𝑋) = ∫ (𝑥 − 𝜇)𝑒 2𝜎2 𝑑𝑥 +𝜇 ∫ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞ 𝜎√2𝜋 −∞
Where letting 𝑦 = (𝑥 − 𝜇) in the first integral and knowing that f(x) is a normal
density function, we have:
+∞ −𝑦 2 +∞
1
= ∫ 𝑦𝑒 2𝜎2 𝑑𝑦 +𝜇∫ 𝑓(𝑥)𝑑𝑥 = 0 + 𝜇
𝜎√2𝜋 −∞ −∞
To determine the variance of the normal distribution, we have:
+∞ −(𝑥−𝜇)2
2
1 2
𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇) ] = ∫ (𝑥 − 𝜇) 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋 −∞
𝑥−𝜇
Where letting 𝑦 = , we have:
𝜎
𝜎2 +∞ −𝑦 2 𝜎2 −𝑦2 +∞ +∞ −𝑦 2
𝜎 2 +∞ −𝑦2
2
𝑉𝑎𝑟(𝑋) = ∫ 𝑦 𝑒 2 𝑑𝑦 = [−𝑦𝑒 2 | + ∫ 𝑒 𝑑𝑦] =
2 ∫ 𝑒 2 𝑑𝑦 = 𝜎 2
√2𝜋 −∞ √2𝜋 −∞ −∞ √2𝜋 −∞
Proposition 3-1
If the random variable 𝑋 follows a normal distribution with parameters 𝜇 and 𝜎 2 , then
𝑌 = 𝑎𝑋 + 𝑏 also follows a normal distribution with parameters 𝑎𝜇 + 𝑏 and 𝑎2 𝜎 2 .
Proof. According to Theorem 8.1 in Chapter 4, since function 𝑌 = 𝑎𝑋 + 𝑏 is reversible,

its density function is determined as follows:
𝑦 − 𝑏 𝑑𝑥 𝑑𝑔−1 (𝑦) 1
𝑦 = 𝑔(𝑥) = 𝑎𝑥 + 𝑏 ⇒ 𝑥 = 𝑔−1 (𝑦) = ⇒ = =
𝑎 𝑑𝑦 𝑑𝑦 𝑎
𝑦−𝑏 2
−( −𝜇) −(𝑦−(𝑎𝜇+𝑏))2
𝑦−𝑏 1 1 𝑎
2
1
⇒ 𝑓𝑌 (𝑦) = 𝑓𝑋 ( )| | = 𝑒 2𝜎 = 𝑒 2𝑎2 𝜎2 ; −∞ < 𝑦 < +∞
𝑎 𝑎 |𝑎|𝜎√2𝜋 |𝑎|𝜎√2𝜋
Where the above density function is a normal density function with mean 𝑎𝜇 +
𝑏 and variance 𝑎2 𝜎 2 .
342 | P a g e
The most important result of proposition 3.1 is that if 𝑋 follows a normal
𝑋−𝜇
distribution with parameters 𝜇 and 𝜎 2 , then the random variable 𝑍 = also follows
𝜎
a normal distribution with mean zero and variance one. In such conditions, random
variable 𝑍 follows a standard normal distribution with the following density function:
1 −𝑧 2
𝑓(𝑧) = 𝑒 2 ; −∞ < 𝑧 < +∞
√2𝜋
The cumulative distribution function is usually used to calculate the

𝑋−𝜇
probabilities of the normal distribution. In other words, variable substitution 𝑍 = 𝜎
is used to calculate the probability of normal random variable 𝑋. Hence, for the
cumulative distribution function of 𝑋, we have:
𝑋−𝜇 𝑎−𝜇 𝑎−𝜇 𝑎−𝜇

𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = 𝑃( ≤ ) = 𝐹𝑍 ( ) = 𝜑( )
𝜎 𝜎 𝜎 𝜎
(Usually, the standard normal cumulative distribution function is denoted by

𝜑.)
Figure 7.5: Turning the normal distribution of 𝑋 into the standard normal distribution
Therefore, by using Table 7.1 designed for the standard normal distribution,
we can obtain the probability of each event from the random variable 𝑋 with
parameters 𝜇 and 𝜎 2 . It should be noted that the cumulative distribution function
values are defined for only positive values in this table.
343 | P a g e
Table 7.1 𝜑(𝑧) is the area of under the standard normal curve in the left side of z
Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0 0.5 0.50398 0.50797 0.51196 0.51595 0.51993 0.52392 0.5279 0.53188 0.53585
0.1 0.53982 0.54379 0.54775 0.55171 0.55567 0.55961 0.56355 0.56749 0.57142 0.57534
0.2 0.57925 0.58316 0.58706 0.59095 0.59483 0.5987 0.60256 0.60641 0.61026 0.61409
0.3 0.61791 0.62171 0.62551 0.6293 0.63307 0.63683 0.64057 0.6443 0.64802 0.65173
0.4 0.65542 0.65909 0.66275 0.6664 0.67003 0.67364 0.67724 0.68082 0.68438 0.68793
0.5 0.69146 0.69497 0.69846 0.70194 0.7054 0.70884 0.71226 0.71566 0.71904 0.7224
0.6 0.72574 0.72906 0.73237 0.73565 0.73891 0.74215 0.74537 0.74857 0.75174 0.7549
0.7 0.75803 0.76114 0.76423 0.7673 0.77035 0.77337 0.77637 0.77935 0.7823 0.78523
0.8 0.78814 0.79102 0.79389 0.79673 0.79954 0.80233 0.8051 0.80784 0.81057 0.81326
0.9 0.81593 0.81858 0.82121 0.82381 0.82639 0.82894 0.83147 0.83397 0.83645 0.83891
1 0.84134 0.84375 0.84613 0.84849 0.85083 0.85314 0.85542 0.85769 0.85992 0.86214
1.1 0.86433 0.8665 0.86864 0.87076 0.87285 0.87492 0.87697 0.87899 0.88099 0.88297
1.2 0.88493 0.88686 0.88876 0.89065 0.89251 0.89435 0.89616 0.89795 0.89972 0.90147
1.3 0.90319 0.9049 0.90658 0.90824 0.90987 0.91149 0.91308 0.91465 0.9162 0.91773
1.4 0.91924 0.92073 0.92219 0.92364 0.92506 0.92647 0.92785 0.92921 0.93056 0.93188
1.5 0.93319 0.93447 0.93574 0.93699 0.93821 0.93942 0.94062 0.94179 0.94294 0.94408
1.6 0.9452 0.9463 0.94738 0.94844 0.94949 0.95052 0.95154 0.95254 0.95352 0.95448
1.7 0.95543 0.95636 0.95728 0.95818 0.95907 0.95994 0.96079 0.96163 0.96246 0.96327
1.8 0.96406 0.96485 0.96562 0.96637 0.96711 0.96784 0.96855 0.96925 0.96994 0.97062
1.9 0.97128 0.97193 0.97257 0.97319 0.97381 0.97441 0.975 0.97558 0.97614 0.9767
2 0.97724 0.97778 0.9783 0.97882 0.97932 0.97981 0.9803 0.98077 0.98123 0.98169
2.1 0.98213 0.98257 0.98299 0.98341 0.98382 0.98422 0.98461 0.98499 0.98537 0.98573
2.2 0.98609 0.98644 0.98679 0.98712 0.98745 0.98777 0.98808 0.98839 0.98869 0.98898
2.3 0.98927 0.98955 0.98982 0.99009 0.99035 0.99061 0.99086 0.9911 0.99134 0.99157
2.4 0.9918 0.99202 0.99223 0.99245 0.99265 0.99285 0.99305 0.99324 0.99343 0.99361
2.5 0.99379 0.99396 0.99413 0.99429 0.99445 0.99461 0.99476 0.99491 0.99505 0.9952
2.6 0.99533 0.99547 0.9956 0.99573 0.99858 0.99597 0.99609 0.9962 0.99631 0.99642
2.7 0.99653 0.99663 0.99673 0.99683 0.99692 0.99702 0.9971 0.99719 0.99728 0.99736
2.8 0.99744 0.99752 0.99759 0.99767 0.99774 0.99781 0.99788 0.99794 0.99801 0.99807
2.9 0.99813 0.99819 0.99824 0.9983 0.99835 0.99841 0.99846 0.99851 0.99855 0.9986
3 0.99865 0.99869 0.99873 0.99877 0.99881 0.99885 0.99889 0.99892 0.99896 0.99899
3.1 0.99903 0.99906 0.99909 0.99912 0.99915 0.99918 0.99921 0.99923 0.99926 0.99928
3.2 0.99931 0.99933 0.99935 0.99938 0.9994 0.99942 0.99944 0.99946 0.99948 0.99949
3.3 0.99951 0.99953 0.99954 0.99956 0.99958 0.99959 0.99961 0.99962 0.9963 0.99965
3.4 0.99966 0.99967 0.99968 0.99969 0.9997 0.99971 0.99972 0.99973 0.99974 0.9975
3.5 0.99976 0.99977 0.99978 0.99979 0.99979 0.9998 0.99981 0.99982 0.99982 0.99983
3.6 0.99984 0.99984 0.99985 0.99985 0.99986 0.99986 0.99987 0.99987 0.99988 0.99988
3.7 0.99989 0.99989 0.9999 0.9999 0.9999 0.99991 0.99991 0.99991 0.99992 0.99992
3.8 0.99992 0.99993 0.99993 0.99993 0.99993 0.99994 0.99994 0.99994 0.99994 0.99994
3.9 0.99995 0.99995 0.99995 0.99995 0.99995 0.99996 0.99996 0.99996 0.99996 0.99996
Note that the numbers in the left-hand side column and the top row of the
table are related to the standard normal random variable values. Moreover, the
numbers inside the table indicate the values of the cumulative distribution function
for this random variable. For instance, the intersection of number 1.6 from the left-
hand side column and number 0.05 from the top row of the table is 0.9505, meaning
344 | P a g e
that the standard normal random variable takes on values less than or equal to 1.65
with probability 0.9505. In other words, we have:
𝜑(1.65) = 0.9505
In this table, the cumulative distribution function value for negative values is
not shown because in case of using the symmetry property of the standard normal
distribution, we can obtain the negative values by the following relationship:
𝑃(𝑋 ≤ −𝑥) = 1 − 𝑃(𝑋 ≥ 𝑥) = 1 − 𝑃(𝑋 < 𝑥) ⇒ 𝜑(−𝑥) = 1 − 𝜑(𝑥) ; 𝑥 > 0
To use this table in a simpler way, consider the following examples:
𝑃(𝑍 ≤ 0) = 𝜑(0) = 0.5
𝑃(𝑍 > 1.35) = 1 − 𝜑(1.35)
1.35
𝑃(1.5 ≤ 𝑍 ≤ 2.5) = 𝜑(2.5) − 𝜑(1.5)
1.5 2.5
𝑃(𝑍 ≥ −1.5) = 𝑃(𝑍 ≤ 1.5) = 𝜑(1.5)
−1.5
𝑃(𝑍 ≤ −1.65) = 𝑃(𝑍 ≥ 1.65) = 1 − 𝜑(1.65)
345 | P a g e
−1.65
𝑃(−2.5 ≤ 𝑍 ≤ −1.5) = 𝑃(1.5 ≤ 𝑍 ≤ 2.5) = 𝜑(2.5) − 𝜑(1.5)
−1.5 − 2.5
𝑃(−1.5 ≤ 𝑍 ≤ 1.5) = 𝜑(1.5) − 𝜑(−1.5)

= 𝜑(1.5) − (1 − 𝜑(1.5)) = 2𝜑(1.5) − 1
−1.5 1.5
𝑃(−1.5 ≤ 𝑍 ≤ 2.5) = 𝜑(2.5) − 𝜑(−1.5)

= 𝜑(2.5) − (1 − 𝜑(1.5)) = 𝜑(2.5) + 𝜑(1.5) − 1
−1.5 2.5
In general, for positive values of 𝑎 and 𝑏, it can be shown that:
𝑃(−𝑎 ≤ 𝑍 ≤ 𝑎) = 𝜑(𝑎) − 𝜑(−𝑎) = 𝜑(𝑎) − (1 − 𝜑(𝑎)) = 2𝜑(𝑎) − 1

𝑃(−𝑎 ≤ 𝑍 ≤ 𝑏) = 𝜑(𝑏) − 𝜑(−𝑎) = 𝜑(𝑏) − (1 − 𝜑(𝑎)) = 𝜑(𝑎) + 𝜑(𝑏) − 1
346 | P a g e
Example 3.1
Suppose that the thirty-year-old men's height in a city follows a normal

random variable with parameters 𝜇 = 176 (in centimeters) and 𝜎 = 5 (in centimeters).
a. What proportion of the thirty-year-old men’s height in the city are less
than 182 centimeters?
b. What proportion of the thirty-year-old men’s height in the city are
more than 178 centimeters?
c. What proportion of the thirty-year-old men’s height in the city are
between 178 and 182 centimeters?
d. What proportion of the thirty-year-old men’s height in the city are
between 170 and 186 centimeters?
Solution. Suppose that the random variable 𝑋 denotes the height of the thirty-year-
old people measured in centimeters.
a.
𝑋 − 𝜇 182 − 𝜇 182 − 176
𝑃(𝑋 < 182) = 𝑃( < ) = 𝑃(𝑍 < ) = 𝑃(𝑍 < 1.2) = 𝜑(1.2) = 0.8849
𝜎 𝜎 5
b.
𝑋 − 𝜇 178 − 𝜇 𝑋 − 176 178 − 176
𝑃(𝑋 > 178) = 𝑃 ( > ) = 𝑃( > )
𝜎 𝜎 5 5
= 𝑃(𝑍 > 0.4) = 1 − 𝑃(𝑍 ≤ 0.4) = 1 − 𝜑(0.4) = 1 − 0.6554 = 0.3446
c.
178 − 𝜇 𝑋 − 𝜇 182 − 𝜇
𝑃(178 < 𝑋 < 182) = 𝑃 ( < < )
𝜎 𝜎 𝜎
178 − 176 182 − 176
= 𝑃( <𝑍< ) = 𝑃(0.4 < 𝑍 < 1.2)
5 5
= 𝜑(1.2) − 𝜑(0.4) = 0.8849 − 0.6554 = 0.2295
347 | P a g e
d.
170 − 𝜇 𝑋 − 𝜇 186 − 𝜇
𝑃(170 < 𝑋 < 186) = 𝑃 ( < < )
𝜎 𝜎 𝜎
170 − 176 186 − 176
= 𝑃( <𝑍< ) = 𝑃(−1.2 < 𝑍 < 2) = 𝜑(2) − 𝜑(−1.2)
5 5
= 𝜑(2) − (1 − 𝜑(1.2)) = 𝜑(2) + 𝜑(1.2) − 1
= 0.9772 + 0.8849 − 1 = 0.8621
Example 3.2
In a discus throw competition, the participants' throw magnitude follows a

normal distribution with mean 50 meters and standard deviation 3 meters.
a. What proportion of people throw the discus more than 55 meters?
b. If a person wants to be among the top 3%, what is the minimum
required magnitude of throwing the discus?
Solution. Suppose that the random variable 𝑋 denotes the throw magnitude of the
participants.
a.
𝑋 − 𝜇 55 − 𝜇 55 − 50
𝑃(𝑋 > 55) = 𝑃( > ) = 𝑃(𝑍 > ) = 𝑃(𝑍 > 1.66) = 1 − 𝜑(1.66)
𝜎 𝜎 3
= 1 − 0.952 = 0.048
b. To be among the top 3%, the person's throw magnitude should be a

number whose greater-than-or-equal-to probability is 0.03. Therefore,
we have:
𝑋−𝜇 𝑎−𝜇 𝑎 − 50
𝑃(𝑋 > 𝑎) = 0.03 ⇒ 𝑃 ( > ) = 0.03 ⇒ 𝑃 (𝑍 > ) = 0.03
𝜎 𝜎 3
𝑎 − 50 𝑎 − 50
⇒ 𝑃 (𝑍 ≤ ) = 0.97 ⇒ = 1.88
3 3
⇒ 𝑎 = 50 + 1.88 × 3 = 55.64
348 | P a g e
Example 3.3
In an endurance running race, time traveled by participants follows a normal

distribution with mean 90 minutes and standard deviation 5 minutes.
a. What proportion of the participants cross the finish line before 80
minutes?
b. If a person wants to be among the top 5%, what is the maximum
allowable time for crossing the finish line?
Solution. Suppose that the random variable 𝑋 denotes the time traveled by the
participants.
a.
𝑋 − 𝜇 80 − 𝜇 80 − 90
𝑃(𝑋 < 80) = 𝑃 ( < ) = 𝑃 (𝑍 < ) = 𝑃(𝑍 < −2) = 𝜑(−2)
𝜎 𝜎 5
= 1 − 𝜑(2) = 1 − 0.977 = 0.023
b. To be among the top 5%, the person's maximum travel time should be a
number whose less-than-or-equal-to probability is 0.05. Therefore, we
have:
𝑋−𝜇 𝑎−𝜇 𝑎 − 90
𝑃(𝑋 < 𝑎) = 0.05 ⇒ 𝑃( < ) = 0.05 ⇒ 𝑃(𝑍 < ) = 0.05
𝜎 𝜎 5
𝑎 − 90
⇒ = −1.65 ⇒ 𝑎 = 90 − 1.65 × 5 = 81.75
5
Example 3.4
The droplet adhesion force of a plastic adhesive normally has an average of 50

kg and a standard deviation of 4 kg. We glue a piece with this adhesive and try to
349 | P a g e
detach it with a 46-kg force for testing. What is the probability that the adhesion
property disappears?
Solution. Suppose that the random variable 𝑋 denotes the adhesion force of the
adhesive. If so, the probability that the adhesion force is less than 46 kg is equal to:
𝑋 − 50 46 − 50
𝑃(𝑋 < 46) = 𝑃( < ) = 𝑃(𝑍 < −1) = 0.1587
4 4
Example 3.5
The external diameter of a cylindrical piece (measured in millimeters)

follows a normal distribution with 𝜇 = 12 and 𝜎 = 0.1. If the specified allowable limit
for the external diameter of the piece is equal to 12 ± 0.2, and the pieces whose
external diameter is not within the allowable limits are considered defective,
a. What proportion of the pieces are defective?
b. What is the allowable value of 𝜎 so that we do not have more than 1%
of defective pieces?
Solution.
a. Suppose that the random variable 𝑋 denotes the external diameter of
the pieces (in millimeters). If so, the probability of being nondefective
for the pieces is:
11.8 − 12 12.2 − 12
𝑃(11.8 < 𝑋 < 12.2) = 𝑃( <𝑍< ) = 𝑃(−2 < 𝑍 < 2)
0.1 0.1
= 2𝜑(2) − 1 = 0.954
Hence, the probability of being defective for the pieces is:
1 − 𝑃(11.8 < 𝑋 < 12.2) = 1 − 0.954 = 0.046
b. Since the defective fraction should be equal to 0.01, 99% of the pieces
should be nondefective. Therefore, we have:
350 | P a g e
11.8 − 12 12.2 − 12 −0.2 0.2
𝑃( <𝑍< ) = 𝑃( <𝑍< ) = 0.99
𝜎 𝜎 𝜎 𝜎
0.2 0.2 0.2
⇒ 2𝑃 (𝑍 < ) − 1 = 0.99 ⇒ 𝑃 (𝑍 < ) = 0.995 ⇒ = 2.575
𝜎 𝜎 𝜎
⇒ 𝜎 ≈ 0.0777
The moment generating function of the standard normal random variable is
+∞ +∞ −𝑧 2 +∞ −(𝑧 2 −2𝑡𝑧)
𝑡𝑍 𝑡𝑧
1 𝑡𝑧
1
𝑀𝑍 (𝑡) = 𝐸 (𝑒 ) = ∫ 𝑒 𝑓(𝑧)𝑑𝑧 = ∫ 𝑒 𝑒 2 𝑑𝑧 = ∫ 𝑒 2 𝑑𝑧
−∞ √2𝜋 −∞ √2𝜋 −∞
+∞ −(𝑧−𝑡)2 𝑡 2 𝑡 2 +∞ −(𝑧−𝑡)2 𝑡2
1 1
= ∫ 𝑒 2 + 2 𝑑𝑧 = 𝑒2 ∫ 𝑒 2 𝑑𝑧 =𝑒2
√2𝜋 −∞ −∞ √2𝜋
Moreover, using Proposition 9.1 in Chapter 5 for a normal random variable with
mean 𝜇 and variance 𝜎 2 , we have:
𝑋−𝜇 𝜎2 𝑡 2 𝜎2 𝑡 2
𝑡𝜇 𝑡𝜇 𝑡𝜇+
= 𝑍 ⇒ 𝑋 = 𝜎𝑍 + 𝜇 ⇒ 𝑀𝑋 (𝑡) = 𝑀𝜎𝑍+𝜇 (𝑡) = 𝑒 𝑀𝑍 (𝜎𝑡) = 𝑒 𝑒 2 =𝑒 2
𝜎
The moment generating function of the standard normal distribution is equal

𝑡2
to 𝑒 . If we differentiate it 𝑟 times with respect to 𝑡 and then let 𝑡 be zero, we have:
2
𝑟!
𝑟 = 1 × 3 × ⋯ × (𝑟 − 1) ; 𝑒𝑣𝑒𝑛 𝑟
𝐸(𝑍 𝑟 ) = {2(2) 𝑟
× (2)!
0 ; 𝑜𝑑𝑑 𝑟
For example, 𝐸(𝑍 5 ) = 0 and 𝐸(𝑍 6 ) = 1 × 3 × 5 = 15.
Likewise, the value of 𝐸((𝑋 − 𝜇)𝑟 ) in a normal distribution with mean 𝜇 and
variance 𝜎 2 is determined as follows:
351 | P a g e
𝑋−𝜇 𝑟
𝐸((𝑋 − 𝜇)𝑟 ) = 𝜎 𝑟 𝐸(( ) ) = 𝜎 𝑟 𝐸(𝑍 𝑟 )
𝜎
𝜎 𝑟 𝑟!
𝑟 = 𝜎 𝑟 × 1 × 3 × ⋯ × (𝑟 − 1) ; 𝑒𝑣𝑒𝑛 𝑟
𝑟
= {2(2) × ( )!
2
0 ; 𝑜𝑑𝑑 𝑟
Moreover, using the above relationships, it can be shown that the normal
distribution skewness and kurtosis are equal to 0 and 3, respectively as follows:
𝐸[(𝑋 − 𝜇)3 ] 𝑋−𝜇 3

𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 = = 𝐸[( ) ] = 𝐸[(𝑍)3 ] = 0
𝜎3 𝜎
𝐸[(𝑋 − 𝜇)4 ] 𝑋−𝜇 4 4!
𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 = 4
= 𝐸[( ) ] = 𝐸[(𝑍)4 ] = 2 =3
𝜎 𝜎 2 × 2!
There are also other properties of the normal distribution such as:
➢ If 𝑋 follows a normal distribution with mean 𝜇, then because of the
normal distribution symmetry, we have 𝑓𝑋 (𝜇 − 𝑎) = 𝑓𝑋 (𝜇 + 𝑎).
➢ The median, mode, and mean of the normal distribution are the same
and equal to 𝜇.
A s mentioned before, the normal distribution as an approximation to the binomial

1
distribution was firstly proposed by DeMoivre for a special case of 𝑝 = , and was
2
then generalized by Laplace for general case of 𝑝. Laplace showed that if 𝑋 follows a
binomial random variable with parameters 𝑛 and 𝑝, and 𝑛 is large, then 𝑋 can be
approximated by a normal distribution with parameters 𝜇 = 𝑛𝑝 and 𝜎 2 = 𝑛𝑝(1 − 𝑝).
Theorem 3.1 The DeMoivre-Laplace limit theorem

If 𝑋 is a binomial random variable with parameters 𝑛 and 𝑝, then for any two
positive integers like 𝑎 and 𝑏 (𝑏 > 𝑎) when 𝑛 approaches infinity, we have:
𝑋 − 𝑛𝑝
𝑃(𝑎 ≤ ≤ 𝑏) = 𝜙(𝑏) − 𝜙(𝑎)
√𝑛𝑝(1 − 𝑝)
352 | P a g e
Figure 7.6 shows the binomial distribution probability function and its
similarity to the normal distribution for different cases of 𝑛 and 𝑝.
Figure 7.6 The binomial distribution probability function for different cases of 𝑛 and 𝑝
In general, if 𝑛 is large such that the relationship 𝑛𝑝(1 − 𝑝) ≥ 10 is valid, the

normal distribution is an appropriate approximation to the binomial distribution.
Since the binomial distribution is a discrete distribution, the probability of some
integers in this distribution is not zero, while the probability of each integer in the
normal distribution approaches zero. Therefore, before applying the normal
1 1
approximation, 𝑃(𝑋 = 𝑖) is written as 𝑃(𝑖 − 2 < 𝑋 < 𝑖 + 2) (This operation is called the
continuity correction.). Hence, 𝑃(𝑋 = 𝑖) in the binomial distribution is approximated
as follows:
1 1
1 1 𝑖 − 2 − 𝑛𝑝 𝑖 + 2 − 𝑛𝑝
𝑃(𝑋 = 𝑖) = 𝑃 (𝑖 − < 𝑋 < 𝑖 + ) = 𝑃 ( <𝑍< )
2 2 √𝑛𝑝(1 − 𝑝) √𝑛𝑝(1 − 𝑝)
353 | P a g e
And 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) in the binomial distribution is approximated as follows:
1 1
1 1 𝑎 − 2 − 𝑛𝑝 𝑏 + 2 − 𝑛𝑝
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝑃(𝑎 − < 𝑋 < 𝑏 + ) = 𝑃 ( <𝑍< )
2 2 √𝑛𝑝(1 − 𝑝) √𝑛𝑝(1 − 𝑝)
Example 3.6
The produced pieces by a factory have independent acceptable quality with

probability 0.8.
a. Obtain the approximate probability that at most 15 of 100 produced pieces are
not acceptable.
b. Obtain the approximate probability that more than 26 of 100 produced pieces
are not acceptable.
c. Obtain the approximate probability that at least 12 and at most 28 of 100
produced pieces are not acceptable.
Solution. Suppose that 𝑋 denotes the number of unacceptable pieces and the
probability of being defective for each piece is 0.2. Since 𝑛𝑝𝑞 is is equal to 16, we have:
a.
1 1
𝑋 − 𝑛𝑝 15 + 2 − 𝑛𝑝 15 + 2 − 100 × 0.2
𝑃(𝑋 ≤ 15) = 𝑃 ( ≤ ) = 𝑃 (𝑍 ≤ )
√𝑛𝑝(1 − 𝑝) √𝑛𝑝(1 − 𝑝) √100 × 0.2 × 0.8
−4.5
= 𝑃(𝑍 ≤ ) = 𝑃(𝑍 ≤ −1.125) = 0.013
4
b.
1 1
𝑋 − 𝑛𝑝 27 − 2 − 𝑛𝑝 27 − 2 − 100 × 0.2
𝑃(𝑋 > 26) = 𝑃(𝑋 ≥ 27) = 𝑃 ( ≥ ) = 𝑃 (𝑍 ≥ )
√𝑛𝑝(1 − 𝑝) √𝑛𝑝(1 − 𝑝) √100 × 0.2 × 0.8
6.5
= 𝑃(𝑍 ≥ ) = 𝑃(𝑍 ≥ 1.625) ≈ 0.052
4
354 | P a g e
c.
1 1
12 − 2 − 100 × 0.2 28 + 2 − 100 × 0.2
𝑃(12 ≤ 𝑋 ≤ 28) = 𝑃 ( ≤𝑍≥ )
√100 × 0.2 × 0.8 √100 × 0.2 × 0.8
−8.5 8.5
= 𝑃( ≤𝑍≤ ) = 𝑃(−2.125 ≤ 𝑍 ≤ 2.125) = 2𝜑(2.125) − 1
4 4
= 2 × 0.983 − 1 = 0.966
T he random variable 𝑋 follows an exponential distribution with parameter 𝜆 >

0 if its density function is as follows:
−𝜆𝑥
𝑓(𝑥) = {𝜆𝑒 ; 𝑥≥0
0 ; 𝑥<0
If the random variable 𝑋 follows an exponential distribution with parameter

𝜆 > 0, we briefly denote it by 𝑋 ∼ 𝐸𝑥𝑝(𝜆). Figure 7.7 shows the density function
curve of this random variable:
𝑓𝑥 (𝑥)
𝜆
𝑥
Figure 7.7: The exponential density function
The cumulative distribution function of the exponential random variable is

equal to:
𝑎 𝑎
𝐹(𝑎) = 𝑃(𝑋 ≤ 𝑎) = ∫ 𝜆𝑒 −𝜆𝑥 𝑑𝑥 = −𝑒 −𝜆𝑥 | 0 = 1 − 𝑒 −𝜆𝑎
0
355 | P a g e
In the exponential distribution, for positive values of 𝑎 and 𝑏 such that (𝑏 > 𝑎),
we have:
𝑃(𝑎 < 𝑋 < 𝑏) = 𝑃(𝑋 < 𝑏) − 𝑃(𝑋 ≤ 𝑎) = (1 − 𝑒 −𝜆𝑏 ) − (1 − 𝑒 −𝜆𝑎 ) = 𝑒 −𝜆𝑎 − 𝑒 −𝜆𝑏
Moreover, the expected value and variance of the exponential random variable
is:
∞ ∞
∞ 1 1
𝐸(𝑋) = ∫ 𝑥𝜆𝑒 −𝜆𝑥 𝑑𝑥 = −𝑥𝑒 −𝜆𝑥 |0 + ∫ 𝑒 −𝜆𝑥 𝑑𝑥 = 0 + =
0 0 𝜆 𝜆
∞ ∞
∞ 2
𝐸(𝑋 2 ) = ∫ 𝑥 2 𝜆𝑒 −𝜆𝑥 𝑑𝑥 = −𝑥 2 𝑒 −𝜆𝑥 |0 + 2 ∫ 𝑥𝑒 −𝜆𝑥 𝑑𝑥 = 0 + 2
0 0 𝜆
2 1 1
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = 2 − ( )2 = 2
𝜆 𝜆 𝜆
Example 4.1
Suppose a person's call duration on a mobile phone (in minutes) is an

1
exponential random variable with parameter 𝜆 = 2. What percentage of that person's
call last more than 4 minutes?
Solution. Suppose that the random variable 𝑋 denotes the person's call duration.
∞ ∞
1 1 1
𝑃(𝑋 > 4) = ∫ 𝑒 −2𝑥 𝑑𝑥 = −𝑒 −2𝑥 | 4 = 𝑒 −2 = 0.1353
4 2
It can be shown that if 𝑌 denotes the number of times that an event occurs
per time unit that follows a Poisson distribution with parameter 𝜆, then the
duration between any two consecutive events from this Poisson random variable
follows an exponential distribution with density function 𝑓𝑋 (𝑥) = 𝜆𝑒 −𝜆𝑥 ; 𝑥 ≥ 0.
Moreover, the duration from zero moment to the first event also follows the
exponential distribution. To prove this property of the exponential distribution,
suppose that 𝑌𝑡=𝑎 denotes the number of times that an event occurs per “𝑎” units
356 | P a g e
of time that follows a Poisson distribution with parameter 𝜆𝑎. If the duration from
the occurrence of an event to the next occurrence is called 𝑋, then for positive
values of 𝑎, we have:
𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = 1 − 𝑃(𝑋 > 𝑎)
Where (𝑋 > 𝑎) means that the time from the occurrence of one event to the
next one lasts more than “𝑎” units of time. In other words, in the next “𝑎” units of
time from a Poisson distribution, zero event occurs. Therefore, we have:
𝑒 −𝜆𝑎 (𝜆𝑎)0
𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = 1 − 𝑃(𝑋 > 𝑎) = 1 − 𝑃(𝑌𝑡=𝑎 = 0) = 1 − = 1 − 𝑒 −𝜆𝑎
0!
𝑑𝐹𝑋 (𝑎)
⇒ 𝑓𝑋 (𝑎) = = 𝜆𝑒 −𝜆𝑎 ; 𝑎 ≥ 0
𝑑𝑎
Therefore, as seen, the random variable 𝑋 follows an exponential

distribution with a rate of occurrence 𝜆 per time unit.
Example 4.2
A store starts working from 8 am. If the number of customers entering the store
follows a Poisson distribution with parameter 𝜆 = 3 (people per hour), then:
a. what is the probability that we wait at least 40 minutes between the
second and third customers' arrival?
b. what is the probability that the first customer enters the store after 8:30
am?
Solution.
a. If the random variable 𝑋 denotes the duration between the second and
third customers, then 𝑋 follows an exponential distribution with
parameter 𝜆 = 3. Therefore, we have:
357 | P a g e
∞
40 2
𝑃(𝑋 > ) = ∫ 3𝑒 −3𝑥 𝑑𝑥 = 𝑒 −3×3 = 𝑒 −2
60 2
3
b. The time from the zero moment to the first customer arrival also follows
an exponential distribution with parameter 𝜆 = 3. Therefore, we have:
∞
30 1
𝑃(𝑋 > ) = ∫ 3𝑒 −3𝑥 𝑑𝑥 = 𝑒 −3×2
60 30
60
The second solution: the question in Section “b” requires the

determination of the probability that no customer enters the store in
the interval [8:00 – 9:30]. Therefore, the probability of such an event is
equal to:
1 1
𝑒 −3×2 (3 × 2)0 3
= 𝑒 −2
0!
Example 4.3
If the number of earthquakes in an area follows a Poisson distribution with

mean 3 earthquakes per year, what is the expected value of duration between two
consecutive earthquakes?
Solution. If 𝑌 denotes the number of earthquakes per year and 𝑋 denotes the
duration between two consecutive earthquakes, 𝑌 follows a Poisson distribution with
mean 𝜆 = 3 earthquakes per year, and 𝑋 follows an exponential distribution with
1 1
mean = 3 per year. If, on average, 3 earthquakes per year occur, we expect the
𝜆
1
duration between two consecutive earthquakes to be 3 of a year.
358 | P a g e
T he moment generating function of the exponential distribution is determined as
follows:
∞ ∞
𝜆
𝑀𝑥 (𝑡) = 𝐸(𝑒 ) = ∫ 𝑒 𝜆𝑒 𝑑𝑥 = 𝜆 ∫ 𝑒 −(𝜆−𝑡)𝑥 𝑑𝑥 =
𝑡𝑋 𝑡𝑥 −𝜆𝑥
;𝑡 < 𝜆
0 0 𝜆−𝑡
Therefore, for the moments of the random variable about the origin, we have:
𝑑𝑀𝑥 (𝑡) 𝜆 𝜆 1
𝐸(𝑋) = | = | = =
𝑑𝑡 𝑡 = 0 (𝜆 − 𝑡)2 𝑡 = 0 𝜆2 𝜆
𝑑2 𝑀𝑥 (𝑡) 2𝜆 2𝜆 2
𝐸(𝑋 2 ) = 2
| = 3
| = 3= 2
𝑑𝑡 𝑡 = 0 (𝜆 − 𝑡) 𝑡 = 0 𝜆 𝜆
3
𝑑 𝑀𝑥 (𝑡) 3 × 2𝜆 3! 𝜆 3!
𝐸(𝑋 3 ) = 3
| = 4
| = 4 = 3
𝑑𝑡 𝑡 = 0 (𝜆 − 𝑡) 𝑡 = 0 𝜆 𝜆
⋮
𝑑 𝑟 𝑀𝑥 (𝑡) 𝑟! 𝜆 𝑟!
𝐸(𝑋 𝑟 ) = | = | =
𝑑𝑡 𝑟 𝑡 = 0 (𝜆 − 𝑡)𝑟+1 𝑡 = 0 𝜆𝑟
As seen, the 𝑟 𝑡ℎ moment of the exponential distribution about the origin is
𝑟!
equal to 𝜆𝑟 .
Proposition 4-1
If the random variable 𝑋 follows an exponential distribution with parameter 𝜆, then,
𝜆
for 𝑎 > 0, the distribution of 𝑌 = 𝑎𝑋 is exponential with parameter .
𝑎
Proof.
𝑦 𝑑𝑥 𝑑𝑔−1 (𝑦) 1 𝑦 1 𝜆 1
𝑦 = 𝑔(𝑥) = 𝑎𝑥 ⇒ 𝑥 = 𝑔−1 (𝑦) = ⇒ = = ⇒ 𝑓𝑌 (𝑦) = 𝑓𝑋 ( ) | | = 𝜆𝑒 −𝑎𝑦
𝑎 𝑑𝑦 𝑑𝑦 𝑎 𝑎 𝑎 𝑎
𝜆 −𝜆𝑦
= 𝑒 𝑎 ;𝑦 ≥ 0
𝑎
𝑙𝑛 2
Proposition 4-2 The median of the exponential distribution is at point .
𝜆
Proof.
1 1 1 1 𝑙𝑛 2
𝑃(𝑋 ≤ 𝑚) = ⇒ 1 − 𝑒 −𝜆𝑚 = ⇒ 𝑒 −𝜆𝑚 = ⇒ −𝜆𝑚 = 𝑙𝑛 = − 𝑙𝑛 2 ⇒ 𝑚 =
2 2 2 2 𝜆
359 | P a g e
Proposition 4-3
The exponential distribution is memoryless. In other words, in the exponential
distribution, for positive values of 𝑠 and 𝑡, we have:
𝑃(𝑋 > 𝑠 + 𝑡|𝑋 > 𝑠) = 𝑃(𝑋 > 𝑡)
Proof.
𝑃(𝐴 ∩ 𝐵) 𝑒 −𝜆(𝑠+𝑡)
𝑃(𝑋 > 𝑠 + 𝑡|𝑋 > 𝑠) = 𝑃(𝐴|𝐵) = = −𝜆𝑠
= 𝑒 −𝜆𝑡 = 𝑃(𝑋 > 𝑡)
𝑃(𝐵) 𝑒
The above term means that if no events occur in “𝑠” units of time from an
exponential distribution, the probability that no events occur in the next “𝑡” units
of time is equal to the probability that from the zero moment to moment “𝑡” no
events occur (the occurrence of no events in “𝑠” units of time from an exponential
distribution does not affect the occurrence time in the future).
Example 4.4
If the working duration of a device until failure time follows an exponential

distribution with parameter 𝜆 = 2 (in hours), and we know that the device has not
failed in the first hour, what is the probability that it does not fail until the third hour?
Solution. Since the exponential distribution is memoryless, the required probability
is equal to the probability that the device does not fail from the zero moment to the
next two hours.
∞
𝑃(𝑋 > 3|𝑋 > 1) = 𝑃(𝑋 > 2) = ∫ 2𝑒 −2𝑥 𝑑𝑥 = 𝑒 −4 = 0.0183
2
If, in the problem, the distribution of 𝑋 (or the working duration until failure)
is uniform in interval [0,5], then the required probability is equal to:
2
𝑃(𝑋 > 3 ∩ 𝑋 > 1) 5 1 3
𝑃(𝑋 > 3|𝑋 > 1) = = = ≠ 𝑃(𝑋 > 2) =
𝑃(𝑋 > 1) 4 2 5
5
Hence, the uniform distribution is not memoryless.
360 | P a g e
In addition, if 𝑋 approximately follows a normal distribution with parameters
1
𝜇 = 2 and 𝜎 = 2, the required probability is equal to:
𝑃(𝑋 > 3 ∩ 𝑋 > 1) 𝑃(𝑍 > 2) 0.0228

𝑃(𝑋 > 3|𝑋 > 1) = = = = 0.0233
𝑃(𝑋 > 1) 𝑃(𝑍 > −2) 0.9772
1
≠ 𝑃(𝑋 > 2) = 𝜙(0) =
2
As a result, the normal distribution is not memoryless as well.
Example 4.5
The lifetime of a radio measured in years follows an exponential distribution

with mean 8 years. If a person buys a second-hand device that has worked 2 years so
far, what is the probability that this device functions another 8 years?
1
Solution. The mean of the exponential distribution is equal to 𝜆. Hence, we have 𝜆 = 18.
Hence, the exponential random variable is memoryless, we have:
∞
1 1
𝑃(𝑋 > 8 + 2|𝑋 > 2) = 𝑃(𝑋 > 8) = ∫ 𝑒 −8𝑥 𝑑𝑥 = 𝑒 −1
8 8
T he random variable 𝑋 follows a two-parameter exponential distribution with

parameters 𝜆 and “𝑎” if its density function is as follows:
𝑓(𝑥) = 𝜆𝑒 −𝜆(𝑥−𝑎) ; 𝑥 ≥ 𝑎
If the random variable 𝑋 follows a two-parameter exponential distribution
with parameters 𝜆 and 𝑎, we briefly denote it by 𝑋 ∼ 𝐸𝑥𝑝(𝜆, 𝑎).
For example, suppose that a device has not failed by moment “𝑎”, and we know
that its remaining lifetime follows the exponential distribution. In such a case, we
show its density function as above.
The two-parameter exponential distribution is actually the same exponential
distribution shifted by “𝑎” units. Therefore, its mean is “𝑎” units more than the mean
of the exponential distribution. However, its variance is equal to the variance of the
361 | P a g e
exponential distribution. To understand the difference between these two
distributions, note the following figures:
𝑓𝑥 (𝑥) 𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥 ; 𝑥 ≥ 0 𝑓𝑥 (𝑥) 𝑓(𝑥) = 𝜆𝑒 −𝜆(𝑥−𝑎) ; 𝑥 ≥ 𝑎
𝜆 1 𝜆 1
𝐸(𝑋) = 𝐸(𝑋) = 𝑎 +
𝜆 𝜆
1 1
𝑉𝑎𝑟(𝑋) = 𝜆2 𝑉𝑎𝑟 (𝑋) = 2
𝜆
𝑥 𝑎 𝑥
𝑋 ∼ 𝐸𝑥𝑝(𝜆) 𝑋 ∼ 𝐸𝑥𝑝(𝜆, 𝑎)
Figure 7.8: The difference between the exponential and two-parameter exponential distributions
The cumulative distribution function of the two-parameter exponential

𝑏
𝑏
𝐹𝑋 (𝑏) = 𝑃(𝑋 ≤ 𝑏) = ∫ 𝜆𝑒 −𝜆(𝑥−𝑎) 𝑑𝑥 = −𝑒 −𝜆(𝑥−𝑎) | = 1 − 𝑒 −𝜆(𝑏−𝑎)
𝑎 𝑎
T he random variable 𝑋 follows a gamma distribution with parameters 𝜆 > 0 and

𝛼 > 0 if its density function is as follows:
𝜆𝑒 −𝜆𝑥 (𝜆𝑥)𝛼−1 𝜆𝛼 𝛼−1 −𝜆𝑥

𝑓(𝑥) = = 𝑥 𝑒 ; 𝑥≥0
𝛤(𝛼) 𝛤(𝛼)
If the random variable 𝑋 follows a gamma distribution with parameters 𝜆 > 0

and 𝛼 > 0 , we briefly denote it by 𝑋 ∼ 𝛤(𝛼, 𝜆).
In the gamma density function, we call 𝛤(𝛼) the gamma function defined as
follows:
∞
𝛤(𝛼) = ∫ 𝑒 −𝑦 𝑦 𝛼−1 𝑑𝑦
0
Using the integration of the gamma function by parts, we have:
362 | P a g e
∞ ∞
𝛤(𝛼) = 𝑒 −𝑦 𝑦 𝛼−1 |∞
0 +∫ 𝑒
−𝑦
(𝛼 − 1)𝑦 𝛼−2 𝑑𝑦 = (𝛼 − 1) ∫ 𝑒 −𝑦 𝑦 𝛼−2 𝑑𝑦 = (𝛼 − 1)𝛤(𝛼 − 1)
0 0
Hence, for integer values of 𝛼, we have:
𝛤(𝛼) = (𝛼 − 1)𝛤(𝛼 − 1) = (𝛼 − 1)(𝛼 − 2)𝛤(𝛼 − 2) = ⋯
∞
Therefore, considering 𝛤(1) = ∫0 𝑒 −𝑦 𝑑𝑦 = 1, for integer values of 𝛼, the gamma
function value is equal to:
𝛤(𝛼) = (𝛼 − 1)!
If 𝛼 is an integer, the gamma distribution is called the Erlang distribution as

well.
Indeed, the term 𝛤(𝛼) is a value whose main function is to make the area under
the gamma density function equal 1. The reader should be careful not to confuse it
with the gamma random variable symbol displayed by 𝑋 ∼ 𝛤(𝛼, 𝜆). Figure 7.9 shows
the gamma density function for different values of parameter 𝛼.
𝑓𝑋 (𝑥)
𝛼=1
𝛼=2
𝛼=5
𝛼 = 10
𝑥
0 10 20 30 40 50 60 70 80 90
1
Figure 7.9 The gamma distribution density function for different values of parameters 𝛼 and 𝜆 =
4
Because this distribution takes different figures by changing the gamma

distribution parameters, it is used to model many random phenomena.
The expected value and variance of the gamma distribution are equal to:
∞ ∞
1 1 𝛤(𝛼 + 1) 𝛼
𝐸(𝑋) = ∫ 𝜆𝑥𝑒 −𝜆𝑥 (𝜆𝑥)𝛼−1 𝑑𝑥 = ∫ 𝜆𝑒 −𝜆𝑥 (𝜆𝑥)𝛼 𝑑𝑥 = =
𝛤(𝛼) 0 𝜆 𝛤(𝛼) 0 𝜆 𝛤(𝛼) 𝜆
363 | P a g e
∞ ∞
1 1 𝛤(𝛼 + 2) 𝛼(𝛼 + 1)
𝐸(𝑋 2 ) = ∫ 𝜆𝑥 2 𝑒 −𝜆𝑥 (𝜆𝑥)𝛼−1 𝑑𝑥 = 2 ∫ 𝜆𝑒 −𝜆𝑥 (𝜆𝑥)𝛼+1 𝑑𝑥 = 2 =
𝛤(𝛼) 0 𝜆 𝛤(𝛼) 0 𝜆 𝛤(𝛼) 𝜆2
𝛼(𝛼 + 1) 𝛼 2 𝛼
⇒ 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = − 2= 2
𝜆2 𝜆 𝜆
It is seen that the expected value and variance of the gamma distribution are
𝛼 times those of the exponential distribution.
In Chapter 9, we will show that the moment generating function of the gamma
𝜆 𝛼
𝑀𝑥 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ( ) ; 𝑡<𝜆
𝜆−𝑡
Moreover, the 𝑟 𝑡ℎ moment of this distribution about the origin is equal to:
𝑟
𝑑𝑟 𝑀𝑥 (𝑡) 𝛤(𝑟 + 𝛼)
𝐸(𝑋 ) = | =
𝑑𝑡 𝑟 𝑡=0 𝜆𝑟 𝛤(𝛼)
However, the 𝑟 𝑡ℎ moment of this distribution is also obtained as follows:
∞ ∞
𝑟
𝜆𝛼 𝛼−1 −𝜆𝑥
𝑟
1 𝛤(𝑟 + 𝛼)
𝐸(𝑋 ) = ∫ 𝑥 𝑥 𝑒 𝑑𝑥 = 𝑟 ∫ 𝜆𝑒 −𝜆𝑥 (𝜆𝑥)𝑟+𝛼−1 𝑑𝑥 = 𝑟
0 𝛤(𝛼) 𝜆 𝛤(𝛼) 0 𝜆 𝛤(𝛼)
It can be shown that if the number of events per time unit follows a Poisson
distribution with parameter 𝜆, then, from a specified moment, the duration that lasts
until 𝛼 events of this distribution occur follows a gamma distribution with
parameters 𝜆 and 𝛼. To do so, suppose that 𝑋 denotes the time that it takes that 𝛼
events of this distribution occur. To obtain its density function, we have:
𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = 1 − 𝑃(𝑋 > 𝑎)
Where (𝑋 > 𝑎) means that from a specified moment to occurrence time of 𝛼

events of this Poisson process takes more than “𝑎” units of time. In other words, in
the next “𝑎” units of time from the Poisson process, up to (𝛼 − 1) events occur.
Therefore, we have:
364 | P a g e
𝛼−1
𝑒 −𝜆𝑎 (𝜆𝑎)𝑖
𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = 1 − 𝑃(𝑋 > 𝑎) = 1 − ∑
𝑖!
𝑖=0
−𝜆𝑎 (𝜆𝑎)0 −𝜆𝑎 (𝜆𝑎)1 −𝜆𝑎 (𝜆𝑎)𝛼−1
𝑒 𝑒 𝑒
=1−[ + +⋯+ ]
0! 1! (𝛼 − 1)!
Now, differentiating the above term with respect to “𝑎”, we can show
𝑑𝐹𝑋 (𝑎) 𝜆𝑒 −𝜆𝑎 (𝜆𝑎)𝛼−1 𝜆𝛼

𝑓𝑋 (𝑎) = = = 𝑎𝛼−1 𝑒 −𝜆𝑎 ; 𝑎 ≥ 0
𝑑𝑎 (𝛼 − 1)! (𝛼 − 1)!
As seen, the above function is the density function of a gamma random variable
with parameters 𝜆 and 𝛼 = 𝑛.
A closer look helps us to realize that if 𝛼 = 1 in the gamma distribution, then
the resulted distribution is the exponential distribution.
Example 5.1
If the number of customers entering a bank per hour follows a Poisson

distribution with parameter 𝜆 = 3 (people per hour), it is desired to calculate:
a. The probability that it takes at least half an hour from the moment of
opening the door to the arrival of the second customer.
b. The probability that the arrival duration between the first and fourth
customers lasts less than 20 minutes.
Solution.
a. Suppose the random variable 𝑋 denotes the time it takes until the
second customer enters, which follows an Erlang distribution with
parameters 𝜆 = 3 and 𝛼 = 2. One way to solve this problem is to take
1
an integral from the Erlang distribution in interval to infinity.
2
Nonetheless, since taking integral requires integration by parts
method, its calculation is often time-consuming. Hence, we use
365 | P a g e
another method associated with the Erlang distribution concept in
the Poisson process. It should be noted that this event; i.e., the time
until the arrival of the second customer takes more than half an hour,
means that the second customer does not enter in the first half of an
hour of the Poisson process. In other words, either no customers or
one customer enters in the first half an hour. Therefore, we have:
1 1 1 1
1 𝑒 −3×2 (3 × 2)0 𝑒 −3×2 (3 × 2)1 5 3
𝑃(𝑋 > ) = + = 𝑒 −2
2 0! 1! 2
Moreover, note that the occurrence rate in one hour is equal to 3, and
3
in half an hour is equal to 2.
b. If the random variable 𝑌 denotes the arrival duration between the first
and fourth customers, then this random variable follows a gamma
distribution with parameters 𝜆 = 3 and 𝛼 = 3. Hence, we have:
1 1
𝑃(𝑌 < ) = 1 − 𝑃(𝑌 > )
3 3
1
Where (𝑌 > 3) denotes the event that after the arrival of the first
customer, the time until the arrival of another three customers takes
more than a third of an hour. In other words, zero, one, or two
customers enter in a third of an hour. Hence, we have:
1 1 1 1 1 1
−3×
1 1 𝑒 3 (3 × 3)0 𝑒 −3×3 (3 × 3)1 𝑒 −3×3 (3 × 3)2
𝑃(𝑌 < ) = 1 − 𝑃(𝑌 > ) = 1 − [ + + ]
3 3 0! 1! 2!
5
= 1 − 𝑒 −1
2
Note that the occurrence rate in one hour is equal to 3 and in a third
of an hour in equal to 1.
366 | P a g e
Proposition 4-1
If the random variable 𝑋 follows a Erlang (gamma) distribution with parameters 𝛼 and
𝜆, then, for 𝑎 > 0, the distribution of 𝑌 = 𝑎𝑋 is Erlang (gamma) with parameters 𝛼 and
𝜆
.
𝑎
Proof.
−1
𝑦 𝑑𝑥 𝑑𝑔−1 (𝑦) 1
𝑦 = 𝑔(𝑥) = 𝑎𝑥 ⇒ 𝑥 = 𝑔 (𝑦) = ⇒ = =
𝑎 𝑑𝑦 𝑑𝑦 𝑎
𝜆 𝛼
𝑦 1 𝜆𝛼 𝑦 𝛼−1 −𝜆𝑦 1 (𝑎) 𝜆
⇒ 𝑓𝑌 (𝑦) = 𝑓𝑋 ( ) | | = ( ) 𝑒 𝑎 = (𝑦)𝛼−1 𝑒 −𝑎𝑦 ; 𝑦 ≥ 0
𝑎 𝑎 𝛤(𝛼) 𝑎 𝑎 𝛤(𝛼)
T he random variable 𝑋 follows a three-parameter gamma distribution if its

density function is as follows:
𝜆𝑒 −𝜆(𝑥−𝑎) (𝜆(𝑥 − 𝑎))𝛼−1 𝜆𝛼

𝑓(𝑥) = = (𝑥 − 𝑎)𝛼−1 𝑒 −𝜆(𝑥−𝑎) ; 𝑥 ≥ 𝑎
𝛤(𝛼) 𝛤(𝛼)
If the random variable 𝑋 follows the density function above, we briefly

denote it by 𝑋 ∼ 𝛤(𝛼, 𝜆, 𝑎).
Namely, suppose that a device has not failed by moment “𝑎”, and we know
that its remaining lifetime follows the gamma distribution. In such a case, its
density function is shows as above.
Note that the three-parameter gamma distribution is the gamma
distribution shifted by “𝑎” units. Therefore, it is evident that its expected value is
“𝑎” units more than that of the gamma distribution. However, its variance is equal
to the variance of the gamma distribution. Hence, for this distribution, we have:
𝛼
𝐸(𝑋) = 𝑎 +
𝜆
𝛼
𝑉𝑎𝑟(𝑋) = 2
𝜆
367 | P a g e
T he random variable 𝑋 follows a Beta distribution with positive parameters 𝑎 and
𝑏 if its density function is as follows:
𝛤(𝑎 + 𝑏) 𝑎−1
𝑥 (1 − 𝑥)𝑏−1 ;0 < 𝑥 < 1
𝑓(𝑥) = {𝛤(𝑎) × 𝛤(𝑏)
0 ; 𝑜. 𝑤
If the random variable 𝑋 follows a Beta distribution with parameters 𝑎 and 𝑏,

then we briefly denote it by 𝑋 ∼ 𝛽(𝑎, 𝑏).
Similar to the gamma distribution, the beta distribution has different figures
except that the values of this distribution fall into a limited interval. Therefore, this
distribution can be used to model random phenomena. Figure 7.10 shows the beta
distribution density for different values of its parameters.
𝑎 = 1, 𝑏 = 1
𝑎 = 0.5, 𝑏 = 0.5
0 1 0 1
𝑎 = 20, 𝑏 = 20 𝑎 = 2, 𝑏 = 10
0 1 0 1
Figure 7.10: Different figures of the beta distribution density function
Considering the following integral, it can be shown that the integral of the beta
distribution density function in interval (0,1) is equal to 1.
368 | P a g e
1
𝛤(𝑎)𝛤(𝑏)
∫ 𝑥 𝑎−1 (1 − 𝑥)𝑏−1 𝑑𝑥 =
0 𝛤(𝑎 + 𝑏)
Moreover, using the same integral, the expected value and variance of this
distribution is obtained as follows:
1
𝛤(𝑎 + 𝑏) 1 𝑎 𝛤(𝑎 + 𝑏) 𝛤(𝑎 + 1) 𝛤(𝑏) 𝑎
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 (1 − 𝑥)𝑏−1 𝑑𝑥 = =
0 𝛤(𝑎) 𝛤(𝑏) 0 𝛤(𝑎) 𝛤(𝑏) 𝛤(𝑎 + 1 + 𝑏) 𝑎+𝑏
1
2
𝛤(𝑎 + 𝑏) 𝑎−1
2 𝑏−1
𝛤(𝑎 + 𝑏) 1 𝑎+1
𝐸(𝑋 ) = ∫ 𝑥 𝑥 (1 − 𝑥) 𝑑𝑥 = ∫ 𝑥 (1 − 𝑥)𝑏−1 𝑑𝑥
0 𝛤(𝑎) 𝛤(𝑏) 𝛤(𝑎) 𝛤(𝑏) 0
𝛤(𝑎 + 𝑏) 𝛤(𝑎 + 2) 𝛤(𝑏) 𝑎(𝑎 + 1)

= =
𝛤(𝑎) 𝛤(𝑏) 𝛤(𝑎 + 2 + 𝑏) (𝑎 + 𝑏)(𝑎 + 𝑏 + 1)
2 𝑎𝑏
⇒ 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − (𝐸(𝑋)) = 2
(𝑎 + 𝑏) (𝑎 + 𝑏 + 1)
In the beta random variable, if 𝑎 = 𝑏, then the beta function is symmetric about
1
point 2. If 𝑎 = 𝑏 = 1, then the beta function is a uniform random variable in interval
[0,1]. If 𝑎 < 𝑏, then the beta distribution density function is skewed to the right (the
distribution peak is shifted to the left and the smaller values have a higher density.),
and vice versa.
Example 6.1
If 𝑋 is beta distribution with parameters (𝑎, 𝑏), then obtain the distribution
of 𝑌 = 𝑐𝑋 for positive values of 𝑐.
369 | P a g e
Solution.
𝑦 𝑑𝑥 𝑑𝑔−1 (𝑦) 1 𝑦 1
𝑦 = 𝑔(𝑥) = 𝑐𝑥 ⇒ 𝑥 = 𝑔−1 (𝑦) = ⇒ = = ⇒ 𝑓𝑌 (𝑦) = 𝑓𝑋 ( ) | |
𝑐 𝑑𝑦 𝑑𝑦 𝑐 𝑐 𝑐
𝛤(𝑎 + 𝑏) 𝑦 𝑎−1 𝑦 𝛤(𝑎 + 𝑏)
= ( ) (1 − )𝑏−1 = (𝑦)𝑎−1 (𝑐 − 𝑦)𝑏−1
𝛤(𝑎) 𝛤(𝑏) 𝑐 𝑐 𝛤(𝑎) 𝛤(𝑏)𝑐 𝑎+𝑏−2
= 𝑘𝑦 𝑎−1 (𝑐 − 𝑦)𝑏−1 ; 0 < 𝑦 < 𝑐
It is seen that the above density function is very similar to the beta density
function and unlike the beta distribution whose interval is from 0 to 1, its interval
is from 0 to 𝑐. This makes it possible to use this random variable in simulating
random phenomena.
T he random variable 𝑋 follows a Weibull distribution with parameters 𝛼 and 𝛽,

then its density function is as follows:
𝛽 𝑥 𝛽−1 𝑥 𝛽
(
𝑓(𝑥) = { 𝛼 𝛼)( ) 𝑒𝑥𝑝 {−( ) } ; 𝑥>0
𝛼
0 ; 𝑥≤0
If the random variable 𝑋 follows a Weibull distribution with the above density
function, it is usually denoted by 𝑋 ∼ 𝑊(𝛼, 𝛽).
This distribution was initially introduced by Waloddi Weibull, a Swedish
mathematician, to model the fatigue of metals and shear strength of materials.
However, it is used in many engineering fields, including the reliability of complex
systems consisting of multiple pieces.
The cumulative distribution function of the Weibull is as follows:
𝑥
𝐹𝑥 (𝑥) = 1 − 𝑒𝑥𝑝 {−( )𝛽 } ; 𝑥 > 0
𝛼
Using the integral by parts, we can show that the expected value and variance
of the Weibull distribution is equal to:
1
𝐸(𝑋) = 𝛼𝛤(1 + )
𝛽
2 1 2
𝑉𝑎𝑟(𝑋) = 𝛼 2 [𝛤(1 + ) − [𝛤(1 + )] ]
𝛽 𝛽
370 | P a g e
T he random variable 𝑋 follows a Cauchy distribution with parameter 𝜃 if its
1 1
𝑓(𝑥) = ; − ∞ < 𝑥 < +∞
𝜋 1 + (𝑥 − 𝜃)2
If the random variable 𝑋 follows a Cauchy distribution with parameter 𝜃, we

denote it by 𝑋 ∼ 𝐶(𝜃) . Meanwhile, parameter 𝜃 is usually equal to zero in statistics
applications.
Paying attention to the expected value definition of a continuous random
∞
variable calculated by the term ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥, we can write the expected value of the
𝜃 ∞
Cauchy random variable as ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 + ∫𝜃 𝑥𝑓(𝑥)𝑑𝑥. Now, it can be shown that the
𝜃 ∞
terms ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 and ∫𝜃 𝑥𝑓(𝑥)𝑑𝑥 do not exist for the Cauchy distribution. Therefore,
the expected value of the Cauchy is undefined.
Example 6.2
If 𝑋 is a Cauchy distribution with parameter 𝜃 = 0, obtain the value of 𝑃(𝑋 ≤ 1).
Solution.
𝜋 𝜋 𝜋
1
1 𝑡𝑎𝑛−1 ( 𝑥) 1 − (− 2 ) 3 4 3
𝑃(𝑋 ≤ 1) = ∫ 𝑑𝑥 = | −∞ = 4 = =
2
−∞ 𝜋 (1 + 𝑥 ) 𝜋 𝜋 𝜋 4
T he random variable 𝑋 follows a Pareto distribution with parameters 𝛼 and 𝜎 if its

𝛼𝜎 𝛼
𝑓(𝑥) = 𝛼+1 ; 𝑥 ≥ 𝜎, 𝛼 > 0
𝑥
where 𝛼 and 𝜎 are positive real numbers.
371 | P a g e
If the random variable 𝑋 follows a Pareto distribution with the above density
function, it is usually denoted by 𝑋 ∼ 𝑃𝑎(𝑎, 𝜎).
The cumulative distribution function of the Pareto random variable is equal to:
𝑥
𝛼𝜎 𝛼 𝜎
𝐹𝑋 (𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫ 𝛼+1
𝑑𝑥 = 1 − ( )𝛼 ; 𝑥 ≥ 𝜎
𝜎 𝑥 𝑥
This random variable is usually used to model some economic indicators such
as the people's income.
S uppose that the continuous and positive random variable 𝑋 denoting the lifetime
of a piece has the cumulative distribution function 𝐹 and density function 𝑓. Its
failure rate or hazard rate function is defined as follows:
𝑓𝑋 (𝑡) 𝑓𝑋 (𝑡)
𝜆(𝑡) = =
𝑃(𝑋 ≥ 𝑡) 1 − 𝐹𝑋 (𝑡)
In other words, the failure rate at time 𝑡 is equal to the density function of
the device failure at time 𝑡 if we know that the device has not failed until time 𝑡.
Note that this function is only defined for the continuous random variables.
Moreover, integrating both sides of the above relationships, we can prove that:
𝑥
𝐹𝑋 (𝑥) = 1 − 𝑒 − ∫0 𝜆(𝑡)𝑑𝑡
Example 7.1
If 𝑋 denotes the lifetime of a device and follows distribution 𝑈[0, 𝑏], obtain its
failure rate function.
372 | P a g e
Solution.
1
𝑓𝑋 (𝑡) 1
𝜆(𝑡) = = 𝑏 = ; 0<𝑡<𝑏
𝑃(𝑋 ≥ 𝑡) 𝑏 − 𝑡 𝑏 − 𝑡
𝑏
It is evident that the above function in interval 0 < 𝑡 < 𝑏 is increasing in terms
of 𝑡, and as times passes, the failure rate of the above-mentioned function increases.
Example 7.2
If 𝑋 denotes the lifetime of a device and follows distribution 𝐸𝑥𝑝(𝜆), obtain its
failure rate function.
Solution.
𝑓𝑋 (𝑡) 𝜆𝑒 −𝜆𝑡
𝜆(𝑡) = = −𝜆𝑡 = 𝜆 ; 𝑡 > 0
𝑃(𝑋 ≥ 𝑡) 𝑒
The function mentioned above is constant in terms of 𝑡, and as time passes,
the failure rate of the device will not change caused by the memoryless property of
the exponential distribution.
For the failure rate function, we have:
𝑎
𝑃(𝑋 > 𝑎) = 1 − 𝐹𝑋 (𝑎) = 𝑒 − ∫0 𝜆(𝑡)𝑑𝑡
𝑏
𝑃(𝑋 > 𝑏) 𝑒 − ∫0 𝜆(𝑡)𝑑𝑡 𝑏
𝑃(𝑋 > 𝑏|𝑋 > 𝑎) = = 𝑎 = 𝑒 − ∫𝑎 𝜆(𝑡)𝑑𝑡 ; 𝑏>𝑎
𝑃(𝑋 > 𝑎) 𝑒 − ∫0 𝜆(𝑡)𝑑𝑡
373 | P a g e
Example 7.3
If the failure rate function of the random variable 𝑋 is given by 𝜆(𝑡) = 𝑡 2 , obtain
𝑓𝑋 (𝑥).
Solution.
𝑥 2 1 3 1 3
𝐹𝑋 (𝑥) = 1 − 𝑒 − ∫0 𝑡 𝑑𝑡
= 1 − 𝑒 −3𝑥 ; 𝑥 > 0 ⇒ 𝑓𝑋 (𝑥) = 𝑥 2 𝑒 −3𝑥 ; 𝑥 > 0
Example 7.4
The lifetime distribution of a piece in years has failure rate function 𝜆(𝑡) = 𝑡 3 .
It is desired to calculate the probability that:
a. The lifetime of the piece is between 1 to 2 years.
b. The piece functions for 2 years if the piece has already survived 1 year.
Solution.
a.
1 2
𝑡4 𝑡4
− | − | 1
4 0 4 0
𝑃(1 ≤ 𝑋 ≤ 2) = 𝑃(𝑋 ≥ 1) − 𝑃(𝑋 > 2) = 𝑒 −𝑒 = 𝑒 −4 − 𝑒 −4
b.
2
𝑡4
2 − | 15
− ∫1 𝑡 3 𝑑𝑡 4 1
𝑃(𝑇 > 2|𝑇 > 1) = 𝑒 =𝑒 = 𝑒− 4
374 | P a g e
Example 7.5
The failure rate function of the random variable 𝑋 is 𝜆𝑋 (𝑡). If 𝑌 = 𝑎𝑋 is a random

variable in which a is a positive constant, then obtain the failure rate function of the
random variable 𝑌.
Solution.
𝑦 𝑦 1 𝑦
𝐹𝑌 (𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑎𝑋 ≤ 𝑦) ⇒ 𝑃(𝑋 ≤ ) = 𝐹𝑋 ( ); 𝑎 > 0 ⇒ 𝑓𝑌 (𝑦) = 𝑓𝑋 ( )
𝑎 𝑎 𝑎 𝑎
1 𝑡
𝑓𝑌 (𝑡) 𝑓𝑋 ( )
⇒ 𝜆𝑌 (𝑡) = = 𝑎 𝑎 = 1 𝜆 (𝑡 )
1 − 𝐹𝑌 (𝑡) 1 − 𝐹 ( 𝑡 ) 𝑎 𝑋 𝑎
𝑋 𝑎
375 | P a g e
1) If 𝑋 is a uniform random variable in interval [0,2], it is desired to calculate:
1
a. 𝑃(|𝑋 − 4 | < 1)
b. 𝑃(𝑋 2 < 2)
1 3 1
c. 𝑃(𝑋 > 4 + 4 |𝑋 > 4)
1
d. 𝑃(𝑋 < 1|𝑋 > 4)
2) If 𝑋 is a uniform random variable in interval [−1,2], it is desired to calculate:

a. 𝑃(𝑋 2 < 3)
1
b. 𝑃(|𝑋 − 5 | < 1)
1
c. 𝑃(𝑋 < 1|𝑋 > − 4)
3) Suppose that 𝑋 is a uniform random variable in interval [1,2] and its expected
value is 𝜇𝑥 . Now:
a. Determine the value of “𝑎” such that 𝑃(𝑋 < 𝑎) is equal to 0.8.
1
b. Determine the value of “𝑏” such that 𝑃(𝑋 > 𝑏 + 𝜇𝑥 ) is equal to 4.
4) Suppose that 𝑋 is a continuous uniform random variable in interval (𝑎, 𝑏) with

mean 5 and 𝑃(𝑋 > 4) = 2𝑃(𝑋 < 4). Obtain the following values:
a. 𝑃(𝑋 > 6)
c. The median of X.
d. 𝐸(𝑋|𝑋 > 6)
e. 𝑉𝑎𝑟(𝑋|𝑋 > 6)
1
5) If the moment generating function of the random variable 𝑋 is 𝑡 (𝑒 𝑡 − 1), then
obtain the following values:
a. 𝑃(𝑋 > 0)
b. 𝐸(𝑋)
c. 𝑉𝑎𝑟(𝑋)
d. The median of 𝑋.
376 | P a g e
1
e. 𝐸(𝑋|𝑋 < 4)
6) If 𝑌 is a uniform random variable in interval (0,5), obtain the probability that

the two solutions of equation 4𝑥 2 + 4𝑥𝑌 + 𝑌 + 2 = 0 are real.
7) Suppose that 𝑋 is a uniform random variable in interval [0,2] and random

𝑋 ; 𝑥≤1
variable 𝑌 is defined as 𝑌 = { 2 . It is desired to calculate:
𝑋 ; 𝑥>1
1 9
a. 𝑃(4 < 𝑌 < 4)
b. 𝐸(𝑌)
8) A boy goes to a place where he gets on his school bus every morning at 7:00
a.m. Suppose the bus arrives uniformly every day between 7:00 and 7:15 a.m. It
is desired to calculate:
a. The probability that, on a day, the boy waits more than 10 minutes until
the bus arrives at the place.
b. The average waiting time (in minutes) for the boy until the bus arrives
at the place.
c. The probability that, on a day, the boy waits more than 10 minutes until
the bus arrives at the place, if the bus has already not arrived by 7:05.
d. The average waiting time (in minutes) for the boy until the bus arrives
at the place, if we know that the bus has not arrived by 7:05.
9) Trains arrive at a station from 6:00 a.m. in half an hour intervals. If a passenger
arrives at the station in interval 𝑎 to 𝑏 hours, uniformly, and gets on the first
train arriving at the station, what is the probability that he waits more than 10
minutes to get the train in each of the following conditions?
a. The passenger arrives at the station uniformly in interval 6:00 to 6:30
a.m.
b. The passenger arrives at the station uniformly in interval 6:00 to 7:00
a.m.
c. The passenger arrives at the station uniformly in interval 6:00 to 6:40
a.m.
10) In a station, the moving trains to destination A arrive at the station at half an
hour intervals starting at 6:00 a.m., while the moving trains to destination B
377 | P a g e
arrive at the station at half an hour intervals starting at 6:10 a.m. If a passenger
uniformly arrives at the station from “𝑎” to “𝑏” hours and gets on the first train
arriving at the station, what is the probability that he gets on the train bound
for destination A in each of the following conditions?
a. The passenger arrives at the station uniformly in interval 6:00 to 7:00
a.m.
b. The passenger arrives at the station uniformly in interval 6:10 to 7:10
a.m.
c. The passenger arrives at the station uniformly in interval 6:05 to 7:05
a.m.
d. The passenger arrives at the station uniformly in interval 6:00 to 7:10
a.m.
11) If 𝑋 is a uniform random variable in interval [0,1], then it is desired to calculate:
1
a. 𝐸[Min(𝑋, )]
3
1
b. 𝐸[𝑀𝑎𝑥(𝑋, 3)]
1
c. 𝐸[|𝑋 − 3 |]
12) Consider a road of length 𝐿. There is a garage located at a distance of length

𝐿
from the beginning of the road. If a car fails on the road, we consider its
3
failure location to be uniformly distributed along the length of the road. In
such a case, if a car fails on the road, it is desired to calculate:
𝐿
a. The probability that the distance from the garage is less than 6.
b. The expected value of the distance from the garage.
13) Consider a road of length 𝐿. There are two garages located at the end and
beginning of the road. If a car fails on the road, we consider its failure location
to be uniformly distributed along the length of the road. In such a situation,
if a car fails on the road, it is desired to calculate
a. The probability that the distance from the garage located at the
beginning of the road is twice the distance from the garage located at
the end of the road.
b. The probability that the distance from the farther garage is four times
the distance from the closer garage.
378 | P a g e
c. The expected value of the distance from the nearest garage.
14) The traffic light of a crossroad remains green for 40 seconds and red for 20
seconds. A car arrives at the intersection totally at random and uniformly
during these 60 seconds.
a. Obtain the distribution of waiting time against the red light for the car.
b. Obtain the average waiting time against the red light for the mentioned
car.
c. Obtain the probability that the car waits less than 10 seconds against
the red light.
15) The number of customers entering at a store per hour follows a Poisson
distribution with parameter 𝜆 = 3. In such conditions, after opening the store,
a. If we know that one customer has entered in the first hour, obtain the
probability that the customer entered in the first twenty minutes.
b. If we know that one customer has entered in the first hour, obtain the
probability that the customer entered either in the first five minutes or
the last ten minutes.
c. If we know that three customers have entered in the first hour, obtain
the probability that all three customers entered in the first 40 minutes.
d. If we know that three customers have entered in the first hour, obtain
the probability that two customers entered in the first 40 minutes and
one customer entered in the remaining 20 minutes.
16) If 𝑋 is a uniform random variable in interval [0,1], then:
a. Obtain the density function of the random variable 𝑌 = −2 𝑙𝑛( 𝑋).
Hint: use Theorem 8.1 in Chapter 4.
b. Obtain the density function of random variable 𝑊 = 10𝑋.
c. Obtain the probability function of random variable 𝑉 = [10𝑋]. ([𝑥] is the
17) If the continuous random variable 𝑋 has density function 𝑓 and cumulative
distribution function 𝐹, and we define the random variable 𝑌 as 𝑌 = 𝐹(𝑋).
a. Using Theorem 8.1 in Chapter 4, show that the random variable 𝑌 is
uniform in interval [0,1].
1
b. Obtain 𝑃(𝑌 < 4).
379 | P a g e
1
c. Obtain 𝑃(𝑌 − 𝐸(𝑌) < 4).
d. Obtain 𝐸[𝑌 4 (1 − 𝑌)].
18) Annual rainfall magnitude (in centimeters) in a certain area follows a normal
distribution with mean 30 and standard deviation 3. It is desired to calculate
a. The probability that the rainfall magnitude is between 27 and 33 units
of rainfall.
b. The probability that the rainfall magnitude exceeds 24 units.
c. The probability that the rainfall magnitude exceeds 33 units, if we know
that the rainfall magnitude exceeds 30 units in that year.
d. The probability that we experience four consecutive years with the
rainfall magnitude of more than 24 units (assuming the independence
of rainfall in different years).
e. The probability that there are at least 3 out of 4 years in the future in
which the rainfall magnitude exceeds 30 units.
f. The average number of upcoming years before getting the first year in
which the rainfall magnitude exceeds 35 units.
19) Suppose that the scores assigned to students by a professor follow a normal
distribution with mean 83 and standard deviation 4.
a. If a student wants to be among the top %2, what is the minimum score
he should get?
b. If 15 students get scores over 88 in the class, what should be the
approximate number of students?
20) The lifetime of light bulbs produced by a factory follows a normal distribution
with mean 24 and standard deviation 3 measured in months.
a. If the factory guarantees 20 months for its light bulbs, what proportion
of light bulbs last less than the guaranteed number?
b. If the factory guarantees 20 months for its light bulbs, what is the
probability that, in a sample of size 5 from light bulbs, one of them lasts
less than 20 months (consider lifetime-related trials of different light
bulbs to be independent.)?
380 | P a g e
c. If the factory guarantees its light bulbs such that only one percent of
them are rejected, how many months should the factory consider to
guarantee the lifetime of the light bulbs?
𝑡2
21) The moment generating function of the random variable 𝑋 is 𝑀𝑋 (𝑡) = 𝑒 2 . In
such a case,
a. What value of “𝑎” satisfies the relationship 𝑃(𝑋 > 𝑎) = 0.5?
b. Obtain the expected value and variance of the random variable.
c. Obtain 𝐸(𝑒 2𝑋 ).
d. Obtain 𝑃(𝑋 > 𝑎|𝑋 2 > 𝑎2 ) for positive values of “𝑎”.
22) If the random variable 𝑋 follows a normal distribution with parameters 𝜇 and
𝜎 2 , then it is desired to calculate:
a. 𝑃(𝑋 > 𝜇 + 𝜎|𝑋 > 𝜇)
b. 𝑃(𝑋 > 𝜇 + 2𝜎|𝑋 > 𝜇 + 𝜎)
c. 𝑃(𝑋 > 𝜇 + 𝜎|𝑋 > 𝜇 − 𝜎)
23) If the random variable 𝑋 follows a normal distribution with parameters 𝜇 = 1
and 𝜎 2 = 4, then it is desired to determine:
a. 𝑃(|𝑋| < 3)
b. 𝑃(|𝑋 − 1| < 3)
c. 𝑃(1 < |𝑋| < 3)
d. A value of the random variable 𝑋 whose probability to the left is equal
to 0.95.
e. A value of the random variable 𝑋 whose probability to the left is equal
to 0.05.
24) In a normal distribution with 𝜎 2 = 4, a point whose probability to the left is 0.1
is equal to 50. In such a case, it is desired to determine:
a. A point whose probability to the left is 0.5.
b. A point whose probability to the left is 0.1.
c. A point whose probability to the left is 0.05.
25) The useful lifetime of a car battery follows a normal distribution with mean 24
months and standard deviation 2 months. If the factory producing the battery
wants to guarantee its product for 20 months, then:
381 | P a g e
a. What proportion of the batteries should be replaced by the factory (the
factory should pay the replacement cost)?
b. If the replacement cost of each battery is $20 for the factory, on
average, how much replacement cost should be paid by the factory for
a lot consisting of 8000 batteries?
26) To control bullets, they are passed through two holes. If it does not pass
through the hole number 1 but the hole number 2, then it is acceptable. The
diameter of the first hole is 3.2 and the second one is 3.3 millimeters. If the
diameter of bullets follows a normal distribution with mean 3.26 millimeters
and standard deviation 0.02 millimeters, on average, how many of 1000 bullets
do you expect to be acceptable?
27) Suppose that the lifetime of a motor produced by a factory (in years) follows a
normal distribution. If 10 percent of the motors last less than 6 years, and 5
percent of them last more than 8.92, it is desired to calculate:
a. The expected value and standard deviation of the lifetime of motors
produced by the factory.
b. The probability that a motor of the factory lasts more than 9.5 years, if
we know that it lasts more than 8.5.
2
28) The moment generating function of the random variable 𝑋 is 𝑀𝑥 (𝑡) = 𝑒 50(𝑡+𝑡 ) .
a. Obtain 𝑃(𝑋 < 40).
b. If the distance of points “𝑎” and “𝑏” from the mean of this random
variable is the same and we know that 𝑃(𝑎 < 𝑋 < 𝑏) = 0.8, obtain the
values of “𝑎” and “𝑏”.
c. If there is a value from this random variable such as 𝑐 satisfying
relationship 𝑃(𝑋 ≥ 𝑐) = 3𝑃(𝑋 < 𝑐), then obtain the value of 𝑐.
29) In each of the following conditions, obtain 𝐸(𝑋 4 + 𝑋 3 + 𝑋 2 + 𝑋) for the random
variable 𝑋.
a. If 𝑋 is a standard normal variable.
b. If 𝑋 is a normal variable with mean zero and standard deviation 2.
30) Suppose that 𝑋 is a normal variable with mean 2 and standard deviation 2. In
such a case,
382 | P a g e
a. If 𝑎, 𝑏, and 𝑐 denote the eighth, ninth, and tenth moments about the
mean, show that the relationship 𝑏 < 𝑎 < 𝑐 is valid.
b. Obtain the variance of 𝑋 2 − 4𝑋.
31) The random variable 𝑋 follows a normal distribution with mean 𝜇 and
standard deviation 𝜎. After observing a value of 𝑋 randomly, a rectangle is
made of dimesons |𝑋| and 3|𝑋|. Obtain the expected value of this rectangle.
32) Suppose that the random variable 𝑋 follows the standard normal
distribution. In such a case, if the random variable 𝑋 is defined as 𝑌 = |𝑋|, it
a. The expected value of the random variable 𝑌.
b. The variance of the random variable 𝑌.
c. The density function of the random variable 𝑌.
Hint: to obtain the density function of the random variable 𝑌, use the
explanations of Section 4.8 in Chapter 4.
distribution. In such a case, if the random variable 𝑊 is defined as 𝑊 =
(𝑋|𝑋 > 0), it is desired to calculate:
a. The expected value of the random variable 𝑊.
b. The variance of the random variable 𝑊.
c. The density function of the random variable 𝑊.
d. Hint: use the explanations of Section 5.11 in Chapter 5.
distribution. If so, it is desired to calculate:
a. 𝐸(𝑀𝑎𝑥(𝑋, 0))
b. 𝐸(𝑀𝑖𝑛(𝑋, 0))
c. 𝐸(𝑋|𝑋 > −1)
35) Suppose that the random variable 𝑋 follows a normal distribution with mean
𝜇 and variance 𝜎 2 .
a. Obtain the value of 𝐸(|𝑋 − 𝜇|).
b. Obtain the value of 𝑉𝑎𝑟(|𝑋 − 𝜇|).
36) If 𝑋 follows a normal distribution with mean 1 and variance 2, it is desired to
calculate:
383 | P a g e
a. 𝐸(𝑒 2𝑋 )
b. 𝑉𝑎𝑟(𝑒 2𝑋 )
𝜇 and variance 𝜎 2 . If the random variable 𝑌 is defined as 𝑌 = 𝑒 𝑋 (we call the
distribution of the random variable 𝑌 Lognormal), it is desired to calculate:
a. The expected value of the random variable 𝑌.
b. The variance of the random variable 𝑌.
c. The density function of 𝑌.
d. Hint: to obtain the density function of random variable 𝑌, use the
explanations of Section 4.8 in Chapter 4.
38) A random number generator device produces numbers independently and
with uniform density in interval 0 to 1. If this device produces 100 random
numbers, then obtain the probability that at least 50 of them are greater than
0.5.
39) Experience has shown that, in a store, 60 percent of customers are women. If
500 customers enter the store independently, it is desired to calculate:
a. The probability that at least 275 of them are women.
b. The probability that the number of women on the referred day is at least
100 individuals more than that of the men.
c. The probability that the number of women on the referred day is at least
2 times that of the men.
d. The expected value and variance of the store's sales amount on the
referred day if each woman buys $100 and each man buys $200,
respectively.
40) In each of the following density functions, obtain constant 𝑐.
2
a. 𝑓(𝑥) = 𝑐𝑒 −𝑥 𝑥∈𝑅
−4𝑥 2 +8𝑥
b. 𝑓(𝑥) = 𝑐𝑒 𝑥∈𝑅
c. 𝑓(𝑥) = 𝑐𝑒 −𝑥(𝑥+1) 𝑥∈𝑅
𝜇 = 0 and variance 𝜎 2 . If so, for what value of 𝜎, the value of 𝑃(𝑎 < 𝑋 < 𝑏) is
maximized? (𝑎 and 𝑏 are positive constants such that 𝑎 < 𝑏.)
384 | P a g e
42) Suppose that the lifetime of an electric device follows an exponential
distribution with a failure rate of 𝜆 = 2 per thousand hours.
a. What is the probability that this device lasts at least 1000 hours?
b. On average, how long this device lasts?
c. If a second-hand device of this type has lasted 2000 hours so far, what
is the probability that it lasts another 1000 hours?
d. If the device has lasted 2000 hours so far, on average, how long does it
last?
43) Suppose that the number of accidents on a road per day follows a Poisson
distribution with an occurrence rate of 𝜆 = 2. It is desired to calculate:
a. The probability that the time from today to the occurrence of the first
accident is at most 3 days.
b. The probability that the time from today to the occurrence of the first
accident is at least 3 days if we know that no events have occurred in
the two previous days.
c. The probability that the time between the first and second accidents is
more than 2 days.
44) If the distribution of the number of cars crossing a particular point in a street
follows a Poisson random variable with rate 𝜆 = 30 (per hour), and the time for
a person to cross the street is 1 minute, find the probability that no cars hit the
person while he is crossing the street. (Suppose if a car crosses the street while
the person is crossing the street, then the car definitely crashes into the
person.)
1
45) Consider light bulbs whose lifetime is exponential with mean (in thousand
𝜆
hours) such that each of them, independently, lasts more than 𝑇 units of time
with probability 𝑒 −𝜆𝑇 . Obtain the probability function, expected value, and
variance of the following random variables:
a. The number of light bulbs out of 10 ones lasting more than 𝑇 units
of time.
b. The number of light bulbs required to get the first light bulb lasting
more than 𝑇 units of time.
c. The number of light bulbs required to get the 𝑟 𝑡ℎ light bulb lasting more
than 𝑇 units of time.
385 | P a g e
46) Suppose that the lifetime of an electronic piece in years follows an exponential
distribution with mean 0.2. Assuming that the trials of investigating the
lifetime are independent, it is desired to calculate:
a. The probability that, of a random sample of size five, four pieces last less
than 1 year.
b. The probability that the first piece lasting more than 1 year is the fifth
investigated piece.
c. The average number of pieces required to get the first piece lasting
more than 1 year.
47) Suppose that 𝑋 follows a Poisson distribution with parameter 𝜆𝑡, and 𝑌 follows
an exponential distribution with parameter 𝜆. If so, show that the relationship
𝑃(𝑋 = 0) = 𝑃(𝑌 > 𝑡) is valid.
48) Suppose that 𝑋 follows an exponential distribution such that 𝑃(𝑋 < 4) =
3𝑃(𝑋 > 4).
a. If we denote the standard deviation of this random variable by 𝜎, obtain
its value.
b. If we denote the median of this random variable by 𝑚, obtain the value
𝑚
of 𝑃(𝑋 > 2 ).
c. If we denote the mean of this random variable by 𝜇, obtain the value of
𝑃(𝑋 > 𝜇).
49) Suppose that 𝑋 follows an exponential distribution with density function
𝑓𝑋 (𝑥) = 𝜆𝑒 −𝜆𝑥 ; 𝑥 > 0. If so, it is desired to calculate:
a. 𝑃(|𝑋 − 𝜇| ≤ 𝜎)
b. 𝑃(|𝑋 − 𝜇| ≤ 3𝜎)
c. 𝑃(𝑋 > 𝜇 + 2𝜎|𝑋 > 𝜇 + 𝜎)
50) If 𝑋 is an exponential random variable with an occurrence rate of 𝜆 = 4,

a. Determine the value of “𝑎” such that 𝑃(𝑋 < 𝑎) = 0.90 (in the continuous
random variables, this point is called the 90th percentile of the random
variable).
386 | P a g e
b. Determine the value of “𝑏” such that 𝑃(𝑋 < 𝑏) = 0.1 (in the continuous
random variables, this point is called the first decile of the random
variable).
51) If 𝑋 is an exponential random variable with mean 𝜇, then:
a. Determine the value of 𝜇 such that 𝑃(𝑋 < 4) = 0.90.
b. Determine the value of 𝑏 such that 𝑃(𝑋 < 𝑏) = 2𝑃(𝑋 > 𝑏).
52) Suppose you want to go to the stadium to watch the match of your favorite
soccer team. Your travel time from home to the stadium is an exponential
random variable with mean 20 minutes. If you want to have no delay with
probability 95 percent and the match begins at 8:00 p.m., it is desired to
calculate the latest time you should depart from the home.
53) Suppose that random variable 𝑋 follows an exponential distribution with mean
1
. If so, then:
𝜆
a. Show that relationship 𝐸(𝑋|𝑋 > 1) = 𝐸(𝑋 + 1) is valid.

b. Show that relationship 𝐸(𝑋 𝑘 |𝑋 > 1) = 𝐸[(𝑋 + 1)𝑘 ] is valid for positive 𝑘.
54) Suppose you want to travel to point A and there are two main routes to get
there. If you choose the route number 1, then your travel time is an exponential
random variable with mean 1. On the other hand, if you choose the route
number 2, it is an exponential random variable with mean 2. Moreover, if you
1
choose routes 1 and 2 with the same probability of and the random variable
2
𝑋 denotes the travel time, then:
a. Obtain the value of 𝑃(𝑋 > 1).
b. Obtain 𝐹𝑋 (𝑥) = 𝑃(𝑋 ≤ 𝑥) and 𝐹𝑋 (𝑥) = 𝑃(𝑋 ≤ 𝑥).
c. Obtain the value of 𝑃(𝑋 > 2|𝑋 > 1).
55) If 𝑋 is an exponential random variable with density function 𝑓(𝑥) = 3𝑒 −3𝑥 ; 𝑥 > 0,
a. 𝐸(𝑒 𝑥 )
b. 𝐸(𝑋10 )
c. 𝐸[(𝑋 − 1)2 ]
d. 𝐸[(𝑋 − 1)3 ]
e. 𝐸([𝑋]) ([𝑥] is the largest integer less than or equal to x)
387 | P a g e
f. 𝐸(𝑒 [𝑋] ) ([𝑥] is the largest integer less than or equal to 𝑥)
g. 𝐸(𝑋|𝑋 > 𝑎)
h. 𝐸(𝑋|𝑋 < 𝑎)
i. 𝐸(𝑀𝑎𝑥(𝑋, 𝑎))
j. 𝐸(𝑀𝑖𝑛(𝑋, 𝑎))
1 ;𝑥 ≤ 𝑎
k. 𝐸[𝑔(𝑋)] if 𝑔(𝑥) is defined as 𝑔(𝑥) = { .
2 ;𝑥 > 𝑎
56) The lifetime of an electronic device follows an exponential distribution with
failure rate 𝜆. If the device fails or its lifetime reaches “𝑎” units of time, it is
replaced.
a. What is the average time that it takes to replace this device?
b. Obtain the distribution of the time that it takes to replace the device.
57) A gasoline station has a tank with a capacity of 70 (in thousand liters) that is
filled at the beginning of each period. In this gas station, the amount of
gasoline demand in each period has an exponential distribution with mean 10
(in thousand liters).
a. Obtain the distribution of the amount of gasoline sold in one period of
the station.
b. Obtain the distribution of random variable denoting the gasoline
shortage amount in one period of this station.
c. Obtain the expected value of the gasoline shortage amount in one
period of this station.
d. Determine the volume of the tank such that the probability of facing a
shortage in each period is equal to 0.01.
58) The light bulbs produced by a factory are defective with probability 0.1 and do
not work. If one light bulb is not defective, its lifetime is exponential with mean
1 year.
a. Obtain the distribution of the lifetime of the light bulbs produced by the
factory.
b. Obtain the average lifetime of the light bulbs produced by the factory.
59) If the random variable 𝑋 follows an exponential distribution with density
function 𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥 ; 𝑥 > 0, obtain the distribution of the following random
variables:
388 | P a g e
a. 𝑌 = 𝑎𝑋, where “𝑎” is a positive constant.
b. 𝑊 = 𝑎𝑋 + 𝑏, where “𝑎” and “𝑏” are positive constants.
c. 𝑍 = 𝑋 𝑘 , where “𝑘” is a positive constant.
d. 𝑇 = I(𝑋>1)
Note that I(𝐴) or the indicator random variable of event A is a Bernoulli random
variable taking on value 1 when event A occurs.
e. 𝑉 = [𝑋] ([𝑥] is the largest integer less than or equal to x.)
60) Consider a road whose length is infinite. We want to establish an emergency
center at a point of the road. In each of the following conditions, how far
should the emergency center be from the beginning of the road to minimize
the average distance of the accident location from the emergency center?
a. Suppose that if the accident occurs along the length of the road, its
distance from the beginning of the road follows a uniform distribution
in interval (0, 𝐿).(𝐿 < ∞)
b. Suppose that if the accident occurs along the length of the road, its
distance from the beginning of the road follows an exponential
distribution with rate 𝜆.
61) The demand magnitude of a raw material in thousand kilograms on behalf of a
producer during a certain interval follows an exponential distribution with
mean 2 thousand kilograms. If the cost of production for each thousand
kilograms of this product is equal to 40 thousand dollars, and we know that
the unsold products at the end of the period are not usable, how many
products should be produced to maximize the average profit resulting from
the sales?
62) Suppose that the number of hourly customers in a store follows a Poisson
distribution with parameter 𝜆 = 3.
a. What is the probability that the time until the arrival of the first
customer is more than 2 hours?
b. What is the probability that the time between the arrival of the first and
second customers is more than 2 hours?
c. What is the probability that the time until the arrival of the third
customer is more than 2 hours?
389 | P a g e
d. What is the probability that the time between the arrival of the second
and fourth customers is more than 2 hours?
63) If the time between the occurrence of fires in a forest area follows an
exponential distribution with a rate of 0.025 fires per day,
a. What is the probability that the time from the first day of the year to the
first fire of the year lasts at most 100 days?
b. What is the probability that the time from the first day of year to the
second fire of the year lasts at most 200 days?
64) Suppose that the number of defective pieces per hour manufactured in a
production line follows a Poisson distribution with parameter 𝜆 = 2.
a. After 8:00 a.m., what is the probability that the time until the production
of the first defective piece exceeds 4 hours?
b. After 8:00 a.m., what is the probability that the time until the production
of the third defective piece exceeds 4 hours?
c. If we know that 3 defective pieces are manufactured from 8 to 10 a.m.,
what is the probability that all of them are produced in the first half an
hour of the interval?
d. If we know that 3 defective pieces are manufactured from 8 to 10 am,
what is the probability that two of them are produced in the first half of
an hour of the interval?
e. If we know that 3 defective pieces are manufactured from 8 to 11 a.m.,
what is the probability that one defective piece is produced per hour?
65) Suppose that the number of customers entering a store follows a Poisson
distribution with parameter 𝜆 = 3 per hour. Consider 12 a.m. to be the time
origin. It is desired to calculate:
a. The probability that the second customer arrives at the store after 2:00
p.m.
b. The probability that the second customer arrives at the store between
2:00 and 3:00 p.m.
c. The probability that the second customer arrives at the store after
3:00 p.m. If we know that the person has not arrived by 2:00 p.m.
390 | P a g e
1
66) In the preceding problem, if each customer is a man with probability and
3
2
a woman with probability 3, then it is desired to calculate:
a. The probability that the second female customer arrives after 2:00
p.m.
b. The probability that the second female customer arrives between
2:00 and 3:00 p.m.
67) Suppose that 𝑋 follows a Poisson distribution with parameter 𝜆𝑡, and 𝑌
follows an Erlang distribution with parameter 𝛼 and 𝜆. If so, show that the
relationship 𝑃(𝑋 < 𝛼) = 𝑃(𝑌 > 𝑡) is valid.
68) A system consists of two parts which are used alternately. If the lifetime of
system in years has the following density function:
𝑐𝑥𝑒 −2𝑥 ;𝑥 > 0
𝑓(𝑥) = {
0 ;𝑥 ≤ 0
a. Constant 𝑐.
b. The expected value and variance of the random variable 𝑥.
c. The probability that the system functions at least 2 years.
−𝑥
69) If the density function of random variable 𝑋 is 𝑓(𝑥) = 𝑐 𝑥 𝑑 𝑒 3 ; 𝑥 > 0, and
we know that 𝐸(𝑋) = 9, it is desired to calculate:
a. Constants 𝑐 and 𝑑.
c. 𝐸(𝑋 4 )
70) If the random variable 𝑋 follows a gamma distribution with density function
𝜆𝛼
𝑓(𝑥) = 𝛤(𝛼) 𝑥 𝛼−1 𝑒 −𝜆𝑥 ; 𝑥 > 0, obtain the distribution of the following random
variables:
a. 𝑌 = 𝑎𝑋, where “𝑎” is a positive constant.
b. 𝑊 = 𝑎𝑋 + 𝑏, where “𝑎” and “𝑏” are positive constants.
c. 𝑍 = 𝑋 𝑘 , where “𝑘” is a positive constant.
391 | P a g e
71) The idle time (in hours) of a machine is a random variable 𝑋 follows a gamma
𝑥
𝜆3
distribution with density function 𝑓(𝑥) = 𝛤(3) 𝑥 2 𝑒 −2 ; 𝑥 > 0. The loss caused by
being idle is specified by 𝐿 = 30𝑋 + 2𝑋 2 . What is the average loss?
72) If the random variable 𝑍 follows the standard normal variable, show that the
1
random variable 𝑌 = 𝑍 2 follows a gamma distribution with parameters 𝛼 = 2
1
and 𝜆 = 2.
73) If the random variable 𝑋 follows a beta distribution with parameters 𝑎 = 1 and
𝑏 = 1, it is desired to calculate:
1 1
a. 𝑃(|𝑋 − 2 | < 3)
1
b. 𝐸[𝑋 −2 (1 − 𝑋)10 ]
74) If the random variable 𝑋 follows a beta distribution with parameters 𝑎 = 2 and
𝑏 = 1, it is desired to calculate:
1 2
a. 𝑃(𝑋 > 3 |𝑋 < 3)
−1
b. 𝐸[(1 − 𝑋) 2 ]
c. The distribution of the random variable 𝑌 = − 𝑙𝑛( 𝑋 2 ).
75) Suppose that the random variable 𝑋 follows the following density function:
𝑓(𝑥) = 𝑐𝑥 𝑑 (1 − 𝑥)3 ; 0 < 𝑥 < 1
1
If 𝐸(𝑋) = 3, then obtain values of 𝑐 and 𝑑.
76) The useful lifetime (in years) of a battery follows a Weibull distribution with
the following density function:
−𝑥 2
𝑓(𝑥) = {2𝑥𝑒 ;𝑥 > 0
0 ;𝑥 ≤ 0
a. The probability that this battery lasts more than 2 years.
b. The expected value of functioning time of the battery.
77) If the failure rate function of the lifetime of a device is 𝜆(𝑡) = 2𝑡, then obtain
𝑃(1 < 𝑋 < 2).
78) A type of battery is produced in factories A and B. The failure rate of the
battery in factory A is three times that of the battery in factory B. If a battery
392 | P a g e
produced by factory A lasts more than three years with probability 0.027,
obtain the probability that a battery produced by factory B lasts more than
three years.
79) If the failure rate function of the lifetime of a device is given by
0.1 ; 0<𝑡<3
𝜆(𝑡) = {0.1 + 0.2(𝑡 − 3) ; 3 ≤ 𝑡 < 7
1.5 ; 7≤𝑡
What is the probability that a ten-year-old device still functions?
393 | P a g e
I n Chapter 4, the random variable was defined and some of their properties were
addressed in Chapter 5. Further, some special and commonly used random
variables were addressed in Chapters 6 and 7. However, the probabilistic
distributions we have learned so far were all associated with one random variable.
To be more specific, sometimes we are inclined to define more than one random
variable on the sample space of a trial and investigate them simultaneously. For
instance, suppose that a system consists of two parts and we intend to analyze the
lifetime of the system. In such a case, the lifetime of both parts must be analyzed
simultaneously. In this chapter, we learn how to approach these type of problems in
which several random variables are present.
I f the two discrete random variables 𝑋 and 𝑌 are defined on the sample space of a
trial, then the probability of their corresponding elements' simultaneous
occurrence is known as the joint probability function of 𝑋 and 𝑌 and shown by
function 𝑃(𝑥, 𝑦):
𝑃(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)
394 | P a g e
It should be noted that 𝑃(𝑥, 𝑦) is always less than or equal to 1. Moreover, if
possible values of 𝑋 and possible values of 𝑌 are denoted by 𝐴 and 𝐵, respectively,
then we have:
1. 𝑃(𝑥, 𝑦) = 0 for 𝑥 ∉ 𝐴 or 𝑥 ∉ 𝐵
2. 𝑃(𝑋 ∈ 𝐴, 𝑌 ∈ 𝐵) = 1; i.e, ∑𝑥∈𝐴 ∑𝑦∈𝐵 𝑃(𝑥, 𝑦) = 1
Example 2.1
Suppose we select three balls at random from an urn containing 3 white, 4

black, and 3 red balls. If 𝑋 and 𝑌 denote the number of selected white and black
balls, respectively, then obtain the joint probability function of 𝑋 and 𝑌.
Solution. If (𝑋 = 𝑥, 𝑌 = 𝑦) means that 𝑥 white and 𝑦 black balls are selected, the set
of possible values for 𝑥 and 𝑦 consists of integers satisfying the inequality 𝑥 + 𝑦 ≤ 3.
Therefore, we have:
3 4 3
( )( )( )
𝑥 𝑦 3−𝑥−𝑦
𝑃(𝑥, 𝑦) = ; 0≤𝑥+𝑦 ≤3
10
( )
3
Namely, for 𝑃(𝑋 = 1, 𝑌 = 2), we have:
3 4 3
( )( )( ) 18
𝑃(1,2) = 1 2 0 =
10 120
( )
3
Considering the resulted values, 𝑃(𝑋 = 1) is equal to:
𝑃(𝑋 = 1) = 𝑃(𝑋 = 1, 𝑌 = 0) + 𝑃(𝑋 = 1, 𝑌 = 1) + 𝑃(𝑋 = 1, 𝑌 = 2)

9 36 18 63
= + + =
120 120 120 120
395 | P a g e
In fact, if 𝑋 and 𝑌 denote have joint probability function 𝑃(𝑥, 𝑦), then we have:
𝑃𝑋 (𝑥) = 𝑃(𝑋 = 𝑥) = ∑ 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)

𝑦
(2-1)
𝑃𝑌 (𝑦) = 𝑃(𝑌 = 𝑦) = ∑ 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)
𝑥
In such cases, 𝑃𝑋 (𝑥) and 𝑃𝑌 (𝑦) are called the marginal probability function of 𝑋
and the marginal probability function of 𝑌, respectively. In this example, the marginal
probability function of 𝑋 is calculated as follows:
3 4 3
( )( )( )
𝑥 𝑦 3−𝑥−𝑦
𝑃𝑋 (𝑥) = 𝑃(𝑋 = 𝑥) = ∑ 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ∑
10
𝑦 𝑦 ( )
3
3 3 7
( ) 4 3 ( )( )
= 𝑥 ∑( )( ) = 𝑥 3 − 𝑥 ; 𝑥 = 0,1,2,3
10 𝑦 3−𝑥−𝑦 10
( ) 𝑦 ( )
3 3
In other words, 𝑋 = 𝑥 means that 𝑥 out of the white balls and 3 − 𝑥 out of the
other balls should be selected. As seen, the marginal probability function of 𝑋 is
indeed a hypergeometric probability function with parameters (𝑁 = 10, 𝑚 = 3, 𝑛 = 3).
Moreover, same as before, 𝐸(𝑋) is obtained from ∑𝑥 𝑥𝑃𝑋 (𝑥), which is as follows for
the hypergeometric distribution:
𝑛𝑚 9
∑ 𝑥𝑃𝑋 (𝑥) = =
𝑁 10
𝑥
Likewise, it can be shown that the marginal probability function of 𝑌 is as

follows:
4 6
( )( )
𝑦 3−𝑦
𝑃𝑌 (𝑦) = ∑ 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ; 𝑦 = 0,1,2,3
10
𝑥 ( )
3
Sometimes, values of the joint probability of 𝑋 and 𝑌 cannot be simply

formulated with a single function. In such cases, the joint probability of 𝑋 and 𝑌 is
396 | P a g e
shown by a table. For instance, if we want to show the probability function of
Example 2.1 as a table, the obtained result is as follows:
𝑌
0 1 2 3 𝑃(𝑋 = 𝑥)
𝑋
1 12 18 4 35
0
120 120 120 120 120
9 36 18 63
1 0
120 120 120 120
9 12 21
2 0 0
120 120 120
1 1
3 0 0 0
120 120
20 60 36 4
𝑃(𝑌 = 𝑦)
120 120 120 120
Example 2.2
An urn contains 4 marbles numbered 1 through 4. If two marbles are selected

at random and with replacement from the urn, and 𝑋 and 𝑌 denote the smaller and
larger numbers, respectively, obtain the joint probability function of 𝑋 and 𝑌.
Solution. If (𝑋 = 𝑥, 𝑌 = 𝑦) means that the minimum obtained number is 𝑥 and the

maximum obtained number is 𝑦, then the set of possible values for 𝑥 and 𝑦 consists
of integers less than or equal to 4 satisfying the inequality 𝑥 ≤ 𝑦. In cases that 𝑥 and
1
𝑦 are equal, 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) is . Namely, 𝑃(𝑋 = 1, 𝑌 = 1) means the probability that
16
the minimum and maximum numbers are equal to 1, i.e. the event that both choices
1
result in 1's, which has the probability of . In cases that 𝑥 and 𝑦 are not equal,
16
2
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) is 16. Namely, 𝑃(𝑋 = 1, 𝑌 = 2) means the probability that the minimum
obtained number is equal to 1 and the maximum obtained number is equal to 2. This
means that the first choice is 1 and the second choice is 2, or the first choice is 2 and
397 | P a g e
2
the second choice is 1, which has the probability of . Hence, the joint probability
16
function and marginal probability functions of the random variables 𝑋 and 𝑌 can be
shown as follows:
𝑌
1 2 3 4 𝑃(𝑋 = 𝑥)
𝑋
1 2 2 2 7
1
16 16 16 16 16
1 2 2 5
2 0
16 16 16 16
1 2 3
3 0 0
16 16 16
1 1
4 0 0 0
16 16
1 3 5 7
𝑃(𝑌 = 𝑦)
16 16 16 16
In the bivariate space, the cumulative distribution function of 𝑋 and 𝑌 is

defined as follows:
𝐹𝑋,𝑌 (𝑥, 𝑦) = 𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦) (2-2)
For instance, in Example 2.2, we have:

𝐹𝑋,𝑌 (2,3) = 𝑃(𝑋 ≤ 2, 𝑌 ≤ 3) = 𝑃(1,1) + 𝑃(1,2) + 𝑃(1,3) + 𝑃(2,2) + 𝑃(2,3)
1 2 2 1 2 8
= + + + + =
16 16 16 16 16 16
Furthermore, in the bivariate space, the marginal cumulative distribution
function of the random variable 𝑋 is defined as follows:
𝐹𝑋 (𝑥) = 𝑃(𝑋 ≤ 𝑥) = 𝑃(𝑋 ≤ 𝑥, 𝑌 < +∞) = 𝐹𝑋,𝑌 (𝑥, +∞) (2-3)
Similarly, the marginal cumulative distribution function of the random variable

𝑌 is defined as follows:
398 | P a g e
𝐹𝑌 (𝑦) = 𝑃(𝑌 ≤ 𝑦) = 𝑃(𝑋 < +∞, 𝑌 ≤ 𝑦) = 𝐹𝑋,𝑌 (+∞, 𝑦) (2-4)
For instance, in Example 2.2, we have:

2
𝐹𝑋 (2) = 𝑃(𝑋 ≤ 2) = 𝑃(𝑋 ≤ 2, 𝑌 ≤ +∞) =
16
A s mentioned in Chapter 4, in the continuous univariate space, the density

function of random variable 𝑋 at point 𝑥 is defined as follows:
𝑑𝑥 𝑑𝑥
𝑃(𝑥 − 2 < 𝑋 < 𝑥 + 2 )
𝑓𝑋 (𝑥) = 𝑙𝑖𝑚
𝑑𝑥→0 𝑑𝑥
If the two continuous random variables 𝑋 and 𝑌 are defined on the sample
space of a trial, then their joint probability density function at point (𝑥, 𝑦) is defined
as follows:
𝑑𝑥 𝑑𝑥 𝑑𝑦 𝑑𝑦
𝑃(𝑥 − 2 ≤ 𝑋 ≤ 𝑥 + 2 , 𝑦 − 2 ≤ 𝑌 ≤ 𝑦 + 2 )
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑙𝑖𝑚
𝑑𝑥→0 𝑑𝑥 𝑑𝑦
𝑑𝑦→0
𝑑𝑥 𝑑𝑦
Therefore, for region (𝑥 ± ,𝑦 ± ), we have:
2 2
𝑙𝑖𝑚 𝑃(𝑥 − ≤𝑋≤𝑥+ ,𝑦 − ≤ 𝑌 ≤ 𝑦 + ) = 𝑙𝑖𝑚 𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑥𝑑𝑦
𝑑𝑥→0 2 2 2 2 𝑑𝑥→0
𝑑𝑦→0 𝑑𝑦→0
399 | P a g e
Hence, to calculate the probability of an event like 𝐸 resulting from the
continuous random variables 𝑋 and 𝑌, we first divide the region into small parts with
equal area of 𝑑𝑥𝑑𝑦 and then sum up the probabilities of those small parts:
𝑃((𝑋, 𝑌) ∈ 𝐸) = ∬ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 (2-5)

(𝑥,𝑦)∈𝐸
If so, for two sets 𝐴 = (−∞, +∞) and 𝐵 = (−∞, +∞), we have:
+∞ +∞
𝑃(𝑋 ∈ 𝐴, 𝑌 ∈ 𝐵) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
−∞ −∞
In the continuous univariate space, the probability of a region is equal to the

area under the density function in the region. In the continuous bivariate space, the
probability of a region is equal to the volume under the density function in the region.
Furthermore, the total area under the density function of the continuous univariate
space and the total volume under the density function of the continuous bivariate
space are equal to 1.
Example 2.3
Suppose that 𝑋 and 𝑌 denote the lifetime of two pieces of a device and their
joint density function is given by
400 | P a g e
𝑓 (𝑥, 𝑦) = 𝑒 −(𝑥+𝑦) ; 𝑥 ≥ 0 , 𝑦 ≥ 0
a. Obtain 𝑃(𝑋 < 𝑌).

b. Obtain 𝑃(𝑋 < 𝑌, 𝑋 + 𝑌 < 1).
c. Obtain 𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎).
Solution.
a.
∞ 𝑦 ∞ 𝑦
−(𝑥+𝑦) −𝑦
𝑃(𝑋 < 𝑌) = ∫ ∫ 𝑒 𝑑𝑥𝑑𝑦 = ∫ 𝑒 ∫ 𝑒 −𝑥 𝑑𝑥 𝑑𝑦
0 0 0 0 𝑌
∞ ∞ 𝑦=𝑥
𝑦
= ∫ 𝑒 −𝑦 (−𝑒 −𝑥 |0 )𝑑𝑦 = ∫ 𝑒 −𝑦 (1 − 𝑒 −𝑦 )𝑑𝑦
0 0
∞ ∞
1 1
= ∫ 𝑒 −𝑦 𝑑𝑦 − ∫ 𝑒 −2𝑦 𝑑𝑦 = 1 − = 𝑋
0 0 2 2
b.
1
1−𝑥
2
𝑃(𝑋 < 𝑌, 𝑋 + 𝑌 < 1) = ∫ ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑦 𝑑𝑥
0 𝑥
𝑌
1 1 1
1−𝑥
2 2
−𝑥 −𝑦 −𝑥 −𝑥 −(1−𝑥)
=∫ 𝑒 ∫ 𝑒 𝑑𝑦 𝑑𝑥 = ∫ 𝑒 (𝑒 −𝑒 ) 𝑑𝑥 𝑦=𝑥
0 𝑥 0
1 1 1 1
2
−2𝑥
1 2 2 2
=∫ 𝑒 𝑑𝑥 − ∫ 𝑒 𝑑𝑥 = ∫ 2𝑒 −2𝑥 𝑑𝑥 − 𝑒 −1 ∫ 𝑑𝑥
−1
0 0 2 0 0
1 1 1 1 1 1 1 1 𝑋
1
= (1 − 𝑒 −2×2 ) − 𝑒 −1 = − 𝑒 −1 − 𝑒 −1 = − 𝑒 −1 2
2 2 2 2 2 2
Note that the region of this example is not regular with respect to the 𝑥-axis.
In double integral evaluation, we write “𝑑𝑥” before “𝑑𝑦” if the region is regular with
respect to the 𝑥-axis. This means that if we draw lines parallel to the 𝑥-axis, they
always intersect at most 2 fixed lines in the region. Otherwise, the region should be
divided into some parts.
c.
401 | P a g e
∞ 𝑎 ∞
𝐹𝑋 (𝑎) = 𝑃(𝑋 ≤ 𝑎) = ∫ ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑥𝑑𝑦 = ∫ (1 − 𝑒 −𝑎 ) 𝑒 −𝑦 𝑑𝑦 = 1 − 𝑒 −𝑎
0 0 0
𝑌
𝑥=𝑎
𝑋
𝑎
If we differentiate the above function with respect to “𝑎”, we call the resulted
function the marginal density function of the random variable 𝑋:
𝑑𝐹𝑥 (𝑎)
𝑓𝑥 (𝑎) = = 𝑒 −𝑎 ; 𝑎 > 0
𝑑𝑎
Example 2.4
The joint density function of 𝑋 and 𝑌 is given by

6 𝑥𝑦
𝑓 (𝑥, 𝑦) = (𝑥 2 + ) ; 0 < 𝑥 < 1, 0<𝑦<2
7 2
a. Obtain 𝑃(𝑋 > 𝑌).

1 1
b. Obtain 𝑃(𝑌 > 2 |𝑋 < 2).
Solution.
a.
1
6 𝑥
𝑥𝑦 𝑌
𝑃(𝑋 > 𝑌) = ∫ ∫ (𝑥 2 + ) 𝑑𝑦𝑑𝑥
0 0 7 2 2 𝑥=𝑦
1 3
6 𝑥 6 5 15
= ∫ (𝑥 3 + ) 𝑑𝑥 = × =
7 0 4 7 16 56
𝑋
1
402 | P a g e
b. 𝑌
1 1 𝑃(𝐴 ∩ 𝐵)
𝑃(𝑌 > |𝑋 < ) = 𝑃(𝐴|𝐵) = 2
2 2 𝑃(𝐵)
1
2 6 𝑥𝑦 23
∫1 ∫02 7 (𝑥 2 + 2 ) 𝑑𝑥𝑑𝑦 1
= 12
= 128 = 0.8625 2
2 6 𝑥𝑦 5 𝑋
∫02 ∫0 7 (𝑥 2 + 2 ) 𝑑𝑦𝑑𝑥 24 1
If we want to obtain 𝑃(𝑋 ∈ 𝐴) by using the joint density function of 𝑋 and 𝑌,

we have:
+∞
𝑃(𝑋 ∈ 𝐴) = 𝑃(𝑋 ∈ 𝐴, 𝑌 ∈ (−∞, +∞)) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥 = ∫ 𝑓𝑋 (𝑥)𝑑𝑥
𝐴 −∞ 𝐴
where we have:
+∞
𝑓𝑋 (𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 (2-6)
−∞
𝑓𝑋 (𝑥) is called the marginal density function of 𝑋. Therefore, to obtain the

marginal density function of 𝑋 in the bivariate space, in addition to differentiating
the marginal cumulative distribution function, mentioned in Section “𝑐” of Example
2.3, we can sum up the joint density function over 𝑦. Namely, the marginal density
function in Example 2.3 can be obtained as follows:
∞ ∞ ∞
𝑓𝑋 (𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑦 = 𝑒 −𝑥 ∫ 𝑒 −𝑦 𝑑𝑦 = 𝑒 −𝑥 ; 𝑥 ≥ 0
−∞ 0 0
Similarly, the marginal density function of 𝑌 is equal to:
+∞
𝑓𝑌 (𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥 (2-7)
−∞
Namely, the marginal density function of 𝑌 in Example 2.3 can be obtained as

follows:
∞ ∞ ∞
𝑓𝑌 (𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥 = ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑥 = 𝑒 −𝑦 ∫ 𝑒 −𝑥 𝑑𝑥 = 𝑒 −𝑦 ; 𝑦 ≥ 0
−∞ 0 0
403 | P a g e
Same as before, 𝐸(𝑋) and 𝐸(𝑌) are obtained by using ∫𝑥 𝑥 𝑓𝑋 (𝑥)𝑑𝑥 and
∫𝑦 𝑦 𝑓𝑌 (𝑦)𝑑𝑦.
Example 2.5
Obtain the marginal density function of 𝑋 in Example 2.4.
Solution.
∞ 2
6 2 𝑥𝑦 6 𝑥𝑦 2 2 6
𝑓𝑋 (𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ (𝑥 + ) 𝑑𝑦 = (𝑥 2 𝑦 + ) | 0 = (2𝑥 2 + 𝑥)
−∞ 0 7 2 7 4 7
As the following relations are valid in the univariate space between the
cumulative distribution function and density function of a random variable,
𝑎 𝑑𝐹𝑥 (𝑎)
𝐹𝑥 (𝑎) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = 𝑓𝑥 (𝑎)
𝑑𝑎
If 𝑋 and 𝑌 are continuous joint random variables, then the following

relationships are also valid between their joint density function and cumulative
distribution function:
𝑎 𝑏
𝑑 2 𝐹𝑋,𝑌 (𝑎, 𝑏)
𝑓𝑋,𝑌 (𝑎, 𝑏) = 𝐹𝑋,𝑌 (𝑎, 𝑏) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥
𝑑𝑎𝑑𝑏 −∞ −∞
For instance, the joint distribution function of 𝑋 and 𝑌 in Example 2.3 is

obtained as follows:
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑒 −(𝑥+𝑦) ; 𝑥 ≥ 0, 𝑦 ≥ 0

𝑦 𝑥 𝑦 𝑥
⇒ 𝐹𝑋,𝑌 (𝑥, 𝑦) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑥𝑑𝑦
−∞ −∞ 0 0
𝑦
= ∫ (1 − 𝑒 −𝑥 )𝑒 −𝑦 𝑑𝑦 = (1 − 𝑒 −𝑥 )(1 − 𝑒 −𝑦 ) ; 𝑥 ≥ 0 , 𝑦 ≥ 0
0
404 | P a g e
Furthermore, if we differentiate the joint cumulative distribution function
once with respect to 𝑥 and do so with respect to 𝑦, the joint density function of 𝑋
and 𝑌 is obtained.
Example 2.6
If the joint distribution function of 𝑋 and 𝑌 is given by:

2 2
𝐹(𝑥, 𝑦) = (1 − 𝑒 −𝑥 )(1 − 𝑒 −𝑦 ) ; 𝑥 > 0, 𝑦 > 0,
obtain the joint density function of 𝑋 and 𝑌.
Solution.
𝑑 2 𝐹𝑋,𝑌 (𝑥, 𝑦) 2 2
𝑓𝑋,𝑌 (𝑥, 𝑦) = = 2𝑥𝑒 −𝑥 2𝑦𝑒 −𝑦
𝑑𝑥𝑑𝑦
O ne of the most important joint distributions is the multinomial distribution, a

special case of which is the binomial distribution. The difference between the
multinomial and binomial distributions is that while each trial in the binomial
distribution can result in either a failure or a success, each trial in the multinomial
distribution can result in one of 𝑟 possible outcomes with probabilities 𝑝1 , . . . , 𝑝𝑟 . If so,
the multinomial distribution can be defined as follows:
Suppose that we perform a series of 𝑛 identical and independent trials such
that each trial results in one of 𝑟 possible outcomes with respective probabilities
𝑝1 , . . . , 𝑝𝑟 (∑𝑟𝑖=1 𝑝𝑖 = 1). If 𝑋𝑖 denotes the number of times that the 𝑖 𝑡ℎ outcome occurs
in 𝑛 trials, then we have:
𝑟
𝑛! 𝑛 𝑛 𝑛
𝑃(𝑋1 = 𝑛1 , 𝑋2 = 𝑛2 , . . . , 𝑋𝑟 = 𝑛𝑟 ) = 𝑝 1 𝑝 2 . . . 𝑝𝑟 𝑟 ∑ 𝑛𝑖 = 𝑛 (3-1)
𝑛1 ! 𝑛2 !. . . 𝑛𝑟 ! 1 2
𝑖=1
Note that if 𝑟 = 2, then 𝑝2 = 1 − 𝑝1 and the resulted distribution is the binomial

distribution.
405 | P a g e
Example 3.1
Suppose that we roll a fair die 10 times. What is the probability that each of the
numbers 3, 5, and 6 appear once, 1 and 4 appear twice, and 2 appears three times?
Solution. Suppose 𝑋𝑖 denotes the number of times that result 𝑖 is obtained.
𝑃(𝑋1 = 2, 𝑋2 = 3, 𝑋3 = 1, 𝑋4 = 2, 𝑋5 = 1, 𝑋6 = 1)
10! 1 1 1 1 1 1 10! 1
= ( )2 ( )3 ( )1 ( )2 ( )1 ( )1 = ( )10
2! 3! 1! 2! 1! 1! 6 6 6 6 6 6 2! 3! 1! 2! 1! 1! 6
The marginal probability function of 𝑋𝑗 is obtained by summing up the values

of the joint probability function for all the possible outcomes of 𝑋𝑖 's, except for 𝑋𝑗 , as
follows:
𝑛! 𝑛 𝑛𝑗−1 𝑛𝑗 𝑛𝑗+1 𝑛
𝑃(𝑋𝑗 = 𝑛𝑗 ) = ∑ ⋯ ∑ ∑ ⋯ ∑ 𝑝1 1 ⋯ 𝑝𝑗−1 𝑝𝑗 𝑝𝑗+1 ⋯ 𝑝𝑟 𝑟
𝑛1 ! ⋯ 𝑛𝑗−1 ! 𝑛𝑗 ! 𝑛𝑗+1 ! ⋯ 𝑛𝑟 !
𝑛1 𝑛𝑗−1 𝑛𝑗+1 𝑛𝑟
𝑛 𝑛
= (𝑛 ) 𝑝𝑗 𝑗 (1 − 𝑝𝑗 )𝑛−𝑛𝑗
𝑗
As seen, the marginal distribution of 𝑋𝑗 is a binomial distribution with

parameters 𝑛 and 𝑝𝑗 . Indeed, (𝑋𝑗 = 𝑛𝑗 ) means that the number of times that the 𝑖 𝑡ℎ
𝑛 𝑛
outcome occurs in 𝑛 trials is equal to 𝑛𝑗 , which has the probability of (𝑛𝑗 ) 𝑝𝑗 𝑗 (1 − 𝑝𝑗 )𝑛−𝑛𝑗 .
Likewise, the expected value of the marginal distribution of 𝑋𝑗 equals 𝐸(𝑋𝑗 ) =
∑𝑥𝑗 𝑥𝑗 𝑃(𝑋𝑗 = 𝑥𝑗 ) = 𝑛𝑝𝑗 .
T his distribution and hypergeometric distribution are similar; however, there is a

difference. In fact, there are two types of choices in the hypergeometric
distribution, but there are more than two types of choices in the multivariate
hypergeometric distribution. The definition of this distribution is as follows:
406 | P a g e
Suppose that, in an urn, there are 𝑚1 balls of type 1, 𝑚2 balls of type 2, …, and
𝑚𝑟 balls of type 𝑟 such that ∑𝑟𝑖=1 𝑚𝑖 = 𝑀. If a sample of size 𝑛 is taken without
replacement, and 𝑋𝑖 denotes the number of selected balls of type 𝑖 in the sample,
then the joint probability function of 𝑋1 , 𝑋2 , . . . , 𝑋𝑟 is equal to:
𝑚1 𝑚2 𝑚𝑟 𝑟
(𝑥 )(𝑥 )...(𝑥 )
𝑃(𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , . . . , 𝑋𝑟 = 𝑥𝑟 ) = 1 2 𝑟
∑ 𝑥𝑖 = 𝑛 (3-2)
𝑀
( ) 𝑖=1
𝑛
The marginal probability function of 𝑋𝑗 is obtained by summing up the values

of the joint probability function for all the outcomes of 𝑋𝑖 's, except for 𝑋𝑗 , as follows:
𝑚𝑗 𝑀 − 𝑚𝑗
(𝑥 )( 𝑛 − 𝑥 )
𝑗 𝑗
𝑃(𝑋𝑗 = 𝑥𝑗 ) =
𝑀
( )
𝑛
In fact, (𝑋𝑗 = 𝑥𝑗 ) means that the number of selected balls of type 𝑖 in the sample
𝑚𝑗 𝑀−𝑚𝑗
( 𝑥 )( 𝑛−𝑥 )
𝑗 𝑗
is equal to 𝑥𝑗 , which has the probability of 𝑀 . That is, the marginal distribution
( )
𝑛
of 𝑋𝑗 is a hypergeometric distribution with parameters 𝑟, 𝑀, and 𝑚𝑗 . Likewise, the
𝑛𝑚𝑗
expected value of marginal distribution of 𝑋𝑗 is equal to 𝐸(𝑋𝑗 ) = ∑𝑥𝑗 𝑥𝑗 𝑃(𝑋𝑗 = 𝑥𝑗 ) = .
𝑀
Of course, the reader should note that if, in the above distribution, the
sampling is performed with replacement, the trials become independent and the
resulted joint distribution turns into a multinomial distribution with probabilities
𝑚1 𝑚𝑟
𝑝1 = , … , 𝑝𝑟 = .
𝑀 𝑀
I n the univariate space, if the density function of the random variable in its possible
interval was the same, we called it the uniform random variable. In the bivariate
space, if the joint density function of the random variables 𝑋 and 𝑌 is the same for all
possible points of these two variables, we call them the bivariate uniform joint
random variables. In other words, if the joint density function of the random variables
𝑋 and 𝑌 are as follows, we call them the bivariate uniform random variables:
407 | P a g e
𝑓(𝑥, 𝑦) = 𝑘 ; ( 𝑥 , 𝑦) ∈ 𝐴
If so, since the sum of the probabilities of the space should be equal to 1, we
have:
1
𝑘=
the area of region 𝐴
If 𝑋 and 𝑌 are the bivariate uniform joint random variables, to calculate the
probability of that region, it can be shown that the area of a region, such as 𝐵, can be
divided by the total area of the bivariate space:
the area of region 𝐵

𝑃({𝑋, 𝑌} ∈ 𝐵) =
the area of region 𝐴
For example, if the joint density function of random variables 𝑋 and 𝑌 is as

follows:
𝑓(𝑥, 𝑦) = 𝑘 ; 𝑥 > 0 , 𝑦 > 0 , 𝑥 + 𝑦 < 2

We have:
𝑓(𝑥, 𝑦)
k
2
𝑥
2
𝑦
1 1
𝑘= =
the total area 2
𝑦
2
𝑥
2
408 | P a g e
Moreover, in this problem, the probability of a region, namely 𝑋 > 1, is equal
to the volume over the shaded region in the figure below:
𝑓(𝑥, 𝑦)
𝑦 𝑘
2
1 2
𝑥
𝑥
1 2 2
𝑦
1
the area of region 𝑋 > 1 2 1
𝑃(𝑋 > 1) = = =
the 𝑡𝑜𝑡𝑎𝑙 area 2 4
Example 3.2
If the density function of 𝑋 and 𝑌 follows a uniform distribution in the region

𝑥 + 𝑦 ≤ 1, obtain 𝑃(|𝑋| + |𝑌| ≥ 1).
2 2
Solution. Since the joint density function is uniform, we use the proportion of areas
to calculate the probability of the required region. The total area of the circle is
𝜋 × 12 , and the area of the referred square is (√2)2. Therefore, we have:
409 | P a g e
1 y
𝑓(𝑥, 𝑦) = ; 𝑥2 + 𝑦2 ≤ 1
𝜋 1
(√2)2 𝜋 − 2
𝑃(|𝑋| + |𝑌| ≥ 1) = 1 − 𝑃(|𝑋| + |𝑌| < 1) = 1 − = 1
𝜋 𝜋 x
S uppose that 𝑋1 and 𝑋2 are independent normal random variables and the random
variables 𝑌1 and 𝑌2 are defined as follows:
𝑌1 = 𝑎1 𝑋1 + 𝑎2 𝑋2
𝑌2 = 𝑏1 𝑋1 + 𝑏2 𝑋2
𝑌1 and 𝑌2 both depend on the random variables 𝑋1 and 𝑋2. Since 𝑌1 and 𝑌2 both
result from a linear combination of the normal variables, they become normal
random variables. The joint distribution of the two random variables 𝑌1 and 𝑌2 is
called the bivariate normal distribution, which has the following joint density
function:
1 𝑦 −𝜇 𝑦 −𝜇 𝑦 −𝜇 𝑦 −𝜇
1 − [( 1 1 )2 −2𝜌( 1 1 )( 2 2 )+( 2 2 )2 ]
𝑓(𝑦1 , 𝑦2 ) = 𝑒 2(1−𝜌2 ) 𝜎1 𝜎1 𝜎2 𝜎2
2𝜋𝜎1 𝜎2 √1 − 𝜌2
; −∞ < 𝑦1 < ∞, −∞ < 𝑦2 < ∞
In this density function, the respective parameters 𝜇1 and 𝜎1 denote the mean
and standard deviation of 𝑌1 , 𝜇2 and 𝜎2 denote the mean and standard deviation of 𝑌2 ,
and 𝜌 denotes the correlation coefficient between them taking on values in interval
[−1,1]. This parameter will be addressed in Chapter 10.
The bivariate normal random variable has many applications in the statistics,
including multivariate analysis and regression problems.
In Chapter 3, it was shown that two events 𝐸 and 𝐹 are independent if we have:
𝑃(𝐸 ∩ 𝐹) = 𝑃(𝐸)𝑃(𝐹) or 𝑃(𝐸|𝐹) = 𝑃(𝐸)
410 | P a g e
In this section, we want to discuss the independence of two random variables.
The two random variables 𝑋 and 𝑌 are independent if, for any two real sets 𝐴 and 𝐵,
we have:
𝑃(𝑋 ∈ 𝐴, 𝑌 ∈ 𝐵) = 𝑃(𝑋 ∈ 𝐴) × 𝑃(𝑌 ∈ 𝐵) (4-1)
In other words, the two random variables 𝑋 and 𝑌 are independent if any real
event of 𝑋 is independent of that of 𝑌.
Example 4.1
Consider the trial of rolling two fair dice. If 𝑋, 𝑌, and 𝑍 denote the outcome
of the first roll, the outcome of the second roll, and the sum of two rolls,
respectively, then:
a. Investigate the independence of 𝑋 and 𝑌.
b. Investigate the independence of 𝑋 and 𝑍.
Solution.
a. It is evident that any event of 𝑋 is independent of that of 𝑌. For instance,
1 1 1 1
𝑃(𝑋 = 1, 𝑌 = 2) = = 𝑃(𝑋 = 1)𝑃(𝑌 = 2) = × =
36 6 6 36
9 3 3
𝑃(𝑋 ∈ {1,2,3}, 𝑌 ∈ {2,4,6}) = = 𝑃(𝑋 ∈ {1,2,3}) × 𝑃(𝑌 ∈ {2,4,6}) = ×
36 6 6
Therefore, the random variables 𝑋 and 𝑌 are independent.
b. In this case, some events of 𝑋 are independent of those of 𝑍. For
example,
1 1 6
𝑃(𝑋 = 3, 𝑍 = 7) = = 𝑃(𝑋 = 3)𝑃(𝑍 = 7) = ×
36 6 36
However, some events of 𝑋 are not independent of those of 𝑍:
1 1 4
𝑃(𝑋 = 3, 𝑍 = 5) = ≠ 𝑃(𝑋 = 3)𝑃(𝑍 = 5) = ×
36 6 36
Hence, 𝑋 and 𝑍 are not independent.
411 | P a g e
When 𝑋 and 𝑌 are two discrete random variables. It can be shown that the
condition of independence (4-1), for all values of 𝑥 and 𝑦, is:
𝑃(𝑥, 𝑦) = 𝑃𝑋 (𝑥)𝑃𝑌 (𝑦) (4-2)
This is because if the relationship (4-1) is satisfied, then the relationship (4-2)
becomes valid by letting 𝐴 = {𝑥} 𝑎𝑛𝑑 𝐵 = {𝑦}. Furthermore, if the relationship (4-2) is
satisfied, for any two real sets 𝐴 and 𝐵, we have:
𝑃(𝑋 ∈ 𝐴, 𝑌 ∈ 𝐵) = ∑ ∑ 𝑃(𝑥, 𝑦) = ∑ ∑ 𝑃(𝑋 = 𝑥)𝑃(𝑌 = 𝑦)
𝑦∈𝐵 𝑥∈𝐴 𝑦∈𝐵 𝑥∈𝐴
= ∑ 𝑃(𝑌 = 𝑦) ∑ 𝑃(𝑋 = 𝑥) = 𝑃(𝑌 ∈ 𝐵)𝑃(𝑋 ∈ 𝐴)

𝑦∈𝐵 𝑥∈𝐴
Therefore, the relationship (4-1) becomes valid.

Furthermore, it can be shown that the continuous joint random variables 𝑋
and 𝑌 are independent if, for all values of 𝑥 and 𝑦, we have:
𝑓(𝑥, 𝑦) = 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦) (4-3)
To put it simply, the relationship (4-2) indicates that the discrete random
variables 𝑋 and 𝑌 are independent if and only if their joint probability function is
equal to the product of their marginal probability functions. Moreover, the
relationship (4-3) indicates that the continuous random variables 𝑋 and 𝑌 are
independent if and only if their joint density function is equal to the product of their
marginal density functions.
Example 4.2
A mechanical manufacturer produces parts, each of which does not work

independently with probability 0.1. If this manufacturer produces 10 parts on one day
and 8 parts on the other day, what is the probability that the manufacturer produces
one defective part on the first day and no defective parts on the second day?
412 | P a g e
Solution. Suppose that 𝑋 denotes the number of defective parts on the first day, and
𝑌 denotes the number of defective parts on the second day. Since each part,
independently, is defective with probability 0.1, 𝑋 and 𝑌 are independent. Hence, we
have:
10 (0.1)1 (0.9)9 8
𝑃(𝑋 = 1, 𝑌 = 0) = 𝑃(𝑋 = 1)𝑃(𝑌 = 0) = ( ) × × ( ) (0.1)0 (0.9)8 = 0.1667
1 0
Example 4.3
Investigate the independence of random variables in the following joint

distributions.
𝑥𝑒 −(𝑥+𝑦) ; 𝑥 > 0, 𝑦 > 0
a. 𝑓 (𝑥, 𝑦) = {
2; 0 < 𝑥 < 𝑦 ,0 < 𝑦 < 1

b. 𝑓 (𝑥, 𝑦) = {
Solution.
a.
∞
𝑓𝑋 (𝑥) = ∫ 𝑥𝑒 −(𝑥+𝑦) 𝑑𝑦 = 𝑥𝑒 −𝑥
0
∞ ∞
𝑓𝑌 (𝑦) = ∫ 𝑥𝑒 −(𝑥+𝑦)
𝑑𝑥 = 𝑒 −𝑦
(𝑥𝑒 −𝑥 |∞
0 + ∫ 𝑒 −𝑥 𝑑𝑥) = 𝑒 −𝑦
0 0
−𝑥 −𝑦 −(𝑥+𝑦)
⇒ 𝑓𝑋 (𝑥) . 𝑓𝑌 (𝑦) = (𝑥𝑒 ) (𝑒 ) = 𝑥𝑒 = 𝑓 (𝑥, 𝑦)
Therefore, 𝑋 and 𝑌 are independent.

b.
1
𝑓𝑥 (𝑥) = ∫ 2 𝑑𝑦 = 2 − 2𝑥
𝑥
𝑦
𝑓𝑌 (𝑦) = ∫ 2 𝑑𝑥 = 2𝑦
0
⇒ 𝑓𝑋 (𝑥). 𝑓𝑌 (𝑦) = 2𝑦(2 − 2𝑥) ≠ 2
Therefore, 𝑋 and 𝑌 are not independent.
413 | P a g e
In general, the random variables 𝑋1 , . . . , 𝑋𝑛 are independent if, for any set of real
numbers 𝐴1 , . . . , 𝐴𝑛 , we have:
𝑃(𝑋1 ∈ 𝐴1 , . . . , 𝑋𝑛 ∈ 𝐴𝑛 ) = 𝑃(𝑋1 ∈ 𝐴1 ). . . 𝑃(𝑋𝑛 ∈ 𝐴𝑛 )

Or, for all values of 𝐴1 , . . . , 𝐴𝑛 , we have:
𝑃(𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) = 𝑃𝑋1 (𝑥1 ). . . 𝑃𝑋𝑛 (𝑥𝑛 ) Discrete Case

𝑓(𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) = 𝑓𝑋1 (𝑥1 ). . . 𝑓𝑋𝑛 (𝑥𝑛 ) Continuous Case
Proposition 4-1
If the joint density function of the continuous random variables 𝑋 and 𝑌 (the joint
probability function of discrete random variables) can be written as the product of a
nonnegative function of 𝑋 and a nonnegative function of 𝑌, and regions of 𝑋 and 𝑌
are not associated with each other, then the random variables 𝑋 and 𝑌 are
independent and there is no need to determine the marginal density functions (the
marginal probability functions) to investigate the independence of these two
variables.
Proof. To prove this proposition, suppose that 𝑓𝑋,𝑌 (𝑥, 𝑦) is the density of joint
continuous random variables 𝑋 and 𝑌 which can be written as the product of two
functions ℎ(𝑥) and 𝑔(𝑦). The first one is in terms of 𝑥 and the second one is in terms
of 𝑦, and the domains of the random variables 𝑋 and 𝑌 are not dependent. Therefore,
we have:
∫ ∫𝑓𝑋,𝑌 (𝑥, 𝑦) 𝑑𝑥𝑑𝑦 = 1 ⇒ ∫ ∫ℎ(𝑥)𝑔(𝑦) 𝑑𝑥𝑑𝑦 = 1 ⇒ ∫ 𝑔(𝑦)𝑑𝑦 ∫ℎ(𝑥) 𝑑𝑥 = 1

𝑦 𝑥 𝑦 𝑥 𝑦 𝑥
Now, if ∫𝑥 ℎ(𝑥)𝑑𝑥 = 𝑘1 and are ∫𝑦 𝑔(𝑦)𝑑𝑦 = 𝑘2 defined, it yields 𝑘1 𝑘2 = 1.

Moreover, writing the density functions of the random variables 𝑋 and 𝑌 as follows:
𝑓𝑋 (𝑥) = ∫ 𝑓𝑋,𝑌 (𝑥, 𝑦) 𝑑𝑦 = ∫ℎ(𝑥)𝑔(𝑦) 𝑑𝑦 = ℎ(𝑥) ∫ 𝑔(𝑦) 𝑑𝑦 = ℎ(𝑥)𝑘2

𝑦 𝑦 𝑦
𝑓𝑌 (𝑦) = ∫𝑓𝑋,𝑌 (𝑥, 𝑦) 𝑑𝑥 = ∫ℎ(𝑥)𝑔(𝑦) 𝑑𝑥 = 𝑔(𝑦) ∫ℎ(𝑥) 𝑑𝑥 = 𝑔(𝑦)𝑘1

𝑥 𝑥 𝑥
414 | P a g e
We have:
𝑓𝑋 (𝑥)𝑓𝑌 (𝑦) = ℎ(𝑥)𝑘2 𝑔(𝑦)𝑘1 = ℎ(𝑥)𝑔(𝑦) = 𝑓𝑋,𝑌 (𝑥, 𝑦)
Therefore, since 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦) and 𝑓𝑋,𝑌 (𝑥, 𝑦) are equal, the random variables 𝑋 and
𝑌 are independent.
Example 4.4
In each of the following joint distributions, investigate the independence of

random variables. (In the following density functions, except for the specified
values, the density function equals zero for other values.)
a. 𝑓(𝑥, 𝑦) = 𝑥 + 𝑦 ; 0≤𝑥 ≤1, 0≤𝑦 ≤1
b. 𝑓(𝑥, 𝑦) = 12𝑥𝑦(1 − 𝑥) ; 0≤𝑥 ≤1, 0≤𝑦 ≤1
1
c. 𝑓(𝑥, 𝑦) = 8 (𝑦 2 − 𝑥 2 )𝑒 −𝑦 ; −𝑦 ≤ 𝑥 ≤ 𝑦 , 0 ≤ 𝑦 ≤ ∞
d. 𝑓(𝑥, 𝑦) = 𝑥𝑒 −(𝑥+𝑦) ; 0 < 𝑥, 0<𝑦
e. 𝑓(𝑥, 𝑦) = 2; 0≤𝑥 ≤𝑦, 0≤𝑦≤1
f. 𝑓(𝑥, 𝑦) = 𝑥𝑦𝑧; 0 ≤ 𝑥 ≤ 1 , 0 ≤ 𝑦 ≤ 1, 0 ≤ 𝑧 ≤ 1
Solution.
a. They are not independent since 𝑥 + 𝑦 cannot be written as the product
of a function in terms of 𝑥 and a function in terms of 𝑦.
b. They are independent since regions 𝑋 and 𝑌 are not related to each
other (the space is rectangular), and 12𝑥𝑦(1 − 𝑥) can be written as the
product of two functions 𝑔(𝑥) = 12𝑥(1 − 𝑥) and ℎ(𝑦) = 𝑦. It should be
noted that if 𝑓(𝑥, 𝑦) can be written as two functions 𝑔(𝑥) and ℎ(𝑦), it is
not necessary that 𝑔(𝑥) be the marginal density function of 𝑋, and ℎ(𝑦)
be the marginal density function of 𝑌. However, we can say that 𝑋 and
𝑌 are independent. For example, in this section, the marginal density
of 𝑋 is equal to 6𝑥(1 − 𝑥) and the density function of 𝑌 is equal to 2𝑦.
c. They are not independent since regions 𝑋 and 𝑌 are associated with
each other. Furthermore, the joint density function cannot be written
as the product of two distinguished functions in terms of 𝑋 and 𝑌.
415 | P a g e
d. They are independent. Why?
e. They are not independent since regions 𝑋 and 𝑌 are associated with
each other.
f. They are independent since regions of 𝑋, 𝑌, and 𝑍 are not related to
each other and 8𝑥𝑦𝑧 can be written as the product of functions 𝑥, 𝑦, and
𝑧.
Note that the random variable of section “f” is a triple random variable.
In cases with more than two random variables, the condition of their
independence is the same as the bivariate case.
Example 4.5
Two people are supposed to meet at a place. The arrival time of the first person
at the place is uniformly distributed from 1:00 to 2:00 a.m. The arrival time of the
second person, independently, is uniformly distributed from 1:30 to 2:00 a.m. What
is the probability that the second person arrives at the place sooner than the first
person?
Solution. Suppose that the random variable 𝑋 denotes the arrival time of the first
person, and the random variable 𝑌 denotes the arrival time of the second person
(both 𝑋 and 𝑌 are in terms of minutes). For simplicity, we consider 13:00 to be the
origin of time. Since 𝑋 and 𝑌 are independent, we have:
1 1 1
𝑓(𝑥, 𝑦) = 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦) =
× = ; 0 ≤ 𝑥 ≤ 60, 30 ≤ 𝑥 ≤ 60
60 30 1800
60 60 Y
1
𝑃(𝑋 > 𝑌) = ∫ ∫ 𝑑𝑥𝑑𝑦 60
30 𝑦 1800
60
(60 − 𝑦) 60𝑦 𝑦 2 60 1 30
=∫ 𝑑𝑦 = − | =
30 1800 1800 3600 30 4
30 60 X
416 | P a g e
1
However, since the joint density function of 𝑋 and 𝑌 is constant (1800), the joint
distribution of 𝑋 and 𝑌 is uniform and the required probability is the ratio of the
desired area to the total area:
30 × 30
2 1
𝑃(𝑋 > 𝑌) = =
30 × 60 4
Example 4.6
If 𝑋1 and 𝑋2 are independent exponential random variables with respective

parameters 𝜆1 and 𝜆2 , obtain 𝑃(𝑋1 < 𝑋2 ).
Solution. Since 𝑋1 and 𝑋2 are independent, their joint density function is obtained by
multiplying their density functions.
𝑓(𝑥1 , 𝑥2 ) = 𝑓𝑋1 (𝑥1 )𝑓𝑋2 (𝑥2 ) = 𝜆1 𝑒 −𝜆1𝑥1 𝜆2 𝑒 −𝜆2 𝑥2 ; 𝑥1 ≥ 0, 𝑥2 ≥ 0

𝑋2
∞ 𝑥2
⇒ 𝑃(𝑋1 < 𝑋2 ) = ∫ ∫ 𝜆1 𝑒 −𝜆1𝑥1 𝜆2 𝑒 −𝜆2 𝑥2 𝑑𝑥1 𝑑𝑥2
0 0 𝑥1 = 𝑥2
∞
= ∫ (1 − 𝑒 −𝜆1𝑥2 )𝜆2 𝑒 −𝜆2𝑥2 𝑑𝑥2 𝑋1
0
∞ ∞
𝜆2 𝜆1
= ∫ 𝜆2 𝑒 −𝜆2𝑥2 𝑑𝑥2 − ∫ 𝜆2 𝑒 −(𝜆1+𝜆2)𝑥2 𝑑𝑥2 = 1 − =
0 0 𝜆1 + 𝜆2 𝜆1 + 𝜆2
In general, if 𝑋1 and 𝑋2 are independent exponential random variables with

respective parameters 𝜆1 and 𝜆2 ,
𝜆1
𝑃(𝑋1 < 𝑋2 ) =
𝜆1 + 𝜆2
𝜆1
𝑎1
𝑃(𝑎1 𝑋1 < 𝑎2 𝑋2 ) = ; 𝑎1 > 0 , 𝑎2 > 0
𝜆1 𝜆2
𝑎1 + 𝑎2
417 | P a g e
Note that, in Chapter 5, it was shown that 𝑎1 𝑋1 is an exponential random
𝜆1
variable with parameter , and 𝑎2 𝑋2 is an exponential random variable with
𝑎1
𝜆
parameter 𝑎2
2
S ometimes, in a multivariate space, the value of a variable is specified and we

should obtain the distribution of the remaining variable. Namely, suppose that a
system consists of two pieces being used simultaneously with dependent lifetime. In
these situations, if the lifetime of pieces numbered 1 and 2 are denoted by 𝑋 and 𝑌,
respectively, and piece number 2 fails after 𝑦 units of time, then the lifetime of piece
number 1 under the new condition is called the conditional distribution of the
random variable 𝑋 given that 𝑌 = 𝑦. We call this concept the conditional distribution.
A s mentioned in Chapter 3, the probability that event 𝐸 occurs given the

occurrence of nonempty event 𝐹 is defined as follows:
𝑃(𝐸 ∩ 𝐹)
𝑃(𝐸|𝐹) =
𝑃(𝐹)
In a bivariate discrete space of 𝑋 and 𝑌, if the value of random variable 𝑌 is
equal to 𝑦, to obtain the distribution of the random variable 𝑋 given 𝑌 = 𝑦, we should
obtain the probability function of 𝑋 under the new condition (𝑌 = 𝑦). The probability
function of 𝑋 under the new condition 𝑌 = 𝑦 is shown by 𝑃(𝑋 = 𝑥|𝑌 = 𝑦) that is
obtained under the condition 𝑃𝑌 (𝑦) > 0 for different values of variable 𝑋 as follows:
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) 𝑃(𝑥, 𝑦)
𝑃(𝑋 = 𝑥|𝑌 = 𝑦) = =
𝑃(𝑌 = 𝑦) 𝑃𝑌 (𝑦)
The above function is also shown by 𝑃𝑋|𝑌 (𝑥|𝑦), called the conditional
probability function of 𝑋 given that 𝑌 = 𝑦.
Moreover, for this conditional probability function, we have:
∑ 𝑃𝑋|𝑌 (𝑥|𝑦) = 1
𝑥
418 | P a g e
Furthermore, like all random variables, for the expected value of distribution
𝑋|𝑌 = 𝑦 and a function of that, as well as its cumulative distribution function, we
have:
𝐸(𝑋|𝑌 = 𝑦) = ∑ 𝑥 𝑃𝑋|𝑌 (𝑥|𝑦)

𝑥
𝐸(𝑔(𝑋)|𝑌 = 𝑦) = ∑ 𝑔(𝑥) 𝑃(𝑋 = 𝑥|𝑌 = 𝑦)

𝑥
𝐹𝑋|𝑌 (𝑎|𝑌 = 𝑦) = ∑ 𝑃𝑋|𝑌 (𝑥|𝑦) = 𝑃(𝑋 ≤ 𝑎|𝑌 = 𝑦)

𝑥≤𝑎
Example 5.1
We roll two fair dice. If the random variables 𝑋 and 𝑌 denote the result of the
first die and the sum of two dice, respectively, then obtain the probability function
of 𝑋 given 𝑌 = 5 and the expected value of 𝑋 given 𝑌 = 5.
Solution.
1
𝑃(𝑋 = 1, 𝑌 = 5) 36 1
𝑃𝑋|𝑌 (1|5) = 𝑃(𝑋 = 1|𝑌 = 5) = = =
𝑃(𝑌 = 5) 4 4
36
Similarly, for the other values of 𝑥, we have:
1 1 1
𝑃𝑋|𝑌 (2|5) = 𝑃𝑋|𝑌 (3|5) = 𝑃𝑋|𝑌 (4|5) = 𝑃𝑋|𝑌 (5|5) = 𝑃𝑋|𝑌 (6|5) = 0
4 4 4
1
⇒ 𝑃𝑋|𝑌 (𝑖|𝑌 = 5) = ; 𝑖 = 1,2,3,4
4
4
1 1 5
𝐸(𝑋|𝑌 = 5) = ∑ 𝑥 𝑃(𝑥|𝑦) = ∑ 𝑖 × = (1 + 2 + 3 + 4) =
4 4 2
𝑥 𝑖=1
419 | P a g e
Example 5.2
In the preceding example, if 𝑋 and 𝑌 denote the smaller number and larger
number in rolling two dice, respectively, obtain the probability function of 𝑋 given
𝑌 = 4. Then, obtain its cumulative distribution function and expected value.
Solution. If 𝑃(𝑥, 𝑦) denotes that the random variables 𝑋 and 𝑌 take on values 𝑥 and 𝑦,
respectively, we have:
7
𝑃(𝑌 = 4) = 𝑃(1,4) + 𝑃(2,4) + 𝑃(3,4) + 𝑃(4,4) + 𝑃(4,3) + 𝑃(4,2) + 𝑃(4,1) =
36
Namely, for 𝑃𝑋|𝑌 (1|𝑌 = 4), we have:
2
36 2
𝑃𝑋|𝑌 (1|𝑌 = 4) = =
7 7
36
As a result, the probability function and cumulative distribution function of

𝑋|𝑌 = 4 are equal to:
2 0; 𝑥<1
; 𝑥=1 2
7
2 ; 1≤𝑥<2
7
; 𝑥=2 4
𝑃𝑋|𝑌 (𝑥|𝑌 = 4) = 7 ⇒ 𝐹𝑋|𝑌 (𝑥|𝑌 = 4) = ; 2≤𝑥<3
2 7
; 𝑥=3 6
7
1 ; 3≤𝑥<4
7
{7 ; 𝑥=4 { 1; 4≤𝑥
and the expected value of 𝑋|𝑌 = 4 is:
2 2 2 1 16
𝐸(𝑋|𝑌 = 4) = ∑ 𝑥 𝑃(𝑥|𝑌 = 4) = 1 × + 2 × + 3 × + 4 × =
7 7 7 7 7
𝑥
420 | P a g e
A s mentioned in Chapter 4, in the univariate continuous space, the density
function of the random variable 𝑋 at point 𝑥 is defined as follows:
𝑑𝑥 𝑑𝑥
𝑃(𝑥 − 2 < 𝑋 < 𝑥 + 2 )
𝑓𝑋 (𝑥) = 𝑙𝑖𝑚
In the continuous bivariate space of 𝑋 and 𝑌, if the random variable 𝑌 takes

𝑑𝑦 𝑑𝑦
on a value in interval 𝑦 ± (since 𝑑𝑦 converges to zero, the term 𝑦 − ≤𝑌≤𝑦+
2 2
𝑑𝑦
is also denoted by 𝑌 = 𝑦), then the density function of the random variable 𝑋 at
2
point 𝑥 under the new condition is defined as follows:
𝑃(𝑥 − 2 < 𝑋 < 𝑥 + 2 |𝑦 − 2 < 𝑌 < 𝑦 + 2 )
𝑓𝑋|𝑌 (𝑥|𝑦) = 𝑙𝑖𝑚
where we have:
𝑃(𝑥 − <𝑋<𝑥+ |𝑦 − <𝑌<𝑦+ )
2 2 2 2
𝑃(𝑥 − 2 < 𝑋 < 𝑥 + 2 , 𝑦 − 2 < 𝑌 < 𝑦 + 2 )
=
𝑑𝑦 𝑑𝑦
𝑃(𝑦 − 2 < 𝑌 < 𝑦 + 2 )
𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 𝑓(𝑥, 𝑦)𝑑𝑥
= =
𝑓𝑌 (𝑦)𝑑𝑦 𝑓𝑌 (𝑦)
Hence, in the continuous bivariate space of 𝑋 and 𝑌, if the random variable
𝑌 is specified as “𝑦”, the density of each possible value for the random variable 𝑋
is obtained as follows:
𝑑𝑥 𝑑𝑥 𝑑𝑦 𝑑𝑦 𝑓(𝑥, 𝑦)𝑑𝑥
𝑃(𝑥 − 2 < 𝑋 < 𝑥 + 2 |𝑦 − 2 < 𝑌 < 𝑦 + 2 ) 𝑓𝑌 (𝑦) 𝑓(𝑥, 𝑦)
𝑓𝑋|𝑌 (𝑥|𝑦) = 𝑙𝑖𝑚 = =
𝑑𝑥→0 𝑑𝑥 𝑑𝑥 𝑓𝑌 (𝑦)
In other words, if, in the continuous bivariate space of 𝑋 and 𝑌, the value of
random variable 𝑌 is specified as “𝑦”, to calculate the probability of an event like 𝐴
for the random variable 𝑋, we have:
421 | P a g e
𝑃(𝑋 ∈ 𝐴|𝑌 = 𝑦) = ∫ 𝑓𝑋|𝑌 (𝑥|𝑦)𝑑𝑥
𝐴
𝑓(𝑥,𝑦)
in which 𝑓𝑋|𝑌 (𝑥|𝑦) is equal to .
𝑓𝑌 (𝑦)
Furthermore, to calculate the expected value of distribution 𝑋|𝑌 = 𝑦 and a

function of that, as well as its cumulative distribution function, we have:
+∞
𝐸(𝑋|𝑌 = 𝑦) = ∫ 𝑥 𝑓𝑋|𝑌 (𝑥|𝑦)𝑑𝑥
−∞
+∞
𝐸(𝑔(𝑋)|𝑌 = 𝑦) = ∫ 𝑔(𝑥) 𝑓𝑋|𝑌 (𝑥|𝑦)𝑑𝑥
−∞
𝑎
𝐹𝑋|𝑌 (𝑎|𝑦) = ∫ 𝑓𝑋|𝑌 (𝑥|𝑦)𝑑𝑥
−∞
Example 5.3
The joint density function of 𝑋 and 𝑌 is given by
𝑓 (𝑥, 𝑦) = 𝑥𝑒 −𝑥(𝑦+1) ; 𝑥>0 , 𝑦>0
a. Obtain the conditional density function of 𝑋 given that 𝑌 = 𝑦 as well as

𝑌 given that 𝑋 = 𝑥.
b. Obtain the expected value and cumulative distribution function of 𝑌|𝑋 =
𝑥.
c. Obtain 𝑃(2 < 𝑌 < 3|𝑋 = 2).
Solution.
a.
∞
𝑓𝑋 (𝑥) = ∫ 𝑥𝑒 −𝑥𝑦 𝑒 −𝑥 𝑑𝑦 = 𝑒 −𝑥 (−𝑒 −𝑥𝑦 ) | ∞ = 𝑒 −𝑥
0 0
422 | P a g e
∞ 𝑒 −𝑥(𝑦+1) ∞ 𝑒 −𝑥(𝑦+1) 1
𝑓𝑌 (𝑦) = ∫0 𝑥𝑒 −𝑥(𝑦+1) 𝑑𝑥 = −𝑥 | ∞ + ∫0 𝑑𝑥 = 0 + (𝑦+1)2 =
𝑦+1 𝑦+1
0
1
(𝑦+1)2
𝑓 (𝑥,𝑦) 𝑥𝑒 −𝑥(𝑦+1)
⇒ 𝑓𝑋|𝑌 (𝑥|𝑦) = = = (𝑦 + 1)2 𝑥𝑒 −𝑥(𝑦+1) ; 𝑥>0
𝑓𝑌 (𝑦) (𝑦+1)−2
Note that the value of the random variable 𝑌 in distribution 𝑋|𝑌 = 𝑦 is

constant, equal to 𝑦. In this regard, in the above density function, 𝑦 is
constant and 𝑋 is the only random variable. Therefore, a closer look at
the above density function helps us to realize that 𝑋|𝑌 = 𝑦 is a Gamma
distribution with parameters 𝜆 = 𝑦 + 1 and 𝛼 = 2. Moreover, in this
problem, we have:
𝑓 (𝑥, 𝑦) 𝑥𝑒 −𝑥(𝑦+1)
𝑓𝑌|𝑋 (𝑦|𝑥) = = = 𝑥𝑒 −𝑥𝑦 ; 𝑦>0
𝑓𝑋 (𝑥) 𝑒 −𝑥
And, in distribution 𝑌|𝑋 = 𝑥, the value of the random variable 𝑋 is

constant, equal to 𝑥, and 𝑌 is the only random variable. Therefore, we
can realize that 𝑌|𝑋 = 𝑥 is an exponential distribution with parameters
𝜆 = 𝑥.
b.
∞ 𝑦=∞ ∞
−𝑥𝑦 −𝑥𝑦
1 1
𝐸(𝑌|𝑋 = 𝑥) = ∫ 𝑦𝑥𝑒 𝑑𝑦 = −𝑦𝑒 | 𝑦=0 + ∫ 𝑒 −𝑥𝑦 𝑑𝑦 = 0 + =
0 0 𝑥 𝑥
𝑎 𝑎
𝐹𝑌|𝑋 (𝑎|𝑥) = ∫ 𝑓𝑌|𝑋 (𝑦|𝑥)𝑑𝑦 = ∫ 𝑥𝑒 −𝑥𝑦 𝑑𝑦 = 1 − 𝑒 −𝑎𝑥 ; 𝑎 > 0
−∞ 0
However, since 𝑌|𝑋 = 𝑥 follows an exponential distribution with

1 1
parameter 𝜆 = 𝑥, it is evident that its expected value is = 𝑥, and its
𝜆
cumulative distribution function is 𝐹𝑌|𝑋 (𝑎|𝑥) = 1 − 𝑒 −𝜆𝑎 = 1 − 𝑒 −𝑎𝑥 ; 𝑎 > 0.
c.
3 3
𝑃(2 < 𝑌 < 3|𝑋 = 2) = ∫ 𝑓𝑌|𝑋 (𝑦|𝑥 = 2)𝑑𝑦 = ∫ 2𝑒 −2𝑦 𝑑𝑦 = 𝑒 −4 − 𝑒 −6
2 2
423 | P a g e
Calculating 𝑓𝑋|𝑌 (𝑥|𝑦), we should note that the value of 𝑌 is specified as
“𝑦”, and 𝑋 is the only random variable. Hence, all the expressions in
terms of 𝑦 are considered number (including 𝑓𝑌 (𝑦)).
Namely, for section “a” of Example 5.3, we have:
𝑓(𝑥, 𝑦) 𝑥𝑒 −𝑥(𝑦+1)
𝑓𝑋|𝑌 (𝑥|𝑦) = = = 𝑐𝑥𝑒 −𝑥(𝑦+1) ; 𝑥>0
𝑓𝑌 (𝑦) 𝑓𝑌 (𝑦)
In which, a closer look reveals that 𝑋|𝑌 = 𝑦 is a Gamma distribution with

parameters 𝜆 = 𝑦 + 1 and 𝛼 = 2.
Moreover, calculating 𝑓𝑌|𝑋 (𝑦|𝑥), we should note that the value of 𝑋 is

specified as “𝑥”, and 𝑌 is the only random variable. Hence, for section
“a” of Example 5.3, we have:
𝑓 (𝑥, 𝑦)
𝑓𝑌|𝑋 (𝑦|𝑥) = = 𝑐𝑥𝑒 −𝑥(𝑦+1) = 𝑐 ′ 𝑒 −𝑥𝑦 ; 𝑦>0
𝑓𝑋 (𝑥)
where 𝑌|𝑋 = 𝑥 is an exponential distribution with parameter 𝜆 = 𝑥.
If 𝑋 and 𝑌 are independent random variables, then distribution 𝑋|𝑌 = 𝑦 and 𝑋

are the same. The proof of this note, for the continuous case, is as follows:
𝑓 (𝑥, 𝑦) 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦)

𝑓𝑋|𝑌 (𝑥|𝑦) = = = 𝑓𝑥 (𝑥)
𝑓𝑌 (𝑦) 𝑓𝑌 (𝑦)
Example 5.4
If the joint density function of the continuous random variables 𝑋, 𝑌, and 𝑍 is

given by:
𝑓(𝑥, 𝑦, 𝑧) = 8𝑥𝑦𝑧 ; 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1, 0≤𝑧≤1
424 | P a g e
1 1 1
obtain 𝑃(𝑋 ≤ 2 |𝑌 = 2 , 𝑍 = 2).
Solution.
𝑓(𝑥, 𝑦, 𝑧) = 8𝑥𝑦𝑧; 0 ≤ 𝑥, 𝑦, 𝑧 ≤ 1
1
𝑓𝑌,𝑍 (𝑦, 𝑧) = ∫ 𝑓(𝑥, 𝑦, 𝑧)𝑑𝑥 = 4𝑦𝑧 ; 0 ≤ 𝑦 ≤ 1, 0≤𝑧≤1
0
𝑓(𝑥, 𝑦, 𝑧) 8𝑥𝑦𝑧
𝑓𝑋|𝑌,𝑍 = (𝑥|𝑦, 𝑧) = = = 2𝑥 ; 0 ≤ 𝑥 ≤ 1
𝑓(𝑦, 𝑧) 4𝑦𝑧
1 1
1 1 1 2 1 1 2 1
𝑃(𝑋 ≤ |𝑌 = , 𝑍 = ) = ∫ 𝑓𝑋|𝑌,𝑍 (𝑥|𝑦 = , 𝑧 = ) 𝑑𝑥 = ∫ 2𝑥𝑑𝑥 =
2 2 2 0 2 2 0 4
However, it can be understood that 𝑋, 𝑌, and 𝑍 are independent random
variables considering their joint density function. Therefore, any event of 𝑋 is
independent of that of 𝑌 and 𝑍. Hence, we have:
1 1 1 1
𝑓𝑋 (𝑥) = ∫ ∫ 𝑓(𝑥, 𝑦, 𝑧) 𝑑𝑦 𝑑𝑧 = ∫ ∫ 8𝑥𝑦𝑧 𝑑𝑦 𝑑𝑧 = 2𝑥
0 0 0 0
1
1 1 1 1 2 1
𝑃(𝑋 ≤ |𝑌 = , 𝑍 = ) = 𝑃(𝑋 ≤ ) = ∫ 𝑓𝑋 (𝑥)𝑑𝑥 =
2 2 2 2 0 4
Example 5.5
We generate a random number in interval (0,1) and call it 𝑋. Then, we

perform 𝑛 independent trials with the probability of success 𝑋, and we call the
number of successes in 𝑛 trials 𝑌. If we know that the number of successes in 𝑛
trials is equal to 𝑖, obtain the conditional distribution (𝑋|𝑌 = 𝑖).
Solution. Similar to the proof presented in Section 8.5.2 for the continuous case,
it can be shown that if 𝑋 is a continuous variable and 𝑌 is a discrete variable, we
have:
425 | P a g e
𝑃(𝑌 = 𝑦|𝑋 = 𝑥)𝑓𝑋 (𝑥)
𝑓𝑋|𝑌 (𝑥|𝑦) =
𝑃(𝑌 = 𝑦)
Therefore, in this problem, considering that 𝑋 follows a uniform distribution

in interval (0,1), and (𝑌|𝑋 = 𝑥) is a binomial distribution with parameters 𝑛 and 𝑥,
we have:
𝑛 𝑦 𝑛−𝑦 1
𝑃(𝑌 = 𝑦|𝑋 = 𝑥)𝑓𝑋 (𝑥) (𝑦) 𝑥 (1 − 𝑥) 1−0
𝑓𝑋|𝑌 (𝑥|𝑦) = = = 𝑐𝑥 𝑦 (1 − 𝑥)𝑛−𝑦 ; 0 < 𝑥
𝑃(𝑌 = 𝑦) 𝑃(𝑌 = 𝑦)
<1
Hence, (𝑋|𝑌 = 𝑦) follows a beta distribution with parameters (𝑦 + 1, 𝑛 − 𝑦 +

1).
426 | P a g e
1) If the joint probability distribution of the random variables 𝑋1 and 𝑋2 is given
by
𝑋1
1 2 3
𝑋2
1 1
1 0
6 12
1 1
2 0
5 9
2 1 1
3
15 4 18
a. The marginal functions of 𝑋1 and 𝑋2.
b. 𝑃(𝑋2 = 3|𝑋1 = 2)
2) A supermarket has two express lanes. Suppose that 𝑋1 and 𝑋2 denote the
number of customers in the first and second lanes at any moment,
respectively. In regular working hours, the joint probability function is as
follows:
𝑋1
0 1 2
𝑋2
0 𝑘 4𝑘 9𝑘
1 2𝑘 6𝑘 12𝑘
2 3𝑘 8𝑘 3𝑘
3 4𝑘 2𝑘 6𝑘
b. The marginal distribution functions.
c. 𝑃(𝑋1 ≤ 1, 𝑋2 ≤ 1)
d. 𝑃(𝑋1 + 𝑋2 > 2)
427 | P a g e
e. 𝑃(𝑋1 + 𝑋2 ≤ 1)
f. 𝑃(𝑋1 ≥ 𝑋2 |𝑋2 > 1)
g. 𝑃(𝑋1 > 1)
h. 𝑃(|𝑋1 − 𝑋2 | = 1)
3) If 𝑋1 and 𝑋2 are independent binomial random variables with respective
parameters (𝑛, 𝑝) and (𝑛, 𝑞) such that (𝑝 + 𝑞 = 1), then obtain 𝑃(𝑋1 = 𝑋2 ).
4) If person A flips a fair coin 𝑛 times, and person B flips a fair coin 𝑛 + 1 times,
obtain the probability that the number of heads for both people is the same.
1 2
5) Two unfair coins with probabilities of coming heads 3 and 3 are given to people
A and B, respectively. The first coin is flipped 𝑘 times by person A and the
second coin is flipped 𝑛 − 𝑘 times by person B. Obtain the probability of
observing the same number of heads for both people.
6) If 𝑋1 and 𝑋2 are independent geometric distributions with parameters 𝑝, obtain
𝑃(𝑋1 = 𝑋2 ).
7) If 𝑋1 and 𝑋2 are independent geometric distributions with parameters 𝑝, obtain
𝑃(𝑋1 > 𝑋2 ).
8) Suppose that 𝑋1 and 𝑋2 are independent geometric distributions with
1 2
parameters 𝑝 = 2. If 𝑃(𝑋1 ≥ 𝑘𝑋2 ) = 15, obtain 𝑘.
9) Consider a trial taking on values 1,2,3, . . . , 𝑛 with respective probabilities
𝑃1 , 𝑃2 , 𝑃3 , . . . , 𝑃𝑛 . If we independently perform this trial twice, what is the
probability that:
a. The result of the first trial differs from that of the second trial?
b. The result of the first trial exceeds that of the second trial?
10) Each of the two construction contracts are allocated to one of the three
companies A, B, and C independently and equally likely. If the random variable
𝑋1 denotes the number of contracts for company A and the random variable
𝑋2 denotes the number of contracts for company B, then obtain the joint
probability function of 𝑋1 and 𝑋2.
11) Of 5 transistors in a box, two are defective. The transistors are withdrawn at
random and without replacement one at a time to be tested. This continues
until defective transistors are found. If the random variable 𝑋1 denotes the
number of tests required to find the first defective transistor and the random
428 | P a g e
variable 𝑋2 denotes the number of additional tests after 𝑋1 , required to obtain
the second defective transistor, then:
a. Obtain the joint probability function of 𝑋1 and 𝑋2.
b. Obtain the marginal distributions 𝑋1 and 𝑋2.
c. Obtain 𝑃(𝑋2 = 2|𝑋1 = 2).
d. Are 𝑋1 and 𝑋2 independent?
12) There are 𝑀 defective transistors in a lot consisting of 𝑁 transistors (𝑀 ≤ 𝑁).
Two transistors are withdrawn at random from the lot. The random variables
𝑋1 and 𝑋2 are defined for 𝑖 = 1,2.
𝑋𝑖 = 1: if th 𝑖 𝑡ℎ transistor is nondefective.
𝑋𝑖 = 0: if the 𝑖 𝑡ℎ transistor is defective.
a. Obtain the joint probability distribution of 𝑋1 and 𝑋2.
b. Obtain the marginal distributions of 𝑋1 and 𝑋2.
13) An urn contains 4 white, 6 black, and 5 red balls. Two samples are taken from
the urn at random, without replacement, and successively such that the first
sample consists of 3 balls, and the second sample consists of 5 balls. Suppose
that the random variables 𝑋1 and 𝑋2 denote the number of white balls in the
first and second samples, respectively.
a. Obtain the joint probability function of 𝑋1 and 𝑋2.
b. Obtain 𝐸(𝑋2 |𝑋1 = 𝑖); 𝑖 = 1,2,3,4.
c. Obtain 𝐸(𝑋1 |𝑋2 = 𝑗); 𝑗 = 1,2,3,4.
14) The random variables 𝑋1 and 𝑋2 have the following joint probability function:
(𝑖, 𝑗) (0,0) (0,1) (1,0) (1,1) (2,0) (2,1)
1 3 4 3 6 1
𝑃(𝑖, 𝑗)
18 18 18 18 18 18
a. Obtain the marginal distribution functions of 𝑋1 and 𝑋2.
b. Are 𝑋1 and 𝑋2 independent?
15) In three flips of a fair coin, if 𝑋1 denotes the number of heads in the first flip
and 𝑋2 denotes the number of heads in the three flips,
b. Obtain the marginal distribution functions of 𝑋1 and 𝑋2.
429 | P a g e
16) Suppose we roll a fair die 7 times. If 𝑋1 denotes the number of 3's and 𝑋2
denotes the number of 2's in the 7 rolls,
b. Obtain the marginal distribution functions of 𝑋1 and 𝑋2.
17) A washing machine seller figures that 45 percent of the customers entering his
store purchase machine type A, 15 percent purchase machine type B, and 40
percent do not purchase. If 5 customers enter his store on a certain day, what
is the probability that he sells two washing machines of type A and one
washing machine of type B?
18) We roll three dice independently.

a. Obtain the probability that the results of three dice are the same.
b. Obtain the probability that the results of two dice are the same.
c. If we repeat rolling these three dice 10 times, 𝑋1 denotes the number of
times that all the three dice land on the same number, and 𝑋2 denotes
the number of times that two dice land on the same number, then obtain
the joint probability distribution of the random variables 𝑋1 and 𝑋2 .
19) Suppose that the number of earthquakes in a certain area follows a Poisson
process with rate 1 earthquake per year. If we know that three earthquakes
have occurred during three years in this area, what is the probability that
exactly one earthquake occurred in each of the three years?
20) Suppose that 𝑋1, 𝑋2, and 𝑋3 are independent uniform random variables
selected from the interval (0,3), independently. What is the probability that
one of them is in the interval (0,1), another in the interval (1,2), and the other
in the interval (2,3)?
21) Suppose that 𝑋1, 𝑋2, and 𝑋3 are independent random variables with the density
𝑥; 0<𝑥≤1
function 𝑓𝑋 (𝑥) = {2 − 𝑥; 1 ≤ 𝑥 < 2 . It is desired to calculate the probability
0; otherwise
that:
430 | P a g e
1 1 3
a. One of them is in the interval (0, 2), other one in the interval (2 , 2),
3
and the other one in the interval (2 , 2).
1
b. At least one of these variables is less than 2, and at least one of these
3
variables is greater than 2.
22) Suppose that 𝑋1 , … , 𝑋𝑛 are independent Bernoulli random variables with

parameter 𝑝. Regard the random variable 𝑋 as the number of observations
of the 𝑛 random variables equal to 𝑋1. If so, obtain the value of 𝑃(𝑋 = 𝑖) for
values 𝑖 = 1,2, … , 𝑛.
23) In each of the following joint probability density functions, obtain the values
c, 𝑓𝑋1 (𝑥1 ), and 𝑓𝑋2 (𝑥2 ).
a. 𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑐(𝑥1 + 2𝑥2 ); 0 ≤ 𝑥1 < 1, 0 ≤ 𝑥2 < 1
b. 𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑐𝑥1 𝑥2 ; 0 < 𝑥1 < 1, 0 < 𝑥2 < 1
c. 𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑐𝑥1 𝑥2 ; 0 < 𝑥1 < 1, 0 < 𝑥2 < 1, 𝑥1 + 𝑥2 ≤ 1
d. 𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑐; 0 < 𝑥1 < 𝑥2 , 0 < 𝑥2 < 1
24) Suppose that the random variables 𝑋1 and 𝑋2 have the following probability
density function:
1; 0 < 𝑥1 < 1, 𝑥1 < 𝑥2 < 𝑥1 + 1
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = {
Obtain the marginal density functions of 𝑋1 and 𝑋2.
25) If the random variables 𝑋1 and 𝑋2 have a uniform joint probability density
function in the region {(𝑥1 , 𝑥2 ): |𝑥1 | + |𝑥2 | < 1}, obtain the marginal density
functions of 𝑋1 and 𝑋2.
26) The joint probability density function of the random variables 𝑋1 and 𝑋2 is
given by
1
𝑓(𝑥1 , 𝑥2 ) =
𝑥 𝑥 ; 0 < 𝑥1 < 2, 0 < 𝑥2 < 4
16 1 2
a. Obtain the joint cumulative distribution function of 𝑋1 and 𝑋2.
b. Obtain the marginal density functions of 𝑋1 and 𝑋2.
c. Obtain the conditional distribution of 𝑋2 given that 𝑋1 = 𝑥1 .
431 | P a g e
given by
𝑓(𝑥1 , 𝑥2 ) = 3𝑥1 ; 0 < 𝑥2 < 𝑥1 < 1
a. Obtain the joint cumulative distribution function of 𝑋1 and 𝑋2.
b. Obtain the marginal density functions of 𝑋1 and 𝑋2.
c. Obtain the conditional distribution of 𝑋2 given that 𝑋1 = 𝑥1 .
given by
𝑓(𝑥1 , 𝑥2 ) = 3𝑥1 ; 0 < 𝑥2 < 𝑥1 < 1
a. Obtain the value of 𝑐.
b. Obtain the joint cumulative distribution function of 𝑋1 and 𝑋2.
c. Obtain the marginal density functions of 𝑋1 and 𝑋2.
d. Obtain the conditional distribution of 𝑋2 given that 𝑋1 = 𝑥1 .
e. Are 𝑋1 and 𝑋2 independent?
29) If 𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑘 such that 0 < 𝑥1 < 𝑥2 < 𝑥3 < 1, then:
a. Obtain the marginal density functions of 𝑋1, 𝑋2 , and 𝑋3.
b. Obtain 𝑓(𝑥3 |𝑥2 ), 𝑓(𝑥3 |𝑥1 ), and 𝑓(𝑥2 , 𝑥3 |𝑥1 ).
30) The joint probability density function of the random variables 𝑋1, 𝑋2, and 𝑋3 is
given by
𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 8𝑥1 𝑥2 𝑥3 ; 0 < 𝑥1 < 1, 0 < 𝑥2 < 1, 0 < 𝑥3 < 1
a. Obtain 𝑃(𝑋1 < 0.5|𝑋2 = 0.5).
b. Obtain 𝑃(𝑋1 < 0.5, 𝑋2 < 0.5|𝑋3 = 0.8).
31) Suppose that the probability density functions of the independent random
variables 𝑋1 and 𝑋2 are given by
𝑓𝑋1 (𝑥1 ) = 𝑒 −𝑥1 ; 𝑥1 > 0, 𝑓𝑋2 (𝑥2 ) = 𝑒 −𝑥2 ; 𝑥2 > 0
If the random variable 𝑍 and the set 𝐴 are defined as:
𝑍 = 𝑋2 − 𝑋1 and 𝐴 = {(𝑥1 , 𝑥2 ); |𝑥1 − 𝑥2 | ≤ 1}, respectively, then it is desired to
calculate:
a. 𝑃(𝐴|𝑋1 = 1)
b. 𝐹𝑍|𝑋1 (0|𝑋1 = 1)
c. 𝑓𝑍|𝑋1 (0|𝑋1 = 1)
432 | P a g e
d. 𝑃(𝑍 ≤ 0|𝐴)
32) If the probability density function of 𝑋1 given that 𝑋2 = 𝑥2 is as follows:
𝑓(𝑥1 |𝑥2 ) = 𝑐1 𝑥1 𝑥2−2 ; 0 < 𝑥1 < 𝑥2 , 0 < 𝑥2 < 1
and the marginal density function of 𝑋2 is 𝑓(𝑥2 ) = 𝑐2 𝑥24 ; 0 < 𝑥2 < 1, then:
a. Obtain the values of 𝑐1 and 𝑐2 .

b. Obtain the joint density function of 𝑋1 and 𝑋2.
c. Obtain 𝑓𝑋2 |𝑋1 (𝑥2 |𝑥1 ).
33) The moment generating functions of the random variables 𝑋1 and 𝑋2 are given
𝑡 −2) 1 3
by 𝑀𝑋1 (𝑡) = 𝑒 (2𝑒 and 𝑀𝑋2 (𝑡) = (4 𝑒 𝑡 + 4)10 , respectively. If 𝑋1 and 𝑋2 are
independent, then it is desired to calculate:
a. 𝑃(𝑋1 + 𝑋2 = 2)
b. 𝑃(𝑋1 𝑋2 = 0)
34) The joint density function of the random variables 𝑋1 and 𝑋2 is given by
𝑓 (𝑥1 , 𝑥2 ) = 2𝑥2 ; 0 ≤ 𝑥1 ≤ 1 , 0 ≤ 𝑥2 ≤ 1, and zero elsewhere. It is desired to
obtain 𝑃(𝑋12 < 𝑋2 < 𝑋1).
6 𝑥1 𝑥2
𝑓 (𝑥1 , 𝑥2 ) = 7 (𝑥12 + ) ; 0 < 𝑥1 < 1, 0 < 𝑥2 < 2, and zero elsewhere.
2
a. Obtain 𝑃(𝑋1 > 𝑋2 ).
1
b. Obtain 𝑃(𝑋1 < 2 |𝑋1 > 𝑋2 ).
given by 𝑓𝑥1,𝑥2 (𝑥1 , 𝑥2 ) = 6𝑥1 ; 0 < 𝑥1 < 𝑥2 < 1, and zero elsewhere. Obtain the
value of 𝑃(𝑋1 + 𝑋2 < 1).
𝑓(𝑥1 , 𝑥2 ) = 𝑐(𝑥1 + 2𝑥2 ); 0 < 𝑥1 < 2, 0 < 𝑥2 < 1, and zero elsewhere. Obtain the
value of 𝑃(𝑋1 < 𝑋2 ).
38) If 𝑋1 and 𝑋2 are the independent and identically continuous random variables
with the probability density function 𝑓(𝑥) = 2𝑥; 0 < 𝑥 < 1, and zero elsewhere,
then obtain the value of 𝑃(𝑋1 < 𝑋2 |𝑋1 < 2𝑋2 ).
433 | P a g e
39) Suppose that the lifetimes of the two light bulbs numbered 1 and 2 follow a
uniform distribution in the interval [0,100] and an exponential distribution
with mean 50, respectively. What is the probability that the second light bulb
lasts longer than the first one?
40) If the random variable 𝑋1 is uniformly distributed over (0,1) and the random
variable 𝑋2, independently, follows an exponential distribution with parameter
𝜆 = 1, then:
𝑋
a. Obtain the value of 𝑃(𝑋1 ≤ 2).
2
1
b. Obtain the values of 𝑃(𝑋1 + 𝑋2 ≤ 2) and 𝑃(𝑋1 + 𝑋2 ≤ 2).
41) Suppose that 𝑋1 and 𝑋2 have the joint probability density function
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑒 −(𝑥1+𝑥2) ; 𝑥1 > 0, 𝑥2 > 0, and zero elsewhere. It is desired to
calculate:
a. 𝑃(𝑋1 ≥ 1 , 𝑋2 < 1)
b. 𝑃(𝑀𝑎𝑥(𝑋1 , 𝑋2 ) ≥ 1)
c. 𝑃(𝑀𝑖𝑛(𝑋1 , 𝑋2 ) ≥ 1)
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑥1 𝑥2 𝑒 −(𝑥1+𝑥2) ; 𝑥1 > 0, 𝑥2 > 0, and zero elsewhere. It is desired to
calculate:
a. 𝑃(𝑋1 ≥ 1 , 𝑋2 < 1)
b. 𝑃(𝑀𝑎𝑥(𝑋1 , 𝑋2 ) ≥ 1)
c. 𝑃(𝑀𝑖𝑛(𝑋1 , 𝑋2 ) ≥ 1)
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑥2 𝑒 −𝑥2(1+𝑥1) ; 𝑥1 > 0, 𝑥2 > 0, and zero elsewhere. It is desired to
calculate:
a. 𝑃(𝑋1 ≥ 1 , 𝑋2 < 1)
b. 𝑃(𝑀𝑎𝑥(𝑋1 , 𝑋2 ) ≥ 1)
c. 𝑃(𝑀𝑖𝑛(𝑋1 , 𝑋2 ) ≥ 1)
44) Suppose that 𝑋1 and 𝑋2 are two independent and identically distributed
random variables with the following probability density function:
𝑓(𝑥) = 2𝑥; 0 ≤ 𝑥 ≤ 1, and zero elsewhere.
434 | P a g e
𝑋
Obtain the value of 𝑃(𝑋1 ≤ 0.5).
2
45) Suppose that the distribution of the number of cars crossing a street is Poisson
with 𝜆 = 30 cars per hour. If the time that it takes a person to cross the street
follows an exponential distribution with mean 1 minute, then:
a. Obtain the probability that no cars collide with the person (suppose that
if a car passes during the passage of a person, it will definitely collide
with the person.)
b. Obtain the probability that the passing time of the person is less than
two times of the passing time interval between two consecutive cars.
46) Suppose that 𝑋1 and 𝑋2 have the following joint density function:
𝑒 −𝑥1−𝑥2 ; 𝑥1 ≥ 0, 𝑥2 ≥ 0
𝑓(𝑥1 , 𝑥2 ) = {
If so, obtain the value of 𝑃(𝑋1 < 𝑋2 |𝑋1 < 2𝑋2 ).
47) If 𝑋1 and 𝑋2 are independent exponential random variables with respective
parameters 𝜆1 and 𝜆2 , obtain the following values:
𝑋 1
a. 𝑃(𝑋1 < 2)
2
𝑋1 1
b. 𝑃(𝑋 < 2)
1 +𝑋2
𝑋1 −𝑋2 1
c. 𝑃(𝑋 < 2)
1 +𝑋2
48) Consider a circle of radius 1 centered at the origin. 𝑋1 and 𝑋2 are the Cartesian
coordinates of a point, respectively.
a. If a point is randomly selected from inside of the circle, then obtain the
1
value of 𝑃(𝑋12 + 𝑋22 ≤ 4).
b. If a point is randomly selected from inside of the circle, then obtain the
1
value of 𝑃(|𝑋1 | + |𝑋2 | ≥ 2)
c. If 10 points are randomly selected from inside of the circle, then obtain
the probability that the distance of the nearest point from the center is
1
at least equal to 2.
49) Two people A and B are supposed to meet in a particular location at 12:30 p.m.
If A arrives at a time uniformly distributed between 12:15 and 12:45 p.m. and B
435 | P a g e
arrives independently at a time uniformly distributed between 12:00 and 13:00
p.m., then:
a. Obtain the probability that the first person arriving at the location waits
no longer than 5 minutes.
b. Obtain the probability that person A arrives first.
50) Two people A and B agree to meet at a certain location. Each of them
independently arrives at a time uniformly distributed between 12 and 13 p.m.
a. If any person arriving first waits up to 10 minutes, what is the probability
that these two do not meet each other?
b. If person A waits 10 minutes at most and person B waits up to 20
minutes, what is the probability that they do not meet each other?
51) A square with the sides of length 2 centimeters is given. We select a point at
random from the square. What is the probability that the distance of the point
from any vertex of the square is greater than 1 centimeter (suppose that the
density of different points is the same)?
52) Two numbers 𝑋1 and 𝑋2 are uniformly distributed between zero and one.
1
Suppose that the events A and B are defined as 𝐴 = (𝑋1 < 2) and 𝐵 = (𝑋1 > 𝑋2 ),
respectively. Obtain the value of 𝑃(𝐴|𝐵).
53) Suppose that 𝑋1 and 𝑋2 have the following joint probability density function:
𝑓(𝑥1 , 𝑥2 ) = 1 ; 0 < 𝑥1 < 1 , 0 < 𝑥2 < 1, and zero elsewhere.
a. 𝑃(𝑋1 + 𝑋2 < 1.5)
b. 𝑃(𝑋1 < 2𝑋2 )
c. 𝑃(𝑋2 < 𝑋1 , 𝑋12 + 𝑋22 > 1)
1
d. 𝑃(𝑋1 𝑋2 < 4)
2
e. 𝑃(𝑋1 + 𝑋2 < 1, 𝑋1 𝑋2 < 9)
54) 𝑋1 is a normal random variable with mean zero and variance 25. 𝑋2,
1
independently, follows a Bernoulli distribution with parameter 𝑝 = 2. Obtain
the value of 𝑃(𝑋1 + 𝑋2 > 1).
55) 𝑛 batteries are being used for the parallel connections of an electronic
product, and the lifetime of each of them follows an exponential distribution
436 | P a g e
with mean 100 days. Suppose that the lifetimes of the batteries are
independent, and the product functions when at least a battery works. If so,
then obtain the minimum value of 𝑛 for the product such that the probability
of lasting at least 600 days becomes more than 0.9.
56) Suppose that 𝑋1 , … , 𝑋5 are independent and identically distributed exponential
random variables with rate 𝜆. Obtain the value of 𝑃(𝑋1 > 𝑋2 + ⋯ + 𝑋5 ).
57) Two groups of customers enter a store. Their arrival time follows a Poisson
process with rates 5 and 10 people per hour, respectively. What is the
probability that the fifth customer of group 1 arrives before the second
customer of group 2?
58) If 𝑋1 , 𝑋2, and 𝑋3 are independent and identically distributed exponential
random variables with parameter 𝜆, it is desired to calculate:
𝑋1 +𝑋2 1
a. 𝑃( < 2)
𝑋3
𝑋1 1
b. 𝑃(𝑋 < 2)
2 +𝑋3
59) If we have 𝑓(𝑥1 , 𝑥2 ) = 𝑒 −𝑥1 ; 0 < 𝑥2 < 𝑥1 < ∞, and zero elsewhere, then it is
desired to obtain the distribution and expected value of the random variable:
a. 𝑍 = (𝑋2 |𝑋1 = 𝑥1 ).
b. 𝑊 = (𝑋1|𝑋2 = 𝑥2 ).
60) Suppose that the joint density function of 𝑋1 and 𝑋2 is given by
−𝑥1
𝑒 𝑥2 𝑒 −𝑥2
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = ; 0 < 𝑥1 < ∞, 0 < 𝑥2 < ∞, and zero elsewhere.
𝑥2
a. 𝐸(𝑋1 |𝑋2 = 𝑥2 )
b. 𝑉𝑎𝑟(𝑋1 |𝑋2 = 𝑥2 )
c. 𝑃(1 ≤ 𝑋1 ≤ 2|𝑋2 = 𝑥2 )
61) Suppose that 𝑋1 and 𝑋2 have the following joint density function:
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 3; 0 ≤ 𝑥2 ≤ 𝑥12 ≤ 1, and zero elsewhere.
a. 𝐸(𝑋2 |𝑋1 = 𝑥1 )
b. 𝑉𝑎𝑟(𝑋2 |𝑋1 = 𝑥1 )
1 1 1
c. 𝑃(3 ≤ 𝑋2 ≤ 2 |𝑋1 = 𝑥1 ) ; < 𝑥1 < 1
2
437 | P a g e
62) If the joint density function of 𝑋1 and 𝑋2 is defined in the region 𝑥1 , 𝑥2 > 0 as
𝑓(𝑥1 , 𝑥2 ) = 8𝑒 −2(𝑥1+2𝑥2) , obtain the value of 𝐸(𝑋1 |𝑋2 = 𝑥2 ).
63) If 𝑓(𝑥1 , 𝑥2 ) = 𝑒 −(𝑥1+𝑥2) ; 𝑥1 > 0, 𝑥2 > 0, and zero elsewhere, and the set of 𝐴 is
defined as 𝐴 = {(𝑥1 , 𝑥2 ): |𝑥1 − 𝑥2 | ≤ 2}, then obtain the value of 𝑃(𝐴|𝑋1 = 1).
64) Consider two machines. Machine 1 is working and machine 2 will be set up to
work at a time 𝑡. If the lifetime of machine 𝑖 is exponential with parameter 𝜆𝑖 ,
what is the probability that machine 1 is the first machine that fails?
65) The joint density function of 𝑋1 and 𝑋2 is given by
𝑓 (𝑥1 , 𝑥2 ) = 𝑥2 𝑒 −𝑥2(𝑥1+1) ; 𝑥1 > 0, 𝑥2 > 0, and zero elsewhere.

a. 𝐸(𝑋1 |𝑋2 = 𝑥2 )
b. 𝑉𝑎𝑟(𝑋1 |𝑋2 = 𝑥2 )
c. 𝑃(1 ≤ 𝑋1 ≤ 2|𝑋2 = 𝑥2 )
66) The joint density function of 𝑋1 and 𝑋2 is given by
𝑒 −𝑥2
𝑓 (𝑥1 , 𝑥2 ) = ; 0 < 𝑥1 < 𝑥2 , 0 < 𝑥2 < ∞, and zero elsewhere.
𝑥2
Obtain the value of 𝐸(𝑋14 |𝑋2 = 𝑥2 ).
67) Suppose that 𝑋1 |𝑋2 ~𝑃(𝑋2 ) and 𝑋2 ~𝐸𝑥𝑝(1). It is desired to calculate:
a. The distribution of the random variable (𝑋2 |𝑋1 = 𝑥1 ).
b. 𝐸(𝑋2 |𝑋1 = 𝑥1 )
c. 𝐸(𝑒 −𝑋2 |𝑋1 = 1)
68) Urn A contains 3 marbles numbered 1 through 3. We select two marbles at
random and one at a time from the urn. Suppose that 𝑋1 denotes number of
the first marble selected from the urn and 𝑋2 denotes the maximum number
among marbles selected from the urn. If so, in each of the following conditions,
obtain the value of 𝑃(𝑋1 ≤ 2|𝑋2 = 3):
a. Sampling is done without replacement.
b. Sampling is done with replacement.
69) Suppose that the random variables 𝑋1 and 𝑋2 are uniformly distributed over
the following region:
438 | P a g e
𝑋2
3 𝑥12 + 𝑥22 = 9
𝑋1
3
a. 𝐸(𝑋2 |𝑋1 = √5)
b. 𝑉𝑎𝑟(𝑋2 |𝑋1 = √5)
70) A point is randomly selected from the inside a semicircle of radius 1. The
semicircle lies in the first and second quadrants. If (𝑋1 , 𝑋2 ) denotes the
coordinates of the selected point, it is desired to calculate:
a. 𝐸(𝑋1 |𝑋2 = 𝑥2 )
b. 𝑉𝑎𝑟(𝑋1 |𝑋2 = 𝑥2 )
71) A point is randomly selected from the inside of a triangle with vertices (0,0),
(0,1), and (1,0). If 𝑋1 denotes the length of the point and 𝑋2 denotes the width
of the point, obtain the value of 𝐸(𝑋2 |𝑋1 = 𝑥1 ).
72) The joint probability function of the random variables 𝑋1 and 𝑋2 is given by
(𝑛 ∈ 𝑁)
2
; 𝑥1 = 1, . . . , 𝑛, 𝑥2 = 1, . . . , 𝑥1
𝑃𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = {𝑛(𝑛 + 1)
a. 𝐸(𝑋2 |𝑋1 = 𝑥1 )
b. 𝐸(𝑋1 |𝑋2 = 𝑥2 )
73) Suppose that 𝑋1 and 𝑋2 have the following joint distribution:
𝑋1
0 1
𝑋2
1 3
0
8 8
2 1
1
6 6
439 | P a g e
Obtain the value of 𝐸(𝑋1 |𝑋2 = 0).
74) The joint density function of the two random variables 𝑋1 and 𝑋2 is given by
𝑥 1 𝑥1
𝑃(𝑥1 , 𝑥2 ) = ( 1 ) ( )𝑥1 ( ); 𝑥1 = 0,1,2,3,4,5 , 𝑥2 = 0,1, . . . , 𝑥1
𝑥2 2 15
75) Suppose that 𝑋1 and 𝑋2 are random variables having the following joint
probability distribution:
(𝑥1 + 1)(𝑥2 + 2)
𝑃𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = ; 𝑥1 = 0,1,2 , 𝑥2 = 0,1,2
54
76) Suppose that the joint probability density function of the random variables 𝑋1
and 𝑋2 is given by
𝑓(𝑥1 , 𝑥2 ) = 6𝑥1 𝑥2 (2 − 𝑥1 − 𝑥2 ); 0 < 𝑥1 < 1, 0 < 𝑥2 < 1, and zero elsewhere.
Obtain the value of 𝐸(𝑋1 |𝑋2 = 0.5).
77) Suppose that the joint probability density function of the random variables 𝑋1
and 𝑋2 is given by
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 4𝑥2 (𝑥1 − 𝑥2 )𝑒 −(𝑥1+𝑥2) ; 0 < 𝑥2 < 𝑥1 , and zero elsewhere.
Obtain the value of 𝑃(𝑋1 < 3|𝑋2 = 1).
78) If 𝑋2 is a random point in the interval (0,1) and 𝑋1 is a random point in the
interval (0, 𝑋2 ), then it is desired to calculate:
a. 𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) Or the joint probability density function of 𝑋1 and 𝑋2.
b. The value of 𝑃(𝑋1 + 𝑋2 < 1).
79) Suppose that 𝑋1 ~𝑈[0,1] and (𝑋2 |𝑋1 = 𝑥1 )~𝑈[𝑥1 , 𝑥1 + 1]. It is desired to calculate:
a. 𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) Or the joint probability density function of 𝑋1 and 𝑋2
b. The value of 𝑃(𝑋1 + 𝑋2 < 1).
80) 𝑋1 is randomly selected from the set of {1,2,…, 𝑛} followed by randomly
selecting of 𝑋2 from the set of {1,…, 𝑋1 }. Obtain the conditional probability
function of (𝑋1 |𝑋2 = 𝑘).
81) Suppose that 𝑋2 follows a Poisson distribution with parameter 𝜆 and
(𝑋1|𝑋2 = 𝑥2 ) is a binomial distribution with parameters (𝑛 = 𝑥2 , 𝑝).
a. Obtain the distribution of (𝑋1 − 𝑥2 )|𝑋2 = 𝑥2 ).
440 | P a g e
b. Obtain the distribution of (𝑋1 − 𝑥2 )|𝑋2 = 𝑥2 ).
82) Suppose that 𝑋2 follows gamma distribution with parameters (𝛼, 𝜆) and
𝑋1 |𝑋2 = 𝑥2 is a Poisson distribution with parameter 𝑥2 . Obtain the distribution
and expected value of 𝑋2 |𝑋1 = 𝑖.
441 | P a g e
W e discussed the distribution of a function of a random variable at the end of
Chapter 4. One of the important topics that is widely used in probability
theory is the distribution of a function of multiple random variables. As it plays a
significant role in applications of probability theory in statistics, this subject is of
great importance. This subject is addressed in this chapter, and some of its
applications in the probability theory and statistics will be explained.
S uppose that 𝑌 = 𝑔(𝑋1, 𝑋2 , . . . , 𝑋𝑛 ) is a function of random variables 𝑋1, 𝑋2, …, 𝑋𝑛 .

If the range of the random variable 𝑌 is discrete, then its distribution is known
by the probability function, and if its range is continuous, then the distribution of
𝑌 is known by the density function.
I n Section 4.8 of Chapter 4, we showed that if 𝑌 is a function of random variable

𝑋 as 𝑌 = 𝑔(𝑋), and the range of this function contains the discrete values, we
should obtain the probability function of 𝑌 to determine its distribution. Generally,
suppose that 𝑌 = 𝑔(𝑋1 , 𝑋2 , . . . , 𝑋𝑛 ) is a function of multiple random variables 𝑋1, 𝑋2,
…, 𝑋𝑛 , and the range of the random variable 𝑌 is discrete. To determine the
distribution of 𝑌, we should obtain the probability function of 𝑌.
442 | P a g e
Example 2.1
Suppose that 𝑋1 denotes the number of female customers of a store per day
following a Poisson process with rate 4, and 𝑋2 denotes the number of male
customers of a store per day following a Poisson process with rate 6. Assuming the
independence of random variables 𝑋1 and 𝑋2, obtain the distribution of 𝑌 = 𝑋1 + 𝑋2.
Solution.
𝑘 𝑘
𝑃(𝑌 = 𝑘) = 𝑃(𝑋1 + 𝑋2 = 𝑘) = ∑ 𝑃(𝑋1 = 𝑖, 𝑋2 = 𝑘 − 𝑖) = ∑ 𝑃(𝑋1 = 𝑖)𝑃(𝑋2 = 𝑘 − 𝑖)

𝑖=0 𝑖=0
𝑘 𝑘 𝑘
−4 𝑖 −6 𝑘−𝑖 𝑖 𝑘−𝑖 −10
𝑒 4 𝑒 6 4 ×6 𝑒 𝑘!
=∑ × = 𝑒 −10 ∑ = ∑ 4𝑖 × 6𝑘−𝑖
𝑖! (𝑘 − 𝑖)! (𝑘
𝑖! − 𝑖)! 𝑘! (𝑘
𝑖! − 𝑖)!
𝑖=0 𝑖=0 𝑖=0
𝑘
𝑒 −10 𝑘 𝑒 −10 𝑒 −10 10𝑘
= ∑ ( ) 4𝑖 × 6𝑘−𝑖 = (4 + 6)𝑘 = ; 𝑘 = 0,1,2, …
𝑘! 𝑖 𝑘! 𝑘!
𝑖=0
As a result, 𝑌 = 𝑋1 + 𝑋2 follows a Poisson distribution with parameter 10.
Example 2.2
Suppose that we select 3 balls at random from an urn containing 3 red, 4 white,
and 5 blue balls. If 𝑋1 and 𝑋2 denote the number of red and white balls selected from
the urn, then obtain the distribution of 𝑌 = 𝑋1 + 𝑋2.
Solution. 𝑌 = 𝑋1 + 𝑋2 can take on the values 0, 1, 2, and 3. Hence, we have:
3 4 5
( )( )( )
𝑥1 𝑥2 3 − 𝑥1 − 𝑥2
𝑃(𝑌 = 𝑎) = 𝑃(𝑋1 + 𝑋2 = 𝑎) = ∑ ∑
12
𝑥1 +𝑥2 =𝑎 ( )
3
443 | P a g e
3 4 5 7
𝑎 ( )( )( ) ( 5 ) 𝑎 (
5
)( )
𝑥1 𝑎 − 𝑥1 3 − 𝑎 3 4
∑ = 3 − 𝑎 ∑ ( )( )= 3−𝑎 𝑎
12 12 𝑥1 𝑎 − 𝑥1 12
𝑥1 =0 ( ) ( ) 𝑥1=0 ( )
3 3 3
As a result, the distribution of 𝑌 = 𝑋1 + 𝑋2 is hypergeometric with parameters

(𝑁 = 12 , 𝑚 = 7 , 𝑛 = 3).
S uppose that 𝑌 = 𝑔(𝑋1, 𝑋2 , . . . , 𝑋𝑛 ) is a function of multiple random variables

𝑋1 , 𝑋2 , … , 𝑋𝑛 and the range of the random variable 𝑌 is continuous. To determine
the distribution of 𝑌, we should obtain the cumulative distribution function of 𝑌 and
then differentiate it.
Example 2.3
The joint density function of 𝑋1 and 𝑋2 is given by:
𝑒 −(𝑥1+𝑥2) ; 𝑥1 > 0 , 𝑥2 > 0

𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = {
Obtain the density function of the following random variables:

𝑋
a. 𝑌 = 𝑋1
2
𝑋1
b. 𝑍=𝑋
1 +𝑋2
Solution.
𝑋
a. Since 𝑋1 and 𝑋2 take on nonnegative values, the random variable 𝑌 = 𝑋1
2
takes on nonnegative values as well.
𝑋1
𝐹𝑌 (𝑎) = 𝑃(𝑌 ≤ 𝑎) = 𝑃( ≤ 𝑎) = 𝑃(𝑋1 ≤ 𝑎𝑋2 )
𝑋2
∞ 𝑎𝑥2 ∞
=∫ ∫ 𝑒 −(𝑥1+𝑥2) 𝑑𝑥1 𝑑𝑥2 = ∫ (1 − 𝑒 −𝑎𝑥2 ) 𝑒 −𝑥2 𝑑𝑥2
0 0 0
444 | P a g e
∞ ∞
1
= ∫ 𝑒 −𝑥2 𝑑𝑥2 − ∫ 𝑒 −(𝑎+1)𝑥2 𝑑𝑥2 = 1 − 𝑋2
0 0 1+𝑎
𝑥1
𝑑𝐹𝑌 (𝑎) 1 𝑥2 =
⇒ 𝑓𝑌 (𝑎) = = ; 𝑎>0 𝑎
𝑑𝑎 (𝑎 + 1)2
𝑋1
However, a closer look at the joint density function of 𝑋1 and 𝑋2 helps

us to realize that 𝑋1 and 𝑋2 are independent exponential random
variables with identical parameter 1. Therefore, using the result of
Example 4.6 of Chapter 8, this problem can be solved by the following
method:
𝑋1 𝜆1 1
𝐹𝑌 (𝑎) = 𝑃(𝑌 ≤ 𝑎) = 𝑃 (
≤ 𝑎) = 𝑃(𝑋1 ≤ 𝑎𝑋2 ) = =
𝑋2 𝜆 1
𝜆1 + 𝑎2 1 + 𝑎
𝑎 1 𝑑𝐹𝑌 (𝑎) 1
= =1− ⇒ 𝑓𝑌 (𝑎) = = ; 𝑎>0
1+𝑎 1+𝑎 𝑑𝑎 (𝑎 + 1)2
b. Since 𝑋1 and 𝑋2 take on nonnegative values, the random variable 𝑍 =

𝑋1
takes on values of the interval (0,1).
𝑋1 +𝑋2
𝑋1
𝐹𝑍 (𝑎) = 𝑃 ( < 𝑎) = 𝑃(𝑋1 < 𝑎(𝑋1 + 𝑋2 ))
𝑋1 + 𝑋2
𝑎𝑥2
∞
1−𝑎
= 𝑃((1 − 𝑎)𝑋1 < 𝑎𝑋2 ) = ∫ ∫ 𝑒 −(𝑥1 +𝑥2) 𝑑𝑥1 𝑑𝑥2
0 0
∞ 𝑎𝑥2 ∞ 1
= ∫ (1 − 𝑒 −(1−𝑎) ) 𝑒 −𝑥2 𝑑𝑥2 = 1 − ∫ 𝜆𝑒 −𝑥2(1−𝑎) 𝑑𝑥2
0 0
𝑑𝐹𝑍 (𝑎)
= 1 − (1 − 𝑎) = 𝑎 ⇒ =1 ; 0<𝑎<1
𝑑𝑎
Which is the density function of the uniform random variable in the

interval (0,1).
Considering that 𝑋1 and 𝑋2 are independent exponential random
variables with identical parameter 1, we can also solve the problem
using the following method:
445 | P a g e
𝑋1
𝐹𝑍 (𝑎) = 𝑃 ( < 𝑎) = 𝑃(𝑋1 < 𝑎(𝑋1 + 𝑋2 )) = 𝑃((1 − 𝑎)𝑋1 < 𝑎𝑋2 )
𝑋1 + 𝑋2
1
= 1 − 𝑎 = 𝑎 ⇒ 𝑓 (𝑎) = 1 ; 0 < 𝑧 < 1
1 1 𝑍
+
1−𝑎 𝑎
There are other methods to obtain the distribution of a function of multiple

random variables for the continuous case. For instance, note the following theorem:
Theorem 2.1
Consider the continuous random variables 𝑋1 and 𝑋2 with the joint density function
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ). Suppose that 𝑌1 = 𝑔1 (𝑋1 , 𝑋2 ) and 𝑌2 = 𝑔2 (𝑋1 , 𝑋2 ) are functions in terms of
𝑋1 and 𝑋2 and satisfy the following conditions:
1. The equations of 𝑌1 = 𝑔1 (𝑋1 , 𝑋2 ) and 𝑌2 = 𝑔2 (𝑋1 , 𝑋2 ) can be solved
exclusively for 𝑋1 and 𝑋2 in terms of 𝑌1 and 𝑌2 with solutions 𝑋1 =
ℎ1 (𝑌1 , 𝑌2 ) and 𝑋2 = ℎ2 (𝑌1 , 𝑌2 ).
2. The functions 𝑔1 and 𝑔2 have continuous partial derivatives in all the
points of (𝑥1 , 𝑥2 ), and the following determinant is nonzero for the
points of (𝑥1 , 𝑥2 ).
𝜕𝑔1 𝜕𝑔1
𝜕𝑥 𝜕𝑥2 |
𝐽(𝑥1 , 𝑥2 ) = || 1
𝜕𝑔2 𝜕𝑔2 |
𝜕𝑥1 𝜕𝑥2
In such conditions, the joint density function of the random variables 𝑌1 and 𝑌2 is
𝑓𝑌1 ,𝑌2 (𝑦1 , 𝑦2 ) = 𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) |𝐽(𝑥1 , 𝑥2 )|−1
Where 𝑥1 and 𝑥2 are substituted by their values in terms of 𝑦1 and 𝑦2 . (1-2)
To prove this theorem, first, the joint cumulative distribution function of the
random variables 𝑌1 and 𝑌2 is written as 𝐹𝑌1 ,𝑌2 (𝑦1 , 𝑦2 ) = 𝑃(𝑌1 ≤ 𝑦1 , 𝑌2 ≤ 𝑦2 ). Then,
differentiating the cumulative distribution function with respect to 𝑦1 and 𝑦2 yields
the joint density function of random variables 𝑌1 and 𝑌2 . We do not address this part
further since it requires advanced mathematics.
446 | P a g e
Example 2.4
Suppose that 𝑋1 and 𝑋2 are independent exponential random variables with

respective parameters 𝜆1 and 𝜆2 . If 𝑌1 = 𝑋1 + 𝑋2 and 𝑌2 = 𝑋1 − 𝑋2, then obtain the joint
density function of 𝑌1 and 𝑌2 .
Solution.
𝑌1 = 𝑋1 + 𝑋2 𝑌2 = 𝑋1 − 𝑋2
1 1 𝑌1 + 𝑌2 𝑌1 − 𝑌2
𝐽=| | = −2 𝑋1 = , 𝑋2 =
1 −1 2 2
−𝜆 𝑥 −𝜆 𝑥 −1
⇒ 𝑓𝑌1 ,𝑌2 (𝑦1 , 𝑦2 ) = 𝜆1 𝑒 1 1 𝜆2 𝑒 2 2 |𝐽| ; 𝑥1 ≥ 0, 𝑥2 ≥ 0
Where 𝑥1 and 𝑥2 are substituted by their values in terms of 𝑦1 and 𝑦2 .
(𝑦1 +𝑦2 ) (𝑦1 −𝑦2 ) 1 (𝑦1 + 𝑦2 ) (𝑦1 − 𝑦2 )

⇒ 𝑓𝑌1 ,𝑌2 (𝑦1 , 𝑦2 ) = 𝜆1 𝑒 −𝜆1 2 𝜆2 𝑒 −𝜆2 2 × ; > 0, >0
2 2 2
Example 2.5
If 𝑋 and 𝑌 have the following joint density function, obtain the density function
of 𝑈 = 𝑋𝑌 for values 𝑢 > 1.
1
𝑓 (𝑥, 𝑦) = 2 2 ; 𝑥 ≥ 1 , 𝑦 ≥ 1
𝑥 𝑦
Solution. The first solution is that we obtain the cumulative distribution function of
𝑈 = 𝑋𝑌 and then differentiate it. The second solution is to use the preceding
theorem. For this purpose, we define the random variable 𝑉 = 𝑌, and obtain the joint
density function of 𝑈 = 𝑋𝑌 and 𝑉 = 𝑌. Finally, we determine the marginal density
function of 𝑈 in the bivariate distribution as follows:
447 | P a g e
𝑈 = 𝑋𝑌 𝑉=𝑌
𝑦 𝑥 1 1
𝐽(𝑥, 𝑦) = | | = 𝑦 ⇒ |𝐽−1 | = =
0 1 𝑌 𝑣
1 1 1 1 1 𝑢
𝑓(𝑥, 𝑦) = 2 2 = 2 ⇒ 𝑓 (𝑢, 𝑣) = 2 × = 2 ; 𝑣 > 1 , > 1
𝑥 𝑦 𝑢 𝑢 𝑣 𝑣𝑢 𝑣
𝑢
𝑣 > 1, > 1⇒1< 𝑣 < 𝑢
𝑣
𝑢
1 𝑙𝑛( 𝑢)
𝑓𝑈 (𝑢) = ∫ 2
𝑑𝑣 = ; 𝑢>1
1 𝑣𝑢 𝑢2
A s mentioned in the preceding section, we are inclined to obtain the distribution

of a function of multiple random variables in many cases. One of the most
important functions defined on multiple random variables is the sum of independent
random variables. This function has many applications in statistics and probability
theory. Since it has a great importance, we address it exclusively in this section. Like
the preceding section, if 𝑌 = 𝑋1 + 𝑋2 is a discrete random variable, then the
distribution of 𝑌 is known by its probability function, and if 𝑌 = 𝑋1 + 𝑋2 is a
continuous random variable, then it the distribution of 𝑌 is known by its density
function.
Example 3.1
If 𝑋1 and 𝑋2 are independent binomial random variables with respective

parameters (𝑛1 , 𝑝) and (𝑛2 , 𝑝), obtain the distribution of 𝑌 = 𝑋1 + 𝑋2.
Solution.
𝑘
𝑃(𝑌 = 𝑘) = 𝑃(𝑋1 + 𝑋2 = 𝑘) = ∑ 𝑃(𝑋1 = 𝑖, 𝑋2 = 𝑘 − 𝑖)

𝑖=0
𝑘 𝑘
𝑛1 𝑖 𝑛1 −𝑖 𝑛2
= ∑ 𝑃(𝑋1 = 𝑖)𝑃(𝑋2 = 𝑘 − 𝑖) = ∑ ( )𝑝 𝑞 ( ) 𝑝𝑘−𝑖 𝑞 𝑛2 −(𝑘−𝑖)
𝑖 𝑘−𝑖
𝑖=0 𝑖=0
448 | P a g e
𝑘
𝑛 𝑛 𝑛 + 𝑛2 𝑘 (𝑛1 +𝑛2 )−𝑘
𝑘 (𝑛1 +𝑛2 )−𝑘
⇒ 𝑃(𝑌 = 𝑘) = 𝑝 𝑞 ∑ ( 1) ( 2 ) = ( 1 )𝑝 𝑞 ; 𝑘 = 0,1,2, … , 𝑛1 + 𝑛2
𝑖 𝑘−𝑖 𝑘
𝑖=0
Therefore, if 𝑋1 and 𝑋2 are independent binomial random variables with

respective parameters (𝑛1 , 𝑝) and (𝑛2 , 𝑝), the distribution of 𝑌 = 𝑋1 + 𝑋2 is binomial
with parameters (𝑛1 + 𝑛2 , 𝑝).
If 𝑋1 , 𝑋2, and 𝑋3 are independent binomial random variables with respective
parameters (𝑛1 , 𝑝), (𝑛2 , 𝑝), and (𝑛3 , 𝑝), the distribution of 𝑌 = 𝑋1 + 𝑋2 + 𝑋3 can be
written as the sum of the random variables 𝑋1 + 𝑋2 and 𝑋3. The first one is binomial
with parameters (𝑛1 + 𝑛2 , 𝑝), the second one is binomial with parameters (𝑛3 , 𝑝), and
the sum of these two random variables follows a binomial distribution with
parameters (𝑛1 + 𝑛2 + 𝑛3 , 𝑝). In general, if 𝑋1 , 𝑋2 , . . . , 𝑋𝑟 are independent binomial
random variables with respective parameters (𝑛1 , 𝑝), (𝑛2 , 𝑝), … , (𝑛𝑟 , 𝑝), then the
random variable 𝑌 = ∑𝑟𝑖=1 𝑋𝑖 follows a binomial distribution with parameters
(∑𝑟𝑖=1 𝑛𝑖 , 𝑝).
Example 3.2
If 𝑋1 and 𝑋2 are independent Poisson random variables with respective

parameters 𝜆1 and 𝜆2 , obtain the distribution of 𝑌 = 𝑋1 + 𝑋2.
Solution.
𝑘
𝑃(𝑌 = 𝑘) = ∑ 𝑃(𝑋1 = 𝑖, 𝑋2 = 𝑘 − 𝑖)
𝑖=0
𝑘 𝑘
𝑒 −𝜆1 𝜆1𝑖 𝑒 −𝜆2 𝜆𝑘−𝑖
2
= ∑ 𝑃(𝑋1 = 𝑖)𝑃(𝑋2 = 𝑘 − 𝑖) = ∑ ×
𝑖! (𝑘 − 𝑖)!
𝑖=0 𝑖=0
𝑘 𝑘
−(𝜆1 +𝜆2 )
−(𝜆1 +𝜆2 )
𝜆1𝑖
× 𝜆𝑘−𝑖
2 𝑒 𝑘!
=𝑒 ∑ = ∑ 𝜆𝑖 𝜆𝑘−𝑖
𝑖! (𝑘 − 𝑖)! 𝑘! 𝑖! (𝑘 − 𝑖)! 1 2
𝑖=0 𝑖=0
−(𝜆1 +𝜆2 ) (𝜆 )𝑘
𝑒 1 + 𝜆2
= ; 𝑘 = 0,1,2, … , 𝑛1 + 𝑛2
𝑘!
449 | P a g e
As a result, 𝑌 = 𝑋1 + 𝑋2 follows a Poisson distribution with parameter (𝜆1 + 𝜆2 ).
In the same manner as the preceding example, it can be shown that if
𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent Poisson random variables with parameters 𝜆𝑖 , then the
random variable 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 follows a Poisson distribution with parameter ∑𝑛𝑖=1 𝜆𝑖 .
If 𝑌 = 𝑋1 + 𝑋2 is continuous, to obtain its density function, we can obtain its
cumulative distribution function and then differentiate it.
Example 3.3
If 𝑋1 and 𝑋2 are independent exponential random variables each with

parameter 𝜆, then obtain the distribution of 𝑌 = 𝑋1 + 𝑋2.
Solution.
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑓𝑋1 (𝑥1 )𝑓𝑋2 (𝑥2 ) = 𝜆𝑒 −𝜆𝑥1 𝜆𝑒 −𝜆𝑥2 𝑋2
𝐹𝑌 (𝑎) = 𝑃(𝑌 ≤ 𝑎) = 𝑃(𝑋1 + 𝑋2 ≤ 𝑎)
𝑎 𝑎−𝑥1 𝑎 𝑥1 + 𝑥2 = 𝑎
=∫ ∫ 𝜆𝑒 −𝜆𝑥1 𝜆𝑒 −𝜆𝑥2 𝑑𝑥2 𝑑𝑥1
0 0
𝑎
= ∫ 𝜆𝑒 −𝜆𝑥1 (1 − 𝑒 −𝜆(𝑎−𝑥1) )𝑑𝑥1 𝑎 𝑋1
0
𝑎 𝑎
−𝜆𝑥1
= ∫ 𝜆𝑒 𝑑𝑥1 − ∫ 𝜆𝑒 −𝜆𝑎 𝑑𝑥1 = 1 − 𝑒 −𝜆𝑎 − 𝜆𝑎𝑒 −𝜆𝑎
0 0
𝑑𝐹𝑌 (𝑎) 𝜆𝑒 −𝜆𝑎 (𝜆𝑎)1
⇒ 𝑓𝑌 (𝑎) = = 𝜆𝑒 −𝜆𝑎 − [𝜆𝑒 −𝜆𝑎 − 𝜆2 𝑎𝑒 −𝜆𝑎 ] = 𝜆2 𝑎𝑒 −𝜆𝑎 = ; 𝑎>0
𝑑𝑎 1!
Therefore, 𝑌 = 𝑋1 + 𝑋2 follows a Erlang (Gamma) distribution with parameters

𝜆 and 𝛼 = 2.
450 | P a g e
Example 3.4
If 𝑋1 and 𝑋2 are independent and identically distributed uniform random

variables in the interval (0,1), then obtain the distribution of 𝑌 = 𝑋1 + 𝑋2.
Solution.
There is a significant difference between this example and Example 3.3. In
Example 3.3, the main space of the problem was 𝑋1 > 0, 𝑋2 > 0. However, in this
example, the main space is 0 < 𝑋1 < 1, 0 < 𝑋2 < 1. Such a difference makes the
region below the line 𝑋1 + 𝑋2 = 𝑎 for 0 ≤ 𝑎 ≤ 1 triangular and for 1 < 𝑎 < 2 non-
triangular (“𝑎” can take on values between 0 to 2).
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑓𝑋1 (𝑥1 )𝑓𝑋2 (𝑥2 ) = 1 × 1 = 1

𝑎 𝑎−𝑥2
∫ ∫ 1 𝑑𝑥1 𝑑𝑥2 ; 0≤𝑎≤1
0 0
𝐹𝑌 (𝑎) = 𝑃(𝑌 ≤ 𝑎) = 𝑃(𝑋1 + 𝑋2 ≤ 𝑎) ⇒ 𝐹𝑌 (𝑎) = 1 1
1 − ∫ ∫ 1 𝑑𝑥1 𝑑𝑥2 ; 1≤𝑎≤2
{ 𝑎−1 𝑎−𝑥2
𝑎2
; 0≤𝑎≤1 𝑑𝐹𝑌 (𝑎) 𝑎 ; 0≤𝑎≤1
⇒ 𝐹𝑌 (𝑎) = 2 ⇒ 𝑓𝑌 (𝑎) = = {2 − 𝑎 ; 1≤𝑎≤2
𝑎2 𝑑𝑎 0 ; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
{2𝑎 − 1 − 2 ; 1≤𝑎≤2
However, since the joint density function is constant and equal to 1, to
calculate the probability a given region like 𝑃(𝑋1 + 𝑋2 ≤ 𝑎), we can use the ratio of
areas. Hence, this problem can be solved without taking integral as follows:
451 | P a g e
𝑎2
; 0≤𝑎≤1
𝐹𝑌 (𝑎) = 𝑃(𝑌 ≤ 𝑎) = 𝑃(𝑋1 + 𝑋2 ≤ 𝑎) ⇒ 𝐹𝑌 (𝑎) = 2
(2 − 𝑎)2
{1 − 2
; 1≤𝑎≤2
𝑎 ; 0≤𝑎≤1
𝑑𝐹𝑌 (𝑎) 𝑓𝑍 (𝑎)
⇒ 𝑓𝑌 (𝑎) = = {2 − 𝑎 ; 1≤𝑎≤2
𝑑𝑎
1
0 1 2 𝑎
It is seen that although the density functions of 𝑋1 and 𝑋2 are uniform and
linear, the density function of 𝑌 = 𝑋1 + 𝑋2 is triangular. We will see further that as the
number of independent and identically distributed random variables increases, the
density function of their sum resembles more the density function of the normal
distribution.
After solving the above examples, it should be noted that, in many cases, the
sum of multiple random variables is well-known. Some of them were mentioned and
proved before and some other ones are expressed in the following pages:
Note 3.1
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent Bernoulli random variables with identical
parameters 𝑝, then the random variable 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 follows a binomial distribution
with parameters (𝑛, 𝑝).
Note 3.2
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent binomial random variables with respective
parameters (𝑛1 , 𝑝), (𝑛2 , 𝑝), … , (𝑛𝑟 , 𝑝), then the random variable 𝑌 = ∑𝑟𝑖=1 𝑋𝑖 follows a
binomial distribution with parameters (∑𝑟𝑖=1 𝑛𝑖 , 𝑝).
452 | P a g e
Note 3.3
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent geometric random variables with identical
parameters 𝑝, then the random variable 𝑌 = ∑𝑟𝑖=1 𝑋𝑖 follows a negative binomial
distribution with parameters (𝑟, 𝑝).
Note 3.4
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent negative binomial random variables with
respective parameters (𝑛1 , 𝑝), (𝑛2 , 𝑝), … , (𝑛𝑟 , 𝑝), then the random variable 𝑌 = ∑𝑘𝑖=1 𝑋𝑖
follows a binomial distribution with parameters (∑𝑘𝑖=1 𝑟𝑖 , 𝑝).
Note that the parameter 𝑝 in all of the 𝑋𝑖 's is the same for all the four notes
mentioned above.
Note 3.5
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent Poisson random variables with respective
parameters 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 , then the random variable 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 follows a Poisson
distribution with parameters ∑𝑛𝑖=1 𝜆𝑖 .
Note 3.6
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent exponential random variables with identical
parameters 𝜆, then the random variable 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 follows a Gamma (Erlang)
distribution with parameters (𝛼 = 𝑛, 𝜆).
Note 3.7
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent gamma random variables with respective
parameters (𝛼1 , 𝜆), . . . , (𝛼𝑛 , 𝜆), then the random variable 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 follows a Gamma
(Erlang) distribution with parameters (∑𝑛𝑖=1 𝛼𝑖 , 𝜆).
Note 3.8
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent normal random variables with respective
parameters (𝜇1 , 𝜎12 ), . . . , (𝜇𝑛 , 𝜎𝑛2 ), then the random variable 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 follows a normal
distribution with parameters (∑𝑛𝑖=1 𝜇𝑖 , ∑𝑛𝑖=1 𝜎𝑖2 ). Moreover, the random variable 𝑍 =
∑𝑛𝑖=1 𝑎𝑖 𝑋𝑖 follows a normal distribution with parameters (∑𝑛𝑖=1 𝑎𝑖 𝜇𝑖 , ∑𝑛𝑖=1 𝑎𝑖 𝜎𝑖2 ).
453 | P a g e
For example, if 𝑋1 and 𝑋2 are independent normal random variables with
respective parameters (𝜇1 , 𝜎12 ) and (𝜇2 , 𝜎22 ), then the random variable 𝑌 = 𝑋1 − 𝑋2
follows a normal distribution with parameters (𝜇1 − 𝜇2 , 𝜎12 + 𝜎22 ). To prove this, it
suffices to write 𝑌 as 𝑌 = 𝑋1 + (−𝑋2 ).
The reader should note that Note 3.1 is a special case of Note 3.2 since the
Bernoulli random variable is a special case of the binomial random variable. Note 3.3
is also a special case of Note 3.4 because the geometric random variable is a special
case of the negative binomial distribution. Finally, Note 3.6 is a special case of 3.7
since the exponential random variable is a special case of the gamma random
variable.
Example 3.5
A manufacturer produces mechanical parts. If each part is defective

independently with probability 0.1, and the manufacturer produces 10 parts in one
day and 8 parts in another day, obtain the probability that there are 3 defective parts
in all the parts.
Solution. The number of defective parts in the first day follows a binomial
distribution with parameters (𝑛1 = 10, 𝑝 = 0.1), and the number of defective parts in
the second day follows a binomial distribution with parameters (𝑛2 = 8, 𝑝 = 0.1).
Therefore, the total number of defective parts in the first and second days follows a
binomial distribution with parameters (𝑛1 + 𝑛2 = 18, 𝑝 = 0.1). The probability that
there are 3 defective parts in the total parts is equal to:
18
( ) (0.1)3 (0.9)15 = 0.168
3
454 | P a g e
Example 3.6
If the number of male and female customers entering a store per hour follow
Poisson distributions with respective parameters 3 and 5, obtain the probability that
5 individuals enter the store in one hour.
Solution. The distribution of the number of people entering the store per hour (the
total number of male and female customers) is Poisson with parameter (5 + 3). As a
result, the probability that 5 individuals enter the store in one hour is equal to:
𝑒 −8 85
5!
Example 3.7
The lifetimes of two batteries produced by factories 𝐴 and 𝐵, independently,

follow respective distributions 𝑁(10,9) and 𝑁(12,7). If one battery is randomly
selected from each producer, obtain the probability that the lifetime of battery from
factory 𝐴 lasts longer than that of factory 𝐵.
Solution. Suppose that 𝑋𝐴 denotes the lifetime of battery produced by factory 𝐴, and
𝑋𝐵 indicates the lifetime of battery produced by factory 𝐵. Therefore, we have:
𝑋𝐴 − 𝑋𝐵 ~𝑁(𝜇𝐴 − 𝜇𝐵 = 10 − 12, 𝜎𝐴2 + 𝜎𝐵2 = 9 + 7)

2 − (−2)
⇒ 𝑃(𝑋𝐴 > 𝑋𝐵 + 2) = 𝑃(𝑋𝐴 − 𝑋𝐵 > 2) = 𝑃(𝑍 > ) = 𝑃(𝑍 > 1) = 0.1587
4
455 | P a g e
Theorem 3.1
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent random variables and the random variable 𝑌 is
defined as 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 , then the moment generating function of 𝑌 is equal to the
product of moment generating functions of the random variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 . That
is, we have:
𝑛
𝑀𝑌 (𝑡) = ∏ 𝑀𝑋𝑖 (𝑡)

𝑖=1
Proof. 𝑀𝑌 (𝑡) = 𝐸(𝑒 𝑡𝑌 ) = 𝐸(𝑒 𝑡(𝑋1 +⋯+𝑋𝑛) ) = 𝐸(𝑒 𝑡𝑋1 𝑒 𝑡𝑋2 … 𝑒 𝑡𝑋𝑛 )
Due to the independence of the 𝑋𝑖 's, we have:
𝐸(𝑒 𝑡𝑋1 𝑒 𝑡𝑋2 … 𝑒 𝑡𝑋𝑛 ) = 𝐸(𝑒 𝑡𝑋1 ) 𝐸(𝑒 𝑡𝑋2 ) … 𝐸(𝑒 𝑡𝑋𝑛 ) = 𝑀𝑋1 (𝑡) 𝑀𝑋2 (𝑡) … 𝑀𝑋𝑛 (𝑡)
Thus, the theorem is proven.
Using Theorem 3.1, we can simply obtain the moment generating functions of
the binomial, negative binomial, and Erlang random variables. This is because each
of these variables can be written as the sum of other independent random variables.
For instance, in addition to the method mentioned in Section 6.3.1 of Chapter 6, the
moment generating function of the binomial random variable which is equal to the
sum of 𝑛 independent Bernoulli random variable can be obtained as follows:
𝑌 ∼ 𝐵(𝑛, 𝑝) ⇒ 𝑌 = ∑ 𝑋𝑖
𝑖=1
In which 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent random variables with identical

parameters 𝑝 and the moment generating function of (𝑝𝑒 𝑡 + 1 − 𝑝). Hence, we have:
𝑛 𝑛
𝑀𝑌 (𝑡) = ∏ 𝑀𝑋𝑖 (𝑡) = ∏(𝑝𝑒 𝑡 + 1 − 𝑝) = (𝑝𝑒 𝑡 + 1 − 𝑝)𝑛

𝑖=1 𝑖=1
456 | P a g e
Likewise, the moment-generating function of negative binomial random
variable which is equal to the sum of independent geometric random variables can
be obtained. Similarly, the moment generating function of Erlang which is equal
to the sum of independent exponential random variables can be obtained.
Moreover, as mentioned in the previous chapters, the moment generating
function of any random variable is unique, and the type of random variable can be
determined by its moment generating function. Considering that and using
Theorem 3.1, the proof of Notes from 3.1 to 3.8 is straightforward. For example, to
prove Note 3.8, we have:
𝜎2 𝑡 2
𝜇𝑖 𝑡+ 𝑖
𝑋𝑖 ∼ 𝑁(𝜇𝑖 , 𝜎𝑖2 ) ⇒ 𝑀𝑋𝑖 (𝑡) = 𝑒 2
𝑛 𝑛 𝑛
𝜎2 𝑡 2 𝑡 2 ∑𝑛 2
𝑖=1 𝜎𝑖
𝜇𝑖 𝑡+ 𝑖 𝑡 ∑𝑛
𝑖=1 𝜇𝑖 +
𝑌 = ∑ 𝑋𝑖 ⇒ 𝑀𝑌 (𝑡) = ∏ 𝑀𝑋𝑖 (𝑡) = ∏𝑒 2 =𝑒 2
𝑖=1 𝑖=1 𝑖=1
Hence, since the moment generating function of any random variable, such
as the normal one, is unique and the moment generating function of 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 is
the same as that of a normal distribution with parameters (∑𝑛𝑖=1 𝜇𝑖 , ∑𝑛𝑖=1 𝜎𝑖2 ), we can
prove that the random variable 𝑌 follows the normal distribution as well.
T he central limit theorem is one of the most significant and applicable results
of the probability theory. It is about the sum of a large number of
independent and identically distributed random variables. This theorem states
that the sum of a large number of independent and identically distributed random
variables follows approximately a normal distribution.
Theorem 4.1
If 𝑋1 , 𝑋2 , … is a series of independent and identically distributed random variables
𝑋1 +⋯+𝑋𝑛 −𝑛𝜇
with mean 𝜇 and variance 𝜎 2 , then the distribution of as 𝑛 → ∞ tends to
𝜎 √𝑛
the standard normal distribution.
Proof. We will show in the next chapter that if 𝑋1 , 𝑋2 , … is a series of independent

and identically distributed random variables with mean 𝜇 and variance 𝜎 2 , the
457 | P a g e
random variable 𝑋1 + ⋯ + 𝑋𝑛 has mean 𝑛𝜇 and variance 𝑛𝜎 2 as well as the standard
deviation 𝜎√𝑛. The central limit theorem intuitively states that as 𝑛 increases, the
distribution of 𝑋1 + ⋯ + 𝑋𝑛 tends to the normal distribution.
To prove the central limit theorem, suppose that the 𝑋𝑖 's are independent
random variables with 𝜇 = 0 and 𝜎 2 = 1. According to the explanations of Proposition
𝑋𝑖
9.7 of Chapter 5, the moment generating function of the random variable is equal
√𝑛
𝑡
to 𝑀𝑋𝑖 ( 𝑛). Therefore, based on Theorem 3.1 of this chapter, the moment generating
√
∑𝑛 𝑛
𝑖=1 𝑋𝑖 𝑡
function of is equal to (𝑀𝑋𝑖 ( 𝑛)) .
√𝑛 √
Considering Problem 57 of Chapter 5, if we define 𝐺𝑋𝑖 (𝑡) as 𝑙𝑛 (𝑀𝑋𝑖 (𝑡)), we have:
𝐺𝑥𝑖 (0) = 0
′ (0)
𝐺𝑥𝑖 =𝜇= 0
𝐺𝑥″𝑖 (0) = 𝜎 2 = 1
𝑛 𝑡2
𝑡
Now, if we prove that 𝑙𝑖𝑚 (𝑀𝑋𝑖 ( 𝑛)) is equal to 𝑒 , or equivalently, taking
2
𝑛→∞ √
𝑡 𝑛 𝑡
natural logarithms of both sides, we can prove that 𝑙𝑖𝑚 𝑙𝑛 (𝑀𝑋𝑖 ( 𝑛)) = 𝑙𝑖𝑚 𝑛𝐺( 𝑛) is
𝑛→∞ √ 𝑛→∞ √
𝑡2
equal to . In fact, we have proven that the moment generating function of the
2
∑𝑛
𝑖=1 𝑋𝑖
random variable is equal to that of the standard normal random variable. To
√𝑛
show this, regarding the L’Hopital’s rule, we have:
𝑡 𝑡 3 𝑡
) 𝐺( −𝐺 ′ ( )𝑛−2 𝑡 −𝐺 ′ ( )𝑡
𝑡 𝑛 √𝑛 √𝑛
𝑙𝑖𝑚 𝑛𝐺( ) = 𝑙𝑖𝑚 √ −1
= 𝑙𝑖𝑚 −2
= 𝑙𝑖𝑚 1
𝑛→∞ √𝑛 𝑛→∞ 𝑛 𝑛→∞ −2𝑛 𝑛→∞
−2𝑛−2
Again by using the L’Hopital’s rule, it can be shown that the above expression
is equal to:
𝑡 3
−𝐺 ″ ( )𝑛−2 𝑡 2
√𝑛 ″
𝑡 𝑡2 𝑡2
𝑙𝑖𝑚 3 = 𝑙𝑖𝑚 𝐺 ( ) =
𝑛→∞
−2𝑛−2
𝑛→∞ √𝑛 2 2
Therefore, the central limit theorem for the case that the 𝑋𝑖 's are independent
random variables with 𝜇 = 0 and 𝜎 2 = 1 is proven. In general, if the 𝑋𝑖 's are
independent random variables with mean 𝜇 and variance 𝜎 2 , it can similarly be
458 | P a g e
proven that the central limit theorem is valid for them since the random variables
𝑋𝑖 −𝜇
𝑌𝑖 = have mean zero and variance 1.
𝜎
Practically speaking, the central limit theorem states that if the value of 𝑛 is
large enough, the distribution of the random variable 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 becomes
approximately normal. Note that the appropriate value of 𝑛 needed to make the
distribution of 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 normal depends on the distribution of 𝑋𝑖 's. As the
distribution of the 𝑋𝑖 's becomes more symmetric and akin to that of the normal
distribution, the less value of 𝑛 is needed to make the ∑𝑛𝑖=1 𝑋𝑖 approximately normal.
Example 4.1
If the purchase amount of each person from a store follows a uniform

distribution in the interval (0,2), then what is the probability that the sum of daily
sales amount of the store exceeds 55 in a day that 50 individuals have entered the
store?
Solution. Suppose that 𝑋 denotes the daily sales and 𝑋𝑖 's denote the purchase
amount of each person (𝑖 = 1,2,3, . . . ,50( .
(2 − 0)2 1
𝐸(𝑋𝑖 ) = 1 𝑉𝑎𝑟(𝑋𝑖 ) = =
12 3
𝑋1 + ⋯ + 𝑋50 − 50 × 1 55 − 50
~𝑍 ⇒ 𝑃(𝑋1 + ⋯ + 𝑋50 > 55) ≃ 𝑃 𝑍 >
√1 × √50 √50
3 ( 3 )
= 𝑃(𝑍 > 1.22) = 0.111
Therefore, the desired probability is approximately equal to 0.111.
Using the central limit theorem, we can show that the distributions written as
the sum of 𝑘 independent and identically distributed random variables will follow
normal distribution approximately as the value of 𝑘 increases. For example, the
459 | P a g e
binomial, negative binomial, and Erlang distributions can be written as the sum of 𝑛
Bernoulli random variables, 𝑟 geometric random variables, and 𝛼 exponential random
variables, respectively. Moreover, the referred distributions approximately are
normal as the values of parameters 𝑛, 𝑟, and 𝛼 increase.
However, it should be noted that if 𝑋𝑖 's are independent normal distributions,

then it is not required that the value of 𝑛 is large to make the distribution of 𝑌 =
∑𝑛𝑖=1 𝑋𝑖 normal. In fact, if the distribution of 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 is normal, then regardless of
the value of 𝑛, the distribution of the random variable 𝑌 = ∑𝑛𝑖=1 𝑋𝑖 is normal.
𝑋1 +⋯+𝑋𝑛 −𝑛𝜇
Furthermore, if, in the fraction , we divide the numerator and
𝜎 √𝑛
denominator of the fraction by 𝑛, this theorem can be presented as follows:
𝑋̄ − 𝜇
𝑙𝑖𝑚 𝜎 ∼𝑍
𝑛→∞
√𝑛
𝑋1 +𝑋2 +⋯+𝑋𝑛
where in the above expression, 𝑋̄ can be defined as , which is called
𝑛
the sample mean of the random variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 .
Example 4.2
Suppose that 𝑋1 , 𝑋2 , . . . , 𝑋100 denote the lifetimes of 100 light bulbs following the
exponential random variables with mean 0.5 month. What is the approximate
probability that the sum of their lifetime is more than 57 months?
Solution.
100
𝛼 𝛼
∑ 𝑋𝑖 ~𝛤(𝛼 = 100 , 𝜆 = 2) ≈ 𝑁(𝜇 = = 50, 𝜎 2 = 2 = 25)
𝜆 𝜆
𝑖=1
460 | P a g e
100
57 − 50
𝑃(∑ 𝑋𝑖 > 57) = 𝑃(𝑍 > ) = 𝑃(𝑍 > 1.4) ≈ 0.08
𝑖=1
√25
Example 4.3
If the lifetime of a light bulb follows an exponential distribution with mean 500
units of time, what is the probability that the mean sample of the lifetime of 100 light
bulbs is between 475 and 555 units of time?
Solution. Suppose that 𝑋1 , 𝑋2 , . . . , 𝑋100 denote the lifetimes of the referred 100 light
bulbs. Hence, we have:
1 1 1
𝑋𝑖 ∼ 𝐸𝑋𝑃(𝜆 = ) ⇒ 𝜇 = = 500, 𝜎 = = 500
500 𝜆 𝜆
475 − 𝜇 𝑋̄ − 𝜇 555 − 𝜇 475 − 500 𝑋̄ − 500 555 − 500
𝑃(475 < 𝑋̄ < 555) = 𝑃( 𝜎 < 𝜎 < 𝜎 ) = 𝑃( < < )
50 50 50
√𝑛 √𝑛 √𝑛
≈ 𝑃(−0.5 < 𝑍 < 1.1) ≈ 0.55
S uppose that 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent and identically distributed continuous

random variables having a common density function 𝑓 and cumulative
distribution function 𝐹. In statistics, any function of independent and identically
distributed random variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 is called a statistic. One of the most
important and applicable statistics is the order statistics. Now, suppose that the
functions 𝑋(1) , 𝑋(2) , . . . , 𝑋(𝑛) are defined on the random variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 as
follows:
461 | P a g e
𝑋(1) = the smallest of 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 = 𝑀𝑖𝑛{ 𝑋1 , . . . , 𝑋𝑛 }
𝑋(2) = the second smallest of 𝑋1 , 𝑋2 , . . . , 𝑋𝑛
𝑋(𝑗) = the 𝑗 𝑡ℎ smallest of 𝑋1 , 𝑋2 , . . . , 𝑋𝑛
𝑋(𝑛) = the 𝑛𝑡ℎ smallest of 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 = 𝑀𝑎𝑥{ 𝑋1 , . . . , 𝑋𝑛 }
If so, the ordered values 𝑋(1) ≤ 𝑋(2) ≤. . . ≤ 𝑋(𝑛) are called the order statistics
of 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 .
It should be noted that since each of the order statistics is a function of the
random variables, each of them will be a random variable as well. Moreover, the
reader should note that the order statistics depend on each other because their
domains interrelated.
To understand the order statistics better, suppose that people A, B, and C
independently arrive at their work office at a time uniformly distributed between
08:00 to 09:00 a.m. Also, 𝑋1 , 𝑋2, and 𝑋3 denote the arrival times of people A, B, and
C, respectively. For simplicity, we consider 08:00 a.m. to be the origin of time and
define the time in terms of minutes. If we get the values 𝑋1 = 41, 𝑋2 = 23, and 𝑋3 =
34 in a day, then we have:
𝑋(1) = 23 𝑋(2) = 34 𝑋(3) = 41
Now, if we get the values 𝑋1 = 29, 𝑋2 = 53, and 𝑋3 = 14 in another day, then
the order statistics of 𝑋1 , 𝑋2, and 𝑋3 are equal to:
𝑋(1) = 14 𝑋(2) = 29 𝑋(3) = 53
Note that the distributions of 𝑋1, 𝑋2, and 𝑋3 are the same with a common
1
density function 𝑓 = 60 in the interval (0,60). However, as we will show in the next
example, the distributions of 𝑋(1) , 𝑋(2) , and 𝑋(3) are different.
462 | P a g e
Example 5.1
Suppose that people A, B, and C independently arrive at their work office at a

time uniformly distributed between 08:00 to 09:00 a.m. If 𝑋1, 𝑋2, and 𝑋3 denote the
arrival time of people A, B, and C, respectively, and 𝑋(1) , 𝑋(2) , and 𝑋(3) denote the
respective arrival times of the first, second, and third individuals to the work office,
obtain the density function of 𝑋(1) , 𝑋(2) , and 𝑋(3) .
Solution. We consider 08:00 a.m. to be the origin of time and define it in terms of
minutes.
The event (𝑋(1) ≤ 𝑎) means that the first individual should arrive before the
moment “𝑎”.
To realize this event, at least one person should arrive before the moment “𝑎”.
Therefore, we have:
3
3 𝑎 𝑎
𝐹𝑋(1) (𝑎) = 𝑃(𝑋(1) ≤ 𝑎) = ∑ ( ) ( )𝑖 (1 − )3−𝑖
𝑖 60 60
𝑖=1
3 𝑎 𝑎 𝑎
= 1 − ( ) ( )0 (1 − )3 = 1 − (1 − )3
0 60 60 60
𝑑 𝐹𝑋(1) (𝑎) 3 𝑎 2
⇒ 𝑓𝑋(1) (𝑎) = = (1 − ) ; 0 < 𝑎 < 60
𝑑𝑎 60 60
As seen, although the arrival time of each individual follows a uniform

distribution independently, the density function of the first person's arrival time of
is not uniform. However, the density of the first person's arrival time is more in the
beginning of the interval 0 to 60 (minutes).
The event (𝑋(2) ≤ 𝑎) means that the second individual should arrive before the
moment “𝑎”.
To realize this event, at least two people should arrive before the moment “𝑎”.
Hence, we have:
463 | P a g e
3
3 𝑎 𝑎
𝐹𝑋(2) (𝑎) = 𝑃(𝑋(2) ≤ 𝑎) = ∑ ( ) ( )𝑖 (1 − )3−𝑖
𝑖 60 60
𝑖=2
3 𝑎 𝑎 3 𝑎 𝑎
= ( ) ( )2 (1 − ) + ( ) ( )3 (1 − )0
2 60 60 3 60 60
𝑑𝐹𝑋(2) (𝑎)
⇒ 𝑓𝑋(2) (𝑎) =
𝑑𝑎
1 𝑎 𝑎
=3×2×( )( )(1 − ) ; 0 < 𝑎
60 60 60
< 60
As seen, the density function of the second person's arrival time is more in the
middle of the interval 0 to 60 (minutes).
The event (𝑋(3) ≤ 𝑎) means that the third individual should arrive before
moment “𝑎”.
To realize this event, at least three people should arrive before moment “𝑎”.
As a result, we have:
3
3 𝑎 𝑎
𝐹𝑋(3) (𝑎) = 𝑃(𝑋(3) ≤ 𝑎) = ∑ ( ) ( )𝑖 (1 − )3−𝑖
𝑖 60 60
𝑖=3
3 𝑎 𝑎 𝑎
= ( ) ( )3 (1 − )0 = ( )3
3 60 60 60
𝑑 𝐹𝑋(3) (𝑎) 3 𝑎 2
⇒ 𝑓𝑋(3) (𝑎) = = ( ) ; 0 < 𝑎 < 60
𝑑𝑎 60 60
As seen, the density function of the third person's arrival time is more in the
ending of the interval 0 to 60 (minutes).
In general, suppose that 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent and identically

distributed continuous random variables having a common density function 𝑓 and
cumulative distribution function 𝐹. To obtain the cumulative distribution function
of the 𝑗 𝑡ℎ order statistic or 𝐹𝑋(𝑗) (𝑎), we should know that the event (𝑋(𝑗) ≤ 𝑎) occurs
when at least j out of the 𝑋𝑖 's are less than “𝑎”. Hence, we have:
464 | P a g e
𝑛
𝑛
𝐹𝑋(𝑗) (𝑎) = 𝑃(𝑋(𝑗) ≤ 𝑎) = ∑ ( ) [𝐹(𝑎)]𝑘 [1 − 𝐹(𝑎)]𝑛−𝑘
𝑘
𝑘=𝑗
Moreover, the density function of 𝑋(𝑗) is obtained by differentiating the above

expression with respect to 𝑎 as follows:
𝑑𝐹𝑋(𝑗) (𝑎) 𝑛!
𝑓𝑋(𝑗) (𝑎) = = [𝐹 (𝑎)]𝑗−1 𝑓(𝑎) × [1 − 𝐹 (𝑎)]𝑛−𝑗
𝑑𝑎 (𝑗 − 1)! 1! (𝑛 − 𝑗)!
There is another method to obtain the density function of the 𝑗 𝑡ℎ order

statistic. As mentioned in Chapter 4, 𝑓𝑋 (𝑎)𝑑𝑎 is the probability that the random
variable 𝑋 lies within an interval of length 𝑑𝑎 about the point “𝑎”. Therefore, 𝑓𝑋(𝑗) (𝑎)𝑑𝑎
is the probability that the random variable 𝑋(𝑗) (the 𝑗 𝑡ℎ order statistic) lies within an
interval of length 𝑑𝑎 about the point “𝑎”. To realize this event, (𝑗 − 1) of the 𝑋𝑖 's should
be before “𝑎”, one of the 𝑋𝑖 's should be in an interval of length da about the point “𝑎”,
and the remaining (𝑛 − 𝑗) of the 𝑋𝑖 's should be after “𝑎”. As a result, we have:
𝑛 𝑛 − (𝑗 − 1) 𝑛−𝑗
𝑓𝑋(𝑗) (𝑎)𝑑𝑎 = (𝑗 − 1) [𝐹(𝑎)]𝑗−1 ( ) 𝑓(𝑎)𝑑𝑎 ( ) [1 − 𝐹(𝑎)]𝑛−𝑗
1 𝑛 − 𝑗
𝑛!
⇒ 𝑓𝑋(𝑗) (𝑎) = [𝐹(𝑎)]𝑗−1 × 𝑓(𝑎) × [1 − 𝐹(𝑎)]𝑛−𝑗
(𝑗 − 1)! 1! (𝑛 − 𝑗)!
If we intend to obtain the density functions of 𝑋(1) , 𝑋(2) , and 𝑋(3) in Example 5.1
by employing the above formula, we have:
3 𝑎 1 𝑎 3 𝑎
𝑓𝑋(1) (𝑎) = ( ) ( )0 × ( ) × (1 − )2 = (1 − )2 ; 0 < 𝑎 < 60
0,1,2 60 60 60 60 60
3 𝑎 1 𝑎 𝑎 𝑎2
𝑓𝑋(2) (𝑎) = ( ) ( )1 × ( ) × (1 − )1 = 6 2 − 6 3 ; 0 < 𝑎 < 60
1,1,1 60 60 60 60 60
3 𝑎 1 𝑎 3 𝑎
𝑓𝑋(3) (𝑎) = ( ) ( )2 × ( ) × (1 − )0 = ( )2 ; 0 < 𝑎 < 60
2,1,0 60 60 60 60 60
465 | P a g e
Example 5.2
An electronic system is composed of 4 parts. The life time of each part follows
an exponential distribution with parameter 𝜆 = 1. These 4 parts begin to functioning
simultaneously. The system functions until at least two parts function. Obtain the
cumulative distribution and density functions of the lifetime of the system.
Solution. The system functions until at least two parts function. Therefore, the
lifetime of the system is equal to the time that it takes for the third part to fail. Hence,
we have:
4
4
𝐹𝑋(3) (𝑎) = ∑ ( ) [𝐹(𝑎)]𝑖 [1 − 𝐹(𝑎)]4−𝑖 ; 𝑎 > 0
𝑖
𝑖=3
Where 𝐹(𝑎) is equal to 1 − 𝑒 −𝑎 .

4 4
⇒ 𝐹𝑋(3) (𝑎) = ( ) (1 − 𝑒 −𝑎 )3 × (𝑒 −𝑎 )1 + ( ) (1 − 𝑒 −𝑎 )4 (𝑒 −𝑎 )0
3 4
Furthermore, the density function of the system's lifetime is:
4! 4!
𝑓𝑋(3) (𝑎) = [𝐹(𝑎)]2 × 𝑓(𝑎) × [1 − 𝐹(𝑎)]1 = (1 − 𝑒 −𝑎 )2 × 𝑒 −𝑎 × (𝑒 −𝑎 )1 ; 𝑎 > 0
2! 1! 1! 2! 1! 1!
Example 5.3
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 are independent exponential random variables with identical

parameters 𝜆, show that the distribution of 𝑌 = 𝑀𝑖𝑛(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) is exponential with
parameter 𝜆′ = 𝑛𝜆.
466 | P a g e
Solution.
𝑛!
𝑓𝑋(1) (𝑎) = [𝐹(𝑎)]0 𝑓(𝑎) × [1 − 𝐹(𝑎)]𝑛−1 = 𝑛[1 − 𝑒 −𝜆𝑎 ]0 𝜆𝑒 −𝜆𝑎 × [𝑒 −𝜆𝑎 ]𝑛−1
0! 1! (𝑛 − 1)!
= 𝑛𝜆𝑒 −𝑛𝜆𝑎 ; 𝑎 > 0
As seen, the density function above is an exponential density function with
rate 𝑛𝜆.
Example 5.4
Consider three light bulbs, each having an exponential distribution with

parameter 𝜆 independently. We turn on the three light bulbs simultaneously. It is
a. The expected value of the time until the first light bulb fails.
b. The expected value of the time until the second light bulb fails.
c. The expected value of the time until the third light bulb fails.
Solution.
a. The time that it takes for the first light bulb to fail is simply the minimum
lifetime of the three light bulbs, which is exponential with parameter 3𝜆
1
and mean 3𝜆, according to the preceding example.
b.
∞ ∞
3!
𝐸(𝑋(2) ) = ∫ 𝑎 𝑓𝑥(2) (𝑎) 𝑑𝑎 = ∫ 𝑎 [𝐹(𝑎)]1 𝑓(𝑎)[1 − 𝐹(𝑎)]1 𝑑𝑎
0 0 1! × 1! × 1!
∞ ∞ ∞
−𝜆𝑎 −𝜆𝑎 −𝜆𝑎 −2𝜆𝑎
= ∫ 6𝑎(1 − 𝑒 ) × 𝜆𝑒 ×𝑒 𝑑𝑎 = ∫ 6𝑎 𝜆𝑒 𝑑𝑎 − ∫ 6𝑎 𝜆𝑒 −3𝜆𝑎 𝑑𝑎
0 0 0
3 2 5
= − =
2𝜆 3𝜆 6𝜆
The second solution: The time that it takes for the first light bulb to fail
is simply the minimum lifetime of the three light bulbs, which is
1
exponential with parameter 3𝜆 and mean 3𝜆. After failing of the first light
467 | P a g e
bulb, since the exponential distribution is memoryless, the time until
the failure of the next light bulb (the time until the failure of one of the
remaining two light bulbs) is exponential with parameter 2𝜆 and mean
1
. Hence, the mean time until failing of the second light bulb is equal to
2𝜆
1 1 5
+ 2𝜆 = 6𝜆.
3𝜆
c. Similar to the preceding section, it can be shown that the mean time
1 1 1
until failing of the third light bulb is 3𝜆 + 2𝜆 + 𝜆.
Example 5.5
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent random variables uniformly distributed over

the interval (0,1), show that the distribution of 𝑋(𝑖) or the 𝑖 𝑡ℎ order statistic of the
𝑖
𝑋𝑖 's is a beta distribution with parameters (𝑎 = 𝑖 , b = 𝑛 − 𝑖 + 1) and mean 𝑛+1.
Solution. 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent and identically distributed continuous
1
random variables with the common density function 𝑓(𝑥) = 1−0 ; 0 < 𝑥 < 1 and
𝑥−0
cumulative distribution function 𝐹(𝑥) = 1−0 = 𝑥; 0 < 𝑥 < 1. Hence, we have:
𝑛!
𝑓𝑋(𝑖) (𝑥) = [𝐹(𝑥)]𝑖−1 𝑓(𝑥)[1 − 𝐹(𝑥)]𝑛−𝑖
(𝑖 − 1)! 1! (𝑛 − 𝑖)!
𝑛!
= 𝑥 𝑖−1 (1 − 𝑥)𝑛−𝑖
(𝑖 − 1)! 1! (𝑛 − 𝑖)!
⇒ 𝑓𝑋(𝑖) (𝑥) = 𝑐𝑥 𝑖−1 (1 − 𝑥)𝑛−𝑖 ; 0 < 𝑥 < 1 ⇒ 𝑋(𝑖) ~𝛽(𝑖, 𝑛 − 𝑖 + 1)
𝑖 𝑖
⇒ 𝐸(𝑋(𝑖) ) = =
𝑖+𝑛−𝑖+1 𝑛+1
Proposition 5.1
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent and identically distributed continuous random
1
variables, then 𝑃(𝑋1 < 𝑋2 <. . . < 𝑋𝑛 ) = 𝑛!.
468 | P a g e
Proof. To comprehend the above proposition better, it suffices to know that since
𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent and identically distributed continuous random
variables, every possible 𝑛! states of their arrangement have the same probability. As
1
a result, the probability of each arrangement is equal to 𝑛!. For example, we have:
1
𝑃(𝑋1 < 𝑋2 ) = 𝑃(𝑋2 < 𝑋1 ) =
2!
1
𝑃(𝑋1 < 𝑋2 < 𝑋3 ) = 𝑃(𝑋1 < 𝑋3 < 𝑋2 ) = ⋯ = 𝑃(𝑋3 < 𝑋2 < 𝑋1 ) =
3!
Proposition 5.2
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent and identically distributed continuous random
variables, then the probability that 𝑋1 is the smallest of 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 equals:
1
𝑃(𝑋1 = 𝑀𝑖𝑛{𝑋1 , 𝑋2 , . . . , 𝑋𝑛 }) = 𝑃(𝑋1 < 𝑋2 , 𝑋1 < 𝑋3 , . . . , 𝑋1 < 𝑋𝑛 ) =
𝑛
Proof. To comprehend the above proposition better, it suffices to know that since
𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent and identically distributed continuous random
variables, each of n random variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 is less than the other ones with
1 1
probability 𝑛. Hence, the probability that 𝑋1 is less than the others is equal to 𝑛.
Proposition 5.3
If the joint density function of 𝑋(1) , . . . , 𝑋(𝑛) is denoted by f𝑋(1) ,...,𝑋(𝑛) (𝑎1 , . . . , 𝑎𝑛 ), then it
can be proven that:
f𝑋(1),...,𝑋(𝑛) (𝑎1 , . . . , 𝑎𝑛 ) = 𝑛! f𝑋1,...,𝑋𝑛 (𝑎1 , . . . , 𝑎𝑛 ) = 𝑛! f (𝑎1 ) f (𝑎2 ). . . f (𝑎𝑛 )
Where f is the density function of 𝑋𝑖 's.
Proof. To realize 𝑋(1) = 𝑥1 , 𝑋(2) = 𝑥2 , … , 𝑋(𝑛) = 𝑥𝑛 , there are 𝑛! states for 𝑋1 , 𝑋2 , . . . , 𝑋𝑛

such that one of them is 𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , … , 𝑋𝑛 = 𝑥𝑛 . Therefore, we have:
𝜀 𝜀 𝜀 𝜀 𝜀 𝜀
𝑃(𝑎1 − ≤ 𝑋(1) ≤ 𝑎1 + , 𝑎2 − ≤ 𝑋(2) ≤ 𝑎2 + , . . . , 𝑎𝑛 − ≤ 𝑋(𝑛) ≤ 𝑎𝑛 + )
2 2 2 2 2 2
= 𝑛! 𝑃(𝑎1 − ≤ 𝑋1 ≤ 𝑎1 + , 𝑎2 − ≤ 𝑋2 ≤ 𝑎2 + , . . . , 𝑎𝑛 − ≤ 𝑋𝑛 ≤ 𝑎𝑛 + )
2 2 2 2 2 2
469 | P a g e
Moreover, according to the explanations of Section 8.2.2 in the preceding
chapter, we have:
𝑃(𝑎1 − ≤ 𝑋1 ≤ 𝑎1 + , 𝑎2 − ≤ 𝑋2 ≤ 𝑎2 + , . . . , 𝑎𝑛 − ≤ 𝑋𝑛 ≤ 𝑎𝑛 + )
2 2 2 2 2 2
= 𝜀 𝑛 × 𝑓𝑋1 ,...,𝑋𝑛 (𝑎1 , . . . , 𝑎𝑛 )
𝑃(𝑎1 − ≤ 𝑋(1) ≤ 𝑎1 + , 𝑎2 − ≤ 𝑋(2) ≤ 𝑎2 + , . . . , 𝑎𝑛 − ≤ 𝑋(𝑛) ≤ 𝑎𝑛 + )
2 2 2 2 2 2
≃ 𝜀 𝑛 × 𝑓𝑋(1) ,...,𝑋(𝑛) (𝑎1 , . . . , 𝑎𝑛 )
As a result, considering the above two equations, we can conclude that:
𝑓𝑋(1) ,...,𝑋(𝑛) (𝑎1 , . . . , 𝑎𝑛 ) = 𝑛! 𝑓𝑋1 ,...,𝑋𝑛 (𝑎1 , . . . , 𝑎𝑛 )
To comprehend the above equation better, note that since 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are

identically distributed random variables, and the order of 𝑋1 ≤ 𝑋2 ≤. . . ≤ 𝑋𝑛 is one of
the equally likely 𝑛! possible permutations of the 𝑋𝑖 's, we can conclude that there are
𝑛! states for 𝑋(1) = 𝑎1 , 𝑋(2) = 𝑎2 ,…, 𝑋(𝑛) = 𝑎𝑛 to occur.
470 | P a g e
1) If 𝑋1 and 𝑋2 are two independent Poisson random variables with respective
parameters 1 and 2, then obtain the probability that the sum of these two
variables is more than 1.
2) If 𝑋1 , 𝑋2, and 𝑋3 are three independent Poisson random variables with identical
1 𝑋 +𝑋 +𝑋
parameters 𝜆 = 2, obtain the value of 𝑃(𝑋̄ < ). (𝑋̄ = 1 2 3)
3 3
3) Suppose that, on average, 13 trucks, 20 automobiles, and 15 motorcycles per
hour enter a filling station. If the station opens in the morning, what is the
probability that the operator of the station is idle for at least 10 minutes?
(Consider the arrival time between any two trucks, automobiles, and
1
motorcycles to be independent exponential random variables with means ,
13
1 1
, and 15.)
20
4) Suppose that 𝑋1, 𝑋2, and 𝑋3 are three independent Poisson random variables
9
with identical parameters 𝜆. If 𝑃(𝑋1 + 𝑋2 + 𝑋3 = 3) = 2 𝑒 −3, then obtain the
value of 𝑃(𝑋1 + 𝑋2 = 2, 𝑋3 = 1).
1 1
5) If 𝑋1 ~𝐵(10, 4), 𝑋2 ~𝐵(20, 4), and 𝑌 = 𝑋1 + 𝑋2, then obtain the value of 𝑃(𝑌 = 10).
1 3
6) If 𝑋1 ~𝐵(10, 4) and 𝑋2 ~𝐵(20, 4), then it is desired to calculate:
a. The distribution of 𝑌 = 10 − 𝑋1 + 𝑋2.

b. The probability that 𝑌 takes on an odd value.
7) Suppose that 𝑋1 and 𝑋2 are two independent and identically distributed
random variables, each having the following probability function:
1 2
𝑃(𝑇 = 𝑡) = ( )𝑡−1 ; 𝑡 = 1,2,3, …
3 3
Obtain the value of 𝑃(𝑋1 + 𝑋2 = 10).
8) A box contains 3 marbles numbered from 1 to 3. We select two marbles at
random and one at a time from the box. If 𝑋1 and 𝑋2 denote numbers of the
first and second selected marbles, respectively, and the choices are without
replacement, then obtain the distribution of 𝑋1 + 𝑋2.
471 | P a g e
9) Suppose that 𝑋1, 𝑋2, and 𝑋3 are three independent random variables, each
taking on the value 1 or 2 with equal probability. If the random variable 𝐺 is
defined as 𝐺 = 𝑋1 𝑋2 𝑋3 and their values are denoted by 𝑔, then obtain the value
of 𝐺 = 𝑋1 𝑋2 𝑋3 for the values 1, 2, 4, and 8.
10) If we know that the length of each chain is a normal random variable with
mean 100 and standard deviation 2 centimeters. Obtain the probability that
the sum of length of 100 chains is at least 10020 centimeters.
11) If 𝑋1 , 𝑋2, and 𝑋3 are three independent random variables with respective
distributions 𝑁(𝜇 = 2, 𝜎 2 = 9), 𝑁(𝜇 = 2, 𝜎 2 = 3), and 𝑁(𝜇 = 3, 𝜎 2 = 4), then it is
a. 𝑃(𝑋1 + 𝑋2 + 𝑋3 > 10)
b. 𝑃(𝑋1 + 3𝑋2 > 10 − 4𝑋3 )
12) If daily sales amount of a store follows a normal distribution with mean 10 and
variance 2, what is the variance of the following random variables:
𝑋: The total sales of today and tomorrow
𝑇: Two times of the sales in a day
13) Test scores of a three-credit course follow a normal distribution with mean 12
and standard deviation 2. Also, test scores of a two-credit course is normal
with mean 17 and standard deviation 4. What is the probability that the average
of a student's test scores in the two courses is greater than 16?
14) A producer has two groups of customer for his products. His monthly
production amount is a normal random variable with mean 18 and variance 6
and the monthly demand of each customer, independently, follows a normal
distribution with mean 7 and variance 5. What is the probability that the
producer does not encounter shortage?
15) A construction engineer based on scientific and empirical information knows
that the weight amount (in thousand kilograms) endured by a bridge follows a
normal random variable with mean 400 and standard deviation 40. If the
weight amount of an automobile is a normal random variable with mean 3 and
standard deviation 0.3 (in ten thousand kilograms), given that 116 automobiles
are on the bridge, then obtain the probability that the bridge fails.
472 | P a g e
16) If 𝑋1 and 𝑋2 are two independent and identically distributed normal random
variables with mean 7 and variance 8, then calculate the probability that the
absolute difference of their magnitudes exceeds 2.
17) Suppose that 𝑋1 , 𝑋2 , . . . , 𝑋10 are independent and identically distributed
normal random variables with the mean and variance of zero and 10,
respectively. If we define the random variable 𝑌 as 𝑌 = ∑10 𝑖=1(−1) 𝑋𝑖 , then
𝑖
calculate the value of 𝑃(𝑌 > 10).

18) Suppose that 𝑋1 and 𝑋2 are two exponential random variables with the
following density function:
𝑓(𝑥) = 3𝑒 −3𝑥 ; 𝑥 > 0, and zero elsewhere.
𝑋1 +𝑋2
If we define 𝑌 = 𝑋1 + 𝑋2 and 𝑊 = , then it is desired to calculate:
2
a. The distributions of 𝑌 and 𝑊.
b. 𝑃(𝑊 > 2), 𝑃(𝑌 > 2)
1 1
c. 𝐸(𝑊) , 𝐸(𝑌)
19) If 𝑋1 and 𝑋2 are two independent random variables, each having the following
density function:
𝑓𝑋 (𝑥) = 𝑒 −(𝑥−1) ; 𝑥 > 1, and zero elsewhere.
Obtain the value of 𝑃(𝑋1 + 𝑋2 > 4).
20) If 𝑋1 , 𝑋2, and 𝑋3 are three independent exponential random variables with
identical parameters 𝜆, then obtain the following density functions:
𝑋
a. 𝑌1 = 𝑋1
2
𝑋1
b. 𝑌2 = 𝑋
1 +𝑋2
𝑋1 −𝑋2
c. 𝑌3 = 𝑋
1 +𝑋2
𝑋1 +𝑋2
d. 𝑌4 = 𝑋3
𝑋1
e. 𝑌5 = 𝑋
2 +𝑋3
21) If 𝑋1 and 𝑋2 are two independent exponential random variables with the
identical parameters 𝜆, then obtain:
a. The moment generating function of the random variable 𝑍 = 𝑋1 − 𝑋2.
b. The density function of 𝑍 for positive and negative values of 𝑧.
473 | P a g e
22) The distance of hit location of a rocket from the target is equal to 𝐷 = √𝑋12 + 𝑋22
such that 𝑋1 and 𝑋2 are standard normal random variables. What percentage
of rockets are in the distance of less than 1 unit in relation to the target?
Hint: According to Problem 72 of Chapter 7, the distributions of the random
1 1
variables 𝑋12 and 𝑋22 are gamma with the identical parameters (𝛼 = 2 , 𝜆 = 2).
23) The joint probability density function of the continuous random variables 𝑋1
and 𝑋2 is given by 𝑓(𝑥1 , 𝑥2 ) = 4𝑥1 𝑥2 ; 0 < 𝑥1 < 1, 0 < 𝑥2 < 1, and zero elsewhere.
Obtain the joint density function of the random variables 𝑌1 = 𝑋12 and 𝑌2 = 𝑋1 𝑋2
using Theorem 2.1 of this chapter.
24) The joint probability density function of the continuous random variables 𝑋1
1
and 𝑋2 is given by 𝑓(𝑥1 , 𝑥2 ) = 𝑥 2 𝑥 2 ; 𝑥1 ≥ 1, 𝑥2 ≥ 1, and zero elsewhere.
1 2
a. Obtain the joint density function of the random variables 𝑌1 = 𝑋1 𝑋2 and

𝑋1
𝑌2 = .
𝑋2
b. Obtain the marginal density functions of 𝑌1 and 𝑌2 .
25) If 𝑋1 and 𝑋2 are two independent identically distributed continuous uniform
(0,1) random variables, obtain the joint density function of the random
𝑋
variables 𝑌1 = 𝑋1 + 𝑋2 and 𝑌2 = 𝑋1.
2
26) Suppose that 𝑋1 and 𝑋2 denote the number of female and male customers
entering a store per hour, respectively. If 𝑋1 and 𝑋2 follow Poisson
distributions with respective parameters 𝜆1 = 5 and 𝜆2 = 10, then it is desired
to calculate:
a. 𝑃(𝑋1 = 3|𝑋1 + 𝑋2 = 5)
b. The probability function of (𝑋1 |𝑋1 + 𝑋2 = 𝑦).
c. 𝐸(𝑋1 |𝑋1 + 𝑋2 = 𝑦)
27) A worker produces 20 parts in one day and 10 parts in another day. The
probability of being defective for each part is 0.1 independently. If 𝑋1 and 𝑋2
denote the number of defective parts in the first and second days,
respectively, then it is desired to calculate:
a. 𝑃(𝑋1 = 3|𝑋1 + 𝑋2 = 5)
c. 𝐸(𝑋1 |𝑋1 + 𝑋2 = 𝑦)
474 | P a g e
28) People A and B separately perform independent trials, each having probability
of success 𝑝 until getting a success. If the random variables 𝑋1 and 𝑋2 denote
the number of trials performed by A and B until getting a success, then it is
a. 𝑃(𝑋1 = 3|𝑋1 + 𝑋2 = 5)
c. 𝐸(𝑋1 |𝑋1 + 𝑋2 = 𝑦)
29) We have two light bulbs, each independently having an exponential
distribution with rate 𝜆. If the random variables 𝑋1 and 𝑋2 denote the lifetimes
of the two light bulbs, then it is desired to calculate:
a. 𝑃(𝑋1 < 3|𝑋1 + 𝑋2 = 5)
c. 𝐸(𝑋1 |𝑋1 + 𝑋2 = 𝑦)
30) If 𝑋𝑖 's denote the purchase amount of different people from a store following
independent uniform distributions in the interval (2,4), then obtain the
approximate value of 𝑃(∑64 𝑖=1 𝑋𝑖 > 192).
31) Suppose that 𝑋𝑖 's denote the number of customers of a store in different days
following independent Poisson distributions, each having mean 4. If so, obtain
the approximate value of 𝑃(∑64 𝑖=1 𝑋𝑖 ≤ 256).
32) A person stands in line with 36 people in front of him. The time required to
1
serve each person follows the density function 𝑓(𝑥) = 𝑐𝑥 2 𝑒 −2𝑥 ; 𝑥 > 0, and zero
elsewhere. What is approximate probability that the person waits more than
220 units of time to be served?
33) The number of customers of a store per hour follows a Poisson distribution
with mean 20 customers per hour. What is the approximate probability that
the time until the arrival of the 25th customer exceeds 80 minutes?
34) A fair die is rolled until the sum of all results exceed 300. Obtain the probability
that at least 80 rolls are required to achieve the target.
Hint: This event means that the sum of the results of the first 79 rolls should
be less than or equal to 300.
35) Ten percent of the parts produced by a machine are defective. A random
sample of size 200 from the parts produced by the machine is selected and the
475 | P a g e
number of defective parts in the sample (𝑋) is calculated. Using the normal
approximation, obtain the following probabilities:
a. 𝑃(15 ≤ 𝑋 ≤ 25)
b. 𝑃(𝑋 = 15)
c. 𝑃(𝑋 ≤ 20)
36) There is a four-choice test consisting of 100 questions such that only one
choice is correct in each question. If a student has no knowledge about the
materials of a question and its corresponding answer, then he answers to the
question at random. If the minimum requirement of passing the test is to
answer 50 questions correctly,
a. What is the probability that the student successfully passes the test?
b. Is it reasonable that the student relies merely on chance to successfully
pass the test?
37) The useful lifetime of an electronic device follows a normal distribution with
mean 1.4 years and standard deviation 0.3 years. We select a lot of size 100 at
random from the devices. Obtain the approximate probability that at least 20
devices of the selected lot last less than 1.8 years.
38) A fair coin is flipped 10000 times.
a. What is the probability that the number of upturned heads is between
4800 to 5200?
b. Considering the preceding section, if a coin is fair, is it possible that
5400 heads appear in 10000 flips?
c. If 5400 heads appear in 10000 flips, is the coin fair or not?
39) We roll a pair of fair dice until a pair of six appears 30 times. What is the
probability that at least 1080 rolls are required?
40) Two people labelled A and B independently plan to meet each other at a time
uniformly distributed between 12:00 to 13:00. If 𝑋1 and 𝑋2 denote the arrival
time of person 1 and 2, respectively, then it is required to calculate:
a. The distribution of 𝑌 = |𝑋1 − 𝑋2 | (the difference time of their arrivals).
b. The expected value of 𝑌.
41) Three people labelled A, B, and C plan to meet at a time uniformly distributed
over 8:00 to 9:00 am. Suppose that 8:00 am is the time origin. If 𝑋1, 𝑋2, and 𝑋3
476 | P a g e
denote the arrival time of people A, B, and C, respectively, and 𝑋(1) , 𝑋(2) , and
𝑋(3) denote the arrival time of the first, second, and third people respectively,
a. The mean arrival time of the first, second, or third person; or
𝐸(𝑋(1) ), 𝐸(𝑋(2) ), 𝐸(𝑋(3) )
b. The mean difference arrival time of the first and second people; or
𝐸(𝑋(2) − 𝑋(1) )
Hint: In the next chapter, we will show that for the arbitrary random
variables 𝑋 and 𝑌, we have:
𝐸(𝑋 − 𝑌) = 𝐸(𝑋) − 𝐸(𝑌)
c. The mean difference arrival time of the third and second people; or
𝐸(𝑋(3) − 𝑋(2) )
d. The probability that the first person arrives before 08:20; or 𝑃(𝑋(1) < 20).
e. The probability that the second person arrives before 08:20; or
𝑃(𝑋(2) < 20).
f. The probability that the third person arrives before 08:20; or
𝑃(𝑋(3) < 20).
g. The probability that the first person arrives between 08:20 and 08:40;
or 𝑃(20 < 𝑋(1) < 40).
h. The probability that the first person arrives before 08:20 and the second
person arrives after 8:20; or 𝑃(𝑋(1) < 20 < 𝑋(2) ).
i. The probability that the first person arrives before 08:20 and the third
person arrives after 8:20; or 𝑃(𝑋(1) < 20 < 𝑋(3) ).
j. The probability that A is the first person to arrive; or 𝑃(𝑋(1) = 𝑋1 ).
k. The probability that A arrives before people B and C; or 𝑃(𝑋1 < 𝑋2 , 𝑋1 < 𝑋3 ).
l. The probability that people A, B, and C arrive at the place as the first,
second, and third individuals, respectively; or
𝑃(𝑋(1) = 𝑋1 , 𝑋(2) = 𝑋2 , 𝑋(3) = 𝑋3 ).
m. The probability that person A arrives before B and B arrives before C;
or 𝑃(𝑋1 < 𝑋2 < 𝑋3 ).
42) Consider three light bulbs, each having exponential lifetime of rate 𝜆 (in years).
Suppose that 𝑋1, 𝑋2, and 𝑋3 denote the lifetime of the light bulbs, respectively.
477 | P a g e
If 𝑋(1) , 𝑋(2) , and 𝑋(3) denote the time until the failure of the first, second, and
third light bulb, respectively, then it is desired to calculate:
a. The mean time until the failure of the first, second, and third light bulb;
or 𝐸(𝑋(1) ), 𝐸(𝑋(2) ), 𝐸(𝑋(3) ).
b. The mean difference time between the failure of the first and second
light bulbs; or 𝐸(𝑋(2) − 𝑋(1) ).
c. The mean difference time between the failure of the first and third light
bulbs; or 𝐸(𝑋(3) − 𝑋(1) ).
d. The probability that the first light bulb fails before one year; or
𝑃(𝑋(1) < 1).
e. The probability that the second light bulb fails before one year; or
𝑃(𝑋(2) < 1).
f. The probability that the third light bulb fails before one year; or
𝑃(𝑋(3) < 1).
g. The probability that the first light bulb lasts between 1 to 2 years; or
𝑃(1 < 𝑋(1) < 2).
43) Three people are being served by three different bank tellers numbered 1
through 3. The serving time of each teller is exponential with parameter 𝜆1 .
Moreover, when the service time of any person terminates, he will go home.
The duration until arriving at home for each person is exponential with
parameter 𝜆2 .
a. What is the probability that the service time of the person in teller 1
terminates before the others?
b. What is the probability that a person arrives at home while the other
two individuals are still being served?
c. What is the probability that a person arrives at home while at least one
of the other two individuals is still being served?
44) A bank has 3 tellers numbered 1 through 3. The serving time of each teller
follows an exponential distribution with parameter 𝜆. Three people are being
served and 5 ones stand in line and the door of the bank is closed. It is desired
to calculate:
a. The mean time required until the first person exits the bank.
b. The mean time required that nobody stands in line.
478 | P a g e
c. The mean time required until a teller becomes idle.
d. The mean time required until the last person in line exits the bank.
e. The mean time required until the all the individuals exit the bank.
f. The probability that the person being served by teller 1 exits the bank
before the person being served by teller 2.
g. The probability that the person being served by teller 1 exits the bank
before the other two individuals.
h. The probability that the fifth person standing in line exits the bank
before the second person standing in line.
45) A flashlight requires two batteries to function. The lifetime of each battery
independently follows an exponential distribution with parameter 𝜆. If a
person has a flashlight and 𝑛 batteries,
a. Obtain the mean time until the failure of the first battery.
b. Obtain the mean time until all the spare batteries fail.
c. Obtain the mean time that the flashlight can function.
46) If 𝑋1 , 𝑋2, and 𝑋3 are three independent and identically distributed normal
random variables with mean 6 and variance 4, then it is desired to calculate
the probability that:
a. Their maximum exceeds 8.
b. Their median exceeds 6.
Hint: if 𝑋(1) , 𝑋(2) , and 𝑋(3) denote the first, second, and third order statistics,
respectively, from a sample of size 3, then the median is equal to 𝑋(2) .
47) Suppose that we have 5 light bulbs. The lifetime of each light bulb
1
independently follows an exponential distribution with mean hours. If we
3
turn on the light bulbs successively and alternately (after the failure of one
light bulb, we turn on the other), then it is desired to calculate:
a. The probability that the total lifetime of the five light bulbs is less than
2 hours.
b. The mean time until all the light bulbs fail.
c. The mean time until the third light bulb fails.
48) In the preceding problem, if we turn on the light bulbs simultaneously, then it
479 | P a g e
a. The probability that we have less than 2 hours of lighting.
b. The mean time until all the light bulbs fail.
c. The mean time until the third light bulbs fails.
49) Suppose that you and your neighbor separately have set a light bulb at the
door of home. These light bulbs are lighting. If the lifetime of the light bulbs
consumed in your home and the neighbor's follow exponential distributions
with respective parameters 𝜆 = 2 and 𝜆 = 1, then what is the probability that
your light bulb lasts longer than the neighbor ones?
50) In the preceding example, suppose that you have two light bulbs and the
neighbor has only one light bulb. If you turn on the light bulb one by one, what
is the probability that the sum of the lifetime of lighting caused by your light
bulbs exceeds the neighbor's light bulb?
51) In Problem 49, If you and the neighbor separately have two light bulbs and turn
them on one by one, then what is the probability that the sum of the lifetime
of your light bulbs is less than that of the neighbor's light bulbs?
52) A system comprises 𝑛 light bulbs, each having an exponential lifetime with
failure rate 𝜆 in unit of time. Obtain the distribution of the lifetime of the
system in each of the following cases:
a. The 𝑛 light bulbs are connected in series.
Hint: In series systems, the parts are used simultaneously and the
system stops working only when the first part fails.
b. The 𝑛 light bulbs are connected in parallel.
Hint: In parallel systems, the parts are used simultaneously and the
system stops working only when the last part fails.
c. The 𝑛 light bulbs are connected in standby.
Hint: In standby systems, the parts are used one by one, meaning that
each failed part will be replaced by the next one.
53) Consider two light bulbs, each having an exponential lifetime with failure rate
𝜆 (in years) are used simultaneously. Suppose that 𝑋1 and 𝑋2 denote the lifetime
of the two light bulbs, respectively. Moreover, 𝑋(1) and 𝑋(2) denote the time
until the failure of the first and second light bulbs, respectively. If so, then it is
480 | P a g e
a. The mean time until the failure of the first and second light bulbs; or
𝐸(𝑋(1) ), 𝐸(𝑋(2) ).
2
b. 𝐸(𝑋(1) )
2 2
Hint: 𝑋12 + 𝑋22 = 𝑋(1) + 𝑋(2)
c. 𝑉𝑎𝑟(𝑋(1) ), 𝑉𝑎𝑟(𝑋(2) )
54) Suppose that we randomly select three numbers one at a time and without
replacement from the set of numbers {1,2,3, … , 𝑛}.
a. If the first selected number is less than the second selected number,
what is the probability that the first number is less than the third
selected number?
b. If the first selected number is smaller than the second selected number,
what is the probability that the third selected number is greater than
the first and second selected numbers?
481 | P a g e
I n the preceding section, we discussed the distribution of a function of multiple
random variables and investigated some of its special cases, such as the sum of
independent random variables and the distribution of order statistics. In this
chapter, we introduce some methods to calculate the expected value of a function of
multiple random variables followed by some examples, such as calculating the mean
of a sum of the random variables. Moreover, some of its applications like the
calculation of covariance and correlation are addressed.
S uppose that 𝑌 = 𝑔(𝑋1, 𝑋2 ) is a function of random variables 𝑋1 and 𝑋2. To calculate

the mean of 𝑌 = 𝑔(𝑋1 , 𝑋2 ), we should first obtain the distribution of 𝑌 in the same
manner as the preceding chapter. Then, we should use common methods explained
in Chapter 5. In other words, if 𝑌 is a discrete random variable, its expected value can
be obtained by the following method:
𝐸(𝑌) = ∑ 𝑦𝑃𝑌 (𝑦)
𝑦
482 | P a g e
And if 𝑌 is a continuous random variable, its expected value can be obtained
by the following method:
𝐸(𝑌) = ∫ 𝑦𝑓𝑌 (𝑦)𝑑𝑦
𝑦
However, the expected value of some random variables can be calculated by

another method as well. If 𝑋1 and 𝑋2 are discrete random variables with joint
probability function 𝑃𝑋1,𝑋2 (𝑥1 , 𝑥2 ), then the expected value of function 𝑌 = 𝑔(𝑋1 , 𝑋2 ) is
𝐸(𝑌) = 𝐸 (𝑔(𝑋1 , 𝑋2 )) = ∑ ∑ 𝑔(𝑥1 , 𝑥2 ) 𝑃𝑋1 ,𝑋2 (𝑥1 , 𝑥2 ) (2-1)

𝑥2 𝑥1
And if 𝑋1 and 𝑋2 are continuous random variables with joint density function
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ), then the expected value of function 𝑔(𝑋1 , 𝑋2 ) is equal to:
𝐸(𝑌) = 𝐸 (𝑔(𝑋1 , 𝑋2 )) = ∫ ∫ 𝑔(𝑥1 , 𝑥2 ) 𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) 𝑑𝑥1 𝑑𝑥2 (2-2)

𝑥2 𝑥1
The proof of these propositions is the same as Proposition 4-1 of Chapter 5

associated with the expected value of a function of a random variable. In this regard,
we avoid proving them herein.
Example 2.1
Suppose that people A and B plan to meet each other. The arrival time of
each person independently follows a uniform distribution in interval 12:00 to 13:00
483 | P a g e
p.m. If 𝑋1 denotes the arrival time of person A and 𝑋2 denotes the arrival time of
person B, then it is desired to obtain:
a. the expected value of their difference arrival time, i.e., 𝐸(|𝑋1 − 𝑋2 |).
b. the expected value of the first person's arrival time, i.e.,
𝐸(𝑀𝑖𝑛{𝑋1 , 𝑋2 }).
Solution.
a. We consider 12:00 to be the origin of time. Therefore, 𝑋1 and 𝑋2 are
independent uniform random variables in interval (0,1).
1 1
𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 ) = 𝑓𝑋1 (𝑥1 )𝑓𝑋2 (𝑥2 ) = × =1
1−0 1−0
1 1
𝐸(|𝑋1 − 𝑋2 |) = ∫ ∫ |𝑥1 − 𝑥2 |𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 )𝑑𝑥1 𝑑𝑥2
0 0
1 𝑥2 1 1
= ∫ ∫ (𝑥2 − 𝑥1 )𝑑𝑥1 𝑑𝑥2 + ∫ ∫ (𝑥1 − 𝑥2 )𝑑𝑥1 𝑑𝑥2
0 0 0 𝑥2
1
( 𝑥1 )2 𝑥2 1
( 𝑥1 ) 2 1
= ∫ (𝑥2 𝑥1 − )| 𝑑𝑥2 + ∫ ( − 𝑥1 𝑥2 ) | 𝑑𝑥2
0 2 0 0 2 𝑥2
1 1
( 𝑥2 ) 2 1 ( 𝑥2 ) 2
=∫ 𝑑𝑥2 + ∫ ( − 𝑥2 + ) 𝑑𝑥2
0 2 0 2 2
(𝑥2 )3 1 1 (𝑥2 )2 (𝑥2 )3 1 1 1 1 1 1
= | + ( 𝑥2 − + )| = + − + =
6 2 2 6 6 2 2 6 3
0 0
b.
1 1
𝐸(𝑀𝑖𝑛{𝑋1 , 𝑋2 }) = ∫ ∫ 𝑀𝑖𝑛 (𝑥1 , 𝑥2 )𝑓𝑋1,𝑋2 (𝑥1 , 𝑥2 )𝑑𝑥1 𝑑𝑥2
0 0
1 𝑥2 1 1
= ∫ ∫ 𝑥1 𝑑𝑥1 𝑑𝑥2 + ∫ ∫ 𝑥2 𝑑𝑥1 𝑑𝑥2
0 0 0 𝑥2
1 1
1
(𝑥2 )2 1
(𝑥2 )3 (𝑥2 )2 (𝑥2 )3
=∫ 𝑑𝑥2 + ∫ 𝑥2 (1 − 𝑥2 )𝑑𝑥2 = |0+( − )| 0
0 2 0 6 2 3
1 1 1 1
= + − =
6 2 3 3
484 | P a g e
However, to solve Section “b”, using order statistics, we can first obtain
the density function of 𝑋(1) = 𝑀𝑖𝑛{𝑋1 , 𝑋2 } and then calculate its expected
value. Based on the results obtained from Example 5.5 of Chapter 9, in
this example, 𝑋(1) = 𝑀𝑖𝑛{𝑋1 , 𝑋2 } follows a beta distribution with
1 1
parameters (1,2) and expected value 1+2 = 3.
Moreover, to solve Section “a”, we have:
𝐸(|𝑋1 − 𝑋2 |) = 𝐸(𝑀𝑎𝑥{𝑋1 , 𝑋2 } − 𝑀𝑖𝑛{𝑋1 , 𝑋2 }) = 𝐸(𝑋(2) − 𝑋(1) )
2 1 1
= − =
2+1 1+2 3
(Note that according to the explanations of the preceding chapter, in
this example, 𝑀𝑎𝑥(𝑋1 , 𝑋2 ) follows a Beta distribution with parameters
2 2
(2,1) and 1+2 = 3.)
Example 2.2
There are two marbles labeled as number 2 and two marbles with number 1 in
an urn. We select two marbles without replacement and at random from the urn.
Then, we define 𝑋1 and 𝑋2 to denote the smaller and greater numbers among the
selected marbles, respectively. It is desired to calculate:
a. 𝐸(𝑋1 𝑋2 )
b. 𝐸(𝑋1 )
c. 𝐸(𝑋2 )
Solution.
a.
𝑔1 (𝑋1 , 𝑋2 ) = 𝑋1 𝑋2 ⇒ 𝐸(𝑋1 𝑋2 ) = ∑ ∑ 𝑥1 𝑥2 𝑃(𝑥1 , 𝑥2 )
𝑥2 𝑥1
= 1 × 1 × 𝑃(1,1) + 1 × 2 × 𝑃(1,2) + 2 × 2 × 𝑃(2,2)
2 2 2 2 2 2
( )( ) ( )( ) ( ) ( ) 13
=1× 2 0 +2× 1 1 +4× 0 2 =
4 4 4 6
( ) ( ) ( )
2 2 2
485 | P a g e
Note that (𝑥1 , 𝑥2 ) cannot adopt the values (2,1) since the value of 𝑥1
cannot be greater than that of 𝑥2 .
b. 𝑔2 (𝑋1 , 𝑋2 ) = 𝑋1 ⇒ 𝐸(𝑔2 (𝑋1 , 𝑋2 )) = 𝐸(𝑋1 ) = ∑𝑥2 ∑𝑥1 𝑥1 𝑃(𝑥1 , 𝑥2 )
7
= 1 × 𝑃(1,1) + 1 × 𝑃(1,2) + 2 × 𝑃(2,2) =
6
c. 𝑔3 (𝑋1 , 𝑋2 ) = 𝑋2 ⇒ 𝐸(𝑔3 (𝑋1 , 𝑋2 )) = 𝐸(𝑋2 ) = ∑𝑥2 ∑𝑥1 𝑥2 𝑃(𝑥1 , 𝑥2 )
11
= 1 × 𝑃(1,1) + 2 × 𝑃(1,2) + 2 × 𝑃(2,2) =
6
Proposition 2.1
If 𝑋1 and 𝑋2 are independent random variables, then for all the functions of 𝑔(𝑋1 )
and ℎ(𝑋2 ) we have:
𝐸[(𝑔(𝑋1 )ℎ(𝑋2 )] = 𝐸[𝑔(𝑋1 )]𝐸[ℎ(𝑋2 )]
Proof.
For the continuous case, we have:
𝐸[(𝑔(𝑋1 )ℎ(𝑋2 )] = ∫ ∫ 𝑔(𝑥1 )ℎ(𝑥2 )𝑓(𝑥1 , 𝑥2 )𝑑𝑥1 𝑑𝑥2

𝑥2 𝑥 1
= ∫ ∫ 𝑔(𝑥1 )ℎ(𝑥2 ) 𝑓𝑋1 (𝑥1 )𝑓𝑋2 (𝑥2 ) 𝑑𝑥1 𝑑𝑥2

𝑥2 𝑥 1
= ∫ 𝑔(𝑥1 ) 𝑓𝑋1 (𝑥1 )𝑑𝑥1 ∫ ℎ(𝑥2 ) 𝑓𝑋2 (𝑥2 ) 𝑑𝑥2 = 𝐸[𝑔(𝑋1 )]𝐸[ℎ(𝑋2 )]
𝑥1 𝑥2
Hence, if the value of 𝐸(𝑋1 𝑋2 ) or 𝐸(𝑋12 𝑋22 ) is required in Example 2.1, we have:
1 1 1
𝐸(𝑋1 𝑋2 ) = 𝐸(𝑋1 )𝐸(𝑋2 ) = × =
2 2 4
1 1
𝐸(𝑋12 𝑋22 ) = 𝐸(𝑋12 )𝐸(𝑋22 ) = ∫ (𝑥1 )2 𝑓(𝑥1 )𝑑𝑥1 ∫ (𝑥2 )2 𝑓(𝑥2 )𝑑𝑥2
0 0
1 1
1 1 1
= ∫ (𝑥1 )2 𝑑𝑥1 ∫ (𝑥2 )2 𝑑𝑥2 = × =
0 0 3 3 9
However, to solve Section “a” of Example 2.2, we cannot use Proposition 2-1
he gamma because the possible values of 𝑋1 and 𝑋2 are dependent herein and thus
these two random variables are not independent.
486 | P a g e
T he sum of random variables has many applications in probability theory and
statistics. Some related explanations were discussed in the preceding chapter,
where the random variables were independent and identically distributed. In
this section, we intend to address some notes about the expected value of the
sum of arbitrary random variables.
If 𝑋1 , . . . , 𝑋𝑛 are arbitrary random variables with finite expected value, then
it can be shown that:
𝐸(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 ) = 𝐸(𝑋1 ) + 𝐸(𝑋2 ) + ⋯ + 𝐸(𝑋𝑛 ) (3-1)
Proof.
For the continuous case, if 𝑋1 and 𝑋2 are continuous random variables with
joint density function 𝑓(𝑥1 , 𝑥2 ), then we have:
+∞ +∞
𝐸(𝑋1 + 𝑋2 ) = ∫ ∫ (𝑥1 + 𝑥2 )𝑓(𝑥1 , 𝑥2 )𝑑𝑥1 𝑑𝑥2
−∞ −∞
+∞ +∞ +∞ +∞
=∫ ∫ 𝑥1 𝑓(𝑥1 , 𝑥2 ) 𝑑𝑥1 𝑑𝑥2 + ∫ ∫ 𝑥2 𝑓(𝑥1 , 𝑥2 )𝑑𝑥1 𝑑𝑥2 = 𝐸(𝑋1 ) + 𝐸(𝑋2 )
−∞ −∞ −∞ −∞
Now, the same method can be used for the discrete case, and then the
proposition can be proven by using the induction for a general case. Note that the
independence and identical distribution of the random variables 𝑋1 , . . . , 𝑋𝑛 are not
necessary. Moreover, even if the 𝑋𝑖 's are not identically distributed or are
dependent, this relationship is still valid.
Example 3.1
As showed in Chapter 9, if 𝑋1 , . . . , 𝑋𝑛 are independent Bernoulli trials with

parameter 𝑝, then the random variable 𝑋 = ∑𝑛𝑖=1 𝑋𝑖 follows a binomial distribution
with parameters (𝑛, 𝑝). Now, considering the aforementioned note and using
Relationship 3.1, obtain the expected value of the binomial distribution.
487 | P a g e
Solution.
1 ; if the 𝑖 𝑡ℎ trial results in 𝑎 success
𝑋𝑖 = {
0 ; otherwise
𝑛 𝑛 𝑛
𝑋 = ∑ 𝑋𝑖 ⇒ 𝐸(𝑋) = 𝐸(∑ 𝑋𝑖 ) = ∑ 𝐸(𝑋𝑖 )

𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
⇒ 𝐸(𝑋𝑖 ) = 1 × 𝑝 + 0 × (1 − 𝑝) = 𝑝 ⇒ 𝐸(𝑋) = ∑ 𝐸(𝑋𝑖 ) = ∑ 𝑝 = 𝑛𝑝

𝑖=1 𝑖=1
Example 3.2
Obtain the expected value of Erlang distribution with parameters (𝛼, 𝜆)

using Relationship 3.1 and the expected value of the exponential distribution.
Solution. If 𝑋 follows an Erlang distribution with parameters (𝛼, 𝜆), then it can be
written as 𝑋 = ∑𝛼𝑖=1 𝑋𝑖 , where 𝑋𝑖 's have exponential distributions with identical
parameter 𝜆. Therefore, we have:
𝛼 𝛼 𝛼 𝛼
1 𝛼
𝑋 = ∑ 𝑋𝑖 ⇒ 𝐸(𝑋) = 𝐸(∑ 𝑋𝑖 ) = ∑ 𝐸(𝑋𝑖 ) = ∑ =
𝜆 𝜆
𝑖=1 𝑖=1 𝑖=1 𝑖=1
Likewise, using the Relationship 3.1 and the fact that the sum of 𝑟
independent geometric random variables with identical parameter 𝑝 follows a
negative binomial random variable with parameters 𝑟 and 𝑝, it can be shown that
𝑟
the expected value of negative binomial distribution is equal to 𝑝.
In many cases, the goal is to calculate the number of successes in a series of
trials. If the trials are independent and each has the same probability of success, then
the number of their successes follows the binomial distribution. However, if the trials
are not independent or each has a different probability of success, then the number
of successes in these trials does not follow the binomial distribution. It can be shown
that regardless of the independence or dependence of the trials and the sameness of
488 | P a g e
their probability of success, the expected value of the number of successes in a series
of trials is equal to the sum of the probability of success for each trial.
If 𝑋1 , . . . , 𝑋𝑛 are Bernoulli trials with respective parameters 𝑝1 , . . . , 𝑝𝑛 , then the

random variable 𝑋 = ∑𝑛𝑖=1 𝑋𝑖 denotes the number of successes in these trials.
To demonstrate this note, suppose that we have 𝑛 Bernoulli trials with

respective probabilities of success 𝑝1 , 𝑝2 , . . . , 𝑝𝑛 . We define Bernoulli random variables
𝑋1 , 𝑋2 , . . . , 𝑋𝑛 for these trials with respective probabilities of success 𝑝1 , 𝑝2 , . . . , 𝑝𝑛 . Each
of the random variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 adopts the value of 1 if its corresponding trial
results in a success; otherwise, it takes on the value of zero. In this case, the random
variable 𝑋 = ∑𝑛𝑖=1 𝑋𝑖 is equal to the sum of Bernoulli trials adopting the value of 1.
Therefore, 𝑋 = ∑𝑛𝑖=1 𝑋𝑖 is equal to the number of successes in n trials. Now, using the
relationship 3.1, the expected value of 𝑋 = ∑𝑛𝑖=1 𝑋𝑖 can be obtained as follows:
𝑛 𝑛
𝑋 = ∑ 𝑋𝑖 ⇒ 𝐸(𝑋) = 𝐸(∑ 𝑋𝑖 ) = 𝐸(𝑋1 ) + 𝐸(𝑋2 ) + ⋯ + 𝐸(𝑋𝑛 ) = 𝑝1 + 𝑝2 + ⋯ + 𝑝𝑛

𝑖=1 𝑖=1
Therefore, the expected value of the number of successes in some Bernoulli

trials is equal to the sum of their success probabilities.
Example 3.3
In Matching problem, a group of 𝑛 people throw their hats in the middle of a

room and then each person picks one hat at random. What is the expected value of
the number of individuals who pick their own hats?
Solution. If 𝑋 denotes the number of people who pick their own hats, then 𝑋 can be
written as the sum of 𝑛 Bernoulli trials.
1 ; if the 𝑖 𝑡ℎ person picks his hat

𝑋𝑖 = {
0 ; otherwise
489 | P a g e
𝑛 𝑛
𝑋 = ∑ 𝑋𝑖 ⇒ 𝐸(𝑋) = ∑ 𝐸(𝑋𝑖 )
𝑖=1 𝑖=1
𝑛 𝑛
1 1
𝐸(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1) = ⇒ 𝐸(𝑋) = ∑ 𝐸(𝑋𝑖 ) = ∑ = 1
𝑛 𝑛
𝑖=1 𝑖=1
We should consider two notes when solving this problem. Firstly, according
to the explanations of Section 6.6.3 in Chapter 6, the random variables 𝑋1 , 𝑋2 , . . . , 𝑋𝑛
are not independent in this problem. Secondly, in the Bernoulli distribution, we
always have 𝐸(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1). Therefore, the expected value of the number of
successes in these trials is equal to the sum of the success probabilities of trials or
1
∑𝑛𝑖=1 = 1.
𝑛
However, Example 3.3 can be solved by using the primary definition of the
expected value as well. Using the information provided in Chapter 2, we know that
the probability function of the random variable 𝑋 is equal to:
1 1 𝑛−𝑖 1
2! − 3! + ⋯ + (−1) (𝑛 − 𝑖)!
𝑃(𝑋 = 𝑖) =
𝑖!
And thus the expected value of 𝑋 is equal to:
𝑛 𝑛 1 1 1 1 𝑛−𝑖 1
− + − + ⋯ + (−1)
0! 1! 2! 3! (𝑛 − 𝑖)!
𝐸(𝑋) = ∑ 𝑖 × 𝑃(𝑋 = 𝑖) = ∑ 𝑖 ×
𝑖!
𝑖=0 𝑖=0
It is evident that the first provided solution is computationally less

expensive. This is the main advantage of using the relationship 3.1. Meanwhile, if
the trials are dependent like this example, obtaining the probability function of the
number of successes will not be straightforward all the times. This makes
Relationship 3.1 easier than the conventional method mentioned in Chapter 5 or
𝐸(𝑋) = ∑𝑥 𝑥 𝑃(𝑋 = 𝑥) to calculate the expected value of the number of successes
in 𝑛 trials.
490 | P a g e
Example 3.4
Suppose that we have 𝑛 marbles and want to distribute them into 𝑟 urns at
random. What is the expected number of the empty urns?
Solution. If 𝑋 denotes the number of empty urns, then 𝑋 can be written as the sum
of 𝑟 Bernoulli trials (these Bernoulli random variables are called indicator variables.)
1 ; if the 𝑖 𝑡ℎ urn is empty

𝑋𝑖 = {
0 ; otherwise
𝑋 = ∑ 𝑋𝑖
𝑖=1
𝑟 𝑟
(𝑟 − 1)𝑛 𝑟−1 𝑛 𝑟−1 𝑛 𝑟−1 𝑛
𝐸(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1) = 𝑛
=( ) ⇒ 𝐸(𝑋) = ∑ 𝐸(𝑋𝑖 ) = ∑( ) = 𝑟( )
𝑟 𝑟 𝑟 𝑟
𝑖=1 𝑖=1
Once again, in this problem, the expected value of the number of successes in
𝑟−1 𝑛
𝑟 urns is equal to the sum of the success probability of for each urn or ∑𝑟𝑖=1( ) =
𝑟
𝑟−1 𝑛
𝑟( ) .
𝑟
Example 3.5
If 10 couples sit at a round table, it is desired to calculate the mean number

of men who sit next to their spouses.
491 | P a g e
Solution. We define 𝑋 to denote the number of couples who sit next to each other.
1 ; if the 𝑖 𝑡ℎ couple sit next to each other

𝑋𝑖 = {
0 ; otherwise
10
2 × (18)! 2
𝑋 = ∑ 𝑋𝑖 , 𝐸(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1) = =
(19)! 19
𝑖=1
10
2 20
⇒ 𝐸(𝑋) = ∑ 𝐸(𝑋𝑖 ) = 10 × =
19 19
𝑖=1
In this problem, the expected value of the number of successes in 10 couples

2 20
is equal to the sum of the probability of each couple's success or ∑10
𝑖=1 19 = 19.
Example 3.6
Using the relationship 3.1 and definition of appropriate indicators, obtain the
expected value of hypergeometric random variable.
Solution. We select a sample of size 𝑛 from an urn with 𝑁 parts that 𝑚 of them are
defective. Suppose that 𝑋 denotes the number of defective parts in the sample. Two
kinds of indicator variables can be defined to solve this problem:
1 ; if the 𝑖 𝑡ℎ selected part is defective

𝑋𝑖 = {
0 ; otherwise
In chapter 3, we showed that the probability that the 𝑖 𝑡ℎ selected part is

𝑚
defective is equal to 𝑁 .
𝑛
𝑚 𝑚
𝐸(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1) = ⇒ 𝐸(𝑋) = ∑ 𝐸(𝑋𝑖 ) = 𝑛
𝑁 𝑁
𝑖=1
492 | P a g e
The second solution: The defective parts are numbered from 1 to 𝑚.
1 ; if the 𝑗 𝑡ℎ defective part is selected

𝑌𝑗 = {
0 ; otherwise
𝑋 = ∑ 𝑌𝑗
𝑗=1
1 𝑁−1 𝑚 𝑚
( )( ) 𝑛 𝑛 𝑚𝑛
𝐸(𝑌𝑗 ) = 𝑃(𝑌𝑗 = 1) = 1 𝑛 − 1 = ⇒ 𝐸(𝑋) = ∑ 𝐸(𝑌𝑗 ) = ∑ =
𝑁 𝑁 𝑁 𝑁
( ) 𝑗=1 𝑗=1
𝑛
W e discussed the expected value of a function of multiple random variables

followed by solving the related examples. If 𝑋 and 𝑌 are two random variables,
then there is a special case for the expected value of a function of them which is
called the covariance between the two random variables 𝑋 and 𝑌. It has many
applications in probability theory and we will address some of them.
Definition
The covariance between two random variables 𝑋 and 𝑌 denoted by 𝐶𝑜𝑣(𝑋, 𝑌) is as
follows:
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )]
where 𝜇𝑥 and 𝜇𝑌 are the expected values of 𝑋 and 𝑌.
Hence, for discrete random variables 𝑋 and 𝑌 with joint probability function
𝑃(𝑥, 𝑦) we have:
𝐶𝑜𝑣(𝑋, 𝑌) = ∑ ∑(𝑥 − 𝜇𝑥 )(𝑦 − 𝜇𝑦 )𝑃(𝑥, 𝑦)

𝑦 𝑥
And for continuous random variables 𝑋 and 𝑌 with joint density function
𝑓(𝑥, 𝑦) we have:
𝐶𝑜𝑣(𝑋, 𝑌) = ∫ ∫(𝑥 − 𝜇𝑥 )(𝑦 − 𝜇𝑦 )𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦
𝑦 𝑥
493 | P a g e
Since the expected value and variance give some information about the values
of a random variable, the covariance also provides some information about the
relationship and joint changes of the two random variables.
Expanding the covariance relationship between the two random variables, we
have:
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋𝑌 − 𝑋𝜇𝑌 − 𝑌𝜇𝑥 + 𝜇𝑌 𝜇𝑥 ) = 𝐸(𝑋𝑌) − 𝜇𝑌 𝐸(𝑋) − 𝜇𝑥 𝐸(𝑌) + 𝜇𝑥 𝜇𝑌

= 𝐸(𝑋𝑌) − 𝜇𝑌 𝜇𝑥 − 𝜇𝑥 𝜇𝑌 + 𝜇𝑥 𝜇𝑌 = 𝐸(𝑋𝑌) − 𝜇𝑥 𝜇𝑌 = 𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌)
Proposition 4.1
If 𝑋 and 𝑌 are independent random variables, then 𝐶𝑜𝑣(𝑋, 𝑌) = 0.
Proof. Considering Proposition 2.1, we have:
𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌) ⇒ 𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌) = 𝐸(𝑋)𝐸(𝑌) − 𝐸(𝑋)𝐸(𝑌) = 0
The above proposition states that of the covariance between two random
variables is not equal to zero, then these two random variables cannot be
independent (why?).
The covariance sign of two variables conveys a kind of relationship between

them. If the large values of the variable 𝑋 lead to the large values of the variable 𝑌 and
the small values of the variable 𝑋 lead to the small values of the variable 𝑌, then we
expect that the positive values of 𝑋 − 𝜇𝑥 often lead to the positive values of 𝑌 − 𝜇𝑌
and the negative values of 𝑋 − 𝜇𝑥 lead to the negative values of 𝑌 − 𝜇𝑌 . If so, we expect
that the values of (𝑋 − 𝜇𝑥 )(𝑌 − 𝜇𝑌 ) become often positive. On the other hand, if the
large values of 𝑋 lead to the small values of 𝑌 and the small values of 𝑋 lead to the
large values of 𝑌, then we expect that the values of (𝑋 − 𝜇𝑥 )(𝑌 − 𝜇𝑌 ) become often
negative. Hence, if the values of 𝑌, on average, increase as the value of X increases,
then the covariance between 𝑋 and 𝑌 is positive. In contrast, if the values of 𝑌, on
average, decrease as the value of 𝑋 increases, then the covariance of 𝑋 and 𝑌 is
negative.
494 | P a g e
Example 4.1
The random variables 𝑋 and 𝑌 have the following joint density function:
2 −2𝑥
𝑓 (𝑥, 𝑦) = {𝑥 𝑒 ; 0 < 𝑥 < ∞, 0<𝑦≤𝑥
Obtain 𝐶𝑜𝑣(𝑋, 𝑌).
Solution.
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌)

∞ 𝑥 ∞ ∞
2 −2𝑥 −2𝑥 −2𝑥 ∞
1 1
𝐸(𝑋) = ∫ ∫ 𝑥 𝑒 𝑑𝑦𝑑𝑥 = ∫ 2𝑥𝑒 𝑑𝑥 = −𝑥𝑒 | + ∫ 𝑒 −2𝑥 𝑑𝑥 = 0 + =
0 0 𝑥 0 0 0 2 2
∞ 𝑥 ∞
2𝑦 −2𝑥 1
𝐸(𝑌) = ∫ ∫ 𝑒 𝑑𝑦𝑑𝑥 = ∫ 𝑥𝑒 −2𝑥 𝑑𝑥 =
0 0 𝑥 0 4
∞ 𝑥
𝐸(𝑋𝑌) = ∫ ∫ 2𝑦𝑒 −2𝑥 𝑑𝑦𝑑𝑥
0 0
∞ 2 ∞
𝑥 −2𝑥 ∞ 1 1
= ∫ 𝑥 2 𝑒 −2𝑥 𝑑𝑥 = − 𝑒 | + ∫ 𝑥𝑒 −2𝑥 𝑑𝑦 = 0 + =
0 2 0 0 4 4
1 1 1 1
⇒ 𝐶𝑜𝑣(𝑋, 𝑌) = − × =
4 2 4 8
For simplicity, in the calculations of the above integrals, use the 𝑛𝑡ℎ moment
of the exponential distribution relationship:
∞
𝑟! 𝑟!
𝑋 ∼ 𝐸𝑥𝑝(𝜆) ⇒ 𝐸(𝑋 ) = 𝑟 ⇒ ∫ 𝑥 𝑟 2𝑒 −2𝑥 𝑑𝑥 = 𝑟
𝑟
𝜆 0 2
Moreover, to calculate the integrals, we can apply the explanations of the

gamma distribution density function mentioned in Chapter 7.
495 | P a g e
As seen, the covariance between 𝑋 and 𝑌 is positive in this example. In
addition, as the value of 𝑌 increases, the value of 𝑋, on average, increases as well.
However, prior to calculation, we could guess that the covariance of 𝑋 and 𝑌 is
positive. This is because the joint density function of these two variables adopts a
value greater than zero for the values of 𝑦 ≤ 𝑥. Besides, since the relationship 𝑦 ≤ 𝑥
is valid, it is expected that as 𝑌 increases, 𝑋, on average, increases as well.
Example 4.2
If the joint density function of 𝑋 and 𝑌 is defined as follows:
1; −𝑦 < 𝑥 < 𝑦, 0≤𝑦≤1

𝑓(𝑥, 𝑦) = {
then what is the covariance between 𝑋 and 𝑌?
Solution. It is evident that 𝑋 and 𝑌 are not independent. This is because if 𝑌 adopts
any value like 𝑦, then 𝑋 should be within interval (−𝑦, 𝑦). Nonetheless, it can be shown
that the covariance between 𝑋 and 𝑌 is zero.
1 𝑦
𝑥2 𝑦 1
𝐸(𝑋) = ∫ ∫ 𝑥 𝑑𝑥 𝑑𝑦 = ∫ | −𝑦 𝑑𝑦 = 0
0 −𝑦 0 2
1 𝑦 1 𝑥=𝑦 1
2
𝐸(𝑌) = ∫ ∫ 𝑦 𝑑𝑥 𝑑𝑦 = ∫ (xy) | 𝑥=−𝑦 𝑑𝑦 = ∫ 2𝑦 2 𝑑𝑦 =
0 −𝑦 0 0 3
1 𝑦 1 2 𝑥=𝑦
𝑥
𝐸(𝑋𝑌) = ∫ ∫ 𝑥𝑦 𝑑𝑥 𝑑𝑦 = ∫ ( 𝑦) | 𝑥=−𝑦 𝑑𝑦 = 0
0 −𝑦 0 2
2
⇒ 𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌) = 0 − 0 × = 0
3
As seen in the above example, the covariance between these two variables is
equal to zero. However, since their limits depend on each other, these variables are
not independent. Two variables with zero covariance are called uncorrelated. It
496 | P a g e
should be noted that if 𝑋 and 𝑌 are independent random variables, then the
covariance between 𝑋 and 𝑌 is always zero. Nonetheless, the opposite of this
statement is not necessarily valid. This means that if the covariance between the two
random variables is zero, then they can be independent or dependent.
In what follows, we mention some important relationships about covariance:
𝐶𝑜𝑣(𝑋, 𝑎) = 0 (4-1)
Proof.
𝐶𝑜𝑣(𝑋, 𝑎) = 𝐸(𝑎𝑋) − 𝐸(𝑎)𝐸(𝑋) = 𝑎𝐸(𝑋) − 𝑎𝐸(𝑋) = 0
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐶𝑜𝑣(𝑌, 𝑋) (4-2)
𝐶𝑜𝑣(𝑋, 𝑋) = 𝑉𝑎𝑟(𝑋) (4-3)
Proof.
𝐶𝑜𝑣(𝑋, 𝑋) = 𝐸[(𝑋 − 𝜇𝑥 )(𝑋 − 𝜇𝑥 )] = 𝐸[(𝑋 − 𝜇𝑥 )2 ] = 𝑉𝑎𝑟 (𝑋)
𝐶𝑜𝑣(𝑎𝑋, 𝑌) = 𝑎𝐶𝑜𝑣(𝑋, 𝑌) (4-4)
Proof.
𝐶𝑜𝑣(𝑎𝑋, 𝑌) = 𝐸[(𝑎𝑋 − 𝑎𝜇𝑥 )(𝑌 − 𝜇𝑌 )] = 𝑎𝐸[(𝑋 − 𝜇𝑥 )(𝑌 − 𝜇𝑌 )] = 𝑎𝐶𝑜𝑣(𝑋, 𝑌)
𝐶𝑜𝑣(𝑎𝑋, 𝑏𝑌) = 𝑎𝑏𝐶𝑜𝑣(𝑋, 𝑌) (4-5)
𝐶𝑜𝑣(𝑋, 𝑌1 + 𝑌2 ) = 𝐶𝑜𝑣(𝑋, 𝑌1 ) + 𝐶𝑜𝑣(𝑋, 𝑌2 ) (4-6)

Proof.
𝐶𝑜𝑣(𝑋, 𝑌1 + 𝑌2 ) = 𝐸[(𝑋 − 𝜇𝑥 )[𝑌1 + 𝑌2 − (𝜇𝑌1 + 𝜇𝑌2 )]]
= 𝐸[(𝑋 − 𝜇𝑥 )[(𝑌1 − 𝜇𝑌1 ) + (𝑌2 − 𝜇𝑌2 )]] = 𝐸[(𝑋 − 𝜇𝑥 )(𝑌1 − 𝜇𝑌1 )] + 𝐸[(𝑋 − 𝜇𝑥 )(𝑌2 − 𝜇𝑌2 )]
497 | P a g e
= 𝐶𝑜𝑣(𝑋, 𝑌1 ) + 𝐶𝑜𝑣(𝑋, 𝑌2 )
𝐶𝑜𝑣(𝑋1 + 𝑋2 , 𝑌1 + 𝑌2 ) = 𝐶𝑜𝑣(𝑋1 , 𝑌1 ) + 𝐶𝑜𝑣(𝑋1 , 𝑌2 ) + 𝐶𝑜𝑣(𝑋2 , 𝑌1 ) + 𝐶𝑜𝑣(𝑋2 , 𝑌2 ) (4-7)
In the same manner as the proof of the relationship (4.6), expand the left-side
expression of the relationship to get the right-side expression.
𝑛 𝑚 𝑛 𝑚
𝐶𝑜𝑣(∑ 𝑋𝑖 , ∑ 𝑌𝑗 ) = ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑌𝑗 ) (4-8)

𝑖=1 𝑗=1 𝑖=1 𝑗=1
This relationship is the generalization of the relationship (4.7) and its proof is
similar to the relationship (4.7).
𝑛 𝑚 𝑛 𝑚
𝐶𝑜𝑣(∑ 𝑎𝑖 𝑋𝑖 , ∑ 𝑏𝑗 𝑌𝑗 ) = ∑ ∑ 𝑎𝑖 𝑏𝑗 𝐶𝑜𝑣(𝑋𝑖 , 𝑌𝑗 ) (4-9)

𝑖=1 𝑗=1 𝑖=1 𝑗=1
To prove this relationship, apply the relationships (4.5) and (4.8).
𝑉𝑎𝑟(∑𝑛𝑖=1 𝑋𝑖 ) = ∑𝑛𝑖=1 𝑉𝑎𝑟(𝑋𝑖 ) + 2 ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) (4-10)

𝑖<𝑗
Proof.
𝑛
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = 𝐶𝑜𝑣(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )
𝑖=1
= 𝐶𝑜𝑣(𝑋1 , 𝑋1 ) + 𝐶𝑜𝑣(𝑋1 , 𝑋2 ) + ⋯ + 𝐶𝑜𝑣(𝑋1 , 𝑋𝑛 )
+𝐶𝑜𝑣(𝑋2 , 𝑋1 ) + 𝐶𝑜𝑣(𝑋2 , 𝑋2 ) + ⋯ + 𝐶𝑜𝑣(𝑋2 , 𝑋𝑛 )
⋮ ⋮
+𝐶𝑜𝑣(𝑋𝑛 , 𝑋1 ) + 𝐶𝑜𝑣(𝑋𝑛 , 𝑋2 ) + ⋯ + 𝐶𝑜𝑣(𝑋𝑛 , 𝑋𝑛 )
498 | P a g e
Considering 𝐶𝑜𝑣(𝑋1 , 𝑋1 ) = 𝑉𝑎𝑟(𝑋1 ) and 𝐶𝑜𝑣(𝑋1 , 𝑋2 ) = 𝐶𝑜𝑣(𝑋2, 𝑋1 ), we have:
𝑛 𝑛 𝑛
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = ∑ 𝑉𝑎𝑟(𝑋𝑖 ) + ∑ ∑ 𝐶𝑜𝑣 (𝑋𝑖 , 𝑋𝑗 ) = ∑ 𝑉𝑎𝑟(𝑋𝑖 ) + 2 ∑ ∑ 𝐶𝑜𝑣 (𝑋𝑖 , 𝑋𝑗 )

𝑖=1 𝑖=1 𝑖≠𝑗 𝑖=1 𝑖<𝑗
Note that the number of terms of (∑ ∑ 𝐶𝑜𝑣 (𝑋𝑖 , 𝑋𝑗 )) or (2 ∑ ∑ 𝐶𝑜𝑣 (𝑋𝑖 , 𝑋𝑗 )) is

𝑖≠𝑗 𝑖<𝑗
𝑛
equal to 𝑛2 − 𝑛 or 2 ( ). (why?)
2
𝑛 𝑛
𝑉𝑎𝑟(∑ 𝑎𝑖 𝑋𝑖 ) = ∑ 𝑎𝑖2 𝑉𝑎𝑟(𝑋𝑖 ) + ∑ ∑ 𝑎𝑖 𝑎𝑗 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) (4-11)

𝑖=1 𝑖=1 𝑖≠𝑗
To prove this relationship, apply the relationships (4.5) and (4.10).
𝐶𝑜𝑣(𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑) = 𝑎𝑐 𝐶𝑜𝑣(𝑋, 𝑌) (4-12)
To prove this relationship, use the relationships (4.1), (4.5), and (4.7).
Example 4.3
Suppose that the random variables 𝑋, 𝑌, and 𝑍 have the following variances
and covariances:
𝜎𝑌2 = 12 𝜎𝑥2 = 8 𝜎𝑧2 = 18
𝐶𝑜𝑣(𝑋, 𝑌) = 1 𝐶𝑜𝑣(𝑋, 𝑍) = −3 𝐶𝑜𝑣(𝑌, 𝑍) = 2
Then it is desired to calculate the covariance between two variables 𝑈 = 𝑋 +

4𝑌 + 2𝑍 and 𝑉 = 3𝑋 − 𝑌 − 𝑍.
Solution.
𝐶𝑜𝑣(𝑈, 𝑉) = 𝐶𝑜𝑣(𝑋 + 4𝑌 + 2𝑍, 3𝑋 − 𝑌 − 𝑍)
= 3𝑉𝑎𝑟(𝑋) − 4𝑉𝑎𝑟(𝑌) − 2𝑉𝑎𝑟(𝑍) + 11𝐶𝑜𝑣(𝑋, 𝑌) + 5𝐶𝑜𝑣(𝑋, 𝑍) − 6𝐶𝑜𝑣(𝑌, 𝑍)
= 24 − 48 − 36 + 11 − 15 − 12 = −76
499 | P a g e
Example 4.4
Obtain the covariance between the number of successes and failures in 𝑛

independent Bernoulli trials with identical parameter 𝑝.
Solution. If we define 𝑋 to denote the number of successes in 𝑛 independent
Bernoulli trials with identical parameter 𝑝 (following the binomial distribution), then
the number of failures in these 𝑛 trials can be denoted by 𝑛 − 𝑋. Therefore, we have:
𝐶𝑜𝑣(𝑋, 𝑛 − 𝑋) = 𝐶𝑜𝑣(𝑋, 𝑛) − 𝐶𝑜𝑣(𝑋, 𝑋) = 0 − 𝑛𝑝(1 − 𝑝)

It is evident that as the number of successes increases in 𝑛 trials, the number
of failures in the trials decreases. Hence, as seen in the above equation, the
covariance between the two variables is negative.
Example 4.5
Obtain the variance of the binomial distribution 𝑋 by writing it as the sum of

indicator variables.
Solution.
1 ; if the 𝑖 𝑡ℎ trial results in a success
𝑋𝑖 = {
0 ; otherwise
𝑋~𝐵(𝑛, 𝑝) 𝑋 = ∑ 𝑋𝑖
𝑖=1
Since each of 𝑋𝑖 's independently follows a Bernoulli trial with parameter 𝑝, we
have:
𝑉𝑎𝑟(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1) [1 − 𝑃(𝑋𝑖 = 1)] = 𝑝(1 − 𝑝)
𝑛
⇒ 𝑉𝑎𝑟(𝑋) = ∑ 𝑉𝑎𝑟(𝑋𝑖 ) + ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 𝑛𝑉𝑎𝑟(𝑋𝑖 ) + 0 = 𝑛𝑝(1 − 𝑝)

𝑖=1 𝑖≠𝑗
500 | P a g e
Similar to the preceding example and using the relationship (4.10) as well as
considering the fact that the sum of 𝑟 independent geometric random variables with
identical parameter 𝑝 follows a negative binomial distribution with parameters (𝑟, 𝑝),
𝑟(1−𝑝)
we can show that the variance of the negative binomial distribution is equal to .
𝑝2
Moreover, given that the sum of 𝛼 independent exponential random variables with
identical parameter 𝜆 follows an Erlang distribution with parameters (𝛼,𝜆), we can
𝛼
show that the variance of Erlang distribution is equal to .
𝜆2
In some cases, 𝑋𝑖 's follow the same distribution. If so, the variance of all the
𝑋𝑖 's are equal. Moreover, the covariance between 𝑋𝑖 and 𝑋𝑗 will be the same for all the
values of 𝑖 ≠ 𝑗.
In these conditions, we have:
𝑛
𝑛
𝑉𝑎𝑟 (∑ 𝑋𝑖 ) = 𝑛𝑉𝑎𝑟(𝑋𝑖 ) + 2 ( ) 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 )
2
𝑖=1
Example 4.6
In the matching problem, obtain the variance of the number of people who
pick their own hats.
Solution. We define variables 𝑋 and 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 in the same manner as Example

3.3 of this chapter.
𝑛 𝑛
𝑋 = ∑ 𝑋𝑖 ⇒ 𝑉𝑎𝑟(𝑋) = ∑ 𝑉𝑎𝑟(𝑋𝑖 ) + 2 ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 )

𝑖=1 𝑖=1 𝑖<𝑗
where each of 𝑋𝑖 's follows Bernoulli distribution. Therefore, we have:
1 1
𝑉𝑎𝑟(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1)[1 − 𝑃(𝑋𝑖 = 1)] = (1 − )
𝑛 𝑛
𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 𝐸(𝑋𝑖 𝑋𝑗 ) − 𝐸(𝑋𝑖 )𝐸(𝑋𝑗 )
501 | P a g e
Note that (𝑋𝑖 𝑋𝑗 ) also follows Bernoulli distribution and adopts the value of 1
only when both of 𝑋𝑖 and 𝑋𝑗 are equal to 1.
1 ; if the 𝑖 𝑡ℎ and 𝑗 𝑡ℎ people pick their own hats

𝑋𝑖 𝑋𝑗 = {
0 ; otherwise
𝐸(𝑋𝑖 𝑋𝑗 ) = 1 × 𝑃(𝑋𝑖 𝑋𝑗 = 1) + 0 × 𝑃(𝑋𝑖 𝑋𝑗 = 0) = 𝑃(𝑋𝑖 𝑋𝑗 = 1) = 𝑃(𝑋𝑖 = 1, 𝑋𝑗 = 1)

1 1
= 𝑃(𝑋𝑖 = 1)𝑃(𝑋𝑗 = 1 |𝑋𝑖 = 1) = ×
𝑛 𝑛−1
1 1
⇒ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 𝐸(𝑋𝑖 𝑋𝑗 ) − 𝐸(𝑋𝑖 )𝐸(𝑋𝑗 ) = − 2
𝑛(𝑛 − 1) 𝑛
𝑛
⇒ 𝑉𝑎𝑟(𝑋) = ∑ 𝑉𝑎𝑟(𝑋𝑖 ) + 2 ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 )

𝑖=1 𝑖<𝑗
1 1 𝑛 1 1
= 𝑛 (1 − ) + 2 ( ) [ − 2] = 1
𝑛 𝑛 2 𝑛(𝑛 − 1) 𝑛
Therefore, it is seen that in addition to the expected value of the number of
matches, its variance is equal to 1.
However, this result is not at odds with our expectation since we showed in
chapter 6 that the number of matches for the large values of 𝑛 is approximately a
Poisson distribution with parameter 𝜆 = 1. Hence, the mean and variance of
approximate distribution of the number of matches in Chapter 6 are equal to 1.
Example 4.7
In Example 3.6, obtain variance of the hypergeometric distribution by writing

it as the sum of indicator random variables.
Solution. We define variables 𝑋 and 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 in the same manner as Example 3.6

of this chapter.
502 | P a g e
𝑛 𝑛
𝑋 = ∑ 𝑋𝑖 ⇒ 𝑉𝑎𝑟(𝑋) = ∑ 𝑉𝑎𝑟(𝑋𝑖 ) + 2 ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 )

𝑖=1 𝑖=1 𝑖<𝑗
where each of 𝑋𝑖 's follows Bernoulli distribution. Therefore, we have:
𝑚 𝑚
𝑉𝑎𝑟(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1)[1 − 𝑃(𝑋𝑖 = 1)] = (1 − )
𝑁 𝑁
Note that (𝑋𝑖 𝑋𝑗 ) also follows Bernoulli distribution and adopts the value of 1
only when both of 𝑋𝑖 and 𝑋𝑗 are equal to 1.
1 ; if the 𝑖 𝑡ℎ and 𝑗 𝑡ℎ choices result in successes

𝑋𝑖 𝑋𝑗 = {
0 ; otherwise
𝐸(𝑋𝑖 𝑋𝑗 ) = 1 × 𝑃(𝑋𝑖 𝑋𝑗 = 1) + 0 × 𝑃(𝑋𝑖 𝑋𝑗 = 0) = 𝑃(𝑋𝑖 𝑋𝑗 = 1) = 𝑃(𝑋𝑖 = 1, 𝑋𝑗 = 1)

𝑚 𝑚−1
= 𝑃(𝑋𝑖 = 1)𝑃(𝑋𝑗 = 1 |𝑋𝑖 = 1) = ×
𝑁 𝑁−1
𝑚(𝑚 − 1) 𝑚2
⇒ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 𝐸(𝑋𝑖 𝑋𝑗 ) − 𝐸(𝑋𝑖 )𝐸(𝑋𝑗 ) = −
𝑁(𝑁 − 1) 𝑁 2
𝑛
𝑚 𝑚 𝑛 𝑚(𝑚 − 1) 𝑚2
⇒ 𝑉𝑎𝑟(𝑋) = ∑ 𝑉𝑎𝑟(𝑋𝑖 ) + 2 ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 𝑛 (1 − ) + 2 ( ) [ − ]
𝑁 𝑁 2 𝑁(𝑁 − 1) 𝑁 2
𝑖=1 𝑖<𝑗
𝑚 𝑚 𝑁−𝑛
= 𝑛 (1 − )
𝑁 𝑁 𝑁−1
Example 4.8
In Example 3.4 of this chapter, obtain the variance of the number of empty
urns.
503 | P a g e
Solution.
𝑟 𝑟
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟(∑ 𝑋𝑖 ) = ∑ 𝑉𝑎𝑟(𝑋𝑖 ) + 2 ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 )

𝑖=1 𝑖=1 𝑖<𝑗
𝑟−1 𝑛 𝑟−1 𝑛 1 1
𝑉𝑎𝑟(𝑋𝑖 ) = 𝑃(𝑋𝑖 = 1)[1 − 𝑃(𝑋𝑖 = 1)] = ( ) (1 − ( ) ) = (1 − )𝑛 − (1 − )2𝑛
𝑟 𝑟 𝑟 𝑟
(𝑋𝑖 𝑋𝑗 ) adopts only the values of 0 and 1. Hence, it follows the Bernoulli
distribution. Therefore, we have:
(𝑟 − 2)𝑛 2
𝐸(𝑋𝑖 𝑋𝑗 ) = 𝑃(𝑋𝑖 𝑋𝑗 = 1) = 𝑃(𝑋𝑖 = 1, 𝑋𝑗 = 1) = 𝑛
= (1 − )𝑛
𝑟 𝑟
2
2 1
⇒ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = (1 − )𝑛 − [(1 − )𝑛 ]
𝑟 𝑟
1 𝑛 1 2𝑛 𝑟 2 1
⇒ 𝑉𝑎𝑟(𝑋) = 𝑟 [(1 − ) − (1 − ) ] + 2 ( ) [(1 − )𝑛 − (1 − )2𝑛 ]
𝑟 𝑟 2 𝑟 𝑟
Example 4.9
In Example 3.5, obtain the variance of men who sit next to their spouses.
Solution.
10 10
𝑋 = ∑ 𝑋𝑖 ⇒ 𝑉𝑎𝑟 (𝑋) = ∑ 𝑉𝑎𝑟 (𝑋𝑖 ) + 2 ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 )

𝑖=1 𝑖=1 𝑖<𝑗
2 2 34
𝑉𝑎𝑟 (𝑋𝑖 ) = [ 𝑃(𝑋𝑖 = 1) ][1 − 𝑃(𝑋𝑖 = 1)] = − ( )2 = 2
19 19 19
And for 𝑖 ≠ 𝑗, we have:
2 × 2 × 17! 2
𝐶𝑜𝑣 (𝑋𝑖 , 𝑋𝑗 ) = 𝐸(𝑋𝑖 𝑋𝑗 ) − 𝐸(𝑋𝑖 ) 𝐸(𝑋𝑗 ) = − ( )2
19! 19
34 10 2 × 2 2 360
⇒ 𝑉𝑎𝑟 (𝑋) = 10 × 2 + 2 ( ) [ − ( )2 ] =
19 2 18 × 19 19 361
Note that since 𝑋𝑖 's are Bernoulli distributions, 𝐸(𝑋𝑖 𝑋𝑗 ) = 𝑃(𝑋𝑖 = 1 , 𝑋𝑗 = 1).
504 | P a g e
Example 4.10
If 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent random variables with identical distributions,

∑ 𝑛
𝑋𝑖
mean 𝜇, and variance 𝜎 2 , then 𝑋̄ or the sample mean of 𝑋𝑖 's is defined as 𝑋̄ = 𝑖=1 .
𝑛
Obtain the expected value and variance of 𝑋̄.
Solution.
𝑛 𝑛
1 1 1
𝐸(𝑋̄) = 𝐸(∑ 𝑋𝑖 ) = ∑ 𝐸(𝑋𝑖 ) = 𝑛𝜇 = 𝜇
𝑛 𝑛 𝑛
𝑖=1 𝑖=1
𝑛 𝑛
1 1 1 𝜎2
𝑉𝑎𝑟(𝑋̄) = 2
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = 2 [∑ 𝑉𝑎𝑟(𝑋𝑖 ) + ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 )] = 2 [𝑛𝜎 + 0] =
𝑛2 𝑛 𝑛 𝑛
𝑖=1 𝑖=1 𝑖≠𝑗
Note that variables 𝑋𝑖 and 𝑋𝑗 are independent for 𝑖 ≠ 𝑗. Hence, 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 0.

The results of Example 4.10 are frequently applied in probability theory and
statistics. In fact, Example 4.10 states that the expected value of 𝑋̄ or the sample mean
is equal to the expected value of every single 𝑋𝑖 . However, the sample variance is not
1
equal to the variance of 𝑋𝑖 's, but equal to 𝑛 multiplied by the variance of each of 𝑋𝑖 's.
Therefore, if 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent random variables with identical
distributions, mean 𝜇, and variance 𝜎 2 , then the mean and variance of the random
variable ∑𝑛𝑖=1 𝑋𝑖 are equal to 𝑛𝜇 and 𝑛𝜎 2 , respectively. Moreover, the mean and
𝜎2
variance of the random variable 𝑋̄ are equal to 𝜇 and , respectively.
𝑛
Consider a process of producing light bulbs whose lifetimes independently
follow exponential distribution with mean 𝜇 and finite variance 𝜎 2 . If we take a sample
of size 𝑛 at random from the light bulbs, then the random variable 𝑋̄ has mean 𝜇 and
𝜎2
variance . It is evident that as the value of 𝑛 increases, the variance of 𝑋̄ decreases.
𝑛
In fact, as the sample size increases, the variable 𝑋̄ adopts the values with less
dispersion about 𝜇. This note was proven by the Chebyshev theorem in Chapter 5.
According to the Chebyshev theorem, if 𝑋 is a random variable with mean 𝜇 and finite
variance 𝜎 2 , then for 𝑘 > 0 we have:
505 | P a g e
𝑉𝑎𝑟(𝑋)
𝑃{|𝑋 − 𝜇| ≥ 𝑘 } ≤
𝑘2
Now, if 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent random variables with identical
distributions, mean 𝜇, and variance, then for 𝑘 > 0 we have:
𝜎2
𝑉𝑎𝑟(𝑋̄) 𝑛 𝜎2
̄
𝑃{|𝑋 − 𝜇| ≥ 𝑘 } ≤ = 2 = 2
𝑘2 𝑘 𝑛𝑘
Now, if 𝑛 increases, it is evident that the obtained upper limit of 𝑃{|𝑋̄ − 𝜇| ≥ 𝑘 }
converges to zero for any value of 𝑘 > 0. Therefore, as 𝑛 increases, the probability
that the event (|𝑋̄ − 𝜇| ≥ 𝑘) occurs converges to zero. This law is called the Weak Law
of Large Numbers.
U nfortunately, we often cannot use covariance as an absolutely reasonable

criterion to measure the relationship between two variables. This is mainly
because the numerical value of the covariance is affected not only by the degree of
relationship between the two variables but also by their dispersion and scale of
measurement. For example, if the scale of 𝑋 and 𝑌 is measured in meters and the
covariance between the two variables is equal to a specified value, then when we
change the scale of 𝑋 and 𝑌 from meters to centimeters, the covariance value
between the two variables becomes 10 thousand times. In other words, if the
covariance of 𝑋 and 𝑌 is greater than that of 𝑈 and 𝑉, then we cannot reasonably
conclude that the relationship of 𝑋 and 𝑌 is higher than that of 𝑈 and 𝑉. This is
because the higher covariance between 𝑋 and 𝑌 can be resulted from the fact that
their range of values is higher or their scale of measurement is different.
To obviate this problem or standardize the covariance, we need to make a
change that eliminates the effect of both the variables' range and the measurement
unit on covariance. For this purpose, we define the linear correlation coefficient
between 𝑋 and 𝑌, denoted by 𝜌(𝑋, 𝑌).
Definition
The linear correlation coefficient between the two random variables 𝑋 and 𝑌 with
respective variances 𝜎𝑥2 and 𝜎𝑌2 is defined as follows:
𝐶𝑜𝑣(𝑋, 𝑌) 𝐶𝑜𝑣(𝑋, 𝑌)
𝜌(𝑋, 𝑌) = =
√𝑉𝑎𝑟(𝑋)𝑉𝑎𝑟(𝑌) 𝜎𝑥 𝜎𝑌
506 | P a g e
It can be shown that 𝜌(𝑋, 𝑌) has no measurement unit and its value is always
between −1 and +1. Hence, the correlation coefficient can be as a criterion to
measure the degree of linear relationship between two random variables. To prove
that 𝜌(𝑋, 𝑌) always adopts the values between −1 and +1, suppose that we have
the random variables 𝑋 and 𝑌 with respective variances 𝜎𝑥2 and 𝜎𝑌2 . If so, we have:
𝑋 𝑌 𝑉𝑎𝑟(𝑋) 𝑉𝑎𝑟(𝑌) 2 𝐶𝑜𝑣(𝑋, 𝑌)

𝑉𝑎𝑟 ( + ) ≥ 0 ⇒ + + ≥0
𝜎𝑋 𝜎𝑌 𝜎𝑋2 𝜎𝑌2 𝜎𝑋 𝜎𝑌
⇒ 2 [1 + 𝜌(𝑋, 𝑌)] ≥ 0 ⇒ −1 ≤ 𝜌(𝑥, 𝑦)
𝑋 𝑌 𝑉𝑎𝑟(𝑋) 𝑉𝑎𝑟(𝑌) 2 𝐶𝑜𝑣(𝑋, 𝑌)

𝑉𝑎𝑟 ( − ) ≥ 0 ⇒ + − ≥0
𝜎𝑋 𝜎𝑌 𝜎𝑋2 𝜎𝑌2 𝜎𝑋 𝜎𝑌
⇒ 2 [1 − 𝜌(𝑋, 𝑌)] ≥ 0 ⇒ 𝜌(𝑥, 𝑦) ≤ +1
It should be noted that the sign of correlation coefficient only depends on

the sign of covariance. In other words, if the value of 𝑌, on average, increases as
that of 𝑋 increases, then the correlation coefficient between 𝑋 and 𝑌 is positive.
Furthermore, if the value of 𝑌, on average, decreases as that of 𝑋 increases, then
the correlation coefficient between 𝑋 and 𝑌 is negative. Moreover, the values of
−1 and +1 indicate the absolute linear relationship between the two variables. This
means that as the relationship between the two variables approaches to the linear
relationship, the absolute value of their correlation coefficient converges more to
1.
Example 5.1
If 𝑋1 , 𝑋2 , 𝑋3 , 𝑎𝑛𝑑 𝑋4 are mutually uncorrelated random variables, each having

mean 0 and variance 1, then it is desired to calculate the correlation coefficient
between:
a. (𝑋1 + 𝑋2 ) and (𝑋2 + 𝑋3 )
b. (𝑋1 + 𝑋2 ) and (𝑋3 + 𝑋4 )
507 | P a g e
Solution.
a. Considering 𝑋𝑖 's are mutually uncorrelated, we have:
𝑉𝑎𝑟(𝑋1 + 𝑋2 ) = 𝑉𝑎𝑟(𝑋2 + 𝑋3 ) = 1 + 1 = 2
1 1
𝐶𝑜𝑣 (𝑋1 + 𝑋2 , 𝑋2 + 𝑋3 ) = 𝑉𝑎𝑟 (𝑋2 ) = 1 ⇒ 𝜌 = =
√2 × 2 2
b. 𝐶𝑜𝑣 (𝑋1 + 𝑋2 , 𝑋3 + 𝑋4 ) = 0 ⇒ 𝜌 = 0
If the relationship between the two variables is linear, then their correlation
coefficient is always either 1 or -1. To prove this note, suppose that 𝑋 is an arbitrary
random variable, then for finite values of 𝑎, 𝑏, 𝑐, and 𝑑, we have:
𝐶𝑜𝑣(𝑎𝑋 + 𝑏, 𝑐𝑋 + 𝑑)
𝜌(𝑎𝑋 + 𝑏, 𝑐𝑋 + 𝑑) =
√𝑉𝑎𝑟(𝑎𝑋 + 𝑏)𝑉𝑎𝑟(𝑐𝑋 + 𝑑)
𝑎𝑐𝑉𝑎𝑟(𝑋) 𝑎𝑐
= =
√𝑎2 𝑉𝑎𝑟(𝑋)√𝑐 2 𝑉𝑎𝑟(𝑋) |𝑎𝑐|
𝑎𝑐
Since |𝑎𝑐| adopts only the value of 1 or -1, the correlation coefficient between
the variables 𝑎𝑋 + 𝑏 and 𝑐𝑋 + 𝑑 which have linear relationship is always 1 or -1.
Example 5.2
Obtain the correlation coefficient between the number of successes and

failures in 𝑛 independent Bernoulli trials.
Solution. If we define 𝑋 to denote the number of successes in 𝑛 independent

Bernoulli trials with identical parameter 𝑝, then the number of failures in the 𝑛
trials is equal to 𝑛 − 𝑋. Hence, we have:
𝜌(𝑋, 𝑛 − 𝑋) = −1
508 | P a g e
Finally, to calculate the correlation coefficient between the two variables,
we should state that if 𝑋 and 𝑌 are two random variables with correlation
coefficient 𝜌, the absolute value of the correlation coefficient between the two
linear functions of 𝑋 and 𝑌 as 𝑎𝑋 + 𝑏 and 𝑐𝑋 + 𝑑 does not change. Accordingly:
𝑎𝑐
𝜌(𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑) = 𝜌(𝑋, 𝑌)
|𝑎𝑐|
The proof of the above equation is as follows:
𝐶𝑜𝑣(𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑) 𝑎𝑐𝐶𝑜𝑣(𝑋, 𝑌) 𝑎𝑐
𝜌(𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑) = = = 𝜌(𝑋, 𝑌)
√𝑉𝑎𝑟(𝑎𝑋 + 𝑏)𝑉𝑎𝑟(𝑐𝑌 + 𝑑) √𝑎2 𝑉𝑎𝑟(𝑋)√𝑐 2 𝑉𝑎𝑟(𝑌) |𝑎𝑐|
𝑎𝑐
As a result, since adopts only the value of 1 or -1, 𝜌(𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑) is equal
|𝑎𝑐|
to 𝜌(𝑋, 𝑌) or −𝜌(𝑋, 𝑌).
For example, suppose that 𝑋 and 𝑌 are two random variables with respective
units of measurement kilograms and meters. If we change their units to grams and
centimeters, respectively, then we have simply multiplied the possible values of 𝑋
and 𝑌 to the respective values of 1000 and 100. However, the correlation coefficient
value between the two variables does not change.
Finally, we should remember that the high correlation between two random
variables does not necessarily mean their causal relationship. For instance, suppose
that as the number of daily car accidents increases in a certain area, the number of
available umbrellas in cars increases as well. In such conditions, in spite of a high
correlation between the two variables, we cannot conclude that the higher usage of
umbrellas causes an increase of the car accidents, and vice versa. In fact, perhaps a
more appropriate interpretation is to say when it is rainy, the car accident rate and
usage of umbrella increase, and probably rainy weather has resulted in the increasing
number of accidents and usage of umbrella. As a result, the high correlation between
two variables does not necessarily signify the existence of causal relationship
between them.
509 | P a g e
1) Suppose that 𝑋~𝑁(1,4) and 𝑌~𝑁(4,9) are two independent random variables.
a. 𝐸(𝑋𝑌)
b. 𝐸(𝑋 2 − 𝑋𝑌 2 )
c. 𝑉𝑎𝑟(𝑋𝑌)
2) If 𝑋 and 𝑌 are two independent random variables uniformly distributed over
the interval (0,1), it is desired to calculate:
a. 𝐸[|𝑋 − 𝑌|]
b. 𝐸[|𝑋 − 𝑌|𝑎 ] for 𝑎 > 0
3) The joint probability function of the random variables 𝑋, 𝑌, and 𝑍 is given by
1
𝑃(1,2,3) = 𝑃(2,1,1 ) = 𝑃(2,2,1) = 𝑃(2,3,2) =
4
It is desired to calculate the expected value of:
a. 𝑈 = 𝑋𝑌 + 𝑋𝑍 + 𝑌𝑍
b. 𝑉 = 𝑋𝑌𝑍
4) Suppose that 𝑋1 , 𝑋2 , 𝑋3 are independent Bernoulli random variables with identical
parameter 𝑝 (𝑞 = 1 − 𝑝). If so, it is desired to calculate:
a. The expected value of random variable 𝑌 = 𝑋1 𝑋2 + 𝑋1 𝑋3 + 𝑋2 𝑋3
b. The variance of random variable 𝑌 = 𝑋1 𝑋2 + 𝑋1 𝑋3 + 𝑋2 𝑋3
5) If 𝑋1 , 𝑋2 , . . . , 𝑋10 are independent Bernoulli random variables with identical mean
1
, then it is desired to calculate:
3
a. The expected value of the random variable 𝑌1 = ∑5𝑖=1 ∑10

𝑗=6 𝑋𝑖 𝑋𝑗
b. The expected value of the random variable 𝑌2 = ∑10 10
𝑖=1 ∑𝑗=1 𝑋𝑖 𝑋𝑗
6) Suppose that A and B are two independent events with respective probabilities
0.6 and 0.7. If 𝐼𝐴 and 𝐼𝐵 denote the indicator functions of events A and B,
respectively, then it is desired to calculate:
a. 𝐸(𝐼𝐴 𝐼𝐵 )
510 | P a g e
b. 𝐸(𝐼𝐴2 +I𝐵2 )
c. 𝐸[(𝐼𝐴 +I𝐵 )3 ]
Hint: 𝐼𝐴 or the indicator function of event A is a Bernoulli random variable that
it is equal to 1 if event A occurs; otherwise, it is zero.
7) In flipping of a match box, suppose that we will be fined $1 if a larger side lands,
receive $2 if a medium side lands, and receive $5 if a smaller side lands. In each
6 3
flip, the probability of the larger, medium, and smaller faces landing is , ,
10 10
1
and respectively. If we flip the match box five times, what is the expected
10
value of the gain amount?
8) A city hospital is located in the center of a square of side 3 miles. If an accident
occurs inside the square and the hospital dispatches an ambulance, then the
routes traveled by the ambulance in the avenues of the city has rectangular
shape. This means that if the coordinates of the hospital is denoted by (0,0)
and the coordinates of the accident location is denoted by (𝑥, 𝑦), then the
distance of the ambulance location from the accident location is equal to |𝑥| +
|𝑦|. Now, if an accident occurs at a point uniformly distributed over the square,
what is the mean distance traveled by the ambulance?
9) If 𝑋 and 𝑌 are two random variables with finite means, prove the following
relationships:
a. 𝐸[Max(𝑋, 𝑌)] ≥ 𝑀𝑎𝑥[𝐸(𝑋), 𝐸(𝑌)]
b. 𝐸[Min(𝑋, 𝑌)] ≤ 𝑀𝑖𝑛[𝐸(𝑋), 𝐸(𝑌)]
10) If 𝑋1 , 𝑋2, and 𝑋3 are three independent exponential random variables with
identical mean 1, then it is desired to calculate:
𝑋1
a. 𝐸(𝑋 )
1 +𝑋2 +𝑋3
𝑋1 +𝑋2
b. 𝐸(𝑋 )
1 +𝑋2 +𝑋3
c. 𝐸(𝑋1 |𝑋1 + 𝑋2 + 𝑋3 = 10)
d. 𝐸(𝑋1 + 𝑋2 |𝑋1 + 𝑋2 + 𝑋3 = 10)
11) How many times do you expect to roll a fair die until all the six sides appear at
least once?
12) A bus contains 𝑘 passengers and stops at 𝑛 stations. The probability of getting
off for each passenger, independently, is the same in all the stations. The bus
511 | P a g e
stops whenever there is at least one passenger to get off. What is the mean
number of stops?
13) If we put 6 different marbles into 5 distinct bags such that each marble has
equal chance to be placed in each bag, then obtain the mean number of bags
having a marble.
14) Suppose that we have 10 objects numbered from 1 to 10. If people A and B
respectively select 2 and 10 objects at random and independently, then it is
a. The mean number of objects selected by both A and B.
b. The mean number of objects selected by only one of A and B.
c. The mean number of objects selected by at least one of A and B.
15) Suppose that we have 5 objects numbered from 1 to 5. If people A and B
individually select 3 objects at random and independently, then it is desired to
calculate:
a. The expected value of the minimum number selected by A given that
the sampling is without replacement.
b. The expected value of the minimum number selected by both
individuals given that the sampling is without replacement.
c. The expected value of the minimum number selected by both
individuals given that the sampling is with replacement.
d. The expected value of the distinct appeared numbers given that the
sampling is with replacement.
e. The expectation of the sum of upturned numbers obtained by the two
people given that the sampling is with replacement.
16) For a group of 100 people, it is desired to calculate:
a. The mean number of days in a year that the birth day of three people is
the same.
b. The mean number of days on which at least one person was born.
c. The mean number of people that no other person was born in their birth
days.
d. The mean number of two-person couples that have the same birth day.
512 | P a g e
17) We independently perform a trial 10 times where the probability of success
for each trial is 𝑝. We say that a change has happened whenever the result
of the current trial is different from the preceding one. For example, if the
result of the trial is ssfsfssffs, then we have had a total of 6 changes. Obtain
the mean number of changes.
18) 100 people have been separately invited to a party. Each of them will try to
see whether he has a friend among the attendees upon arriving at the party.
If so, he will sit at a table that his friend has sited; otherwise, he chooses a
100
vacant table. Suppose that each of ( ) couples of the people are mutual
2
friend independently with probability of 0.1. Obtain the mean number of the
used tables.
19) Three poachers are waiting for the flight of a bunch of ducks. When a bunch
of ducks fly over their heads, the poachers hit them simultaneously.
However, each poacher, independently, selects his target. If each poacher,
independently, hits his target with probability 0.2, then calculate the mean
number of the escaped ducks from a bunch of size 10.
20) A group of 20 men and 10 women has attended in a party. Obtain the mean
number of men that there is at least one woman who sits in one side of them
for the following conditions:
a. The people are randomly seated at a round table.
b. The people are randomly seated in a row.
21) Consider an urn containing 5 identical balls that three of them have number
1 and the others have number 4. If two balls are selected at random and
without replacement from the urn and the sum of the selected balls'
numbers is considered as a score, then what is the mean of obtained scores?
10
22) A box consists of 210 balls such that there are ( ) balls labeled by number
𝑘
𝑘 (𝑘 = 0,1,2, . . . ,10).
a. If we select a ball at random from the box, obtain the expected value
of the result.
b. If we select three balls at random and without replacement from the
box, obtain the mean of the sum of the results.
513 | P a g e
23) If the joint density function of 𝑋 and 𝑌 is as follows:
𝑥
1 −(𝑦+ )
𝑓 (𝑥, 𝑦) = 𝑦 𝑒 𝑦 ; 𝑥 > 0, 𝑦 > 0, and zero elsewhere.
then calculate 𝐶𝑜𝑣(𝑋, 𝑌).
24) If 𝑓𝑋 (𝑥) = 2𝑒 −2𝑥 , 𝑥 > 0 and 𝑌 = 𝑒 𝑋 , then calculate 𝐶𝑜𝑣(𝑋, 𝑌).
25) If the random variable 𝑋 follows an exponential distribution with density
function 𝑓𝑥 (𝑥) = 3𝑒 −3𝑥 , calculate the covariance and correlation coefficient
of (𝑋, 𝑋 2 ).
26) If 𝑍 is the standard normal random variable, then it is desired to calculate:
a. The covariance and correlation coefficient between (𝑍, 𝑍 2 ).
b. The covariance and correlation coefficient between (𝑍, 𝑍 3 ).
27) If 𝑋1 and 𝑋2 are independent random variables with identical standard
normal distribution, then it is desired to calculate:
a. The covariance and correlation coefficient between:
(𝑋1 + 𝑋2 , 𝑋12 + 𝑋22 )
b. The covariance and correlation coefficient between:
(𝑋1 + 𝑋2 , 𝑋13 + 𝑋23 )
28) Suppose that 𝑋 and 𝑌 are random variables.
a. Obtain 𝐶𝑜𝑣(𝑋 + 𝑌, 𝑋 − 𝑌).
b. If 𝑋 and 𝑌 have the same variance, obtain 𝐶𝑜𝑣(𝑋 + 𝑌, 𝑋 − 𝑌).
29) We roll a fair die twice. If 𝑋 denotes the sum of the upturned results and 𝑌
indicates subtracting the first roll's result from the second roll's result, then
calculate 𝐶𝑜𝑣(𝑋, 𝑌).
30) Consider the random variables 𝑋 and 𝑌 with joint density function
𝑓(𝑥, 𝑦) = 6𝑥 2 in region |𝑥| < 𝑦, 0 < 𝑦 < 1, and zero elsewhere.
a. Obtain the covariance and correlation coefficient between 𝑋 and 𝑌.
b. Are these two variables independent?
31) We select the random number (𝑥, 𝑦) uniformly distributed over a square
with vertices (1, −1), (−1,1), (1,1), and (−1, −1). Obtain Var(𝑋 + 𝑌).
514 | P a g e
32) Suppose that 𝑋 and 𝑌 are two dependent random variables, each having
1
variance of 4. If the correlation coefficient of 𝑋 and 𝑌 is equal to 2, then it is
a. Var(X+Y)
b. 𝑉𝑎𝑟(3𝑋 + 2𝑌)
33) If 𝑋 and 𝑌 are random variables with correlation coefficient 0.3 and the same
variance of 4, obtain the covariance and correlation coefficient between the
1
variables 𝑍1 = 2𝑋 − 3 and 𝑍2 = 7 (𝑌 + 5).
34) Suppose that 𝑋1 and 𝑋2 are independent random variables with identical
variance 𝜎 2 . If 𝑌 = 𝑋1 + 2𝑋2 and 𝑍 = 𝑋1 + 𝑏𝑋2, then determine the value of 𝑏
such that the variables 𝑌 and 𝑍 become uncorrelated.
35) If 𝑋1 , 𝑋2 , 𝑎𝑛𝑑 𝑋3 are identically distributed uniform random variables over the
1
interval (0,1) and 𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 24 ; 𝑖 ≠ 𝑗, then obtain:
a. 𝐶𝑜𝑣(3𝑋1 − 2𝑋2 + 𝑋3 , 2𝑋1 + 𝑋2 − 2𝑋3 )

b. The mean and variance of the random variable 𝑍 = 𝑋1 + 2𝑋2 − 𝑋3.
36) Suppose that 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 are independent random variables with identical
variance 𝜎 2 . If so, then obtain:
a. The covariance and correlation coefficient between 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛−1
and 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛 .
b. The covariance and correlation coefficient between 2𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
and 𝑋1 + 𝑋2 + ⋯ + 2𝑋𝑛 .
c. The covariance and correlation coefficient between 2𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛−1
and 𝑋2 + 𝑋3 + ⋯ + 2𝑋𝑛 .
𝑋
d. The covariance and correlation coefficient between 𝑋𝑖 and 𝑋̄ = ∑𝑛𝑖=1 𝑖. 𝑛
e. The covariance and correlation coefficient between 𝑋̄ and 𝑋𝑖 − 𝑋̄.
f. The covariance and correlation coefficient between 𝑋𝑖 and 𝑋𝑖 − 𝑋̄.
37) If 𝑋𝑖 's are mutually correlated with identical variance 𝜎 2 , then obtain the
covariance and correlation coefficient between (∑2𝑛 2𝑛 𝑖
𝑖=1 𝑋𝑖 , ∑𝑖=1(−1) 𝑋𝑖 ).
38) Suppose that 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 follow 𝑁(0,1).
a. If we have 𝑆𝑟 = ∑𝑟𝑖=1 𝑋𝑖 ; 𝑟 ≤ 𝑛, obtain the covariance and correlation
coefficient between 𝑆𝑟 and 𝑆𝑛 .
515 | P a g e
b. If we have 𝑆𝑟 = ∑𝑟𝑖=1 𝑋𝑖2 ; 𝑟 ≤ 𝑛, obtain the covariance and correlation
coefficient between 𝑆𝑟 and 𝑆𝑛 .
39) If 𝑋1 ∼ 𝑁(𝜇, 𝜎 2 ) and 𝑋𝑖 = 𝛽𝑋𝑖−1 (𝑖 = 2, . . . , 𝑛), obtain the covariance and
correlation coefficient between (𝑋1 , ∑𝑛𝑖=1 𝑋𝑖 ).
40) We flip a fair coin until getting a heads for the first time. Suppose that 𝑋
denotes the number of the upturned tails before appearing the first heads
and 𝑌 denotes the number of flips required to obtain the first heads. If so,
obtain the covariance and correlation coefficient between 𝑋 and 𝑌.
41) If the discrete random variable 𝑋 follows a binomial distribution with
1
parameters 𝑝 = 3 and 𝑛 = 10, then obtain the covariance and correlation
𝑋 10−𝑋
coefficient between 10 and 10 .
42) We roll a fair die once. If 𝑋 and 𝑌 are defined as follows:
1 ; 𝑡ℎ𝑒 𝑑𝑖𝑒 𝑙𝑎𝑛𝑑𝑠 𝑜𝑛 5 𝑜𝑟 6 1 ; 𝑡ℎ𝑒 𝑑𝑖𝑒 𝑙𝑎𝑛𝑑𝑠 𝑜𝑛 1 𝑜𝑟 2
𝑋={ 𝑌={
0 ; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0 ; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
then obtain 𝐶𝑜𝑣( 𝑋, 𝑌).
43) There are 15 intact and 5 defective parts in a lot. Suppose that we have
selected a sample of size 3 at random and without replacement from the lot.
The inspection time of each intact part takes 10 minutes. However, the
inspection and repair time of each defective time takes 20 minutes. If so,
a. The mean and variance of the total time for the inspection and repair
of the parts in the sample.
b. The covariance and correlation coefficient between the number of
defective and intact parts in the sample.
44) The random variables 𝑋 and 𝑌 with identical mean 𝜇 and respective
variances 𝜎𝑋2 and 𝜎𝑌2 are given. The weighted mean of 𝑋 and 𝑌 is defined as
𝑍 = 𝜆𝑋 + (1 − 𝜆)𝑌.
a. Obtain the expected value of 𝑍.
b. If 𝑋 and 𝑌 are independent, determine the value of 𝜆 such that the
variance of 𝑍 is minimized.
c. If 𝑋 and 𝑌 are dependent in a way that 𝐶𝑜𝑣(𝑋, 𝑌) = 1 and 𝜎𝑋2 = 1, 𝜎𝑌2 = 2,
then determine the value of 𝜆 such that the variance of 𝑍 is
minimized.
45) In a production line, 20% of the parts are damaged and the rest are intact. We
choose 10 parts at random. If 𝑋1 denotes the number of damaged parts and 𝑋2
denotes the number of intact parts, then obtain the variance of 𝑋1 + 2𝑋2.
516 | P a g e
1
46) Suppose that 𝑋 follows a geometric distribution with mean 𝑝. Obtain the
covariance and correlation coefficient between (𝑋, I{1} (𝑋)).
I{1} (𝑋) is an indicator function that adopts the value of 1 when 𝑋 is equal to 1.
47) Suppose that 𝑋 and 𝑌 denote the number of 1's and 2's in n rolls of a fair die,
respectively. Obtain the covariance and correlation coefficient between 𝑋 and
𝑌.
Hint: To calculate 𝐶𝑜𝑣(𝑋, 𝑌), apply the relationship 𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) +
𝑉𝑎𝑟(𝑌) + 2𝐶𝑜𝑣(𝑋, 𝑌).
48) There are 𝑚 black and 𝑛 white marbles in an urn (𝑁 = 𝑛 + 𝑚). We select two
marbles at random and without replacement from the urn. In the ith choice, if
a black marble is selected, we say that the random variable 𝑋𝑖 adopts the value
of 1; otherwise, it is zero.
a. Obtain 𝐶𝑜𝑣(𝑋1 , 𝑋2 ).
b. Obtain 𝜌(𝑋1 , 𝑋2 ).
49) In Problem 12, obtain the variance of the number of stops.
50) Suppose that random variables 𝑋 and 𝑌 denote the cost and revenue of a
household in a city, respectively, such that they follow a bivariate normal
distribution with parameters 𝜎𝑌2 = 16, 𝜎𝑋2 = 4, 𝜇𝑌 = 35, 𝜇𝑋 = 25 and correlation
1
coefficient 𝜌(𝑋, 𝑌) = 4. If the saving of the household is defined as 𝑆 = 𝑌 − 𝑋,
calculate 𝑃(2 < 𝑆 < 16).
51) Suppose that 𝑋, 𝑌, and 𝑍 are normal random variables with mean 0 and
1
variance 1. Moreover, the correlation coefficient between 𝑋 and 𝑌 is 2, between
𝑌 and Z is zero, and 𝑋 and 𝑍 are independent. If so, it is desired to obtain the
value of 𝑃(𝑋 + 𝑌 + 𝑍 < 8).
52) Suppose that we have a sample of size 𝑛 from (𝑋𝑖 , 𝑌𝑖 ) following a bivariate
distribution with 𝐸(𝑋𝑖 ) = 𝜇1 , 𝐸(𝑌𝑖 ) = 𝜇2 , 𝑉𝑎𝑟(𝑋𝑖 ) = 𝑉𝑎𝑟(𝑌𝑖 ) = 𝜎 2 , and
𝐶𝑜𝑣(𝑋𝑖 , 𝑌𝑖 ) = 𝜌𝜎 2 . If 𝑍1 , 𝑍2 , … , 𝑍𝑛 and 𝑈1 , 𝑈2 , … , 𝑈𝑛 are independent random
variables following the respective distributions of 𝑁(𝜇1 , 𝜎 2 ) and 𝑁(𝜇2 , 𝜎 2 ), then
𝑉𝑎𝑟(𝑋̄ −𝑌̄)
obtain the value of 𝑉𝑎𝑟(𝑍̄−𝑈̄).
517 | P a g e
53) If 𝑋 and 𝑌 have the following joint probability function, obtain the covariance
and correlation coefficient between them.
𝑌
1 2
𝑋
1 1
2
8 4
1 1
3
8 2
54) If 𝑋 and 𝑌 have the following joint probability function, obtain the covariance
and correlation coefficient between them.
𝑋
-1 0 1
𝑌
1
0 0 0
3
1 1
1 0
3 3
55) We have the following information for the variables 𝑋 and 𝑌:
𝑥 0 1 𝑦 1 2
2 1 1 3
𝑃(𝑥) 𝑃(𝑦)
3 3 4 4
1
Moreover, we know that 𝑃(𝑋 = 1, 𝑌 = 1) = 4. It is desired to obtain the variance
of 𝑍 = 3𝑋 + 𝑌 − 12.
56) Suppose that 𝐴 and 𝐵 are two events of a probabilistic space. If:
1 1 1
𝑃(𝐵|𝐴) = 2 𝑃(𝐴|𝐵) = 4 𝑃(𝐴) = 4 𝑋 = 𝐼𝐴 𝑌 = 𝐼𝐵
a. Obtain the covariance between 𝑋 and 𝑌.
b. Are these two variables independent?
57) The joint density function of the random variables 𝑋 and 𝑌 is given by:
𝑓𝑋,𝑌 (𝑥, 𝑦) = 2; 0 < 𝑥 < 𝑦 < 1, and zero elsewhere.
If we define 𝐵 = {(𝑥, 𝑦); 0 < 𝑥 + 𝑦 < 1}, then it is desired to calculate:
a. 𝐸(𝑌|𝐵)
b. 𝐸(𝑋|𝐵)
518 | P a g e
58) If 𝑋 and 𝑌 are independent and identically distributed uniform random
variables in interval (0,1), it is desired to calculate:
a. 𝐸(𝑌|𝑋 < 𝑌)
b. 𝐸(𝑋|𝑋 < 𝑌)
c. 𝐸(𝑋 − 𝑌|𝑋 < 𝑌)
d. 𝐸(𝑋|𝑋 + 𝑌 < 1)
e. 𝐸(𝑌|𝑋 + 𝑌 < 1)
59) If 𝑋 and 𝑌 are independent exponential random variables with respective
parameters 𝜆 and 𝜇, it is desired to calculate:
a. 𝐸(𝑋|𝑋 < 𝑌)
b. 𝐸(𝑌|𝑋 < 𝑌)
c. 𝐸(𝑌 − 𝑋|𝑋 < 𝑌)
519 | P a g e
S ometimes, the expected value or the probability of an event of the random
variable 𝑋 is different given the possible values of another random variable like
𝑌. In other words, for random variable 𝑋|𝑌 = 𝑦, the expected value or the
probability of an event depends on the value of 𝑦. In such conditions, to determine
𝐸(𝑋) or 𝑃(𝑋 = 𝑥), we condition on the possible values of 𝑌. For example, suppose
that a person rolls a fair die. If the result of rolling is 𝑦, then he flips a fair coin 𝑦
times and 𝑋 denotes the number of upturned heads in the experiment. It is evident
that, in this problem, the random variable 𝑋|𝑌 = 𝑦 has different expected value and
probability function given different values of 𝑦. For instance, consider the state
that the result of rolling is 4 and we flip the coin 4 times. In this case, the number
of upturned heads or equivalently the distribution of 𝑋|𝑌 = 4 follows a binomial
distribution with parameters (𝑛 = 4, 𝑝 = 0.5). Moreover, consider the state that
the result of rolling is 6 and subsequently we flip the coin 6 times. In this case, the
number of obtained heads or the distribution of 𝑋|𝑌 = 6 is binomial with
parameters (𝑛 = 6, 𝑝 = 0.5). In such examples, we show that to determine 𝐸(𝑋) and
𝑃(𝑋 = 𝑥), we should condition on the possible values of 𝑌.
520 | P a g e
A s mentioned in Section 4.9 of Chapter 4, it is likely that the probability of
occurrence of an event depends on the result of a continuous or discrete
variable. If so, to calculate the probability of the referred event, we should
condition on the result of the random variable. Reconsider the example mentioned
in Section 11.1. If the result of rolling is 𝑦, then the person flip a fair coin y times
and 𝑋 denotes the number of obtained heads in the experiment. Obviously, in this
problem, the random variable 𝑋|𝑌 = 𝑦 has a binomial distribution with parameters
(𝑛 = 𝑦, 𝑝 = 0.5). In such a case, to determine 𝑃(𝑋 = 𝑥), we should condition on the
possible values of 𝑌 as follows:
𝑃(𝑋 = 𝑥) = ∑ 𝑃(𝑋 = 𝑥|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) (2-1)

𝑦
For the above-mentioned example, the relation is as follows:
6 6
𝑦 1 1 1 1 𝑦 1
𝑃(𝑋 = 𝑥) = ∑ 𝑃(𝑋 = 𝑥|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) = ∑ ( ) ( )𝑥 ( )𝑦−𝑥 = ∑ ( ) ( )𝑦
𝑥 2 2 6 6 𝑥 2
𝑦 𝑦=𝑥 𝑦=𝑥
Namely, for 𝑃(𝑋 = 4), we have:
6 6
𝑦 1 1 1 1 𝑦 1
𝑃(𝑋 = 4) = ∑ 𝑃(𝑋 = 4|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) = ∑ ( ) ( )4 ( )𝑦−4 = ∑ ( ) ( )𝑦
4 2 2 6 6 4 2
𝑦 𝑦=4 𝑦=4
1 4 1 5 1 6 1 1 1 5 15 29
= (( ) ( )4 + ( ) ( )5 + ( ) ( )6 ) = ( + + )=
6 4 2 4 2 4 2 6 16 32 64 384
Note that since the number of obtained head or 𝑋 is equal to 4, the result of
the die or equivalently the number of flips of the coin should be greater than or
equal to 4.
Sometimes the probability function of the random variable 𝑋|𝑌 = 𝑦 depends
on the value of 𝑦 while the possible values of 𝑦 are continuous. In such cases,
similar to the explanations of Section 4.9 in Chapter 4, to determine 𝑃(𝑋 = 𝑥), we
should condition on the continuous values of the random variable 𝑌 as follows:
521 | P a g e
𝑃(𝑋 = 𝑥) = ∫𝑃(𝑋 = 𝑥|𝑌 = 𝑦)𝑓𝑌 (𝑦)𝑑𝑦
𝑦
(2-2)
For example, suppose that we first generate a random number uniformly over
the interval zero to one. If the generated number is 𝑦, then we perform 𝑛 trials
independently with identical probability of success 𝑦 and denote the number of
successes in the 𝑛 trials by 𝑋. In this problem, the random variable 𝑋|𝑌 = 𝑦 has a
binomial distribution with parameters (𝑛, 𝑝 = 𝑦). Now, to determine 𝑃(𝑋 = 𝑥) given
the integer values of 0,1,2, . . . , 𝑛 for 𝑥, we should condition on the possible values of
the random variable 𝑌 as follows:
1
1
𝑃(𝑋 = 𝑥) = ∫𝑃(𝑋 = 𝑥|𝑌 = 𝑦)𝑓𝑌 (𝑦)𝑑𝑦 = ∫ (𝑃(𝑋 = 𝑥|𝑌 = 𝑦)) 𝑑𝑦
𝑦 0 1−0
1 1
𝑛 1 𝑛 𝑛 𝛤(𝑥 + 1)𝛤(𝑛 − 𝑥 + 1)
= ∫ ( ) 𝑦 𝑥 (1 − 𝑦)𝑛−𝑥 𝑑𝑦 = ( ) ∫ 𝑦 𝑥 (1 − 𝑦)𝑛−𝑥 𝑑𝑦 = ( )
0
𝑥 1−0 𝑥 0 𝑥 𝛤((𝑥 + 1) + (𝑛 − 𝑥 + 1))
𝑛 𝑥! (𝑛 − 𝑥)! 1
=( ) =
𝑥 (𝑛 + 1)! 𝑛+1
In other words, in this problem, 𝑃(𝑋 = 𝑥) given the integer values of 0,1,2, . . . , 𝑛
1
for 𝑥 is equal to 𝑛+1.
Example 2.1
Suppose that the number of accidents for the drivers of a city follows a Poisson
distribution with rate 𝑌 such that 𝑌 is equal to 1 for half of the people and 2 for the
other half. Obtain the proportion of the people in the city who have experienced no
accident in a year.
Solution. If 𝑋 denotes the number of people's accidents in a year, then 𝑋|𝑌 has a
Poisson distribution with parameter 𝑌 in a way that 𝑌 is 1 for half of the people in city
522 | P a g e
and 2 for the other half. This means that 𝑌 follows a discrete uniform distribution
with values of {1,2}. Hence, we have:
2 2
𝑒 −𝑦 𝑦 0 1 𝑒 −1 10 1 𝑒 −2 20 1
𝑃(𝑋 = 0) = ∑ 𝑃(𝑋 = 0|𝑌 = 𝑦) 𝑃(𝑌 = 𝑦) = ∑ × = × + ×
0! 2 0! 2 0! 2
𝑦=1 𝑦=1
1
= (𝑒 −1 + 𝑒 −2 ) ≈ 0.252
2
Example 2.2
Suppose that the number of accidents for the drivers of a city follows a Poisson
distribution with rate 𝑌 such that 𝑌 is uniformly distributed over the continuous
interval (1,2) for the people of the city. Obtain the proportion of the people in the
city who have experienced no accident in a year.
Solution. If 𝑋 denotes the number of people's accidents in a year, then 𝑋|𝑌 has a
Poisson distribution with parameter 𝑌 in a way that 𝑌 follows a uniform distribution
over the interval (1,2). Therefore, we have:
2 2
𝑒 −𝑦 𝑦 0 1
𝑃(𝑋 = 0) = ∫ 𝑃(𝑋 = 0|𝑌 = 𝑦)𝑓𝑌 (𝑦)𝑑𝑦 = ∫ 𝑑𝑦
𝑦=1 𝑦=1 0! 2 − 1
2
= ∫ 𝑒 −𝑦 𝑑𝑦 = 𝑒 −1 − 𝑒 −2 ≈ 0.233
1
A s mentioned in the preceding section, if the probability of an event of a random

variable is different given the possible states of another variable like 𝑌, then to
calculate the probability function, we should condition on the different values of 𝑌.
This property is also valid for the expected value of a random variable like 𝑋. This
means that if the expected value of the random variable 𝑋 varies given different
523 | P a g e
values of 𝑌, then we should condition on the values of 𝑌 to obtain the expected value
of 𝑋.
If 𝑌 is a discrete random variable and values of 𝐸(𝑋|𝑌 = 𝑦) change given
different values of 𝑌 = 𝑦, then we have:
𝐸(𝑋) = ∑ 𝐸(𝑋|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) (3-1)

𝑦
On the other hand, if 𝑌 is a continuous random variable, then we have:
𝐸(𝑋) = ∫𝐸(𝑋|𝑌 = 𝑦)𝑓𝑌 (𝑦)𝑑𝑦 (3-2)

𝑦
Proof (for continuous case).

𝑓𝑋,𝑌 (𝑥, 𝑦)
∫𝐸(𝑋|𝑌 = 𝑦)𝑓𝑌 (𝑦)𝑑𝑦 = ∫ ∫𝑥𝑓𝑌 (𝑥|𝑦)𝑑𝑥 𝑓𝑌 (𝑦)𝑑𝑦 = ∫ ∫ 𝑥 𝑑𝑥 𝑓𝑌 (𝑦)𝑑𝑦
𝑦 𝑦 𝑥 𝑦 𝑥 𝑓𝑌 (𝑦)
= ∫ ∫𝑥𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 𝐸(𝑋)

𝑦 𝑥
Now, again suppose that a person rolls a fair die. If the result of the die is 𝑦,
then he flips a fair coin y times. As mentioned, if 𝑋 denotes the number of obtained
heads, the random variable 𝑋|𝑌 = 𝑦 has a binomial distribution with parameters
(𝑛 = 𝑦, 𝑝 = 0.5). In this problem, the value of 𝐸(𝑋|𝑌 = 𝑦) change given different
𝑦
values of 𝑦 and equals 2. Therefore, to calculate 𝐸(𝑋), we have:
6
1 1 1 1 1 1 7
𝐸(𝑋) = ∑ 𝐸(𝑋|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) = (1 × ) × + (2 × ) × + ⋯ + (6 × ) × =
2 6 2 6 2 6 4
𝑦=1
However, such problems can be investigated by another approach. As

mentioned, to calculate 𝐸(𝑋), we condition on the possible values of 𝑌 if the
expected value of 𝑋 depends on different values of 𝑌. In other words, 𝐸(𝑋|𝑌 = 𝑦)
is a function of 𝑦 as 𝑔(𝑦). Accordingly, in such cases, the expression 𝐸(𝑋) =
∑𝑦 𝐸(𝑋|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) is as ∑𝑦 𝑔(𝑦)𝑃(𝑌 = 𝑦) denoted by 𝐸(𝑔(𝑌)) in Chapter 5. As a
result, in these conditions, 𝐸(𝑋) becomes as 𝐸(𝑔(𝑌)) such that 𝑔(𝑌) is 𝐸(𝑋|𝑌) or
524 | P a g e
equivalently the expected value of 𝑋 given 𝑌. As an example, in the previous
1
example, 𝐸(𝑋|𝑌) or the expected value of 𝑋 given 𝑌 is equal to 𝑌 × 2. Therefore, we
have:
𝑦 𝑌 1 1 7 7
𝐸(𝑋) = ∑ 𝐸(𝑋|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) = ∑ 𝑃(𝑌 = 𝑦) = 𝐸( ) = 𝐸(𝑌) = × =
2 2 2 2 2 4
𝑦 𝑦
In other words, in this problem, 𝐸(𝑋) is simply the expected value of a

function of Y as 𝑔(𝑌) = 𝐸(𝑋|𝑌), which can be written as follows:
𝐸(𝑋) = 𝐸(𝑔(𝑌)) = 𝐸(𝐸(𝑋|𝑌))
Calculating 𝐸(𝑋), we condition on the possible values of 𝑌 when the expected

value of 𝑋 depends on the values of 𝑌. In other words, 𝐸(𝑋|𝑌 = 𝑦) is a function of 𝑦 as
𝑔(𝑦). Hence, in such conditions, the expression 𝐸(𝑋) = ∑𝑦 𝐸(𝑋|𝑌 = 𝑦)𝑃(𝑌 = 𝑦)
becomes as ∑𝑦 𝑔(𝑦)𝑃(𝑌 = 𝑦), denoted by 𝐸(𝑔(𝑌)) in the previous chapters. As a result,
in these conditions, 𝐸(𝑋) is as 𝐸(𝑔(𝑌)) such that 𝑔(𝑌) is simply 𝐸(𝑋|𝑌) or equivalently
the expected value of 𝑋 given 𝑌.
Proposition 3-1
If the expected value of 𝑋 depends on the values of 𝑌 and 𝐸(𝑋|𝑌) is a function of
the random variable 𝑌, then we have:
𝐸(𝑋) = 𝐸[𝐸(𝑋|𝑌)]
Hence, for the preceding example, we have:
𝑌 1 1 7 7
𝐸(𝑋) = 𝐸[𝐸(𝑋|𝑌)] = 𝐸( ) = 𝐸(𝑌) = × =
2 2 2 2 4
525 | P a g e
Example 3.1
Suppose that any customer entering a store per day will buy a specific product
with probability 𝑝. If the number of customers in the store follows a Poisson
distribution with parameter 𝜆 in each day, then obtain the mean number of
customers that buy the product in a day.
Solution. If the number of customers that buy the product in a day is denoted by 𝑋
and the number of customers that enter the store in a day is denoted by 𝑌, then 𝑋|𝑌
follows a binomial distribution with parameter (𝑌, 𝑝) and 𝑌 itself follows a Poisson
distribution with parameter 𝜆. Hence, we have:
𝐸(𝑋) = ∑ 𝐸(𝑋|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) = ∑(𝑦𝑝)𝑃(𝑌 = 𝑦) = 𝐸(𝑌𝑝) = 𝑝𝐸(𝑌) = 𝑝𝜆

𝑦 𝑦
Furthermore, using Proposition 3.1, we have:
𝐸(𝑋) = 𝐸[𝐸(𝑋|𝑌)] = 𝐸(𝑝𝑌) = 𝑝𝜆
Example 3.2
The random variable 𝑋|𝑌 has a Poisson distribution with parameter 𝑌 and 𝑌
itself is an exponential random variable with density function 𝑓𝑌 (𝑦) = 𝑒 −𝑦 for values
of 𝑎 > 0. Obtain the expected value of the random variable 𝑋.
Solution.
∞ ∞
𝐸(𝑋) = ∫ 𝐸(𝑋|𝑌 = 𝑦)𝑓𝑌 (𝑦)𝑑𝑦 = ∫ 𝑦𝑒 −𝑦 𝑑𝑦 = 1
0 0
Moreover, since 𝑌 is an exponential random variable with parameter 1, using

Proposition 3.1, we have:
526 | P a g e
𝐸(𝑋) = 𝐸[𝐸(𝑋|𝑌)] = 𝐸(𝑌) = 1
As mentioned in previous chapters, if one of the parameters of the random

variable 𝑋 is itself a random variable, then to calculate any probability of 𝑋, we should
condition on the possible values of that parameter. Similarly, if one of the parameters
of the random variable 𝑋 is itself a random variable, then to calculate the expected
value of 𝑋, in the same manner as the preceding example, we should condition on
the possible values of that parameter.
Proposition 3-2
If the distribution of 𝑋 depends on that of 𝑌, then we have:
𝐸(𝑋𝑌) = 𝐸[𝐸(𝑋𝑌|𝑌)] = 𝐸[𝑌𝐸(𝑋|𝑌)]
Proof. (for the discrete case).
𝐸(𝑋𝑌) = ∑ 𝐸(𝑋𝑌|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) = ∑ 𝑦𝐸(𝑋|𝑌 = 𝑦)𝑃(𝑌 = 𝑦)

𝑦 𝑦
Hence, letting 𝑦𝐸(𝑋|𝑌 = 𝑦) = 𝑔(𝑦), we get:
𝐸(𝑋𝑌) = ∑ 𝑦𝐸(𝑋|𝑌 = 𝑦)𝑃(𝑌 = 𝑦) = ∑ 𝑔(𝑦)𝑃(𝑌 = 𝑦) = 𝐸(𝑔(𝑌)) = 𝐸(𝑌𝐸(𝑋|𝑌))

𝑦 𝑦
Example 3.3
In Example 3.2, obtain the value of 𝐸(𝑋𝑌).
Solution.
2
(𝑋|𝑌 = 𝑦)~𝑃(𝑦) ⇒ 𝐸(𝑋𝑌) = 𝐸[𝑌𝐸(𝑋|𝑌)] = 𝐸(𝑌𝑌) = 𝐸(𝑌 2 ) = =2
12
527 | P a g e
Proposition 3-3
For the two random variables 𝑋 and 𝑌, we have:
𝐶𝑜𝑣(𝑌, 𝑋) = 𝐶𝑜𝑣(𝑌, 𝐸(𝑋|𝑌))
Proof.
𝐶𝑜𝑣(𝑌, 𝐸(𝑋|𝑌)) = 𝐸(𝑌 × 𝐸(𝑋|𝑌)) − 𝐸(𝑌)𝐸(𝐸(𝑋|𝑌)) = 𝐸(𝑋𝑌) − 𝐸(𝑌)𝐸(𝑋) = 𝐶𝑜𝑣(𝑌, 𝑋)
Example 3.4
In Example 3.2, obtain the value of 𝐶𝑜𝑣(𝑋, 𝑌).

Solution. 𝐶𝑜𝑣(𝑌, 𝑋) = 𝐶𝑜𝑣(𝑌, 𝐸(𝑋|𝑌)) = 𝐶𝑜𝑣(𝑌, 𝑌) = 𝑉𝑎𝑟(𝑌) = 1
Proposition 3-4
For the two random variables 𝑋 and 𝑌, we have:
𝑉𝑎𝑟(𝑋) = 𝐸[𝑉𝑎𝑟(𝑋|𝑌)] + 𝑉𝑎𝑟[𝐸(𝑋|𝑌)]
Proof. 𝑉𝑎𝑟(𝑋|𝑌) = 𝐸(𝑋 2 |𝑌) − (𝐸(𝑋|𝑌))2
Therefore, we have:
𝐸(𝑉𝑎𝑟(𝑋|𝑌)) = 𝐸(𝐸(𝑋 2 |𝑌)) − 𝐸([𝐸(𝑋|𝑌)]2 ) = 𝐸(𝑋 2 ) − 𝐸([𝐸(𝑋|𝑌)]2 ) (3-3)
Moreover, we have:
𝑉𝑎𝑟(𝐸(𝑋|𝑌)) = 𝐸([𝐸(𝑋|𝑌)]2 ) − (𝐸[𝐸(𝑋|𝑌)])2 = 𝐸([𝐸(𝑋|𝑌)]2 ) − (𝐸(𝑋))2 (3-4)
Hence, adding up Relationships of (3.3) and (3.4) yields:

528 | P a g e
𝐸[𝑉𝑎𝑟(𝑋|𝑌)] + 𝑉𝑎𝑟[𝐸(𝑋|𝑌)] = 𝐸(𝑋 2 ) − (𝐸(𝑋))2 = 𝑉𝑎𝑟(𝑋)
Example 3.5
In Example 3.2, obtain the value of 𝑉𝑎𝑟(𝑋).
Solution.
𝑉𝑎𝑟(𝑋) = 𝐸[𝑉𝑎𝑟(𝑋|𝑌)] + 𝑉𝑎𝑟[𝐸(𝑋|𝑌)]
𝑉𝑎𝑟(𝑋) = 𝐸[𝑌] + 𝑉𝑎𝑟[𝑌] = 1 + 1 = 2
A s mentioned in Chapter 10, if 𝑋𝑖 's are independent and identically distributed

random variables with mean 𝜇 and 𝜎 2 , then we have:
𝐸(∑ 𝑋𝑖 ) = 𝐸(𝑋1 ) + 𝐸(𝑋2 ) + ⋯ + 𝐸(𝑋𝑛 ) = 𝑛𝐸(𝑋𝑖 )

𝑖=1
And if the 𝑋𝑖 's are uncorrelated, we have:
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = 𝑉𝑎𝑟(𝑋1 ) + 𝑉𝑎𝑟(𝑋2 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑛 ) = 𝑛𝑉𝑎𝑟(𝑋𝑖 )

𝑖=1
Likewise, it can be shown that if 𝑋1 , 𝑋2 , … are a series of independent and

identically distributed random variables and N is a nonnegative integer random
variable which is independent of the 𝑋𝑖 's, we have:
529 | P a g e
𝑁
𝐸(∑ 𝑋𝑖 ) = (𝑁)𝐸(𝑋𝑖 ) (4-1)

𝑖=1
𝑁
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = 𝐸(𝑁)𝑉𝑎𝑟(𝑋𝑖 ) + 𝑉𝑎𝑟(𝑁)[𝐸(𝑋𝑖 )]2 (4-2)

𝑖=1
Proof.
𝑁
𝐸(∑ 𝑋𝑖 ) = 𝐸(𝑌) = 𝐸(𝐸(𝑌|𝑁)) = 𝐸(𝑁. 𝐸(𝑋𝑖 )) = 𝐸(𝑁). 𝐸(𝑋𝑖 )

𝑖=1
𝑁
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = 𝑉𝑎𝑟(𝑌) = 𝐸[𝑉𝑎𝑟(𝑌|𝑁)] + 𝑉𝑎𝑟[𝐸(𝑌|𝑁)]

𝑖=1
= 𝐸[𝑁. 𝑉𝑎𝑟(𝑋𝑖 )] + 𝑉𝑎𝑟[𝑁. 𝐸(𝑋𝑖 )] = 𝐸(𝑁)𝑉𝑎𝑟(𝑋𝑖 ) + 𝑉𝑎𝑟(𝑁). [𝐸(𝑋𝑖 )]2
Example 4.1
The number of busses entering a terminal in a day has a Poisson distribution

with rate 30 and the number of people getting on each bus follows a discrete uniform
distribution with values of { 1,2, . . . ,19 }. What are the mean and variance of the
number of people who get on the bus in a day?
Solution. If 𝑋𝑖 denotes the number of people in the ith bus and 𝑁 denotes the number
of busses, then the number of people in the terminal is equal to ∑𝑁 𝑖=1 𝑋𝑖 . Hence, we
have:
𝐸(𝑋) = 𝐸(𝑁)𝐸(𝑋𝑖 ) = 30 × 10 = 300
Accordingly, multiplying the mean number of people in each bus by the mean
number of busses, we can obtain the mean number of people in the terminal.
530 | P a g e
𝑁
2
192 − 1
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = 𝐸(𝑁)𝑉𝑎𝑟(𝑋𝑖 ) + 𝑉𝑎𝑟(𝑁)[𝐸(𝑋𝑖 )] = 30 × + 30 × 102 = 3900
12
𝑖=1
Example 4.2
If the number of people who enter a store in a day has a Poisson distribution
with mean 20 people and any person's purchase amount follows an exponential
1
distribution with parameter 𝜆 = 10, then what is the mean and variance of daily
sales amount of the store?
Solution. If 𝑋𝑖 denotes the purchase amount of the 𝑖 𝑡ℎ person and 𝑁 denotes the
number of people who enters the store in a day, then the sales amount of the store
is equal to ∑𝑁
𝑖=1 𝑋𝑖 . Hence, we have:
1
𝑋𝑖 ~𝐸𝑥𝑝(𝜆 = 10) 𝑁~𝑃(20)
𝐸(𝑋) = 𝐸(𝑁)𝐸(𝑋𝑖 ) = 20 × 10 = 200
As a result, multiplying the mean number of people entering the store in a

day by the mean of each person's purchase amount, we can obtain the daily sales
amount of the store.
Moreover, for the variance of the daily sales amount of the store, we have:
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = 𝐸(𝑁)𝑉𝑎𝑟(𝑋𝑖 ) + 𝑉𝑎𝑟(𝑁)[𝐸(𝑋𝑖 )]2 = 20 × 102 + 20 × 102 = 4000

𝑖=1
Note that the number of variables counted in the term ∑𝑁

𝑖=0 𝑋𝑖 is not equal
to 𝑁, but equal to 𝑁 + 1. Hence, if the 𝑋𝑖 's are independent and identically
distributed random variables with mean 𝜇 and variance 𝜎 2 , and 𝑁 is a discrete
random variable with nonnegative integers, then we have:
531 | P a g e
𝑁
𝐸(∑ 𝑋𝑖 ) = 𝐸(𝑁 + 1)𝐸(𝑋𝑖 ) = [𝐸(𝑁) + 1] 𝐸(𝑋𝑖 )

𝑖=0
𝑁
𝑉𝑎𝑟(∑ 𝑋𝑖 ) = 𝐸(𝑁 + 1)𝑉𝑎𝑟(𝑋𝑖 ) + 𝑉𝑎𝑟(𝑁 + 1)[𝐸(𝑋𝑖 )]2

𝑖=0
= [𝐸(𝑁) + 1] 𝑉𝑎𝑟(𝑋𝑖 ) + 𝑉𝑎𝑟(𝑁)[𝐸(𝑋𝑖 )]2
532 | P a g e
1) We denote the number of customers entering a store in a day by Y which
follows a Poisson distribution with parameter 𝜆. Each customer buys a specific
product with probability 𝑝. If 𝑋 denotes the number of customers who buy the
product in a day, then it is desired to calculate:
a. 𝑃(𝑋 = 𝑘) for 𝑘 = 0,1,2,3, . ..
b. 𝐸(𝑋)
c. Var(𝑋)
d. 𝐸(XY)
e. 𝐶𝑜𝑣(𝑋, 𝑌)
f. Var(X+Y)
Hint: (𝑋|𝑌) ∼ 𝐵(Y,p) and 𝑌 ∼ 𝑃(𝜆)
2) We denote the number of accidents for people of a city by 𝑋 which has a
Poisson distribution with rate 𝑌. 𝑌 or accident rate of the people of the city
itself follows an exponential distribution with rate 𝜆 = 1. It is desired to
calculate:
a. 𝑃(𝑋 = 𝑘) for 𝑘 = 0,1,2,3, . ..
b. 𝐸(𝑋)
c. Var(𝑋)
d. 𝐸(XY)
f. Var(X+Y)
Hint: (𝑋|𝑌) ∼ 𝐵(Y,p) and 𝑌 ∼ 𝑃(𝜆)
3) Solve the preceding problem under the condition that 𝑌 has a gamma
distribution with parameters (𝛼 = 3, 𝜆 = 1).
4) We generate a random number uniformly distributed over the interval (0,1)
and denote it by 𝑌. Then, we perform 𝑛 independent trials with identical
probability of success 𝑌 and denote their number of successes by 𝑋. If so, then
a. 𝑃(𝑋 = 𝑘) for 𝑘 = 0,1,2,3, …
533 | P a g e
b. 𝐸(𝑋)
c. 𝑉𝑎𝑟(𝑋)
d. 𝐸(𝑋𝑌)
Hint: (𝑋|𝑌) ∼ 𝐵(𝑛, 𝑌) and 𝑌 ∼ 𝑈(0,1)
5) We first generate a random number uniformly distributed over interval 0 to 1
and denote it by 𝑌. Then, we perform a trial with probability of success 𝑌
independently until getting the first success and denote the number of
performed trials by 𝑋. If so, then:
a. Obtain 𝑃(𝑋 = 𝑘) for 𝑘 = 1,2,3, . ...
b. Show that 𝐸(𝑋) and 𝑉𝑎𝑟(𝑋) do not exist.
c. Obtain 𝐸(𝑋𝑌).
d. Show that 𝐶𝑜𝑣(𝑋, 𝑌) does not exist.
Hint: (𝑋|𝑌) ∼ 𝐺(𝑌) and 𝑌 ∼ 𝑈(0,1)
6) We denote the lifetime of a light bulb by the random variable 𝑋 which follows
an exponential distribution with failure rate 𝑌. Also, 𝑌 itself has an exponential
distribution with mean 1.
a. Obtain 𝑃(𝑋 < 𝑘) for 𝑘 > 0.
b. Show that 𝐸(𝑋) and 𝑉𝑎𝑟(𝑋) do not exist.
c. Obtain 𝐸(𝑋𝑌).
d. Show that 𝐶𝑜𝑣(𝑋, 𝑌) does not exist.
Hint: (𝑋|𝑌) ∼ 𝐸𝑥𝑝(𝑌) and 𝑌 ∼ 𝐸𝑥𝑝(𝜆 = 1)
7) We first roll a fair die and denote the corresponding result by 𝑋. Then, we flip
a fair coin 𝑋 times and denote the number of upturned heads by 𝑌. It is desired
to calculate:
a. 𝐸(𝑌)
b. Var(𝑌)
c. 𝐸(XY)
1
Hint: (𝑌|𝑋) ∼ 𝐵(𝑋, 𝑝 = 2) and 𝑋 ∼ 𝑈{1,2, . . . ,6}
8) Show the validity of the following relationship:

𝑉𝑎𝑟(𝑋𝑌) = 𝐸[𝑌 2 𝑉𝑎𝑟(𝑋|𝑌)] + 𝑉𝑎𝑟[𝑌𝐸(𝑋|𝑌)]
534 | P a g e
9) If 𝑋|𝑦~𝑃(𝜆 = 𝑦) and 𝑌~𝐸𝑥𝑝(1), then obtain 𝑉𝑎𝑟(𝑋𝑌).
10) A fair die is independently rolled 20 times. If 𝑋 and 𝑌 denote the number of
times that sides 1 and 2 appear, respectively, then:
a. Obtain the distribution of 𝑋 given 𝑌.
b. Obtain the mean of 𝑋 given 𝑌.
11) We select number 𝑋 at random from the set {1,2,3,4,5}. Then, we choose a
number at random from the subset {1,2, . . . , 𝑋}. If 𝑌 denotes the second
selected number, then obtain 𝐸(𝑌).
12) Suppose that the 𝑋𝑖 's are independent and identically distributed random
variables with mean 𝜇 and variance 𝜎 2 . If 𝑛 is a nonnegative integer and 𝑁 is
a discrete random variable with nonnegative integers, then it is desired to
calculate:
a. 𝐸(∑𝑛𝑖=1 𝑋𝑖 ) and 𝐸(∑𝑁
𝑖=1 𝑋𝑖 )
b. 𝑉𝑎𝑟(∑𝑛𝑖=1 𝑋𝑖 ) and 𝑉𝑎𝑟(∑𝑁𝑖=1 𝑋𝑖 )
13) The number of busses entering a terminal in a day follows a Poisson

distribution with rate 30, and the number of people who get on any bus has
a discrete uniform distribution with values of { 1,2, . . . ,19 }. Obtain the mean
and variance of the number of people who enter the terminal to get on the
bus in a day.
14) Suppose that the time between arrival of the cars in a gas station follows an
exponential distribution with mean 1. Moreover, we know that the refueling
time of any car is uniform distribution in interval 0 to 2 minutes. Considering
the independence of these two variables, if 𝑋 denotes the time (in hours) that
the station is operating, obtain its mean and variance.
15) Suppose that 𝑋0 , 𝑋1 , . . . , 𝑋𝑁 are (𝑁 + 1) independent and identically distributed
random variables with mean 𝜇 and variance 𝜎 2 such that 𝑁 itself is a Poisson
random variable with parameter 1. If so, then obtain the values of 𝐸(∑𝑁 𝑖=0 𝑋𝑖 )
𝑁
and 𝑉𝑎𝑟(∑𝑖=0 𝑋𝑖 ).
16) Devices with installed main and spare systems are brought to the repair shop.
The repairing time of each system (not each device) is exponential with mean
1 hour. The probability that only one system is defective is equal to 𝑝, and the
probability of being defective for the two systems is equal to 𝑞 (𝑝 + 𝑞 = 1).
Obtain the mean and variance of the repairing time of each device.
535 | P a g e
17) We roll a fair die successively until the side 6 comes up. Observing the side
𝑘 (𝑘 = 1,2, . . . ,5), we wait for 𝑘 minutes and then roll the die again. If 𝑇
denotes the time before observing the side 6, obtain the expected value of
𝑇.
18) A random number generator produces each of the numbers 0 to 9 with the
same probability of 0.1. Obtain the average number of zeros produced
between two consecutive 9's.
19) Suppose that the 𝑋𝑖 's are independent and identically distributed random
variables with mean 𝜇 and variance 𝜎 2 . If 𝑛 is a nonnegative integer and 𝑁 is
a discrete random variable with nonnegative integers, then it is desired to
calculate:
a. 𝐸(∏𝑛𝑖=1 𝑋𝑖 ) and 𝐸(∏𝑁
𝑖=1 𝑋𝑖 )
𝑛
b. 𝑉𝑎𝑟(∏𝑖=1 𝑋𝑖 ) and 𝑉𝑎𝑟(∏𝑁 𝑖=1 𝑋𝑖 )
Hint: to calculate 𝐸(𝑎𝑁 ), it suffices to replace 𝑡 with the value of “𝑎” in the
factorial moment generating function of the random variable 𝑁.
20) Suppose that 𝑆(𝑡) denotes the price of a product in time 𝑡 (𝑡 ≥ 0). If
economic shocks happen, then the price of this product varies. If 𝑁(𝑡)
denotes the number of shocks until time 𝑡 and 𝑋𝑖 denotes the effect of the
𝑁(𝑡)
𝑖 𝑡ℎ shock, we have 𝑆(𝑡) = 𝑆(0) ∏𝑖=1 𝑋𝑖 . Moreover, for 𝑁(𝑡) = 0, we have
∏𝑁(𝑡)
𝑖=1 𝑋𝑖 = 1 for 𝑁(𝑡) = 0. Now, if the 𝑋𝑖 's are independent and identically
distributed exponential random variables with rate 𝜇, {𝑁(𝑡) , 𝑡 > 0} follows a
Poisson process with rate 𝜆 (independent of 𝑋𝑖 's), and 𝑆(0) = 𝑠, then obtain
𝐸(𝑆(𝑡)).
21) Suppose that 𝑋1 , 𝑋2 , … are from a series of independent and identically
distributed Bernoulli random variables with parameter 𝑝. If 𝑁 is a random
variable independent of the 𝑋𝑖 's with probability function
𝑃(𝑁 = 𝑛) = 𝑝𝑞 𝑛−1 ; 𝑛 = 1,2, … (𝑞 = 1 − 𝑝),
a. 𝐶𝑜𝑣( 𝑁, ∑𝑁
𝑖=1 𝑋𝑖 )
b. 𝐶𝑜𝑣(∑𝑖=1 𝑋𝑖 , 𝑁 − ∑𝑁
𝑁
𝑖=1 𝑋𝑖 )
22) Suppose that 𝑋~𝑁(0,1) and 𝑌~𝐵(1, 𝑝) are two independent random
𝑋; 𝑦 = 1
variables. If 𝑉 = { , obtain 𝐶𝑜𝑣(𝑋, 𝑉).
−𝑋; 𝑦 = 0
536 | P a g e
23) In flips of a fair coin, if 𝑋 denotes the number of trials until getting the first
heads and 𝑌 denotes the number of trials until getting the first tails, then
obtain the covariance between 𝑋 and 𝑌.
24) An employee is required to provide customer service, the time of which is
determined by the exponential distribution with an average of 6 hours. Also,
working time for the employees is 8 hours per day. If the employee completes
his work before the end of office hours, he must stay in the office until the end
of working hours. On the other hand, if his work lasts longer than the office
hours, then he must remain in the office until the work is finished. Obtain the
mean time that the employee spends in the office.
25) The lifetime of an electronic device follows an exponential distribution with
1
mean 𝜆. If the device fails or its lifetime reaches at 𝑇, this device will be
replaced. What is the mean time that it takes to replace this device?
26) Suppose that 𝑋 is a geometric random variable with probability function
𝑋; 𝑀 ≤ 𝑋
𝑃(𝑋 = 𝑥) = 𝑞 𝑥−1 𝑝; 𝑥 = 1,2, … . If 𝑌 = 𝑀𝑎𝑥(𝑋, 𝑀) = { and 𝑀 is a positive
𝑀; 𝑋 < 𝑀
integer, obtain the value of 𝐸(𝑌).
27) A piece of wood with length 1 is available. We randomly select a point on the
wood and then divide it into two parts.
a. Obtain the mean length of the smaller piece.
b. Obtain the mean length of the larger piece.
c. If there is a specific point on the wood with a distance of length 0.4 from
the beginning of the piece, then obtain the mean length of the piece that
contains the referred point.
28) Suppose that 𝑋1 ∼ 𝑈(0, 𝜃) and 𝑋2 ∼ 𝑈(0,2𝜃) are independent random variables.
𝜃; 𝑋1 > 𝑋2
If 𝑀 = { , then obtain the value of 𝐸(𝑀).
2𝜃; 𝑋1 ≤ 𝑋2
29) A physician has given two examination time to patients A and B at 5 and 5:10
p.m. If the examination duration of each patient has an exponential
distribution with an average of 15 minutes, then before being examined, how
much time, on average, would it take the second patient to be examined at 5:10
p.m. (assuming that he is present at 5:10 in the clinic)?
30) In a game, a fair die and a fair coin are tossed simultaneously. If the result of
the coins is heads, the player wins twice of the appeared side of the die;
537 | P a g e
otherwise, he wins half of the upturned side. Obtain the expected value of
winning.
31) We roll a fair die. If it lands on 6, we flip a fair coin 10 times and receive a score
two times of the number of heads. If the die lands on a side other than 6, we
flip the fair coin 10 times and receive +2 scores given each heads and -1 score
given each tails. Calculate the expected value of the obtained scores.
32) A fair die is successively rolled. If 𝑋 and 𝑌 denote the number of required rolls
until getting a 6 and a 5, respectively, then it is desired to calculate:
a. 𝐸(𝑋|𝑌 = 1)
b. 𝐸(𝑋|𝑌 = 2)
Hint: condition on the result of the first trial.
33) A miner is trapped in a mine that has three paths. The first path leads to a
tunnel that will save the miner after 2 hours. However, the second and third
paths lead to the tunnels returning him to the mine after 3 and 5 hours,
respectively. If we assume that the miner any time choses each of the paths
independently and with the same probability and 𝑋 denotes the amount of
time that the miner's rescue time, then obtain the expected value and moment
generating function of 𝑋.
34) Consider a city with a population of 𝑁 such that one percent of them have
contracted a specific illness. To diagnose the illness, a sample of size 𝑛 (𝑁 = 𝐾𝑛)
is taken at random from the people's blood. Subsequently, the blood of the
people is coalesced and then investigated at a laboratory. If the result of the
test is negative, then all the people of the samples are healthy, which requires
us to perform only one trial; otherwise, all the blood samples should be
investigated, which entails us to perform 𝑛 + 1 trials. On average, how many
trials are needed to diagnose the illness of these people?
35) We flip a coin whose probability of coming heads is 0.1 in each flip. On average,
how many times should we flip it until:
a. Two heads appear.
b. Two consecutive heads appear.
c. A heads and a tails appear.
d. A heads and two tails appear.
36) There is a road with a length of 30 miles between cities A and B. If an
accident happens along the road, then its location is assumed to follow a
538 | P a g e
uniform distribution along the road. An ambulance is stationed 10 miles
from city A. What is the mean distance of the accident location from the
ambulance location?
37) In the preceding problem, if another ambulance is stationed 10 miles from
city B (20 miles from city A), then what is the mean distance of the accident
location from the nearest ambulance?
38) The number of customers of a car rental institution has a Poisson random
variable with an average of 3 on sunny days and a Poisson distribution with
an average of 2 on non-sunny days. If it is sunny tomorrow with probability
0.4 and cloudy with probability 0.6, then obtain the mean and variance
number of the customers for tomorrow in the institution.
39) Suppose that box 1 contains 5 white and 6 black balls and box 2 contains 8
white and 10 black balls. Two balls are randomly selected from the box 1 and
then are put into box 2. Then, 3 balls are randomly chosen from box 2. What
is the mean number of white balls in the selected sample of size 3?
40) You and another person are invited to participate in an auction. If your
bidding wins, you decide to sell the bidding for 9 thousand dollars. If you
know that the bidding of the other person follows a uniform distribution in
interval 5 to 10 thousand dollars, then what should be your bidding amount
in order to maximize your expected value of profit?
41) In the preceding problem, suppose that two other people along with you
have participated in the auction. If you know that their bidding
independently follows a uniform distribution in interval 5 to 10 thousand
dollars, what should be your bidding amount in order to maximize the
expected value of your profit?
42) Suppose that a box contains 30 balls such that 7 of them are red and 8 of
them are blue. 12 balls are randomly selected from the box. If 𝑋 denotes the
number of red balls and 𝑌 denotes the number of blue balls, then it is desired
to calculate:
a. The expected value of 𝑋.
b. The expected value of 𝑋 given 𝑌 = 𝑦.
c. 𝐶𝑜𝑣(𝑋, 𝑌)
43) Suppose that we roll a fair die and its result is denoted by 𝑌. Then, people A
and B independently flip a fair coin 𝑌 times. If we denote the number of
539 | P a g e
heads resulted from their flips respectively by 𝑋1 and 𝑋2, then obtain the
covariance between 𝑋1 and 𝑋2.
Hint: In this problem, the variables 𝑋1 |𝑌 = 𝑦 and 𝑋2 |𝑌 = 𝑦 are independent
whereby the covariance between them becomes zero. This means that
when the result of the die is specified, the number of heads obtained by
people A and B is independent. However, the covariance between 𝑋1 and 𝑋2
is not zero. We call such variables conditionally independent.
540 | P a g e
[1] Ross, S. (2005); "A First Course in Probability", 7th edition; Prentice Hall
[2] Ghahramani, S. (1999); "Fundamentals of Probability", 2nd edition; Prentice
Hall
[3] Freund J. E. (1992); "Mathematical Statistics", 5th edition; Prentice-Hall, Inc
[4] Akhavan Niaki, Seyed Taghi (2015); “Probability Theory and its Applications”,
2nd edition; Sharif University of Technology
541 | P a g e

Eyvazian M Mehrpour MR Principles of Probability Theory and

Uploaded by

Copyright:

Available Formats

Eyvazian M Mehrpour MR Principles of Probability Theory and

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eyvazian M Mehrpour MR Principles of Probability Theory and

Uploaded by

Copyright:

Available Formats

The science of probability theory was first noticed in the middle of the 18th century with the study of

CHAPTER 2: AXIOMS OF PROBABILITY ................................................................................................................................................ 58

CHAPTER 3: CONDITIONAL PROBABILITY AND INDEPENDENCE ..................................................................................................... 106

CHAPTER 4: RANDOM VARIABLES ...................................................................................................................................................... 162

CHAPTER 5: EXPECTED VALUE .......................................................................................................................................................... 206

CHAPTER 6: SPECIAL DISCRETE RANDOM VARIABLES .................................................................................................................... 264

CHAPTER 7: SPECIAL CONTINUOUS RANDOM VARIABLES .............................................................................................................. 333

A ll methods of counting rely on the Basic Principle of Counting or the

There are 12 coaches, each of whom has 4 athletes participating in a ceremony.

How many three-digit even numbers can be made using 0, 1, 2, 3, 4, and 5

A common type of problems concerning the combinatorial analysis relates to the

According to the principle of multiplication, the number of states of putting

How many ways can we select a director, an administrative assistant, and a

How many different arrangements can be made using the letters 𝑎, 𝑎, 𝑐

How many different arrangements can be made using the letters 𝑎, 𝑎, 𝑏, 𝑏?

The number of states of The number of states of

Therefore, there is a relationship between the states of seating people in a row

𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑜𝑓 𝑠𝑒𝑎𝑡𝑖𝑛𝑔 𝑛 𝑝𝑒𝑜𝑝𝑙𝑒 𝑎𝑡 𝑎 𝑟𝑜𝑢𝑛𝑑 𝑡𝑎𝑏𝑙𝑒

Suppose that a class consists of five boys and four girls.

indicating the space that at most one "a" can be seated.

Hence, the number of states equals:

To prove it analytically, suppose we have an 𝑛-member set and we want to

To prove, using Identities (5.1) and (5.3), we have:

To understand the above identity better, suppose that we want to make a

Another important identity that is commonly used in the combinatorial

If we write (𝑥1 + 𝑥2 )𝑛 as (𝑥1 + 𝑥2 )(𝑥1 + 𝑥2 ) ⋯ (𝑥1 + 𝑥2 ) and expand it, the

What is the coefficient of 𝑥1 7 𝑥2 3 in the expansion of (𝑥1 + 𝑥2 )10 ?

What is the coefficient of 𝑥1 7 𝑥2 3 in the expansion of (2𝑥1 + 3𝑥2 )10?

What is the coefficient of 𝑥1 7 in the expansion of (2𝑥1 + 3)10 ?

Also, to prove the above identity analytically, suppose that we want to

The generalized form of the binomial expansion is called the Multinomial

(𝑥1 + 𝑥2 + 𝑥3 )7 = (𝑥1 + 𝑥2 + 𝑥3 )(𝑥1 + 𝑥2 + 𝑥3 ) ⋯ (𝑥1 + 𝑥2 + 𝑥3 )

In the multinomial expansion, if the coefficient of 𝑥𝑖 's equals 1, we call it the

If the objects are assumed to be the same or

If the objects are assumed to be different or

Furthermore, the term “indistinguishable containers” means that groupmates

Different rooms 4 people in 2 two-member rooms The same rooms

Suppose that we want to distribute “𝑛” distinguishable objects into “𝑟”

Hence, the number of possible states to distribute “𝑛” distinguishable objects

Suppose that we want to distribute a set, consisting of “𝑛” distinguishable

How many ways can seven people be distributed among 3 three-member

Type 3: Putting distinguishable objects into indistinguishable (identical) urns

Suppose that we want to distribute “𝑛” distinguishable objects into “𝑟”

Suppose that we want to position four people in two indistinguishable two-

Every 2! States of the left problem

𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑒𝑤 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 (𝑖𝑛𝑑𝑖𝑠𝑡𝑖𝑛𝑔𝑢𝑖𝑠ℎ𝑎𝑏𝑙𝑒 𝑢𝑟𝑛𝑠)

b. In this problem, rooms are indistinguishable, and only each person's

Room 101 Room 102 Room 103

𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑒𝑤 𝑝𝑟𝑜𝑏𝑙𝑒𝑚 (𝑖𝑑𝑒𝑛𝑡𝑖𝑐𝑎𝑙 𝑢𝑟𝑛𝑠)

b. In Section (b) of this example, since rooms are distinguishable and it is

The section b (Type 2) The section c (Type 3) The section a (Type 1)

Type 4: Putting distinguishable objects into distinguishable urns such that

Type 5: Putting distinguishable objects into distinguishable urns such that

Suppose that we want to distribute “𝑛” distinguishable objects into “𝑟”

Avenue (1) Avenue (2) Avenue (3)

However, if we first choose three officers numbered 4, 2, and 3 and position

Type 6: Putting indistinguishable objects into distinguishable urns such that

Urn 1 Urn 2 Urn 3