Handouts Probability Iit

Probability and Statistics
MA-202
Handout 1
January 08, 2024
Random Experiment: A random (or statistical) experiment is an experiment in which:
• All possible outcomes of the experiment are known in advance.
• Any performance of the experiment yields an outcome that is unpredictable beforehand.
• The experiment can be repeated under identical conditions.
Our goal is to study this uncertainty of a random experiment. For this purpose, we associate with
each such experiment a set Ω, the set of all possible outcomes of the experiment. This collection
Ω must be
• mutually exclusive: that is, if head comes in a coin tossing experiment then tail doesn’t
come and vice-versa (or there is a unique outcome).
• collectively exhaustive: no matter what happens in the experiment, we always obtain an

outcome that has been included in Ω.
Recap
Example 1. Consider tossing a coin two times. Let A be a subset that includes outcomes con-
taining atleast one head in two throws and let B be a subset that including outcomes where both
tosses results in tails. Find the probability of A and B.
Solution The set of possible outcomes, Ω, is given by
Ω = {HH, HT, T H, T T }
where H denotes a head and T denotes a tail. Clearly, we have
A = {HH, HT, T H}, B = {T T }.
The required probabilities are given by
3 1
P (A) = , P (B) = .
4 4
Example 2. A die is rolled twice. Let A be the collection of outcomes that the first throw shows
a number ≤ 2, and B be the collection of outcomes that the second throw shows at least 4. Find
probability of A and B.
Solution The set of possible outcomes, Ω, is given by
Ω = {(i, j)|1 ≤ i, j ≤ 6}.
We have
A = {(i, j)|i = 1, 2, j = 1, 2, 3, 4, 5, 6}
B = {(i, j)|i = 1, 2, 3, 4, 5, 6, j = 4, 5, 6}
The required probabilities are given by
12 1 18 1
P (A) = = , P (B) = = .
36 3 36 2
Remark 1. The random experiments in above examples have a finite number of possible outcomes
and it is assumed that all the outcomes are equally likely. Thus, the probability of any subset is
defined as the proportion of the number of favorable outcomes to the total number of all possible
outcomes, i.e.,
n(A)
P (a) =
n(Ω)
2
where n(A) denotes the cardinality of set A. This is the classical definition of the probability given
by Laplace in his monumental work, Theorie analytique des probabilites (1812).
Remark 2. Next, suppose the Ω is countably infinite, say, Ω = {ω1 , ω2 , . . . , }. For example, we
select a number at random from the set of natural numbers, N. The classical definition is not
applicable in such cases. Further, we cannot have uniform probability for each outcome.
• If P (ωi ) = p > 0, ∀i, we can find sufficiently large N such that N p > 1, hence a contradic-
tion.
Extension of the classification definition

An extension of Laplace’s classical definition was used to evaluate probabilities of sets with infinite
outcomes. According to this extension, if Ω is some region with a well-defined measure (length,
area, volume, etc.), the probability that a point chosen at random lies in a sub-region A of Ω is
the ratio
P (A) = measure(A)/measure(Ω).
Example 3. Consider throwing a dart on the square target and viewing the point of impact as
outcome. The outcome of a single through is going to be a point in the region denoted by Ω and
given by
Ω = {(x, y)|0 ≤ x, y ≤ 1)}.
Since there are infinitely many points, the sample space is of size infinite and is uncountable. Find
the probability that dart will hit the region {(x, y)|x + y ≤ 0.5}.
Solution Let A denotes the target region, i.e.,
A = {(x, y)|x + y ≤ 0.5}.
We can find the probability of A as

1
area of A 2
× 12 × 1
2 1
P (A) = = = .
area of Ω 1×1 8
This extension proved instrumental in resolving numerous geometric probability

problems. However, the challenge lies in the fact that one can arbitrarily define
3
“at random” in various ways, and distinct definitions can yield different solutions.
Example 4. (Bertrand’s Paradox) A chord is drawn at random in the unit circle. What is
the probability that the chord is longer than the side of the equilateral triangle inscribed in the
circle?
Solution 1 The length of a chord is uniquely characterized by the midpoint’s distance from the
circle’s center. Because the circle is symmetrical, we assume the midpoint M is on a fixed line
from the center O (see Figure 1). It is evident that the chord’s length exceeds the side of the
inscribed equilateral triangle when the length of OM is less than half of the radius. Consequently,
the desired probability is 0.5.
Figure 1: Solution 1
Solution 2 Let us pick a point, say V , on the circle. Now, draw a tangent to the circle at V .
Draw a line at random angle Φ passing through V which intersects the circle at some point, thus
forming a chord. Clearly, the length of the chord is greater than the side of the equilateral triangle,
2π
− π3
if Φ ∈ [ π3 , 2π
3
]. Therefore, the required probability is 3
π−0
= 13 .
4
Figure 2: Solution 2
Axiomatic definition of probability

A. N. Kolmogorov, 1933.
For the reasons above, it is necessary to develop a consistent probability theory, and the idea is
to define directly the probability of subsets of Ω. In other words, restrict our attention to “good
subsets” of the sample space on which the probability is well-defined. The collection of such “good
subsets” is known as σ−field, which is defined below.
Definition 1. (σ- Field) Let F denote the collection of subsets of Ω which satisfies the following
properties:
a) ∅ ∈ F.
b) If A ∈ F then the complement set A, Ac ∈ F.
c) If Ai , i = 1, 2, . . . , ∈ F then ∪i Ai ∈ F.
Then, F is called as σ-field on Ω. The pair (Ω, F) is called the sample space. The default F is
the collection of all possible subsets of Ω including the null set which is also called as the power
set on Ω.
• The elements of Ω are called sample points.
• Any set A ∈ F is known as an event.
5
• We say that an event A occurs if the outcome of the experiment corresponds to a point in
A.
• If the set Ω contains only a finite number of points, we say that (Ω, F) is a finite sample
space.
• If Ω contains at most a countable number of points, we call (Ω, F) a discrete sample space.
• If Ω contains uncountably many points, we say that (Ω, F) is an uncountable sample space.
For instance, if Ω = Rp or some rectangle in Rp , we call it a continuous sample space.
Definition 2. (Probability) Define a real-valued function P on F satisfying the following con-

ditions:
a) P(A) ≥ 0 ∀ A ∈ F.
b) P(Ω) = 1.
T
c) Let, A1 , A2 , . . . be mutually exclusive events in F, i.e., Ai Aj = ∅ ∀ i ̸= j, then
[∞ ∞
X
P Ai = P(Ai ).
i=1 i=1
The function P is called a probability function over sample space (Ω, F), and the triplet (Ω, F, P )
is called a probability space/model.
Remark 3. Note that, different F and different P will give different probability model for a given
Ω.
Discussion
(1) Ω is Finite
If Ω contains n points {ω1 , . . . , ωn }, with n < ∞, it is sufficient to define P for ωi ’s, say,
P (ωi ) = pi .
One can then consider F as the class of all subsets of Ω, and obtain probability of any event
(subset of Ω) as
X X
P (A) = P (ωj ) = pj
{j | ωj ∈A} {j | ωj ∈A}
6
From point (a) in the definition of probability, we need pi ≥ 0, and from point (b), we need
Pn
j=1 pj = 1.
Example 5. One such function is the equally likely assignment (uniform probabilities).
According to this assignment, P ({ωj }) = n1 , j = 1, 2, . . . , n. Thus P (A) = m
n
if A contains
m elementary events, 1 < m < n. Note that this is the classical definition of probability.
2. Ω is countably infinite As noted earlier, if Ω is discrete and contains a countable number

of points, one cannot assign uniform probabilities. However, due to countable additivity of
probability function, it suffices to make the assignment for each elementary event. Similar
to the finite case, if Ω contains countably infinite points {ω1 , ω2 , . . .}, it is sufficient to define
P for ωi ’s, say,
P (ωi ) = pi ∀i
because using the countable additivity of probability, one can find probability of any event
(subset of Ω) as
X X
P (A) = P (ωj ) = pj
{j | ωj ∈A} {j | ωj ∈A}
• From point (a) in the definition of probability, we need pi ≥ 0, and from point (b), we
P
need j pj = 1.
• In this case, F can be considered as the power set of Ω.
Example 6. Let N = {1, 2, 3, . . .} be the set of positive integers, and define P as follows:
1
P ({n}) = , n = 1, 2, . . .
2n
1
P
Since 2n
≥ 0 and i∈N P ({i}) = 1, P defines a probability.
• P (number is even) = P ({2, 4, 6, . . .}) = P ({2}) + P ({4}) + . . . = 1

22
+ 1
24
+ . . . = 13 .
• P (number is multiple of 3) = P ({3, 6, 9, . . .}) = P ({3}) + P ({6}) + . . . = 1

23
+
1
26
+ . . . = 17 .
Example 7. Suppose Ω = {0, 1, 2, . . . } and P ({x}) = (1 − θ)θx , x = 0, 1, 2, . . . , 0 < θ < 1.

One can verify that all the axioms are satisfied.
Remark 4. We observe from (1) and (2) above that if Ω contains countable number of points,
it is sufficient to define probabilities of sample points ωi ’s and probability of any subset of Ω can
be obtained using the countable (or finite) additivity. Thus, we need not bother about what F is
7
(which, of course, is power set of Ω).
3. Ω is uncountable Probabilistic models with continuous sample space differ from their dis-
crete counterparts in that the probabilities of the single element events may not be sufficient
to characterize the probability.
• Clearly, one cannot make an equally likely assignment of probabilities. (See the case
of countable Ω)
• Indeed, one cannot assign positive probability to each elementary event without vio-
lating the axiom P (Ω) = 1.
Thus, in this case, one assigns probabilities directly to good subsets. In this course, we will
mostly be working with intervals in R (or their higher dimensional counterparts).
Example 8. Let Ω = [0, ∞). Define P as follows: For each interval I ⊂ Ω,

Z
P (I) = e−x dx.
I
Clearly, P (I) > 0, P (Ω) = 1, and P is countably additive by properties of integrals. Thus,
P is a probability function on Ω.
Using the above P , we can find probabilities of different events. For example,
Z ∞
P ((0, ∞)) = e−x dx = 1
Z0 ∞
P ((1, ∞)) = e−x dx = e−1
Z1 ∞
P ([1, ∞)) = e−x dx = e−1
Z1 1
P ((0, 1)) = e−x dx = e−0 − e−1
0
Remark 5. The probability of elementary events, i.e., singletons is zero for a continuous
sample space.
Remark 6. Note that if P (A) = 0 for some event A, we call A an event with zero probability
or a null event. However, it does not follow that A = ∅. For example, P ({1}) = 0 in the
above example. Similarly, if P (B) = 1 for some event B, we call B a certain event, but it
does not follow that B = Ω. For example, P ((0, ∞)) = 1 in the above example.
8
Example 9. Let Ω = {ω : 0 ≤ ω ≤ a}. Define
x
P {[0, x]} = , 0 < x ≤ a.
a
Check whether all probability axioms are satisfied or not? Find P {[ a4 , a2 ]}.
Other properties
Theorem 1. Given the events A and B such that A ⊂ B, show that
(i) P (A) ≤ P (B) (P is monotone)
(ii) P (B − A) = P (B) − P (A) (P is subtractive)
Remark 7. It follows from Theorem 1 that P (A) ≤ 1 for all events.
Theorem 2. (Addition Rule) For the events A, B, we have
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
and hence P (Ac ) = 1 − P (A).

Handouts Probability Iit

Uploaded by

Copyright:

Available Formats

Handouts Probability Iit

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handouts Probability Iit

Uploaded by

Copyright:

Available Formats

Probability and Statistics

January 08, 2024

Random Experiment: A random (or statistical) experiment is an experiment in which:

• All possible outcomes of the experiment are known in advance.

• Any performance of the experiment yields an outcome that is unpredictable beforehand.

• The experiment can be repeated under identical conditions.

• collectively exhaustive: no matter what happens in the experiment, we always obtain an

where H denotes a head and T denotes a tail. Clearly, we have

A = {HH, HT, T H}, B = {T T }.

The required probabilities are given by

Solution The set of possible outcomes, Ω, is given by

Ω = {(i, j)|1 ≤ i, j ≤ 6}.

The required probabilities are given by

Extension of the classification definition

Solution Let A denotes the target region, i.e.,

A = {(x, y)|x + y ≤ 0.5}.

We can find the probability of A as

This extension proved instrumental in resolving numerous geometric probability

Axiomatic definition of probability

b) If A ∈ F then the complement set A, Ac ∈ F.

• The elements of Ω are called sample points.

• Any set A ∈ F is known as an event.

Definition 2. (Probability) Define a real-valued function P on F satisfying the following con-

2. Ω is countably infinite As noted earlier, if Ω is discrete and contains a countable number

• In this case, F can be considered as the power set of Ω.

• P (number is even) = P ({2, 4, 6, . . .}) = P ({2}) + P ({4}) + . . . = 1

• P (number is multiple of 3) = P ({3, 6, 9, . . .}) = P ({3}) + P ({6}) + . . . = 1

Example 7. Suppose Ω = {0, 1, 2, . . . } and P ({x}) = (1 − θ)θx , x = 0, 1, 2, . . . , 0 < θ < 1.

Example 8. Let Ω = [0, ∞). Define P as follows: For each interval I ⊂ Ω,

(i) P (A) ≤ P (B) (P is monotone)

(ii) P (B − A) = P (B) − P (A) (P is subtractive)

Remark 7. It follows from Theorem 1 that P (A) ≤ 1 for all events.

Theorem 2. (Addition Rule) For the events A, B, we have

and hence P (Ac ) = 1 − P (A).

You might also like