Lecture 1
Lecture 1
Lecture 1
Probability is a universally accepted tool for expressing degrees of confidence or doubt about some
proposition in the presence of incomplete information or uncertainty. By convention, probabilities are
calibrated on a scale of 0 to 1; assigning something a zero probability amounts to expressing the belief
that we consider it impossible, whereas assigning a probability of one amounts to considering it a
certainty. Most propositions fall somewhere in between. Probability statements that we make can be
based on our past experience, or on our personal judgments. Whether our probability statements are
based on past experience or subjective personal judgments, they obey a common set of rules, which we
can use to treat probabilities in a mathematical framework, and also for making decisions on predictions,
for understanding complex systems, or as intellectual experiments and for entertainment. Probability
theory is one of the most applicable branches of mathematics.
Treatment of probability theory starts with the consideration of a sample space. The sample space is the
set of all possible outcomes in some physical experiment. For example, if a coin is tossed twice and after
each toss the face that shows is recorded, then the possible outcomes of this particular coin-tossing
experiment, are HH, HT, TH, TT, with H denoting the occurrence of heads and T denoting the occurrence
of tails.
We call = {HH, HT, TH, TT} the sample space of the experiment. In general, a sample space is a general
set , finite or infinite. An easy example where the sample space is infinite is to toss a coin until the
first time that heads show up and record the number of the trial at which the first head appeared. In this
case, the sample space is the countably infinite set = {1, 2, 3, … }
Sample spaces can also be uncountably infinite; for example, consider the experiment of choosing a
number at random from the interval [0, 1]. The sample space of this experiment is = [0, 1]. In this case,
is an uncountably infinite set. In all cases, individual elements of a sample space are denoted as . The
first task is to define events and to explain the meaning of the probability of an event.
Definition: Let be the sample space of an experiment. Then any subset A of , including the empty set
and the entire sample space , is called an event. Events may contain even one single sample point ,
in which case the event is a singleton set {}. We want to assign probabilities to events. But we want to
assign probabilities in a way that they are logically consistent. Here is a definition of what counts as a
legitimate probability on events.
(b) P() = 1
∞
(c) Given disjoint subsets A1, A2, A3, … of , P ¿ = ∑ P (A i)
i=1
Property (c) is known as countable additivity. Note that it is not something that can be proved, but it is
like an assumption or an axiom.
Definition: Let be a finite sample space consisting of N sample points. We say that the sample points
are equally likely if P() = 1/N for each sample point . An immediate consequence, due to the additivity
axiom, is the following useful formula.
Proposition: Let be a finite sample space consisting of N equally likely sample points. Let A be any event
and suppose A contains n distinct sample points. Then
Example: If two dice are rolled, what is the probability that they sum to seven?
Example: If four coins are tossed, what is the probability of getting at least two heads?
Example: Suppose there are five pairs of shoes in a closet and four shoes are taken out at random. What
is the probability that among the four that are taken out, there is at least one complete pair?
Example: How many random people do you need in a room before the probability that two of them
share a birthday exceeds 0.5?
Conditional Probability and Independence
Both conditional probability and independence are fundamental concepts in probability and statistics
alike. Conditional probabilities correspond to updating one’s beliefs when new information becomes
available. Independence corresponds to irrelevance of a piece of new information, even when it is made
available. In addition, the assumption of independence can and does significantly simplify development,
mathematical analysis, and justification of tools and procedures.
Definition: The complement of an event A is denoted by AC. It is defined as the probability of the event
not occurring. In other words, if P(A) is the probability of event A occurring, then P(A C) is the probability
of event A not occurring. The probability of the complement of an event can be calculated by subtracting
the probability of the event from 1.
Definition: Let A and B be general events with respect to some sample space , and suppose P(A) > 0.
The conditional probability of B given A is defined as
P( A ∩ B)
P ( B| A )=
P (A )
Some immediate consequences of the definition of a conditional probability are the following.
Theorem: (a) (Multiplicative Formula) For any two events A, B such that P(A) > 0, one has
P(A B) = P(A) P(B | A)
(b) For any two events A, B such that 0 < P(A) < 1, one has P(B) = P(B | A) P(A) + P(B | AC) P(AC)
(c) (Total Probability Formula) If A1, A2, …, Ak form a partition of the sample space , (i.e., Ai Aj = for
k
all i j , and ¿ i=1 ¿ k A i=¿ ¿ ), and if 0 < P(Ai) < 1 for all i, then P ( B )=∑ P ( B| A i ¿ P( A i) ¿
i=1
Example: If a card is drawn from a deck, what is the probability that it is an ace given that we know that
it is red?
Example: If two dice are rolled, what is the probability that they sum to 8 if we know that the sum is
even?
Example: One of two urns has 99 red balls and 1 black ball, and the other has 1 red and 1 black ball. One
ball is chosen at random from each urn, and then one of these two balls is chosen at random. What is
the probability that this ball is red?
In order to calculate a conditional probability P(A | B) when we know the other conditional probability
P(B | A), a simple formula known as Bayes’ theorem is useful. Here is a statement of a general version of
Bayes’ theorem.
Theorem: Let {A1, A2, …, Am} be a partition of a sample space . Let B be some fixed event. Then
P ( A j| B ¿=P ( B| A j ¿ P( A j ) ¿ ¿
m
∑ P ( B| A i ¿ P( A i)
i=1
Example: Dangerous fires in America are rare (1% of evenings), but smoke is fairly common (10% of
evenings) due to barbecues, and 90% of dangerous fires make smoke. Calculate the probability of there
being a dangerous fire when there is smoke in the evening.
Example: Suppose that the questions in a multiple-choice exam have five alternatives each, of which a
student has to pick one as the correct alternative. A student either knows the truly correct alternative
with probability 0.7, or she randomly picks one of the five alternatives as her choice. Suppose a
particular problem was answered correctly. We want to know what the probability is that the student
really knew the correct answer.
Example: A random student wants to know if they have Covid. There is a test for Covid, but the test is
not always right. For people who really do have Covid, the test says "Yes" 80% of the time. For people
that do not have Covid, the test says "Yes" 10% of the time. If 1% of the population have the allergy,
and the student's test says "Yes", what are the chances that the student really has Covid?
Example: In a jury trial, let’s assume that the probability the defendant is convicted, given they are guilty,
is 82%, and that the probability that the defendant is acquitted, given innocence, is 80%. Suppose that
85% of all defendants are indeed guilty. Now, suppose a particular defendant is convicted of a crime.
Find the probability they are innocent.