CSE3013 Module5

"Artificial Intelligence"
Uncertainity and Knowledge Reasoning
Dr. Rabindra Kumar Singh

Associate Professor (Sr.)
School of Computer Science and Engineering
VIT - Chennai
Dr. Rabindra Kumar Singh "Artificial Intelligence" 1/ 109

Introduction
What is Uncertainty?
It is a state of
Doubt about the future or
About what is the right thing to do.
Examples :
A doctor does not know exactly what is going on inside a patient.
A teacher does not know exactly what a student understands,
A robot does not know what is in a room it left a few minutes ago.
A period of Politician
The way of life in the coastal region.

Uncertainty Knowledge Representation
We used FOL and Propositional logic to represent the knowledge with

confidence(certainty), thus we were assured of the predicates.
Example: Given A → B, which means that if A is true, then B must be true.
But consider a situation where we are not sure about whether A is true or not
then we cannot express this statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the

predicates, we need uncertain reasoning or probabilistic reasoning.
Causes of uncertainty :
Information occurred from unreliable sources.
Experimental Errors or Equipment fault
Temperature variation or Climate change.

Probabilistic reasoning

VIT - Chennai

Probabilistic Reasoning I
It is a way of representing the knowledge, where we apply the concept of

probability to indicate the uncertainty in knowledge.
Here, we combine probability theory with logic to handle the uncertainty.
In the real world, there are lots of scenarios, where the certainty of something is
not confirmed, such as :
"It will rain today"
"Behavior of someone for some situations"
"A match between two teams or two players."
These are probable sentences for which we can assume that it will happen but
not sure about it, so here we use probabilistic reasoning.
Need of Probabilistic reasoning in AI:

When there are unpredictable outcomes.
When specifications or possibilities of predicates becomes too large to
handle.
When an unknown error occurs during an experiment.

Probabilistic Reasoning II
In PR, There are two ways to solve problems with uncertain knowledge:
Bayes’ rule
Bayesian Statistics
Probability?
It is a chance that an uncertain event will occur.
It is measured numerically (0 or 1) of the likelihood that an event will occur.
- 1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

- 2. P(A) = 0, indicates total uncertainty in an event A. ( P(¬A) ).
- 3. P(A) = 1, indicates total certainty in an event A.

- From 2 and 3 : P(A) + P(¬A) = 0
We can find the probability of an uncertain event by using the below formula.
Number _of _Desired_Outcomes
Probability _of _Occurance =
Total_Number _of _Outcomes

Common Terms
Event : Each possible outcome of a variable is called an event.

Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and
objects in the real world.
Prior probability: The prior probability of an event is probability
computed before observing new information.
Posterior Probability: The probability that is calculated after all evidence
or information has taken into account. It is a combination of prior
probability and new information.

Conditional probability I
It is a probability of occurring an event when another event has already

happened.
Let’s suppose, we want to calculate the event ’A’ when event ’B’ has already
occurred, "the probability of A under the conditions of B", it can be written as:
P(A ∧ B)
P(A|B) =
P(B)
Where P(A ∧ B)= Joint probability of A and B

P(B)= Marginal probability of B.
If the probability of ’A’ is given and we need to find the probability of B, then
it will be given as:
P(A ∧ B)
P(B|A) =
P(A)

Conditional probability II
It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when
event B is already occurred by dividing the probability of P(A ∧ B) by P(B).
Example : In a class, there are 70% of the students who like English and 40% of the
students who likes English and mathematics, and then what is the percent of students
those who like English also like mathematics?
Solution : Let, A is an event that a student likes Mathematics and B is an event that
a student likes English.
P(A ∧ B) 0.4
P(A|B) = = = 57%
P(B) 0.7
Hence, 57% are the students who like English also like Mathematics.
Bayes’ theorem
It is also known as Bayes’ rule, Bayes’ law, or Bayesian reasoning, which

determines the probability of an event with uncertain knowledge.
It relates the conditional and marginal probabilities of 2 random events.
It was named after the British mathematician Thomas Bayes. The
Bayesian inference is an application of Bayes’ theorem, which is
fundamental to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of
P(A|B).
It allows updating the probability prediction of an event by observing new
information of the real world.

Bayes’ theorem I
Example: If cancer corresponds to one’s age then by using Bayes’ theorem, we

can determine the probability of cancer more accurately with the help of age.
Bayes’ theorem can be derived using product rule and conditional probability of
event A with known event B:
As from product rule we can write: Similarly, the probability of event B

with known event A
P(A ∧ B) = P(A|B)P(B) (1) P(A ∧ B) = P(B|A)P(A) (2)
Equating right hand side of both the equations, we will get:
P(B|A), P(A)
P(A ∧ B) = (3)
P(B)
The above equation (3) is called as Bayes’ rule or Bayes’ theorem. This
equation is basic of most modern AI systems for probabilistic inference.

Bayes’ theorem II
It shows the simple relationship between joint and conditional probabilities.
Here,
P(A|B) is known as posterior, which we need to calculate, and it will be
read as Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is
true, then we calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before
considering the evidence
P(B) is called marginal probability, pure probability of an evidence.
In the equation (1), in general, we can write P(B) = P(A) ∗ P(B|Ai ), hence
the Bayes’ rule can be written as:
P(Ai ∗ P(B|Ai ))
P(Ai |B) = Pk (4)
i=1
P(Ai ) ∗ P(B|Ai )
Where A1 , A2 , A3 , ........, An is a set of mutually exclusive and exhaustive events.

Applying Bayes’ Rule
Bayes’ rule allows us to compute the single term P(B|A) in terms of

P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these
three terms and want to determine the fourth one.
Suppose we want to perceive the effect of some unknown cause, and want
to compute that cause, then the Bayes’ rule becomes:
P(effect|cause), P(cause)
P(cause|effect) =
P(effect)

Examples-1
What is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck,
and it occurs 80% of the time. He is also aware of some more facts, which are
given as follows:
The Known probability that a patient has meningitis disease is 1/30,000.
The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition
that patient has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
1
P(a|b) ∗ P(b) 0.8 ∗ ( 30000 )
P(b|a) = = = 0.001333333
P(a) 0.02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease
with a stiff neck.
Example-2
From a standard deck of playing cards, a single card is drawn. The probability that the
card is king is 4/52, then calculate posterior probability P(King|Face), which means
the drawn face card is a king card.
Solution:
P(face|king) ∗ p(king)
P(king|face) = (1)
P(face)
P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13
P(face|king): probability of face card when we assume it is a king = 1
1
1∗ 13 1
P(king|face) = 3
= (2)
13
3
is the probability that the face card is king card

Example-3
Consider the two events A and B in a sample space S.

5
A : T F T T F T T F. ⇒ P(A) = 8
4 1
B : F T T F T F T F. ⇒ P(B) = 8
= 2
Case:1 Suppose, if the P(B|A) = 25 . Then Computing the probability of

occurring event ’A’ given that event ’B’ occurs as :
2∗5
P(B|A)∗P(A) 5 8 2 1
P(A|B) = P(B)
= 4 = 4
= 2
8
Case:2 Suppose, if the P(A|B) = 11 . Then Computing the probability of

occurring event ’B’ given that event ’A’ occurs as :
1∗4
P(A|B)∗P(B) 2 8 2
P(B|A) = P(A)
= 5 = 5
8

Example-4
Problem :
Consider a boy who has a volleyball tournament on the next day, but today he
feels sick. It is unusual that there is only a 40% chance he would fall sick since
he is a healthy boy. Now, Find the probability of the boy participating in the
tournament. The boy is very much interested in volleyball, so there is a 90%
probability that he would participate in tournaments and 20% that he will fall
sick given that he participates in the tournaments.
Solution : Let
’A’ be a boy participating the tournament = P(A) = 90%.
’B’ be a boy is Sick = P(B) = 40% and
P(B|A) = P(Boy is Sick | Boy participating in the tournament ) = 20%
Then, Finding P(Boy participating in the tournament | Boy is Sick) is :
P(B|A)∗P(A) 0.9∗0.2
Finding the P(A | B) = P(B)
= 0.4
= 0.45

Example-5 I
Consider the below Table :

Practical Communication
Sl.No CGPA Interactiveness Job Offer
Knowledge Skills
1 ≥ 9 Yes Very Good Good Yes
2 ≥ 8 No Good Moderate Yes
3 ≥ 9 No Average Poor No
4 < 8 No Average Good No
5 ≥ 8 Yes Good Moderate Yes
6 ≥ 9 Yes Good Moderate Yes
7 < 8 Yes Good Poor No
8 ≥ 9 No Very Good Good Yes
9 ≥ 8 Yes Good Good Yes
10 ≥ 8 Yes Average Good Yes
The Given Data set, which consists of 10 data instances with attributes such as
’CGPA’, ’Interactiveness’, Communication Skills’, and ’Practical
Knowledge’, and the Target Variable ’Job Offer’ which is classified as Yes or
No for a student.

Example-5 II
Step-1 : Compute Prior Probability for the target feature ’Job Offer’. It has 2
class Yes/No. It is a binary classification Problem.
Given a student instance, need to classify whether ’Job Offer’ = Yes or No.
From the Given DS, the frequency or the number of instances with Job Offer
= ’Yes’ is 7 and Job Offer = ’No’ is 3.
7 3
∴ P(J=’Yes’) = 10
and P(J=’No’) = 10
.
Step-2 : Compute the Likelihood Probability for each of the feature :
1 CGPA :
Table: Frequency Matrix of CGPA
Sl.No CGPA Job Offer = ’Yes’ Job Offer = ’No’

1 ≥9 3 1
2 ≥8 4 0
3 <9 0 2
Total 7 3

Example-5 III
Table: Likelihood Probability of CGPA (C)
Sl.No CGPA P(Job Offer = ’Yes’) P(Job Offer = ’No’)

1 ≥9 P( C | J ) = 3/7 P( C | J) = 1/3
2 ≥8 P( C | J ) = 4/7 P( C | J ) = 0/3
3 ≥9 P( C | J ) = 0/7 P( C | J ) = 2/3
2 Interactiveness
Table: Frequency Matrix of Interactivenss (I)
Sl.No Interactiveness Job Offer = ’Yes’ Job Offer = ’No’

1 Yes 5 1
2 No 2 2
Total 7 3
Table: Likelihood Probability of Interactiveness (I)
Sl.No Interactiveness P(Job Offer = ’Yes’) P(Job Offer = ’No’)

1 Yes P( I | J ) = 5/7 P( I | J) = 1/3
2 No P( I | J ) = 2/7 P(I | J ) = 2/3

Example-5 IV
3 Practical Knowledge (Pk) :
Table: Frequency Matrix of Practical Knowledge (Pk)
Sl.No Practical Knowledge Job Offer = ’Yes’ Job Offer = ’No’

1 Very Good 2 0
2 Average 1 2
3 Good 4 1
Total 7 3
Table: Likelihood Probability of Practical Knowledge (Pk)
Sl.No Practical Knowledge P(Job Offer = ’Yes’) P(Job Offer = ’No’)

1 Very Good P( Pk | J ) = 2/7 P( Pk | J) = 0/3
2 Average P( Pk | J ) = 1/7 P( Pk | J ) = 2/3
2 Good P( Pk | J ) = 4/7 P( Pk | J ) = 1/3

Example-5 V
4 Communication Skills (Cs)
Table: Frequency Matrix of Communication Skills (Cs)
Sl.No Communication Skills Job Offer = ’Yes’ Job Offer = ’No’

1 Good 4 1
2 Moderate 3 0
3 Poor 0 2
Total 7 3
Table: Likelihood Probability of Communication Skills (Cs)
Sl.No Communication Skills P(Job Offer = ’Yes’) P(Job Offer = ’No’)

1 Good P( Cs | J ) = 4/7 P( Cs | J) = 1/3
2 Moderate P( Cs | J ) = 3/7 P( Cs | J ) = 0/3
2 Poor P( Cs | J ) = 0/7 P( Cs | J ) = 2/3

Example-5 VI
Step-3 : Test Data - P (C ≥ 9, I = ’Yes’, Pk = ’Average’, Cs = ’Good’) = ?

(ie. Student gets a Job Offer or Not).
By Applying Bayes Theorm :
Case-1 :P(J=’Y’ | Test Data) = P(J = ’Y’ |C, I, Pk,CS) * P(J=’Y’)/P((Test Data))
Here, we can ignore P(Test Data) in the denominator since it is common for all
cases to be considered.
= P(C ≥ 9|J=’Y’) * P(I=’Y’|J=’Y’) * P(Pk=’Avg’|J=’Y’) * P(Cs=’Gd’|J=’Y’) * P(J = ’Y’)
3 5 1 4 7
= 7
* 7
* 7
* 7
* 10
* = 0.0175
Case-2 : P(J=’N’ | Test Data) = P(J = ’N’ |C, I, Pk,CS)/P(Test Data)

Here, we can ignore P(Test Data) in the denominator since it is common for all
cases to be considered.
= P(C ≥ 9|J=’N’) * P(I=’Y’|J=’N’) * P(Pk=’Avg’|J=’N’) * P(Cs=’Gd’|J=’N’) * P(J = ’Y’)
1 1 2 1 3
= 3
* 3
* 3
* 3
* 10
= 0.0074

Example-6
Take a real-time example of predicting the result of a student using Naive

Bayes algo training data set T consists of 8 data instances with attributes such
as ’Assessment, ’Assignment’‘Project’ and ‘Seminar’ as shown in below Table.
The target variable is Result which is classified as Pass or Fail for a candidate
student.
Given a test data to be (Assessment = Average, Assignment = Yes, Project =

No and Seminar = Good), predict the result of the student.
S.No. Assessment Assignment Project Seminar Result
1. Good Yes Yes Good Pass
2. Average Yes No Poor Fail
3. Good No Yes Good Pass
4. Average No No Poor Fail
5. Average No Yes Good Pass
6. Good No No Poor Pass
7. Average Yes Yes Good Fail
8. Good Yes Yes Poor Pass

Example-7
Consider an example of predicting a student’s result using Gaussian Naive Baye

continuous attribute. The training dataset T consists of 10 data instances with
attribut “Assessment Marks’, ‘Assignment Marks’ and ‘Seminar Done’ as
shown in Table given below. The Target variable is Result which is classified as
Pass or Fail for a candidate student.
Given a test data to be (Assessment Marks = 75, Assignment Marks = 6,

Seminar Done = 7 predict the result of the student.
S.No. Assessment Marks Assignment Marks Seminar Done Result
1. 95 8 Good Pass
2. 71 5 Poor Fail
3. 93 9 Good Pass
4. 62 4 Poor Fail
5. 81 9 Good Pass
6. 93 8.5 Poor Pass
7. 65 9 Good Pass
8. 45 3 Poor Fail
9. 78 8.5 Good Pass
10. 56 4 Poor Fail

Applications
It is used to calculate the next step of the robot when the already
executed step is given.
Bayes’ theorem is helpful in weather forecasting.
It can solve the Monty Hall problem.

Bayesian Belief Network
It is key computer technology for dealing with probabilistic events and to

solve a problem which has uncertainty.
"It is a probabilistic graphical model which represents a set of variables
and their conditional dependencies using a directed acyclic graph."
It is also called a Bayes/Belief/Decision/Bayesian model.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.
It can be used for building models from data and experts opinions, and it
consists of two parts:
Directed Acyclic Graph
Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve
decision problems under uncertain knowledge is known as an Influence
diagram.

Bayesian Network Graph I
It is made up of nodes and arcs (Directed Acyclic Graph)[DAG], where:
Each node corresponds to the random variables, and a variable can be

continuous or discrete.
Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows
connect the pair of nodes in the graph. These links represent that one
node directly influence the other node, and if there is no directed link that
means that nodes are independent with each other
In the above diagram, A, B, C, and D are random variables represented by
the nodes of the network graph.
If we are considering node B, which is connected with node A by a directed
arrow, then node A is called the parent of Node B.
Node C is independent of node A.

Bayesian Network Graph II
The Bayesian network has mainly two components:

Causal Component
Actual numbers
Each node in the Bayesian network has condition probability distribution
P(Xi |Parent(Xi )), which determines the effect of the parent on that node.
It is based on Joint probability distribution and conditional probability.
Joint Probability Distribution: The probabilities of a different combination of

the variables x1 ,x2 ,x3 ......xn are known as the "joint probability distribution" if
we have those variables.
P[x1 ,x2 ,x3 ......xn ], can be written in terms of JPD as

= P[x1 |x2 ,x3 ......xn ]*P[x2 ,x3 ......xn ]
= P[x1 |x2 ,x3 ......xn ]P[x2 |x3 ......xn ]......P[xn−1 |xn ]
In General, For every Variable Xi , we can write the equation as :

P(Xi |Xi−1 ,........., X1 ) = P(Xi |Parents(Xi ))

Explanation of Bayesian network I
Let’s understand the Bayesian network through an example by creating a

directed acyclic graph:
Example
Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably
responds at detecting a burglary but also responds for minor earthquakes. Harry has
two neighbors David and Sophia, who have taken a responsibility to inform Harry at
work when they hear the alarm. David always calls Harry when he hears the alarm, but
sometimes he got confused with the phone ringing and calls at that time too. On the
other hand, Sophia likes to listen to high music, so sometimes she misses to hear the
alarm. Here we would like to compute the probability of Burglary Alarm.
Problem: Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called the Harry.
Solution:
The network structure is showing that burglary and earthquake is the
parent node of the alarm and directly affecting the probability of alarm’s
going off, but David and Sophia’s calls depend on alarm probability.

Explanation of Bayesian network II
The network is representing that our assumptions do not directly perceive

the burglary and also do not notice the minor earthquake, and they also
not confer before calling.
The conditional distributions for each node are given as conditional
probabilities table or CPT.
Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
In CPT, a boolean variable with k boolean parents contains 2K
probabilities. Hence, if there are two parents, then CPT will contain 4
probability values
List of Events Occurred :
Burglary (B)
Earthquake(E)
Alarm(A)
David Calls(D)
Sophia calls(S)

Explanation of Bayesian network III
Events in terms of Probability : P[D, S, A, B, E] can rewrite the above probability

statement using joint probability distribution:
P[D, S, A, B, E] = P[D | S, A, B, E]. P[S, A, B, E]
= P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P[D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Explanation of Bayesian network IV
Let’s take the observed probability for the Burglary and earthquake component:
P(B=True) = 0.002, which is the probability of burglary.

P(B=False) = 0.998, which is the probability of no burglary.
P(E=True) = 0.001, which is the probability of a minor earthquake.
P(E=False) = 0.999, Which is the probability that an earthquake not occurred.
Table: CPT for Alarm A Depends on ’B’ & ’E’
B E P(A=True) P(A=False)
True True 0.94 0.06
True False 0.95 0.04
False True 0.31 0.69
False False 0.001 0.999
Table: CPT for David Calls Depends on ’A’ Table: CPT for Sophia Calls Depends on ’A’
A P(D=True) P(D=False) A P(S=True) P(S=False)

True 0.91 0.09 True 0.75 0.25
False 0.05 0.95 False 0.02 0.98

Explanation of Bayesian network V
From the formula of joint distribution, we can write the problem statement in
the form of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) * P(D|A) * P(A|¬B ∧ ¬E) * P(¬B) * P(¬E).

P(S, D, A, ¬B, ¬E) = 0.75 ∗ 0.91 ∗ 0.001 ∗ 0.998 ∗ 0.999
P(S, D, A, ¬B, ¬E) = 0.00068045
Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.
The semantics of Bayesian Network:

To understand the network as the representation of the Joint
probability distribution. (It is helpful to understand how to construct the
network.)
To understand the network as an encoding of a collection of
conditional independence statements. (It is helpful in designing
inference procedure)

Problems to Solve...! I
Problem-1
Three people A, B, and C have submitted a job application to a private
business. The likelihood of their choices is 1:2:4.The chances that A, B, and C
can implement adjustments to increase the company’s profitability are,
respectively, 0.8, 0.5, and 0.3.Determine the likelihood that the nomination of c
is to blame if the change doesn’t occur.
Problem-2
Four balls are in a bag. Without replacement, two balls are picked at random,
and it is discovered that they are blue. How likely is it that every ball in the
bag will be blue?
Problem-3
90% of the youngsters in one community were sick with the flu, 10% with the
measles, and no other illnesses were present. For measles, the likelihood of
seeing rashes is 0.95, but for the flu, the likelihood is 0.08. Determine the
likelihood that a child will have the flu if they develop rashes.

Problems to Solve...! II
Problem-4
There are three similar cards, with the exception that the first card has red on
both sides, the second card has blue on both sides, and the third card has a red
side and a blue side. Out of these 3, one card is picked at random and placed
on the table with its red visible side. How likely is it that the other side is blue?
Problem-5
There are three jars with white and black balls in them: the first contains three
white and two black balls, the second has two white and three black balls, and
the third has four white and one black ball. One urn is arbitrarily selected from
those one white ball, without any bias. What is the likelihood that it came
from the three urns?

Decision Networks

VIT - Chennai

Introduction to Decision Network
A decision network (also called an influence diagram) is a graphical

representation of a finite sequential decision problem.
It extend belief networks to include decision variables and utility.
It extends the single-stage decision network to allow for sequential decisions,

and allows both chance nodes and decision nodes to be parents of decision
nodes.
It is a directed acyclic graph (DAG) with chance nodes (drawn as ovals),

decision nodes (drawn as rectangles), and a utility node (drawn as a diamond).
The meaning of the arcs is:
Arcs coming into decision nodes represent the information that will be
available when the decision is made.
Arcs coming into chance nodes represent probabilistic dependence.
Arcs coming into the utility node represent what the utility depends on.

Example - Decision Network I
Example-1 : Below Figure shows a simple decision network for a decision of

whether the agent should take an umbrella when it goes out.
The agent’s utility depends on the weather and whether it takes an umbrella.
The agent does not get to observe the weather; it only observes the forecast.
The forecast probabilistically depends on the weather.
Figure: Decision network for decision of whether to take an umbrella

Example - Decision Network II
As part of this network, the designer must specify the domain for each
Random and the Domain for each decision variable.
1 Random variables are:
Weather{norain, rain},
Forecast{sunny, cloudy, rainy}
2 Decision Variables are :
Umbrella{take_it, leave_it}
There is no domain associated with the utility node.
The designer also must specify the probability of the random variables given
their parents.
Suppose P(Weather) is defined as P(Weather = rain) = 0.3

P(Forecast | Weather) is given by

Example - Decision Network III
And Suppose the utility function, u(Weather, Umbrella) is
There is no table specified for the Umbrella decision variable. It is the task of
the planner to determine which value of Umbrella to select, as a function of
the forecast.

Example - Decision Network IV
Example-2 : Consider a simple case of diagnosis where a doctor first gets to

choose some tests and then gets to treat the patient, taking into account the
outcome of the tests. The reason the doctor may decide to do a test is so that
some information (the test results) will be available at the next stage when
treatment may be performed. The test results will be information that is
available when the treatment is decided, but not when the test is decided. It is
often a good idea to test, even if testing itself can harm the patient.
Figure: Decision network for diagnosis

Example - Decision Network V
Example-3 : Decision Network for Alarm, The agent can receive a report of
people leaving a building and has to decide whether or not to call the fire
department. Before calling, the agent can check for smoke, but this has some
cost associated with it. The utility depends on whether it calls, whether there
is a fire, and the cost associated with checking for smoke.
Figure: Decision network for the alarm problem

CSE3013 Module5

Uploaded by

Copyright:

Available Formats

CSE3013 Module5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE3013 Module5

Uploaded by

Copyright:

Available Formats

"Artificial Intelligence"

Uncertainity and Knowledge Reasoning

Dr. Rabindra Kumar Singh

Dr. Rabindra Kumar Singh "Artificial Intelligence" 1/ 109

Dr. Rabindra Kumar Singh "Artificial Intelligence" 2/ 109

We used FOL and Propositional logic to represent the knowledge with

Example: Given A → B, which means that if A is true, then B must be true.

So to represent uncertain knowledge, where we are not sure about the

Dr. Rabindra Kumar Singh "Artificial Intelligence" 3/ 109

Dr. Rabindra Kumar Singh

Dr. Rabindra Kumar Singh "Artificial Intelligence" 4/ 109

It is a way of representing the knowledge, where we apply the concept of

Need of Probabilistic reasoning in AI:

Dr. Rabindra Kumar Singh "Artificial Intelligence" 5/ 109

- 1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

- 3. P(A) = 1, indicates total certainty in an event A.

Dr. Rabindra Kumar Singh "Artificial Intelligence" 6/ 109

Event : Each possible outcome of a variable is called an event.

Dr. Rabindra Kumar Singh "Artificial Intelligence" 7/ 109

It is a probability of occurring an event when another event has already

Where P(A ∧ B)= Joint probability of A and B

Dr. Rabindra Kumar Singh "Artificial Intelligence" 8/ 109

It is also known as Bayes’ rule, Bayes’ law, or Bayesian reasoning, which

Dr. Rabindra Kumar Singh "Artificial Intelligence" 10/ 109

Example: If cancer corresponds to one’s age then by using Bayes’ theorem, we

As from product rule we can write: Similarly, the probability of event B

Dr. Rabindra Kumar Singh "Artificial Intelligence" 11/ 109

It shows the simple relationship between joint and conditional probabilities.

Dr. Rabindra Kumar Singh "Artificial Intelligence" 12/ 109

Bayes’ rule allows us to compute the single term P(B|A) in terms of

Dr. Rabindra Kumar Singh "Artificial Intelligence" 13/ 109

P(king): probability that the card is King= 4/52= 1/13

Dr. Rabindra Kumar Singh "Artificial Intelligence" 15/ 109

Consider the two events A and B in a sample space S.

Case:1 Suppose, if the P(B|A) = 25 . Then Computing the probability of

Case:2 Suppose, if the P(A|B) = 11 . Then Computing the probability of

Dr. Rabindra Kumar Singh "Artificial Intelligence" 16/ 109

Dr. Rabindra Kumar Singh "Artificial Intelligence" 17/ 109

Consider the below Table :

Dr. Rabindra Kumar Singh "Artificial Intelligence" 18/ 109

Step-2 : Compute the Likelihood Probability for each of the feature :

Table: Frequency Matrix of CGPA

Sl.No CGPA Job Offer = ’Yes’ Job Offer = ’No’

Dr. Rabindra Kumar Singh "Artificial Intelligence" 19/ 109

Table: Likelihood Probability of CGPA (C)

Sl.No CGPA P(Job Offer = ’Yes’) P(Job Offer = ’No’)

Table: Frequency Matrix of Interactivenss (I)

Sl.No Interactiveness Job Offer = ’Yes’ Job Offer = ’No’

Table: Likelihood Probability of Interactiveness (I)

Sl.No Interactiveness P(Job Offer = ’Yes’) P(Job Offer = ’No’)

Dr. Rabindra Kumar Singh "Artificial Intelligence" 20/ 109

3 Practical Knowledge (Pk) :

Table: Frequency Matrix of Practical Knowledge (Pk)

Sl.No Practical Knowledge Job Offer = ’Yes’ Job Offer = ’No’

Table: Likelihood Probability of Practical Knowledge (Pk)

Sl.No Practical Knowledge P(Job Offer = ’Yes’) P(Job Offer = ’No’)

Dr. Rabindra Kumar Singh "Artificial Intelligence" 21/ 109

4 Communication Skills (Cs)

Table: Frequency Matrix of Communication Skills (Cs)