AI Digital Notes Complete
AI Digital Notes Complete
AI Digital Notes Complete
ON
ARTIFICIAL INTELLIGENCE
B.TECH II YR / II SEM
(2021-22)
DEPARTMENT OF AI & ML
MALLA REDDY UNIVERSITY
( UGC, Govt. of India)
Recognized under 2(f) and 12 (B) of UGC ACT 1956
Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad – 500100, Telangana State, India
Artificial Intelligence
Course Objectives
UNIT- I
UNIT-II
Beyond Classical Search: Hill-climbing search, simulated annealing search, Local
Search in Continuous Spaces, Searching with Non-Deterministic Actions,
Searching wih Partial Observations, Online Search Agents and Unknown
Environment.
UNIT-IV
First-Order Logic: Representation, Syntax and Semantics of First-Order Logic, Uses of First-
Order Logic, Knowledge Engineering in First-Order Logic.
Inference in First-Order Logic: Propositional vs. First-Order Inference, Unification and Lifting,
Forward Chaining, Backward Chaining, Resolution.
UNIT-V
Knowledge Representation: Ontological Engineering, Categories and Objects, Events. Mental
Events and Mental Objects, Reasoning Systems for Categories, Reasoning with Default
Information.
Quantifying Uncertainty: Acting under Uncertainty, Basic Probability Notation, Inference Using
Full Joint Distributions, Independence, Bayes’ Rule and Its Use,
COURSE OUTCOMES
1. Understand the informed and uninformed problem types and apply search strategies to solve
them.
2. Apply difficult real life problems in a state space representation so as to solve them using AI
techniques like searching and game playing.
3. Design and evaluate intelligent expert models for perception and prediction from intelligent
environment.
4. Formulate valid solutions for problems involving uncertain inputs or outcomes by using
decision making techniques.
5. Demonstrate and enrich knowledge to select and apply AI tools to synthesize information and
develop models within constraints of application area.
6. Examine the issues involved in knowledge bases, reasoning systems
Text Books:
Introduction:
Artificial Intelligence is concerned with the design of intelligence in an artificial device. The term
was coined by John McCarthy in 1956.
Intelligence is the ability to acquire, understand and apply the knowledge to achieve goals in the
world.
AI is the study of the mental faculties through the use of computational models
AI is the study of intellectual/mental processes as computational processes.
AI program will demonstrate a high level of intelligence to a degree that equals or exceeds the
intelligence required of a human in performing some task.
AI is unique, sharing borders with Mathematics, Computer Science, Philosophy,
Psychology, Biology, Cognitive Science and many others.
Although there is no clear definition of AI or even Intelligence, it can be described as an attempt
to build machines that like humans can think and act, able to learn and use knowledge to solve
problems on their own.
History of AI:
Important research that laid the groundwork for AI:
In 1957, The General Problem Solver (GPS) demonstrated by Newell, Shaw & Simon
In 1958, John McCarthy (MIT) invented the Lisp language.
In 1959, Arthur Samuel (IBM) wrote the first game-playing program, for checkers, to achieve
sufficient skill to challenge a world champion.
In 1963, Ivan Sutherland's MIT dissertation on Sketchpad introduced the idea of interactive
graphics into computing.
In 1966, Ross Quillian (PhD dissertation, Carnegie Inst. of Technology; now CMU) demonstrated
semantic nets
In 1967, Dendral program (Edward Feigenbaum, Joshua Lederberg, Bruce Buchanan, Georgia
Sutherland at Stanford) demonstrated to interpret mass spectra on organic chemical compounds.
First successful knowledge-based program for scientific reasoning.
In 1967, Doug Engelbart invented the mouse at SRI
In 1968, Marvin Minsky & Seymour Papert publish Perceptrons, demonstrating limits of simple
neural nets.
In 1972, Prolog developed by Alain Colmerauer.
In Mid 80’s, Neural Networks become widely used with the Backpropagation algorithm (first
described by Werbos in 1974).
1990, Major advances in all areas of AI, with significant demonstrations in machine learning,
intelligent tutoring, case-based reasoning, multi-agent planning, scheduling, uncertain reasoning,
data mining, natural language understanding and translation, vision, virtual reality, games, and
other topics.
In 1997, Deep Blue beats the World Chess Champion Kasparov
In 2002,iRobot, founded by researchers at the MIT Artificial Intelligence Lab, introduced Roomba,
a vacuum cleaning robot. By 2006, two million had been sold.
1) Game Playing
Deep Blue Chess program beat world champion Gary Kasparov
2) Speech Recognition
PEGASUS spoken language interface to American Airlines' EAASY SABRE reseration system, which
allows users to obtain flight information and make reservations over the telephone. The 1990s has
seen significant advances in speech recognition so that limited systems are now successful.
3) Computer Vision
Face recognition programs in use by banks, government, etc. The ALVINN system from CMU
autonomously drove a van from Washington, D.C. to San Diego (all but 52 of 2,849 miles), averaging
63 mph day and night, and in all weather conditions. Handwriting recognition, electronics and
manufacturing inspection, photo interpretation, baggage inspection, reverse engineering to
automatically construct a 3D geometric model.
4) Expert Systems
Application-specific systems that rely on obtaining the knowledge of human experts in an area and
programming that knowledge into a system.
a. Diagnostic Systems : MYCIN system for diagnosing bacterial infections of the blood and
suggesting treatments. Intellipath pathology diagnosis system (AMA approved). Pathfinder
medical diagnosis system, which suggests tests and makes diagnoses. Whirlpool customer
assistance center.
Application of AI:
AI algorithms have attracted close attention of researchers and have also been applied
successfully to solve problems in engineering. Nevertheless, for large and complex problems, AI
algorithms consume considerable computation time due to stochastic feature of the search
approaches
Building AI Systems:
1) Perception
Intelligent biological systems are physically embodied in the world and experience the world
through their sensors (senses). For an autonomous vehicle, input might be images from a
camera and range information from a rangefinder. For a medical diagnosis system, perception is
the set of symptoms and test results that have been obtained and input to the system manually.
2) Reasoning
Inference, decision-making, classification from what is sensed and what the internal "model" is
of the world. Might be a neural network, logical deduction system, Hidden Markov Model
induction, heuristic searching a problem space, Bayes Network inference, genetic algorithms,
etc.
Includes areas of knowledge representation, problem solving, decision theory, planning, game
theory, machine learning, uncertainty reasoning, etc.
3) Action
Biological systems interact within their environment by actuation, speech, etc. All behavior is
centered around actions in the world. Examples include controlling the steering of a Mars rover
or autonomous vehicle, or suggesting tests and making diagnoses for a medical diagnosis
system. Includes areas of robot actuation, natural language generation, and speech synthesis.
"The automation of] activities that we "The study of the computations that
associate with human thinking, activities such make it possible to perceive, reason, and
as decision-making, problem solving, act" (Winston, 1992)
learning..."(Bellman, 1978)
c) "The art of creating machines that perform d) "A field of study that seeks to explain and
functions that require intelligence when emulate intelligent behavior in terms of
performed by people" (Kurzweil, 1990) computational processes" (Schalkoff, 1
990)
"The study of how to make computers do
things at which, at the moment, people "The branch of computer science that is
are better" (Rich and Knight, 1 99 1 ) concerned with the automation of
intelligent behavior" (Luger and
Stubblefield, 1993)
The definitions on the top, (a) and (b) are concerned with reasoning, whereas those on the bottom, (c)
and (d) address behavior.The definitions on the left, (a) and (c) measure success in terms of human
performance, and those on the right, (b) and (d) measure the ideal concept of intelligence called
rationality
Intelligent Systems:
In order to design intelligent systems, it is important to categorize them into four categories (Luger and
Stubberfield 1993), (Russell and Norvig, 2003)
1. Systems that think like humans
2. Systems that think rationally
3. Systems that behave like humans
4. Systems that behave rationally
Human- Like Rationally
a. Requires a model for human cognition. Precise enough models allow simulation by
computers.
b. Focus is not just on behavior and I/O, but looks like reasoning process.
c. Goal is not just to produce human-like behavior but to produce a sequence of steps of the
reasoning process, similar to the steps followed by a human in solving the same task.
a. The study of mental faculties through the use of computational models; that it is, the study of
computations that make it possible to perceive reason and act.
b. Focus is on inference mechanisms that are probably correct and guarantee an optimal solution.
c. Goal is to formalize the reasoning process as a system of logical rules and procedures of
inference.
a. The art of creating machines that perform functions requiring intelligence when performed by
people; that it is the study of, how to make computers do things which, at the moment, people
do better.
b. Focus is on action, and not intelligent behavior centered around the representation of the world
o The interrogator can communicate with the other 2 by teletype (to avoid the
machine imitate the appearance of voice of the person)
o The machine tries to fool the interrogator to believe that it is the human, and
the person also tries to convince the interrogator that it is the human.
o If the machine succeeds in fooling the interrogator, then conclude that the
machine is intelligent.
a. Tries to explain and emulate intelligent behavior in terms of computational process; that it is
concerned with the automation of the intelligence.
Strong AI makes the bold claim that computers can be made to think on a level (at least) equal to
humans.
Weak AI simply states that some "thinking-like" features can be added to computers to make them
more useful tools... and this has already started to happen (witness expert systems, drive-by-wire cars
and speech recognition software).
AI Problems:
Common-Place Tasks:
1. Recognizing people, objects.
2. Communicating (through natural language).
3. Navigating around obstacles on the streets.
1. Intelligent Agent’s:
2.1 Agents andenvironments:
2.1.1 Agent:
An Agent is anything that can be viewed as perceiving its environment through sensors and acting
upon that environment through actuators.
A human agent has eyes, ears, and other organs for sensors and hands, legs, mouth, and
other body parts foractuators.
A robotic agent might have cameras and infrared range finders for sensors and various
motors foractuators.
A software agent receives keystrokes, file contents, and network packets as sensory
2.1.2 Percept:
We use the term percept to refer to the agent's perceptual inputs at any given instant.
2.1.3 PerceptSequence:
An agent's percept sequence is the complete history of everything the agent has ever perceived.
2.1.5 Agentprogram
Internally, the agent function for an artificial agent will be implemented by an agent program. It is
important to keep these two ideas distinct. The agent function is an abstract mathematical
description; the agent program is a concrete implementation, running on the agent architecture.
To illustrate these ideas, we will use a very simple example-the vacuum-cleaner world shown in Fig
2.1.5. This particular world has just two locations: squares A and B. The vacuum agent perceives
which square it is in and whether there is dirt in the square. It can choose to move left, move right,
suck up the dirt, or do nothing. One very simple agent function is the following: if the current
square is dirty, then suck, otherwise move to the other square. A partial tabulation of this agent
function is shown in Fig 2.1.6.
Fig 2.1.6: Partial tabulation of a simple agent function for the example: vacuum-cleaner
world shown in the Fig 2.1.5
Fig 2.1.6(i): The REFLEX-VACCUM-AGENT program is invoked for each new percept (location, status) and
returns an action each time
Strategies of Solving Tic-Tac-Toe Game Playing
Tic-Tac-Toe is a simple and yet an interesting board game. Researchers have used various approaches to
study the Tic-Tac-Toe game. For example, Fok and Ong and Grim et al. have used artificial neural
network based strategies to play it. Citrenbaum and Yakowitz discuss games like Go-Moku,
Hex and Bridg-It which share some similarities with Tic-Tac-Toe.
Artificial Intelligence Page 15
Fig 1.
The board used to play the Tic-Tac-Toe game consists of 9 cells laid out in the form of a 3x3 matrix (Fig.
1). The game is played by 2 players and either of them can start. Each of the two players is assigned a
unique symbol (generally 0 and X). Each player alternately gets a turn to make a move. Making a move is
compulsory and cannot be deferred. In each move a player places the symbol assigned to him/her in a
hitherto blank cell.
Let a track be defined as any row, column or diagonal on the board. Since the board is a square
matrix with 9 cells, all rows, columns and diagonals have exactly 3 cells. It can be easily observed that
there are 3 rows, 3 columns and 2 diagonals, and hence a total of 8 tracks on the board (Fig. 1). The goal
of the game is to fill all the three cells of any track on the board with the symbol assigned to one before
the opponent does the same with the symbol assigned to him/her. At any point of the game, if
there exists a track whose all three cells have been marked by the same symbol, then the player
to whom that symbol have been assigned wins and the game terminates. If there exist no track
whose cells have been marked by the same symbol when there is no more blank cell on the board then
the game is drawn.
Let the priority of a cell be defined as the number of tracks passing through it. The priorities of the
nine cells on the board according to this definition are tabulated in Table 1. Alternatively, let the
priority of a track be defined as the sum of the priorities of its three cells. The priorities of the eight
tracks on the board according to this definition are tabulated in Table 2. The prioritization of the cells
and the tracks lays the foundation of the heuristics to be used in this study. These heuristics are
somewhat similar to those proposed by Rich and Knight.
Algorithm:
2. Use the computed number as an index into Move-Table and access the vector stored there.
Procedure:
1) Elements of vector:
0: Empty
1: X
2: O
b) Element = A vector which describes the most suitable move from the
Comments:
3. Difficult to extend
Stratergy 2:
Data Structure:
1) Use vector, called board, as Solution 1
2) However, elements of the vector:
2: Empty
3: X
5: O
3) Turn of move: indexed by integer
1,2,3, etc
Function Library:
1.Make2:
a) Return a location on a game-board.
IF (board[5] = 2)
RETURN 5; //the center cell.
ELSE
RETURN any cell that is not at the board’s corner;
// (cell: 2,4,6,8)
b) Let P represent for X or O
c) can_win(P) :
P has filled already at least two cells on a straight line (horizontal, vertical, or
diagonal)
d) cannot_win(P) = NOT(can_win(P))
2. Posswin(P):
Algorithm:
1. Turn = 1: (X moves)
Go(1) //make a move at the left-top cell
2. Turn = 2: (O moves)
IF board[5] is empty THEN
Go(5)
ELSE
Go(1)
3. Turn = 3: (X moves)
IF board[9] is empty THEN
Go(9)
ELSE
Go(3).
4. Turn = 4: (O moves)
IF Posswin (X) <> 0 THEN
Go (Posswin (X))
//Prevent the opponent to win
ELSE Go (Make2)
5. Turn = 5: (X moves)
Comments:
1. Not efficient in time, as it has to check several conditions before making each
move.
2. Easier to understand the program’s strategy.
3. Hard to generalize.
Searching Solutions:
To build a system to solve a problem:
1. Define the problem precisely
2. Analyze the problem
3. Isolate and represent the task knowledge that is necessary to solve the problem
4. Choose the best problem-solving techniques and apply it to the particular problem.
Which search algorithm one should use will generally depend on the problem domain.
There are four important factors to consider:
2. Optimality – Is the solution found guaranteed to be the best (or lowest cost) solution if there exists
more than one solution?
3. Time Complexity – The upper bound on the time required to find a solution, as a function of the
complexity of the problem.
4. Space Complexity – The upper bound on the storage space (memory) required at any point during the
search, as a function of the complexity of the problem.
The General Problem Solver (GPS) was the first useful AI program, written by Simon, Shaw, and Newell
in 1959. As the name implies, it was intended to solve nearly any problem.
Newell and Simon defined each problem as a space. At one end of the space is the starting point; on the
other side is the goal. The problem-solving procedure itself is conceived as a set of operations to cross
that space, to get from the starting point to the goal state, one step at a time.
The General Problem Solver, the program tests various actions (which Newell and Simon called
operators) to see which will take it closer to the goal state. An operator is any activity that changes the
A Water Jug Problem: You are given two jugs, a 4-gallon one and a 3-gallon one, a
pump which has unlimited water which you can use to fill the jug, and the ground on which
water may be poured. Neither jug has any measuring markings on it. How can you get
exactly 2 gallons of water in the 4-gallon jug?
Operators -we must defi ne a set of operators that will take us from one state to another:
Second Solution:
Control strategies
Control Strategies means how to decide which rule to apply next during the process of searching for a
solution to a problem.
Requirement for a good Control Strategy
Let us discuss these strategies using water jug problem. These may be applied to any search problem.
Generate all the offspring of the root by applying each of the applicable rules to the initial state.
Now for each leaf node, generate all its successors by applying all the rules that are appropriate.
8 Puzzle Problem.
The 8 puzzle consists of eight numbered, movable tiles set in a 3x3 frame. One cell of the frame is always
empty thus making it possible to move an adjacent numbered tile into the empty cell. Such a puzzle is
illustrated in following diagram.
Solution:
To solve a problem using a production system, we must specify the global database the rules, and the
control strategy. For the 8 puzzle problem that correspond to these three components. These elements
are the problem states, moves and goal. In this problem each tile configuration is a state. The set of all
configuration in the space of problem states or the problem space, there are only 3, 62,880 different
configurations o the 8 tiles and blank space. Once the problem states have been conceptually identified,
we must construct a computer representation, or description of them . this description is then used as
the database of a production system. For the 8-puzzle, a straight forward description is a 3X3 array of
matrix of numbers. The initial global database is this description of the initial problem state. Virtually
any kind of data structure can be used to describe states.
A move transforms one problem state into another state. The 8-puzzle is conveniently interpreted as
having the following for moves. Move empty space (blank) to the left, move blank up, move blank to the
right and move blank down,. These moves are modeled by production rules that operate on the state
descriptions in the appropriate manner.
The rules each have preconditions that must be satisfied by a state description in order for them to be
applicable to that state description. Thus the precondition for the rule associated with “move blank up”
is derived from the requirement that the blank space must not already be in the top row.
The problem goal condition forms the basis for the termination condition of the production system. The
control strategy repeatedly applies rules to state descriptions until a description of a goal state is
produced. It also keeps track of rules that have been applied so that it can compose them into sequence
representing the problem solution. A solution to the 8-puzzle problem is given in the following figure.
Example:- Depth – First – Search traversal and Breadth - First - Search traversal
Search is the systematic examination of states to find path from the start/root state to the goal state.
Many traditional search algorithms are used in AI applications. For complex problems, the traditional
algorithms are unable to find the solution within some practical time and space limits. Consequently,
many special techniques are developed; using heuristic functions. The algorithms that use heuristic
Heuristic algorithms aremore efficient because they take advantage of feedback from the data to direct
the search path.
Uninformed search
Also called blind, exhaustive or brute-force search, uses no information about the problem to guide the
search and therefore may not be very efficient.
Informed Search:
Also called heuristic or intelligent search, uses information about the problem to guide the search,
usually guesses the distance to a goal state and therefore efficient, but the search may not be always
possible.
• Algorithm:
1. Create a variable called NODE-LIST and set it to initial state
2. Until a goal state is found or NODE-LIST is empty do
a. Remove the first element from NODE-LIST and call it E. If NODE-LIST was empty,
quit
b. For each way that each rule can match the state described in E do:
i. Apply the rule to generate a new state
ii. If the new state is a goal state, quit and return this state
iii. Otherwise, add the new state to the end of NODE-LIST
BFS illustrated:
Step 1: Initially fringe contains only one node corresponding to the source state A.
Step 2: A is removed from fringe. The node is expanded, and its children B and C are generated.
They are placed at the back of fringe.
Figure 2
FRINGE: B C
Step 3: Node B is removed from fringe and is expanded. Its children D, E are generated and put
at the back of fringe.
Figure 3
FRINGE: C D E
Step 4: Node C is removed from fringe and is expanded. Its children D and G are added to the
back of fringe.
Step 5: Node D is removed from fringe. Its children C and F are generated and added to the back
of fringe.
Figure 5
FRINGE: E D G C F
Figure 6
FRINGE: D G C F
Figure 7
FRINGE: G C F B F
• Algorithm:
1. Create a variable called NODE-LIST and set it to initial state
Figure 1
FRINGE: A
Step 2: A is removed from fringe. A is expanded and its children B and C are put in front of
fringe.
Figure 2
FRINGE: B C
Figure 3
FRINGE: D E C
Step 4: Node D is removed from fringe. C and F are pushed in front of fringe.
Figure 4
FRINGE: C F E C
Step 5: Node C is removed from fringe. Its child G is pushed in front of fringe.
Figure 5
FRINGE: G F E C
Step 6: Node G is expanded and found to be a goal node.
Note that the time taken by the algorithm is related to the maximum depth of the search tree. If the
search tree has infinite depth, the algorithm may not terminate. This can happen if the search space is
infinite. It can also happen if the search space contains cycles. The latter case can be handled by
checking for cycles in the algorithm. Thus Depth First Search is not complete.
Description:
It is a search strategy resulting when you combine BFS and DFS, thus combining the advantages
of each strategy, taking the completeness and optimality of BFS and the modest memory
requirements of DFS.
IDS works by looking for the best search depth d, thus starting with depth limit 0 and make a BFS
and if the search failed it increase the depth limit by 1 and try a BFS again with depth 1 and so
on – first d = 0, then 1 then 2 and so on – until a depth d is reached where a goal is found.
procedure IDDFS(root)
for depth from 0 to ∞
found ← DLS(root, depth)
if found ≠ null
return found
Performance Measure:
o Completeness: IDS is like BFS, is complete when the branching factor b is finite.
o Optimality: IDS is also like BFS optimal when the steps are of the same cost.
Time Complexity:
o One may find that it is wasteful to generate nodes multiple times, but actually it is not
that costly compared to BFS, that is because most of the generated nodes are always in
the deepest level reached, consider that we are searching a binary tree and our depth
limit reached 4, the nodes generated in last level = 24 = 16, the nodes generated in all
nodes before last level = 20 + 21 + 22 + 23= 15
o Imagine this scenario, we are performing IDS and the depth limit reached depth d, now
if you remember the way IDS expands nodes, you can see that nodes at depth d are
generated once, nodes at depth d-1 are generated 2 times, nodes at depth d-2 are
generated 3 times and so on, until you reach depth 1 which is generated d times,
we can view the total number of generated nodes in the worst case as:
N(IDS) = (b)d + (d – 1)b2+ (d – 2)b3 + …. + (2)bd-1 + (1)bd = O(bd)
Weblinks:
i. https://www.youtube.com/watch?v=7QcoJjSVT38
ii. https://mhesham.wordpress.com/tag/iterative-deepening-depth-first-search
Conclusion:
We can conclude that IDS is a hybrid search strategy between BFS and DFS inheriting their
advantages.
It is said that “IDS is the preferred uniformed search method when there is a large search space
and the depth of the solution is not known”.
Heuristic Searches:
A Heuristic technique helps in solving problems, even though there is no guarantee that it will never
lead in the wrong direction. There are heuristics of every general applicability as well as domain specific.
The strategies are general purpose heuristics. In order to use them in a specific domain they are coupler
with some domain specific heuristics. There are two major ways in which domain - specific, heuristic
information can be incorporated into rule-based search procedure.
A heuristic function is a function that maps from problem state description to measures desirability,
usually represented as number weights. The value of a heuristic function at a given node in the search
process gives a good estimate of that node being on the desired path to solution.
Greedy best-first search tries to expand the node that is closest to the goal, on the: grounds that this is
likely to lead to a solution quickly. Thus, it evaluates nodes by using just the heuristic function:
f (n) = h (n).
Taking the example of Route-finding problems in Romania, the goal is to reach Bucharest starting from
the city Arad. We need to know the straight-line distances to Bucharest from various cities as shown in
Figure 8.1. For example, the initial state is In (Arad), and the straight line distance heuristic hSLD (In
(Arad)) is found to be 366. Using the straight-line distance heuristic hSLD, the goal state can be reached
faster.
Figure 8.2: Stages in a greedy best-first search for Bucharest using the straight-line distance heuristic
hSLD. Nodes are labeled with their h-values.
Figure 8.2 shows the progress of greedy best-first search using hSLD to find a path from Arad to
Bucharest. The first node to be expanded from Arad will be Sibiu, because it is closer to
Bucharest than either Zerind or Timisoara. The next node to be expanded will be Fagaras,
because it is closest.
Fagaras in turn generates Bucharest, which is the goal.
Complete: NO [can get stuck in loops, e.g., Complete in finite space with repeated-
state checking ]
Time Complexity: O (bm) [but a good heuristic can give dramatic improvement]
Space Complexity: O (bm) [keeps all nodes in memory]
Greedy best-first search is not optimal, and it is incomplete. The worst-case time and space
complexity is O (bm), where m is the maximum depth of the search space.
We will assume we are trying to maximize a function. That is, we are trying to find a point in the search
space that is better than all the others. And by "better" we mean that the evaluation is higher. We might
also say that the solution is of better quality than all the others.
Also, if two neighbors have the same evaluation and they are both the best quality, then the algorithm
will choose between them at random.
The main problem with hill climbing (which is also sometimes called gradient descent) is that we are not
guaranteed to find the best solution. In fact, we are not offered any guarantees about the solution. It
could be abysmally bad.
You can see that we will eventually reach a state that has no better neighbours but there are better
solutions elsewhere in the search space. The problem we have just described is called a local maxima.
We have considered algorithms that work only in discrete environments, but real-world
environment are continuous.
Local search amounts to maximizing a continuous objective function in a multi-dimensional
vector space.
This is hard to do in general.
Can immediately retreat
Discretize the space near each state
Apply a discrete local search strategy (e.g., stochastic hill climbing, simulated annealing)
OPEN is a priorityqueue of nodes that have been evaluated by the heuristic function but which have not
yet been expanded into successors. The most promising nodes are at the front.
CLOSED are nodes that have already been generated and these nodes must be stored because a graph is
being used in preference to a tree.
Algorithm:
Example:
1. It is not optimal.
2. It is incomplete because it can start down an infinite path and never return to try other
possibilities.
The A* search algorithm (pronounced "Ay-star") is a tree search algorithm that finds a path from a given
initial node to a given goal node (or one passing a given goal test). It employs a "heuristic estimate"
which ranks each node by an estimate of the best route that goes through that node. It visits the nodes
in order of this heuristic estimate.
Similar to greedy best-first search but is more accurate because A* takes into account the nodes that
have already been traversed.
g is a measure of the distance/cost to go from the initial node to the current node
Thus fis an estimate of how long it takes to go from the initial node to the solution
Algorithm:
g(s)= 0, f(s)=h(s)
save n in CLOSED
Insert m in OPEN
b) If m € *OPEN U CLOSED+
Move m to OPEN.
Description:
A* begins at a selected node. Applied to this node is the "cost" of entering this node (usually
zero for the initial node). A* then estimates the distance to the goal node from the current
node. This estimate and the cost added together are the heuristic which is assigned to the path
leading to this node. The node is then added to a priority queue, often called "open".
The algorithm then removes the next node from the priority queue (because of the way a
priority queue works, the node removed will have the lowest heuristic). If the queue is empty,
there is no path from the initial node to the goal node and the algorithm stops. If the node is the
goal node, A* constructs and outputs the successful path and stops.
If the node is not the goal node, new nodes are created for all admissible adjoining nodes; the
exact way of doing this depends on the problem at hand. For each successive node, A*
calculates the "cost" of entering the node and saves it with the node. This cost is calculated from
the cumulative sum of costs stored with its ancestors, plus the cost of the operation which
reached this new node.
The algorithm also maintains a 'closed' list of nodes whose adjoining nodes have been checked.
If a newly generated node is already in this list with an equal or lower cost, no further
processing is done on that node or with the path associated with it. If a node in the closed list
matches the new one, but has been stored with a higher cost, it is removed from the closed list,
and processing continues on the new node.
The algorithm A* is admissible. This means that provided a solution exists, the first solution
found by A* is an optimal solution. A* is admissible under the following conditions:
A* is also complete.
Algorithm:
Each iteration of the algorithm is a depth-first search that keeps track of the cost, f(n) = g(n) +
h(n), of each node generated.
As soon as a node is generated whose cost exceeds a threshold for that iteration, its path is cut
off, and the search backtracks before continuing.
The cost threshold is initialized to the heuristic estimate of the initial state, and in each
successive iteration is increased to the total cost of the lowest-cost node that was pruned during
the previous iteration.
The algorithm terminates when a goal state is reached whose total cost dees not exceed the
current threshold.
UNIT-III
PROPOSITIONAL LOGIC
The Wumpus World: A variety of "worlds" are being used as examples for
Knowledge Representation, Reasoning, and Planning. Among them the Vacuum
World, the Block World, and the Wumpus World. The Wumpus World was
introduced by Genesereth, and is discussed in Russell-Norvig. The Wumpus
World is a simple world (as is the Block World) for which to represent
knowledge and to reason. It is a cave with a number of rooms, represented as a
4x4 square.
• Environment: A 4×4 grid of rooms, with walls surrounding the grid. The agent
always starts in the square labeled [1,1], facing to the east. The locations of the
gold and the wumpus are chosen randomly, with a uniform distribution, from
the squares other than the start square. In addition, each square other than the
start can be a pit, with probability 0.2.
A cautious agent will move only into a square that it knows to be OK. Let us
suppose the agent decides to move forward to [2,1]. The agent perceives a
breeze (denoted by ―B‖) in [2,1], so there must be a pit in a neighboring square.
The pit cannot be in [1,1], by the rules of the game, so there must be a pit in
[2,2] or [3,1] or both. The notation ―P?‖ in Figure 7.3(b) indicates a possible pit
in those squares. At this point, there is only one known square that is OK and
that has not yet been visited. So the prudent agent will turn around, go back to
[1,1], and then proceed to [1,2]. The agent perceives a stench in [1,2], resulting
in the state of knowledge shown in Figure 7.4(a). The stench in [1,2] means that
there must be a wumpus nearby. But the wumpus cannot be in [1,1], by the rules
of the game, and it cannot be in [2,2] (or the agent would have detected a stench
when it was in [2,1]). Therefore, the agent can infer that the wumpus is in [1,3].
The notation W! indicates this inference. Moreover, the lack of a breeze in [1,2]
implies that there is no pit in [2,2]. Yet the agent has already inferred that there
must be a pit in either [2,2] or [3,1], so this means it must be in [3,1]. This is a
fairly difficult inference, because it combines knowledge gained at different
times in different places and relies on the lack of a percept to make one crucial
step. The agent has now proved to itself that there is neither a pit nor a wumpus
in [2,2], so it is OK to move there. We do not show the agent’s state of
knowledge at [2,2]; we just assume that the agent turns and moves to [2,3],
giving us Figure 7.4(b). In [2,3], the agent detects a glitter, so it should grab the
gold and then return home. Note that in each case for which the agent draws a
conclusion from the available information, that conclusion is guaranteed to be
correct if the available information is correct. This is a fundamental property of
logical reasoning. In the rest of this chapter, we describe how to build logical
agents that can represent information and draw conclusions .
Logic:
A logic must also define the semantics, or meaning, of sentences. The semantics
defines Truth the truth of each sentence with respect to each possible world. For
example, the semantics Possible world for arithmetic specifies that the sentence
―x+y=4‖ is true in a world where x is 2 and y is 2, but false in a world where x
is 1 and y is 1. In standard logics, every sentence must be either true or false in
each possible world—there is no ―in between.‖2 Model When we need to be
precise, we use the term model in place of ―possible world.‖ Whereas possible
worlds might be thought of as (potentially) real environments that the agent
might or might not be in, models are mathematical abstractions, each of which
has a fixed truth value (true or false) for every relevant sentence. Informally, we
may think of a possible world as, for example, having x men and y women
sitting at a table playing bridge, and the sentence x+y=4 is true when there are
four people in total. Formally, the possible models are just all possible
assignments of nonnegative integers to the variables x and y. Each such
assignment determines the truth of any sentence of arithmetic whose variables
are x and y. If Satisfaction a sentence α is true in model m, we say that m
satisfies α or sometimes m is a model of α. We use the notation M(α) to mean
the set of all models of α. Now that we have a notion of truth, we are ready to
talk about logical reasoning. This Entailment involves the relation of logical
entailment between sentences—the idea that a sentence follows logically from
another sentence. In mathematical notation, we write α |= β to mean that the
sentence α entails the sentence β. The formal definition of entailment is this: α
|= β if and only if, in every model in which α is true, β is also true. Using the
notation just introduced, we can write
Syntax :
The syntax of propositional logic defines the allowable sentences. The atomic
sentences Atomic sentences consist of a single proposition symbol. Each such
symbol stands for a proposition that can Proposition symbol be true or false. We
use symbols that start with an uppercase letter and may contain other letters or
subscripts, for example: P, Q, R, W1,3 and Facing East. The names are arbitrary
but are often chosen to have some mnemonic value—we use W1,3 to stand for
the proposition that the wumpus is in [1,3]. (Remember that symbols such as
W1,3 are atomic, i.e., W, 1, and 3 are not meaningful parts of the symbol.)
There are two proposition symbols with fixed meanings: True is the always-true
proposition and False is the always-false proposition. Complex sentences are
constructed from simpler sentences, using parentheses and operators Complex
sentences called logical connectives. There are five connectives in common use:
Logical connectives ¬ (not). A sentence such as ¬W1,3 is called the negation of
W1,3. A literal is either an Negation atomic sentence (a positive literal) or a
negated atomic sentence (a negative literal). Literal ∧ (and). A sentence whose
main connective is ∧, such as W1,3 ∧P3,1, is called a conjunction; its parts are
the conjuncts. (The ∧ looks like an ―A‖ for ―And.‖) Conjunction ∨ (or). A
sentence whose main connective is ∨, such as (W1,3 ∧P3,1)∨W2,2, is a
disjunction; its parts are disjuncts—in this example, (W1,3 ∧P3,1) and W2,2.
Disjunction ⇒ (implies). A sentence such as (W1,3 ∧P3,1) ⇒ ¬W2,2 is called
an implication (or con- Implication ditional). Its premise or antecedent is (W1,3
∧P3,1), and its conclusion or consequent Premise is ¬W2,2. Implications are
also known as rules or if–then statements. The implication Conclusion symbol
is sometimes written in other books as ⊃ or →. Rules ⇔ (if and only if). The
sentence W1,3 ⇔ ¬W2,2 is a biconditional.
Semantics :
The semantics defines the rules for determining the truth of a sentence with
respect to a particular Truth value model. In propositional logic, a model simply
sets the truth value—true or false—for every proposition symbol. For example,
if the sentences in the knowledge base make use of the proposition symbols
P1,2, P2,2, and P3,1, then one possible model is m1 = {P1,2 =false, P2,2 =false,
P3,1 =true}. With three proposition symbols, there are 23 =8 possible models.
Propositional Theorem Proving
In this section, we show how entailment Theorem proving can be done by
theorem proving—applying rules of inference directly to the sentences in our
knowledge base to construct a proof of the desired sentence without consulting
models. If the number of models is large but the length of the proof is short,
then theorem proving can be more efficient than model checking. Before we
plunge into the details of theorem-proving algorithms, we will need some
Logical equivalence additional concepts related to entailment. The first concept
is logical equivalence: two sentences α and β are logically equivalent if they are
true in the same set of models. We write this as α ≡ β. (Note that ≡ is used to
make claims about sentences, while ⇔ is used as part of a sentence.) For
example, we can easily show (using truth tables) that P∧Q and Q∧P are
logically equivalent; other equivalences are shown in Figure 7.11. These
equivalences play much the same role in logic as arithmetic identities do in
ordinary mathematics. An alternative definition of equivalence is as follows:
any two sentences α and β are equivalent if and only if each of them entails the
other: α ≡ β if and only if α |= β and β |= α. Validity The second concept we will
need is validity. A sentence is valid if it is true in all models. For Tautology
example, the sentence P∨ ¬P is valid. Valid sentences are also known as
tautologies—they are necessarily true. Because the sentence True is true in all
models, every valid sentence is logically equivalent to True. What good are
valid sentences? From our definition of entail Deduction theorem, we can derive
the deduction theorem, which was known to the ancient Greeks: I For any
sentences α and β, α |= β if and only if the sentence (α ⇒ β) is valid. (Exercise
7.DEDU asks for a proof.) Hence, we can decide if α |= β by checking that (α ⇒
β) is true in every model—which is essentially what the inference algorithm in
Figure 7.10 does—or by proving that (α ⇒ β) is equivalent to True. Conversely,
the deduction theorem states that every valid implication sentence describes a
legitimate inference. Satisfiability The final concept we will need is
satisfiability. A sentence is satisfiable if it is true in, or satisfied by, some
model. For example, the knowledge base given earlier, (R1 ∧R2 ∧ R3 ∧ R4 ∧
R5), is satisfiable because there are three models in which it is true, as shown in
Figure 7.9. Satisfiability can be checked by enumerating the possible models
until one is found that satisfies the sentence. The problem of determining the
satisfiability of sentences,
in propositional logic—the SAT problem—was the first problem proved to be
NP-complete. SAT Many problems in computer science are really satisfiability
problems. For example, all the constraint satisfaction problems in Chapter 5 ask
whether the constraints are satisfiable by some assignment. Validity and
satisfiability are of course connected: α is valid iff ¬α is unsatisfiable;
contrapositively, α is satisfiable iff ¬α is not valid. We also have the following
useful result: α |= β if and only if the sentence (α∧ ¬β) is unsatisfiable. J Proving
β from α by checking the unsatisfiability of (α ∧ ¬β) corresponds exactly to the
standard mathematical proof technique of reductio ad absurdum (literally,
―reduction to an absurd thing‖). It is also called proof by refutation or proof by
contradiction. One assumes a Refutation sentence β to be false and shows that
this leads to a contradiction with known axioms α. This Contradiction
contradiction is exactly what is meant by saying that the sentence (α∧ ¬β) is
unsatisfiable.
The notation means that, whenever any sentences of the form α ⇒ β and α are
given, then the sentence β can be inferred. For example, if (WumpusAhead ∧
WumpusAlive) ⇒ Shoot and (WumpusAhead ∧ WumpusAlive) are given, then
Shoot can be inferred. Another useful inference rule is And-Elimination, which
says that, from a conjunction, And-Elimination any of the conjuncts can be
inferred:
By considering the possible truth values of α and β, one can easily show once
and for all that Modus Ponens and And-Elimination are sound. These rules can
then be used in any particular instances where they apply, generating sound
inferences without the need for enumerating models. All of the logical
equivalences in Figure 7.11 can be used as inference rules. For example, the
equivalence for biconditional elimination yields the two inference rules
Not all inference rules work in both directions like this. For example, we cannot
run Modus Ponens in the opposite direction to obtain α ⇒ β and α from β. Let
us see how these inference rules and equivalences can be used in the wumpus
world. We start with the knowledge base containing R1 through R5 and show
how to prove ¬P1,2, that is, there is no pit in [1,2]:
4. Apply Modus Ponens with R8 and the percept R4 (i.e., ¬B1,1), to obtain R9 :
¬(P1,2 ∨P2,1). 5. Apply De Morgan’s rule, giving the conclusion R10 : ¬P1,2 ∧
¬P2,1 . That is, neither [1,2] nor [2,1] contains a pit. Any of the search
algorithms in Chapter 3 can be used to find a sequence of steps that constitutes a
proof like this. We just need to define a proof problem as follows:
• ACTIONS: the set of actions consists of all the inference rules applied to all
the sentences that match the top half of the inference rule.
• RESULT: the result of an action is to add the sentence in the bottom half
of the inference rule.
• GOAL: the goal is a state that contains the sentence we are trying to
prove Thus, searching for proofs is an alternative to enumerating models.
In many practical cases I finding a proof can be more efficient because
the proof can ignore irrelevant propositions, no matter how many of them
there are.
Proof by resolution
We have argued that the inference rules covered so far are sound, but we have not
discussed the question of completeness for the inference algorithms that use them.
For example, if we removed the biconditional elimination rule, the proof in the preceding
section would not go through. The current section introduces a single inference rule,
resolution, that yields a complete inference algorithm when coupled with any complete
search algorithm.
We begin by using a simple version of the resolution rule in the wumpus world. Let us
consider the steps leading up to Figure 7.4(a): the agent returns from [2,1] to [1,1] and then
goes to [1,2], where it perceives a stench, but no breeze. We add the following facts to the
knowledge base:
Horn clauses and definite clause:
The first step is to enable the agent to deduce, to the extent possible, the
state of the world given its percept history. This requires writing down a
complete logical model of the effects of actions. We then show how
logical inference can be used by an agent in the wumpus world. We also
show how the agent can keep track of the world efficiently without going
back into the percept history for each inference. Finally, we show how the
agent can use logical inference to construct plans that are guaranteed to
achieve its goals, provided its knowledge base is true in the actual world.
A hybrid agent:
The agent program works quite well, but it has one major weakness: as
time goes by, the computational expense involved in the calls to ASK
goes up and up. This happens mainly because the required inferences
have to go back further and further in time and involve more and more
proposition symbols. Obviously, this is unsustainable—we cannot have
an agent whose time to process each percept grows in proportion to the
length of its life! What we really need is a constant update time—that is,
independent of t. The obvious answer is to save, or cache, the results of
inference, so that the inference process at the next time step can Caching
build on the results of earlier steps instead of having to start again from
scratch.
Pros and Cons of Pro. Logic
Propositional logic is declarative
Propositional logic allows partial/disjunctive/negated
information
■ unlike most data structures and databases, which are domain-
specific
[Type text]
Our Approach
■ Adopt the foundation of propositional logic – a
declarative, compositional semantics that is
context-independent and unambiguous – but with
more expressive power, borrowing ideas from
natural language while avoiding its drawbacks
■ Two binary
relations
■ Three unary
relations
■ One unary
function
Syntax of FOL: Basic Elements
■ Basic symbols: objects (constant symbols), relations
(predicate symbols), and functions (functional symbols).
■ Constants King John, 2, Wumpus...
■ Predicates Brother, >,...
■ Functions Sqrt, LeftLegOf,...
■ Variables x, y, a, b,...
■ Connectives , , , ,
■ Equality =
■ Quantifiers ,
■ A legitimate expression of predicate calculus is called well-
formed formula (wff), or simply, sentence
Relations (Predicates)
■ A relation is the set of tuples of objects
■ Brotherhood relationship {<Richard, John>, <John, Richard>}
■ Unary relation, binary relation, …
a
b d
c e
Functions
■ In English, we use “King John’s left leg” rather than
giving a name to his leg, where we use “function
symbol”
■ hat(c) = b
■ hat(b) = a a
hat(d) is not defined d
■
b
c e
Terms and Atomic Sentences
Atomic sentence = predicate (term1,...,termn)
or term1 = term2
Term = function (term1,...,termn)
or constant or variable
■ Atomic sentence states facts
■ For example:
■ Brother(KingJohn,RichardTheLionheart)
■ Length(LeftLegOf(Richard))
■ Married(Father(Richard), Mother(John))
Composite Sentences
■ Complex sentences are made from atomic
sentences using connectives
■ S, S1 S2, S1 S2 , S1 S 2, S 1 S2
For example:
TELL(KB, King(John))
TELL(KB, x King(x) Person(x))King(John))
ASK (KB, Person(John))
ASK (KB,
ASK (KB, x Person(x)) answer :{x / John}
Example: The Kinship Domain
■ An example KB includes things like:
■ Fact:
■ “Elizabeth is the mother of Charles”
■ “Charles is the father of William”
■ Rules:
■ One’s grandmother is the mother of one’s parent”
■ Object: people
■ Unary predicate: Male, Female
■ Binary predicate: Son, Spouse, Wife, Husband,
Grandparent, Grandchild, Cousin, Aunt, and Uncle
■ Function: Mother, Father
Example: The Kinship Domain
One ' s mom is one ' s female parent.
m, c Mother(c) = m Female(m) Parent(m, c)
One ' s husband is one ' s male spouse.
w, h Husband (h, w) Male(h) Spouse(h, w)
Paren and child are inverse relations.
p, c Parent( p, c) Child (c, p)
A grandparent is a parent of one ' s parent.
g, c Grandparent(g, c) p Parent(g, p) Parent( p, c)
Practice:
Male and female are disjoint categories
A sibling is another child of one’s parents
Answer
■ Male and female are disjoint categories:
■ x Male(x) Female(x)
■ Reflex behavior:
■ t Glitter(t) BestAction(Grab, t)
The Wumpus World
■ The Environment:
■ Objects: squares, pits and the wumpus
■ x,y,a,b Adjacent([x,y],[a,b])
[a,b] {[x+1,y], [x-1,y],[x,y+1],[x,y-1]}
■ A unary predicate Pit(x)
breezy or not smelly, the agent can deduce where the pits
are and where the wumpus is
Diagnostic Rules
■ Diagnostic rule: lead from observed effects
to hidden causes
■ If the square is breezy then adjacent square(s)
must contain a pit
s Breezy(s) r Adjacent(r,s) Pit(r)
■ If the square is not breezy, no adjacent square
contains a pit
s Breezy(s) ( r Adjacent(r,s) Pit(r))
■ Combining both:
s Breezy(s) r Adjacent(r,s) Pit(r)
Causal Rules
■ Causal rule: some hidden property of the
world causes certain percept to be
generated
■ A pit causes all adjacent squares to be breezy
r Pit(r) [ s Adjacent(r,s) Breezy(s)]
■ If all squares adjacent to a given square are
pitless, the square will not be breezy
s [ r [Adjacent(r,s) Pit(r)] Breezy(s)
Summary
■ First-order logic:
■ objects and relations are semantic primitives
■ syntax: constants, functions, predicates,
equality, quantifiers
■ AsHighAs(Everest, Everest)
■ AsHighAs(Kilimanjaro, Everest)
■ AsHighAs(Kilimajaro, Everest) and AsHighAs(BenNevis,
Everest)
Reduction to Propositional Inference
Suppose the KB contains just the following:
x King(x) Greedy(x) Evil(x)
King(John)
Greedy(John)
Brother(Richard,John)
■ E.g., from:
x King(x) Greedy(x) Evil(x)
King(John)
y Greedy(y)
Brother(Richard,John)
■ Theorem:
■ If a sentence is entailed by the original, first-order KB, then
there is a proof involving just a finite subset of the
propositionalized KB
■ First instantiation with constant symbols
■ Then terms with depth 1 (Father(John))
■ Then terms with depth 2 …
■ Until the sentence is entailed
Unification and Lifting
Propositionalization approach is
rather inefficient
A First-Order Inference Rule
■ If there is some substitution θ that makes the premise of the
implication identical to sentences already in the KB, then we
assert the conclusion of the implication, after applying θ
p q θ
Knows(John,x) Knows(John,Jane)
Knows(John,x) Knows(y,OJ)
Knows(John,x) Knows(y,Mother(y))
Knows(John,x) Knows(x,OJ)
Unification
■ The unify algorithm takes two sentences and
returns a unifier for them if one exists:
■ UNIFY(p, q) = θ where Subst(θ, p) = Subst(θ, q)
p q θ
Knows(John,x) Knows(John,Jane) {x/Jane}
Knows(John,x) Knows(y,OJ)
Knows(John,x) Knows(y,Mother(y))
Knows(John,x) Knows(x,OJ)
Unification
■ The unify algorithm takes two sentences and
returns a unifier for them if one exists:
■ UNIFY(p, q) = θ where Subst(θ, p) = Subst(θ, q)
p q θ
Knows(John,x) Knows(John,Jane) {x/Jane}
Knows(John,x) Knows(y,OJ) {x/OJ,y/John}
Knows(John,x) Knows(y,Mother(y))
Knows(John,x) Knows(x,OJ)
Unification
■ The unify algorithm takes two sentences and
returns a unifier for them if one exists:
■ UNIFY(p, q) = θ where Subst(θ, p) = Subst(θ, q)
p q θ
Knows(John,x) Knows(John,Jane) {x/Jane}
Knows(John,x) Knows(y,OJ) {x/OJ,y/John}
Knows(John,x) Knows(y,Mother(y)) {y/John,x/Mother(John)}
Knows(John,x) Knows(x,OJ)
Unification
■ The unify algorithm takes two sentences and
returns a unifier for them if one exists:
■ UNIFY(p, q) = θ where Subst(θ, p) = Subst(θ, q)
p q θ
Knows(John,x) Knows(John,Jane) {x/Jane}
Knows(John,x) Knows(y,OJ) {x/OJ,y/John}
Knows(John,x) Knows(y,Mother(y)) {y/John,x/Mother(John)}
Knows(John,x) Knows(x,OJ) {fail}
■ Criminal(x): x is a criminal
■ Missile(x): x is a missile
■ Consider a rule:
■ Owns(Nono, x) Missile(x) Sells(West, x, Nono)
■ We find all the objects owned by Nono in
constant time per object;
■ Then, for each object, we check whether it’s a
missile.
■ If more objects and very few missiles
inefficient
■ Conjunct ordering problem: NP-hard
■ Heuristics?
Pattern Matching and CSP
■ The most constrained variable heuristic used for CSPs would suggest
ordering the conjuncts to look for missiles first if there are fewer
missiles than objects owned by Nono
■ We can actually express every finite-domain CSP as a single definite
clause together with some associated ground facts
Diff(R, B) Diff(R, G)
Diff(G, R) Diff(G, B)
Diff(B, R) Diff(B, G)
Incremental Forward Chaining
■ Observation:
■ Every new fact inferred on iteration t must be derived from
at least one new fact inferred on iteration t-1
■ Modification:
■ At iteration t, we check a rule only if its premise includes a
conjunct pi that unifies with a fact pi’ newly inferred at
iteration t-1
■ Many real systems operate in an “update” mode wherein
forward chaining occurs in response to each new fact that is
TOLD to the system
Irrelevant Facts
■ FC makes all allowable inferences based on
known facts, even if they are irrelevant to
the goal at hand
■ Similar to FC in propositional context
■ Solution?
Backward Chaining
■ p1 p 2 … pn q
■ When a query q is examined:
■ if a matching fact q' is known, return the unifier
■ for each rule whose consequent q' matches q
■ attempt to prove each premise of the rule by
backward chaining
■ http://www.cs.toronto.edu/~sheila/384/w11/simple
-prolog-examples.html
Exercise
■ Write down logical representations for the following
sentences, suitable for use with Generalized Modus
Ponens
■ Horses, cows, and pigs are mammals.
■ An offspring of a horse is a horse.
■ Bluebeard is a horse.
■ Bluebeard is Charlie’s parent.
■ Offspring and parent are inverse relations.
KNOWLEDGE REPRESENTATION
– Defining which attributes and properties classes can have and constraints on
their values
General-purpose ontology
1. Predicates: apple(x)
Category Organization
Categories serve to organize and simplify the knowledge base through inheritance.
• Relation = inheritance:
– All instance of food are edible, fruit is a subclass of food and apples is a subclass of fruit then an
apple is edible.
• Defines a taxonomy
– Subclass relations
organize categories
First-order logic makes it easy to state facts about categories, either by relating objects to categories
or by quantifying over their members.
Example:
– MemberOf(BB12,Basketballs)
– SubsetOf(Basketballs, Balls)
– MemberOf(Dogs, DomesticatedSpecies)
Relations between Categories
– Disjoint({Animals, Vegetables})
• A decomposition of a class into categories is called exhaustive if each object of the class must
belong to atleast one category
Natural kinds
• Tomatoes: sometimes green, orange, red, yellow. Mostly spherical, smaller, larger etc
• One solution: Separate what is true of all instances of a category from what is true only of typical
instances.
– Typical(c) ⊆ c
• This helps us to write down useful facts about categories without providing exact definitions.
Physical composition
– PartOf(Bucharest,Romania)
– PartOf(Romania,EasternEurope)
– PartOf(EasternEurope,Europe)
• The PartOf predicate is transitive (and reflexive), so we can infer that PartOf(Bucharest,Europe)
• More generally:
– ∀ x PartOf(x,x)
Measurements
– ∀ i Centimeters(2.54 x i)=Inches(i).
– Most important aspect of measures is not numerical values, but measures can be orderable.
Stuff: Defy any obvious individuation after division into distinct objects (mass nouns)
Events
Event calculus, models how the truth value of relations changes because of events occurring at
certain times.
• In logical approach, fluent is a predicate or function that vary from one situation to the next.
• When using reified fluents, a separate predicate is necessary to tell when a fluent is actually true
or not.
• For example, HoldsAt(on(box,table), t) means that the box is actually on the table at time t, where
the predicate HoldsAt is the one that tells when fluents are true.
• Clipped(t, f, t2) - Fluent f ceases to be true at some point during time interval between t and t2
• Restored(t, f, t2) - Fluent f becomes true sometime during time interval between t and t2
The Clipped predicate, stating that a fluent has been made false during an interval, can be
axiomatized as follows:
• The Restored predicate, stating that a fluent has been made true during an interval, can be
axiomatized as follows:
Example
Processes
Any subinterval of a process is also a member of same process category called process category or
liquid category
• Any process e that happens over an interval also happens over any subinterval:
Example:
Time Interval
1. moments
2. extended intervals
Time scale
• Interval(i) ‹ Duration(i)=(Time(End(i))−Time(Begin(i))).
• Time(Begin(AD2001)) = Seconds(3187324800)
• Time(End(AD2001))=Seconds(3218860800)
• Duration(AD2001)= Seconds(31536000)
• To say that George Washington was president throughout 1790, we can write
For a single- agent domains, knowledge about one’s own knowledge and reasoning processes is
useful for controlling inference.
• In a multiagent domains, it becomes important for an agent to reason about the mental states of
the other agents.
• It requires a model of the mental objects in someone’s head(knowledge base) and the processes
that manipulate these mental objects.
Relationships between agents and mental objects: believes, knows, wants, intends ... are called
propositional attitudes
• If Superman is Clark, then we must conclude that Lois knows that Clark can fly:
Knows(Lois, CanFly(Clark))
For propositional attitudes like believes and knows, we would like to have referential opacity
• Modal logic includes special modal operators that take sentences (rather than terms) as
arguments.
KAP,
In modal logic, consider both the possibility that Superman’s secret identity is Clark and that it
isn’t.
• Build a model, that consists of a collection of possible worlds rather than just one true world and
the worlds are connected in a graph by accessibility relations
∃x KBondSpy(x)
which in modal logic means that there is an x that, in all accessible worlds, Bond knows to be a
spy.
KBond∃x Spy(x)
The modal logic interpretation is that in each accessible world there is an x that is a spy, but it
need not be the same x in each world.
• After extensive study, it is commonly said that knowledge is justified true belief
• Example:
• Lois knows whether Clark can fly if she either knows that Clark can fly or knows that he cannot
Belief is also something which can change over time. For example:
T(Believes(Lois, flies(Superman)),Today)
• The same will not be true in 100 years when he died of old age
Categories are the primary building blocks of large-scale knowledge representation schemes.
• This topic describes systems specially designed for organizing and reasoning with categories.
1. Semantic networks
2. Description logics
Reasoning systems
Semantic networks
– Efficient algorithms for inferring of object on the basis of its category membership
Description logics
Semantic Networks
In 1909, Charles S. Peirce proposed a graphical notation of nodes and edges called existential
graphs
• A typical graphical notation displays object or category names in ovals or boxes, and connects
them with labeled arcs/links
• For Example:
– MemberOf link between Mary and FemalePersons, corresponding to the logical assertion
Mary ∈ FemalePersons
Semantic Networks
• The simplicity and efficiency of this inference mechanism compared with logical theorem has
been one of the main attractions of semantic networks.
• Multiple Inheritance becomes complicated because two or more conflicting values for answering
the query
• For this reason, multiple inheritance is banned in some object-oriented programming (OOP)
languages, such as Java
• Drawback of semantic network is that the links between bubbles represent only binary relations.
• For example, the sentence Fly(Shankar, NewYork, NewDelhi, Yesterday) cannot be asserted
directly in a semantic network.
• But can obtain the effect of n-ary assertions by reifying the proposition
• Example:
– John has 1 leg despite the fact that all persons have 2 legs
Description logics
They are logical notations that are designed to describe definitions and properties about
categories
– Subsumption: checking if one category is the subset of another by comparing their definitions
• For example, to say that bachelors are unmarried adult males we would write
• Bachelor(x) ⇔ Unmarried(x)∧Adult(x)∧Male(x)
Default reasoning
This is a very common from of non-monotonic reasoning. Here We want to draw conclusions
based on what is most likely to be true.
We have already seen examples of this and possible ways to represent this knowledge.
Non-Monotonic logic.
Default logic.
DO NOT get confused about the label Non-Monotonic and Default being applied to reasoning
and a particular logic. Non-Monotonic reasoning is generic descriptions of a class of
reasoning. Non-Monotonic logic is a specific theory. The same goes for Default reasoning
and Default logic.
Non-Monotonic Logic
This is basically an extension of first-order predicate logic to include a modal operator, M.
The purpose of this is to allow for consistency.
states that for all x is x plays an instrument and if the fact that x can improvise is consistent
with all other knowledge then we can conclude that x is a jazz musician.
to show that fact P is true attempt to prove . If we fail we may say that P is consistent
(since is false).
Now this states that Quakers tend to be pacifists and Republicans tend not to be.
Quaker(Nixon)
Republican(Nixon)
Default Logic
Now this is similar to Non-monotonic logic but there are some distinctions:
New inference rules are used for computing the set of plausible extensions. So in the
Nixon example above Default logic can support both assertions since is does not say
anything about how choose between them -- it will depend on the inference being
made.
In Default logic any nonmonotonic expressions are rules of inference rather than
expressions.
Circumscription
Circumscription is a rule of conjecture that allows you to jump to the conclusion that the
objects you can show that posses a certain property, p, are in fact all the objects that posses
that property.
: penguin(x) bird(x)
: penguin(x) flies(x)
We cannot conclude
flies(tweety)
abnormal(tweety).
we will assume that those things that are shown to be abnormal are the only things to be
abnormal
: abnormal(x)
penguin(tweety)
abnormal(tweety).
: abnormal(x) penguin(x).
Due to Lecture time limitation. This topic is not dealt with in any great depth. Please refer to
the further reading section.
Basically TMSs:
This is a simple TMS in that it does not know anything about the structure of the
assertions themselves.
Each supported belief (assertion) in has a justification.
Each justification has two parts:
o An IN-List -- which supports beliefs held.
o An OUT-List -- which supports beliefs not held.
Nodes (assertions) assume no relationships among them except ones explicitly stated
in justifications.
JTMS can represent P and P simultaneously. An LTMS would throw a contradiction
here.
If this happens network has to be reconstructed.
JTMS and LTMS pursue a single line of reasoning at a time and backtrack
(dependency-directed) when needed -- depth first search.
ATMS maintain alternative paths in parallel -- breadth-first search
Backtracking is avoided at the expense of maintaining multiple contexts.
However as reasoning proceeds contradictions arise and the ATMS can be pruned
o Simply find assertion with no valid justification.
Quantifying Uncertainty
Summarizing Uncertainty
Probability allows to summarize the uncertainty on effects of
laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
Probability can be derived from
statistical data (ex: 80% of toothache patients so far had cavities)
some knowledge (ex: 80% of toothache patients has cavities) their combination thereof
Probability statements are made with respect to a state of knowledge (aka evidence), not
with respect to the real world
e.g., “The probability that the patient has a cavity, given that she has a toothache, is 0.8”:
P(HasCavity(patient) j hasToothAche(patient)) = 0:8
Probabilities of propositions change with new evidence:
“The probability that the patient has a cavity, given that she has a toothache and a history of
gum disease, is 0.4”:
P(HasCavity(patient)
j hasToothAche(patient) ^ HistoryOfGum(patient)) = 0:4
Probabilities Basics:
Random Variables
Probability Distributions
Probability for Continuous Variables
Conditional Probabilities
Probabilistic Inference via Enumeration
Marginalization
Artificial Intelligence (BEIT703T)
VII Semester IT
Unit-III
Topic:- Knowledge Representation with Logic and Inferences
Prof. S. C. Sakure
(1) What are the steps associated with the knowledge Engineering process?
Discuss them by applying the steps to any real world application of your choice.
Knowledge Engineering
The general process of constructing knowledge base is called knowledge engineering. A
knowledge engineer is someone who investigates a particular domain, learns what concepts are
important in that domain, and creates a formal representation of the objects and relations in the
domain. We will illustrate the knowledge engineering process in an electronic circuit domain that
should already be fairly familiar,
The steps associated with the knowledge engineering process are :
1 Identify the task.
The task will determine what knowledge must be represented in order to connect problem instances to
answers. This step is analogous to the PEAS process for designing agents.
2. Assemble the relevant knowledge. The knowledge engineer might already be an expert in the domain,
or might need to work with real experts to extract what they know-a process called knowledge
acquisition.
3. Decide on a vocabulary of predicates, functions, and constants. That is, translate the important
domain-level concepts into logic-level names. Once the choices have been made. the result is a
vocabulary that is known as the ontology of the domain. The word ontology means a particular theory of
the nature of being or existence.
4. Encode general knowledge about the domain. The knowledge engineer writes down the axioms for
all the vocabulary terms. This pins down (to the extent possible) the meaning of the terms, enabling the
expert to check the content. Often, this step reveals misconceptions or gaps in the vocabulary that must be
fixed by returning to step 3 and iterating through the process.
5. Encode a description of the specific problem instance
For a logical agent, problem instances are supplied by the sensors, whereas a "disembodied" knowledge
base is supplied with additional sentences in the same way that traditional programs are supplied with
input data.
6. Pose queries to the inference procedure and get answers. This is where the reward is: we can let the
inference procedure operate on the axioms and problem-specific facts to derive the facts we are interested
in knowing.
7. Debug the knowledge base.
x NumOfLegs(x,4) => Mammal(x) Is false for reptiles ,amphibians.
1. If two terminals are connected, then they have the same signal:
t1,t2 Connected(t1, t2) Signal(t1) = Signal(t2)
2
2. The signal at every terminal is either 1 or 0 (but not both):
t Signal(t) = 1 Signal(t) =01
≠0
3
The answer are substitutions for the variable i1,i2 and i3 such that the resulting sentence is entailed by
the knowledge base, There are three such substitutions:
The knowledge base is checked with different constraints. For example if the assertion 1#0 is not
included in the knowledge base then it is variable to prove any output for the circuit ,expect for the input
cases 000 and 110
This claim that no output are known at X1 for the input cases 10 and 01. Therefore the axiom for XOR
gate is considered
It the input signal are known 1 and 0 , the above sentence reduces to Signal(Out(1,X1)) = 1 1#0
Unification
Generalized Modus Pones
• For atomic sentences pi, pi’, q where there is a substitution q such that Subst{q,
pi’} = Subst{q, pi}:
4
Unification
It is the process of finding substitutions that make two atomic sentences identical
Lifted inference rules require finding substitution that make different logical expression
look identical this process is called unification
It is the key component of first order inference logic algorithm
UNIFY (p,q)=θ Where SUBST(θ,p)= SUBST(q, θ)
Θ- Unifier of two sentences
P-S1(x,x)
1(y,z)
Assume θ=y
Unification algorithm returns failure as the result at two factors
i) Two sentence have different predicate name
Ex hate(m,c)
try(m,c)
ii) Two sentence will have different number of arguments
Ex hate(m,c,x)
hate(m,c)
Query Knows (John,x) whom does john knows
The knowledg e base contain the following information
Knows(John,x)
Knows(John,Jane)
Knows(y,Bill)
Knows(y,Mother(y))
Knows(x,Eizabeth)
5
• This fails because x cannot take on two values
• But “Everyone knows Elizabeth” and it should not fail
• Must standardize apart one of the two sentences to eliminate reuse of variable
Standardizing apart eliminates overlap of variables, e.g., Knows(z17, OJ) by renaming its
variables to avoid name clashes
Most General Unifier
Multiple unifiers are possible:
or
6
Storage and retrieval
Once the data type for sentences and terms are defined, we need to contain a set of sentences in a
KB
i) Stores(s) – store a sentence s
ii) Fetch(s)-returns all unifiers
such that query q unifies with some sentence in KB.
An example is when we ask Knows (John, x) – is an instance of fetching
The simplest way to implement STORE and FETCH is to keep all the facts in the
knowledge base in one long list
The given query UNIFY(q,s) – given query call for every sentence s in the list, requires
O(n) time on an n-element KB.
Such a process is inefficient in terms of time taken because it will unify each and every
statement of knowledge base
Inefficient because we’re performing so many unifies.
7
• Example: unify Knows (John, x) with Brother (Richard, John)
There is no need to unify
We can avoid such unification by indexing the facts in the knowledge base
Predicate indexing puts all the Knows facts in one bucket and all the Brother facts in
another
The bucket stored in a hash table for efficient access.
We can large bucket problem with the help of multiple indexing
• Might not be a win if there are lots of clauses for a particular predicate symbol
Consider how many people Know one another
8
9
10
11
12
13
14
9 Man(x5)v person(x5)
15
2. Explain Forward and backward chaining (NOV 2013)(MAY 2013)(MAY 2012)(NOV 2011).
A forward-chaining algorithm for propositional definite clauses was already given. The
idea is simple: start with the atomic sentences in the knowledge base and apply
ModusPonens in the forward direction, adding new atomic sentences, until no
further inferences can be made.
First-order definite clauses
First-order definite clauses closely resemble propositional definite clauses they are
disjunctions of literals of which exactly one is positive. A definite clause either is atomic
or is an implication whose antecedent is a conjunction of positive literals and
whose consequent is a single positive literal.
This knowledge base contains no function symbols and is therefore an instance of' the
class DATALOG of Data log knowledge bases-that is, sets of first-order definite clauses with
no function symbols
16
• All of its missiles were sold to it by Colonel West
“
• West, who is American”
Nono, is a nation
Nation(nono)
• The country Nono, an enemy of America
America is a nation
Nation (America)
17
Analyse the algorithm
Sound
• Does it only derive sentences that are entailed?
• Yes, because only Modus Ponens is used and it is sound
Complete
• Does it answer every query whose answers are entailed by the KB?
• Yes if the clauses are definite clauses
Assume KB only has sentences with no function symbols
• What’s the most number of iterations through algorithm?
• Depends on the number of facts that can be added
Let k be the arity, the max number of arguments of any predicate and
Let p be the number of predicates
Let n be the number of constant symbols
• At most pnk distinct ground facts
• Fixed point is reached after this many iterations
• A proof by contradiction shows that the final KB is complete
Three sources of complexity
• inner loop requires finding all unifiers such that premise of rule unifies with facts
of database
– this “pattern matching” is expensive
• must check every rule on every iteration to check if its premises are satisfied
• many facts are generated that are irrelevant to goal
18
Step1:
Step2:
Step3:
19
Efficiency of forward chaining
Incremental forward chaining: no need to match a rule on iteration k if a premise wasn't
added on iteration k-1
match each rule whose premise contains a newly added positive literal
Matching itself can be expensive:
20
The algorithm uses composition of substitution. COMPOSE(θ1, θ2) is the substitution whose
effect is identical to the effect of applying each substitution is
SUBST(COMPOSE(θ1, θ2), p) = SUBST(θ2, SUBST(θ1, p))
In the algorithm , the current variable binding , which are stored in θ, are composed with the
binding resulting from unifying the goal with the clause head , given a new set of current
bindings for the recursive call.
Step1
Step2
Step 3
21
Step 4
Step 5
Step 6
22
Step 7
Two marks
1 What is Ontological Engineering
Ontology refers to organizing everything in the world into hierarch of categories.
Representing the abstract concepts such as Actions,Time,Physical Objects, and Beliefs is
called Ontological Engineering
23
2 1Define diagnostic rules with example?
Diagnostic rules are used in first order logic for inferencing. The diagnostic rules
generate hidden causes from observed effect. They help to deduce hidden facts in the world. For
example considering the wumpum world.
The diagnostic rule finding ‘pit’ is,
“If square is breezy some adjacent square must contain pit”, which is written as,
∀xBreezy(s)=> ∃rAdjacent(r,s)^ pit (r
4
S. No Propositional logic Predicate logic
1 It is less expressive power It is more expressive power
2 It cannot represent relationship among objects It represent relationship among objects
3 It does not use quantifiers It use quantifiers
4 It does not consider generalization of objects It consider generalization of objects
24