Nothing Special   »   [go: up one dir, main page]

CSC 208 Note

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

CSC 208: INTRODUCTION TO

ARTIFICIAL INTELLIGENCE

1
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
Artificial Intelligence is concerned with the design of intelligence in an artificial device. The term
was coined by John McCarthy in 1956.

There are probably as many definitions of intelligence as there are experts who study it. Simply
put, however, “intelligence is the ability to learn about, learn from, understand, and interact with
one’s environment. This general ability consists of a number of specific abilities, which include
these specific abilities:

 Adaptability to a new environment or to changes in the current environment


 Capacity for knowledge and the ability to acquire it
 Capacity for reason and abstract thought
 Ability to comprehend relationships
 Ability to evaluate and judge
 Capacity for original and productive thought
Intelligence is the ability to acquire, understand and apply the knowledge to achieve goals in the
world.
Several Forms of Intelligence:
• Capability to learn
• Gathering of information
• Recognizing Patterns
• Capability to classify
• Making decisions
• Reasoning capability
• Predicting/Forecasting
• Capability to survive

2
ARTIFICIAL INTELLIGENCE
AI is a broad field, and means different things to different people. It is concerned with getting
computers to do tasks that require human intelligence which require complex and sophisticated
reasoning processes and knowledge.

AI is the study of the mental faculties through the use of computational models AI is the study of
intellectual/mental processes as computational processes. AI program will demonstrate a high
level of intelligence to a degree that equals or exceeds the intelligence required of a human in
performing some task. AI is unique, sharing borders with Mathematics, Computer Science,
Philosophy, Psychology, Biology, Cognitive Science and many others. Although there is no clear
definition of AI or even Intelligence, it can be described as an attempt to build machines that like
humans can think and act, able to learn and use knowledge to solve problems on their own.

The definitions on the top, (a) and (b) are concerned with reasoning, whereas those on the
bottom, (c) and (d) address behavior. The definitions on the left, (a) and (c) measure success in
terms of human performance, and those on the right, (b) and (d) measure the ideal concept of
intelligence called rationality.

HISTORY OF ARTIFICIAL INTELLIGENCE


Important research that laid the groundwork for AI:

 In 1931, Goedel laid the foundation of Theoretical Computer Science1920-30s:


o He published the first universal formal language and showed that math itself is
either flawed or allows for unprovable but true statements.
 In 1936, Turing reformulated Goedel’s result and church’s extension thereof.
 In 1956, John McCarthy coined the term "Artificial Intelligence" as the topic of the
Dartmouth Conference, the first conference devoted to the subject.

3
 In 1957, The General Problem Solver (GPS) demonstrated by Newell, Shaw & Simon
 In 1958, John McCarthy (MIT) invented the Lisp language.
 In 1959, Arthur Samuel (IBM) wrote the first game-playing program, for checkers, to
achieve sufficient skill to challenge a world champion.
 In 1963, Ivan Sutherland's MIT dissertation on Sketchpad introduced the idea of
interactive graphics into computing.
 In 1966, Ross Quillian (PhD dissertation, Carnegie Inst. of Technology; now CMU)
demonstrated semantic nets
 In 1967, Dendral program (Edward Feigenbaum, Joshua Lederberg, Bruce Buchanan,
Georgia
 Sutherland at Stanford) demonstrated to interpret mass spectra on organic chemical
compounds.
 First successful knowledge-based program for scientific reasoning.
 In 1967, Doug Engelbart invented the mouse at SRI
 In 1968, Marvin Minsky & Seymour Papert publish Perceptrons, demonstrating limits of
simple neural nets.
 In 1972, Prolog developed by Alain Colmerauer.
 In Mid 80’s, Neural Networks become widely used with the Back propagation algorithm
(first described by Werbos in 1974).
 1990, Major advances in all areas of AI, with significant demonstrations in machine
learning,
 intelligent tutoring, case-based reasoning, multi-agent planning, scheduling, uncertain
reasoning, data mining, natural language understanding and translation, vision, virtual
reality, games, and other topics.
 In 1997, Deep Blue beats the World Chess Champion Kasparov
 In 2002,iRobot, founded by researchers at the MIT Artificial Intelligence Lab, introduced
Roomba, a vacuum cleaning robot. By 2006, two million had been sold.

Foundations of Artificial Intelligence (Perspectives)


 Philosophy: This perspective of AI investigates foundational issues of knowledge and
believe, mutual knowledge (e.g. can a machine think?).
 Psychology and Cognitive Science e.g., problem solving skills
 Neuro-Science e.g., brain architecture
 Computer Science and Engineering e.g., complexity theory, algorithms, logic and
inference, programming languages, and system building.
 Mathematics and Physics e.g. statistical modeling, continuous mathematics etc.

SUB AREAS OF ARTIFICIAL INTELLIGENCE


1) Game Playing:

4
Deep Blue Chess program beat world champion Gary Kasparov
2) Speech Recognition:

PEGASUS spoken language interface to American Airlines' EAASY SABRE reservation system,
which allows users to obtain flight information and make reservations over the telephone. The
1990s saw significant advances in speech recognition so that many systems are now successful.
3) Computer Vision:

Face recognition programs in use by banks, government, etc. The ALVINN system from CMU
autonomously drove a van from Washington, D.C. to San Diego (all but 52 of 2,849 miles),
averaging 63 mph day and night, and in all weather conditions. Handwriting recognition,
electronics and manufacturing inspection, photo interpretation, baggage inspection, reverse
engineering to automatically construct a 3D geometric model.
4) Expert Systems:

Application-specific systems that rely on obtaining the knowledge of human experts in an area and
programming that knowledge into a system. Examples are:

a. Diagnostic Systems: MYCIN system for diagnosing bacterial infections of the blood and
suggesting treatments. Intellipath pathology diagnosis system (AMA approved). Pathfinder
medical diagnosis system, which suggests tests and makes diagnoses. Whirlpool customer
assistance center.

b. System Configuration: DEC's XCON system for custom hardware configuration.


Radiotherapy treatment planning.

c. Financial Decision Making: Credit card companies, mortgage companies, banks, and the
U.S. government employ AI systems to detect fraud and expedite financial transactions.
For example, AMEX credit check.

d. Classification Systems: Put information into one of a fixed set of categories using several
sources of information. e.g., financial decision making systems. NASA developed a system
for classifying very faint areas in astronomical images into either stars or galaxies with
very high accuracy by learning from human experts' classifications.
5) Mathematical Theorem Proving:
Use inference methods to prove new theorems.
6) Natural Language Understanding:
AltaVista's translation of web pages. Translation of Catepillar Truck manuals into 20 languages.
7) Scheduling and Planning:

5
Automatic scheduling for manufacturing. DARPA's DART system used in Desert Storm and
Desert Shield operations to plan logistics of people and supplies. American Airlines rerouting
contingency planner. European space agency planning and scheduling of spacecraft assembly,
integration and verification.
8) Artificial Neural Networks
9) Machine Learning etc.

INTELLIGENT SYSTEMS
In order to design intelligent systems, it is important to categorize them into four categories (Luger
and Stubberfield 1993), (Russell and Norvig, 2003)

1. Systems that think like humans.


2. Systems that think rationally.
3. Systems that behave like humans.
4. Systems that behave rationally.

Scientific Goal: To determine which ideas about knowledge representation, learning, rule
systems search, and so on, explain various sorts of real intelligence.

Engineering Goal: To solve real world problems using AI techniques such as Knowledge
representation, learning, rule systems, search, and so on.

Traditionally, computer scientists and engineers have been more interested in the engineering
goal, while psychologists, philosophers and cognitive scientists have been more interested in the
scientific goal.

1. Cognitive Science: Think Human-Like


a. Requires a model for human cognition. Precise enough models allow simulation by computers.
b. Focus is not just on behavior and I/O, but looks like reasoning process.

6
c. Goal is not just to produce human-like behavior but to produce a sequence of steps of the
reasoning process, similar to the steps followed by a human in solving the same task.

2. Laws of thought: Think Rationally


a. The study of mental faculties through the use of computational models; that it is, the study of
computations that make it possible to perceive reason and act.
b. Focus is on inference mechanisms that are probably correct and guarantee an optimal solution.

c. Goal is to formalize the reasoning process as a system of logical rules and procedures of
inference.
d. Develop systems of representation to allow inferences to be like
“Socrates is a man. All men are mortal. Therefore Socrates is mortal”

3. Turing Test: Act Human-Like


a. The art of creating machines that perform functions requiring intelligence when performed by
people; that it is the study of, how to make computers do things which, at the moment, people do
better.
b. Focus is on action, and not intelligent behavior centered around the representation of the world

Knowledge of the Turing Test


One of the earliest and most significant papers on machine intelligence, ‘Computing machinery
and intelligence’, was written by the British mathematician Alan Turing over fifty years ago
(Turing, 1950). However, it has stood up well to the test of time, and Turing’s approach remains
universal. Turing’s theoretical concept of the universal computer and his practical experience in
building code-breaking systems equipped him to approach the key fundamental question of
artificial intelligence. He asked: Is there thought without experience? Is there mind without
communication? Is there language without living? Is there intelligence without life? All these
questions, are just variations on the fundamental question of artificial intelligence, Can machines
think?
Turing did not provide definitions of machines and thinking, he just avoided semantic arguments
by inventing a game, the Turing imitation game. Instead of asking, ‘Can machines think?’ Turing
said we should ask, ‘Can machines pass a behavior test for intelligence?’ He predicted that by the
year 2000, a computer could be programmed to have a conversation with a human interrogator for
five minutes and would have a 30 per cent chance of deceiving the interrogator that it was a human.
Turing defined the intelligent behavior of a computer as the ability to achieve the human-level
performance in cognitive tasks. In other words, a computer passes the test if interrogators cannot
distinguish the machine from a human on the basis of the answers to their questions.

7
Figure 1: Turing imitation game: phase 1

The imitation game proposed by Turing originally included two phases. In the first phase, shown
in the figure 1 above, the interrogator, a man and a woman are each placed in separate rooms and
can communicate only via a neutral medium such as a remote terminal. The interrogator’s
objective is to work out who is the man and who is the woman by questioning them. The rules of
the game are that the man should attempt to deceive the interrogator that he is the woman, while
the woman has to convince the interrogator that she is the woman.

Figure 2: Turing imitation game: phase 2

In the second phase of the game, shown in Figure 2, the man is replaced by a computer
programmed to deceive the interrogator as the man did. It would even be programmed to make
mistakes and provide fuzzy answers in the way a human would. If the computer can fool the
interrogator as often as the man did, we may say this computer has passed the intelligent behavior
test.

Physical simulation of a human is not important for intelligence. Hence, in the Turing test the
interrogator does not see, touch or hear the computer and is therefore not influenced by its
appearance or voice. However, the interrogator is allowed to ask any questions, even provocative

8
ones, in order to identify the machine. The interrogator may, for example, ask both the human and
the machine to perform complex mathematical calculations, expecting that the computer will
provide a correct solution and will do it faster than the human. Thus, the computer will need to
know when to make a mistake and when to delay its answer. The interrogator also may attempt to
discover the emotional nature of the human, and thus, he might ask both subjects to examine a
short novel or poem or even painting. Obviously, the computer will be required here to simulate a
human’s emotional understanding of the work. The Turing test has two remarkable qualities that
make it really universal. Thus;

 By maintaining communication between the human and the machine via terminals, the test
gives us an objective standard view on intelligence. It avoids debates over the human nature
of intelligence and eliminates any bias in favour of humans. .
 The test itself is quite independent from the details of the experiment. It can be conducted
either as a two-phase game as just described, or even as a singlephase game in which the
interrogator needs to choose between the human and the machine from the beginning of
the test. The interrogator is also free to ask any question in any field and can concentrate
solely on the content of the answers provided.

4. Rational agent: Act Rationally


a. Tries to explain and emulate intelligent behavior in terms of computational process; that it
is concerned with the automation of the intelligence.
b. Focus is on systems that act sufficiently if not optimally in all situations.
c. Goal is to develop systems that are rational and sufficient.

The difference between strong AI and weak AI:


Strong AI makes the bold claim that computers can be made to think on a level (at least) equal to
humans. Weak AI simply states that some "thinking-like" features can be added to computers to
make them more useful tools... and this has already started to happen (witness expert systems,
drive-by-wire cars and speech recognition software).

AI Problems:
AI problems (speech recognition, NLP, vision, automatic programming, knowledge
representation, etc.) can be paired with techniques (NN, search, Bayesian nets, production systems,
etc.).AI problems can be classified in two types:
1. Common-place tasks (Mundane Tasks)
2. Expert tasks

9
Common-Place Tasks:
1. Recognizing people, objects.
2. Communicating (through natural language).
3. Navigating around obstacles on the streets.
These tasks are done matter of factly and routinely by people and some other animals.

Expert tasks:
1. Medical diagnosis.
2. Mathematical problem solving
3. Playing games like chess
These tasks cannot be done by all people, and can only be performed by skilled specialists. Clearly
tasks of the first type are easy for humans to perform, and almost all are able to master them. The
second range of tasks requires skill development and/or intelligence and only some specialists can
perform them well. However, when we look at what computer systems have been able to achieve
to date, we see that their achievements include performing sophisticated tasks like medical
diagnosis, performing symbolic integration, proving theorems and playing chess.

INTELLIGENT AGENTS
Agents and Environments

Fig 3: Agents and Environments.


Agent:

An Agent is anything that can be viewed as perceiving its environment through sensors and acting
upon that environment through actuators. A human agent has eyes, ears, and other organs for
sensors and hands, legs, mouth, and other body parts for actuators. A robotic agent might have
cameras and infrared range finders for sensors and various motors for actuators. A software agent
receives keystrokes, file contents, and network packets as sensory inputs and acts on the
environment by displaying on the screen, writing files, and sending network packets.
The agent function maps from percept histories to actions:

10
[f: P* A]
• The agent program runs on the physical architecture to produce f
• Agent = architecture + program

Percept:
We use the term percept to refer to the agent's perceptual inputs at any given instant.

Percept Sequence:
An agent's percept sequence is the complete history of everything the agent has ever perceived.
Agent function:
Mathematically speaking, we say that an agent's behavior is described by the agent function that
maps any given percept sequence to an action.

Agent program:
Internally, the agent function for an artificial agent will be implemented by an agent program. It is
important to keep these two ideas distinct. The agent function is an abstract mathematical
description while the agent program is a concrete implementation, running on the agent
architecture.
To illustrate these ideas, we will use a very simple example-the vacuum-cleaner world shown in
figure 4 below.

Figure 5: A vacuum-cleaner world with just two locations.

This particular world has just two locations: squares A and B. The vacuum agent perceives which
square it is in and whether there is dirt in the square. It can choose to move left, move right, suck
up the dirt, or do nothing. One very simple agent function is the following: if the current square is
dirty, then suck, otherwise move to the other square. A partial tabulation of this agent function is
shown in Figure 5.

11
Agent function

Figure 5: Partial tabulation of a simple agent function for the example: vacuum-cleaner
world shown in the Figure 4

 Percepts: location and contents, e.g., [A, Dirty]


 Actions: Left, Right, Suck, NoOp
 Agent’s function table

Figure 6: The REFLEX-VACCUM-AGENT program is invoked for each new percept (location,
status) and returns an action each time.

12
AGENT ENVIRONMENT IN AI
An environment is everything in the world which surrounds the agent, but it is not a part of an
agent itself. An environment can be described as a situation in which an agent is present.

The environment is where agent lives, operate and provide the agent with something to sense and
act upon it. An environment is mostly said to be non-feministic.

Some programs operate in the entirely artificial environment confined to keyboard input, database,
computer file systems and character output on a screen.

In contrast, some software agents (software robots or softbots) exist in rich, unlimited softbots
domains. The simulator has a very detailed, complex environment. The software agent needs to
choose from a long array of actions in real time. For example, a softbot designed to scan the online
preferences of the customer and show interesting items to the customer works in the real as well
as an artificial environment.

The most famous artificial environment is the Turing Test environment, in which one real and
other artificial agents are tested on equal ground. This is a very challenging environment as it is
highly difficult for a software agent to perform as well as a human.

Features of Environment
An environment can have various features from the point of view of an agent: (Russell & Norvig,
2020):
Fully Observable vs Partially Observable
Static vs Dynamic
Discrete vs Continuous
Deterministic vs Stochastic
Single-agent vs Multi-agent
Episodic vs Sequential
Known vs Unknown
Accessible vs Inaccessible

1. Fully Observable vs Partially Observable:


If an agent sensor can sense or access the complete state of an environment at each point in time
from the percepts, then it is a fully observable environment, else it is partially observable. (In other

13
words, the environment is fully observable if everything an agent requires to choose its actions is
available to it via its sensors. The environment is partially observable if parts of the environment
are inaccessible which implies that an agent must make informed guesses about the world)

A fully observable environment is easy as there is no need to maintain the internal state to keep
track of the history of the world. If an agent is with no sensors in all environments then such an
environment is called as unobservable environment. E.g. Cross word puzzle is fully observable
while Poker is partially observable.

2. Deterministic vs Stochastic:
If an agent's current state and selected action can completely determine the next state of the
environment, then such environment is called a deterministic environment.

A stochastic environment is random in nature and cannot be determined completely by an agent.


In a deterministic, fully observable environment, agent does not need to worry about uncertainty.
E.g. Cross word puzzle is deterministic while Poker is stochastic.

3. Episodic vs Sequential:
In an episodic environment, there is a series of one-shot actions, and only the current percept is
required for the action. However, in Sequential environment, an agent requires memory of past
actions to determine the next best actions i.e. the choice of current action is dependent on pervious
action. Episodic environments are much simpler because the agent does not need to think ahead.
E.g. Cross word puzzle and Poker are sequential while image analysis is episodic.

4. Single-agent vs Multi-agent
If only one agent is involved in an environment, and operating by itself then such an environment
is called single agent environment.

However, if multiple agents are operating in an environment, then such an environment is called a
multi-agent environment. The agent design problems in the multi-agent environment are different
from single agent environment.

5. Static vs Dynamic:
If the environment can change itself while an agent is deliberating then such environment is called
a dynamic environment else it is called a static environment.
Static environments are easy to deal because an agent does not need to continue looking at the
world while deciding for an action. However for dynamic environment, agents need to keep

14
looking at the world at each action. Alternatively, agents need to anticipate the change during
deliberation OR make decision very fast.

Taxi driving is an example of a dynamic environment whereas Crossword puzzles are an example
of a static environment.

6. Discrete vs Continuous:
If in an environment there are a finite number of percepts and actions that can be performed within
it, then such an environment is called a discrete environment else it is called continuous
environment.

A chess game comes under discrete environment as there is a finite number of moves that can be
performed. A self-driving car is an example of a continuous environment.

7. Known vs Unknown:
Known and unknown are not actually a feature of an environment, but it is an agent's state of
knowledge to perform an action.

In a known environment, the results for all actions are known to the agent. While in unknown
environment, agent needs to learn how it works in order to perform an action.

It is quite possible that a known environment to be partially observable and an Unknown


environment to be fully observable.

8. Accessible vs Inaccessible:
If an agent can obtain complete and accurate information about the state's environment, then such
an environment is called an accessible environment else it is called inaccessible.

An empty room whose state can be defined by its temperature is an example of an accessible
environment. Information about an event on earth is an example of inaccessible environment.

Rationality
Rationality is the state of being reasonable, sensible, and having good sense of judgment.
Rationality is concerned with expected actions and results depending upon what the agent has
perceived. Performing actions with the aim of obtaining useful information is an important part of
rationality.

Ideal Rational Agent


An ideal rational agent is the one, which is capable of doing expected actions to maximize its
performance measure, on the basis of:
1. Its percept sequence

15
2. Its built-in knowledge base
Rationality of an agent depends on the following:

1. The performance measures, which determine the degree of success.


2. Agent’s Percept Sequence.
3. The agent’s prior knowledge about the environment.
4. The actions that the agent can carry out.

A rational agent always performs right action, where the right action means the action that causes
the agent to be most successful in the given percept sequence. The problem the agent solves is
characterized by Performance Measure, Environment, Actuators, and Sensors (PEAS).

The Structure of Intelligent Agents


Agent’s structure can be viewed as:

 Agent = Architecture + Agent Program


 Architecture = the machinery that an agent executes on.
 Agent Program = an implementation of an agent function.

Types of Agent
1. Simple Reflex Agents

They choose actions only based on the current percept.


They are rational only if a correct decision is made only on the basis of current precept.
Their environment is completely observable.
Condition-Action Rule − It is a rule that maps a state (condition) to an action.

16
Figure: Simple Reflex Agents

2. Model Based Reflex Agents

They use a model of the world to choose their actions. They maintain an internal state.
Model − knowledge about “how the things happen in the world”.
Internal State − It is a representation of unobserved aspects of current state depending on percept
history.

Updating the state requires the information about:

 How the world evolves.


 How the agent’s actions affect the world.

Figure: Model Based Reflex Agents

17
3. Goal Based Agents

They choose their actions in order to achieve goals. Goal-based approach is more flexible than
reflex agent since the knowledge supporting a decision is explicitly modeled, thereby allowing for
modifications.
Goal − It is the description of desirable situations.

Figure: Goal Based Agents

4. Utility Based Agents


They choose actions based on a preference (utility) for each state.
Goals are inadequate when −

There are conflicting goals, out of which only few can be achieved. Goals have some uncertainty
of being achieved and you need to weigh likelihood of success against the importance of a goal.

Figure: Utility Based Agents

18
PEAS: Performance measure, Environment, Actuator, Sensor
PEAS System is used to categorize similar agents together. The PEAS system delivers the
performance measure with respect to the environment, actuators and sensors of the respective
agent. Most of the highest performing agents are Rational Agents.

Rational Agent: The rational agent considers all possibilities and chooses to perform the highly
efficient action. For example it chooses the shortest path with low cost for high efficiency.
PEAS stands for Performance measure, Environment, Actuator, Sensor.

Performance Measure: Performance measure is the unit to define the success of an agent.
Performance varies with agents based on their different precept.
Environment: Environment is the surrounding of an agent at every instant. It keeps changing
with time if the agent is set in motion. There are 5 major types of environments:
Fully Observable & Partially Observable
Episodic & Sequential
Static & Dynamic
Discrete & Continuous
Deterministic & Stochastic
Actuator: Actuator is a part of the agent that delivers the output of an action to the environment.
Sensor: Sensors are the receptive parts of an agent which takes in the input for the agent.

Table: Examples of PEAS

Agent Performance Environment Actuator Sensor


Measure

Hospital Patient’s health, Hospital, Doctors, Prescription, Symptoms,


Management Admission Patients Diagnosis, Scan Patient’s response
System process, Payment report

Automated Comfortable trip, Roads, Traffic, Steering wheel, Camera, GPS,


Car Drive Safety, Vehicles Accelerator, Odometer,
Maximum Brake, Mirror Speedometer,
Distance Engine sensors

19
Subject Maximize scores, Classroom, Desk, Smart displays, Eyes, Ears,
Tutoring Improvement in Chair, Board, Corrections, Notebooks
students Staff, Students Exercises

Part-picking Percentage of Conveyor belt Jointed arm and Camera, joint


robot parts in correct with parts, bins hand angle sensors
bins

SEARCH AND CONSTRAINT SATISFACTION


Search is the systematic examination of states to find path from the start/root state to the goal state.
Many traditional search algorithms are used in AI applications. For complex problems, the
traditional algorithms are unable to find the solution within some practical time and space limits.
Consequently, many special techniques are developed; Search techniques in AI are universal
problem-solving methods. Rational agents or Problem-solving agents in AI mostly use these
search strategies or algorithms to solve a specific problem and provide the best result. Problem-
solving agents are the goal-based agents and use atomic representation.

Search Algorithm Terminologies:


o Search: Searching is a step by step procedure to solve a search-problem in a given search
space. Search is the systematic examination of states to find path from the start/root state
to the goal state. A search problem can have three main factors:

a. Search Space: Search space represents a set of possible solutions, which a system may
have.
b. Start State: It is a state from where agent begins the search.
c. Goal test: It is a function which observe the current state and returns whether the goal
state is achieved or not.
o Search tree: A tree representation of search problem is called Search tree. The root of
the search tree is the root node which is corresponding to the initial state.
o Actions: It gives the description of all the available actions to the agent.
o Transition model: A description of what each action do, can be represented as a
transition model.
o Path Cost: It is a function which assigns a numeric cost to each path.
o Solution: It is an action sequence which leads from the start node to the goal node.
o Optimal Solution: If a solution has the lowest cost among all solutions.

20
Properties of Search Algorithms:

Following are the four essential properties of search algorithms to compare the efficiency of
these algorithms:

Completeness: A search algorithm is said to be complete if it guarantees to return a solution if


at least any solution exists for any random input.

Optimality: If a solution found for an algorithm is guaranteed to be the best solution (lowest
path cost) among all other solutions, then such a solution for is said to be an optimal solution.
Time Complexity: Time complexity is a measure of time for an algorithm to complete its task.

Space Complexity: It is the maximum storage space required at any point during the search, as
the complexity of the problem.

Types of Search Algorithms


Based on the search problems we can classify the search algorithms into uninformed (Blind
search) search and informed search (Heuristic search) algorithms.

Figure: Types of Search Algorithms

Brute-Force Search/ Uninformed/Blind Search Methods


Uninformed search is a class of general-purpose search algorithms which operates in brute force-
way. Uninformed search algorithms do not have additional information about state or search space
other than how to traverse the tree.

21
The uninformed search does not contain any domain knowledge such as closeness, the location
of the goal. It only includes information about how to traverse the tree and how to identify leaf
and goal nodes. Uninformed search applies a way in which search tree is searched without any
information about the search space like initial state operators and test for the goal, so it is also
called blind search. It examines each node of the tree until it achieves the goal node.

It can be divided into five main types:


o Breadth-first search
o Uniform cost search
o Depth-first search
o Iterative deepening depth-first search
o Bidirectional Search

1. Breadth-first Search (BFS):


Consider the state space of a problem that takes the form of a tree. Now, if we search the
goal along each breadth of the tree, starting from the root and continuing up to the largest
depth, we call it breadth first search.

o Breadth-first search is the most common search strategy for traversing a tree or graph.
This algorithm searches breadthwise in a tree or graph, so it is called breadth-first search.

o BFS algorithm starts searching from the root node of the tree and expands all successor
node at the current level before moving to nodes of next level.
o The breadth-first search algorithm is an example of a general-graph search algorithm.
o Breadth-first search is implemented using FIFO queue data structure.

Advantages:
o BFS will provide a solution if any solution exists.

o If there are more than one solutions for a given problem, then BFS will provide the
minimal solution which requires the least number of steps.

Disadvantages:
o It requires lots of memory since each level of the tree must be saved into memory to
expand the next level.
o BFS needs lots of time if the solution is far away from the root node.

22
BFS illustrated:
Step 1: Initially fringe contains only one node corresponding to the source state A.

Figure: Fringe containing the source state A.

Step 2: A is removed from fringe. The node is expanded, and its children B and C are generated.
They are placed at the back of fringe.

Figure: Fringe BC

Step 3: Node B is removed from fringe and is expanded. Its children D, E are generated and put
at the back of fringe.

Figure: Fringe CDE

23
Step 4: Node C is removed from fringe and is expanded. Its children D and G are added to the
back of fringe.

Figure: Fringe DEDG

Step 5: Node D is removed from fringe. Its children C and F are generated and added to the back
of fringe.

Figure: Fringe EDGCF


Step 6: Node E is removed from fringe. It has no children.

Figure: Fringe DGCF


Step 7: D is expanded; B and F are put in OPEN.

24
Figure: Fringe GCFBF

Step 8: G is selected for expansion. It is found to be a goal node. So the algorithm returns the
path A C G by following the parent pointers of the node corresponding to G. The algorithm
terminates.

The above example shows the traversing of the tree using BFS algorithm from the root node S to
goal node K. BFS search algorithm traverse in layers.

Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of nodes
traversed in BFS until the shallowest Node. Where the d= depth of shallowest solution and b is a
node at every state.

T (b) = 1+b2+b3+.......+ bd= O (bd)


Space Complexity: Space complexity of BFS algorithm is given by the Memory size of frontier
which is O(bd).

Completeness: BFS is complete, which means if the shallowest goal node is at some finite depth,
then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.

2. Depth-First Search (DFS):


The goal is sometimes searched for along the largest depth of the tree, the search is move up only
when further traversal along the depth is not possible. Attempt is then made to find alternative
offspring of the parent of the node (state) last visited. If the nodes of a tree is visited using the
above principles to search the goal, the traversal made is called depth first traversal and
consequently the search strategy is called depth first search.
o Depth-first search is a recursive algorithm for traversing a tree or graph data structure.

o It is called the depth-first search because it starts from the root node and follows each path
to its greatest depth node before moving to the next path.

25
o DFS uses a stack data structure for its implementation.
o The process of the DFS algorithm is similar to the BFS algorithm.

Advantage:
o DFS requires very less memory as it only needs to store a stack of the nodes on the path
from root node to the current node.

o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right
path).

Disadvantage:
o There is the possibility that many states keep re-occurring, and there is no guarantee of
finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.

DFS illustrated:
Step 1: Initially fringe contains only the node for A.

Figure: Fringe containing the source state A.


Step 2: A is removed from fringe. A is expanded and its children B and C are put in front of
fringe.

Figure: Fringe BC
Step 3: Node B is removed from fringe, and its children D and E are pushed in front of fringe.

26
Figure: Fringe DEC
Step 4: Node D is removed from fringe. C and F are pushed in front of fringe.

Figure: Fringe CFEC


Step 5: Node C is removed from fringe. Its child G is pushed in front of fringe.

Figure: Fringe GFEC


Step 6: Node G is expanded and found to be a goal node.

27
Figure: Fringe GFEC
The solution path A-B-D-C-G is returned and the algorithm terminates.

Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:

T(n)= 1+ n2+ n3 +.........+ nm=O(nm)

Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity of DFS is equivalent to the size of the fringe set, which is O(bm).

Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node.

3. Iterative Deepening Depth-First Search (IDDFS):


Iterative Deepening Search (IDS)
The iterative deepening algorithm is a combination of DFS and BFS algorithms. In other words,
IDS is an iterative graph searching strategy that takes advantage of the completeness of the
Breadth-First Search (BFS) strategy but uses much less memory in each iteration (similar to
Depth-First Search). It searches each branch of a node from left to right until it reaches the
required depth. Once it has, IDS goes back to the root node and explores a different branch that
is similar to DFS.

This search algorithm finds out the best depth limit and does it by gradually increasing the limit
until a goal is found. This algorithm performs depth-first search up to a certain "depth limit", and
it keeps increasing the depth limit after each iteration until the goal node is found.

28
Advantages:
o It combines the benefits of BFS and DFS search algorithm in terms of fast search and
memory efficiency.

Disadvantages:
o The main drawback of IDDFS is that it repeats all the work of the previous phase.
o The time taken to reach the goal node is exponential.

IDDFS illustrated:

As a general rule of thumb, we use iterative deepening when we do not know the depth of our
solution and have to search a very large state space.

Completeness:
This algorithm is complete is if the branching factor is finite.

29
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case time complexity is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the node.

Informed Search Methods (Best First, A*)


Many traditional search algorithms are used in AI applications. For complex problems, the
traditional algorithms are unable to find the solution within some practical time and space limits.
Consequently, many special techniques are developed; using heuristic functions. The algorithms
that use heuristic functions are called heuristic algorithms. Heuristic algorithms are not really
intelligent; they appear to be intelligent because they achieve better performance.

Heuristic algorithms are more efficient because they take advantage of feedback from the data to
direct the search path.

Heuristics and Admissibility


Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the most
promising path. It takes the current state of the agent as its input and produces the estimation of
how close agent is from the goal. The heuristic method, however, might not always give the best
solution, but it guaranteed to find a good solution in reasonable time. Heuristic function estimates
how close a state is to the goal. It is represented by h(n), and it calculates the cost of an optimal
path between the pair of states. The value of the heuristic function is always positive.
Admissibility of the heuristic function is given as:
h(n) <= h*(n)

Here, h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less
than or equal to the estimated cost.

Pure Heuristic Search:


Pure heuristic search is the simplest form of heuristic search algorithms. It expands nodes based
on their heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In the CLOSED list,
it places those nodes which have already expanded and in the OPEN list, it places nodes which
have yet not been expanded.

On each iteration, each node n with the lowest heuristic value is expanded and generates all its
successors and n is placed to the closed list. The algorithm continues unit a goal state is found.

30
Two main algorithms of the informed search are given below:

1. Best First Search Algorithm(Greedy search)


Greedy best-first search algorithm always selects the path which appears best at that moment. It is
the combination of depth-first search and breadth-first search algorithms. It uses the heuristic
function and search. Best-first search allows us to take the advantages of both algorithms.

The idea of Best First Search is to use an evaluation function to decide which adjacent node is
most promising and then explore. In other words, with the help of best-first search, at each step,
the most promising node is chosen. In the best first search algorithm, the node which is closest to
the goal node is expanded and the closest cost is estimated by heuristic function, i.e.
f(n)= h(n). Where, h(n)= estimated cost from node n to the goal.

The greedy best first algorithm is implemented by the priority queue in order to store costs of
nodes.
Best first search algorithm:
o Step 1: Place the starting node into the OPEN list.
o Step 2: If the OPEN list is empty, Stop and return failure.

o Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and
places it in the CLOSED list.
o Step 4: Expand the node n, and generate the successors of node n.

o Step 5: Check each successor of node n, and find whether any node is a goal node or not.
If any successor node is goal node, then return success and terminate the search, else
proceed to Step 6.

o Step 6: For each successor node, algorithm checks for evaluation function f(n), and then
check if the node has been in either OPEN or CLOSED list. If the node has not been in
both list, then add it to the OPEN list.
o Step 7: Return to Step 2.

Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.

Disadvantages:
o This algorithm is not optimal.

31
o It is incomplete because it can start down an infinite path and never return to try other
possibilities.
o The worst-case time complexity for greedy search is O (bm ), where m is the maximum
depth of the search space.
o Because greedy search retains all nodes in memory, its space complexity is the same as its
time complexity

Example:
Consider the below search problem, and we will traverse it using best-first search. At each
iteration, each node is expanded using evaluation function f(n)=h(n)

Start from source "S" and search for goal "I" using given costs and Best First search.

Priority_Queue initially contains S. remove S from and process unvisited neighbors of S to


Priority_Queue.
Priority_Queue now contains {A, C, B} (C is put before B because C has lesser cost).
Remove A from Priority_Queue and process unvisited neighbors of A to Priority_Queue.
Priority_Queue now contains {C, B, E, D}
Remove C from Priority_Queue and process unvisited neighbors of C to Priority_Queue.
Priority_Queue now contains {B, H, E, D}
Remove B from Priority_Queue and process unvisited neighbors of B to Priority_Queue.
Priority_Queue now contains {H, E, D, F, G}
Remove H from Priority_Queue. Since our goal "I" is a neighbor of H, we return.

32
Time Complexity: The worst case time complexity of Best first search is O(bm).

Space Complexity: The worst case space complexity of Best first search is O(bm). Where, m is
the maximum depth of the search space.
Complete: Best-first search is also incomplete, even if the given state space is finite.
Optimal: Best first search algorithm is not optimal.

2. A* Search Algorithm
The Best First algorithm is a simplified form of the A* algorithm.

The A* search algorithm is a tree search algorithm that finds a path from a given initial node to a
given goal node (or one passing a given goal test). It employs a "heuristic estimate" which ranks
each node by an estimate of the best route that goes through that node. It visits the nodes in order
of this heuristic estimate. In other words, A * algorithm searches for the shortest path between the
initial and the final state.
In maps the A* algorithm is used to calculate the shortest distance between the source (initial state)
and the destination (final state).

A* Search algorithm is one of the best and popular technique used in path-finding and graph
traversals.

Similar to greedy best-first search but is more accurate because A* takes into account the nodes
that have already been traversed.

A* search properties:
o The algorithm A* is admissible. This means that provided a solution exists, the first
solution found by A* is an optimal solution. A* is admissible under the following
conditions:
o Heuristic function: for every node n, h(n) ≤ h*(n) .
o A* is also complete.
o A* is optimally efficient for a given heuristic.
o A* is much more efficient that uninformed search.

A* algorithm has 3 parameters:

g: the cost of moving from the initial cell to the current cell. Basically, it is the sum of all the cells
that have been visited since leaving the first cell. In other words, the movement cost to move from
the starting point to a given square on the grid, following the path generated to get there.

33
h: also called the heuristic value, it is the estimated cost of moving from the current cell to the final
cell. The actual cost cannot be calculated until the final cell is reached. Hence, h is the estimated
cost. We must make sure that there is never an over estimation of the cost.
f: it is the sum of g and h. So, f = g + h.

Advantages:
o A* search algorithm is the best algorithm than other search algorithms.
o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.

Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and
approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.

A* Search Algorithm illustrated:

The way that the algorithm makes its decisions is by taking the f-value into account. The algorithm
selects the smallest f-valued cell and moves to that cell. This process continues until the algorithm
reaches its goal cell. Notice how node B is never visited.

Constraint Satisfaction Problems

34
Sometimes a problem is not embedded in a long set of action sequences but requires picking the
best option from available choices. A good general-purpose problem solving technique is to list
the constraints of a situation (either negative constraints, like limitations, or positive elements that
can be found in the final solution). Then pick the choice that satisfies most of the constraints.

Formulation and Solution of Constraint Satisfaction Problems


Formally speaking, a constraint satisfaction problem (or CSP) is defined by a set of variables,
X1;X2; : : : ;Xn, and a set of constraints, C1;C2; : : : ;Cm. Each variable Xi has a non-empty
domain Di of possible values. Each constraint Ci involves some subset of t variables and specifies
the allowable combinations of values for that subset. A state of the problem is defined by an
assignment of values to some or all of the variables, {Xi = vi;Xj =vj ; : : :} An assignment that
does not violate any constraints is called a consistent or legal assignment. A complete assignment
is one in which every variable is mentioned, and a solution to a CSP is a complete assignment that
satisfies all the constraints. Some CSPs also require a solution that maximizes an objective
function.
CSP can be given an incremental formulation as a standard search problem as follows:
1. Initial state: the empty assignment fg, in which all variables are unassigned.
2. Successor function: a value can be assigned to any unassigned variable, provided that it does
not conflict with previously assigned variables.
3. Goal test: the current assignment is complete.
4. Path cost: a constant cost for every step.

Example:
The map coloring problem. The task of coloring each region red, green or blue in such a way that
no neighbouring regions have the same color. We are given the task of coloring each region red,
green, or blue in such a way that the neighboring regions must not have the same color.
To formulate this as CSP, we define the variable to be the regions: WA, NT, Q, NSW, V, SA, and
T. The domain of each variable is the set {red, green, blue}.

The constraints require neighboring regions to have distinct colors: for example, the allowable
combinations for WA and NT are the pairs
{(red,green),(red,blue),(green,red),(green,blue),(blue,red),(blue,green)}.
(The constraint can also be represented as the inequality WA ≠ NT).

35
There are many possible solutions, such as {WA = red, NT = green, Q = red, NSW = green, V =
red, SA = blue, T = red}.

Figure: Map of Australia showing each of its states and territories

Constraint Graph: A CSP is usually represented as an undirected graph, called constraint graph
where the nodes are the variables and the edges are the binary constraints.

Figure: Constraint graph-nodes are variable, arcs show constraints.


The map-coloring problem represented as a constraint graph.
CSP can be viewed as a standard search problem as follows:

 Initial state: the empty assignment {},in which all variables are unassigned.
 Successor function: a value can be assigned to any unassigned variable, provided that it
does not conflict with previously assigned variables.

36
 Goal test: the current assignment is complete.
 Path cost: a constant cost (E.g.,1) for every step.

Adversarial Search
Adversarial search, or game-tree search, is a technique for analyzing an adversarial game in order
to try to determine who can win the game and what moves the players should make in order to
win. Adversarial search is one of the oldest topics in Artificial Intelligence. The original ideas for
adversarial search were developed by Shannon in 1950 and independently by Turing in 1951, in
the context of the game of chess—and their ideas still form the basis for the techniques used today.

Adversarial search is a search, where we examine the problem which arises when we try to plan
ahead of the world and other agents are planning against us.

In previous topics, we have studied the search strategies which are only associated with a single
agent that aims to find the solution which are often expressed in the form of a sequence of actions.
But, there might be some situations where more than one agent is searching for the solution in the
same search space, and this situation usually occurs in game playing.
The environment with more than one agent is termed as multi-agent environment, in which each
agent is an opponent of other agent and playing against each other. Each agent needs to consider
the action of other agent and effect of that action on their performance.

So, Searches in which two or more players with conflicting goals are trying to explore the same
search space for the solution, are called adversarial searches, often known as Games. Games are
modelled as a Search problem and heuristic evaluation function, and these are the two main factors
which help to model and solve games in AI.

Types of Games in AI:

Deterministic Chance Moves


Perfect information Chess, Checkers, go, Othello Backgammon, monopoly
Imperfect information Battleships, blind, tic-tac-toe Bridge, poker, scrabble, nuclear war

Perfect information: A game with the perfect information is that in which agents can look into the
complete board. Agents have all the information about the game, and they can see each other
moves also. Examples are Chess, Checkers, Go, etc.

37
Imperfect information: If in a game agents do not have all information about the game and not
aware with what's going on, such type of games are called the game with imperfect information,
such as tic-tac-toe, Battleship, blind, Bridge, etc.

Deterministic games: Deterministic games are those games which follow a strict pattern and set of
rules for the games, and there is no randomness associated with them. Examples are chess,
Checkers, Go, tic-tac-toe, etc.

Non-deterministic games: Non-deterministic are those games which have various unpredictable
events and has a factor of chance or luck. This factor of chance or luck is introduced by either dice
or cards. These are random, and each action response is not fixed. Such games are also called as
stochastic games. Example: Backgammon, Monopoly, Poker, etc.

Zero-Sum Game
Zero-sum games are adversarial search which involves pure competition. In Zero-sum game each
agent's gain or loss of utility is exactly balanced by the losses or gains of utility of another agent.
One player of the game try to maximize one single value, while other player tries to minimize it.
Each move by one player in the game is called ply. Chess and tic-tac-toe are examples of a Zero-
sum game.

Zero-sum game: Embedded thinking


The Zero-sum game involved embedded thinking in which one agent or player is trying to figure
out:

 What to do.
 How to decide the move
 Needs to think about his opponent as well
 The opponent also thinks what to do

Each of the players is trying to find out the response of his opponent to their actions. This requires
embedded thinking or backward reasoning to solve the game problems in AI.

Minimax (2-Person Games)

o Mini-max algorithm is a recursive or backtracking algorithm which is used in decision-


making and game theory. It provides an optimal move for the player assuming that
opponent is also playing optimally.
o Mini-Max algorithm uses recursion to search through the game-tree.

38
o Minimax algorithm is mostly used for game playing in AI. Such as Chess, Checkers, tic-
tac-toe, go, and various tow-players game. This Algorithm computes the minimax decision
for the current state.
o In this algorithm two players play the game, one is called MAX and other is called MIN.
o Both the players fight it as the opponent player gets the minimum benefit while they get
the maximum benefit.
o Both Players of the game are opponent of each other, where MAX will select the
maximized value and MIN will select the minimized value.
o The minimax algorithm performs a depth-first search algorithm for the exploration of the
complete game tree.
o The minimax algorithm proceeds all the way down to the terminal node of the tree, then
backtrack the tree as the recursion.

MiniMax Terminologies:

 Players are called: Max and Min.


 Initial State: Includes board position and whose turn it is.
 Operators: These correspond to legal moves.
 Terminal Test: A test applied to a board position which determines whether the game is
over. In chess, for example, this would be a checkmate or stalemate situation.
 Utility Function: Is a measurement of how valuable the final result in that tree is.
Alternatively, it is known as a function which assigns a numeric value to a terminal state.
For example, in chess the outcome is win (+1), lose (-1) or draw (0). Note that by
convention, we always measure utility relative to Max.

MiniMax Algorithm:
1. Generate the whole game tree.
2. Apply the utility function to leaf nodes to get their values.
3. Use the utility of nodes at level n to derive the utility of nodes at level n-1.
4. Continue backing up values towards the root (one layer at a time).
5. Eventually the backed up values reach the top of the tree, at which point Max chooses the move
that yields the highest value. This is called the minimax decision because it maximises the utility
for Max on the assumption that Min will play perfectly to minimise it.

39
Working of Min-Max Algorithm:
o The working of the minimax algorithm can be easily described using an example. Below
is an example of game-tree which is representing the two-player game.
o In this example, there are two players one is called Maximizer and other is called
Minimizer.
o Maximizer will try to get the Maximum possible score, and Minimizer will try to get the
minimum possible score. Summarily, it means, maximize your win while your opponent is
trying to minimize their loss.
o This algorithm applies DFS, so in this game-tree, we have to go all the way through the
leaves to reach the terminal nodes.
o At the terminal node, the terminal values are given so we will compare those value and
backtrack the tree until the initial state occurs. Following are the main steps involved in
solving the two-player game tree:

Step-1: In the first step, the algorithm generates the entire game-tree and apply the utility function
to get the utility values for the terminal states. In the below tree diagram, let's take A as the initial
state of the tree. Suppose maximizer takes first turn which has worst-case initial value =- infinity,
and minimizer will take next turn which has worst-case initial value = +infinity.

40
Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we will
compare each value in terminal state with initial value of Maximizer and determine the higher
nodes values. It will find the maximum among all nodes.
o For node D max(-1,- -∞) => max(-1,4)= 4
o For Node E max(2, -∞) => max(2, 6)= 6
o For Node F max(-3, -∞) => max(-3,-5) = -3
o For node G max(0, -∞) => max(0, 7) = 7

Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value with +∞, and
will find the 3rd layer node values.
o For node B= min(4,6) = 4
o For node C= min(-3, 7) = -3

Step 4: Now it's a turn for Maximizer, and it will again choose the maximum of all nodes value
and find the maximum value for the root node. In this game tree, there are only 4 layers, hence we
reach root node fast, but in real games, there will most likely be more than 4 layers.
For node A max(4, -3)= 4

41
Properties of MiniMax algorithm:
Complete- MiniMax algorithm is Complete. It will definitely find a solution (if exist), in the
finite search tree.
Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.

Time complexity- As it performs DFS for the game-tree, so the time complexity of Min-Max
algorithm is O(bm), where b is branching factor of the game-tree, and m is the maximum depth
of the tree.

Space Complexity- Space complexity of Mini-max algorithm is also similar to DFS which is
O(bm).

Limitation of the minimax Algorithm:


The main drawback of the minimax algorithm is that it gets really slow for complex games such
as Chess, go, etc. This type of games has a huge branching factor, and the player has lots of
choices to decide, hence, not always feasible to traverse entire tree.

Alpha-Beta Pruning
Pruning: eliminating a branch of the search tree from consideration without exhaustive
examination of each node.

• α-β Pruning: the basic idea is to prune portions of the search tree that cannot improve the utility
value of the max or min node, by just considering the values of nodes seen so far. Alpha-beta
pruning is used on top of minimax search to detect paths that do not need to be explored. In other
words, alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization
technique for the minimax algorithm.
Terminologies:

42
• The MAX player is always trying to maximize the score. Call this α.
• The MIN player is always trying to minimize the score. Call this β.

• Alpha cut-off: Given a Max node n, cutoff the search below n (i.e., don't generate or examine
any more of n's children) if alpha(n) >= beta(n).

• Beta cut-off.: Given a Min node n, cutoff the search below n (i.e., don't generate or examine any
more of n's children) if beta(n) <= alpha(n).

Example 1:
1. Setup phase:

Assign to each left-most (or right-most) internal node of the tree, variables: alpha = -infinity (-∞),
beta = +infinity (+∞).

2. Look at first computed final configuration value. It’s a 3. Parent is a min node, so set the beta
(min) value to 3.

43
3. Look at next value, 5. Since parent is a min node, we want the minimum of 3 and 5 which is 3.
Parent min node is done – fill alpha (max) value of its parent max node. Always set alpha for max
nodes and beta for min nodes. Copy the state of the max parent node into the second unevaluated
min child.

4. Look at next value, 2. Since parent node is min with b=+inf, 2 is smaller, change beta.

5. Now, the min parent node has a max value of 3 and min value of 2. The value of the 2nd child
does not matter. If it is >2, 2 will be selected for min node. If it is <2, it will be selected for min
node, but since it is <3 it will not get selected for the parent max node. Thus, we prune the right
sub-tree of the min node. Propagate max value up the tree.

44
6. Max node is now done and we can set the beta value of its parent and propagate node state to
sibling sub tree’s left-most path.

7. The next node is 10. 10 is not smaller than 3, so state of parent does not change. We still have
to look at the 2nd child since alpha is still –inf.

8. The next node is 4. Smallest value goes to the parent min node. Min subtree is done, so the
parent max node gets the alpha (max) value from the child. Note that if the max node had a 2nd
subtree, we can prune it since a>b.

45
9. Continue propagating value up the tree, modifying the corresponding alpha/beta values. Also
propagate the state of root node down the left-most path of the right subtree.

10. Next value is a 2. We set the beta (min) value of the min parent to 2. Since no other children
exist, we propagate the value up the tree.

11. We have a value for the 3rd level max node, now we can modify the beta (min) value of the
min parent to 2. Now, we have a situation that a>b and thus the value of the rightmost subtree of
the min node does not matter, so we prune the whole subtree.

46
12. Finally, no more nodes remain, we propagate values up the tree. The root has a value of 3
that comes from the left-most child. Thus, the player should choose the left-most child’s move in
order to maximize his/her winnings. As you can see, the result is the same as with the mini-max
example, but we did not visit all nodes of the tree.

Example 2:

ADVANCED SEARCH
Genetic Algorithm
A genetic algorithm is a search heuristic that is inspired by Charles Darwin’s theory of natural
evolution. This algorithm reflects the process of natural selection where the fittest individuals are
selected for reproduction in order to produce offspring of the next generation.

47
Genetic algorithms are a type of optimization algorithm, meaning they are used to find the optimal
solution(s) to a given computational problem that maximizes or minimizes a particular function.
Genetic algorithms represent one branch of the field of study called evolutionary computation, in
that they imitate the biological processes of reproduction and natural selection to solve for the
‘fittest’ solutions. Like in evolution, many of a genetic algorithm’s processes are random, however
this optimization technique allows one to set the level of randomization and the level of control.
These algorithms are far more powerful and efficient than random search and exhaustive search
algorithms, yet require no extra information about the given problem.

A genetic algorithm (or GA) is a search technique used in computing to find true or approximate
solutions to optimization and search problems. GAs are categorized as global search heuristics.

GAs are a particular class of evolutionary algorithms that use techniques inspired by evolutionary
biology such as inheritance, mutation, selection, and crossover (also called recombination).

Natural Selection
The process of natural selection starts with the selection of fittest individuals from a population.
They produce offspring which inherit the characteristics of the parents and will be added to the
next generation. If parents have better fitness, their offspring will be better than parents and have
a better chance at surviving. This process keeps on iterating and at the end, a generation with the
fittest individuals will be found.

This notion can be applied for a search problem. We consider a set of solutions for a problem
and select the set of best ones out of them. Five phases are considered in a genetic algorithm:
Initial population
Fitness function
Selection
Crossover
Mutation

48
1. Initial Population
The process begins with a set of individuals which is called a Population. Each individual is a
solution to the problem you want to solve.
An individual is characterized by a set of parameters (variables) known as Genes. Genes are
joined into a string to form a Chromosome (solution).
In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of
an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the
genes in a chromosome.

2. Fitness Function
The fitness function determines how fit an individual is (the ability of an individual to compete
with other individuals). It gives a fitness score to each individual. The probability that an individual
will be selected for reproduction is based on its fitness score.

3. Selection
The idea of selection phase is to select the fittest individuals and let them pass their genes to the
next generation. Two pairs of individuals (parents) are selected based on their fitness scores.
Individuals with high fitness have more chance to be selected for reproduction.

4. Crossover
Crossover is the most significant phase in a genetic algorithm. For each pair of parents to be
mated, a crossover point is chosen at random from within the genes.
For example, consider the crossover point to be 3 as shown below.

Crossover point Exchanging genes among parents

Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached. The new offspring are added to the population.

49
New offspring

5. Mutation
In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped. Mutation
occurs to maintain diversity within the population and prevent premature convergence.

Termination:
The algorithm terminates if the population has converged (does not produce offspring which are
significantly different from the previous generation). Then it is said that the genetic algorithm
has provided a set of solutions to our problem.

Note:
The population has a fixed size. As new generations are formed, individuals with least fitness
die, providing space for new offspring.

Applications of Genetic Algorithm

Optimization − Genetic Algorithms are most commonly used in optimization problems wherein
we have to maximize or minimize a given objective function value under a given set of
constraints. The approach to solve Optimization problems has been highlighted throughout the
tutorial.

Economics − GAs are also used to characterize various economic models like the cobweb model,
game theory equilibrium resolution, asset pricing, etc.

Neural Networks − GAs are also used to train neural networks, particularly recurrent neural
networks.
Parallelization − GAs also have very good parallel capabilities, and prove to be very effective
means in solving certain problems, and also provide a good area for research.

50
Image Processing − GAs are used for various digital image processing (DIP) tasks as well like
dense pixel matching.

Scheduling applications − GAs are used to solve various scheduling problems as well,
particularly the time tabling problem.

Robot Trajectory Generation − GAs have been used to plan the path which a robot arm takes by
moving from one point to another.
Parametric Design of Aircraft − GAs have been used to design aircrafts by varying the
parameters and evolving better solutions.

DNA Analysis − GAs have been used to determine the structure of DNA using spectrometric
data about the sample.

Multimodal Optimization − GAs are obviously very good approaches for multimodal
optimization in which we have to find multiple optimum solutions.

Example:
The Traveling Salesman Problem:
The aim is to find a tour of a given set of cities so that each city is visited only once. The total
distance traveled is minimized.
 Representation: is an ordered list of city numbers known as an order-based GA.
1) London 3) Dunedin 5) Beijing 7) Tokyo
2) Venice 4) Singapore 6) Phoenix 8) Victoria
CityList1 (3 5 7 2 1 6 4 8)
CityList2 (2 5 7 6 8 1 3 4)
 Crossover: combines inversion and recombination:
Parent1 (3 5 7 2 1 6 4 8)
Parent2 (2 5 7 6 8 1 3 4)
______________________________
Child (2 5 7 6 8 1 3 4)
This operator is called the Order1 crossover.
 Mutation: involves reordering of the list:

51
Before: (5 8 7 2 1 6 3 4)
After: (5 8 6 2 1 7 3 4)

Simulated Annealing
The Simulated Annealing algorithm is based upon Physical Annealing in real life. Physical
Annealing is the process of heating up a material until it reaches an annealing temperature and
then it will be cooled down slowly in order to change the material to a desired structure.

Annealing is a thermal process for obtaining low energy states of a solid in a heat bath.The process
contains two steps:

1. Increase the temperature of the heat bath to a maximum value at which the solid melts.
2. Decrease carefully the temperature of the heat bath until the particles arrange themselves in
the ground state of the solid. Ground state is a minimum energy state of the solid.

The ground state of the solid is obtained only if the maximum temperature is high enough and the
cooling is done slowly.

Simulated Annealing (SA) is used for optimizing parameters in a model. This process is very useful
for situations where there are a lot of local minima such that algorithms like Gradient Descent
would be stuck at.

In problems like the one above, if Gradient Descent started at the starting point indicated, it
would be stuck at the local minima and not be able to reach the global minima.

Knowledge Representation
Humans are best at understanding, reasoning, and interpreting knowledge. Human knows things,
which is knowledge and as per their knowledge they perform various actions in the real world. But
how machines do all these things comes under knowledge representation and reasoning. Hence we
can describe Knowledge representation as following:

52
Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence which
concerned with AI agents thinking and how thinking contributes to intelligent behavior of agents.

It is responsible for representing information about the real world so that a computer can
understand and can utilize this knowledge to solve the complex real world problems such as
diagnosis a medical condition or communicating with humans in natural language.

It is also a way which describes how we can represent knowledge in artificial intelligence.
Knowledge representation is not just storing data into some database, but it also enables an
intelligent machine to learn from that knowledge and experiences so that it can behave intelligently
like a human.

Types of Knowledge
Knowledge is awareness or familiarity gained by experiences of facts, data, and situations.
Following are the types of knowledge in artificial intelligence:

1. Declarative Knowledge:
Declarative knowledge is to know about something.
It includes concepts, facts, and objects.
It is also called descriptive knowledge and expressed in declarative sentences.
It is simpler than procedural language

2. Procedural Knowledge:
It is also known as imperative knowledge.
Procedural knowledge is a type of knowledge which is responsible for knowing how to do
something.
It can be directly applied to any task.
It includes rules, strategies, procedures, agendas, etc.

3. Meta-knowledge:
Knowledge about the other types of knowledge is called Meta-knowledge.

53
4. Heuristic knowledge:
Heuristic knowledge is representing knowledge of some experts in a filed or subject.
Heuristic knowledge is rules of thumb based on previous experiences, awareness of approaches,
and which are good to work but not guaranteed.

5. Structural knowledge:
Structural knowledge is basic knowledge to problem-solving.
It describes relationships between various concepts such as kind of, part of, and grouping of
something.
It describes the relationship that exists between concepts or objects.

The Relation between Knowledge and Intelligence


Knowledge of real-worlds plays a vital role in intelligence and same for creating artificial
intelligence. Knowledge plays an important role in demonstrating intelligent behavior in AI agents.
An agent is only able to accurately act on some input when he has some knowledge or experience
about that input.

Techniques of Knowledge Representation


There are mainly four ways of knowledge representation which are given as follows:

Logical Representation
Semantic Network Representation
Frame Representation
Production Rules

1. Logical Representation
Logical representation is a language with some concrete rules which deals with propositions and
has no ambiguity in representation. Logical representation means drawing a conclusion based on
various conditions. This representation lays down some important communication rules. It consists
of precisely defined syntax and semantics which supports the sound inference. Each sentence can
be translated into logics using syntax and semantics.

Syntax:
o Syntaxes are the rules which decide how we can construct legal sentences in the logic.
o It determines which symbol we can use in knowledge representation.
o How to write those symbols.
Semantics:
o Semantics are the rules by which we can interpret the sentence in the logic.
o Semantic also involves assigning a meaning to each sentence.

Logical representation can be categorized into mainly two logics:


1. Propositional Logics
2. Predicate logics

54
Advantages of logical representation:
1. Logical representation enables us to do logical reasoning.
2. Logical representation is the basis for the programming languages.
Disadvantages of logical Representation:
1. Logical representations have some restrictions and are challenging to work with.
2. Logical representation technique may not be very natural, and inference may not be so
efficient.

2. Semantic Network Representation


Semantic networks are alternative of predicate logic for knowledge representation. In Semantic
networks, we can represent our knowledge in the form of graphical networks. This network
consists of nodes representing objects and arcs which describe the relationship between those
objects. Semantic networks can categorize the object in different forms and can also link those
objects. Semantic networks are easy to understand and can be easily extended.
This representation consist of mainly two types of relations:
o IS-A relation (Inheritance)
o Kind-of-relation
Example:
Following are some statements which we need to represent in the form of nodes and arcs.

Statements:
a. Jerry is a cat.
b. Jerry is a mammal
c. Jerry is owned by Priya.
d. Jerry is white colored.
e. All Mammals are animal.

In the diagram above, the different types of knowledge has been represented in the form of nodes
and arcs. Each object is connected with another object by some relation.

Advantages of Semantic network:


1. Semantic networks are a natural representation of knowledge.
2. Semantic networks convey meaning in a transparent manner.
3. These networks are simple and easily understandable.

Disadvantages of Semantic representation:

55
1. Semantic networks take more computational time at runtime as we need to traverse the complete
network tree to answer some questions. It might be possible in the worst case scenario that after
traversing the entire tree, we find that the solution does not exist in this network.
2. Semantic networks try to model human-like memory (Which has 1015 neurons and links) to
store the information, but in practice, it is not possible to build such a vast semantic network.
3. These types of representations are inadequate as they do not have any equivalent quantifier,
e.g., for all, for some, none, etc.
4. Semantic networks do not have any standard definition for the link names.
5. These networks are not intelligent and depend on the creator of the system.

3. Frame Representation
A frame is a record like structure which consists of a collection of attributes and its values to
describe an entity in the world. Frames are the AI data structure which divides knowledge into
substructures by representing stereotypes situations. It consists of a collection of slots and slot
values. These slots may be of any type and sizes. Slots have names and values which are called
facets.

Facets: The various aspects of a slot is known as Facets. Facets are features of frames which enable
us to put constraints on the frames. Example: IF-NEEDED facts are called when data of any
particular slot is needed. A frame may consist of any number of slots, and a slot may include any
number of facets and facets may have any number of values. A frame is also known as slot-filter
knowledge representation in artificial intelligence.

Frames are derived from semantic networks and later evolved into our modern-day classes and
objects. A single frame is not much useful. Frames system consist of a collection of frames which
are connected. In the frame, knowledge about an object or event can be stored together in the
knowledge base. The frame is a type of technology which is widely used in various applications
including Natural language processing and machine visions.

Example:
Let's take an example of a frame for a book

Slots Filters

Title Artificial Intelligence

Genre Computer Science

Author Peter Norvig

Edition Third Edition

56
Year 1996

Page 1152

Advantages of frame representation:


1. The frame knowledge representation makes the programming easier by grouping the
related data.
2. The frame representation is comparably flexible and used by many applications in AI.
3. It is very easy to add slots for new attribute and relations.
4. It is easy to include default data and to search for missing values.
5. Frame representation is easy to understand and visualize.

Disadvantages of frame representation:


1. In frame system inference mechanism is not be easily processed.
2. Inference mechanism cannot be smoothly proceeded by frame representation.
3. Frame representation has a much generalized approach.

4. Production Rules
Production rules system consist of (condition, action) pairs which mean, "If condition then action".
It has mainly three parts:

1 The set of production rules


2 Working Memory
3 The recognize-act-cycle

In production rules, agent checks for the condition and if the condition exists then production rule
fires and corresponding action is carried out. The condition part of the rule determines which rule
may be applied to a problem. And the action part carries out the associated problem-solving steps.
This complete process is called a recognize-act cycle.

The working memory contains the description of the current state of problems-solving and rule
can write knowledge to the working memory. This knowledge match and may fire other rules.
If there is a new situation (state) generated, then multiple production rules will be fired together,
this is called conflict set. In this situation, the agent needs to select a rule from these sets, and it is
called a conflict resolution.

Example:
o IF (at bus stop AND bus arrives) THEN action (get into the bus)

57
o IF (on the bus AND paid AND empty seat) THEN action (sit down).
o IF (on bus AND unpaid) THEN action (pay charges).
o IF (bus arrives at destination) THEN action (get down from the bus).

Advantages of Production rule:


1. The production rules are expressed in natural language.
2. The production rules are highly modular, so we can easily remove, add or modify an
individual rule.
Disadvantages of Production rule:
1. Production rule system does not exhibit any learning capabilities, as it does not store the
result of the problem for the future uses.
2. During the execution of the program, many rules may be active hence rule-based
production systems are inefficient.

Propositional Logic
Propositional logic is the branch of logic that studies ways of joining and/or modifying entire
propositions, statements or sentences to form more complicated propositions, statements or
sentences, as well as the logical relationships and properties that are derived from these methods
of combining or altering statements. A proposition is a declarative statement which is either true
or false. It is a technique of knowledge representation in logical and mathematical form.

Example:
1. It is Sunday.
2. The Sun rises from West (False proposition)
3. 3+3= 7(False proposition)
4. 5 is a prime number.
5.
Basic facts about propositional logic:
o Propositional logic is also called Boolean logic as it works on 0 and 1.
o In propositional logic, we use symbolic variables to represent the logic, and we can use
any symbol for a representing a proposition, such A, B, C, P, Q, R, etc.
o Propositions can be either true or false, but it cannot be both.
o Propositional logic consists of an object, relations or function, and logical connectives.
o These connectives are also called logical operators.
o The propositions and connectives are the basic elements of the propositional logic.
o Connectives can be said as a logical operator which connects two sentences.
o A proposition formula which is always false is called Contradiction.
o A proposition formula which has both true and false values is called

58
o Statements which are questions, commands, or opinions are not propositions, e.g.
"Where is Ruth?", "How are you?", "What is your name?”.
Syntax of propositional logic:
The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:

A. Atomic Proposition: Atomic propositions are simple propositions. It consists of a single


proposition symbol. These are the sentences which must be either true or false.
Example:
a) 2+2 is 4, it is an atomic proposition as it is a true fact.
b) "The Sun is cold" is also a proposition as it is a false fact.

B. Compound proposition: Compound propositions are constructed by combining simpler or


atomic propositions, using parenthesis and logical connectives.
Example:
a) "It is raining today, and the street is wet."
b) "Ade is a doctor, and his clinic is in Lagos."

Logical Connectives
Logical connectives are used to connect two simpler propositions or representing a sentence
logically. We can create compound propositions with the help of logical connectives. There are
mainly five connectives, which are given as follows:
1. Negation: A sentence such as ¬ P is called negation of P. A literal can be either Positive
literal or negative literal.
2. Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a conjunction.
Example: “Kemi is intelligent and hardworking”. It can be written as,
P= Kemi is intelligent,
Q= Kemi is hardworking. → P∧ Q.
3. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction,
where P and Q are the propositions.
Example: "Ahmed is a doctor or Engineer",
Here P= Ahmed is Doctor. Q= Ahmed is Doctor, so we can write it as P ∨ Q.
4. Implication: A sentence such as P → Q, is called an implication. Implications are also
known as if-then rules. It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
5. Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If I am
breathing, then I am alive.
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.
Following is the summarized table for Propositional Logic Connectives:

59
Truth Table
In propositional logic, we need to know the truth values of propositions in all possible scenarios.
We can combine all the possible combination with logical connectives, and the representation of
these combinations in a tabular format is called Truth table. Following are the truth table for all
logical connectives:

Truth table with three propositions


We can build a proposition composing three propositions P, Q, and R. This truth table is made-
up of 8n Tuples as we have taken three proposition symbols.

60
Precedence of connectives
Just like arithmetic operators, there is a precedence order for propositional connectors or logical
operators. This order should be followed while evaluating a propositional problem. Following is
the list of the precedence order for operators:

Logical equivalence
Logical equivalence is one of the features of propositional logic. Two propositions are said to be
logically equivalent if and only if the columns in the truth table are identical to each other.
Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In
below truth table we can see that column for ¬A∨ B and A→B, are identical hence A is
Equivalent to B

61
Limitations of Propositional logic
1. We cannot represent relations like ALL, some, or none with propositional logic. Example:
All the girls are intelligent.
Some apples are sweet.
2. Propositional logic has limited expressive power. In propositional logic, we cannot describe
statements in terms of their properties or logical relationships.

First-Order Logic
In propositional logic, we can only represent the facts, which are either true or false. PL is not
sufficient to represent the complex sentences or natural language statements. The propositional
logic has very limited expressive power.
First-order logic is another way of knowledge representation in artificial intelligence. It is an
extension to propositional logic. FOL is sufficiently expressive to represent the natural language
statements in a concise way.
First-order logic is also known as Predicate logic or First-order predicate logic. First-order
logic is a powerful language that develops information about the objects in a more easy way and
can also express the relationship between those objects.
As a natural language, first-order logic has two main parts:
a. Syntax
b. Semantics
Syntax of First-Order logic:
The syntax of FOL determines which collection of symbols is a logical expression in first-order
logic. The basic syntactic elements of first-order logic are symbols. We write statements in short-
hand notation in FOL.
Basic Elements of First-order logic syntax:

62
Constant 1, 2, A, John, Lokoja, cat,…

Variables x, y, z, a, b,....

Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃

Atomic sentences
Atomic sentences are the most basic sentences of first-order logic. These sentences are formed
from a predicate symbol followed by a parenthesis with a sequence of terms. Atomic sentences
can be represented as: Predicate (term1, term2, ......, term n).
Example: Yemi and Ajayi are brothers: => Brothers(Yemi, Ajayi).
Tom is a cat: => cat (Tom).

Complex Sentences
Complex sentences are made by combining atomic sentences using connectives. First-order logic
statements can be divided into two parts:
Subject: Subject is the main part of the statement.
Predicate: A predicate can be defined as a relation, which binds two atoms together in a statement.

Consider the statement: "x is an integer.” it consists of two parts, the first part x is the subject of
the statement and second part "is an integer," is known as a predicate.

63
Quantifiers in First-order logic
A quantifier is a language element which generates quantification, and quantification specifies
the quantity of specimen in the universe of discourse.
These are the symbols that permit to determine or identify the range and scope of the variable in
the logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).
Universal Quantifier
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing. The Universal
quantifier is represented by a symbol ∀, which resembles an inverted A.
If x is a variable, then ∀x is read as:
 For all x
 For each x
 For every x.
Example:
All man drink coffee. In short hand notation, it is written as:
∀x man(x) → drink (x, coffee).
It will be read as: There are all x where x is a man who drink coffee.
The main connective for universal quantifier ∀ is implication →.

Existential Quantifier
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something. It is denoted by the logical operator ∃, which
resembles as inverted E. When it is used with a predicate variable then it is called as an
existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
 There exists a 'x.'
 For some 'x.'

64
 For at least one 'x.'
Example:
Some boys are intelligent.
∃x: boys(x) ∧ intelligent(x)
It will be read as: There are some x where x is a boy who is intelligent.
The main connective for existential quantifier ∃ is and ∧.

Examples of FOL using quantifier:


1. All birds fly.
In this statement, the predicate is "fly(bird)."
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).
2. Every man respects his parent.
In this statement, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as follows:
∀x man(x) → respects (x, parent).
3. Some boys play football.
In this statement, the predicate is "play(x, y)," where x= boys, and y= football. Since there are
some boys so we will use ∃, and it will be represented as:
∃x boys(x) → play(x, football).
4. Not all students like both Mathematics and Science.
In this statement, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so following representation for
this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].

Free and Bound Variables


The quantifiers interact with variables which appear in a suitable way. There are two types of
variables in First-order logic which are given below:
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope
of the quantifier.
Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the
scope of the quantifier.
Example: ∀x [A (x) B(y)], here x and y are the bound variables.

65
Inference in First-Order Logic (FOL)
Inference in First-Order Logic is used to deduce new facts or sentences from existing sentences.
Basic terminologies used in FOL.

Substitution:
Substitution is a fundamental operation performed on terms and formulas. It occurs in all
inference systems in first-order logic. The substitution is complex in the presence of quantifiers
in FOL. If given F[a/x], it means to substitute a constant "a" in place of variable "x".

Equality:
First-Order logic does not only use predicate and terms for making atomic sentences but also
uses another way, which is equality in FOL. For this, equality symbols which specify that the
two terms refer to the same object can be used.
Example: Brother (John) = Samuel
As in the above example, the object referred by the Brother (John) is similar to the object
referred by Samuel. The equality symbol can also be used with negation to represent that two
terms are not the same objects.
Example: ¬(x=y) which is equivalent to x ≠y.

First Order Logic inference rules for quantifier:


The following are some basic inference rules in FOL:
o Universal Generalization
o Universal Instantiation
o Existential Instantiation
o Existential introduction

1. Universal Generalization:
o Universal generalization is a valid inference rule which states that if premise P(c) is true
for any arbitrary element c in the universe of discourse, then we can have a conclusion as
∀ x P(x).

o It can be represented as:


o This rule can be used if we want to show that every element has a similar property.
o In this rule, x must not appear as a free variable.

66
Example: Let P(c): "A byte contains 8 bits", so for ∀ x P(x) "All bytes contain 8 bits.", it will
also be true.

2. Universal Instantiation (UI):


o Universal instantiation is also called as universal elimination or UI is a valid inference
rule. It can be applied multiple times to add new sentences.
o The new KB is logically equivalent to the previous KB.
o As per UI, we can infer any sentence obtained by substituting a ground term for the
variable.
o The UI rule state that we can infer any sentence P(c) by substituting a ground term c (a
constant within domain x) from ∀ x P(x) for any object in the universe of discourse.

o It can be represented as:


Example:1.
IF "Every person like ice-cream"=> ∀x P(x) so we can infer that
"John likes ice-cream" => P(c)
Example: 2.
"All kings who are greedy are Evil." So let our knowledge base contains this detail as in the form
of FOL:
∀x king(x) ∧ greedy (x) → Evil (x),
So from this information, we can infer any of the following statements using Universal
Instantiation:
King(John) ∧ Greedy (John) → Evil (John),
King(Richard) ∧ Greedy (Richard) → Evil (Richard),
King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),

3. Existential Instantiation:
o Existential instantiation is also called as Existential Elimination, which is a valid
inference rule in first-order logic.
o It can be applied only once to replace the existential sentence.
o The new KB is not logically equivalent to old KB, but it will be satisfiable if old KB was
satisfiable.
o This rule states that one can infer P(c) from the formula given in the form of ∃x P(x) for a
new constant symbol c.

67
o The restriction with this rule is that c used in the rule must be a new term for which P(c )
is true.

o It can be represented as:


Example:
From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the
knowledge base.
o The above used K is a constant symbol, which is called Skolem constant.
o The Existential instantiation is a special case of Skolemization process.
4. Existential Introduction
o An existential introduction is also known as an existential generalization, which is a valid
inference rule in first-order logic.
o This rule states that if there is some element c in the universe of discourse which has a
property P, then we can infer that there exists something in the universe which has the
property P.

o It can be represented as:


o Example:
"Patrick got good marks in English."
"Therefore, someone got good marks in English."
o

Forward and Backward Chaining


Forward chaining
Forward chaining is a method of reasoning in artificial intelligence in which inference rules are
applied to existing data to extract additional data until an endpoint (goal) is achieved.
In this type of chaining, the inference engine starts by evaluating existing facts, derivations, and
conditions before deducing new information. An endpoint (goal) is achieved through the
manipulation of knowledge that exists in the knowledge base. Forward chaining can be used in
planning, monitoring, controling, and interpreting applications.

68
Properties of forward chaining
 The process uses a down-up approach (bottom to top).
 It starts from an initial state and uses facts to make a conclusion.
 This approach is data-driven.
 It’s employed in expert systems and production rule system.

Examples:
A simple example of forward chaining can be explained in the following sequence.
A
A->B
B
A is the starting point. A->B represents a fact. This fact is used to achieve a decision B.

A practical example will go as follows;


Tom is running (A)
If a person is running, he will sweat (A->B)
Therefore, Tom is sweating. (B)
Advantages
 It can be used to draw multiple conclusions.
 It provides a good basis for arriving at conclusions.
 It’s more flexible than backward chaining because it does not have a limitation on the
data derived from it.
Disadvantages
 The process of forward chaining may be time-consuming. It may take a lot of time to
eliminate and synchronize available data.
 Unlike backward chaining, the explanation of facts or observations for this type of
chaining is not very clear. The former uses a goal-driven method that arrives at
conclusions efficiently.

Backward chaining
Backward chaining involves backtracking from the endpoint or goal to steps that led to the
endpoint. This type of chaining starts from the goal and moves backward to comprehend the
steps that were taken to attain this goal.
The backtracking process can also enable a person establish logical steps that can be used to find
other important solutions. Backward chaining can be used in debugging, diagnostics, and
prescription applications.

69
Properties of backward chaining
 The process uses an up-down approach (top to bottom).
 It’s a goal-driven method of reasoning.
 The endpoint (goal) is subdivided into sub-goals to prove the truth of facts.
 A backward chaining algorithm is employed in inference engines, game theories, and
complex database systems.
 The modus ponens inference rule is used as the basis for the backward chaining process.
This rule states that if both the conditional statement (p->q) and the antecedent (p) are
true, then we can infer the subsequent (q).

Examples:
B
A->B
A
B is the goal or endpoint that is used as the starting point for backward tracking. A is the initial
state. A->B is a fact that must be asserted to arrive at the endpoint B.
A practical example of backward chaining will go as follows:
Tom is sweating (B).
If a person is running, he will sweat (A->B).
Tom is running (A).

Advantages
 The result is already known, which makes it easy to deduce inferences.
 It’s a quicker method of reasoning than forward chaining because the endpoint is
available.
 In this type of chaining, correct solutions can be derived effectively if pre-determined
rules are met by the inference engine.

Disadvantages
 The process of reasoning can only start if the endpoint is known.
 It doesn’t deduce multiple solutions or answers.
 It only derives data that is needed, which makes it less flexible than forward chaining.

70
PROBABILISTIC REASONING
Probabilistic reasoning is a way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine
probability theory with logic to handle the uncertainty.
Probability is used in probabilistic reasoning because it provides a way to handle the uncertainty
that is the result of someone's laziness and ignorance. In the real world, there are lots of scenarios,
where the certainty of something is not confirmed, such as "It will rain today," "behavior of
someone for some situations," "A match between two teams or two players." Etc. These are
probable sentences for which we can assume that it will happen but not sure about it, so
probabilistic reasoning is used.

Probability
Probability can be defined as a chance that an uncertain event will occur. It is the numerical
measure of the likelihood that an event will occur. The value of probability always remains
between 0 and 1 that represent ideal uncertainties.
0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
P(A) = 0, indicates total uncertainty in an event A.
P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
P(¬A) = probability of a not happening event.
P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.


Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real
world.
Prior probability: The prior probability of an event is probability computed before observing new
information.
Posterior Probability: The probability that is calculated after all evidence or information has
taken into account. It is a combination of prior probability and new information.
Conditional probability: Conditional probability is a probability of an event occurring when
another event has already happened.

Let's suppose, we want to calculate the event A when event B has already occurred, "the probability
of A under the conditions of B", it can be written as:
P(A⋀B)= Joint probability of A and B
P(B)= Marginal probability of B.

71
BAYES' THEOREM
Bayes' theorem is also known as Bayes' rule, Bayes' law or Bayesian reasoning, which determines
the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two
random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference
is an application of Bayes' theorem, which is fundamental to Bayesian statistics It is a way to
calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.

Example
If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability
of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with
known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
1. P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:

The above equation is called as Bayes' rule or Bayes' theorem. This equation is basic of most
modern
AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of
hypothesis A when an evidence B has occurred.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate
the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
P(B) is called marginal probability, pure probability of an evidence.

72
Applying Bayes' rule
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This
is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one.
Suppose we want to perceive the effect of some unknown cause, and want to compute that cause,

then the Bayes' rule becomes:


Example
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that meningitis causes a patient to have a stiff neck, and it occurs 80% of the
time. He is also aware of some more facts, which are given as follows:
o The Known probability that a patient has meningitis disease is 1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis. , so we can calculate the following as:

Hence, we can assume that 1 patient out of 750 patients has meningitis with a stiff neck.

PLANNING
When the world state is accessible, an agent can use the percepts provided by the environment to
build a complete and correct model of the current world state. Then, given a goal, it can call a
suitable planning algorithm to generate a plan of action. The agent can then execute the steps of
the plan, one action at a time.
Consider the case of a problem-solving exercise: An agent is given a problem to "Get a quart of
milk and a bunch of bananas and a variable-speed cordless drill." There is need to specify the
following:
 the initial state: the agent is at home but without any of the desired objects
 the operator set: all the things that the agent can do.

73
The problem-solving agent considers sequences of actions starting from the initial state. It forces
the agent to decide first what to do in the initial state, where the relevant choices are essentially to
go to any of a number of other places. Until the agent has figured out how to obtain the various
items—by buying, borrowing, leasing, growing, manufacturing, stealing etc. it cannot really
decide where to go. The agent therefore needs a flexible way of structuring its deliberations, so
that it can work on whichever part of the problem is most likely to be solvable given the current
information.

Figure: Solving a shopping problem with forward search through the space of situations in the world.

Key Ideas behind Planning


Open up the representation of states, goals, and actions. Planning algorithms use descriptions in
some formal language, usually first-order logic or a subset thereof. States and goals are represented
by sets of sentences, and actions are represented by logical descriptions of preconditions and
effects. This enables the planner to make direct connections between states and actions. For
example, if the agent knows that the goal is a conjunction that includes the conjunct Have(Milk),
and that Buy(x) achieves Have(x), then the agent knows that it is worthwhile to consider a plan
that includes Buy(Milk). It need not consider irrelevant actions such as Buy(WhippingCream) or
GoToSleep.
The planner is free to add actions to the plan wherever they are needed, rather than in an
incremental sequence starting at the initial state. For example, the agent may decide that it is going
to have to Buy(Milk), even before it has decided where to buy it, how to get there, or what to do
afterwards. There is no necessary connection between the order of planning and the order of
execution. By making "obvious" or "important" decisions first, the planner can reduce the
branching factor for future choices and reduce the need to backtrack over arbitrary decisions.
Note: The representation of states as sets of logical sentences plays a crucial role in making this
freedom possible. For example, when adding the action Buy(Milk) to the plan, the agent can
represent the state in which the action is executed as, say, At(Supermarket). This actually
represents an entire class of states—states with and without bananas, with and without a drill, and
so on. Search algorithms that require complete state descriptions do not have this option.

74
Most parts of the world are independent of most other parts. This makes it feasible to take a
conjunctive goal like "get a quart of milk and a bunch of bananas and a variable-speed cordless
drill" and solve it with a divide-and-conquer strategy. A subplan involving going to the
supermarket can be used to achieve the first two conjuncts, and another subplan (e.g., either going
to the-hardware store or borrowing from a neighbor) can be used to achieve the third.

References
Negnevitsky, M. (2005). Artificial Intelligence: A Guide to Intelligent Systems. Pearson Education
Limited, Edinburgh Gate Harlow, Essex, England.
Russell S. J. and Norvig Peter. (1995). Artificial Intelligence A Modern Approach. Prentice Hall,
Englewood Cliffs, NJ. Hopfield.
https://www.javatpoint/

75

You might also like