Unit 3

AL3391 ARTIFICIAL INTELLIGENCE
UNIT III GAME PLAYING AND CSP
1. Game Theory:
Game theory is basically a branch of mathematics that is used to typical strategic interaction
between different players (agents), all of which are equally rational, in a context with
predefined rules (of playing or maneuvering) and outcomes. Every player or agent is a rational
entity who is selfish and tries to maximize the reward to be obtained using a particular
strategy. All the players abide by certain rules in order to receive a predefined playoff- a
reward after a certain outcome. Hence, a GAME can be defined as a set of players, actions,
strategies, and a final playoff for which all the players are competing.
Game Theory has now become a describing factor for both Machine Learning algorithms and
many daily life situations.
Consider the SVM (Support Vector Machine) for instance. According to Game Theory, the
SVM is a game between 2 players where one player challenges the other to find the best hyper-
plane after providing the most difficult points for classification. The final playoff of this game
is a solution that will be a trade-off between the strategic abilities of both players competing.
Nash equilibrium:
Nash equilibrium can be considered the essence of Game Theory. It is basically a state, a point
of equilibrium of collaboration of multiple players in a game. Nash Equilibrium guarantees
maximum profit to each player.
Let us try to understand this with the help of Generative Adversarial Networks (GANs).
What is GAN?
It is a combination of two neural networks: the Discriminator and the Generator. The
Generator Neural Network is fed input images which it analyzes and then produces new
sample images, which are made to represent the actual input images as close as possible. Once
the images have been produced, they are sent to the Discriminator Neural Network. This neural
network judges the images sent to it and classifies them as generated images and actual input
images. If the image is classified as the original image, the DNN changes its parameters of
judging. If the image is classified as a generated image, the image is rejected and returned to
the GNN. The GNN then alters its parameters in order to improve the quality of the image
produced.
This is a competitive process which goes on until both neural networks do not require to make
any changes in their parameters and there can be no further improvement in both neural
networks. This state of no further improvement is known as NASH EQUILIBRIUM. In other
words, GAN is a 2-player competitive game where both players are continuously optimizing
themselves to find a Nash Equilibrium.
But how do we know if the game has reached Nash Equilibrium?
In any game, one of the agents is required to disclose their strategy in front of the other agents.
After the revelation, if none of the players changes their strategies, it is understood that the
game has reached Nash Equilibrium.
Now that we are aware of the basics of Game Theory, let us try to understand how Nash
Equilibrium is attained in a simultaneous game. There are many examples but the most famous
is the Prisoner’s Dilemma. There are some more examples such as the Closed-bag exchange
Game, the Friend or For Game, and the iterated Snowdrift Game.
In all these games, two players are involved and the final playoff is a result of a decision that
has to be made by both players. Both players have to make a choice between defection and co-
operation. If both players cooperate, the final playoff will turn out to be positive for both.
However, if both defect, the final playoff will be negative for both players. If there is a
combination of one player defecting and the other co-operating, the final playoff will be
positive for one and negative for another.
Here, Nash Equilibrium plays an important role. Only if both players jot out a strategy that
benefits each other and provide both with a positive playoff, the solution to this problem will
be optimal.
There are many more real examples and a number of pieces of code that try to solve this
dilemma. The basic essence, however, is the attainment of the Nash Equilibrium in an
uncomfortable situation.
Where is GAME THEORY now?
Game Theory is increasingly becoming a part of the real-world in its various applications in
areas like public health services, public safety, and wildlife. Currently, game theory is being
used in adversary training in GANs, multi-agent systems, and imitation and reinforcement
learning. In the case of perfect information and symmetric games, many Machine Learning and
Deep Learning techniques are applicable. The real challenge lies in the development of
techniques to handle incomplete information games, such as Poker. The complexity of the
game lies in the fact that there are too many combinations of cards and the uncertainty of the
cards being held by the various players.
Types of Games:
Currently, there are about 5 types of classification of games. They are as follows:
1. Zero-Sum and Non-Zero Sum Games: In non-zero-sum games, there are multiple players
and all of them have the option to gain a benefit due to any move by another player. In zero-
sum games, however, if one player earns something, the other players are bound to lose a
key playoff.
2. Simultaneous and Sequential Games: Sequential games are the more popular games where
every player is aware of the movement of another player. Simultaneous games are more
difficult as in them, the players are involved in a concurrent game. BOARD GAMES are the
perfect example of sequential games and are also referred to as turn-based or extensive-form
games.
Perfect Information game.

4. Asymmetric and Symmetric Games: Asymmetric games are those win in which each player
has a different and usually conflicting final goal. Symmetric games are those in which all
players have the same ultimate goal but the strategy being used by each is completely
different.
5. Co-operative and Non-Co-operative Games: In non-co-operative games, every player plays
for himself while in co-operative games, players form alliances in order to achieve the final
goal.
2. Optimal Decisions in Games:
Humans’ intellectual capacities have been engaged by games for as long as civilization has
existed, sometimes to an alarming degree. Games are an intriguing subject for AI researchers
because of their abstract character. A game’s state is simple to depict, and actors are usually
limited to a small number of actions with predetermined results. Physical games, such as
croquet and ice hockey, contain significantly more intricate descriptions, a much wider variety
of possible actions, and rather ambiguous regulations defining the legality of activities. With
the exception of robot soccer, these physical games have not piqued the AI community’s
interest.
Games are usually intriguing because they are difficult to solve. Chess, for example, has an
average branching factor of around 35, and games frequently stretch to 50 moves per player,
therefore the search tree has roughly 35100 or 10154 nodes (despite the search graph having
“only” about 1040 unique nodes). As a result, games, like the real world, necessitate the ability
to make some sort of decision even when calculating the best option is impossible.
Inefficiency is also heavily punished in games. Whereas a half-efficient implementation of A
search will merely take twice as long to complete, a chess software that is half as efficient in
utilizing its available time will almost certainly be beaten to death, all other factors being
equal. As a result of this research, a number of intriguing suggestions for making the most use
of time have emerged.
Optimal Decision Making in Games
Let us start with games with two players, whom we’ll refer to as MAX and MIN for obvious
reasons. MAX is the first to move, and then they take turns until the game is finished. At the
conclusion of the game, the victorious player receives points, while the loser receives
penalties. A game can be formalized as a type of search problem that has the following
elements:
 S0: The initial state of the game, which describes how it is set up at the start.
 Player (s): Defines which player in a state has the move.
 Actions (s): Returns a state’s set of legal moves.
 Result (s, a): A transition model that defines a move’s outcome.
 Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise.
Terminal states are those in which the game has come to a conclusion.
 Utility (s, p): A utility function (also known as a payout function or objective function )
determines the final numeric value for a game that concludes in the terminal state s for
player p. The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2.
Backgammon’s payoffs range from 0 to +192, but certain games have a greater range of
possible outcomes. A zero-sum game is defined (confusingly) as one in which the total
reward to all players is the same for each game instance. Chess is a zero-sum game because
each game has a payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a
preferable name, 22 but zero-sum is the usual term and makes sense if each participant is
charged 1.
The game tree for the game is defined by the beginning state, ACTIONS function, and
RESULT function—a tree in which the nodes are game states and the edges represent
movements. The figure below depicts a portion of the tic-tac-toe game tree (noughts and
crosses). MAX may make nine different maneuvers from his starting position. The game
alternates between MAXs setting an X and MINs placing an O until we reach leaf nodes
corresponding to terminal states, such as one player having three in a row or all of the
squares being filled. The utility value of the terminal state from the perspective of MAX is
shown by the number on each leaf node; high values are thought to be beneficial for MAX
and bad for MIN
The game tree for tic-tac-toe is relatively short, with just 9! = 362,880 terminal nodes.
However, because there are over 1040 nodes in chess, the game tree is better viewed as a
theoretical construct that cannot be realized in the actual world. But, no matter how big the
game tree is, MAX’s goal is to find a solid move. A tree that is superimposed on the whole
game tree and examines enough nodes to allow a player to identify what move to make is
referred to as a search tree.
A sequence of actions leading to a goal state—a terminal state that is a win—would be the best
solution in a typical search problem. MIN has something to say about it in an adversarial
search. MAX must therefore devise a contingent strategy that specifies M A X’s initial state
move, then MAX’s movements in the states resulting from every conceivable MIN response,
then MAX’s moves in the states resulting from every possible MIN reaction to those moves,
and so on. This is quite similar to the AND-OR search method, with MAX acting as OR and
MIN acting as AND. When playing an infallible opponent, an optimal strategy produces results
that are as least as excellent as any other plan. We’ll start by demonstrating how to find the
best plan.
We’ll move to the trivial game in the figure below since even a simple game like tic-tac-toe is
too complex for us to draw the full game tree on one page. MAX’s root node moves are
designated by the letters a1, a2, and a3. MIN’s probable answers to a1 are b1, b2, b3, and so
on. This game is over after MAX and MIN each make one move. (In game terms, this tree
consists of two half-moves and is one move deep, each of which is referred to as a ply.) The
terminal states in this game have utility values ranging from 2 to 14.
Game’s Utility Function
The optimal strategy can be found from the minimax value of each node, which we express as
MINIMAX, given a game tree (n). Assuming that both players play optimally from there
through the finish of the game, the utility (for MAX) of being in the corresponding state is the
node’s minimax value. The usefulness of a terminal state is obviously its minimax value.
Furthermore, if given the option, MAX prefers to shift to a maximum value state, whereas
MIN wants to move to a minimum value state. So here’s what we’ve got:
Let’s use these definitions to analyze the game tree shown in the figure above. The game’s
UTILITY function provides utility values to the terminal nodes on the bottom level. Because
the first MIN node, B, has three successor states with values of 3, 12, and 8, its minimax value
is 3. Minimax value 2 is also used by the other two MIN nodes. The root node is a MAX node,
with minimax values of 3, 2, and 2, resulting in a minimax value of 3. We can also find the
root of the minimax decision: action a1 is the best option for MAX since it leads to the highest
minimax value.
This concept of optimal MAX play requires that MIN plays optimally as well—it maximizes
MAX’s worst-case outcome. What happens if MIN isn’t performing at its best? Then it’s a
simple matter of demonstrating that MAX can perform even better. Other strategies may
outperform the minimax method against suboptimal opponents, but they will always
outperform optimal opponents.
3. Alpha-Beta Pruning Search:
o Alpha-beta pruning is a modified version of the minimax algorithm. It is an

optimization technique for the minimax algorithm.
o As we have seen in the minimax search algorithm that the number of game states it has
to examine are exponential in depth of the tree. Since we cannot eliminate the exponent,
but we can cut it to half. Hence there is a technique by which without checking each
node of the game tree we can compute the correct minimax decision, and this technique
is called pruning. This involves two threshold parameter Alpha and beta for future
expansion, so it is called alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
prune the tree leaves but also entire sub-tree.
o The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at any point along
the path of Maximizer. The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along the
path of Minimizer. The initial value of beta is +∞.
The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the final
decision but making algorithm slow. Hence by pruning these nodes, it makes the algorithm
fast.
Condition for Alpha-beta pruning:
The main condition which required for alpha-beta pruning
α>=β
Key points about alpha-beta pruning:

o The Max player will only update the value of alpha.
o The Min player will only update the value of beta.
o While backtracking the tree, the node values will be passed to upper nodes instead of
values of alpha and beta.
o We will only pass the alpha, beta values to the child nodes.
Working of Alpha-Beta Pruning:
Let's take an example of two-player search tree to understand the working of Alpha-beta
pruning
Step 1: At the first step the, Max player will start first move from node A where α= -∞ and
β= +∞, these value of alpha and beta passed down to node B where again α= -∞ and β= +∞,
and Node B passes the same value to its child D.
Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D
and node value will also 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a
turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min
(∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will change. The current
value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3,
where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node
A, the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3,
and β= +∞, these two values now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α
remains 3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of
beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
and again it satisfies the condition α>=β, so the next child of C which is G will be pruned,
and the algorithm will not compute the entire sub-tree G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value for the maximizer is 3 for this
example.
Move Ordering in Alpha-Beta pruning:
The effectiveness of alpha-beta pruning is highly dependent on the order in which each
node is examined. Move order is an important aspect of alpha-beta pruning.
It can be of two types:
Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the
leaves of the tree, and works exactly as minimax algorithm. In this case, it also consumes
more time because of alpha-beta factors, such a move of pruning is called worst ordering.
In this case, the best move occurs on the right side of the tree. The time complexity for
such an order is O(bm).
Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning
happens in the tree, and best moves occur at the left side of the tree. We apply DFS hence it
first search left of the tree and go deep twice as minimax algorithm in the same amount of
time. Complexity in ideal ordering is O(bm/2).
Rules to find good ordering:
Following are some rules to find good ordering in alpha-beta pruning:
o Occur the best move from the shallowest node.

o Order the nodes in the tree such that the best nodes are checked first.
o Use domain knowledge while finding the best move. Ex: for Chess, try order: captures
first, then threats, then forward moves, backward moves.
o We can bookkeep the states, as there is a possibility that states may repeat.
4. Monte Carlo Tree Search (MCTS):
Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence
(AI). It is a probabilistic and heuristic driven search algorithm that combines the classic tree
search implementations alongside machine learning principles of reinforcement learning.
In tree search, there’s always the possibility that the current best action is actually not the most
optimal action. In such cases, MCTS algorithm becomes useful as it continues to evaluate other
alternatives periodically during the learning phase by executing them, instead of the current
perceived optimal strategy. This is known as the ” exploration-exploitation trade-off “. It
exploits the actions and strategies that is found to be the best till now but also must continue to
explore the local space of alternative decisions and find out if they could replace the current
best.
Exploration helps in exploring and discovering the unexplored parts of the tree, which could
result in finding a more optimal path. In other words, we can say that exploration expands the
tree’s breadth more than its depth. Exploration can be useful to ensure that MCTS is not
overlooking any potentially better paths. But it quickly becomes inefficient in situations with
large number of steps or repetitions. In order to avoid that, it is balanced out by exploitation.
Exploitation sticks to a single path that has the greatest estimated value. This is a greedy
approach and this will extend the tree’s depth more than its breadth. In simple words, UCB
formula applied to trees helps to balance the exploration-exploitation trade-off by periodically
exploring relatively unexplored nodes of the tree and discovering potentially more optimal
paths than the one it is currently exploiting.
For this characteristic, MCTS becomes particularly useful in making optimal decisions in
Artificial Intelligence (AI) problems.
Monte Carlo Tree Search (MCTS) algorithm:

In MCTS, nodes are the building blocks of the search tree. These nodes are formed based on
the outcome of a number of simulations. The process of Monte Carlo Tree Search can be
broken down into four distinct steps, viz., selection, expansion, simulation and
backpropagation. Each of these steps is explained in details below:
 Selection: In this process, the MCTS algorithm traverses the current tree from the root node
using a specific strategy. The strategy uses an evaluation function to optimally select nodes
with the highest estimated value. MCTS uses the Upper Confidence Bound (UCB) formula
applied to trees as the strategy in the selection process to traverse the tree. It balances the
exploration-exploitation trade-off. During tree traversal, a node is selected based on some
parameters that return the maximum value. The parameters are characterized by the formula
that is typically used for this purpose is given below.
 where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns the greatest
value from the above equation will be one that will get selected. During traversal, once a
child node is found which is also a leaf node, the MCTS jumps into the expansion step.
 Expansion: In this process, a new child node is added to the tree to that node which was
optimally reached during the selection process.
 Simulation: In this process, a simulation is performed by choosing moves or strategies until
a result or predefined state is achieved.
 Backpropagation: After determining the value of the newly added node, the remaining tree
must be updated. So, the backpropagation process is performed, where it backpropagates
from the new node to the root node. During the process, the number of simulation stored in
each node is incremented. Also, if the new node’s simulation results in a win, then the
number of wins is also incremented.
The above steps can be visually understood by the diagram given below:
These types of algorithms are particularly useful in turn based games where there is no element
of chance in the game mechanics, such as Tic Tac Toe, Connect 4, Checkers, Chess, Go, etc.
This has recently been used by Artificial Intelligence Programs like AlphaGo, to play against
the world’s top Go players. But, its application is not limited to games only. It can be used in
any situation which is described by state-action pairs and simulations used to forecast
outcomes.
As we can see, the MCTS algorithm reduces to a very few set of functions which we can use
any choice of games or in any optimizing strategy.
Advantages of Monte Carlo Tree Search:
1. MCTS is a simple algorithm to implement.

2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate effectively without
any knowledge in the particular domain, apart from the rules and end conditions, and can
find its own moves and learn from them by playing random playouts.
3. The MCTS can be saved in any intermediate state and that state can be used in future use
cases whenever required.
4. MCTS supports asymmetric expansion of the search tree based on the circumstances in
which it is operating.
Disadvantages of Monte Carlo Tree Search:
1. As the tree growth becomes rapid after a few iterations, it requires a huge amount of
memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there
might be a single branch or path, that might lead to loss against the opposition when
implemented for those turn-based games. This is mainly due to the vast amount of
combinations and each of the nodes might not be visited enough number of times to
understand its result or outcome in the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most
efficient path. So, there is a bit of a speed issue there.
5. Stochastic games:
Many unforeseeable external occurrences can place us in unforeseen circumstances in real life.
Many games, such as dice tossing, have a random element to reflect this unpredictability.
These are known as stochastic games. Backgammon is a classic game that mixes skill and luck.
The legal moves are determined by rolling dice at the start of each player’s turn white, for
example, has rolled a 6–5 and has four alternative moves in the backgammon scenario shown
in the figure below.
This is a standard backgammon position. The object of the game is to get all of one’s pieces off
the board as quickly as possible. White moves in a clockwise direction toward 25, while Black
moves in a counterclockwise direction toward 0. Unless there are many opponent pieces, a
piece can advance to any position; if there is only one opponent, it is caught and must start
over. White has rolled a 6–5 and must pick between four valid moves: (5–10,5–11), (5–11,19–
24), (5–10,10–16), and (5–11,11–16), where the notation (5–11,11–16) denotes moving one
piece from position 5 to 11 and then another from 11 to 16.
Stochastic game tree for a backgammon position
White knows his or her own legal moves, but he or she has no idea how Black will roll, and
thus has no idea what Black’s legal moves will be. That means White won’t be able to build a
normal game tree-like in chess or tic-tac-toe. In backgammon, in addition to M A X and M I N
nodes, a game tree must include chance nodes. The figure below depicts chance nodes as
circles. The possible dice rolls are indicated by the branches leading from each chance node;
each branch is labelled with the roll and its probability. There are 36 different ways to roll two
dice, each equally likely, yet there are only 21 distinct rolls because a 6–5 is the same as a 5–6.
P (1–1) = 1/36 because each of the six doubles (1–1 through 6–6) has a probability of 1/36.
Each of the other 15 rolls has a 1/18 chance of happening.
The following phase is to learn how to make good decisions. Obviously, we want to choose the
move that will put us in the best position. Positions, on the other hand, do not have specific
minimum and maximum values. Instead, we can only compute a position’s anticipated value,
which is the average of all potential outcomes of the chance nodes.
As a result, we can generalize the deterministic minimax value to an expected-minimax value
for games with chance nodes. Terminal nodes, MAX and MIN nodes (for which the dice roll is
known), and MAX and MIN nodes (for which the dice roll is unknown) all function as before.
We compute the expected value for chance nodes, which is the sum of all outcomes, weighted
by the probability of each chance action.
6. Partially observable games:

A partially observable system is one in which the entire state of the system is not fully visible
to an external sensor. In a partially observable system the observer may utilise a memory
system in order to add information to the observer's understanding of the system.
An example of a partially observable system would be a card game in which some of the cards
are discarded into a pile face down. In this case the observer is only able to view their own
cards and potentially those of the dealer. They are not able to view the face-down (used) cards,
nor the cards that will be dealt at some stage in the future. A memory system can be used to
remember the previously dealt cards that are now on the used pile. This adds to the total sum of
knowledge that the observer can use to make decisions.
In contrast, a fully observable system would be that of chess. In chess (apart from the 'who is
moving next' state, and minor subtleties such as whether a side has castled, which may not be
clear) the full state of the system is observable at any point in time.
Partially observable is a term used in a variety of mathematical settings, including that of
artificial intelligence and partially observable Markov decision processes.
7. Constraint satisfaction problems:

We have seen so many techniques like Local search, Adversarial search to solve different
problems. The objective of every problem-solving technique is one, i.e., to find a solution to
reach the goal. Although, in adversarial search and local search, there were no constraints on
the agents while solving the problems and reaching to its solutions.
Constraint satisfaction technique. By the name, it is understood that constraint satisfaction

means solving a problem under certain constraints or rules.
Constraint satisfaction is a technique where a problem is solved when its values satisfy certain
constraints or rules of the problem. Such type of technique leads to a deeper understanding of
the problem structure as well as its complexity.
Constraint satisfaction depends on three components, namely:
X: It is a set of variables.
D: It is a set of domains where the variables reside. There is a specific domain for each
variable.
C: It is a set of constraints which are followed by the set of variables.
In constraint satisfaction, domains are the spaces where the variables reside, following the
problem specific constraints. These are the three main elements of a constraint satisfaction
technique. The constraint value consists of a pair of {scope, rel}. The scope is a tuple of
variables which participate in the constraint and rel is a relation which includes a list of values
which the variables can take to satisfy the constraints of the problem.
Solving Constraint Satisfaction Problems
The requirements to solve a constraint satisfaction problem (CSP) is:
 A state-space
 The notion of the solution.
A state in state-space is defined by assigning values to some or all variables such as

{X1=v1, X2=v2, and so on…}.
An assignment of values to a variable can be done in three ways:
 Consistent or Legal Assignment: An assignment which does not violate any constraint
or rule is called Consistent or legal assignment.
 Complete Assignment: An assignment where every variable is assigned with a value,
and the solution to the CSP remains consistent. Such assignment is known as Complete
assignment.
 Partial Assignment: An assignment which assigns values to some of the variables only.
Such type of assignments are called Partial assignments.
Types of Domains in CSP

There are following two types of domains which are used by the variables :
 Discrete Domain: It is an infinite domain which can have one state for multiple
 variables. For example, a start state can be allocated infinite times for each variable.
 Finite Domain: It is a finite domain which can have continuous states describing one
domain for one specific variable. It is also called a continuous domain.
Constraint Types in CSP

With respect to the variables, basically there are following types of constraints:
 Unary Constraints: It is the simplest type of constraints that restricts the value of a
single variable.
 Binary Constraints: It is the constraint type which relates two variables. A value x2 will
contain a value which lies between x1 and x3.
 Global Constraints: It is the constraint type which involves an arbitrary number of
variables.
 Some special types of solution algorithms are used to solve the following types of
constraints:
 Linear Constraints: These type of constraints are commonly used in linear
programming where each variable containing an integer value exists in linear form
only.
 Non-linear Constraints: These type of constraints are used in non-linear programming
where each variable (an integer value) exists in a non-linear form.
Note: A special constraint which works in real-world is known as Preference constraint.
Constraint Propagation
In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP, we have
two choices either:
 We can search for a solution or

 We can perform a special type of inference called constraint propagation.
Constraint propagation is a special type of inference which helps in reducing the legal number
of values for the variables. The idea behind constraint propagation is local consistency.
In local consistency, variables are treated as nodes, and each binary constraint is treated as
an arc in the given problem. There are following local consistencies which are discussed
below:
 Node Consistency: A single variable is said to be node consistent if all the values in the
variable’s domain satisfy the unary constraints on the variables.
 Arc Consistency: A variable is arc consistent if every value in its domain satisfies the
binary constraints of the variables.
 Path Consistency: When the evaluation of a set of two variable with respect to a third
variable can be extended over another variable, satisfying all the binary constraints. It is
similar to arc consistency.
 k-consistency: This type of consistency is used to define the notion of stronger forms of
propagation. Here, we examine the k-consistency of the variables.
CSP Problems
Constraint satisfaction includes those problems which contains some constraints while solving
the problem. CSP includes the following problems:
 Graph Coloring: The problem where the constraint is that no adjacent sides can have
the same color.
 Sudoku Playing: The gameplay where the constraint is that no number from 0-9 can be
repeated in the same row or column.
 n-queen problem: In n-queen problem, the constraint is that no queen should be placed
either diagonally, in the same row or column.
Note: The n-queen problem is already discussed in Problem-solving in AI section.
 Crossword: In crossword problem, the constraint is that there should be the correct
formation of the words, and it should be meaningful.
 Latin square Problem: In this game, the task is to search the pattern which is occurring
several times in the game. They may be shuffled but will contain the same digits.
 Cryptarithmetic Problem: This problem has one most important constraint that is, we
cannot assign a different digit to the same character. All digits should contain a unique
alphabet.
Cryptarithmetic Problem
Cryptarithmetic Problem is a type of constraint satisfaction problem where the game is about
digits and its unique replacement either with alphabets or other symbols. In cryptarithmetic
problem, the digits (0-9) get substituted by some possible alphabets or symbols. The task in
cryptarithmetic problem is to substitute each digit with an alphabet to get the result
arithmetically correct.
We can perform all the arithmetic operations on a given cryptarithmetic problem.

The rules or constraints on a cryptarithmetic problem are as follows:
 There should be a unique digit to be replaced with a unique alphabet.

 The result should satisfy the predefined arithmetic rules, i.e., 2+2 =4, nothing else.
 Digits should be from 0-9 only.
 There should be only one carry forward, while performing the addition operation on a
problem.
 The problem can be solved from both sides, i.e., lefthand side (L.H.S), or righthand
side (R.H.S)
Let’s understand the cryptarithmetic problem as well its constraints better with the help
of an example:
 Given a cryptarithmetic problem, i.e., S E N D + M O R E = M O N E Y
 In this example, add both terms S E N D and M O R E to bring M O N E Y as a result.
Follow the below steps to understand the given problem by breaking it into its subparts:
 Starting from the left hand side (L.H.S) , the terms are S and M. Assign a digit which
could give a satisfactory result. Let’s assign S->9 and M->1.
Hence, we get a satisfactory result by adding up the terms and got an assignment for O as O-
>0 as well.
 Now, move ahead to the next terms E and O to get N as its output.
Adding E and O, which means 5+0=0, which is not possible because according to
cryptarithmetic constraints, we cannot assign the same digit to two letters. So, we need to think
more and assign some other value.
Note: When we will solve further, we will get one carry, so after applying it, the answer will be
satisfied.
 Further, adding the next two terms N and R we get,

But, we have already assigned E->5. Thus, the above result does not satisfy the values
because we are getting a different value for E. So, we need to think more.
Again, after solving the whole problem, we will get a carryover on this term, so our answer
will be satisfied.
where 1 will be carry forward to the above term

Let’s move ahead.
 Again, on adding the last two terms, i.e., the rightmost terms D and E, we get Y as its
result.
where 1 will be carry forward to the above term

 Keeping all the constraints in mind, the final resultant is as follows:
 Below is the representation of the assignment of the digits to the alphabets.

More examples of cryptarithmatic problems can be:
8. Backtracking search for CSP:
Backtracking search, a form of depth-first search, is commonly used for solving CSPs. Inference
can be interwoven with search.
Commutativity: CSPs are all commutative. A problem is commutative if the order of
application of any given set of actions has no effect on the outcome.
Backtracking search: A depth-first search that chooses values for one variable at a time and
backtracks when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all values in
the domain of that variable in turn, trying to find a solution. If an inconsistency is detected, then
BACKTRACK returns failure, causing the previous call to try another value.
There is no need to supply BACKTRACKING-SEARCH with a domain-specific initial state,
action function, transition model, or goal test.
BACKTRACKING-SARCH keeps only a single representation of a state and alters that
representation rather than creating a new ones.
To solve CSPs efficiently without domain-specific knowledge, address following questions:

1) function SELECT-UNASSIGNED-VARIABLE: which variable should be assigned next?
function ORDER-DOMAIN-VALUES: in what order should its values be tried?
2) function INFERENCE: what inferences should be performed at each step in the search?
3) When the search arrives at an assignment that violates a constraint, can the search avoid
repeating this failure?
1. Variable and value ordering
SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the variable with the
fewest “legal” value. A.k.a. “most constrained variable” or “fail-first” heuristic, it picks a
variable that is most likely to cause a failure soon thereby pruning the search tree. If some
variable X has no legal values left, the MRV heuristic will select X and failure will be detected
immediately—avoiding pointless searches through other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible value for SA, so
it makes sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic: The degree heuristic attempts to reduce the branching factor on future choices
by selecting the variable that is involved in the largest number of constraints on other unassigned
variables. [useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T has degree
0.
ORDER-DOMAIN-VALUES
Value selection—fail-last
If we are trying to find all the solution to a problem (not just the first one), then the ordering does
not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice for the
neighboring variables in the constraint graph. (Try to leave the maximum flexibility for
subsequent variable assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and that our next
choice is for Q. Blue would be a bad choice because it eliminates the last legal value left for Q’s
neighbor, SA, therefore prefers red to blue.
2. Interleaving search and inference
INFERENCE
forward checking: [One of the simplest forms of inference.] Whenever a variable X is assigned,
the forward-checking process establishes arc consistency for it: for each unassigned variable Y
that is connected to X by a constraint, delete from Y’s domain any value that is inconsistent with
the value chosen for X.
There is no reason to do forward checking if we have already done arc consistency as a
preprocessing step.
Advantage: For many problems the search will be more effective if we combine the MRV
heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but doesn’t look
ahead and make all the other variables arc-consistent.
MAC (Maintaining Arc Consistency) algorithm: [More powerful than forward checking,
detect this inconsistency.] After a variable Xi is assigned a value, the INFERENCE procedure
calls AC-3, but instead of a queue of all arcs in the CSP, we start with only the arcs(Xj, Xi) for
all Xj that are unassigned variables that are neighbors of Xi. From there, AC-3 does constraint
propagation in the usual way, and if any variable has its domain reduced to the empty set, the call
to AC-3 fails and we know to backtrack immediately.
Intelligent backtracking
chronological backtracking: The BACKGRACKING-SEARCH in Fig 6.5. When a branch of the
search fails, back up to the preceding variable and try a different value for it. (The most recent
decision point is revisited.)
e.g.
Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue, T=red}.
When we try the next variable SA, we see every value violates a constraint.
We back up to T and try a new color, it cannot resolve the problem.
Intelligent backtracking: Backtrack to a variable that was responsible for making one of the
possible values of the next variable (e.g. SA) impossible.
Conflict set for a variable: A set of assignments that are in conflict with some value for that
variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
backjumping method: Backtracks to the most recent assignment in the conflict set.
(e.g. backjumping would jump over T and try a new value for V.)
Forward checking can supply the conflict set with no extra work.
Whenever forward checking based on an assignment X=x deletes a value from Y’s domain, add
X=x to Y’s conflict set;
If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are added to
the conflict set of X.
In fact,every branch pruned by backjumping is also pruned by forward checking. Hence simple
backjumping is redundant in a forward-checking search or in a search that uses stronger
consistency checking (such as MAC).
Conflict-directed backjumping:
e.g.
consider the partial assignment which is proved to be inconsistent: {WA=red, NSW=red}.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these last 4
variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work because NT
doesn’t have a complete conflict set of preceding variables that caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together with any
subsequent variables to have no consistent solution. So the algorithm should backtrack to NSW
and skip over T.
A backjumping algorithm that uses conflict sets defined in this way is called conflict-direct
backjumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable has a
standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value for Xj fails,
backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪ conf(Xj) – {Xi}.
The conflict set for an variable means, there is no solution from that variable onward, given the
preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT, NSW}.
Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA, NSW}.
Hence the algorithm backjump to NSW. (over T)
After backjumping from a contradiction, how to avoid running into the same problem again:
Constraint learning: The idea of finding a minimum set of variables from the conflict set that
causes the problem. This set of variables, along with their corresponding values, is called a no-
good. We then record the no-good, either by adding a new constraint to the CSP or by keeping a
separate cache of no-goods.
Backtracking occurs when no legal assignment can be found for a variable. Conflict-directed
backjumping backtracks directly to the source of the problem.
9. Local search for CSP:
Local search algorithms for CSPs use a complete-state formulation: the initial state assigns a
value to every variable, and the search change the value of one variable at a time.
The min-conflicts heuristic: In choosing a new value for a variable, select the value that results
in the minimum number of conflicts with other variables.
Local search techniques in Section 4.1 can be used in local search for CSPs.
The landscape of a CSP under the mini-conflicts heuristic usually has a series of plateau.
Simulated annealing and Plateau search (i.e. allowing sideways moves to another state with the
same score) can help local search find its way off the plateau. This wandering on the plateau can
be directed with tabu search: keeping a small list of recently visited states and forbidding the
algorithm to return to those tates.
Constraint weighting: a technique that can help concentrate the search on the important
constraints.
Each constraint is given a numeric weight Wi, initially all 1.
At each step, the algorithm chooses a variable/value pair to change that will result in the lowest
total weight of all violated constraints.
The weights are then adjusted by incrementing the weight of each constraint that is violated by
the current assignment.
Local search can be used in an online setting when the problem changes, this is particularly
important in scheduling problems.
The structure of problem

1. The structure of constraint graph
The structure of the problem as represented by the constraint graph can be used to find solution
quickly.
e.g. The problem can be decomposed into 2 independent subproblems: Coloring T and coloring
the mainland.
Tree: A constraint graph is a tree when any two varyiable are connected by only one path.
Directed arc consistency (DAC): A CSP is defined to be directed arc-consistent under an
ordering of variables X1, X2, … , Xn if and only if every Xi is arc-consistent with each Xj for j>i.
By using DAC, any tree-structured CSP can be solved in time linear in the number of variables.
How to solve a tree-structure CSP:
Pick any variable to be the root of the tree;
Choose an ordering of the variable such that each variable appears after its parent in the tree.
(topological sort)
Any tree with n nodes has n-1 arcs, so we can make this graph directed arc-consistent in O(n)
steps, each of which must compare up to d possible domain values for 2 variables, for a total
time of O(nd2).
Once we have a directed arc-consistent graph, we can just march down the list of variables and
choose any remaining value.
Since each link from a parent to its child is arc consistent, we won’t have to backtrack, and can
move linearly through the variables.
There are 2 primary ways to reduce more general constraint graphs to trees:
1. Based on removing nodes;
e.g. We can delete SA from the graph by fixing a value for SA and deleting from the domains of
other variables any values that are inconsistent with the value chosen for SA.
The general algorithm:
Choose a subset S of the CSP’s variables such that the constraint graph becomes a tree after
removal of S. S is called a cycle cutset.
For each possible assignment to the variables in S that satisfies all constraints on S,
(a) remove from the domain of the remaining variables any values that are inconsistent with the
assignment for S, and
(b) If the remaining CSP has a solution, return it together with the assignment for S.
Time complexity: O(dc·(n-c)d2), c is the size of the cycle cut set.
Cutset conditioning: The overall algorithmic approach of efficient approximation algorithms to
find the smallest cycle cutset.
2. Based on collapsing nodes together
Tree decomposition: construct a tree decomposition of the constraint graph into a set of
connected subproblems, each subproblem is solved independently, and the resulting solutions are
then combined.
A tree decomposition must satisfy 3 requirements:

·Every variable in the original problem appears in at least one of the subproblems.
·If 2 variables are connected by a constraint in the original problem, they must appear together
(along with the constraint) in at least one of the subproblems.
·If a variable appears in 2 subproblems in the tree, it must appear in every subproblem along the
path connecting those those subproblems.
We solve each subproblem independently.

If any one has no solution, the entire problem has no solution.
If we can solve all the subproblems, then construct a global solution as follows:
First, view each subproblem as a “mega-variable” whose domain is the set of all solutions for the
subproblem.
Then, solve the constraints connecting the subproblems using the efficient algorithm for trees.
A given constraint graph admits many tree decomposition;
In choosing a decomposition, the aim is to make the subproblems as small as possible.
Tree width:
The tree width of a tree decomposition of a graph is one less than the size of the largest
subproblems.
The tree width of the graph itself is the minimum tree width among all its tree decompositions.
Time complexity: O(ndw+1), w is the tree width of the graph.
2. The structure in the values of variables

By introducing a symmetry-breaking constraint, we can break the value symmetry and reduce
the search space by a factor of n!.
e.g.
Consider the map-coloring problems with n colors, for every consistent solution, there is actually
a set of n! solutions formed by permuting the color names.(value symmetry)
On the Australia map, WA, NT and SA must all have different colors, so there are 3!=6 ways to
assign.
We can impose an arbitrary ordering constraint NT<SA<WA that requires the 3 values to be in
alphabetical order. This constraint ensures that only one of the n! solution is possible: {NT=blue,
SA=green, WA=red}. (symmetry-breaking constraint)

Unit 3

Uploaded by

Copyright:

Available Formats

Unit 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3

Uploaded by

Copyright:

Available Formats

AL3391 ARTIFICIAL INTELLIGENCE

UNIT III GAME PLAYING AND CSP

Perfect Information game.

2. Optimal Decisions in Games:

Optimal Decision Making in Games

Game’s Utility Function

3. Alpha-Beta Pruning Search:

o Alpha-beta pruning is a modified version of the minimax algorithm. It is an

Condition for Alpha-beta pruning:

The main condition which required for alpha-beta pruning

Key points about alpha-beta pruning:

Working of Alpha-Beta Pruning:

Move Ordering in Alpha-Beta pruning:

It can be of two types:

Rules to find good ordering:

Following are some rules to find good ordering in alpha-beta pruning:

o Occur the best move from the shallowest node.

4. Monte Carlo Tree Search (MCTS):

Monte Carlo Tree Search (MCTS) algorithm:

Advantages of Monte Carlo Tree Search:

1. MCTS is a simple algorithm to implement.

6. Partially observable games:

7. Constraint satisfaction problems:

Constraint satisfaction technique. By the name, it is understood that constraint satisfaction

A state in state-space is defined by assigning values to some or all variables such as

Types of Domains in CSP

Constraint Types in CSP

Note: A special constraint which works in real-world is known as Preference constraint.

 We can search for a solution or

variable’s domain satisfy the unary constraints on the variables.

Note: The n-queen problem is already discussed in Problem-solving in AI section.

We can perform all the arithmetic operations on a given cryptarithmetic problem.

 There should be a unique digit to be replaced with a unique alphabet.

 Given a cryptarithmetic problem, i.e., S E N D + M O R E = M O N E Y

 In this example, add both terms S E N D and M O R E to bring M O N E Y as a result.

 Further, adding the next two terms N and R we get,

where 1 will be carry forward to the above term

where 1 will be carry forward to the above term

 Below is the representation of the assignment of the digits to the alphabets.

To solve CSPs efficiently without domain-specific knowledge, address following questions:

1. Variable and value ordering

2. Interleaving search and inference

9. Local search for CSP:

The structure of problem

A tree decomposition must satisfy 3 requirements:

We solve each subproblem independently.

2. The structure in the values of variables

You might also like