Unit 3
Unit 3
Unit 3
1. Game Theory:
Game theory is basically a branch of mathematics that is used to typical strategic interaction
between different players (agents), all of which are equally rational, in a context with
predefined rules (of playing or maneuvering) and outcomes. Every player or agent is a rational
entity who is selfish and tries to maximize the reward to be obtained using a particular
strategy. All the players abide by certain rules in order to receive a predefined playoff- a
reward after a certain outcome. Hence, a GAME can be defined as a set of players, actions,
strategies, and a final playoff for which all the players are competing.
Game Theory has now become a describing factor for both Machine Learning algorithms and
many daily life situations.
Consider the SVM (Support Vector Machine) for instance. According to Game Theory, the
SVM is a game between 2 players where one player challenges the other to find the best hyper-
plane after providing the most difficult points for classification. The final playoff of this game
is a solution that will be a trade-off between the strategic abilities of both players competing.
Nash equilibrium:
Nash equilibrium can be considered the essence of Game Theory. It is basically a state, a point
of equilibrium of collaboration of multiple players in a game. Nash Equilibrium guarantees
maximum profit to each player.
Let us try to understand this with the help of Generative Adversarial Networks (GANs).
What is GAN?
It is a combination of two neural networks: the Discriminator and the Generator. The
Generator Neural Network is fed input images which it analyzes and then produces new
sample images, which are made to represent the actual input images as close as possible. Once
the images have been produced, they are sent to the Discriminator Neural Network. This neural
network judges the images sent to it and classifies them as generated images and actual input
images. If the image is classified as the original image, the DNN changes its parameters of
judging. If the image is classified as a generated image, the image is rejected and returned to
the GNN. The GNN then alters its parameters in order to improve the quality of the image
produced.
This is a competitive process which goes on until both neural networks do not require to make
any changes in their parameters and there can be no further improvement in both neural
networks. This state of no further improvement is known as NASH EQUILIBRIUM. In other
words, GAN is a 2-player competitive game where both players are continuously optimizing
themselves to find a Nash Equilibrium.
But how do we know if the game has reached Nash Equilibrium?
In any game, one of the agents is required to disclose their strategy in front of the other agents.
After the revelation, if none of the players changes their strategies, it is understood that the
game has reached Nash Equilibrium.
Now that we are aware of the basics of Game Theory, let us try to understand how Nash
Equilibrium is attained in a simultaneous game. There are many examples but the most famous
is the Prisoner’s Dilemma. There are some more examples such as the Closed-bag exchange
Game, the Friend or For Game, and the iterated Snowdrift Game.
In all these games, two players are involved and the final playoff is a result of a decision that
has to be made by both players. Both players have to make a choice between defection and co-
operation. If both players cooperate, the final playoff will turn out to be positive for both.
However, if both defect, the final playoff will be negative for both players. If there is a
combination of one player defecting and the other co-operating, the final playoff will be
positive for one and negative for another.
Here, Nash Equilibrium plays an important role. Only if both players jot out a strategy that
benefits each other and provide both with a positive playoff, the solution to this problem will
be optimal.
There are many more real examples and a number of pieces of code that try to solve this
dilemma. The basic essence, however, is the attainment of the Nash Equilibrium in an
uncomfortable situation.
Where is GAME THEORY now?
Game Theory is increasingly becoming a part of the real-world in its various applications in
areas like public health services, public safety, and wildlife. Currently, game theory is being
used in adversary training in GANs, multi-agent systems, and imitation and reinforcement
learning. In the case of perfect information and symmetric games, many Machine Learning and
Deep Learning techniques are applicable. The real challenge lies in the development of
techniques to handle incomplete information games, such as Poker. The complexity of the
game lies in the fact that there are too many combinations of cards and the uncertainty of the
cards being held by the various players.
Types of Games:
Currently, there are about 5 types of classification of games. They are as follows:
1. Zero-Sum and Non-Zero Sum Games: In non-zero-sum games, there are multiple players
and all of them have the option to gain a benefit due to any move by another player. In zero-
sum games, however, if one player earns something, the other players are bound to lose a
key playoff.
2. Simultaneous and Sequential Games: Sequential games are the more popular games where
every player is aware of the movement of another player. Simultaneous games are more
difficult as in them, the players are involved in a concurrent game. BOARD GAMES are the
perfect example of sequential games and are also referred to as turn-based or extensive-form
games.
Humans’ intellectual capacities have been engaged by games for as long as civilization has
existed, sometimes to an alarming degree. Games are an intriguing subject for AI researchers
because of their abstract character. A game’s state is simple to depict, and actors are usually
limited to a small number of actions with predetermined results. Physical games, such as
croquet and ice hockey, contain significantly more intricate descriptions, a much wider variety
of possible actions, and rather ambiguous regulations defining the legality of activities. With
the exception of robot soccer, these physical games have not piqued the AI community’s
interest.
Games are usually intriguing because they are difficult to solve. Chess, for example, has an
average branching factor of around 35, and games frequently stretch to 50 moves per player,
therefore the search tree has roughly 35100 or 10154 nodes (despite the search graph having
“only” about 1040 unique nodes). As a result, games, like the real world, necessitate the ability
to make some sort of decision even when calculating the best option is impossible.
Inefficiency is also heavily punished in games. Whereas a half-efficient implementation of A
search will merely take twice as long to complete, a chess software that is half as efficient in
utilizing its available time will almost certainly be beaten to death, all other factors being
equal. As a result of this research, a number of intriguing suggestions for making the most use
of time have emerged.
Let us start with games with two players, whom we’ll refer to as MAX and MIN for obvious
reasons. MAX is the first to move, and then they take turns until the game is finished. At the
conclusion of the game, the victorious player receives points, while the loser receives
penalties. A game can be formalized as a type of search problem that has the following
elements:
S0: The initial state of the game, which describes how it is set up at the start.
Player (s): Defines which player in a state has the move.
Actions (s): Returns a state’s set of legal moves.
Result (s, a): A transition model that defines a move’s outcome.
Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise.
Terminal states are those in which the game has come to a conclusion.
Utility (s, p): A utility function (also known as a payout function or objective function )
determines the final numeric value for a game that concludes in the terminal state s for
player p. The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2.
Backgammon’s payoffs range from 0 to +192, but certain games have a greater range of
possible outcomes. A zero-sum game is defined (confusingly) as one in which the total
reward to all players is the same for each game instance. Chess is a zero-sum game because
each game has a payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a
preferable name, 22 but zero-sum is the usual term and makes sense if each participant is
charged 1.
The game tree for the game is defined by the beginning state, ACTIONS function, and
RESULT function—a tree in which the nodes are game states and the edges represent
movements. The figure below depicts a portion of the tic-tac-toe game tree (noughts and
crosses). MAX may make nine different maneuvers from his starting position. The game
alternates between MAXs setting an X and MINs placing an O until we reach leaf nodes
corresponding to terminal states, such as one player having three in a row or all of the
squares being filled. The utility value of the terminal state from the perspective of MAX is
shown by the number on each leaf node; high values are thought to be beneficial for MAX
and bad for MIN
The game tree for tic-tac-toe is relatively short, with just 9! = 362,880 terminal nodes.
However, because there are over 1040 nodes in chess, the game tree is better viewed as a
theoretical construct that cannot be realized in the actual world. But, no matter how big the
game tree is, MAX’s goal is to find a solid move. A tree that is superimposed on the whole
game tree and examines enough nodes to allow a player to identify what move to make is
referred to as a search tree.
A sequence of actions leading to a goal state—a terminal state that is a win—would be the best
solution in a typical search problem. MIN has something to say about it in an adversarial
search. MAX must therefore devise a contingent strategy that specifies M A X’s initial state
move, then MAX’s movements in the states resulting from every conceivable MIN response,
then MAX’s moves in the states resulting from every possible MIN reaction to those moves,
and so on. This is quite similar to the AND-OR search method, with MAX acting as OR and
MIN acting as AND. When playing an infallible opponent, an optimal strategy produces results
that are as least as excellent as any other plan. We’ll start by demonstrating how to find the
best plan.
We’ll move to the trivial game in the figure below since even a simple game like tic-tac-toe is
too complex for us to draw the full game tree on one page. MAX’s root node moves are
designated by the letters a1, a2, and a3. MIN’s probable answers to a1 are b1, b2, b3, and so
on. This game is over after MAX and MIN each make one move. (In game terms, this tree
consists of two half-moves and is one move deep, each of which is referred to as a ply.) The
terminal states in this game have utility values ranging from 2 to 14.
The optimal strategy can be found from the minimax value of each node, which we express as
MINIMAX, given a game tree (n). Assuming that both players play optimally from there
through the finish of the game, the utility (for MAX) of being in the corresponding state is the
node’s minimax value. The usefulness of a terminal state is obviously its minimax value.
Furthermore, if given the option, MAX prefers to shift to a maximum value state, whereas
MIN wants to move to a minimum value state. So here’s what we’ve got:
Let’s use these definitions to analyze the game tree shown in the figure above. The game’s
UTILITY function provides utility values to the terminal nodes on the bottom level. Because
the first MIN node, B, has three successor states with values of 3, 12, and 8, its minimax value
is 3. Minimax value 2 is also used by the other two MIN nodes. The root node is a MAX node,
with minimax values of 3, 2, and 2, resulting in a minimax value of 3. We can also find the
root of the minimax decision: action a1 is the best option for MAX since it leads to the highest
minimax value.
This concept of optimal MAX play requires that MIN plays optimally as well—it maximizes
MAX’s worst-case outcome. What happens if MIN isn’t performing at its best? Then it’s a
simple matter of demonstrating that MAX can perform even better. Other strategies may
outperform the minimax method against suboptimal opponents, but they will always
outperform optimal opponents.
α>=β
Let's take an example of two-player search tree to understand the working of Alpha-beta
pruning
Step 1: At the first step the, Max player will start first move from node A where α= -∞ and
β= +∞, these value of alpha and beta passed down to node B where again α= -∞ and β= +∞,
and Node B passes the same value to its child D.
Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D
and node value will also 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a
turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min
(∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will change. The current
value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3,
where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node
A, the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3,
and β= +∞, these two values now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α
remains 3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of
beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
and again it satisfies the condition α>=β, so the next child of C which is G will be pruned,
and the algorithm will not compute the entire sub-tree G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value for the maximizer is 3 for this
example.
The effectiveness of alpha-beta pruning is highly dependent on the order in which each
node is examined. Move order is an important aspect of alpha-beta pruning.
Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the
leaves of the tree, and works exactly as minimax algorithm. In this case, it also consumes
more time because of alpha-beta factors, such a move of pruning is called worst ordering.
In this case, the best move occurs on the right side of the tree. The time complexity for
such an order is O(bm).
Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning
happens in the tree, and best moves occur at the left side of the tree. We apply DFS hence it
first search left of the tree and go deep twice as minimax algorithm in the same amount of
time. Complexity in ideal ordering is O(bm/2).
Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence
(AI). It is a probabilistic and heuristic driven search algorithm that combines the classic tree
search implementations alongside machine learning principles of reinforcement learning.
In tree search, there’s always the possibility that the current best action is actually not the most
optimal action. In such cases, MCTS algorithm becomes useful as it continues to evaluate other
alternatives periodically during the learning phase by executing them, instead of the current
perceived optimal strategy. This is known as the ” exploration-exploitation trade-off “. It
exploits the actions and strategies that is found to be the best till now but also must continue to
explore the local space of alternative decisions and find out if they could replace the current
best.
Exploration helps in exploring and discovering the unexplored parts of the tree, which could
result in finding a more optimal path. In other words, we can say that exploration expands the
tree’s breadth more than its depth. Exploration can be useful to ensure that MCTS is not
overlooking any potentially better paths. But it quickly becomes inefficient in situations with
large number of steps or repetitions. In order to avoid that, it is balanced out by exploitation.
Exploitation sticks to a single path that has the greatest estimated value. This is a greedy
approach and this will extend the tree’s depth more than its breadth. In simple words, UCB
formula applied to trees helps to balance the exploration-exploitation trade-off by periodically
exploring relatively unexplored nodes of the tree and discovering potentially more optimal
paths than the one it is currently exploiting.
For this characteristic, MCTS becomes particularly useful in making optimal decisions in
Artificial Intelligence (AI) problems.
Selection: In this process, the MCTS algorithm traverses the current tree from the root node
using a specific strategy. The strategy uses an evaluation function to optimally select nodes
with the highest estimated value. MCTS uses the Upper Confidence Bound (UCB) formula
applied to trees as the strategy in the selection process to traverse the tree. It balances the
exploration-exploitation trade-off. During tree traversal, a node is selected based on some
parameters that return the maximum value. The parameters are characterized by the formula
that is typically used for this purpose is given below.
where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns the greatest
value from the above equation will be one that will get selected. During traversal, once a
child node is found which is also a leaf node, the MCTS jumps into the expansion step.
Expansion: In this process, a new child node is added to the tree to that node which was
optimally reached during the selection process.
Simulation: In this process, a simulation is performed by choosing moves or strategies until
a result or predefined state is achieved.
Backpropagation: After determining the value of the newly added node, the remaining tree
must be updated. So, the backpropagation process is performed, where it backpropagates
from the new node to the root node. During the process, the number of simulation stored in
each node is incremented. Also, if the new node’s simulation results in a win, then the
number of wins is also incremented.
The above steps can be visually understood by the diagram given below:
These types of algorithms are particularly useful in turn based games where there is no element
of chance in the game mechanics, such as Tic Tac Toe, Connect 4, Checkers, Chess, Go, etc.
This has recently been used by Artificial Intelligence Programs like AlphaGo, to play against
the world’s top Go players. But, its application is not limited to games only. It can be used in
any situation which is described by state-action pairs and simulations used to forecast
outcomes.
As we can see, the MCTS algorithm reduces to a very few set of functions which we can use
any choice of games or in any optimizing strategy.
1. As the tree growth becomes rapid after a few iterations, it requires a huge amount of
memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there
might be a single branch or path, that might lead to loss against the opposition when
implemented for those turn-based games. This is mainly due to the vast amount of
combinations and each of the nodes might not be visited enough number of times to
understand its result or outcome in the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most
efficient path. So, there is a bit of a speed issue there.
5. Stochastic games:
Many unforeseeable external occurrences can place us in unforeseen circumstances in real life.
Many games, such as dice tossing, have a random element to reflect this unpredictability.
These are known as stochastic games. Backgammon is a classic game that mixes skill and luck.
The legal moves are determined by rolling dice at the start of each player’s turn white, for
example, has rolled a 6–5 and has four alternative moves in the backgammon scenario shown
in the figure below.
This is a standard backgammon position. The object of the game is to get all of one’s pieces off
the board as quickly as possible. White moves in a clockwise direction toward 25, while Black
moves in a counterclockwise direction toward 0. Unless there are many opponent pieces, a
piece can advance to any position; if there is only one opponent, it is caught and must start
over. White has rolled a 6–5 and must pick between four valid moves: (5–10,5–11), (5–11,19–
24), (5–10,10–16), and (5–11,11–16), where the notation (5–11,11–16) denotes moving one
piece from position 5 to 11 and then another from 11 to 16.
Stochastic game tree for a backgammon position
White knows his or her own legal moves, but he or she has no idea how Black will roll, and
thus has no idea what Black’s legal moves will be. That means White won’t be able to build a
normal game tree-like in chess or tic-tac-toe. In backgammon, in addition to M A X and M I N
nodes, a game tree must include chance nodes. The figure below depicts chance nodes as
circles. The possible dice rolls are indicated by the branches leading from each chance node;
each branch is labelled with the roll and its probability. There are 36 different ways to roll two
dice, each equally likely, yet there are only 21 distinct rolls because a 6–5 is the same as a 5–6.
P (1–1) = 1/36 because each of the six doubles (1–1 through 6–6) has a probability of 1/36.
Each of the other 15 rolls has a 1/18 chance of happening.
The following phase is to learn how to make good decisions. Obviously, we want to choose the
move that will put us in the best position. Positions, on the other hand, do not have specific
minimum and maximum values. Instead, we can only compute a position’s anticipated value,
which is the average of all potential outcomes of the chance nodes.
As a result, we can generalize the deterministic minimax value to an expected-minimax value
for games with chance nodes. Terminal nodes, MAX and MIN nodes (for which the dice roll is
known), and MAX and MIN nodes (for which the dice roll is unknown) all function as before.
We compute the expected value for chance nodes, which is the sum of all outcomes, weighted
by the probability of each chance action.
A state-space
The notion of the solution.
Discrete Domain: It is an infinite domain which can have one state for multiple
variables. For example, a start state can be allocated infinite times for each variable.
Finite Domain: It is a finite domain which can have continuous states describing one
domain for one specific variable. It is also called a continuous domain.
Unary Constraints: It is the simplest type of constraints that restricts the value of a
single variable.
Binary Constraints: It is the constraint type which relates two variables. A value x2 will
contain a value which lies between x1 and x3.
Global Constraints: It is the constraint type which involves an arbitrary number of
variables.
Some special types of solution algorithms are used to solve the following types of
constraints:
Linear Constraints: These type of constraints are commonly used in linear
programming where each variable containing an integer value exists in linear form
only.
Non-linear Constraints: These type of constraints are used in non-linear programming
where each variable (an integer value) exists in a non-linear form.
Constraint Propagation
In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP, we have
two choices either:
Node Consistency: A single variable is said to be node consistent if all the values in the
Arc Consistency: A variable is arc consistent if every value in its domain satisfies the
binary constraints of the variables.
Path Consistency: When the evaluation of a set of two variable with respect to a third
variable can be extended over another variable, satisfying all the binary constraints. It is
similar to arc consistency.
k-consistency: This type of consistency is used to define the notion of stronger forms of
propagation. Here, we examine the k-consistency of the variables.
CSP Problems
Constraint satisfaction includes those problems which contains some constraints while solving
the problem. CSP includes the following problems:
Graph Coloring: The problem where the constraint is that no adjacent sides can have
the same color.
Sudoku Playing: The gameplay where the constraint is that no number from 0-9 can be
repeated in the same row or column.
n-queen problem: In n-queen problem, the constraint is that no queen should be placed
either diagonally, in the same row or column.
Crossword: In crossword problem, the constraint is that there should be the correct
formation of the words, and it should be meaningful.
Latin square Problem: In this game, the task is to search the pattern which is occurring
several times in the game. They may be shuffled but will contain the same digits.
Cryptarithmetic Problem: This problem has one most important constraint that is, we
cannot assign a different digit to the same character. All digits should contain a unique
alphabet.
Cryptarithmetic Problem
Cryptarithmetic Problem is a type of constraint satisfaction problem where the game is about
digits and its unique replacement either with alphabets or other symbols. In cryptarithmetic
problem, the digits (0-9) get substituted by some possible alphabets or symbols. The task in
cryptarithmetic problem is to substitute each digit with an alphabet to get the result
arithmetically correct.
Let’s understand the cryptarithmetic problem as well its constraints better with the help
of an example:
Follow the below steps to understand the given problem by breaking it into its subparts:
Starting from the left hand side (L.H.S) , the terms are S and M. Assign a digit which
could give a satisfactory result. Let’s assign S->9 and M->1.
Hence, we get a satisfactory result by adding up the terms and got an assignment for O as O-
>0 as well.
Now, move ahead to the next terms E and O to get N as its output.
Adding E and O, which means 5+0=0, which is not possible because according to
cryptarithmetic constraints, we cannot assign the same digit to two letters. So, we need to think
more and assign some other value.
Note: When we will solve further, we will get one carry, so after applying it, the answer will be
satisfied.
Again, on adding the last two terms, i.e., the rightmost terms D and E, we get Y as its
result.
Backtracking search, a form of depth-first search, is commonly used for solving CSPs. Inference
can be interwoven with search.
Commutativity: CSPs are all commutative. A problem is commutative if the order of
application of any given set of actions has no effect on the outcome.
Backtracking search: A depth-first search that chooses values for one variable at a time and
backtracks when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all values in
the domain of that variable in turn, trying to find a solution. If an inconsistency is detected, then
BACKTRACK returns failure, causing the previous call to try another value.
There is no need to supply BACKTRACKING-SEARCH with a domain-specific initial state,
action function, transition model, or goal test.
BACKTRACKING-SARCH keeps only a single representation of a state and alters that
representation rather than creating a new ones.
SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the variable with the
fewest “legal” value. A.k.a. “most constrained variable” or “fail-first” heuristic, it picks a
variable that is most likely to cause a failure soon thereby pruning the search tree. If some
variable X has no legal values left, the MRV heuristic will select X and failure will be detected
immediately—avoiding pointless searches through other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible value for SA, so
it makes sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic: The degree heuristic attempts to reduce the branching factor on future choices
by selecting the variable that is involved in the largest number of constraints on other unassigned
variables. [useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T has degree
0.
ORDER-DOMAIN-VALUES
Value selection—fail-last
If we are trying to find all the solution to a problem (not just the first one), then the ordering does
not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice for the
neighboring variables in the constraint graph. (Try to leave the maximum flexibility for
subsequent variable assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and that our next
choice is for Q. Blue would be a bad choice because it eliminates the last legal value left for Q’s
neighbor, SA, therefore prefers red to blue.
INFERENCE
forward checking: [One of the simplest forms of inference.] Whenever a variable X is assigned,
the forward-checking process establishes arc consistency for it: for each unassigned variable Y
that is connected to X by a constraint, delete from Y’s domain any value that is inconsistent with
the value chosen for X.
There is no reason to do forward checking if we have already done arc consistency as a
preprocessing step.
Advantage: For many problems the search will be more effective if we combine the MRV
heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but doesn’t look
ahead and make all the other variables arc-consistent.
MAC (Maintaining Arc Consistency) algorithm: [More powerful than forward checking,
detect this inconsistency.] After a variable Xi is assigned a value, the INFERENCE procedure
calls AC-3, but instead of a queue of all arcs in the CSP, we start with only the arcs(Xj, Xi) for
all Xj that are unassigned variables that are neighbors of Xi. From there, AC-3 does constraint
propagation in the usual way, and if any variable has its domain reduced to the empty set, the call
to AC-3 fails and we know to backtrack immediately.
Intelligent backtracking
chronological backtracking: The BACKGRACKING-SEARCH in Fig 6.5. When a branch of the
search fails, back up to the preceding variable and try a different value for it. (The most recent
decision point is revisited.)
e.g.
Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue, T=red}.
When we try the next variable SA, we see every value violates a constraint.
We back up to T and try a new color, it cannot resolve the problem.
Intelligent backtracking: Backtrack to a variable that was responsible for making one of the
possible values of the next variable (e.g. SA) impossible.
Conflict set for a variable: A set of assignments that are in conflict with some value for that
variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
backjumping method: Backtracks to the most recent assignment in the conflict set.
(e.g. backjumping would jump over T and try a new value for V.)
Forward checking can supply the conflict set with no extra work.
Whenever forward checking based on an assignment X=x deletes a value from Y’s domain, add
X=x to Y’s conflict set;
If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are added to
the conflict set of X.
In fact,every branch pruned by backjumping is also pruned by forward checking. Hence simple
backjumping is redundant in a forward-checking search or in a search that uses stronger
consistency checking (such as MAC).
Conflict-directed backjumping:
e.g.
consider the partial assignment which is proved to be inconsistent: {WA=red, NSW=red}.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these last 4
variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work because NT
doesn’t have a complete conflict set of preceding variables that caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together with any
subsequent variables to have no consistent solution. So the algorithm should backtrack to NSW
and skip over T.
A backjumping algorithm that uses conflict sets defined in this way is called conflict-direct
backjumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable has a
standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value for Xj fails,
backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪ conf(Xj) – {Xi}.
The conflict set for an variable means, there is no solution from that variable onward, given the
preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT, NSW}.
Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA, NSW}.
Hence the algorithm backjump to NSW. (over T)
After backjumping from a contradiction, how to avoid running into the same problem again:
Constraint learning: The idea of finding a minimum set of variables from the conflict set that
causes the problem. This set of variables, along with their corresponding values, is called a no-
good. We then record the no-good, either by adding a new constraint to the CSP or by keeping a
separate cache of no-goods.
Backtracking occurs when no legal assignment can be found for a variable. Conflict-directed
backjumping backtracks directly to the source of the problem.
Local search algorithms for CSPs use a complete-state formulation: the initial state assigns a
value to every variable, and the search change the value of one variable at a time.
The min-conflicts heuristic: In choosing a new value for a variable, select the value that results
in the minimum number of conflicts with other variables.
Local search techniques in Section 4.1 can be used in local search for CSPs.
The landscape of a CSP under the mini-conflicts heuristic usually has a series of plateau.
Simulated annealing and Plateau search (i.e. allowing sideways moves to another state with the
same score) can help local search find its way off the plateau. This wandering on the plateau can
be directed with tabu search: keeping a small list of recently visited states and forbidding the
algorithm to return to those tates.
Constraint weighting: a technique that can help concentrate the search on the important
constraints.
Each constraint is given a numeric weight Wi, initially all 1.
At each step, the algorithm chooses a variable/value pair to change that will result in the lowest
total weight of all violated constraints.
The weights are then adjusted by incrementing the weight of each constraint that is violated by
the current assignment.
Local search can be used in an online setting when the problem changes, this is particularly
important in scheduling problems.
Tree: A constraint graph is a tree when any two varyiable are connected by only one path.
Directed arc consistency (DAC): A CSP is defined to be directed arc-consistent under an
ordering of variables X1, X2, … , Xn if and only if every Xi is arc-consistent with each Xj for j>i.
By using DAC, any tree-structured CSP can be solved in time linear in the number of variables.
How to solve a tree-structure CSP:
Pick any variable to be the root of the tree;
Choose an ordering of the variable such that each variable appears after its parent in the tree.
(topological sort)
Any tree with n nodes has n-1 arcs, so we can make this graph directed arc-consistent in O(n)
steps, each of which must compare up to d possible domain values for 2 variables, for a total
time of O(nd2).
Once we have a directed arc-consistent graph, we can just march down the list of variables and
choose any remaining value.
Since each link from a parent to its child is arc consistent, we won’t have to backtrack, and can
move linearly through the variables.
There are 2 primary ways to reduce more general constraint graphs to trees:
1. Based on removing nodes;
e.g. We can delete SA from the graph by fixing a value for SA and deleting from the domains of
other variables any values that are inconsistent with the value chosen for SA.
The general algorithm:
Choose a subset S of the CSP’s variables such that the constraint graph becomes a tree after
removal of S. S is called a cycle cutset.
For each possible assignment to the variables in S that satisfies all constraints on S,
(a) remove from the domain of the remaining variables any values that are inconsistent with the
assignment for S, and
(b) If the remaining CSP has a solution, return it together with the assignment for S.
Time complexity: O(dc·(n-c)d2), c is the size of the cycle cut set.
Cutset conditioning: The overall algorithmic approach of efficient approximation algorithms to
find the smallest cycle cutset.
2. Based on collapsing nodes together
Tree decomposition: construct a tree decomposition of the constraint graph into a set of
connected subproblems, each subproblem is solved independently, and the resulting solutions are
then combined.