The Search For GTO Determining Optimal Poker Strategy Using Line
The Search For GTO Determining Optimal Poker Strategy Using Line
The Search For GTO Determining Optimal Poker Strategy Using Line
Open Works
Senior Independent Study Theses
2017
Recommended Citation
Young, Stuart, "The Search for GTO: Determining Optimal Poker Strategy Using Linear Programming" (2017). Senior Independent
Study Theses. Paper 7807.
https://openworks.wooster.edu/independentstudy/7807
This Senior Independent Study Thesis Exemplar is brought to you by Open Works, a service of The College of Wooster Libraries. It has been accepted
for inclusion in Senior Independent Study Theses by an authorized administrator of Open Works. For more information, please contact
openworks@wooster.edu.
by
Stuart Young
The College of Wooster
2017
Advised by:
Dr. Matthew Moynihan
Abstract
This project applies techniques from game theory and linear programming
to find the optimal strategies of two variants of poker. A set of optimal poker
strategies describe a Nash equilibrium, where no player can improve their
outcome by changing their own strategy, given the strategies of their
opponent(s). We first consider Kuhn Poker as a simple application of our
methodology. We then turn our attention to 2-7 Draw Poker, a modern variant
onto which little previous research is focused. However, the techniques that
we use are incapable of solving large, full-scale variants of poker such as 2-7
Draw. Therefore, we utilize several abstractions techniques to render a
computationally-feasible LP that retains the underlying spirit of the game. We
use the Gambit software package to build and solve LPs whose solutions are
the optimal strategies for each game.
iii
Acknowledgements
To my parents, Lee Stivers and Peter Young, thank you. Beyond just
providing me with this education, you taught me why its so important. Mom,
you have inspired my love of scientific inquiry and desire for new knowledge.
Most importantly, you are supportive, reassuring, and always there for me.
This project, and my time at the College of Wooster, would not have been
possible without each of you.
v
Contents
Abstract iii
Acknowledgements v
1 Introduction 1
2 Linear Programming 3
2.1.1 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Game Theory 17
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
vii
viii CONTENTS
4 Kuhn Poker 55
4.1 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.2 Payoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.2 Payoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.1 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6 Conclusion 111
A Gambit 113
xi
xii LIST OF FIGURES
A.1 Part of the Extensive Form of Kuhn Poker in the Gambit GUI . . . 114
List of Tables
i
ii LIST OF TABLES
Chapter 1
Introduction
1
2 CHAPTER 1. INTRODUCTION
Linear Programming
3
4 CHAPTER 2. LINEAR PROGRAMMING
f (x) = c1 x1 + c2 x2 + · · · + c j x j .
This f (x) is called the objective function of the LP. Each c j ∈ R is a coefficient
for a particular decision variable x j . These coefficients can have different
interpretations, depending on context; for example, if f (x) is a revenue
function, each c j would represent the unit cost of good x j . A particular LP can
seek to either maximize or minimize this function; a business could want to
minimize cost or maximize profit. Fortunately, we can easily convert between
a minimization LP and maximization LP by changing the sign; minimizing
f (x) is equivalent to maximizing − f (x).
2.1.1 Constraints
≤
a1 x1 + a2 x2 + · · · + an xn
= b.
≥
CHAPTER 2. LINEAR PROGRAMMING 5
a1 x1 + a2 x2 + · · · + an xn ≤ b,
a1 x1 + a2 x2 + · · · + an xn + w = b.
a1 x1 + a2 x2 + · · · + an xn ≥ b,
a1 x1 + a2 x2 + · · · + an xn − v = b.
a1 x1 + a2 x2 + · · · + an xn = b
6 CHAPTER 2. LINEAR PROGRAMMING
a1 x1 + a2 x2 + · · · + an xn + w ≥ b,
a1 x1 + a2 x2 + · · · + an xn + w ≤ b,
Maximize: f (x) = c1 x1 + c2 x2 + · · · + c j x j
ai1 x1 + ai2 x2 + · · · + ai j xn ≤ bi
x1 , x2 , · · · , xn ≥ 0.
Maximize: cT x
x ≥ 0.
CHAPTER 2. LINEAR PROGRAMMING 7
2.1.2 Solutions
We now consider the solution methods and solution types of LPs. Most
LPs are much too complex to solve by hand or by graphical methods. Instead,
we typically use the Simplex method algorithm. Broadly, the Simplex method
searches the vertices of the feasible region to find the optimal solution. The
mechanics of the Simplex method are not a focus of this project and are well
documented elsewhere, including in [12].
Using the Simplex method, we can find several different types of solutions.
An n-tuple (x1 , x2 , · · · , xn ) specifying a value for each decision variable is a
solution; it is a feasible solution if it satisfies all of the constraints of the LP
and an optimal solution if it corresponds to the actual maximum or minimum
of the LP. The optimal solution is also a feasible solution, by definition. Any
solution that fails to satisfy one or more constraints is an infeasible solution.
The following definitions further detail types of solutions.
Linear programs that have no optimal solution are split into two
8 CHAPTER 2. LINEAR PROGRAMMING
categories. Some LPs have no feasible solutions; that is, the space defined by
the constraints is empty. For example, consider the LP
subject to: x1 + x2 + x3 ≤ 5
− x1 − x2 − x3 ≤ −8 (2.3)
2x1 + x2 ≤ 10
x1 , x2 , x3 ≥ 0.
x1 , x2 ≥ 0.
Let x1 ≥ 2 and x2 = 0; any solution of this form satisfies both constraints and is
feasible. As x1 increases, so does f (x) = x1 − 6x2 . Thus, f (x) takes on arbitrarily
large values and is unbounded. We can now categorize the solution categories
CHAPTER 2. LINEAR PROGRAMMING 9
2.2 Duality
For many applications of linear programming, it is useful to consider the
dual LP associated with a particular primal LP. These structures come in
pairs; the following section details their relationship. In short, every feasible
solution for one of the programs characterizes a bound on the optimal
objective function value of the other. In fact, both LPs will have the same
optimal objective function value.
2.2.1 Motivation
3x1 + 2x2 ≤ 8.
10 CHAPTER 2. LINEAR PROGRAMMING
We can also place upper bounds on f (x∗ ), as follows. Consider the sum of
the three constraints,
3x2 + x3 ≤ 5
x1 + x2 + x3 ≤ 10
+ 3x1 + 2x2 ≤ 8
Since each variable is nonnegative, this sum forms an upper bound on the
optimal objective function value; that is,
y1 (3x2 + x3 ) ≤ 5y1
y2 (x1 + x2 + x3 ) ≤ 10y2
+ y3 (3x1 + 2x2 ) ≤ 8y3
(y2 + 3y3 )x1 + (3y1 + y2 + 2y3 )x2 + (y1 + y2 )x3 ≤ 5y1 + 10y2 + 8y3
Assume that each of the xi coefficients is the same as in the original objective
function. That is,
y2 + 3y3 ≥ 2
y1 + y2 ≥ 3.
Thus,
Since f (x) is bounded above by 5y1 + 10y2 + 8y3 , we can minimize this sum in
order to find the most specific upper bound for the original LP. This new
12 CHAPTER 2. LINEAR PROGRAMMING
y1 + y2 ≥ 3.
This is the dual LP associated with the given primal LP (2.5). Intuitively, the
dual LP seeks to minimize (maximize) the upper (lower) bound on the primal
objective function, respectively.
2.2.2 Theory
n
X
Maximize: c jx j
j=1
n
(2.8)
X
subject to: ai j x j ≤ bi 1≤i≤m
j=1
xj ≥ 0 1≤ j≤n
CHAPTER 2. LINEAR PROGRAMMING 13
m
X
Minimize: bi yi
i=1
m
(2.9)
X
subject to: yi ai j ≥ c j 1≤ j≤n
i=1
yi ≥ 0 1 ≤ i ≤ m.
Proof. To verify this relationship, we must first write the dual problem in
standard form. Recall that a standard form LP maximizes its objective
function and has ≤ constraints. The sign restrictions remain the same. Since
minimizing a function is equivalent to maximizing its negative, we have
m
m
X X
bi yi = −max
min bi yi
i=1 i=1
m
X
Maximize: b i yi
i=1
Xm
subject to: −yi ai j ≤ −c j 1≤ j≤n
i=1
yi ≥ 0 1 ≤ i ≤ m.
14 CHAPTER 2. LINEAR PROGRAMMING
n
X
Minimize: −c j x j
j=1
n
X
subject to: (−ai j )x j ≥ −bi 1≤i≤m
j=1
xj ≥ 0 1 ≤ j ≤ n,
n
X
Maximize: c jx j
j=1
n
X
subject to: ai j x j ≤ bi 1≤i≤m
j=1
xj ≥ 0 1 ≤ j ≤ n.
i bi yi .
P P
j c jx j ≤
CHAPTER 2. LINEAR PROGRAMMING 15
X X X
c jx j ≤ yi ai j x j
j j i
X
= yi ai j x j
ij
(2.10)
X X
= ai j x j yi
i j
X
≤ bi yi .
i
P
For the first inequality, we know that each x j ≥ 0 by (2.8) that c j ≤yi ai j for
i
P
each j by (2.9). The second inequality is similar; yi ≥ 0 by (2.9) and j ai j x j ≤ bi
by (2.8) for each i.
From the Weak Duality Theorem, we know that all of the primal objective
function values (for feasible solutions) are less than all of the dual objective
function values. For an arbitrary primal feasible solution x, f (x) ≤ g(y) for all
dual feasible solutions y, where f and g are the primal and dual objective
functions, respectively. This provides an upper bound for every primal
objective function value. Similarly, the objective function value for any primal
optimal solution provides a lower bound for every dual objective function
value. Therefore, we can characterize the optimal solutions x and y as those
for which f (x) = g(y).
For more detail, see [12]. Intuitively, this theorem states that the objective
functions for a primal-dual pair of LPs is equal precisely at the optimal
solution to each.
j c jx j =
∗ ∗
P P
i bi yi .
Chapter 3
Game Theory
This chapter presents several concepts in game theory that are central to
this project. We focus on the various representations and types of games, as
well as the associated Nash equilibria. These concepts will later be used in
conjunction with the linear programming theory of the previous chapter to
develop a method of representing and solving two variants of poker.
3.1 Background
The purpose of game theory is to develop mathematical models that
represent decision-making interactions among individuals. We assume that
these individuals are both rational in their own decisions and take into
account expectations of how the other decision-makers will act [7]. Game
theory can be used to model a wide range of situations, including business
negotiations, political competition, and economic models of oligopolistic
competition, as well as traditional ‘games‘, such as chess, checkers, and poker.
Game theoretic models are highly abstracted versions of the actual situation.
17
18 CHAPTER 3. GAME THEORY
Unlike many other fields of mathematics, the study of game theory attempts to
both positively describe how individuals do make decisions, and normatively
how they should under the assumption of perfect rationality.
The modern field of game theory is relatively new, and is generally
considered to have begun in the 19th century with Cournot, Bertrand, and
Edgeworth’s respective work on models of oligopoly pricing [4]. These ideas
developed to a more general theory of games in the mid-19th century with
John von Neumann’s proof of the existence of mixed-strategy equilibria in
two-person zero-sum games. He later co-authored the seminal Theory of Games
and Economic Behavior with Oskar Morgenstern [4]. This text considered
theoretical results about cooperative games with several players, and offered a
axiomatic conception of utility theory that allowed mathematicians to
formalize decision-making under uncertainty. Additionally, it introduced the
concepts of the normal (or strategic) and extensive forms, as well as a proof of
the existence of a maximin equilibrium for all two-player zero-sum games [4].
With zero-sum games fairly well understood, attention turned to the more
general case of non zero-sum games, which are perhaps more common. In
1950, John Forbes Nash Jr. proved the existence of an optimal strategy for each
player in every non-cooperative game [4]. The Nash equilibrium is a
generalization of Morgenstern’s maximin theorem for non zero-sum games,
and requires that each player’s optimal strategy be a payoff-maximizing
response to his correct expectation of his opponents strategy. This guarantee
was a significant advance in the field, and allowed the study of a broader class
of games [4].
CHAPTER 3. GAME THEORY 19
Any such game with a finite set of actions Ai for each player i is considered finite.
Notice that the preferences of each player i are defined over A, instead of
Ai ; that is, their preferences are defined over each combination of the other
player’s outcomes, as well as their own. This distinguishes a strategic game
from a more general decision problem; players are concerned with their
20 CHAPTER 3. GAME THEORY
opponents actions and payoffs, as well as their own. The normal form model
of a game is intentionally abstract, allowing it to be applied to a wide variety
of situations. The set of players N may consist of individuals, animals,
governments, businesses, etc. Likewise, the set of actions Ai may contain only
a few simple actions or complicated sets of contingencies over many potential
outcomes. The primary restriction on the elements of N and each A is that
each player has a well-defined preference relation over all outcomes of action
profile combinations. For example, the preference relation for a business
might be a function that maps particular actions to profit levels.
This definition considers a generalized preference relation %i ; typically, and
for our purposes, we can use a more intuitive payoff function ui . In this case,
D E
we denote a particular game as N, (Ai ), (ui ) . This payoff function ui maps a
particular combination of action profiles to a numerical payoff value. These
inputs can be represented either as a singular action profile a ∈ A or as a tuple
to highlight the actions of individual players. For example, a particular u1 (a)
represents the payoff to player 1 for a particular combination of each player’s
actions a ∈ A. This action profile a defines a particular action a j for each player
j from their action set A j . Alternatively, we may wish to highlight particular
action profiles from distinct players as separate inputs to ui . Players typically
wish to choose a set of actions that maximize this value, wish could represent
profit, utility, etc. The utility function describes how a particular player feels
about a particular combination of actions of all of the players in the game.
combinations of actions a, b ∈ A.
In a game, players make choices about how they will play. Each of these
choices is a particular action take by a player at some point in the game. In a
game with more than one such set of action choices, players employ a
strategy, which is an algorithmic prescription of every action that a player will
choose. A particular strategy could be a simple choice between two actions, or
a complicated set of prescriptions to choose actions based on various
contingencies of how the other players act. These choices may or may not be
identical across iterations of the game, motivating the following distinction.
p2
s3 s4
s1 (w1 , w2 ) (x1 , x2 )
p1
s2 (y1 , y2 ) (z1 , z2 )
However, each player need not always play the same strategy in this
game. It may be necessary to vary their play in order to optimize their
expected payoff. This is the motivation behind the concept of a mixed strategy.
Recall that a mixed strategy σi is a probability distribution over all of player i’s
pure strategies. The probabilities assigned to each pure strategy determine the
expected payoff of the mixed strategy. For example, consider a particular
mixed strategy σ1 in which player 1 chooses s1 with probability 0.7 and s2 with
probability 0.3. Player 2 uses mixed strategy σ2 , where he chooses s3 and s4
with equal probability. Then the payoff to player 1 is
More generally, the payoff for a mixed strategy profile is a linear function
of the probabilities assigned to each pure strategy [4]. In (3.1), the coefficient
associated with a particular payoff value is the product of the probabilities
assigned to the particular pure strategy of each player in their mixed strategy.
Player 1 uses pure strategy s1 with probability 0.7 in his mixed strategy σ1 , and
player 2 uses pure strategy s3 with probability 0.3 in his mixed strategy σ2 . In
this case, player 1’s payoff is w1 . This occurs with probability 0.7 × 0.5 = 0.35.
CHAPTER 3. GAME THEORY 23
3.2.1 Equilibrium
In any game, each player tries to maximize their own payoff. Payoffs are
dependent on the combination of strategies that each employs. Thus, the
decision of which strategy to employ is critical. This section begins by
considering the Nash equilibrium of any normal form game, and then
considers the more specific case of Maximin equilibrium in strictly competitive
games. Both refer to a steady state wherein each player correctly assumes how
the other(s) will play and chooses his own strategy rationally in response.
Generally, a game is said to be in equilibrium if no player can improve their
payoff by unilaterally changing strategy.
Nash Equilibria
The concept of the Nash equilibrium is well illustrated through the classic
example of the Prisoner’s Dilemma, as shown through a normal form
representation in Figure 3.2. This is a standard example in game theory
literature, and illustrates several important concepts. The rules of the game are
as follows: Two individuals are arrested by the police, and are suspected of
committing a crime together. The police lack sufficient evidence to convict
either on the most serious charge, but enough to convict either on a less
serious charge. Each individual is offered a choice: betray the other and testify
to the police that the other committed the most serious crime, or stay silent
and accept the conviction of the lesser charge. If both betray each other, the
24 CHAPTER 3. GAME THEORY
penalty is reduced by one year for each. The individuals cannot communicate
with each other as they decide whether or not to betray their co-conspirator.
Unsurprisingly, they each want to minimize the number of years that they
spend in prison. The potential payoffs, in years in prison respectively, are
shown in Figure 3.2.
p2
Betray Silent
Betray (3, 3) (0, 4)
p1
Silent (4, 0) (1, 1)
With this intuitive understanding, we can now formalize this logic and
develop the theory more generally. This section focuses on the Nash
equilibrium of the normal form representation; the concept extends naturally
CHAPTER 3. GAME THEORY 25
In this definition, we denote an optimal action profile for player i as a∗i , and
a non-optimal action profile ai . All players other than i are denoted by −i. For
a profile a∗ to be a Nash equilibrium, it must be the case that no player i has an
action with an outcome preferable to the corresponding outcome in the
optimal a∗i , given that every other player chooses their optimal a∗j . Here, we
consider a utility function with multiple inputs of action profiles (each
representing one or more players), whose union is a member of A. Consider
the second case, with specified utility functions. Player i uses his optimal
action profile a∗i , as do all other players −i. The payoff to player i, defined by ui ,
is greater than or equal to his payoff when all other players continue to use
their optimal a∗−i and he uses a non-optimal ai . No player has reason to deviate
from their strategy, given the strategy of the other players. A similar
conception of the Nash equilibrium applies to other game forms.
While this extends to n-player games, we are most concerned with the two
26 CHAPTER 3. GAME THEORY
and
u2 (a∗1 , a∗2 ) ≥ u2 (a∗1 , a2 ) ∀ai ∈ Ai .
The payoff to each player is at least as great playing their optimal action
profile a∗1 or a∗2 than by playing any non-optimal a1 or a2 . This combination of
action profiles a∗1 and a∗2 is a Nash equilibrium.
Maximin Equilibrium
We now turn our attention to the specific case of Nash equilibria for a
two-person, zero-sum game. This section is heavily derived from [7],
beginning with the following definition.
D E
Definition 9. A strategic game {1, 2}, (Ai ), (%i ) is strictly competitive or zero
sum if for any a, b ∈ A we have a %1 b if and only if b %2 a. For utility functions u1
and u2 , this condition is u1 (a) ≥ u1 (b) if and only if u2 (b) ≥ u2 (a).
Recall that the inputs to each ui are either elements of A, which are tuples of
actions profiles for multiple players, or elements of Ai , which are individual
player’s action profiles. If player 1’s preference relation %1 is given by the
payoff function u1 , then player 2’s payoff function u2 is such that
u1 (s1 , s2 ) + u2 (s1 , s2 ) = 0 for any strategies s1 ∈ A1 , s2 ∈ A2 , and (s1 , s2 ) ∈ A. That
is, a payoff gain for either player corresponds to an equivalent payoff loss for
the other. This structure influences each player’s strategy, as they have no
incentive to cooperate.
CHAPTER 3. GAME THEORY 27
D E
Definition 10. Let {1, 2}, (Ai ), (%i ) be a strictly competitive game. The action
x∗ ∈ A1 is a maxminimizer for player 1 if
Theorem 4. For any strictly competitive game with Nash equilibria, a pair of action
profiles is a Nash equilibrium if and only if the action profile of each player is a
maximinimizer.
Proof. Since G is a strictly competitive game, we have that u1 (x) = −u2 (y) for
any pair of strategies x and y. For an arbitrary function f ,
minz (− f (z)) = − maxz f (z) and arg minz (− f (z)) = arg maxz f (z). These arg
values are the z values for which minz (− f (z)) and maxz f (z) attain their
equivalent minimum and maximum f (z) values. Let x ∈ A1 and y ∈ A2 be (not
necessarily optimal) strategies for players 1 and 2, respectively. For every
y ∈ A2 , we have
Thus,
max[max u2 (x, y)] = − min[− min u2 (x, y)] = −min y∈A2 max u1 (x, y).
y∈A2 x∈A1 y∈A2 x∈A1 x∈A1
CHAPTER 3. GAME THEORY 29
D E
Proposition 2. Let G = {1, 2}, (Ai ), (%i ) be a strictly competitive game.
(c) If x∗ and y∗ are maximin strategies for player’s 1 and 2 respectively, then (x∗ , y∗ )
is a Nash equilibrium of G.
Proof. We first prove (a) and (b) in conjunction. Let (x∗ , y∗ ) be a Nash
equilibrium of G. Then u2 (x∗ , y∗ ) ≥ u2 (x∗ , y) for all y ∈ A2 . Likewise, since
u2 = −u1 , u1 (x∗ , y∗ ) ≤ u1 (x∗ , y) for all y ∈ A2 . Therefore,
u1 (x∗ , y∗ ) = min y u1 (x∗ , y) ≤ maxx min y u1 (x, y). Similarly, u1 (x∗ , y∗ ) ≥ u1 (x, y∗ ) for
all x ∈ A1 . Therefore, u1 (x∗ , y∗ ) ≥ min y u1 (x, y) for all x ∈ A1 , so that
u1 (x∗ , y∗ ) ≥ maxx min y u1 (x, y). Therefore, u1 (x∗ , y∗ ) = maxx min y u1 (x, y) and x∗
is a maximin strategy for player 1. An analogous argument shows that y∗ is a
maximin strategy for player 2 and that u2 (x∗ , y∗ ) = max y minx (x, y), so that
u1 (x∗ , y∗ ) = min y maxx u1 (x, y).
For part (c), let v∗ = maxx min y u1 (x, y) = min y maxx u1 (x, y). From Lemma
1, we know that max y minx u2 (x, y) = −v∗ . Since x∗ is a maximin strategy for
player 1, we have u1 (x∗ , y) ≥ −v∗ for all y ∈ A2 . Likewise, since y∗ is a maximin
strategy for player 2, we have u2 (x, y∗ ) ≥ −v∗ for all x ∈ A2 . Setting y = y∗ and
30 CHAPTER 3. GAME THEORY
– If (ak )k=1,...,K ∈ H (where K may be infinite) and L < K then (ak )k=1,··· ,L ∈ H.
– If an infinite sequence (ak )k=1,··· satisfies (ak )k=1,··· ,L ∈ H for every positive
integer L then (ak )k=1,··· ∈ H.
Let h be a history of length k; the ordered pair (h, a) is the history of length
k + 1 consisting of h followed by a. For any non-terminal history h, the player
function P(h) chooses an action a from the set
At the beginning of any extensive game, this history is empty; no actions have
previously occurred. For the empty sequence ∅ ∈ H, the player P(∅) chooses a
32 CHAPTER 3. GAME THEORY
• N = {1, 2};
In Figure 3.3, the node at the top of the graph represents the history ∅. Here,
player 1 chooses between actions A and B, which are the members of A(∅).
These actions are shown as line segments originating from the initial node.
Once the first player has chosen either A or B, it is player 2’s turn,
P(A) = P(B) = 2. If player 1 chose action B, player 2 chooses between the
members of A(B), which are the actions G and H. Then, player 1 chooses
between actions C and D. The preference relations %1 and %2 define the
ordering of each player’s preference of outcomes; they can also be utility
functions u1 and u2 with the same ordering.
CHAPTER 3. GAME THEORY 33
p1
A B
p2 p2
E F G H
1, 4 3, 2
C D C D
3, 2 0, 5 4, 1 5, 0
S1 = (A), (B, (G, C), (H, C)), (B, (G, C), (H, D)), (B, (G, D), (H, C)), (B, (G, D), (H, D)).
(3.2)
contingent on player 2’s actions. For example, the strategy (B, (G, C), (H, D))
represents the following contingency plan: after first choosing action B, if
player 2 chooses G, take action C. Otherwise, if player 2 takes action H, player
1 chooses D. Player 2’s strategies are simpler, and can be described as
S2 = ((A, E), (B, G)), ((A, E), (B, H)), ((A, F), (B, G)), ((A, F), (B, H)). (3.3)
The strategy ((A, E), (B, G)) is interpreted as follows: if player 1 chooses A,
choose E. Otherwise, if player 1 chooses B, choose G.
It is important to note that a particular strategy defines the action to be
chosen for each history, even if that history is never reached when the strategy
is employed. For example, consider the strategy (A, (G, C), (H, C)) for player 1.
If he chooses action A, the later choice between C and D cannot happen.
Despite this, we still specify what player 1 would do in that scenario. In this
way, a formal strategy in an extensive game differs from an intuitive plan of
action.
D E
Definition 14. A pure strategy si of player i ∈ N in an extensive game N, H, P, %i )
is a deterministic prescription of how to play a game. A mixed strategy σi is a
probability distribution over all pure strategies.
This concepts of pure and mixed strategies is similar to that in the normal
form representation. For the game in Figure 3.3, (3.2) and (3.3) are the pure
strategies. We use the following notation to distinguish between the pure and
mixed strategies of the various players in any game. We often consider the
strategy of a particular player i, while holding the strategies of his opponents
fixed. Let s−i ∈ S−i denote the strategy selection for all other players than i, and
CHAPTER 3. GAME THEORY 35
0
write (si , s−i ) for the profile
0
(s1 , · · · , si−1 , si , si+1 , · · · , si ).
0 0
(σi , σ−i ) = (σ1 , · · · , σi−1 , σi , σi+1 , · · · , σi ).
Definition 15. A pure strategy si is strictly dominated for player i if there exists a
0
mixed strategy σi such that
0
ui (σi , s−i ) > ui (si , s−i ) ∀s−i ∈ S−i
0
The strategy si is weakly dominated if there exists a σi such that this holds with
weak inequality, and the inequality is strict for at least one s−i .
36 CHAPTER 3. GAME THEORY
Definition 17. An extensive form game has perfect recall if for each player i we have
Xi (h) = Xi (h0 ) wherever the histories h and h0 are in the same information set of player
i.
In a game with perfect recall, no player forgets information that they knew
at a previous state. This information includes previous actions and any hidden
information known only to that player, as well as previous information sets.
Later information sets must refine previous information sets from earlier
histories. If this is not the case, then the game has imperfect recall. It is not
38 CHAPTER 3. GAME THEORY
p1
M N
p2 p2
O P O P
p1 p1 p1 p1
l r l r L R L R
−1, 1 1, −1 3, −3 −2, 2 0, 0 1, −1 1, −1 3, −3
Figure 3.4: Extensive Form Game with Imperfect Information and Perfect Recall
Figure 3.4 shows an extensive form game with imperfect information and
perfect recall. In this example, neither player knows which action that the
other takes. For example, when player 2 is deciding between actions O and P,
he does not know if player 1 has previously chosen M or N. Player 2 has a
single information set, comprising both of his decision nodes and all four
associated actions. This information set is represented by the first dashed line;
at these nodes, he does not know if player 1 just took action M or N. Player 1
has two information sets, each with two nodes and four associated actions.
His first information set represents the two possible moves by player 2 (O and
P), after he chose action M at the beginning of the game. His second
CHAPTER 3. GAME THEORY 39
information set contains the two game states possible after he chose action N
originally. Notice that these player 1’s information sets refine previous
distinctions; there is one set where he previously took action M, and another
where he took action N. Further, notice that player 1’s possible actions are
identical across all nodes in an information set, but are also unique to that
information set. He faces the choice between l and r only when he previously
chose M, and between L and R only when he previously chose N.
equivalent.
Definition 20. Let σ = (σi )i∈N be a profile of mixed or behavioral strategies for an
extensive game with imperfect information. The outcome O(σ) of σ is the probability
distribution over the terminal histories that results when each player i follows the
strategy specification of σi .
Definition 21. Two strategies, either mixed or behavioral, are outcome equivalent
if for every collection of pure strategies of the other players the two strategies induce
the same outcome.
Proposition 3. For any mixed strategy of a player in a finite extensive game with
perfect recall, there is an outcome-equivalent behavioral strategy.
We now present a brief theory of the Nash equilibrium for extensive form
games with imperfect information.
This version of the Nash equilibrium is very similar to that of the normal form
and the extensive form perfect information case. The optimal mixed strategy
CHAPTER 3. GAME THEORY 41
Lemma 2. For an extensive game with perfect recall, the Nash equilibria in mixed
and behavioral strategies are equivalent.
Many of the structures of the sequence form are similar to those in the
extensive form. Recall that an extensive form game has a chance player and N
personal players. The chance player governs the chance node(s) by one or
more probability distributions, and the personal players govern the decision
nodes by their prescribed strategy. The sequence form defines a strategy
differently; instead of defining a probability distribution over all pure
strategies or over each information set, the player considers each payoff node
of the game tree and the sequence of actions necessary to get there. These
sequences are the basic unit of analysis in this form.
Definition 24. Let h(a) be a function that maps each terminal node a to a payoff
vector in Rn . The payoff function g : S0 × S1 × · · · × Sn 7→ RN is defined by
g(s) = h(a) if s is the (N + 1)-tuple (s0 , s1 , · · · , sN ) of sequences defined by a leaf a of
the game tree, and by g(s) = (0, · · · , 0) ∈ RN otherwise. Note that the chance player
sequence s∅ occurs with probability 1 and will always be the first input in g.
The tuple (s0 , s1 , . . . , sN ) is unique for a particular node a in the game tree;
therefore, the payoff function g is well defined. The number of sequences is at
most the number of nodes, which grows linearly with the size of the game
tree. In contrast, the number of pure strategies in the normal form grows
exponentially, since each pure strategy must consider each combination of
actions at all possible information sets. For a two-player game with a known
chance player, the matrix P is the payoff matrix with each of player 1 and 2’s
sequences as the rows and columns, respectively. A particular entry Pi j is the
payoff of sequences si and s j being played by the two respective personal
players; that is, Pi j = g(s∅ , si , s j ). Many combinations of sequences between the
two players are impossible; they define combinations of actions that cannot
occur in conjunction with each other. Thus, the number of non-zero entries in
A is at most the number of leaves in the game tree. All other entries are 0, since
44 CHAPTER 3. GAME THEORY
Y
ri (si ) = βi (c),
c∈si
p1
M N
p2 p2
o p O P
p1 p1 p1 p1
l r l r L R L R
−1, 1 1, −1 3, −3 −2, 2 0, 0 1, −1 1, −1 3, −3
Each ri (si ) is a real value that represents the product of the individual
decision probabilities of the actions leading to si . We now explore the
particular properties of this ri , based on [13]. In a game with perfect recall,
every member of an information set u ∈ Ii of player i has an identical history.
Let Cu be the set of actions possible at u. Consider a particular sequence su ,
which we call the sequence leading to u. An action c chosen at this information
set extends u. This extended sequence is given by
su c = su ∪ {c} ∀c ∈ Cu .
Since the choices made by the player up until that information set are the
same, any nonempty sequence is uniquely specified by its last action c. We can
now formally define the set of sequences Si , which has 1 + u∈Ii |Cu | elements;
P
46 CHAPTER 3. GAME THEORY
one for each action in each information set, and one for the empty sequence s∅ .
Each elemenet in Si is a particular sequence s. Notice that the number of
sequences in a game is at most the number of terminal (payoff) nodes; an LP
that considers individual sequences will typically have fewer variables than
an LP that considers all possible pure strategies.
ri (s∅ ) = 1 (3.4)
sequence must equal the sum of the realization probabilities for that sequence
extended over all of its possible extending actions c. We can think of the
probability of reaching a particular sequence as being “distributed” over all of
these extending actions. Formally,
X
−ri (su ) + ri (su c) = 0 ∀u ∈ Ii ∀su ∈ Si . (3.5)
c∈Cu
ri (su c)
βi (c) = ∀c ∈ Cu
ri (su )
X
βi (c) = 1
c∈Cu
Definition 27. Any information set u for which ri (su ) = 0 in behavior strategy βi is
called irrelevant.
strategy.
Mixed strategies can also be represented by a realization plan. Recall that a
mixed strategy is a combination of pure strategies with probabilities assigned
to each. This mixed strategy has a corresponding set of realization plans that
follow Definition 25. However, information is lost in converting a mixed
strategy to a realization plan, since the later does not need to cover every
possible contingency of the game. It only must cover the sequences and
information sets that the player will actually need to choose from with their
behavior strategy. Fortunately, the realization plan does retain the strategically
relevant aspects.
Definition 28. A set of mixed strategies are realization equivalent if for any fixed
strategies (pure or mixed) of the other players, all of the mixed strategies define the
same probabilities for reaching the nodes of the game tree.
Proposition 5. Mixed strategies are realization equivalent if and only if they have the
same realization plan.
Corollary 1. For a player with perfect recall, any mixed strategy is realization
equivalent to a behavior strategy.
based on [13]. This game is first converted to sequence form using the theory
of the previous section. The sequence form structures are used to build a
primal-dual pair of LPs. We want to find a pair of realization plans, which can
be converted into behavior strategies. The components of an optimal
realization plan are probability distributions over each information set. The
sequence form translates to a primal-dual pair of LPs, whose solutions are the
optimal realization plans for each player.
This section uses the following notation. The realization plans r1 and r2 are
denoted as column vectors x and y which have |S1 | and |S2 | entries,
respectively. The constraint matrices E and F show that x and y are valid
realization plans in accordance with Definition (25), with Ex = e and Fy = f .
Here, E and F have |S1 | and |S2 | columns, respectively. They have 1 + |I1 | and
1 + |I2 | rows, respectively. Both e and f are unit vectors of the appropriate
dimension. The payoff matrix P has |S1 | rows and |S2 | columns, and individual
payoff entries are calculated according to Definition (24). The payoffs are with
respect to player 1; in this zero-sum game, player 2’s expected payoffs are
given by −P. The expected payoff to realization plans x and y are xT Py and
−xT Py for the respective players.
y ≥ 0.
50 CHAPTER 3. GAME THEORY
In this LP, player 2 wants to utilize a realization plan y that maximizes his
payoff given by −(xT P)y. This realization plan must correspond to a legal set
of moves, giving the constraint Fy = y. Finally, since each element of y is a
probability, we have y ≥ 0. We also have the dual of this LP, given by
Minimize: f (q) = qT f
(3.8)
subject to: qT F ≥ −xT P.
The dual variables are given by the vector q, which has 1 + |I2 | elements.
We have an analogous pair of LP’s for the best response x of player 1 given
realization plan y for player 2. This finds a realization plan x that maximizes
player 1’s payoff xT (Py). This x must satisfy the constraints xT ET = eT and x ≥ 0
that define a valid realization plan. The primal is
x ≥ 0.
The dual problem uses the unconstrained vector p, which has 1 + |I2 |
elements. It is given by
Minimize: f (p) = eT p
(3.10)
T
subject to: E p ≥ Py.
These LPs describe algorithms for finding the optimal realization plans of
one player, given a particular realization plan of the other. We require an LP
CHAPTER 3. GAME THEORY 51
that generates the Nash equilibrium strategies, in which each player actively
considers the other’s strategy. Consider that (3.7) and (3.10) have the same
optimal objective function value; that is xT (Ay) = eT p. This is the payoff to
player 1, given that player 2 uses realization plan y. If y is not fixed and can be
varied by player 2, he will try to minimize this payoff. Since this is a zero-sum
game, player 2 wants to minimize any payoff he gives to player 1, as this will
maximize his own payoff. If we consider that y can vary, it must be subject to
the same constraints as (3.9). Thus, player 1’s optimal strategy is the solution
to the following LP:
Minimize: f (y, p) = eT p
subject to: − Py + ET p ≥ 0
(3.11)
Fy = f
y ≥ 0.
Player 2’s optimal strategy is the solution to the dual LP, given by
Maximize: f (x, q) − qT f
x ≥ 0.
Theorem 5. The equilibria of a zero sum game with perfect recall are the optimal
primal and dual solutions of this sequence form linear program whose size, in sparse
representation, is linear in the size of the game tree.
52 CHAPTER 3. GAME THEORY
Proof. Consider the primal LP in (3.11) with optimal solution y, p and the dual
LP in (3.12) with optimal solution x, q. Based on previous discussion, the
number of nonzero entries of the payoff matrix P and of the constraint
matrices E and F grows linearly with the game tree. By definition, y and p are
feasible solutions to (3.7) and (3.8), respectively. Likewise, x and q are feasible
solutions to (3.9) and (3.10). Multiplying the Fy = f constraint in (3.7) by qT
and the constraint in (3.8) gives
eT p ≥ xT Py ≥ −qT f. (3.15)
The inequality in (3.13) is the Weak Duality Theorem (2). The value of the
objective function xT Py from (3.7) is bounded from above by its dual objective
function value qT f from (3.8). Analogously, (3.14) shows this bound for the
primal-dual pair of (3.9) and (3.8) and (3.15) for (3.11) and (3.12). Recall the
Strong Duality Theorem (3), which states that a pair of primal/dual solutions
are optimal only if the associated objective functions are equal. Applying this
to (3.11) and (3.12), we have that eT p = −qT f . This implies that equality holds
in (3.13), (3.14), and (3.15). Thus, x is an optimal solution to (3.9) and a best
CHAPTER 3. GAME THEORY 53
Kuhn Poker
4.1 Rules
In Kuhn Poker, there are two players and a deck containing the cards Jack,
Queen, and King. No card is repeated. First, each player antes one unit. Each
player is then dealt one card, and the player with the highest card wins the pot
at showdown, provided neither player has folded. After the random deal, the
subsequent betting structure of Kuhn poker is as follows. Player 1 can choose
to either bet one unit or check. If player 1 bets, player 2 has the option of
calling the bet or folding. If player 1 checks, player 2 can either bet one unit or
55
56 CHAPTER 4. KUHN POKER
check. If player 2 bets, player 1 can either call or fold. At this point, if neither
has folded, the hand goes to showdown and the player with the highest card
wins the total pot [5].
JQ JK QJ QK KJ KQ
a b a b c d c d e f e f
m p q s t u x q s t u x m p
n o r vw r vw n o
-1 1 -2 -1 1 -2 1 1 2 -1 1 -2 1 1 2 1 1 2
g h g h i j i j k l k l
-1 -2 -1 -2 -1 2 -1 -2 -1 2 -1 2
combinations for each player and particular dealing of the three cards.
4.3.1 Strategies
Each entry in Figure 4.2 represents a complete pure strategy that player 1
could choose, covering all deals and both betting rounds. The prescribed pure
strategy for each held card (J, Q, K) is read from the left, and the comma
delineates player 1’s first round betting strategy from his second round betting
strategy. A P represents a check or fold , B represents a bet or call, and O
represents a non-existent move option in player 1’s second betting round.
Player 1 will use the pass P to fold if player 2 has just bet, and to check if
player 2 has not just bet. For example, PPB, PPO represents player 1 betting
only while holding a King in the first round, and folding all hands if player 2
bets. If player 2 chooses not to bet after we check, the hand goes to showdown;
in this case, the second half of the strategy above is irrelevant. Note that there
is no prescribed strategy (O) for having a King in the second round after
CHAPTER 4. KUHN POKER 59
player 2 has bet, since this sequence of actions is impossible if player 1 always
bets while holding a King in the first round. Player 2 can only call or fold;
player 1 would never face a bet from player 2.
Player 1 has 27 pure strategies, and player 2 has 64 pure strategies. Each B
60 CHAPTER 4. KUHN POKER
and P entry has the same meaning, and the comma separates the actions that
player 2 will take after player 1 checks from when player 1 bets. The letters to
the left of the comma prescribe strategy for when player 1 has just checked,
and to the right for when he has just bet. Note that since none of player 2’s
betting actions are dependent on his own previous actions, all combinations of
bets are defined for each round. For example, PPB, PPB represents player 2
always calling or betting while holding a King, and checking or folding
otherwise.
Each player in Kuhn Poker has many weakly dominated strategies, which
typically stem from a particular action a that is never advantageous to do.
Since the purpose of our analysis is to determine the optimal strategy for each
player, we can ignore these weakly dominated strategies. These can be
identified intuitively. For example, we can eliminate all of player 1’s strategies
involving folding a King after player 2 has bet. Player 1 will always win at
showdown with that hand, and gives up their ante by folding. Thus, those
strategies are weakly dominated by all others. We also eliminate any player 1
strategy that calls a bet with a Jack, bets in the first round with a Queen, as
well as any player 2 strategy that prescribes a bet with a Queen after player 1
has passed. In these last two cases, the player has nothing to gain by betting
with a Queen; the opposing player will always pass with a worse hand (a Jack)
and bet with a better hand (a King). In each case, the payoff expectation is at
least as great if the player passes while holding a Queen. The bolded strategies
in Figures 4.2 and 4.3 represent the non-weakly dominated strategies.
4.3.2 Payoffs
In order to find the Nash equilibria of Kuhn poker, we must first calculate
the payoffs for the entire strategy set. This corresponds to the normal form
representation, shown in (4.1). This matrix represents the payoffs to player 1
for each pair of pure strategies employed, with player 1 as the row player and
player 2 as the column player. The value in each matrix entry is the overall
payoff of the game to player 1; since this is a zero-sum game, the payoffs to
player 2 are simply the corresponding negatives. Each matrix entry is the sum
62 CHAPTER 4. KUHN POKER
of all payoff values over the 6 potential deals, {(JQ), (JK), (QJ), (QK), (KJ), (KQ)}.
where each entry represents a strategy for the respective players, as shown in
payoff matrix P. Let σ1∗ and σ2∗ be Nash-optimal strategies for players 1 and 2,
respectively. Note that these could be either pure or mixed strategies. Then,
CHAPTER 4. KUHN POKER 63
σk1∗ is the probability that player 1 will play a particular pure strategy k. Note
that the sum of all of the pure strategy probabilities must sum to 1;
X X
σk1∗ = 1 , σk2∗ = 1. (4.2)
k k
σ1 = 0 1 0 0 0 0 0 0 ,
σ2 = 1
2
0 1
2
0 .
Since σ1∗ and σ2∗ are the optimal strategies for each player and M is the
payoff matrix for each combination of those strategies, we have
That is, the payoff z is the matrix product of each player’s mixed strategy σ
with the payoff matrix P. Player 1’s optimal strategy σ1∗ will maximize the
minimum of z against any of player 2’s pure strategies. Since player 1 does not
know in advance the counter-strategy that player 2 will use, he wants to
employ a strategy that gives the best worst-case scenario payoff. An arbitrary
strategy of player 1 will have at best expected payoff z against an arbitrary
strategy σ2 ;
σ1 MσT2 ≤ z. (4.5)
CHAPTER 4. KUHN POKER 65
σ2 = 1 0 0 0 ,
0 1 0 0 ,
(4.6)
0 0 1 0 ,
0 0 0 1
respectively, for each of the strategies denoted by the columns in (4.1) and
(4.3). We then find σ1 Mσ2 for each σ2 , and combine these four sums with the
upper bound of z from (4.5). This gives the constraints
4a + 3b + 4c + 5d + 5e + 4 f + 4g + 3h ≤ z
3a + 5b + 6c + 4d + 3e + 5 f + 2g + 4h ≤ z
(4.7)
4a + 3b + c + 2d + 3e + 2 f + 5g + 4h ≤ z
3a + 5b + 3c + d + e + 3 f + 3g + 5h ≤ z.
Dividing by z, we have
4a 3b 4c 5d 5e 4f 4g 3h
+ + + + + + + ≤1
z z z z z z z z
3a 5b 6c 4d 3e 5f 2g 4h
+ + + + + + + ≤1
z z z z z z z z (4.8)
4a 3b c 2d 3e 2f 5g 4h
+ + + + + + + ≤1
z z z z z z z z
3a 5b 3c d e 3f 3g 5h
+ + + + + + + ≤ 1.
z z z z z z z z
66 CHAPTER 4. KUHN POKER
Then, define i, j, . . . , p as
a b h
i = , j = ,...,p = . (4.9)
z z z
Since each a, b, ..., h are probabilities, they are all non-negative. Since we
modified the payoff matrix P to have strictly positive entries in M, we know
that z > 0. This gives the sign restrictions i ≥ 0, j ≥ 0, . . . , p ≥ 0. Combining
(4.8) and (4.9), we have
4i + 3j + 4k + 5l + 5m + 4n + 4o + 3p ≤ 1
3i + 5j + 6k + 4l + 3m + 5n + 2o + 4p ≤ 1
(4.10)
4i + 3j + k + 2l + 3m + 2n + 5o + 4p ≤ 1
3i + 5j + 3k + l + m + 3n + 3o + 5p ≤ 1.
a b h 1 1
+ + ... + = =⇒ i + j + ... + p = . (4.11)
z z z z z
Minimize: z = i + j + k + l + m + n + o + p
subject to: 4i + 3j + 4k + 5l + 5m + 4n + 4o + 3p ≤ 1
3i + 5j + 6k + 4l + 3m + 5n + 2o + 4p ≤ 1
(4.12)
4i + 3j + k + 2l + 3m + 2n + 5o + 4p ≤ 1
3i + 5j + 3k + l + m + 3n + 3o + 5p ≤ 1
i, j, k, l, m, n, o, p ≥ 0.
σ1∗ = a b c d e f g h = iz jz kz lz mz nz oz pz , (4.13)
we can solve these equations to find an optimal strategy for player 1, in terms
of his potential pure strategies. This mixed strategy is b = 31 , f = 29 , g = 94 , and
is given by σ1∗ = 0 3 0 0 0 9 9 0 . Note that the sum of the entries in
1 2 4
probability 13 , BPB, OBO with probability 92 , and PPB, PPO with probability 49 .
This corresponds to the following mixed strategy. In the first betting round,
2
Player 1 will bet with probability 9
with a Jack, always pass with a Queen, and
2
bet with probability 3
with a King. On the second betting round for player 1,
he should always fold a Jack facing a bet, call with a Queen with probability 59 ,
and always call with a King. The other optimal solutions prescribe a similar
strategy; player 1 should vary bluffing with a Jack and slowplaying with a
King. In poker, slowplaying involves checking with a good hand to induce the
other player to bluff with a poor hand.
CHAPTER 4. KUHN POKER 69
We can also compute the Nash equilibria strategy for player 2, using a
similar Minimax LP method. Working from (4.4), we determine the optimal
strategy σ1∗ given any σ2∗ . The pure strategies for player 2 are given by the
columns in matrix P, and are abbreviated as u, v, w, x respectively. Player 2’s
optimal strategy minimizes the maximum of z against any of player 1’s pure
strategies. This LP is the dual of (4.12),
Maximize: z = u + v + w + x
subject to: 4u + 3v + 4w + 3x ≤ 1
3u + 5v + 3w + 5x ≤ 1
4u + 6v + 1w + 3x ≤ 1
5u + 4v + 2w + x ≤ 1
(4.14)
5u + 3v + 3w + x ≤ 1
4u + 5v + 2w + 3x ≤ 1
4u + 2v + 5w + 3x ≤ 1
3u + 4v + 4w + 5x ≤ 1
u, v, w, x ≥ 0
q
where u = z , v = zr , w = zs , and x = zt . This LP has a unique optimal solution,
u= 2
11
, x= 1
11
, and z = 3
11
. This optimal solution corresponds to
σ2∗ = 2
3
0 0 1
3
. (4.15)
70 CHAPTER 4. KUHN POKER
These values prescribe the probability with which player 2 should utilize each
2
of his pure strategies. He should use PPB, PPB with probability 3
and
BPB, PBB with probability 31 . This optimal σ2∗ corresponds to a strategy as
follows. When holding a Jack, player 2 should always fold when facing a bet
1
and bet with probability 3
if player 1 checks. With a Queen, they should call a
1
bet with probability 3
and always check when facing a check from player 1.
When holding a King, they should always call when facing a bet or bet when
player 1 checks.
Now that we have the optimal strategies s1∗ and s2∗ , we can find the
expected value of the game for both the original payoff matrix P and the
modified matrix M. This z∗ is the expected value of the game to player 1, if
each player utilizes strategy. For the original case, we have
6z = σ1∗ PσT2∗
0 −1 0 −1
−1 1 −1 1
0 2 −3 −1 23
1 0 −2 −3 0
= 0 1 2 4
3
0 0 0 9 9
0 (4.16)
1 −1 −1 −3 0
1 −2 −1 13
0
0 −2 1 −1
−1 0 0 1
1
=− .
3
6z + 4 = σ1∗ MσT2∗
4 3 4 3
3 5 3 5
3 23
4 6 1
5 4 2 1 0
= 0 1 2 4
3
0 0 0 9 9
0
(4.17)
5 3 3 1 0
3 13
4 5 2
4 2 5 3
3 4 4 5
11
= .
3
Deal
JQ JK
a b a b
m o p q s t
n r
-1 1 -2 -1 1 -2
g h g h
-1 -2 -1 -2
Note that each player in Kuhn Poker has a unique set of sequences, given by
S0 = {s∅ },
l
[
S1 = si ,
i=a
x
[
S2 = si ,
i=m
The S0 set represents the chance player, who initializes the game at the root by
generating a random deal of the cards for each player.
4.4.2 Payoffs
We now turn our attention to defining the payoff matrix P in the sequence
form representation of Kuhn Poker. Since the payoffs at each leaf are unique to
their preceding sequences, the payoff function g is well defined. There are at
most as many sequences as nodes for a particular player, so the number of
sequences is linear in the size of the game tree. Since the number of nonzero
entries of this matrix is at most the number of sequences of player i, this matrix
is sparse. Most entries will be 0, as they represent combinations of sequences
that are not possible and have no associated payoff [13]. Consider
g(s∅ , sa , so ) = 0; this defines the payoff to player 1 checking while holding a Jack
and player 2 betting while holding a Queen after player 1 has bet. Since this
combination is not possible, the payoff is 0. Examining Figure 4.1, we can see
that most combinations of sequences s are impossible and are assigned a
payoff of 0 by default.
Deal
JQ
a b
m o p
n
-1 1 -2
g h
-1 -2
We now define the payoff matrix P, which represents the payoff from
player 1’s perspective for each combination of sequences played. Since this is
a zero-sum game, the payoffs from player 2’s perspective for each combination
of sequences is simply −P. For the realization plans x and y, the payoffs to
each player are xT Py and −xPyT , where P is a |S1 | × |S2 | matrix. Each entry in P
is given by
X
g(s0 , s1 , s2 )r0 (s0 ).
s0 ∈S0
θ m n o p q r s t u v w x
θ 0
0 0 0 0 0 0 0 0 0 0 0 0
− 16 − 16
a 0 0 0 0 0 0 0 0 0 0 0
1
b 0
0 0 6
− 13 0 0 1
6
− 13 0 0 0 0
− 16 1
c 0
0 0 0 0 0 0 0 6
0 0 0
1
− 13 1 1
d 0
0 0 0 0 0 0 6
0 0 6 3
1 1
e 0 0 0 0 0 0 0 0 0 0 0
6 6
= P
1 1 1 1 (4.18)
f 0 0 0 0 0 0 0 0 0
6 3 6 3
g 0
0 − 61 0 0 0 − 61 0 0 0 0 0 0
h 0
0 − 13 0 0 0 − 31 0 0 0 0 0 0
i 0 0 0 0 0 0 − 16 0 0 0 − 16 0 0
− 13 2
j 0
0 0 0 0 0 0 0 0 3
0 0
− 16 − 61
k 0
0 0 0 0 0 0 0 0 0 0
1 2
l 0 0 0 0 0 0 0 0 0 0 0
3 3
The final structures necessary to construct the sequence form LPs for Kuhn
Poker specify the legal ways to play the game. We define the vectors x and y to
be realization plans with |S1 | = |S2 | = 13 entries each, for each sequence of the
respective players. Each entry in x and y corresponds to a particular sequence
s of the respective players, which are the rows and column heading in
(4.18).The two strategy constraint matrices E and F, which define x and y as
realization plans. The vectors p and q, which represent the dual variables in
78 CHAPTER 4. KUHN POKER
each LP; they have 1 + |I1 | and 1 + |I2 | entries respectively. Finally, we have
vectors e and f , which are unit vectors of an appropriate dimension that satisfy
Ex = e and Fy = y. These elements are based on the properties of realization
plans in (3.4), (3.5), and (3.6).
The strategy constraint matrices E and F define the legal combinations of
sequences that each player can choose. The dimensions of E and F are
(1 + |I1 |) × |S1 | and (1 + |I2 |) × |S2 |. Since each player has 6 information sets
and 13 sequences, both matrices are 7 × 13. Each row in these matrices
represent a particular information set, and each column represents a particular
sequence. The fact that both matrices are the same size is a coincidence based
on the structure of Kuhn Poker, as both player 1 and player 2 happen to have 6
information sets and 12 sequences. The first row in E and F represents the 0th
information set corresponding to the random deal, and the first column
represents the sequence s∅ of the random deal.
These matrices can be constructed as follows. We place a 1 in this first
entry to initialize the game. For each subsequent information set row, we place
a −1 in the sequence that is required to happen before the player is in the
information set. Then, we place a 1 in each sequence in the information set.
For example, sequence sa must occur before player 1 can be in information set
4 where he can choose to play sequence g or h. Since the previous “move” for
player 2 is the random deal θ, each information set row in F initializes with a
−1 at θ. Using this process, we have E and F as shown in (4.19) and (4.20).
CHAPTER 4. KUHN POKER 79
θ a b c d e f g h i j k l
0 1
0 0 0 0 0 0 0 0 0 0 0 0
1 −1 1 1 0 0 0 0 0 0 0 0 0 0
2 −1 0 0 1 1 0 0 0 0 0 0 0 0
0 = E
3 −1 0 0 0 0 1 1 0 0 0 0 0 (4.19)
4 0
−1 0 0 0 0 0 1 1 0 0 0 0
5 0
0 0 −1 0 0 0 0 0 1 1 0 0
6 0 0 0 0 0 −1 0 0 0 0 0 1 1
θ m n o p q r s t u v w x
0 1
0 0 0 0 0 0 0 0 0 0 0 0
1 −1 1 1 0 0 0 0 0 0 0 0 0 0
2 −1 0 0 1 1 0 0 0 0 0 0 0 0
0 = F
3 −1 0 0 0 0 1 1 0 0 0 0 0 (4.20)
4 −1 0 0 0 0 0 0 1 1 0 0 0 0
5 −1 0 0 0 0 0 0 0 0 1 1 0 0
6 −1 0 0 0 0 0 0 0 0 0 0 1 1
We now have all of the necessary theory and structures to write the
sequence form representation of Kuhn Poker. Recall from Section 3.4.1 that
80 CHAPTER 4. KUHN POKER
x ≥ 0.
Minimize: q0
subject to: q0 − q1 − q2 − q3 ≥ 0
1 1
xm + xr + q2 − q4 ≥0
6 6
1 1 1 1
− x∅ + xp − xs + xt + q1 ≥0
6 3 6 3
1 1
xq − xu + q2 − q5 ≥0
6 6
1 1 1 1
− xs + xt − xw − xx + q2 ≥0
6 3 6 3
1 1 (4.22)
− xm − xu + q3 − q6 ≥0
6 6
1 1 1 1
− xo − xp − xw − xx + q3 ≥0
6 3 6 3
1 1
xn + xr + q4 ≥0
6 6
1 1
xn + xr + q4 ≥0
3 3
1 1
xr + xv + q5 ≥0
6 6
1 1
xr − xv + q5 ≥0
3 3
CHAPTER 4. KUHN POKER 81
1 1
xn + xv + q6 ≥ 0
6 6
1 1
− xn − xv + q6 ≥ 0
3 3
x∅ = 1
−x∅ + xm + xn = 0
−x∅ + xo + xp = 0
−x∅ + xq + xr = 0
−x∅ + xs + xt = 0
−x∅ + xu + xv = 0
−x∅ + xw + xx = 0.
2 1
x∅ = 1 xm = 1 xn = 0 xo = xp =
3 3
2
xq = 0 xr = 1 xs = 0 xt = 1 xu =
3 (4.23)
1 1 1
xv = xw = 1 xx = 0 q0 = − q1 = −
3 18 3
1 7 1 2 1
q2 = − q3 = q4 = − q5 = − q6 = .
9 18 6 9 9
This optimal realization plan directly corresponds to an optimal behavioral
strategy. Each optimal x value corresponds to the frequency with which player
2 should utilize the corresponding sequence s. For each information set, the x
values represent a probability distribution. For example, consider the optimal
values xa = 1 and xb = 0. These values correspond to the sequence sa and sb ,
which represent player 2’s choice of checking or betting when holding a
Queen, after player 1 has checked on the first betting round. In this optimal
82 CHAPTER 4. KUHN POKER
To find the optimal strategy for player 1, we solve the dual of player 2’s LP.
This LP is given by
Minimize: f (y, p) = eT p
subject to: − Py + Et p ≥ 0
(4.24)
Fy = f
y ≥ 0.
Maximize − p0
subject to: p0 − p1 − p2 − p3 − p4 − p5 − p6 ≥ 0
1 1
− ya + ye + p1 ≥0
6 6
1 1 1 1
− y g − yh − yo + yp + p1 ≥0
6 3 6 3
1 1
yb + y f + p2 ≥0
6 6
1 1 (4.25)
− yb + y f + p2 ≥0
3 3
1 1
− ya − yc + p3 ≥0
6 6
1 1 1 1
− y g − yh − yk − yl + p3 ≥0
6 3 6 3
1 1
yb + yd + p4 ≥0
6 6
1 1
− yb − yd + p4 ≥0
3 3
84 CHAPTER 4. KUHN POKER
1 1
yc + ye + p5 ≥0
6 6
1 1 1 1
− yk + yl − yo + yp + p5 ≥0
6 3 6 3
1 1
yd + y f + p6 ≥0
6 6
1 1
yd + y f + p6 ≥0
3 3
y∅ = 1
−y∅ + ya + yb = 0
−y∅ + yc + yd = 0
−y∅ + ye + y f = 0
−ya + y g + yh = 0
−yc + yk + yl = 0
−ye + yo + yp = 0.
While player 2 has a single optimal strategy, player 1 has infinitely many
optimal strategies. Mathematica’s Linear Programming function has three
optional methods to compute numerical solutions, Simplex, Revised Simplex,
and Interior Point. The first two methods compute the endpoint solutions,
while Interior Point finds the midpoint solution. With these methods, we can
characterize all possible optimal strategies. These are reported as behavior
strategies with a probability distribution defined on each information set. The
value of each y is the probability that player 1 should utilize the corresponding
sequence s when in the relevant information set. The particular p values are
unimportant in this analysis. These results are shown in Table 4.1.
As these results demonstrate, player 1 has infinitely many optimal
CHAPTER 4. KUHN POKER 85
y∅ 1 1 1
5 2
ya 1 6 3
1 1
yb 0 6 3
yc 1 1 1
yd 0 0 0
1
ye 1 2
0
1
yf 0 2
1
5 2
yg 1 6 3
yh 0 0 0
2 1 1
yi 3 2 3
1 1 2
yj 3 2 3
yk 0 0 0
1
yl 1 2
0
p0 − 181 1
− 18 − 181
1
p1 0 − 18 − 19
1 2
p2 0 9 9
p3 − 187 7
− 18 − 187
1
p4 0 − 18 − 19
1 1 1
p5 3 4 6
421 1
p6 0 5000 6
strategies. Since the Simplex and Revised Simplex solutions represent the
endpoint solutions, all intermediate solutions are also optimal. We can
characterize player 1’s optimal strategies as dependent on a parameter
α ∈ [0, 13 ], which is the probability that they will bet on their first turn while
holding a Jack. When holding a King, player 1 should bet with probability 3α
on their first turn. When holding a Queen, player 1 should always check on
their first betting round. If player 2 bets after this check, player 1 should call
the bet with probability α + 13 . Finally, player 1 should always fold a Jack and
call with a King when facing a bet from player 2.
Unlike player 1’s LP solution (4.1), player 2’s LP solution does not
correspond neatly to a behavioral strategy. Recall that the optimal values
calculated by the sequence form LP describe a realization plan, which are
relative to the realization probability of an information set. Consider that
player 1 will not always have a second choice to bet; this only occurs when he
checks and player 2 bets. There are many other paths down the game tree that
do not involve these moves, such as both players checking or player 2 folding
to player 1’s bet. Since player 2 always faces a decision to check, bet, or fold
regardless of player 1’s action, the sum of each pair of sequences s for each
information set is 1. The same is not true for player 1, who will not always get
to play sequences s g through sl . Consider the optimal ya = 5
6
in the Interior
Point solution (α = 16 ). Since player 1’s potential later choice between s g and sl
is dependent on him also playing sa , y g + yl = 5
6
in the solution to the LP. We use
a similar process to convert each of the values in the realization plan to a full
behavioral strategy. In a sense, the probability of the base sequence (ya = 56 )
becomes the new “1” for its subsequent sequences further down the game tree.
CHAPTER 4. KUHN POKER 87
In player 1’s optimal realization plan, some sequences will never be used.
Consider the sequences sk and sl , both with realization probability 0 in the
Revised Simplex solution. Both are members of the same information set, in
which player 1 was dealt a King, checked initially, and now faces a bet from
player 2. However, he will never reach this information set if he plays
according to other elements of his optimal realization plan. In the Revised
Simplex solution, ye = 0; that is, he never checks initially while holding a King.
Since se is necessary for both sk and sl , he will never face the choice. Therefore,
an optimal realization plan does not have to specify frequencies for these
actions. Table 4.2 shows the optimal realization plan converted to an optimal
behavior strategy as above, with *** denoting sequences that are never reached.
The Simplex solution represents the case α = 0. In this case, player 1
always checks in the first round; that is, ya = yc = ye = 1 and yb = yd = y f = 0.
Since he chooses to never bluff (by betting when holding a Jack), he must
balance his strategy by also never betting a King for value. He must call a bet
from player 2 while holding a Queen with probability 31 . The Revised Simplex
method gives the other endpoint solution, with α = 31 . In this strategy, player 1
bluffs with a Jack with probability 1
3
(yb = 31 ), and always value bets with a
King (y f = 1). He always checks with a Queen in the first round (yc = 1), and
calls a bet from player 2 with probability 2
3
(y j = 32 ). In all optimal strategies,
player 1 folds with a Jack and calls with a King when facing a bet. He cannot
win a showdown with a Jack, and cannot lose a showdown with a King; thus,
any strategy that did not prescribe these sequences is weakly dominated by
those that do. The Simplex solution (α = 0) represents player 1’s most
conservative strategy, as he never bluffs with a Jack and never value bets with
88 CHAPTER 4. KUHN POKER
s∅ 1 1 1
5 2
sa 1 6 3
1 1
sb 0 6 3
sc 1 1 1
sd 0 0 0
1
se 1 2
0
1
sf 0 2
1
sg 1 1 1
sh 0 0 0
2 1 1
si 3 2 3
1 1 2
sj 3 2 3
sk 0 0 ***
sl 1 1 ***
5.1 Rules
While poker variants such as Texas Hold’em and Omaha have become
very popular in recent years, variants such as 2-7 Draw still have a loyal
89
90 CHAPTER 5. 2-7 DRAW POKER
following. Its structure and game play are very different, and little previous
work has been published on game-theoretic optimal play. As such, it is a
fruitful area of research. This section details the rules of full-scale, two player
2-7 Draw based on [8]. 2-7 Draw is a lowball variant, meaning that traditional
hand rankings are reversed; the worst traditional hand wins the pot, if the
hand goes to showdown. Additionally, as the name suggests, it is a draw
variant. Each player is dealt a private hand, and has the opportunity to
discard and redraw any number of their own cards after each betting round.
These features contrast with games such as Texas Hold’em and Omaha, where
players share a common set of community cards and the highest hand wins.
At the start of each hand, each player is dealt 5 cards. There is a round of
betting, where each player has the option to check, bet, raise, or fold,
depending on the actions of previous players. Betting in 2-7 Draw games can
have a limit structure, with fixed bet sizes, or a no-limit betting structure,
where players can bet any amount. Then, each player discards any or all of
their hand, and replaces them with cards from the deck. 2-7 Draw is most
commonly played with one or three discard rounds; our version uses one
discard round. If there is more than one person who has not previously
folded, the hand goes to showdown, and the player with the worst hand wins
the pot. Thus, a hand of 2-7 (Single) Draw has the following structure:
2. Betting round #1
4. Betting round #2
CHAPTER 5. 2-7 DRAW POKER 91
1. No Pair
2. One Pair
3. Two Pair
4. Three of a Kind
5. Straight
6. Flush
7. Full House
8. Four of a Kind
9. Straight Flush
Within each of these categories, there is a further ordering. The best possible
No Pair hand is 7, 5, 4, 3, 2, not of the same suit. The best possible One Pair
hand is 2, 2, 5, 4, 3, which is worse than all No Pair hands. Ordering in each
category is determined first by the highest card (or pair), and subsequently by
the next highest card. For example, 4, 4, 4, 2, 2 beats 4, 4, 4, 3, 3. Each category
admits a similar ordering structure.
92 CHAPTER 5. 2-7 DRAW POKER
5.2 Abstractions
One abstraction technique to reduce the size of the game is to reduce the
number of cards in the deck. This simplifies the game by drastically reducing
the number of potential hands and, by extension, the size of the game tree. For
example, a 52 card deck has
!
52
= 2, 598, 960
5
three suits: clubs (♣), spades (♠), and hearts (♥). This deck has
!
21
= 1, 330
3
unique 3 card hands. The hand rankings for this abstraction and their relative
frequencies are shown in Table 5.1. While these hand rankings do not
correspond precisely to the original game, they retain enough similarity to
make valid strategic comparisons between the two games.
Table 5.1: Hand Rankings with Frequencies in Simplified 2-7 Draw Poker
We can also simplify the game by changing the number and structure of
betting rounds. In a typical game of 2-7 (Triple) Draw, there are up to three
opportunities (provided at least two players do not fold) for players to discard
and obtain new cards. Players can discard and redraw up to 5 cards each time.
Before the first and following every discard, there is a round of betting in
which players can either bet a fixed amount or any amount; 2-7 Draw can be
played as a limit or no-limit game, respectively.
5.2.3 Bucketing
of buckets are particular to the game state; in our case, there is one set of
buckets for hands determined by the initial deal, and another for hands after
the discard round.
The most intuitive method of bucketing is by raw hand strength; that is, by
the absolute hand rank relative to all possible hands. Using this method,
bucket composition would be determined by simply partitioning the ordered
set of all possible hands into some number of equivalence classes. This is an
appropriate method for the second set of buckets, when no new cards will be
drawn. However, this method loses any conception of hand potential, which
is critical in early stages of the hand. A player may be initially dealt a hand
with poor absolute hand strength, but with a high probability of improving
significantly during the discard round. This hand is said to have high roll-out
hand strength. Conversely, a hand with poor roll-out hand strength has little
chance of improving to a absolutely strong hand after the discard and redraw.
Thus, we categorize the first set of buckets according to roll-out hand strength,
and the second according to absolute hand strength. The process of
determining bucket composition is detailed further in Section 5.3.
and B, the second set C, D, and E, and the set of final buckets 1, 2, 3. The best
initial buckets are A and C, and the best final bucket is 3.
Finally, we need sets of transition probabilities that model the move into
the initial buckets from the original deal and from the initial to the final
buckets. In this abstract game, there are no individual cards, only buckets. The
effect of the original deal node is to determine the probability of initialization
to each pair of initial buckets. Likewise, the effect of the discard round is to
determine the probability of moving from one pair of initial buckets to one
pair of final buckets. Conceptualizing the game in this way drastically reduces
the number of nodes; instead of
! !
21 18
∗ = 1, 085, 280
3 3
Bucket A Bucket B
!
Bucket A 0.437 0.229
Bucket B 0.229 0.105
probability of player 1 being dealt a hand in the ith bucket and player 2 being
dealt a hand in the jth bucket. Since each player has an equal likelihood of
receiving a particular hand, transition matrices are symmetric; that is, ai j = a ji
for all i, j. For computational feasibility, our simulations iterate over a subset
of all possible hand combinations. Therefore, the matrices shown in Figures
5.1 and 5.2 are only approximately symmetric.
each bi j is the matrix entry representing the probability that the players move
from initial bucket combination j to final bucket combination i with the
discard round. Each ordered pair represents the pair of buckets each player’s
hand is in, respectively. For example, the transition probability from state A, A
to 1, 1 is 0.174. This is interpreted as follows: given that both players are
98 CHAPTER 5. 2-7 DRAW POKER
initially dealt hands in bucket A, the probability that they will both have
hands in bucket 1 after the discard round is 0.174.
5.3 Simulation
An important element of our simplified version of 2-7 Draw is the
transition probabilities between each game state, conceived as the associated
buckets of each player’s hand. Recall that in this game, we do not consider
individual hands as nodes in the game tree, only pairs of buckets. However,
CHAPTER 5. 2-7 DRAW POKER 99
the probability of the game being in each of these bucket combination states is
not uniform. Therefore, we must determine the individual transition
probabilities between each bucket pairing through simulation. This section
presents an overview of the algorithms necessary to model this and other
elements of the game, before we use the sequence form LP method to find the
optimal strategy. All of the following code was written in Python.
The first step is to generate an ordered list of all possible 3 card hands,
based on hand strength. There are
!
21
= 1, 330
3
total hands. The ordering of the hands has two levels. First, we place the
hands into the ranking categories shown in Table 5.1 such as Three of a Kind,
Flush, One Pair, etc. This is done through conditional statements based on
card values and suits of each hand. Within each of these categories, we then
sort based on the highest card in the hand to create a sub ordering. This
process is repeated for each ranking category, and the ordered categories are
appended to an overall list OrderedHands. Algorithm 1 shows a schemitization
of this process for the Three of a Kind category.
for AllHands do
if Card1Value = Card2Value = Card3Value then
Append(Hand) to ThreeKindList
end
end
Sort(ThreeKindList);
Append(ThreeKindList) to OrderedHands
Algorithm 1: Hand Sorting Algorithm
100 CHAPTER 5. 2-7 DRAW POKER
Recall that the particular hand composition of our initial buckets is based
on the roll-out strength of each hand, given all possible draw possibilities. We
iterate each hand over all possible draws and determine the average final hand
ranking, based on its index in OrderedHands. In order to determine which card
to discard, we use the following rule: if their hand is a pair or three of a kind,
discard one those cards. Otherwise, discard the highest card in the hand.
A general version of this process is shown in Algorithm 2. First, we
remove each of the three cards in the original hand from the available Deck,
since players cannot redraw a card that they discard. Then, if the original
hand is a Pair or Three of a Kind (PairsThreeKind), that card is replaced with
one from the Deck. Otherwise, the highest card is replaced with one from the
Deck. We then find the Strength of the new hand, given by its index in the
OrderedHands. This process iterates for each of the 18 possible cards to be
drawn. Then, for each hand, we compute the average Strength and place the
hand into a bucket with similar Strength values.
CHAPTER 5. 2-7 DRAW POKER 101
for OrderedHands do
Hand=Card1, Card2, Card3;
Remove(Card1, Card2, Card3) from Deck;
for Deck do
if Hand in PairsThreeKind then
Replace PairCard with Decki
end
else
Replace HighCard with Decki
end
Strength = OrderedHands.index(NewHand)
end
if Average(Strength) < value then
Append(Hand) to BucketA
end
else
Append(Hand) to BucketB
end
Add(Card1, Card2, Card3) to Deck;
end
Algorithm 2: Roll-Out Hand Strength Algorithm
Deck. Then, generate each 3 card combination of the Deck as the list structure
Combos. For each of these, the ordered pair HandPair is the combination of the
original Hand and an element of Combos. This pair of two hands is then sorted
into the appropriate paired bucket, based on the original membership of each
of its component hands. For example, if both Hand and a particular element of
Combos were originally members of bucket A (defined in Algorithm 2), this
HandPair would be appended to BucketAA. Then, the probability that
1
BucketAA is reached after the initial deal is Length(BucketAA)
. A generalized version
of this process is shown in Algorithm 3.
for Hands do
Hand=Card1, Card2, Card3;
Remove(Card1, Card2, Card3) from Deck;
Combos = Combinations(Deck, 3);
Sort(Combos);
for Combos do
HandPair=(Hand, Combos)
end
Add(Card1, Card2, Card3) to Deck;
end
if Hand in BucketA and Combos in BucketA then
Append(HandPair) to BucketAA
end
Length(BucketAA)
Algorithm 3: Initial Bucket Probability Algorithm
We now turn our attention to the second set of buckets, representing game
states after each player has discarded and redrawn. Their hand is now fixed,
CHAPTER 5. 2-7 DRAW POKER 103
and we can consider a new set of buckets based only on the strength of their
hand. These buckets are defined in Table 5.3, and we define these in our code
by iterating over all 1, 085, 280 two-player hand combinations and sorting
based on the respective indices in OrderedHands. The result is a set of 9 list
structures, Bucket11, Bucket12, etc, which contain pairs of hands meeting the
criteria for each single hand bucket. For example, if both players held a hand
in the top 25% of overall hands, it would be placed in Bucket33.
With this second set of buckets defined, we now determine the transition
probabilities for moving between each first round bucket pair (defined in
Algorithm 2) and each second round bucket pair. This is the most
computationally intensive step of this simulation process, and required
time-saving measures at multiple points. This algorithm is shown in
generalized form in Algorithm 4 First, we initialize a counter Count11,
representing the number of hand pairs that fall into Bucket11. For each hand
pair in BucketAA, remove each of the six cards that one of the players holds
from the available Deck. Then, generate all permutations of Deck of length 2.
This is an ordered pair representing the cards that each player will receive
after they discard one card, respectively. For computational efficiency,
randomly select one of these and assign it to Draw. Then, replace one card in
each player’s hand with one of the cards in Draw. The particular card to be
replaced is determined by the same discard rule: if holding One Pair or Three
of a Kind, discard one of those cards. Otherwise, discard the highest card.
With the resulting pair of hands, check the membership of each in the
previously defined final buckets and increment the corresponding counter.
For example, if the both of the hands are in the strongest category in Bucket11
104 CHAPTER 5. 2-7 DRAW POKER
Counter11
after the particular Draw, add 1 to Counter11. Then, 1,085,280
is the approximate
probability that given original membership in BucketAA, the players will end
in Bucket11 after the discard and redraw. This value has a corresponding entry
in Figure 5.3 or 5.4 for the 2 and 3 initial bucket cases, respectively.
Count11=0;
for BucketAA do
HandPair = [(Card1, Card2, Card3), (Card4, Card5, Card6)];
P1Hand = (Card1, Card2, Card3);
P2Hand = (Card4, Card5, Card6);
Remove (Card1, Card2, Card3, Card4, Card5, Card6) from Deck;
Draws = Permutations(Deck, 2);
Draw = RandomSample(Draws, 1);
if P1Hand in PairsThreeKind and P2Hand in PairsThreeKind then
Replace P1PairCard with Draw(1) and P2PairCard with Draw(2)
else if P1Hand in PairsThreeKind and P2Hand not in PairsThreeKind then
Replace P1PairCard with Draw(1) and P2HighCard with Draw(2)
else if P1Hand not in PairsThreeKind and P2 in PairsThreeKind then
Replace P1HighCard with Draw(1) and P2PairCard with Draw(2)
else
Replace P1HighCard with Draw(1) and P2HighCard with Draw(2)
FinalPair = (P1Hand, P2Hand);
if FinalPair in Bucket11 then
Count11 = Count11 + 1
end
Add(Card1, Card2, Card3, Card4, Card5, Card6) to Deck;
Algorithm 4: First to Second Round Transition Probabilities Algorithm
CHAPTER 5. 2-7 DRAW POKER 105
5.4 Results
We solve the sequence form LP of Simplified 2-7 Draw using the Gambit
software package, for both the 2 and 3 initial bucket cases. The solutions to
these LPs are the optimal realization plans for each game. These LPs also
compute an expected value of the game, which is simply the optimal objective
function value. Gambit’s sequence form LP algorithms compute only one
optimal solution; there may be other optimal realization plans for Simplified
2-7 Draw. This section provides a brief discussion of Gambit’s computed
solution; for more detail, see Appendix B.
For the 2 initial bucket case, both player’s optimal realization plans
characterize a relatively conservative strategy. We begin by discussing player
1’s optimal play. Player 1 is first dealt a random hand, which is a member of
either Bucket A or Bucket B. Hands in Bucket A have a higher average roll-out
hand strength than hands in Bucket B. On his first opportunity to bet, player 1
should always check. If player 2 bets after player 1 checks, player 1 should
always call the bet. Thus, player 1’s strategy for the first round is simple:
regardless of Bucket, check and call a bet if necessary.
After the first round of betting, each player discards and redraws one card
from the remaining deck. At this point, player 1’s optimal realization plan
becomes more complicated. It defines optimal play for each combination of
original bucket and final bucket. Recall that in the final round, hands are
sorted by absolute strength into Buckets 1 − 3, in ascending order. Hands in
106 CHAPTER 5. 2-7 DRAW POKER
Bucket 3 are superior to those in Bucket 2, which are in turn superior to those
in Bucket 1. If player 1’s hand was originally in Bucket A and both players
checked in the first betting round, he should bet with probability 0.178 while
holding a hand in Bucket 1, always check with Bucket 2, and always bet with
Bucket 3. If he does check and player 2 then bets, player 1 should always fold
with Bucket 1 and always call with Bucket 2. Notice that this optimal
realization plan does not specify what player 1 should do with a hand in
Bucket 3 after player 2 has bet; since it is optimal for player 1 to always bet
with Bucket 1 initially, he will simply never be in this situation. These
irrelevant sequences are denoted *** in the full table of results in Appendix B.
If player 1’s hand was originally in Bucket A and the betting sequence of
the first round was player 1 checks, player 2 bets, and player 1 calls, player 1
should bet with probability 0.122 while holding a hand in Bucket 1, always
check with Bucket 2, and always bet with Bucket 3. If player 1 checks and
player 2 bets, player 1 should always fold with Bucket 1 and always call with
Bucket 2. If player 1’s hand was originally in Bucket B and both player’s
checked in the first round, player 1 should always check with Bucket 1, bet
with probability 0.233 with Bucket 2, and always bet with Bucket 3. If player 1
checks and player 2 bets, always fold with Bucket 1 and call with Bucket 2. If
player 1’s hand was originally in Bucket B and the betting sequence of the first
round was player 1 checks, player 2 bets, and player 1 calls, player 1 should
always bet with Bucket 1, check with Bucket 2, and always bet with Bucket 3.
If player 1 checks and player 2 bets, always call with Bucket 2.
Player 2’s first round strategy is more aggressive and complex than player
1’s. If player 1 checks, player 2 should always check with Bucket A and always
CHAPTER 5. 2-7 DRAW POKER 107
bet with Bucket B. If player 1 bets, player 2 should call with probability 0.983
with Bucket A and always call with Bucket B. The discard and redraw follows
this round of betting. If player 2 originally had a hand in Bucket A and both
players checked in the first round, and player 1 bets in the final round, player
2 should always fold with Bucket 1, call with probability 0.891 with Bucket 2,
and always call with Bucket 3. Since player 2 never checks in the first round
with Bucket B, we need not define a realization plan for the second round.
If player 2 originally had a hand in Bucket B, the first round betting action
was player 1 checks, player 2 bets, and player 1 calls, and player 1 checks after
the discard and redraw, player 2 should always bet with Bucket 1, bet with
probability 0.435 with Bucket 2, and always bet with Bucket 3. Instead, if
player 1 bets, player 2 should call with probability 0.415 with Bucket 1, and
always call with Buckets 2 and 3.
If player 2 originally had a hand in Bucket A, the first round betting action
was player 1 bets and player 2 calls, and player 1 bets after the discard and
redraw, player 2 should call with probability 0.664 with Bucket 1, always with
Bucket 2, and with probability 0.640 with Bucket 3. If player 1 checks, player 2
should always check with every Bucket. If player 2 instead originally had a
hand in Bucket B, he should always call with Buckets 1 and 2, and with
probability 0.595 with Bucket 3. If player 1 checks, player 2 should always
check with every Bucket.
When each player uses these optimal realization plans, the value of the
14
game is approximately − 100 . This value is from player 1’s perspective; for each
hand of Simplified 2-7 Draw played according to these optimal realization
14
plans, player 1 can expected to lose 100
betting units to player 2. Recall that the
108 CHAPTER 5. 2-7 DRAW POKER
Finally, it is notable that little folding should occur in the first betting
round. Player 2 folds with probability 0.017 when holding a hand in Bucket A
when player 1 bets. Otherwise, each player’s optimal realization plan involves
always remaining in the hand until the final betting round. This highlights the
importance of the discard and redraw phase, where each player’s hand
strength can change dramatically. Each initial buckets contained plenty of
hands that could become very strong after the discard and redraw; this was
simply more likely with Bucket A. This feature illustrates a considerable issue
with the abstraction techniques that we used to reduce 2-7 Draw to a tractable
size. It is likely that more specific buckets, a bigger deck, bigger individual
hands, and more betting rounds would incentivize players to fold more often
before the final betting round.
clear how long a game of this size would take to run, but we were surprised
that the algorithm could not finish. The 2 initial bucket LP with 80 information
sets computed a solution rationally within 20 minutes, and the 3 initial bucket
LP only has 120 information sets. Fortunately, the non-rational algorithm for
the 3 initial bucket LP finishes within an hour. However, these results are
clearly flawed, and are shown in Appendix B. Many of the supposed optimal
probabilities cannot even be such; they often do not sum to 1 in each
information set, are negative, or are greater than 1. We report these results for
the sake of completeness of the project, but they cannot accurately describe an
optimal realization plan for the game. Future versions of Gambit and more
powerful computers may be able to find the exact rational solution.
Chapter 6
Conclusion
111
112 CHAPTER 6. CONCLUSION
in Section 5.2.2 could be modified to create a more realistic game. While our
version is a reasonable approximation, it loses many key facets of the game.
For example, players are not actually required to discard and redraw a card at
every opportunity, nor by our specific discard rule. Thus, a natural extension
of this project is to analyze a version with fewer abstractions.
Computational constraints played a significant role in our analysis of
Simplified 2-7 Draw, and considering a more accurate version would require
better computational tools. The current version of Gambit can compute
solutions for medium-sized games like ours, but quickly fails when faced with
truly large-scale games. The current version of Gambit is unable to find
optimal strategies of large games using the sequence form LP method, but
may be able to using other methods. Although not discussed or utilized in this
project, Gambit does have a variety of other methods of finding optimal
strategies [6]. Building the sequence form LPs manually, instead of
automatically with Gambit, would be difficult and prone to error. However,
they could be solved using external solvers, which are much more able to
handle large LPs. For example, a sequence form LP with over 1.1 million
variables was used to solve an abstracted game of Rhode Island Hold’Em with
external LP solvers [9]. Future work based on computational improvement
could use a more advanced version of Gambit, a different solution method, or
external LP solvers.
Appendix A
Gambit
Figure A.1 shows a portion of the Kuhn Poker extensive form game tree
constructed in the Gambit GUI. It allows the user to specify and label players,
actions, nodes, information sets, and payoffs. Since Kuhn Poker is relatively
small, we compute the Nash equilibrium both using the sequence form LP
113
114 APPENDIX A. GAMBIT
Figure A.1: Part of the Extensive Form of Kuhn Poker in the Gambit GUI
The first version of Simplified 2-7 Triple Draw that we developed involved
combinations of 5 buckets for the first round of betting, transitioning to
combinations of 3 buckets for the second round. This is in contrast to the 2 and
3 initial bucket versions for which we present solutions in Chapter 5. The 5
116 APPENDIX A. GAMBIT
bucket case is more complex, with 200 information sets, 400 sequences, and
6301 nodes. Attempts to find a Nash equilibrium for this game with Gambit
return an unexplained error. Direct correspondence with Gambit developer
Theodore Turocy suggests that a solving a game of this size is infeasible in
Gambit’s current version [11].
Appendix B
The following tables show the results of the Simplified 2-7 Draw sequence
form LPs. Each row denotes an information set for a particular player, each of
which has two possible actions. The Information column denotes the
previous action and information known to the player; this describes the
information set. The P(C/F) column lists the probability that the optimal
player should check or fold, depending on context. The P(B/C column lists the
corresponding probability that of the other action, either betting or calling. For
example, information set 1 for player 1 prescribes checking with probability
0.822 and betting with probability 0.178, given that player 1 initially had a
hand in Bucket 1, then both players checked, and now he has a hand in Bucket
1. Many of the information sets with ‘optimal‘ probabilities of 0.5 for each
action is a history that the player will never face, given the other prescriptions
of strategy. For example, consider information set 14 for player 1. In the
associated history, player 1 bet after the initial deal; since he always checks in
the situation per information set 0, he will never face the choice of actions in
117
118 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
1 0 Deal to Bucket A 1 0
B.1 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 119
B.1 (continued)
120 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
1 20 Deal to Bucket 2 1 0
B.1 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 121
B.1 (continued)
122 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
2 0 Bucket A, p1 check 1 0
B.1 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 123
B.1 (continued)
124 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
B.1 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 125
2 33 Bucket B, p1 bet 0 1
B.1 (continued)
Notice that the values do not describe probability distributions over each
information set; they sometimes do not sum to 1, are greater than 1, or less
than 0. The values in Table B.2 are unedited, and directly reflect Gambit’s
output. We do not identify irrelevant sequences, but these are likely to be
information sets where both sequences are played with probability 0.5.
1 0 Deal to Bucket C 1 0
1 5 Bucket C, p1 cheeck, p2 0 1
check, Bucket 3
B.2 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 127
B.2 (continued)
128 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
1 20 Deal to Bucket D 1 0
B.2 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 129
B.2 (continued)
130 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
1 40 Deal to Bucket E 1 0
B.2 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 131
B.2 (continued)
132 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
B.2 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 133
2 13 Bucket C, p1 bet 1 0
B.2 (continued)
134 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
2 20 Bucket D, p1 check 1 0
B.2 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 135
2 33 Bucket D, p1 bet 1 0
B.2 (continued)
136 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
B.2 (continued)
APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS 137
2 53 Bucket E, p1 bet 1 0
B.2 (continued)
138 APPENDIX B. SIMPLIFIED 2-7 DRAW LP RESULTS
Bibliography
[3] Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin.
Heads-up Limit Holdem Poker is Solved. Science, 347(6218):145–149, 2015.
[4] Drew Fudenberg and Jean Tirole. Game Theory. Massachusetts Institute of
Technology, Boston, MA, 1995.
139
140 BIBLIOGRAPHY
[8] Pokerstars. 2-7 (Deuce to Seven) Single Draw Lowball Rules, 2017.
https://www.pokerstars.com/poker/games/draw/2-7/single/.
[9] Tuomas Sandholm and Andrew Gilpin. Optimal Rhode Island Hold’em
Poker. National Conference on Artificial Intelligence (AAAI), Intelligent
Systems Demonstration Program, 2005.
[10] Byron Spice. Carnegie Mellon Artificial Intelligence Beats Top Poker
Pros. 2017. https://www.cmu.edu/news/stories/archives/2017/january/AI-
beats-poker-pros.html.