A mathematical analysis of the long-run behavior of genetic algorithms for social modeling

Ludo Waltman¹ &
Nees Jan van Eck¹

1807 Accesses
2 Citations
Explore all metrics

Abstract

We present a mathematical analysis of the long-run behavior of genetic algorithms (GAs) that are used for modeling social phenomena. Our analysis relies on commonly used mathematical techniques in the field of evolutionary game theory. We make a number of assumptions in our analysis, the most important one being that the mutation rate is positive but infinitely small. Given our assumptions, we derive results that can be used to calculate the exact long-run behavior of a GA. Using these results, the need to rely on computer simulations can be avoided. We also show that if the mutation rate is infinitely small the crossover rate has no effect on the long-run behavior of a GA. To demonstrate the usefulness of our mathematical analysis, we replicate a well-known study by Axelrod in which a GA is used to model the evolution of strategies in iterated prisoner’s dilemmas. The theoretically predicted long-run behavior of the GA turns out to be in perfect agreement with the long-run behavior observed in computer simulations. Also, in line with our theoretically informed expectations, computer simulations indicate that the crossover rate has virtually no long-run effect. Some general new insights into the behavior of GAs in the prisoner’s dilemma context are provided as well.

Characteristic Features of the Sustainable Strategies in the Evolvable Iterated Prisoners’ Dilemma

Evolutionary Mechanisms

Epictetusian rationality and evolutionary stability

Article 17 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The field of evolutionary computation is concerned with the study of all kinds of evolutionary algorithms. These algorithms can be used for various purposes. Perhaps the most popular purpose for which they can be used is function optimization (e.g., Gen and Cheng 2000; Goldberg 1989; Michalewicz 1996). In the function optimization context, evolutionary algorithms can be seen as heuristics that serve as alternatives to more traditional techniques from the fields of combinatorial optimization and mathematical programming. Another important purpose for which evolutionary algorithms can be used is the modeling of biological and social phenomena (e.g., Mitchell 1996). This is the topic with which we are concerned in this paper. Our focus is in particular on the use of evolutionary algorithms for modeling social phenomena.

When using evolutionary algorithms in the social modeling context, one of the assumptions one makes is that the agents whose behavior is being modeled are boundedly rational. This basically means that the agents are assumed not to behave in a utility-maximizing manner. There are numerous ways in which boundedly rational behavior can be modeled (e.g., Brenner 2006; Fudenberg and Levine 1998). A popular approach is to rely on an evolutionary metaphor. This is the approach that is taken by evolutionary algorithms. In its simplest form, the evolutionary approach assumes that there is a population of agents and that for each agent in the population the strategy it uses depends on the population-wide past performance of strategies. The better the past performance of a strategy, the more likely the strategy is to be used again. The evolutionary approach also assumes that there always is a small probability that an agent experiments with a new strategy.

The evolutionary approach to modeling boundedly rational behavior has attracted a lot of attention, not only from researchers in the field of evolutionary computation but also from researchers in the social sciences, in particular from economists. Traditionally, economists have typically relied on game-theoretic models to analyze interactions between agents. These models assume agents to behave in a fully rational way. Nowadays, however, the limitations of game-theoretic models are well recognized and many economists have started to study evolutionary models of agent behavior. These models are based on the assumption that the behavior of agents can best be described using some evolutionary mechanism rather than using the idea of full rationality.

In the field of economics, there are two quite separate streams of research that are both concerned with the evolutionary approach to modeling boundedly rational behavior. One stream of research, which is usually referred to as agent-based computational economics (e.g., Tesfatsion 2006), makes use of techniques from the field of evolutionary computation. Especially genetic algorithms (GAs) are frequently used. Early work in this stream of research includes (Andreoni and Miller 1995; Arifovic 1994, 1996; Dawid 1996; Holland and Miller 1991; Marks 1992; Miller 1986), and examples of more recent work are Alkemade et al. (2006, 2009), Georges (2006), Haruvy et al. (2006), Lux and Schornstein (2005), Vriend (2000), Waltman and Van Eck (2009), and Waltman et al. (2011). The other stream of research is more closely related to traditional game theory and is referred to as evolutionary game theory (e.g., Gintis 2000; Maynard Smith 1982; Vega-Redondo 1996; Weibull 1995). Like the traditional game-theoretic approach, the evolutionary game-theoretic approach is model-based and relies heavily on mathematical analysis. The use of computer simulations is not very common in evolutionary game theory.

In this paper, it is not our aim to argue in favor of either the agent-based computational economics approach, which emphasizes algorithms and computer simulations, or the evolutionary game-theoretic approach, which emphasizes models and mathematical analysis. Instead, we want to show how the former approach can benefit from the mathematical techniques used in the latter approach. More specifically, we want to show how evolutionary algorithms that are used for modeling social phenomena can be analyzed mathematically using techniques that are popular in evolutionary game theory. Our focus in this paper is on one particular type of evolutionary algorithm, namely GAs with a binary encoding. However, we emphasize that the approach that we take can be applied to other types of evolutionary algorithms as well. The reason for focusing on GAs with a binary encoding is that this seems to be the type of evolutionary algorithm that is used most frequently for modeling social phenomena (e.g., Alkemade et al. 2006, 2007; Andreoni and Miller 1995; Arifovic 1994, 1996; Ashlock et al. 1996; Axelrod 1987; Crowley et al. 1996; Dawid 1996; Georges 2006; Ishibuchi and Namikawa 2005; Lux and Schornstein 2005; Marks 1992; Miller 1986, 1996; Van Bragt et al. 2001; Vriend 2000; Yao and Darwen 1994).

The mathematical analysis that we present in this paper deals with the long-run behavior of GAs with a binary encoding. The GAs are assumed to be used in the social modeling context (for theoretical work on GAs in the function optimization context, see, e.g., Mitchell 1996; Nix and Vose 1992; Rudolph 1994, 1998; Vose 1999). In the terminology of Vriend (2000), we are concerned with GAs that are used for modeling social learning (as opposed to individual learning). Our work can be seen as an extension of the work of Dawid (1996), who derived a number of important mathematical results on the behavior of GAs. For small and moderate population sizes, the results of Dawid do not provide a full characterization of the long-run behavior of GAs. We extend the work of Dawid by deriving results that do provide a full characterization of the long-run behavior of GAs for small and moderate population sizes. Using our results, the long-run behavior of a GA can be calculated exactly and need not be estimated using computer simulations. This means that it is no longer necessary to run a GA a large number of times for a large number of iterations in order to get insight into its long-run behavior. The use of our mathematical results has at least three advantages over the use of computer simulations:

1.
Our mathematical results can be used to calculate the long-run behavior of a GA exactly, while computer simulations can only be used to estimate the long-run behavior of a GA.
2.
When using computer simulations, it can be difficult to determine how many iterations of a GA are required to approximate the long-run behavior of the GA reasonably closely. Our mathematical results do not have this problem.
3.
Calculating the exact long-run behavior of a GA using our mathematical results requires less computing time than obtaining a reasonably accurate estimate of the long-run behavior of a GA using computer simulations.

Our mathematical results have one important limitation, which is that on most of today’s computers they can only be used if the chromosome length is not greater than about 24 bits. If the chromosome length is greater than about 24 bits, the use of our mathematical results to calculate the long-run behavior of a GA most likely requires a prohibitive amount of computer memory.

Like in Dawid (1996), the mathematical analysis presented in this paper relies on the assumption that the mutation rate is positive but infinitely small. (In other words, the analysis is concerned with the limit case in which the mutation rate approaches zero.) In simulation studies with GAs, researchers typically work with values between 0.001 and 0.01 for the mutation rate. This seems to be a rather pragmatic choice (cf. Dawid 1996). On the one hand, lower values for the mutation rate would lead to very slow convergence and, consequently, very long simulation runs. On the other hand, higher values for the mutation rate would lead to convergence to unstable, difficult to interpret outcomes. We believe that our assumption of an infinitely small mutation rate is justified because an infinitely small mutation rate is less arbitrary than a mutation rate whose value is determined solely based on pragmatic grounds (cf. Foster and Young 1990). The assumption of an infinitely small mutation rate is also in line with the common practice in evolutionary game theory, in which a similar assumption is almost always made. The advantage of assuming an infinitely small mutation rate is that it greatly simplifies the mathematical analysis of the long-run behavior of GAs (see also Dawid 1996). In fact, GAs with an infinitely small mutation rate can be analyzed in a similar way as well-known models in evolutionary game theory (e.g., Foster and Young 1990; Kandori et al. 1993; Vega-Redondo 1997; Young 1993). Like in evolutionary game theory, mathematical results provided by Freidlin and Wentzell (1998) are the key tool for analyzing the long-run behavior to which convergence will take place. We note that, in addition to the assumption of an infinitely small mutation rate, there are some other technical assumptions on which our mathematical analysis relies. Most of these assumptions are not very strong and will probably be satisfied by most GAs.

To demonstrate the usefulness of our mathematical analysis, we replicate a well-known study by Axelrod (1987) (reprinted in Axelrod 1997; see also Dawid 1996; Mitchell 1996). Axelrod used a GA to model the evolution of strategies in iterated prisoner’s dilemmas (IPDs). He showed that an evolutionary mechanism can lead to cooperative behavior. Axelrod’s study has been one of the first and also one of the most influential studies on the use of evolutionary algorithms for modeling social phenomena. Directly or indirectly, his study seems to have inspired many researchers (e.g., Ashlock et al. 1996, 2006; Chong and Yao 2005; Crowley et al. 1996; Fogel 1993; Ishibuchi and Namikawa 2005; Mühlenbein 1991; Thibert-Plante and Charbonneau 2007; Van Bragt et al. 2001; Yao and Darwen 1994). The results obtained by Axelrod are all based on computer simulations. In this paper, we show that more or less the same results can be calculated exactly, with no need to rely on simulations. We also discuss some new insights that exact calculations provide.

The mathematical analysis that we present in this paper also has an important implication for the choice of the parameters of a GA. The analysis indicates that if the mutation rate is infinitely small the crossover rate has no effect on the long-run behavior of a GA. This is a quite remarkable result that, to the best of our knowledge, has not been reported before in the theoretical literature on GAs. The result implies that when GAs are used for modeling social phenomena the crossover rate is likely to be a rather insignificant parameter, at least when one is mainly interested in the behavior of GAs in the long run (for the short run, see Thibert-Plante and Charbonneau 2007). This suggests that in many cases the crossover rate can simply be set to zero, in which case no crossover will take place at all. Simulation results that we report in this paper indeed show no significant effect of the crossover rate on the long-run behavior of a GA.

The remainder of this paper is organized as follows: in Sect. 2, we present a mathematical analysis of the long-run behavior of GAs that are used for modeling social phenomena. Based on the analysis, we derive an algorithm for calculating the long-run behavior of GAs in Sect. 3. In Sect. 4, we demonstrate an application of the algorithm by replicating Axelrod’s study (1987). Finally, we discuss the conclusions of our research in Sect. 5. Proofs of our mathematical results are provided in the Appendix.

2 Analysis

The general form of the GAs that we analyze in this paper is shown in Fig. 1. In this figure, and also in the rest of this paper, the positive integers n and m and the probabilities γ and $\varepsilon$ denote, respectively, the population size, the chromosome length, the crossover rate, and the mutation rate. For simplicity, we assume the population size n to be even. We further assume the crossover rate γ and the mutation rate $\varepsilon$ to remain constant over time. We also assume $\varepsilon$ to be positive. The GAs that we analyze are generalizations of the canonical GA discussed in, for example, Goldberg (1989) and Mitchell (1996). Like the canonical GA, we assume the use of a binary encoding, that is, chromosomes correspond to bit strings in our GAs. Unlike the canonical GA, we do not assume the use of specific selection and crossover operators. Instead, the GAs that we analyze may use almost any selection operator, such as roulette wheel selection (sometimes referred to as fitness-proportionate selection), tournament selection, or rank selection, and any crossover operator, such as single-point crossover, two-point crossover, or uniform crossover. Furthermore, in the GAs that we analyze, the fitness of a chromosome may depend, either deterministically or stochastically, on the entire population rather than only on the chromosome itself. When using GAs for social modeling, the fitness of a chromosome typically depends on the entire population. This is referred to as state-dependent fitness in Dawid (1996). In most studies, GAs that are used for social modeling have the same general form as the GAs that we analyze in this paper.

We now introduce the terminology and the mathematical notation that we use in our analysis. We note that an overview of the mathematical notation is provided in Table 1. There are μ = 2^m different chromosomes, denoted by $0, \ldots, \mu-1.$ Each chromosome has a unique binary encoding, which is given by a bit string of length m.^{Footnote 1} $\mathcal{C}=\{0, \ldots, \mu-1\}$ denotes the set of all chromosomes. i and j denote typical chromosomes and take values in $\mathcal{C}.$ The following definition introduces the notion of uniform and non-uniform populations.

Table 1 Overview of the mathematical notation

Full size table

Definition 1

A population is said to be uniform if and only if all n chromosomes in the population are identical. A population is said to be non-uniform if and only if some chromosomes in the population are different.

$\mathcal{U}$ denotes the set of all uniform populations. Obviously, since there are μ different chromosomes, there are also μ different uniform populations, that is, $\vert{\mathcal{U}}\vert =\mu.$ $u(i) \in \mathcal{U}$ denotes the uniform population consisting of n times chromosome i.δ(i, j) denotes the Hamming distance between chromosomes i and j, that is, the number of corresponding bits in the binary encodings of i and j that are different. $\mathcal{G}(i)$ denotes the set of all chromosomes that have the same binary encoding as chromosome i except that one bit has been changed from one into zero. Conversely, $\mathcal{H}(i)$ denotes the set of all chromosomes that have the same binary encoding as chromosome i except that one bit has been changed from zero into one. In mathematical notation,

$$ {\mathcal{G}}(i)=\{j \vert j < i \hbox{ and } \delta(i, j) = 1\} $$

(1)

$$ {\mathcal{H}}(i)=\{j \vert j > i \hbox{ and } \delta(i, j) = 1\}. $$

(2)

Notice that $j \in \mathcal{G}(i)$ if and only if $i \in\mathcal{H}(j).$ There are

$$ \nu = \mu m/2 = m{2^{m - 1}} $$

(3)

combinations of two chromosomes i and j such that δ(i, j) = 1, that is, such that the binary encodings of i and j differ by exactly one bit. (To see this, notice that there are μ different chromosomes and that for each chromosome there are m chromosomes that have the same binary encoding except for one bit. Dividing by two is necessary to avoid double counting.) k and $k^{\prime}$ denote indices that take values in $\{1, \ldots, \nu\}.\;\widetilde{{\mathcal{V}}}$ denotes the set of all populations in which there are exactly two different chromosomes and in which the binary encodings of these chromosomes differ by exactly one bit. There are

$$ \xi=\lvert{\widetilde{\mathcal{V}}}\rvert = \nu(n - 1) = (n - 1)m{2 ^ {m-1}} $$

(4)

such populations. (The order of the chromosomes within a population has no effect on the behavior of a GA. Populations consisting of the same chromosomes in different orders are therefore considered identical.) $\mathcal{V}$ denotes the set that is obtained by adding the uniform populations to $\widetilde{\mathcal{V}},$ that is, $\mathcal{V} = \widetilde{\mathcal{V}} \cup \mathcal{U}.$ For i and j such that δ(i, j) = 1 and for $\lambda \in \{0,\ldots, n\}, v(i, j, \lambda) \in \mathcal{V}$ denotes the population consisting of λ times chromosome i and n − λ times chromosome j. Notice that v(i, j, λ) = v(j, i, n − λ) and that v(i, j, 0) = u(j) and $v(i, j, n)=u(i).$ $\mathcal{W}$ denotes the set of all possible populations. As shown in Nix and Vose (1992, Lemma 1) and Dawid (1996), the number of possible populations equals

$$ \vert{\mathcal{W}}\vert = \left(\begin{array}{l} n + \mu - 1\\ \mu - 1 \end{array}\right)=\frac{(n + \mu - 1)!}{n!(\mu - 1)!}. $$

(5)

(Again, populations consisting of the same chromosomes in different orders are considered identical.) For $t \in \{0, 1, \ldots\},$ the random variable $W_t \in \mathcal{W}$ denotes the population at the beginning of iteration t of a GA. For i and j such that δ(i, j) = 1 and for $\lambda \in \{1, \ldots, n - 1\}$ and $\lambda^{\prime} \in \{0, \ldots, n\}, \pi(i, j, \lambda,\lambda^{\prime})$ denotes the limit as the mutation rate $\varepsilon$ approaches zero of the probability that population v(i, j, λ) is turned into population $v(i, j,\lambda^{\prime})$ in a single iteration of a GA. In mathematical notation,

$$ \pi(i, j, \lambda, \lambda^{\prime}) = \lim_{\varepsilon \rightarrow 0}\Pr(W_{t + 1} = v(i, j, \lambda^{\prime}) \vert W_t = v(i, j, \lambda)) $$

(6)

where $t \in \{0, 1, \ldots\}.$ Because the binary encodings of chromosomes i and j differ by only one bit, applying the crossover operator to chromosomes i and j has no effect. As a consequence, the crossover operator has no effect on $\pi(i, j,\lambda, \lambda^{\prime}).$ Moreover, because $\varepsilon$ approaches zero, the mutation operator has no effect on $\pi(i, j,\lambda, \lambda^{\prime})$ either. $\pi(i, j, \lambda,\lambda^{\prime})$ therefore equals the probability that the selection operator turns population v(i, j, λ) into population $v(i, j, \lambda^{\prime})$ in a single iteration of a GA.

The following definition introduces the notion of almost uniform populations.

Definition 2

A non-uniform population $w \in \mathcal{W} \setminus\mathcal{U}$ is said to be almost uniform if and only if

$$ \lim_{\varepsilon \rightarrow 0}\Pr(W_{t + N} = u \vert W_t = w) > 0 $$

(7)

for all $t \in \{0, 1, \ldots\},$ some finite positive integer N, and some $u \in \mathcal{U}.$

Hence, a non-uniform population is almost uniform if and only if no mutation is required to go from the non-uniform population to some uniform population. This is not a very strong condition. In many cases, all non-uniform populations are almost uniform. For example, if a GA uses roulette wheel selection or tournament selection, there is always a possibility that the selection operator selects n times the same chromosome. In other words, the selection operator can turn any non-uniform population into a uniform population in a single iteration. Because of this, when roulette wheel selection or tournament selection is used, all non-uniform populations are almost uniform.

The following two definitions introduce the notion of a connection from one chromosome to another:

Definition 3

A direct connection from chromosome i to chromosome j is said to exist if and only if δ(i, j) = 1 and

$$ \lim_{\varepsilon \rightarrow 0}\Pr(W_{t + N} = u(j) \vert W_t = v(i, j, n - 1)) > 0 $$

(8)

for all $t \in \{0, 1, \ldots\}$ and some finite positive integer N.

Definition 4

A connection from chromosome i to chromosome j is said to exist if and only if there exists a sequence $(i_1,\ldots, i_N)$ such that $i_1, \ldots, i_N \in \mathcal{C}, i_1 = i,i_N = j,$ and i _M is directly connected to i _M+1 for all $M\in \{1, \ldots, N - 1\}.$

Definition 3 states that there is a direct connection from chromosome i to chromosome j if and only if the minimum number of mutations required to go from uniform population u(i) to uniform population u(j) is one. We note that in many cases all chromosomes i and j such that δ(i, j) = 1 have mutual direct connections. This is, for example, the case if a GA uses roulette wheel selection and the fitness of a chromosome is always positive. Definition 4 states that there is a connection from chromosome i to chromosome j if and only if there is a sequence of chromosomes starting at i and ending at j such that each chromosome in the sequence is directly connected to its successor. Clearly, if all chromosomes i and j such that δ(i, j) = 1 have mutual direct connections, then each chromosome is connected to all other chromosomes.

It is well known that the population in the current iteration of a GA has no effect on the behavior of the GA in the long run (e.g., Dawid 1996; Nix and Vose 1992). More specifically, the population an infinite number of iterations in the future is statistically independent of the population in the current iteration. The following lemma states this result in a formal way:

Lemma 1

For each population $w \in \mathcal{W},$ there exists a long-run probability $\bar{q}(w)$ such that

$$ \lim_{N \rightarrow \infty}\Pr(W_{t + N} = w \vert W_t = w_t) = \bar{q}(w) $$

(9)

for all $t \in \{0, 1, \ldots\}$ and all $w_t \in\mathcal{W}.$

Proof

See the Appendix.

In our analysis, we are concerned with the long-run behavior of GAs in the limit as the mutation rate $\varepsilon$ approaches zero. We therefore use the following definition:

Definition 5

For $w \in \mathcal{W}, \hat{q}(w) =\lim_{\varepsilon \rightarrow 0}\bar{q}(w)$ is called the long-run limit probability of population w.

We now introduce the vectors and matrices that we need to state the main result of our analysis. We first note that throughout this paper vectors and matrices are represented by, respectively, bold lowercase and bold uppercase letters and that the transpose of a matrix X is written as X ^T. I _N denotes an identity matrix of order N × N, and 0 _M × N and 1 _M × N denote matrices of order M × N in which all elements are equal to, respectively, zero and one. We simply write I, 0, or 1 when the order of a matrix is clear from the context. g = [g _k] and h = [h _k] denote vectors of length ν that satisfy

$$ \forall k: g_k, h_k \in {\mathcal{C}} $$

(10)

$$ \forall k: h_k \in {\mathcal{H}}(g_k) $$

(11)

$$ \forall k, k^{\prime}: k \neq k^{\prime} \Rightarrow (g_k, h_k) \neq (g_{k^{\prime}}, h_{k^{\prime}}). $$

(12)

Hence, for each k, (g _k, h _k) denotes a combination of two chromosomes such that the binary encodings of the chromosomes differ by exactly one bit. $\mathbf{g}$ and $\mathbf{h}$ together contain all such combinations of two chromosomes. $\mathbf{A}, \mathbf{B},\mathbf{C},$ and $\mathbf{D}$ denote matrices of order μ × ξ, ξ × μ, ξ × ξ, and μ × μ, respectively. Matrix $\mathbf{A}$ is given by

$$ {\mathbf{A}}=\left[\begin{array}{lll} {\mathbf{a}}(0, 1)& \cdots & {\mathbf{a}}(0, \nu)\\ \vdots& \ddots&\vdots\\ {\mathbf{a}}(\mu - 1, 1)&\cdots&{\mathbf{a}}(\mu - 1, \nu) \end{array}\right] $$

(13)

where

$$ {\mathbf{a}}(i, k)= \left\{\begin{array}{ll}{\tilde{\mathbf{a}}_1},& \hbox{if}\; g_k = i\\ {{\tilde{\mathbf{a}}}_2}, & \hbox{if}\; h_k = i\\ {\mathbf{0}}_{1 \times (n - 1)}, & \hbox{otherwise} \end{array}\right. $$

(14)

and

$$ {{\tilde{\mathbf{a}}}_1}=\left[\begin{array}{ll}{\mathbf{0}}_{1 \times (n - 2)} & 1\end{array}\right], \quad {{\tilde{\mathbf{a}}}_2}= \left[\begin{array}{ll}1 & {\mathbf{0}}_{1 \times (n - 2)}\end{array}\right]. $$

(15)

Matrix $\mathbf{B}$ is given by

$$ {\mathbf{B}}=\left[\begin{array}{lll}{\mathbf{b}}(1, 0) &\cdots & {\mathbf{b}}(1, \mu - 1) \\ \vdots& \ddots & \vdots\\ {\mathbf{b}}(\nu, 0) & \cdots & {\mathbf{b}}(\nu, \mu - 1) \end{array}\right] $$

(16)

where

$$ {\mathbf{b}}(k, i)=\left\{\begin{array}{ll}{{\tilde{\mathbf{b}}}}(g_k, h_k, n), & \hbox{if}\; g_k=i \\ {{\tilde{\mathbf{b}}}}(g_k, h_k, 0), & \hbox {if}\; h_k = i\\ {\mathbf{0}}_{(n-1) \times 1}, & \hbox{otherwise} \end{array}\right. $$

(17)

and

$$ {{\tilde{\mathbf{b}}}}(i, j, \lambda) =\left[\begin{array}{l} \pi(i, j, 1, \lambda)\\ \vdots\\ \pi(i, j, n - 1, \lambda) \end{array}\right]. $$

(18)

Matrix $\mathbf{C}$ is given by

$$ {\mathbf{C}} =\left[\begin{array}{lll} {\mathbf{C}}(1, 1) & \cdots & {\mathbf{C}}(1, \nu)\\ \vdots&\ddots&\vdots\\ {\mathbf{C}}(\nu, 1)&\cdots&{\mathbf{C}}(\nu, \nu)\end{array}\right] $$

(19)

where

$$ {\mathbf{C}}(k, k^{\prime}) =\left\{\begin{array}{ll} {{\widetilde{\mathbf{C}}}}(g_k, h_k), &\hbox{if}\; k=k^{\prime}\\ {\mathbf{0}}_{(n-1) \times (n-1)}, & \hbox{otherwise}\end{array}\right. $$

(20)

and

$$ {{\widetilde{\mathbf{C}}}}(i, j)= \left[\begin{array}{lll} \pi(i, j, 1, 1)& \cdots & \pi(i, j, 1, n - 1)\\ \vdots&\ddots&\vdots\\ \pi(i, j, n - 1, 1)& \cdots& \pi(i, j, n - 1, n - 1) \end{array}\right]. $$

(21)

Matrix D is obtained from A, B, and C and is given by

$$ {\mathbf{D}} = {\mathbf{A}}{({\mathbf{I}} - {\mathbf{C}}) ^ {-1}}{\mathbf{B}} - m{\mathbf{I}}. $$

(22)

The following theorem states the main result of our analysis:

Theorem 1

Let all non-uniform populations be almost uniform, and let each chromosome in $\mathcal{C}$ be connected to all other chromosomes in $\mathcal{C}.$ Then, (1) all non-uniform populations have a long-run limit probability of zero, that is, $\hat{q}(w) = 0$ for all $w \in \mathcal{W} \setminus\mathcal{U},$ and (2) the long-run limit distribution ${{\hat{\mathbf{q}}}}=\left[\begin{array}{lll}\hat{q}(u(0)) \cdots \hat{q}(u(\mu - 1))\end{array}\right]$ satisfies

$$ {{\hat{\mathbf{q}}}}{\mathbf{D}}={\mathbf{0}} $$

(23)

$$ {{\hat{\mathbf{q}}}}{\mathbf{1}}=1 $$

(24)

which has a unique solution.

Proof

See the Appendix.

There are three comments that we would like to make on the above theorem. First, the result that under certain assumptions non-uniform populations have a long-run limit probability of zero is not new. A similar result can be found in Dawid (1996, Proposition 4.2.1). Second, under the assumptions of the theorem, the long-run limit probability of a population does not depend on the crossover rate γ. This is a quite remarkable result that, to the best of our knowledge, has not been reported before in the theoretical literature on GAs. It indicates that in the limit as the mutation rate $\varepsilon$ approaches zero γ has no effect on the long-run behavior of a GA. Third, the theorem can be used to calculate the long-run limit distribution ${\hat{\mathbf{q}}}$ only if the probabilities $\pi(i, j, \lambda, \lambda^{\prime})$ defined in (6) can be calculated for all i and all j such that δ(i, j) = 1 and for all $\lambda \in \{1, \ldots, n - 1\}$ and all $\lambda^{\prime} \in \{0, \ldots, n\}.$ Whether this is possible depends on the way in which the fitness of a chromosome is determined and on the selection operator that is used. This in turn depends heavily on the specific problem that one wants to model using a GA. Because of the dependence on the problem to be modeled, we cannot provide any general results for the calculation of the probabilities $\pi(i, j, \lambda, \lambda^{\prime}).$ In Sect. 4, however, we demonstrate how the probabilities $\pi(i, j, \lambda,\lambda^{\prime})$ can be calculated for a GA that is similar to the GA used by Axelrod in his seminal paper on GA modeling (Axelrod 1987).

3 Algorithm

In this section, we present an algorithm for calculating the long-run limit distribution ${\hat{\mathbf{q}}}.$ The algorithm is based on Theorem 1. Like Theorem 1, it assumes that all non-uniform populations are almost uniform and that each chromosome in $\mathcal{C}$ is connected to all other chromosomes in $\mathcal{C}.$ It also assumes that the probabilities $\pi(i, j,\lambda, \lambda^{\prime})$ defined in (6) can be calculated for all i and all j such that δ(i, j) = 1 and for all $\lambda\in \{1, \ldots, n - 1\}$ and all $\lambda^{\prime} \in \{0, \ldots,n\}.$

The most straightforward approach to calculating the long-run limit distribution ${\hat{\mathbf{q}}}$ would be to start with calculating the matrices $\mathbf{A}, \mathbf{B},$ and $\mathbf{C}$ using (13)–(21). Matrix $\mathbf{D}$ would then be calculated using (22), which would require solving a linear system. Finally, ${\hat{\mathbf{q}}}$ would be obtained by solving the linear system given by (23) and (24). Unfortunately, this approach to calculating ${\hat{\mathbf{q}}}$ requires a lot of computer memory and is therefore infeasible even for problems of only moderate size. Most memory is required for storing matrix $\mathbf{C}.$ This matrix has (at most) ν (n − 1)² = (n − 1)² m2^m−1 non-zero elements. Clearly, as the population size n and the chromosome length m increase, storing the non-zero elements of $\mathbf{C}$ in a computer’s main memory soon becomes infeasible. The algorithm that we propose for calculating ${\hat{\mathbf{q}}}$ exploits the sparsity of the matrices $\mathbf{A}, \mathbf{B},$ and $\mathbf{C}$ in order to calculate matrix $\mathbf{D}$ in a memory-efficient way. The algorithm does not require the entire matrices $\mathbf{A},\mathbf{B},$ and $\mathbf{C}$ to be stored in memory. The algorithm also solves the linear system given by (23) and (24) in a memory-efficient way. This is achieved by exploiting the sparsity of $\mathbf{D}.$ The algorithm is shown in Fig. 2. We now discuss it in more detail.

We first consider the efficient calculation of matrix $\mathbf{D}.$ Let ${\widehat{\mathbf{C}}} = (\mathbf{I} - \mathbf{C}) ^ {-1}.$ Because $\mathbf{C}$ is a block diagonal matrix, ${\widehat{\mathbf{C}}}$ can be written as

$$ {{\widehat{\mathbf{C}}}}=\left[\begin{array}{lll} {{\widehat{\mathbf{C}}}}(1, 1) & \cdots & {{\widehat{\mathbf{C}}}}(1, \nu)\\ \vdots& \ddots&\vdots\\ {{\widehat{\mathbf{C}}}}(\nu, 1)&\cdots&{{\widehat{\mathbf{C}}}}(\nu, \nu) \end{array}\right] $$

(25)

where

$$ {{\widehat{\mathbf{C}}}}(k, k^{\prime})=\left\{\begin{array}{ll}({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(g_k, h_k)) ^ {-1}, & \hbox {if \;$k = k^{\prime}$} \\ {\mathbf{0}}_{(n - 1) \times (n - 1)}, & \hbox {otherwise}.\end{array}\right. $$

(26)

Hence, ${\widehat{\mathbf{C}}}$ is a block diagonal matrix too. Let $\mathbf{D}$ be written as

$$ {\mathbf{D}}=\left[\begin{array}{lll} d(0, 0)& \cdots & d(0, \mu - 1)\\ \vdots&\ddots&\vdots\\ d(\mu - 1, 0)&\cdots & d(\mu-1, \mu-1) \end{array}\right]. $$

(27)

Taking into account the sparsity of $\mathbf{A}, \mathbf{B},$ and ${\widehat{\mathbf{C}}},$ it can be seen that

$$ d(i, j)= \left\{\begin{array}{ll}\sum_{i^{\prime} \in {\mathcal{G}}(i)}{{\tilde{\mathbf{a}}}_2}{{\tilde{\mathbf{e}}}}(i^{\prime}, i, 0) &\\\quad + \sum_{i^{\prime} \in {\mathcal{H}}(i)}{{\tilde{\mathbf{a}}}_1}{{\tilde{\mathbf{e}}}}(i, i^{\prime}, n) - m, & \hbox{if}\; i=j \\ {{\tilde{\mathbf{a}}}_2}{{\tilde{\mathbf{e}}}}(j, i, n), & \hbox{if}\; j \in {\mathcal{G}}(i)\\ {{\tilde{\mathbf{a}}}_1}{{\tilde{\mathbf{e}}}}(i, j, 0), & \hbox{if}\; j \in {\mathcal{H}}(i)\\ 0, & \hbox{otherwise}\end{array}\right. $$

(28)

where

$$ {{\tilde{\mathbf{e}}}}(i, j, \lambda) = {({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}{{\tilde{\mathbf{b}}}}(i, j, \lambda). $$

(29)

This result shows that each non-zero element of $\mathbf{D}$ can be calculated by solving one or more relatively small linear systems, that is, systems of n − 1 equations and unknowns. Moreover, by calculating the elements of $\mathbf{D}$ one by one, there is no need to store the entire matrices $\mathbf{A}, \mathbf{B},$ and $\mathbf{C}$ in memory. Solving a linear system of n − 1 equations and unknowns can be done using standard Gaussian elimination methods. Except for very large values for the population size n, today’s computers have sufficient main memory to apply Gaussian elimination methods to such systems. We further note that the amount of computation required for obtaining $\mathbf{D}$ can be reduced by taking into account that

$$ \begin{aligned} {{\tilde{\mathbf{e}}}}(i, j, 0)&={({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}{{\tilde{\mathbf{b}}}}(i, j, 0)\\ &={({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}\left({\mathbf{1}} - \textstyle\sum\limits_{\lambda = 1}^n{{\tilde{\mathbf{b}}}}(i, j, \lambda)\right)\\ &={({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}({\mathbf{1}} - {{\widetilde{\mathbf{C}}}}(i, j){\mathbf{1}} - {{\tilde{\mathbf{b}}}}(i, j, n)) \\ &={({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)){\mathbf{1}} - {{\tilde{\mathbf{e}}}}(i, j, n)\\ &={\mathbf{1}} - {{\tilde{\mathbf{e}}}}(i, j, n). \end{aligned} $$

(30)

Because of this, d(i, j) can be written as

$$ d(i,j)=\left\{\begin{array}{ll}\sum_{i^{\prime} \in {\mathcal{G}}(i)}(1 - {{\tilde{\mathbf{a}}}_2}{{\tilde{\mathbf{e}}}}(i^{\prime}, i, n))&\\\quad +\sum_{i^{\prime}\in{\mathcal{H}}(i)}{{\tilde{\mathbf{a}}}_1}{{\tilde{\mathbf{e}}}}(i, i^{\prime}, n) -m, & \hbox{if}\; i=j\\ {{\tilde{\mathbf{a}}}_2}{{\tilde{\mathbf{e}}}}(j, i, n), & \hbox{if}\; j \in {\mathcal{G}}(i)\\ 1 - {{\tilde{\mathbf{a}}}_1}{{\tilde{\mathbf{e}}}}(i, j, n), & \hbox {if}\; j \in {\mathcal{H}}(i)\\ 0, & \hbox{otherwise.}\end{array}\right. $$

(31)

Using (31) rather than (28) to calculate $\mathbf{D}$ halves the number of linear systems that need to be solved. In the algorithm in Fig. 2, the calculation of $\mathbf{D}$ based on (31) is performed between lines 3 and 3.

Matrix $\mathbf{D}$ has μ² = 2^2m elements. Consequently, storing all elements of $\mathbf{D}$ in a computer’s main memory is possible only if the chromosome length m is not too large. It follows from (28) and (31) that the number of non-zero elements in $\mathbf{D}$ equals μ(m + 1) = (m + 1)2^m. Hence, $\mathbf{D}$ is a rather sparse matrix and a lot of memory can be saved by storing only its non-zero elements.^{Footnote 2} In addition to the memory efficiency of the way in which $\mathbf{D}$ is stored, one should also pay attention to the memory efficiency of the method that is used to solve the linear system given by (23) and (24). Gaussian elimination and other direct (i.e., non-iterative) methods for solving linear systems generally require that at least a large number of elements of the coefficient matrix, including zero elements, are stored in memory. Consequently, when using such a method to solve the linear system given by (23) and (24), it would not be possible to fully exploit the sparsity of $\mathbf{D}.$ Linear systems can also be solved using iterative methods that require only the non-zero elements of the coefficient matrix to be stored in memory. One such method is the method of successive overrelaxation (e.g., Barrett et al. 1994; Stewart 1994; Tijms 1994, 2003). In the algorithm in Fig. 2, this method is used to solve the linear system given by (23) and (24) (see lines 3–3 of the algorithm). In addition to an initial guess ${\hat{\mathbf{q}}_0}$ for the solution of the linear system, the method of successive overrelaxation also requires a value for the relaxation factor ω. The value of ω, which should be between 0 and 2, may have a large effect on the rate of convergence of the method, and for some values of ω the method may not converge at all. An appropriate value for ω has to be determined experimentally. For ω = 1, the method of successive overrelaxation reduces to the Gauss-Seidel method, which is another iterative method for solving linear systems. We refer to Stewart (1994) for an in-depth discussion of both the method of successive overrelaxation and a number of alternative methods for solving linear systems similar to the one given by (23) and (24). We further note that the amount of main memory in most of today’s computers allows the algorithm in Fig. 2 to be run for chromosomes with length m up to about 24 bits.

4 Application

In this section, we demonstrate an application of the algorithm presented in the previous section. We study the use of a GA for modeling the evolution of strategies in IPDs. The use of GAs in this context was first studied by Axelrod (1987) (reprinted in Axelrod 1997; see also Dawid 1996; Mitchell 1996) and after him by many others (e.g., Ashlock et al. 1996, 2006; Crowley et al. 1996; Ishibuchi and Namikawa 2005; Miller 1996; Mühlenbein 1991; Thibert-Plante and Charbonneau 2007; Van Bragt et al. 2001; Yao and Darwen 1994). The algorithm presented in the previous section is used to analyze the long-run behavior of our GA. The results of the analysis are compared with results obtained using computer simulations (i.e., results obtained simply by running the GA). We emphasize that our primary aim is merely to illustrate the usefulness of the mathematical analysis provided in Sect. 2 and of the algorithm derived from the analysis in Sect. 3. It is not our primary aim to provide new insights into the behavior of GAs in the context of IPDs.

4.1 Genetic algorithm modeling in iterated prisoner’s dilemmas

The way in which we model the evolution of strategies in IPDs is similar to the way in which this was done by Axelrod (1987). However, Axelrod studied two approaches for modeling the evolution of strategies. In one approach, the fitness of a chromosome is determined by the performance of the chromosome in IPD games against a fixed set of opponents. In the other approach, the fitness of a chromosome is determined by the performance of the chromosome in IPD games against other chromosomes in the population. We restrict our attention to the second approach. This is the approach on which almost all studies after Axelrod’s work have focused (an exception is Mittal and Deb 2006).

We model the evolution of strategies in IPDs using a GA with a population size of n = 20 chromosomes. Each chromosome represents a strategy for playing IPD games. Players in IPD games are assumed to choose the action they play, that is, whether they cooperate or defect, based on their own actions and their opponent’s actions in the previous τ periods of the game, where τ is referred to as players’ memory length. Players are further assumed to play only pure strategies. We use the same binary encoding of strategies as was used by Axelrod (1987). For a description of this encoding, we refer to Axelrod (1987, 1997), Dawid (1996), and Mitchell (1996). Using Axelrod’s encoding, the chromosome length m depends on the memory length τ. We consider three memory lengths, 1, 2, and 3 periods, which result in chromosome lengths of, respectively, 6, 20, and 70 bits. In each iteration of the GA, each chromosome in the population plays an IPD game of 151 periods against all other chromosomes. In addition, each chromosome also plays a game against itself. The payoff matrix for a single period of an IPD game is shown in Table 2. The payoffs in this matrix must satisfy

$$ S < P < R < T $$

(32)

and

$$ S + T < 2R. $$

(33)

The payoff obtained by a chromosome in an IPD game equals the mean payoff obtained by the chromosome in all periods of the game. The fitness f of a chromosome equals the mean payoff obtained by the chromosome in the IPD games that it has played in the current iteration of the GA. Like in Axelrod’s work (1987), we use sigma scaling (e.g., Mitchell 1996) to normalize the fitness of a chromosome. The normalized fitness $\tilde{f}$ of a chromosome is given by

$$ \tilde{f}=\left\{\begin{array}{ll}\max\left({{f - \mu_{\rm f}}\over {\sigma_{\rm f}}} + 1, 0\right), & \hbox{if}\; \sigma_{\rm f} > 0 \\ 1, & \hbox{otherwise}\end{array}\right. $$

(34)

where μ _f and σ_f denote, respectively, the mean and the standard deviation of the fitness of the chromosomes in the population. The selection operator that we use is roulette wheel selection. Selection is performed based on the normalized fitness of the chromosomes in the population. The crossover operator that we use is single-point crossover.

Table 2 Payoff matrix for a single period of an iterated prisoner’s dilemma game

Full size table

4.2 Calculation of the long-run limit distribution of the genetic algorithm

In this subsection, we are concerned with the calculation of the long-run limit distribution of the GA discussed in the previous subsection. To calculate the long-run limit distribution of the GA, we use the algorithm presented in Sect. 3. This algorithm assumes that the probabilities $\pi(i, j, \lambda, \lambda^{\prime})$ defined in (6) can be calculated for all i and all j such that δ(i, j) = 1 and for all $\lambda \in \{1, \ldots, n - 1\}$ and all $\lambda^{\prime} \in \{0, \ldots, n\}.$ We now discuss the calculation of the probabilities $\pi(i, j, \lambda,\lambda^{\prime})$ for our GA. For $i^{\prime}, j^{\prime} \in\mathcal{C},$ let $\varphi(i^{\prime}, j^{\prime})$ denote the payoff obtained by chromosome $i^{\prime}$ in an IPD game against chromosome $j^{\prime}.$ Suppose that the population in the current iteration of our GA equals v(i, j, λ), where i and j satisfy δ(i, j) = 1 and where $\lambda \in \{1, \ldots, n-1\}.$ That is, the population in the current iteration of our GA consists of λ times chromosome i and n − λ times chromosome j. The fitness f _i of chromosome i is then given by

$$ f_i=\frac{\lambda\varphi(i, i) + (n - \lambda)\varphi(i, j)}{n}. $$

(35)

Similarly, the fitness f _j of chromosome j is given by

$$ f_j=\frac{\lambda\varphi(j, i) + (n - \lambda)\varphi(j, j)}{n}.$$

(36)

Furthermore, the mean μ _f and the standard deviation σ_f of the fitness of the chromosomes in the population are equal to, respectively,

$$ \mu_{\rm f}=\frac{{\lambda}f_i + (n - \lambda)f_j}{n}$$

(37)

and

$$ \sigma_{\rm f} = \sqrt{\frac{\lambda{(f_i - \mu_{\rm f}) ^ 2} + (n - \lambda){(f_j - \mu_{\rm f}) ^ 2}}{n}}.$$

(38)

The normalized fitness $\tilde{f}_i$ of chromosome i is obtained by substituting f _i, μ_f, and σ_f into (34). The normalized fitness $\tilde{f}_j$ of chromosome j is obtained in a similar way. Let $\tilde{\pi}_i$ and $\tilde{\pi}_j$ denote the probabilities that the roulette wheel selection operator selects, respectively, chromosome i and chromosome j. Obviously, $\tilde{\pi}_i$ and $\tilde{\pi}_j$ equal

$$ \tilde{\pi}_i=\frac{\lambda\tilde{f}_i}{\lambda\tilde{f}_i + (n - \lambda)\tilde{f}_j}, \quad \tilde{\pi}_j =\frac{(n - \lambda)\tilde{f}_j}{\lambda\tilde{f}_i +(n-\lambda)\tilde{f}_j}. $$

(39)

$\pi(i, j, \lambda, \lambda^{\prime}),$ where $\lambda^{\prime} \in\{0, \ldots, n\},$ equals the probability that the roulette wheel selection operator turns population v(i, j, λ) into population $v(i, j, \lambda^{\prime})$ in a single iteration of our GA. Taking into account that the roulette wheel selection operator selects chromosomes independently of each other, it can be seen that $\pi(i, j, \lambda, \lambda^{\prime})$ equals the probability mass function of a binomial distribution and is given by

$$ \pi(i, j, \lambda, \lambda^{\prime}) = \left( {\begin{array}{*{20}c} n \\ {\lambda ^{'} } \\ \end{array} } \right) {\tilde{\pi}_i{{} ^ {\lambda^{\prime}}}}{\tilde{\pi}_j{{}^{n - \lambda^{\prime}}}} $$

(40)

where the binomial coefficient $ \left( {\begin{array}{*{20}c} n \\ {\lambda ^{'} } \\ \end{array} } \right) $ is defined as

$$ \left( {\begin{array}{*{20}c} n \\ {\lambda ^{'} } \\ \end{array} } \right) = \frac{n!}{\lambda^{\prime}!(n - \lambda^{\prime})!}. $$

(41)

The algorithm presented in Sect. 3 also assumes that all non-uniform populations are almost uniform and that each chromosome in $\mathcal{C}$ is connected to all other chromosomes in $\mathcal{C}.$ Because of the use of roulette wheel selection, the assumption that all non-uniform populations are almost uniform is satisfied. The assumption that each chromosome in $\mathcal{C}$ is connected to all other chromosomes in $\mathcal{C}$ is satisfied if and only if matrix $\mathbf{D}$ calculated in lines 3–3 of the algorithm in Fig. 2 is irreducible. ($\mathbf{D} =\left[\begin{array}{l}d(i, j)\end{array}\right]$ is said to be irreducible if and only if there does not exist a non-empty set of chromosomes $\widetilde{\mathcal{C}} \subset \mathcal{C}$ such that d(i, j) = 0 for all $i \in \widetilde{\mathcal{C}}$ and all $j \in\mathcal{C} \setminus \widetilde{\mathcal{C}}.$) For the particular values that we use for the parameters S, P, R, T, and τ (see the next subsection), $\mathbf{D}$ turns out to be irreducible. Hence, the assumption that each chromosome in $\mathcal{C}$ is connected to all other chromosomes in $\mathcal{C}$ is satisfied.

4.3 Analysis of the long-run behavior of the genetic algorithm

In this subsection, we analyze the long-run behavior of our GA for the prisoner’s dilemma payoffs S = 0, P = 1, R = 3, and T = 5. These are the same payoffs as were used by Axelrod (1987) (see also Axelrod 1984) and by many others. The analysis is performed using the algorithm presented in Sect. 3. The use of this algorithm to analyze the long-run behavior of our GA was discussed in the previous subsection. We compare the results obtained using the algorithm with results obtained using computer simulations (i.e., results obtained simply by running the GA).^{Footnote 3} The parameter settings that we use are summarized in Table 3.

Table 3 Genetic algorithm parameter settings

Full size table

The long-run limit distribution for a memory length of τ = 1 period is shown in Fig. 3 (in dark gray). The distribution was calculated using the algorithm from Sect. 3. As mentioned earlier, τ = 1 results in a chromosome length of m = 6 bits. This implies that there are μ = 2^m = 64 different chromosomes and, as a consequence, that there are 64 different uniform populations. The long-run limit distribution is a probability distribution over these populations. As can be seen in Fig. 3, the long-run limit distribution spreads most of its mass over approximately 15 populations. It puts almost no mass on the remaining populations. Since all chromosomes in a uniform population are identical and represent the same strategy, the long-run limit distribution can be used to determine the long-run limit probability that a particular strategy is played. However, when doing so, it should be noted that there is some redundancy in the binary encoding of strategies that we use (as was already pointed out by Axelrod (1987). Due to this redundancy, it is possible that different chromosomes represent the same strategy. Some strategies can be encoded in two or three different ways, and the strategies always cooperate and always defect can even be encoded in twelve different ways. Taking into account the redundancy in the encoding, we have calculated the long-run limit probabilities of all possible strategies. The six strategies with the highest long-run limit probability are reported in Table 4. Together, these strategies have a long-run limit probability of almost 0.95. The remaining strategies all have very low long-run limit probabilities. It is sometimes claimed (e.g., Axelrod 1984, 1987) that a very effective strategy for playing IPD games is the tit-for-tat strategy, which is the strategy of cooperating in the first period and repeating the opponent’s previous action thereafter. The results reported in Table 4 do not really support this claim. As can be seen in the table, the always defect strategy has by far the highest long-run limit probability. In the long run, this strategy is played about 43% of the time. The tit-for-tat strategy has a long-run limit probability of no more than 0.14. This is even slightly less than the long-run limit probability of another cooperative strategy, namely the strategy that keeps cooperating until the opponent defects and then keeps defecting forever.

Table 4 The six strategies with the highest long-run limit probability (reported in the first column)

Full size table

In order to check the correctness of the algorithm presented in Sect. 3, we have also used computer simulations to analyze the long-run behavior of our GA. In other words, we have also analyzed the long-run behavior of our GA simply by running the GA. Like above, we first focus on the behavior of the GA for a memory length of τ = 1 period. We performed 500 runs of the GA. The crossover rate was set to γ = 1.0, and the mutation rate was set to $\varepsilon = 10 ^{-5}.$ Because of the very small value of $\varepsilon,$ the simulation results should be similar to the results obtained using the algorithm from Sect. 3. (Recall that the latter results hold in the limit as $\varepsilon$ approaches zero.) Each run of the GA lasted $2 \times 10 ^ 5$ iterations. This seemed sufficient for the GA to reach its steady state. After the last iteration of a GA run, we almost always observed that the population was uniform. Based on the 500 GA runs that we had performed, we determined for each uniform population the probability of observing that population at the end of a GA run. In this way, we obtained a probability distribution over the uniform populations. This distribution is shown in Fig. 3 (in light gray). Figure 3 allows us to compare the distribution with the long-run limit distribution calculated using the algorithm from Sect. 3. It can be seen that the two distributions are very similar. This confirms the correctness of the algorithm presented in Sect. 3.

In order to examine to what extent our GA results in the evolution of cooperative strategies, we now focus on the long-run mean fitness, that is, the mean fitness of a chromosome after a large number of iterations of the GA. For various values of the memory length τ, the crossover rate γ, and the mutation rate $\varepsilon,$ the long-run mean fitness obtained using computer simulations is reported in Table 5. The associated 95% confidence interval is also provided in the table. The simulation results for τ = 1 are based on 500 runs of the GA, and the simulation results for τ = 2 and τ = 3 are based on 200 runs. (Simulation runs with τ = 2 and τ = 3 took more computing time than simulation runs with τ = 1. We therefore performed a smaller number of runs with τ = 2 and τ = 3 than with τ = 1.) Each run lasted $2 \times 10 ^ 5$ iterations. The long-run mean fitness was calculated by taking the average over all GA runs of the mean fitness of a chromosome at the end of a run. In the limit as $\varepsilon$ approaches zero, the long-run mean fitness can be calculated exactly and does not depend on γ. The calculation of the long-run mean fitness is based on the long-run limit distribution of the GA, which can be obtained using the algorithm presented in Sect. 3. For τ = 1 and τ = 2, the long-run mean fitness in the limit as $\varepsilon$ approaches zero is reported in Table 5. For τ = 3, we cannot calculate the long-run limit distribution of the GA and we therefore do not know the long-run mean fitness in the limit as $\varepsilon$ approaches zero. Calculating the long-run limit distribution of the GA is impossible for τ = 3 because the chromosome length equals m = 70 bits and because for such a chromosome length storing the long-run limit distribution requires a prohibitive amount of computer memory.

Table 5 Long-run mean fitness and associated 95% confidence interval for various values of the memory length τ, the crossover rate γ, and the mutation rate $\varepsilon$

Full size table

Based on the results in Table 5, a number of observations can be made. First, for τ = 1 and τ = 2, the results obtained for $\varepsilon = 10 ^ {-4}$ and $\varepsilon = 10 ^ {-5}$ turn out to be very similar to the results obtained for $\varepsilon \rightarrow0.$ This again confirms the correctness of the algorithm presented in Sect. 3. Second, for τ = 1, we find that the results are quite sensitive to the value of $\varepsilon.$ Studies on GA modeling sometimes report that the long-run behavior of a GA is relatively insensitive to the value of $\varepsilon.$ Our results demonstrate that this need not always be the case. Third, for small values of $\varepsilon,$ it can be seen that increasing τ leads to a higher long-run mean fitness and, hence, to more cooperation. The evolution of cooperative strategies in IPD games therefore seems more likely when players have longer memory lengths. Finally, it can be observed that the value of γ has no significant effect on our results. This is in line with the mathematical analysis provided in Sect. 2. The mathematical analysis implies that for $\varepsilon \rightarrow 0$ the long-run mean fitness is independent of γ. The results in Table 5 indicate that this is the case not only for $\varepsilon \rightarrow 0$ but more generally.

5 Conclusions

In this paper, we have presented a mathematical analysis of the long-run behavior of GAs that are used for modeling social phenomena. Under the assumption of a positive but infinitely small mutation rate, the analysis provides a full characterization of the long-run behavior of GAs with a binary encoding. Based on the analysis, we have derived an algorithm for calculating the long-run behavior of GAs. In an economic context, the algorithm can for example be used to determine whether convergence to an equilibrium will take place and, if so, what kind of equilibrium will emerge. Compared with computer simulations, the main advantage of the algorithm that we have derived is that it calculates the long-run behavior of GAs exactly. Computer simulations only estimate the long-run behavior of GAs.

To demonstrate the usefulness of our mathematical analysis, we have replicated a well-known study by Axelrod in which a GA is used to model the evolution of strategies in IPDs (Axelrod 1987). We have used both our exact algorithm and computer simulations to replicate Axelrod’s study. By comparing the results of the two approaches, we have confirmed the correctness of our algorithm. We have also obtained some interesting new insights. For example, when players have a memory length of one period, the tit for tat strategy turns out to be less important than is sometimes claimed (e.g., Axelrod 1984, 1987). In the long run, the strategy is played only 14% of the time. Another finding is that the long-run behavior of a GA can be quite sensitive to the value of the mutation rate. We regard this as a serious problem, since the value of the mutation rate is typically chosen in a fairly arbitrary way without any empirical justification (see also Dawid 1996).

The mathematical analysis that we have presented also reveals that if the mutation rate is infinitely small the crossover rate has no effect on the long-run behavior of a GA. This remarkable result is perfectly in line with the simulation results that we have reported in Sect. 4. For various values of the mutation rate, the simulation results show no significant effect of the crossover rate on the long-run behavior of a GA. Hence, when GAs are used for modeling social phenomena, the crossover rate seems to be a rather unimportant parameter, at least when the focus is on the long run (for the short run, see Thibert-Plante and Charbonneau 2007). It seems likely that in many cases leaving out the crossover operator altogether has no significant effect on the long-run behavior of a GA. Interestingly, leaving out the crossover operator brings GAs quite close to well-known models in evolutionary game theory, such as those studied in Kandori et al. (1993) and Vega-Redondo (1997).

Finally, we note that an analysis such as the one presented in this paper can be performed not only for GAs with a binary encoding but also for other types of evolutionary algorithms. From a modeling point of view, a binary encoding in many cases has the disadvantage that it lacks a clear interpretation (e.g., Dawid 1996). The use of a binary encoding can therefore be difficult to justify and may even lead to artifacts (as shown in Waltman and Van Eck 2009; Waltman et al. 2011). Probably for these reasons, some researchers use evolutionary algorithms without a binary encoding (e.g., Haruvy et al. 2006; Lux and Schornstein 2005). The analysis presented in this paper then does not directly apply. However, when the action space of agents is assumed discrete, the long-run behavior of evolutionary algorithms without a binary encoding can still be analyzed in a similar way as we have done in this paper, namely by relying on mathematical results provided by Freidlin and Wentzell (1998). This indicates that our approach is quite general and can be adapted relatively easily to other types of evolutionary algorithms.

Notes

In this paper, we use a standard binary encoding. Hence, if m = 2, chromosomes 0, 1, 2, and 3 have binary encodings 00, 01, 10, and 11, respectively. We emphasize that the use of a standard binary encoding is by no means essential for our analysis. Other binary encoding schemes, such as Gray encoding, can be used as well. This does not require any significant changes in our analysis.
The non-zero elements of $\mathbf{D}$ can be stored efficiently by using two arrays: a one-dimensional array of size μ for the diagonal elements of $\mathbf{D}$ and a two-dimensional array of size m × μ for the non-zero off-diagonal elements of $\mathbf{D}.$ The element in the κth row and the ith column of the latter array is used to store d(j, i), where j has the same binary encoding as i except that the κth bit is inverted.
The software used to obtain the results reported in this subsection is available online at http://www.ludowaltman.nl/ga_analysis/. The software runs in MATLAB and has been written partly in the MATLAB programming language and partly in the C programming language.
The terminology used in Kemeny and Snell (1960) differs from the terminology used in many other texts on Markov chains. In particular, an ergodic Markov chain in Kemeny and Snell (1960) corresponds to an irreducible Markov chain in this paper.

References

Alkemade F, La Poutré JA, Amman HM (2006) Robust evolutionary algorithm design for socio-economic simulation. Comput Econ 28(4):355–370
Article Google Scholar
Alkemade F, La Poutré JA, Amman HM (2007) On social learning and robust evolutionary algorithm design in the Cournot oligopoly game. Comput Intell 23(2):162–175
Article Google Scholar
Alkemade F, La Poutré JA, Amman HM (2009) Robust evolutionary algorithm design for socio-economic simulation: a correction. Comput Econ 33(1):99–101
Article Google Scholar
Andreoni J, Miller JH (1995) Auctions with artificial adaptive agents. Game Econ Behav 10(1):39–64
Article MATH Google Scholar
Arifovic J (1994) Genetic algorithm learning and the cobweb model. J Econ Dyn Control 18(1):3–28
Article MATH Google Scholar
Arifovic J (1996) The behavior of the exchange rate in the genetic algorithm and experimental economies. J Political Econ 104(3):510–541
Article Google Scholar
Ashlock D, Kim EY, Leahy N (2006) Understanding representational sensitivity in the iterated prisoner’s dilemma with fingerprints. IEEE Trans Syst Man Cybern C 36(4):464–475
Article Google Scholar
Ashlock D, Smucker MD, Stanley EA, Tesfatsion L (1996) Preferential partner selection in an evolutionary study of prisoner’s dilemma. Biosystems 37(1–2):99–125
Article Google Scholar
Axelrod R (1984) The evolution of cooperation. Basic Books, London
Google Scholar
Axelrod R (1987) The evolution of strategies in the iterated prisoner’s dilemma. In: Davis L (ed) Genetic algorithms and simulated annealing. Morgan Kaufmann, Menlo Park, pp 32–41
Google Scholar
Axelrod R (1997) The complexity of cooperation: agent-based models of competition and collaboration. Princeton University Press, NJ
Google Scholar
Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst H (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. SIAM, Philadelphia
Book Google Scholar
Brenner T (2006) Agent learning representation: advice on modelling economic learning. In: Tesfatsion L, Judd KL (eds) Handbook of computational economics, vol 2. Elsevier, Amsterdam, pp 895–947
Chapter Google Scholar
Chong SY, Yao X (2005) Behavioral diversity, choices and noise in the iterated prisoner’s dilemma. IEEE Trans Evol Comput 9(6):540–551
Article Google Scholar
Crowley PH, Provencher L, Sloane S, Dugatkin LA, Spohn B, Rogers L, Alfieri M (1996) Evolving cooperation: the role of individual recognition. Biosystems 37(1–2):49–66
Article Google Scholar
Dawid H (1996) Adaptive learning by genetic algorithms: analytical results and applications to economic models. No. 441 in lecture notes in economics and mathematical systems. Springer, Berlin
Google Scholar
Fogel DB (1993) Evolving behaviors in the iterated prisoner’s dilemma. Evol Comput 1(1):77–97
Article MathSciNet Google Scholar
Foster D, Young P (1990) Stochastic evolutionary game dynamics. Theor Popul Biol 38(2):219–232
Article MathSciNet MATH Google Scholar
Freidlin MI, Wentzell AD (1998) Random perturbations of dynamical systems, 2nd edn. Springer, Berlin
Book MATH Google Scholar
Fudenberg D, Levine DK (1998) The theory of learning in games. MIT Press, Cambridge
MATH Google Scholar
Gen M, Cheng R (2000) Genetic algorithms and engineering optimization. Wiley, New York
Google Scholar
Georges C (2006) Learning with misspecification in an artificial currency market. J Econ Behav Organ 60(1):70–84
Article Google Scholar
Gintis H (2000) Game theory evolving: a problem-centered introduction to modeling strategic interaction. Princeton University Press, NJ
MATH Google Scholar
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading
MATH Google Scholar
Haruvy E, Roth AE, Utku Ünver M (2006) The dynamics of law clerk matching: an experimental and computational investigation of proposals for reform of the market. J Econ Dyn Control 30(3):457–486
Article MATH Google Scholar
Holland JH, Miller JH (1991) Artificial adaptive agents in economic theory. Am Econ Rev (Pap Proc) 81(2):365–370
Google Scholar
Ishibuchi H, Namikawa N (2005) Evolution of iterated prisoner’s dilemma game strategies in structured demes under random pairing in game playing. IEEE Trans Evol Comput 9(6):552–561
Article Google Scholar
Kandori M, Mailath GJ, Rob R (1993) Learning, mutation, and long run equilibria in games. Econometrica 61:29–56
Article MathSciNet MATH Google Scholar
Kemeny JG, Snell JL (1960) Finite Markov chains. D. Van Nostrand Company, New York
MATH Google Scholar
Lux T, Schornstein S (2005) Genetic learning as an explanation of stylized facts of foreign exchange markets. J Math Econ 41(1–2):169–196
Article MathSciNet MATH Google Scholar
Marks RE (1992) Breeding hybrid strategies: optimal behaviour for oligopolists. J Evol Econ 2(1):17–38
Article MathSciNet Google Scholar
Maynard Smith J (1982) Evolution and the theory of games. Cambridge University Press, London
MATH Google Scholar
Michalewicz Z (1996) Genetic algorithms + data structures = evolution programs, 3rd edn. Springer, Berlin
MATH Google Scholar
Miller JH (1986) A genetic model of adaptive economic behavior. Working paper, University of Michigan
Miller JH (1996) The coevolution of automata in the repeated prisoner’s dilemma. J Econ Behav Organ 29(1):87–112
Article Google Scholar
Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge
Google Scholar
Mittal S, Deb K (2006) Optimal strategies of the iterated prisoner’s dilemma problem for multiple conflicting objectives. In: Proceedings of the 2006 IEEE symposium on computational intelligence and games, pp 197–204
Mühlenbein H (1991) Darwin’s continent cycle theory and its simulation by the prisoner’s dilemma. Complex Syst 5(5):459–478
MATH Google Scholar
Nix AE, Vose MD (1992) Modeling genetic algorithms with Markov chains. Ann Math Artif Intell 5(1):79–88
Article MathSciNet MATH Google Scholar
Rudolph G (1994) Convergence analysis of canonical genetic algorithms. IEEE Trans Neural Netw 5(1):96–101
Article Google Scholar
Rudolph G (1998) Finite Markov chain results in evolutionary computation: a tour d’horizon. Fundamenta Informaticae 35(1–4):67–89
MathSciNet MATH Google Scholar
Stewart WJ (1994) Introduction to the numerical solution of Markov chains. Princeton University Press, NJ
MATH Google Scholar
Tesfatsion L (2006) Agent-based computational economics: a constructive approach to economic theory. In: Tesfatsion L, Judd K (eds) Handbook of computational economics, vol 2. Elsevier, Amsterdam, pp 831–880
Chapter Google Scholar
Thibert-Plante X, Charbonneau P (2007) Crossover and evolutionary stability in the prisoner’s dilemma. Evol Comput 15(3):321–344
Article Google Scholar
Tijms HC (1994) Stochastic models: an algorithmic approach. Wiley, New York
MATH Google Scholar
Tijms HC (2003) A first course in stochastic models. Wiley, New York
Book MATH Google Scholar
Van Bragt DDB, Van Kemenade C, La Poutré JA (2001) The influence of evolutionary selection schemes on the iterated prisoner’s dilemma. Comput Econ 17(2–3):253–263
Article MATH Google Scholar
Vega-Redondo F (1996) Evolution, games, and economic behaviour. Oxford University Press, NY
Book Google Scholar
Vega-Redondo F (1997) The evolution of Walrasian behavior. Econometrica 65:375–384
Article MATH Google Scholar
Vose MD (1999) The simple genetic algorithm: foundations and theory. MIT Press, Cambridge
MATH Google Scholar
Vriend NJ (2000) An illustration of the essential difference between individual and social learning, and its consequences for computational analyses. J Econ Dyn Control 24(1):1–19
Article MathSciNet MATH Google Scholar
Waltman L, Van Eck NJ (2009) Robust evolutionary algorithm design for socio-economic simulation: some comments. Comput Econ 33(1):103–105
Article Google Scholar
Waltman L, van Eck NJ, Dekker R, Kaymak U (2011) Economic modeling using evolutionary algorithms: the effect of a binary encoding of strategies. J Evol Econ 21(5):737–756
Article Google Scholar
Weibull JW (1995) Evolutionary game theory. MIT Press, Cambridge
MATH Google Scholar
Yao X, Darwen P (1994) An experimental study of N-person iterated prisoner’s dilemma games. Informatica 18(4):435–450
MATH Google Scholar
Young HP (1993) The evolution of conventions. Econometrica 61:57–84
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We would like to thank Uzay Kaymak and Rommert Dekker for their comments on earlier drafts of this paper.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Author information

Authors and Affiliations

Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands
Ludo Waltman & Nees Jan van Eck

Authors

Ludo Waltman
View author publications
You can also search for this author in PubMed Google Scholar
Nees Jan van Eck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ludo Waltman.

Appendix

In this appendix, we prove the mathematical results presented in Sect. 2. Before proving the results, we first provide some definitions and lemmas on Markov chains.

Definition 6

A collection of random variables {X _t}, where the index t takes values in $\{0, 1, \ldots\}$ and where $X_0, X_1,\ldots$ take values in a finite set $\mathcal{X},$ is called a finite discrete-time Markov chain if

$$ \Pr(X_{t + 1} = x_{t + 1} \vert X_t = x_t)= \\ \Pr(X_{t + 1} = x_{t + 1} \vert X_t = x_t, \ldots, X_0 = x_0) $$

(42)

for all t and all $x_0, \ldots, x_{t + 1} \in \mathcal{X}.$ The elements of $\mathcal{X}$ are called the states of the Markov chain. $\mathcal{X}$ is called the state space of the Markov chain.

Definition 7

A finite discrete-time Markov chain {X _t} is said to be time-homogeneous if

$$ \Pr(X_{t + 1} = x_{t + 1} \vert X_t = x_t) = p(x_t, x_{t + 1}) $$

(43)

for all t, all $x_t, x_{t + 1} \in \mathcal{X},$ and some function $p: \mathcal{X} ^ 2 \rightarrow [0, 1]$ that does not depend on t. For $x, x^{\prime} \in \mathcal{X},$ the probability $p(x,x^{\prime})$ is called the transition probability from state x to state $x^{\prime}.$ The matrix

$$ {\mathbf{P}}=\left[\begin{array}{l} p(x, x^{\prime})\end{array}\right]_{x, x^{\prime} \in {\mathcal{X}}} $$

(44)

is called the transition matrix of the Markov chain.

In the remainder of this appendix, the term Markov chain always refers to a finite discrete-time Markov chain that is time-homogeneous.

Definition 8

Consider a Markov chain {X _t}. A row vector ${\bar{\mathbf{p}}} = [{\bar{p}}(x)]_{x \in \mathcal{X}}$ that satisfies

$$ {\bar{\mathbf{p}}}{\mathbf{P}}={\bar{\mathbf{p}}}$$

(45)

$$ {{\bar{\mathbf{p}}}}{\mathbf{1}}=1$$

(46)

is called a stationary distribution of the Markov chain. For $x \in \mathcal{X},$ the probability $\bar{p}(x)$ is called the stationary probability of state x.

Definition 9

A Markov chain {X _t} is said to be irreducible if for each $x, x^{\prime} \in \mathcal{X}$ there exists a positive integer N such that $\Pr(X_{t + N} = x^{\prime}\vert X_t = x) > 0.$

Lemma 2

If a Markov chain {X _t} is irreducible, it has a unique stationary distribution ${\bar{\mathbf{p}}}.$

Proof

See, for example, Tijms (1994, Th. 2.3.3).

Definition 10

An irreducible Markov chain {X _t} is said to be aperiodic if for each $x \in \mathcal{X}$ there exists a positive integer N such that $\Pr(X_{t + M} = x \vert X_t = x) >0$ for all integers M ≥ N.

Lemma 3

If a Markov chain {X _t} is irreducible and aperiodic, then

$$ \lim_{t \rightarrow \infty}\Pr(X_t = x \vert X_0 = x_0) = \bar{p}(x) $$

(47)

for all $x, x_0 \in \mathcal{X}.$

Proof

See, for example, Tijms (1994, Th. 2.3.1 and Lemma 2.3.2).

Lemma 4

Let a Markov chain {X _t} be irreducible. Let $\mathcal{Y} \subset \mathcal{X}$ and $\mathcal{Y} \neq\varnothing.$ Let

$$ {\mathbf{T}}=\left[\begin{array}{l} p(x, x^{\prime})\end{array}\right]_{x, x^{\prime} \in {\mathcal{Y}}},\quad {\mathbf{U}} = \left[\begin{array}{l}p(x, x^{\prime})\end{array}\right]_{x \in {\mathcal{Y}}, x^{\prime} \in {\mathcal{X}} \setminus {\mathcal{Y}}} $$

(48)

$$ {\mathbf{V}}=\left[\begin{array}{l}p(x, x^{\prime})\end{array}\right]_{x \in {\mathcal{X}} \setminus {\mathcal{Y}}, x^{\prime} \in {\mathcal{Y}}},\quad {\mathbf{W}} =\left[\begin{array}{l}p(x, x^{\prime})\end{array}\right]_{x, x^{\prime} \in {\mathcal{X}} \setminus {\mathcal{Y}}} $$

(49)

and let

$$ {\mathbf{P}}_Y = {\mathbf{T}} + {\mathbf{U}}{({\mathbf{I}} - {\mathbf{W}}) ^ {-1}}{\mathbf{V}}. $$

(50)

Let {Y _t} denote a Markov chain with state space $\mathcal{Y}$ and transition matrix $\mathbf{P}_Y.$ Markov chain {Y _t} is then irreducible and has stationary probabilities $\bar{p}_Y(y)$ that are given by

$$ \bar{p}_Y(y) = \frac{\bar{p}(y)}{\sum_{y^{\prime} \in {\mathcal{Y}}}\bar{p}(y^{\prime})} $$

(51)

where $y \in \mathcal{Y}.$

Proof

See Kemeny and Snell (1960, Th. 6.1.1).^{Footnote 4}

Definition 11

Consider a set $\mathcal{X}.$ For $x, x^{\prime} \in\mathcal{X},$ the ordered pair $(x, x^{\prime})$ is called an arrow from x to $x^{\prime}.$ For $x_1, \ldots, x_N \in\mathcal{X},$ the sequence of arrows $((x_1, x_2), (x_2, x_3),\ldots, (x_{N - 2}, x_{N - 1}), (x_{N - 1}, x_N))$ is called a path from x ₁ to x _N. For $x \in \mathcal{X},$ a set of arrows E is called an x-tree on $\mathcal{X}$ if it satisfies the following conditions:

1.
E contains no arrow that starts at x.
2.
For each $x^{\prime} \in \mathcal{X} \setminus \{x\}, E$ contains exactly one arrow that starts at $x^{\prime}.$
3.
For each $x^{\prime} \in \mathcal{X} \setminus \{x\}, E$ contains a path from $x^{\prime}$ to x (or, formulated more accurately, for each $x^{\prime} \in \mathcal{X} \setminus \{x\},$ there exists a path from $x^{\prime}$ to x such that E contains all arrows of the path).

Lemma 5

Let a Markov chain {X _t} be irreducible. For $x \in \mathcal{X},$ let $\mathcal{E}(x)$ denote the set of all x-trees on $\mathcal{X}.$ The stationary probabilities $\bar{p}(x)$ of the Markov chain are then given by

$$ \bar{p}(x) = \frac{\tilde{p}(x)}{\sum_{x^{\prime} \in {\mathcal{X}}}\tilde{p}(x^{\prime})} $$

(52)

where $x \in \mathcal{X}$ and

$$ \tilde{p}(x) = \sum_{E \in {\mathcal{E}}(x)}\prod_{(x, x^{\prime}) \in E}p(x, x^{\prime}). $$

(53)

Proof

A proof is provided by Freidlin and Wentzell (1998, Ch. 6, Lemma 3.1) (see also Dawid 1996, Th. 4.2.1).

Using the above definitions and lemmas, we now prove the mathematical results presented in Sect. 2.

Proof of Lemma 1

Notice that

$$ \Pr(W_{t + 1} = w_{t + 1} \vert W_t = w_t) = \\ \Pr(W_{t + 1} = w_{t + 1} \vert W_t = w_t, \ldots, W_0 = w_0) $$

(54)

for all $t \in \{0, 1, \ldots\}$ and all $w_0, \ldots, w_{t + 1} \in\mathcal{W}.$ That is, the population in iteration t + 1 of a GA depends only on the population in iteration t. Given the population in iteration t, the population in iteration t + 1 is independent of the populations in iterations $0, \ldots, t - 1.$ Notice further that

$$ \Pr(W_{t + 1} = w_{t + 1} \vert W_t = w_t) = q(w_t, w_{t + 1}) $$

(55)

for all $t \in \{0, 1, \ldots\},$ all $w_t, w_{t + 1} \in\mathcal{W},$ and some function $q: \mathcal{W} ^ 2 \rightarrow [0,1]$ that does not depend on t. That is, the probability of going from one population to some other population remains constant over time. (Recall that the crossover rate γ and the mutation rate $\varepsilon$ are assumed to remain constant over time.) It now follows from Definitions 6 and 7 that {W _t}, where the index t takes values in $\{0, 1, \ldots\},$ is a Markov chain with state space $\mathcal{W}$ and transition probabilities $q(w, w^{\prime}).$ Since the mutation rate $\varepsilon$ is assumed to be positive, any population can be turned into any other population in a single iteration of a GA. Hence, $q(w, w^{\prime}) > 0$ for all $w,w^{\prime} \in \mathcal{W}.$ Consequently, it follows from Definitions 9 and 10 that Markov chain {W _t} is irreducible and aperiodic. Lemma 3 then implies that for each population $w \in\mathcal{W}$ there exists a stationary probability $\bar{q}(w)$ such that

$$ \lim_{t \rightarrow \infty}\Pr(W_t = w \vert W_0 = w_0) = \bar{q}(w) $$

(56)

for all $w_0 \in \mathcal{W}.$ We refer to a stationary probability $\bar{q}(w)$ as the long-run probability of population w. Finally, (9) is obtained from (56) by taking into account the time-homogeneity of Markov chain {W _t}. This completes the proof of Lemma 1.

Proof of Theorem 1

As shown in the proof of Lemma 1, {W _t}, where the index t takes values in $\{0, 1, \ldots\},$ is an irreducible and aperiodic Markov chain with state space $\mathcal{W}.$ Markov chain {W _t} has stationary probabilities $\bar{q}(w),$ to which we refer as long-run probabilities. We now introduce some additional mathematical notation. Like in the proof of Lemma 1, the function $q: \mathcal{W} ^ 2 \rightarrow [0, 1]$ denotes the transition probabilities of Markov chain {W _t}. For $w, w^{\prime} \in \mathcal{W}, q(w, w^{\prime})$ is a polynomial in the mutation rate $\varepsilon$ and can therefore be written as

$$ q(w, w^{\prime}) = \sum_{l = 0}^\infty\alpha(w, w^{\prime}, l){\varepsilon ^ l} $$

(57)

where $\alpha(w, w^{\prime}, 0), \alpha(w, w^{\prime}, 1), \ldots$ denote the coefficients of the polynomial. $c(w, w^{\prime})$ is defined as

$$ c(w, w^{\prime}) = \min\{l \vert \alpha(w, w^{\prime}, l) \neq 0\}. $$

(58)

That is, $c(w, w^{\prime})$ is defined as the rate at which $q(w,w^{\prime})$ approaches zero as $\varepsilon$ approaches zero. It follows from this definition that $c(w, w^{\prime})$ equals the minimum number of mutations required to go from population w to population $w^{\prime}$ in a single iteration of a GA. $\alpha(w,w^{\prime})$ is defined as

$$ \alpha(w, w^{\prime}) = \alpha(w, w^{\prime}, c(w, w^{\prime})). $$

(59)

For $w \in \mathcal{W}, \tilde{q}(w)$ is defined as

$$ \tilde{q}(w) = \sum_{E \in {\mathcal{E}}(w)}\prod_{(w, w^{\prime}) \in E}q(w, w^{\prime}) $$

(60)

where $\mathcal{E}(w)$ denotes the set of all w-trees on $\mathcal{W}.$ Since the transition probabilities $q(w, w^{\prime})$ are polynomials in $\varepsilon, \tilde{q}(w)$ is a polynomial in $\varepsilon$ too. $\tilde{q}(w)$ can therefore be written as

$$ \tilde{q}(w) = \sum_{l = 0}^\infty\tilde{\alpha}(w, l){\varepsilon ^l} $$

(61)

where $\tilde{\alpha}(w, 0), \tilde{\alpha}(w, 1), \ldots$ denote the coefficients of the polynomial. $\tilde{c}(w)$ is defined as

$$ \tilde{c}(w) = \min\{l \vert \tilde{\alpha}(w, l) \neq 0\}. $$

(62)

That is, $\tilde{c}(w)$ is defined as the rate at which $\tilde{q}(w)$ approaches zero as $\varepsilon$ approaches zero. $\tilde{\alpha}(w)$ is defined as

$$ \tilde{\alpha}(w) = \tilde{\alpha}(w, \tilde{c}(w)). $$

(63)

Using the mathematical notation introduced above, we first prove part (1) of Theorem 1. It follows from (57), (58), and (60)–(62) that $\tilde{c}(w)$ can be written as

$$ \tilde{c}(w) = \min_{E \in {\mathcal{E}}(w)}\sum_{(w, w^{\prime}) \in E}c(w, w^{\prime}). $$

(64)

At least one mutation is required to go from a uniform population $u\in \mathcal{U}$ to any other population $w \in \mathcal{W}\setminus \{u\}.$ Hence, c(u, w) ≥ 1 for all $u \in\mathcal{U}$ and all $w \in \mathcal{W}$ such that u ≠ w. Consequently, it follows from (64) that $\tilde{c}(u) \geq \mu - 1$ for all $u \in \mathcal{U}$ and that $\tilde{c}(w) \geq \mu$ for all $w \in \mathcal{W} \setminus \mathcal{U}.$ We now show that for each chromosome i it is possible to construct a u(i)-tree E on $\mathcal{W}$ that satisfies

$$ \sum_{(w, w^{\prime}) \in E}c(w, w^{\prime}) = \mu - 1. $$

(65)

Consider an arbitrary chromosome i. Let the function $\rho:\mathcal{C} \rightarrow \mathcal{C}$ satisfy the following conditions:

1.
For each j ≠ i, chromosome j is directly connected to chromosome ρ(j).
2.
For each j ≠ i, ρ^N(j) = i for some positive integer N.

In condition (2), ρ^N(j) is defined as

$$ \rho^N(j) =\left\{\begin{array}{ll}\rho(j), & \hbox {if $N = 1$} \\ \rho\left(\rho^{N - 1}(j)\right), & \hbox {otherwise.}\end{array}\right. $$

(66)

Because Theorem 1 assumes that each chromosome is connected to all other chromosomes, a function ρ that satisfies the above two conditions is guaranteed to exist. In order to construct a u(i)-tree E on $\mathcal{W}$ that satisfies (65), we start with an empty set of arrows E. For each j ≠ i, we then add an arrow to E that starts at u(j) and ends at v(j, ρ(j), n − 1). It follows from condition (1) that one mutation is required to go from u(j) to v(j, ρ(j), n − 1) in a single iteration of a GA. Hence, c(u(j), v(j, ρ(j), n − 1)) = 1. Next, for each j ≠ i, we add a path to E that starts at v(j, ρ(j), n − 1) and ends at u(ρ(j)). Each path that we add to E must contain no cycles, that is, it must contain no two arrows $(w_1,w^{\prime}_1)$ and $(w_2, w^{\prime}_2)$ such that either w ₁ = w ₂ or $w^{\prime}_1 = w^{\prime}_2.$ In addition, each path must only contain arrows $(w, w^{\prime})$ for which $c(w, w^{\prime}) =0.$ Condition (1) guarantees the existence of paths that satisfy the latter requirement. Due to condition (2), for each $u \in\mathcal{U} \setminus \{u(i)\}, E$ now contains a path from u to u(i). Finally, for each $w \in \mathcal{W} \setminus \mathcal{U},$ if E does not yet contain an arrow that starts at w, we add such an arrow to E. We choose the arrows that we add to E in such a way that, after adding the arrows, E contains, for each $w \in\mathcal{W} \setminus \mathcal{U},$ a path from w to some $u \in\mathcal{U}$ (which implies that E contains a path from w to u(i)). In addition, we only choose arrows $(w, w^{\prime})$ for which $c(w, w^{\prime}) = 0.$ We can choose the arrows in this way because Theorem 1 assumes that all non-uniform populations are almost uniform. Using Definition 11, it can be seen that the set of arrows E constructed as discussed above is a u(i)-tree on $\mathcal{W}.$ Moreover, E satisfies (65). We have therefore shown that for each chromosome i a u(i)-tree E on $\mathcal{W}$ that satisfies (65) can be constructed. Consequently, it follows from (64) that $\tilde{c}(u) \leq \mu - 1$ for all $u \in \mathcal{U}.$ Since it has been shown above that $\tilde{c}(u) \geq \mu - 1$ for all $u \in \mathcal{U},$ this implies that $\tilde{c}(u) = \mu - 1$ for all $u \in \mathcal{U}.$ It has also been shown above that $\tilde{c}(w) \geq \mu$ for all $w \in \mathcal{W} \setminus\mathcal{U}.$ Hence, as the mutation rate $\varepsilon$ approaches zero, $\tilde{q}(w)$ approaches zero faster for $w \in \mathcal{W}\setminus \mathcal{U}$ than for $w \in \mathcal{U}.$ It then follows from Lemma 5 that for all non-uniform populations $w \in \mathcal{W}\setminus \mathcal{U}$ the long-run probability $\bar{q}(w)$ approaches zero as $\varepsilon$ approaches zero. In other words, the long-run limit probability $\hat{q}(w)$ equals zero for all non-uniform populations $w \in \mathcal{W} \setminus \mathcal{U}.$ This completes the proof of part (1) of Theorem 1.

We now prove part (2) of Theorem 1. It has been shown above that $\tilde{c}(u) = \mu - 1$ for all $u \in \mathcal{U}.$ Consequently, as the mutation rate $\varepsilon$ approaches zero,$\tilde{q}(u)$ approaches zero equally fast for all $u \in \mathcal{U}.$ Using Lemma 5, it can therefore be seen that the long-run limit probability $\hat{q}(u)$ of a uniform population $u \in \mathcal{U}$ is given by

$$ \hat{q}(u) = \lim_{\varepsilon \rightarrow 0}\bar{q}(u) = \frac{\tilde{\alpha}(u)}{\sum_{u^{\prime} \in {\mathcal{U}}}\tilde{\alpha}(u^{\prime})}. $$

(67)

For $u \in \mathcal{U},$ let $\widetilde{\mathcal{E}}(u)$ be defined as

$$ {\widetilde{\mathcal{E}}}(u) = \left\{E \in {\mathcal{E}}(u) \Big\vert \textstyle{\sum_{(w, w^{\prime}) \in E}c(w, w^{\prime})} = \mu - 1\right\}. $$

(68)

It then follows from (57) to (63) that $\tilde{\alpha}(u)$ can be written as

$$ \tilde{\alpha}(u)=\sum_{E \in {\widetilde{\mathcal{E}}}(u)}\prod_{(w, w^{\prime}) \in E}\alpha(w, w^{\prime}). $$

(69)

Consider an arbitrary uniform population $u \in \mathcal{U}$ and an arbitrary u-tree E on $\mathcal{W},$ where $E \in\widetilde{\mathcal{E}}(u).$ Let E ₁ and E ₂ denote sets of arrows that are given by

$$ E_1=\{(w, w^{\prime}) \in E \vert w \in {\mathcal{V}}\}$$

(70)

$$ E_2=E \setminus E_1. $$

(71)

It is immediately clear that E ₁ satisfies the following conditions:

(A1)
E ₁ contains no arrow that starts at u or at some $w \in\mathcal{W} \setminus \mathcal{V}.$
(A2)
For each $v \in \mathcal{V} \setminus \{u\}, E_1$ contains exactly one arrow that starts at v.

Notice that $c(u^{\prime}, w) \geq 1$ for all $u^{\prime} \in\mathcal{U}$ and all $w \in \mathcal{W}$ such that $u^{\prime} \neq w.$ Notice further that, due to (68), $\sum_{(w, w^{\prime}) \in E_1}c(w, w^{\prime}) \leq \mu - 1.$ These observations imply that, for each $(w, w^{\prime}) \in E_1, c(w, w^{\prime}) = 1$ if $w \in\mathcal{U}$ and $c(w, w^{\prime}) = 0$ otherwise. They also imply that E ₁ satisfies the following condition:
(A3)
$ \sum_{(w, w^{\prime}) \in E_1}c(w, w^{\prime}) = \mu - 1. $It is easy to see that c(v, w) ≥ 1 for all $v \in \mathcal{V}$ and all $w \in \mathcal{W} \setminus \mathcal{V}$ and that $c(u^{\prime}, w) \geq 2$ for all $u^{\prime} \in \mathcal{U}$ and all $w \in \mathcal{W} \setminus \mathcal{V}.$ Consequently, E ₁ contains no arrows that end at some $w \in \mathcal{W} \setminus\mathcal{V}.$ This implies the following condition on E ₁:
(A4)
For each $v \in \mathcal{V} \setminus \{u\}, E_1$ contains a path from v to u.

It is immediately clear that E ₂ satisfies the following conditions:

(B1)
E ₂ contains no arrow that starts at some $v \in\mathcal{V}.$
(B2)
For each $w \in \mathcal{W} \setminus \mathcal{V}, E_2$ contains exactly one arrow that starts at w.
(B3)
For each $w \in \mathcal{W} \setminus \mathcal{V}, E_2$ contains a path from w to some $v \in \mathcal{V}.$

Furthermore, taking into account that E ₁ satisfies condition (A3), (68) implies that E ₂ satisfies the following condition:
(B4)
$ \sum_{(w, w^{\prime}) \in E_2}c(w, w^{\prime}) = 0. $

For $u \in \mathcal{U},$ let $\widetilde{\mathcal{E}}_1(u)$ denote a set that contains all sets of arrows E ₁ satisfying conditions (A1)–(A4). Let $\widetilde{\mathcal{E}}_2$ denote a set that contains all sets of arrows E ₂ satisfying conditions (B1)–(B4). Notice that $\widetilde{\mathcal{E}}_2$ does not depend on u. Clearly, for each $E \in\widetilde{\mathcal{E}}(u),$ there exist an $E_1 \in\widetilde{\mathcal{E}}_1(u)$ and an $E_2 \in\widetilde{\mathcal{E}}_2$ such that E = E ₁ ∪ E ₂. Conversely, it can be seen that for each $E_1 \in \widetilde{\mathcal{E}}_1(u)$ and each $E_2 \in \widetilde{\mathcal{E}}_2$ there exists an $E \in\widetilde{\mathcal{E}}(u)$ such that E = E ₁ ∪ E ₂. Hence,

$$ {\widetilde{\mathcal{E}}}(u) = \left\{E_1 \cup E_2 \Big\vert E_1 \in {\widetilde{\mathcal{E}}}_1(u), E_2 \in {\widetilde{\mathcal{E}}}_2\right\}. $$

(72)

Equation 69 can now be written as

$$ \tilde{\alpha}(u) =\left(\textstyle{\sum_{E_1 \in {\widetilde{\mathcal{E}}}_1(u)}\prod_{(w, w^{\prime}) \in E_1}\alpha(w, w^{\prime})}\right) \\ \times \left(\textstyle{\sum_{E_2 \in {\widetilde{\mathcal{E}}}_2}\prod_{(w, w^{\prime}) \in E_2}\alpha(w, w^{\prime})}\right). $$

(73)

Consequently, it follows from (67) that

$$ \hat{q}(u) = \lim_{\varepsilon \rightarrow 0}\bar{q}(u)\\ = \frac{\sum_{E_1 \in {\widetilde{\mathcal{E}}}_1(u)}\prod_{(w, w^{\prime}) \in E_1}\alpha(w, w^{\prime})}{\sum_{u^{\prime} \in {\mathcal{U}}}\sum_{E_1 \in {\widetilde{\mathcal{E}}}_1(u^{\prime})}\prod_{(w, w^{\prime}) \in E_1}\alpha(w, w^{\prime})}. $$

(74)

Based on (74), the following observations can be made:

1.
For $w, w^{\prime} \in \mathcal{W}$ such that $w \neq w^{\prime}$ and such that there exists an $E_1 \in \bigcup_{u^{\prime} \in\mathcal{U}}\widetilde{\mathcal{E}}_1(u^{\prime})$ that contains an arrow $(w, w^{\prime}), \lim_{\varepsilon \rightarrow 0}\bar{q}(u)$ depends on the term of lowest degree in the transition probability $q(w, w^{\prime})$ and does not depend on other terms in $q(w,w^{\prime}).$
2.
For $w, w^{\prime} \in \mathcal{W}$ such that $w \neq w^{\prime}$ and such that there does not exist an $E_1 \in \bigcup_{u^{\prime}\in \mathcal{U}}\widetilde{\mathcal{E}}_1(u^{\prime})$ that contains an arrow $(w, w^{\prime}), \lim_{\varepsilon \rightarrow0}\bar{q}(u)$ does not depend on any of the terms in the transition probability $q(w, w^{\prime}).$

Let {V _t}, where the index t takes values in $\{0, 1,\ldots\},$ denote a Markov chain with state space $\mathcal{V},$ transition probabilities $r(v, v^{\prime}),$ and stationary probabilities $\bar{r}(v),$ where $v, v^{\prime} \in \mathcal{V}.$ For $v \neq v^{\prime},$ let

$$ r(v, v^{\prime}) = \left\{\begin{array}{ll} \alpha(v, v^{\prime})\varepsilon, & \hbox {if}\; v \in {\mathcal{U}} {\hbox{ and }} c(v, v^{\prime}) = 1\\ \alpha(v, v^{\prime}), & \hbox {if}\; v \notin {\mathcal{U}} {\hbox{ and }} c(v, v^{\prime}) = 0 \\ 0, & \hbox {otherwise.}\end{array}\right. $$

(75)

Furthermore, let $r(v, v) = 1 - \sum_{v^{\prime} \in \mathcal{V}\setminus \{v\}}r(v, v^{\prime}).$ Clearly, Markov chain {V _t} is irreducible. Taking into account the two observations made above, it can be seen that $\lim_{\varepsilon \rightarrow 0}\bar{r}(v) =\lim_{\varepsilon \rightarrow 0}\bar{q}(v)$ for all $v \in\mathcal{V}.$ That is, in the limit as $\varepsilon$ approaches zero, corresponding states of Markov chains {V _t} and {W _t} have the same stationary probability. It follows from this that $\lim_{\varepsilon \rightarrow 0}\bar{r}(v) = \hat{q}(v)$ for all $v\in \mathcal{V}.$

The following observations can be made:

1.
For $v \in \mathcal{U}$ and $v^{\prime} \in \mathcal{V}, c(v,v^{\prime}) = 1$ if and only if v = u(i) and $v^{\prime} = v(i, j,n - 1)$ for some i and some j such that δ(i, j) = 1.
2.
For $v \in \mathcal{U}$ and $v^{\prime} \in \mathcal{V}$ such that $c(v, v^{\prime}) = 1, q(v, v^{\prime})$ equals the probability that the mutation operator inverts one specific bit in the binary encoding of an arbitrarily chosen chromosome and that it does not invert any other bits in the binary encoding of the chosen chromosome or of any other chromosome in the population. This probability does not depend on v or $v^{\prime}.$ Consequently, for all $v_1, v_2 \in \mathcal{U}$ and all $v^{\prime}_1,v^{\prime}_2 \in \mathcal{V}$ such that $c(v_1, v^{\prime}_1) =c(v_2, v^{\prime}_2) = 1, q(v_1, v^{\prime}_1) = q(v_2,v^{\prime}_2)$ and hence $\alpha(v_1, v^{\prime}_1) = \alpha(v_2,v^{\prime}_2).$
3.
For $v \in \mathcal{V} \setminus \mathcal{U}$ and $v^{\prime} \in\mathcal{V}, c(v, v^{\prime}) = 0$ only if v = v(i, j, λ) and $v^{\prime} = v(i, j, \lambda^{\prime})$ for some i and some j such that δ(i, j) = 1 and for some $\lambda \in \{1,\ldots, n - 1\}$ and some $\lambda^{\prime} \in \{0, \ldots, n\}.$
4.
For $v \in \mathcal{V} \setminus \mathcal{U}$ and $v^{\prime} \in\mathcal{V}$ such that $c(v, v^{\prime}) = 0, \alpha(v, v^{\prime})= \pi(i, j, \lambda, \lambda^{\prime}),$ where i, j, λ, and $\lambda^{\prime}$ satisfy v = v(i, j, λ) and $v^{\prime} = v(i, j, \lambda^{\prime})$ and where $\pi(i, j,\lambda, \lambda^{\prime})$ is defined in (6).

Let $\alpha = \alpha(v, v^{\prime})$ for all $v \in \mathcal{U}$ and all $v^{\prime} \in \mathcal{V}$ such that $c(v, v^{\prime}) = 1.$ Using (75), it follows from the first two observations made above that $r(v, v^{\prime}) = \alpha\varepsilon$ if v = u(i) and $v^{\prime} = v(i, j, n - 1)$ for some i and some j such that δ(i, j) = 1. It also follows that $r(v, v^{\prime}) = 1 -m\alpha\varepsilon$ if $v = v^{\prime} \in \mathcal{U}.$ Furthermore, taking into account the last two observations made above, it can be seen from (75) that $r(v, v^{\prime}) = \pi(i, j,\lambda, \lambda^{\prime})$ if v = v(i, j, λ) and $v^{\prime} = v(i, j, \lambda^{\prime})$ for some i and some j such that δ(i, j) = 1 and for some $\lambda \in \{1, \ldots,n - 1\}$ and some $\lambda^{\prime} \in \{0, \ldots, n\}.$ Finally, (75) implies that $r(v, v^{\prime}) = 0$ if none of the above conditions is satisfied. Let the vector ${\tilde{\mathbf{v}}}=\left[\begin{array}{lll}\tilde{v}_1 & \cdots &\tilde{v}_\xi\end{array}\right]$ be given by

$$ \tilde{{\mathbf{v}}}^{\rm T}=\left[\begin{array}{l} v(g_1, h_1, 1)\\ \vdots\\ v(g_1, h_1, n - 1)\\ v(g_2, h_2,1)\\ \vdots\\ \vdots\\ v(g_{\nu - 1}, h_{\nu - 1}, n - 1) \\ v(g_\nu, h_\nu, 1)\\ \vdots\\ v(g_\nu, h_\nu, n - 1)\end{array}\right] $$

(76)

where $\mathbf{g} = [g_k]$ and $\mathbf{h} = [h_k]$ are defined in Sect. 2. Notice that ${\tilde{\mathbf{v}}}$ contains each population in $\mathcal{V} \setminus \mathcal{U}$ exactly once. It can be seen that

$$ (1-m\alpha\varepsilon){\mathbf{I}} = \\ \left[\begin{array}{lll} r(u(0), u(0))& \cdots & r(u(0), u(\mu - 1))\\ \vdots& \ddots & \vdots\\ r(u(\mu - 1), u(0))&\cdots & r(u(\mu - 1), u(\mu - 1)) \end{array}\right] $$

(77)

$$ \alpha\varepsilon{\mathbf{A}} =\left[\begin{array}{lll} r(u(0), \tilde{v}_1) &\cdots & r(u(0), \tilde{v}_\xi)\\ \vdots & \ddots & \vdots\\ r(u(\mu - 1), \tilde{v}_1) & \cdots & r(u(\mu-1),\tilde{v}_\xi) \end{array}\right] $$

(78)

$$ {\mathbf{B}}=\left[\begin{array}{lll} r(\tilde{v}_1, u(0)) & \cdots & r(\tilde{v}_1, u(\mu - 1)) \\ \vdots & \ddots & \vdots \\ r(\tilde{v}_\xi, u(0)) & \cdots & r(\tilde{v}_\xi, u(\mu - 1)) \end{array}\right] $$

(79)

$$ {\mathbf{C}} =\left[\begin{array}{lll} r(\tilde{v}_1, \tilde{v}_1) & \cdots & r(\tilde{v}_1, \tilde{v}_\xi) \\ \vdots & \ddots & \vdots \\ r(\tilde{v}_\xi, \tilde{v}_1) & \cdots & r(\tilde{v}_\xi, \tilde{v}_\xi) \end{array}\right] $$

(80)

where $\mathbf{A}, \mathbf{B},$ and $\mathbf{C}$ are defined in (13), (16), and (19). Let $\mathbf{S}$ denote a μ × μ matrix that is obtained from the matrices in (77)–(80) and that is given by

$$ {\mathbf{S}} = (1 - m\alpha\varepsilon){\mathbf{I}} + \alpha\varepsilon{\mathbf{A}}{({\mathbf{I}} - {\mathbf{C}}) ^ {-1}}{\mathbf{B}}. $$

(81)

This can be written more simply as

$$ {\mathbf{S}} = {\mathbf{I}} + \alpha\varepsilon{\mathbf{D}} $$

(82)

where $\mathbf{D}$ is defined in (22). Let {U _t}, where the index t takes values in $\{0, 1, \ldots\},$ denote a Markov chain with state space $\mathcal{U}$ and transition matrix $\mathbf{S}.$ Using (77)–(81), it follows from Lemma 4 that Markov chain {U _t} is irreducible and has stationary probabilities $\bar{s}(u)$ that are given by

$$ \bar{s}(u) = \frac{\bar{r}(u)}{\sum_{u^{\prime} \in {\mathcal{U}}}\bar{r}(u^{\prime})} $$

(83)

where $u \in \mathcal{U}.$ Definition 8 states that the stationary distribution ${\bar{\mathbf{s}}} =\left[\begin{array}{lll}\bar{s}(u(0)) & \cdots & \bar{s}(u(\mu - 1))\end{array}\right]$ of Markov chain {U _t} satisfies

$$ {{\bar{\mathbf{s}}}}{\mathbf{S}} = {{\bar{\mathbf{s}}}} $$

(84)

$$ {{\bar{\mathbf{s}}}}{\mathbf{1}} = 1. $$

(85)

Lemma 2 implies that this linear system has a unique solution. The equality in (84) can be written as

$$ {{\bar{\mathbf{s}}}}({\mathbf{S}} - {\mathbf{I}}) = \alpha\varepsilon{{\bar{\mathbf{s}}}}{\mathbf{D}} = {\mathbf{0}}. $$

(86)

Since α > 0 and $\varepsilon > 0,$ this can be simplified to

$$ {{\bar{\mathbf{s}}}}{\mathbf{D}} = {\mathbf{0}}. $$

(87)

Notice that $\mathbf{D}$ does not depend on $\varepsilon.\;{\bar{\mathbf{s}}}$ therefore does not depend on $\varepsilon$ either. Recall further that $\lim_{\varepsilon \rightarrow 0}\bar{r}(v) =\hat{q}(v)$ for all $v \in \mathcal{V}$ and that $\hat{q}(w) = 0$ for all $w \in \mathcal{W} \setminus \mathcal{U}.$ Using (83), it now follows that

$$ \bar{s}(u) = \lim_{\varepsilon \rightarrow 0}\bar{s}(u) = \lim_{\varepsilon \rightarrow 0}\frac{\bar{r}(u)}{\sum_{u^{\prime} \in {\mathcal{U}}}\bar{r}(u^{\prime})} = \frac{\hat{q}(u)}{\sum_{u^{\prime} \in {\mathcal{U}}}\hat{q}(u^{\prime})} = \hat{q}(u) $$

(88)

for all $u \in \mathcal{U}.$ Hence, the stationary distribution ${\bar{\mathbf{s}}}$ of Markov chain {U _t} equals the long-run limit distribution ${\hat{\mathbf{q}}}.$ Consequently, (85) and (87) imply that ${\hat{\mathbf{q}}}$ satisfies (23) and (24). It also follows that the linear system given by (23) and (24) has a unique solution. This completes the proof of part (2) of Theorem 1.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Waltman, L., van Eck, N.J. A mathematical analysis of the long-run behavior of genetic algorithms for social modeling. Soft Comput 16, 1071–1089 (2012). https://doi.org/10.1007/s00500-012-0804-x

Download citation

Published: 24 January 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s00500-012-0804-x

A mathematical analysis of the long-run behavior of genetic algorithms for social modeling

Abstract

Similar content being viewed by others

Characteristic Features of the Sustainable Strategies in the Evolvable Iterated Prisoners’ Dilemma

Evolutionary Mechanisms

Epictetusian rationality and evolutionary stability

Explore related subjects

1 Introduction

2 Analysis

Definition 1

Definition 2

Definition 3

Definition 4

Lemma 1

Proof

Definition 5

Theorem 1

Proof

3 Algorithm

4 Application

4.1 Genetic algorithm modeling in iterated prisoner’s dilemmas

4.2 Calculation of the long-run limit distribution of the genetic algorithm

4.3 Analysis of the long-run behavior of the genetic algorithm

5 Conclusions

Notes

References

Acknowledgments

Open Access

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Definition 6

Definition 7

Definition 8

Definition 9

Lemma 2

Proof

Definition 10

Lemma 3

Proof

Lemma 4

Proof

Definition 11

Lemma 5

Proof

Proof of Lemma 1

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation