1 Introduction

The field of evolutionary computation is concerned with the study of all kinds of evolutionary algorithms. These algorithms can be used for various purposes. Perhaps the most popular purpose for which they can be used is function optimization (e.g., Gen and Cheng 2000; Goldberg 1989; Michalewicz 1996). In the function optimization context, evolutionary algorithms can be seen as heuristics that serve as alternatives to more traditional techniques from the fields of combinatorial optimization and mathematical programming. Another important purpose for which evolutionary algorithms can be used is the modeling of biological and social phenomena (e.g., Mitchell 1996). This is the topic with which we are concerned in this paper. Our focus is in particular on the use of evolutionary algorithms for modeling social phenomena.

When using evolutionary algorithms in the social modeling context, one of the assumptions one makes is that the agents whose behavior is being modeled are boundedly rational. This basically means that the agents are assumed not to behave in a utility-maximizing manner. There are numerous ways in which boundedly rational behavior can be modeled (e.g., Brenner 2006; Fudenberg and Levine 1998). A popular approach is to rely on an evolutionary metaphor. This is the approach that is taken by evolutionary algorithms. In its simplest form, the evolutionary approach assumes that there is a population of agents and that for each agent in the population the strategy it uses depends on the population-wide past performance of strategies. The better the past performance of a strategy, the more likely the strategy is to be used again. The evolutionary approach also assumes that there always is a small probability that an agent experiments with a new strategy.

The evolutionary approach to modeling boundedly rational behavior has attracted a lot of attention, not only from researchers in the field of evolutionary computation but also from researchers in the social sciences, in particular from economists. Traditionally, economists have typically relied on game-theoretic models to analyze interactions between agents. These models assume agents to behave in a fully rational way. Nowadays, however, the limitations of game-theoretic models are well recognized and many economists have started to study evolutionary models of agent behavior. These models are based on the assumption that the behavior of agents can best be described using some evolutionary mechanism rather than using the idea of full rationality.

In the field of economics, there are two quite separate streams of research that are both concerned with the evolutionary approach to modeling boundedly rational behavior. One stream of research, which is usually referred to as agent-based computational economics (e.g., Tesfatsion 2006), makes use of techniques from the field of evolutionary computation. Especially genetic algorithms (GAs) are frequently used. Early work in this stream of research includes (Andreoni and Miller 1995; Arifovic 1994, 1996; Dawid 1996; Holland and Miller 1991; Marks 1992; Miller 1986), and examples of more recent work are Alkemade et al. (2006, 2009), Georges (2006), Haruvy et al. (2006), Lux and Schornstein (2005), Vriend (2000), Waltman and Van Eck (2009), and Waltman et al. (2011). The other stream of research is more closely related to traditional game theory and is referred to as evolutionary game theory (e.g., Gintis 2000; Maynard Smith 1982; Vega-Redondo 1996; Weibull 1995). Like the traditional game-theoretic approach, the evolutionary game-theoretic approach is model-based and relies heavily on mathematical analysis. The use of computer simulations is not very common in evolutionary game theory.

In this paper, it is not our aim to argue in favor of either the agent-based computational economics approach, which emphasizes algorithms and computer simulations, or the evolutionary game-theoretic approach, which emphasizes models and mathematical analysis. Instead, we want to show how the former approach can benefit from the mathematical techniques used in the latter approach. More specifically, we want to show how evolutionary algorithms that are used for modeling social phenomena can be analyzed mathematically using techniques that are popular in evolutionary game theory. Our focus in this paper is on one particular type of evolutionary algorithm, namely GAs with a binary encoding. However, we emphasize that the approach that we take can be applied to other types of evolutionary algorithms as well. The reason for focusing on GAs with a binary encoding is that this seems to be the type of evolutionary algorithm that is used most frequently for modeling social phenomena (e.g., Alkemade et al. 2006, 2007; Andreoni and Miller 1995; Arifovic 1994, 1996; Ashlock et al. 1996; Axelrod 1987; Crowley et al. 1996; Dawid 1996; Georges 2006; Ishibuchi and Namikawa 2005; Lux and Schornstein 2005; Marks 1992; Miller 1986, 1996; Van Bragt et al. 2001; Vriend 2000; Yao and Darwen 1994).

The mathematical analysis that we present in this paper deals with the long-run behavior of GAs with a binary encoding. The GAs are assumed to be used in the social modeling context (for theoretical work on GAs in the function optimization context, see, e.g., Mitchell 1996; Nix and Vose 1992; Rudolph 1994, 1998; Vose 1999). In the terminology of Vriend (2000), we are concerned with GAs that are used for modeling social learning (as opposed to individual learning). Our work can be seen as an extension of the work of Dawid (1996), who derived a number of important mathematical results on the behavior of GAs. For small and moderate population sizes, the results of Dawid do not provide a full characterization of the long-run behavior of GAs. We extend the work of Dawid by deriving results that do provide a full characterization of the long-run behavior of GAs for small and moderate population sizes. Using our results, the long-run behavior of a GA can be calculated exactly and need not be estimated using computer simulations. This means that it is no longer necessary to run a GA a large number of times for a large number of iterations in order to get insight into its long-run behavior. The use of our mathematical results has at least three advantages over the use of computer simulations:

  1. 1.

    Our mathematical results can be used to calculate the long-run behavior of a GA exactly, while computer simulations can only be used to estimate the long-run behavior of a GA.

  2. 2.

    When using computer simulations, it can be difficult to determine how many iterations of a GA are required to approximate the long-run behavior of the GA reasonably closely. Our mathematical results do not have this problem.

  3. 3.

    Calculating the exact long-run behavior of a GA using our mathematical results requires less computing time than obtaining a reasonably accurate estimate of the long-run behavior of a GA using computer simulations.

Our mathematical results have one important limitation, which is that on most of today’s computers they can only be used if the chromosome length is not greater than about 24 bits. If the chromosome length is greater than about 24 bits, the use of our mathematical results to calculate the long-run behavior of a GA most likely requires a prohibitive amount of computer memory.

Like in Dawid (1996), the mathematical analysis presented in this paper relies on the assumption that the mutation rate is positive but infinitely small. (In other words, the analysis is concerned with the limit case in which the mutation rate approaches zero.) In simulation studies with GAs, researchers typically work with values between 0.001 and 0.01 for the mutation rate. This seems to be a rather pragmatic choice (cf. Dawid 1996). On the one hand, lower values for the mutation rate would lead to very slow convergence and, consequently, very long simulation runs. On the other hand, higher values for the mutation rate would lead to convergence to unstable, difficult to interpret outcomes. We believe that our assumption of an infinitely small mutation rate is justified because an infinitely small mutation rate is less arbitrary than a mutation rate whose value is determined solely based on pragmatic grounds (cf. Foster and Young 1990). The assumption of an infinitely small mutation rate is also in line with the common practice in evolutionary game theory, in which a similar assumption is almost always made. The advantage of assuming an infinitely small mutation rate is that it greatly simplifies the mathematical analysis of the long-run behavior of GAs (see also Dawid 1996). In fact, GAs with an infinitely small mutation rate can be analyzed in a similar way as well-known models in evolutionary game theory (e.g., Foster and Young 1990; Kandori et al. 1993; Vega-Redondo 1997; Young 1993). Like in evolutionary game theory, mathematical results provided by Freidlin and Wentzell (1998) are the key tool for analyzing the long-run behavior to which convergence will take place. We note that, in addition to the assumption of an infinitely small mutation rate, there are some other technical assumptions on which our mathematical analysis relies. Most of these assumptions are not very strong and will probably be satisfied by most GAs.

To demonstrate the usefulness of our mathematical analysis, we replicate a well-known study by Axelrod (1987) (reprinted in Axelrod 1997; see also Dawid 1996; Mitchell 1996). Axelrod used a GA to model the evolution of strategies in iterated prisoner’s dilemmas (IPDs). He showed that an evolutionary mechanism can lead to cooperative behavior. Axelrod’s study has been one of the first and also one of the most influential studies on the use of evolutionary algorithms for modeling social phenomena. Directly or indirectly, his study seems to have inspired many researchers (e.g., Ashlock et al. 1996, 2006; Chong and Yao 2005; Crowley et al. 1996; Fogel 1993; Ishibuchi and Namikawa 2005; Mühlenbein 1991; Thibert-Plante and Charbonneau 2007; Van Bragt et al. 2001; Yao and Darwen 1994). The results obtained by Axelrod are all based on computer simulations. In this paper, we show that more or less the same results can be calculated exactly, with no need to rely on simulations. We also discuss some new insights that exact calculations provide.

The mathematical analysis that we present in this paper also has an important implication for the choice of the parameters of a GA. The analysis indicates that if the mutation rate is infinitely small the crossover rate has no effect on the long-run behavior of a GA. This is a quite remarkable result that, to the best of our knowledge, has not been reported before in the theoretical literature on GAs. The result implies that when GAs are used for modeling social phenomena the crossover rate is likely to be a rather insignificant parameter, at least when one is mainly interested in the behavior of GAs in the long run (for the short run, see Thibert-Plante and Charbonneau 2007). This suggests that in many cases the crossover rate can simply be set to zero, in which case no crossover will take place at all. Simulation results that we report in this paper indeed show no significant effect of the crossover rate on the long-run behavior of a GA.

The remainder of this paper is organized as follows: in Sect. 2, we present a mathematical analysis of the long-run behavior of GAs that are used for modeling social phenomena. Based on the analysis, we derive an algorithm for calculating the long-run behavior of GAs in Sect. 3. In Sect. 4, we demonstrate an application of the algorithm by replicating Axelrod’s study (1987). Finally, we discuss the conclusions of our research in Sect. 5. Proofs of our mathematical results are provided in the Appendix.

2 Analysis

The general form of the GAs that we analyze in this paper is shown in Fig. 1. In this figure, and also in the rest of this paper, the positive integers n and m and the probabilities γ and \(\varepsilon\) denote, respectively, the population size, the chromosome length, the crossover rate, and the mutation rate. For simplicity, we assume the population size n to be even. We further assume the crossover rate γ and the mutation rate \(\varepsilon\) to remain constant over time. We also assume \(\varepsilon\) to be positive. The GAs that we analyze are generalizations of the canonical GA discussed in, for example, Goldberg (1989) and Mitchell (1996). Like the canonical GA, we assume the use of a binary encoding, that is, chromosomes correspond to bit strings in our GAs. Unlike the canonical GA, we do not assume the use of specific selection and crossover operators. Instead, the GAs that we analyze may use almost any selection operator, such as roulette wheel selection (sometimes referred to as fitness-proportionate selection), tournament selection, or rank selection, and any crossover operator, such as single-point crossover, two-point crossover, or uniform crossover. Furthermore, in the GAs that we analyze, the fitness of a chromosome may depend, either deterministically or stochastically, on the entire population rather than only on the chromosome itself. When using GAs for social modeling, the fitness of a chromosome typically depends on the entire population. This is referred to as state-dependent fitness in Dawid (1996). In most studies, GAs that are used for social modeling have the same general form as the GAs that we analyze in this paper.

Fig. 1
figure 1

General form of the genetic algorithms analyzed in this paper

We now introduce the terminology and the mathematical notation that we use in our analysis. We note that an overview of the mathematical notation is provided in Table 1. There are μ = 2m different chromosomes, denoted by \(0, \ldots, \mu-1.\) Each chromosome has a unique binary encoding, which is given by a bit string of length m.Footnote 1 \(\mathcal{C}=\{0, \ldots, \mu-1\}\) denotes the set of all chromosomes. i and j denote typical chromosomes and take values in \(\mathcal{C}.\) The following definition introduces the notion of uniform and non-uniform populations.

Table 1 Overview of the mathematical notation

Definition 1

A population is said to be uniform if and only if all n chromosomes in the population are identical. A population is said to be non-uniform if and only if some chromosomes in the population are different.

\(\mathcal{U}\) denotes the set of all uniform populations. Obviously, since there are μ different chromosomes, there are also μ different uniform populations, that is, \(\vert{\mathcal{U}}\vert =\mu.\) \(u(i) \in \mathcal{U}\) denotes the uniform population consisting of n times chromosome i.δ(ij) denotes the Hamming distance between chromosomes i and j, that is, the number of corresponding bits in the binary encodings of i and j that are different. \(\mathcal{G}(i)\) denotes the set of all chromosomes that have the same binary encoding as chromosome i except that one bit has been changed from one into zero. Conversely, \(\mathcal{H}(i)\) denotes the set of all chromosomes that have the same binary encoding as chromosome i except that one bit has been changed from zero into one. In mathematical notation,

$$ {\mathcal{G}}(i)=\{j \vert j < i \hbox{ and } \delta(i, j) = 1\} $$
(1)
$$ {\mathcal{H}}(i)=\{j \vert j > i \hbox{ and } \delta(i, j) = 1\}. $$
(2)

Notice that \(j \in \mathcal{G}(i)\) if and only if \(i \in\mathcal{H}(j).\) There are

$$ \nu = \mu m/2 = m{2^{m - 1}} $$
(3)

combinations of two chromosomes i and j such that δ(ij) = 1, that is, such that the binary encodings of i and j differ by exactly one bit. (To see this, notice that there are μ different chromosomes and that for each chromosome there are m chromosomes that have the same binary encoding except for one bit. Dividing by two is necessary to avoid double counting.) k and \(k^{\prime}\) denote indices that take values in \(\{1, \ldots, \nu\}.\;\widetilde{{\mathcal{V}}}\) denotes the set of all populations in which there are exactly two different chromosomes and in which the binary encodings of these chromosomes differ by exactly one bit. There are

$$ \xi=\lvert{\widetilde{\mathcal{V}}}\rvert = \nu(n - 1) = (n - 1)m{2 ^ {m-1}} $$
(4)

such populations. (The order of the chromosomes within a population has no effect on the behavior of a GA. Populations consisting of the same chromosomes in different orders are therefore considered identical.) \(\mathcal{V}\) denotes the set that is obtained by adding the uniform populations to \(\widetilde{\mathcal{V}},\) that is, \(\mathcal{V} = \widetilde{\mathcal{V}} \cup \mathcal{U}.\) For i and j such that δ(i, j) = 1 and for \(\lambda \in \{0,\ldots, n\}, v(i, j, \lambda) \in \mathcal{V}\) denotes the population consisting of λ times chromosome i and n − λ times chromosome j. Notice that v(ij, λ) = v(jin − λ) and that v(ij, 0) = u(j) and \(v(i, j, n)=u(i).\) \(\mathcal{W}\) denotes the set of all possible populations. As shown in Nix and Vose (1992, Lemma 1) and Dawid (1996), the number of possible populations equals

$$ \vert{\mathcal{W}}\vert = \left(\begin{array}{l} n + \mu - 1\\ \mu - 1 \end{array}\right)=\frac{(n + \mu - 1)!}{n!(\mu - 1)!}. $$
(5)

(Again, populations consisting of the same chromosomes in different orders are considered identical.) For \(t \in \{0, 1, \ldots\},\) the random variable \(W_t \in \mathcal{W}\) denotes the population at the beginning of iteration t of a GA. For i and j such that δ(i, j) = 1 and for \(\lambda \in \{1, \ldots, n - 1\}\) and \(\lambda^{\prime} \in \{0, \ldots, n\}, \pi(i, j, \lambda,\lambda^{\prime})\) denotes the limit as the mutation rate \(\varepsilon\) approaches zero of the probability that population v(ij, λ) is turned into population \(v(i, j,\lambda^{\prime})\) in a single iteration of a GA. In mathematical notation,

$$ \pi(i, j, \lambda, \lambda^{\prime}) = \lim_{\varepsilon \rightarrow 0}\Pr(W_{t + 1} = v(i, j, \lambda^{\prime}) \vert W_t = v(i, j, \lambda)) $$
(6)

where \(t \in \{0, 1, \ldots\}.\) Because the binary encodings of chromosomes i and j differ by only one bit, applying the crossover operator to chromosomes i and j has no effect. As a consequence, the crossover operator has no effect on \(\pi(i, j,\lambda, \lambda^{\prime}).\) Moreover, because \(\varepsilon\) approaches zero, the mutation operator has no effect on \(\pi(i, j,\lambda, \lambda^{\prime})\) either. \(\pi(i, j, \lambda,\lambda^{\prime})\) therefore equals the probability that the selection operator turns population v(ij, λ) into population \(v(i, j, \lambda^{\prime})\) in a single iteration of a GA.

The following definition introduces the notion of almost uniform populations.

Definition 2

A non-uniform population \(w \in \mathcal{W} \setminus\mathcal{U}\) is said to be almost uniform if and only if

$$ \lim_{\varepsilon \rightarrow 0}\Pr(W_{t + N} = u \vert W_t = w) > 0 $$
(7)

for all \(t \in \{0, 1, \ldots\},\) some finite positive integer N, and some \(u \in \mathcal{U}.\)

Hence, a non-uniform population is almost uniform if and only if no mutation is required to go from the non-uniform population to some uniform population. This is not a very strong condition. In many cases, all non-uniform populations are almost uniform. For example, if a GA uses roulette wheel selection or tournament selection, there is always a possibility that the selection operator selects n times the same chromosome. In other words, the selection operator can turn any non-uniform population into a uniform population in a single iteration. Because of this, when roulette wheel selection or tournament selection is used, all non-uniform populations are almost uniform.

The following two definitions introduce the notion of a connection from one chromosome to another:

Definition 3

A direct connection from chromosome i to chromosome j is said to exist if and only if δ(i, j) = 1 and

$$ \lim_{\varepsilon \rightarrow 0}\Pr(W_{t + N} = u(j) \vert W_t = v(i, j, n - 1)) > 0 $$
(8)

for all \(t \in \{0, 1, \ldots\}\) and some finite positive integer N.

Definition 4

A connection from chromosome i to chromosome j is said to exist if and only if there exists a sequence \((i_1,\ldots, i_N)\) such that \(i_1, \ldots, i_N \in \mathcal{C}, i_1 = i,i_N = j,\) and i M is directly connected to i M+1 for all \(M\in \{1, \ldots, N - 1\}.\)

Definition 3 states that there is a direct connection from chromosome i to chromosome j if and only if the minimum number of mutations required to go from uniform population u(i) to uniform population u(j) is one. We note that in many cases all chromosomes i and j such that δ(i, j) = 1 have mutual direct connections. This is, for example, the case if a GA uses roulette wheel selection and the fitness of a chromosome is always positive. Definition 4 states that there is a connection from chromosome i to chromosome j if and only if there is a sequence of chromosomes starting at i and ending at j such that each chromosome in the sequence is directly connected to its successor. Clearly, if all chromosomes i and j such that δ(i, j) = 1 have mutual direct connections, then each chromosome is connected to all other chromosomes.

It is well known that the population in the current iteration of a GA has no effect on the behavior of the GA in the long run (e.g., Dawid 1996; Nix and Vose 1992). More specifically, the population an infinite number of iterations in the future is statistically independent of the population in the current iteration. The following lemma states this result in a formal way:

Lemma 1

For each population \(w \in \mathcal{W},\) there exists a long-run probability \(\bar{q}(w)\) such that

$$ \lim_{N \rightarrow \infty}\Pr(W_{t + N} = w \vert W_t = w_t) = \bar{q}(w) $$
(9)

for all \(t \in \{0, 1, \ldots\}\) and all \(w_t \in\mathcal{W}.\)

Proof

See the Appendix.

In our analysis, we are concerned with the long-run behavior of GAs in the limit as the mutation rate \(\varepsilon\) approaches zero. We therefore use the following definition:

Definition 5

For \(w \in \mathcal{W}, \hat{q}(w) =\lim_{\varepsilon \rightarrow 0}\bar{q}(w)\) is called the long-run limit probability of population w.

We now introduce the vectors and matrices that we need to state the main result of our analysis. We first note that throughout this paper vectors and matrices are represented by, respectively, bold lowercase and bold uppercase letters and that the transpose of a matrix X is written as X T. I N denotes an identity matrix of order N × N, and 0 M × N and 1 M × N denote matrices of order M × N in which all elements are equal to, respectively, zero and one. We simply write I0, or 1 when the order of a matrix is clear from the context. g = [g k ] and h = [h k ] denote vectors of length ν that satisfy

$$ \forall k: g_k, h_k \in {\mathcal{C}} $$
(10)
$$ \forall k: h_k \in {\mathcal{H}}(g_k) $$
(11)
$$ \forall k, k^{\prime}: k \neq k^{\prime} \Rightarrow (g_k, h_k) \neq (g_{k^{\prime}}, h_{k^{\prime}}). $$
(12)

Hence, for each k, (g k h k ) denotes a combination of two chromosomes such that the binary encodings of the chromosomes differ by exactly one bit. \(\mathbf{g}\) and \(\mathbf{h}\) together contain all such combinations of two chromosomes. \(\mathbf{A}, \mathbf{B},\mathbf{C},\) and \(\mathbf{D}\) denote matrices of order μ × ξ, ξ ×  μ, ξ × ξ, and μ ×  μ, respectively. Matrix \(\mathbf{A}\) is given by

$$ {\mathbf{A}}=\left[\begin{array}{lll} {\mathbf{a}}(0, 1)& \cdots & {\mathbf{a}}(0, \nu)\\ \vdots& \ddots&\vdots\\ {\mathbf{a}}(\mu - 1, 1)&\cdots&{\mathbf{a}}(\mu - 1, \nu) \end{array}\right] $$
(13)

where

$$ {\mathbf{a}}(i, k)= \left\{\begin{array}{ll}{\tilde{\mathbf{a}}_1},& \hbox{if}\; g_k = i\\ {{\tilde{\mathbf{a}}}_2}, & \hbox{if}\; h_k = i\\ {\mathbf{0}}_{1 \times (n - 1)}, & \hbox{otherwise} \end{array}\right. $$
(14)

and

$$ {{\tilde{\mathbf{a}}}_1}=\left[\begin{array}{ll}{\mathbf{0}}_{1 \times (n - 2)} & 1\end{array}\right], \quad {{\tilde{\mathbf{a}}}_2}= \left[\begin{array}{ll}1 & {\mathbf{0}}_{1 \times (n - 2)}\end{array}\right]. $$
(15)

Matrix \(\mathbf{B}\) is given by

$$ {\mathbf{B}}=\left[\begin{array}{lll}{\mathbf{b}}(1, 0) &\cdots & {\mathbf{b}}(1, \mu - 1) \\ \vdots& \ddots & \vdots\\ {\mathbf{b}}(\nu, 0) & \cdots & {\mathbf{b}}(\nu, \mu - 1) \end{array}\right] $$
(16)

where

$$ {\mathbf{b}}(k, i)=\left\{\begin{array}{ll}{{\tilde{\mathbf{b}}}}(g_k, h_k, n), & \hbox{if}\; g_k=i \\ {{\tilde{\mathbf{b}}}}(g_k, h_k, 0), & \hbox {if}\; h_k = i\\ {\mathbf{0}}_{(n-1) \times 1}, & \hbox{otherwise} \end{array}\right. $$
(17)

and

$$ {{\tilde{\mathbf{b}}}}(i, j, \lambda) =\left[\begin{array}{l} \pi(i, j, 1, \lambda)\\ \vdots\\ \pi(i, j, n - 1, \lambda) \end{array}\right]. $$
(18)

Matrix \(\mathbf{C}\) is given by

$$ {\mathbf{C}} =\left[\begin{array}{lll} {\mathbf{C}}(1, 1) & \cdots & {\mathbf{C}}(1, \nu)\\ \vdots&\ddots&\vdots\\ {\mathbf{C}}(\nu, 1)&\cdots&{\mathbf{C}}(\nu, \nu)\end{array}\right] $$
(19)

where

$$ {\mathbf{C}}(k, k^{\prime}) =\left\{\begin{array}{ll} {{\widetilde{\mathbf{C}}}}(g_k, h_k), &\hbox{if}\; k=k^{\prime}\\ {\mathbf{0}}_{(n-1) \times (n-1)}, & \hbox{otherwise}\end{array}\right. $$
(20)

and

$$ {{\widetilde{\mathbf{C}}}}(i, j)= \left[\begin{array}{lll} \pi(i, j, 1, 1)& \cdots & \pi(i, j, 1, n - 1)\\ \vdots&\ddots&\vdots\\ \pi(i, j, n - 1, 1)& \cdots& \pi(i, j, n - 1, n - 1) \end{array}\right]. $$
(21)

Matrix D is obtained from AB, and C and is given by

$$ {\mathbf{D}} = {\mathbf{A}}{({\mathbf{I}} - {\mathbf{C}}) ^ {-1}}{\mathbf{B}} - m{\mathbf{I}}. $$
(22)

The following theorem states the main result of our analysis:

Theorem 1

Let all non-uniform populations be almost uniform, and let each chromosome in \(\mathcal{C}\) be connected to all other chromosomes in \(\mathcal{C}.\) Then, (1) all non-uniform populations have a long-run limit probability of zero, that is, \(\hat{q}(w) = 0\) for all \(w \in \mathcal{W} \setminus\mathcal{U},\) and (2) the long-run limit distribution \({{\hat{\mathbf{q}}}}=\left[\begin{array}{lll}\hat{q}(u(0)) \cdots \hat{q}(u(\mu - 1))\end{array}\right]\) satisfies

$$ {{\hat{\mathbf{q}}}}{\mathbf{D}}={\mathbf{0}} $$
(23)
$$ {{\hat{\mathbf{q}}}}{\mathbf{1}}=1 $$
(24)

which has a unique solution.

Proof

See the Appendix.

There are three comments that we would like to make on the above theorem. First, the result that under certain assumptions non-uniform populations have a long-run limit probability of zero is not new. A similar result can be found in Dawid (1996, Proposition 4.2.1). Second, under the assumptions of the theorem, the long-run limit probability of a population does not depend on the crossover rate γ. This is a quite remarkable result that, to the best of our knowledge, has not been reported before in the theoretical literature on GAs. It indicates that in the limit as the mutation rate \(\varepsilon\) approaches zero γ has no effect on the long-run behavior of a GA. Third, the theorem can be used to calculate the long-run limit distribution \({\hat{\mathbf{q}}}\) only if the probabilities \(\pi(i, j, \lambda, \lambda^{\prime})\) defined in (6) can be calculated for all i and all j such that δ(i, j) = 1 and for all \(\lambda \in \{1, \ldots, n - 1\}\) and all \(\lambda^{\prime} \in \{0, \ldots, n\}.\) Whether this is possible depends on the way in which the fitness of a chromosome is determined and on the selection operator that is used. This in turn depends heavily on the specific problem that one wants to model using a GA. Because of the dependence on the problem to be modeled, we cannot provide any general results for the calculation of the probabilities \(\pi(i, j, \lambda, \lambda^{\prime}).\) In Sect. 4, however, we demonstrate how the probabilities \(\pi(i, j, \lambda,\lambda^{\prime})\) can be calculated for a GA that is similar to the GA used by Axelrod in his seminal paper on GA modeling (Axelrod 1987).

3 Algorithm

In this section, we present an algorithm for calculating the long-run limit distribution \({\hat{\mathbf{q}}}.\) The algorithm is based on Theorem 1. Like Theorem 1, it assumes that all non-uniform populations are almost uniform and that each chromosome in \(\mathcal{C}\) is connected to all other chromosomes in \(\mathcal{C}.\) It also assumes that the probabilities \(\pi(i, j,\lambda, \lambda^{\prime})\) defined in (6) can be calculated for all i and all j such that δ(i, j) = 1 and for all \(\lambda\in \{1, \ldots, n - 1\}\) and all \(\lambda^{\prime} \in \{0, \ldots,n\}.\)

The most straightforward approach to calculating the long-run limit distribution \({\hat{\mathbf{q}}}\) would be to start with calculating the matrices \(\mathbf{A}, \mathbf{B},\) and \(\mathbf{C}\) using (13)–(21). Matrix \(\mathbf{D}\) would then be calculated using (22), which would require solving a linear system. Finally, \({\hat{\mathbf{q}}}\) would be obtained by solving the linear system given by (23) and (24). Unfortunately, this approach to calculating \({\hat{\mathbf{q}}}\) requires a lot of computer memory and is therefore infeasible even for problems of only moderate size. Most memory is required for storing matrix \(\mathbf{C}.\) This matrix has (at most) ν (n − 1)2 = (n − 1)2 m2m−1 non-zero elements. Clearly, as the population size n and the chromosome length m increase, storing the non-zero elements of \(\mathbf{C}\) in a computer’s main memory soon becomes infeasible. The algorithm that we propose for calculating \({\hat{\mathbf{q}}}\) exploits the sparsity of the matrices \(\mathbf{A}, \mathbf{B},\) and \(\mathbf{C}\) in order to calculate matrix \(\mathbf{D}\) in a memory-efficient way. The algorithm does not require the entire matrices \(\mathbf{A},\mathbf{B},\) and \(\mathbf{C}\) to be stored in memory. The algorithm also solves the linear system given by (23) and (24) in a memory-efficient way. This is achieved by exploiting the sparsity of \(\mathbf{D}.\) The algorithm is shown in Fig. 2. We now discuss it in more detail.

Fig. 2
figure 2

Algorithm for calculating the long-run limit distribution of a genetic algorithm

We first consider the efficient calculation of matrix \(\mathbf{D}.\) Let \({\widehat{\mathbf{C}}} = (\mathbf{I} - \mathbf{C}) ^ {-1}.\) Because \(\mathbf{C}\) is a block diagonal matrix, \({\widehat{\mathbf{C}}}\) can be written as

$$ {{\widehat{\mathbf{C}}}}=\left[\begin{array}{lll} {{\widehat{\mathbf{C}}}}(1, 1) & \cdots & {{\widehat{\mathbf{C}}}}(1, \nu)\\ \vdots& \ddots&\vdots\\ {{\widehat{\mathbf{C}}}}(\nu, 1)&\cdots&{{\widehat{\mathbf{C}}}}(\nu, \nu) \end{array}\right] $$
(25)

where

$$ {{\widehat{\mathbf{C}}}}(k, k^{\prime})=\left\{\begin{array}{ll}({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(g_k, h_k)) ^ {-1}, & \hbox {if \;$k = k^{\prime}$} \\ {\mathbf{0}}_{(n - 1) \times (n - 1)}, & \hbox {otherwise}.\end{array}\right. $$
(26)

Hence, \({\widehat{\mathbf{C}}}\) is a block diagonal matrix too. Let \(\mathbf{D}\) be written as

$$ {\mathbf{D}}=\left[\begin{array}{lll} d(0, 0)& \cdots & d(0, \mu - 1)\\ \vdots&\ddots&\vdots\\ d(\mu - 1, 0)&\cdots & d(\mu-1, \mu-1) \end{array}\right]. $$
(27)

Taking into account the sparsity of \(\mathbf{A}, \mathbf{B},\) and \({\widehat{\mathbf{C}}},\) it can be seen that

$$ d(i, j)= \left\{\begin{array}{ll}\sum_{i^{\prime} \in {\mathcal{G}}(i)}{{\tilde{\mathbf{a}}}_2}{{\tilde{\mathbf{e}}}}(i^{\prime}, i, 0) &\\\quad + \sum_{i^{\prime} \in {\mathcal{H}}(i)}{{\tilde{\mathbf{a}}}_1}{{\tilde{\mathbf{e}}}}(i, i^{\prime}, n) - m, & \hbox{if}\; i=j \\ {{\tilde{\mathbf{a}}}_2}{{\tilde{\mathbf{e}}}}(j, i, n), & \hbox{if}\; j \in {\mathcal{G}}(i)\\ {{\tilde{\mathbf{a}}}_1}{{\tilde{\mathbf{e}}}}(i, j, 0), & \hbox{if}\; j \in {\mathcal{H}}(i)\\ 0, & \hbox{otherwise}\end{array}\right. $$
(28)

where

$$ {{\tilde{\mathbf{e}}}}(i, j, \lambda) = {({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}{{\tilde{\mathbf{b}}}}(i, j, \lambda). $$
(29)

This result shows that each non-zero element of \(\mathbf{D}\) can be calculated by solving one or more relatively small linear systems, that is, systems of n − 1 equations and unknowns. Moreover, by calculating the elements of \(\mathbf{D}\) one by one, there is no need to store the entire matrices \(\mathbf{A}, \mathbf{B},\) and \(\mathbf{C}\) in memory. Solving a linear system of n − 1 equations and unknowns can be done using standard Gaussian elimination methods. Except for very large values for the population size n, today’s computers have sufficient main memory to apply Gaussian elimination methods to such systems. We further note that the amount of computation required for obtaining \(\mathbf{D}\) can be reduced by taking into account that

$$ \begin{aligned} {{\tilde{\mathbf{e}}}}(i, j, 0)&={({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}{{\tilde{\mathbf{b}}}}(i, j, 0)\\ &={({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}\left({\mathbf{1}} - \textstyle\sum\limits_{\lambda = 1}^n{{\tilde{\mathbf{b}}}}(i, j, \lambda)\right)\\ &={({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}({\mathbf{1}} - {{\widetilde{\mathbf{C}}}}(i, j){\mathbf{1}} - {{\tilde{\mathbf{b}}}}(i, j, n)) \\ &={({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)) ^ {-1}}({\mathbf{I}} - {{\widetilde{\mathbf{C}}}}(i, j)){\mathbf{1}} - {{\tilde{\mathbf{e}}}}(i, j, n)\\ &={\mathbf{1}} - {{\tilde{\mathbf{e}}}}(i, j, n). \end{aligned} $$
(30)

Because of this, d(ij) can be written as

$$ d(i,j)=\left\{\begin{array}{ll}\sum_{i^{\prime} \in {\mathcal{G}}(i)}(1 - {{\tilde{\mathbf{a}}}_2}{{\tilde{\mathbf{e}}}}(i^{\prime}, i, n))&\\\quad +\sum_{i^{\prime}\in{\mathcal{H}}(i)}{{\tilde{\mathbf{a}}}_1}{{\tilde{\mathbf{e}}}}(i, i^{\prime}, n) -m, & \hbox{if}\; i=j\\ {{\tilde{\mathbf{a}}}_2}{{\tilde{\mathbf{e}}}}(j, i, n), & \hbox{if}\; j \in {\mathcal{G}}(i)\\ 1 - {{\tilde{\mathbf{a}}}_1}{{\tilde{\mathbf{e}}}}(i, j, n), & \hbox {if}\; j \in {\mathcal{H}}(i)\\ 0, & \hbox{otherwise.}\end{array}\right. $$
(31)

Using (31) rather than (28) to calculate \(\mathbf{D}\) halves the number of linear systems that need to be solved. In the algorithm in Fig. 2, the calculation of \(\mathbf{D}\) based on (31) is performed between lines 3 and 3.

Matrix \(\mathbf{D}\) has μ2 = 22m elements. Consequently, storing all elements of \(\mathbf{D}\) in a computer’s main memory is possible only if the chromosome length m is not too large. It follows from (28) and (31) that the number of non-zero elements in \(\mathbf{D}\) equals μ(m + 1) = (m + 1)2m. Hence, \(\mathbf{D}\) is a rather sparse matrix and a lot of memory can be saved by storing only its non-zero elements.Footnote 2 In addition to the memory efficiency of the way in which \(\mathbf{D}\) is stored, one should also pay attention to the memory efficiency of the method that is used to solve the linear system given by (23) and (24). Gaussian elimination and other direct (i.e., non-iterative) methods for solving linear systems generally require that at least a large number of elements of the coefficient matrix, including zero elements, are stored in memory. Consequently, when using such a method to solve the linear system given by (23) and (24), it would not be possible to fully exploit the sparsity of \(\mathbf{D}.\) Linear systems can also be solved using iterative methods that require only the non-zero elements of the coefficient matrix to be stored in memory. One such method is the method of successive overrelaxation (e.g., Barrett et al. 1994; Stewart 1994; Tijms 1994, 2003). In the algorithm in Fig. 2, this method is used to solve the linear system given by (23) and (24) (see lines 3–3 of the algorithm). In addition to an initial guess \({\hat{\mathbf{q}}_0}\) for the solution of the linear system, the method of successive overrelaxation also requires a value for the relaxation factor ω. The value of ω, which should be between 0 and 2, may have a large effect on the rate of convergence of the method, and for some values of ω the method may not converge at all. An appropriate value for ω has to be determined experimentally. For ω = 1, the method of successive overrelaxation reduces to the Gauss-Seidel method, which is another iterative method for solving linear systems. We refer to Stewart (1994) for an in-depth discussion of both the method of successive overrelaxation and a number of alternative methods for solving linear systems similar to the one given by (23) and (24). We further note that the amount of main memory in most of today’s computers allows the algorithm in Fig. 2 to be run for chromosomes with length m up to about 24 bits.

4 Application

In this section, we demonstrate an application of the algorithm presented in the previous section. We study the use of a GA for modeling the evolution of strategies in IPDs. The use of GAs in this context was first studied by Axelrod (1987) (reprinted in Axelrod 1997; see also Dawid 1996; Mitchell 1996) and after him by many others (e.g., Ashlock et al. 1996, 2006; Crowley et al. 1996; Ishibuchi and Namikawa 2005; Miller 1996; Mühlenbein 1991; Thibert-Plante and Charbonneau 2007; Van Bragt et al. 2001; Yao and Darwen 1994). The algorithm presented in the previous section is used to analyze the long-run behavior of our GA. The results of the analysis are compared with results obtained using computer simulations (i.e., results obtained simply by running the GA). We emphasize that our primary aim is merely to illustrate the usefulness of the mathematical analysis provided in Sect. 2 and of the algorithm derived from the analysis in Sect. 3. It is not our primary aim to provide new insights into the behavior of GAs in the context of IPDs.

4.1 Genetic algorithm modeling in iterated prisoner’s dilemmas

The way in which we model the evolution of strategies in IPDs is similar to the way in which this was done by Axelrod (1987). However, Axelrod studied two approaches for modeling the evolution of strategies. In one approach, the fitness of a chromosome is determined by the performance of the chromosome in IPD games against a fixed set of opponents. In the other approach, the fitness of a chromosome is determined by the performance of the chromosome in IPD games against other chromosomes in the population. We restrict our attention to the second approach. This is the approach on which almost all studies after Axelrod’s work have focused (an exception is Mittal and Deb 2006).

We model the evolution of strategies in IPDs using a GA with a population size of n = 20 chromosomes. Each chromosome represents a strategy for playing IPD games. Players in IPD games are assumed to choose the action they play, that is, whether they cooperate or defect, based on their own actions and their opponent’s actions in the previous τ periods of the game, where τ is referred to as players’ memory length. Players are further assumed to play only pure strategies. We use the same binary encoding of strategies as was used by Axelrod (1987). For a description of this encoding, we refer to Axelrod (1987, 1997), Dawid (1996), and Mitchell (1996). Using Axelrod’s encoding, the chromosome length m depends on the memory length τ. We consider three memory lengths, 1, 2, and 3 periods, which result in chromosome lengths of, respectively, 6, 20, and 70 bits. In each iteration of the GA, each chromosome in the population plays an IPD game of 151 periods against all other chromosomes. In addition, each chromosome also plays a game against itself. The payoff matrix for a single period of an IPD game is shown in Table 2. The payoffs in this matrix must satisfy

$$ S < P < R < T $$
(32)

and

$$ S + T < 2R. $$
(33)

The payoff obtained by a chromosome in an IPD game equals the mean payoff obtained by the chromosome in all periods of the game. The fitness f of a chromosome equals the mean payoff obtained by the chromosome in the IPD games that it has played in the current iteration of the GA. Like in Axelrod’s work (1987), we use sigma scaling (e.g., Mitchell 1996) to normalize the fitness of a chromosome. The normalized fitness \(\tilde{f}\) of a chromosome is given by

$$ \tilde{f}=\left\{\begin{array}{ll}\max\left({{f - \mu_{\rm f}}\over {\sigma_{\rm f}}} + 1, 0\right), & \hbox{if}\; \sigma_{\rm f} > 0 \\ 1, & \hbox{otherwise}\end{array}\right. $$
(34)

where μ f and σf denote, respectively, the mean and the standard deviation of the fitness of the chromosomes in the population. The selection operator that we use is roulette wheel selection. Selection is performed based on the normalized fitness of the chromosomes in the population. The crossover operator that we use is single-point crossover.

Table 2 Payoff matrix for a single period of an iterated prisoner’s dilemma game

4.2 Calculation of the long-run limit distribution of the genetic algorithm

In this subsection, we are concerned with the calculation of the long-run limit distribution of the GA discussed in the previous subsection. To calculate the long-run limit distribution of the GA, we use the algorithm presented in Sect. 3. This algorithm assumes that the probabilities \(\pi(i, j, \lambda, \lambda^{\prime})\) defined in (6) can be calculated for all i and all j such that δ(i, j) = 1 and for all \(\lambda \in \{1, \ldots, n - 1\}\) and all \(\lambda^{\prime} \in \{0, \ldots, n\}.\) We now discuss the calculation of the probabilities \(\pi(i, j, \lambda,\lambda^{\prime})\) for our GA. For \(i^{\prime}, j^{\prime} \in\mathcal{C},\) let \(\varphi(i^{\prime}, j^{\prime})\) denote the payoff obtained by chromosome \(i^{\prime}\) in an IPD game against chromosome \(j^{\prime}.\) Suppose that the population in the current iteration of our GA equals v(ij, λ), where i and j satisfy δ(i, j) = 1 and where \(\lambda \in \{1, \ldots, n-1\}.\) That is, the population in the current iteration of our GA consists of λ times chromosome i and n − λ times chromosome j. The fitness f i of chromosome i is then given by

$$ f_i=\frac{\lambda\varphi(i, i) + (n - \lambda)\varphi(i, j)}{n}. $$
(35)

Similarly, the fitness f j of chromosome j is given by

$$ f_j=\frac{\lambda\varphi(j, i) + (n - \lambda)\varphi(j, j)}{n}.$$
(36)

Furthermore, the mean μ f and the standard deviation σf of the fitness of the chromosomes in the population are equal to, respectively,

$$ \mu_{\rm f}=\frac{{\lambda}f_i + (n - \lambda)f_j}{n}$$
(37)

and

$$ \sigma_{\rm f} = \sqrt{\frac{\lambda{(f_i - \mu_{\rm f}) ^ 2} + (n - \lambda){(f_j - \mu_{\rm f}) ^ 2}}{n}}.$$
(38)

The normalized fitness \(\tilde{f}_i\) of chromosome i is obtained by substituting f i ,  μ f , and σ f into (34). The normalized fitness \(\tilde{f}_j\) of chromosome j is obtained in a similar way. Let \(\tilde{\pi}_i\) and \(\tilde{\pi}_j\) denote the probabilities that the roulette wheel selection operator selects, respectively, chromosome i and chromosome j. Obviously, \(\tilde{\pi}_i\) and \(\tilde{\pi}_j\) equal

$$ \tilde{\pi}_i=\frac{\lambda\tilde{f}_i}{\lambda\tilde{f}_i + (n - \lambda)\tilde{f}_j}, \quad \tilde{\pi}_j =\frac{(n - \lambda)\tilde{f}_j}{\lambda\tilde{f}_i +(n-\lambda)\tilde{f}_j}. $$
(39)

\(\pi(i, j, \lambda, \lambda^{\prime}),\) where \(\lambda^{\prime} \in\{0, \ldots, n\},\) equals the probability that the roulette wheel selection operator turns population v(ij, λ) into population \(v(i, j, \lambda^{\prime})\) in a single iteration of our GA. Taking into account that the roulette wheel selection operator selects chromosomes independently of each other, it can be seen that \(\pi(i, j, \lambda, \lambda^{\prime})\) equals the probability mass function of a binomial distribution and is given by

$$ \pi(i, j, \lambda, \lambda^{\prime}) = \left( {\begin{array}{*{20}c} n \\ {\lambda ^{'} } \\ \end{array} } \right) {\tilde{\pi}_i{{} ^ {\lambda^{\prime}}}}{\tilde{\pi}_j{{}^{n - \lambda^{\prime}}}} $$
(40)

where the binomial coefficient \( \left( {\begin{array}{*{20}c} n \\ {\lambda ^{'} } \\ \end{array} } \right) \) is defined as

$$ \left( {\begin{array}{*{20}c} n \\ {\lambda ^{'} } \\ \end{array} } \right) = \frac{n!}{\lambda^{\prime}!(n - \lambda^{\prime})!}. $$
(41)

The algorithm presented in Sect. 3 also assumes that all non-uniform populations are almost uniform and that each chromosome in \(\mathcal{C}\) is connected to all other chromosomes in \(\mathcal{C}.\) Because of the use of roulette wheel selection, the assumption that all non-uniform populations are almost uniform is satisfied. The assumption that each chromosome in \(\mathcal{C}\) is connected to all other chromosomes in \(\mathcal{C}\) is satisfied if and only if matrix \(\mathbf{D}\) calculated in lines 3–3 of the algorithm in Fig. 2 is irreducible. (\(\mathbf{D} =\left[\begin{array}{l}d(i, j)\end{array}\right]\) is said to be irreducible if and only if there does not exist a non-empty set of chromosomes \(\widetilde{\mathcal{C}} \subset \mathcal{C}\) such that d(i, j) = 0 for all \(i \in \widetilde{\mathcal{C}}\) and all \(j \in\mathcal{C} \setminus \widetilde{\mathcal{C}}.\)) For the particular values that we use for the parameters SPRT, and τ (see the next subsection), \(\mathbf{D}\) turns out to be irreducible. Hence, the assumption that each chromosome in \(\mathcal{C}\) is connected to all other chromosomes in \(\mathcal{C}\) is satisfied.

4.3 Analysis of the long-run behavior of the genetic algorithm

In this subsection, we analyze the long-run behavior of our GA for the prisoner’s dilemma payoffs S = 0, P = 1, R = 3, and T = 5. These are the same payoffs as were used by Axelrod (1987) (see also Axelrod 1984) and by many others. The analysis is performed using the algorithm presented in Sect. 3. The use of this algorithm to analyze the long-run behavior of our GA was discussed in the previous subsection. We compare the results obtained using the algorithm with results obtained using computer simulations (i.e., results obtained simply by running the GA).Footnote 3 The parameter settings that we use are summarized in Table 3.

Table 3 Genetic algorithm parameter settings

The long-run limit distribution for a memory length of τ = 1 period is shown in Fig. 3 (in dark gray). The distribution was calculated using the algorithm from Sect. 3. As mentioned earlier, τ = 1 results in a chromosome length of m = 6 bits. This implies that there are μ = 2m = 64 different chromosomes and, as a consequence, that there are 64 different uniform populations. The long-run limit distribution is a probability distribution over these populations. As can be seen in Fig. 3, the long-run limit distribution spreads most of its mass over approximately 15 populations. It puts almost no mass on the remaining populations. Since all chromosomes in a uniform population are identical and represent the same strategy, the long-run limit distribution can be used to determine the long-run limit probability that a particular strategy is played. However, when doing so, it should be noted that there is some redundancy in the binary encoding of strategies that we use (as was already pointed out by Axelrod (1987). Due to this redundancy, it is possible that different chromosomes represent the same strategy. Some strategies can be encoded in two or three different ways, and the strategies always cooperate and always defect can even be encoded in twelve different ways. Taking into account the redundancy in the encoding, we have calculated the long-run limit probabilities of all possible strategies. The six strategies with the highest long-run limit probability are reported in Table 4. Together, these strategies have a long-run limit probability of almost 0.95. The remaining strategies all have very low long-run limit probabilities. It is sometimes claimed (e.g., Axelrod 1984, 1987) that a very effective strategy for playing IPD games is the tit-for-tat strategy, which is the strategy of cooperating in the first period and repeating the opponent’s previous action thereafter. The results reported in Table 4 do not really support this claim. As can be seen in the table, the always defect strategy has by far the highest long-run limit probability. In the long run, this strategy is played about 43% of the time. The tit-for-tat strategy has a long-run limit probability of no more than 0.14. This is even slightly less than the long-run limit probability of another cooperative strategy, namely the strategy that keeps cooperating until the opponent defects and then keeps defecting forever.

Fig. 3
figure 3

The long-run limit distribution calculated using the algorithm presented in Sect. 3 (in dark gray) and a probability distribution over the uniform populations obtained using computer simulations (in light grey). The memory length τ equals 1. On the horizontal axis, integers between 0 and 63 are used to represent the uniform populations. Integer i represents the uniform population consisting of 20 times chromosome i

Table 4 The six strategies with the highest long-run limit probability (reported in the first column)

In order to check the correctness of the algorithm presented in Sect. 3, we have also used computer simulations to analyze the long-run behavior of our GA. In other words, we have also analyzed the long-run behavior of our GA simply by running the GA. Like above, we first focus on the behavior of the GA for a memory length of τ = 1 period. We performed 500 runs of the GA. The crossover rate was set to γ = 1.0, and the mutation rate was set to \(\varepsilon = 10 ^{-5}.\) Because of the very small value of \(\varepsilon,\) the simulation results should be similar to the results obtained using the algorithm from Sect. 3. (Recall that the latter results hold in the limit as \(\varepsilon\) approaches zero.) Each run of the GA lasted \(2 \times 10 ^ 5\) iterations. This seemed sufficient for the GA to reach its steady state. After the last iteration of a GA run, we almost always observed that the population was uniform. Based on the 500 GA runs that we had performed, we determined for each uniform population the probability of observing that population at the end of a GA run. In this way, we obtained a probability distribution over the uniform populations. This distribution is shown in Fig. 3 (in light gray). Figure 3 allows us to compare the distribution with the long-run limit distribution calculated using the algorithm from Sect. 3. It can be seen that the two distributions are very similar. This confirms the correctness of the algorithm presented in Sect. 3.

In order to examine to what extent our GA results in the evolution of cooperative strategies, we now focus on the long-run mean fitness, that is, the mean fitness of a chromosome after a large number of iterations of the GA. For various values of the memory length τ, the crossover rate γ, and the mutation rate \(\varepsilon,\) the long-run mean fitness obtained using computer simulations is reported in Table 5. The associated 95% confidence interval is also provided in the table. The simulation results for τ = 1 are based on 500 runs of the GA, and the simulation results for τ = 2 and τ = 3 are based on 200 runs. (Simulation runs with τ = 2 and τ = 3 took more computing time than simulation runs with τ = 1. We therefore performed a smaller number of runs with τ = 2 and τ = 3 than with τ = 1.) Each run lasted \(2 \times 10 ^ 5\) iterations. The long-run mean fitness was calculated by taking the average over all GA runs of the mean fitness of a chromosome at the end of a run. In the limit as \(\varepsilon\) approaches zero, the long-run mean fitness can be calculated exactly and does not depend on γ. The calculation of the long-run mean fitness is based on the long-run limit distribution of the GA, which can be obtained using the algorithm presented in Sect. 3. For τ = 1 and τ = 2, the long-run mean fitness in the limit as \(\varepsilon\) approaches zero is reported in Table 5. For τ = 3, we cannot calculate the long-run limit distribution of the GA and we therefore do not know the long-run mean fitness in the limit as \(\varepsilon\) approaches zero. Calculating the long-run limit distribution of the GA is impossible for τ = 3 because the chromosome length equals m = 70 bits and because for such a chromosome length storing the long-run limit distribution requires a prohibitive amount of computer memory.

Table 5 Long-run mean fitness and associated 95% confidence interval for various values of the memory length τ, the crossover rate γ, and the mutation rate \(\varepsilon\)

Based on the results in Table 5, a number of observations can be made. First, for τ = 1 and τ = 2, the results obtained for \(\varepsilon = 10 ^ {-4}\) and \(\varepsilon = 10 ^ {-5}\) turn out to be very similar to the results obtained for \(\varepsilon \rightarrow0.\) This again confirms the correctness of the algorithm presented in Sect. 3. Second, for τ = 1, we find that the results are quite sensitive to the value of \(\varepsilon.\) Studies on GA modeling sometimes report that the long-run behavior of a GA is relatively insensitive to the value of \(\varepsilon.\) Our results demonstrate that this need not always be the case. Third, for small values of \(\varepsilon,\) it can be seen that increasing τ leads to a higher long-run mean fitness and, hence, to more cooperation. The evolution of cooperative strategies in IPD games therefore seems more likely when players have longer memory lengths. Finally, it can be observed that the value of γ has no significant effect on our results. This is in line with the mathematical analysis provided in Sect. 2. The mathematical analysis implies that for \(\varepsilon \rightarrow 0\) the long-run mean fitness is independent of γ. The results in Table 5 indicate that this is the case not only for \(\varepsilon \rightarrow 0\) but more generally.

5 Conclusions

In this paper, we have presented a mathematical analysis of the long-run behavior of GAs that are used for modeling social phenomena. Under the assumption of a positive but infinitely small mutation rate, the analysis provides a full characterization of the long-run behavior of GAs with a binary encoding. Based on the analysis, we have derived an algorithm for calculating the long-run behavior of GAs. In an economic context, the algorithm can for example be used to determine whether convergence to an equilibrium will take place and, if so, what kind of equilibrium will emerge. Compared with computer simulations, the main advantage of the algorithm that we have derived is that it calculates the long-run behavior of GAs exactly. Computer simulations only estimate the long-run behavior of GAs.

To demonstrate the usefulness of our mathematical analysis, we have replicated a well-known study by Axelrod in which a GA is used to model the evolution of strategies in IPDs (Axelrod 1987). We have used both our exact algorithm and computer simulations to replicate Axelrod’s study. By comparing the results of the two approaches, we have confirmed the correctness of our algorithm. We have also obtained some interesting new insights. For example, when players have a memory length of one period, the tit for tat strategy turns out to be less important than is sometimes claimed (e.g., Axelrod 1984, 1987). In the long run, the strategy is played only 14% of the time. Another finding is that the long-run behavior of a GA can be quite sensitive to the value of the mutation rate. We regard this as a serious problem, since the value of the mutation rate is typically chosen in a fairly arbitrary way without any empirical justification (see also Dawid 1996).

The mathematical analysis that we have presented also reveals that if the mutation rate is infinitely small the crossover rate has no effect on the long-run behavior of a GA. This remarkable result is perfectly in line with the simulation results that we have reported in Sect. 4. For various values of the mutation rate, the simulation results show no significant effect of the crossover rate on the long-run behavior of a GA. Hence, when GAs are used for modeling social phenomena, the crossover rate seems to be a rather unimportant parameter, at least when the focus is on the long run (for the short run, see Thibert-Plante and Charbonneau 2007). It seems likely that in many cases leaving out the crossover operator altogether has no significant effect on the long-run behavior of a GA. Interestingly, leaving out the crossover operator brings GAs quite close to well-known models in evolutionary game theory, such as those studied in Kandori et al. (1993) and Vega-Redondo (1997).

Finally, we note that an analysis such as the one presented in this paper can be performed not only for GAs with a binary encoding but also for other types of evolutionary algorithms. From a modeling point of view, a binary encoding in many cases has the disadvantage that it lacks a clear interpretation (e.g., Dawid 1996). The use of a binary encoding can therefore be difficult to justify and may even lead to artifacts (as shown in Waltman and Van Eck 2009; Waltman et al. 2011). Probably for these reasons, some researchers use evolutionary algorithms without a binary encoding (e.g., Haruvy et al. 2006; Lux and Schornstein 2005). The analysis presented in this paper then does not directly apply. However, when the action space of agents is assumed discrete, the long-run behavior of evolutionary algorithms without a binary encoding can still be analyzed in a similar way as we have done in this paper, namely by relying on mathematical results provided by Freidlin and Wentzell (1998). This indicates that our approach is quite general and can be adapted relatively easily to other types of evolutionary algorithms.