Abstract
Quality diversity (QD) is a branch of evolutionary computation that gained increasing interest in recent years. The Map-Elites QD approach defines a feature space, i.e., a partition of the search space, and stores the best solution for each cell of this space. We study a simple QD algorithm in the context of pseudo-Boolean optimisation on the “number of ones” feature space, where the ith cell stores the best solution amongst those with a number of ones in \([(i-1)k, ik-1]\). Here k is a granularity parameter \(1 \le k \le n+1\). We give a tight bound on the expected time until all cells are covered for arbitrary fitness functions and for all k and analyse the expected optimisation time of QD on OneMax and other problems whose structure aligns favourably with the feature space. On combinatorial problems we show that QD finds a \({(1-1/e)}\)-approximation when maximising any monotone sub-modular function with a single uniform cardinality constraint efficiently. Defining the feature space as the number of connected components of an edge-weighted graph, we show that QD finds a minimum spanning forest in expected polynomial time. We further consider QD’s performance on classes of transformed functions in which the feature space is not well aligned with the problem. The asymptotic performance is unaffected by transformations on easy functions like OneMax. Applying a worst-case transformation to a deceptive problem increases the expected optimisation time from \(O(n^2 \log n)\) to an exponential time. However, QD is still faster than a (1+1) EA by an exponential factor.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Evolutionary algorithms (EAs) are randomised search heuristics imitating principles of natural evolution. They maintain a population, a multi-set of candidate solutions, and aim to iteratively improve the quality (or fitness) of solutions in the population by creating new solutions via mutation and/or crossover of current population members, and then selecting promising solutions to form the next generation’s population [1]. EAs have proven to perform excellently in many domains, including—but not limited to—multi-objective optimisation, logistics, and transportation [2].
The classic focus of EAs is on pure (global) optimisation, i.e., to find a single high-performing solution to the problem at hand. In multi-modal optimisation [3] instead, the goal is in finding all or (at least) most of multiple local and/or global optima. Such algorithms usually use different means of explicit diversity preservation to allow a more exploratory optimisation process in order to prevent the population from converging into local optima which often act as traps. Another related EA-branch first introduced by Ulrich and Thiele [4] in the continuous domain is evolutionary diversity optimisation (EDO). In EDO the aim is to evolve a population of diverse solutions which are all high-performers (i.e., they all satisfy a minimum quality threshold which may be set a-priori or can be adapted dynamically). Usually, diversity is measured by a domain- or even problem-specific diversity function. Recently, EDO gained increasing interest in combinatorial optimisation. Very recently, a new branch of evolutionary computation termed quality diversity (QD) emerged [5]. In the so-called MAP-Elites approach [6], the search space is partitioned into disjoint regions of interest termed cells. The goal is to find high-quality solutions for each cell, making diversity explicit rather than implicit like it is done, e.g., in EDO. QD algorithms have major applications in the field of robotics [7] and only recently have been successfully applied in, e.g., TSP instance space coverage [8] or combinatorial optimisation for the knapsack problem [9] and the travelling thief problem [10].
Stemming from engineering applications, the theory of evolutionary algorithms was mainly neglected in the early years. With some delay it finally developed into a mature field with many runtime results on pseudo-Boolean optimisation, many combinatorial optimisation problems [11], multi-modal optimisation [12], diversity preservation [13] and—rather recently—advances in multi-objective optimisation [14] and EDO [15]. However, at the moment of writing—to the best of our knowledge—there is just one work on the runtime analysis of quality diversity algorithms. Nikfarjam et al. [10] analyse a QD algorithm for the knapsack problem. They show that the studied QD algorithm operating on a suitable two-dimensional feature space mimics the dynamic programming approach for the knapsack problem; they show upper bounds on the expected optimisation time. Apart from this, there is no theoretical work explaining the potential benefits of QD. Many questions need to be solved: when does QD perform well, and why? How does its performance compare with that of established algorithms? How to design effective QD algorithms for interesting problem classes? Many of these questions can be resolved by runtime analysis.
We perform runtime analyses of a simple MAP-Elites quality diversity algorithm, termed QD, in the context of pseudo-Boolean optimisation and combinatorial optimisation. QD operates on the k-number-of-ones (k-NoO) feature space where \(k \in \{1, \ldots , n+1\}\) steers the granularity of the feature space, i.e., the number of cells. The ith cell of the map for \(i = 1, \ldots , (n+1)/k=:L\) describes the niche of solutions with the number of ones in the interval \([(i-1)k,ik-1]\).
We consider three performance measures: the time to find an optimal search point (optimisation time), the time to cover all cells (cover time) and the time to find an optimal search point within each cell (optimal-cover time).
In Sect. 3 we show that the cover time of QD operating on the k-NoO feature space on every pseudo-Boolean fitness function is \(O(n^2 \log n)\) for \(k=1\) and \(O(n/(\sqrt{k}4^k p_m^k))\) for larger k, where \(p_m = \Theta (1/n)\) is the mutation probability. For granularity \(k=1\) this immediately gives upper bounds of \(O(n^2 \log n)\) for all functions of unitation (where the fitness only depends on the number of ones) and all monotone functions [16, 17], demonstrating that QD can be powerful if the feature space is well aligned with relevant problem characteristics. We show that the aforementioned bounds are tight for OneMax and that the optimal-cover time on OneMax is of the same order as well.
Moreover, we give a general fitness-level upper bound for QD and consider its performance on classes of transformed functions on which the feature space is not necessarily well aligned with the problem. On all transformed OneMax functions the optimal-cover time is still of order \(O(n^2 \log n)\). Applying a worst-case transformation to Trap functions turns an expected optimisation time of \(O(n^2 \log n)\) into an exponential time. However, there is still a small benefit for QD as the worst possible running time is drastically smaller than that of a (1+1) EA.
In Sect. 4 we further study QD on the k-NoO feature space in the domain of monotone submodular function maximisation which is known to be NP-hard in general. QD finds a \((1-1/e)\)-approximation on any monotone submodular function with a single cardinality constraint r in expected time \(O(n^2(\log (n) + r))\), matching the upper bound shown by Friedrich and Neumann [18] for the multi-objective optimiser GSEMO. Both GSEMO and our QD operate similarly, but QD is simpler and more straightforward from an algorithmic perspective. This application indicates an interesting relationship between the definition of a feature space in the context of quality diversity optimisation and the fitness-function construction—encoding features of the encoded solutions as additional objectives—in multi-objective optimisation.
Last but not least, in Sect. 5 we focus on the minimum spanning forest (MSF) problem on an edge-weighted undirected graph with n nodes, m edges, and \(C(G) \in \{1, \ldots , n\}\) connected components (CCs); the minimum spanning tree problem (MST) is a special case for connected graphs with \(C(G)=1\). Solutions are encoded as bit-strings over \(\{0,1\}^m\), i.e., the feature space spans a feature of the phenotype of the encoded solutions. We first consider a two-dimensional feature space with cells for every feasible number of CCs and every possible number of selected edges, referred to as CCE feature space. We show an upper bound of \(O(m^2(n-C(G))\max \{n-C(G), \log n\})\) for finding a minimum spanning forest on G. Then we observe that a simpler feature space is sufficient and leads to better performance. In the CC feature space we have cells for each feasible number of CCs. QD operating on this CC feature space finds a minimum spanning forest of any edge-weighted source graph in at most \(O((n-C(G)+1)m\max \{n-C(G)+1,\log n\})\) steps, provided all edge weights are bounded by \(2^{O(n)}\). As a consequence, an MST is found after \(O(n^2m)\) steps in expectation on any connected graph. Again, the algorithm works similarly to the bi-objective formulation studied by Neumann and Wegener [19] who used the number of connected components and the total weight of the encoded solution as two conflicting objectives to be minimised simultaneously, leading to the same time bound of \(O(n^2 m)\). QD is a simpler algorithm since it does not require the concept of Pareto-dominanceFootnote 1 to deal with partial ordering of the bi-objective fitness vectors, and the runtime analysis is more straightforward as a consequence.
A preliminary version of this work with parts of the results has been published at a conference [20]. In this improved and extended manuscript we added the general fitness-level upper bound, results on transformations of fitness functions (Sect. 3.4) and the two-dimensional feature space CCE for computing minimum spanning forests.
2 The QD Algorithm
The outline of the studied quality diversity algorithm QD analysed is given in Algorithm 1. The algorithm follows the Map-Elites scheme and operates on functions \(f :\{0,1\}^n \rightarrow {\mathbb {R}}\). First, QD initialises a map M which over the course of optimisation stores one solution for each expression of the feature space. The specific details will be discussed later; for now assume that M is a function that maps \(\{0,1\}^n\) to a feature space. In the beginning the map is empty. Next, an initial solution \(x \in \{0,1\}^n\) is sampled uniformly at random. This solution is stored alongside its objective function value in the map in cell M(x). After initialisation the evolutionary loop starts until a stopping condition is met. In each iteration a random parent x is sampled uniformly at random from the set of already covered map cells. A single offspring y is generated from x by bit-wise mutation with mutation probability \(p_m\). If the cell at position M(y) is not yet covered, y is stored in any case. If M(y) is already populated, say with z, the fitness function decides which one to keep. If f(y) is no worse than f(z), y replaces z in the respective map cell.
We define the following goals and corresponding hitting times.
-
Optimisation: let \(T_{f, \text {OPT}}\) be the random number of function evaluations until QD finds a global optimum of f.
-
Coverage: let \(T_{f, \text {C}}\) be the random number of function evaluations until QD finds each one solution for each map cell.
-
Optimal coverage: Let \(T_{f, \text {COPT}}\) be the time until an optimal solution is found for each map cell.
Note that the optimal-coverage time is the largest of all measures: \(T_{f, \text {C}} \le T_{f, \text {COPT}}\) and \(T_{f, \text {OPT}} \le T_{f, \text {COPT}}\).
We say that an event occurs with overwhelming probability if its probability is at least \(1-2^{\Omega (n^\varepsilon )}\) for some positive constant \(\varepsilon > 0\). By H(x, y) we denote the Hamming distance (number of disagreeing bits) in two search points \(x, y \in \{0, 1\}^n\).
We will consider three different feature spaces, a generic feature space for pseudo-Boolean optimisation based on the number of ones, and two tailored feature spaces for computing minimum spanning forests. The latter two will be introduced later on, in Sect. 5.
Definition 1
The number-of-ones (NoO) feature space is defined as follows. Given a granularity parameter \(k \in \{1, \ldots , n+1\}\) with k dividing \(n+1\), the map M is of size \(L {:}{=}\frac{n+1}{k}\) and the i-th cell, \(i \in \{1, \ldots , L\}\), stores the best solution found so far amongst all solutions with a number of ones in \([(i-1)k, ik-1]\). We refer to this as the k-NoO feature space.
Note that \(k=1\) matches the somehow natural feature space storing a best search point for every number of ones, that is, every level of the Boolean hypercube. It should be noted that QD can be implemented efficiently for the k-NoO feature space: the map can be represented via an array of length L. Given a solution \(x \in \{0,1\}^n\), we can calculate the index M(i) directly in constant time by calculating its array index \(i = \lfloor |x|_1/k\rfloor \).
The number-of-ones feature space is a natural space for the Boolean hypercube and it is well-aligned with many well-studied fitness functions. Functions of unitation, e.g., OneMax, Jump and Trap only depend on the number of ones. These functions are defined in this way for simplicity and notational convenience. For some combinatorial optimisation problems, e. g. in selection problems like Knapsack or Vertex Cover, as well as for optimising submodular functions, the number of ones is a natural property.
3 Pseudo-Boolean Optimisation
3.1 Upper Bounds for Cover Times
We first give an upper bound on the expected time to cover all cells. This upper bound applies to every pseudo-Boolean fitness function and relies on worst-case probabilities of jumping from one cell to an undiscovered neighbouring cell.
Theorem 1
Consider QD operating on the k-NoO feature space with granularity \(1 \le k \le n+1\), k dividing \(n+1\), with an arbitrary initial search point and mutation rate \(p_m = c/n\) for a constant \(c > 0\). For every fitness function, the expected cover time is \(O(n^2\log n)\) for \(k=1\) and \(O\left( Lp_m^{-k}/\left( {\begin{array}{c}2k-1\\ k\end{array}}\right) \right) = O(n/(\sqrt{k}4^kp_m^k))\) for \(k \ge 2\).
Note that, while Theorem 1 bounds the cover time irrespective of the fitness, the optimal-cover time depends on the fitness function. In particular, the cell \(\lceil L/2 \rceil \) contains all \(\Omega (2^n/\sqrt{n})\) search points with \(\lfloor n/2 \rfloor \) ones and the fitness function in this cell could be deceptive or a needle-in-a-haystack function, implying exponential optimal cover times. Hence the optimal-cover time cannot be bounded as in Theorem 1.
To prove Theorem 1, we show the following technical lemma.
Lemma 2
For all \(k \in {\mathbb {N}}\), \(k \ge 2\) and \(i \ge 3\),
Proof
Since the fraction is non-decreasing with k, its minimum is attained for \(k=2\) and we obtain the lower bound
Using \(i \ge 3\), which implies \(i^2 \ge 3i\), we bound the numerator from below by \(2i^2 - 3i + 1 \ge i^2 + 1 \ge i^2\). This completes the proof. \(\square \)
Proof of Theorem 1
We assume \(k \le (n+1)/2\) as otherwise the only k dividing \(n+1\) is \(k=n+1\), where we only have one cell and the cover time is trivial. After initialisation one cell i is covered and \(QD({k}) \) will maintain a solution \(x^{(i)}\) whose number ones is in \([k(i-1), ki-1]\). Note that the precise search point can be replaced by another search point having the same property and a fitness value that is no worse. If \(i > 1\) then cell \(i-1\) can be covered by choosing the current search point \(x^{(i)}\) in cell i as a parent and creating an offspring whose number of ones is in \([k(i-2), k(i-1)]\). In the worst case,Footnote 2\(|x^{(i)}|_1 = ki-1\) and then selecting \(x^{(i)}\) as a parent and flipping exactly k one-bits in \(x^{(i)}\) will create an offspring in cell \(i-1\). Since the probability of selecting a parent from one out of at most L covered cells is always at least 1/L, the probability of this event is at least
The expected time for covering all cells with an index of at most i is bounded by \(\sum _{j=2}^{i-1} \frac{1}{s_i} \le \sum _{j=2}^{L} \frac{1}{s_i}\). The same arguments apply symmetrically for reaching cells with a larger number of ones, assuming that in the worst case a search point in cell j has \(k(j-1)\) ones and swapping the roles of zeros and ones. Thus, the expected time to cover all cells is bounded by
Now for the summation in the last line we make a case distinction. For \(k=1\) the summation simplifies to
where H(n) denotes the n-th harmonic number, as \(k=1\) implies \(L-1= \frac{n+1}{k} -1 = n\). Since \(H(n) = \ln (n) + \Theta (1)\), along with \(p_m = \Theta (1/n)\) and \((1-p_m)^{n-k} = (1-c/n)^{n-k} = \Theta (1)\), this proves the claimed upper bound for \(k=1\).
For \(k \ge 2\), applying Lemma 2 to all summands with \(i \ge 3\),
since \(\sum _{i=3}^L \frac{1}{i^2} \le \sum _{i=1}^\infty \frac{1}{i^2} = \frac{\pi ^2}{6} = O(1)\). Hence, along with \((1-p_m)^{n-k} = (1-c/n)^{n-k} = \Theta (1)\), the expected coverage time is \(O\left( Lp_m^{-k}/\left( {\begin{array}{c}2k-1\\ k\end{array}}\right) \right) \). Using well known results on the largest binomial coefficient (see, e.g., [21, Equation (1.4.18), Corollary 1.4.12]), \(\left( {\begin{array}{c}2k-1\\ k\end{array}}\right) = 2^{2k-1}/\Theta (\sqrt{k}) = \Theta (4^k/\sqrt{k})\), we obtain the simplified bound \(O(Lp_m^{-k} \cdot \sqrt{k}/4^k) = O(n/(\sqrt{k}4^kp_m^k))\). \(\square \)
We remark that the term \(p_m^{-k}\) (along with a factor \((1-p_m)^{-n+k} = \Theta (1)\) that was absorbed in the asymptotic notation) reflects the expected time to perform a mutation flipping precisely k bits and the binomial coefficient \(\left( {\begin{array}{c}2k-1\\ k\end{array}}\right) \) reflects the number of possible choices for these k flipping bits. The term \(p_m^{-k}\) could possibly be reduced by using heavy-tailed mutations [22, 23] that increase the probability of flipping many bits. This is outside the scope of this work, though.
3.2 Functions of Unitation and Monotone Functions
Our general Theorem 1 guarantees good performance for QD if the feature space aligns favourably with the problem in hand. We give two such examples. Functions of unitation only depend on the number of ones in the bit string. Well-known examples include OneMax \((x) {:}{=}\sum _{i=1}^n x_i\), Jump [24], Cliff [25], Hurdle [26] and Trap (see Sect. 3.4), which are functions of wildly varying difficulty for evolutionary algorithms. For functions of unitation and \(k=1\) with the 1-NoO feature space, covering all cells is equivalent to covering all cells optimally, and implies that a global optimum has been found. Hence Theorem 1 yields the following simple implication.
Corollary 3
The expected cover time, the expected optimal-cover time, and the expected time until QD operating on the 1-NoO feature space, with mutation rate \(p_m = c/n\), \(c > 0\) constant, finds a global optimum on any function of unitation is \(O(n^2 \log n)\).
Note that for larger k, the optimal-cover time on functions of unitation can be larger than the cover time. If in cell 1 the all-zeros string is optimal and search points with \(k-1\) ones have the second-best fitness of that cell, a mutation flipping \(k-1\) specific bits is required to optimally cover this cell. Along with a factor of \(\Theta (L)\) for selecting a parent from cell 1, this yields an expected time of \(\Omega (L p_m^{-k+1})\) from a typical population covering cell 1 and \(\Omega (L)\) other cells. If k is large enough such that \(4^k/\sqrt{k} = \omega (1/p_m)\), this is larger than the upper bound from Theorem 1.
It should be pointed out that Corollary 3 only holds since the feature space aligns with the definition of unitation functions. It is not robust in a sense that changing the encoding of unitation functions by swapping the meaning of zeros and ones for selected bits will immediately invalidate the proof. We will consider transformations of fitness functions in more detail in Sect. 3.4.
A second example is the class of monotone functions [16, 17, 27,28,29]. A pseudo-Boolean function is called monotone if flipping only 0-bits to 1 and not flipping any 1-bits strictly increases the fitness. This implies that \(1^n\) is a unique global optimum. The class of monotone functions includes all linear functions with positive weights. All monotone functions can be solved by randomised local search in expected time \(O(n \log n)\). But, surprisingly, there are monotone functions such that the (1+1) EA with mutation rate c/n, \(c > 0\) a sufficiently large constant, require exponential time with high probability [16, 17]. The reason is that mutation probabilities larger than 2.2/n frequently lead to mutations flipping both zeros and ones (thus avoiding the requirements of monotonicity) and accepting search points with a smaller number of ones. Since QD on the 1-NoO feature space stores every increase in the number of ones, it runs in expected time \(O(n^2 \log n)\) for every mutation rate c/n.
Corollary 4
The expected time until QD operating on the 1-NoO feature space, with mutation rate \(p_m = c/n\), \(c > 0\) constant, finds a global optimum on any monotone function is \(O(n^2 \log n)\).
Even though every mutation that flips only a 0-bit strictly increases the fitness, this property is not needed for Corollary 4 as empty cells are being filled regardless of fitness. But it suggests that the result on monotone functions may be more robust to fitness transformations than our result on functions of unitation. Indeed, we will show in Sect. 3.4 that our upper bound on functions of unitation is not robust in a sense that exponential optimisation times occur with overwhelming probability for worst-case transformed functions of unitation.
3.3 Tight Bounds for OneMax
We also show that the upper bound from Theorem 1 is asymptotically tight for OneMax, for all values of k. For \(k \ge 2\) the reason is that, on OneMax, when cell 2 is reached, the best solution in cell 2 is one with \(2k-1\) ones. The best chance of reaching cell 1 from there is to pick M(2) as a parent and to flip k out of \(2k-1\) 1-bits. This shows that even for the simplest possible fitness function (with a unique optimum), the expected cover time grows exponentially with the granularity k (except for the trivial setting of \(k=n+1\), that is, having one cell only, for which QD equals the (1+1) EA and the expected optimal-cover time is \(O(n \log n)\)).
Theorem 5
Consider QD operating on the k-NoO feature space, with \(1 \le k \le (n+1)/2\) and mutation rate \(p_m = c/n\), \(c > 0\) constant, on OneMax. The expected cover time and the expected optimal-cover time are both in \(\Theta (n^2\log n)\) for \(k=1\) and \(\Theta \left( Lp_m^{-k}/\left( {\begin{array}{c}2k-1\\ k\end{array}}\right) \right) = \Theta (n/(\sqrt{k}4^kp_m^k))\) for \(k \ge 2\).
We first show a technical lemma on decaying jump probabilities.
Lemma 6
Let \(p_{i, j}\) denote the probability of a standard bit mutation with mutation probability \(p_m\) creating a mutant with j ones from a parent with i ones. For all \(0< p_m < 1\), all \(j < i\) and all \(k \in \{0, \ldots , j\}\),
Proof
For \(j < i\) and \(\ell \ge 0\) let \(p_{i, j, \ell }\) denote the probability of a standard bit mutation with mutation probability \(p_m\) creating a mutant with j ones from a parent with i ones by flipping \(i-j+\ell \) ones and flipping \(\ell \) zeros, that is,
Note that \(p_{i, j, \ell } = 0\) if \(\ell > \min \{j, n-i\}\). We claim that for all \(i \in \{2, \ldots , n\}\), all \(d \in \{1, \ldots , i\}\) and all \(\ell \in {\mathbb {N}}_0\)
This is trivial for \(p_{i, i - d - 1, \ell } = 0\), hence we assume \(p_{i, i - d - 1, \ell } > 0\) (which implies \(i - d - \ell \ge 1\)) and have
Since \(d + \ell \ge 1\) and \(i \ge 2\), we have \(\frac{d+\ell +1}{i-d-\ell } \ge \frac{2}{i-1} \ge \frac{1}{i}\) and this implies (1). Consequently,
The statement now follows from applying (1) repeatedly and bounding the resulting factors as
\(\square \)
The following lemma shows that, with probability \(\Omega (1)\), the map will be partially filled by the time the cells with lower indices will be reached. This implies that the probability of selecting a parent from the cell with smallest index is decreased by a factor of \(\Theta (1/L)\), slowing down the exploration of smaller cells.
Lemma 7
Consider the scenario of Theorem 5. There exists a constant \(\varepsilon > 0\) such that, with probability \(\Omega (1)\), QD reaches a state where at least \(\varepsilon L\) cells are covered and cells 1 to \(\varepsilon L\) are not yet covered.
Proof
If \(k=(n+1)/2\) then since k divides \(n+1\) we know that \(n+1\) is even and there are just two symmetric cells 1 and 2. By symmetry, the probability of initialising in cell 2 is 1/2 and then the claim follows. The same arguments apply to values \(k < (n+1)/2\) and \(k = \Omega (n)\) as then \(L = O(1)\) and covering one cell with a solution of at least n/2 ones is sufficient when choosing \(\varepsilon := 1/L\).
In the following, we assume \(k = o(n)\) and define \(X_t\) as the smallest number of ones in any search point seen so far. Note that, trivially, \(X_{t+1} \le X_t\). We claim that with probability \(1-o(1)\) the following statements hold for a function \(\alpha {:}{=}\alpha (n) {:}{=}\min \{n/3, n/(4c)\}\). (1) the algorithm is initialised with \(X_0 \ge \alpha \), (2) whenever \(X_t\) is decreased, it is decreased by at most \(\alpha /4\), (3) while \(X_t \in [1, \alpha ]\), steps decreasing \(X_t\) have an exponential decay (as will be made precise in the following). Claim (1) follows from Chernoff bounds since \(\alpha \le n/3\). Claim (2) follows since decreasing \(X_t\) by at least \(\alpha /4\) requires flipping \(\Omega (n)\) bits. This has probability at most \(\left( {\begin{array}{c}n\\ \alpha /4\end{array}}\right) p_m^{\alpha /4} \le 2^n \cdot (c/n)^{\alpha /4} = n^{-\Omega (n)}\) whereas accepted steps decreasing \(X_t\) have probability at least \(\Omega (n^{-k}/L) = n^{-o(n)}\). Along with a union bound over O(n) values of \(X_t\), the conditional probability of decreasing \(X_t\) by at least \(\alpha /4\) throughout the run is \(O(n) \cdot n^{-\Omega (n)}/n^{-o(n)} = n^{-\Omega (n)}\). It remains to prove (3).
Assuming \(X_t \le \alpha \), we consider the decrease of \(X_t\) in steps in which \(X_{t+1} < X_t\) and consider the difference \(D_t:= X_{t+1} - X_t\). Adopting the notation from Lemma 6, we have
Recalling \(\alpha \le n/(4c)\), by Lemma 6 we have for all \(i \le \alpha \) and all \(\ell \in \{0, \ldots , i-1\}\).
for n large enough such that \(1-p_m \ge 1/2\). Hence the above is at most
Thus, \((D_t \mid X_{t+1} < X_t)\) is stochastically dominated by a geometric random variable \(Z_t\) with parameter 1/2. In other words, the progress towards \(0^n\) in steps in which \(X_t\) decreases is thus stochastically dominated by a sequence \(Z_0, Z_1, \ldots \) of geometric random variables with parameter 1/2. Now choose \(\varepsilon {:}{=}\alpha /(12Lk)\) and note \(\varepsilon = \Theta (1)\).
Consider a phase of \(\alpha /12\) steps. The expected total decrease in \(\alpha /12\) steps is at most \(\alpha /6\). By Chernoff bounds for sums of geometric random variables (Theorem 10.32 in [21]), the probability of the total decrease being at least \(\alpha /4\) is at most \(e^{-\Omega (\alpha )} = e^{-\Omega (n)}\). Hence, with the converse probability \(1 - e^{-\Omega (n)}\) at least \(\alpha /12\) steps decreasing \(X_t\) are necessary to reduce \(X_t\) by a total of \(\alpha /4\). By claims (1) and (2), with high probability the first \(X_t\)-value of at most \(\alpha \) is at least \(3\alpha /4\), hence the final \(X_t\)-value is at least \(\alpha /2\).
Since \(\alpha /2 = 6\varepsilon L k\), the event \(X_t \ge \alpha /2\) implies that the first \(\varepsilon L\) cells have not been reached yet. Since at most k decreasing steps may concern the same cell, during \(\alpha /12\) steps decreasing \(X_t\), at least \(\alpha /(12k) = \varepsilon L\) cells are visited. Observing that the union bound of all failure probabilities is o(1) completes the proof. \(\square \)
Now we put everything together to prove Theorem 5.
Proof of Theorem 5
The upper bounds on the expected cover time follow from Theorem 1. For \(k=1\) the cover time equals the optimal-cover time. For \(k \ge 2\) we use a simple and crude argument: as long as not all cells are covered optimally, there is a cell i whose search point can be improved. An improvement happens with probability \(1/L \cdot 1/(en)\), if the right parent is selected and an improving Hamming neighbour is found. The expected waiting time is at most enL and at most n such steps are sufficient to cover all cells optimally. This only adds a term \(O(n^2L) = O(n^3)\) to the upper bound, which is no larger than the claimed runtime bound.
It remains to show the claimed lower bounds for the expected cover time. By Lemma 7, with probability \(\Omega (1)\), QD(\(k\)) reaches a state where at least \(\varepsilon L\) cells are covered and the first \(\varepsilon L\) cells are not yet covered. We work under the condition that this happens and note that this only incurs a constant factor in the lower bound.
We first show the statement for \(k \ge 2\) and \(k \le \varepsilon (n+1)/2\). We may assume that \(\varepsilon (n+1)/2 \le n/(16c)\) as otherwise we may simply choose a smaller value for \(\varepsilon \). This ensures that the exponential decay of jump lengths shown in the proof of Lemma 7 applies for all search points with at most \(4k \le 4\varepsilon (n+1)/2\) ones. Let S be the set of all search points with at most \(2k-1\) ones. As the first \(\varepsilon L\) cells are not covered yet, the minimum number of ones seen so far is at least \(\varepsilon k L = \varepsilon (n+1) \ge 2k\). Applying Lemma 6 as in the proof of Lemma 7 we get that, for all \(2k \le i \le 4\varepsilon (n+1)/2\), the probability that a search point with exactly \(2k-1\) ones is reached, when S is reached for the first time, is
The same probability bound also holds for \(i > 4\varepsilon (n+1)/2\) as the conditional probability of jumping to S from a search point with more than \(4\varepsilon (n+1)/2\) ones is asymptotically smaller than the probability of jumping to S from cell 3. Since we are optimising OneMax, the algorithm will maintain a search point with exactly \(2k-1\) ones in cell 2 forever. To discover cell 1, the algorithm either needs to pick the search point from cell 2 as parent and jump to a search point of at most \(k-1\) ones, or pick a parent from some cell i, \(i \ge 3\), that has at least \((i-1)k\) ones and jump to the same target. The probability of any such event is at most
For all \(m \ge k-1\), since at least \(m-(k-1)\) bits have to flip, \(\sum _{j=0}^{k-1} p_{m, j} \le \left( {\begin{array}{c}m\\ m-(k-1)\end{array}}\right) p_m^{m-k+1} = \left( {\begin{array}{c}m\\ k-1\end{array}}\right) p_m^{m-k+1}\), thus this is at most
Using that for \(k \ge 2\)
we get, for \(i \ge 4\),
as \((i-1)p_m^{i-3} \le 1\) for all \(i \ge 4\) and large enough n.
The summand for \(i=3\) is
Together, the sought probability is bounded by
Taking the reciprocal yields a lower bound on the expected time for finding cell 1.
For \(\varepsilon (n+1)/2 < k \le (n+1)/2\) all cells have linear width. Since the probability of initialising with at least \(n/2 + \sqrt{n}\) ones is \(\Omega (1)\) by properties of the binomial distribution, it is easy to show that the algorithm will reach a search point with \(2k-1\) ones before reaching cell 1, with high probability. Then we apply the above arguments.
For \(k=1\) we argue differently as the time is no longer dominated by the expected waiting time to jump from cell 2 to cell 1. We assume that the event stated in Lemma 7 occurs and consider \(X_t\) as the minimum number of ones in any search point of the population at time t, \(P_t\). Let \(Q_t\) be the number of ones in the parent chosen in generation t. Define the drift
under the condition that a parent is chosen with a number of ones by i larger than \(X_t\). Note that \(\Delta _0\) is bounded by the expected number of flipping 1-bits, \(\Delta _0 \le s \cdot p_m = \frac{cs}{n}\). We focus on \(X_t \le n^{1/5}\) and show that here the drift is at most
where the first equation follows from the law of total probability and the last equality remains to be proven. We will show that for all \(i \ge 1\), \(\Delta _i\) is asymptotically much smaller than \(\Delta _0\). We trivially have \(\Delta _i \le \textrm{Pr}\left( X_{t+1} < X_t \mid X_t, Q_t = X_t + i\right) \cdot (X_t+i)\) since the maximum possible decrease is \(X_t+i\). In order to decrease \(X_t\) by mutating a parent of \(X_t+i\) ones, it is necessary to flip at least \(i+1\) ones. For \(i \le X_t + n^{1/5}\) This event has probability
and thus \(\Delta _i \le (2cn^{-4/5})^{i+1} \cdot 2n^{1/5} \le O(n^{-7/5})\) and \(\sum _{i=0}^{n^{1/5}-1} \Delta _i = O(n^{-6/5})\). For all \(i \ge X_t + n^{1/5}\) the above probability is at most
Since \(i! \ge (i/e)^i\), this term is \(n^{-\Omega (n^{1/5})}\) and \(\sum _{i=n^{1/5}}^{n-X_t} \Delta _i = n^{-\Omega (n^{1/5})}\). Along with \(L = n+1\), (2) follows.
Finally, we apply the multiplicative drift theorem for lower bounds, Theorem 2.2 in [30], to the process \(X_t\), using the following parameters: \(s_{\min } {:}{=}\log ^2 n\) and \(\beta {:}{=}1/\ln n\). The condition on the drift, \(\text {E}\left( X_t - X_{t+1} \mid X_t = s\right) \le \delta s\), is satisfied for \(\delta {:}{=}c'/n^2\), \(c'\) being a sufficiently small positive constant.
We still need to show that, for all \(s \ge s_{\min } = \log ^2 n\),
Since \(\beta s \ge \beta s_{\min } = \Omega (\log n)\), the above bound holds since the probability of flipping a logarithmic number of bits with \(p_m = c/n\) is superpolynomially small.
By the multiplicative drift theorem, the expected time for reaching a distance of at most \(s_{\min }\), and thus the time to reach cell 1, starting from a distance of at least \(\varepsilon L = \varepsilon n\), is at least
\(\square \)
As an aside, we point out that Theorem 5 gives a lower bound on the expected time for GSEMO to cover the whole Pareto frontFootnote 3 of the bi-objective test function OneMinMax\((x) {:}{=}(\textsc {OneMax} (x), n - \textsc {OneMax} (x))\). To our knowledge, this is a novel (albeit not surprising) result; previously only for SEMO a lower bound of \(\Omega (n^2 \log n)\) was known [31]. SEMO only flips one bit in each mutation, and previous work avoided the complexity of analysing standard mutations.
Theorem 8
The expected time for QD, operating on the 1-NoO feature space, on OneMax to cover all \(n+1\) cells equals the expected time of GSEMO covering the Pareto front of OneMinMax.
Proof
Both algorithms keep one search point in every cell and create new search points by choosing a covered cell uniformly at random, applying standard bit mutation and keeping the result if it covers an uncovered cell. \(\square \)
3.4 General Results and Transformations
We now take a closer look at the question how the performance of QD is affected if the function is not well-aligned with the feature space.
Following [32], we define XOR-transformations as follows.
Definition 2
Given a bit string \(a \in \{0, 1\}^n\) and a fitness function \(f :\{0, 1\}^n \rightarrow {\mathbb {R}}\), we define \(f_a :\{0, 1\}^n \rightarrow {\mathbb {R}}\) by \(f_a(x) {:}{=}f(x \oplus a)\) where \(\oplus \) denotes the bit-wise exclusive OR operation.
This transformation implies that in all positions i where \(a_i = 1\), the operation \(x \oplus a\) inverts the bit value before evaluating the fitness. So the special case \(a=0^n\) implies \(f_a = f\). If f contains a single global optimum \(x^*\), it is moved to \(x^* \oplus a\) when considering \(f_a\) instead.
We now discuss generalisations of our previous results to classes of transformed fitness functions. For easy functions like OneMax, QD is still efficient because of elitism. We give a more general fitness-level upper bound based on the well known fitness-level method or method of f-based partitions [33].
The idea is to partition the search space into sets that are strictly ordered with respect to the fitness of the contained individuals.
Definition 3
For two sets \(A, B \subseteq \{0, 1\}^n\) and fitness function f let \(A <_f B\) if \(f(a) < f(b)\) for all \(a \in A\) and all \(b \in B\). Consider a partition of the search space into non-empty sets \(A_1, \ldots , A_m\) such that \( A_1<_f A_2<_f \cdots <_f A_m \) and \(A_m\) only contains global optima. We say that \(QD \) is in \(A_i\) or on level i if the best individual created so far is in \(A_i\).
Owing to elitism, the algorithm can only increase its current fitness level. Assuming that for each non-optimal fitness level we have a lower bound on the probability of reaching some higher fitness level set, this gives an upper bound on the expected time to escape from the current level. Summing up all these expected waiting times yields an upper bound on the expected time for reaching the final level.
Theorem 9
(Fitness-level upper bound for QD) Consider QD operating on any feature space of L cells. Given a fitness-level partition \(A_1, \ldots , A_m\), let \(s_i\) be a lower bound on the probability of creating a new offspring in \(A_{i+1} \cup \cdots \cup A_m\), provided the best search point in \(QD \)’s map is in \(A_i\). Then the expected optimisation time of \(QD \) on f is at most
Proof
For every set \(A_i\), \(i < m\), the following holds. If the best search point in the map is currently in \(A_i\), the probability of selecting this cell is 1/L and the probability of creating a search point in a better fitness level is at least \(s_i\). If this happens, the set \(A_i\) is left for good. Thus, the expected time spent in \(A_i\) is bounded by the expectation of a geometric random variable with parameter \(s_i/L\), which is \(L/s_i\). Summing up these times for all non-optimal \(A_i\) implies the claim. \(\square \)
For instance, on OneMax we may choose \(A_i = \{x \mid \textsc {OneMax} (x) = i\}\) for \(i \in \{0, \ldots , n\}\) and \(s_i \ge (n-i)p_m(1-p_m)^{n-1}\) for all \(i \le n-1\) as a sufficient condition for increasing the fitness is to flip one of \(n-i\) zero-bits and not to flip any of the other bits. This argument remains valid for all transformations of OneMax when replacing “zero-bits” with “bits that disagree with the global optimum”. For \(p_m = \Theta (1/n)\), we can use \((1-p_m)^{n-1} = \Theta (1)\) and
using the well-known fact that the n-th harmonic number is \(O(\log n)\). This yields the following.
Corollary 10
Consider QD with mutation rate \(p_m = c/n\), \(c > 0\) constant, operating on any feature space with L cells. For every bit string \(a \in \{0, 1\}^n\), the expected optimisation time of QD on the transformed function \(\textsc {OneMax} _a\) is \(O(Ln \log n)\). For the 1-NoO feature space this simplifies to \(O(n^2 \log n)\).
This result concerns the optimisation time. The expected cover time on transformed OneMax functions using the 1-NoO feature space is still bounded by \(O(n^2 \log n)\) as well since Theorem 1 applies to all pseudo-Boolean fitness functions. However, while for standard OneMax a cover implies that all fitness values have been found and all cells are covered optimally (since, trivially, all search points in one cell have the same fitness), this does not necessarily hold for transformed OneMax functions. The following theorem shows that a time bound of \(O(n^2 \log n)\) still applies for the optimal-cover time of every transformed OneMax function.
Theorem 11
Consider QD with mutation rate \(p_m = c/n\), \(c > 0\) constant, operating on the 1-NoO feature space. For every \(a \in \{0, 1\}^n\) the expected optimal-cover time of QD on the transformed function \(\textsc {OneMax} _a\) is \(O(n^2 \log n)\).
Proof
In expected time \(O(n^2 \log n)\) all cells are covered by Theorem 1. This in particular implies that, numbering cells from 0 to n, the cells 0 and n, respectively, are covered optimally, since these cells only contain single search points (\(0^n\) and \(1^n\), respectively).
Now we estimate the expected remaining time for covering all other cells optimally. Let \(i^*\) denote the number of ones in the global optimum, \({\overline{a}}\). For \(0 \le j \le i^*\) let \(T_j\) denote the random time until all cells \(0, 1, \ldots , j\) are covered optimally. For \(0 \le j \le i^*-1\) we consider the algorithm at time \(T_j\) and estimate the expected time for covering cell \(j+1\) optimally. Note that the minimum Hamming distance between the search point \(x_j\) in cell j and the optimum is \(i^* - j\). As \(x_j\) is optimal for cell j, there must be \(i^*-j\) bits that are 1 in the global optimum and 0 in \(x_j\). A 1-bit flip inverting one of these bits creates an optimal solution in cell \(j+1\). The probability of this event is at least \(1/L \cdot (i^*-j)\cdot C/n\) for some constant \(C > 0\) that depends on \(p_m\). Consequently, the expected time for covering all cells \(0, \ldots , i^*\) optimally, starting with a full cover, is at most
A symmetric argument applies for the expected time to cover all cells \({n, n-1, \ldots , i^*+1}\) optimally, swapping the roles of zeros and ones. This shows an upper bound of \(O(n^2 \log n)\). \(\square \)
In summary, on easy functions (for which the fitness-level method yields good bounds), transformations are not that harmful for the expected optimisation time.
Now we give a more extreme example where transformations affect performance significantly.
Definition 4
The deceptive function Trap is defined as
It is well known [24] that the (1+1) EA with mutation rate \(p_m = 1/n\) requires time \(\Theta (n^n)\) in expectation on Trap and that this is the largest expected optimisation time possible for algorithms using standard bit mutation with mutation probability 1/n in each offspring creation. The reason is that in each offspring creation, the probability of turning the parent x into a global optimum \(x^*\) is
where the inequality holds for all mutation probabilities \(p_m \le 1/2\) (as then \(p_m \le 1-p_m\)). Consequently, for every fitness function and every mutation probability \(0 < p_m \le 1/2\), the expected time for creating a global optimum is at most \((1/p_m)^n\).
Note that Trap is a function of unitation, hence Corollary 3 yields an upper bound of \(O(n^2 \log n)\) for QD operating on the 1-NoO feature space with respect to the expected optimisation time and even the expected optimal-cover time. This result breaks down when considering transformations of the Trap function.
Theorem 12
Consider QD with mutation rate \(p_m = c/n\), \(c > 0\) constant, operating on the 1-NoO feature space. There is a transformed Trap function for which the optimisation time of QD is in \(\Omega ((1/p_m)^{\lfloor n/2 \rfloor }n)\) with overwhelming probability and in expectation.
In the analysis we will consider the impact of a mutation operation and distinguish between flipping bits that contribute to a decrease of the Hamming distance and those contributing to an increase. This process can be modelled as an urn model with red and white balls, where the red balls correspond to bit flips approaching the optimum. The following lemma shows that in a general urn model with N balls, r of which are red, if the number of red balls is small, the probability of drawing more red balls than white balls is bounded by r/N. This is trivially the precise formula when drawing exactly one ball. The following lemma shows that this bound applies regardless of the number of balls drawn.
Lemma 13
Consider an urn model in which k balls are chosen uniformly at random, without replacement, from an urn containing \(N \in {\mathbb {N}}\) red or white balls, r of which are red. If \(r \le N/4\) then, for every value \(k \in \{0, \ldots , N\}\) the probability of drawing more red balls than white balls is at most r/N.
Proof
The probability of drawing more red than white balls equals the sum of probabilities of drawing i red balls, for \(i \in \{\lfloor k/2\rfloor +1, \ldots , k\}\). Let \(p_i\) denote the probability of drawing i red balls, then \(p_i = 0\) if \(i > r\) and otherwise, if \(r \ge i\),
Using \(r \ge i\), the product of the first and the last \(\prod \) terms is at most 1 and the above simplifies to
As an aside, note that the latter, when multiplied with a factor of \((1-r/N)^{k-i}\), yields the probability of choosing i red balls with replacement, that is, a binomial distribution with parameters k and r/N. Now the sought probability is at most
Noting \(\sum _{i=0}^k \left( {\begin{array}{c}k\\ i\end{array}}\right) = 2^k\), the sum is precisely \(2^k/2\) if k is odd by symmetry of binomial coefficients and \(2^k/2 - \left( {\begin{array}{c}k\\ k/2\end{array}}\right) /2 \le 2^k/2\) if k is even, thus we get
where the last inequality is obvious for even k and follows from \((1/4)^{(k-1)/2} = (1/2)^{k-1}\) for odd k. \(\square \)
Now we are able to give a proof of Theorem 12
Proof of Theorem 12
Throughout this proof we assume that n is large enough such that all inequalities holding for large enough n hold. We choose the transformation according to \(a {:}{=}1^{\lfloor n/2 \rfloor } 0^{\lceil n/2 \rceil }\), that is, the global optimum is a and the fitness of all other search points equals the Hamming distance to a. Note that in every non-empty cell the Hamming distance to the optimum can only increase, except for the cell containing search points with \(\lfloor n/2 \rfloor \) ones in the unlikely case that a is reached. However, empty cells will accept any search point, including those in which the Hamming distance to a has decreased, compared to the current parent. Hence we need to show that QD does not evolve any search points that are too close to a.
To this end, denoting by \(M_t\) the map at time t we define the distance measure
Note that \(D_t = 0\) if and only if a is found. Then we show that the following statements all hold with overwhelming probability:
-
1.
Initially, \(D_0 \ge 2n^{2/3}\).
-
2.
Whenever \(D_t\) is decreased, it decreases by at most \(n^{1/4}\).
-
3.
There are at most \(n^{5/12}\) steps decreasing \(D_t\) before the map is filled.
Together, if these very likely events all occur, the total decrease of \(D_t\) is bounded by \(n^{1/4} \cdot n^{5/12} = n^{2/3}\) and \(D_t\) is always at least \(n^{2/3}\) while the map is being filled. We assume in the following that this happens. Once the map is filled, as long as a is not found, QD behaves as on the transformed function \(\textsc {OneMax} _a\). We consider the random time \(T^*\) to find a or to cover each cell with the optimal \(\textsc {OneMax} _a\)-value. We know \(\text {E}\left( T^*\right) = O(n^2\log n)\) by Theorem 11, using that the unlikely event of finding a is an additional stopping event. During this time, every parent has Hamming distance at least \(D_t\) to a by definition of \(D_t\). Using \(D_t \ge n^{2/3}\), the probability of creating a is at most \(p_m^{n^{2/3}}\) and the probability of this happening during a random phase of length \(T^*\) is at most
After time \(T^*\), if a has not been found, every cell will have a search point with maximum Hamming distance to a. The maximum Hamming distance of any search point in the cell containing i ones to a is attained for the search point \(0^{n-i}1^i\). In case \(i \le \lceil n/2 \rceil \) the maximum Hamming distance is \(i+\lfloor n/2 \rfloor \) as all ones are placed in disjoint positions. In case \(i > \lceil n/2 \rceil \) the maximum Hamming distance is \(\lceil n/2 \rceil + n - i\) as all zeros are placed in disjoint positions. Together, the maximum Hamming distance is \(\lfloor n/2 \rfloor \) for \(i=0\), \(\lceil n/2 \rceil \) for \(i=n\) and at least \(\lfloor n/2 \rfloor + 1\) for all \(1 \le i \le n-1\). Hence, creating the optimum by mutation has probability at most \(2/L \cdot p_m^{\lfloor n/2 \rfloor } + (1-2/L) \cdot p_m^{\lfloor n/2 \rfloor +1} = O(p_m^{\lfloor n/2 \rfloor }/n)\). This implies the claimed time bound.
It remains to show the three claims from above. The first claim easily follows from applying Chernoff bounds to the number of bits in the initial search point that agree with a. The expectation of this quantity is n/2 and the probability of it being less than \(2n^{2/3}\) is \(1-2^{-\Omega (n)}\). For the second claim, we use that decreasing \(D_t\) by d requires the algorithm to flip at least d bits in one standard bit mutation. The probability of flipping at least \(n^{1/4}\) bits is at most
using \(j! \ge (j/e)^j\) and properties of the geometric series.
The probability of this happening in the first \(T^*\) steps is still exponentially small, using the same arguments as above.
Finally, we show the third claim, assuming that the typical events from the first two statements occur. Note that \(D_t\) can only be decreased if the parent has Hamming distance at most \(D_t + n^{1/4} \le 2n^{2/3} + n^{1/4} \le 3n^{2/3}\) to a. Consequently, it suffices to consider cells with a number of ones in \(S {:}{=}\{|a|_1 - 3n^{2/3} + 1, \ldots , |a|_1 + 3n^{2/3} - 1\}\) as all other cells are guaranteed to have a larger Hamming distance to a. As mentioned before, it also suffices to consider steps where empty cells are filled as these are the only steps capable of decreasing \(D_t\). We consider such a step and apply the method of deferred decisions: we assume that we know the number of one-bits flipped in this step and the number of flipped zero-bits, but that we do not yet know their positions. This is a valid argument since only the number of flipped one-bits and zero-bits determine the cell the offspring belongs to. Now, assume that the new cell has been reached, creating an offspring y from flipping o one-bits and z zero-bits in its parent, x, with \(H(x, a) \le 3n^{2/3}\). The latter implies that at most \(3n^{2/3}\) one-bits in x disagree with a and at least \(|a|_1 - 3n^{2/3} - 3n^{2/3} \ge n/2 - 6n^{2/3} \ge n/3\) one-bits in x agree with a. A necessary condition for decreasing the Hamming distance to a is that amongst all flipping one-bits we have a surplus of bits that disagree with a or that amongst all flipping zero-bits we have a surplus of bits that disagree with a. Imagining the choice of bits as an urn model with at most \(3n^{2/3}\) red balls and n/3 bits in total, the probability of having a surplus of red balls representing flipping one-bits is at most \(3n^{2/3}/(n/3) = 9n^{-1/3}\) and the same holds for flipping zero-bits. A union bound of these two events still yields a probability of at most \(18n^{-1/3}\).
Since these arguments are agnostic of the specific number of bits flipped, and these events are independent for different cells, the number of steps decreasing \(D_t\) is stochastically dominated by a sequence of \(|S| \le 6n^{2/3}\) independent Bernoulli random variables with parameter \(18n^{-1/3}\). The expected number of steps decreasing \(D_t\), \(\text {E}\left( X\right) \), is at most \(108n^{1/3}\). We apply Chernoff bounds to the latter, choosing a parameter \(\delta \) such that \((1+\delta )\text {E}\left( X\right) = n^{5/12}\), which implies \(\delta = \Theta (n^{1/12})\). Then Chernoff bounds yield a probability bound of \(e^{-\Omega (\text {E}\left( X\right) \delta ^2)} = e^{-\Omega (n^{1/3} \cdot n^{2/12})} = e^{-\Omega (n^{1/2})}\) of having more than \(n^{5/12}\) such steps.
This completes the proof of the third claim and the theorem. \(\square \)
Next, we show that the above lower bound is asymptotically tight.
Theorem 14
Consider QD with mutation rate \(p_m = c/n\), \(c > 0\) constant, operating on the 1-NoO feature space. For every pseudo-Boolean function \(f :\{0, 1\}^n \rightarrow {\mathbb {R}}\) the expected optimisation time of QD on f is at most
where \(\textrm{OPT}_f\) is the set of global optima of f and, for sets \(A, B \subseteq \{0, 1\}^n\), \(H(A, B) {:}{=}\min _{x \in A, y \in B}H(x, y)\) denotes the minimum Hamming distance between any search point \(x \in A\) and any search point \(y \in B\).
Proof
Within expected time \(O(n^2 \log n)\), QD has covered all cells, by Theorem 1. In particular, QD has found both strings \(0^n\) and \(1^n\). Let \(x^*\) be a global optimum of f with \(H(\{x^*\}, \{0^n, 1^n\}) = H(\textrm{OPT}_f, \{0^n, 1^n\}) {=}{:}h\). Then \(x^*\) can be created from flipping h bits in either \(0^n\) or \(1^n\). The probability of selecting a parent with Hamming distance at most h to \(x^*\) and mutating it into \(x^*\) is at least \(1/L \cdot p_m^h(1-p_m)^{n-h} = \Omega (p_m^{h}/n)\).
Taking the reciprocal yields an upper bound on the expected waiting time for finding a global optimum after covering all cells. The second bound follows since for any search point \(x^* \in \textrm{OPT}_f\) we have \(H(x^*, 0^n) + H(x^*, 1^n) = n\) as the left-hand side adds up the number of ones in \(x^*\) and the number of zeros in \(x^*\). This implies \(H(\textrm{OPT}_f, \{0^n, 1^n\}) \le \min \{H(x^*, 0^n), H(x^*, 1^n)\} \le \lfloor n/2 \rfloor \). \(\square \)
We conclude that QD with the 1-NoO feature space is efficient when there are global optima close to \(0^n\) or \(1^n\) in terms of Hamming distance. But it cannot avoid exponential optimisation times if all global optima have a sufficiently large Hamming distance from both \(0^n\) and \(1^n\). However, the worst case of \(\Theta ((1/p_m)^{n})\) can be lowered significantly, to \(\Theta ((1/p_m)^{\lfloor n/2 \rfloor }n)\). The latter is nearly the square root of the former and the upper bound is smaller by a factor of order \((1/p_m)^{\lceil n/2\rceil }/n\). Note, however, that in the worst case both bounds are exponential in n, limiting the usefulness of these bounds. This is why we now turn to combinatorial problems where we will show that QD can guarantee finding good solutions in expected polynomial time.
4 Monotone Submodular Function Optimisation
We now consider the NP-hard problem of submodular function maximisation. Submodular functions generalise many well-known NP-hard combinatorial optimisation problems, e.g., finding a maximum cut in a graph, and are thus of utmost relevance. Let \(\Omega \) be a finite ground set and let \(f: 2^{\Omega } \rightarrow {\mathbb {R}}\) be a set function. The function f is submodular if for all \(A, B \subseteq \Omega \)
and f is monotone if \(f(A) \le f(B)\) for all \(A \subseteq B \subseteq \Omega \). We assume that f is normalised, i.e., \(f(A) \ge 0 \,\forall A \subseteq \Omega \) and \({f(\emptyset ) = 0}\). Given a constraint r the goal is to find a set \(\text {OPT}\subseteq \Omega \) with \(f(\text {OPT}) = \max _{A \subseteq \Omega , |A|\le r} f(A)\). Friedrich and Neumann [18, 34] were the first to study the performance of evolutionary algorithms on monotone submodular functions. In particular they showed that GSEMO, using a bi-objective fitness function, obtains a \((1-1/e)\)-approximation on any monotone submodular function with a single uniform matroid constraint in expected time \(O(n^2 (\log (n) + r))\).
The authors consider the fitness function \(g(x):= (z(x), |x|_0)\) where \(z(x) = f(x)\) if the solution x is feasible (i.e., \(|x|_1 \le r\)) and \(z(x) = -1\) otherwise. They use the concept of Pareto-dominance and define \(g(x) \ge g(y)\) if and only if \(((z(x) \ge z(y)) \wedge (|x|_0 \ge |y|_0))\) holds; in this case x (Pareto-)dominates y. They study the expected runtime of the GSEMO (see Algorithm 2) until a solution \(x^{*} = \text {arg max}_{x \in P} z(x)\) with \(f(x^{*})/\text {OPT}\ge \alpha \) is hit for the first time.
In the following we show that QD with \(p_m {:}{=}1/n\) also achieves an approximation ratio of \((1-1/e)\) in polynomial expected time \(O(n^2(\log (n) + r))\) using the function f itself as the fitness function.
Theorem 15
The expected time until QD operating on the 1-NoO feature space has found a \((1-1/e)\)-approximate solution for a monotone submodular function with a uniform cardinality constraint r is \(O(n^2 (\log (n) + r))\).
The proof follows the proof of Theorem 2 in [18] and the proof of the approximation quality of the deterministic greedy algorithm by Nemhauser et al. [35]. This is because the working principles of QD and GSEMO match. While the fitness function for GSEMO cleverly encodes the cardinality constraint as a second objective and uses Pareto-dominance to decide which solutions to keep, the 1-NoO feature space does the job for QD.
Proof of Theorem 15
By Theorem 1 the first cell of the map is populated with the solution \(0^n\) in \(O(n^2 \log n)\) steps in expectation. Since \(0^n\) is the only solution with n zeroes it will not be replaced.
More precisely, we show that once \(0^n\) is stored in the map, QD evolves solutions \(x_j\) for \(0 \le j \le r\) with approximation ratio
in expected time \(O(n^2r)\). The proof is by induction over j. As a base case we note that once the all zeroes string is in the map, Eq. (4) holds for \(x_0 = 0^n\) since
Now assume that Eq. (4) holds for every \(0 \le i \le j < r\); we show that it also holds for \(x_{j+1}\). To this end the algorithm needs to (i) select the \(x_j \in M\) with \(|x_j|_1 = j\) and (ii) add the element with the largest possible increase in f in a single mutation. Let \(\delta _{j+1}\) be the increase in fitness we obtain by this event. Then we get
where the first inequality follows from monotonicity of f and the second by submodularity. Rearranging terms we obtain
This leads to
The latter for \(j = r\) simplifies to
proving the claimed approximation ratio. Note that the required selection plus mutation event happens at least with probability \(\frac{1}{n+1} \cdot \frac{1}{en}\) where the first factor is the probability to select the respective parent and the second factor is a lower bound on the probability to greedily add the element with largest increase in fitness. Summing over all r iterations we obtain an expected runtime of \(\sum _{j=0}^{r-1} (n+1)ne = O(n^2r)\). Adding the initial time to hit \(0^n\) yields \(O(n^2\log n) + O(n^2r) = O(n^2(\log (n) + r))\) and completes the proof. \(\square \)
We see that QD achieves the same approximation ratio as GSEMO and it does so in the same expected runtime. This comes as little surprise, as the algorithms work similarly (population with at most \((n+1)\) non-dominated individuals versus a map with at most \((n+1)\) individuals) as pointed out before. However, QD is a much simpler algorithm. The fitness function is the set function itself and no bi-objective formulation is required (that role is played by the feature space). As a consequence, the algorithm does not need to perform dominance checks in every iteration. We are confident that many other results from [18] also hold for QD.
5 Minimum Spanning Forests
We now turn the focus to problem-tailored feature spaces and a well-known combinatorial optimisation problem. Consider an undirected graph \(G=(V,E)\) with \(n=|V|\) nodes, \(m = |E|\) edges, and a weight function \(w: E \rightarrow {\mathbb {N}}^{+}\) that maps each edge to a positive integer weight. Let \(C(G) \in \{1, \ldots , n\}\) be the number of connected components (CCs) of G. For a connected graph G (i.e., \(C(G) = 1\)) every acyclic sub-graph of G that contains all vertices is called a spanning tree (ST); let \({\mathcal {T}}\) be the set of all spanning trees of G. A spanning tree \(T^{*}\) with \(T^{*} = \text {arg min}_{T \in {\mathcal {T}}} \sum _{e \in T} w(e)\) is a minimum spanning tree (MST) of G. For an unconnected undirected graph G with \(C(G) > 1\) a spanning forest (SF) is a collection of C(G) STs (each one for each connected component). A minimum spanning forest (MSF) corresponds to a SF composed of MSTs for every CC.
We encode solution candidates as bit-strings x from the search space \(S=\{0,1\}^m\) where \(x_i = 1\) means that the ith edge is in the solution and \(x_i=0\) otherwise given an arbitrary, but fixed order of the edges \(e_1, \ldots , e_m\). The fitness of solution candidate x is—overloading the function w—defined as
Let cc(x) be the number of connected components of \(x \in S\).
The MST problem was studied for Randomised Local Search [19], ant colony optimisation [36] and simulated annealing [37, 38].
Neumann and Wegener [19] used a bi-objective formulation of the MST problem where the fitness \(f': S \rightarrow {\mathbb {R}}^2\) of an individual \(x \in S\) was defined by \(w'(x) = (cc(x),w(x))\) which is to be minimised in both objectives simultaneously. Since \(cc(x) \in \{1, \ldots , n\}\) the Pareto front has size n and thus the population size of SEMO and GSEMO is bounded from above by n. They showed that both SEMO and GSEMO, given an arbitrary initial search point, solve the bi-objective variant of the MST in expected time \(O(n^2m)\) if all edge weights are at most \(2^n\).
In the following we first consider QD operating on a two-dimensional feature space. The first dimension is the defined by the number of CCs of a solution. Note that, when implementing the algorithm, one might want to map these values to a set \(\{1, \ldots , n-C(G)+1\}\) by applying some affine transformation, but for our proofs this is immaterial. The second dimension of M is defined by the number of edges used in the solutions. In consequence, the size of the map is \(L = (n-C(G)+1)(m+1)\). We term this the CCE feature space. We remark that some cells may remain empty forever since not all cells admit feasible solutions. In particular, \(C(G) \ge n-m\) since introducing an edge can only reduce the number of connected components by at most 1. Thus, all cells with \(C(G) < n-m\) are always empty. We shall ignore empty cells as they will not play a role for parent selection and we adapt our notation of cover times and optimal-cover times in an obvious way, to exclude cells that admit no feasible solutions. The only (mild) effect is that L is an overestimation of the number of non-empty cells.
Theorem 16
Let \(G=(V,E)\) be an undirected graph with n nodes, m edges, edge weights \(w: E \rightarrow {\mathbb {N}}^{+}\), and \(C(G) \in \{1, \ldots , n\}\) connected components. QD operating on the CCE feature space finds an MSF of G in expected time \(O(m^2(n-C(G))(n-C(G) + \log m))\).
Proof
The proof consists of two phases. We first need to add the empty edge set \(0^m\) (with n connected components) to the map. Once this event happened, we proceed with the construction of an MSF in the second phase. As the case \(m = 0\) is trivial, we assume \(m > 0\).
Phase 1: Let \(x \in M\) be the solution with the minimum number of edges and—among such solutions—with the maximum number of CCs. Dropping an edge from x brings us closer to the empty edge set and the solution is accepted in M since it decreases the minimum number of edges. Note that the number of CCs of the mutant may reduce by exactly one or stay the same. However, the number of CCs is clearly non-increasing. If x has \(1 \le i \le m\) edges such an event happens with probability at least
Here we used that \(L = (n-C(G)+1)(m+1) \le 2(n-C(G)) \cdot 2m = 4(n-C(G))m\). Using artificial fitness levels, \(0^m\) is hit for the first time after at most
expected steps.
Phase 2: Recall Kruskal’s MST algorithm [39, 40]. It starts with the empty edge set and adds the cheapest edge that does not introduce a cycle to the solutions until an MSF is found. This step is repeated \(n-C(G)\) times. Once \(0^m \in M\), QD on the CCE feature space can mimic this behaviour. Let \(x_t\) at time t denote the acyclic solution with the smallest number of CCs and minimum weight that is currently stored in the map. At the beginning of the process \(0^m\) fulfills these requirements. In order to locate the cheapest acyclic sub-graph with \(cc(x_t)-1\) CCs QD needs to add the cheapest edge that reduces the number of CCs and does not yield a cycle (exactly as Kruskal’s algorithm proceeds in a deterministic manner). This event happens with probability at least \(1/(eLm) \ge \frac{1}{4em^2(n-C(G))}\). After \(n-C(G)\) such steps an MSF is found. Taking the reciprocal to obtain an upper bound on the expected waiting time for such a step, this phase ends after at most \((n-C(G))^24em^2 = O((n-C(G))^2m^2)\) expected iterations.
Adding up the expected times of both phases yields an upper bound of
\(\square \)
It follows that on a connected graph an MST is found by QD on the CCE feature space after \(O(m^2n^2)\) function evaluations.
Corollary 17
Let \(G=(V,E)\) be an undirected, connected graph with n nodes, m edges and edge weights \(w: E \rightarrow {\mathbb {N}}^{+}\). QD operating on the CCE feature-space finds an MST of G in expected time \(O(m^2n^2)\).
Since \(L = \Theta (nm)\) the likelihood for successful mutations is quite low, though, yielding a runtime bound of \(O(n^6)\) on dense graphs.
In the following we drop the second dimension of the map and derive an upper bound for QD with the feature space being defined solely as via the number of CCs. Thus, the map size in this setting is \(L = n - C(G) + 1\). For each number of CCs we store the best-so-far solution in the map. We refer to this as CC feature space.
Theorem 18
Let \(G=(V,E)\) be an undirected graph with n nodes, m edges, edge weights \(w: E \rightarrow {\mathbb {N}}^{+}\), and \(C(G) \in \{1, \ldots , n\}\) connected components. QD operating on the CC feature space locates \(0^m\) (the empty edge set) in an expected time of \(O((n-C(G)+1)m\log (nw_{\max }))\) where \(w_{\max }\) is the maximum edge weight.
Proof
We follow well-established arguments for the analysis of the \((\mu + 1)\)-EA and GSEMO on MST. Let \(x_t\) denote a solution with minimum weight amongst all solutions stored by QD at time t. Note that \(w(x_t)\) is non-increasing over time. If there are k edges selected in \(x_t\), there are k distinct mutations flipping only a single selected edge, and not flipping any unselected edges. As all these mutations, when applied at the same time, decrease the weight from \(w(x_t)\) to 0, the expected decrease in weight, when applying one such mutation chosen uniformly at random, is \(w(x_t)/k\). The event we condition on has probability \(k \cdot 1/m \cdot (1-1/m)^{m-1} \ge k/(em)\). In addition, the probability of selecting \(x_t\) as parent is at least \(1/(n-C(G)+1)\). Together,
The weight of the initial solution is \(w_0 \le n w_{\max }\). Applying the multiplicative drift theorem [41] yields an upper bound of \(O((n-C(G)+1)m \log (n w_{\max }))\). \(\square \)
The following result follows for connected graphs where \(C(G) = 1\).
Corollary 19
On every connected graph \(G=(V,E)\) with n nodes, m edges and edge weights \(w: E \rightarrow {\mathbb {N}}^{+}\), QD operating on the CC feature space locates \(0^m\) (the empty edge set) in expected time of \(O(nm\log (nw_{\max }))\) where \(w_{\max } = \max _{e \in E} w(e)\) is the maximum edge weight.
The time bound matches the one for the expected time until the population of GSEMO contains \(0^m\) (see [19, Theorem 1]).
We remark that the preliminary version of this work published at GECCO 2023 [20] claims an upper bound of \(O(nm \log n)\) in its Theorem 6.1. This result matches the above statement when all weights are polynomial in n. Regrettably, the proof in [20] is flawed as mutations adding edges while also removing edges may increase the total number of edges. Hence, the argument used in [20] that QD may remove all edges by mutations flipping one bit is not sufficient. The authors thank Frank Neumann for pointing this out. To our knowledge the upper bound of \(O(nm \log (n w_{\max }))\) is the best known upper bound for the expected time of the (1+1) EA and GSEMO for finding an empty selection of edges in the MST problem [19], along with a cruder bound of \(O(nm^2 \log n)\) due to Reichel and Skutella [42] that removes the dependence on weights. We conjecture that a time bound of \(O(nm \log n)\) holds for all weights, but we currently do not have a proof and therefore have to leave this as an open problem. The following theorem bounds the expected time an MSF on G is found. In addition, the proof shows that optimal solutions are found for all cells, i. e., QD finds minimum-weight edge selections that induce i connected components, for all values of \(i = C(G), \ldots , n\). We therefore obtain a bound on the expected optimal-cover time. An MSF corresponds to an optimal solution of cell \(i = C(G)\).
Theorem 20
Let \(G=(V,E)\) be an undirected graph with n nodes, m edges, edge weights \(w: E \rightarrow {\mathbb {N}}^{+}\), and \(C(G) \in \{1, \ldots , n\}\) connected components. The expected optimal-cover time of QD operating on the CC feature space and starting with \(0^m \in M\) is \(O((n-C(G)+1)nm)\). In particular, QD finds an MSF of G in expected time \(O((n-C(G)+1)nm)\).
Proof
For \(i = C(G), \ldots , n\) let \(x_i^{*}\) be a minimum-weight edge selection inducing i connected components, i.e.,
It holds that \(x_n^{*}=0^m\) since \(cc(0^m)=n\) and \(0^m\) is the empty edge set; all edge weights are positive and thus adding edges can only worsen the fitness. Clearly, \(x_{C(G)}^{*}\) is a minimum spanning forest of the input graph. Given \(x_i^{*}, i \in \{C(G)+1, \ldots , n\}\) we can reach \(x_{i-1}^{*}\) by injecting an edge of minimal weight into \(x_i^{*}\) which does reduce the number of CCs by one or, put differently, does not close a cycle. We call such a step a Kruskal-step as it is exactly the way Kruskal’s well-known MST algorithm builds an MST (see, e.g., [40]). Now, starting with \(x_n^{*}\) we can create the sequence \(x_{n-1}^{*}, x_{n-2}^{*}, \ldots , x_{C(G)+1}^{*}, x_{C(G)}^{*}\) by \(n-C(G)\) consecutive Kruskal-steps. Having \(x_i^{*} \in M\) the probability for a Kruskal-step is at least \(1/((n-C(G)+1)em)\) as we need to select the right cell and flip exactly one bit. In consequence, optimal solutions for all cells, including an MSF, are found after at most \((n-C(G))(n-C(G)+1)em = O((n-C(G)+1)^2m)\) steps in expectation. \(\square \)
Combining the results of Theorem 18 and Theorem 20 yields the following result.
Corollary 21
On every graph \(G=(V,E)\) with n nodes, m edges, edge weights \(w: E \rightarrow {\mathbb {N}}^{+}\) with \(w(e) \le 2^{O(n)}\) for all \(e \in E\), and \(C(G) \in \{1, \ldots , n\}\) connected components, QD operating on the CC feature space and starting with an arbitrary initial solution finds optimal solutions for all cells, including an MSF of G, in expected time \(O((n-C(G)+1)m\max \{n-C(G)+1,\log n\})\).
Corollary 22
On every connected graph \(G=(V,E)\) with n nodes, m edges and edge weights \(w: E \rightarrow {\mathbb {N}}^{+}\) with \(w(e) \le 2^{O(n)}\) for all \(e \in E\), QD operating on the CC feature space and starting with an arbitrary initial solution finds optimal solutions for all cells, including an MSF of G, in expected time \(O(n^2m)\).
For edge weights bounded from above by \(2^{O(n)}\) this matches the result by Neumann and Wegener for (G)SEMO [19].
6 Conclusion
We performed runtime analyses of a simple quality diversity algorithm termed QD in the context of pseudo-Boolean optimisation and combinatorial optimisation. For the number-of-ones feature space with granularity parameter k we showed an upper bound on the expected time until all L cells are covered, for different values of k. We gave upper bounds on the expected optimisation time on OneMax, functions of unitation and monotone functions, for which the feature space is aligned favourably with the structure of the problem.
Applying a transformation to such problems may diminish this favourable alignment. On easy functions like OneMax, a general fitness-level bound shows that performance unaffected by transformations, at least asymptotically speaking. While Trap can be optimised in expected time \(O(n^2 \log n)\), after applying a worst-case transformation the expected time increases to \(\Theta ((1/p_m)^{\lfloor n/2 \rfloor }n)\), which is better than the expected time of \((1/p_m)^{n}\) for a (1+1) EA by an exponential factor. We also saw that including two complementary search points like \(0^n\) and \(1^n\) can exponentially decrease the expected optimisation time on the hardest possible problems.
We showed that QD finds a \((1-1/e)\)-approximation for the task of monotone submodular function maximisation, given a single uniform cardinality constraint.
Finally, we considered the problem of finding a minimum spanning forest for edge-weighted graphs G with \(C(G) \in \{1, \ldots , n\}\) connected components. Here, a two-dimensional feature space spanned by the number of connected components and the number of edges used in the solutions yields a bound of \(O(m^2n^2)\) imitating Kruskal’s well-known MST algorithm. However, we showed that QD using only the CC-based dimension yields a significantly better upper bound of \(O((n-C(G)+1)m\max \{n-C(G)+1,\log n\})\) for integer weights up to \(2^{O(n)}\). On connected graphs this runtime bound simplifies to \(O(n^2m)\). Interestingly, we find that the working principles of QD are similar to those of GSEMO while its implementation is easier and there is no need to use the concept of Pareto-dominance.
We see this paper as a starting point for a deep dive into the theoretical study of QD-algorithms. One important question for future work is to investigate the alignment between the feature space and the problem structure in more depth, to see how the efficiency of QD is affected if there is a mismatch between the two. Straightforward extensions of QD might consider more combinatorial optimisation problems, e.g., graph colouring problems, the study of different (multi-)dimensional feature spaces, and supplementary experiments. A first step in this direction was undertaken recently [43], where in the context of monotone submodular function optimisation the 1-NoO feature space was generalised by allowing users to add Boolean formulas to guide the optimisation process, leading to several overlaid problems being optimised in parallel [43]. We also see different directions for methodological/algorithmic extensions, e.g., keeping multiple solutions per cell or combining classic population-based EAs with quality diversity approaches. Another promising research direction is identifying general relationships between QD and GSEMO.
Data Availability
No datasets were generated or analysed during the current study.
Notes
In multi-objective optimisation a search point x Pareto-dominates a solution y if x is no worse than y in all objectives and x is strictly better than y in at least one objective.
This worst case is likely to occur frequently on the function OneMax, when trying to reach cells of smaller index, and on the function −TwoMax with \(\textsc {TwoMax}(x):= \max \{\sum _{i=1}^n x_i, \sum _{i=1}^n (1-x_i)\}\).
The Pareto front F is the set of objective vectors for all Pareto-optimal solutions, that is, solutions that are not Pareto-dominated by any other search point in the search space.
References
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing, 2nd edn. Springer, Berlin (2015)
Deb, K.: Optimization for Engineering Design: Algorithms and Examples, 2nd edn. PHI Learning Private Limited, New Delhi (2012)
Preuss, M.: Multimodal optimization by means of evolutionary algorithms. In: Natural Computing Series. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-07407-8
Ulrich, T., Thiele, L.: Maximizing population diversity in single-objective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’11), pp. 641–648. ACM, New York (2011). https://doi.org/10.1145/2001576.2001665
Chatzilygeroudis, K., Cully, A., Vassiliades, V., Mouret, J.-B.: Quality-diversity optimization: a novel branch of stochastic optimization. In: Pardalos, P.M., Rasskazova, V., Vrahatis, M.N. (eds.) Black Box Optimization, Machine Learning, and No-Free Lunch Theorems. Springer Optimization and Its Applications, pp. 109–135. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-66515-9_4
Mouret, J.-B., Clune, J.: Illuminating search spaces by mapping elites (2015). arXiv:1504.04909
Cully, A., Mouret, J.-B.: Evolving a behavioral repertoire for a walking robot. Evol. Comput. 24(1), 59–88 (2016). https://doi.org/10.1162/EVCO_a_00143
Bossek, J., Neumann, F.: Exploring the feature space of TSP instances using quality diversity. In: Proceedings of the Genetic and Evolutionary Computation Conference. GECCO ’22, pp. 186–194. ACM, New York (2022). https://doi.org/10.1145/3512290.3528851
Nikfarjam, A., Neumann, A., Neumann, F.: On the use of quality diversity algorithms for the traveling thief problem. In: Proceedings of the Genetic and Evolutionary Computation Conference. GECCO ’22, pp. 260–268. ACM, New York (2022). https://doi.org/10.1145/3512290.3528752
Nikfarjam, A., Viet Do, A., Neumann, F.: Analysis of quality diversity algorithms for the knapsack problem. In: Parallel Problem Solving from Nature (PPSN ’22), pp. 413–427. Springer-Verlag, Berlin (2022). https://doi.org/10.1007/978-3-031-14721-0_29
Neumann, F., Witt, C.: Bioinspired computation in combinatorial optimization: algorithms and their computational complexity. In: Natural Computing Series. Springer, Berlin (2010)
Rajabi, A., Witt, C.: Stagnation detection in highly multimodal fitness landscapes. In: Proceedings of the Genetic and Evolutionary Computation Conference. GECCO ’21, pp. 1178–1186. ACM, New York (2021). https://doi.org/10.1145/3449639.3459336
Friedrich, T., Oliveto, P.S., Sudholt, D., Witt, C.: Analysis of diversity-preserving mechanisms for global exploration. Evol. Comput. 17(4), 455–476 (2009). https://doi.org/10.1162/evco.2009.17.4.17401
Zheng, W., Liu, Y., Doerr, B.: A first mathematical runtime analysis of the non-dominated sorting genetic Algorithm II (NSGA-II). Proc. AAAI Conf. Artif. Intell. 36(9), 10408–10416 (2022). https://doi.org/10.1609/aaai.v36i9.21283
Do, A.V., Guo, M., Neumann, A., Neumann, F.: Analysis of evolutionary diversity optimisation for permutation problems. In: Proceedings of the Genetic and Evolutionary Computation Conference. GECCO ’21, pp. 574–582. ACM, New York (2021). https://doi.org/10.1145/3449639.3459313
Doerr, B., Jansen, T., Sudholt, D., Winzen, C., Zarges, C.: Mutation rate matters even when optimizing monotonic functions. Evol. Comput. 21(1), 1–21 (2013). https://doi.org/10.1162/EVCO_a_00055
Lengler, J., Steger, A.: Drift analysis and evolutionary algorithms revisited. Comb. Probab. Comput. 27(4), 643–666 (2018). https://doi.org/10.1017/S0963548318000275
Friedrich, T., Neumann, F.: Maximizing submodular functions under matroid constraints by evolutionary algorithms. Evol. Comput. 23(4), 543–558 (2015). https://doi.org/10.1162/EVCO_a_00159
Neumann, F., Wegener, I.: Minimum spanning trees made easier via multi-objective optimization. Nat. Comput. 5(3), 305–319 (2006). https://doi.org/10.1007/s11047-006-9004-x
Bossek, J., Sudholt, D.: Runtime analysis of quality diversity algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2023), pp. 1546–1554. ACM Press, New York (2023). https://doi.org/10.1145/3583131.3590383
Doerr, B.: Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pp. 1–87. Springer (2020). Chap. Probabilistic Tools for the Analysis of Randomized Optimization Heuristics
Doerr, B., Le, H.P., Makhmara, R., Nguyen, T.D.: Fast genetic algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’17), pp. 777–784. ACM Press, New York (2017). https://doi.org/10.1145/3071178.3071301
Friedrich, T., Quinzan, F., Wagner, M.: Escaping large deceptive basins of attraction with heavy-tailed mutation operators. In: Proceedings of the Genetic and Evolutionary Computation Conference. GECCO ’18, pp. 293–300. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3205455.3205515
Droste, S., Jansen, T., Wegener, I.: On the analysis of the (1+1) evolutionary algorithm. Theoret. Comput. Sci. 276(1–2), 51–81 (2002). https://doi.org/10.1016/S0304-3975(01)00182-7
Jägersküpper, J., Storch, T.: When the plus strategy outperforms the comma strategy and when not. In: Proceedings of the IEEE Symposium on Foundations of Computational Intelligence (FOCI), pp. 25–32. IEEE, Honolulu (2007). https://doi.org/10.1109/FOCI.2007.372143
Prügel-Bennett, A.: When a genetic algorithm outperforms hill-climbing. Theoret. Comput. Sci. 320(1), 135–153 (2004). https://doi.org/10.1016/j.tcs.2004.03.038
Lengler, J.: A general dichotomy of evolutionary algorithms on monotone functions. IEEE Trans. Evol. Comput. 24(6), 995–1009 (2020). https://doi.org/10.1109/TEVC.2019.2917014
Lengler, J., Zou, X.: Exponential slowdown for larger populations: the (\(\mu \)+1)-EA on monotone functions. Theor. Comput. Sci. 875:28–51 (2021). https://doi.org/10.1016/j.tcs.2021.03.025
Kaufmann, M., Larcher, M., Lengler, J., Zou, X.: Self-adjusting population sizes for the (1, \(\lambda \))-EA on monotone functions. In: Rudolph, G., Kononova, A.V., Aguirre, H.E., Kerschke, P., Ochoa, G., Tusar, T. (eds.) Parallel Problem Solving from Nature (PPSN ’22). Lecture Notes in Computer Science, vol. 13399, pp. 569–585. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-14721-0_40
Witt, C.: Tight bounds on the optimization time of a randomized search heuristic on linear functions. Comb. Probab. Comput. 22(2), 294–318 (2013). https://doi.org/10.1017/S0963548312000600
Covantes Osuna, E., Gao, W., Neumann, F., Sudholt, D.: Design and analysis of diversity-based parent selection schemes for speeding up evolutionary multi-objective optimisation. Theor. Comput. Sci. 832, 123–142 (2020). https://doi.org/10.1016/j.tcs.2018.06.009
Droste, S., Jansen, T., Wegener, I.: Upper and lower bounds for randomized search heuristics in black-box optimization. Theory Comput. Syst. 39(4), 525–544 (2006)
Wegener, I.: Methods for the analysis of evolutionary algorithms on pseudo-Boolean functions. In: Sarker, R., Yao, X., Mohammadian, M. (eds.) Evolutionary Optimization, pp. 349–369. Kluwer, New York (2002)
Friedrich, T., Neumann, F.: Maximizing submodular functions under matroid constraints by multi-objective evolutionary algorithms. In: Bartz-Beielstein, T., Branke, J., Filipič, B., Smith, J. (eds.) Parallel Problem Solving from Nature (PPSN ’14), pp. 922–931. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10762-2_91
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14(1), 265–294 (1978). https://doi.org/10.1007/BF01588971
Neumann, F., Witt, C.: Ant colony optimization and the minimum spanning tree problem. Theor. Comput. Sci. 411(25), 2406–2413 (2010). https://doi.org/10.1016/j.tcs.2010.02.012
Wegener, I.: Simulated annealing beats Metropolis in combinatorial optimization. In: Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP ’05). LNCS, vol. 3580, pp. 589–601 (2005). https://doi.org/10.1007/11523468_48
Doerr, B., Rajabi, A., Witt, C.: Simulated annealing is a polynomial-time approximation scheme for the minimum spanning tree problem. In: Fieldsend, J.E., Wagner, M. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’22), pp. 1381–1389. ACM, New York (2022). https://doi.org/10.1145/3512290.3528812
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7(1), 48–50 (1956)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)
Doerr, B., Johannsen, D., Winzen, C.: Multiplicative drift analysis. Algorithmica 64, 673–697 (2012)
Reichel, J., Skutella, M.: On the size of weights in randomized search heuristics. In: Proceedings of the Tenth ACM SIGEVO Workshop on Foundations of Genetic Algorithms (FOGA ’09), pp. 21–28. ACM, New York (2009). https://doi.org/10.1145/1527125.1527130
Schmidbauer, M., Opris, A., Bossek, J., Neumann, F., Sudholt, D.: In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’24). ACM Press (2024). To appear
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
Both J.B. and D.S. contributed equally. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bossek, J., Sudholt, D. Runtime Analysis of Quality Diversity Algorithms. Algorithmica 86, 3252–3283 (2024). https://doi.org/10.1007/s00453-024-01254-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-024-01254-z