Introduction

A multi-objective optimization problem (MOP) can be stated as follows:

$$\begin{aligned} \left\{ \begin{aligned}&\mathrm{{minimize}} \quad F(x)={({f_1}(x),\ldots ,{f_m}(x))}^T\\&\mathrm{{subject \ to}} \quad x \in \varOmega \end{aligned}\right. \end{aligned}$$
(1)

where \(x = ({x_1},{x_2},\ldots ,{x_n})\) is called decision vector, \(\varOmega \) is the feasible region in the decision space. \({R^m}\) is called objective space with m dimensions. \(F:\varOmega \rightarrow {R^m}\) consists of m real-valued objective functions. Very often, since the objectives in (1) contradict each other, there is no such point in \(\varOmega \) that minimizes all the objectives simultaneously. Hence, the major goal is actually to find the solutions which can maintain a kind of delicate balance between each objective. The best tradeoff among all the objectives is defined in terms of Pareto optimality. More precisely, a point \({x^*} \in \varOmega \) is Pareto optimal to (1) if there is no \(x \in \varOmega \) such that F(x) dominates \(F({x^*})\). The set of all such points is called Pareto set (PS), and the set of all the objective vectors corresponding to the PS is called Pareto front (PF).

Multi-objective evolutionary algorithms (MOEAs) seem to be very suitable for tackling this types of problems because they use a set of possible solutions to deal with simultaneously, which allows it to find several members of the Pareto optimal set in a single simulation rather than performing a series of separate runs. Generally speaking, there are two critical components associated with MOEAs for solving MOPs: reproduction and selection. The former decides how the offspring is reproduced, whereas the latter is about how to select the individuals for the next generation. Up to now, plenty of works have been reported on designing the environment selection criteria, and they can be divided into three categories: dominance-, decomposition-, and indicator-based MOEAs. Decomposition strategy is commonly used in traditional mathematical programming to solve MOPs. Zhang et al. [27] combined this approach with evolutionary algorithm (EA) and proposed one of the most famous and popular MOEA in the evolutionary computation community, called MOEA based on decomposition (MOEA/D).

MOEA/D provides a framework based on decomposition strategy, which consists of two critical concepts: decomposition and collaboration. It utilizes a series of weight vectors (or called reference vectors) to decompose an MOP into many single-objective optimization subproblems, and optimize them concurrently based on the neighborhood relationship. Compared with other algorithms, MOEA/D demonstrates the significant advantages in accelerating the convergence speed and maintaining the diversity of the population. With the excellent performance in solving continuous MOPs, MOEA/D has attracted the researchers’ attention once it was proposed, and many variants have been designed to improve the performance furthermore or solve the specific types of MOPs since then. For instance, ENS-MOEA/D [30] ensembles different neighborhood sizes with online self-adaptation to obtain the superior performance on UF benchmark. EAG-MOEA/D [1] uses the nondominated sorting scheme in NSGA-II [2] to maintain an extra archive, which enhances the performance of MOEA/D in solving combinatorial optimization problems. In the literature [3, 13, 23], efforts have been made to solve many-objective optimization problems (MaOPs) under the framework of MOEA/D, and received the satisfactory results respectively.

Another way to achieve better solutions in solving MOPs is integrating powerful genic operators (or called reproduction operator) into MOEA/D. There are many innovative strategies in the literature [7, 8, 16, 20], in which differential evolution (DE) variants or other powerful operators are proposed and embedded into MOEA/D to search the solutions in the decision space more efficiently. Recent years, researchers have realized that different genic operators may have their unique abilities, and are suitable for solving different kinds of problems or different stages of the evolution. Combining complementary operators into MOEAs can be benefited from the different search patterns and be more robust than using single one. In literature [10], an MOEA/D with multiple DE operators was proposed to produce three offspring at each time, which improves the performance of MOEA/D significantly. The last decade has witnessed the development of EAs with adaptive operator selection (AOS). Most of them includes two major tasks: credit assignment and operator selection. AOS method commenced from single-objective optimization EAs [15, 22, 24] because the quality of an operator is easily to be quantified by fitness value. When extending to multi-objective optimization (MO), it is convenient and natural to apply AOS into MOEA/D because it decomposes an MOP into a series of scalar optimization subproblems. In literature [9], a bandit-based AOS method was proposed according to the recent performance of each operator. In literature [11], an adaptive operators pool selection strategy and parameters adaptation scheme were proposed for MOEA/D. Literature [6] combines fitness landscape analysis technique with conventional AOS method in solving multi and many-objective problems. Some researchers also consider AOS as a classification problem, in which one should classify the candidate solutions created by different operators first and only the ones labeled as positive are kept as the offspring. For example, Zhang et al. [25] proposed a classification based preselection strategy (CPS), which uses KNN to filter the newly generated solutions. By modifying and extending CPS to other MOEAs [26], they proposed MOEA/D-MO-CPS three years later. Lin et al. [12] used support vector machine (SVM) model to preselect the trail solutions in MOEA/D and achieved better results.

This paper presents a novel classification tree based AOS method (CTAOS) for MOEA/D. The classification tree is used to predict the reproduction results obtained by different operators under the certain condition respectively, and the operator with the best result is applied to generate offspring at each time. For this purpose, we record some information on newly-borned offspring and label it as three different categories according to its performance. Then, the data set is used to train a classifier at the end of each generation. Compared with other AOS method, our operator selection scheme is driven by the prediction rather than the empirical experience. The positional relationship in the decision space between parents, which is often ignored by other AOS methods, is taken into consideration. A novel DE variant based on search inertia (SiDE) is developed as well to enhance the search ability of conventional DE and shrink the size of operators pool. With the integration of the aforementioned algorithmic components, we design a novel classification tree based AOS method for MOEA/D (MOEA/D-CTAOS). Preliminary results shows the competitiveness of our proposal when compared with some other state-of-the-art MOEA/D variants. Subsequently, MOEA/D-CTAOS is compared with other MOEAs with AOS to prove the effectiveness and superiority of MOEA/D-CTAOS further. The rational and effectiveness of SiDE is revealed through the experimental results as well.

The reminder of this paper is organized as follows: in the second section introduces some related work so far in the literature and the motivation of this study. In the third section presents MOEA/D-CTAOS. In the fourth section demonstrates the superiority and rational of our proposal through several constrast experiments. And at last section concludes this paper and introduces the future work.

Related work and motivation

In this section, we give a brief introduction to classification tree training procedure and adaptive operator selection in EAs. The motivation of this study is discussed as well.

Classification tree

Classification and regression trees (CART) are statistical structures which were first proposed by Breiman et al., in 1984, and is well developed in the machine learning field since then. Compared with other classification and regression methods, e.g., k-nearest neighbour (KNN), neural network (NN) and support vector machine (SVM), the most unique features of CART are its good visualization and explainability. The computational cost in CART is much more less than other approaches, which makes it more suitable to be embedded into an MOEA. In training phase, a set of training data is feed to the CART structure and different rules for the classification of training data are extracted. The most critical step of CART training is splitting the data set into two categories according to one feature. This step will repeat several times until a binary tree is generated completely. There are several approaches for this task, among which the most frequently used one is Gini impurity, the procedure of CART training is described in Algorithm 1.

figure a

Adaptive operator selection

As mentioned in the first section, it is well known that different genic operators have their unique characteristics, and are suitable for solving different types of MOPs or different stages of the evolution. AOS is a paradigm to adaptively picking up the most appropriate operator for generating new offspring at each time point. This method includes two major tasks: credit assignment and operator selection. The former defines how to reward operators based on their recent performance, while the latter decides which operator should be applied next.

1. Credit assignment The most commonly used metric in credit assignment for MOEA/D framework is based on the fitness improvement rates (FIRs) of new generated offspring during the whole evolutionary process. Once FIRs are calculated, each operator will receive the reward with some certain rules. In literature [18], four kinds of the most frequently used credit assignment strategies are discussed and compared through constrast experiments. Literature [9] has proposed a rank-based credit assignment scheme with decaying mechanisms. Besides, when applied AOS into dominance-based MOEAs, the number of survival offspring after selection is treated as the most important factor in credit assignment, as in [4, 19, 21] do.

2. Operator selection Guided by the credit value, an operator is selected to generate offspring. There are two of the most frequently used strategies in the literature: probability- and bandit-based operator selection. Probability-based approach randomly selects an operator with a certain probability at each time. The more credit value an operator earned before, the more possible it be picked up. The bandit-based method utilizes UCB algorithm to choose one operator without astonish at each time.

Motivation

From the literature review, we can see that conventional AOS methods focus on the designing of credit assignment and operator selection strategies. These strategies choose the operator with best reward at each time. However, it is obvious that the positional relationship between parents in the decision space also determines the quality of newly generated offspring, which tend to be neglected in most existing AOS strategies. In fact, different operators are suitable for different positional relationship between parents. Some genic operators uses the parents coming from very different subpopulations to accelerate the convergence speed or maintain the diversity, which inflates the importance of positional relationship. When conducting operator selection, the algorithm should take this relationship into consideration as well. The classification tree can classify the data via several irrelevant features, so it is feasible to predict the reproduction results produced by different operators under the certain positional relationship between parents. Moreover, the size of operators pool may not be too large in MOEAs because the algorithm may waste much computational resource on exploration the ability of each operator. And the best one, therefore, is hard to be guaranteed to be selected very often. A powerful genic operator is eager to be designed for shrinking the size of operators pool.

Fig. 1
figure 1

The definition of search inertia. Individuals are searching along the direction of the previous generation

Proposed algorithm: MOEA/D-CTAOS

In this section, we present a novel classification tree and decomposition based MOEA with AOS. The proposed algorithm pays particular attention on the novel operator designing and dynamic operator selection scheme. The main framework of MOEA/D-CTAOS is introduced as well.

Fig. 2
figure 2

Procedure of SiDE mutation and the relationship between the parents. Both objective functions are minimized

Differential evolution based on search inertia

Searching solution in an effective and efficient way is taken as the first priority in designing a genic operator for EAs. Conventional genic operators usually ignore the historical information during the evolutionary process, which leads to the limitation in search efficiency. It is a nature of EA that survival pressure will force the individuals in the decision space moving towards to true PS, the movements of individuals in the decision space, therefore, may provide some information on estimating the search direction in the future. It is intuitive that the most effective search strategy is searching along the direction of the previous generation rather than searching randomly. As can be seen in Fig. 1, the search inertia is defined as the following formula:

$$\begin{aligned} v_i = x_i + \mathrm{IC} \cdot \left( x_i - x_{i}^p \right) \end{aligned}$$
(2)

where \(v_i\) denotes the potential better solution in the decision space. i is the index of the individual at current generation and \(x_{i}^p\) represents the main parent (the donor vector) of this individual. IC denotes the inertial coefficient. The exactly best search direction is unavailable though, attempt could still be made to get some hints for estimating the promising search direction through this strategy.

Exploration and exploitation are two of the most important things when estimating the performance of one genic operator. It is difficult to enhance the exploitation without the loss of explorative ability as if they were conflicting to each other. As is mentioned above, the search inertia helps to guide the search direction in the future. However, note that if the difference between \(x_{i}\) and \(x_{i}^p\) becomes smaller and smaller, the solutions may be easily trapped into local optimal, especially at the end stage of the evolution. Meanwhile, due to the fixed movement trajectory, the stagnation in the evolutionary process may occur easily. Thus, using pure direction-guided technique could not guarantee a satisfactory results. To address this issue, we utilize the search inertia in a special way, as is visually shown in Fig. 2, individuals are inspired by the search inertia of their neighborhood, and then utilize these information to generate offspring. Neighborhood relationship helps to exchange information in a certain limited region and provides more astonish for evolution, which is beneficial to prevent the loss of diversity. In MOEA/D, the neighborhood relationship is defined based on the similarity between each weight vector at the first place, so there is no additional steps for neighborhood definition in SiDE when applying into MOEA/D.

In SiDE, there are three individuals involved in the mutation. One provides information on estimating the promising search direction via search inertia, and the others conduct conventional differential mutation. The proposed SiDE mutation strategy can be described by the following formula:

$$\begin{aligned} v_i = x_i + \mathrm{IC} \cdot \left( x_\mathrm{rn1} - x_\mathrm{rn1}^p \right) + F \cdot \left( x_\mathrm{rn2} - x_\mathrm{rn3}\right) \end{aligned}$$
(3)

where \(x_\mathrm{rn1}\), \(x_\mathrm{rn2}\), and \(x_\mathrm{rn3}\) are three parents randomly selected from the neighborhood. \(x_\mathrm{rn1}^p\) is the main parent of \(x_\mathrm{rn1}\), which represents the state of \(x_\mathrm{rn1}\) at previous generation to some extend. The new generated donor vector \(v_i\) will conduct crossover and mutation to produce trail solution afterwards.

Classification based operator selection

The operators pool in MOEA/D-CTAOS consists of three genic operators:

  1. 1.

    DE/rand/1: \(v_i = x_i + F \cdot (x_\mathrm{rn1} - x_\mathrm{rn2}) \)

  2. 2.

    DE/rand/2: \(v_i = x_i + F \cdot (x_\mathrm{rn1} - x_\mathrm{rn2}) + F \cdot (x_\mathrm{rn3} - x_\mathrm{rn4})\)

  3. 3.

    SiDE: \(v_i = x_i + \mathrm{IC} \cdot (x_\mathrm{rn1} - x_\mathrm{rn1}^p) + F \cdot (x_\mathrm{rn2} - x_\mathrm{rn3})\)

The “DE/rand/1” mutation strategy is the most frequently used one in MOEAs, where three parents are randomly selected from the entire population or a small region. Consequently, it randomly chooses one new search direction at each time, instead of having preference to any specific search direction. “DE/rand/2” employes five parents, which may lead to a larger perturbation than “DE/rand/1”. SiDE is the enhanced version of conventional DE operator, which favors searching along the direction of previous generation while doing some astonish search to explore the other areas in the decision space, as is mentioned in the last section. A classification based AOS method is proposed and applied in MOEA/D-CTAOS. It mainly includes two procedure: the training data set collection, the classifier building and the selection of operator. The details of these steps are introduced as follows.

Data collection

The training data set is collected through the historical information on offspring. There are three issues should be considered. The first one is data labeling. It is not feasible to directly use the exact quality of offspring to label these data even if they have single scalar values because compared with classification, regression is much more complex. Instead, we just need to record the quality level of the offspring so that the reproduction results created by different operators could be distinguished. The quality level of the solutions is defined according to FIRs sorting at the end of each generation, and each level set contains the same number of solutions. Since there are three operators in the pool, we need to use at least three classes to label the data, i.e., \(l=\{Q_1,Q_2,Q_3\}\) with \(Q_1\) denoting the best and \(Q_3\) denoting the worst quality level respectively.

The second one is data feature. The genic operator and positional relationship between parents both have an influence on the performance of offspring. In differential mutation, the difference between parents in the decision space decides how the offspring being generated directly. In this study, we use Manhattan distance to describe this relationship in the decision space. Given two parents \(x_1\) and \(x_2\), the Manhattan distance between them is defined by the following formula:

$$\begin{aligned} d=\sum \limits _{i = 1}^n {\left| x_1^i - x_2^i \right| } \end{aligned}$$
(4)

with n denoting the dimensions of the decision space. Compared with recording solution x directly, this strategy could compress the number of features, especially when n is very large. There are six parents involved in reproduction, but only eight kinds of distance is needed to be recorded according to the mutation strategies in the operators pool, as is illustrated in Fig. 3. By adding the index of operator which produces the solution, we have the feature vector \(x=\{d_1,d_2,\ldots ,d_8,op\}\) of one piece of data.

Fig. 3
figure 3

The eight kinds of distance between parents involved in the mutation. Manhattan distance is used to describe the positional relationship in the decision space

Fig. 4
figure 4

Illustration of the data structure of evolutionary information and the FIFO sliding window

The third issue is data set update. In MOEA/D-CTAOS, when a new solution is generated and has successfully replaced some other solutions in the neighborhood, the corresponding evolutionary information along with the quality level of offspring, is record into a sliding window, as is visually demonstrated in Fig. 4. The sliding window is organized as a first-in, first-out (FIFO) queue, which only memorizes the most recently data. It is common that the new solutions are easy to replace the old ones at the beginning of the evolution, and become difficult at the end. It may be hard to label offspring with different quality levels at the end of the stage since most of offspring may not have FIR. For this instance, the sliding window only records the information on the new generated offspring who have FIR. The size of sliding window (since it also denoting the training data set size, hereafter called TS for short) is set by user. Once the sliding window is filled, it will be feed to train the classifier at the end of the generation.

Classifier training

Classification aim to find a classifier that can reveal the relationship between feature and label based on the exsiting data points, and thus to predict the categories of new data points with given feature. Which classifier should be used and how to trained the classifier with the given data set are two critical issues in this field. In our case, CART is chosen to be the classifier to predict the reproduction results. The main two reasons are as follows:

1. The features in our case, i.e., \(x=\{d_1,d_2,\ldots ,d_8,\mathrm{op}\}\), may be irrelevant to each other. For example, the index of the operator has nothing to do with the Manhattan distance. CART can classify the data via several irrelevant features while the other approaches may not be suitable for this task.

2. When using classification in MOEA, the additional time consumption is inevitable. In our AOS method, the classifier is trained at each generation, so it is important to use a computational friendly approach for this task. Among most of the classification approaches, CART has the lowest time complexity, with O(logS) for predicting and \(O(DS^2logS)\) for training (D and S denoting the number of features and data set size respectively), yet is still able to guarantee a satisfactory result overall. In our case, \(S=3\) for prediction in each subproblem and S = TS, D = 9 for training at each generation. Thus, the total additional computational cost in MOEA/D-CTAOS is \(O(G \times (N+\mathrm{TS}^2\mathrm{log}\mathrm{TS}))\), where N represents the population size and G represents the maximal number of generation.

Based on the above two reasons, CART is used to conduct classification in our proposal. The procedure of CART training is described in Algorithm 1 in the ”Related work and motivation” section.

Operator selection

The operator selection in CTAOS is based on the prediction results of reproduction, as is shown in Algorithm 2. The CART is used to predict the quality of reproduction provided by each operator under the given parents in Line 2–5. Subsequently, the index of the operator which may produce the best result according to quality level is returned in Line 6. If there are more than one operator that can produce the best results, then we randomly select one of them, as is described in Line 7–8.

Compared with other AOS method, our approach are based on the prediction and do not generate any candidate solutions for preselection, which could avoid unnecessary computational cost in reproduction and save the function evaluation times as well. Meanwhile, for the first time, the positional relationship between parents is taken into consideration when conduct operator selection.

figure b

Integration of CTAOS with MOEA/D

The purpose of this study is actually to improve the performance of MOEA/D by replacing its original genic operator. There are several MOEA/D variants proposed in the literature. In this paper, we select MOEA/D with dynamical resource allocation (MOEA/D-DRA [29]), which won the CEC 2009 MOEA contest, as the base framework of our algorithm. Algorithm 3 demonstrates the pseudo of MOEA/D-CTAOS. Note that TS is the only extra parameter in the algorithm, which makes our proposal be more convenient to be applied in practical. We would like to make the following comments on the Algorithm 3.

  • SW is the abbreviation of sliding window.

  • Line 1: The initial solution at each subproblem is randomly sampled from the feasible region in the decision space. The weight vectors of the subproblems are randomly distributed.

  • Line 5: The neighborhood of each subproblem consists of all the subproblems whose weight vectors are the closest T to its. Once the neighborhood is defined, it will not be changed during the following process.

  • Line 8: Each subproblem may not make the same contribution on solving the problem. At each subgeneration, subproblem is randomly selected by 10-tournament selection according to their utility \(\pi ^i\).

  • Line 11: Selecting the parents locally or from the whole population with a certain probability to enhance the population diversity.

  • Line 17–22: The genic operator is selected based on the CTAOS method, which is shown in Algorithm 2 at each time point.

  • Line 38–40: Only the evolutionary information on the new generated offspring who has FIR will be recorded into sliding window.

  • Line 43–45: Once the sliding window is filled, it will be feed to train the classifier at the end of each generation.

  • Line 47: If gen is a multiplication of 50, the utility of each subproblem will be updated. The parameter 50 is set as the literature [29] recommended.

figure c

Empirical study

In this section, we evaluate the overall performance of MOEA/D-CTAOS and the effectiveness of each algorithmic component through several experiments. Our empirical study can be divided into five parts:

  1. 1.

    In The ”Comparison with MOEA/D variants” section investigates the overall performance of MOEA/D-CTAOS by comparing it with four other representative MOEA/D variants, namely MOEA/D-MO [10], MOEA/D-AWA [14], MOEA/D-STM [5] and ENS-MOEA/D [30] respectively.

  2. 2.

    In the ”MOEA/D-CTAOS Versus Other AOS-based MOEAs” section compares our proposed algorithm with other MOEA/D variants based on different AOS methods to confirm the effectiveness of CTAOS.

  3. 3.

    The proposed MOEA/D-CTAOS utilizes three operators instead of single one to conduct reproduction. To clarify the benefits from the adaptive operator selection, in the ”Operators pool versus single DE operator” section compares the experimental results of MOEA/D-CTAOS with other three versions of algorithm which only uses single genic operator.

  4. 4.

    Search inertia is a critical concept in the proposed SiDE. In order to verify the rational and effectiveness of this novel operator, in the ”Effectiveness of search inertia” section designs a contrast experiment to figure out how the search inertia affects the evolutionary process.

  5. 5.

    The size of data set is an important control parameter in the training of the classifier. In the ”Parameter sensitivity analysis” section conducts parameter analysis on it.

Environments, benchmarks, and parameters

All the experiments in this study were running on the PlatEMO [17], which is an open source and free MATLAB-based platform for evolutionary multi-objective optimization. An Intel Core i5 machine with 8 GB RAM and 3.0 GHz speed was used to conducted the experiments. MATLAB 2020b is used for coding and simulation.

There are a large number of benchmark problems in the literature, proposed to assess the performance of MOEAs. Among these benchmark problems, the most frequently used ones [7, 9] for testing the genic operators are LZ [7] and UF [28] benchmark families. Varied types of MOPs are covered in terms of separability, bias, modality, and shape of PF in LZ and UF test instance, and all of them are unconstrained minimization problems. LZ and UF benchmarks both have the very complex PS in the decision space, which makes them difficult to be solved. The parametric settings in our empirical study are listed as follows:

  1. 1.

    Control parameters in DE, SBX, and polynomial mutation:

    • \({C_\mathrm{r}} = 1.0\), \(K=0.5\), IC = 0.5 and \(F = 0.5\) in the DE variants.

    • \({C_\mathrm{r}} = 1.0\) and \(\eta = 20\) in the SBX.

    • \(\eta = 20\) and \({P_\mathrm{m}} = 1/n\) in the polynomial mutation.

  2. 2.

    Control parameters in MOEA/D framework:

    • Neighborhood size: \(T=20\)

    • Maximum number of solutions replaced by each offspring: \(n_\mathrm{r}=2\)

    • The probability of choosing parents locally: \(\delta = 0.9\)

    • Decomposition method: Tchebycheff approach

  3. 3.

    Number of Runs, Population, and Stop Condition: Each algorithm runs 30 times independently on the test instance. The population size for bi-objective optimization problems (UF1-7, F1-F5 and F7-9) is set to 600, and 1000 for tri-objective problems (UF8-10 and F6). The algorithm stops when reaching the maximal function evaluation times (MAX-EV). MAX-EV is set to 300,000 for UF and LZ6 benchmarks, and 150,000 for the rest.

  4. 4.

    Other Control Parameters: Some extra control parameters in other algorithms will be listed separately in the following sections. In MOEA/D-CTAOS, the size of training data set is set to 4000 except in the ”Parameter sensitivity analysis” section.

Table 1 Average and standard deviation IGD values obtained by MOEA/D variants on each benchmark (30 runs)

Performance metric

The inverted generational distance (IGD) and hypervolume (HV) are taken as the performance metric in our experiments. We would like to give a brief introduction on them.

  • IGD is defined as the following formula:

    $$\begin{aligned} \mathrm{{IGD(}}{P^*}\mathrm{{,}}P\mathrm{{)}} = \frac{{\sum {_{v \in {P^*}}d(v,P)} }}{{\left| {{P^*}} \right| }}\end{aligned}$$
    (5)

    where P is the approximation to PF, which is produced by algorithms. \({P^*}\) is a set of uniformly distributed points in the objective space along PF. d(vP) is the minimum Euclidean distance between v and the points in P. if \(\left| {{P^*}} \right| \) is large enough, IGD could measure both the convergence and diversity of the population. In our experiments, we select 10,000 evenly distributed points in PF and \({P^*}\) is obtained through integration of these points for each test instance.

  • HV is defined as follows:

    $$\begin{aligned} \mathrm{{HV}} = \mathrm{volume} \left( \bigcup \limits _{y \in S} {[{y_1},y_1^*]} \times \cdots [{y_m},y_m^*] \right) \end{aligned}$$
    (6)

    where \({y^*} = \left( y_1^*,y_2^*,\ldots ,y_n^* \right) \) is an antioptimal reference point which is dominated by all the solution vectors in the objective space, and S is the obtained approximation to PF in the objective space. Then HV is the volume of the region dominated by S and bounded by \(y^*\). In our experiments, we select \({y^*} = (1,1)\) for bi-objective problems and \({y^*} = (1,1,1)\) for tri-objective problems after normalization.

Generally speaking, the lower IGD value achieved, the better algorithms performing. For HV, the higher means the better.

Table 2 Average and standard deviation HV values obtained by MOEA/D variants on each benchmark (30 runs)

Comparison with MOEA/D variants

We have first compared MOEA/D-CTAOS with other four MOEA/D variants, namely MOEA/D-MO, MOEA/D-AWA, MOEA/D-STM and ENS-MOEA/D respectively, to see the competitiveness of MOEA/D-CTAOS. MOEA/D-MO replaces the SBX with three DE operators to produce better solutions for MO framework. MOEA/D-AWA uses an adaptive weight vector adjustment strategy to tackle the MOPs with complex PF. MOEA/D-STM introduces a stable matching model to coordinate the selection process in MOEA/D. The neighborhood size in ENS-MOEA/D is ensembled with different sizes and tuned in a online manner. In summary, the four comparative algorithms improve MOEA/D from the aspects of genic operator, weight vectors, selection scheme, and neighborhood size respectively. The other control parameters, which are not listed in the ”Environments, benchmarks, and parameters” section, is set as the recommendation in the literature:

  1. 1.

    MOEA/D-AWA with ratio of updated weight vectors \(\mathrm{rate\_update\_weight}=0.05\); ratio of iterations to evolve with only MOEA/D \(\mathrm{rate\_evol} = 0.8\); iteration interval of utilizing AWA \(wag=100\).

  2. 2.

    ENS-MOEA/D with set of neighborhood sizes NS = 25,50,75,100; learning period LP = 50.

Except for MOEA/D-MO, all the algorithms are based on the framework of MOEA/D-DRA, that means the difference between them only comes from their unique strategies. Tables 1 and 2 present the performance of each algorithm in terms of IGD and HV values respectively and the best entries are highlighted in boldface. In order to draw a statistic-based conclusion, Wilcoxon’s rank sum test at a 0.05 significance level is used to compare the significance of the difference between results produced by two competing algorithms. “+”, “−”, and ‘\(=\)’ denotes the empirical results produced by other algorithms is better than, worse than or equal to the MOEA/D-CTAOS respectively.

From the tables, it is clear to see that MOEA/D-CTAOS receives the best results in 8 out of 19 problems in terms of IGD, and 9 out of 19 problems from the aspect of HV. Actually, MOEA/D-CTAOS outperforms other algorithms 110 out of 152 times when comparing with other algorithms independently on each instance from the aspects of IGD and HV. To see clearly on the evolutionary process, we plot the runtime performance (IGD versus the number of function evaluations) on six problems in Fig. 5. It is evident that the proposed MOEA/D-CTAOS has much faster convergence speed than the other algorithms except in LZ8. Another thing should be noticed is that the first data point in the chart represents the IGD value after a small evaluation time instead of initialization. From the above comparison, we can draw conclusion that MOEA/D-CTAOS is competitive to other representative MOEA/D variants in general.

On LZ problems, MOEA/D-CTAOS produce the significantly better results than other algorithms, which shows that AOS method is quite suitable for solving MOPs with complex PS. On UF8-10 and LZ6 tri-objective problems, we have seen the promising results created by MOEA/D-CTAOS, despite of it does not perform the best on all of them. It indicates the possibility and potential of extending CTAOS to many-objective optimization, which will be a research issue for our future work.

Fig. 5
figure 5

The average IGD values in 30 runs produced by five MOEA/D variants at each time point

MOEA/D-CTAOS versus other AOS-based MOEAs

In this section, we compare MOEA/D-CTAOS with other state-of-the art decomposition-based algorithms with AOS methods to demonstrate the effectiveness of CTAOS and the proposed genic operator. Three representative and state-of-the-art algorithms are picked up as the comparative algorithms, namely MOEA/D-FRRMAB [9], MOEA/D-CDE [11] and MOEA/D-MO-CPS[26] respectively. MOEA/D-FRRMAB uses bandit-based AOS method to select operator based on the recent performance of operators at each time point. MOEA/D-CDE combines several operators pools with composite operators and dynamically selects the pool to evolve the population and tunes the scaling factor F for DE variants. MOEA/D-MO-CPS uses a classification based preselection (CPS) strategy to filter the candidate solutions created by different operators, and only the one with positive label will be kept as the offspring solutions. The additional parameter settings in them are listed as follows:

  1. 1.

    MOEA/D-FRRMAB with scaling factor in bandit-based operator selection \(C=5\); size of sliding window \(W=N/2\); decaying factor in calculating credit value \(D=1\);

  2. 2.

    MOEA/D-CDE with scaling factor in bandit-based operator selection \(C=5\); size of sliding window \(W=N/2\);

Another issue is regarding whether an intelligent AOS method will truly provide some benefits for the algorithm. For this purpose, the counterpart of MOEA/D-CTAOS, which selects the three operators randomly by removing CTAOS from the original algorithm, is added in this experiment, called MOEA/D-Uniform hereafter. The five algorithms involved in the experiment are all based on the framework of MOEA/D, by which the difference caused by MO framework is eliminated therefore.

As can be seen from Tables 3 and 4, MOEA/D-CTAOS outperforms other MOEA/D variants with different AOS methods. Compared with MOEA/D-Uniform, MOEA/D-CTAOS wins on 17 out of 19 test instances. From the results, we can conclude that the intelligent AOS method does drive the operator selection process in a more effective way. The second best algorithm is MOEA/D-CDE and MOEA/D-MO-CPS. MOEA/D-CDE benefits from its adaptive parameter scheme in DE variants, and this will also be a research issue for our further investigation. MOEA/D-MO-CPS uses preselection strategy to filter the new solutions, which is equivalent to conduct operator selection.

Table 3 Average and standard deviation IGD values obtained by MOEA/D variants with different AOS methods on each benchmark (30 runs)
Table 4 Average and standard deviation HV values obtained by MOEA/D variants with different AOS methods on each benchmark (30 runs)

In order to visualize the evolutionary process of five algorithms, we plot the runtime performance on six problems, as is shown in Fig. 6. Besides, the effectiveness of each operator and the visualization of operator selection are also two of our concerns so the usage times of each operator at different evolutionary phases is plotted in Fig. 7. As can be seen in the figure, the usage times of each operator are quite different on six benchmarks. On some of them, i.e., UF1, UF8 and LZ3, the operator usage strategy may change sharply according to the different stages of the evolution, which verifies the effectiveness of each operator and our approach for AOS.

Fig. 6
figure 6

The average IGD values in 30 runs produced by five MOEAs with different AOS methods at each time point

Fig. 7
figure 7

The average usage times of each operator in 30 runs at different evolutionary phases

Table 5 Average and standard deviation runtime values obtained by MOEA/D variants with different AOS methods on each benchmark (30 runs)
Table 6 Average and standard deviation IGD values obtained by four versions of MOEA/D-CTAOS on each benchmark (30 runs)
Table 7 Average and standard deviation HV values obtained by four versions of MOEA/D-CTAOS on each benchmark (30 runs)
Fig. 8
figure 8

The average IGD values in 30 runs produced by two algorithms at each time point

Fig. 9
figure 9

Boxplots of MOEA/D-CTAOS on UF and LZ benchmarks with different training data set size. The five versions of MOEA/D-CTAOS, namely v1, v2, v3, v4 and v5, denote \(TS=1000, 2000, 3000, 4000\) and 5000 respectively

The satisfactory results produced by MOEA/D-CTAOS benefit from its theoretical sound way of considering the positional relationship between parents and its powerful SiDE operator. But the outstanding performance of MOEA/D-CTAOS does not come for free. Table 5 demonstrates the average and standard deviation runtime values of each MOEA/D variant with different AOS methods on 30 runs. As can be seen in the table, compared with MOEA/D-Uniform, no matter what kind of AOS method is employed in MOEA, the extra computational consumption is always required. The classification approaches seems to cost more computational resource than other AOS methods, in exchange for more excellent performance. Compared with KNN in CPS, CART classification approach is more computational friendly, which proves our previous analysis in the ”Classifier training” section.

Operators pool versus single DE operator

In this section, we would like to figure out what benefits can be obtained when operators pool are adopted as the search engine in MOEA instead of single operator. For this purpose, we derived three versions of MOEA/D-CTAOS by using only one genic operator, i.e., DE/rand/1, DE/rand/2 and SiDE, and compare their performance with the original one in UF and LZ test suites.

From the Tables 6 and 7, it is evident to see that MOEA/D-CTAOS beats all the other algorithms with single operators in 11 out of 19 MOPs in terms of IGD value, and 13 out of 19 in terms of HV value respectively. On UF1-2, UF4, UF7, UF9, LZ1 and LZ6, although MOEA/D-CTAOS fails to achieve the best results, it still be the second best algorithm in solving these problems. The second best algorithm is MOEA/D-CTAOS with SiDE, which won 4 times and 2 times in terms of IGD and HV value respectively. The other two conventional operators, i.e., DE/rand/1 and DE/rand/2, could not guarantee satisfactory results in searching solutions in the decision space. The experimental results demonstrate the benefits coming from the usage of multiple operators when comparing with other variants using single operator. Thus, we can conclude that the success of MOEA/D-CTAOS comes from the application of operators pool.

Effectiveness of search inertia

In the proposed SiDE, the movement of the individual is guided by the search inertia. For further investigation, there is an obvious question that needs to be figured out: will search inertia really steer the optimization progress in an effective and efficient way? Or the results are only affected by the astonish perturbations? In order to clearly demonstrate the effectiveness of the search inertia, a contrast experiment is conducted through modifying the search inertia in SiDE, called rSiDE for short hereafter.

  • Original Search Inertia (SiDE):

    $$\begin{aligned} v_i = x_i + \mathrm{IC} \cdot \left( x_\mathrm{rn1} - x_\mathrm{rn1}^p \right) + F \cdot \left( x_\mathrm{rn2} - x_\mathrm{rn3}\right) \end{aligned}$$
    (7)
  • A Reverse Version of Search Inertia (rSiDE):

    $$\begin{aligned} v_i = x_i + \mathrm{IC} \cdot \left( x_\mathrm{rn1}^p - x_\mathrm{rn1}\right) + F \cdot \left( x_\mathrm{rn2} - x_\mathrm{rn3}\right) \end{aligned}$$
    (8)

All the elements involved in rSiDE are exactly identical to those in original one except the search inertia direction. The influence of all the other factors, therefore, is eliminated and the impact of search inertia is scaled up. Two versions of SiDE are implemented into MOEA/D-DRA by replacing the original DE operator respectively.

To save the space, we only plot the average IGD values on UF1, UF3, UF8, LZ1, LZ3, and LZ8 in 30 runs at each time point, as is shown in Fig. 8. It is evident that IGD value obtained by SiDE decreases much faster than that of rSiDE. From the final results obtained by two algorithms, we can see that SiDE performs better than rSiDE on UF3, UF8, LZ3 and LZ8, and about the same for the rest. Thus, from the above empirical study, we can draw conclusion that the search inertia helps to steer the evolutionary process more effectively, and improves the convergence speed for the population significantly.

Parameter sensitivity analysis

The CTAOS introduces one extra parameter into the algorithm. In order to figure out how MOEA/D-CTAOS is sensitive to this parameter, a parameter analysis experiment is conducted by setting training data set size (TS) with different values. The boxplots of IGD values on UF and LZ benchmarks are drawn in Fig. 9. On UF1, UF3-6, UF8-9, LZ1, LZ4-5, LZ7 and LZ9, the training data set size has a slightly influence on the average IGD results, and in contrast for the rest. However, the size may determine the robustness of the algorithm on all benchmarks. In order to figure out the best parameter setting, we plot the average ranking order for all the algorithms on 19 instances, as is shown in Fig. 10. It is clearly to see that when setting TS = 4000, the algorithm could receive the lowest average ranking on both IGD and HV value. For this reason, we set MOEA/D-CTAOS with TS = 4000 in the other experiments. It is worth noticing that this value may not be suitable for all types of problems, one should set TS carefully when facing a specific problem.

Conclusion

In this paper, we propose a classification tree and decomposition based MOEA for solving MOPs with complex PS, which picks up the most appropriate operator not only from the aspect of their recent performance, but also takes the positional relationship between parents into consideration. A novel DE variant based on the search inertia is also developed for steering the evolutionary process more efficiently. The main task of the proposed CTAOS is actually to produce more promising offspring for MOEA/D frameworks. Empirical study demonstrates the superiority of our proposal when embedded into MOEA/D-DRA, and the advantages of CTAOS is confirmed through the comparison with other AOS based algorithms. Additionally, for further investigation, we also conducted the contrast experiment to validate the rational and effectiveness of direction-guided search strategy.

Fig. 10
figure 10

Average ranking for five versions of parameter setting on 19 instance in terms of IGD and HV values respectively

Although MOEA/D-CTAOS outperforms other MOEA/D variants in a wide variety of MOPs, there are still some works remained to be done in the further. First, CTAOS extends to many-objective optimization should be considered. In addition, the parameter adaptation in genic operator could be designed. And finally, implementation in more MOEA frameworks may also be conducted.