CN108647771A

CN108647771A - The layout method of research-on-research flow data under a kind of mixing cloud environment

Info

Publication number: CN108647771A
Application number: CN201810427966.1A
Authority: CN
Inventors: 林兵; 陈星�; 项滔; 卢宇; 黄志高; 郭文忠
Original assignee: Fujian Normal University
Current assignee: Fujian Normal University
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2018-10-12

Abstract

The present invention discloses a kind of layout method mixing research-on-research flow data under cloud environment, is encoded to scientific workflow data layout problem using discrete particle cluster coding mode；The variation and crossover operation for introducing genetic algorithm are updated operation to particle, effectively overcome the problems, such as the existing Premature Convergence of discrete particle cluster optimization；A kind of difference degree based between global history optimal particle and current particle is built come the adaptive update method for adjusting the inertia weight factor, effectively meets the property complicated and changeable of data problem；It is defined by the still remaining feasible interparticle fitness function of the infeasible particle and capacity of storing excess to data center's capacity, the data reduced between data center move number, compressed data transmission quantity, so as to effectively promote the execution efficiency of scientific workflow.

Description

Method for distributing scientific workflow data in mixed cloud environment

Technical Field

The invention relates to a scientific workflow data layout method in the field of parallel and distributed high-performance computing, in particular to a scientific workflow data layout method in a mixed cloud environment.

Background

In the scientific research fields of astronomy and bioinformatics, scientists analyze data from existing data sources or collect data from physical devices by performing thousands of tasks and generate large amounts of new data, such as intermediate data or final data results, often of the TB and even PB magnitude. In the past, scientists have generally used simple methods (e.g., Perl scripting language) to orchestrate tasks and manage data, but this approach is not only time consuming but is also prone to errors. Computing tasks and scientific applications are therefore often modeled as workflows, with data analysis automated through data or control dependencies. Scientific workflows that contain a large number of computing tasks require not only high performance computing resources, but also a large amount of storage resources. Currently, some large scientific workflows are deployed on complex distributed computer systems, such as supercomputers, because of their high performance and mass storage. However, these systems are very expensive to build and time consuming to apply for access. The cloud computing virtualizes resources in different geographic positions into a resource pool through a virtualization technology, faces users in a pay-as-you-go mode, and has the characteristics of high efficiency, flexibility and customization, so that a new idea is provided for solving the problems encountered in the operation process of a scientific workflow.

As bandwidth between data centers is typically limited and expensive. Improper data placement strategies may result in excessive amounts of data transfer between data centers, resulting in excessive implementation costs to the user. In recent years, many workflow scheduling algorithms have been developed for efficient workflow execution. However, most of them focus on scheduling computationally intensive workflows. As more and more scientific workflows become data intensive, novel approaches are needed to handle these scientific workflows. At present, most data layout optimization algorithms adopt a clustering matrix method, the operation process of a workflow is optimized by dividing the operation process into two stages of construction time and operation time, and when the size difference of a data set is overlarge, the clustering matrix has certain limitation. At present, an intelligent method is also used for research on data layout, however, most of the work assumes that all data sets are the same in size, and data sets of different workflows are very different in practical situations, and these factors may have a certain influence on data layout.

Regarding the scientific workflow data layout problem considering the private data placement in the mixed cloud environment, few studies are currently conducted at home and abroad. The most relevant research work considers the data dependency relationship, screens out non-private data sets with high dependency on private data sets, and places the data sets in corresponding private cloud data centers, so that the data transmission time is effectively shortened, but the work does not consider the influence of the size difference among the data sets on the data layout.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a scientific workflow data layout method based on scientific workflow execution characteristics and self structure, data set size difference, data center capacity and other relevant factors under a mixed cloud environment.

In order to achieve the purpose of the invention, the technical scheme of the invention is as follows:

a scientific workflow data layout method in a mixed cloud environment comprises the following steps:

step 1: constructing a scientific workflow structure in a hybrid cloud environment, and calculating data transmission quantity generated when each task is executed according to the execution sequence of the tasks;

step 2: initializing a population size, maximum iteration times, inertia weight factors and cognitive factors based on a scientific workflow structure in the mixed cloud environment, and randomly generating an initial population; initializing self history optimal particles of the first generation particles and initial population global optimal particles;

and step 3: the velocity and position of the particles are updated,

and 4, step 4: calculating a fitness value of each particle;

and 5: updating the self history optimal particle of the particle;

if the fitness value of the current particle is smaller than the fitness of the historical optimal particle, updating the new particle into the historical optimal particle;

step 6: updating global optimal particles of the population;

if the fitness value of the current particle is smaller than that of the population global optimal particle, updating the particle into the population global optimal particle;

and 7: checking whether an algorithm termination condition is met, and ending when the algorithm termination condition is met; otherwise, go to step 3.

Further, the step 1 of constructing the scientific workflow structure in the hybrid cloud environment specifically includes the following steps:

step 1-1; generating a scientific workflow Data Set (DS);

where dsi represents the data set numbered i, and ds _ size represents the size of the data set dsi; init represents the initial position of the data set; depep represents the final layout position of the data set; ti represents a task set requiring a data set dsi; pri _ flag represents a private data set storage flag, if the dsi is a private data set, the pri _ flag is the number of a data center for storing the dsi, otherwise, the pri _ flag is 0; the ini _ flag represents an initial data set flag, if the dsi is an initial data set, the ini _ flag is 1, otherwise, the ini _ flag is 0;

1-2; constructing a data center set DC in a mixed cloud environment based on a scientific workflow data set DS;

wherein dci represents a data center with number i, and DSi represents a data set stored in the data center i; size represents the storage capacity of the data center; availsize represents the available storage capacity of the current data center; the type represents the type of the data center, and when the type is 0, the data center is a public cloud data center; when the type is 1, the data center is a private cloud data center;

1-3; generating a task set T of the scientific workflow;

where ti represents the task numbered i; inDSi represents the set of input data sets for task ti; outDSi is represented as a set of output data sets for task ti; runDC represents the data center location where task ti is executing;

1-4; constructing a scientific workflow of the triples;

G＝<T,E,DS>(1)；

wherein E is a control flow set representing the execution sequence of tasks, and G is a directed acyclic graph; t is expressed as a task set of the scientific workflow; the DS represents the set of datasets for the scientific workflow.

Further, in the step 1, the method for allocating the tasks and the generated data sets in the scientific workflow structure in the mixed cloud environment is as follows:

when the task contains the private data set, placing the task into a data center which correspondingly stores the private data for operation; otherwise, the data center with more input data sets is placed for operation.

Further, the pri _ flag and ini _ flag in step 1-1 take the following values:

DSpri represents a set of private data sets that must be placed in a particular private cloud data center, pridc (dsi) represents the location of the private cloud data center where data sets dsi are stored, DSpub represents all public data set sets, DSini represents a set of all initial data sets that already exist before the scientific workflow is run; DSgen represents a collection of data sets generated by a scientific workflow runtime.

Further, the step 2 of initializing the population comprises the following steps:

step 2-1: particle encoding: the particle coding is a two-dimensional particle coding strategy, each particle represents a data layout scheme of a scientific workflow in a mixed cloud environment, and each particle comprises a data center of a first-dimensional data set and a second-dimensional data set placement position; the number of the data sets is M, and the number of the data centers is Q;

position of particle i at time tExpressed as:

wherein,a data center representing the placement of the kth data set at time t;

step 2-2: initializing the population size, the maximum iteration times, the inertia weight factor, the individual cognition factor, the global cognition factor and the particle initial speed according to a standard PSO algorithm.

Further, the initialization of the corresponding parameters in step 2-2 are respectively as follows: the population size is set to 50, the maximum number of iterations is set to 500, the inertial weight factor is set to 0.4, the individual cognition factor is set to 0.4, and the global cognition factor is set to 0.6.

Further, the specific method for initializing the self-history optimal particles and the initial population global optimal particles of the first generation of particles in step 2 is as follows:

step 2-3: the fitness value of each particle of the first generation is calculated,

step 2-4: and selecting the particle with the minimum fitness value as a population global optimal particle, and setting each particle in the first generation as a self history optimal particle.

Further, the calculation formula of the fitness of the particle is as follows:

wherein a particle q1 represents one coding scheme corresponding to the question and a particle q2 represents another coding scheme corresponding to the question; QTuf is a feasible solution set, and QTul is an infeasible solution set; evaluate (q1) represents the data transmission amount of the placement strategy corresponding to the feasible particle q1, and evaluate (q2) represents the data transmission amount of the placement strategy corresponding to the feasible particle q 2;

further, the update formula for updating the particle i in step 3 is as follows:

wherein,is an inertial part, and is a part,the cognitive component of the individual is a cognitive component,is a global cognitive component;

the standard PSO algorithm is combined with the variation operation of the genetic algorithm to obtain the inertia part of the particle i at the time tThe formula of (1) is as follows:

w is an inertia weight factor, w is used for adjusting the searching capacity of particles on a solution space, Mu () represents a mutation operation, r1 is a random number between 0 and 1, Mu () randomly selects a quantile, the value of the quantile is randomly changed, and the value meets the corresponding value range;

the formula for respectively obtaining the individual cognitive part and the global cognitive part of the particle i at the time t by combining the standard PSO algorithm and the cross operation of the genetic algorithm is as follows:

wherein c is₁Is an individual cognitive factor, c₂The method comprises the following steps that (1) a global cognition factor is adopted, pBest and gBest respectively represent the individual optimal position of a particle after multiple iterations and the global optimal position of a population; cp () and Cg () denote a crossover operation, r2 and r3 are random numbers between 0 and 1, Cp () and Cg () randomly select two fractional bits of a particle, crossover the same fractional bit of pBest or gBest; r is₁And r₂Is a random variable with a value range of [0, 1 ]]，r₁And r₂For enhancing randomness in the iterative search process.

The technical scheme is adopted, and a discrete particle swarm encoding mode is adopted to encode the scientific workflow data layout problem; the variation and cross operation of the genetic algorithm are introduced to update the particles, so that the problem of premature convergence in the optimization of the discrete particle swarm is effectively solved; an updating method for adaptively adjusting the inertia weight factor based on the difference degree between the global historical optimal particle and the current particle is constructed, and the complex and variable properties of the data problem are effectively met; by means of the fitness function definition of the data center capacity storage excess infeasible particles and the feasible particles with surplus capacity, the data moving times among the data centers are reduced, data transmission quantity is compressed, and therefore execution efficiency of scientific workflow can be effectively improved.

Drawings

The invention is described in further detail below with reference to the accompanying drawings and the detailed description;

FIG. 1 is an example diagram of a scientific workflow of a method for laying out scientific workflow data in a hybrid cloud environment according to the present invention;

FIG. 2 is a first scientific workflow data layout scheme of a scientific workflow data layout method in a hybrid cloud environment according to the present invention;

FIG. 3 is a scientific workflow data layout scheme II of the scientific workflow data layout method in a hybrid cloud environment according to the present invention;

FIG. 4 is a schematic particle encoding diagram of a scientific workflow data layout method in a hybrid cloud environment according to the present invention;

FIG. 5 is a mutation operator operation diagram of an inertial part of the scientific workflow data layout method in a hybrid cloud environment according to the present invention;

fig. 6 is a schematic diagram of the operation of the crossover operator of the personal or social cognitive part of the method for laying out the scientific workflow data in the hybrid cloud environment.

Fig. 7 is an algorithm flowchart of a scientific workflow data layout method in a hybrid cloud environment according to the present invention.

Detailed Description

For a more detailed description of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

1, problem analysis:

as shown in fig. 1, a scientific workflow example of the present invention is given, and fig. 2 and 3 show two data layout schemes, respectively, to describe the problem in detail. The scientific workflow consists of 5 tasks t₁，t₂，t₃，t₄，t₅5 input data sets d₁，d₂，d₃，d₄，d₅And 1 intermediate data set d₆And (9) composition. Wherein the data set ds₄Is a private data set that must be stored in a data center dc₂In (1), task t₄Is { ds }₃，ds₄，ds₆Is then t₄Must also be in the data center dc₂And (4) running. If only the number of data transmissions is considered, the number of transmissions in fig. 2 is 2, and the number of transmissions in fig. 3 is 5, but the size of the data set cannot be ignored, the present invention assumes ds₁＝1GB，ds₂＝1GB，ds₆The data transfer amount in fig. 2 is 60GB and the data transfer amount in fig. 3 is 34GB, 30 GB.

Consider 1) the large size of the data set; 2) there are dependencies between data sets; 3) the computing and storage resources of a single data center are insufficient to process and store the entire application data; 4) and part of the data set can only be stored in a designated private cloud data center in consideration of security and privacy. Different data sets need to be uploaded to different data centers respectively, and data are transmitted from other data centers when tasks are executed, so that the scientific workflow can run normally. Therefore, the invention provides a self-adaptive particle swarm optimization algorithm based on genetic algorithm operation, which introduces random two-point cross operation and random single-point variation operation of the genetic algorithm to improve the diversity in the population evolution process, effectively improves the execution efficiency of scientific workflow, and reduces the data movement times and data transmission quantity among data centers.

2, modeling a system:

scientific workflows are generally described as a Directed Acyclic Graph (DAG), in which nodes represent tasks and edges represent control dependencies. But this model is not suitable for data intensive scientific workflows because it does not have an input data set. Thus, the present invention extends the DAG model by considering the input data set.

Definition 1: a scientific workflow is defined as a triple:

G＝<T,E,DS>(1)

the control flow set G is a Directed Acyclic Graph (DAG), and E represents the order of execution of tasks. T represents the set of all tasks of the scientific workflow. The DS represents the set of all data sets of the scientific workflow.

Definition 2: the data center set in the hybrid cloud environment is defined as:

wherein dc_iRepresenting the data center numbered i. DS (direct sequence)_iRepresenting a collection of data sets stored in data center i; size represents the storage capacity of the data center; availsize represents the available storage capacity of the current data center; when the type is 0, the data center is a public cloud data center; when the type is 1, the data center is a private cloud data center.

Definition 3: the set of datasets for a scientific workflow is defined as:

wherein ds_iRepresenting the data set numbered i. ds _ size represents the size of the data set ds; init represents the initial position of the data set; depep represents the final layout position of the data set；T_iRepresenting a desired data set ds_iA task set of (2); pri _ flag and ini _ flag take the following values:

private data set DS_priMust be placed in a specific private cloud data center, data set ds_iThe stored private cloud data center is at the position of PrIDC (ds)_i)，DS_pubRepresented as a set of all public datasets. Based on the generation time of the data set, the present invention refers to the data set that existed before the scientific workflow was run as the initial data set, DS_iniRepresented as a collection of all initial datasets; generating a data set refers to a data set, DS, generated at the runtime of a scientific workflow_genRepresented as a collection of all the generated datasets.

Definition 4: the task set of the scientific workflow is defined as:

wherein t is_iIndicating the task numbered i. The invention focuses on the data placement problem of scientific workflows in a hybrid cloud environment, and therefore only input and output data sets corresponding to tasks are considered. InDS (inner data System)_iDenoted as task t_iThe input dataset set of (a); outDS_iDenoted as task t_iThe set of output datasets of; runDC is denoted as task t_iThe location of the data center where the execution is performed.

3 data Placement strategy of the invention

3.1 traditional particle swarm optimization Algorithm

The PSO algorithm is a group stochastic optimization algorithm based on global psychology principles and was proposed by Eberhart and Kennedy in 1995. The particles in the PSO algorithm represent candidate solutions of the optimization problem, can move at a certain speed in the whole problem space range, and continuously and iteratively update and adjust the position and the speed of the particles according to own experience and surrounding particles. Wherein the update formula of the speed and the position is expressed as:

where t is represented as the current number of iterations.And V_i ^tRespectively representing the position and the velocity of the ith particle after t iterations, and defining the maximum velocity of all the particles as V to ensure that the problem solution is always in the solution space_max。And gBest^tRespectively showing the self optimal position of the ith particle after t iterations and the historical optimal position of the population. The inertia weight w is used for adjusting the searching capability of the particles on the solution space and is important for the convergence of the algorithm. c. C₁And c₂The cognition factors represent the cognition learning ability of the current particles to the self historical optimal value and the population global historical optimal value. r is₁And r₂Is a random variable with a range of [0, 1 ]]And the method is used for enhancing the randomness in the iterative search process. In addition, in order to judge the superiority and inferiority of the solution, a fitness function is introduced to evaluate the solution quality of each particle.

3.2 adaptive particle swarm optimization algorithm based on genetic algorithm operation

The genetic algorithm is easy to deviate from the local optimal solution in the mutation process, but a better optimal interval is easy to leave, so that the search time is prolonged. As shown in equation (7), the PSO includes three core components: an inertial portion, an individual cognitive portion, and a global cognitive portion. In order to overcome the defect of premature convergence of the particle swarm algorithm, the invention introduces the variation and cross operation of the genetic algorithm and carries out corresponding updating operation on the particle swarm algorithm. The present invention will describe the algorithm from the following six sections.

Particle encoding:

the encoding mode of the particles affects the search efficiency and performance of the algorithm. The invention provides a two-dimensional particle coding strategy, and an integer coding strategy in a genetic algorithm is adopted to avoid invalid solutions. Each particle represents a data layout scheme of a scientific workflow in a hybrid cloud environment. The invention sets M-dimensional search space for each particle, namely the number of data sets is M, each dimension has a possibility set Q of an integer, namely the number of data centers is Q, and the position of the particle i at the time tCan be expressed as:

wherein,indicating the placement of the kth data set at time t. Fig. 4 illustrates a particle encoding strategy comprising 6 data sets, the first dimension of the particle represents the data set, the second dimension represents the placement position of the data set, the invention assumes that there are 3 data centers in the hybrid cloud environment, i.e. Q ranges from 0 to 2. As can be seen from FIG. 4, data set ds₃Is dc₁。

Setting parameters:

w in equation (7) affects the search power and convergence of the particle swarm algorithm: when w is larger, the algorithm has stronger global search capability, and when w is smaller, the algorithm has stronger local search capability. Due to the diversity and complexity of the initial problem space, the particles need to have strong global search capability, and as the number of iterations increases, the local search capability of the particles needs to be remarkably increased. Therefore, the inertial weight factor w should be linearly decreasing with evolution. The adjustment formula of w is set as follows:

wherein, w_maxAnd w_minRespectively representing the maximum and minimum values at w initialization, and t representing the current number of iterations. This strategy takes into account the difference between the current particle and the globally optimal particle,is defined as:

whereinThe difference digit of the current particle and the global optimal particle is represented, and the DSN represents the number of data sets in the scientific workflow. When in useWhen the value is smaller, the difference between the current particle and the global optimal particle is smaller, so that w needs to be reduced to ensure that the particle finds an optimal solution in a small range; otherwise, the size of w should be increased to make the search space for the particles larger in order to find the optimal solution faster.

Initializing a strategy:

in order to ensure the integrity and diversity of a search space, the invention provides a simple initialization strategy, parameter values such as the population size, the maximum iteration times, the inertia weight factor, the cognitive factor and the like in the initialization placement strategy are initialized according to the above mentioned rules, and an initial population is randomly generated. When randomly assigning data centers to each public data set, it is possible to have data sets in all data centers. For private data sets, the present invention always distributes them to specific data centers.

Fitness function:

the main objective of the data placement strategy proposed by the present invention is to reduce the total data traffic volume between data centers during scientific workflow execution. The method utilizes the fitness function of the particles to evaluate the superiority and inferiority comparison of the two particles. In general, the particles corresponding to smaller fitness function values are better, and the method takes the global transmission quantity of the data placement scheme as an evaluation function. Since the previous particle encoding strategy does not satisfy the soundness principle, that is, the residual size of the data center may not be enough to store the generated data set generated during the operation of the workflow, the present invention needs to distinguish and define the fitness function between the feasible solution and the infeasible solution from the following three aspects. In the present invention, a particle q1 indicates one coding scheme corresponding to a problem, and a particle q2 indicates another coding scheme corresponding to a problem.

Case 1: particle q₁Belong to the feasible solution set QT_ufParticles q₂Belonging to the infeasible solver set QT_ul. A feasible solution is chosen without any controversy, whose fitness function is defined as shown in equation (12), where evaluate (q)₁) Represents feasible particles q₁The amount of data transfer for the corresponding placement policy.

fitness＝evaluate(q₁),q₁∈QT_uf(12)

Case 2: particle q₁And q is₂All belong to a feasible solution set QT_ufThe particle with the least total data transmission amount is selected. The fitness function is defined as shown in formula (13):

fitness＝min(evaluate(q₁),evaluate(q₂)),q₁,q₂∈QT_uf(13)

case 3: particle q₁And q is₂All belong to the infeasible solution set QT_ulThe particle with the least total data transmission amount is selected. As the particle is more likely to become a viable solution after evolution. The fitness function is defined as shown in formula (14):

fitness＝min(evaluate(q₁),evaluate(q₂)),q₁,q₂∈QT_ul(14)

furthermore, each task in a scientific workflow must meet at least four requirements: (1) tasks are scheduled to be performed on a particular data center; (2) the input data set of the task comprises a privacy data set and must be placed on a private cloud data center; (3) the initial position of the output data set of the task is on a data center for executing the task; (4) the data sets required by the tasks are in the same data node. Thus, if the input data sets of a task are stored on different data centers, data transfer between the data nodes is required before execution. The present invention assumes that a task is placed on a data center that contains a larger number of input data sets for that task in order to reduce the amount of data transfer between data centers due to task scheduling.

And calculating the fitness value of the particle, firstly, distributing the task and the generated data set, if the task contains the private data set, placing the task to the corresponding data center for operation, and otherwise, placing the task to the data center with more input data sets for operation. Then, the data transfer amount generated when each task is executed is calculated according to the execution order of the tasks. And finally, returning the fitness value of the particle, namely the total data transmission amount generated by the deployment scheme corresponding to the particle.

Update policy

As shown in equation (7), the PSO includes three core components: an inertial portion, an individual cognitive portion, and a global cognitive portion. The invention introduces the crossover and mutation operation of the genetic algorithm to overcome the defect of premature convergence of the particle swarm algorithm. The update formula of the particle i at the time t is as follows:

for the inertia part, the invention combines the variant operation idea of the genetic algorithm to update the corresponding part in the formula (7), and the formula of the inertia part of the particle i at the time t is as follows:

wherein M is_u() Denotes a mutation operation, r₁Is a random number between 0 and 1. M_u() Randomly selecting a fractional bit, and randomly changing the numerical value of the fractional bit, wherein the numerical value meets the corresponding value range. FIG. 5 shows a mutation operation on the encoded particle of FIG. 4, a data set ds of the particle₂The value of the corresponding placement position is updated from 0 to 1, and the variance meets the scheduling criteria.

For the individual cognition part and the global cognition part, the invention combines the cross operation thought of the genetic algorithm to update the corresponding part in the formula (7), and the formulas of the individual cognition part and the global cognition part of the particle i at the time t are as follows:

wherein C is_p() And C_g() Denotes a crossover operation, r₂And r₃Is a random number between 0 and 1. C_p() And C_g() Randomly selecting two fractions of a particle, andthe values between the same quantiles of pBest (gBest) are interleaved.

FIG. 6 shows the interleaving operation on the encoded particles of FIG. 4, the particles being in a data set interval ds₂To ds₄The value of the corresponding placement position is interleaved with the value of the corresponding position of pbest (gbest), and the operation meets the scheduling criteria.

Algorithm flow

Fig. 7 is an algorithm flowchart of the scientific workflow data placement method of the present invention, which includes the following steps:

and step 3: the velocity and position of the particles are updated,

and 4, step 4: calculating a fitness value of each particle;

and 5: updating the self history optimal particle of the particle;

step 6: updating global optimal particles of the population;

Claims

1. A scientific workflow data layout method in a mixed cloud environment is characterized by comprising the following steps: which comprises the following steps:

and step 3: the velocity and position of the particles are updated,

and 4, step 4: calculating a fitness value of each particle;

and 5: updating the self history optimal particle of the particle;

step 6: updating global optimal particles of the population;

2. The method for laying out scientific workflow data in a hybrid cloud environment according to claim 1, wherein the method comprises the following steps: the method for constructing the scientific workflow structure in the mixed cloud environment in the step 1 specifically comprises the following steps:

step 1-1; generating a scientific workflow Data Set (DS);

1-3; generating a task set T of the scientific workflow;

1-4; constructing a scientific workflow of the triples;

G＝<T,E,DS>(1)；

3. The method for laying out scientific workflow data in a hybrid cloud environment according to claim 1, wherein the method comprises the following steps: the method for distributing the tasks and the generated data sets in the scientific workflow structure in the mixed cloud environment in the step 1 comprises the following steps:

4. The method for laying out scientific workflow data in a hybrid cloud environment according to claim 2, wherein the method comprises the following steps: the values of pri _ flag and ini _ flag in step 1-1 are as follows:

5. The method for laying out scientific workflow data in a hybrid cloud environment according to claim 1, wherein the method comprises the following steps: step 2 initializing the population comprises the following steps:

position of particle i at time tExpressed as:

wherein,a data center representing the placement of the kth data set at time t;

6. The method for laying out scientific workflow data in a hybrid cloud environment according to claim 5, wherein the method comprises the following steps: the initialization of the corresponding parameters in the step 2-2 is respectively as follows: the population size is set to 50, the maximum number of iterations is set to 500, the inertial weight factor is set to 0.4, the individual cognition factor is set to 0.4, and the global cognition factor is set to 0.6.

7. The method for laying out scientific workflow data in a hybrid cloud environment according to claim 1, wherein the method comprises the following steps: the specific method for initializing the self-history optimal particles and the initial population global optimal particles of the first generation particles in the step 2 is as follows:

8. The method for laying out scientific workflow data in the hybrid cloud environment according to claim 1 or 7, wherein the method comprises the following steps: the calculation formula of the fitness of the particles is as follows:

wherein a particle q1 represents one coding scheme corresponding to the question and a particle q2 represents another coding scheme corresponding to the question; QTuf is a feasible solution set, and QTul is an infeasible solution set; evaluate (q1) represents the data transfer amount of the placement strategy corresponding to the feasible particle q1, and evaluate (q2) represents the data transfer amount of the placement strategy corresponding to the feasible particle q 2.

9. The method for laying out scientific workflow data in a hybrid cloud environment according to claim 1, wherein the method comprises the following steps: the update formula for updating the particle i in step 3 is as follows: