Continuous annealing band steel performance multi-index prediction method based on multi-objective evolutionary deep learning
Technical Field
The invention relates to the technical field of data processing in a cold rolling production control process of steel enterprises, in particular to a continuous annealing strip steel performance multi-index prediction method based on multi-objective evolutionary deep learning.
Background
Continuous annealing is an important step in the cold rolling process of the steel industry, and can improve the physical properties and the processing properties of metal materials, so that the metal materials are more suitable for various industrial applications. Because the continuous annealing production process involves a plurality of processing stages of heating, heat preservation, quick cooling, slow cooling, overaging, quenching and the like, and each processing stage also comprises a plurality of control technological parameters with mutual coupling relation, the technological mechanism is complex, so that a plurality of performance indexes of the continuous annealing strip steel are difficult to accurately judge according to manual experience in the actual production process, and further larger fluctuation of the quality indexes of the strip steel easily occurs in the production process, and continuous stable operation of production is influenced.
Therefore, the data-driven modeling method represented by deep learning becomes an important means for predictive modeling of the performance index of the continuous annealing steel strip, and is a key technology for improving the quality of the continuous annealing steel strip product and the intelligent level of production process control. For example, in the patent application No. CN116796627A, a cold-rolled steel sheet performance prediction method and system based on deep learning are provided, and parameters of a deep neural network are set mainly according to manual experience in the method. Although the deep neural network has many advantages in capturing implicit relation among process control parameters and acquiring associated characteristics, the deep neural network generally relies on manual experience in architecture design of a deep learning model and setting of network parameters, so that the problem of generalization capability reduction easily occurs when a strip steel performance prediction model based on traditional deep learning changes working conditions, and the actual production needs are difficult to meet. In addition, most continuous annealing steel strip performance prediction models based on machine learning can only predict single performance indexes (for example, an invention patent with patent number ZL201710159565.8, an invention patent with patent number ZL201410843307.8, a method for online detection of quality of continuous annealing steel strip based on hybrid integrated learning, and an invention patent with patent number ZL201410843307.8, an invention patent with patent number ZL201710159565.8, an invention patent with patent number ZL 201410843307.8) can not predict multiple performance indexes simultaneously.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a continuous annealing strip steel performance multi-index prediction method based on multi-objective evolution deep learning, which overcomes the defects of the traditional deep neural network model based on artificial experience and realizes simultaneous prediction of a plurality of performance indexes, and performs global search in a super-parameter space of the deep learning model through a multi-objective evolution algorithm so as to obtain a super-parameter combination with optimal performance, realize autonomous optimization construction of the deep learning model and multi-index prediction of strip steel performance.
In order to solve the technical problems, the invention adopts the following technical scheme:
A continuous annealing band steel performance multi-index prediction method based on multi-target evolution deep learning comprises the following steps:
step 1, constructing a sample set;
The output of one sample is a plurality of performance indexes of a continuous annealing strip steel, and the input is process data when the strip steel is subjected to continuous annealing production;
Step 2, data preprocessing;
Normalizing the n sample data in the established sample data set S sam;
Step 3, generating a multi-target evolution deep learning model;
According to the n samples obtained in the step 2, training network super-parameters by using a multi-objective evolutionary deep learning method based on NSGA-II;
step 4, selecting an optimal solution;
Based on the objective function MSE sum and model complexity of each solution in the final population obtained in the step 3, selecting by using a TOPSIS method to obtain an upper-ranking solution as an optimal super-parameter combination of the CNN architecture;
step 5, constructing a hybrid deep learning network model;
Constructing a corresponding deep convolutional neural network according to the value of each super parameter in the final solution selected in the step 4, and then training the network on all data samples to obtain a corresponding deep convolutional neural network CNN;
Step 6, based on the deep convolutional neural network CNN obtained in the step 5, a transducer network is overlapped behind the deep convolutional neural network CNN, a mixed deep learning network model CNN-transducer is constructed, and then training is carried out on the network on all data samples, so that a final strip steel performance multi-index prediction model CNN-transducer is obtained;
And 7, acquiring continuous annealing production process data of each strip steel in real time through an OPC interface, predicting 3 key performance indexes of yield strength, tensile strength and elongation by using the CNN-converter model obtained in the step 6, realizing online accurate prediction of multiple indexes, and giving an alarm in time when the deviation between the predicted value and the expected value of the performance index exceeds a given range so as to ensure the stability of the quality of continuous annealing strip steel products.
Further, the specific method of the step 3 is as follows:
Step 3.1, randomly dividing the n samples after pretreatment into 0.8n and 0.2n samples to respectively form a training set D train and a verification set D validate, and if 0.8n and 0.2n are not integers, performing lower rounding treatment;
Setting the maximum iteration number L max, the population scale N pop and the network super-parameter set S para={x1,x2,x3,x4,x5,x6 of the multi-target evolutionary deep learning algorithm, and setting an iteration number counter l=0, wherein x 1、x2、x3、x4、x5、x6 respectively corresponds to 6 parameters of a convolutional neural network model, namely the number of convolutional layers, the number of output channels, the size of a convolutional kernel, the number of hidden units, the learning rate and weight attenuation;
Initializing a population, randomly generating N pop real number vectors with the length equal to the number of the super parameters as an initial population by taking all the super parameters of a deep learning network model as decision variables to form a first generation population, wherein each vector x= (x 1,x2,x3,x4,x5,x6) is a solution in the population, represents a super parameter combination scheme, and can construct a corresponding deep learning network based on the individual;
Step 3.4, constructing a corresponding deep convolutional neural network according to the corresponding super-parameter value of each solution in the population, training the deep convolutional neural network on a training data set D train, and calculating the prediction precision and complexity Complexity of the deep convolutional neural network model by using a verification set D validate as two optimized objective functions of the solution, wherein the prediction precision is defined as the sum MSE sum of mean square errors of a plurality of indexes;
Step 3.5, non-dominant sorting is carried out on the solutions in the current population according to the objective function and the crowding distance, and N pop solutions are selected from the current population to be used as a parent solution set by utilizing a binary tournament strategy;
Step 3.6, randomly selecting two solutions from the population, sequentially performing crossing and mutation operations to generate new offspring individuals, and repeatedly executing N pop times to obtain N pop child generation solutions;
step 3.7, for each child solution, obtaining an objective function of the child solution by using the objective function calculation method in step 3.4;
step 3.8, combining the newly generated N pop offspring solutions with the current population to form a population with the size of 2N pop;
Setting a new population as an empty set, and further dividing 2N pop solutions into different Pareto fronts for the combined population by using a NSGA-II rapid non-inferior sorting method;
Step 3.10, setting iteration times l=l+1, if L is not less than L max, turning to step 4, otherwise, returning to step 3.6.
Further, in the step 3.2, a corresponding design needs to be performed on the architecture of the deep learning network model, and the building of the model includes the following steps:
step 3.2.1 convolutional layer design, network includes A plurality of convolution layers, each usingA convolution kernel with the size of x 3 =3 and the output channels, wherein the convolution layer introduces nonlinear transformation through a ReLU activation function;
Step 3.2.2, selecting an optimizer and a loss function, namely adopting an Adam optimizer and a mean square error loss function MSELoss, wherein the loss function measures the difference between a predicted value and an actual value through the square difference between the output of a network and a target;
step 3.2.3 full connection layer design the network comprises 3 full connection layers, each full connection layer containing A plurality of hidden units, each hidden unit having a ReLU activation function behind it;
step 3.2.4 setting weight decay, initializing the weights of the convolution layer and the linear layer by using an Xavier initialization method, continuously optimizing parameters to minimize a loss function by means of back propagation and gradient descent during training of the whole network, wherein the learning rate is set Weight decay setting。
Further, the specific method for the crossover and mutation operation in the step 3.6 is as follows:
Step 3.6.1, performing SBX crossover on the selected individuals, firstly generating a random number u, if u is less than 0.9, performing crossover, otherwise, directly copying the parent individuals to the next generation;
Step 3.6.2, performing mutation operation on newly generated individuals, firstly generating random number u ', if u' <0.5, performing mutation operation, selecting one individual parameter x i from the current population, and according to the formula Obtaining mutated individual parameters, wherein x i,new is mutated individual parameter, x i,old is selected individual parameter, x i,upper and x i,lower are upper and lower bounds of decision variable x i, i=1, 2,..6, delta represents variant, otherwise, individual is kept unchanged.
Further, the specific process of the step 3.9 is as follows:
If the solutions contained in the current Pareto front are all added into the new population, the number of the solutions in the new population is still smaller than N pop, all the solutions in the Pareto front are added into the new population, and the next Pareto front is transferred, otherwise, the most crowded solutions are sequentially deleted from the current Pareto front based on the crowding distance until the size of the new population just reaches N pop after the rest solutions are added into the new population, and finally all the rest solutions are added into the new population, so that the new population of the next iteration is formed.
Further, the specific method in the step 4 is as follows:
step 4.1, constructing a decision matrix;
Constructing a decision matrix X containing v rows and 2 columns, taking MSE sum and Complexity as different columns, wherein the decision matrix is that ;
Step 4.2, standardizing a decision matrix;
the decision matrix is standardized, so that all indexes are ensured to be in the same numerical range, and the standardized matrix is recorded as Wherein x ij represents an element of the original decision matrix;
Step 4.3, determining weights;
determining a weight for each index, representing the relative importance thereof in the comprehensive evaluation, wherein w 1=0.6,w2 = 0.4 is the weight of two indexes respectively corresponding to the characteristic MSEs sum and complexity of the two columns of the decision matrix X respectively;
Step 4.4, constructing a weighted normalized decision matrix;
multiplying the normalized decision matrix by the weight to obtain a weighted normalized decision matrix ,Elements representing the normalized decision matrix;
step 4.5, calculating and determining the ideal solution And negative ideal solution;
Step 4.6, calculating the distance;
calculating the distance of each solution to the positive ideal solution Distance from negative ideal solution;
Step 4.7, calculating comprehensive scores;
calculating the proximity of each solution The index value corresponding to each solution,;
Step 4.8, selecting an optimal scheme;
And C i of all schemes is ranked according to the comprehensive scores, and the first-ranked solution is selected from high to low to serve as the optimal super-parameter setting of the convolutional network model.
Further, the specific method in the step 5 is as follows:
step 5.1, obtaining an optimal individual, taking one optimal individual selected in the step 4 as a decoding object, wherein the optimal individual comprises optimal super-parameter settings of a group of convolutional neural networks CNN;
Step 5.2, respectively decoding different types of super-parameters, ensuring that the decoded super-parameters accord with a defined parameter range, and if the decoded super-parameters exceed the defined range, performing truncation processing to find the nearest real value in a decision variable value range;
and 5.3, applying the decoded super parameters to the architecture and training process of the convolutional neural network model CNN.
The continuous annealing strip steel performance multi-index prediction method based on multi-objective evolution deep learning has the advantages that the multi-objective evolution algorithm is applied to super-parameter selection of the deep learning model, the accuracy of an online detection method is further improved, meanwhile, a training set is obtained in a random selection mode, the optimal super-parameter combination of the deep learning model can be quickly obtained by means of a searching strategy of the training set in a super-parameter space, and finally the generalization capability of the online detection model is improved. Through actual production data testing, the method provided by the invention can be used for predicting the multi-performance index of the continuous annealing steel strip, so that the cold rolling production can be helped to improve the level of product quality control.
Drawings
FIG. 1 is a schematic diagram of a cold rolling production process flow and data acquisition based on data driving provided by an embodiment of the invention;
FIG. 2 is a flowchart of a multi-objective evolutionary deep learning model algorithm provided by an embodiment of the invention;
FIG. 3 is a diagram of a continuous annealing band steel performance multi-index prediction network structure provided by an embodiment of the invention;
FIG. 4 is a graph showing a comparison of tensile strength predictions provided by an embodiment of the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
Aiming at the strip steel with certain tempering degree, the embodiment provides a continuous annealing strip steel performance multi-index prediction method based on multi-objective evolutionary deep learning, which is specifically described as follows.
And 1, constructing a sample set.
The output of one sample is a plurality of performance indexes (yield strength, tensile strength, elongation and the like) of a continuous annealing strip steel, and the input is process data when the strip steel is subjected to continuous annealing production. N=15201 samples under normal production conditions are collected from historical data of the cold rolling continuous annealing unit, and a sample set S sam is established. The information of the data of the cold rolling production process in this embodiment is shown in table 1, and the data collection process is shown in fig. 1.
TABLE 1 acquisition item information of Cold Rolling Process data
Acquisition item name |
Unit (B) |
Thickness of continuous annealing plan |
mm |
Continuous annealing plan width |
mm |
Rp0.2 |
N/mm2 |
Elongation percentage |
% |
Upper surface roughness |
μm |
Roughness of lower surface |
μm |
Furnace speed |
m/min |
JPF |
°C |
... |
... |
Tapping temperature |
°C |
Average coiling temperature |
°C |
Average finish rolling temperature |
°C |
Average finishing temperature |
°C |
Continuous pickling speed |
m/s |
Elongation by rolling |
% |
Elongation of leveling machine |
% |
Inlet tension of levelling machine |
kN |
And 2, preprocessing data. N=15201 sample data in the set of sample data S sam is normalized, and the data is scaled to the [0,1] range in order to ensure the regularity of the data.
And 3, generating a multi-target evolution deep learning model. According to 15201 samples obtained in the step 2, the network super-parameters are trained by using a multi-objective evolutionary deep learning method based on NSGA-II, and a flow chart is shown in fig. 2, wherein the main process comprises:
Step 3.1 randomly dividing 0.8n=12160 and 0.2n=3040 samples from the preprocessed n=15201 samples, respectively constructing a training set D train and a verification set D validate.
Step 3.2, setting the maximum iteration times L max =100, the population size N pop =100, the crossover probability p c =0.9, the mutation probability p m =0.5, the distribution index eta c =5 of SBX crossover, the network super-parameter set S para={x1,x2,x3,x4,x5,x6, and setting an iteration times counter l=0. Wherein x 1、x2、x3、x4、x5、x6 corresponds to each parameter of the convolutional neural network model, namely the convolutional layer number, the output channel number, the convolutional kernel size, the hidden unit number, the learning rate and the weight attenuation.
In the deep learning model, the architecture of the model needs to be correspondingly designed, and the establishment of the model comprises the following steps:
Step 3.2.1 convolutional layer design the network comprises A plurality of convolution layers, each usingThe convolution layer introduces a nonlinear transformation by the ReLU activation function, with a convolution kernel of size x 3 =3 and the number of output channels.
Step 3.2.2 optimizer and loss function selection the difference between the predicted and actual values is measured by the square difference between the output of the network and the target using Adam optimizer and mean square error loss function MSELoss.
Step 3.2.3 full connection layer design the network comprises three full connection layers, each full connection layer comprisingEach hidden unit is followed by a ReLU activation function to improve the non-linearity capability of the model.
Step 3.2.4, setting weight attenuation, namely initializing the weights of the convolution layer and the linear layer by using an Xavier initialization method, and continuously optimizing parameters to minimize a loss function by means of back propagation and gradient descent during training of the whole network. Wherein the learning rate is setWeight decay setting。
Initializing a population, namely taking all super parameters of a deep learning model as decision variables, randomly generating 100 real vectors with the length equal to the number of the super parameters as the initial population, and forming a first generation population. Each vector x= (x 1,x2,x3,x4,x5,x6) is a solution in the population, and represents a super-parameter combination scheme, and a corresponding deep learning network can be constructed based on the individual.
Step 3.4, constructing a corresponding deep convolutional neural network according to the corresponding hyper-parameters of each solution in the population, training the network on a training dataset D train, and calculating the prediction accuracy (defined as the sum of the mean square errors of a plurality of indexes) of the network model by using a verification set D validate ) And complexity Complexity as two optimization objective functions for the solution. Where 0.2n represents the number of samples of the validation set, j= {1,2,3} represents the index yield strength, tensile strength and elongation,Is the actual value of the jth index of the ith sample,Is the predicted value of the j index of the i sample.
And 3.5, non-dominant sorting is carried out on the solutions in the current population according to the objective function and the crowding distance, and 100 solutions are selected from the current population to be used as a parent solution set by utilizing a binary tournament strategy.
And 3.6, randomly selecting two solutions from the population, and sequentially performing crossover and mutation operations to generate new offspring individuals. The execution is repeated 100 times to obtain 100 sub-solutions.
Step 3.6.1, performing SBX crossover on the selected individuals, namely firstly generating a random number u, if u is smaller than 0.9, performing crossover, otherwise, directly copying the parent individuals to the next generation.
Step 3.6.2, performing a mutation operation on the newly generated individuals by first generating a random number u ', and if u' <0.5, performing a mutation operation, selecting a parameter x i of an individual from the current population, and according to the formulaObtaining individual parameters after mutation. Otherwise, the individual is kept unchanged. Where x i,new is the individual parameter after mutation, x i,old is the individual parameter selected, x i,upper and x i,lower are the upper and lower bounds of the decision variable x i (i=1, 2,..6), respectively, δ represents the amount of mutation, and δ=0.2 is set.
And 3.7, obtaining an objective function of each child solution by using the objective function calculation method in the step 3.4.
And 3.8, combining the newly generated 100 child solutions with the current population to form a population with the size of 200.
Setting new population as empty set, dividing 200 solutions into different Pareto fronts by using NSGA-II fast non-inferior sorting method, selecting solutions from the best first Pareto fronts in turn to generate next generation population, if the number of solutions in the new population is still less than 100, adding all solutions in the Pareto fronts into the new population, and turning to the next Pareto fronts, otherwise, deleting the most crowded solutions from the current Pareto fronts in turn based on crowding distance until the size of the new population is exactly 100 after the rest solutions are added into the new population, and finally adding all the rest solutions into the new population to form the new population of next iteration.
Step 3.10, setting iteration times l=l+1, if L is not less than L max, turning to step 4, otherwise, returning to step 3.6.1.
And 4, selecting an optimal solution, namely selecting by using a TOPSIS method based on an objective function MSE sum and model complexity of each solution in the final population to obtain an upper-ranking solution as an optimal super-parameter combination of the CNN architecture.
Step 4.1 constructing a decision matrix, namely constructing a decision matrix X containing 100 rows and 2 columns, and taking MSE sum and Complexity as different columns, wherein the decision matrix is。
Step 4.2, standardizing the decision matrix, namely, standardizing the decision matrix to ensure that all indexes are in the same numerical range, and marking the standardized matrix asWhere x ij represents the elements of the original decision matrix and v represents 100 elements per column.
Step 4.3 determining weights for each index, i.e. the characteristic MSEs sum and complexity of the two columns of the decision matrix X, indicating their relative importance in the overall evaluation, where w 1=0.6,w2 =0.4.
Step 4.4, constructing a weighted normalized decision matrix, namely multiplying the normalized decision matrix by the weight to obtain the weighted normalized decision matrix,Representing elements of the normalized decision matrix.
Step 4.5, determining positive ideal solution and negative ideal solution, namely calculating to obtain positive ideal solution asNegative ideal solution is。
Step 4.6 calculating distance, calculating distance from each solution to the positive ideal solutionDistance of negative ideal solution。
Step 4.7 calculating the composite score, calculating the proximity of each solutionAn index value corresponding to each solution.
And 4.8, selecting an optimal scheme, namely sequencing the C i of all schemes according to the comprehensive scores, and selecting the first-ranked solution from high to low as the optimal super-parameter setting of the convolutional network model.
And 5, constructing a hybrid deep learning network model. And (3) constructing a corresponding deep convolutional neural network according to the value of each super parameter in the final solution selected in the step (4), and then training the network on all data samples to obtain the corresponding deep convolutional neural network.
And 5.1, obtaining an optimal individual, namely taking one optimal individual selected in the step 4 as a decoding object, wherein the optimal individual comprises the optimal super-parameter setting of a group of convolutional neural networks CNN.
And 5.2, respectively decoding different types of super-parameters, ensuring that the decoded super-parameters accord with a defined parameter range, and if the decoded super-parameters exceed the defined range, performing truncation processing to find the nearest real value in the decision variable value range.
And 5.3, applying the decoded super parameters to the architecture and training process of the convolutional neural network CNN model.
And 6, based on the obtained deep convolutional neural network CNN, a transducer network is overlapped behind the deep convolutional neural network CNN, a mixed deep learning network model CNN-transducer is constructed, a network architecture diagram is shown in fig. 3, and then the network is trained on all data samples to obtain a final strip steel performance multi-index prediction model CNN-transducer, wherein the output dimension of the convolutional network CNN is used as the input of the transducer network, the number of multi-head attention mechanisms is set to be 4, and the repeated decoder units in the transducer network are set to be 1.
And 7, applying a CNN-converter learning model, namely acquiring continuous annealing production process data of each strip steel in real time through an OPC interface in actual production, predicting 3 key performance indexes of yield strength, tensile strength and elongation by using the constructed CNN-converter model, realizing online accurate prediction of multiple indexes, and giving an alarm in time when the deviation between the predicted value and the expected value of the performance index exceeds a given range so as to ensure the stability of the quality of continuous annealing strip steel products.
Table 2 shows the comparison results of the super parameter tuning and the empirical value setting for 12160 training samples according to the continuous annealing band steel performance multi-index prediction method based on multi-objective evolutionary deep learning provided in this embodiment, and table 3 shows the experimental results of the test samples (samples without training the model), and the test set samples were independently run 30 times in the experiment to perform statistical analysis. As can be seen from table 3, the hit rates of the root mean square error RMSE and the average error MAE in the multi-index prediction task according to the multi-objective evolutionary deep learning method provided by the embodiment are both at a low level, which well reflects the proximity degree between the model predicted value and the actual value, and also shows that the multi-index prediction task has good stability after parameter tuning and the mixed deep learning model is adopted for multi-index prediction task, the prediction accuracy is highly consistent with the actual situation, and fig. 4 shows a comparison illustration of the predicted value and the actual value of the tensile strength index.
Table 2 comparison of evolutionary Algorithm tuning results and empirical value set results
Parameters (parameters) |
Results of tuning |
Empirical values |
Number of convolution layers |
4 |
5 |
Number of output channels |
192 |
128 |
Convolution kernel size |
3 |
4 |
Number of hidden units |
141 |
50 |
Learning rate |
0.06039172533384231 |
0.001 |
Weight decay |
1.3798747705105333E-06 |
0.001 |
TABLE 3 evaluation results of mechanical Properties prediction index
Mechanical properties |
MAE after evolution |
RMSE after evolution |
MAE before evolution |
RMSE before evolution |
Yield strength of |
2.308 |
4.785 |
2.405 |
4.862 |
Tensile strength of |
1.908 |
3.712 |
2.902 |
4.213 |
Elongation percentage |
0.761 |
1.386 |
0.775 |
1.406 |
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not deviate from the essence of the corresponding technical solution from the scope of the present invention defined by the claims.