CN111598854A

CN111598854A - Complex texture small defect segmentation method based on rich robust convolution characteristic model

Info

Publication number: CN111598854A
Application number: CN202010368806.1A
Authority: CN
Inventors: 陈海永; 刘聪; 王霜; 刘卫朋; 张建华
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2020-05-01
Filing date: 2020-05-01
Publication date: 2020-08-28
Anticipated expiration: 2040-05-01
Also published as: CN111598854B

Abstract

The invention discloses a method for segmenting complex texture small defects based on a rich robust convolution characteristic model, which is characterized by comprising the steps of obtaining an image containing an object to be segmented, and performing characteristic recombination on the image containing the object to be segmented by using the rich robust convolution characteristic model to obtain a characteristic diagram of each side output layer; the feature diagram of each side output layer is sequentially connected with a deconvolution layer and a side output layer fine loss function to obtain a prediction feature diagram of each stage side output layer; and meanwhile, adding a fusion layer in the model, fusing the characteristic graphs after deconvolution of the characteristic graphs of all side output layers, and then connecting a fusion layer fine loss function to obtain a final prediction graph so as to realize defect segmentation. The method solves the problem of inaccurate prediction result caused by unbalanced proportion between the target pixel and the background pixel, and can predict the fine target.

Description

Complex texture small defect segmentation method based on rich robust convolution characteristic model

Technical Field

The invention relates to the technical field of lithium battery surface defect detection, in particular to a complex texture small defect segmentation method based on a rich robust convolution characteristic model.

Background

The detection of the surface defects of the lithium battery becomes an important technical means for controlling the surface quality of the lithium battery, and the surface quality of the lithium battery can not only prolong the service life of a battery assembly, but also improve the power generation efficiency of the lithium battery.

For a Convolutional Neural Network (CNN) based crack segmentation method, two problems are usually encountered, one is that missed detection or false detection of cracks is serious, and the other is that a predicted crack segmentation result is thick, and a complex post-processing is required to obtain a fine crack. The complex non-uniform texture background on the surface of the lithium battery is one of the main reasons for the above problems, and another reason is that the ratio of crack pixels to background pixels in the defect image is extremely unbalanced, for example, the size of the defect image is 100 ten thousand pixels, and the defect pixels only occupy dozens of pixels or even dozens of pixels.

The information obtained by different convolutional layers becomes coarser as the number of layers becomes deeper, the information is more global, the lower convolutional layer contains complex random texture background and target detail information, the distinction between the target information and the background is not obvious, only some information which does not have distinction in shape, corner features and the like is learned by a network, the important target information in the higher convolutional layer is reserved, and the middle convolutional layer contains essential target detail information. However, the general convolutional neural network model only uses the output characteristics of the last convolutional layer or the convolutional layers before the pooling layer of each stage, and ignores the target detail information contained in the middle convolutional layer; for crack segmentation, the critical problem is that the similarity between background information and target information is high, and excessive fusion will cause serious false detection.

Although the segmentation method based on the convolutional neural network is good at predicting the characteristics of the contour, the edge and the like rich in semantic information, the analysis shows that the prediction result of directly adopting the convolutional neural network to segment the crack is much coarser than the labeled crack of a real label, so that the crack pixel cannot be accurately positioned. The problem of predicting cracks, edges, contours or lines too thick is rarely discussed in the existing literature, and one possible reason is that these methods usually apply a post-processing method for thinning cracks, edges, contours or lines after an initial prediction result is generated to obtain a prediction result close to a real label, so that the width of the processed prediction result has little influence on the result, and actually reduces the prediction precision, and therefore, the requirements cannot be met in some detection tasks with higher requirements on accurate positioning at a pixel level.

The loss function is used for evaluating the degree of inconsistency between the predicted value and the true value of the model, the smaller the loss function is, the better the robustness of the representative model is, and the loss function can guide the model learning. The lithium battery crack defect image has a large proportion of model loss of a negative sample (background pixel) due to the fact that the proportion of the crack pixel to the background pixel is extremely unbalanced, so that the learning process falls into a local minimum value of a loss function, prediction is more biased to the background pixel, and the trained model cannot detect the crack which is an event which is not easy to occur.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method for segmenting complex texture small defects based on a rich robust convolution characteristic model.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a segmentation method of complex texture small defects based on a rich robust convolution feature model is characterized in that the method comprises the steps of obtaining an image containing an object to be segmented, and performing feature recombination on the image containing the object to be segmented by using the rich robust convolution feature model to obtain a feature map of each side output layer; the feature diagram of each side output layer is sequentially connected with a deconvolution layer and a side output layer fine loss function to obtain a prediction feature diagram of each stage side output layer;

meanwhile, a fusion layer is added in the model, the characteristic graphs after deconvolution of the characteristic graphs of all side output layers are fused together, and then a fusion layer fine loss function is connected to obtain a final prediction graph, so that defect segmentation is realized;

wherein, the side output layer fine loss function satisfies formula (1):

P_side＝σ(A_side)，A_side＝{a_j，j＝1，……|Y|} (2)

in the formula, L^(k)(P_sideG) represents the distance loss function of the k stage; l (W, W)^(k)) Representing a weighted cross-entropy loss function for the kth stage; p_sideA prediction feature map representing a k-th stage side output layer; sigma is a sigmoid activation function; a. the_sideA set of activation values at all pixels of a predicted feature map representing k stage-side output layers; a is_jRepresenting an activation value at any pixel j in the prediction feature map of the k-th stage side output layer; y represents the sum of the defective pixel and the non-defective pixel in the diagram;

the fusion layer fine loss function is given by:

L_fuse(W，w)＝L_c(P_fuse，G) (3)

in the formula, L_cA cross entropy loss function representing a criterion; p_fuseRepresenting the fusion of the prediction characteristic graphs of the k stage side output layers, namely the fusion layer weight; k represents the total number of stages;

summarizing the fusion layer fine loss function and the side output layer fine loss functions of all stages by using an argmin function to obtain a target function L, and expressing the target function L by using a formula (5);

and finally, optimizing the objective function to obtain the weights of the side output layer fine loss function and the fusion layer fine loss function.

The specific process of utilizing the rich robust convolution characteristic model to carry out characteristic recombination comprises the following steps:

removing the full-connection layer and the pooling layer in the fifth stage on the basis of the original ResNet40 network, and respectively laterally connecting the identification block layer in the first stage and the identification block layer in the second stage of the original ResNet40 network with one convolutional layer to obtain characteristic diagrams of side output layers in the first and second stages;

and respectively connecting a convolution layer laterally behind each block layer in the third, fourth and fifth stages of the original ResNet40 network to obtain feature maps after convolution of the respective block layers, and then respectively adding element by element the feature maps after convolution of all the block layers in the same stage to obtain the feature maps of the output layers at the corresponding stages.

The convolution kernel size of the convolution layer laterally connected with the identification block layers in the first and second stages is 1 multiplied by 1, and the step length and the channel number are all 1; the convolution kernel size of the convolution layer connected laterally behind each block layer in the third, fourth and fifth stages is 1 × 1, the step size is 1, and the number of channels is 21.

The original ResNet40 network comprises 40 convolutional layers and a full-connection layer positioned at the last layer of the network, and is divided into 5 stages, each stage comprises a convolutional block layer and one or more identification block layers, wherein the first stage and the second stage respectively comprise a convolutional block layer and an identification block layer, the third stage, the fourth stage and the fifth stage respectively comprise a convolutional block layer and two identification block layers, and each convolutional block layer and each identification block layer respectively comprise a plurality of convolutional layers; each stage adds a pooling layer with a pooling window size of 2 x2 and step size of 1 after all the flag layers.

The specific structure of the original ResNet40 network is as follows: firstly, sequentially carrying out convolution with convolution kernel size of 5 multiplied by 5, step length of 1, channel number of 32 and maximum pooling layer with convolution kernel size of 2 multiplied by 2 and step length of 2 on an input target image to obtain input characteristics of a first stage; the input features of the first stage are connected with a residual error with convolution kernel size of 1 × 1, step size of 1 × 3 and channel number of 32 through three convolution kernels with convolution kernel size of 1 × 1, convolution kernel size of 3 × 3 and convolution kernel size of 1 × 1 and channel number of 1 × 1 in sequence to obtain output features of the convolution block layer of the first stage; the output characteristics of the first stage convolution block layer are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 32 in sequence to obtain the output characteristics of the first stage identification block layer; the output characteristic of the first stage identification block layer is subjected to a pooling layer with a convolution kernel size of 2 multiplied by 2 and a step length of 2 to obtain the output characteristic of the first stage;

the output characteristics of the first stage are connected by three convolution volumes with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 64 and a residual error with convolution kernel sizes of 1 × 1, step lengths of 1 and channel numbers of 64 in sequence to obtain the output characteristics of the convolution block layer of the second stage; the output characteristics of the second stage convolution block layer are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 64 in sequence to obtain the output characteristics of the second stage identification block layer; the output characteristics of the second stage identification block layer pass through a pooling layer with convolution kernel size of 2 multiplied by 2 and step length of 2 to obtain the output characteristics of the second stage;

the output characteristics of the second stage are connected by three convolution volumes with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 256 and a residual error with convolution kernel sizes of 1 × 1, step lengths of 1 and channel numbers of 256 in sequence to obtain the output characteristics of the convolution block layer of the third stage; the output characteristics of the convolution block layer in the third stage are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 256 in sequence to obtain the output characteristics of the first identification block layer in the third stage; the output characteristics of the first identification block layer in the third stage are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 256 in sequence to obtain the output characteristics of the second identification block layer in the third stage; the output characteristic of the second identification block layer in the third stage passes through a pooling layer with the convolution kernel size of 2 multiplied by 2 and the step length of 2 to obtain the output characteristic of the third stage;

the operation process of the fourth stage is the same as that of the third stage, and the output characteristic of the fourth stage is obtained after the output characteristic of the third stage is repeated in the operation of the third stage;

the operation process of the fifth stage is the same as the operation process of the convolution block layer and the two identification block layers of the fourth stage, and the output characteristic of the fourth stage is obtained after the operation of the convolution block layer and the two identification block layers of the fourth stage is repeated.

A segmentation method of complex texture small defects based on a rich robust convolution characteristic model comprises the following specific steps:

s1 image preprocessing

Collecting an image containing a defect to be segmented, and normalizing the collected image into 1024 multiplied by 1024 pixels; adding pixel level labels to the normalized images, wherein the images added with the labels are target images; dividing the target image into different sample sets according to the proportion;

s2 construction of rich robust convolution characteristic model

Based on an original ResNet40 network, respectively and laterally connecting a convolutional layer on an identification block layer at a first stage and an identification block layer at a second stage of the original ResNet40 network to obtain characteristic diagrams of output layers at the first and second stages;

respectively connecting a convolution layer laterally behind each block layer in the third, fourth and fifth stages of the original ResNet40 network to obtain feature maps after convolution of the respective block layers, and then respectively adding element by element the feature maps after convolution of all the block layers in the same stage to obtain the feature maps of the output layers at the corresponding stages;

respectively connecting the feature maps of the five stage side output layers with a deconvolution layer (deconv) for up-sampling to obtain feature maps after respective stage deconvolution, and respectively connecting the feature maps after each stage deconvolution with a side output layer fine loss function for pixel-by-pixel classification to obtain a prediction feature map of each stage side output layer;

connecting the characteristic diagrams after deconvolution of each stage, and then fusing all the characteristic diagrams after deconvolution through a convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1 to obtain a fused layer characteristic diagram; finally, connecting the fusion layer feature graph with a fusion layer fine loss function to obtain a final prediction feature graph;

s3 model training and testing

Initializing model parameters, and inputting a target image for training and a pixel level label corresponding to the target image; in the model training process, the weight value of each convolution layer is transferred to the loss through a random gradient descent method, the weight value is updated, the momentum of the random gradient descent method is 0.9, and the weight attenuation is 0.0005; randomly sampling 1 image in each training process, stopping training when the number of iterative cycles reaches 100 cycles, and finishing the training of the model;

scaling the target image for testing to 1024x1024 pixels, and inputting the scaled target image into the trained model; the testing time of a single image is 0.1s, and the operation of the model is repeated to complete the model test.

The object to be segmented is a crack, an edge or a linear structure.

Compared with the prior art, the invention has the beneficial effects that:

the method is based on the reasonable utilization of convolution characteristics and the angle of designing a loss function, aims to enable a model to learn defect characteristics which are as rich and complete as possible, and predicts fine defects with robustness without using a post-processing method, so that the defect characteristics learned by the model can generate a prediction characteristic diagram which is as similar as possible to a real label; therefore, the invention constructs a rich robust convolution characteristic model based on an original ResNet40 network, and performs end-to-end deep learning under a Keras1.13 deep learning framework, the model adopts a network structure with multi-scale and multi-level characteristics, more high-level characteristics (third, fourth and fifth stages) are fused, and less low-level (first and second stages) characteristics are fused, meanwhile, each stage respectively adopts convolution with convolution kernel size of 1x1 for superposition fusion, all convolution characteristics are packaged into a richer and more robust expression mode, and the expression capability of the characteristics is improved; and the output characteristic diagram of the middle layer of each stage is utilized, so that the defect that the conventional convolutional neural network model only uses the output characteristics of the last convolutional layer or convolutional layers before each stage pooling layer and ignores target detail information contained in the middle layer is overcome.

Aiming at the problem of inaccurate prediction result caused by unbalanced proportion of crack pixels and non-crack pixels, in order to predict fine cracks, side output layer fine loss functions are respectively introduced into side output layers at various stages, a fused layer fine loss function is introduced into a fused layer of a model, the side output layer fine loss function combines a weighted cross entropy loss function and a distance loss function, defect characteristics in a prediction characteristic diagram of the fine side output layer are combined, and meanwhile, the weight of each convolution layer in the side output layer is optimized through the side output layer fine loss function in the model training process; the fusion layer fine loss function is fused with the side output layer fine loss function, defect characteristics in the characteristic diagram are predicted finely and finally, meanwhile, the weight of each convolution layer in the fusion layer is optimized through the fusion layer fine loss function in the model training process, and the prediction of cracks from the whole situation to the local situation is achieved.

Compared with the traditional filter segmentation method and the conventional convolutional neural network, the method has the advantages that the more refined cracks can be predicted by using the rich robust convolutional characteristic model, and the crack segmentation identification accuracy can reach 79.64%.

The method can provide an idea for target segmentation which is similar to a crack structure, has an extreme length-width ratio and has higher requirements on fineness.

Drawings

FIG. 1 is a network architecture diagram of a rich robust convolution signature model of the present invention;

FIG. 2 is a graph of crack segmentation results for various segmentation methods of the present invention;

FIG. 3 is a comparison of evaluation results of different segmentation methods of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the specific drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The method of the present application will be described in detail below with reference to the application of the method to a lithium battery for performing a surface crack defect of the lithium battery as an example.

The invention provides a method for segmenting complex texture small defects (a method for short) based on a rich robust convolution characteristic model, which comprises the following steps:

s1 image preprocessing

S1-1 acquiring an image

The method comprises the steps that 140 ten thousand near-infrared cameras are used for collecting lithium battery images, the actual size of the collected lithium battery images is 165mm multiplied by 165mm, the collected images are normalized to be 1024 multiplied by 1024 pixels and serve as original images, the original images do not need to be subjected to a complex preprocessing process, and the size normalization is guaranteed to be used for model input; the setting of the image size is almost equal to the image size acquired by an original camera, the original information of the image can be better kept, a complex processing process is avoided, the algorithm processing speed is improved, and the real-time requirement of production line detection is met; the original image comprises an image containing an object to be segmented and an image without the object to be segmented;

s1-2 image tag

Manually labeling all the original images containing the object to be segmented in the step S1-1 by using Labelimg software, and adding pixel-level labels, wherein the pixel-level labels comprise the area size and the spatial position information of the defect, and the images added with the labels are target images for model training, testing and verification;

s1-3 sample set

Grouping the target images in the step S1-2, randomly extracting 20% (default value) of the target images as a test sample set, and randomly dividing the rest target images into a training sample set and a verification sample set according to the ratio of 4: 1;

s2 construction of a Rich Robust convolution Features model (Rich and Robust Convolitional Features, RRCF)

S2-1 original ResNet40 network

The invention is based on the improvement of original ResNet40 network, the original ResNet40 network includes 40 convolution layers (Conv) and full Connected Layer (full Connected Layer) located at the last Layer of network, it is mainly divided into 5 stages (Stage), each Stage includes a convolution Block Layer (Conv Block) and one or more identification Block layers (IdentityBlock), it is represented by stagek _ Block, k represents the number of stages, m represents the number of Block layers in the corresponding Stage, the first and second stages include a convolution Block Layer and an identification Block Layer, the third, fourth and fifth stages include a convolution Block Layer and two identification Block layers, each convolution Block Layer and identification lamination Layer include multiple convolution; in each stage, a pooling layer with a pooling window size of 2 multiplied by 2 and a step length of 1 is added behind all the mark block layers; specific parameters of each convolutional layer of the original ResNet40 network are shown in Table 1;

firstly, sequentially carrying out convolution with convolution kernel size of 5 multiplied by 5, step length of 1 and channel number of 32 and a maximum pooling layer (Maxpool) with convolution kernel size of 2 multiplied by 2 and step length of 2 on an input target image to obtain input characteristics of a first stage; the input features of the first stage are sequentially connected with a residual error (Shortcut) with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step sizes of 1 and channel numbers of 32 to obtain output features of a convolution block layer of the first stage; the output characteristics of the first stage convolution block layer are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 32 in sequence to obtain the output characteristics of the first stage identification block layer; the output characteristic of the first stage identification block layer is subjected to a pooling layer with a convolution kernel size of 2 multiplied by 2 and a step length of 2 to obtain the output characteristic of the first stage;

the operation process of the fifth stage is the same as the operation process of the convolution block layer and the two identification block layers of the fourth stage, and the output characteristic of the fourth stage is obtained after the operation of the convolution block layer and the two identification block layers of the fourth stage is repeated;

calculating and extracting features of an input target image layer by layer, wherein the size of the target image is 1024 × 1024 × 32, the length and the width of the target image are both 1024, and the number of channels is 32; the output size after convolution with the convolution kernel size of 5 multiplied by 5, the step length of 1 and the channel number of 32 is 1024 multiplied by 32; then, the output size after the maximal pooling with the convolution kernel size of 2 × 2 and the step length of 2 is 512 × 512 × 32, namely the size of the input feature at the first stage is 512 × 512 × 32; after the operation, the output characteristic size of the first stage is 256 × 256 × 32, the output characteristic size of the second stage is 128 × 128 × 64, the output characteristic size of the third stage is 64 × 64 × 256, the output characteristic size of the fourth stage is 32 × 32 × 256, and the output characteristic size of the fifth stage is 16 × 16 × 256;

TABLE 1 details of the original ResNet40 network

In the table, Identity Block × 2 indicates that two Identity Block operations have been performed;

s2-2 feature map reorganization of rich robust convolution feature model

The full connection layer and the pooling layer of the fifth stage are removed on the basis of the original ResNet40 network constructed in the step S2-1; on one hand, the full connection layer is removed to provide a full convolution network for outputting prediction from an image to the image, and meanwhile, the model calculation complexity can be reduced; on the other hand, the pooling layer in the fifth stage can increase the step length by two times to influence the positioning of the defects; although the pooling layers can influence the positioning, the pooling layers are adopted in the first four stages mainly for accelerating the training;

respectively and laterally connecting a convolution layer with convolution kernel size of 1 multiplied by 1 and channel number of 1 to channel number dimensionality reduction at an identification block layer (stage1_ block2) of a first stage and an identification block layer (stage2_ block2) of a second stage of the original ResNet40 network, respectively obtaining characteristic diagrams of output layers of the first stage and the second stage, and realizing characteristic information integration;

after each block layer in the third, fourth and fifth stages of the original ResNet40 network, namely, stage3_ block1, stage3_ block2, stage3_ block3, stage4_ block1, stage4_ block2, stage4_ block3, stage5_ block1, stage5_ block2 and stage5_ block3, a convolution layer with convolution kernel size of 1 × 1, step size of 1 and channel number of 21 is laterally connected respectively to obtain a feature diagram after convolution of each block layer, and then feature diagrams after convolution of all block layers in the same stage are subjected to element-by-element addition respectively to obtain a feature diagram of an output layer at the corresponding stage;

s2-3 construction of prediction feature map of rich robust convolution feature model

Respectively connecting the feature maps of the five stage side output layers with a deconvolution layer (deconv) for up-sampling to predict feature maps with the same scale as that of the target image to obtain feature maps after deconvolution of respective stages, wherein the feature maps after deconvolution retain spatial position information of defects in the target image;

respectively connecting the characteristic diagram after deconvolution of each stage with a side output layer fine loss function to carry out pixel-by-pixel classification to obtain a predicted characteristic diagram of the side output layer of each stage, namely classifying each pixel on the characteristic diagram after deconvolution, and optimizing the weight of each convolution layer in the side output layer through the side output layer fine loss function in the model training process;

in order to directly utilize the predicted feature maps of the output layers at each stage, a fusion layer is added in a model, and the weight of the fusion layer is learned in the training process, namely, the feature maps after deconvolution at each stage are connected together (concatenate), and then all the deconvoluted feature maps are fused through a convolution layer with the convolution kernel size of 1 × 1 and the step length of 1 to obtain a fusion layer feature map; finally, the fusion layer feature graph is connected with a fusion layer fine loss function to obtain a final prediction feature graph, defect features in the feature graph are finely and finally predicted, and meanwhile, the weight of each convolution layer in the fusion layer is optimized through the fusion layer fine loss function in the model training process; the final prediction characteristic graph is the prediction characteristic graph of the rich robust convolution characteristic model;

design of S3 fine loss function

S3-1 weighted cross entropy loss function

Since small defects are distributed unevenly on pixels under a complex texture background in a defect image, most pixels are non-defect pixels distributed randomly, namely the background, such as crack pixels and non-crack pixels, the defect pixels cannot be accurately segmented from the non-defect pixels by directly using a cross entropy loss function; the weighted cross-entropy loss function (weighted cross-entropy loss) introduces a class balance weight coefficient β to counteract the imbalance between defective and non-defective pixels, the loss per pixel satisfying the formula (1):

β＝|Y_-|/|Y|，1-β＝|Y₊|/|Y| (2)

wherein X represents a target image; w represents the set of all network layer parameters; w is a^(k)Weights representing the predicted feature map of the k-th stage side output layer; y is₊And Y_-Respectively representing defective pixels and non-defective pixels, β representing class balance weight coefficients, Y ═ Y₊And Y_{_}Summing; y is_jRepresenting any pixel in the target image; pr (y)_j＝1|X；W，w^(k)) Is shown at pixel y_j(ii) class score calculated using sigmoid activation function, and Pr ∈ [0,1]；

S3-2 distance loss function (Dice loss, written as Dice loss function)

Given a target image X and a corresponding real label G, a predicted image of the target image X is P, a distance loss function (Dice loss function) can compare the similarity between the predicted image P and the real label G and can minimize the distance between the predicted image P and the real label G, and the Dice loss function (Dist (P, G)) formula is as follows:

wherein p is_j∈ P as any pixel in the predicted image P, g_j∈ G, which is any pixel in the real label G, wherein N represents the total number of pixels in the target image;

design of S3-3 fine loss function

In order to obtain better defect prediction performance, a fine Loss Function (precision Loss Function) combining a weighted cross entropy Loss Function and a Dice Loss Function is provided; the Dice loss function is considered as loss at an image level, is focused on similarity between two groups of image pixels, can reduce redundant information, and is a key for generating fine cracks in the application, and is easy to have the phenomena of incomplete prediction and target loss, such as part lack of predicted cracks; the weighted cross entropy loss function is focused on the difference of pixel levels, because the weighted cross entropy loss function is the sum of the distances between corresponding pixels between a predicted image and a real label, the prediction is complete, and the target loss cannot be caused, but the weighted cross entropy loss function is easy to introduce more background information, so that the prediction result is inaccurate; therefore, the combination of the two can realize the minimization of the distance from the image level to the pixel level, and realize the prediction from the global to the local;

in order to obtain a more refined prediction characteristic diagram of each stage side output layer, a side output layer refined loss function is provided, and the following formula is satisfied:

P_side＝σ(A_side)，A_side＝{a_j，j＝1，……|Y|} (5)

wherein L is^(k)(P_sideG) represents the distance loss function of the k stage; l (W, W)^(k)) Representing a weighted cross-entropy loss function for the kth stage; p_sideA prediction feature map representing a k-th stage side output layer; sigma is a sigmoid activation function;A_sidea set of activation values at all pixels of a predicted feature map representing k stage-side output layers; a is_jRepresenting an activation value at any pixel j in the prediction feature map of the k-th stage side output layer;

the fusion layer fine loss function is given by:

L_fuse(W，w)＝L_c(P_fuse，G) (6)

wherein L is_cA cross entropy loss function representing a criterion; p_fuseRepresenting the fusion of the prediction characteristic graphs of the k stage side output layers, namely the fusion layer weight; k represents the total number of stages;

summarizing the fusion layer fine loss function and the side output layer fine loss functions of all stages by using an argmin function (a minimum function which represents a variable value when the target function takes the minimum value) to obtain a target function, wherein the formula is shown in a formula (8); optimizing a target function by a standard random gradient descent method, and further optimizing the weight of the fine loss function of each side output layer and the weight of the fine loss function of the fusion layer;

s4 model training and testing

S4-1 model parameter initialization: initializing all weight values, bias values and batch normalization scale factor values, inputting initialized parameter data into the rich robust convolution characteristic model established in the step S2, and setting the initial learning rate lambda of the model to be 0.001; the weight standard deviation of the convolutional layers in the first to fifth stages is initialized to 0.01, and the weight deviation is initialized to 0; initializing the weight standard deviation of all convolution layers of the fusion layer to 0.2, and initializing the weight deviation to 0;

training an S4-2 model: inputting the target images in the training sample set and the corresponding pixel-level labels into the rich robust convolution characteristic model after initializing the parameters in the step S4-1; transferring the loss to the weight of each convolutional layer by a Stochastic Gradient Descent (SGD) method in the model training process, and updating the weight value, wherein the momentum of the Stochastic gradient descent method is 0.9, and the weight attenuation is 0.0005; randomly sampling 1 image in each training process, stopping training when the number of iterative cycles reaches 100 cycles, and completing the training of the rich robust convolution characteristic model; the operations are all completed under a window10 system, a computer CPU used for training is core i7 series, a memory is 32GB, and a display card is NIVIDIAGeforce GTX2080 ti; training of the model is realized based on a Keras1.13 deep learning framework;

testing of an S4-3 model: scaling the target image in the test sample set to 1024x1024 pixels, and inputting the scaled target image into the rich robust convolution characteristic model trained in the step S4-2; and (4) the testing time of the single image is 0.1S to meet the requirement of production efficiency, and the operation of the step S4-2 is repeated to complete the model test.

In order to verify the effectiveness of the method, the method is used for carrying out experiments on the lithium battery image containing the crack defects, and meanwhile, the results of the experiments are compared with the traditional segmentation method (Gabor filter method) and the commonly used convolutional neural network method (UNet, U-shaped network), and the comparison result is shown in fig. 2; wherein, (a1) represents an original image containing a crack defect, and (a5) is a corresponding real label; (a2) the method is a result of characteristic extraction by adopting a Gabor filter method; (a3) the method is a result of feature extraction by adopting a UNet model (U-type network) method; (a4) the method is a result of feature extraction by adopting a rich robust convolution feature model (RRCF) provided by the method;

as can be seen from fig. 2, the RRCF model proposed by the method has more learned crack information due to the superposition and fusion of convolution features at each stage, and overcomes the defects that a Gabor filter method easily falsely detects a grid line structure similar to a crack structure and a UNet model easily falsely detects a portion shielded by a crystal grain as a crack; the crack lines predicted by the RRCF model provided by the method are thinner and are closer to real labels, and the result shows that the two fine loss functions in the method are beneficial to predicting fine cracks, so that the prediction result of the Gabor filter and the UNet model which are not fine enough in prediction is improved, and the prediction precision is higher.

In order to quantitatively evaluate the performance of each method, three indexes of cpt (integrity), crt (accuracy) and F-measure (F measure) are respectively used for quantitative analysis, the F-measure is a calculation result obtained based on the cpt and the crt, and the higher the F-measure value is, the more effective the adopted method is; each expression is shown in formulas (9) to (11);

wherein L is_gRepresenting the number of crack pixels in the real label marked by hand; l is_tThe number of the pixels extracted in the detection method is the number of the pixels extracted in the detection method; l is the number of pixels matched with the real label in the detection method extraction result;

each index value of the three methods is shown in fig. 3, the UNet model and the rich robust convolution characteristic model both show higher integrity cpt, and reflect the advantages of the convolution neural network in solving the crack detection problem under the complex background interference; the F-measure of the rich robust convolution characteristic model is 85.81%, and the performance is superior to that of the other two methods; the integrity and the accuracy of the rich robust convolution feature model are respectively 93.02% and 79.64%, on one hand, the integrity of crack segmentation is improved through multi-level fusion of a network, on the other hand, two fine loss functions are designed according to the characteristic of the extreme length-width ratio of the crack, the interference of background information is reduced, the accuracy is improved, the identification accuracy is obviously improved, crack features are not easy to lose in the process, and the crack omission is avoided; the accuracy of the UNet model is the lowest (69.5%), which is because the UNet model is greatly influenced by background interference, excessive background information is introduced, fine segmentation of cracks cannot be realized, and the accuracy is the lowest; in conclusion, the method has the highest completeness and accuracy of crack segmentation, has the best segmentation effect, and can realize fine segmentation of cracks.

Nothing in this specification is said to apply to the prior art.

Claims

1. A segmentation method of complex texture small defects based on a rich robust convolution feature model is characterized in that the method comprises the steps of obtaining an image containing an object to be segmented, and performing feature recombination on the image containing the object to be segmented by using the rich robust convolution feature model to obtain a feature map of each side output layer; the feature diagram of each side output layer is sequentially connected with a deconvolution layer and a side output layer fine loss function to obtain a prediction feature diagram of each stage side output layer;

wherein, the side output layer fine loss function satisfies formula (1):

P_side＝σ(A_side)，A_side＝{a_j，j＝1，……|Y|} (2)

the fusion layer fine loss function is given by:

L_fuse(W，w)＝L_c(P_fuse，G) (3)

2. The segmentation method according to claim 1, wherein the specific process of using the rich robust convolution feature model to perform the feature reconstruction is as follows:

3. The segmentation method according to claim 2, wherein the convolution kernel size of the convolutional layer laterally connected to the first and second stages of the identifier block layer is 1 × 1, and the step size and the number of channels are all 1; the convolution kernel size of the convolution layer connected laterally behind each block layer in the third, fourth and fifth stages is 1 × 1, the step size is 1, and the number of channels is 21.

4. The segmentation method according to claim 2, wherein the original ResNet40 network comprises 40 convolutional layers and a fully-connected layer located at the last layer of the network, and is divided into 5 stages, each stage comprises a convolutional block layer and one or more flag block layers, wherein the first and second stages respectively comprise a convolutional block layer and a flag block layer, the third, fourth and fifth stages respectively comprise a convolutional block layer and two flag block layers, and each convolutional block layer and flag block layer comprises a plurality of convolutional layers; each stage adds a pooling layer with a pooling window size of 2 x2 and a step size of 2 after all the flag layers.

5. The segmentation method according to claim 4, wherein the original ResNet40 network has a specific structure:

firstly, sequentially carrying out convolution with convolution kernel size of 5 multiplied by 5, step length of 1, channel number of 32 and maximum pooling layer with convolution kernel size of 2 multiplied by 2 and step length of 2 on an input target image to obtain input characteristics of a first stage; the input features of the first stage are connected with a residual error with convolution kernel size of 1 × 1, step size of 1 × 3 and channel number of 32 through three convolution kernels with convolution kernel size of 1 × 1, convolution kernel size of 3 × 3 and convolution kernel size of 1 × 1 and channel number of 1 × 1 in sequence to obtain output features of the convolution block layer of the first stage; the output characteristics of the first stage convolution block layer are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 32 in sequence to obtain the output characteristics of the first stage identification block layer; the output characteristic of the first stage identification block layer is subjected to a pooling layer with a convolution kernel size of 2 multiplied by 2 and a step length of 2 to obtain the output characteristic of the first stage;

6. The segmentation method according to any one of claims 1 to 5, characterized in that the method comprises the specific steps of:

s1 image preprocessing

s2 construction of rich robust convolution characteristic model

s3 model training and testing

7. The segmentation method according to claim 1, wherein the object to be segmented is a crack, an edge or a line structure.