Disclosure of Invention
Aiming at the problem that there is not enough cloth defect image data with holes, stains and looseness to support the training of a deep learning model, the invention provides a cloth defect image generation system and method based on an improved variation self-encoder network, the variational self-encoder network is improved by generating a discriminator network in the countermeasure network, learning characteristic representation in the GAN discriminator is used as the basis of a VAE reconstruction target, and pixel similarity measurement of the VAE is replaced by characteristic measurement learned by the discriminator, so that the quality of a generated image is improved.
In one aspect, the present invention provides a cloth defect image generating system based on an improved variation self-encoder network, comprising: a variation self-encoder network and a discrimination network, the variation self-encoder network being divided into an encoder network and a decoder network;
an encoder network for encoding the real target image into a normal distribution q (z|x) of latent space variables X;
A decoder network for sampling the latent space variable X from the normal distribution q (z|x) to generate a new target image;
And the discriminator network is used for measuring the similarity between the generated target image and the real target image, calculating the countermeasures, transmitting the countermeasures into the coding network and the decoding network, and simultaneously replacing the reconstruction measurement based on the pixels in the variation self-coder network with the characteristic measurement expressed in the discriminator network.
Further, the encoder network comprises a resnet network; the resnet network comprises a Conv2d layer, a maximum pooling layer, 8 residual blocks and an average pooling layer which are sequentially connected from shallow to deep; the 8 residual blocks are sequentially connected in series;
The Conv2d layer consists of a convolution layer with the convolution kernel size of 7 multiplied by 7 and the step length of 2, a normalization layer and an activation function ReLU; the convolution kernel size of the maximum pooling layer is 3×3 and the step size is 2; the convolution kernel size of the average pooling layer is 1 multiplied by 1; the 1 st, 2 nd residual blocks each contain a convolution with a 2-layer convolution kernel size of 3 x 3 and an output channel of 64; the 3 rd and 4 th residual blocks each contain a convolution with a 2-layer convolution kernel size of 3 x 3 and an output channel of 128; the 5 th and 6 th residual blocks each contain a convolution with a 2-layer convolution kernel size of 3 x 3 and an output channel of 256; the 7 th and 8 th residual blocks each contain convolutions with a 2-layer convolution kernel size of 3 x 3 and an output channel of 512.
Further, the decoder network comprises 6 layers, the first layer comprising a deconvolution layer having a convolution kernel size of 4×4 and an activation function ReLU; the second layer to the fifth layer comprise a deconvolution layer with a convolution kernel size of 4×4, a normalization layer and an activation function ReLU; the sixth layer comprises a deconvolution layer with a convolution kernel size of 4×4 and an activation function Tanh; the number of output channels of the first layer to the sixth layer is 512, 384, 192, 96, 64,3 in order.
Further, the arbiter network comprises 6 layers, the first layer comprising a convolution layer having a convolution kernel size of 4 x 4 and a step size of 2, and an activation function LeakyReLU; the second layer to the fifth layer are composed of a convolution layer with the convolution kernel size of 4 multiplied by 4 and the step length of 2, a normalization layer and an activation function ReLU; the sixth layer consists of a convolution layer with a convolution kernel size of 4 x 4 and a step size of 1 and an activation function Sigmoid.
In another aspect, the present invention provides a cloth defect image generating method based on an improved variation self-encoder network, comprising:
Step 1: constructing the cloth defect image generation system based on the improved variation self-encoder network;
Step 2: screening cloth defect image data with holes, stains or loose warp defects as a training set, and scaling and cropping each image to 256×256 pixels;
step 3: defining a loss function of the cloth defect image generating system;
step 4: initializing a cloth defect image generation system;
Step 5: training the cloth defect image generation system by adopting a training set by using an Adam algorithm;
Step 6: and generating a cloth defect image with holes, stains or loose warp defects by using a trained cloth defect image generation system.
Further, the loss function of the cloth defect image generating system comprises regularized prior loss L prior of the variable self-encoder network, counterloss L GAN of the variable self-encoder network and the discriminant network, and characteristic loss of the first layer of the discriminant network replacing pixel loss
Wherein the regularized prior loss L prior of the variation from the encoder network is defined as the KL divergence loss between the normal distribution q (z|x) of the transformation of the target image into the latent space variable X and a given normal distribution p (z) constraint:
Lprior=DKL(q(z|x)||p(z))
countering loss L GAN and characteristic loss of first layer of discriminator network Respectively defined as:
LGAN=log(Dis(x))+log(1-Dis(Gen(z)))
Wherein D KL represents a KL divergence loss function; dis represents a discriminator function, which is used for judging whether the picture input to the discriminator is true or false, if true, outputting 1, otherwise outputting 0; gen stands for variation self-encoder, the encoder in the variation self-encoder encodes the inputted real picture into a latent space variable, the decoder in the variation self-encoder decodes the latent space variable into a new picture as input of the discriminator; e represents the expected value of the distribution function.
Further, step 5 specifically includes: in the training process, the system parameters are updated through iteration until the system converges; in each iteration process, the process of updating the system parameters specifically comprises the following steps:
calculating losses of the encoder network and the decoder network through a loss function;
Calculating increment DeltaW (i) of convolution layer weight in the convolution layer according to the obtained loss by utilizing the principle of a back propagation algorithm and a gradient descent algorithm, executing Updating the convolution kernel parameters;
Wherein W (i) represents the convolution kernel parameters of the convolution layer after the ith iteration, Is the update amount of the parameter calculated according to the back propagation algorithm and the gradient descent algorithm in the ith iteration,In steps.
The invention has the beneficial effects that:
The invention uses a discrimination network in a generation countermeasure network to improve a variation self-encoder network, and designs a cloth defect image generation system based on the improved variation self-encoder network and related parameter configuration thereof aiming at the problem that insufficient three cloth defect image data with holes, stains and looseness are used for supporting deep learning model training, and specifically comprises the following steps: improving the network structure of the VAE to form a new VAE-GAN by using the GAN; an improved variational self-encoder network capable of generating higher quality images with three cloth defects of hole, stain and loose warp is designed; the problem that the image data of three cloth defects with holes, stains and loose warps is insufficient to support training of the deep learning model is effectively solved.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides a cloth defect image generating system based on an improved variation self-encoder network, which includes a variation self-encoder network Generator and a discrimination network Discriminator, wherein the variation self-encoder network is divided into an encoder network Encoder and a Decoder network Decoder;
an encoder network for encoding the real target image into a normal distribution q (z|x) of latent space variables X;
A decoder network for sampling the latent space variable X from the normal distribution q (z|x) to generate a new target image;
And the discriminator network is used for measuring the similarity between the generated target image and the real target image, calculating the countermeasures, transmitting the countermeasures into the coding network and the decoding network, and simultaneously replacing the reconstruction measurement based on the pixels in the variation self-coder network with the characteristic measurement expressed in the discriminator network.
As an embodiment, as shown in fig. 2, the encoder network comprises a resnet network; the resnet network comprises a Conv2d layer, a maximum pooling layer, 8 residual blocks and an average pooling layer which are sequentially connected from shallow to deep; the 8 residual blocks are sequentially connected in series;
The Conv2d layer consists of a convolution layer with the convolution kernel size of 7 multiplied by 7 and the step length of 2, a normalization layer and an activation function ReLU; the convolution kernel size of the maximum pooling layer is 3×3 and the step size is 2; the convolution kernel size of the average pooling layer is 1 multiplied by 1; the 1 st, 2 nd residual blocks each contain a convolution with a 2-layer convolution kernel size of 3 x 3 and an output channel of 64; the 3 rd and 4 th residual blocks each contain a convolution with a 2-layer convolution kernel size of 3 x 3 and an output channel of 128; the 5 th and 6 th residual blocks each contain a convolution with a 2-layer convolution kernel size of 3 x 3 and an output channel of 256; the 7 th and 8 th residual blocks each contain convolutions with a 2-layer convolution kernel size of 3 x 3 and an output channel of 512.
As an implementation, as shown in fig. 3, the decoder network includes 6 layers, and the first layer includes a deconvolution layer with a convolution kernel size of 4×4 and an activation function ReLU, and the structure is represented as UpConv = [ ConvTranspose 4×4 -ReLU ]; the second layer to the fifth layer comprise a deconvolution layer with a convolution kernel size of 4×4, a normalization layer and an activation function ReLU, and the structure is expressed as UpConv = [ ConvTranspose 4×4 -BN-ReLU ]; the sixth layer comprises a deconvolution layer with a convolution kernel size of 4×4 and an activation function Tanh, up conv= [ ConvTranspose 4×4 -Tanh ]; the number of output channels of the first layer to the sixth layer is 512, 384, 192, 96, 64,3 in order.
The variation self-encoder network in the embodiment of the invention extracts the characteristics of the image by utilizing resnet network, solves the problem of gradient disappearance caused by over-deep network by utilizing the thought of residual block, and generates a new target image by sampling-X from normal component q (z|x) through a decoding network.
As an embodiment, as shown in fig. 4, the arbiter network includes 6 layers, where the first layer includes a convolution layer with a convolution kernel size of 4×4 and a step size of 2, and an activation function LeakyReLU; the second layer to the fifth layer are composed of a convolution layer with the convolution kernel size of 4 multiplied by 4 and the step length of 2, a normalization layer and an activation function ReLU; the sixth layer consists of a convolution layer with a convolution kernel size of 4 x 4 and a step size of 1 and an activation function Sigmoid.
In the embodiment of the invention, the identifier network extracts the characteristics of the image by utilizing the convolution layer, the true and false of the input image is identified through the Sigmoid function, the first layer characteristic loss of the identification network and the identification result are fed back to the variation self-encoder network, and the variation self-encoder network retrains the generated image according to the feedback result until the model is stable.
The invention also provides a cloth defect image generation method based on the improved variation self-encoder network, which adopts the cloth defect image generation system in the embodiment, and specifically comprises the following steps:
S101: constructing a cloth defect image generating system based on an improved variation self-encoder network, such as the cloth defect image generating system in the embodiment;
S102: screening cloth defect image data with holes, stains or loose warp defects as a training set, and scaling and cropping each image to 256×256 pixels;
S103: defining a loss function of the cloth defect image generating system;
Specifically, in the embodiment of the present invention, the loss function of the cloth defect image generating system includes regularized prior loss L prior of the variable self-encoder network, counterloss L GAN of the variable self-encoder network and the discriminant network, and characteristic loss of the first layer of the discriminant network replacing pixel loss This is because the feature loss learned by the first layer of the arbiter is extracted as the loss of the generated image, based on that pixel loss is less suitable for the image data, and therefore a higher level and sufficiently invariant representation of the image is used to measure the similarity of the image.
Wherein the regularized prior loss L prior of the variation from the encoder network is defined as the KL divergence loss between the normal distribution q (z|x) of the transformation of the target image into the latent space variable X and a given normal distribution p (z) constraint:
Lprior=DKL(q(z|x)||p(z))
countering loss L GAN and characteristic loss of first layer of discriminator network Respectively defined as:
LGAN=log(Dis(x))+log(1-Dis(Gen(z)))
Wherein D KL represents a KL divergence loss function; dis represents a discriminator function, which is used for judging whether the picture input to the discriminator by the function is true or false, if true, outputting 1, otherwise outputting 0; gen stands for variation self-encoder, the encoder in the variation self-encoder encodes the inputted real picture into a latent space variable, the decoder in the variation self-encoder decodes the latent space variable into a new picture as input of the discriminator; e represents the expected value of the distribution function, the greater the expected value, the smaller the loss of function.
S104: initializing a cloth defect image generating system, comprising: for a convolution layer in a convolution network, a convolution kernel parameter is initialized by using an Xavier mode.
S105: training the cloth defect image generation system by adopting a training set by using an Adam algorithm;
specifically, an image generated by the decoding of the latent space variable X and a real image generated by the decoding of the latent space variable X are sent to a discriminator network to calculate countermeasures, the countermeasures the quality of the image generated by the decoding of the latent space variable X, and the result is fed back to a variational self-encoder network;
In the training process, the system parameters are updated through iteration until the system converges; in each iteration process, the process of updating the system parameters specifically comprises the following steps:
calculating losses of the encoder network and the decoder network through a loss function;
Calculating increment DeltaW (i) of convolution layer weight in the convolution layer according to the obtained loss by utilizing the principle of a back propagation algorithm and a gradient descent algorithm, executing Updating the convolution kernel parameters;
Wherein W (i) represents the convolution kernel parameters of the convolution layer after the ith iteration, Is the update amount of the parameter calculated according to the back propagation algorithm and the gradient descent algorithm in the ith iteration,In steps.Also referred to as the learning rate,The size of (2) determines how fast and how far the network is converging during training,When the value is relatively large, the network can quickly converge during training, but the global optimal value can not be reached,When smaller, the network training converges more slowly, but often can reach a global optimum. In the preferred embodiment, adam is used in the iterative update parameter algorithm, and the learning rate is initially set to 0.001.
S106: and generating a cloth defect image with holes, stains or loose warp defects by using a trained cloth defect image generation system.
Specifically, the cloth defect image with holes, stains or loose defects generated by using the trained cloth defect image generating system can be compared with the images generated by the VAE and the GAN, and the quality of the image generated by using the method can be evaluated by using Cosine similarity and Mean squared error values of the generated images, so that the performance of the cloth defect system provided by the invention can be measured.
In order to verify the effectiveness of the cloth defect image generation system and method provided by the invention, the invention also provides the following experiment.
In the experiment, pytorch-1.6 is used for realizing a system network structure, and an LFW testing set is adopted for testing the system network performance; the network is trained by using an Adam algorithm, wherein the super-parameter learning rate of the Adam algorithm is initially set to 0.0002, and the learning rate is automatically adjusted by using a dynamic learning mode. All the convolution layer convolution kernels are initialized by adopting the Xavier initialization method in Pytorch, and the offset item is set to be 0.0. In each iteration, any 4 images in the training set form a batch, the images are input into the network for training, and the training is carried out for a plurality of times until the network converges. After training, the images are expressed by evaluation indexes Cosine similarity and Mean squared error of the similarity of the image attributes, and the images are used for comparing networks such as VAE, GAN and the like, adopting the same data set division and testing the performance according to the original configuration.
As shown in table 1, the Cosine similarity and Mean squared error values of the resulting cloth defect images were tested under VAE, GAN and the system and method of the present invention, respectively, and as can be seen from table 1, the system and method of the present invention provided cloth defects that perform better on the dataset than the VAE and GAN.
TABLE 1
Model |
Cosine similarity |
Mean squared error |
test_set |
0.9193 |
14.1987 |
VAE |
0.9030 |
27.59±1.42 |
GAN |
0.8892 |
27.89±3.0 |
VAEGAN |
0.9114 |
22.39±1.16 |
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.