CN112927236B

CN112927236B - Clothing analysis method and system based on channel attention and self-supervision constraint

Info

Publication number: CN112927236B
Application number: CN202110226332.1A
Authority: CN
Inventors: 项欣光; 左成婷; 张冬
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2021-03-01
Filing date: 2021-03-01
Publication date: 2021-10-15
Anticipated expiration: 2041-03-01
Also published as: CN112927236A

Abstract

The invention relates to a garment analysis method and system based on channel attention and self-supervision constraint, wherein the method comprises the following steps: acquiring a clothing picture data set, wherein the clothing picture data set comprises a clothing analysis chart; inputting each clothing picture in the clothing picture data set into a neural network, wherein the neural network comprises a channel attention module; based on the channel attention module, performing multiple times of image feature extraction on the input clothing pictures to obtain output feature maps of the clothing pictures; performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of the high-level output to the low-level output in each decoding output data in the neural network; and inputting the clothing picture to be analyzed into the trained neural network model, and outputting a clothing analysis picture. The invention reduces the parameter quantity and improves the garment analysis precision.

Description

Clothing analysis method and system based on channel attention and self-supervision constraint

Technical Field

The invention relates to the technical field of computer vision semantic segmentation, in particular to a garment analysis method and system based on channel attention and self-supervision constraint.

Background

Image semantic segmentation is one of core research problems in the field of computer vision, and aims to extract correct high-level semantic information from an image and correctly associate the information with each pixel point. Clothing analysis is an important and more detailed research direction in the field of semantic segmentation, and aims to automatically analyze an acquired image and analyze human clothing in the image so as to replace human eyes to judge and position clothing. The application prospect of clothing analysis is wide, the clothing analysis mainly relates to the fields of virtual clothing changing, pedestrian re-identification, clothing recommendation, clothing retrieval and the like, but the clothing analysis is a new research direction in recent years, and related technologies are few; meanwhile, the deep learning technology is continuously developed, so an advanced garment analysis method following the current development trend is urgently needed.

The existing garment analysis methods are mainly divided into two major categories, one is based on multi-stage methods, and the other is based on end-to-end methods. The multi-stage method mostly takes human body posture estimation and clothing classification as prior knowledge to be added into a clothing analysis network, and takes a conditional random field as a subsequent network auxiliary processing means. The end-to-end method is to directly input the original fashion clothing picture into the clothing analysis network and directly output an analyzed clothing prediction picture by the network. But these garment resolution networks either introduce too much a priori knowledge and subsequent assistance; or noise exists in the introduced prior knowledge, so that the extracted features are not optimal; or require a bulky network architecture. There are also some networks that have been improved from the point of view of simplifying the structure, but the split performance still remains to be improved.

Disclosure of Invention

The invention aims to provide a garment analysis method and system based on channel attention and self-supervision constraint, and garment analysis accuracy is improved.

In order to achieve the purpose, the invention provides the following scheme:

a method of garment parsing based on channel attention and self-supervision constraints, the method comprising:

acquiring a clothing picture data set, wherein the clothing picture data set comprises clothing analytic graphs corresponding to all clothing pictures;

inputting each clothing picture in the clothing picture data set into a neural network, wherein the neural network comprises a channel attention module;

based on the channel attention module, performing multiple times of image feature extraction on the input clothing pictures to obtain output feature maps of the clothing pictures;

performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of high-level output on low-level output in each decoding output data in the neural network;

and inputting the clothing picture to be analyzed into the trained neural network model, and outputting a clothing analysis picture.

Optionally, before inputting each clothing picture in the clothing picture data set into the neural network, the method further includes:

and carrying out normalization processing on each clothing picture in the clothing picture data set.

Optionally, the image feature extraction specifically includes:

different convolution operations are carried out on the clothes pictures input into the neural network to obtain first data;

carrying out average pooling on the first data to obtain second data;

performing feature extraction on the second data by using set convolution to obtain first feature data;

performing maximum pooling on the first data to obtain third data;

performing feature extraction on the third data by using the set convolution to obtain second feature data;

adding the first characteristic data and the second characteristic data and inputting the added first characteristic data and the added second characteristic data into a Sigmoid function to obtain fourth data;

and multiplying the fourth data by the first data to obtain output data of the channel attention module.

Optionally, the performing, based on the channel attention module, multiple times of image feature extraction on the input clothing pictures to obtain an output feature map of each clothing picture specifically includes:

repeating the image feature extraction for 7 times to obtain first coded output data, wherein I ═ psi_encoder1(A)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(A) )))) where I represents the first encoded output data, ψ)_encoder1Representing the acquisition process, p, of the first encoded output data_i′Represents the i 'th repetition, i' belongs to {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set;

down-sampling the first encoded output data to obtain first down-sampled encoded output data;

repeating the image feature extraction for 6 times to obtain second coded output data, J ═ psi_encoder2(I′)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(I'))))) where J represents the second encoded output data, ψ_encoder2Representing the acquisition process of said second encoded output data, I' representing the first downsampled encoded output data;

down-sampling the second coded output data to obtain second down-sampled coded output data;

repeating the image feature extraction for 5 times to obtain third coded output data, wherein K is psi_encoder3(J′)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(J'))))) where K represents the third encoded output data, ψ_encoder3Representing the acquisition process of said third encoded output data, J' representing a second downsampled encoded output data;

down-sampling the third encoded output data to obtain third down-sampled encoded output data;

repeating the image feature extraction 4 times to obtain fourth coded output data, wherein L ═ psi_encoder4(K′)＝ρ₄(ρ₃(ρ₂(ρ₁(K')))) where L represents the fourth encoded output data, ψ_encoder4Representing the acquisition process of the fourth encoded output data, and K' represents third downsampled encoded output data;

down-sampling the fourth encoded output data to obtain fourth down-sampled encoded output data;

repeating the image feature extraction for 4 times to obtain fifth coded output data, wherein M is psi_encoder5(L′)＝ρ₄(ρ₃(ρ₂(ρ₁(L')))) where M represents fifth encoded output data, ψ_encoder5Represents the acquisition process of the fifth encoded output data, and L' represents the fourth downsampled encoded output data;

performing downsampling on the fifth coded output data to obtain fifth downsampled coded output data;

repeating the image feature extraction for 4 times to obtain sixth coded output data N ═ psi_encoder6(M′)＝ρ₄(ρ₃(ρ₂(ρ₁(M')))) where N represents sixth encoded output data, ψ_encoder6Representing the acquisition process of the sixth encoded output data, M' representing a fifth downsampled encoded output data;

performing upsampling on the sixth coded output data N to obtain sixth upsampled coded output data N';

repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψ_decoder5(N′，M)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (N', M)))) where O represents the fifth decoded output data, ψ_decoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;

performing upsampling on the fifth decoded output data to obtain fifth upsampled coded output data;

repeating the image feature extraction 4 times to obtain fourth decoded output data, P ═ ψ_decoder4(O′，L)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (O', L)))) where P represents the fourth decoded output data, ψ_decoder4Represents the acquisition process of the fourth decoded output data, O' represents the fifth up-sampled encoded output data;

up-sampling the fourth decoded output data to obtain fourth up-sampled encoded output data;

repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψ_decoder3(P′，K)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (P', K)))) where Q represents the third decoded output data, ψ_decoder3Represents the acquisition process of the third decoded output data, P' represents the fourth up-sampled encoded output data;

up-sampling the third decoded output data to obtain third up-sampled encoded output data;

repeating the image feature extraction 6 times to obtain second decoded output data, wherein R & ltpsi >_decoder2(Q′，J)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (Q', J))))) where R represents the second decoded output data, ψ_decoder2Representing the acquisition of said second decoded output data, Q' representing third up-sampled encoded output data;

upsampling the second decoded output data to obtain second upsampled coded output data;

repeating the image feature extraction 7 times to obtain first decoded output data, wherein S ═ ψ_decoder1(R′，I)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (R', I)))))) where S represents the first decoded output data, ψ_decoder1Representing the acquisition process of said first decoded output data, R' representing second up-sampled encoded output data;

upsampling sixth coded output data, fifth decoded output data, fourth decoded output data, third decoded output data, second decoded output data and first decoded output data to the same size as the clothing picture, and respectively counting sixth coded output upsampling data, fifth decoded output upsampling data, fourth decoded output upsampling data, third decoded output upsampling data, second decoded output upsampling data and first decoded output upsampling data;

fusing sixth encoding output up-sampled data, fifth decoding output up-sampled data, fourth decoding output up-sampled data, third decoding output up-sampled data, second decoding output up-sampled data and first decoding output up-sampled data by adopting a channel to obtain fused data;

and performing convolution operation on the fusion data by adopting the set convolution to obtain an output characteristic diagram of the clothing picture.

Optionally, the iteratively training the neural network based on the strong supervision constraint and the self-supervision constraint to obtain a trained neural network model specifically includes:

respectively calculating the clothing analysis graph corresponding to the clothing picture and the cross entropy losses of the output characteristic graph of the clothing picture, the first decoding output up-sampling data, the second decoding output up-sampling data, the third decoding output up-sampling data, the fourth decoding output up-sampling data, the fifth decoding output up-sampling data and the sixth encoding output up-sampling data to obtain a first group of cross entropy losses;

respectively calculating cross entropy losses of the output feature map of the clothing picture and the first decoding output up-sampled data, the second decoding output up-sampled data, the third decoding output up-sampled data, the fourth decoding output up-sampled data, the fifth decoding output up-sampled data and the sixth encoding output up-sampled data to obtain a second group of cross entropy losses;

calculating cross entropy losses of the first decoded output upsampled data and the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a third set of cross entropy losses;

calculating cross entropy losses of the second decoded output upsampled data and the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a fourth set of cross entropy losses;

calculating cross entropy losses of the third decoded output upsampled data and the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a fifth set of cross entropy losses;

calculating cross entropy losses of the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a sixth set of cross entropy losses;

respectively calculating cross entropy losses of the fifth decoded output upsampled data and the sixth encoded output upsampled data to obtain a seventh group of cross entropy losses;

weighting and adding the first group of cross entropy losses, the second group of cross entropy losses, the third group of cross entropy losses, the fourth group of cross entropy losses, the fifth group of cross entropy losses, the sixth group of cross entropy losses and the seventh group of cross entropy losses to obtain a loss function;

and performing iterative training on the neural network by adopting a random gradient descent algorithm based on the loss function.

The invention also discloses a clothing analysis system based on channel attention and self-supervision constraint, which comprises:

the system comprises a data set acquisition module, a data processing module and a data processing module, wherein the data set acquisition module is used for acquiring a clothing picture data set, and the clothing picture data set comprises clothing analytic graphs corresponding to all clothing pictures;

the data input module is used for inputting each clothing picture in the clothing picture data set into a neural network, and the neural network comprises a channel attention module;

the feature extraction module is used for extracting image features of the input clothing pictures for multiple times based on the channel attention module to obtain output feature maps of the clothing pictures;

the neural network training module is used for carrying out iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of high-level output on low-level output in each decoding output data in the neural network;

and the neural network model application module is used for inputting the clothing picture to be analyzed into the trained neural network model and outputting a clothing analysis picture.

Optionally, the system further comprises:

and the preprocessing module is used for carrying out normalization processing on each clothing picture in the clothing picture data set.

Optionally, the image feature extraction specifically includes:

carrying out average pooling on the first data to obtain second data;

performing maximum pooling on the first data to obtain third data;

Optionally, the feature extraction module specifically includes:

a first encoding unit for repeating the image feature extraction 7 times to obtain first encoded output data, I ═ ψ_encoder1(A)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(A) )))) where I represents the first encoded output data, ψ)_encoder1Representing the acquisition process, p, of the first encoded output data_i′Represents the i 'th repetition, i' belongs to {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set;

the first down-sampling unit is used for down-sampling the first coding output data to obtain first down-sampling coding output data;

a second encoding unit for repeating the image feature extraction 6 times to obtain second encoded output data, J ═ ψ_encoder2(I′)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(I'))))) where J represents the second encoded output data, ψ_encoder2Representing the acquisition process of said second encoded output data, I' representing the first downsampled encoded output data;

the second downsampling unit is used for downsampling the second coding output data to obtain second downsampling coding output data;

a third encoding unit for repeating the image feature extraction 5 times to obtain third encoded output data, K ═ ψ_encoder3(J′)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(J'))))) where K represents the third encoded output data, ψ_encoder3Representing the acquisition process of said third encoded output data, J' representing a second downsampled encoded output data;

a third downsampling unit, configured to downsample the third encoded output data to obtain third downsampled encoded output data;

a fourth encoding unit for repeating the image feature extraction 4 times to obtain fourth encoded output data, L ═ ψ_encoder4(K′)＝ρ₄(ρ₃(ρ₂(ρ₁(K')))) where L represents the fourth encoded output data, ψ_encoder4Representing the acquisition process of the fourth encoded output data, and K' represents third downsampled encoded output data;

a fourth downsampling unit, configured to downsample the fourth encoded output data to obtain fourth downsampled encoded output data;

a fifth encoding unit for repeating the image feature extraction 4 times to obtain fifth encoded output data, where M ═ ψ_encoder5(L′)＝ρ₄(ρ₃(ρ₂(ρ₁(L')))) where M represents fifth encoded output data, ψ_encoder5Represents the acquisition process of the fifth encoded output data, and L' represents the fourth downsampled encoded output data;

a fifth downsampling unit, configured to downsample the fifth encoded output data to obtain fifth downsampled encoded output data;

a sixth encoding unit for repeating the image feature extraction 4 times to obtain sixth encoded output data N ═ ψ_encoder6(M′)＝ρ₄(ρ₃(ρ₂(ρ₁(M')))) where N represents sixth encoded output data, ψ_encoder6Representing the acquisition process of the sixth encoded output data, M' representing a fifth downsampled encoded output data;

the first up-sampling unit is used for up-sampling the sixth coded output data to obtain sixth up-sampled coded output data;

a fifth decoding unit for repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψ_decoder5(N′，M)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (N', M)))) where O represents the fifth decoded output data, ψ_decoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;

a second upsampling unit, configured to upsample the fifth decoded output data to obtain fifth upsampled encoded output data;

a fourth decoding unit for repeating the image feature extraction 4 times to obtain fourth decoded output data, where P ═ ψ_decoder4(O′，L)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (O', L)))) where P represents the fourth decoded output data, ψ_decoder4Represents the acquisition process of the fourth decoded output data, O' represents the fifth up-sampled encoded output data;

a third upsampling unit, configured to upsample the fourth decoded output data to obtain fourth upsampled encoded output data;

a third decoding unit for repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψ_decoder3(P′，K)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (P', K)))) where Q represents the third decoded output data, ψ_decoder3Represents the acquisition process of the third decoded output data, P' represents the fourth up-sampled encoded output data;

a fourth upsampling unit, configured to upsample the third decoded output data to obtain third upsampled encoded output data;

a second decoding unit for repeating the image feature extraction 6 times to obtain second decoded output data, wherein R phi_decoder2(Q′，J)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (Q', J))))) where R represents the second decoded output data, ψ_decoder2Representing the acquisition of said second decoded output data, Q' representing third up-sampled encoded output data;

a fifth upsampling unit, configured to upsample the second decoded output data to obtain second upsampled encoded output data;

a first decoding unit for repeating the image feature extraction 7 times to obtain first decoded output data, wherein S ═ ψ_decoder1(R′，I)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (R', I)))))) where S represents the first decoded output data, ψ_decoder1Representing the acquisition process of said first decoded output data, R' representing second up-sampled encoded output data;

a sixth upsampling unit, configured to upsample sixth encoded output data, fifth decoded output data, fourth decoded output data, third decoded output data, second decoded output data, and first decoded output data to the same size as the clothing picture, and count the sixth encoded output upsampling data, the fifth decoded output upsampling data, the fourth decoded output upsampling data, the third decoded output upsampling data, the second decoded output upsampling data, and the first decoded output upsampling data, respectively;

the data fusion unit is used for fusing sixth encoding output up-sampling data, fifth decoding output up-sampling data, fourth decoding output up-sampling data, third decoding output up-sampling data, second decoding output up-sampling data and first decoding output up-sampling data by adopting a channel to obtain fused data;

and the characteristic diagram output unit is used for carrying out convolution operation on the fusion data by adopting the set convolution to obtain an output characteristic diagram of the clothing image.

Optionally, the neural network training module specifically includes:

a first group of cross entropy loss obtaining units, configured to respectively calculate cross entropy losses of a clothing analysis graph corresponding to the clothing picture and an output feature graph of the clothing picture, the first decoded output upsampled data, the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, so as to obtain a first group of cross entropy losses;

a second group of cross entropy loss obtaining units, configured to calculate cross entropy losses between the output feature map of the clothing picture and the first decoded output upsampled data, the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a second group of cross entropy losses;

a third group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the first decoded output upsampled data and the second decoded output upsampled data, the third coded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth coded output upsampled data, respectively, to obtain a third group of cross entropy losses;

a fourth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the second decoded output upsampled data and the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a fourth group of cross entropy losses;

a fifth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the third decoded output upsampled data and the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a fifth group of cross entropy losses;

a sixth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a sixth group of cross entropy losses;

a seventh group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a seventh group of cross entropy losses;

a loss function obtaining unit, configured to weight and add the first group of cross entropy losses, the second group of cross entropy losses, the third group of cross entropy losses, the fourth group of cross entropy losses, the fifth group of cross entropy losses, the sixth group of cross entropy losses, and the seventh group of cross entropy losses to obtain a loss function;

and the training unit is used for carrying out iterative training on the neural network by adopting a random gradient descent algorithm based on the loss function.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention discloses a garment analysis method and system based on channel attention and self-supervision constraint.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart of a garment analysis method based on channel attention and self-supervision constraint according to the present invention;

FIG. 2 is a detailed flowchart of a garment parsing method based on channel attention and self-supervision constraint according to the present invention;

FIG. 3 is a schematic diagram of a neural network according to the present invention;

fig. 4 is a schematic structural diagram of a garment analysis system based on channel attention and self-supervision constraint according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Fig. 1 is a schematic flow chart of a garment analysis method based on channel attention and self-supervision constraint according to the present invention, and as shown in fig. 1, the garment analysis method based on channel attention and self-supervision constraint includes:

step 101: and acquiring a clothing picture data set, wherein the clothing picture data set comprises clothing analytic graphs corresponding to the clothing pictures.

Step 102: and inputting each clothing picture in the clothing picture data set into a neural network, wherein the neural network comprises a channel attention module.

Before each clothing picture in the clothing picture data set is input into the neural network, the method further comprises the following steps:

Step 103: and performing multiple times of image feature extraction on the input clothing pictures based on the channel attention module to obtain output feature maps of the clothing pictures.

The image feature extraction in step 103 specifically includes:

and carrying out different convolution operations on the clothes pictures input into the neural network to obtain first data B.

And carrying out average pooling on the first data B to obtain second data C.

And performing feature extraction on the second data C by using set convolution to obtain first feature data D.

And performing maximum pooling on the first data B to obtain third data E.

And performing feature extraction on the third data E by using the set convolution to obtain second feature data F.

And adding the first characteristic data D and the second characteristic data F and inputting the added first characteristic data D and the added second characteristic data F into a Sigmoid function to obtain fourth data H.

Multiplying the fourth data H and the first data B to obtain output data H of the channel attention module_out。

Step 103, specifically, the method further comprises:

repeating the image feature extraction for 7 times to obtain first coded output data, wherein I ═ psi_encoder1(A)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(A) )))) where I represents the first encoded output data, ψ)_encoder1Representing the acquisition process, p, of the first encoded output data_i′Represents the i 'th repetition, i' is e {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set.

And performing downsampling on the first coded output data to obtain first downsampled coded output data.

Repeating the image feature extraction for 6 times to obtain second coded output data, J ═ psi_encoder2(I′)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(I'))))) where J represents the second encoded output data, ψ_encoder2Representing the acquisition of said second encoded output data and I' representing the first downsampled encoded output data.

And performing downsampling on the second coded output data to obtain second downsampled coded output data.

Repeating the image feature extraction for 5 times to obtain third coded output data, wherein K is psi_encoder3(J′)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(J'))))) where K represents the third encoded output data, ψ_encoder3Representing the acquisition of said third encoded output data, and J' represents the second downsampled encoded output data.

And performing downsampling on the third coded output data to obtain third downsampled coded output data.

Repeating the image feature extraction 4 times to obtain fourth coded output data, wherein L ═ psi_encoder4(K′)＝ρ₄(ρ₃(ρ₂(ρ₁(K')))) where L represents the fourth encoded output data, ψ_encoder4Representing the acquisition of said fourth encoded output dataAnd taking process, wherein K' represents the third downsampled coding output data.

And performing down-sampling on the fourth coded output data to obtain fourth down-sampled coded output data.

Repeating the image feature extraction for 4 times to obtain fifth coded output data, wherein M is psi_encoder5(L′)＝ρ₄(ρ₃(ρ₂(ρ₁(L')))) where M represents fifth encoded output data, ψ_encoder5Representing the acquisition of said fifth encoded output data and L' representing the fourth down-sampled encoded output data.

And performing downsampling on the fifth coded output data to obtain fifth downsampled coded output data.

Repeating the image feature extraction for 4 times to obtain sixth coded output data N ═ psi_encoder6(M′)＝ρ₄(ρ₃(ρ₂(ρ₁(M')))) where N represents sixth encoded output data, ψ_encoder6Representing the acquisition of said sixth encoded output data and M' representing the fifth downsampled encoded output data.

And performing upsampling on the sixth coded output data to obtain sixth upsampled coded output data.

Repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψ_decoder5(N′，M)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (N', M)))) where O represents the fifth decoded output data, ψ_decoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data.

And performing upsampling on the fifth decoding output data to obtain fifth upsampling coding output data.

Repeating the image feature extraction 4 times to obtain fourth decoded output data, P ═ ψ_decoder4(O′，L)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (O', L)))) where P represents the fourth decoded output data, ψ_decoder4Representing the acquisition of said fourth decoded output dataTaking process, O' represents the fifth up-sampled encoded output data.

And performing upsampling on the fourth decoding output data to obtain fourth upsampling coding output data.

Repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψ_decoder3(P′，K)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (P', K)))) where Q represents the third decoded output data, ψ_decoder3Representing the acquisition of said third decoded output data and P' representing the fourth up-sampled encoded output data.

And performing upsampling on the third decoded output data to obtain third upsampled coded output data.

Repeating the image feature extraction 6 times to obtain second decoded output data, wherein R & ltpsi >_decoder2(Q′，J)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (Q', J))))) wherein R represents the second decoded output data, ψ_decoder2Representing the acquisition of said second decoded output data, Q' represents the third up-sampled encoded output data.

And performing upsampling on the second decoding output data to obtain second upsampling coding output data.

Repeating the image feature extraction 7 times to obtain first decoded output data, wherein S ═ ψ_decoder1(R′，I)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (R', I)))))) where S represents the first decoded output data, ψ_decoder1Representing the acquisition of said first decoded output data and R' representing the second up-sampled encoded output data.

The sixth coded output data N, the fifth decoded output data O, the fourth decoded output data P, the third decoded output data Q, the second decoded output data R and the first decoded output data S are all up-sampled to the same size as the clothing picture a, and are respectively the sixth coded output up-sampled data N "(O in fig. 3)₆) Fifth decoded output upsamplingSample data O "(O in FIG. 3)₅) The fourth decoded output upsampled data P "(O in fig. 3)₄) And the third decoded output upsampled data Q "(O in fig. 3)₃) Second decoded output upsampled data R "(O in fig. 3)₂) And first decoded output upsampled data S' (O in FIG. 3)₁)。

Fusing the sixth encoded output upsampled data N ", the fifth decoded output upsampled data O", the fourth decoded output upsampled data P ", the third decoded output upsampled data Q", the second decoded output upsampled data R "and the first decoded output upsampled data S" by using channels to obtain fused data T (O in fig. 3)₀)。

Using said set convolution

And performing convolution operation on the fusion data to obtain an output characteristic diagram Y of the clothing picture.

Step 104: performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of the high-level output to the low-level output in each decoding output data in the neural network.

Step 104, specifically comprising:

respectively calculating the cross entropy losses of the clothing analysis graph A degree corresponding to the clothing picture, the output characteristic graph Y of the clothing picture, the first decoding output up-sampling data S ', the second decoding output up-sampling data R', the third decoding output up-sampling data Q ', the fourth decoding output up-sampling data P', the fifth decoding output up-sampling data O 'and the sixth coding output up-sampling data N', and obtaining a first group of cross entropy losses, wherein the first group of cross entropy losses comprise a loss₀，loss₁，loss₂，loss₃，loss₄，loss₅，loss₆。

Respectively calculating the output characteristics of the clothing picturesGraph Y obtains a second set of cross-entropy losses with the cross-entropy losses of the first decoded output upsampled data S ', the second decoded output upsampled data R', the third decoded output upsampled data Q ', the fourth decoded output upsampled data P', the fifth decoded output upsampled data O ', and the sixth encoded output upsampled data N'. The second set of cross-entropy losses includes loss₀₁，loss₀₂，loss₀₃，loss₀₄，loss₀₅，loss₀₆。

Cross entropy losses of the first decoded output upsampled data S "and the second decoded output upsampled data R", the third decoded output upsampled data Q ", the fourth decoded output upsampled data P", the fifth decoded output upsampled data O ", and the sixth encoded output upsampled data N" are calculated, respectively, to obtain a third set of cross entropy losses. The third set of cross-entropy penalties includes loss₁₂，loss₁₃，loss₁₄，loss₁₅，loss₁₆。

Cross-entropy losses of the second decoded output upsampled data R "and the third decoded output upsampled data Q", the fourth decoded output upsampled data P ", the fifth decoded output upsampled data O", and the sixth encoded output upsampled data N "are calculated, respectively, to obtain a fourth set of cross-entropy losses. The fourth set of cross-entropy losses includes loss₂₃，loss₂₄，loss₂₅，loss₂₆。

Cross-entropy losses of the third decoded output upsampled data Q "and the fourth decoded output upsampled data P", the fifth decoded output upsampled data O ", and the sixth encoded output upsampled data N" are calculated, respectively, to obtain a fifth set of cross-entropy losses. The fifth set of cross-entropy losses includes loss₃₄，loss₃₅，loss₃₆。

Cross entropy losses of the fourth decoded output upsampled data P ' and the fifth decoded output upsampled data O ' and the sixth encoded output upsampled data N ' are calculated, respectively, to obtain a sixth set of cross entropy losses. Crossing in the sixth groupFork entropy losses include loss₄₅，loss₄₆。

And respectively calculating the cross entropy loss of the fifth decoding output up-sampling data O 'and the sixth coding output up-sampling data N' to obtain a seventh group of cross entropy losses. The seventh set of cross-entropy losses includes loss₅₆。

Weighting the first group of cross entropy loss, the second group of cross entropy loss, the third group of cross entropy loss, the fourth group of cross entropy loss, the fifth group of cross entropy loss, the sixth group of cross entropy loss and the seventh group of cross entropy loss differently and then adding the weighted values to obtain a loss function loss_final。

loss_final＝ω_g·[loss0，loss1，loss2，loss3，loss4，loss5，loss6]T+ω₀·[loss₀₁，loss₀₂，loss₀₃，loss₀₄，loss₀₅，loss₀₆]T+ω₁·[loss₁₂，loss₁₃，loss₁₄，loss₁₅，loss₁₆]^T+ω₂·[loss₂₃，loss₂₄，loss₂₅，loss₂₆]^T+ω₃·[loss₃₄，loss₃₅，loss₃₆]^T+ω₄·[loss₄₅，loss₄₆]^T+ω₅·loss₅₆. Wherein [. ]]^TRepresents a transpose of a matrix; omega_g、ω₀、ω₁、ω₂、ω₃、ω₄Respectively representing the weight coefficient matrixes; omega_g＝[ω_g0，ω_g1，ω_g2，ω_g3，ω_g4，ω_g5，ω_g6，ω_g7]；ω₀＝[ω₀₁，ω₀₂，ω₀₃，ω₀₄，ω₀₅，ω₀₆]；ω₁＝[ω₁₂，ω₁₃，ω₁₄，ω₁₅，ω₁₆]；ω₂＝[ω₂₃，ω₂₄，ω₂₅，ω₂₆]；ω₃＝[ω₃₄，ω₃₅，ω₃₆]；ω₄＝[ω₄₅，ω₄₆]；ω_ijRepresenting the value, ω, in the weight matrix₅Representing the weight coefficients.

Step 105: and inputting the clothing picture to be analyzed into the trained neural network model, and outputting a clothing analysis picture.

The garment analysis method based on channel attention and self-supervision constraint is specifically refined into the following 42 steps, as shown in fig. 2, and specifically comprises the following steps:

step 1: a garment data set (garment picture data set) with labels is collected, wherein the labels refer to garment analytic graphs corresponding to all garment pictures in the data set, 70% of the pictures are randomly selected to serve as a training set, and the rest pictures serve as a test set.

Step 2: normalization processes are to normalize an image pixel value to [0, 1 ] for an image that is to be input into the network]Get training data A (without label), test data A_test(without notation). The step is not carried out on the labeled data in the data set, and the training labeled data A and the testing labeled data A obtained by Stepl are directly used in the subsequent steps_test。

Step 3: conv, two different convolution operations were performed on the training data A without labels preprocessed in Step2₁(Conv₂(A) Get data B, Conv)₁() And Conv₂() Representing two different convolution operations, respectively.

Step 4: the data B obtained in Step3 is transmitted to the channel attention module, and the data C is obtained by performing average pooling.

Step 5: by convolution

And performing feature extraction on the data C obtained in Step4 to obtain data D.

Step 6: data E was obtained by maximizing pooling of data B obtained in Step 3.

Step 7: by convolution

And (4) performing feature extraction operation on the data E obtained in Step6 to obtain data F.

Step 8: data G is obtained by adding data D obtained at Step5 and data F obtained at Step 7.

Step 9: sending the data G obtained in Step8 into a Sigmoid function to obtain data H, and multiplying the data H by the output data B of Step3 to obtain the output data H of the channel attention module_out。

Step 10: repeating the steps 3 to 9 7 times to obtain the output data I of the coding block 1_encoder1(A)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(A) )))) of a plurality of different types of wafers, wherein ψ is_encoder1Denotes the entire process of the coding block 1, p_i′Represents the i 'th iteration, i' e {1, 2, 3, 4, 5, 6, 7 }.

Step 11: and downsampling the data I obtained at Step10 to obtain data I'.

Step 12: repeating the steps 3 to 9 6 times to obtain the output data J psi of the coding block 2_encoder2(I′)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(I')))))) where ψ is_encoder2The entire process of encoding block 2 is shown.

Step 13: data J obtained at Step12 is down-sampled to obtain data J.

Step 14: repeating the steps 3 to 9 5 times to obtain the output data K psi of the coding block 3_encoder3(J′)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(J')))) in which ψ is provided_encoder3The entire process of the coding block 3 is shown.

Step 15: and 5, downsampling the data K obtained at Step14 to obtain the data K.

Step 16: repeating steps 3 to 9 4 times to obtain the output data L psi of the coding block 4_encoder4(K′)＝ρ₄(ρ₃(ρ₂(ρ₁(K'))) where ψ_encoder4Representing a coding block 4The whole process.

Step 17: data L obtained at Step16 is down-sampled to obtain data L.

Step 18: repeating the steps 3 to 9 4 times to obtain the output data M psi of the coding block 5_encoder5(L′)＝ρ₄(ρ₃(ρ₂(ρ₁(L'))) where ψ_encoder5The entire process of the coding block 5 is shown.

Step 19: and downsampling the data M obtained at Step18 to obtain data M'.

Step 20: repeating the steps 3 to 9 4 times to obtain the output data N psi of the coding block 6_encoder6(M′)＝ρ₄(ρ₃(ρ₂(ρ₁(M'))) where ψ_encoder6The entire process of the coding block 6 is shown.

Step 21: and upsampling the data N obtained at Step20 to obtain data N'.

Step 22: repeating steps 3 to 9 4 times to obtain the output data O ═ ψ of the decoded block 5_decoder5(N′，M)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (N', M)))) where ψ_decoder5Representing the entire process of decoding block 5, concat () representing channel fusion, and M being the data from Step 18.

Step 23: data O obtained at Step22 is up-sampled to obtain data O'.

Step 24: repeating steps 3 to 9 4 times to obtain the output data P ψ of the decoding block 4_decoder4(O′，L)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (O', L)))) where ψ_decoder4Representing the entire process of decoding block 4, concat () representing channel fusion, and L being the data from Step 16.

Step 25: the data P obtained at Step24 is up-sampled to data P'.

Step 26: repeating steps 3 to 9 5 times to obtain the output data Q ψ of the decoding block 3_decoder3(P′，K)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (P', K)))) where ψ_decoder3Indicating the entire processing of the decoded block 3Routine, concat () represents channel fusion, K is the data from Ste 14.

Step 27: data Q from Step26 is up-sampled to data Q'.

Step 28: repeating steps 3 to 9 6 times to obtain the output data R psi of the decoding block 2_decoder2(Q′，J)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (Q', J))))) where ψ_decoder2Represents the entire process of decoding block 2, concat () represents channel fusion, and J is the data resulting from Step 12.

Step 29: and upsampling the data R obtained at Step27 to obtain data R'.

Step 30: repeating the steps 3 to 9 7 times to obtain the output data S phi of the decoding block 1_decoder1(R′，I)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (R', I)))))) wherein ψ_decoder1Represents the entire process of decoding block 1, concat () represents channel fusion, and I is the data resulting from Step 10.

Step 31: data N, O, P, Q, R, S obtained at steps 20, 22, 24, 26, 28, and 30 is up-sampled to the same size as data a obtained at Step2 to obtain data N ", O", P ", Q", R ", and S", and data T ═ concat (N, O, P, Q, R, and S) is obtained using channel fusion.

Step 32: and (4) convolving the data T obtained at Step31 to obtain output data Y of the whole network.

Step 33: respectively calculating the cross entropy loss of the labeled data A ' of the data A acquired at Step2 and the data S ', R ', Q ', P ', O ', N ' obtained from the data Y, Step31 obtained at Step32

Obtain loss₀，loss₁，loss₂，loss₃，loss₄，loss₅，loss₆. Wherein A is_h×A_wRepresenting the number of all pixel points of the data A with the height of h and the width of w collected in Step 2; label represents the total number of categories in the dataset; y is_ijeA represents whether the ith pixel point of the data A is of the jth class or not;

and the probability that the ith pixel point of the data A is predicted to be the jth class is represented. For example: for the

For the

Step 34: the data Y obtained at Step32 are respectively used for calculating the cross entropy loss f of the data S ', R', Q ', P', O ', N' obtained at Step31_lossObtain loss₀₁，loss₀₂，loss₀₃，loss₀₄，loss₀₅，loss₀₆。

Step 35: the data S 'obtained at Step31 are respectively used for calculating the cross entropy loss f of the data R', Q ', P', O ', N' obtained at Step31_lossObtain loss₁₂，loss₁₃，loss₁₄，loss₁₅，loss₁₆。

Step 36: the data R ' obtained at Step31 are used for respectively calculating the cross entropy loss f of the data Q ', P ', O ', N ' obtained at Step31_lossObtain loss₂₃，loss₂₄，loss₂₅，loss₂₆。

Step 37: respectively calculating cross entropy losses f of the data Q 'obtained at Step31 to the data P', O ', N' obtained at Step31_lossObtain loss₃₄，loss₃₅，loss₃₆。

Step 38: the data P ' obtained at Step31 are respectively used for calculating the cross entropy loss f of the data O ' and N ' obtained at Step31_lossObtain loss₄₅，loss₄₆。

Step 39: obtained from Step31Data O 'Cross entropy loss function f is calculated separately for data N' obtained at Step31_lossObtain loss₅₆。

Step 40: all the loss functions obtained from steps 33-39 are combined by applying different weights to obtain the final loss_final。

loss_final＝ω_g·[loss0，loss1，loss2，loss3，loss4，loss5，loss6]^T+ω₀·[loss₀₁，loss₀₂，loss₀₃，loss₀₄，loss₀₅，loss₀₆]^T+ω₁·[loss₁₂，loss₁₃，loss₁₄，loss₁₅，loss₁₆]^T+ω₂·[loss₂₃，loss₂₄，loss₂₅，loss₂₆]^T+ω₃·[loss₃₄，loss₃₅，loss₃₆]^T+ω₄·[loss₄₅，loss₄₆]^T+ω₅·loss₅₆. Wherein [. ]]^TRepresents a transpose of a matrix; omega_gg、ω₀、ω₁、ω₂、ω₃、ω₄Respectively representing the weight coefficient matrixes; omega_g＝[ω_g0，ω_g1，ω_g2，ω_g3，ω_g4，ω_g5，ω_g6，ω_g7]；ω₀＝[ω₀₁，ω₀₂，ω₀₃，ω₀₄，ω₀₅，ω₀₆]；ω₁＝[ω₁₂，ω₁₃，ω₁₄，ω₁₅，ω₁₆]；ω₂＝[ω₂₃，ω₂₄，ω₂₅，ω₂₆]；ω₃＝[ω₃₄，ω₃₅，ω₃₆]；ω₄＝[ω₄₅，ω₄₆]；ω_ijRepresenting the value, ω, in the weight matrix₅Representing the weight coefficients.

Step 41: loss function loss obtained based on Step40_finalAnd (5) iteratively training the whole network for z times by using a random gradient descent algorithm. Selection evaluationPth corresponding to the model with the highest index (average cross-over ratio, miou) value is stored.

Step 42: test data A obtained in Step2_testAnalyzing the trained neural network model best.pth obtained by inputting the data into Step41 to obtain a final prediction graph (clothing analysis graph)

The invention mainly has 3 processes: (1) and acquiring data and preprocessing the data. (2) The network structure of the invention is a coding and decoding structure, as shown in fig. 3, the coding part is composed of 6 coding blocks (from left to right in fig. 3, Block-1 to Block-6), the decoding part is composed of 5 decoding blocks (from left to right in fig. 3, Block-5 to Block-1), and the coding Block and the decoding Block are both formed by iterative combination of the same structure. The structure is as follows: two different convolutional layers (Step3) are followed by a connection channel attention module (average pooling layer, convolutional layer, max pooling layer, convolutional layer, Sigmoid layer) (Step4-Step9), for example the structure iterates 7 times to constitute coding block 1. All the coding blocks are connected through downsampling operation (Step11, Step13, Step15, Step17 and Step19), the output results of all the decoding blocks and the output results of the coding blocks 6 are finally upsampled to the size of an input picture (namely the respective prediction results are obtained, for example, the prediction results of the decoding block 1 are obtained after the decoding block 1 is upsampled), channel fusion is carried out (Step31), and then the fused results are processed through convolution operation to obtain the final output of the whole neural network (Step 32). (3) Training a neural network, namely respectively calculating cross entropy losses for the prediction results of 5 decoding blocks and 5 coding blocks 6 output finally by the whole network (Step34), then respectively calculating the cross entropy losses for the prediction results of the decoding blocks 1 and the coding blocks 6 (Step35), then respectively calculating the cross entropy losses for the prediction results of the decoding blocks 2 to 5 and the coding blocks 6 by the decoding block 2 (Step36), then respectively calculating the cross entropy losses for the prediction results of the decoding blocks 4 to 5 and the coding blocks 6 by the decoding block 3 (Step37), then respectively calculating the cross entropy losses for the prediction results of the decoding blocks 5 and the coding blocks 6 by the decoding block 4 (Step38), and finally calculating the cross entropy losses for the prediction results of the coding blocks 6 by the decoding block 5 (Step39), adding the cross entropy loss function results with different weights to obtain a loss function of an auto-supervision part (Step40), wherein the whole network comprises strong supervision besides the auto-supervision part. The strong supervision part calculates cross entropy loss for the final output result of the network and the prediction results of the 5 decoding blocks and the prediction result of the coding block 6 respectively by using the marking data (Step 33). The loss function of the strongly supervised part is also obtained by adding different weights (Step 40). The auto-supervised partial loss function and the strongly supervised partial loss function are added to obtain a final loss function (Step 40). And training the whole network for a fixed number of times by using the designed loss function to obtain a model under the best result of the evaluation index (average cross-over ratio), and analyzing the clothing image of the test set by using the model.

The invention discloses a garment analysis method based on channel attention and self-supervision constraint. Aiming at fashion clothing images, the method provides two modes for improving performance from the perspective of a channel, including a channel attention module and a cross-layer channel self-supervision constraint network. Firstly, standardizing fashion clothing image data containing characters and sending the fashion clothing image data into a network; after passing through a specific convolution layer, inputting the result characteristic diagram into a channel attention module to extract the channel attention; then combining the weighting with the original characteristic diagram to obtain a new characteristic diagram; and then, sending the new feature map into a subsequent processing module for further feature extraction. And finally, in the cross-layer channel self-supervision constraint network, based on the six side branch prediction graphs of the backbone network and the fused prediction graph, respectively applying self-supervision constraint on a lower layer by a higher layer, and simultaneously performing strong supervision constraint on the fusion result and the six side branch prediction graphs by using the labeled image. In particular, there are many forms of self-supervision constraints, one of which may be selected to be combined with a strong supervision constraint to supervise a feature. In addition, under the current fashion garment data set, the method can effectively improve the garment analysis precision by only needing a small number of model parameters.

Fig. 4 is a schematic structural diagram of a garment analysis system based on channel attention and self-supervision constraint according to the present invention, the system includes:

a data set obtaining module 201, configured to obtain a clothing picture data set, where the clothing picture data set includes clothing analysis diagrams corresponding to clothing pictures;

a data input module 202, configured to input each clothing picture in the clothing picture data set into a neural network, where the neural network includes a channel attention module;

the feature extraction module 203 is configured to perform multiple image feature extractions on the input clothing pictures based on the channel attention module to obtain output feature maps of the clothing pictures;

the neural network training module 204 is used for performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of high-level output on low-level output in each decoding output data in the neural network;

and the neural network model application module 205 is configured to input the clothing image to be analyzed into the trained neural network model, and output a clothing analysis graph.

The system further comprises:

The image feature extraction in the feature extraction module 203 specifically includes:

carrying out average pooling on the first data to obtain second data;

performing maximum pooling on the first data to obtain third data;

The feature extraction module 203 specifically further includes:

a first encoding unit for repeating the image feature extraction 7 times to obtain first encoded output data, I ═ ψ_encoder1(A)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(A) )))) where I represents the first encoded output data, ψ)_encoder1Representing the acquisition process, p, of the first encoded output data_iRepresenting the ith repetition, i belongs to {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set;

a third encoding unit for repeating the image feature extraction 5 times to obtain third encoded output data, K ═ ψ_encoder3(J′)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(J'))))) where K represents the third editionCode output data, #_encoder3Representing the acquisition process of said third encoded output data, J' representing a second downsampled encoded output data;

a fifth decoding unit for extracting the image features againObtaining the fifth decoded output data 4 times, O ═ psi_decoder5(N′，M)＝ρ₄(ρ₃(ρ₂(ρ₁(concat (N', M)))) where O represents the fifth decoded output data, ψ_decoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;

a second decoding unit for repeating the image feature extraction 6 times to obtain second decoded output data, wherein R phi_decoder2(Q′，J)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (Q', J))))) where R represents the second decoded output data, ψ_decoder2Representing the second decoded output dataQ' represents the third up-sampled encoded output data;

The neural network training module 204 specifically includes:

a third group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the first decoded output upsampled data and the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a third group of cross entropy losses;

a loss function obtaining unit, configured to perform different weighting on the first group of cross entropy losses, the second group of cross entropy losses, the third group of cross entropy losses, the fourth group of cross entropy losses, the fifth group of cross entropy losses, the sixth group of cross entropy losses, and the seventh group of cross entropy losses, and then add the weighted values to obtain a loss function;

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A method for analyzing a garment based on channel attention and self-supervision constraint, the method comprising:

inputting the clothing picture to be analyzed into the trained neural network model, and outputting a clothing analysis picture;

the method for extracting the image features of the input clothing pictures for multiple times based on the channel attention module to obtain the output feature map of each clothing picture specifically comprises the following steps:

performing up-sampling on the sixth encoded output data to obtain sixth up-sampled encoded output data;

repeating the image feature extraction for 7 times to obtain a first decoding output numberAccording to S ═ ψ_decoder1(R′，I)＝ρ₇(ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (R', I)))))) where S represents the first decoded output data, ψ_decoder1Representing the acquisition process of said first decoded output data, R' representing second up-sampled encoded output data;

and performing convolution operation on the fusion data by adopting set convolution to obtain an output characteristic diagram of the clothing picture.

2. The method of claim 1, wherein before inputting each of the garment pictures in the garment picture dataset into a neural network, the method further comprises:

3. The garment parsing method based on channel attention and self-supervision constraint according to claim 1, wherein the image feature extraction specifically comprises:

carrying out average pooling on the first data to obtain second data;

performing maximum pooling on the first data to obtain third data;

4. The garment parsing method based on channel attention and self-supervision constraint according to claim 1, wherein the iteratively training the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model specifically comprises:

5. A garment parsing system based on channel attention and self-supervision constraints, the system comprising:

the neural network model application module is used for inputting the clothing picture to be analyzed into the trained neural network model and outputting a clothing analysis graph;

the feature extraction module specifically includes:

a second encoding unit for repeating the image feature extraction 6 times to obtain second encoded output data, J ═ ψ_encoder2(I′)＝ρ₆(ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(I'))))) where J represents the second encoded output data, ψ_encoder2Representing the acquisition of said second encoded output data, I' representing the firstDown-sampling the encoded output data;

a fifth decoding unit for repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψ_decoder5(N′，M)＝ρ₄(ρ₃(ρ₂(ρ 1(concat (N', M)))) where O denotes fifth decoding output data, ψ_decoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;

a third decoding unit for repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψ_decoder3(P′，K)＝ρ₅(ρ₄(ρ₃(ρ₂(ρ₁(concat (P', K)))) where Q represents the third decoded output data, ψ_decoder2Represents the acquisition process of the third decoded output data, P' represents the fourth up-sampled encoded output data;

and the characteristic diagram output unit is used for carrying out convolution operation on the fusion data by adopting set convolution to obtain an output characteristic diagram of the clothing image.

6. The channel attention and self-supervision constraint based garment parsing system of claim 5, further comprising:

7. The system for analyzing clothes based on channel attention and self-supervision constraint according to claim 5, wherein the image feature extraction specifically comprises:

carrying out average pooling on the first data to obtain second data;

performing maximum pooling on the first data to obtain third data;

8. The system for analyzing clothes based on channel attention and self-supervision constraint according to claim 5, wherein the neural network training module specifically comprises: